Abstract Copy number variants (CNVs) are a type of genetic polymorphism which contribute to phenotypic variation in several species, including livestock. In this study, we used genomic data of 192 animals from 3 Iranian sheep breeds including 96 Baluchi sheep and 47 Lori-Bakhtiari sheep as fat-tailed breeds and 47 Zel sheep as thin-tailed sheep breed genotyped with Illumina OvineSNP50K Beadchip arrays. Also, for association test, 70 samples of Valle del Belice sheep were added to the association test as thin-tailed sheep breed. PennCNV and CNVRuler software were, respectively, used to study the copy number variation and genomic association analyses. We detected 573 and 242 CNVs in the fat and thin tailed breeds, respectively. In terms of CNV regions (CNVRs), these represented 328 and 187 CNVRs that were within or overlapping with 790 known Ovine genes. The CNVRs covered approximately 73.85 Mb of the sheep genome with average length 146.88 kb, and corresponded to 2.6% of the autosomal genome sequence. Five CNVRs were randomly chosen for validation, of which 4 were experimentally confirmed using Real time qPCR. Functional enrichment analysis showed that genes harbouring CNVs in thin-tailed sheep were involved in the adaptive immune response, regulation of reactive oxygen species biosynthetic process and response to starvation. In fat-tailed breeds these genes were involved in cellular protein modification process, regulation of heart rate, intestinal absorption, olfactory receptor activity and ATP binding. Association test identified one copy gained CNVR on chromosomes 6 harbouring two protein-coding genes HGFAC and LRPAP1. Our findings provide information about genomic structural changes and their association to the interested traits including fat deposition and environmental compatibility in sheep. Subject terms: Agricultural genetics, Animal breeding, Genetic association study, Genetic markers, Genomics Introduction Sheep breeding has an important role in meat production in Iran. There are 28 distinct sheep breeds in Iran distributed over different environments from dry and warm climate to the mountainous cold areas^[36]1,[37]2. These breeds are characterized by a wide range of phenotypic variation especially in the case of tail shape. The vast majority of Iranian sheep are fat-tailed^[38]3 while Zel is known as the only thin-tailed breed reared in the north part of Iran near to Caspian Sea (Fig. [39]1). The fat tail characteristic of sheep has a role in the survival and adaptation mechanisms in harsh environments. It is aimed for depositing nutrients when food supply is abundant and represents a valuable metabolic energy during periods of climate changes like drought and highly cold periods and food insufficiency^[40]4,[41]5. In addition, the fat in the tails can be consumed by humans as a source of energy during periods of drought and famine^[42]6,[43]7. Figure 1. Figure 1 [44]Open in a new tab Traditional geographic distributions of the three Iranian sheep breeds. Since a large portion of the fat deposited in the carcass of sheep is in the tail and represents more than 20% of the carcass weight, the amount of feed requirement to store fat in the tail and therefore, the cost of feeding, could be considerable^[45]8,[46]9. With the development of modern livestock breeding systems, most of the benefits of a fat tail have lost their significance and accordingly reduction of fat deposition in tail is one of the main interests of the producers^[47]7. Besides, the consumers in many cases prefer the low-fat meat and therefore the carcass adiposity especially in the case of the fat tailed, is no longer desirable to customers and lessens the value of the meat^[48]10. Removing or reducing the amount of the fat in the tails of local sheep can be a breeding objective for the sheep industry. This can be approached by docking the fat-tail^[49]11, slaughtering lambs at an early age, or crossing the fat-tailed breeds with tailed breeds^[50]10. The phenotypic diversity and genetic structure among sheep breeds appeared due to environmental adaptation and artificial selection for economically important traits such as meat, milk and wool^[51]12. Sheep adaptation to different environmental conditions may have resulted in variation in genes and genomic structure^[52]13. A better understanding of the genomic structure underlying fat deposition is very important for controlling the fat deposition in carcass through sheep breeding. With the advent of genome-wide SNP detection technology, it is now possible to identify genomic regions associated with a variety of economically important traits and subsequently to use this information in genetic improvement of livestock. Structural genetic variants represent a category of genomic changes of DNA that typically extend more than a thousand bases^[53]14,[54]15. Copy number variations (CNVs) are the most prevalent type of structural genetic variation as DNA segments that are presented at a different copy number when compared to a reference genome^[55]15. They are observed with variable length varying from 1 kb to several Mb where duplication or deletion events can be detected^[56]16. CNVs play main role in genetic and phenotypic variation^[57]17,[58]18, gene expression and adaptation by disrupting encoding sequences, gene structure changes and the appearance of the recessive alleles^[59]16,[60]19–[61]21. There are several experimental approaches of CNV detection including array-based comparative genomic hybridization arrays (aCGH)^[62]22, SNP genotyping panels^[63]23; and next-generation sequencing (NGS)^[64]24. High-throughput genotyping arrays are the most commonly implemented mainly because of their benefits of their appropriate signal-to-noise ratios, measuring total signal intensity and allelic intensity ratio altogether, which makes the explanation of results easier^[65]25,[66]26. Several genomic studies have been performed to identify functional genes associated with fat deposition in sheep. Using Genomic scan of selective sweeps, three novel regions on chromosomes 5, 7 and X have been reported to be associated with fat deposition in sheep^[67]7. Zhang et al. using high density SNP markers identified 13 candidate genes including SMURF2, FBF1, DTNBP1, SETD7 and RBM11 associated with fat metabolism in sheep^[68]27. Fei et al. using miRNA seqencing identified differentially expression miRNAs in short-fat-tailed short-thin-tailed sheeps among which 17 miRNAs were related with lipid metabolism^[69]28. Salehian-Dehkordi et al. using the genome-wide SNPs detected eight CNVRs associated with fat deposition in the tails in the Large-tailed and Small-tailed Han sheep horboting PPP1R11 and GABBR1 genes to be involved in fat deposition in the tails^[70]29. Zhu et al. using high-density SNP arrays detected genome-wide CNVs in Chinese indigenous sheep with different types of tails and reported CNVRs including genes associated with fat deposition^[71]30. Yuan et al. via selection signature analysis detected 6.24 Mb of overlapped regions and 43 genes that may related to fat tail process in Chinese indigenous sheep^[72]31. Bakhtiarizadeh et al. implemented an expressed sequence tag (EST) study and reported candidate genes associated with tail type development and reported that the FABP4 gene expression in the fat-tail is a main cause of fat formation^[73]32. Moreover, many studies have reported the contribution of the CNVs in many traits in sheep^[74]13,[75]30,[76]33–[77]36. It is known that the agouti duplication, affects the ASIP locus for white and grey coat phenotypes in sheep^[78]33,[79]37. Also, CNV in the KIT gene causes a white coat color in pigs^[80]38, while the phenotype of the pea comb in chickens is affected by CNV in intron 1 of SOX5^[81]39. In addition, genome scan of CNVs in Chinese sheep identified the candidate genes related to fat deposition^[82]40. Despite the importance of CNVs, there are no reports on CNV structure in Iranian indigenous sheep and the mechanism responsible for differences in fat deposition between the fat-tailed and thin-tailed breeds is not yet clear. The aim of this study was to identify the characteristic of CNVs in three Iranian sheep breeds with different types of tails (Baluchi and Lori-Bakhtiari as fat-tailed breeds and Zel as the thin-tailed breed). Also, we performed the association between copy number variation regions with fat-tailed deposition trait in these breeds and run subsequent bioinformatics approaches to report genes related to this trait. Methods Genotyping data A total of 192 individual’s genomic data from 3 breeds including 96 Baluchi sheep^[83]41 and 47 Lori-Bakhtiari sheep^[84]7 as fat-tailed breeds and 47 Zel sheep^[85]7 as thin-tailed breed were used for this study. All samples were genotyped using the Illumina OvineSNP50 BeadChip array with 54,241 SNPs. SNP quality control To obtain reliable and precise results for CNV detection, quality control (QC) was done in two phase including SNP genotyping and CNV calling. At SNP genotyping phase, we performed QC using Plink v1.07 software^[86]42. SNPs or samples were excluded if any of the following criteria was met: MAF < 0.01, (ii) animal call rate < 0.99, SNPs call rate < 0.95 and p-value for Hardy–Weinberg equilibrium < 0.00001. Also, the X and Y chromosomes were excluded from further analyses. CNV calling The CNVs calling was implemented with the SNP data file from the GenomeStudio 1.0 software. The intensity files containing SNP name, chromosome, position, BAF and LRR were obtained for each dataset. The CNV analysis was performed by PennCNV v1.0.5 software^[87]43. This software uses different kind of data based on a hidden Markov model for CNV detection. First, the intensity files were converted into individual files using ‘-split’ option in PennCNV package. The individual-calling algorithm was performed using the ‘-test’ option. PennCNV includes GCmodel argument which utilizes a regression model for adjusting the high GC content and recovers samples influenced by genomic waves^[88]44. To adjust genomic waves, the ‘-gcmodel’ option with the ‘gcmodel’ file was implemented and GC content of the 1-Mb genomic region surrounding each marker (500 kb on each side) was measured. The additional input file for PennCNV including PFB (Population Frequency of B allele) was calculated based on the average BAF of each marker in the three breeds separately. If a large fraction of samples has waviness factor (WF value) less than -0.04 or higher than 0.04, it is much better to apply the adjustment procedure to reduce false positive calls ([89]http://penncnv.openbioinformatics.org/en/latest/user-guide/test/). The status of CNV was classified into two classes: “loss” (CNV containing a deletion) and “gain” (CNV containing a duplication). CNVs quality control The CNV filter was performed to enhance the accuracy of detected CNVs based on these criteria: (1) the CNV must contain at least ten SNPs without gap; (2) the length of the CNV more than 10 kb; and. Quality control was implemented in following rules: standard deviation of LRR < 0.3, BAF drift < 0.01 and a waviness factor between 0.05 and -0.05. Identifying CNVR After CNV detection, the copy number variation regions (CNVRs) were identified by overlapping CNVs using the CNVRuler V1.2 program^[90]45. First, individual CNVs were merged into CNVRs, which are genomic regions covering CNVs overlapping by at least 1 bp^[91]16. This step is easy and straight, however, when the overlapping CNVs are highly long it can lead to overestimation of the size of CNVRs. To alleviate this issue, the CNVRuler gives the opportunity to evaluate base-by-base the regional density of the contributing CNVs and remove the low-density regions. Genomic regions with density lower than 10% were removed (“recurrence 0.1”)^[92]45. The recurrence trims a CNVR on the base of its occurrence to prevent false positive results, and it provides more reliable bounds of the regions^[93]45. The option "Gain/Loss separated regions" was used to assess the result (gain, loss) in each region. Overlapping "gain" and "loss" CNVRs were merged into single regions to identify genomic regions in which both gain and loss events can be observed ("mixed" CNVRs). Association analysis To identify CNVRs significantly associated with fat deposition, a case–control analysis was performed using Fisher’s exact test on which fat-tailed sheep were treated as cases while thin-tailed sheep as controls. The genome wide association analysis between the CNVRs and the tail type was executed using CNVRuler V1.2 program^[94]45 and applying a logistic regression model. Bonferroni correction was used for solving the multiple testing problem. To balance the number of control samples against the cases and also to help focus the differences between the case and control groups on fat deposition, 70 samples of Valle del Belice breed^[95]36 were added to the association test as thin-tailed sheep breed. For Valle del Belice breed, quality control and CNV detection and filtration were performed exactly as previously mentioned for the other three breeds, and then the results were used for association analysis. Gene content in CNVRs The BioMart database ([96]http://www.biomart.org/) in Ensembl was used to investigate genes within or overlapping with the identified CNVRs for each sheep breed based on the Ovis aries (Oar_v3.1) gene sequence assembly. CNVRs overlapping with the coding region of the gene by at least 1 bp were considered to measure the proportion of CNVR overlapping genes^[97]13. Gene ontology and KEGG pathway analyses was performed in DAVID ([98]http://david.abcc.ncifcrf.gov/). Because the genome annotation for sheep is not complete, we converted the ovine Ensembl gene IDs into human Ensemble gene IDs for the functional enrichment analysis. qPCR validation of CNVRs To validate the detected CNVRs, qPCR analysis was used using five CNVRs that were selected randomly based on the results of PennCNV analysis. The primers (Additional file: Table [99]S6) were designed using Primer 3 ([100]http://bioinfo.ut.ee/primer3-0.4.0/) based on NCBI reference sequences. The genomic DNA of the same individuals in genotyping process was used for the experimental validation. For reference samples, four individuals with normal copy number were considered. The DGAT1 gene was used as reference since it has been shown to have no variation in copy number in sheep genome^[101]33. PCR phases were implemented using Power SYBR Green PCR Reagent Kit (Applied Biosystems). The qPCR conditions with three replications for each sample were as: 95 °C for 3 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 60 s. We follow a standard 2^−ΔΔCt method to determine fold changes^[102]46 for which, to obtain the ΔΔC[t], the ΔC[t] value of a reference sample was compared with the sample of interest. To do a well comparison of copy number in all qPCR plots fold changes were normalized to a diploid number^[103]13. Finally, the copy number of the target regions was obtained through 2 × 2^−ΔΔCt equation. According to what described by Jiang et al., values of 2, 3 or more, and 1 or below approves normal, gain and loss events, respectively^[104]19. Results Quality control, CNVs and CNVRs Table [105]1 shows the results of the raw data quality control. With strict quality control, most potentially problematic, low-quality data that reduce the reliability of CNVs are excluded from further analysis. Table 1. Results of data quality control. Removal criteria Zel Lori-Bakhtiari Baluchi Total animals 47 47 96 animal call rate < 0.99 4 3 9 Remaining animals 43 45 87 Number of SNPs 53,903 53,903 51,135 MAF < 0.01 1648 1945 2760 SNPs call rate < 0.95 3718 3884 224 HWE < 0.00001 69 98 29 unknown SNPs 31 30 25 Remaining SNPs 48,437 47,946 48,097 [106]Open in a new tab The CNVs calling were performed based on Illumina Ovine SNP50 Beadchip and PennCNV software. Table [107]2 represents the total number of CNVs detected by PennCNV in the Iranian sheep breeds. Table 2. Copy number variation characteristics in two tail type sheep breeds. Measurement Fat-tailed Thin-tailed Gain Loss Total Gain Loss Total CNV count 273 300 573 164 78 242 Total length (Mb) 34.71 44.28 79.001 20.26 14.004 34.26 Average length(kb) 126.7 147.6 137.63 124.3 179.56 142.17 Median length(kb) 112.96 112.92 112.97 122.25 120.76 120.76 Max length(kb) 414 1729.93 1729.93 705.53 666.69 705.53 Min length(kb) 11.97 25.710 11.97 19.7 34.44 19.7 Average per sample(n) 2.3 2.52 4.82 3.8 1.81 5.61 [108]Open in a new tab According to the PennCNV results, on 26 autosomal chromosomes a total of 815 CNVs including 573 and 242 CNVs were identified in fat tailed and thin tail sheep breeds, respectively. The CNVs length ranged from 11.97 kb (gain) to 1729.93 kb (loss) with an average of 137.63 kb and median size 112.97 kb in fat tailed, and from 19.7 kb (gain) to 705.53 kb (gain) with an average length of 142.17 kb and a median size of 120.76 kb in thin tail. The frequency of CNVs for each animal varied from 0.36% to 10.47% and the average number of CNVs in each individual ranged from 4.82 to 5.6 for fat and thin tail breeds, respectively. Our findings showed that in fat tail sheep, loss events were higher in number than gain events, whereas in thin tail sheep, gain events were higher in number than loss events. The results of the analysis showed that CNVs with a length range between 100 and 500 kb constituted 54.53% and 64.91% of the total CNVs in fat tail and thin tail sheep, respectively. CNVs shorter than 10 kb were not identified while CNVs longer than 1 Mb were detected with a frequency of 0.17 in fat-tailed (Fig. [109]2). Figure 2. Figure 2 [110]Open in a new tab The distribution of CNVs size (a) and CNVRs size (b) in in fat and thin tail groups. Results of CNV distribution in the 26 autosome chromosomes revealed the highest number of CNVs for fat tail on chromosomes 1, 6 and 2 with frequencies of 12.18, 9.27 and 8.36%, respectively. In thin tail, the highest number of CNVs was detected on chromosomes 2, 1 and 7 with frequencies of 13.22, 9.5 and 9.5%, respectively. In total, chromosomes 1, 2 and 7 had the most CNVs (Fig. [111]S1). The maximum and minimum CNV length sizes were identified on Chromosome 10 (loss event) and on chromosome 1 (gain event) respectively. We found the number of CNVs in each chromosome to vary from 3 (chromosome 24) to 89 (chromosome1) and the average number of CNVs per chromosome was 30.42 (Fig. [112]3). Figure 3. [113]Figure 3 [114]Open in a new tab Distribution of the CNVs (a) and CNVRs (b) number on the 26 autosomal sheep chromosomes. After merging overlapping CNVs, a total of 328 CNVRs (gain: 212, loss: 107, both: 9) and 187 CNVRs (gain: 152, loss: 34, both: 1) with lengths of 50.63 Mb, and 25.15 Mb were detected in fat tail and thin tail sheep breeds, respectively. Table [115]3 showed CNVRs information in two tail type Iranian sheep breeds. In total after merging overlapping CNVs in all samples, 483 CNVRs with a length of 73.85 Mb and an average of 146.89 kb were identified representing 2.6% of the entire sheep genome. Out of 483 CNVRs, 343 were gain, 126 were loss, and 14 were mixed within the same region. Most CNVRs sizes were identified in the range of 100 to 500 kb (57.74%). Also, shorter CNVRs lower than 10 kb were not observed. Also, chromosomes 1, 2, 3 and 6 had the highest CNVR and chromosomes 12, 24 and 21 had the lowest CNVR (Table [116]S1). Table 3. CNVRs Information in two tail type Iranian sheep breeds. Measurement Total Loss Gain Mixed Count 483 126 343 14 Length (Mb) 73.85 46.39 48.3 5.03 Average length (kb) 146.89 151.7 135.64 418.98 Length range  < 10 kb 0 0 0 0 10- 50 kb 34(16.7%) 8(1.68%) 26(5.47%) 50-100 kb 163(34.32%) 38(8%) 123(25.9%) 2(0.42%) 100-500 kb 273(57.74%) 74(15.58%) 191(40.21%) 8(1.68%) 500-1000 kb 4(0.84%) 1(0.21%) 2(0.42%) 1(0.21%)  > 1 Mb 1(0.21%) 0 0 1(0.21%) [117]Open in a new tab Figure [118]4 shows the distribution of CNVRs on all chromosomes. As mentioned in material and methods chromosomes X and Y were excluded, respectively for poor coverage and to avoid gender effect. The map showed that CNVRs were not uniformly distributed across chromosomes and varied based on the position on each chromosome. The results showed that the number of CNVRs on chromosomes ranged from 2 to 57. Most CNVRs were located on larger chromosomes 2(57), 1(56), 3(32) and the lowest CNVRs were located on chromosomes 24(2), 12(2), 21(7). Also, the shortest and furthest distances between CNVRs were 26.351 kb and 29.38 kb, respectively on chromosome 13 and chromosome 5. In total, the average distance was 4.75 Mb between adjacent CNVRs. Figure 4. [119]Figure 4 [120]Open in a new tab Genomic distribution and status of detected CNVRs in Iranian sheep breeds with two different types of tails. The chromosomal regions covered by CNVRs (the total length of CNVRs detected per chromosome over chromosome length) varied between chromosomes, ranging from 0.91% in OAR12 to 4.34% in OAR16. The number of CNVRs on each chromosome varied from 2 on OAR24 to 57 on OAR2. Results showed that the number of CNVRs were positively related to chromosome length and this relationship was also linear (Table [121]S1; Fig. [122]S2). Association analysis In comparative analysis between cases (two fat-tailed breeds) and controls (two thin-tailed breeds), one genomic region on chromosomes 6 was found to be in significant association with fat tail deposition (Table [123]4). The detected significant copy gained CNVR contained two significant protein-coding genes HGFAC and LRPAP1. Table 4. association test between cases (fat tail sheep breeds) and controls (thin tail sheep breed). CNVR Chr Start End Size Freq (thin-tailed sheep) Freq (fat-tailed sheep) Type Fisher’s exact test (P-value) Gene(s) 1 6 114,604,408 114,720,361 115,953 11 0 Gain 0.00001 HGFAC,LRPAP1 [124]Open in a new tab CNVRs gene content Out of 483 CNVRs, 315 identified CNVRs (65.35%) overlapped with 790 genes, while remaining 167 CNVRs (34.64%) were in non-annotated gene regions. Further investigation showed that 77.72% of identified genes were of encoding protein, 9% of lincRNAs, 3.67% of miRNAs, 3.67% of snRNAs and 6% were other pseudogenes, processed pseudomorphs, snoRNAs, rRNAs, snRNA and misc_RNA. There were also 9 common genes between fat and thin tail sheep breeds (Table [125]S2). GO analysis showed that the annotation of these genes was involved in biological processes, cellular components, and molecular functions (Tables [126]S3, [127]S4). In the case of biological processes, genes were enriched in several terms such as adaptive immune response, cellular protein modification process, regulation of reactive oxygen species in thin-tailed breed and cellular protein modification process, regulation of heart rate, intestinal absorption in fat-tailed breeds. In molecular function annotation, these candidate genes were associated with protein-L-isoaspartate (D-aspartate) O-methyltransferase activity and phosphatidylinositol 3-kinase activity in thin tailed breed and Olfactory receptor activity, transcription factor binding and interleukin-10 receptor activity carbon monoxide binding, ATP binding in fat-tailed breeds. For the cell component annotation, genes were enriched in nuclear chromatin in thin-tailed breed and cytoplasm, mitochondrial respiratory chain complex in fat-tailed breeds. According to the results of KEGG pathway analysis, these candidate genes were enriched in several signalling pathways, such as Regulation of lipolysis in adipocytes, Ras signalling pathway, Biosynthesis of antibiotics, Insulin resistance, Pantothenate and CoA biosynthesis, TNF signalling pathway, Melanoma, Glioma, Asthma and Regulation of lipolysis in adipocytes were significantly enriched (p < 0.05) (Table [128]S5). CNVR validation by qPCR In order to validate the obtained CNVRs by qPCR experiment, we randomly select five CNVRs with different types of CNV format (gain, loss). Four selected CNVRs were confirmed and were completely in accordance with PennCNV results (Table [129]S6). To evaluate the validity of the CNVRs identified in this study, the results were compared with 10 previous studies with different breeds, size, population structure, platform, and CNV identification algorithms (Table [130]5). Table 5. Comparison of our study with recent ovine CNVR reports. References Assembly Algorithm(s) Platform Sample size Length range(kb)