Abstract Runs of Homozygosity (ROH) are homozygous genomic fragments inherited from parents to offspring. ROH can be used to indicate the level of inbreeding, as well as to identify possible signatures of artificial or natural selection. Indigenous sheep populations on the southern edge of the Taklimakan Desert have evolved unique genetic traits adapted to extreme desert environments. In an attempt to better understand the adaptive mechanisms of these populations under harsh conditions, we used Illumina® Ovine SNP50K BeadChip to perform a genomic characterization of three recognized breeds (Duolang: n = 36, Hetian: n = 84, Qira black: n = 189) and one ecotypic breed (Kunlun: n = 27) in the region. Additionally, we assessed genomic inbreeding coefficients through ROH analysis, revealing insights into the inbreeding history of these populations. Subsequently, we retrieved candidate genes associated with economic traits in sheep from ROH islands in each breed. To better understand the autozygosity and distribution of ROH islands in these indigenous sheep breeds relative to international breeds, we also included three commercial mutton breeds (Poll Dorset: n = 108, Suffolk: n = 163, Texel: n = 150). The study revealed that among seven sheep breeds, Hetian exhibited the shortest linkage disequilibrium (LD) decay distance, while Kunlun demonstrated the highest LD levels. A total of 10,916 ROHs were obtained. The number of ROHs per breed ranged from 34 (Kunlun) to 2,826 (Texel). The length of ROH was mainly 1–5 Mb (63.54%). Furthermore, 991 candidate genes specific to indigenous sheep breeds were identified, including those associated with heat tolerance, adaptability, energy metabolism, reproduction, and immune response. These findings elucidate the genetic adaptation of indigenous sheep in the Taklimakan Desert, uncovering distinctive characteristics of indigenous sheep formation, and advocating for the conservation and genetic enhancement of local sheep populations. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-025-11445-9. Keywords: Taklamakan desert, Indigenous sheep, Runs of homozygosity, Linkage disequilibrium, ROH islands Introduction The Taklamakan Desert, located in the central Tarim Basin in the northwest of China and surrounded by towering mountains [[36]1, [37]2], is known for its arid conditions, high solar radiation levels, significant day-night temperature fluctuations, and frequent sandstorms. These challenging environmental conditions have driven natural selection in the local sheep population [[38]3, [39]4] resulting in breeds such as the Duolang [[40]5, [41]6], Hetian [[42]7, [43]8], Qira black [[44]9, [45]10], and Kunlun sheep [[46]11]. Over time, these sheep have developed region-specific adaptive traits that enable them to maintain productivity and reproduce under extreme environmental stress. Despite the remarkable adaptations driven by natural selection, there is a need for further comprehensive research to uncover the genetic mechanisms responsible for phenotypic variations, including coat type, color, adaptability, and disease resistance, resulting from domestication in these populations. ROH are genome-wide segments of homozygous genotypes resulting from the transmission of identical haplotypes by both parents to their offspring [[47]12]. Throughout biological evolution and the formation of breeds, variations in selection pressure intensity, population size, and structure contribute to diversity in the length, number, distribution, and frequency of ROH in the genome [[48]13]. Examining the unique distribution patterns of ROH within the genomic composition of animal populations proves valuable for deducing the extent and age of inbreeding and unraveling the evolutionary history of livestock [[49]14]. Longer ROH segments typically signify more recent mating events among close relatives, whereas shorter ROH segments indicate inbreeding events occurring in more ancient generations [[50]15, [51]16]. Dzomba et al. [[52]17] utilized the Illumina® Ovine SNP50K Beadchip to analyze ROH in 14 South African and 17 global sheep breeds. They found that long ROH were more prevalent in breeds with smaller population sizes, accompanied by elevated genomic inbreeding coefficients. Furthermore, the prevalence of short ROH reflected the long-term commercial breeding history of these breeds [[53]18]. Sallam et al. [[54]19] utilized 60 K chip data to analyze ROH in four Egyptian sheep breeds. They discovered that the Damascus breed had the shortest ROH, whereas the Egyptian Nubian breed not only had the longest ROH but also exhibited the highest inbreeding coefficient (F[ROH]= 0.12), as previously reported by Eydivandi et al. under intensive breeding practices [[55]20]. Moreover, selection imparts discernible imprints in particular segments of the animal genome, resulting in an elevated purity of these genomic regions in comparison to others. These selectively favored genomic regions frequently manifest as ROH islands [[56]13, [57]21, [58]22], which exhibit a non-random distribution across the genome and are universally shared among individuals within the same species [[59]23]. Consequently, ROH islands are commonly used to evaluate the genetic adaptation of populations in order to identify genes related to livestock adaptability and economic traits. Liu et al. [[60]24] identified candidate genes associated with tail types, reproductive performance, environmental adaptability, and immunity within the high-frequency ROH regions of five distinct Chinese sheep breeds. Mastrangelo et al. [[61]25] discovered candidate genes related to environmental adaptability, immune response, and lipid metabolism based on the high-frequency ROH regions of the Valle del Belice sheep breed. He et al. [[62]26] searched the high-frequency ROH regions of 635 Chinese Merino sheep and retrieved Quantitative Trait Locus (QTLs) associated with horn phenotype, immune traits, and environmental adaptability. Linkage disequilibrium (LD) can improve the accuracy of genomic association analysis and predict marker regions [[63]27]. LD decay patterns also help in understanding the selection patterns experienced by regions of gene function [[64]28]. Effective population size (Ne), a key population genomics parameter, reflects ideal population size and is crucial for assessing genetic diversity and adaptation potential [[65]29, [66]30]. Comparing LD and Ne of different species can reveal their inbreeding rate and environmental adaptability. The Taklamakan region is celebrated for its distinctive geographical and climatic characteristics, presenting unique challenges to the survival and reproduction of native sheep. Our research aims to employ the Illumina® Ovine SNP50K BeadChip to conduct a genomic characterization of the indigenous sheep breeds in the region, revealing their distribution pattern of ROH, genome-level diversity, inbreeding coefficients, and population history. Furthermore, we investigate adaptive mechanisms to extreme environments in these sheep based on identified high-frequency ROH islands in the genome. Additionally, this study can serve as a reference for genetic resource conservation, rational breeding planning, and controlling inbreeding in different sheep populations. Materials and methods This study adheres rigorously to the relevant guidelines and regulations of the Ministry of Agriculture of the People’s Republic of China. The animal study underwent thorough review and received approval from the Ethics Committee of the College of Animal Science and Technology of Tarim University (protocol code: 2023039). Prior to their participation in this study, written informed consent was obtained from the owners of the animals. Animal samples The experiment involved the selection of 336 healthy adult indigenous ewes of similar age, sourced from the southern edge of the Taklamakan Desert in Xinjiang, China. These ewes represented four distinct populations: Duolang sheep (DUO), Qira Black sheep (QIR), Hetian sheep (HET), and Kunlun sheep (KUN). DUO from the Duolang sheep conservation farm in Maigaiti County, Xinjiang; QIR from the Cele Black sheep conservation farm in Cele County, Xinjiang; HET from Hetian region, Xinjiang; and KUN from the Kunlun sheep conservation farm in Qiemo County, Xinjiang (Fig. [67]1; Table [68]1). Additionally, the analysis incorporated 421 genotypes from three sheep breeds of international origin, including Poll Dorset (POL), Suffolk (SUF), and Texel (TEX). Access to the dataset was granted under the permission of the International Sheep Genomics Consortium (ISGC, [69]http://www.sheephapmap.org). To minimize any potential confounding effects stemming from sexual disparities, all 757 samples were deliberately chosen from female specimens. Fig. 1. [70]Fig. 1 [71]Open in a new tab Geographic distribution of the four indigenous sheep populations from the southern edge of the Taklamakan Desert in Xinjiang, China Table 1. A brief introduction to experimental animals (Breed name, place of origin, sample size, purpose, reproduction rate, different climatic and geographical conditions, and feeding methods) Breed Name Duolang (DUO) Qira Black (QIR) Hetian (HET) Kunlun (KUN) Location Makati County, Xinjiang Hotan prefecture, Xinjiang Moyu County, Xinjiang Qiemo County, Xinjiang Sample Size 36 189 84 27 Purpose Wool/Meat Lamb skin Carpet wool Wool/Meat Av. Altitude (m) 1200 1336 1100–4000 1170–4500 Agro-Ecology and feeding mode Temperate continental extreme arid climate, with lowland meadows as the main grassland type, dominated by reed-like grasses. Stall feeding is the primary method. Warm and extremely arid climate, with ice and snow meltwater as the main source of water in the production area. Seasonal grazing is the primary method, with grazing in the mountains during the spring lambing season and in crop stubble fields during the autumn. During the winter, a semi-grazing, semi-stall-feeding approach is used. Temperate continental desert climate, high-altitude river valleys with sparse shrubs. Grazing throughout the year in natural grasslands, with the possibility of digging through snow for grass in winter. During the winter, mainly stall feeding for rams, and providing appropriate supplemental feeding to ewes during the winter and lambing season. Temperate continental desert climate, natural grasslands in lowland meadows, high-altitude cold grasslands, with access to snow and glacier meltwater as well as mountain streams. Grazing takes place from May to November, with sheltered feeding during December to April. Fecundity (%) 113–130 215 98–103 —— Annual Average sunshine (hour) 2806 2695 2470–2795 —— Annual Average Temperature (°C) 22.4 13.4 -11.0-4.0 —— Extreme Maximum Temperature (°C) 40.8 40.8 43.2 —— Annual Precipitation (mm) 39 29 15–150 —— Annual Evaporation (mm) 2553 2553 2143.8-2858.7 —— [72]Open in a new tab Genotyping and quality control Blood samples were obtained from the jugular vein of sheep, and Deoxyribonucleic acid (DNA) extraction was carried out using a DNA extraction kit (Tiangen Biotech Co., Ltd., Beijing, China). All samples were genotyped using the Illumina® Ovine SNP50 BeadChip (Beijing Compass Biotechnology Co., Ltd, Beijing, China), which includes 46 900 single nucleotide polymorphisms (SNPs). Quality control for SNPs was executed using PLINK v1.90 (access dates: Jul-18-2023) [[73]31], with exclusion criteria applied to SNP sites having a call rate < 0.95, and Hardy-Weinberg equilibrium (HWE) p-value < 10^− 6. Additionally, individuals with more than 5% missing genotypes excluded from the analysis. After quality control, 757 samples with 46,896 SNPs were retained for continued analysis. Estimation of linkage disequilibrium (LD) and effective population size (Ne) The LD degree between markers was quantified using the r^2 proposed by Hill and Robertson, where r^2 represents the square of the correlation coefficient between the alleles of two loci [[74]32, [75]33]. The values of r^2 range from 0 to 1, and the distances between markers were categorized into 10 classes: 0 < r^2 ≤ 20 kb; 20 < r^2 ≤ 40 kb; 40 < r^2 ≤ 60 kb; 60 < r^2 ≤ 100 kb; 100 < r^2 ≤ 200 kb; 200 < r^2 ≤ 500 kb; 500 < r^2 ≤ 1000 kb; 1000 < r^2 ≤ 2000 kb; 2000 < r^2 ≤ 4000 kb; 4000 < r^2 ≤ 6000 kb. The calculation formula is as follows: graphic file with name M1.gif Where f(AB), f(ab), f(Ab), and f(aB) represent the frequencies of the four haplotypes formed between genotypes, and f(A), f(a), f(B), and f(b) are the allele frequencies. We utilized PopLDdecay V3.42 (access dates: Jul-25-2023) to compute the r^2 of alleles to determine the levels of LD and decay [[76]34]. The obtained results were then visualized using Plot_OnePop.pl (access dates: Jul-25-2023). We utilized SMC + + v1.15.3 (access dates: Aug-11-2023), which incorporates an innovative spline regularization scheme [[77]35, [78]36], to infer the population size history and divergence time of the four sheep populations from the southern edge of the Taklamakan Desert in Xinjiang, China. This method effectively minimizing estimation errors. Initially, employing the vcf2smc tool, each population within the VCF dataset was converted into independent SMC + + format output files. Next, the model fitting was executed using the estimate function, with all simulations conducted under the initial condition of a mutation rate of 1.25e-9 [[79]37]. Detection of runs of homozygosity The distribution of ROH in each sample within seven populations was assessed using PLINK v1.90 (access dates: Aug-31-2023). ROH estimation adhered to the following criteria: (1) Each sliding window contained 50 SNPs; (2) a maximum of one heterozygous genotype per window; (3) a maximum of one missing genotype per window; (4) The scanning window threshold was set at 0.05; (5) a maximum gap between two consecutive SNPs in ROH of 500 kb; (6) The minimum length of ROH was set to 1000 kb to exclude short ROH resulting from LD; (7) To minimize the likelihood of detecting ROH by chance, the minimum number of SNPs constituting a ROH was determined using the method proposed by Lentz et al. [[80]38]. That is: graphic file with name M2.gif The variable I[ROH] denotes the minimum number of SNPs constituting ROH. α represents the false positive rate of identified ROH (set to 0.05 in this study), n[S] represents the number of SNPs per individual, n[i] is the total number of individuals in the population, and Inline graphic signifies the average heterozygosity across all SNPs. To assess the distribution of ROH lengths among the seven breeds, ROH lengths were classified into six categories: 1–5 Mb, 5–10 Mb, 10–15 Mb, 15–20 Mb, 20–25 Mb, and > 25 Mb. We calculated the ROH frequency for each length category (Within each breed, count the number of ROH within each length category and calculate the proportion within the total ROHs) and the average ROH length for each category (For each breed, sum up the lengths of all individual ROHs belonging to different categories, and then divide the resulting sums by the number of individuals in that breed). The enumeration of ROH was conducted independently for each chromosome within the seven distinct sheep breeds. Furthermore, the cumulative assessment included determining the total length and aggregate count of ROH for individual sheep specimens. Estimation of inbreeding coefficient Utilizing the equation proposed by Sams et al. [[81]15], the ROH-based inbreeding coefficient (F[ROH]) was calculated for each breed as follows: graphic file with name M4.gif We calculated the cumulative length of identified ROH within individuals, denoted as ∑[LROH]. L[AUTO] signifies the total length of the autosomal sheep genome, which was 2,452 Mb in our study. Additionally, for validating the accuracy of F[ROH], we computed the inbreeding coefficient based on the expected number of homozygous genotypes (F[HOM]) using PLINK v1.90 with the following parameters: -het (access dates: Sep-17-2023) [[82]31]. graphic file with name M5.gif Where Ho represents the observed number of heterozygous genotypes in the population, He represents the theoretical frequency of heterozygous genotypes under Hardy-Weinberg equilibrium, and N represents the total number of SNPs in the individual. Selection signatures To identify genomic regions that are most frequently associated with ROH across sheep populations under different environmental conditions, we calculated the percentage of each SNP within ROH by tallying their occurrences. Subsequently, we visualized the distribution of all SNPs along chromosomal positions. In this study, to identify genomic regions frequently associated with ROH in each breed, we used an SNP frequency threshold of > 45% to screen genomic regions associated with ROH across various breeds [[83]39]. Within these regions, ROH islands were defined as clusters consecutive SNP loci > 10 Mb in length, containing a minimum of 200 SNPs [[84]17]. We annotated the genomic regions associated with these ROH islands using the Oar_v4.0 sheep reference genome. Subsequently, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for genes annotated to ROH islands in different populations using the online tool DAVID ([85]https://david.ncifcrf.gov/) [[86]40]. Additionally, we referred to NCBI ([87]https://www.ncbi.nlm.nih.gov/pmc/) and OMIM ([88]http://www.ncbi.nlm.nih.gov/omim) databases for functional information on genes within the ROH islands. Results Estimation of LD and Ne We observed a gradual decline in r^2 values with increasing physical distance between SNPs, accompanied by a slowdown in the rate of decay (Fig. [89]2a). By comparing LD among seven sheep populations, we observed differences in LD levels and decay rates with the same SNP intervals. Overall, HET exhibited the shortest LD decay distance, while KUN demonstrated the highest LD levels (r^2 = 0.081). Additionally, it was evident that among the three commercial meat sheep breeds, SUF had the fastest LD decay rate, followed by TEX and POL. Fig. 2. [90]Fig. 2 [91]Open in a new tab (a) LD decay map measured by r^2 over distance between SNPs in seven sheep populations; (b) Estimated Ne for 10,000 ancestral generations of four Chinese indigenous sheep breeds Figure [92]2b illustrates the changing trends of Ne in different sheep populations as the number of generations changes. Between the 1,000th and 2,000th generations, the Ne of all four sheep breeds had reached their maximum, with estimated values of approximately 4,718 (DUO), 1,293 (KUN), 2,765 (QIR), and 2,810(HET). In the most recent generations, the Ne of the four sheep breeds remained relatively stable. QIR and DUO exhibited the largest effective populations, with estimated Ne values of 1,082 (QIR) and 901 (DUO), respectively. In contrast, KUN and HET had the smallest effective populations, with corresponding Ne estimates of 704 (KUN) and 746 (HET), respectively. Runs of homozygosity patterns Based on the defined parameters, a total of 10,916 ROH segments were detected. The number of ROHs per breed ranged from 34 (KUN) to 2,826 (TEX). Among the indigenous sheep populations, QIR had the highest total number of ROH (2,255), followed by DUO (346) and HET (294), while KUN had the lowest total number of ROH (34) (Table [93]S1). Importantly, readers should interpret the results presented here with caution, as varying sample sizes may influence the detection and interpretation of ROHs. Therefore, the identified genomic regions and their implications should be considered in light of these differences. Analyzing ROH lengths in seven sheep breeds revealed predominant clustering within 1–5 Mb (63.54%). Commercial meat breeds showed higher average lengths and proportions of ROHs within this range than local breeds. Among the local breeds with ROHs exceeding 10 Mb in length, QIR had the longest average ROH length and the highest frequency, whereas KUN exhibited the shortest average ROH length and the lowest frequency (Fig. [94]3, Table [95]S2, Table [96]S3). Fig. 3. [97]Fig. 3 [98]Open in a new tab The average sum of ROH of each breed in different ROH length classes Figure [99]4 illustrates the distribution of ROHs per chromosome for each breed. ROHs did not display a uniform distribution among chromosomes within each breed. Across all breeds, the highest percentage of ROHs was observed on Ovis aries chromosomes (OAR) 2(13.25%) and 3(13.14%). Notably, ROHs were not detected on OAR 17, 21, and 23 in the indigenous sheep (Table [100]S1). Fig. 4. [101]Fig. 4 [102]Open in a new tab Number of ROH found on each chromosome in seven sheep populations Figure [103]5 depicts the relationship between the total number of ROHs per individual and the total genomic length covered by ROHs. There were significant differences among different breeds, with KUN and HET exhibiting a high number of short ROHs. Additionally, extreme individuals with ROH coverage exceeding 400 Mb on autosomes were observed in QIR (Table [104]S4). Fig. 5. [105]Fig. 5 [106]Open in a new tab The total genomic length covered by ROH per individual (x axis) and total number of ROH per individual (y axis) Inbreeding coefficient We computed the inbreeding coefficient (F[ROH]) based on ROH (Table [107]2). Figure [108]6A shows the variation and distribution of the average inbreeding coefficient estimates among different sheep populations. Among four indigenous populations from the southern edge of the Taklamakan Desert, QIR had the highest inbreeding coefficient (0.0393 ± 0.0331), while KUN had the lowest (0.0059 ± 0.0083). Moreover, inbreeding coefficient (F[HOM]) based on the expected E(HOM) and observed O(HOM) number of homozygous genotypes was calculated using Plink V. 1.90 [[109]31] (Table [110]2). To visualize inbreeding degrees, we plotted F[ROH] and F[HOH] as line graphs (Fig. [111]6B). A significant correlation was observed between F[ROH] and F[HOM] (r = 0.785, p = 0.036). Table 2. Mean and standard deviation of ROH and genomic inbreeding coefficients in seven sheep breeds Breed Name Sample Size ROH number ROH length, Mb FROH FHOM Mean SD Mean SD Mean SD Mean SD Duolang (DUO) 36 9.6111 6.8442 73.8218 92.4277 0.0310 0.0379 0.3057 0.054 Qira Black (QIR) 189 11.9312 6.9848 89.7825 81.9726 0.0393 0.0331 0.3244 0.0684 Hetian (HET) 84 3.5000 4.0285 21.5177 34.3799 0.0119 0.0152 0.2597 0.0445 Kunlun (KUN) 27 1.2593 2.1589 7.5456 16.234 0.0059 0.0083 0.2769 0.0072 Poll Dorset (POL) 108 24.2130 6.1736 113.9845 37.9326 0.0465 0.0155 0.3040 0.0317 Suffolk (SUF) 163 15.6196 10.1879 85.3954 65.4967 0.0355 0.0265 0.2887 0.0649 Texel (TEX) 150 18.8400 8.4679 85.8968 46.716 0.0355 0.0187 0.2964 0.0446 [112]Open in a new tab Fig. 6. [113]Fig. 6 [114]Open in a new tab (A) Distribution of the runs of homozygosity inbreeding coefficient (F[ROH]) within each breed; (B) Trend plot of F[ROH] and F[HOM] for each sheep breed Gene annotation of common ROH islands The statistical analysis of ROH islands in seven sheep breeds are presented in Table [115]S4 and Table [116]S5. To compare the differences in ROH islands between the indigenous breeds from the southern edge of the Taklamakan Desert and commercial meat breeds, we categorized the seven sheep breeds into two distinct groups: Chinese indigenous sheep and commercial meat sheep. The results revealed a significant number of ROH islands on OAR 1, 2, 3, 6, 20, and 25 in four indigenous breeds (Fig. [117]7A); similarly, a significant number of ROH islands were observed on OAR 2, 4, 5, 6, 7, and 10 in three commercial meat breeds (Fig. [118]7B). A total of 542 ROH islands were identified in the Chinese indigenous sheep group, while 559 ROH islands were found in the commercial meat sheep group. Fig. 7. [119]Fig. 7 [120]Open in a new tab (a) Manhattan plot of SNPs in ROH islands on different chromosomes of four indigenous sheep breeds; (b) Manhattan plot of SNPs in ROH islands on different chromosomes of three commercial meat sheep breeds; (c) Wayne diagram of candidate genes of four indigenous sheep breeds; (d) Wayne diagram of candidate genes of three commercial meat sheep breeds; (e) Wayne plots of candidate genes in four indigenous sheep breeds versus three commercial meat sheep breeds The genomic region associated with each of these ROH islands was annotated using the sheep genome Oar_v4.0. In the four indigenous breeds, a total of 10,387 candidate genes were retrieved (DUO: 5,129, HET: 3,621, KUN: 433, QIR: 9,976) (Fig. [121]7C). In the three commercial meat breeds, a total of 9,979 candidate genes were identified (SUF: 8,593, TEX: 6,411, POL: 6,455) (Fig. [122]7D). To better understand the adaptive mechanisms of indigenous sheep from the southern Taklamakan Desert under extreme conditions, we excluded intersecting genes with commercial meat breeds, leaving us with 991 genes considered as candidate genes for adaptation to extreme environments (Fig. [123]7E; Table [124]S6). After functional enrichment analysis of these candidate genes, a total of 44 GO terms and KEGG pathways were enriched at the threshold (p < 0.05). The most significant GO terms were found in coenzyme A biosynthetic process (GO:0015937), regulation of ventricular cardiac muscle cell membrane repolarization (GO:0060307), negative regulation of epithelial cell proliferation involved in prostate gland (GO:0060770), voltage-gated potassium channel complex (GO:0008076), and olfactory receptor activity (GO:0004984). The most enriched KEGG pathways included Olfactory transduction (oas04740), Insulin resistance (oas04931), Adipocytokine signaling pathway (oas04920), and PI3K-Akt signaling pathway (oas04151) (Table [125]S7). Discussion This study utilized the Illumina® Ovine SNP50 Beadchip to detect ROH in seven sheep populations, quantified their number, length, distribution, and frequency. Using this ROH information, we inferred the population histories of different sheep breeds, calculated inbreeding coefficients, and explored adaptive mechanisms in indigenous sheep under extreme production conditions by analyzing candidate genes identified in ROH islands. Additionally, we conducted LD analysis on four indigenous sheep breeds and three commercial meat breeds, and estimated Ne for the indigenous breeds. Linkage disequilibrium and effective population size Among all sheep breeds, KUN exhibited the highest value of average r², which corresponded to the smallest Ne. The smallest values of average r² were observed in QIR, which had the highest Ne. Notably, the LD decay lines of QIR and HET almost overlap. Our finding is consistent with previous research on HET, which indicated that higher r² levels correspond to smaller Ne [[126]41]. In this study, the minimal Ne observed in KUN can be primarily attributed to their geographic isolation and the environmental stress of high altitudes. These factors likely contributed to reduced genetic flow and, consequently, a smaller Ne [[127]42, [128]43]. In contrast, QIR boasted the highest Ne, owing to its status as a prominent wool-producing breed in China. This high Ne level indicates successful conservation efforts for this valuable genetic resource [[129]44]. Furthermore, the study reveals that the Ne of four sheep populations declined over the past approximately 1300 generations. The period coincided with the peak of the Silk Road trade [[130]45], a time of significant cultural and economic exchange in the region. As a crucial junction along the Silk Road, Xinjiang experienced significant genetic admixture, leading to a reduction in the genetic resources of indigenous sheep populations during that era [[131]46, [132]47]. This historical context provides a plausible explanation for the observed decline in Ne. ROHs and ROH-based inbreeding coefficient (F[ROH]) The length of ROH segments reflects the history of animal inbreeding [[133]14]. As the number of generations increases, the likelihood of ROH segments being disrupted by recombination increases. Consequently, shorter ROH segments primarily result from inbreeding in more distant generations, while longer ROH segments indicate more recent inbreeding [[134]15, [135]16]. Among the indigenous sheep breeds studied, QIR exhibited the highest proportion of long ROH segments and the highest levels of F[ROH] and F[HOH]. As an elite sheep breed valued for its high-quality lambskin, QIR is predominantly distributed in Cele County, Hotan Prefecture, Xinjiang [[136]48]. However, recent changes in market demand and the massive introduction of foreign commercial meat sheep breeds have contributed to a significant decline in the number of purebred QIR [[137]49]. Notably, all QIR samples in this study were collected from the Cele Black Sheep Conservation Farm, where the limited population size has led to a high degree of inbreeding. To address this issue, we recommend implementing a regular strategy for exchanging rams within the population to enhance genetic diversity and expand the effective population size of QIR [[138]41]. KUN possesses the smallest proportion of long ROH segments and the lowest levels of inbreeding coefficient. Distributed in higher-altitude mountainous areas and adopting a semi-grazing and semi-confined feeding method, KUN has a relatively small core population size. However, existing breeding plans have successfully avoided excessive inbreeding [[139]41]. This suggests that adaptive breeding strategies can mitigate the negative effects of small population size on genetic diversity. HET, widely distributed across the Hotan region of Xinjiang, originated from a small ancestral population and has evolved into diverse ecotypes through inbreeding in more distant generations, natural selection, and artificial breeding [[140]41, [141]50]. The HET samples in this study, originating from various conservation farms, exhibited a higher proportion of short ROH segments due to geographical isolation. This reinforces the significant role of geographical factors in shaping population genetic structures [[142]51, [143]52]. Selection signatures of candidate genes within ROH islands Significant genetic differences exist between domestic and foreign sheep in economically important traits, making it a crucial research area in genetic breeding [[144]53, [145]54, [146]55]. To deeply explore the adaptive mechanisms of indigenous sheep from the southern edge of the Taklimakan Desert under harsh environmental conditions, this study separately identified ROH islands in indigenous sheep and commercial mutton sheep that may be related to breed-specific traits. After gene annotation, by eliminating intersecting genes, we successfully identified multiple candidate genes associated with the indigenous sheep breeds in extreme environments. Candidate genes related to olfactory receptor activity and olfactory transduction The well-developed sense of smell in sheep is a key evolutionary advantage for their survival in the natural environment, affecting critical behaviors like foraging [[147]56], danger detection [[148]57], and reproduction [[149]58]. Olfactory receptors in vertebrates, located in the olfactory epithelium, sense odor molecules in the environment [[150]59]. These receptors are mostly G protein-coupled receptors (GPCRs). When odor molecules bind to these receptors, they activate a biochemical cascade that converts external odors into electrochemical signals, involving G protein activation [[151]60, [152]61], adenosine triphosphate (ATP) conversion to cyclic adenosine monophosphate (cAMP) [[153]62], ion channel opening, and intracellular Ca^2+ increase [[154]63], ultimately activating olfactory neurons [[155]64]. These neurons then transmit the sensory information to the brain, forming the animal’s perception of olfaction [[156]65]. We identified GPCRs, including CELSR2 [[157]66], RXFP3 [[158]67], and VIPR1 [[159]68], which are associated with neuronal development and synapse formation in mammals, potentially influencing the complexity and sensitivity of the olfactory system. Additionally, GRM7 [[160]69] and ADGRL [[161]70] play roles in neurotransmitter release and neuronal signaling, which are crucial for the rapid and accurate processing of olfactory information. Additionally, we identified several genes within the GPCR family linked to reproduction. GPR18 has been established as a genetic marker associated with phenotypic traits of livestock semen quality [[162]71]. GRM7 has been identified as an important and unique candidate gene associated with the semen volume of Merino sheep [[163]72]. KISS1R has been identified as a major gene affecting sheep reproduction traits [[164]73]. These findings suggest a potential link between olfactory system and reproductive behavior in sheep. PRKACB, a catalytic subunit of cAMP-dependent protein kinase, mediates olfactory signal transduction by regulating cAMP levels [[165]74]. GNB1, as the β subunit of the G protein family, plays a crucial role in transmembrane signal transduction [[166]75]. MCOLN2 and MCOLN3 belong to the mucolipin family of Ca^2+ channel proteins, regulating intracellular Ca^2+ levels and neural signal transmission [[167]76, [168]77]. NALCN specifically mediates sodium leak currents to modulate neuronal excitability [[169]78]. Genes related to mammalian olfactory receptor activity or conduction identified in this study help us understand the unique genetic adaptations evolved by these native sheep breeds under harsh environmental conditions. Candidate genes related to adipocyte synthesis and energy metabolism Fat is the main energy reserve in animals, and fat-tailed sheep store excess fat in their tails, helping them adapt to food scarcity and climate changes in arid deserts [[170]79]. In this study, we identified three key pathways specifically linked to fat deposition and energy metabolism: coenzyme A biosynthetic process (GO:0015937), adipocytokine signaling (oas04920), and insulin resistance (oas04931). Coenzyme A (CoA) is a critical cofactor in fatty acid synthesis, influencing fat deposition. In the CoA biosynthetic pathway, we found that the genes PANK4, PANK3, PPCS, and ACOT7 play pivotal roles. These genes encode enzymes essential for converting pantothenate to 4’-phosphopantothenate and synthesizing 4’-phosphopantothenoylcysteine, a key intermediate in CoA biosynthesis [[171]80, [172]81, [173]82]. Additionally, ACOT7 is involved in breaking down acyl-CoA into free fatty acids and CoA [[174]83]. Although these genes collectively play pivotal roles in the CoA biosynthetic process, their specific roles and interactions in fat-tailed sheep have not been previously explored. The Insulin resistance and adipocytokine signaling pathways involve critical genes such as PRKAA1, SLC2A1, PRKAG2, IRS2, STK11, and IL6. PRKAA1 and PRKAG2 encode subunits of AMP-activated protein kinase (AMPK) [[175]84, [176]85], which regulates metabolic homeostasis by activating the fatty acid oxidation pathway. SLC2A1 encodes glucose transporter 1 (GLUT1) [[177]86], facilitating glucose transport into cells, essential for fatty acid synthesis and cellular energy production [[178]87]. IRS2 regulates adipocyte metabolism by controlling fat storage and breakdown through its role in insulin signaling [[179]88, [180]89]. STK11, a key upstream gene in the AMPK cascade, promotes adipogenesis in ruminant preadipocytes [[181]90], while IL6 plays crucial role in insulin resistance and fat metabolism [[182]91]. The candidate genes we identified related to fat deposition and energy metabolism highlight the unique genetic adaptations of indigenous sheep to extreme arid desert environments. Candidate genes related to desert environment adaptation and immune response Indigenous sheep from the Taklamakan Desert are exposed to complex physical stressors throughout the year, including significant diurnal temperature fluctuations, drought, and high radiation. We identified some pathways associated with adaptation to desert environments and specific immune responses [[183]92, [184]93]. In environments with significant temperature variations, animals regulate local circulation to adjust to tissue demands [[185]94]. Vasoconstriction in cold environments increases blood viscosity and reduces flow to minimize heat loss [[186]95], while vasodilation in hot environments decreases blood viscosity and increases flow to aid heat dissipation [[187]96]. The vascular system thus plays a crucial role in maintaining thermal balance. COL4A1 [[188]97], KDR [[189]98], CTNNB1 [[190]99], and SEMA3Ed [[191]100] play a crucial role in blood vessel formation and maintenance of mammalian blood vessels. Notably, KDR facilitates the high-altitude adaptation of Tibetan sheep by regulating angiogenesis [[192]101, [193]102]. High temperatures enhance hemoglobin oxygenation and dissociation, boosting oxygen transport [[194]103, [195]104]. To increase tissue oxygen supply, the sympathetic nervous system activates the cardiovascular system, increasing heart rate and cardiac output. We identified two pathways related to cardiac function: regulation of ventricular cardiac muscle cell membrane repolarization and voltage-gated potassium channel complex. Proper repolarization, driven mainly by voltage-gated potassium channels [[196]105], is crucial for maintaining normal cardiac rhythm, contraction, and relaxation. Blood-fluid shear stress, a biomechanical characteristic [[197]106], exerts force on arterial walls and stimulates endothelial cells to produce nitric oxide (NO) [[198]107]. NO is a crucial signaling molecule that dilates blood vessels, increases blood flow, and inhibits thrombus formation [[199]108]. This explains the involvement of genes like KDR (vascular endothelial growth factor) and NOS3 (encoding nitric oxide synthase) in the pathway of fluid shear stress and atherosclerosis. NOS3 has been reported to regulate the diameter of arterial blood vessels in sheep embryos, causing vasodilation and increasing blood flow, which aids in heat dissipation for indigenous sheep from the southern edge of the Taklamakan Desert in hot environments [[200]109, [201]110]. Additionally, NO also exhibits anti-inflammatory and anti-platelet aggregation effects, maintaining vascular homeostasis [[202]111, [203]112]. The immune system is vital for organism health and survival. In desert environments, extreme day-night temperature swings and high variations can disrupt inflammation levels, reducing immune function [[204]113, [205]114]. Additionally, intense radiation and microorganisms elevate infection risks. To adapt, organisms adjust immune-related gene expression, enhancing responses to specific antigens. The four genes (CSF1, IL6, IL7, IL7R) on the PI3K-Akt signaling pathway are involved in biological processes like immune regulation, inflammatory response, and cell proliferation. While these genes have been previously reported in sheep [[206]115, [207]116, [208]117], our study contributes to the understanding of their role in the immune system of desert sheep, emphasizing their novelty and the unique selective pressures they may face. For instance, the role of CSF1 in macrophage differentiation and inflammation regulation may be particularly important in desert environments, where inflammation levels can be disrupted by temperature variations [[209]118, [210]119]. IL-6 and IL-7, belonging to the interleukin cytokine family, regulate immune cells by mediating the proliferation, differentiation, and activation of T and B cells. Specifically, IL-6 also impacts immune cell activity, participates in inflammatory responses, and regulates metabolic homeostasis [[211]120]. Meanwhile, IL-7 is essential for lymphocyte development and maintaining immune system balance [[212]121], while IL-7R, as the receptor for IL-7, plays a crucial role in IL-7 signal transduction [[213]122]. This is particularly significant when dealing with environmental stressors such as radiation and temperature changes. Chemotaxis is a crucial process in the immune system, allowing immune cells to sense chemical signals (chemokines) and migrate to inflammatory or lesion sites, participating in pathogen clearance and immune regulation. Our study is the first to report the presence and potential roles of CX3CR1 [[214]123], RARRES2 [[215]124], CCR8 [[216]125], and DOCK2 [[217]126], which are involved in chemotaxis and immune recognition, in desert-dwelling sheep. Similarly, genes involved in the response to gamma radiation, such as FANCD2 [[218]127], XRCC2 [[219]128], and LIG4 [[220]129], have not been previously reported in sheep in desert environments. Conclusions This study focused on analyzing LD, Ne, and homozygosity patterns in four distinct indigenous sheep breeds from the southern edge of the Taklamakan Desert in comparison to three foreign commercial meat sheep breeds. Variations in LD and ROH patterns were observed among the four breeds. The QIR genome exhibited the highest inbreeding coefficient and the highest proportion of long ROH segments, indicating relatively recent breeding events. The KUN Ne was smaller, with a low inbreeding coefficient, suggesting appropriate breeding management. Utilizing ROH analysis, we identified genes associated with heat resistance, adaptability to extreme arid environments, energy metabolism, reproduction, and immune response within prominent homozygous islands. These findings significantly impact the adaptability and survival of indigenous sheep populations. These findings will not only enrich our understanding of the genetics of indigenous sheep but will provide essential genetic information for the conservation and rational utilization of this population. Future research can further explore the functional significance of these genes and explore genetic improvement strategies related to indigenous sheep. Electronic supplementary material Below is the link to the electronic supplementary material. [221]Supplementary Material 1^ (21KB, xlsx) [222]Supplementary Material 2^ (16.5KB, xlsx) [223]Supplementary Material 3^ (18.6KB, xlsx) [224]Supplementary Material 4^ (52.4KB, xlsx) [225]Supplementary Material 5^ (54.1KB, xlsx) [226]Supplementary Material 6^ (341.4KB, xlsx) [227]Supplementary Material 7^ (30.3KB, xlsx) Acknowledgements