Abstract Background Diverse gut microbiota in animals significantly influences host physiology, ecological adaptation, and evolution. However, the specific functional roles of gut microbiota in facilitating host adaptation, as well as the coevolutionary dynamics between microbiota and their hosts, remain largely understudied. Results A total of 41,847 metagenome-assembled genomes and 3193 high-quality species-level genome bins were generated, establishing a comprehensive gut microbiome catalog for cervids in this study. Phylogenetic analysis revealed a coevolutionary relationship between cervids and their gut microbiota. Comparative metagenomic analyses further indicated that the gut microbiota of plateau cervids have undergone genome-level adaptations related to energy metabolism. At the genus level, species-level genome bins from the genera Alistipes and Faecousia in plateau cervids exhibit enhanced energy metabolism capabilities. Structural variations analysis revealed that the insertion and duplications structural variations in the gut microbiota of plateau cervids were significantly enriched in energy metabolism pathways. In contrast, the deletions and contractions in structural variations were predominantly enriched with metabolic pathways involved in the biosynthesis of diverse biochemical molecules. Conclusions Our study provides a comprehensive gut microbiome catalog of the cervid gut microbiota, revealing the coevolutionary relationship between cervid gut microbiota and hosts. These findings highlight the adaptive genomic evolution of the gut microbiota in contributing to the plateau adaptability of cervids and offer new insights into the mechanisms by which the gut microbiota help hosts adapt to extreme environments. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-025-02269-w. Keywords: Cervids, Metagenome-assembled genomes, Comparative macrogenomics, Structural variation, Plateau adaptation Background Cervids, encompassing all species within the family Cervidae, represent a significant subgroup of the order Cetartiodactyla within the class Mammalia and are the second most diverse family of even-toed ungulates, following the Bovidae family [[40]1]. Cervids are found throughout much of the world, with notable exceptions being Oceania and Antarctica. Cervids exhibit exceptional adaptability and resilience, allowing them to thrive in a wide range of geographical and climatic environments [[41]1]. Arctic reindeer have undergone significant evolutionary adaptations, involving specific genes related to vitamin D metabolism (CYP27B1 and POR) mutations. These adaptations are crucial for their survival in the harsh Arctic climate and its extreme light–dark cycles [[42]2]. The Qinghai-Tibet Plateau (QTP), commonly known as the “Third Pole,” has an average elevation of over 4000 m. This region is situated at the junction of the Asian and Indian tectonic plates, where their interaction has significantly influenced its geological and topographical features [[43]3]. The prolonged geological processes of tectonic compression and uplift have shaped an extreme climate, with high-altitude, reduced atmospheric pressure, cold temperatures, intense ultraviolet radiation, and hypoxia, in the QTP [[44]3]. The extreme environment of this region presents significant challenges for local wildlife. Numerous animals on the QTP, such as the plateau pika (Ochotona curzoniae), have developed unique physiological mechanisms, genetic adaptations, and ecological strategies, along with co-evolved microorganisms, to survive and reproduce in the harsh environment [[45]4]. However, existing research on the adaptability of species to the QTP has predominantly concentrated on small mammals and bovids. In contrast, the adaptive mechanisms of cervids to the extreme environments of the QTP remain largely underexplored and inadequately addressed. The substantial presence of gut microbiota in animal gut creates a complex and densely populated ecosystem that drives food digestion, energy metabolism, immune regulation, and pathogen defense [[46]5–[47]7]. Coevolution of gut microbiota with their host is a prolonged and intricate process, shaped by dynamic interactions between the host, the gut microbiota, and the surrounding environment. Throughout the prolonged evolutionary process, the host establishes an ecological niche that supports microbial growth and reproduction. In return, gut microorganisms play a pivotal role in host physiology through their metabolic activities [[48]8]. The mutualistic interaction enhances the ability of host to adapt to specific environments. For example, although the giant panda (Ailuropoda melanoleuca) genome does not include genes necessary for cellulose digestion, its gut microbiota harbors specialized taxa that produce cellulolytic enzymes, crucial for enabling the host to efficiently digest its predominantly bamboo-based diet. Furthermore, during the bamboo shoot season, butyrate-producing bacteria such as Clostridium butyricum are notably more abundant than in the leaf-eating season. This seasonal variation in the gut microbiota facilitates increased fat storage in giant pandas during the bamboo shoot season [[49]9, [50]10]. Consequently, gut microbiota plays a crucial role as a plastic factor, significantly enhancing the ability to adapt to environmental fluctuations. Research has demonstrated significant variations in bacterial diversity and composition among individuals of the same species inhabiting different altitudes [[51]11]. Nonetheless, research has largely overlooked the specific functional roles of the gut microbiome in facilitating host adaptation and the dynamics of coevolution between gut microbiota and their hosts. We hypothesize that the gut microbiota of cervids inhabiting the QTP underwent distinct evolution and adaptation compared to cervids from lowland regions, driven by the extreme conditions of the QTP. The evolution may enhance the ability of cervids to cope with the unique challenges posed by the harsh conditions in the QTP. Further research involving a diverse range of cervid species and extensive sample sizes is crucial for elucidating the coevolutionary dynamics between cervids and their gut microbiota, particularly in the context of adapting to the extreme environments of the QTP. Recent breakthroughs in high-throughput sequencing technologies and metagenomic binning techniques have substantially advanced the field of microbiology. These advancements have made it possible to obtain the large-scale assembly of nearly complete metagenome-assembled genomes (MAGs), where each MAG constitutes a comprehensive genomic representation of individual organisms [[52]12–[53]14]. Metagenomic binning techniques facilitate the discovery of numerous previously unidentified bacterial species within the gut microbiome and allow for the detailed characterization of these microorganisms evolutionary and functional differences at the genomic level [[54]15]. We conducted metagenomic assembly and binning on 414 metagenome samples collected from six cervid species across various regions. We analyzed the microbial composition and functional profiles of gut microbiota across different altitudes and cervid species. We also reconstructed the six cervid species phylogenetic relationships to investigate the distinctive evolutionary and differentiation processes undergone by the gut microbiota. Through comparative metagenomic analysis, we investigated the functional genes, evolutionary patterns, and structural variations (SVs) of gut microbiome species and strains from cervids in distinct regions. Furthermore, the study explored the functional characteristics of cervid gut microbiota in plateau, highlighting the adaptive mechanisms between cervids and gut microbiota in response to the harsh plateau environment. Results Metagenome sequencing and assembly of cervids The 222 samples from cervids inhabiting diverse regions of the QTP generated 4.25 tera base pairs (Tb) of raw sequencing data using the Illumina platform. Additionally, 192 publicly available samples from cervid species in plain were downloaded from National Center for Biotechnology Information (NCBI), providing 2.92 Tb of raw Illumina data. After quality control, 7.07 Tb of clean, high-quality sequencing data were retained. 5.13 Tb of host-free reads remained for subsequent metagenome assembly and binning, after filtering out host genomic sequences using reference genomes from previously published studies (Fig. [55]1A, Additional file 2: Table S1). A total of 41,847 MAGs were reconstructed from the host-free reads following assembly and binning, comprising 25,144 low-quality MAGs, 8053 medium-quality MAGs, and 8650 high-quality MAGs (Fig. [56]1B). Among the high-quality MAGs, the average completeness was 90.39%, with an average contamination rate of 1.77%, and genome sizes ranging from 0.6 to 7.2 Mb. Additionally, 82.53% (7139) of high-quality MAGs exhibited an N50 assembly length greater than 10,000 bp (Additional file 2: Table S2). Positive correlation was observed between MAG relative abundance and assembly completeness, with overall MAG abundance (R = 0.166, p < 0.001), high-quality MAG abundance (R = 0.228, p < 0.001), and medium-quality MAG abundance positively correlated with completeness across samples (Fig. [57]1C, Additional file 1: Fig. S1). Numerous low-abundance MAGs were effectively captured on account of the high sequencing depth (15.72 ± 2.35 Gb per sample) and robust assembly processes. It included 5884 high-quality MAGs (68.02%) that were fully assembled despite lower relative abundances (log[10]Abundance < 1) and 6524 medium-quality MAGs (81.03%) that were also well assembled under similar environments (Additional file 2: Table S2). Furthermore, the rarefaction curve analysis showed a leveling off in the discovery of new MAGs once the sequencing depth reached around 12 Gb, with additional data yielding only a small number of new MAGs (Additional file 1: Fig. S2). Fig. 1. [58]Fig. 1 [59]Open in a new tab Overview of metagenomic assembly, binning, and taxonomic annotation. Sika deer (HCEN), white-lipped deer (CEA), Siberian roe deer (HCAP), and red deer (CEC) in plateau; moose (ALA), sika deer (LCEN), Siberian roe deer (LCAP), and Chinese water deer (HYI) in plain. A Workflow of the metagenomic assembly process, binning strategy, and taxonomic annotation. B Number and proportion of MAGs categorized by quality. C Spearman correlation between relative abundance and assembly completeness of all MAGs. R > 0 indicates a positive correlation; R < 0 indicates a negative correlation. p < 0.001 denotes a highly significant correlation. D Number of known and unknown SGBs in different regions. Red font represents species from plateau hosts, while black font represents species from plain hosts. E Phylogenetic tree of 3193 non-redundant SGBs generated by PhyloPhlAn. Colored stars represent phylum-level SGBs, with the legend arranged in descending order based on the number of detected bacterial and archaeal phyla within each clade. Bacteria: black font; while archaea: red font The high-quality MAGs from the gut microbiota of different cervid species collected from various regions were dereplicated separately at the strain level (ANI ≤ 99%) and species level (ANI ≤ 95%) for reducing redundancy (Fig. [60]1A). Specifically, a total of 5463 MAGs were classified at the strain level and 3896 MAGs were classified at the species level. Based on annotations from the Genome Taxonomy Database (GTDB), a total of 1636 species-level bins (SGBs) were categorized as known (Fig. [61]1D). In contrast, 2260 SGBs did not have corresponding reference genomes in the GTDB and were classified as unknown MAGs. The number of unknown SGBs in cervids from plateau exceeded that in plain (Fig. [62]1D). Furthermore, the number of unknown SGBs binned from each contig was standardized, and the relative number in plateau and plain cervids was compared. The results showed that plateau cervids had a significantly higher relative number of unknown SGBs than plain cervids (Additional file 1: Fig. S3). Subsequently, SGBs from various groups were consolidated and dereplicated at the species level (ANI ≤ 95%), resulting in a final set of 3193 high-quality, non-redundant SGBs. Moreover, 1192 (37.3%) MAGs have corresponding reference genomes in the GTDB, whereas 2001 (62.7%) MAGs did not have corresponding reference genomes in the GTDB and therefore could not be annotated to the species level among these non-redundant SGBs (Fig. [63]1A). The latter SGBs represent new species discovered in this study. All 3193 non-redundant SGBs were assigned to taxonomic groups predicted by the GTDB, comprising 3173 bacterial SGBs and 22 archaeal SGBs. The bacterial SGBs were classified into 21 known phyla (3172 SGBs) and 1 unclassified phylum (1 SGB). The SGB of the unclassified bacterial phylum, designated CEN.ZECO.010_bin.108, was derived from a plateau sika deer. The 5 most abundant bacterial phyla identified were Bacillota_A (1839 SGBs), Bacteroidota (812 SGBs), Pseudomonadota (113 SGBs), Verrucomicrobiota (112 SGBs), Bacillota_I (111 SGBs) (Fig. [64]1E). Within bacteria, 1989 SGBs could not be annotated to the species level, with the majority found in the Bacillota_A phylum (1294 SGBs) and the Bacteroidota phylum (434 SGBs). The archaea comprised three phyla, including Methanobacteriota (13 SGBs), Thermoplasmatota (5 SGBs), and Halobacteriota (5 SGBs) (Fig. [65]1E). Among the archaeal SGBs, 12 species remained unclassified at the species level, most of which belonged to the Methanobacteriota phylum (8 SGBs). Structural annotation and functional characteristics of gut microbiome MAGs in cervids A total of 3193 non-redundant SGBs contained 6,070,813 predicted proteins, of which 5,278,392 (86.94%) were successfully annotated using the Evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) Database. The predominant functional category was metabolic pathways, comprising approximately 37.03% of the overall functional composition (Fig. [66]2A). Within metabolic pathways, carbohydrate metabolism was the most prominent, accounting for around 30.44% of the metabolic functions. Subsequently, 337,303 predicted proteins (5.56%) were identified as containing at least one carbohydrate-active enzymes (CAZymes) function, based on CAZymes functional annotation. This dataset encompassed 181,370 (53.8%) glycoside hydrolases (GH), 98,494 (29.2%) glycosyl transferases (GT), 30,039 (8.9%) carboxy esterases (CE), 21,087 (6.3%) carbohydrate-binding modules (CBM), 5316 (1.5%) polysaccharide lyases (PL), 709 (0.2%) auxiliary activities (AA), 192 S-layer homology (SLH) proteins, and 96 cohesins (Fig. [67]2A). A comparative analysis of the predicted CAZymes against the current CAZymes database was conducted. A total of 269,323 (79.8%) exhibited low similarity (< 95%) to known CAZymes based on amino acid sequence alignment among the 337,303 predicted CAZymes proteins (Additional file 1: Fig. S4). The distribution of CAZymes across the genomes of the identified phyla was uneven, with notable enrichment observed in the Bacillota_A and Bacteroidota phyla MAGs (Fig. [68]2B). The relative abundance of CAZymes in cervids from plateau was higher compared to those from plain, although this difference was not statistically significant (Fig. [69]2C). Interestingly, no SLH and cohesin enzymes were annotated in the Bacteroidota phylum MAGs. Functional genes of the bacterium classified under the previously unidentified phylum CEN.ZECO.010_bin.108 were annotated with 18 GH, 14 GT, 5 CBM, and 2 CE enzyme functions. Fig. 2. [70]Fig. 2 [71]Open in a new tab Functional annotation of cervid MAGs. A Functional annotation of proteins from the cervid gut microbiome, as determined by KEGG (top) and dbCAN2 (bottom) databases. B Relative abundance of CAZymes categorized by phylum MAGs. C Comparison of relative abundance of CAZymes in cervids from plateau and plain, assessed using Wilcoxon test, ns means p > 0.05 Coevolution and genomic metabolic features of cervid gut microbiome with hosts We reconstructed the phylogenetic relationships of cervids from plateau and plain by constructing maximum likelihood trees based on circular mitochondrial genomes extracted from metagenomic reads (Fig. [72]3A). Global analyses using ParaFit and Phylogenetic Association with Co-occurrence (PACo) demonstrated significant evidence for coevolution between all SGBs and hosts, with ParaFitGlobal value = 157,503 (p = 0.012) and m^2 global value = 3708 (p < 0.001). Additionally, the application of the eMPRess led to the rejection of the null hypothesis of random host-all SGBs associations, with statistically significant results at the 0.01 (p = 0.009) (Fig. [73]3B). The ParaFit H–S link model, used to assess the coevolutionary relationships between each SGB and host, revealed that 1235 SGBs exhibited significant coevolutionary associations with their hosts, as evidenced by ParaFit1 and ParaFit2 p ≤ 0.05. In contrast, 2661 SGBs did not display significant coevolutionary relationships, with p > 0.05 (Additional file 2: Table S3). Fig. 3. [74]Fig. 3 [75]Open in a new tab Coevolution between cervid gut microbiota SGBs and hosts. A Maximum likelihood phylogenetic tree of cervids from plateau and plain. Orange represents cervids in plain, and green represents cervids in plateau. B Histogram of p for all the 3896 SGBs. In each histogram, orange bars represent the best adjustment cost for eMPRess coevolution trees, while blue bars denote the best cost for trees with randomly permuted tip associations. C Phylogenetic tree of Alistipes SGBs in cervids, generated with PhyloPhlAn. Orange represents SGBs from plain, and green represents SGBs the plateau. The outer ring was the results of ParaFit H–S link model tests. Green squares were SGBs with significant coevolutionary relationships with their hosts, while orange squares were SGBs without significant coevolution. On the right side, the reconstructed phylogenetic trees for clade 1 and clade 2 were presented from top to bottom. D Histogram of p for Alistipes SGBs. E Relative average phylogenetic distances for clade 1 and clade 2, assessed using Wilcoxon test. *** means 0.001 < p < 0.005. F Histogram of p for Alistipes clade 1 SGBs. G Histogram of p for Alistipes clade 2 SGBs The intricate phylogenetic relationships among SGBs present a significant challenge in studying their coevolution with hosts. Notably, no same SGB was identified across all hosts during the dereplication process. Consequently, we focused on the genus Alistipes within the phylum Bacteroidota and the genus Faecousia within the phylum Bacillota_A. These genera included the most abundant SGBs within their respective phyla and were present across all cervid hosts in our result. We reconstructed phylogenetic trees for the SGBs of Alistipes (294 SGBs) and Faecousia (110 SGBs) genera from the SGBs of all the hosts (Fig. [76]3C). The SGBs of the phylogenetic tree of Alistipes were predominantly derived from plateau cervids. The ParaFit and PACo demonstrated significant coevolution between SGBs in Alistipes and hosts (ParaFitGlobal = 12,492.81, p = 0.001; m^2 global value = 363.2, p < 0.001) (Fig. [77]3C). The ParaFit H–S link model revealed that 235 SGBs exhibited significant coevolution with their hosts, whereas only 59 SGBs did not. Among the 235 SGBs with significant coevolution, 193 were from plateau hosts and 42 from plain hosts (Fig. [78]3C). Additionally, the eMPRess analysis between SGBs in Alistipes and hosts yielded p = 0.009 (Fig. [79]3D). The SGBs in the genera Alistipes and Faecousia were classified into different clades (Clade) based on the evolutionary distances of the phylogenetic trees. Clade 1 in the Alistipes exhibited short branch lengths and were predominantly derived from plateau hosts (81.16%) (Fig. [80]3C). In contrast, clade 2 in the Alistipes displayed longer branch lengths and was also predominantly associated with plateau hosts (72.44%) (Fig. [81]3C). The average relative phylogenetic distance in clade 1 was significantly shorter than that in clade 2 (Fig. [82]3E). For clade 1, the Global ParaFit and PACo statistic results were ParaFitGlobal value of 19.57 (p = 0.001) and m^2 global value of 82.9 (p < 0.001). The eMPRess analysis yielded p = 0.009 (Fig. [83]3F). For clade 2, the Global ParaFit and PACo statistic results were ParaFitGlobal value of 416.30 (p = 0.001) and m^2 global value of 278.9 (p < 0.001). The eMPRess analysis produced p = 0.009 (Fig. [84]3G). The SGBs in the reconstructed phylogenetic tree of Faecousia were predominantly derived from plain cervids. The Global ParaFit and PACo test result of Faecousia was ParaFitGlobal of 836.32 (p = 0.001) and m^2 global value of 135.8 (p < 0.001). The ParaFit H–S linkage model indicated that 80 SGBs display significant coevolutionary relationships with their hosts, while 30 SGBs do not (Additional file 1: Fig. S5A). Among the 80 SGBs with significant coevolution, 32 originate from plateau hosts and 48 from plain hosts. Additionally, the eMPRess analysis between SGBs in Faecousia and hosts yielded p = 0.009 (Additional file 1: Fig. S5B). The tree was classified into three clades based on branch lengths. Clade 1, characterized by relatively short branches, contained a higher proportion of SGBs from plateau hosts (51.85%). Clade 2, which exhibited the longest branches, had the fewest SGBs and showed an equal distribution between plateau and plain hosts. Clade 3 was predominantly composed of SGBs from plain hosts (58.67%) (Additional file 1: Fig. S5A). Further analysis of phylogenetic distances within clades 1, 2, and 3 indicated that clade 3 had a significantly greater average relative phylogenetic distance compared to clades 1 and 2 (Additional file 1: Fig. S5C). We also analyzed the phylogenetic trees for clade 1, clade 2, and clade 3 of Faecousia coevolution with hosts. Significant coevolutionary relationships were detected only in clade 3, while clades 1 and 2 did not show significant coevolution. Specifically, clade 1 had ParaFitGlobal of 31.02, p = 0.001; m^2 global value = 32.4, p = 0.024, and eMPRess p = 0.02. Clade 2 had ParaFitGlobal = 2.04, p = 0.854; m^2 global value = 8.8, p = 0.7, and eMPRess p = 0.33. Clade 3 had ParaFitGlobal = 166.04, p = 0.001; m^2 global value = 85.8, p < 0.001, and eMPRess p = 0.01 (Additional file 1: Fig. S5D). The Alistipes had a significantly higher relative abundance of CAZymes in clade 1 compared to clade 2 (Fig. [85]4A). Similarly, in the genus Faecousia, both clade 1 and clade 2 exhibited significantly greater CAZymes relative abundance than clade 3 (Fig. [86]4B). For Alistipes genus, genes with higher relative abundance in clade 1 compared to clade 2 were predominantly significant enriched in metabolic pathways related to polysaccharide degradation and short-chain fatty acid metabolism, including starch and sucrose metabolism, carbohydrate metabolism, glycolysis/gluconeogenesis, fatty acid biosynthesis, and lipid metabolism (Fig. [87]4C). In contrast, the metabolic pathways significantly enriched in genes with higher relative abundance in clade 2 compared to clade 1 focused on the synthesis and repair of membrane structures, amino acid metabolism. Specifically, enriched pathways included starch and sucrose metabolism, glycerophospholipid metabolism, metabolism of other amino acids, cyanoamino acid metabolism, lipid metabolism, pentose and glucuronate interconversions, and ether lipid metabolism (Additional file 1: Fig. S6A). Similarly, for Faecousia, genes with higher relative abundance in clade 1 compared to clade 2 and clade 3, as well as in clade 2 compared to clade 3, were enriched in metabolic pathways similar to those observed in Alistipes genera in clade 1 (Additional file 1: Fig. S6B–D). Unexpectedly, genes with higher relative abundance in clade 3 compared to clade 1 and clade 2 and those in clade 2 compared to clade 1 did not show significant enrichment in any metabolic pathways. Fig. 4. [88]Fig. 4 [89]Open in a new tab Genomic and metabolic features of different clades in Alistipes and Faecousia. A Comparison of CAZymes relative abundance across distinct clades of Alistipes. B Comparison of CAZymes relative abundance across distinct clades of Faecousia. Significant differences were assessed using Wilcoxon test, *** means 0.001 < p < 0.005. C KEGG-enriched metabolic pathways for genes in Alistipes with higher relative abundance in clade 1 compared to clade 2 Functional features of SVs in plateau cervid MAGs High-quality strain-level MAGs from cervid species inhabiting both plateau and plain were screened and clustered. This process resulted in the clustering of 3193 non-redundant SGBs from 5463 strain-level MAGs. The SVs analysis was performed in the species clusters generated during the deduplication process. The SVs were identified, including insertions, deletions, duplications, translocations, inversions, and contractions, by comparing the species clusters of SGBs from plateau and plain cervids (Fig. [90]5A). A total of 40,649 SVs with a length of at least 200 bp were identified. It included 16,765 insertions, 17,792 deletions, 1923 duplications, 866 contractions, 2921 translocations, and 382 inversions (Fig. [91]5B). In plateau cervids, a total of 31,117 SVs were identified in the SGBs. The white-lipped deer exhibited the highest number of SVs, totaling 12,760, followed by the SGBs in plateau red deer with 7562 SVs. Conversely, the plain cervids had 9532 SVs in SGBs, with the plain sika deer accounting for the highest number at 3036 SVs (Additional file 2: Table S4). The number of SVs was normalized relative to SGBs genome size on account of the potential undetectable SVs within assembly gaps of MAGs. No significant differences were observed in the relative abundances of SVs between cervids from the plateau and the plain (Fig. [92]5C). Furthermore, the differences in the relative abundance of different SVs in cervids were analyzed. The results showed that the relative abundance of insertions in plateau cervids was significantly higher than that of other SVs, whereas the relative abundance of deletions in plain cervids was significantly higher than that of other SVs (Additional file 1: Fig. S7). Substantial variation in SV relative abundances was observed among the 16 phyla, with archaea displaying significantly fewer SVs compared to bacteria (Fig. [93]5D). The Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolism pathways significant enriched in the duplicated SVs closely mirrored pathways enriched in the insertion genes, with a predominant focus on carbohydrate and energy metabolism. It included pathways such as fatty acid degradation, pentose phosphate pathway, galactose metabolism, fatty acid biosynthesis, fructose and mannose metabolism, alanine, aspartate, and glutamate metabolism, glycolysis/gluconeogenesis, carbohydrate metabolism, energy metabolism, and amino acid metabolism (Fig. [94]5E). The KEGG metabolism pathway enrichment analysis on insertion from the SGBs of plateau cervids revealed that the insertion genes were enriched in pathways associated with carbohydrate and energy metabolism (Fig. [95]5F). It included carbohydrate metabolism, glycan biosynthesis and metabolism, starch and sucrose metabolism, energy metabolism, and pentose and glucuronate interconversions (Fig. [96]5F). Fig. 5. [97]Fig. 5 [98]Open in a new tab Functional features of SVs in the gut microbiome of plateau cervids. A Schematic representation of SVs in gut microbiome SGBs. B Numbers of SVs identified in the gut microbiome of cervids. C Comparison of relative abundance of SVs across plateau and plateau cervid gut microbiota. D Relative abundance of SVs across different phylum levels in cervid gut microbiota. Significant differences were assessed using Wilcoxon test, ns means p > 0.05, * means 0.01 < p < 0.05, ** means 0.005 < p < 0.01, *** means 0.001 < p < 0.005, **** means p < 0.001. E KEGG metabolism pathways significantly enriched with duplication SVs in the gut microbiome of plateau cervids. F Top 20 KEGG metabolic pathways significantly enriched in the gut microbiome of plateau cervids associated with the insertion SVs The KEGG metabolism pathways significantly enriched in deletion SVs differed markedly from those associated with insertion and duplication SVs. Deletion SVs were primarily enriched in pathways involved in the metabolism and biosynthesis of various molecules. It included o-antigen nucleotide sugar biosynthesis, polyketide sugar unit biosynthesis, amino sugar and nucleotide sugar metabolism, acarbose and validamycin biosynthesis, ascorbate and aldarate metabolism, streptomycin biosynthesis, glycan biosynthesis and metabolism, cysteine and methionine metabolism, and the biosynthesis of vancomycin group antibiotics (Additional file 1: Fig. S8). Furthermore, no KEGG metabolic pathways were significantly enriched in the contraction SVs. Discussion Taxonomic and functional annotations of MAGs Previous studies have demonstrated that short-read sequencing at depths exceeding 10 Gb can reveal novel genomic features of gut microbiota, with increases in sequencing depth allowing for the detection of additional low-abundance species [[99]16]. We achieved an average sequencing depth of 15.72 ± 2.35 Gb per sample, resulting in the successful assembly of a significant number of relatively low abundances high-quality and medium-quality MAGs. The rarefaction curve analysis suggested that the sequencing depth of 12 Gb could cover the genomic information of most target species. These results demonstrate that the sequencing depth was adequate to capture the majority of gut microbiota diversity in cervids in this study. The extensive metagenomic dataset generated also serves as a valuable resource for future investigations into the gut microbiomes of cervids. Furthermore, the positive correlation observed between the relative abundance and completeness of MAGs indicates that higher-quality MAGs generally harbor higher abundant sequences in samples. These MAGs typically could provide richer functional information and are expected to play more critical roles in gut [[100]17, [101]18]. In plateau cervids, the gut microbiota exhibit a significantly greater number of unidentified SGBs compared to those in the plain. It not only enhances the understanding of gut microbial diversity in plateau cervids but also suggests that prolonged exposure to extreme plateau pressures may drive adaptive evolution between hosts and their gut microbiota, thus increasing the diversity and complexity of gut microbiota [[102]19, [103]20]. The gut microbiota SGBs in cervids were predominantly comprised of the phyla Bacillota_A and Bacteroidota, aligning with previous studies [[104]21–[105]24]. A total of 1294 unclassified MAGs in the Bacillota_A phylum and 434 in the Bacteroidota phylum were assembled, significantly enhancing the discovery of new species within two phyla. It reveals the importance of Bacillota_A and Bacteroidota in the cervid gut microbiota and suggests that microbial diversity within these phyla substantially exceeds current database. Notably, of the 23 archaeal MAGs obtained from the cervid gut microbiota, 12 were served as unclassified MAGs. Our study contributes to the discovery of new archaea species. Additionally, archaea from the phylum Methanobacteriota are primarily involved in methane metabolism [[106]25, [107]26]. The significant presence of Methanobacteriota within the archaea also shows its crucial role in methane metabolism in the cervid gut. Among all SGBs, 86.94% of the predicted proteins were annotated in the eggNOG database, indicating that a majority of predicted proteins possess detailed functional annotations. It provides a high-quality functional annotation dataset for the gut microbiota of cervids. Notably, carbohydrate metabolism was the predominant metabolic pathway among the annotated proteins, consistent with previous findings that the functions of ruminant gut microbiota primarily focus on energy metabolism [[108]27, [109]28]. The low similarity observed among a significant number of CAZymes in the annotation results indicates that these proteins are novel CAZymes [[110]29]. It not only establishes a foundational dataset for the functional annotation of cervid gut microbiota but also reveals the limitations of current CAZymes databases. Furthermore, the relative abundance of CAZymes of SGBs in plateau cervids was slightly greater than that in plain cervids, implying a potentially greater carbohydrate metabolic capacity within the gut microbiota of plateau cervids [[111]26, [112]30]. Coevolution of hosts and gut microbiota and genomic functional features Coevolution between hosts and gut microbiota represents a complex and interdependent process that is widely observed in nature [[113]8, [114]31]. For example, as humans transitioned from hunter-gatherer societies to agricultural and urban lifestyles, the diversity and structure of their gut microbiota underwent significant changes [[115]32]. Similarly, the gut microbiome composition of bats exhibits greater similarity to that of birds than to that of terrestrial mammals [[116]33]. Furthermore, the coevolution relationship of the bacterium Buchnera within aphids reveals that natural selection in the host drives the evolution of symbiotic bacteria, thereby accelerating the coevolution between the host and bacterium [[117]34]. We employed multiple methods to investigate the coevolution between gut microbiota and their cervid hosts. All the results showed coevolution between the gut microbiota of cervids and their hosts. Both Alistipes and Faecousia genera display a clear coevolutionary relationship with their hosts. These finding reveals that coevolution between hosts and gut microbiota is a widespread phenomenon that occurs broadly within the gut microbiota of cervids, rather than being restricted to specific taxon [[118]35]. The SGBs of the Alistipes genus predominantly originated from plateau hosts, suggesting that extreme plateau environments impose selective pressures on both hosts and their symbiotic gut microbiota, promoting coevolution. Furthermore, the observed coevolutionary relationships in two clades of Alistipes and their hosts indicate that, under the harsh environments of plateau, the gut microbiota of these hosts undergo a stable long-term coevolution process while also experiencing rapid species emergence together with adaptive evolution [[119]8, [120]36]. The SGBs of the genus Faecousia primarily originated from hosts in plain, with coevolutionary relationships observed only in clade 3, where plain hosts are more prevalent. It may be attributed to the relatively consistent environments in the plains, which enable the host-gut microbiota coevolutionary relationship to persist throughout the long process of evolution [[121]37]. The SGBs primarily originate from plateau hosts and harbor a greater number of CAZymes in the Alistipes clade 1 and Faecousia clade 1, exhibiting a higher relative abundance compared to other clades within the same genus. Furthermore, the metabolic pathways enriched by highly abundant genes in clades in the Alistipes and Faecousia, predominantly derived from plateau hosts, are primarily associated with polysaccharide degradation and short-chain fatty acid metabolism. In contrast, the metabolic pathways enriched by highly abundant genes in clades predominantly derived from plain hosts differ from these. These findings indicate that the genomic characteristics of SGBs from plateau cervids demonstrate superior capabilities in polysaccharide degradation, carbohydrates, and energy metabolism relative to those from plain. The products of these metabolic pathways not only enhance the host’s energy metabolism and thermogenesis but also improve the efficiency of food digestion and energy absorption [[122]7, [123]9, [124]27, [125]30]. SVs in cervid gut microbiota support adaptation to plateau SVs are prevalent in the genomes of species and constitute a primary source of genomic functional variation. Typically, SVs are closely linked to the phenotypes of organisms and could significantly impact phenotype, disease susceptibility, and the adaptive capacity of species [[126]38, [127]39]. Through the detection and screening of SVs, we observed that in SGBs from plateau cervids possess a significantly higher number of SVs than those from plain cervids. Furthermore, certain plateau cervid SGBs demonstrate a relative abundance of SVs that exceeds that of SGBs from plain cervids. These results imply that extreme plateau environments appear to exert considerable selective pressure on gut microbiota, prompting SGBs to generate greater SVs at the genome [[128]40]. Furthermore, the relative abundance of SVs in archaea was significantly lower than in bacteria, also implying that the genomic structure and replication mechanisms of archaea are conserved [[129]41, [130]42]. The KEGG enrich pathways associated with inserted and duplicated genes in the gut microbiota of plateau cervids exhibited a similar pattern, predominantly related to carbohydrate and energy metabolism. The pathways including fatty acid degradation, fatty acid biosynthesis, carbohydrate metabolism, energy metabolism, and amino acid metabolism were significant enriched in both gene sets. The metabolic products from these pathways can enhance the energy metabolism and thermogenic mechanisms of host, aiding their adaptation to the extreme environment of plateau [[131]43–[132]45]. Overall, these findings reveal that gut microbiota strains in plateau cervids have undergone significant genomic structural variation and enhanced energy metabolism efficiency as an adaptation to the extreme pressures of plateau, in contrast to plain. Conclusions In conclusion, our sequencing and assembly efforts have significantly advanced the understanding of gut microbiota diversity in cervids, revealing that prolonged exposure to extreme environmental pressures in the plateau enhances the diversity and complexity of the gut microbiota. A pronounced coevolutionary relationship exists between cervids and their gut microbiota, particularly in plateau environments, where the gut microbiota of plateau cervids undergoes both long-term stability and rapid species differentiation. These unique species, which have co-evolved with plateau cervids, exhibit enhanced capacities for carbohydrate and energy metabolism. Moreover, the extreme plateau environment has prompted SVs in the gut microbiota genome, primarily concentrating on the enhancement of metabolic pathways associated with energy metabolism. The observed increase in energy metabolism efficiency, both inter-specifically and intra-specifically within the gut microbiota of plateau cervids, promotes cervid adaptation to these extreme environments. These findings provide critical evidence for microbial contributions to host adaptation in extreme environments, while offering fresh insights into the coevolution of gut microbiota and their functional adaptations. Methods Metagenome samples All details about samples are listed in Additional file 2: Table S5. Two hundred twenty-two fresh fecal samples were obtained from Siberian roe deer (Capreolus pygargus), white-lipped deer (Przewalskium albirostris), red deer (Cervus elaphus), and sika deer (Cervus nippon) inhabiting plateau in this study. Fecal samples from Siberian roe deer (HCAP: n = 37) and sika deer (HCEN: n = 37) inhabiting plateau were collected from Qilian County, Haibei Tibetan Autonomous Prefecture, Qinghai Province, China. Qilian County (37°25′16″ N ~ 39°05′15″ N, 98°05′35″ E ~ 101°02′06″ E) is located in the heart of the Qilian Mountains, a region in the northeastern QTP, with an average elevation of 3169 m. Fecal samples from red deer (CEC: n = 60) were collected from Tianjun County, Haixi Mongol and Tibetan Autonomous Prefecture, Qinghai Province, China. Tianjun County (96°49′42″ E ~ 99°41′48″ E, 36°53′ N ~ 48°39′12″ N) is situated in the northeastern QTP, with an average elevation of 4100 m. Fecal samples from white-lipped deer (CEA: n = 88) were collected from Qumarlêb County (33°36′ N ~ 35°40′ N, 92°56′ E ~ 97°35′ E) and Zaduo County (93°38′ E ~ 96°12′ E, 32°08′ N ~ 34°15′ N) in Yushu Tibetan Autonomous Prefecture, Qinghai Province, China. Qumarlêb County is situated in the southwest of Qinghai Province and the northern part of Yushu Tibetan Autonomous Prefecture, within the Yangtze River headwater region of Sanjiangyuan National Park, with an average elevation of 4550 m. Zaduo County is located in the southern part of Qinghai Province and the southwestern part of Yushu Tibetan Autonomous Prefecture, within the Lancang River section of Sanjiangyuan National Park, with an average elevation of 4290 m. We located the deer herd in the field, observed their activity for no more than 1 h, and then immediately collected the fecal samples to ensure the freshness. The fresh, soft, and intact feces were collected, excluding loose stools to avoid potential variations caused by gastrointestinal diseases. To avoid cross-contamination of the fecal samples, we used disposable polyethylene (PE) gloves for collection. Fresh, moist feces were promptly sealed in disposable plastic bags, tightly wrapped in thick aluminum foil, and secured with transparent tape. Then, the samples were then stored in a − 196 °C liquid nitrogen tank carried in the accompanying vehicle to ensure freshness. Upon returning to the lab, they were stored in a − 80 °C freezer until DNA extraction was performed. In addition, 192 publicly accessible metagenomic samples were retrieved from the NCBI database. These samples were from moose (Alces alces) (ALA: n = 35), sika deer (LCEN: n = 87), Siberian roe deer (LCAP: n = 40), and Chinese water deer (Hydropotes inermis) (HYI: n = 30) from plain areas below 1000 m in China. DNA extraction and quality control High-quality microbial DNA was extracted from fecal samples using the Mag-Bind Soil DNA Kit (OMEGA). Specifically, 0.8 ml of Buffer SLX Mlus, 500 mg of glass beads, and 0.5 g of processed fecal sample were combined in a 2-ml centrifuge tube and vortexed for 1–5 min. Subsequently, 80 μl of Buffer DS was added, mixed thoroughly, and the mixture was incubated at 70 °C for 10 min with intermittent inversion of the tube. After centrifugation, the supernatant was transferred to a new 2-ml centrifuge tube, and 200 μl of Buffer SP2 was added and mixed. Following this, 100 μl of HTR Reagent was added, the sample was ice-bathed for 5 min, and centrifugation was repeated. The supernatant was then transferred to a new 1.5-ml centrifuge tube, mixed with 40 μl of magnetic beads and 450 μl of Binding Buffer, allowed to stand at room temperature for 2 min, and placed on a magnetic rack until the solution clarified and the beads adhered to the side. The liquid was aspirated. Subsequently, 500 μl of Binding Buffer, 1000 μl of Buffer PHB, and 1000 μl of SPM Wash Buffer were added sequentially, with mixing and clarification on the magnetic rack between each addition, and the liquid was aspirated each time. The beads were then air-dried at room temperature for 15 min until slightly cracked. Finally, 50 μl of Elution Buffer was added, incubated at 65 °C for 15 min, and the tube was placed on the magnetic rack until the solution clarified. The DNA solution was transferred to a new 1.5-ml centrifuge tube. The DNA concentration was measured using a Qubit 4 fluorometer, and DNA quality was assessed by 1% agarose gel electrophoresis. Library preparation, sequencing, and quality control of data The metagenome sequencing library was constructed using the standard Illumina TruSeq DNA library preparation protocol. Initially, DNA was fragmented with a Bioruptor machine. End Repair Mix was then applied to repair the DNA fragments with protruding ends and to introduce an “A” base at the 3′ end for complementary pairing with the “T” bases of the adapters. Adapters containing unique tags were subsequently ligated to the DNA fragments using ligase. To remove free and self-ligated adapter sequences, magnetic beads were employed, and DNA fragments of the appropriate size were selected and purified. PCR was performed to selectively enrich for DNA fragments with adapters on both ends and to amplify the DNA library. The library was quantified using a Qubit 4 fluorometer, and the quality of the PCR-enriched fragments was evaluated using an Agilent 2100 bioanalyzer to confirm fragment size and distribution. Finally, the library, diluted to a concentration of 10 nM, was sequenced on the Illumina NovaSeq 6000 platform (PE150). To ensure the quality of the raw data from the Illumina sequencing platform, quality control was performed using fastp v0.23.4 [[133]46]. This software was employed to filter out low-quality fragments with poor average quality scores, adapter contamination (with adapter sequences overlapping more than 15 base pairs), undetermined bases, and excessively short fragments. The quality control parameters applied were “-W 5 -M 20 −5 -q 15 -u 40 -n 0 -l 75” for the metagenomic paired-end reads. Following this, the filtered DNA sequencing data were mapped to the host genome, obtained from NCBI, using bwa-mem2 v2.2.1 to remove host contamination [[134]47]. The host genome information is listed in Additional file 2: Table S6. Metagenome assembly and binning SPAdes v4.0.0 was used to assemble high-quality sequences, which were then subjected to quality control and host removal to generate contigs [[135]48]. Short fragments less than 1000 bp were filtered out using seqkit v2.8.2 to ensure the integrity of the contigs [[136]49]. Subsequently, Bowtie2 v2.5.4 was employed to map the host-removed reads back to the filtered contigs, producing corresponding SAM files [[137]50]. These SAM files were converted to BAM format using Samtools v1.21, followed by indexing and sorting of the contigs [[138]51]. The jgi_summarize_BAM_contig_depth script from MetaBAT2 v2.17 was then used to determine sequencing depth for the contigs from the BAM files [[139]52, [140]53]. Metagenomic binning was performed based on contig sequence characteristics and sequencing depth, resulting in various bins. Each bin represents one MAG. The quality of the bins was assessed using CheckM v1.2.3, with bins classified into quality categories based on completeness and contamination levels: bins with completeness < 50% or contamination > 10% were deemed low-quality; bins with completeness ≥ 50% and contamination ≤ 10% were categorized as medium-quality; and bins with completeness ≥ 80% and contamination ≤ 10% were considered high-quality [[141]29, [142]54]. Then, dRep v3.5.0 was used to remove redundancy among high-quality bins at both strain and species levels, with average nucleotide identity (ANI) thresholds set at 99% and 95%, respectively [[143]55]. For duplicate bins, scores were calculated based on completeness − 5 × contamination + 0.5 × log (N50), retaining the bins with the highest score. Following the dRep v3.5.0 redundancy removal procedure, bins from each host were first dereplicated to the strain level using the parameters “-pa 0.9 -sa 0.99 -nc 0.30 -cm larger.” The resulting strain-level bins were then dereplicated to the species level using the parameters “-pa 0.9 -sa 0.95 -nc 0.30 -cm” [[144]55]. Finally, the SGBs from each host were merged, and redundant SGBs were removed using the parameters “-pa 0.9 -sa 0.95 -nc 0.30 -cm larger.” The intermediate results from these dereplication processes were used for subsequent analyses [[145]55]. MAG taxonomic annotation, abundance calculation, and annotation Taxonomic annotation was performed using the gtdbtk v2.4.0 [[146]56]. Specifically, the classify_wf function from the GTDB was used for taxonomic annotation of SGBs. The naming of prokaryotes follows the latest standards of the International Code of Nomenclature of Prokaryotes (ICNP) [[147]57, [148]58]. Phylogenetic trees were constructed using PhyloPhlAn based on over 400 universal protein markers from microbial genomes and visualized using the TVBOT website ([149]https://www.chiplot.online/tvbot.html) [[150]59, [151]60]. MAG abundance was quantified for each sequence assembled by SPAdes using the quant_bins parameter of MetaWRAP v0.02 [[152]61]. The Spearman correlation between MAG abundance and completeness was assessed using the R package ggplot2 v3.5.1 [[153]62]. Rarefaction curve analysis was conducted using the “rarefaction_curve.py” auxiliary script (v0.3.0) from the InStrain ([154]https://github.com/MrOlm/inStrain), applied to Bowtie 2-aligned BAM files [[155]63]. MAGs demonstrating a coverage breadth below 50% were excluded based on initial rarefaction results. For each rarefaction level, the analysis involved three sequential steps: First, a scaling threshold was determined by calculating the ratio of rarefaction depth to total sequencing depth. Second, scaled coverage values for individual MAGs were obtained by multiplying their original coverage values by this scaling factor. Finally, MAGs achieving scaled coverage values ≥ 1 were classified as detected [[156]64]. Open reading frames (ORFs) for each high-quality MAG were predicted using the Prodigal function of the bakta v1.9.4, and the generated ORFs were translated into protein sequences using the bakta-provided database [[157]57, [158]65, [159]66]. Functional annotation of the protein sequences was performed using eggNOG-mapper based on the eggNOG database [[160]67]. Carbohydrate-active enzyme (CAZy) annotation was carried out using dbCAN v4.1.4 and HMMER v3.4 against the CAZY database for all predicted proteins [[161]68, [162]69]. Phylogenetic relationships and coevolution of cervids and MAGs Complete circular mitochondrial sequences of the host were reconstructed from the clean metagenome sequences using GetOrganelle [[163]70]. The mitochondrial sequences were subsequently aligned with MAFFT [[164]71, [165]72]. A maximum likelihood tree was then constructed based on mitochondrial sequences using IQ-tree within the PhyloSuite [[166]73, [167]74]. The optimal nucleotide substitution model was determined via ModelFinder, and the phylogenetic tree was inferred accordingly [[168]75]. To evaluate the support for the nodes in the phylogenetic tree, an ultra-fast bootstrap approach was utilized, generating 5000 pseudo-replicate samples. The phylogenetic tree of MAGs was reconstructed using the PhyloPhlAn software, which employs clade-specific maximally informative phylogenetic markers [[169]59]. Two distance-based global testing programs, ParaFit and PACo, along with the event-based program eMPRess, were employed to evaluate the topological consistency between the host phylogenetic tree and the MAG phylogenetic tree [[170]76]. ParaFit assessed the independence of host and parasite evolution using 999 permutations, with a p value of ≤ 0.05 considered significant [[171]77]. PACo examined the dependency between the two phylogenies, providing graphical assessments and goodness-of-fit statistics through Procrustes plots and 1000 randomizations to determine significance [[172]78]. eMPRess applied a duplication-transfer-loss (DTL) model to adjust the paired phylogenetic trees, utilizing maximum parsimony to calculate the cost of each coevolution event and comparing the best adjustment cost for each dataset against the costs of randomly permuted trees [[173]79]. To further explore the genomic metabolic features of SGBs from distinct clades within the genera Alistipes and Faecousia, CAZymes annotation was conducted on the SGBs using the dbCAN [[174]69]. The relative abundance of CAZymes was normalized based on the genome size of SGBs. Subsequent annotation of genes from the genera Alistipes and Faecousia was conducted using the KEGG database. To account for variability across different clades, the average abundance of genes sharing the same KEGG Orthology (KO) number was normalized and processed based on their respective SGBs. The top 5% genes which have higher average abundances in clade 1 compared to clade 2 and clade 2 compared to clade 1 in genus Alistipes and clade 1 compared to clade 2 and clade 3, clade 2 compared to clade 3 and clade 1, and clade 3 compared to clade 1 and clade 3 in genus Faecousia were selected and performed function enrichment using the KEGG pathway, respectively. MAG structural variant analysis Structural variant (SV) events, insertions, deletions, duplications, translocations, inversions, and contractions were identified for each species cluster of SGBs. It was achieved by comparing the reference MAG with the highest dRep score to other query MAGs using the MUM&Co (Additional file 2: Table S4) [[175]80]. SV events occurring within 10 bp of contig ends were excluded to minimize potential false positives. The total number of SV events was subsequently normalized according to the genome size of the MAGs. Genes located within SVs greater than 200 bp were screened and predicted, and the KEGG Orthology (KO) profiles were annotated using eggNOG-mapper [[176]81]. The genes within SVs were designated as the foreground gene set, while all genes from MAGs containing SVs were used as the background gene set for KEGG enrichment analyzing by TBtools [[177]71]. Supplementary Information [178]12915_2025_2269_MOESM1_ESM.docx^ (2.8MB, docx) Additional file 1: Figures S1–S8. Fig. S1 Spearman correlation between relative abundance and assembly completeness of MAGs. Fig. S2 Rarefaction analysis. Fig. S3 Comparison of relative numbers of unknown SGBs among plateau and plain cervid species. Fig. S4 Sequence similarity of annotated CAZy enzymes. Fig. S5 Coevolution between cervid gut microbiota SGBs in genera Faecousia and hosts. Fig. S6 Genomic and metabolic features of different clades in Alistipes and Faecousia. Fig. S8 KEGG enriched metabolism pathways associated with the top 20 deletion SVs in the gut microbiome of plateau cervids. [179]12915_2025_2269_MOESM2_ESM.xlsx^ (5.4MB, xlsx) Additional file 2: Tables S1–S6. Table S1 Sequence data statistics and quality control information. Table S2 Statistics and quality control information of MAGs. Table S3 ParaFit H-S link model between 3896 SGBs and cervid hosts. Table S4 SVs statistics information. Table S5 All detailed information on cervids and samples. Table S6 Cervid host genome information. Acknowledgements