ABSTRACT Chinese serow (Capricornis milneedwardsii) is mainly distributed in the south of Yellow River in China, which has been listed as vulnerable by the International Union for Conservation of Nature (IUCN). However, the reference genome of serow has not been reported and its taxonomic status is still unclear. Here, we first constructed a high‐quality chromosome‐level reference genome of C. milneedwardsii using PacBio long HiFi reads combined with Hi‐C technology. The assembled genome was ~2.83 Gb in size, with a contig N50 of 100.96 Mb and scaffold N50 of 112.75 Mb, which were anchored onto 24 chromosomes. Furthermore, we found that the Chinese serow was more closely related to muskox, which diverged from ~4.85 million years ago (Mya). Compared to the karyotype of goat (2n = 60), we found the Chinese serow (2n = 48) experienced six chromosome fusions, which resulted in the formation of six central centromere chromosomes. We also identified two positively selected genes (MYH6 and DCSTAMP) specific to Chinese serow, which were involved in ‘viral myocarditis’ and ‘Cardiac muscle contraction’. Interestingly, compared to other Caprinae animals, the MYH6 protein of Chinese serow occurred two mutations (E1520S and G1521S), which might be related to hypoxia tolerance. The high‐quality reference genome of C. milneedwardsii provides a valuable information for protection of serows and insights into its evolution. Keywords: Chinese serow, chromosomal evolution, Hi‐C, HiFi, hypoxia tolerance __________________________________________________________________ We constructed a high‐quality chromosome‐level reference genome of C. milneedwardsii for the first time. The Chinese serow was more closely related to muskox, and the karyotype of serow (2n = 48) occurred six chromosome fusions. Interestingly, compared to other Caprinae species, the MYH6 protein of Chinese serow occurred two mutations (E1520S and G1521S) which were conserved in Cetaceans. graphic file with name ECE3-14-e70400-g004.jpg 1. Introduction The serow (Capricornis spp.) is a typical bovid herbivore in the tropical and subtropical regions of southeast Asia, belonging to the Caprinae subfamily (Meng et al. [44]2009), which has been listed as vulnerable by the IUCN (Chen et al. [45]2021). The Capricornis genus is comprised of six species: Chinese serow (C. milneedwardsii), Sumatran serow (C. sumatraensis), Formosan serow (C. swinhoei), Himalayan serow (C. thar), Japanese serow (C. crispus) and Red serow (C. rubidus) according to the Mammal Species of the World (3rd edition) (Wilson and Reeder [46]2005). Chinese serow, also known as Tenma, is mainly distributed in the south of Yellow River in China (Figure [47]1), which can be divided into five subspecies according to the physiological characteristics and geographical location (Wang [48]1990). FIGURE 1. FIGURE 1 [49]Open in a new tab Chinese serow in Qinling Mountains (Photographer: Naxun Zhao). Based on the morphology, characteristic of chromosomes or analysis of mitochondrial cytochrome b genes (Cytb) sequences, serow (2n = 48) most closely related to goral (2n = 50) (Jantarat et al. [50]2017). Compared to other Caprinae animals, serow was more closely related to goral based on the mitochondrial genome sequence (Dou, Zhang, and Feng [51]2016; Song et al. [52]2023). However, the mitochondrial phylogeny might be mixed with different species (Song et al. [53]2023). Therefore, the phylogenetic analysis of the whole‐genomic level was required to classify the taxonomic status of serow. In this study, we constructed a chromosome‐level reference genome of C. milneedwardsii using PacBio long HiFi reads combined with Hi‐C technology. Furthermore, the taxonomic status and chromosomal evolution of serow were analyzed. What's more, positive selection gene and rapid evolution gene were identified. These findings will be helpful to understand the evolution of Caprinae species. 2. Materials and Methods 2.1. Sample Collection and Ethics Statement The liver sample from a dead adult female Chinese serow collected in December 2022 at Louguantai, Zhouzhi, Xi'an City, Shaanxi Province, China was used for genome sequencing. Sample collection and the experimental procedures were carried out according to the guidelines of the China Council on Animal Care and approved by the Experimental Animal Management Committee (EAMC) of Northwest A&F University (Approval ID: XN2023‐0619). 2.2. Genome Sequencing and Assembly Obtaining HiFi reads using the PacBio Sequel II platform. First, the software HiFiAdapterFilt (v2.0.0) (Sim et al. [54]2022) was used with default parameters 44 bp and 97% match to remove the remaining adapter sequences from PacBio HiFi reads. Only 7 reads contaminated with adapters were removed. Second, the remaining HiFi reads were assembled de novo to the contig level using hifiasm (v0.19.5) with default parameters (Cheng et al. [55]2021). The clean Hi‐C reads were aligned to the contigs using Chromap (v0.2.4) with the additional parameter – remove‐pcr‐duplicates (Zhang et al. [56]2021). Third, the alignments sorted by name were used as input for YaHs (v1.2a.1) to perform genome scaffolding (Zhou, Mccarthy, and Durbin [57]2023). Finally, visualization and manual inspection of the Hi‐C interaction matrix were conducted in Juicebox (Durand et al. [58]2016). The TGS‐GapCloser (v1.2.1) (Xu et al. [59]2020), Pilon (v1.23) (Walker et al. [60]2014), BUSCO (v5.4.7) (Simao et al. [61]2015) and the mammalia_odb10 database were used to assess genome completeness. The distribution of GC content in the genome was determined using bedtools (v2.31.0) software with a window size of 1 Mb (Quinlan [62]2014). The genome resequencing was conducted using the Illumina NovaSeq6000 platform. The software GCE (v1.0.2) ([63]https://arxiv.org/abs/1308.2012v2) was used to estimate the genome size and heterozygosity. The estimation was based on 40× Illumina WGS reads using the k‐mer method with a k‐mer size of 17. 2.3. Repeats and Gene Annotation RepeatMasker (v4.1.5) ([64]https://www.repeatmasker.org) and repeat database (including Dfam v.3.7 and RepBase library October 26, 2018) were used to identify repeats in the serow genome. The software Liftoff (v1.6.3) (Shumate and Salzberg [65]2021) was utilized to map the annotation of the sheep genome (ARS‐UI_Ramb_v2.0_GCF_016772045.1, annotation‐source NCBI Ovis aries Annotation Release 104) onto the serow genome. The parameters were polish, copies, and flank 0.2. The input comprised the genome sequences of sheep and serow, along with the GFF‐formatted annotation of the sheep genome. Minimap2 (v2.26‐r1175) (Li [66]2018) was used to align the entire gene sequence, including exons and introns, to the target, resulting in the gene annotation file of the serow, from which protein‐coding genes were filtered. The Circos (v0.69–9) software (Krzywinski et al. [67]2009) was used to visualize the details of the serow genome. The outermost track represents chromosome lengths, followed by gene density, repetitive sequences, and GC content, each with a window size of 1 Mb. Centromere: 50,000 bp was defined as the longest merging distance between the two sections, and the location of centromere on each chromosome was found with the interval span greater than 200,000 bp as the filter condition. Telomeres: Telomere positions on each chromosome were determined using a filtering criterion of telomeric repeat sequences extending at least 500 bp from the chromosome ends. Additionally, we assessed the presence of telomeric repeat sequence (TTAGGG) within the terminal 20 kb of the autosomal telomeric sequences. Using the AGAT (v1.2.0) software to process annotation results, the agat_sp_keep_longest_isoform.pl. script was executed to extract the annotation file of the longest transcript sequences from the annotation file of the serow genome. The agat_sp_extract_sequences.pl. script was executed to obtain the protein sequences of the longest transcripts. CDSs (average CDSs length per gene) and exon average length were calculated from the longest transcription sequence annotation file of serow. According to the longest transcription sequence annotation file of serow, the intron sequence position is obtained by taking the difference set between gene sequence position and exon sequence, and then its average length is calculated. 2.4. Mitochondrial Genome Assembly The MitoHiFi (v2.2) software (Uliano‐Silva et al. [68]2023) was used to assemble the mitochondrial genome from contigs using the serow mitochondrial genome [69]NC_023457.1 as a reference. Mitochondrial sequences were retrieved from contigs using BLASTn, and sequences with alignment length > 80% of mitochondrial length and identical matches > 90% were identified as mitochondrial sequences. All 64 sequences identified were small contig (< 60 kb) that did not interact with other sequences. The mitochondrial genome was annotated with MitoFinder (v1.4) (Allio et al. [70]2020) and visualized by OGDRAW (Greiner, Lehwark, and Bock [71]2019). 2.5. Phylogenetic Analysis and Divergence Time Estimation The phylogenetic tree was constructed including 18 mitochondrial genomes: C. milneedwardsii (OR_551470), Ovis aries ([72]NC_001941.1), Ovis ammon ([73]KT781689.1), Ovis nivicola ([74]NC_039431.1), Capra sibirica ([75]NC_020626.1), Capra hircus ([76]MZ073671.1), Capra aegagrus ([77]NC_028161.1), Oreamnos americanus ([78]NC_020630.1), Budorcas taxicolor ([79]NC_013069.1), Ovibos moschatus ([80]NC_020631.1), Pantholops hodgsonii ([81]NC_007441.1), Bos taurus ([82]NC_006853.1), Hemitragus hylocrius (the mitochondrial genome was assembled based on the Illumina WGS sequencing data SRR11430235), Pseudois nayaur ([83]NC_020632.1), Ammotragus lervia ([84]NC_009510.1), Ovis orientalis ([85]NC_026063.1), Naemorhedus goral ([86]NC_021381.1) and Rupicapra rupicapra ([87]NC_020633.1). Mitochondrial genome was applied to MEGA (v11) (Tamura, Stecher, and Kumar [88]2021). The ClustalW was used for sequence alignment, and the Maximum Likelihood method was selected with the bootstrap value set to 100 to construct the mitochondrial phylogenetic tree. The genomes of Pantholops hodgsonii (GCF_000400835.1), Oreamnos americanus (GCA_009758055.1), Ovibos moschatus (GCA_021462335.1), Ovis ammon (GCA_028583565.1), Capra aegagrus (GCA_000978405.1), Capra hircus (GCA_015443085.1), C. milneedwardsii (serow.2 genome, GCA_032405125.2), Ovis aries (GCF_016772045.1), Capra sibirica (GCA_003182615.2), Ovis nivicola (GCA_903231385.1), Budorcas taxicolor (GCF_023091745.1), Hemitragus hylocrius (GCA_004026825.1), Pseudois nayaur (GCA_003182575.1), Ammotragus lervia (GCA_002201775.1), Ovis orientalis (GCA_014523465.1), and Rupicapra rupicapra (GCA_963981305.1) were separately aligned to the genome of Bos Taurus (GCF_002263795.1) using the LAST (v1409) (Kielbasa et al. [89]2011). The comparison results were then combined with Multiz (v11.2) (Blanchette et al. [90]2004) and filtered with Gblocks (v0.91b), retaining 4.31 million quadruple degenerate sites. A species tree was then constructed using RAxML‐NG (v1.2.0) (Kozlov et al. [91]2019) under the GTR + G model, with bootstrap replicates set to 100. The BASEML subroutine in the PAML (v4.10.7) package (Yang [92]2007) was used to calculate the neutral replacement rate (0.197217 + −0.000417). The calculated neutral replacement rate was used to calculate the divergence time for MCMCTREE, and the time calibration was performed on both nodes using the previously studied calibration time. The divergence between cattle and other animals is ‘> 18.3 < 28.5’ Mya, the divergence between goats and sheep is ‘> 3.9 < 8.1’ Mya (Yang [93]2007). 2.6. Chromosomal Evolution The LAST (version 1409) (Kielbasa et al. [94]2011) was used to align the goat genome (GCA_015443085.1) to the takin (GCF_023091745.1) and the assembled serow genome to the goat genome, respectively. Using MCScan (v1.3.6) (Tang et al. [95]2024), a fragment length is more than 30,000 bp for filter, linear relation between visualization on two groups. 2.7. Heterozygosity Estimation The genome sequencing data was compared with the assembled genome using BWA‐MEM (v0.7.17). The SAMtools (v1.17) software (Li et al. [96]2009) was used to convert SAM files to BAM files. GATK (v4.1.8.1) software (Mckenna et al. [97]2010) was used to invoke and screen single nucleotide polymorphisms (SNPs). SNPs with mutation type 0/1 were screened out, and then the number of SNPs was compared with the length of the reference genome to obtain the heterozygosity of the genome. Heterozygosity of other species was obtained from previous research (Liu et al. [98]2021). 2.8. Positive Selection Gene, Rapid Evolution Gene Identification The CODEML subroutine in PAML was used to identify positive selection genes (PSGs) and rapid evolution genes (REGs) in the serow genome. Serow branches in phylogenetic trees were designated as foreground branches for independent positive selection analysis. The PSG was determined by the branch‐site model, where the alternative model allowed a positive selection of the site on the foreground branch. REG was tested with the branch model, where the alternative model allowed for different rates for different branches. Models allowing positive selection (dN/dS > 1) and empty models allowing neutral evolution and purification selection (dN/dS < 1) were compared using the likelihood ratio test (LRT). KOBAS (v3.0) software (Bu et al. [99]2021) and DAVID software were used for KEGG and GO pathway enrichment analysis of the identified positive selection genes and rapid evolution genes (p < 0.05). 2.9. Prediction of the Three‐Dimensional Structure of MYH6 Protein Download the amino acid sequence of the bovine MYH6 gene from NCBI, the software SMART (Simple Modular Architecture Research Tool) ([100]https://smart.embl‐heidelberg.de/) was used to predict the functional domain in the protein sequence. The nucleotide sequences of MYH6 gene were converted into amino acid sequences using the CODEML subroutine in PAML (v4.10.7) (Yang [101]2007). The converted amino acid sequences were subjected to multiple sequence alignment using the Clustal Omega (version 1.2.4) tool available in the EMBOSS software ([102]https://www.ebi.ac.uk/Tools/emboss/). Download the amino acid sequence of the human MYH6 gene from NCBI. Visualize the retrieved three‐dimensional structure from the database using PyMOL (version2.5.7) ([103]https://pymol.org/). Mutate the amino acids at positions 1520 and 1521 from E and G to S and S, respectively. Visualize the hydrogen bonds formed between the surrounding atoms and the two amino acids before and after the mutation. Utilize the software ProtScale ([104]https://web.expasy.org/protscale/) to analyze the hydrophobicity of the two positions before and after the mutation and their adjacent amino acids. 3. Results 3.1. Chromosome‐Level de novo Genome Assembly of Chinese serow To estimate the genome size of C. milneedwardsii, 120.10 Gb clean reads were used for k‐mer analysis. The genome size was estimated at 2.88 Gb with 0.43% heterozygosity (k = 17, Figure [105]S1). A total of 107.2 Gb HiFi reads (~37×) were generated by the PacBio Sequel II platform (Table [106]S1). We assembled a ~2.83 Gb reference genome with 240 contigs and a contig N50 of 100.96 Mb using Hifiasm v0.19.5 (Table [107]1). TABLE 1. Assembly statistics of Chinese serow genome. Assembly Number/length Total assembly length (bp) 2,828,266,837 Gap number 6 Number of contigs 165 N50 contig length (bp) 100,955,055 Contig L50 11 Number of scaffolds 158 N50 scaffold length (bp) 112,748,084 Scaffold L50 9 Chromosome number 23 + X GC content (%) 43.07 Protein‐coding genes 20,783 [108]Open in a new tab To obtain a chromosome‐level assembly, 352.60 Gb Hi‐C reads (~120×) spanned to 252 scaffolds with a N50 of 112.75 Mb (Table [109]S2), which anchored onto 24 chromosomes (Figure [110]2A). The gap in Chr 7 was filled by TGS‐GapCloser v1.2.1. After deleting the mitochondrial genome sequences and polishing with pilon v1.23, the completeness of final assembled genome reached 95.90% with six gaps using cetartiodactyla_odb10 database in BUSCO v5.4.7 (Figure [111]2B, Table [112]S3). Compared to the references of other species, Chinese