Abstract

   Taimen (Hucho taimen) is an important ecological and economic species
   that is classified as vulnerable by the IUCN Red List of Threatened
   Species; however, limited genomic information is available on this
   species. RNA‐Seq is a useful tool for obtaining genetic information and
   developing genetic markers for nonmodel species in addition to its
   application in gene expression profiling. In this study, we performed a
   comprehensive RNA‐Seq analysis of taimen. We obtained 157 M clean reads
   (14.7 Gb) and used them to de novo assemble a high‐quality
   transcriptome with a N50 size of 1,060 bp. In the assembly, 82% of the
   transcripts were annotated using several databases, and 14,666 of the
   transcripts contained a full open reading frame. The assembly covered
   75% of the transcripts of Atlantic salmon and 57.3% of the
   protein‐coding genes of rainbow trout. To learn about the genome
   evolution, we performed a systematic comparative analysis across 11
   teleosts including eight salmonids and found 313 unique gene families
   in taimen. Using Atlantic salmon and rainbow trout transcriptomes as
   the background, we identified 250 positive selection transcripts. The
   pathway enrichment analysis revealed a unique characteristic of taimen:
   It possesses more immune‐related genes than Atlantic salmon and rainbow
   trout; moreover, some genes have undergone strong positive selection.
   We also developed a pipeline for identifying microsatellite marker
   genotypes in samples and successfully identified 24 polymorphic
   microsatellite markers for taimen. These data and tools are useful for
   studying conservation genetics, phylogenetics, evolution among
   salmonids, and selective breeding for threatened taimen.

   Keywords: comparative transcript analysis, Hucho taimen, microsatellite
   markers, positive selection, RNA‐Seq

1. INTRODUCTION

   The genus Hucho belongs to the Salmonidae family and includes four
   species that are endangered. Taimen (Hucho taimen) is a strictly
   freshwater member of Hucho that is distributed widely from the Danube
   drainage basin to the Pacific Ocean in terms of longitude and from the
   Arctic Ocean to Mongolia and northern China in terms of latitude. In
   recent decades, overexploitation, hydropower dams, and pollution have
   dramatically diminished the habitat of this species (Gilroy et al.,
   [34]2010; Zolotukhin, [35]2013), and its population has decreased
   approximately 37.3% worldwide (Hogan & Jensen, [36]2013). Taimen is
   also a rare endemic species in China. In the 1950s, this fish had a
   wide distribution from Northwest China to Northeast China, and the
   harvest was abundant. At present, however, only a small population
   remains in Kanas Lake in Xinjiang Province and the Amur River in
   Heilongjiang Province, and the estimated abundance in China has
   decreased by 80% over the past three generations (Hogan & Jensen,
   [37]2013). Therefore, taimen has been listed as an endangered species
   in China since 1998 (Yue & Chen, [38]1998) and has been included on the
   China Species Red List since 2004 (Wang & Xie, [39]2004). The global
   endangered status of taimen was re‐evaluated in 2013, and it was then
   classified as vulnerable on the IUCN Red List of Threatened Species
   (Hogan & Jensen, [40]2013).

   Taimen also has great potential to become an excellent aquaculture
   species because it is the largest salmonid and can grow to 55–66 kg in
   body weight and 160–170 cm in total length (Andreji & Stráňai,
   [41]2013). Moreover, taimen is one of the fastest growth species among
   salmonids (Andreji & Stráňai, [42]2013), and its body weight increases
   at an approximately linear rate below 10 years of age (Andreji &
   Stráňai, [43]2013). For conservation and exploitation of this species,
   artificial propagation and fry rearing have been successfully
   developed, and the species has been cultured widely in China (Li, Wang,
   Liu, Yin, & Lu, [44]2016; Wang et al., [45]2016). During aquaculture
   activity, taimen presents higher bacterial disease resistance than
   rainbow trout under the same conditions (Li et al., [46]2016; Wang
   et al., [47]2016).

   Although taimen is considered an ecologically and economically valuable
   species, few genetic resources are available for studies of its
   conservation genetics, phylogenetics, and selective breeding. Only
   approximately 450 taimen nucleotide recodes are available in GenBank
   (searched in 9 May 2017) and most are mitochondrial genes. Tong, Kuang,
   Yin, Liang, and Sun ([48]2006) and Wang, Kuang, Tong, and Yin
   ([49]2011) developed microsatellite markers; and Wang, Zhang, Yang, and
   Song (2011) and Balakirev, Romanov, Mikheev, and Ayala ([50]2016)
   sequenced the complete mitochondrial genome. RNA‐Seq is an excellent
   technology for studying phylogenetics and evolution and developing SSR
   and SNP markers for nonmodel species (Cahais et al., [51]2012; Ekblom &
   Galindo, [52]2010; Qian, Ba, Zhuang, & Zhong, [53]2014). In this study,
   we used RNA‐Seq to construct the transcriptome of taimen and develop
   SSR and SNP markers, and we also performed a systematic cross‐species
   comparative analysis for taimen and other salmonids.

2. MATERIALS AND METHODS

2.1. Sample collection and RNA isolation

   The RNA‐Seq analysis was performed on first‐generation offspring
   (Figure [54]1) artificially reproduced from wild stock, and these fish
   were collected from the Bohai Station of the Heilongjiang River
   Fisheries Research Institute (HRFRI) of the Chinese Academy of Fishery
   Sciences. To obtain the whole reference transcriptome, 10 individuals
   were collected, including two individuals at age 4+, six individuals at
   age 2+, and two individuals at age 1+. Twelve organs from each
   individual, including the skin, muscle, eyes, brain, spleen, kidney,
   intestines, stomach, liver, testes, ovaries, and gills, were dissected
   for RNA extraction, and all tissue samples were stored in RNALater
   solution (Qiagen, CA, USA) for transport. Total RNA was extracted using
   an RNeasy kit (Qiagen, CA, USA) and treated with DNaseI (Invitrogen,
   CA, USA) to remove genomic DNA. After a quality examination using a
   Bioanalyzer 2100 system (Agilent, CA, USA) and quantitation using a
   NanoDrop 8000 system (Thermo Fisher Scientific Inc., CA, USA), equal
   quantities of total RNA with RIN (RNA integrity number) ≥7.0 from each
   organ of each individual were mixed to construct the RNA‐Seq library.
   All experiments involving the handling and treatment of fish in this
   study were approved by the Animal Care and Use committee of the HRFRI
   of the Chinese Academy of Fishery Sciences.

Figure 1.

   Figure 1
   [55]Open in a new tab

   Yong fish (age 3+) of Hucho taimen. Photographed by Wei Xu, with
   permission

2.2. cDNA library construction and sequencing

   Two normalized cDNA libraries were constructed using an RNA‐Seq assay
   for transcriptome sequencing. One paired‐end cDNA library was generated
   from the pooled total RNA of 12 tissues from one individual at age 4+,
   and another library was generated from the pooled total RNA of 12
   tissues from the other nine individuals. The paired‐end cDNA libraries
   were prepared using a TruSeq RNA prep Kit (Illumina, CA, USA) according
   to the Illumina protocols. The insert size was approximately 200 bp,
   and the two libraries were normalized with a DSN normalization kit
   (Illumina, CA, USA). The normalization libraries were sequenced on an
   Illumina HiSeq2000 system in the 100‐bp pair‐end mode. The cDNA library
   construction and sequencing were performed by a commercial service
   company (Genergy Inc, Shanghai, China).

2.3. De novo assembly

   The transcriptome sequences were assembled using the Trinity package
   (Grabherr et al., [56]2011; Haas et al., [57]2013). Before assembly,
   low‐quality reads were filtered from the raw reads using Trimmomatic
   (Bolger, Lohse, & Usadel, [58]2014) with the parameters LEADING:30
   TRAILING:30 SLIDINGWINDOW:4:20 MINLEN:50. The clean reads from the two
   pooled libraries were merged and in silico normalized using the Trinity
   package with default parameters to reduce the running time and memory
   consumption. A parameter kmer size of 25 and a depth of at least two
   kmer were used for assembly with the Trinity package. The contigs
   resulting from Trinity were further fed to the TGI clustering Tool
   (version 2.1) (Pertea et al., [59]2003) to process alternative splicing
   and redundant sequences.

2.4. Assembly assessment

   A gold standard is not available to assess the quality of a de novo
   assembly transcriptome for a nonmodel species. We adopted general
   methods suggested by Trinity (Haas et al., [60]2013) and Fu et al. (Fu
   & He, [61]2012) to assess the assembly quality. The assessment method
   consisted of five criteria: read composition of the assembly, number of
   full‐length protein‐coding transcripts, quality of the assembled
   transcript sequences, completeness, and gene coverage. First, Bowtie2
   (Langmead & Salzberg, [62]2012) was used to map the clean reads to the
   assembly, and a Perl script in the Trinity package was then used to
   summarize the properly mapped reads. Generally, a vast number of reads
   were mapped back to the assembly, and among them, approximately 70%–80%
   of the mapped reads were proper pairs. Second, the full‐length
   protein‐coding transcripts were characterized using the Blastx or
   Blastp tool, and the Swiss‐Prot protein database and several species’
   (including zebrafish, threespine stickleback, Atlantic salmon, rainbow
   trout, medaka, fugu, and green spotted puffer) whole‐genome‐predicted
   protein sequences (Table [63]S1) were used as references to search the