Abstract Taimen (Hucho taimen) is an important ecological and economic species that is classified as vulnerable by the IUCN Red List of Threatened Species; however, limited genomic information is available on this species. RNA‐Seq is a useful tool for obtaining genetic information and developing genetic markers for nonmodel species in addition to its application in gene expression profiling. In this study, we performed a comprehensive RNA‐Seq analysis of taimen. We obtained 157 M clean reads (14.7 Gb) and used them to de novo assemble a high‐quality transcriptome with a N50 size of 1,060 bp. In the assembly, 82% of the transcripts were annotated using several databases, and 14,666 of the transcripts contained a full open reading frame. The assembly covered 75% of the transcripts of Atlantic salmon and 57.3% of the protein‐coding genes of rainbow trout. To learn about the genome evolution, we performed a systematic comparative analysis across 11 teleosts including eight salmonids and found 313 unique gene families in taimen. Using Atlantic salmon and rainbow trout transcriptomes as the background, we identified 250 positive selection transcripts. The pathway enrichment analysis revealed a unique characteristic of taimen: It possesses more immune‐related genes than Atlantic salmon and rainbow trout; moreover, some genes have undergone strong positive selection. We also developed a pipeline for identifying microsatellite marker genotypes in samples and successfully identified 24 polymorphic microsatellite markers for taimen. These data and tools are useful for studying conservation genetics, phylogenetics, evolution among salmonids, and selective breeding for threatened taimen. Keywords: comparative transcript analysis, Hucho taimen, microsatellite markers, positive selection, RNA‐Seq 1. INTRODUCTION The genus Hucho belongs to the Salmonidae family and includes four species that are endangered. Taimen (Hucho taimen) is a strictly freshwater member of Hucho that is distributed widely from the Danube drainage basin to the Pacific Ocean in terms of longitude and from the Arctic Ocean to Mongolia and northern China in terms of latitude. In recent decades, overexploitation, hydropower dams, and pollution have dramatically diminished the habitat of this species (Gilroy et al., [34]2010; Zolotukhin, [35]2013), and its population has decreased approximately 37.3% worldwide (Hogan & Jensen, [36]2013). Taimen is also a rare endemic species in China. In the 1950s, this fish had a wide distribution from Northwest China to Northeast China, and the harvest was abundant. At present, however, only a small population remains in Kanas Lake in Xinjiang Province and the Amur River in Heilongjiang Province, and the estimated abundance in China has decreased by 80% over the past three generations (Hogan & Jensen, [37]2013). Therefore, taimen has been listed as an endangered species in China since 1998 (Yue & Chen, [38]1998) and has been included on the China Species Red List since 2004 (Wang & Xie, [39]2004). The global endangered status of taimen was re‐evaluated in 2013, and it was then classified as vulnerable on the IUCN Red List of Threatened Species (Hogan & Jensen, [40]2013). Taimen also has great potential to become an excellent aquaculture species because it is the largest salmonid and can grow to 55–66 kg in body weight and 160–170 cm in total length (Andreji & Stráňai, [41]2013). Moreover, taimen is one of the fastest growth species among salmonids (Andreji & Stráňai, [42]2013), and its body weight increases at an approximately linear rate below 10 years of age (Andreji & Stráňai, [43]2013). For conservation and exploitation of this species, artificial propagation and fry rearing have been successfully developed, and the species has been cultured widely in China (Li, Wang, Liu, Yin, & Lu, [44]2016; Wang et al., [45]2016). During aquaculture activity, taimen presents higher bacterial disease resistance than rainbow trout under the same conditions (Li et al., [46]2016; Wang et al., [47]2016). Although taimen is considered an ecologically and economically valuable species, few genetic resources are available for studies of its conservation genetics, phylogenetics, and selective breeding. Only approximately 450 taimen nucleotide recodes are available in GenBank (searched in 9 May 2017) and most are mitochondrial genes. Tong, Kuang, Yin, Liang, and Sun ([48]2006) and Wang, Kuang, Tong, and Yin ([49]2011) developed microsatellite markers; and Wang, Zhang, Yang, and Song (2011) and Balakirev, Romanov, Mikheev, and Ayala ([50]2016) sequenced the complete mitochondrial genome. RNA‐Seq is an excellent technology for studying phylogenetics and evolution and developing SSR and SNP markers for nonmodel species (Cahais et al., [51]2012; Ekblom & Galindo, [52]2010; Qian, Ba, Zhuang, & Zhong, [53]2014). In this study, we used RNA‐Seq to construct the transcriptome of taimen and develop SSR and SNP markers, and we also performed a systematic cross‐species comparative analysis for taimen and other salmonids. 2. MATERIALS AND METHODS 2.1. Sample collection and RNA isolation The RNA‐Seq analysis was performed on first‐generation offspring (Figure [54]1) artificially reproduced from wild stock, and these fish were collected from the Bohai Station of the Heilongjiang River Fisheries Research Institute (HRFRI) of the Chinese Academy of Fishery Sciences. To obtain the whole reference transcriptome, 10 individuals were collected, including two individuals at age 4+, six individuals at age 2+, and two individuals at age 1+. Twelve organs from each individual, including the skin, muscle, eyes, brain, spleen, kidney, intestines, stomach, liver, testes, ovaries, and gills, were dissected for RNA extraction, and all tissue samples were stored in RNALater solution (Qiagen, CA, USA) for transport. Total RNA was extracted using an RNeasy kit (Qiagen, CA, USA) and treated with DNaseI (Invitrogen, CA, USA) to remove genomic DNA. After a quality examination using a Bioanalyzer 2100 system (Agilent, CA, USA) and quantitation using a NanoDrop 8000 system (Thermo Fisher Scientific Inc., CA, USA), equal quantities of total RNA with RIN (RNA integrity number) ≥7.0 from each organ of each individual were mixed to construct the RNA‐Seq library. All experiments involving the handling and treatment of fish in this study were approved by the Animal Care and Use committee of the HRFRI of the Chinese Academy of Fishery Sciences. Figure 1. Figure 1 [55]Open in a new tab Yong fish (age 3+) of Hucho taimen. Photographed by Wei Xu, with permission 2.2. cDNA library construction and sequencing Two normalized cDNA libraries were constructed using an RNA‐Seq assay for transcriptome sequencing. One paired‐end cDNA library was generated from the pooled total RNA of 12 tissues from one individual at age 4+, and another library was generated from the pooled total RNA of 12 tissues from the other nine individuals. The paired‐end cDNA libraries were prepared using a TruSeq RNA prep Kit (Illumina, CA, USA) according to the Illumina protocols. The insert size was approximately 200 bp, and the two libraries were normalized with a DSN normalization kit (Illumina, CA, USA). The normalization libraries were sequenced on an Illumina HiSeq2000 system in the 100‐bp pair‐end mode. The cDNA library construction and sequencing were performed by a commercial service company (Genergy Inc, Shanghai, China). 2.3. De novo assembly The transcriptome sequences were assembled using the Trinity package (Grabherr et al., [56]2011; Haas et al., [57]2013). Before assembly, low‐quality reads were filtered from the raw reads using Trimmomatic (Bolger, Lohse, & Usadel, [58]2014) with the parameters LEADING:30 TRAILING:30 SLIDINGWINDOW:4:20 MINLEN:50. The clean reads from the two pooled libraries were merged and in silico normalized using the Trinity package with default parameters to reduce the running time and memory consumption. A parameter kmer size of 25 and a depth of at least two kmer were used for assembly with the Trinity package. The contigs resulting from Trinity were further fed to the TGI clustering Tool (version 2.1) (Pertea et al., [59]2003) to process alternative splicing and redundant sequences. 2.4. Assembly assessment A gold standard is not available to assess the quality of a de novo assembly transcriptome for a nonmodel species. We adopted general methods suggested by Trinity (Haas et al., [60]2013) and Fu et al. (Fu & He, [61]2012) to assess the assembly quality. The assessment method consisted of five criteria: read composition of the assembly, number of full‐length protein‐coding transcripts, quality of the assembled transcript sequences, completeness, and gene coverage. First, Bowtie2 (Langmead & Salzberg, [62]2012) was used to map the clean reads to the assembly, and a Perl script in the Trinity package was then used to summarize the properly mapped reads. Generally, a vast number of reads were mapped back to the assembly, and among them, approximately 70%–80% of the mapped reads were proper pairs. Second, the full‐length protein‐coding transcripts were characterized using the Blastx or Blastp tool, and the Swiss‐Prot protein database and several species’ (including zebrafish, threespine stickleback, Atlantic salmon, rainbow trout, medaka, fugu, and green spotted puffer) whole‐genome‐predicted protein sequences (Table [63]S1) were used as references to search the