Abstract Background Sturgeons (Acipenseriformes) are polyploid chondrostean fish that constitute an important model species for studying development and evolution in vertebrates. To better understand the mechanisms of reproduction regulation in sturgeon, this study combined PacBio isoform sequencing (Iso-Seq) with Illumina short-read RNA-seq methods to discover full-length genes involved in early gametogenesis of the Amur sturgeon, Acipenser schrenckii. Results A total of 50.04 G subread bases were generated from two SMRT cells, and herein 164,618 nonredundant full-length transcripts (unigenes) were produced with an average length of 2782 bp from gonad tissues (three testes and four ovaries) from seven 3-year-old A. schrenckii individuals. The number of ovary-specific expressed unigenes was greater than those of testis (19,716 vs. 3028), and completely different KEGG pathways were significantly enriched between the ovary-biased and testis-biased DEUs. Importantly, 60 early gametogenesis-related genes (involving 755 unigenes) were successfully identified, and exactly 50% (30/60) genes of those showed significantly differential expression in testes and ovaries. Among these, the Amh and Gsdf with testis-biased expression, and the Foxl2 and Cyp19a with ovary-biased expression strongly suggested the important regulatory roles in spermatogenesis and oogenesis of A. schrenckii, respectively. We also found the four novel Sox9 transcript variants, which increase the numbers of regulatory genes and imply function complexity in early gametogenesis. Finally, a total of 236,672 AS events (involving 36,522 unigenes) were detected, and 10,556 putative long noncoding RNAs (lncRNAs) and 4339 predicted transcript factors (TFs) were also respectively identified, which were all significantly associated with the early gametogenesis of A. schrenckii. Conclusions Overall, our results provide new genetic resources of full-length transcription data and information as a genomic-level reference for sturgeon. Crucially, we explored the comprehensive genetic characteristics that differ between the testes and ovaries of A. schrenckii in the early gametogenesis stage, which could provide candidate genes and theoretical basis for further the mechanisms of reproduction regulation of sturgeon. Keywords: Amur sturgeon, Acipenser schrenckii; Isoform sequencing; Gonad transcriptome; Early gametogenesis Background Sturgeons (Acipenseriformes) are polyploid chondrostean fish that originated during the Devonian period and have over 200 million years of history; thus, they constitute an important model species for studying development and evolution in vertebrates [[39]1, [40]2]. As the source of caviar food, the sturgeon has high economic value, which has resulted in intense fishing pressure on wild stocks, so leading them to be listed among the more endangered group of species. In the updated February 2019 press release, the International Union for Conversation of Nature and Natural Resource (IUCN) Red List identified sturgeon as one of the most endangered animal groups; some 85% of sturgeon species are on the verge of extinction ([41]http://www.iucnredlist.org/search). Efforts have been made worldwide to develop a sturgeon aquaculture industry for artificial reproduction since the early 1960s. However, to date, many new problems have emerged that require attention. For instance, The developmental asynchronism of the double sex gametes i.e. sperm and ovum and reproduction interval of 2–7 years markedly reduce reproduction efficiency in sturgeon artificial propagation practices, causing more aquaculture cost. Therefore, studies of gametogenesis mastering the mechanism of reproduction regulation in sturgeon are greatly useful for reproduction evaluation and aquaculture management. Gametogenesis includes oogenesis and spermatogenesis, which mainly are comprised of germ cell growth and proliferation, primary spermatocytes and primary oocytes formation, until matured gametes of double sexes production. In fish, the onset of meiosis of germ cell is one of the most important steps in gametogenesis, which involved in the sex steroid hormones stimulating and the endocrine regulation [[42]3, [43]4]. The previous studies reported various abnormalities in sturgeon gametogenesis and occurs of aberrant intersex gonads in cultured sturgeons [[44]5, [45]6], which significantly impact on the viability of progeny and decrease the efficiency of sturgeon stock enhancement. In this context, the first and core step in investigating the key genes involved in early gametogenesis in sturgeon is to acquire full-length nucleotide sequences information of the genes. Whole-genome sequencing and assembly combined with transcriptome data would be an efficient way to systematically characterize gene models. However, the sturgeon genome is large and includes numerous mini-chromosomes and substantial polyploidy caused by genome duplication [[46]7, [47]8], which possess significant difficulties to entire-genome sequencing. Recently, transcriptome-scale sex-related gene characterization was conducted in different sturgeon species with next-generation high-throughput sequencing technologies, including Adriatic sturgeon (A. naccarii) [[48]9], Chinese sturgeon (A. sinensis) [[49]10], Amur sturgeon (A. schrenckii) [[50]11, [51]12] and Russian sturgeon (A. gueldenstaedtii) [[52]13]. However, the genes acquired by only assemble procedure used in above studies signify incomplete sequences, which extremely restricts the yield of full-length genes. The third-generation long-read sequencing platform can overcome this difficulty. In comparison with short-read sequencing, the methodological advantages of PacBio Isoform sequencing (Iso-Seq) include better completeness in sequencing both the 5′ and 3′ ends of the full-length cDNA molecules and greater accuracy in producing isoform-level transcripts. Recently, PacBio Iso-Seq technology has been successfully used in multiple species, such as Populus [[53]14], Dolly Varden char [[54]15], human cell lines and tissues [[55]16], rabbits [[56]17] and primates [[57]18]. The transcriptomic data produced by PacBio Iso-Seq provide innovative research materials. For example, deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon has opened the door for gaining a deep understanding of genome evolution and human diseases [[58]19], and the full-length transcripts from the Iso-Seq platform have provided new insight into the extreme metabolism of the ruby-throated hummingbird [[59]20]. Meanwhile, the reconstruction and annotation of full-length transcripts also plays a critical role in gene discovery, particularly for species with no reference genome, such as the transcript variants involved in the innate immune system in Litopenaeus vannamei [[60]21], gene families with two and more isoforms in Misgurnus anguillicaudatus [[61]22] and transcript diversity in bioactive compound biosynthesis of Astragalus membranaceus [[62]23]. Full-length transcriptome analyses may also drive new innovative progress for understanding the mechanisms of reproduction regulation in sturgeon. In this study, we adopted joint PacBio Iso-Seq and short-read RNA-seq to generate a high-confidence full-length transcripts dataset of the gonads of 3-year-old A. schrenckii individuals and further used them to obtain comparative transcriptomic analysis for quantification investigation between testes and ovaries. Subsequently, a functional annotation of these full-length transcripts was systematically conducted with well-curated databases. Alternative splicing (AS) events, long noncoding RNAs (LncRNAs) and transcription factors (TFs) were detected. Most importantly, searching for genes involved in early gametogenesis and association analyses of related factors and early gametogenesis were performed. Herein, we not only provide a valuable resource i.e. a comprehensive full-length transcript set for the genomic-level reference of sturgeon but also systematically characterize early gametogenesis related genes of A. schrenckii to further investigate the functions of molecules during gametogenesis in sturgeon. Materials and methods Samples collection and histological analysis A total of ten 3-year-old healthy Amur sturgeon (five males and five females) were sampled from the Engineering and Technology Center of Sturgeon Breeding and Cultivation of the Chinese Academy of Fishery Sciences (Beijing, China) in this study. Before sampling, the experimental individuals were anesthetized with eugenol in water for 1–3 min according to the AVMA guidelines for use (2013 version). The gonads i.e. ovaries and testes were separately collected and treated. One part of each gonad tissues was preserved in Bouin’s fixation for histological procedures, and another part was immediately immersed into liquid nitrogen until total RNA extraction. The histological analysis of each gonad tissue was performed using hematoxylin and eosin (HE) staining. RNA extraction and quality evaluation Total RNA was extracted from each gonad tissue (testis or ovary) using RNAiso reagent (Takara, Tokyo, Japan). To prevent genomic DNA contamination, RNA sample was treated to digest DNA using RNase-free DNase I during extraction of total RNA. RNA purity and concentration were checked using a Nanodrop 2000 spectrophotometer (Thermo Scientific). RNA integrity was assessed using an Agilent RNA 6000 Nano reagents part I Kit in an Agilent 2100 Bioanalyzer System (Agilent Technologies, CA, USA). The RNA quality criteria for the RNA samples were RIN ≥ 7.0 (RNA Integrity Number), and 1.8 < OD260/280 < 2.2. The qualified RNAs were used for further PacBio and Illumina library construction, respectively. All the sequencing operations were conducted at Biomarker Technologies CO., LTD (Beijing, China). Considering the relatively large size of the sturgeon genome and the available A. schrenckii individuals (a presumed octaploid species), the standard of sequencing quantity (clean data) was set as follows: Pacbio Iso-Seq of the library was at least 40 G and Illumina short-read RNA-seq was greater than 20 G for each sample. PacBio library construction and sequencing To construct the library for PacBio sequencing, the qualified RNA from seven tissues, including the three testes and four ovaries, were mixed in equal amounts. The mixed RNA sample was reverse-transcribed for mRNA using the SMARTer™ PCR cDNA Synthesis Kit. PCR amplification was performed using the KAPA HiFi HotStart PCR Kit. Then, the PCR product for the SMRTbell library was constructed using the SMRTbell template pre kit. The concentration of the SMRTbell library was measured using a Qubit 3.0 fluorometer with a Qubit™ 1X dsDNA HS Assay kit (Invitrogen, Carlsbad, USA). The quantified criteria of library quality were concentration > 10 ng/μl with dispersive but continuous distribution in the range of 1–10kbp. A total of 2.5 ng of the library was sequenced for each SMRT cell using the binding kit 2.1 from the PacBio Sequel platform, producing 20 h of movies. The sample information was first registered as BioProject with accession number PRJNA532819 and BioSample with accession number SAMN11415730. Subsequently, the subread sequence generated by the PacBio Iso-Seq platform was deposited into the NCBI Sequence Read Archive (SRA) with accession number SRR7453063. Error correction of PacBio Iso-Seq reads According to PacBio’s protocol, the raw polymerase reads were first processed using SMRTlink 5.0 software. Briefly, after removing the SMRTbell™ adapter and the low-quality data, postfilter polymerase reads were obtained. The circular consensus sequence (CCS) was generated from the subreads BAM files, also known as the reads of insert (ROI). All the ROIs whose number of full passes were > 1 were further classified into full-length (FL) and non full-length (nFL) transcript sequences based on whether the 5′ primer, 3′ primer and poly A tail could be simultaneously observed. We employed a three-step strategy for error correction to improve the accuracy of the full-length transcripts produced by the PacBio Iso-Seq platform. First, the circle sequencing with > 1 pass provided an opportunity for ROI self-correction. Second, full-length, nonchimeric (FLNC) reads were subjected to nonredundant and clustering treatments by the ICE Quiver algorithm and to Arrow polishing with the nFL sequence, producing high-quality and polished full-length consensus sequences. Finally, these polished consensus sequences were further subjected to correction and redundancy removal with Illumina short reads using the Proovread tool and the CD-HIT program with a –c 0.99 parameter cutoff [[63]24], respectively. The above three corrections resulted in nonredundant, nonchimeric, full-length unigenes (isoform level) with high accuracy for subsequent analyses. Illumina library construction and sequencing The Illumina library was prepared using the NEBNext, Ultra™ RNA Library Prep Kit (E7530 L) for Illumina (NEB, USA). Briefly, polyadenylated RNA was isolated and randomly separated into fragments. First-strand cDNA was synthesized using random hexamer primers, followed by second-strand synthesis. The purification of the double-stranded cDNA was performed using VAHTSTM mRNA capture beads. The purified and repaired double-stranded cDNA fragments were selected by size in the range of 250 bp ~ 350 bp. The concentration and quality control of the Illumina library were measured using a Qubit 3.0 fluorometer with an ExKubit dsDNA HS Assay kit (Invitrogen, Carlsbad, USA) and a Qsep 400 fragment analyzer, respectively. The quantified criteria of the library quality were a concentration > 1 ng/μl in a range of 380 bp ~ 480 bp fragments. The Illumina libraries were finally sequenced on the Illumina HiSeq platform. Raw short reads in FASTQ format were first processed through in-house Perl scripts; clean short reads were obtained by removing reads containing adapter and ploy-N, and with low quality from the raw short reads. Simultaneously, the Q30, GC content and sequence duplication level of the clean data were calculated. The clean short reads were then mapped to the PacBio reference sequence using Tophat2 tools. Only reads with perfect matches or only one mismatch were further analyzed. Functional annotation of unigenes For comprehensive functional annotation, the unigenes were searched against the following seven databases using BLAST software (version 2.2.26) [[64]25]: NR (NCBI nonredundant protein sequences) [[65]26], COG (Cluster of Orthologous Groups of proteins) [[66]27], Pfam (Protein family) [[67]28], Swiss-Prot (A manually annotated and reviewed protein sequence database) [[68]29], KEGG (Kyoto Encyclopedia of Genes and Genomes) [[69]30], GO (Gene Ontology) [[70]31] and eggNOG (Cluster of Orthologous Groups of proteins) [[71]32]. The Diamond BLASTX methods [[72]33] with an E-value < 1 × 10^− 5 were analyzed in NR, COG, Swiss-Prot, Pfam and KEGG annotations. Quantification of unigene expression levels and differential expression analysis Using the full-length transcripts yielded from the SMRT Iso-Seq analysis as reference sequences, the unigene expression levels between the ovaries and testes of A. schrenckii were further analyzed based on the short-reads datasets generated by the Illumina sequencing platform. The extracts of three testes and four ovaries from seven A. schrenckii individuals were separately used as examples of the analysis. Quantifications of the unigene expression values from the Illumina reads of each sample were determined with RSEM using the default parameters [[73]34]. Briefly, the clean data from the Illumina sequencing were mapped back onto the reference sequences, and the readcount values of the unigenes for each sample were obtained. To eliminate the effects of the sequencing depth and transcript length, all the readcounts were transformed into FPKM values (expected number of fragments per kilobase of transcript sequence per million base pairs sequenced). The 0.1 threshold of FPKM value is regarded as the expression criterion of the unigene in tested tissue (FPKM > 0.1) according to the method [[74]35]. To detect the differentially expressed genes (DEGs), all the readcounts of each sample were first normalized into a standardized readcount using the edgeR package. Then, the differential expression analysis of testes and ovaries was performed using the EBSeq R package mode based on the negative binomial distribution from the biological replicates. The resulting false discovery rate (FDR) values were adjusted using the posterior probability of being DE (PPDE) approach. Herein, the |log2(Fold Change)| > 1 and FDR < 0.05 were used as the threshold for determining DEGs. Alternative splicing prediction and analysis Based on the BLAST method [[75]25], all the unigenes were used for pairwise alignment. Finally, BLAST alignments that met all three of the following criteria were considered products of candidate AS events [[76]36]. Briefly, 1) the length of two unigenes was both greater than 1000 bp, and there were two high-scoring segment pairs (HSPs) in the alignment. 2) The AS gap between two aligned unigenes was greater than 100 bp and at least 100 bp from the 3′ end and 5′ end. 3) A 5 bp overlap could be tolerated. LncRNA prediction and analysis Unigenes with a length of over 200 nt and having more than two exons were selected as lncRNA candidates. Then, four computational approaches were employed to further screen the protein-coding unigenes from the noncoding unigenes: 1) we performed the coding-noncoding-Index (CNCI) with default settings [[77]37] to assess the coding potential; 2) we used the coding potential calculator (CPC) to search for the unigenes in the NCBI eukaryotic protein database with a score < 0 setting [[78]38]; 3) we translated each transcript in all three possible frames and used the Pfam scan utility with default parameters to identify the occurrence of any of the known protein family domains documented in the Pfam database [[79]39]; 4) we used the coding potential assessment tool (CPAT) to assess the putative protein-coding unigenes by calculating the Fickett and Hexamer scores based on the logistic regression model [[80]40]. As a result, all the isoform transcripts with coding potential were filtered, and the intersecting unigenes without coding potential formed our candidate set of lncRNAs. Transcription factor prediction and further analysis Transcription factor (TF)-related unigene sequences were predicted using the BLAST method with the AnimalTFDB database [[81]41]. The HMG domain of SOX was further verified using the SMART mode ([82]http://smart.emblheidelbergde/index2.cgi). According to the lengths of the four nonredundant full-length Sox9 genes, we renamed them as Asc_Sox9–1, Asc_Sox9–2, Asc_Sox9–3 and Asc_Sox9–4, respectively. A neighbor-joining phylogenetic tree was reconstructed based on amino acid sequences using the MEGA 7.0 package with the following parameter settings: p-distance model and partial deletion treatment. Meanwhile, Sox2 from the zebrafish Danio rerio (accession number: [83]BAE48583.1) was chosen as the out-group protein sequence. Results Histology characteristics from the gonads of 3-year-old A. schrenckii The gonads of 3-year-old A. schrenckii were visible at the early gametogenesis stage. In 3-year-old female individuals, the histological section of ovary showed deep, branching ovarian lamellae structure and was mainly composed of primary growth oocytes of different diameter sizes ranging from 100 to 500 μm. Some oocytes at the perinucleolar stage were clearly observed with a high number of small nucleoli along the nuclear perimeter. In 3-year-old male individuals, section of the entire germ region of testis showed smooth surface of the lateral side. By histological observation, it was found that the testis tissue displayed alveolate seminiferous lobules organization structure. The anastomosing tubules were separated from each other by thin layers of compact connective tissues and filled with spermatogonia enveloped by their own Sertoli cells. Meanwhile, we also found that there were some degenerating oocytes probably undergoing autophagy and some spermatogonia with pyknotic nuclei were probably apoptotic. Histomorphological characteristics of the ovary and testis from 3-year-old A. schrenckii individuals are shown in Fig. [84]1. Fig. 1. [85]Fig. 1 [86]Open in a new tab Histological characteristics of ovary (a and b) and testes (c and d) from 3-year-old A. schrenckii individuals. OI, ovarian lamellae; OL, ovarian lumen; PG, primary growth oocyte; BM, basement membrane; N, oocyte nucleus; Nu, oocyte nucleoli; OG, oogonia; BV, blood vessels; Lo, lobule; SG, spermatogonia; SC, Sertoli cell; The gonadal tissues were stained with hematoxylin and eosin (HE staining). Scale bar in A = 100 μm and in B, C, D = 50 μm Full-length transcripts from the gonads of A. schrenckii The full-length transcriptome of A. schrenckii was generated using the PacBio Sequel platform on the pooled RNA from seven tissues, including three testes and four ovaries. The resulting total of 50.04 G subread bases was generated by two SMRT cells from the PacBio library; therefore, the 1,260,958 reads of insert were produced with a mean read quality of 0.95 and mean passes of 14 circles (Supplementary Table [87]1). By applying the standard Iso-Seq classification and clustering protocol, all the ROIs were further classified into 358,153 nFL sequences and 860,617 FLNC reads with a mean length of 2548 bp. Based on the ICE Quiver and Arrow polishing algorithms, we produced 461,596 polished full-length consensus transcripts with a mean length of 2782 bp, including 335,067 high-quality (HQ) and 125,969 low-quality (LQ) sequences. After correction using short reads produced by Illumina short-read RNA-seq and subsequently removing redundancies using the CD-Hit program, the consensus transcripts were finally clustered into a total of 164,618 unigenes for subsequent analysis. We found that 91.09% of the unigenes main length distribution ranging from 1 to 6 kbp (Fig. [88]2 and Supplementary Table [89]2). The Iso-Seq statistics from the gonads of A. schrenckii by the PacBio Sequel platform are listed in Table [90]1. Fig. 2. Fig. 2 [91]Open in a new tab Distribution of nonredundant full-length unigenes from the gonad transcriptome of A. schrenckii by the PacBio Sequel platform Table 1. Description of Iso-Seq from the gonads of A. schrenckii by PacBio Sequel platform species SMRT cells Subreads base (G) Reads of Insert (ROIs) nonfull-length reads Full-length reads FLNC reads Consensus transcripts Mean length (bp) High-quality consensus transcripts Low-quality consensus transcripts Unigenes A. schrenckii 2 50.04 1,260,958 358,153 869,918 860,617 461,596 2782 335,067 125,969 164,618 [92]Open in a new tab Meanwhile, the clean short reads were respectively generated from seven gonads of A. schrenckii (including three testes and four ovaries) using Illumina sequencing platform. The quality evaluation indexes of the clean short reads are summarized in Supplementary Table [93]3. In sum, the high-quality clean short reads from each sampled Illumina library was obtained with at least 6.96 × 10^7 read numbers and a Q30 of over 90%. Efficient gene annotation of full-length unigenes from A. schrenckii gonads To obtain a comprehensive functional annotation from the full-length gonad transcriptome of A. schrenckii, all the 164,618 nonredundant unigenes were assigned to align with seven different databases, including NR, KOG, Pfam, SwissProt, KEGG, GO, and eggNOG. A total of 93.55% of the unigenes (154,006 of 164,618) were successfully annotated with significant hits (E-value <1E^− 5) from these well-curated databases. The statistics of the full-length unigene annotations are listed in Table [94]2. The remaining unannotated unigenes (10,612 unigenes) might represent novel A. schrenckii species-specific genes. Table 2. Statistics of full-length unigene annotation performed with seven different databases Database NO. transcripts annotated Annotated rate (%) NO.unigene 164,618 – NR 153,455 93.21 KOG 151,313 91.92 Pfam 134,181 81.51 SwissProt 102,427 62.22 KEGG 87,086 52.90 GO 72,758 44.20 eggNOG 47,463 28.83 Total 154,006 93.55 [95]Open in a new tab Among the annotated 60 classified GO terms, cellular process was identified as the most common annotation in the biological process; metabolic process and biological regulation were the next most abundant GO terms. In the molecular function and cellular component categories, binding and cell part annotations were identified as the most abundant terms, respectively. Two putative early gametogenesis related GO terms, including reproductive process (involving 1188 unigenes) and reproduction (involving 1134 unigenes), were successfully annotated. The GO classifications of the full-length unigenes from the gonads of A. schrenckii are shown in Supplementary Figure [96]1 and Supplementary Table [97]4. In the KEGG classification, a total of 295 pathways annotated from 87,086 nonredundant unigenes were extracted from the gonad transcriptome of A. schrenckii (Supplementary Table [98]5). The results showed that the protein processing endoplasmic reticulum (2329 unigenes), RNA transport (2312 unigenes) and cell cycle (2263 unigenes) were the top three pathways with the most abundant unigenes. Notably, we paid attention to 14 KEGG pathways, which may be closely associated with early gametogenesis of sturgeon. Among these, MAPK signaling pathway (1834 unigenes), oocyte meiosis (1731 unigenes) and progesterone-mediated oocyte maturation (1398 unigenes) were the top three pathways with the most abundant unigenes distribution. Search for genes involved in early gametogenesis of A. schrenckii A total of 60 genes (involving 755 unigenes), reported as active in the gonad development of sturgeon [[99]10, [100]42–[101]44], were detected to be present in the full-length gonad transcriptome of 3-year-old A. schrenckii. Therefore, the 60 early gametogenesis related genes and their NR annotations of full-length unigenes from the Iso-Seq of A. scipenserkii gonads are listed in Table [102]3 and Supplementary Table [103]6, respectively. Mainly, two major sex determination related genes in fish, Dmrt1and Gsdf, were also present in the full-length gonad transcriptome. Nine spermatogenesis-related genes were significantly matched with the full-length gonad transcriptome, including AR, Vasa, ER, Rspo1, Igf I, Dkk1, Hsd11b2, Cyp11b and ATRX. Meanwhile, four oogenesis-related genes were also predicted from the full-length gonad transcriptome, including Foxl2, Cyp19a, VR and Ozf. Four hormone receptor genes, including AR, ER, Gnrhr and Fshr. Two members of transforming growth factor (TGF)-β superfamily, Amh and Gsdf. Nine genes belonging to the sox subfamily, Sox3, Sox4, Sox5, Sox6, Sox7, Sox8, Sox9, Sox10 and Sox18, and five genes belonging to the double sex and mab-s (DM) domain, Dmrt1, Dmrt2, DmrtA1, DmrtA2 and DmrtB1, were found. In addition, Nine Sap family genes and four Hsp family genes were also concerned. We also found that Oct4 has the most abundant transcript diversity, with 100 unigenes predicted with ORFs among 104 unigenes, followed by Hsp70 (59/64) and Cyp19a (49/50). Table 3. Searching for genes putatively involved in early gametogenesis from the full-length gonad transcriptome of A. scipenserki Gene Symbol Gene Description Length (bp) NO. Unigene NO. Unigene with ORF Dmrt1 double sex and mab-3 related transcription factor 1 2893&3843 2 0 Dmrt2 double sex and mab-3 related transcription factor 2 2291 1 1 DmrtA1 double sex and mab-3 related transcription factor A1 1422–2140 4 3 DmrtA2 double sex and mab-3 related transcription factor A2 2297–3868 12 12 DmrtB1 doublesex and mab-3 related transcription factor B1 2325–2377 4 0 Amh anti-müllerian hormone 2071–3443 5 5 Foxl2 forkhead box protein L2 1802–2143 3 3 Bmp15 bone morphogenetic protein 15 1906–3389 11 7 Cyp19a cytochrome P450 1A 1757–3305 50 49 Ctnnb1 catenin beta-1 3074–3598 13 13 OCT4 POU2 1830–4694 104 100 Lhx1 LIM homeobox 1 1213 1 1 Sf1 steroidogenic acute regulatory protein 2674 1 1 Rspo1 R spondin 1 2372 1 1 Sox3 SRY box 3 1335–3125 56 55 Sox4 SRY box 4 2257–3666 5 4 Sox5 SRY box 5 1500–6536 7 5 Sox6 SRY box 6 1491 1 0 Sox7 SRY box 7 1679–2211 15 13 Sox8 SRY box 8 2619–3152 7 7 Sox9 SRY-related HMG box gene 9 2657–3491 6 6 Sox10 SRY box 10 1968 1 1 Sox18 SRY box 18 2439 1 1 AR androgen receptor 2941–8079 6 5 Vasa vasa 2284–2876 30 27 ERɑ, ERβ estrogen receptor 1956–6770 17 11 Pdgfrα platelet-derived growth factor receptor alpha 3552 1 0 Hsd11b2 11-beta-hydroxysteroid dehydrogenase 2243–3278 3 2 Lhx9 LIM homeobox 9 3300&3332 2 2 Emx2 homeobox protein EMX2 2058 1 1 Fshr follicle-stimulating hormone receptor 4201–5062 4 4 Gnrhr gonadotropin-releasing hormone receptor 815&1002 2 0 Igf I insulin-like growth factor I 2729&4884 2 2 Dkk1 dickkopf-related protein 1 803–1948 9 7 Wt1 wilms tumor protein homolog 3322 1 1 Fgfr2 fibroblast growth factor receptor 2 3916 1 1 VR vitellogenin receptor 1224&3294 2 1 Fem1 fem-1 homolog A 1763–4084 13 10 Gsdf gonadal soma derived factor 1602–2461 12 7 Spo11 meiotic recombination protein 1652–1797 5 3 Fstb2 follistatin b2 3181 1 1 Ozf6 oocyte zinc finger protein 6 1506–4396 47 35 Ozf7 oocyte zinc finger protein 7 1487–4252 21 17 Ozf22 oocyte zinc finger protein 22 3066 1 0 Hsp heat shock protein 479–3651 7 7 Hsp70 heat shock protein 70 971–4279 64 59 Hsp75 heat shock protein 75 2289–2505 3 0 Hsp90 heat shock protein 90 403–7147 48 35 Kpi1 kunitz-type protease inhibitor 1 2541–4371 13 10 Gsf1 gametocyte-specific factor 1 592–6520 27 24 ATRX ATRX 1417–8148 21 16 Sap2 spermatogenesis-associated protein 2 2141–4829 26 25 Sap5 spermatogenesis-associated protein 5 2419–3622 13 10 Sap6 spermatogenesis-associated protein 6 2173–4421 10 10 Sap7 spermatogenesis-associated protein 7 1666–3201 14 11 Sap13 spermatogenesis-associated protein 13 4707–6505 5 5 Sap17 spermatogenesis-associated protein 17 644–2371 3 2 Sap20 spermatogenesis-associated protein 20 3934 1 1 Sap22 spermatogenesis-associated protein 22 1406–1699 4 3 Sap24 spermatogenesis-associated protein 24 1691–5433 4 0 [104]Open in a new tab Differential expression genes in early gametogenesis in the gonad transcriptome of A. schrenckii Using nonredundant full-length transcripts as genome sequence references and combining the clean short reads datasets from the