Abstract

   Transcriptome complexity plays crucial roles in regulating the
   biological functions of eukaryotes. Except for functional genes,
   alternative splicing and fusion transcripts produce a vast expansion of
   transcriptome diversity. In this study, we applied PacBio
   single-molecule long-read sequencing technology to unveil the whole
   transcriptome landscape of Lateolabrax maculatus. We obtained 28,809
   high-quality non-redundant transcripts, including 18,280 novel isoforms
   covering 8,961 annotated gene loci within the current reference genome
   and 3,172 novel isoforms. A total of 10,249 AS events were detected,
   and intron retention was the predominant AS event. In addition, 1,359
   alternative polyadenylation events, 3,112 lncRNAs, 29,609 SSRs, 365
   fusion transcripts, and 1,194 transcription factors were identified in
   this study. Furthermore, we performed RNA-Seq analysis combined with
   Iso-Seq results to investigate salinity regulation mechanism at the
   transcripts level. A total of 518 transcripts were differentially
   expressed, which were further divided into 8 functional groups.
   Notably, transcripts from the same genes exhibited similar or opposite
   expression patterns. Our study provides a comprehensive view of the
   transcriptome complexity in L. maculatus, which significantly improves
   current gene models. Moreover, the diversity of the expression patterns
   of transcripts may enhance the understanding of salinity regulatory
   mechanism in L. maculatus and other euryhaline teleosts.

   Keywords: Lateolabrax maculatus, Iso-Seq, full-length transcripts,
   alternative splicing, isoform, salinity regulation

Introduction

   With the development of high-throughput sequencing of the
   transcriptome, biologists have begun to pay more attention to multiple
   post-transcriptional processes of precursor-messenger RNA (pre-mRNA).
   Alternative splicing (AS), a key post-transcriptional processing of
   pre-mRNA, is prevalent in most eukaryotic organisms ([43]Barbazuk et
   al., 2008; [44]Pan et al., 2008; [45]Kornblihtt et al., 2013; [46]Yi et
   al., 2018; [47]Zhang et al., 2019a), which makes an important
   contribution to the enhancement of the functional complexity of the
   transcriptome ([48]Graveley, 2001; [49]Reddy et al., 2013;
   [50]Abdel-Ghany et al., 2016). Transcriptome complexity plays an
   important role in increasing the coding capacity of genes, generating
   proteome diversity, regulating gene expression, cellular physiological
   and developmental processes ([51]Lareau et al., 2004; [52]Abdel-Ghany
   et al., 2016; [53]Wang et al., 2016). It has been shown that over 90%
   of multi-exonic genes in human (Homo sapiens) ([54]Pan et al., 2008),
   46% in fruit fly (Drosophila melanogaster) ([55]Hansen et al., 2009),
   and 61% in the model plant thale cress (Arabidopsis thaliana) are
   alternatively spiced ([56]Marquez et al., 2012). Although the
   functional significance of most spliced isoforms has yet to be fully
   elucidated, several studies suggest that AS is a profound regulatory
   process involved in organismal function. For instance, in thale cress,
   zinc-induced facilitator-like 1 can produce two spliced isoforms, one
   that regulates stomatal movement and another that influences cellular
   auxin transport ([57]Reddy et al., 2013). Coincidentally, the Bcl-x
   gene in fruit fly yields two different isoforms, one of which inhibits
   apoptosis, while the other activates apoptosis ([58]Chang et al.,
   2004). In addition, alternative polyadenylation (APA), another
   post-transcriptional regulatory events in which RNA molecules with
   different 3’ ends originate from distinct polyadenylation sites of a
   single gene, is emerging as a mechanism widely used to regulate gene
   expression ([59]Chen et al., 2017b). APA events may alter sequence
   elements and/or the coding capacity of transcripts, and could be
   considered as a mechanism that adds another layer to regulation of
   transcriptome diversity ([60]Shen et al., 2011; [61]Abdel-Ghany et al.,
   2016; [62]Ha et al., 2018).

   However, due to technical limitations, most information on these
   post-transcriptional regulatory events remains limited. Although data
   from short-read sequencing have accumulated over recent years, it
   remains an immense challenge to obtain full-length (FL) sequences for
   each RNA because of difficulties in the short read-based assembly,
   which limits the identification and prediction of post-transcriptional
   events ([63]Wang et al., 2016; [64]Chen et al., 2017a). In the last few
   years, Pacific BioSciences (PacBio) single-molecule real-time
   sequencing has been introduced ([65]Rhoads and Au, 2015). The PacBio
   isoform sequencing (Iso-Seq) platform can directly produce FL
   transcripts without an assembly process, providing superior evidence
   for comprehensive analysis of splice isoforms of each gene and
   improving the annotation of existing gene models ([66]Tilgner et al.,
   2014; [67]Gordon et al., 2015; [68]Wang et al., 2016). Recently,
   Iso-Seq has led to the discovery of thousands of novel genes and
   alternatively spliced isoforms in human ([69]Au et al., 2013), mouse
   (Mus musculus) ([70]Karlsson and Linnarsson, 2017), rabbit (Oryctolagus
   cuniculus) ([71]Chen et al., 2017a), sorghum (Sorghum bicolor)
   ([72]Abdel-Ghany et al., 2016), and maize (Zea mays) ([73]Wang et al.,
   2016). This finding indicates that Iso-Seq is sensitive to detecting FL
   transcripts and serves as a valuable resource for transcriptome
   complexity research. In addition, the studies in aspen (Populus
   tremuloides) ([74]Chao et al., 2019), strawberry (Fragaria vesca)
   ([75]Li et al., 2017), and pig (Sus scrofa) ([76]Li et al., 2018) also
   provide strong evidence that Iso-Seq could complement short-read
   sequencing with cataloguing and quantifying eukaryotic transcripts.

   Spotted sea bass (Lateolabrax maculatus) is a euryhaline teleost fish
   naturally distributed in the northwestern Pacific Ocean, especially
   along the Chinese coast, reaching south to the borders of Vietnam and
   north to Korea and Japan ([77]Zhang et al., 2001; [78]Tseng and Hwang,
   2008; [79]Seo et al., 2016). It is considered as one of the most
   popular economic fishes because of its high nutritive value and
   pleasant taste. Since the release of the draft reference genome of L.
   maculatus ([80]Shao et al., 2018; [81]Chen et al., 2019), more
   functional genes have been discovered. However, most of the existing
   gene models are derived from in silico prediction with a lack of
   reliable annotation of alternative isoforms and untranslated regions,
   which would prevent accurate evaluation of transcriptome complexity
   ([82]Chen et al., 2017a). Hence, our study is crucial for facilitating
   the biological research of L. maculatus.

   Salinity represents a major abiotic stress and critical environmental
   factor that directly affects the survival, growth, development,
   reproduction, and physiological functions of all aquatic organisms
   ([83]Kultz et al., 2013; [84]Kultz, 2015). L. maculatus, a typical
   euryhaline fish, is capable of inhabiting freshwater, brackish water,
   seawater, and hypersaline water ([85]Kim et al., 1998). It has been
   documented that L. maculatus can tolerate a considerable range of
   external salt concentrations (0–45 ppt) and maintain constant internal
   osmotic homeostasis ([86]Zhang et al., 2019b). Hence, it provides an
   excellent model with which to identify and characterize osmoregulatory
   mechanisms. In spotted sea bass, RNA-Seq analysis has been performed to
   identify hundreds of genes involved in salinity adaptation and
   osmoregulation ([87]Zhang et al., 2017). In addition, previous RNA-Seq
   studies in other aquaculture fish species, including Asian seabass
   (Lates calcarifer), striped catfish (Pangasianodon hypophthalmus),
   Mozambique tilapia (Oreochromis mossambicus), and Nile tilapia
   (Oreochromis niloticus), have identified several differentially
   expressed genes in response to distinct salinity concentrations
   ([88]Xia et al., 2013; [89]Thanh et al., 2014; [90]Ronkin et al.,
   2015), which were considered as candidate osmoregulatory genes.
   However, due to limitations of technology, RNA-Seq lacks the ability to
   accurately quantify the transcripts or isoforms ([91]Steijger et al.,
   2013). In this study, we applied Iso-Seq to uncover
   post-transcriptional regulatory events in L. maculatus and combined
   with gill RNA-Seq to investigate salinity regulation at the transcript
   level. It was the first time that Iso-Seq was applied in aquaculture
   teleosts, providing the first comprehensive view of transcriptome
   complexity in L. maculatus and characterizing differentially expressed
   transcripts (DETs) involved in osmoregulatory mechanisms, which refines
   the annotation of the reference genome and serves as a valuable
   reference for future Iso-Seq studies.

Materials and Methods

Ethics Statement

   All experiments involving animals were conducted according to the
   guidelines and approved by the respective Animal Research and Ethics
   Committees of Ocean University of China (Permit Number: 20141201). The
   field studies did not involve any endangered or protected species.

Fish Sample Collection for Iso-Seq

   Three L. maculatus adults (body length: 44.92 ± 4.63 cm, body weight:
   551.23 ± 7 9.84 g) were obtained from Kiaochow Bay of the Yellow Sea,
   China. The fish individuals were anesthetized with MS-222 and rapidly
   dissected for 13 tissues including brain, hypophysis, gill, heart,
   liver, stomach, intestine, kidney, spleen, gonad, muscle, fin, and
   skin. Then, these tissues were immediately frozen in liquid nitrogen
   and transferred to −80°C refrigerator until the extraction of RNA.

RNA Extraction

   Total RNA was extracted using TRIzol reagent (Invitrogen, CA, USA)
   according to the manufacturer’s instructions and digested with
   RNase-free DNase I (Takara, Shiga, Japan) to remove genomic DNA
   contamination. The reagents and instruments involved in this experiment
   were treated with 0.1% (vol/vol) diethylpyrocarbonate (DEPC) to
   maintain RNase-free conditions. The concentration and integrity of RNA
   was monitored using NanoDrop ND-1000 (NanoDrop Technologies, DE, USA)
   and 1% agarose gel electrophoresis, respectively. Agilent Bioanalyzer
   2100 system (Agilent Technologies, CA, USA) was used to assess the
   quality of extracted RNA. All RNA samples from three L. maculatus were
   equally pooled together for following PacBio Iso-Seq.

Iso-Seq Library Construction and Sequencing

   According to the Iso-Seq protocol, 1 µg total RNA was transcribed to
   generate full-length cDNA using the SMARTer PCR cDNA Synthesis Kit
   (Clontech, CA, USA). Then, the cDNA was amplified using the advantage 2
   PCR kit (Clontech, CA, USA), and PCR products were purified with AMpure
   PB beads (Beckman Coulter, CA, UAS). Purification was followed by size
   selection using the BluePippinTM Size Selection System (Sage Science,
   MA, USA) of the following bins: 1-2, 2-3 and 3-6 kb. The three
   libraries were then constructed using SMRTbell Template Prep kit
   (Pacific Biosciences, CA, USA). Before sequencing, the quality of the
   libraries was assessed by Agilent Bioanalyzer 2100 system (Agilent
   Technologies, CA, USA) and Qubit fuorometer 2.0 (Life Technologies, CA,
   USA). Libraries were prepared for sequencing by annealing a sequencing
   primer and adding polymerase to the primer annealed template. The
   polymerase-bound template was bound to MagBeads and a total of 6 SMRT
   cells were sequencing on PacBio RS II platform using P6-C4 chemistry (2
   cell each library).

PacBio Long-Read Processing

   PacBio polymerase reads were processed into error corrected reads of
   insert (ROIs) using SMART Analysis v2.3
   ([92]https://www.pacb.com/products-and-services/analytical-software)
   with min-Full Pass > = 0 and min-Predicted Accuracy >75%. After ROIs
   <50 bp were discarded, they were classified into full-length
   non-chimeric (FLNC) and non-full-length (NFL) reads based on the
   presence of the poly(A) tail signal and the 5’ and 3’ cDNA primers.
   FLNC reads were clustered into consensus sequences using the Iterative
   Clustering for Error Correction (ICE) algorithm
   ([93]https://www.pacb.com/products-and-services/analytical-software).
   Combined with NFL reads, consensus sequences were then polished in
   clusters using Quiver ([94]Chin et al., 2013). Based on the criteria of
   post-correction accuracy >99%, consensus sequences were divided into
   high-quality and low-quality sequences. To improve the accuracy of
   consensus sequences, low-quality sequences were corrected by the above
   Illumina clean reads using Proovread v2.13.13 with default parameters
   ([95]Thomas et al., 2014). Consensus sequences were mapped to the
   reference genome of L. maculatus (NCBI BioProject ID: PRJNA407434)
   using Genomic Mapping and Alignment Program (GMAP) ([96]Wu and
   Watanabe, 2005). Mapped sequences were further collapsed using the
   pbtranscript-ToFU package
   ([97]http://github.com/PacificBiosciences/cDNA_primer/) with
   min-coverage = 85% and min-identity = 90% to generate non-redundant
   transcripts.

Alternative Splicing (AS) Analysis

   AStalavista v3.2 software with default parameters was employed to
   determine the AS events in above obtained non-redundant transcripts
   ([98]Foissac and Sammeth, 2007). The non-redundant transcripts were
   further classified into five major types of AS events following the
   rules in previous publication ([99]Wang et al., 2016), namely Intron
   retention, Exon skipping, Alternative 3’ splice site, Mutually
   exclusive exon, and Alternative 5’ splice site.

Alternative Polyadenylation (APA) Identification

   In our study, FLNC reads were selected to identify APA sites using
   Transcriptome Analysis Pipeline for Isoform Sequencing (TAPIS pipeline
   v1.2.1, default parameters) ([100]Abdel-Ghany et al., 2016). The
   qualified APA for a gene must be supported by at least two FLNC reads
   aligned to the gene loci.

Long Non-Coding RNA (LncRNA) and Simple Sequence Repeat (SSR) Analysis

   Four computational approaches, including Coding-Non-Coding-Index (CNCI,
   v2), Coding Potential Calculator (CPC, v1), Coding Potential Assessment
   Tool (CPAT, v1.2), and Pfam (v1.5), were combined to identify
   non-protein coding RNA candidates from the non-redundant transcripts.
   Transcripts with lengths more than 200 bp and more than two exons were
   selected as lncRNAs candidates and further screened using CPC, CNCI,
   CPAT, and Pfam that have the power to distinguish the protein-coding
   genes from the non-coding genes. The relationship between lncRNAs and
   target genes were predicted based on their position (<100 kb upstream
   or downstream) and base complementary using lncTar target gene
   prediction tool (v1.0) with default parameters ([101]Li et al., 2015).

   The simple sequence repeats (SSR) were identified using Microsatellite
   identification tool (MISA, v1.0) with default parameters ([102]Beier et
   al., 2017). Only non-redundant transcripts that were > = 500 bp in size
   were selected for SSR detection. A total of seven SSR types were
   identified, namely, Mono-nucleotide, Di-, Tri-, Tetra-, Penta-, Hexa-,
   and compound SSR, respectively.

Fusion Transcripts and Transcription Factors (TFs) Detection

   Consensus sequences from PacBio Iso-Seq were selected for fusion
   transcripts identification. A fusion transcript is a chimeric RNA
   encoded by a single fusion gene or by two different genes that are
   subsequently joined by trans-splicing. The criteria used to identify
   candidate fusion transcripts as follows A) map to two or more loci; B)
   minimum coverage for each locus is 5% and minimum coverage in bp is > =
   1bp; C) total coverage is > = 95%; D) distance between the loci is at
   least 10 kb ([103]Wang et al., 2016; [104]Li et al., 2018).

   Animal TFDB 3.0 ([105]http://bioinfo.life.hust.edu.cn/AnimalTFDB/) was
   set as the reference transcription factor database. The algorithm HMMER
   3.0 software was used to identify TFs and assign transcripts to
   different families ([106]Eddy, 2009).

Functional Annotation

   The non-redundant transcripts were aligned against several protein and
   nucleotide databases, including Clusters of Gene Ontology (GO), Kyoto
   Encyclopedia of Genes and Genomes (KEGG), conserved Protein families or
   domains (Pfam), Swiss-prot, NCBI non-redundant proteins (NR), and
   non-redundant nucleotide (NT) databases, using BLASTX (v2.2.26) with
   cutoff E-value < = 1e^-5 ([107]Camacho et al., 2009).

Fish and Experimental Treatments for RNA-Seq

   L. maculatus adults (body length: 21.92 ± 3.17 cm, body weight: 158.23
   ± 18.77 g) were acquired from Shuangying Aquaculture Company (Dongying,
   Shandong Province, China) and acclimated for a week. Water temperature
   (13.5–14.5°C), pH (7.8–8.15), salinity (30 ppt), and DO (6.7–7.5 mg/L)
   were stabilized during the acclimation. After acclimation, 90
   individuals were randomly divided into two groups: freshwater (FW, 0
   ppt) and seawater group (SW, 30 ppt) in triplicate tanks at the density
   of 15 individuals per tank. After 30 days of breeding, three
   individuals per tank were anesthetized with MS-222 and rapidly sampled
   for gill tissues, which were frozen in liquid nitrogen and transferred
   to −80°C refrigerator until the extraction of RNA.

RNA-Seq Library Construction and Sequencing

   Total RNA was extracted using TRIzol method mentioned above (the
   section RNA Extraction). Equal amounts of RNA from the gill tissues of
   three individuals (500 ng per individual) from the same tank were
   pooled as one sample to minimize the variation among individuals. A
   total of 6 libraries (3 replicated samples × 2 treatment groups) were
   constructed using the TruSeq™ RNA Sample Prep Kit (Illumina, CA, USA).
   The libraries were sequenced on Illumina HiSeq 4000 platform and 150 bp
   paired-end raw reads were generated. Then, the raw reads were processed
   using Trimmomatic software ([108]Bolger et al., 2014) and clean reads
   were obtained for the following analysis.

Differentially Expressed Transcript (DETs) Analysis

   To estimate the expression level of transcripts, the Iso-Seq database
   was added to the genome database to construct a new database with a
   modified general feature format (GFF) file. Then, the clean reads from
   the above RNA-Seq were mapped to the new database using STAR software
   (v2.5.3) ([109]Dobin et al., 2013). The Cuffquant and Cuffnorm modules
   of the Cufflinks program (v2.2.1) were used to quantify transcript
   abundance based on the mapped results ([110]Trapnell et al., 2010).
   When all the splicing junctions of transcripts were supported by clean
   reads, it was defined as an expressed transcript. For the reads mapped
   to multiple isoforms derived from the same gene, they were distributed
   according to uniquely mapped reads using Cuffquant and Cuffnorm
   modules. The mapped reads were counted and subsequently normalized to
   fragments per kilobase of transcript per million fragments mapped
   (FPKM) as the expression value. Differential expression analysis in the
   FW and SW environments was performed using the DEseq R package
   (1.10.1). The P-values were adjusted using Benjamini and Hochberg’s
   approach for controlling the false discovery rate (FDR). The FDR <0.05
   and |fold change| > = 2 were set as the threshold for significantly
   differentially expressed transcripts (DETs).

Validation Experiments

   For RT-PCR validation of AS events, fusion transcripts, and novel
   transcripts, total RNA was reverse-transcribed to cDNA using
   PrimeScript RT reagent kit (Takara, Shiga, Japan) following the
   manufacturer’s instructions. 10×diluted cDNA was served as the template
   and Fastpfu reagent kit (TransGen, Beijing, China) was used for RT-PCR
   amplification. Transcript-specific primers were designed to span the
   predicted splicing events using Primer 5 software ([111] Supplementary
   Table 1 ). PCR conditions were 5 min at 94°C followed by 35 cycles of
   94°C for 30 s, Tm for 30 s, 72°C for a time period that depends on the
   product sizes, and 72°C for 10 min. PCR products were monitored on 1%
   agarose gel stained by GelStain (TransGen, Beijing, China). For the APA
   validation, 1 µg RNA was used to synthesize first-strand cDNA using the
   SMART™ RACE cDNA Amplification Kit (Clontech, California, USA).
   Gene-specific primers ([112] Supplementary Table 1 ) were designed for
   the 3’ rapid amplification of cDNA ends (3’ RACE). PCRs were performed
   using Taq DNA Polymerase (Clontech, California, USA) following
   touchdown PCR cycling conditions: denaturation step at 94°C for 3 min,
   followed by 20 cycles of 94°C for 15 s and at a range of annealing
   temperature from 60 to 50°C, decreasing 0.5°C each cycle and 72°C for
   40 s, and finally ended with 10 min at 72°C for extension. PCR products
   were also monitored on 1% agarose gel stained by GelStain (TransGen,
   Beijing, China). Finally, the products were purified, subcloned into T1
   vector, propagated in Escherichia coli DH5a, and sequenced by the
   Sanger method.

   Quantitative real-time PCR (qPCR) analysis was employed to verify
   differentially expressed transcripts. Total RNA was isolated from the
   gill tissues of fish exposed to freshwater and seawater in previous
   salinity challenge experiment. cDNA was synthesized using the
   PrimeScript RT reagent kit (TaKaRa, Shiga, Japan). All
   transcripts-specific primers for qPCR were designed using Primer 5
   software and listed in [113]Supplementary Table 2 . SYBR Premix Ex Taq
   kit was used for qPCR (Takara, Shiga, Japan). Each PCR reaction
   consisted of 2 µl cDNA, 10 µl SYBR premix Ex Taq, 0.4 µl of each
   forward and reverse primers, 0.4 µl ROX Reference Dye, 6. 8µl ddH2O to
   a final volume of 20 µl. qPCR was performed on the Applied Biosystems
   7300 machines (Applied Biosystems, CA, USA) under the following
   conditions: 95°C for 30 s and 40 cycles of 95°C for 5 s, 55°C for 30 s,
   and 72°C for 30 s. The relative expression levels of transcripts were
   normalized by 18S ribosomal RNA. 2^-ΔΔCT method was used for subsequent
   analysis. The correlation coefficient between differential expressed
   transcripts and qPCR were determined by SPSS 17.0 software ([114]Bryman
   and Cramer, 2011). One-way ANOVA was conducted followed by Duncan’s
   multiple tests to identify significance differences when P < 0.05.

Results

PacBio Iso-Seq and Bioinformatic Analysis

   In total, six SMRT cells, including three size-fractionated libraries
   (1–2, 2–3, and 3–6 kb), were used for Iso-Seq, yielding 13.42 Gb of
   clean data. A specific bioinformatic analysis pipeline for our Iso-Seq
   data is outlined in [115]Figure 1A . In detail, 363,371 ROIs were
   retained after filtering, and the mean length was 3,120 bp ([116] Table
   1 ). The density plot of the length of the ROIs showed three obvious
   peaks, which was consistent with the size of the three libraries ([117]
   Supplementary Figure 1 ). ROIs were further classified into FLNC and
   NFL reads based on the presence of 5’ primer, 3’ primer, and poly(A)
   tails. A total of 39.79%, 44.88%, and 44.77% of ROIs were qualified as
   FLNC reads in the 1–2, 2–3, and 3–6 kb libraries ([118] Table 1 ),
   respectively, with an average FLNC ratio of 42.5% ([119] Figure 1B ).
   ICE was applied for sequence clustering, yielding 60,573 consensus
   sequences ([120] Table 1 ). Combined with NFL reads, these consensus
   sequences were corrected by Quiver. A total of 68.92% (41,744) of the
   sequences were defined as high-quality sequences. The remaining 18,829
   consensus sequences were defined as low-quality sequences, which were
   subsequently corrected by Illumina clean reads. Finally, these
   consensus sequences were collapsed by the TOFU process, yielding 28,809
   non-redundant transcripts retained for the following study ([121] Table
   1 ). The reference genome and Iso-Seq data information was shown in
   [122]Figure 2 .

Figure 1.

   [123]Figure 1
   [124]Open in a new tab

   Iso-Seq analysis pipeline and ROIs classification. (A) Pipeline of
   bioinformatic analysis in Iso-Seq. (B) Pie chart of ROIs classification
   for Iso-Seq. Full-length reads: containing 5’ primer, 3’ primer and the
   poly (A) tails. Filtered short reads: length 300 bp; Chimeric reads:
   containing artificial concatemers.

Table 1.

   Statistics of Iso-Seq data in L. maculatus.
                Library                 1-2kb     2-3kb    3-6kb    Total
   SMRT cell                          2         2         2       6
   Number of subreads                 2,933,418 1,466,540 869,243 5,269,201
   Mean length of subreads            1,835     3,178     3,877   2,546
   Number of ROIs                     156,048   122,203   85,120  363,371
   Mean length of ROIs                2,540     3,482     3,663   3,120
   Mean quality of ROIs               0.92      0.90      0.87    0.90
   Mean Number of Passes              12        7         5       8.7
   Number of FL reads                 62,090    54,840    38,112  154,969
   Number of FLNC reads               61,392    54,818    38,093  154,249
   Mean length of FLNC reads          1,258     2,159     3,337   2,091
   FL percentage                      39.79%    44.88%    44.77%  42.50%
   Number of consensus sequences      24,188    17,561    18,824  60,573
   Mean length of consensus sequences 1,321     2,360     3,712   2,365
   High-quality sequences             21,390    12,434    7,920   41,744
   Low-quality sequences              2,798     5,127     10,904  18,829
   Non-redundant transcripts          –         –         –       28,809
   [125]Open in a new tab

Figure 2.

   [126]Figure 2
   [127]Open in a new tab

   Circos visualization of L. maculatus genome and Iso-Seq results. (A)
   Karyotype of the L. maculatus chromosome; (B) Gene density of the L.
   maculatus genome; (C) Gene density of Iso-Seq; (D) Transcript density
   of Iso-Seq; (E) Alternative splicing (AS) events distribution; (F)
   Alternative polyadenylation (APA) events distribution; (G) Long
   non-coding RNA distribution; and (H) Fusion transcripts distribution:
   intra-chromosome (purple); inter-chromosome (deyellow). The
   distribution was calculated in a 1-Mb sliding window at 20 kb
   intervals.

Transcripts and Alternative Splicing (AS) Events

   A total of 28,809 non-redundant transcripts were compared against L.
   maculatus reference genome. In total, 88.9% of non-redundant
   transcripts (25,637) were aligned to 12,477 annotated gene loci ([128]
   Figure 3A ), covering 52.7% of the L. maculatus genome loci (23,657).
   Based on the splice sites of the genome and structures of transcripts,
   a total of 25,637 transcripts annotated in the genome were further
   classified into two groups as follows ([129] Figure 3A ): 1) known
   isoforms (7,357, 25.5%) sharing the same splice sites with the existing
   L. maculatus gene models; and 2) novel isoforms (18,280, 63.5%) that
   share at least one splice site with existing L. maculatus gene models
   but differ in other splice sites. Typical examples were shown in
   [130]Supplementary Figure 2A . The remaining 3,172 transcripts (11.1%)
   were absent from any annotated gene loci in the L. maculatus genome and
   were identified as novel isoforms from novel genes. The 3,172 novel
   isoforms were clustered into 2,580 gene loci defined as novel genes
   ([131] Figure 3B , [132]Supplementary Figure 2B ). To further
   investigate the homology and annotation, these novel gene loci were
   aligned against the Swiss-Prot and NR databases. A total of 24.96% of
   novel gene loci (644) were annotated in the Swiss-Prot protein
   database, and 43.64% of novel loci (1,126) were in NR database, which
   exhibited their homology to other species. The remaining genes absent
   in the databases were likely species-specific genes in L. maculatus.
   Four novel isoforms from novel genes were randomly selected for
   validation by RT-PCR ([133] Supplementary Figure 3A ).

Figure 3.

   [134]Figure 3
   [135]Open in a new tab

   Characteristics of transcripts and alternative splicing (AS) events of
   L. maculatus. (A) Numbers of three types of transcripts. (B) Venn
   diagram showing the common and unique genes from the L. maculatus
   genome and Iso-Seq. (C) Distribution of the number of isoforms per
   gene. (D) Schematic diagram of five types of AS events. (E) Pie chart
   showing the numbers of five AS events in Iso-Seq.

   A total of 15,057 gene loci (12,477 existing gene loci and 2580 novel
   gene loci) were identified in our Iso-Seq data, of which 6,396 (42.5%)
   were found to generate at least two different isoforms ([136] Figure 3C
   ). Notably, 476 out of 15,057 (3.2%) genes produced more than 5
   isoforms for each gene loci, generating a total of 6,087 unique
   isoforms that accounted for 21% of the total Iso-Seq transcripts. To
   investigate the potential function of the genes with numerous isoforms,
   KEGG pathway enrichment analysis was performed for 847 genes harboring
   more than 4 isoforms. The results revealed that the most enriched
   pathways were related to the phagosome, apoptosis, and the AGE-RAGE
   signaling pathway ([137] Supplementary Figure 4 ).

   During AS events, splice sites are used with greater or lesser
   frequency to produce messages that differ in their exon content and
   structure ([138]Liu et al., 2017). Although this happens frequently,
   only a few of the AS events have been reported in aquaculture species.
   In this study, a total of 10,249 AS events were detected from the
   Iso-Seq database and further classified into five main types ([139]
   Figures 2E and [140]3D, E , [141]Supplementary Figures 5A–E ).
   Strikingly, intron retention (39.9%, 4,089) was the most enriched type
   of AS event, and exon skipping (27.99%, 2,869) was the second most
   prevalent AS event. The number of the two AS types accounted for more
   than half (67.86%) of the total AS events in L. maculatus.

   To verify the accuracy of isoforms identified by Iso-Seq, 10 genes with
   predicted AS events were randomly selected, and the existence and size
   of isoforms were validated by RT-PCR. Primers were designed in the
   overlapping regions of various transcripts derived from the same gene.
   The experimental results demonstrated that the amplified product sizes
   were consistent with predicted target fragments by Iso-Seq, confirming
   the credibility of our Iso-Seq data ([142] Figure 4 ).

Figure 4.

   [143]Figure 4
   [144]Open in a new tab

   Validation of alternative splicing (AS) events by RT-PCR. Ten AS events
   were randomly selected for RT-PCR validation. Gene name, gene ID, and
   transcript ID were shown in order to the left of the gene model.
   Splicing events of different transcripts were shown by red or blue
   lines connecting exons, and green arrowheads indicated the position of
   PCR primers (F, forward and R, reverse). Introns are represented by
   black lines. In the agarose gel images, the red or blue arrowheads
   indicated the PCR products generated from the corresponding
   transcripts.

Alternative Polyadenylation (APA) Events

   In our Iso-Seq data, the TAPIS pipeline was used to detect APA events
   in L. maculatus. The qualified gene loci for APA must be supported by
   at least two aligned FLNC reads. Of the 6,506 detected genes with
   evidence of a poly(A) site, 5,147 genes (79.11%) were found to contain
   a single poly(A) site ([145] Figures 2F and [146]5A ). The remaining
   1,359 (20.89%) genes contained two or more detected poly(A) sites, and
   14 genes were predicted to generate more than 5 poly(A) sites. An
   example, the transcripts structure of haptoglobin gene, which contained
   several distinct poly(A) sites, was illustrated in [147]Figure 5B .
   Additionally, a gene with APA events was randomly selected for the
   validation experiment using 3’RACE and Sanger sequencing ([148]
   Supplementary Figure 3B ).

Figure 5.

   [149]Figure 5
   [150]Open in a new tab

   Statistics of identified Alternative polyadenylation (APA) events. (A)
   Distribution of the number of poly(A) sites per gene. (B) IGV view of
   haptoglobin gene with APA events.

Long Non-Coding RNA (LncRNA)

   In our Iso-Seq database, a total of 3,112 lncRNAs were ultimately
   identified by intersection analysis of four computational approaches,
   including CNCI, CPC, CPAT, and Pfam ([151] Figures 2G and [152]6A ).
   Based on biogenesis positions relative to protein-coding genes of the
   genome of L. maculatus, 2,734 (87.85%) lncRNAs were further divided
   into four categories: 22.69% (706) were generated from intergenic
   regions (lincRNA), 20.18% (628) from intronic regions
   (intronic-lncRNA), 10.32% (321) from the antisense strand
   (antisense-lncRNA), and 34.67% (1,079) from the sense strand
   (sense-lncRNA) ([153] Figure 6B ). The relationship between lncRNAs and
   target genes was predicted based on their position (< 100 kb upstream
   or downstream) and base complementary. A total of 13,566 protein-coding
   gene loci were screened in the 100 kb upstream or downstream of 3,007
   lncRNAs. In total, 909 lncRNAs were found to have a base-pairing
   interaction with 14,080 mRNAs.

Figure 6.

   [154]Figure 6
   [155]Open in a new tab

   Statistics of long non-coding RNA and SSR in L. maculatus. (A) Venn
   diagram showing the number of lncRNAs annotated by four computational
   approaches including CPC, CNCI, CPAT, and Pfam. (B) The number of four
   types of lncRNA based on biogenesis positions relative to
   protein-coding genes. (C) Density of SSR motifs. c, compound SSR; p1:
   Mono-nucleotide SSR; p2, Di-nucleotide SSR; p3, Tri-nucleotide SSR; p4,
   Tetra-nucleotide SSR; p5, Penta-nucleotide SSR; p6, Hexa-nucleotide
   SSR.

Simple Sequence Repeat (SSR)

   Of 21,432 selected novel transcripts (novel isoforms from known gene
   loci and novel transcripts from novel gene loci), a total of 13,450
   transcripts were found to contain 29,690 SSR motifs ([156]
   Supplementary Table 3 ). Additionally, more than half of the SSR
   transcripts (7,401, 55.03%) contained at least 2 SSR motifs, and 5,019
   SSR motifs (16.95%) were classified as compound repeats. Of the
   detected SSR motifs, the mono-nucleotide motif (240.30/Mb) was the most
   abundant in density, followed by di- (82.00/Mb), compound (69.10/Mb),
   tri- (48.66/Mb), tetra- (5.40/Mb), penta- (0.87/Mb), and hexa-
   (0.45/Mb) nucleotide motifs ([157] Figure 6C ).

Fusion Transcripts and Transcription Factors (TFs)

   Fusion transcripts, such as chimeric mRNA transcripts, result from
   either trans-splicing of distinct genes or aberrant chromosomal
   translocations ([158]Wang et al., 2016). In our Iso-Seq dataset, a
   total of 365 fusion transcripts were identified, and their chromosome
   distribution was shown in [159]Figure 2H . Among them, 43 fusion
   transcripts were observed in the intra-chromosomic region, while the
   others (322) were in the inter-chromosomic region. The results of GO
   enrichment analysis showed that the fusion transcripts were primarily
   (top six) associated with cell (GO:0005623), cell part (GO: 0044464),
   catalytic activity (GO:0003824), binding (GO: 0005488), cellular
   process (GO: 0009987), and single-organism process (GO: 0044699) ([160]
   Supplementary Figure 6 ). Two fusion transcripts (PB.19, PB.130) were
   randomly selected and verified by RT-PCR ([161] Supplementary Figure 3C
   ).

   In our Iso-Seq, a total of 1,194 TFs transcripts generated from 723 TFs
   genes were identified and their detailed information was shown in
   [162]Supplementary Table 4 . Based on the Animal TFDB 3.0 database
   classification, these TFs belong to more than 52 families. It is the
   first time to extensively identified TFs using transcriptome dataset in
   L. maculatus, which provided a useful foundation for TFs studies in the
   future.

Differentially Expressed Transcripts (DETs) in Response to FW and SW
Environment

   To capture transcript-level expression differences in response to
   different salinity environment, the Illumina RNA-Seq data of gill
   tissue was aligned to the refined genome combined with both the
   reference genome and the Iso-Seq database for quantification. In total,
   265.90 million clean reads were mapped to the new database. Using these
   criteria, a total of 518 DETs covering 497 gene loci were identified,
   of which 264 transcripts were up-regulated and 254 transcripts were
   down-regulated in the SW relative to the FW group ([163] Supplementary
   Table 5 ). The distribution of DETs was illustrated in
   [164]Supplementary Figure 7 .

   These candidate DETs were classified into eight functional groups,
   including energy metabolism, immune response, molecule and ion
   transport and metabolism, protein biosynthesis, protein degradation,
   RNA processing and modification, signal transduction, and structure
   reorganization based on the combination of GO and KEGG annotation,
   enrichment analysis, and published literature ([165] Supplementary
   Table 6 ). These results indicated that transcripts showed different
   expression patterns in response to FW and SW environment. As shown in
   [166]Supplementary Figure 8 , among these DETs, exon skipping events
   were the most frequent AS type, accounting for 35.37% (208), followed
   by intron retention (32.48%, 191), which was different from their
   percentages in the Iso-Seq results.

   To verify the accuracy of the expression patterns of DETs by Iso-Seq,
   we randomly selected nine transcripts derived from four genes for qPCR
   validation ([167] Figure 7 ). The experimental results demonstrated
   that the expression patterns were consistent with our analysis results,
   confirming the credibility and accuracy of DETs results.

Figure 7.

   [168]Figure 7
   [169]Open in a new tab

   qPCR validation of DETs. The expression levels were showed as the
   log[2]fold change in SW relative to FW group. Gene abbreviations were
   as followed: slc4a1, solute carrier family 4a1; slc25a39, solute
   carrier family 25 member 39; nka-beta233, sodium/potassium-transporting
   ATPase subunit beta-233; mtGPAT1, glycerol-3-phosphate acyltransferase
   1, mitochondrial.

Genes With Distinct Differentially Expressed Transcripts (DETs)

   In our study, a total of 518 DETs were generated from 497 genes,
   suggesting that some genes could produce at least two DETs. A total of
   17 genes were found to generate several spliced variants, of which two
   or more variants were differentially expressed. Based on their
   expression patterns, two situations were observed for the genes ([170]
   Table 2 ). Firstly, DETs from the same gene loci exhibited the similar
   expression trends. For example, all three transcripts of Cysteine
   dioxygenase type 1 (cdo1) gene were significantly down-regulated in the
   SW relative to the FW group, and a similar expression pattern was also
   found in different transcripts of sodium/potassium-transporting ATPase
   subunit beta-233 (nka-beta233) gene. The second situation was that DETs
   of the same gene exhibited the opposite expression trends (including
   seven genes; the schematic diagram of their structures was illustrated
   in [171]Supplementary Figure 9 ). For example, 2,4-dienoyl-CoA
   reductase, mitochondrial (decr1), involved in the decomposition of
   polyunsaturated fatty acids, was alternatively spliced to generate two
   transcripts with opposite expression patterns. Similar events were also
   discovered in 6-phosphofructo-2-kinase fructose-2,6-bisphosphatase 3
   (pfkfb3), and prostaglandin D2 receptor 2 (ptgdr2). The results
   suggested that spliced transcripts of the same gene may be involved in
   diverse physiological functions.

Table 2.

   List of genes with distinct differentially expressed transcripts (DETs)
   in SW relative to FW environment.
   Gene name Gene ID Transcripts ID FDR Log[2]FC
   Genes with distinct DETs in similar expression patterns
   Pre-mRNA-processing factor 39 evm.TU.scaffold_1.98 PB.341.2 0.000106
   2.67500760
   PB.341.5 0.039993 1.53015860
   Sodium/potassium-transporting ATPase subunit beta-233
   evm.TU.scaffold_125.24 PB.2116.24 5.48E-05 −1.31694649
   PB.2116.25 0.04437353 −1.59047126
   PB.2116.26 1.22E-05 −1.79317407
   Immunoglobulin heavy chain evm.TU.scaffold_156.67 PB.3643.213 0.010499
   −3.25928268
   PB.3643.274 0.031384 −1.20801476
   Endothelial PAS domain-containing protein 1 evm.TU.scaffold_166.2
   PB.3978.4 2.99E-06 1.23813931
   PB.3979.2 0.001415 2.01478964
   Cuticle protein evm.TU.scaffold_180.61 PB.4536.1 0.007865 1.03098816
   PB.4536.2 0.014347 6.48597657
   PB.4536.5 0.03675 1.21411681
   PB.4536.13 7.81E-05 1.19817302
   Cytidine deaminase evm.TU.scaffold_226.10 PB.6084.1 0.002701 −2.9710025
   PB.6084.2 4.35E-12 −2.76291983
   Cysteine dioxygenase type 1 evm.TU.scaffold_3.79 PB.8015.1 1.42E-15
   −6.26435354
   PB.8015.2 2.13E-11 −5.18024918
   PB.8015.3 6.58E-23 −5.03618275
   Coiled-coil domain-containing protein 22 evm.TU.scaffold_309.8
   PB.8408.5 0.000442 -inf
   PB.8408.6 0.002339 −2.24439212
   Nucleolin 2 evm.TU.scaffold_41.58 PB.10153.4 0.000822 −1.82073360
   PB.10153.6 1.61E-05 -inf
   Interferon-induced very large GTPase 1 evm.TU.scaffold_56.81 PB.11500.1
   1.25E-24 −3.27867754
   PB.11500.2 1.16E-17 −2.07765032
   Genes with distinct DETs in opposite expression patterns
   KAT8 regulatory NSL complex subunit 3 evm.TU.scaffold_0.458 PB.253.1
   0.009568 −3.28744420
   PB.253.8 0.001073 3.68324251
   Ran GTPase-activating protein 1 evm.TU.scaffold_116.12 PB.1622.1
   0.046889 1.24749357
   PB.1622.5 −1.56839702
   6-Phosphofructo-2-kinase/fructose-2,6-bisphosphatase 3
   evm.TU.scaffold_118.17 PB.1707.1 6.17E-29 inf
   PB.1707.4 2.61E-23 −5.16585478
   Prostaglandin D2 receptor 2 evm.TU.scaffold_29.136 PB.7861.1 0.026324
   −3.1457994
   PB.7861.2 0.045375 2.30238325
   2,4-Dienoyl-CoA reductase, mitochondrial evm.TU.scaffold_313.25
   PB.8538.1 2.65E-06 −2.72486991
   PB.8538.2 0.025641 1.04664231
   Protein kinase C-binding protein 1 evm.TU.scaffold_44.98 PB.10442.1
   0.010769 −2.17623890
   PB.10442.2 1.13E-07 inf
   MAPK/MAK/MRK overlapping kinase evm.TU.scaffold_80.12 PB.13693.1
   0.039827 −3.55865208
   PB.13693.2 7.81E-05 inf
   [172]Open in a new tab

Differentially Expressed Transcription Factors

   Of 518 DETs, 17 DETs were identified as TFs belonging to 8 families,
   including C2H2-ZF, bHLH, ETS, Fork head, HMG, Homeobox, MBD, and RHD
   ([173] Supplementary Table 7 ). Both C2H2-ZF and bHLH TF families have
   been well characterized with roles in response to stresses ([174]Fujita
   et al., 2006; [175]Terova et al., 2008; [176]Steinberg, 2012). Of 17
   differentially expressed TFs, 14 TFs were found to be up-regulated in
   SW relative to FW group, suggesting that TFs may play important roles
   in response to hypertonic stress and enhance salt tolerance of L.
   maculatus.

Discussion

   In our study, we employed PacBio Iso-Seq to uncover the complexity of
   the L. maculatus transcriptome, providing the first comprehensive view
   of splice variants in aquaculture teleosts. Using Iso-Seq, 2,580 novel
   genes were discovered, accounting for more than 10% of the total number
   of genes in the L. maculatus genome. The most impressive gene was
   immunoglobulin heavy chain with 279 unique transcripts, which was even
   more than neurexin-1-alpha with 247 splicing variants in mouse
   ([177]Barbara et al., 2014). This suggests that Iso-Seq is advantageous
   for the identification of novel gene loci and the detection of
   alternative transcripts, which is consistent with a previous study. In
   addition, post-transcriptional events (AS and APA), lncRNAs, and fusion
   transcripts were predicted to improve the understanding of the
   complexity of L. maculatus. These results would be great resource for
   further analysis of post-transcriptional events and refinement of the
   annotation of the L. maculatus reference genome.

   Over the past decade, it has been shown that AS is a major mechanism
   for the enhancement of transcriptome and proteome diversity ([178]Keren
   et al., 2010). A certain AS event is the outcome of the cooperative or
   antagonistic interactions between RNA cis-elements and splicing factors
   ([179]Black, 2003; [180]Matlin et al., 2005), including members of the
   serine-arginine-rich protein family ([181]Fu, 1995), members of the
   heterogenous nuclear ribonucleoproteins family ([182]Krecic and
   Swanson, 1999), and other specific proteins ([183]Underwood et al.,
   2005). Accumulating evidence indicates that numerous stimuli, such as
   growth factors, cytokines, and stress, would alter the choice of splice
   sites and produce multiple transcripts ([184]Barbazuk et al., 2008).
   Multiple transcripts in teleosts could promote the tolerance to
   stresses ([185]Xia et al., 2017; [186]Tan et al., 2018). In our study,
   several typical splicing factors, including serine/arginine-rich
   splicing factor 1, serine/arginine-rich splicing factor 7, RNA-binding
   protein 5, RNA-binding protein 33, RNA-binding protein 39, and
   RNA-binding protein 47, were differentially expressed under salinity
   challenge, indicating the splice factors and AS events in L. maculatus
   can be activated by salinity stimuli. The stress-induced AS events
   could increase the tolerance to the stresses by two different
   mechanisms. 1) stress-induced AS events could generate aberrant
   transcripts with splicing errors, which would be removed by nonsense
   mediated mRNA decay ([187]Wollerton et al., 2004; [188]Chang et al.,
   2007). This mechanism could weaken the function of the corresponding
   genes by decreasing the abundance of functional transcripts
   ([189]Maquat, 2004; [190]Chang et al., 2014; [191]Cui et al., 2014). 2)
   AS transcripts could encode unique protein, often with alternations in
   localization, activity, and function ([192]Wang et al., 2008;
   [193]Kalam et al., 2017). Moreover, their biological function change
   and expression abundance regulation are largely independent process to
   increase organismal tolerance against stresses. For example, in human,
   Na^+/K^+/2Cl^- cotransporter (nkcc2) is proved to be key regulator
   associated with salt and water homoeostasis in kidney. At least 3 nkcc2
   transcripts are generated via different splicing of exon 4
   ([194]Schiessl and Castrop, 2015). The exon 4 encodes the second
   transmembrane domain, which is crucially involved in the Cl^- binding
   ([195]Haas and Mcmanus, 1983; [196]Schiessl and Castrop, 2015). As a
   result, these nkcc2 transcripts differ markedly in their ion affinities
   and transport characteristics ([197]Haas and Mcmanus, 1983;
   [198]Schiessl and Castrop, 2015). The nkcc2 splicing is need for
   enhanced ion reabsorption during a salt-restricted diet, even without
   changes in total nkcc2 abundance ([199]Schiessl and Castrop, 2015). In
   our study, three nkcc2 transcripts were also identified in L. maculatus
   and their splicing mode has been shown in [200]Supplementary Figure 5A
   . These transcripts in L. maculatus were generated by different
   splicing mode similar with those in human, which 5 exons were lost
   between exon 2 and 8; intron is retained between exon 23 and 24. The
   various splicing modes may also change transmembrane domain of nkcc2
   and influence their ion affinities. However, their specific
   physiological function is required to be further studied in the future.
   Besides, AS transcripts could also regulate their expression abundance
   to cope with stress, which has been widely reported in previous
   transcriptomic studies ([201]Xia et al., 2017). In our study, a total
   518 DETs were identified in response to different salinity environment.
   However, the functional significance of most spliced transcripts in
   teleosts is yet to be fully elucidated. Hence, their gene function
   would be discussed as follows.

   In response to cell shrinkage and swelling caused by salinity stress,
   fishes need to cope with salt depletion or gain, and water loss or
   gain. In the molecule and ion transport and metabolism groups, the
   expression of sodium/potassium-transporting ATPase subunit alpha-2 and
   beta-233 transcripts were significantly up-regulated in the FW
   environment. Both the sodium/potassium-transporting ATPase subunit
   alpha-2 and beta-233 genes are members of the
   sodium/potassium-transporting ATPase family, which play an important
   role in providing a driving force for ion transport to maintain cell
   osmotic balance and volume in euryhaline teleosts, such as senegal sole
   (Solea senegalensis)([202]Skou and Esmann, 1992; [203]Feng et al.,
   2002; [204]Armesto et al., 2014). Solute carrier family 4 a1 (slc4a1)
   is generally accepted as a bifunctional protein with both Cl^-/HCO3^-
   exchange and Cl^-/taurine channel functions ([205]Romero et al., 2013).
   It has been proposed that hypotonic stress induces taurine movements
   via an anion channel that is depending on or controlled by slc4a1
   ([206]Fiévet et al., 1995). Consistent with previous studies, our data
   indicated that the expression of slc4a1 in the FW environment was
   up-regulated in response to swelling stress. In coping with hypertonic
   stress, adenosylhomocysteinase 2 gene could reduce the apparent
   affinity for intracellular Mg^2+ in the inhibition of slc4a1 currents,
   which explains the high expression level of adenosylhomocysteinase 2 in
   the SW environment ([207]Soichiro and Toru, 2014).

   An adequate and timely energy supply is a prerequisite for enzymes and
   transporters used in iono- and osmoregulatory processes ([208]Tseng and
   Hwang, 2008). The oxidation of glucose and fatty acids is the major
   source of energy for organisms ([209]Lavrentyev et al., 2004). In the
   energy metabolism group, transcripts of
   6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase 3 and
   2,4-dienoyl-CoA reductasewere differentially expressed in response to
   salinity stress. 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase 3
   plays a role in maintaining elevating fructose-2,6-bisphosphate levels,
   which is considered as the major regulator controlling carbon flux
   through glycolysis ([210]Sakakibara et al., 1997; [211]Chesney et al.,
   1999; [212]Alexander et al., 2002; [213]Kawaguchi et al., 2015).
   2,4-dienoyl-CoA reductase encodes an essential enzyme that participates
   in the beta-oxidation and metabolism of polyunsaturated fatty enoyl-CoA
   esters ([214]Gurvitz et al., 1999).

   Adaptive and acclamatory responses of fish to salinity stress depend on
   efficient mechanisms of osmosensing and osmotic stress signaling
   ([215]Kultz, 2015). Instead of directly coupling osmosensors to osmotic
   effector proteins, large-scale osmoregulatory mechanisms are operated
   by linking molecular osmosensors to cell signaling pathways to initiate
   adaptive reactions ([216]Evans, 2010). In our study, several DETs were
   involved in typical signal transduction, such as mitogen-activated
   protein kinase kinase kinase 14, tyrosine-protein kinase Fyn-like, rho
   GTPase-activating protein 35, tyrosine kinase 2, and
   serine/threonine-protein kinase Sgk2. They may integrate and amplify
   signals from osmosensors to activate appropriate downstream targets
   mediating physiological acclimation ([217]Kültz, 2010; [218]Zhang et
   al., 2017).

   In addition, one of the heat shock 70 kDa protein transcripts was
   differentially expressed in the SW group. Heat shock 70 kDa protein,
   known as chaperone proteins, is pivotal in maintaining protein
   homeostasis by interacting with stress-denatured proteins to prevent
   their aggregation and malfolding ([219]Parsell and Lindquist, 1993). In
   the protein degradation classification, many DETs were involved in
   ubiquitin. Ubiquitin in cells acts as a covalent modifier of proteins
   in functionalization and degradation, which is dependent on ubiquitin
   ligase ([220]Lyu et al., 2018). E3 ubiquitin proteins are the final
   enzymes in the ubiquitin-proteasome pathway, regulating protein
   degradation, cell growth, and apoptosis in response to environmental
   changes ([221]Mani and Gelmann, 2005; [222]Sardella and Kultz, 2009;
   [223]Li et al., 2014).

   Cytoskeletal organization is notably affected by perturbations in cell
   volume. Thus, cytoskeletal protein has been considered as a putative
   osmosensor. Correspondingly, several DETs are found to be involved in
   structural components of the cytoskeleton, such as cuticle protein,
   filamin-B, and beta tubulin. In addition, previous reports demonstrate
   that salinity could enhance the abundance of innate immune defenses
   proteins, and chronic salinity stressors could stimulate the
   proliferation and antimicrobial functions of innate immune cells, as
   well as the release of pro-inflammatory cytokines, in several
   euryhaline and stenohaline fish species ([224]Cuesta et al., 2005;
   [225]Delamare-Deboutteville et al., 2006; [226]Jiang et al., 2008;
   [227]Schmitz et al., 2016). In L. maculatus, several transcripts,
   encoding immune-related proteins, also exhibited differentially
   expressed profiles, such as classical MHC class I molecule alpha-chain,
   tumor necrosis factor receptor superfamily member 6B, IgGFc-binding
   protein-like, and leucine-rich-repeat-containing protein C3.

   Recently, accumulating evidence indicates that TFs are also crucial in
   mediating organism adaptation to salinity stresses by activating or
   suppressing downstream genes in the pathway ([228]Fujita et al., 2006;
   [229]Nie et al., 2019). Indeed, TFs are also greatly affected by AS
   events and TF transcripts with the alternative function often are low
   in abundance. One interesting example is the sex determination
   mechanism of fruit fly. Sex-lethal, acting as a master regulatory
   switch in female flies, plays a key role in orchestrating the changes
   in gene expression responsible for all aspects of sexual determination
   in fruit fly ([230]Förch and Valcárcel, 2003). Despite the presence of
   Sex-lethal transcripts in both sexes, however, Sex-lethal protein is
   only produced in female flies. It is resulted from a critical
   difference between the transcripts in the two sexes: exon 3 with stop
   codons in frame is included in male flies and skipped in females
   ([231]Salz et al., 1989; [232]Bell et al., 1991). However, little is
   known about similar mechanism in response to salinity. In our study, a
   total of 1,194 TF transcripts from 723 genes are identified in L.
   maculatus, suggesting AS events are common in the TF genes of L.
   maculatus. The question remains as to whether there exists alternative
   biological function among TF transcripts. A satisfactory answer to this
   question will require further researches in the future. Additionally, a
   total of 17 TF transcripts were differentially expressed after salinity
   change, including members of C2H2-ZF, bHLH, ETS, and others. Previous
   studies have demonstrated that C2H2-ZF ([233]Steinberg, 2012), bHLH
   ([234]Terova et al., 2008; [235]Liu et al., 2009), Homeobox ([236]Nie
   et al., 2019), RHD ([237]Carlsen et al., 2004), and ETS ([238]Wasylyk
   et al., 1998) TFs could be crucial in increasing stresses tolerance by
   signal transduction or modulation pathway. MBD TFs are mainly involved
   in the cytosine methylation of the nuclear DNA ([239]Nan et al., 1998)
   and HMG proteins are ubiquitous nuclear proteins that bind to DNA,
   nucleosomes and induce structural changes in the chromatin fiber
   ([240]Hock et al., 2007). In our study, a total of 17 differentially
   expressed TFs suggested their important roles in response in salinity
   change in L. maculatus.

   Post-transcriptional regulatory mechanisms, including AS, APA, and
   fusion transcripts, make essential contribution to physiological
   function regulatory of aquaculture species. For example, AS events have
   been studies in Pacific oyster (Crassostrea gigas) ([241]Huang et al.,
   2016), Nile tilapia ([242]Xia et al., 2017) and channel catfish
   (Ietalurus punetaus) ([243]Tan et al., 2018) using Illumina RNA-Seq
   datasets. However, these RNA-Seq projects of aquaculture species obtain
   transcripts only based on short read-based assembly, which would limit
   the accuracy of identification of post-transcriptional events. In our
   study, it is the first time that Iso-Seq is applied in the aquaculture
   teleost, which has detected numerous full-length transcripts and
   characterized many post-transcriptional regulatory events in L.
   maculatus. It creates a paradigm for future post-transcriptional
   regulatory studies of aquaculture species in transcriptome wide.
   Besides, this study investigates the DETs of euryhaline L. maculatus in
   response to different salinity environment. It has pushed the limit of
   previous gene-level transcriptome studies ([244]Zhang et al., 2017),
   which would be helpful to unveil molecular mechanism of coping with
   salinity stress in fishes.

Conclusion

   In our study, we applied PacBio Iso-Seq to yield a new set of
   transcriptomic data of L. maculatus as follows: 28,809 non-redundant
   transcripts, 10,249 AS events, 1,359 APA events, 3,112 lncRNA, 29,609
   SSRs, 365 fusion transcripts, and 1,194 TFs. It is the first time in
   aquaculture teleosts that Iso-Seq was applied to unveil the
   transcriptome complexity. To investigate transcripts involved in
   salinity regulation in L. maculatus, RNA-Seq data was combined with
   Iso-Seq results and identified 518 DETs in different environment.
   Notably, transcripts from the same genes may exhibit similar or
   opposite expression patterns. In addition, the expression level of 14
   TFs is significantly up-regulated in SW environment, implying their
   roles in hypertonic stress. Our study not only improves current gene
   models of L. maculatus, but also enhances the understanding of salinity
   regulatory mechanisms in euryhaline teleosts.

Data Availability Statement

   The raw sequences of our study have been submitted to the Sequence Read
   Archive (SRA) of National Center for Biotechnology Information (NCBI)
   with the accession number of PRJNA515783 (BioProject ID of Iso-Seq) and
   PRJNA515986 (BioProject ID of RNA-Seq). Reference genome of L.
   maculatus was downloaded from NCBI with the accession number of
   PRJNA407434 (BioProject ID).

Ethics Statement

   This study was carried out in accordance with the recommendations of
   “Animal Research and Ethics Committees of Ocean University of China
   (Permit Number: 20141201).” The protocol was approved by the “Animal
   Research and Ethics Committees of Ocean University of China.”

Author Contributions

   YL and YT conceived the study. YT, XQ, XZ, and SL performed
   bioinformatics analysis. YL provided funding support. WY collected
   samples and extracted RNA samples. HW, JL, and FH administrated the
   project. BL and YS verified the sequencing results. All authors read
   and approved the final manuscript.

Funding

   This study was supported by National Natural Science Foundation of
   China (No.31602147), National Key R&D Program of China
   (No.2018YFD0900101), and China Agriculture Research System (No.
   CARS-47).

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Supplementary Material

   The Supplementary Material for this article can be found online at:
   [245]https://www.frontiersin.org/articles/10.3389/fgene.2019.01126/full
   #supplementary-material
   Supplementary Figure 1

   ROIs length distribution of three size bins.
   [246]Click here for additional data file.^ (264.2KB, jpeg)
   Supplementary Figure 2

   Examples of structures showing the gene with different types of
   isoforms. The blue bars indicated the annotated gene model in the L.
   maculatus genome. In (A), the orange bars indicated the isoform
   structure detected by Iso-Seq. Lines represented introns, and arrows
   indicated the orientation of transcription. In (B), novel gene and
   their transcripts were absent in the annotation of the genome.
   [247]Click here for additional data file.^ (2.5MB, jpeg)
   Supplementary Figure 3

   Examples of validation experiments for (A) novel isoforms, (B) APA
   event and (C) fusion transcripts.
   [248]Click here for additional data file.^ (3.2MB, jpeg)
   Supplementary Figure 4

   KEGG analysis of the genes with more than four isoforms. (A) KEGG
   pathway annotation; (B) Statistics of pathway enrichment.
   [249]Click here for additional data file.^ (2MB, jpeg)
   Supplementary Figure 5

   Examples of five types of alternative splicing events detected by
   Iso-Seq. The blue bars indicated the annotated gene model in the L.
   maculatus genome, and dark green bars indicated the transcripts
   structure detected by Iso-Seq. (A) Example of intron retention events
   in evm.TU.scaffold_294.3; (B) Example of exon skipping events in
   evm.TU.scaffold_229.19; (C) Example of alternative 3’ splice site
   events in evm.TU.scaffold_13.295; (E) Example of alternative 5’ splice
   site events in evm.TU.scaffold_81.120; (E) Example of mutually
   exclusive exon events in evm.TU.scaffold_98.14;
   [250]Click here for additional data file.^ (11.9MB, jpeg)
   Supplementary Figure 6

   Histogram of gene ontology classifications of L. maculatus fusion
   transcripts.
   [251]Click here for additional data file.^ (296.6KB, png)
   Supplementary Figure 7

   Volcano plot showing the DETs between the FW and SW treatment groups.
   The horizontal axis was the log2 fold change in SW relative to FW
   groups. The vertical axis was the -log10 false discovery rate. Green
   dots represented significantly down-regulated transcripts in SW
   relative to FW group, while red dots represented significantly
   up-regulated transcripts. Black dots represented transcripts without
   significant expression difference between two groups.
   [252]Click here for additional data file.^ (188.7KB, jpeg)
   Supplementary Figure 8

   Pie chart showing frequencies of five types of alternative spliced
   events in DETs.
   [253]Click here for additional data file.^ (292.6KB, jpeg)
   Supplementary Figure 9

   The transcripts structures of DETs with opposite expression patterns
   generated from 7 genes. Blue bars indicated the annotated gene model in
   the L. maculatus genome, and orange bars indicated the transcripts
   detected by Iso-Seq. Transcripts with DETs were marked with red dashed
   rectangles.
   [254]Click here for additional data file.^ (6.3MB, jpeg)
   [255]Click here for additional data file.^ (31.3KB, docx)
   [256]Click here for additional data file.^ (5.7MB, xls)
   [257]Click here for additional data file.^ (190.6KB, docx)

References