Abstract

   Seabuckthorn carpenter moth, Eogystia hippophaecolus (Lepidoptera:
   Cossidae), is an important pest of sea buckthorn (Hippophae
   rhamnoides), which is a shrub that has significant ecological and
   economic value in China. E. hippophaecolus is highly cold tolerant, but
   limited studies have been conducted to elucidate the molecular
   mechanisms underlying its cold resistance. Here we sequenced the E.
   hippophaecolus transcriptome using RNA-Seq technology and performed de
   novo assembly from the short paired-end reads. We investigated the
   larval response to cold stress by comparing gene expression profiles
   between treatments. We obtained 118,034 unigenes, of which 22,161 were
   annotated with gene descriptions, conserved domains, gene ontology
   terms, and metabolic pathways. These resulted in 57 GO terms and 193
   Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. By comparing
   transcriptome profiles for differential gene expression, we identified
   many differentially expressed proteins and genes, including heat shock
   proteins and cuticular proteins which have previously been reported to
   be involved in cold resistance of insects. This study provides a global
   transcriptome analysis and an assessment of differential gene
   expression in E. hippophaecolus under cold stress. We found seven
   differential expressed genes in common between developmental stages,
   which were verified with qPCR. Our findings facilitate future genomic
   studies aimed at improving our understanding of the molecular
   mechanisms underlying the response of insects to low temperatures.

Introduction

   Insects are poikilothermic animals and thus temperature is one of the
   most important abiotic factors influencing their distribution and
   behavior [[32]1,[33]2]. Nearly every aspect of an insect’s life is
   affected by temperature, including survival, development, reproduction,
   and longevity [[34]3–[35]6]. Nevertheless, insects have evolved to
   flourish in a wide variety of thermal environments [[36]3]. They have
   achieved this through the natural selection of a variety of strategies
   to cope with changes in ambient temperatures. Some insects overwinter
   by migrating to warmer climes or by moving to protected habitats
   [[37]7–[38]9]. Most insects, however, survive through the winter as a
   result of a variety of physiological and biochemical adaptations that
   can be grouped into two general strategies: freeze intolerance or
   freeze tolerance [[39]10–[40]12]. Freeze-intolerant insects are very
   sensitive to ice formation. They avoid freezing by supercooling to
   survive at low temperatures [[41]1]. They achieve this by reducing
   intra-cellular water content, synthesizing cryoprotectants, and
   actively removing ice-nucleating agents (INAs) [[42]7,[43]12,[44]13].
   Compared with freeze intolerance, the freeze-tolerant strategy is more
   advanced because it permits extracellular ice formation within tissues.
   Ice formation is triggered by INAs such as insect ice nucleating
   proteins (INPs), which are often lipoproteins [[45]14, [46]15].

   To date, considerable research has been conducted on the biochemical
   mechanisms of insect cold resistance. Increasingly, genomic and
   proteomic screening methods are being applied to discover the genes and
   proteins that enable winter survival, which is greatly broadening our
   understanding of important molecular processes. Two key proteins have
   so far been recognized to directly contribute to insect cold hardiness:
   anti-freeze proteins (AFPs) and INPs [[47]16]. Other proteins
   contribute to cold hardiness indirectly, including heat-shock proteins
   (HSPs), antioxidant enzymes, and enzymes involved in cryoprotectant
   synthesis and degradation [[48]17]. In the transcriptomic studies of
   Dunning et al., (2013) and Dennis et al., (2015), a set of structural
   cuticular genes were identified to be associated with stick insect
   adaptation to the alpine environment. [[49]18, [50]19]. They also
   identified three candidate genes that responded to low temperature:
   prolyl 4-hydroxylase subunit alpha-1, staphylococcal nuclease
   domain-containing protein 1 and a cuticular protein gene [[51]20].
   Durant et al’s (2015) research on transcription profiling of the brain
   in Ceratina calcarata showed that changes in gene expression associated
   with overwintering, included an underrepresentation of genes related to
   muscle fibers and an overrepresentation of genes related to lipid
   metabolic processes [[52]21].

   Because RNA transcript levels generally reflect physiological demand,
   changes in transcript abundance should identify genes involved in an
   organism’s response to extrinsic stimuli [[53]22]. Transcriptome
   analysis using Next Generation technologies has been proven to be a
   powerful method for exploring the underlying basis of the cold response
   in a variety of insect lineages. Examples include the study of
   transcriptional profiling of overwintering gene expression in a
   carpenter bee [[54]21], the identification of cold-responsive genes in
   an alpine stick insect [[55]20], the analysis of divergent
   transcriptional responses in alpine and lowland stick insects [[56]18],
   and the study of cold adaptation in eight genera stick insects
   [[57]19]. Compared with traditional differential genes expression
   methods such as expressed sequence tags (EST), microarrays, serial
   analysis of gene expression (SAGE), or massively parallel signature
   sequencing (MPSS), a complete transcriptome sequence and analysis can
   be completed for any species with high efficiency without a reference
   genome. Moreover, transcripts normally expressed at very low levels can
   be detected and quantified [[58]23–[59]26]. While transcriptome
   analysis have been used in studies of the cold response of Drosophila
   melanogaster and D. subobscura [[60]27–[61]29], its application to
   study cold hardiness in other insects has been quite limited.

   The seabuckthorn carpenter moth Eogystia hippophaecolus (Hua, Chou,
   Fang & Chen, 1990) is the dominant boring insect pest of sea buckthorn
   (Hippophae rhamnoides L.). Sea buckthorn is a shrub that is distributed
   across northern and western China. The plant grows well in sandy soils
   in cold climates. It has been extensively planted to prevent soil
   erosion and desertification, and also to serve as a source for food and
   medicine products [[62]30]. E. hippophaecolus completes a generation
   every three to four years and most of its life history is spent in 16
   larval stages lasting about 31 months [[63]31]. Eggs are mainly
   deposited in wounds or in cracks of the bark. The egg stage lasts 7–30
   days. The early larvae feed on trunk phloem at first. Small larvae
   often congregate in large numbers and later migrate to the base of the
   truck or to the roots before winter. Larvae are very cold tolerant and
   can survive in temperatures lower than -30°C. The larvae obstruct water
   transport in sea buckthorn plants by burying into the trunk and roots.
   The shape of the galleries is irregular and most are not connected to
   each other. Each gallery is filled with sawdust and fecula. The mature
   larvae will climb out of the wormhole, and then burrow 15 cm deep into
   the soil in June. The pupal stage lasts about 30 days and is followed
   by eclosion, mating, egg-laying, and hatching in July, followed by
   overwintering in October [[64]32]. Its voraciousness, high reproduction
   rate, and inconspicuous habit make the pest difficult to control. In
   China, seabuckthorn carpenter moth is considered a major threat to sea
   buckthorns, and it can also feed on other plants, such as elms
   [[65]33,[66]34]. In preliminary studies measuring physiological indices
   (e.g., SCP, freezing point, etc.) of cold resistance in E.
   hippophaecolus, we classified the seabuckthorn carpenter moth as a
   highly freeze-tolerant organism [[67]35].

   In this study, we used RNA-Seq and de novo transcriptome assembly to
   generate transcriptomes and examine the changes in the regulation of
   transcription associated with cold treatment. Detailed differential
   expression analysis revealed a number of candidate genes that are
   potentially related to the cold tolerance of E. hippophaecolus. Our
   results provide new genetic information about this insect pest and
   provide a rich resource for the discovery and identification of novel
   genes involved in the cold stress response.

Materials and methods

Insect material and RNA extraction

   E. hippophaecolus larvae were collected from Jianping County, Chaoyang
   City, Liaoning Province, China, in late December 2014. The latitude and
   longitude of the collection site are 119.637° E and 41.3954° N. Given
   the long life history and many larval stages of this species, we chose
   three developmental stages (seventh instar, tenth instar, and
   thirteenth instar) [[68]31] and two treatment temperatures to
   characterize variation in gene expression. Healthy larvae in different
   developmental stages were treated as presented in [69]Table 1. Prior to
   treatment, we cleaned the moths with sterile water and starved them for
   three days. All of the instruments used to handle the moths were
   sterilized before use. After housing for 48 h at room temperature
   (25°C), three samples of each development stage were treated
   differently: one stayed at room temperature as a control and the other
   two were kept in 5°C and -5°C, respectively, for 10 h, and then
   snap-frozen in liquid nitrogen and stored at -80°C until the RNA
   extraction step. Total RNA was extracted using TRIzol reagent (Ambion)
   and the RNeasy Plus Mini Kit (No. 74134; Qiagen, Hilden, Germany)
   following the manufacturer’s instructions. A NanoDrop 8000 (Thermo,
   Waltham, MA, USA) was used to quantify the RNA. The mRNA samples were
   purified and fragmented using the TruSeq RNA Sample Preparation Kit
   v2-Set A (No. RS-122-2001; Illumina, San Diego, CA, USA) to remove
   rRNA. After additional quality control, cDNA library construction,
   Illumina sequencing, and de novo transcriptome assembly were performed
   by the Beijing Biomarker Biotechnology Co. (Beijing, China). In view of
   the costs, 5°C and -5°C samples were pooled by instar to construct the
   cDNA library. This method has been used in previous transcriptome
   studies [[70]36, [71]37]. No specific permits were required for the
   described field studies. The location is not privately owned or
   protected, and the field studies did not involve endangered or
   protected species.

Table 1. Schematic table of the experimental design used to investigate
effects of cold stress.

                       control group Treatment group
   Developmental stage     25°C      5°C     5°C
       7th instar           T1             T4
       10th instar          T2             T5
       13th instar          T3             T6
   [72]Open in a new tab

   Note: There was one larva in each different treatment. After treatment,
   the total RNA of larvae under 5°C and 5°C were mixed for sequencing
   library construction. Samples were named T1~T6.

Illumina sequencing and de novo transcript assembly

   We sequenced the E. hippophaecolus transcriptome on an Illumina HiSeq
   2500 platform and performed de novo assembly from the short paired-end
   reads (200bp). Before transcript assembly, all raw reads were processed
   to remove low-quality and adaptor sequences by Trimmomatic 0.32
   ([73]http://www.usadellab.org/cms/index.php?page=trimmomatic) using the
   following parameters (Phred64 LEADING:3 TRAILING:3 SLIDING WINDOW:4:15
   MINLEN:36), while the unpaired reads were discarded [[74]38]. The clean
   reads were then assembled into contigs and the contigs were
   subsequently combined into transcripts based on paired-end information
   using the Trinity platform (Version: r2013.11.10) [[75]39]. CD-HIT was
   used to exclude redundant sequences and the clustering was implemented
   using TGICL (2.1) [[76]40] to combine sequences with more than 90%
   similarity and an overlapping length longer than 35bp. The completeness
   assessment was performed by BUSCO (version 2), using the insecta
   lineage [[77]41]. The transcripts were blasted to various RNA databases
   including Silva, GtRNAdb by BLAST software (version 2.2.26) for
   filtering (e-value≤10^−5). After filtering, ribosomal RNA (rRNA),
   transfer RNA (tRNA), small nuclear RNA (snRNA), microRNA (miRNA), small
   nucleolar RNA (snoRNA), repeat sequences, and other non-coding RNA
   (ncRNA) were removed. The list of the sequences identified as ncRNA has
   been shown in [78]S1 Table.

   The raw data has been submitted to the NCBI Sequence Read Archive (SRA)
   with accession number SRR4409152. The mapped read counts have been
   submitted to the Gene Expression Omnibus (GEO) with accession number
   [79]GSE92686.

Gene functional annotation

   We utilized BLAST software (version 2.2.26) [[80]42] to obtain
   homologous sequences (e-value≤10^−5) by searching several public
   protein databases, including the National Center for Biotechnology
   Information (NCBI) non-redundant (nr) protein database (downloaded on
   02/2015) [[81]43], the Swiss-Prot protein database 51.0 [[82]44], the
   Gene Ontology (GO) database (downloaded on 12/2014) [[83]45], the Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathway database 73.1
   [[84]46], and the Cluster of Orthologous Groups (COG) database
   (downloaded on 2013) [[85]47]. We used the single best blast hit only
   to identify the contig and to extract GO information, while
   non-arthropod contigs were filtered out.

   After predicting the amino-acid sequences of unique genes by using
   Transdecoder (downloaded on 2015), we obtained functional information
   about the unigenes by comparing these to the Pfam database using HMMER
   software (e-value≤10^−10) [[86]48].

Differential gene expression analysis

   Bowtie (4.4.7) [[87]49] was used to align reads to reference sequences.
   Gene expression levels were measured in the RNA-Seq analysis as reads
   per kilobase of exon model per million mapped reads (RPKM) [[88]50].
   The EBSeq software was used to identify differentially expressed genes
   in pair-wise comparisons and separate models were fit for each
   development stage[[89]51]. The results of all statistical tests were
   corrected for multiple testing with the Benjamini–Hochberg false
   discovery rate with the following parameter values: FDR < 0.01.
   Sequences were deemed to be significantly differentially expressed if
   the adjusted P value obtained by this method was <0.001 and there was
   at least a twofold change (>1 or <-1 in log2 ratio value which were
   calculated by the average RPKM value of treatment libraries divided by
   the average RPKM value of control libraries). K-Means clustering of
   gene expression was performed using the R Package software v.3.1.3 to
   examine the relationships between samples. GO annotation and KEGG
   pathway analysis were also performed to assess the differential
   expression results. To annotate the gene with gene ontology (GO) terms,
   the nr BLAST results were imported into the Blast2GO program (Conesa,
   Gotz et al. 2005). GO annotations for the genes were obtained by
   Blast2GO. This analysis mapped all of the annotated genes to GO terms
   in the database and counted the number of genes associated with each
   term. A Perl script was then used to plot GO functional classification
   for the unigenes with a GO term hit to view the distribution of gene
   functions. The gene sequences were also aligned to the Clusters of
   Orthologous Group (COG) database to predict and classify functions
   (Tatusov, Galperin et al. 2000). We used KOBAS
   ([90]http://kobas.cbi.pku.edu.cn/home.do) to annotate a set of
   sequences with KO terms and identify both the most frequent and the
   statistically significantly enriched pathways. KEGG pathways were
   tested for enrichment using the right-sided Fisher's exact test. We
   removed the T3 transcript and its corresponding control, T6, from the
   DEG analysis because of large, unexplained variation or low correlation
   between the T3 and other samples.

Gene expression validation by real-time quantitative PCR

   The seven genes DE in response to cold treatment in both developmental
   stages were chosen for validation using qRT-PCR. Total RNA was reverse
   transcribed using the PrimeScript RT Reagent Kit with gDNA Eraser (No.
   RR047A; TaKaRa, Shiga, Japan). Gene-specific primers were designed
   using Primer3 ([91]http://bioinfo.ut.ee/primer3-0.4.0/). qRT-PCR was
   performed on a Bio-Rad CFX96 real-time PCR System (Hercules, CA, USA)
   using SYBR Premix Ex-Taq II (No. RR820A; TaKaRa), according to the
   manufacturer's instructions.

   The template cDNAs were diluted 20-fold with nuclease-free deionized
   water. Of this, 2 μl was added to the reaction mixture (25 μl), which
   included 12.5 μl of SYBR Premix Ex Taq II, 1 μl of each primer (10 μM),
   and 8.5 μl of ddH[2]O. The PCR was performed with the following thermal
   cycle: 95°C for 30 s, 40 cycles at 95°C for 5 s, 60°C for 30 s and 65°C
   to 95°C in increments of 0.5°C for 5 s to generate melting curves. Each
   reaction was performed on three biological and four technical
   replicates. The average threshold cycle was calculated to estimate the
   relative gene expression levels using the 2^−ΔΔCt method [[92]52]. The
   data were expressed as the mean ± SE, and all primer information is
   listed in [93]S2 Table. A t-test was performed for the qPCR data.
   β-Actin was used as the reference gene to normalize target gene
   expression [[94]53].

Results

Sequencing and assembly

   A total of 39.00 Gb of clean data passed the Illumina quality filter
   after transcriptome sequencing of six cDNA samples. The number of the
   paired-end reads and the length of these reads are provided in [95]S3
   Table. The Q30 and GC contents were 88.30% and 44.93%, respectively.
   The quality control result of six samples was seen in [96]S3 Table. The
   de novo assembly using Trinity software and standard parameter values
   yielded 191,200 transcripts (none shorter than 200 bp) with an N50 of
   1,687 bp. The unigene dataset included 118,034 sequences with an N50 of
   904bp. The length distribution of unigenes closely followed the length
   distribution of transcripts [97]S1 Fig). Coverage of clean reads mapped
   to the assembly was provided in [98]S3 Table. The completeness
   assessment result showed that 90.0% of BUSCO genes were “complete”, of
   which 84.3% were single-copy BUSCOs and 5.7% were duplicated BUSCOs.
   6.2% were “fragmented”, while the remaining 3.8% were “missing”.

Annotation of predicted proteins

   After annotation, genes with a significant blast hit to arthropods were
   identified. In total, 22,161 unigenes were found in at least one public
   database, such as the nr database, the Swiss-Prot protein database, the
   GO database, the KEGG database, and the COG database. The SwissProt
   database (21,452 annotated unigenes, 21.47%) had the most matches,
   followed by the nr database (21,232, 21.25%). A total of 77,735
   (77.82%) unigenes were not found in the public databases ([99]Table 2).
   Overall, most of the unigenes either could not be annotated or their
   descriptions were uninformative (e.g., putative, unknown, hypothetical,
   or unnamed protein). Among the annotated unigenes, more than 70% of
   them matched with other lepidopteran species. The best homologous
   species match in the nr database for each sequence is shown in [100]S2
   Fig. The top matched species were the silkworm (Bombyx mori), with 8627
   unigenes (40.73%), followed by the monarch butterfly (Danaus
   plexippus). The blast information for each contig was provided in
   [101]S4 Table.

Table 2. Statistics of annotation analysis of unigenes.

   Databases            Unigene 300≤length<1000 length≥1000
   COG_Annotation       4,452   692             3,272
   GO_Annotation        10,525  2,613           5,578
   KEGG_Annotation      7,611   1,708           4,257
   KOG_Annotation       10,784  2,266           6,763
   Pfam_Annotation      12,413  2,893           7,857
   Swissprot_Annotation 21,452  6,629           9,654
   nr_Annotation        21,232  6,540           9,621
   All                  22,161  6,967           9,723
   [102]Open in a new tab

Global functional classification of unigenes by GO and KEGG

   One unigene can have different functional annotations; therefore,
   38,435 unigenes in total with a match in the GO database were
   classified by their GO terms. The unigenes could be categorized into 57
   functional sub-groups of the three main GO groups: 15,798 unigenes fell
   in the biological process group, 6,721 in the cellular component group,
   and 15,916 in the molecular function group ([103]Fig 1). The most
   frequent GO terms were: metabolic process (6,071 unigenes), binding
   (5,492 unigenes), cellular process (5,258 unigenes), catalytic activity
   (5,067 unigenes), single-organism process (4,354 unigenes), cell part
   (3,159 unigenes), cell (3,156 unigenes).

Fig 1. Gene ontology (GO) annotations of the annotated unigenes.

   [104]Fig 1
   [105]Open in a new tab

   38,435 unigenes were assigned to three GO categories containing 57
   functional subcategories. The y-axis on the right shows the number of
   genes in category, and that on the left shows the percentage of a
   specific category of genes in that main category.

   Pathway enrichment analysis based on the KEGG database was used to
   identify enriched metabolic pathways or signal transduction pathways.
   In total, 5,990 unigenes were assigned to 193 pathways ([106]Fig 2).
   The pathway categories with the largest number of unigenes were purine
   metabolism and ribosome (both 208 unigenes), followed by oxidative
   phosphorylation (174), RNA transport (170), Protein processing in
   endoplasmic reticulum (158), spliceosome (138), carbon metabolism
   (137), peroxisome (133), and lysosome (115).

Fig 2. Pathway assignment based on KEGG mapping.

   [107]Fig 2
   [108]Open in a new tab

DEGs in E. hippophaecolus larvae under cold stress

   The correlations between samples were shown in [109]S3 Fig. Based on
   the correlations, the clustering of samples was presented in [110]S4
   Fig. It can be seen sample T3 was not correlated well with other
   samples (r^2 all less than 0.6). Use of such samples is likely to
   result in the spurious detection of DEG, and so sample T3 and its
   corresponding control, T6, were excluded from the DEG analysis. The
   remaining two groups (T1-T4, T2-T5) were then used to explore the genes
   involved in cold resistance. The DE information for each contig was
   provided in [111]S5 Table.

   We applied a rigorous procedure to identify DEGs (see [112]Methods) and
   identified 427 unigenes differentially expressed between group T1 and
   T4, and 71 between group T2 and T5 ([113]Fig 3). About half of the DEGs
   had annotations. The annotations of the DEGs were shown in [114]S6
   Table. Only seven DEGs were shared by the two groups, three of which
   were annotated. The blast results of these seven genes were shown in
   [115]Table 3. GO annotation was used to evaluate the function of DEGs
   that were significantly different between treatments ([116]Fig 4).
   Between T1 and T4, the most significant items were “Biological Process:
   metabolic process”, “Molecular Function: catalytic activity”,
   “Biological Process: cellular process”, “Molecular Function: binding”,
   and “Cellular Component: cell part” ([117]Fig 4a). When comparing T2
   and T5, unigenes were assigned to “Biological Process: metabolic
   process”, “Molecular Function: binding”, “Molecular Function: catalytic
   activity”, “Biological Process: cellular process” and “Molecular
   Function: structural molecule activity” ([118]Fig 4b).

Fig 3. Distribution of DEGs between samples.

   [119]Fig 3
   [120]Open in a new tab

   (A) Distribution of DEGs between T1 and T4. (B) Distribution of DEGs
   between T2 and T5.

Table 3. Common DEGs differentially expressed genes in two groups (T1-T4 and
T2-T5) from RNAseq.

   Unigene ID Blast Blastx Match Log[2]FC^a Regulated^a Log[2]FC^b
   Regulated^b
   annotation Acc. Number E-value Identity
   c79869.graph_c1 endonuclease-reverse transcriptase [Bombyx mori]
   [121]ADI61830.1 0 64% 3.567980225 up 3.829149437 up
   c76833.graph_c0 PREDICTED: uncharacterized protein LOC105570877
   [Vollenhovia emeryi] [122]XP_011883719.1 1.00E-152 47% 3.587111514 down
   4.534550285 up
   c73192.graph_c0 PREDICTED: uncharacterized protein LOC106708429 isoform
   X1 [Papilio machaon] [123]XP_014355420.1 0 64% 6.026349523 up
   7.944625711 up
   c59923.graph_c0 putative axonemal leucine-rich repeat protein [Danaus
   plexippus] [124]EHJ70571.1 5.00E-128 67% 2.600939044 normal 2.969087192
   normal
   c33229.graph_c0 -- -- -- -- 9.625223032 up 8.727798365 up
   c36418.graph_c0 -- -- -- -- 2.841664353 up 5.818570542 up
   c68914.graph_c0 -- -- -- -- 2.459464841 up 6.968321714 up
   [125]Open in a new tab

   Note: FC^a indicates the ratio of RPKM of T4 sample divided by the RPKM
   of T1 sample; FC^b indicates the ratio of RPKM of T5 sample divided by
   the RPKM of T2 sample; Regulated^a indicates the expression level of
   DEGs in T4 compared to T1, including three situations, up-regulated,
   down-regulated or kept unchanged; other columns are functional
   annotations of unigenes; Regulated^b indicates the expression level of
   DEGs in T5 compared to T2; “--” means no information.

Fig 4. GO annotations of the DEGs.

   [126]Fig 4
   [127]Open in a new tab

   (A) GO annotations of the DEGs between T1 and T4. (B) GO annotations of
   the DEGs between T2 and T5.

   KEGG pathway enrichment analysis revealed the primary DEG pathways
   ([128]Fig 5). Only group T1-T4 had pathway enrichment result, because
   DEGs between T2 and T5 were too few to be identified in the enrichment
   analysis. The most remarkable pathway was “ribosome”, which is
   dominated by ribosomal proteins. Another pathway was “oxidative
   phosphorylation”, which includes several enzymes, including cytochrome
   oxidase subunit, NADH dehydrogenase subunit, and ATP synthase subunit.
   Proteins present in the “spliceosome” pathway were Hsp70, RNA-binding
   protein, and ATP-dependent RNA helicase DDX5/DBP2. Chitinase was the
   changed protein in “amino sugar and nucleotide sugar metabolism”
   pathway. Hsp70 and cyclic AMP-dependent transcription factor ATF-4 were
   the changed participants of the “protein processing in endoplasmic
   reticulum” pathway. Genes of some enzymes, such as acetyl-CoA
   synthetase, gluconolactonase, glucuronosyltransferase,
   alpha-glucosidase, DNA-directed RNA polymerase III subunit RPC1,
   glutathione S-transferase, glycerol-3-phosphate O-acyltransferase,
   formyltetrahydrofolate dehydrogenase, lysosomal acid phosphatase,
   porphobilinogen synthase, were also differentially expressed during the
   cold stress. KEGG Pathway Maps were provided in [129]S1 File.

Fig 5. Pathways enriched for DEGs in T1-T4.

   [130]Fig 5
   [131]Open in a new tab

   The y-axis indicates the pathways and the x-axis indicates the
   percentage of unigenes in each pathway.

QRT-PCR validation

   This experiment involved RT-PCR (reverse transcription PCR), gel
   electrophoresis, Sanger sequencing, and qPCR of seven common DEGs
   identified in the RNA sequence data. The non-redundant transcripts from
   the transcriptome data and the RT-PCR fragments showed high sequence
   identities, validating the accuracy of the Illumina sequencing and de
   novo assembly. The qPCR results are presented in [132]Fig 6. It can be
   seen that expression changes were in the same direction for the qPCR,
   but the magnitude differed. The FKPM of these genes were shown in
   [133]Table 3. In general, the results were consistent with the Illumina
   sequencing data, corroborating the reliability and accuracy of the
   transcriptome analysis. This ensures the feasibility and sustainability
   of our further research on these or other DEGs from the transcriptome
   data.

Fig 6. Validation of the DEGs by qRT-PCR.

   [134]Fig 6
   [135]Open in a new tab

   c33229.graph_c0~ c79869.graph_c1 are unigene IDs. T1&T2 represent the
   control groups; T4&T5 represent the low temperature groups. Relative
   expression values are presented as the mean ± SE of three biological
   replicates. Differences statistically significant are indicated by
   asterisk (P<0.05).

Discussion

   In this study, we conducted a comprehensive transcriptome analysis and
   characterized the gene expression profiles of E. hippophaecolus under
   cold stress. Through the analysis of DEGs sets, we identified
   transcriptome changes in E. hippophaecolus larvae.

   From the assembly result, it can be seen that the length of 57.79% of
   the transcripts and 73.76% of the unigenes are less than 500 bp. N50
   length of the transcripts is 1,687 bp, while the length of the unigenes
   is 904 bp. The completeness assessment by BUSCO indicates the
   comparatively high quality of the assembly according to the study of
   Simão et al. (2015) [[136]41]. Our results are in line with other
   insect transcriptome projects using Illumina technology [[137]36,
   [138]54].

   Only 22.39% of the unigenes produced a significant blast hit to an
   arthropod sequence in the nr database. This can be partly attributed to
   the limited information available on Lepidoptera transcriptomes and
   genomes. Moreover, some genes in this moth could be highly diverged,
   thereby excessing the blast threshold. Unsurprisingly, most of the E.
   hippophaecolus unigenes matched sequences from other lepidopteran
   species.

   In this study, DEG libraries from cold treated and untreated E.
   hippophaecolus larvae were compared to identify functional genes at the
   whole-organism level. There were small-scale changes in gene
   expression. The number of DEGs from the T1-T4 comparison was larger
   than in the T2-T5 comparison. This difference may due to the different
   stages of larvae. Young larvae of E. hippophaecolus may not be as
   innately cold tolerant as older larvae to resist cold, so they may have
   to regulate related genes intensively to compensate. It has been
   previously noted in Drosophila montana that genes implicated in
   increasing cold tolerance are different between diapausing and
   non-diapausing flies, and that this could be due to different energy
   budgets at different life stages [[139]55]. There were seven DE genes
   in common between developmental stages. Little previous work has been
   conducted on these genes, but our work suggests they are important for
   cold tolerance. Although there was little overlap in the genes between
   stages, there were some overlaps in the processes involved. Processes
   such as “binding”, “RNA binding”, “structural constituent of cuticle”,
   “N-acetylmuramoyl-L-alanine amidase activity” and “peptidoglycan
   catabolic process”, in which genes involved were all up-regulated;
   “nucleic acid binding” and “nucleic acid binding transferase activity”
   in which genes involved were all down-regulated were found in common
   through GO analysis.

   From the pathway analysis, we found some hsps were differentially
   expressed. HSPs are well-known chaperone proteins that play important
   roles in cold hardiness and organism responses to other stressors
   (e.g., heat, low oxygen levels, UV radiation, bacterial and viral
   infection, heavy metals) that can affect the folding and functional
   conformation of proteins [[140]56, [141]57]. That not all HSP genes
   were upregulated is consistent with previous studies [[142]58, [143]59]
   and highlights that the mechanisms regulating these genes need further
   research. The “oxidative phosphorylation” pathway, which involves
   several enzymes, including cytochrome oxidase subunit, NADH
   dehydrogenase subunit, and ATP synthase subunit, was significantly
   downregulated under cold treatment. Antioxidant defense is a cell
   preservation strategy in response to environmental challenges [[144]60]
   (e.g., heat, cold, UV radiation, osmotic challenge, toxins). Genes
   associated with the “metabolism of xenobiotics by cytochrome P450”
   pathways were modestly downregulated. It has been previously reported
   that cytochrome P-450 is related to temperature regulation [[145]61].

   In addition, genes for a number of cuticular proteins were also
   upregulated. These proteins are essential during insect cuticle
   formation and in the development of insect body structures and organs.
   Previous studies have shown that some insects upregulate the expression
   of cuticular protein genes to resist cold or other environmental
   stressors [[146]19, [147]62–[148]64]. To investigate the changes of
   structure or amount of the cuticle in response to cold could be an
   avenue for future work.

   In this research, we also discovered many DEGs and the roles they play
   in cold tolerance in insects are unknown. Though it is hard to
   elucidate the molecular mechanisms underlying a complex process from
   the action of a few genes, the functions of these genes are worth being
   verified in future studies.

   Curiously, the proteins most associated with insect cold hardiness,
   AFPs, were not found in the transcriptome profile. There may be several
   reasons for this. Firstly, although AFPs have been well studied, only
   about 200 insect AFP genes have been registered in GenBank. Secondly,
   the relationship between mRNA transcript levels and protein levels is
   complex. Integrated proteome/transcriptome studies have shown that mRNA
   levels are poorly associated with protein levels for all but the most
   abundant proteins [[149]65]. Thirdly, it may be that more
   genes/proteins would be stimulated or expressed in the chill recovery
   phase [[150]56]. Fourthly, genes involved in increasing cold tolerance
   have been shown to differ even between closely related species
   [[151]66]. Finally, it is generally presumed that AFPs play a greater
   role in the cold resistance of freeze intolerant insects; for freeze
   tolerant species, on the other hand, AFPs may not be important
   [[152]12].

   During the transcriptome construction, we took into account the
   development stage of the moth, but neglected the replicates, thus might
   cause problems such as false positives when conducted DE analysis.
   Mixing samples in one RNA-seq sample may cause inconsistent of gene
   expression. The parallel experiment within different development stages
   could provide a reference when we choose interesting genes, thus we
   focus on the common DEGs in the parallel tests. Moreover, further
   validation results of the DEGs using qRT-PCR showed the consistency of
   the transcriptome data. This study provides a profile of the candidate
   genes that are suspected to be involved with insect cold hardness. Many
   further studies should be carried out to test and verify the function
   the candidate genes.

   In conclusion, this study is the first to characterize the
   transcriptome of E. hippophaecolus larvae without a reference genome.
   Our data will facilitate future genomic research and in-depth
   annotation of insect genomes. Moreover, the preliminary identification
   of genes and pathways exhibiting differential expression under cold
   stress may further our general understanding of cold resistance in
   insects.

Supporting information

   S1 Fig. Statistics of de novo assembly of the transcriptome.

   (a) Distribution of all unigenes length. (b) Distribution of all
   transcripts lengths. The x-axis indicates the length and y-axis
   indicates the number of unigenes or transcripts.

   (TIF)
   [153]Click here for additional data file.^ (1.8MB, tif)
   S2 Fig. Characteristics of the homology search of Illumina sequences in
   the nr database.

   (TIF)
   [154]Click here for additional data file.^ (267.9KB, tif)
   S3 Fig. Correlation between samples.

   (TIF)
   [155]Click here for additional data file.^ (973.4KB, tif)
   S4 Fig. Correlations matrix heat map.

   (TIF)
   [156]Click here for additional data file.^ (669.8KB, tif)
   S1 Table. The list of the sequences identified as ncRNA.

   (XLSX)
   [157]Click here for additional data file.^ (441.9KB, xlsx)
   S2 Table. Primers for real-time PCR.

   (PDF)
   [158]Click here for additional data file.^ (120.8KB, pdf)
   S3 Table. Part of the sequencing and assembly results.

   (DOCX)
   [159]Click here for additional data file.^ (78.5KB, docx)
   S4 Table. The blast information for each contig.

   (XLSX)
   [160]Click here for additional data file.^ (1.6MB, xlsx)
   S5 Table. DE information for each contig.

   FDR: false discovery rate; Log[2]FC: log[2] fold change of DE genes;
   regulated: the expression level of DEGs in T04/T05 compared to T01/T02,
   including three situations, up-regulated, down-regulated or normal;
   T01~T05 are FPKM values of unigenes in each sample.

   (XLSX)
   [161]Click here for additional data file.^ (5.5MB, xlsx)
   S6 Table. Annotations of the DEGs.

   _FPKM: FPKM values of unigenes in each sample; FDR: false discovery
   rate; Log[2]FC: log[2] fold change of DE genes; regulated: the
   expression level of DEGs in T04/T05 compared to T01/T02, including
   three situations, up-regulated, down-regulated or normal; other columns
   are functional annotations of unigenes.

   (XLSX)
   [162]Click here for additional data file.^ (62KB, xlsx)
   S1 File. KEGG pathway maps of DEGs in T1-T4.

   (ZIP)
   [163]Click here for additional data file.^ (1.5MB, zip)

Acknowledgments