Abstract The sweet potato weevil, Cylas formicarius (F.) (Coleoptera: Brentidae), is an important pest of sweet potato worldwide. However, there is limited knowledge on the molecular mechanisms underlying growth and differentiation of C. formicarius. The transcriptomes of the eggs, second instar larvae, third instar larvae (L3), pupae, females, and males of C. formicarius were sequenced using Illumina sequencing technology for obtaining global insights into developing transcriptome characteristics and elucidating the relative functional genes. A total of 54,255,544 high-quality reads were produced, trimmed, and de novo assembled into 115,281 contigs. 61,686 unigenes were obtained, with an average length of 1,009 nt. Among these unigenes, 17,348 were annotated into 59 Gene Ontology (GO) terms and 12,660 were assigned to 25 Cluster of Orthologous Groups classes, whereas 24,796 unigenes were mapped to 258 pathways. Differentially expressed unigenes between various developmental stages of C. formicarius were detected. Higher numbers of differentially expressed genes (DEGs) were recorded in the eggs versus L3 and eggs versus male samples (2,141 and 2,058 unigenes, respectively) than the others. Genes preferentially expressed in each stage were also identified. GO and pathway-based enrichment analysis were used to further investigate the functions of the DEGs. In addition, the expression profiles of ten DEGs were validated by quantitative real-time PCR. The transcriptome profiles presented in this study and these DEGs detected by comparative analysis of different developed stages of C. formicarius will facilitate the understanding of the molecular mechanism of various living process and will contribute to further genome-wide research. Keywords: different stages, differentially expressed genes, sweet potato weevil, transcriptome __________________________________________________________________ Sweet potato (Ipomoea batatas (L.) Lam., Convolvulaceae) possesses remarkable importance in energy production, and the utilization as food, feed, and industrial products in China and worldwide. The sweet potato weevil, Cylas formicarius (F.) (Coleoptera: Brentidae) is one of the most destructive pest of sweet potato in the tropical and temperate growing areas, including China ([33]Kuriwada et al. 2010, [34]Prentice et al. 2015, [35]Reddy and Chi 2015). This pest not only causes huge production losses, but also seriously affects the quality of sweet potato, both in the field and in the storage. C.formicarius infests all plant parts: roots, stems, foliage, and flowers seeds ([36]Edison et al. 2009). Damage caused by the weevils is often worse during dry times ([37]Reddy et al. 2014). In addition to damage caused directly by tunneling, larvae also cause indirect damage by facilitating entry of soil-borne pathogens ([38]Uritani et al. 1975), which renders the product unsuitable for consumption. China accounts for the highest sweet potato production in the world ([39]FAOSTAT 2016). However, its productivity is severely constrained by insect feeding during the production and storage processes, especially by the sweet potato weevil. The yield losses caused by sweet potato weevil damage range from 5 to 50%, and can even lead to a complete loss under heavy infestations ([40]Wang et al. 2014). [41]Reddy and Chi (2015) collected life table data for C. formicarius grown on I. batatas, and showed that the weevil could survive more than 4 mo; the mean of total preoviposition period was 50.1 d and female adults did not contribute to the population growth after 92 d. In southern China, C. formicarius could produce several generations per year, and overwinter in storage or in open fields ([42]Wang et al. 2014). Eggs of C. formicarius are laid in cavities just below the skin of the root, and then larvae feed and pupate in the roots. The peak period of adult occurrence was from the beginning of August to October ([43]Wang et al. 2014, [44]Ye 2015). Control of the pest remains difficult. Efficacy of chemical insecticides to the weevils was limited by the cryptic nature of the larvae, the nocturnal activities of the adults, and increasingly pesticide resistance. Effective crop rotations could reduce the tuber damage compared to monoculture of sweet potato ([45]Pillai et al. 1996, [46]Ehisianya et al. 2013). Sweet potato weevils have the capacity to adapt and develop resistance to active proteins and compounds found or introduced into new sweet potato varieties ([47]Mwanga et al. 2011). Currently, there are few superior varieties with high level of resistance against C. formicarius. The high throughput sequencing technology is a powerful and cost-efficient tool for advanced research in many areas, including microRNA expression profiling, DNA methylation, especially de novo transcriptome sequencing for non-model organisms ([48]Varshney et al. 2009, [49]Parchman et al. 2010, [50]Wang et al. 2010b, [51]Sun et al. 2012, [52]Zhang et al. 2015c). Illumina sequencing of transcriptome for organisms with completed genomes confirmed that the relatively short reads produced could be effectively assembled and used for gene discovery and gene expression analysis ([53]Marioni et al. 2008, [54]Hegedűs et al. 2009, [55]Wang et al. 2011). The development of next-generation sequencing technologies and the de novo analysis of the transcriptome data could provide a clear insight into the molecular mechanism of different life process. [56]Heyland et al. (2011) analyzed temporal and spatial gene expression changes during embryogenesis, larval development, and metamorphic stages of Aplysia californica (Cooper) using microarrays and in situ hybridization, revealed novel molecular components associated with life history transitions. [57]Daines et al. (2011) provided interesting insights into the extent of alternate splicing in Drosophila melanogaster (Meigen) by the transcriptome of ten developmental stages of D. melanogaster. [58]Zheng et al. (2015) sequenced the transcriptome of the early parasitic second-stage juveniles of Heterodera avenae (Wollenweber) and revealed the genes and molecular mechanisms of the incompatible interaction between H. avenae and the host plant Aegilops variabilis. [59]Prentice et al. (2015) reported the transcriptome analysis of the second instar larvae of Cylas puncticollis (Boheman) and demonstrated that C. puncticollis exhibits a strong and systemic RNAi effect, suggesting the potential of RNAi as a future strategy to control C. formicarius. Though transcriptome sequencing has been broadly applied to illustrate gene expression patterns in many species, genomic, and transcriptomic details about C. formicarius, especially its biosynthetic pathways and gene expression in different stages, have remained largely unknown. Genomic resources available in public databases for C. formicarius were limited to a few sequences. Due to the limitations of information on its genetic basis, it is necessary to carry out an extensive transcriptome study to systematically gauge molecular differences associated with the development of different stages of C. formicarius. The management of the weevils is vital for sweet potato production. Understanding the gene expression changes at various developmental stages of the sweet potato weevil will provide important insight into mechanisms underlying growth, differentiation, stage structure, and this will be useful for the management programs. In this study, in order to establish a useful database of transcriptome sequences as well as of differentially expressed genes (DEGs) at different developmental stages of C. formicarius, we performed de novo transcriptome sequencing by the Illumina Hiseq2000 sequencing platform. A comprehensive analysis of the transcriptome data for eggs, larvae, pupae, and adults of C. formicarius were present. 61,686 distinct sequences including hundreds of stage specific and metabolism genes were identified. A total of 35,789 unigenes could be annotated with known biological functions; enrichment analysis and pathway analysis were applied to find interesting biological processes. We expect these new dataset will provide genomic resource and new leads for future studies of this species. Materials and Methods Insect Rearing and Sample Preparation A laboratory colony of the C. formicarius was established using the insects collected from Fujian province, China in 2011 and maintained on sweet potato storage roots at 28 ± 1°C, 75 ± 10% relative humidity. For this experiment, 100–150 virgin females and virgin males were collected separately at 2–8 d post eclosion to ensure they were actively feeding and sexually mature. Eggs and pupae were randomly picked out from the sweet potato with a brush. 100–150 second instar larvae (L2) and third instar larvae (L3) were collected ∼11 and 15 d post hatch, respectively. Different instars of C. formicarius were sampled from different generations and the total weight of each life stage (pooled whole body samples) is ∼500 mg. All samples were washed with diethyl pyrocarbonate treated water, then snap-frozen immediately in liquid nitrogen and stored at −80°C until further processing. RNA Extraction and Library Construction Total RNA form the five different developmental stages of C. formicarius (mixed sex of eggs, second instar and third instar larvae, pupae; adult females and adult males) were extracted individually using Trizol reagent (Invitrogen, Beijing, China) according to the manufacturer’s protocol. The RNA samples were treated with DNase I (Takara Biotech Co., Dalian, China) for purification from DNA contamination. RNA quality and purity were assessed using Nanodrop2000 spectrophotometer (Thermo Fisher Scientific, MA, USA) based on absorbance ratios at 260/280 and 260/230 nm. The integrity of the RNA preparation was verified using Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) with a minimum integrity number value of 8.0. RNA samples with high purity (OD260/280 between 1.9 and 2.0) and high integrity were selected and used for cDNA library construction. Poly(A) mRNA was purified from total RNA using oligo (dT) magnetic beads. Following purification, the mRNA is fragmented to small pieces using fragmentation buffer under elevated temperature and the cleaved RNA fragments were transcribed into first strand cDNA using reverse transcriptase and random primers. This was followed by second strand cDNA synthesis using DNA polymerase I and RNaseH. Then the high-throughput RNA-sequencing libraries (RNA-seq) were prepared following Illumina’s protocols and were sequenced on the Illumina Hiseq 2000 platform (Illumina, San Diego, USA) at the Beijing Genomics Institute (BGI; Shenzhen, China). The datasets were deposited in NCBI Sequence Read Archive (SRA, [60]http://www.ncbi.nlm.nih.gov/sra) under the accession number SRP067907. This transcriptome shotgun assembly project has been deposited at DDBJ/EMBL/GenBank under the accession [61]GEOI00000000. De Novo Transcriptome Assembly and Annotation The raw reads were cleaned by removing adaptor sequence, low-quality sequences (tags with ambiguous sequences ‘N’), empty tags (sequence with only adaptor sequences but no tags), low complexity, and tags with a copy number of 1 (probably sequencing error) ahead of assembly. The quality reads were de novo assembled with Trinity software ([62]Pertea et al. 2003). The TGI clustering tool ([63]Iseli et al. 1999) is used to assemble all the unigenes from different samples to form a single set of non-redundant unigenes. The unigenes were annotated by aligning with the deposited ones in diverse protein databases including NCBI non-redundant protein (NR), UniProt/Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG) database, Gene Ontology (GO) and Cluster of Orthologous Groups (COG) database using BLASTX (E-value ≤1e^−5). Sequence orientations and the coding sequence (CDS) of unigenes were determined according to the best hit in these databases. ESTScan ([64]Iseli et al. 1999) was used to predict the sequence direction and the CDS when unigenes were unaligned to any of the databases. With NR annotation, Blast2GO program ([65]Conesa et al. 2005) was used to get GO annotation of the unigenes. After getting GO annotation for every unigene, WEGO software ([66]Ye et al. 2006) were used to do functional classification for all unigenes and to decipher the distribution of gene functions at the macro level. Then we used the blast information to extract CDS from unigene sequences and translated them into peptide sequences. Differentially Expressed Genes For gene expression analysis, the number of expressed unigenes was calculated and then normalized to TPM (number of transcripts per million tags). A rigorous algorithm was developed to identify DEGs between two samples ([67]Audic and Claverie 1997, [68]Zhang et al. 2015b). The false discovery rate (FDR) was used to determine the threshold of P value in multiple tests ([69]Benjamini and Yekutieli 2001). Fold changes (log2 Ratio) were estimated according to the normalized gene expression level in each sample ([70]Wang et al. 2010a). “FDR ≤0.001 and the absolute value of log2 Ratio ≥1” were used as the threshold to judge the significance of gene expression difference. Pairwise and multi-condition analyses were used to detect DEGs between two samples and among the five development stages of C. formicarius. For the functional and pathway enrichment analysis, the DEGs were then mapped into GO terms and the KEGG databases, significantly enriched GO and KEGG terms were determined by P value ≤0.05. Cluster analysis of gene expression patterns were performed with “cluster” software ([71]Eisen et al. 1998). Quantitative Real-Time PCR Analysis of Gene Expression To validate the data obtained by the transcriptome sequencing, the relative expression levels of ten DEGs were analyzed by real-time quantitative PCR (qPCR) with three biological replicates. Specific primers of selected genes were designed by Primer Premier 5.0 (Premier Biosoft International, CA, USA) ([72]Table 1). Total RNA from ∼150 mg of the eggs, L2, L3, pupae, females, and males of C. formicarius was extracted separately as described above. First-strand cDNA was synthesized using AMV first strand cDNA synthesis kit (Sangon, Shanghai, China). The SYBR Green real-time PCR assay was performed on a LightCycler 480 real-time PCR instrument (Roche Diagnostics, Mannheim, Germany) using the SG fast qPCR master mix (BBI Life Science, Ontario, Canada) according to the manufacturer’s protocol in a total volume of 20 µl, containing 2 μl of cDNA, 0.4 μl of 10 μM of each primer, and 10 ml Master mix. The PCR was performed at 95°C for 3 min, followed by 40 cycles of 95°C for 7 s, 57°C for 10 s and 72°C for 20 s. The experiments were repeated three times. The melting curve analysis was conducted from 60 to 95°C. Each of the two primer pairs for the tested genes and endogenous references produced a single peak in melting curve