Abstract

   The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an
   invasive forestry pest rapidly expanding in many countries. This pest
   poses a considerable threat to the urban forestry ecosystem, especially
   to Platanus spp. However, its molecular biology and biochemistry are
   poorly understood. This study reports the first C. ciliata
   transcriptome, encompassing three different life stages (Nymphs, adults
   female (AF) and adults male (AM)). In total, 26.53 GB of clean data and
   60,879 unigenes were obtained from three RNA-seq libraries. These
   unigenes were annotated and classified by Nr (NCBI non-redundant
   protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam
   (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins),
   Swiss-Prot (A manually annotated and reviewed protein sequence
   database), and KO (KEGG Ortholog database). After all pairwise
   comparisons between these three different samples, a large number of
   differentially expressed genes were revealed. The dramatic differences
   in global gene expression profiles were found between distinct life
   stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM),
   with some of the significantly differentially expressed genes (DEGs)
   being related to metamorphosis, digestion, immune and sex difference.
   The different express of unigenes were validated through quantitative
   Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition,
   17,462 potential simple sequence repeat molecular markers were
   identified in these transcriptome resources. These comprehensive C.
   ciliata transcriptomic information can be utilized to promote the
   development of environmentally friendly methodologies to disrupt the
   processes of metamorphosis, digestion, immune and sex differences.

Introduction

   The sycamore lace bug, Corythucha ciliata (Say, 1832) is a native North
   American insect that specifically feeds on Platanus spp. (Platanaceae),
   including P.× occidentalis, P. ×acerifolia and P. ×orientalis
   [[34]1,[35]2]. The bugs feed on the undersides of the leaves of plane
   trees (Platanus spp.), leading to a white stippling that eventually
   progress into chlorotic or bronzed foliage and premature senescence of
   leaves [[36]1]. C. ciliata may transmit two fungi, Ceratocystis
   fimbriata Hell. et Halsted forma specialis platani Walter and
   Apiognomonia (= Gnomonia) veneta (Sacc. and Spreg.)[[37]3]. The
   insulted trees will defoliate earlier, stop upward growth, become weak
   and even die [[38]4,[39]5]. A recent study found that C. ciliata can
   suck human blood, leading to health problem in human [[40]6].

   C. ciliata was first discovered in Europe in 1964 in Padova, Italy and
   has now spread through many countries in the world [[41]1,[42]7]. C.
   ciliata migrates quickly and now it has been detected in at least 26
   cities of 12 provinces since it was first found in 2002[[43]8] in
   China, including Hunan, Hubei, Shanghai, Zhejiang, Jiangsu, Shandong,
   Henan, Chongqing and Guizhou, etc [[44]9]. This pest has been included
   in the list of dangerous forest pest and it was even listed as a
   moderate dangerous forest pest by the Chinese Forestry Administrative
   Department in 2007[[45]10].

   Currently, the management of C. Ciliata was depended heavily on the use
   of insecticides. However, large-scale application of insecticides can
   lead to many problems in urban area[[46]1]. In addition, C. ciliata has
   developed the tolerances to insecticides with phosphorus acid ester
   ingredient [[47]11]. Faced with this situation, the development of
   effective and environmentally sound novel management strategies
   including behavioral manipulations[[48]12], neurophysiological
   manipulations[[49]13] and RNAi methods[[50]14] was necessary. The
   genomic and transcriptomic resources of C. ciliata have been needed to
   aid the development of novel method for managing this insect.
   Furthmore, C. ciliata represents one of a group of important
   agricultural and forestry pests, highly invasive herbivorous bugs of
   the Tingidae family-which include avocado lace bug, Pseudacysta
   perseae, subsocial lace bug, Gargaphia solani, azalea lace bug
   Stephanitis pyrioides (Scott), and Chrysanthemum lace bug, Corythucha
   marmorata (Uhler,1878)[[51]15–[52]17]. Thus, the C. ciliata
   transcriptome could serve as a reference transcriptome for promoting
   the molecular biology and evolutionary studies of Tingidae species.

   In the present study, we sequenced and analyzed the C. Ciliata
   transcriptomes from three samples represented difference developmental
   stages and sex difference. This is the first species of Tingidae where
   the RNA has been sequenced. The sequenced, assembled and annotated
   transcriptome datasets will provide assistance with the identification
   of genes associated with C. Ciliata metamorphosis, digestion and
   immune, and sex difference, help to better analyses of the genetic
   function, reveal molecular regulatory mechanisms and develop molecular
   marker.

Materials and Methods

Insect materials and RNA isolation

   C. Ciliata samples (nymphs, AM and AF) were collected from the
   P.×acerifolia trees grown in the garden of plant protection institute,
   Chinese Academy of Agricultural Science, Haidian District
   (40°01’24.13”, 116°16’46”) in September 2015. The AF and AM were
   distinguished based on the abdomen morphology[[53]9]. Nymphs were mix
   3^th and 4^th instar nymphs. Nearly 500 individuals of each sample
   (nymphs, AF and AM) were used in RNA extraction. Total RNA was isolated
   using the RNAqueous-Micro kit (Life Technologies), and the integrity of
   all RNA samples was assessed using Agilent Bioanalyzer 2100 system
   (Agilent Technologies, CA, USA) with a minimum RNA integration value of
   6.

cDNA library preparation and Illumina sequencing

   To obtain the comprehensive gene information of C. Ciliata, Illumina
   sequencing was performed for all three C. Ciliata RNA samples
   represented the different developmental stages and sex differences. All
   cDNA libraries were generated in accordance with the manufacturer’s
   protocol (Illumina Inc., San Diego, CA, USA). 1.5 μg RNA per sample of
   C. Ciliata developmental stages was used for library construction.
   Sequencing libraries were generated using NEBNext^® Ultra^™ RNA Library
   Prep Kit for Illumina^® (NEB, USA) following manufacturer’s
   recommendations. Briefly, mRNA was purified from total RNA using poly-T
   oligo-attached magnetic beads. Fragmentation was carried out using
   divalent cations under elevated temperature in NEBNext First Strand
   Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using
   random hexamer primer and M-MuLV Reverse Transcriptase (RNase H^-).
   Second strand cDNA synthesis was subsequently performed using DNA
   Polymerase I and RNase H. Remaining overhangs were converted into blunt
   ends via exonuclease/polymerase activities. After adenylation of 3’ends
   of DNA fragments, NEBNext Adaptor with hairpin loop structure were
   ligated to prepare for hybridization. In order to select cDNA fragments
   of preferentially 150–200 bp in length, the library fragments were
   purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3
   ul USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated
   cDNA at 37°C for 15 min followed by 5 min at 95°C before PCR. Then PCR
   was performed with Phusion High-Fidelity DNA polymerase, Universal PCR
   primers and Index (X) Primer. At last, PCR products were purified
   (AMPure XP system) and library quality was assessed on the Agilent
   Bioanalyzer 2100 system. The library preparations were sequenced on an
   Illumina Hiseq2500 platform and 150-bp paired-end reads were generated.

Bioinformatics analysis

   The raw reads were cleaned by removing adapter sequences, reads
   containing ploy-N and low quality reads through in-house perl scripts.
   At the same time, Q20, Q30, GC-content and sequence duplication level
   of the clean data were calculated. All the downstream analyses were
   based on clean data with high quality. The left files (read1 files)
   from all libraries/samples were pooled into one big left.fq file, and
   right files (read2 files) into one big right.fq file. Transcriptome
   assembly was accomplished based on the left.fq and right.fq using
   Trinity v2.2.0 [[54]18] with min_kmer_cov set to 2 by default and all
   other parameters set default. Unigenes were annotated based on the
   following seven databases: Nr, Nt, Pfam[[55]19], KOG/COG, Swiss-Prot,
   and KO and Gene Ontology (GO) [[56]20], using BLAST with a cutoff
   E-value of < 10^−5.

Identification of simple sequence repeats (SSRs)

   SSRs among all unigenes were identified using the MISA [[57]21]
   ([58]http://pgrc.ipk-gatersleben.de/misa/misa.html), and primer for
   each SSR was designed by Primer3 [[59]22]
   ([60]http://primer3.sourceforge.net/releases.php).

Differential expression analyses

   Gene expression levels were estimated by RSEM [[61]23] for each sample
   by following steps: Clean data were mapped back onto the assembled
   transcriptome, and then Read count for each gene was obtained from the
   mapping results and normalized to reads per kb of exon model per
   million mapped reads (RPKM). For each sequenced library, the read
   counts were adjusted by edegR program package through one scaling
   normalized factor[[62]24]. Differential expression analysis of three
   samples was performed using the DEGseq R package v1.21.1 [[63]25]. P
   value was adjusted using q-value which was defined as the multiple
   testing analog of the p-value[[64]26]. q-value<0.005 and |log2
   (foldchange)|>1 was set as the threshold for significantly differential
   expression. Variations in gene expression levels were analyzed for
   specific comparisons that encompass two categories: (i) different
   developmental stages (AF and nymphs, AM and nymphs); (ii) sex
   differences (AF and AM). GO enrichment analysis of the DEGs was
   determined using the GOseqR packages[[65]20]. Kyoto Encyclopedia of
   Genes and Genomes (KEGG)[[66]27] pathway enrichment analysis was
   conducted using KOBAS software[[67]28].

Quantitative Real-time PCR

   The samples collections and their total RNA isolation were same as
   described above. The cDNA was generated using PrimeScript RT Master Mix
   (Perfect Real Time) (TaKaRa). The qRT-PCR primers were designed using
   IDT Primer Quest ([68]http://www.idtdna.com/primerquest/Home/Index) and
   shown in [69]S4 Table. The qRT-PCR was performed on the ABI Prism^®
   7500 (Applied Biosystems, Carlsbad, CA, USA) and the reaction was
   conducted in 20 μl reaction system containing 10 μl 2×GoTaq^®qPCR
   Master Mix (Promega, Madison, WI, USA), 10 μM of each primer pair,
   100ng cDNA and nuclease-free water. The following PCR-cycling condition
   was used for the qRT-PCR, 2 min at 95°C for 10 s, 40 cycles of 30 sec
   at 95°C and 1 min at 60°C. Relative RNA variations were normalized and
   corrected with the level of the actin gene (NCBI accession number
   [70]KX108734). All experiments were performed in three biological
   replicates with three technical replicates. The relative expression
   quantities of each gene was calculated by the comparative2^-ΔΔCt
   method[[71]29].

Results

Transcriptome sequencing and reads assembly

   To obtain a comprehensive understanding of molecular mechanisms that
   control C. Ciliata biology at distinct life stages and sex difference,
   total RNA was extracted from nymphs, AF and AM samples, followed by
   sequencing with the Illumina Hiseq sequencing platform. The sequencing
   data were deposited in the NCBI Sequence Read Archive (SRA,
   [72]http://www.ncbi.nlm.nih.gov/Traces/sra); BioProject ID number was
   PRJNA312006. The accession numbers were SRR3170921, SRR3170922 and
   SRR3170923, respectively. In total, 212,259,810 clean reads were
   assembled into a total of 83,754 transcripts from the nymphs, AF and AM
   libraries (Tables [73]1 and [74]2). The AF sample produced the most
   data (76,903,406 clean reads) and the AM sample produced the least
   amount of data (65,036,134 clean reads). The Q20 scores (the average
   quality value) were above 95% ([75]Table 1). Moreover, the overall GC
   content of the clean reads was high similarities, ranging from 44.87%
   to 46.24%. The assembled reads from three samples representing the
   different life stages and sex difference generated a total length of
   47,464,477 bp and 60,879 unigenes, with a mean contig length of 780bp
   and a N50 of 1,678 bp ([76]Table 2). Analysis of the size distributions
   revealed that 11,459 unigenes were over 1,000 bp in length.

Table 1. Output statistics from AM, AF and Nymphs of C. Ciliata.

   Samples Total Raw Reads Total Clean Reads Clean bases(G) Q20%  GC %
   Nymphs  73,934,914      70,320,270             8.79      95.04 46.24
   AM      67,466,456      65,036,134             8.13      95.16 45.93
   AF      79,748,574      76,903,406             9.61      95.24 44.87
   All     221,149,944     212,259,810           26.53
   [77]Open in a new tab

Table 2. Assembly statistics from AM, AF and Nymphs of C. Ciliata.

               Total Number Total Length (bp) Mean Length (bp) N50
   Transcripts 83,754       86,443,735        1,032            2,319
   Unigenes    60,879       47,464,477        780              1,678
   [78]Open in a new tab

Function Annotation

   All 60,879 C. Ciliata unigenes were subjected to the seven public
   databases ([79]Table 3) for functional annotations. There were 26.81%
   of them (16,322 unigenes) matched against the Nr database, and 7.42% of
   them (4,519) had specific matches to nucleotide sequences in the
   GenBank Nt database, and 20.66% (12,581) were similar to proteins in
   the Swiss-Prot database ([80]Table 3).

Table 3. Summary of functional annotations of C. Ciliata unigenes.

                                      Number of Unigenes Percentage (%)
   Annotated in Nr                    16,322                 26.81
   Annotated in Nt                    4,519                   7.42
   Annotated in KO                    5,783                   9.49
   Annotated in SwissProt             12,581                 20.66
   Annotated in PFAM                  14,246                  23.4
   Annotated in GO                    14,349                 23.56
   Annotated in KOG                   8,967                  14.72
   Annotated in all Databases         1,846                   3.03
   Annotated in at least one Database 20,579                  33.8
   [81]Open in a new tab

   Statistical tests on the distribution of the BLASTx[[82]30] matches of
   C. Ciliata unigenes against Nr databases indicated that 63.0% of the
   annotated unigenes had significant homology (E-values<1e^−30) to other
   sequences in Nr database, much higher than the percentage of homologous
   sequences with E-values between 1e^−5 and 1e^−30 (37%) ([83]Fig 1a).
   The similarity distributions suggested that 69.5% sequences had a
   similarity more than 60%, while 30.4% sequences had a similarity
   ranging from 18% to 59% ([84]Fig 1b). For main species distribution
   matched against Nr database, C. Ciliata unigenes have closely matched
   with sequences of the dampwood termite Zootermopsis nevadensis (13.2%),
   the pea aphid Acyrthosiphon pisum (8.6%), the red flour beetle
   Tribolium castaneum (7.0%), the Asian citrus psyllid Diaphorina citri
   (4.7%), and the bean bug Riptortus pedestris (4.5%) ([85]Fig 1c).

Fig 1. Pie charts showing distribution of the BLASTxmatches of C. Ciliata
transcriptome unigenes against Nr databases.

   Fig 1
   [86]Open in a new tab

   (a) E-values distribution, (b) similarity distribution, and (c) species
   classification.

   The functional annotation of C. Ciliata unigenes by GO, KOG, and KEGG
   recovered diverse potential biological functions and processes. A total
   of 14,349 unigenes were annotated with GO functions, including 73.87%
   unigenes at the biological process level, 83.81% unigenes at the
   molecular function level, and 49.26% unigenes at the cellular component
   level. Within the Biological process GO category, Cellular process and
   metabolic process were most abundant. Cell and cell part terms were
   most abundant among the Cellular component category. For Molecular
   Function, unigenes were predominantly associated with binding and
   catalytic activity functions ([87]Fig 2).

Fig 2. GO Classification of C. Ciliata unigenes according to the categories
of Biological process, Molecular function and Cellular component.

   [88]Fig 2
   [89]Open in a new tab

   After KOG-based annotation, a total of 8,967 annotated putative
   proteins were assigned to 26 KOG categories, mainly including general
   function prediction, signal transduction mechanisms, posttranslational
   modification, protein turnover, chaperones, translation, ribosomal
   structure and biogenesis, cytoskeleton, energy production and
   conversion, lipid transport and metabolism, and so on ([90]Fig 3).

Fig 3. KOG annotations of C. Ciliata predicted proteins.

   [91]Fig 3
   [92]Open in a new tab

   A total of 8,967 predicted proteins have a KOG classification among the
   26 categories.

   We mapped the predicted proteins of the C. Ciliata to the reference
   authoritative pathways in KEGG for further functional categorization
   and annotation. In total, we assigned 5,783 proteins into 228 KEGG
   pathways, with 2,047 proteins (30.61%) being associated with metabolic
   pathways. The pathways involving the largest number of unigenes were
   signal transduction (740), followed by transport and catabolism (454),
   whereas biosynthesis of other secondary metabolism (11) was the
   smallest group ([93]Fig 4).

Fig 4. KEGG annotation of C. Ciliata predicted proteins.

   [94]Fig 4
   [95]Open in a new tab

Protein coding sequence (CDs) prediction

   All 60,879 unigenes were aligned against the protein databases, with
   the priority order of Nr, Swiss-Prot, KEGG and KOG. Using BLASTx, a
   total of 17,533 coding sequences (CDs) were obtained from unigenes
   sequences and translated into amino acid sequences ([96]Fig 5A and 5B).
   Using the ESTScan[[97]31], we identified 11,706 unigenes CDs that could
   not be matched to the above protein databases and translated them into
   amino acid sequences. Of these unigenes with CDs sequences, 4,187 were
   over 500 bp and 1,070 were over 1000 bp ([98]Fig 5C and 5D).

Fig 5. Coding sequence predictions of C. Ciliata transcriptome by BLASTx and
ESTScan.

   [99]Fig 5
   [100]Open in a new tab

   (a) Length distribution of CDs using BLASTx (E-value < 10^−5); (b)
   Length distribution of proteins using BLASTx; (c) Length distribution
   of CDs predicted by ESTScan, and (d) Length distribution of proteins
   using ESTScan.

Frequency and distribution of SSRs

   We screened the C. Ciliata unigene dataset to mine potential SSRs for
   future population and evolutionary genetics analysis. Among 60,879
   examined sequences (47,464,477bp), a total of 17,462 SSRs were
   identified and the number of SSRs containing sequences reached 12,105.
   A total of 9,300 pairs of potential SSR primers were designed.
   Furthermore, 3,359 sequences contained more than 1 SSRs. In the C.
   Ciliata transcriptome, the SSRs frequency was 28.68% and the
   distribution density was 0.37 per kb. Based on the repeat motifs, all
   SSRs loci were divided into 6 categories: mono-nucleotide repeats
   (14,566), di-nucleotide (1,078), tri-nucleotide (1,723),
   tetra-nucleotide (88), penta-nucleotide (4) and hexa-nucleotide repeats
   (3), respectively ([101]Fig 6).

Fig 6. Frequency of SSRs in C. Ciliata transcriptomes.

   [102]Fig 6
   [103]Open in a new tab

Differential gene expression profile among C. Ciliata developmental stages
and sex difference

   To specifically identify development-biased and sex-biased
   deferentially expressed genes, we calculated expression profiling to
   examine gene activity changes between selected developmental stages
   (nymph and adult male, nymph and adult female) and sexes difference
   (adult male and adult female) ([104]Fig 7). DEGs (q-value <0.005 and
   |log2 (foldchange)| >1) were defined as genes that were significantly
   enriched or depleted in one sample relative to the other.

Fig 7. Volcano plots of differentially expressed unigenes.

   [105]Fig 7
   [106]Open in a new tab

   The abscissa represents the expressed levels fold change of unigenes in
   three different samples. The ordinate indicates the statistically
   significant difference degree. The lower and higher -log10 (p-adj)
   values mean greater differences. The scatters in diagram stand for each
   gene, the blue dot represents there was no significant difference of
   genes. The up-regulated and down-regulated genes were indicated by a
   red dot and green dot, respectively.

   In total, 1,690 genes showed significant change in gene expression on
   comparison of AF and nymphs libraries, with 584 and 1,106 genes that
   were up-regulated and down-regulated in the AF library, respectively
   ([107]Fig 7A). Among the top ten up-regulated DEGs, 5 showed defined
   functions, i.e., polyprotein (Bemisia tabaci), sodium-coupled
   monocarboxylate transporter 1-like (Plutella xylostella), group XV
   phospholipase A2-like (Acyrthosiphon pisum), chemosensory protein 8
   (Apolygus lucorum), and cytochrome P450 6k1-like (Microplitis
   demolitor). In addition, among ten most down-regulated DEGs, five genes
   have predicted functions, i.e., hypothetical protein L798_05227
   (Zootermopsis nevadensis), cuticle protein, putative (Riptortus
   pedestris), cuticle protein 19 (Tribolium castaneum), hypothetical
   protein TcasGA2_TC013819 (Tribolium castaneum), and hypothetical
   protein L798_05226, partial (Zootermopsis nevadensis) ([108]S1 Table).

   In the comparison between AM and nymphs, the expression profiles of
   2,524 genes had changed. There were 1,520 and 1,004 unigenes that were
   significantly up-regulated and down-regulated in the AM library,
   respectively ([109]Fig 7B). Among the top ten up-regulated genes, one
   displayed predicted functional gene in adult male library: the
   Cathepsin L1 in Strongylocentrotus purpuratus. The top ten
   down-regulated genes included defined functions: hypothetical protein
   L798_05227 (Zootermopsis nevadensis), uncharacterized protein
   LOC101741662 (Bombyx mori), Apolipoprotein D precursor, putative
   (Pediculus humanus corporis), hypothetical protein TcasGA2_TC013819
   (Tribolium castaneum), cuticle protein 19 (Tribolium castaneum), and
   lysosomal aspartic protease (Vollenhovia emeryi) ([110]S1 Table).

   The comparison between AF and AM libraries revealed 1,600 genes were
   significantly difference, with 125 and 1,475 genes that were
   up-regulated and down-regulated in the AF library,
   respectively([111]Fig 7C). Five of the top ten up-regulated DEGs in the
   AF library have putative functions, i.e., polyprotein (Bemisia tabaci),
   protein bicaudal C homolog 1-B isoform X1 (Cerapachys biroi), ankyrin
   repeat protein (Trichomonas vaginalis G3), hypothetical protein
   captedraft_143611 (Capitella teleta) and cathepsin B (Riptortus
   pedestris). Among the top ten down-regulated DEGs, two genes have
   annotated functions, i.e., ornithine aminotransferase (Riptortus
   pedestris) and Cytosol aminopeptidase (Pediculus humanus corporis)
   ([112]S1 Table).

GO enrichment analysis for DGEs

   DEGs sets in AF and nymphs libraries were investigated by GO enrichment
   analysis. For up-regulated DEGs in AF compared to nymphs, three terms
   oxidoreductase activity, catalytic activity, and cofactor binding were
   mostly highly enriched in the molecular function category. And the top
   three enriched terms in the cell components category were respiratory
   chain complex, respiratory chain and mitochondrion. For biological
   process, oxidation-reduction process, single-organism metabolic
   process, and monovalent inorganic cation transport were the top three
   dominant enriched terms. For the down-regulated DEGs in AF, chitin
   metabolic process, glucosamine-containing compound metabolic process
   and amino sugar metabolic process were the highest three biological
   process enriched. Only one term extracellular region was enriched in
   cellular component. The top three terms enriched in molecular function
   were structural constituent of cuticle, chitin binding, and structural
   molecule activity ([113]S2 Table).

   The GO enrichment analysis was conducted for the DEGs in AM sample,
   compared with nymphs (q-value < 0.05). The GO enrichment of
   up-regulated DEGs in AM was shown in [114]S2 Table. In the molecular
   function category, the top three enriched terms were aminopeptidase
   activity, peptidase activity, acting on L-amino acid peptides and
   catalytic activity. In the cell components category, microtubule
   associated complex, microtubule cytoskeleton and cytoskeletal part were
   the top three dominant enriched terms. For biological process,
   proteolysis, tetraterpenoid metabolic process, and tetraterpenoid
   biosynthetic process were the mostly highly enriched. For the
   down-regulated DEGs in AM compared to nymph sample, chitin metabolic
   process, glucosamine-containing compound metabolic process and amino
   sugar metabolic process were the top-three biological process enriched
   by the down-regulated DEGs. Two terms extracellular region and
   extracellular matrix were the cellular component enriched by
   down-regulated DEGs. Four terms structural constituent of cuticle,
   chitin binding, structural molecule activity and serine-type
   endopeptidase activity were the enriched molecular function ([115]S2
   Table).

   The GO enrichment analysis was conducted for the DEGs between AF and AM
   sample also ([116]S2 Table). For the down-regulated DEGs in AF vs AM,
   serine-type endopeptidase activity, peptidase activity, acting on
   L-amino acid peptides, and serine-type peptidase activity were the top
   three enriched terms in the molecular function category. In the cell
   components category, microtubule cytoskeleton, microtubule associated
   complex and dynein complex were the top three dominant enriched terms.
   For biological process, proteolysis, protein metabolic process and
   tetraterpenoid metabolic process were the mostly highly enriched.

Pathway enrichment analysis for DGEs

   The KEGG pathway enrichment of the up-regulated DEGs in AF, compared
   with nymphs was investigated also. Seven pathways were enriched
   (q-value < 0.05) ([117]S2 Table). For the down-regulated DEGs in the
   AF, five pathways including Glutathione metabolism, Amino sugar and
   nucleotide sugar metabolism, Arginine and proline metabolism, Gap
   junction, and Pathogenic Escherichia coli infection were enriched
   ([118]S2 Table).

   For the KEGG pathway enrichment of the up-regulated DEGs in AM,
   compared with nymphs. Two pathways Non-alcoholic fatty liver disease
   and Arginine and proline metabolism were enriched (q-value < 0.05). For
   the down-regulated DEGs in the AM, hippo signaling pathway—fly,
   pathways in cancer, basal cell carcinoma, amino sugar and nucleotide
   sugar metabolism and wnt signaling pathway were the enriched pathways
   ([119]S2 Table).

   However, no significant GO-term enrichment were observed for the
   up-regulated DEGs in AF. For KEGG pathway enrichment, the up-regulated
   DEGs in AF vs AM were enriched in the pathways lysosome and
   glycosphingolipid biosynthesis—globo series, according to KEGG pathway
   enrichment analysis. While, the down-regulated DEGs were not enriched
   significantly in AF.

DEGs associated with metamorphosis, digestion and immune

   To explore potential targets that might serve as novel pest management
   strategies for C. Ciliata, unigenes involved in metamorphosis,
   digestion and immunity were identified from the 1,291 DGEs between
   nymphs and adults (AM and AF)([120]S1 Fig). Based on the best BLAST
   matches against the Nr database, total of 10, 27 and 9 genes associated
   with metamorphosis, digestion and immunity were found, respectively
   ([121]Table 4 and [122]S3 Table). Among 10 metamorphosis-related
   unigenes, 6 were annotated respectively as methoprene-tolerant protein,
   broad, E7 series nuclear receptor, hormone receptors and ftz-f1. Other
   4 unigenes belonged to juvenile hormone-related genes ([123]S3 Table).
   In addition, we annotated 27 putative unigenes related to the digestive
   system ([124]Table 4), which encoded 2 serine protease, 1 cysteine
   protease, 11 aminopeptidase, 6 carboxypeptidase, 5 lipase and 2
   glucosidase. Among these 27 unigene, 12 were significantly
   differentially expressed between AF and AM. For 9 immunity-related
   unigenes, 4 were pathogen recognition genes encode pathogen
   surveillance proteins including 2 scavenger receptor and 2 Toll-like
   receptor. 4 unigenes were signaling pathway genes (Srpn or serpin). 1
   unigene was effector gene encode antimicrobial peptide.

Table 4. Metamorphosis, digestion and immune related genes detected in the C.
Ciliata DEGs dataset.

   Unigene ID Fold Change
   AF vs Nymphs AM vs Nymphs AF vs AM
   Metamorphosis-related genes Met
   c37392_g1 1.8916 1.1994 0.48447
   Broad
   c30286_g1 -1.5684 -2.1682 0.39217
   E7 series nuclear receptor
   c34837_g1 -4.2365 -4.2058 -0.23834
   Hormone receptors
   c38244_g1 -3.5604 -3.9799 0.21193
   ftz-f1
   c32527_g1 -1.144 -1.4728 0.12109
   c38083_g2 -1.1329 -1.4648 0.12419
   Others
   c36420_g1 1.3299 1.1657 -0.04342
   c23250_g1 -3.1944 -3.3947 -0.0072726
   c36792_g1 -6.5889 -5.7741 -1.0224
   c33463_g1 1.3668 1.5055 -0.34635
   digestion-related genes Serine protease
   c39372_g1 -3.0936 -4.0106 0.70943
   c27683_g1 -4.8423 -5.2471 0.1971
   Cysteine protease
   c36392_g1 3.581 -5.9139 9.2873
   Aminopeptidase
   c36547_g2 -5.1543 3.4303 -8.7922
   c37887_g1 -5.6824 -5.851 -0.039035
   c24987_g1 -6.1746 3.7993 -10.182
   c26582_g1 -11.509 3.3086 -15.025
   c32511_g2 -4.9939 3.1614 -8.152
   c34521_g1 -8.8125 3.3063 -12.326
   c34676_g1 -3.3304 3.1462 -6.6842
   c34884_g1 -9.6653 3.1381 -13.011
   c35035_g1 -1.6619 -2.1215 0.25194
   c35539_g1 -2.6068 -3.7812 0.96677
   c36253_g1 -5.6286 3.0675 -8.6928
   Carboxypeptidase
   c31658_g1 -5.0132 -4.8101 -0.41068
   c32354_g1 -5.8394 -4.2195 -1.8275
   c32694_g1 -4.5936 -6.6929 1.8917
   c34338_g2 -4.4356 -5.7741 1.1309
   c34575_g1 1.9712 1.3423 0.42127
   c35834_g1 -1.3044 2.5209 -4.0329
   lipase
   c32378_g1 -4.0463 -4.4767 0.22277
   c14857_g1 -2.4211 -2.5122 -0.11657
   c27654_g1 -5.8152 -5.0262 -0.9966
   c28590_g1 -3.1372 3.4538 -6.7987
   c31359_g1 -2.1681 -1.9684 -0.40728
   glucosidase
   c23488_g1 -5.0055 2.6057 -7.608
   c33442_g1 1.9176 1.6349 0.075112
   Immune-related genes Scavenger receptor
   c34938_g1 2.9408 3.0163 -0.28313
   c30960_g1 2.3447 1.956 0.18107
   Toll-like receptors
   c37763_g2 -1.4859 -1.6096 -0.083979
   c38863_g1 -2.8667 -3.2647 0.19034
   Srpn or serpin
   c27683_g1 -4.8423 -5.2471 0.1971
   c34936_g1 -3.1244 -2.9175 -0.41454
   c39627_g1 -5.8738 -2.4996 -3.5818
   c39627_g2 -5.3046 -2.6767 -2.8355
   Antimicrobial peptide
   c20205_g1 7.85 8.1072 -0.46479
   [125]Open in a new tab

   Number in bold represents significantly difference: q-value <0.005 and
   |log2 (foldchange)| >1. The q-value and Nr annotations of these
   unigenes were in [126]S3 Table.

qRT-PCR analysis

   To confirm the reliability of gene expression profiles in the
   sequencing data, 9 DEGs (difference between AF vs nymphs and AM vs
   nymphs) related to Juvenile hormone (c33463_g1, Juvenile hormone
   epoxide hydrolase 1; c37392_g1, methoprene-tolerant protein; c23250_g1,
   juvenile hormone binding protein-like precursor; c30286_g1, broad),
   cellular homeostasis (c20097_g1, small heat shock protein), defense
   (c20205_g1, defensin 4), molting and metamorphosis (c34837_g1,
   ecdysone-induced protein 78C), digestive (c31526_g1, salivary protein
   MYS2 precursor) and one cuticle protein (c36547_g6, Larval cuticle
   protein A3A) were chosen randomly and examined by the qRT-PCR in AF, AM
   and nymphs. In addition, 7 DEGs (difference between AF vs AM) related
   to sex difference (c37363_g1, vitellogenin receptor), zinc metabolism
   (c28326_g1, ADIPOR-like receptor) and other functions (c36825_g1,
   myosin-IIIb-like isoform X5; c38424_g1, protein bicaudal C homolog 1-B
   isoform X1; c33040_g1, ATP synthase subunits, mitochondrial isoform X1;
   c24503_g1, Atlastin; c36078_g1, Calpain-7) were examined in AF and AM.

   All the designed primers were listed in [127]S4 Table. Results from the
   qRT-PCR experiment were demonstrated consistent expression patterns
   with the relative expression levels obtained by RNA-Seq data analyses
   and shown similar results in up- or down-regulation of C. Ciliata genes
   ([128]Fig 8).

Fig 8. qRT-PCR validations results of DEGs in C. Ciliata.

   [129]Fig 8
   [130]Open in a new tab

   a. The fold changes for 9 DEGs (difference between AF vs nymphs and AM
   vs nymphs) from RPKM value in RNA-Seq data. b. The fold changes of
   these 9 DEGs in AF, AM and nymphs from qRT-PCR data. The fold changes
   of the 9 DEGs were calculated as the log2 value of each AF/nymph and
   AM/nymph comparison from RNA-Seq and qRT-PCR data. c. The fold changes
   for 7 DEGs (difference between AF vs AM) from RPKM value in RNA-Seq
   data. d. The fold changes of these 7 DEGs in AF and AM from qRT-PCR
   data. The fold changes of the 7 DEGs were calculated as the log2 value
   of each AF/AM comparison from RNA-Seq and qRT-PCR data. qRT-PCR were
   performed in triplicates. Error bars indicate the standard error of the
   mean. Significant difference was detected in all 9 DEGs between AF vs
   nymphs, AM vs nymphs (t-test, P<0.05) and all 7 DEGs between AF vs AM
   (t-test, P<0.05).

Discussion

   In this subject, the transcriptomes of C. Ciliata were sequenced by the
   Illumina platform. We identified a total of 60,879 unigenes from C.
   Ciliata transcriptome and only 26.81% showed specific NCBI Nr protein
   database matches ([131]Table 3). This suggests that the transcriptome
   data provide abundant information besides the now available Nr protein
   sequences. Thus, our result make a significant contribution to the
   current molecular resources for this invasive forestry pest of plane
   trees (Platanus spp.), and provides a framework for understanding the
   development and sex difference. This dataset could serve as not only a
   valuable resource to better understanding of biology and physiology of
   C. Ciliata, but may also contribute to the development of novel pest
   manipulation strategies for C. Ciliata. Furthermore, this first
   large-scale transcriptomic dataset for C. Ciliata overall offers a
   valuable genetic resource for gene function and evolutionary research
   on Tingidae species.

   Comparison of gene expression among the different developmental stages
   and different sex in the current experiment is helpful for
   identification of deferentially expressed genes across this pest's
   development and sexual dimorphism, thereby expanding our current
   knowledge of 60,879 C. Ciliata gene expression profiles. This result
   will contribute to future research on molecular feature of
   metamorphosis, digestion and immune, developmental mechanisms, and
   sex-determination mechanisms of this and other Tingidae species. During
   the developmental stage of C. Ciliata, the 60.22% of DEGs were found to
   be up-regulated and 65.48% were found to be down-regulated in the AM
   and AF when compared with the nymphs, respectively. In addition, a
   large number of genes showed developmental stage-related expression
   levels (e.g., juvenile hormone related genes, chemosensory protein,
   cuticular protein, etc) that were likely associated with C. Ciliata
   developmental process. Results From the comparison between AF and AM
   demonstrate that 125 and 1,475 unigenes were up-regulated and
   down-regulated in the AF, respectively. Vitellogenin receptor and
   protein bicaudal C were found to be up-regulated in AF vs AM, similar
   with previous study in Brown Planthopper Nilaparvata lugens[[132]32].
   Other various genes validated by qRT-PCR such as the Atlastin,
   ADIPOR-like receptor and the Calpain-7 gene appear to be sex-biased
   genes. In current study, A total of 19 juvenile hormone related genes
   were identified in our transcriptome data ([133]S5 Table), the genes
   are more abundant than the previous study in another hemipteran pest
   Diaphorina citri, in which 13 juvenile hormone genes were found by
   transcriptome analysis[[134]33]. The transcriptional expression levels
   of four juvenile hormone related genes (c33463_g1, Juvenile hormone
   epoxide hydrolase 1; c37392_g1, methoprene-tolerant protein; c23250_g1,
   juvenile hormone binding protein-like precursor; c30286_g1, broad) was
   further validate by qPCR in AF, AM and nymphs. Furthermore, many of sex
   biased genes (e.g. transformer protein, odorant binding protein,
   cuticular protein) identified from the whitefly Bemisia tabaci
   [[135]34] and mirid bugs Adelphocoris suturalis[[136]35] were also
   investigated in our transcriptome data, thereby further enriching our
   understanding of sex determination and differentiation in non-model
   insect pests.

   In current study, many important DEGs in C. Ciliata transcriptome were
   detected which were potentially related to various key processes in the
   life of C. Ciliata (e.g., metamorphosis, digestion and immune, sex
   difference). These genes may offer novel molecular targets for
   developing new insecticides to control C. Ciliata. Further examination
   is necessary for characterizing these genes in development or sex
   differences in C. Ciliata, using a combination molecular biology, cell
   biology, gene knockout and knockdown approach (e.g., gene expression,
   electrophysiology analysis, morphological analysis, behavior analysis,
   CRSPR/Cas9[[137]36] and RNAi [[138]14]).

   It is noteworthy that there is no annotation available for a large
   number of DEGs revealed in current study. These DEGs (536 unigenes
   between AM vs nymphs, 373 unigenes between AF vs nymphs, 391 unigenes
   between AF vs AM) showed significantly up-regulated or down-regulated
   patterns during development and sex difference. These new unigenes
   provides a unique resource for further studies of the developmental
   mechanism or sex determination mechanisms in C. Ciliata.

   We identified 17,462 microsatellite DNA loci from the transcriptome
   dataset of C. Ciliata. In comparison to FIASCO methods for
   microsatellite isolation[[139]37,[140]38], transcriptomes provide an
   economical, fast, sensitive and high-throughput screening method. It
   also produces multiple types of microsatellite simultaneously. These
   microsatellite DNA loci found in our study will facilitate future
   research in population genetic structure, gene flow studies, and
   parentage analysis of C. Ciliata, as well as other Tingidae
   species[[141]39,[142]40].

Supporting Information

   S1 Fig. Venn diagram of the DEGs detected in AF, AM and nymphs
   libraries.

   (TIF)
   [143]Click here for additional data file.^ (774.3KB, tif)
   S1 Table. Top ten DEGs in each library of comparisons.

   (XLS)
   [144]Click here for additional data file.^ (32.5KB, xls)
   S2 Table. GO and KEGG enrichment analysis for DEGs.

   (XLS)
   [145]Click here for additional data file.^ (62KB, xls)
   S3 Table. Metamorphosis, digestion and immune related genes detected in
   the C. Ciliata DEGs dataset.

   (XLS)
   [146]Click here for additional data file.^ (39KB, xls)
   S4 Table. Primers used in this study for qRT-PCR validation, including
   Nr annotation information for each unigenes.

   (XLS)
   [147]Click here for additional data file.^ (25KB, xls)
   S5 Table. Juvenile hormone related genes identified in the C. Ciliata.
   transcriptome.

   (XLS)
   [148]Click here for additional data file.^ (24KB, xls)

Data Availability

   The deep sequencing dataset has been uploaded to the NCBI Short Read
   Archive database, BioProject ID number PRJNA312006. The Accession
   numbers are SRR3170921, SRR3170922 and SRR3170923.

Funding Statement

   This work was funded by the Project of science and technology of
   Beijing City (grant No. Z151100001115002).

References