Abstract Background De novo transcriptome sequencing is a robust method of predicting miRNA target genes, especially for organisms without reference genomes. Differentially expressed miRNAs had been identified previously in kidney samples collected from susceptible and resistant grass carp (Ctenopharyngodon idella) affected by Aeromonas hydrophila. Target identification for these differentially expressed miRNAs poses a major challenge in this non-model organism. Results Two cDNA libraries constructed from mRNAs of susceptible and resistant C. idella were sequenced by Illumina Hiseq 2000 technology. A total of more than 100 million reads were generated and de novo assembled into 199,593 transcripts which were further extensively annotated by comparing their sequences to different protein databases. Biochemical pathways were predicted from these transcript sequences. A BLASTx analysis against a non-redundant protein database revealed that 61,373 unigenes coded for 28,311 annotated proteins. Two cDNA libraries from susceptible and resistant samples showed that 721 unigenes were expressed at significantly different levels; 475 were significantly up-regulated and 246 were significantly down-regulated in the SG samples compared to the RG samples. The computational prediction of miRNA targets from these differentially expressed genes identified 188 unigenes as the targets of 5 conserved and 4 putative novel miRNA families. Conclusion This study demonstrates the feasibility of identifying miRNA targets by transcriptome analysis. The transcriptome assembly data represent a substantial increase in the genomic resources available for C. idella and will provide insights into the gene expression profile analysis and the miRNA function annotations in further studies. Background Next-generation sequencing (NGS) -based RNA sequencing for transcriptome methods (RNA-seq) allow simultaneous acquisition of sequences for gene discovery as well as identification of transcripts involved in specific biological processes. This is especially suitable for non-model organisms whose genomic sequences are unknown [31][1], [32][2]. In addition, the dynamic range, sensitivity and specificity of RNA-seq also make it ideal for quantitatively analyzing various aspects of gene regulation [33][3]. These techniques do not require prior knowledge of genomic sequence and are much advanced in terms of time, cost, labor, amount of data produced, data coverage, sensitivity, and accuracy compared to traditional sequencing methods [34][4], [35][5]. The grass carp (Ctenopharyngodon idella) is one of the most important farmed fish species in China, with a cultural history dating back to the 7th century CE (Tang Dynasty) [36][6]. According to the FAO, the value of farmed C. idella reached more than 6.46 billion USD for a production of 5.03 billion tons in 2012, thus accounting for the highest production and third highest value of major cultured fish species worldwide at single species level [37][7]. Despite favorable growth traits, farmed C. idella are rather susceptible to various disease. Outbreaks of disease associated with bacteria such as Aeromonas hydrophila have caused high mortality, resulting in reduced production and considerable economic losses [38][8]. A. hydrophila is a causative agent of a wide spectrum of diseases in humans and animals [39][9]. While originally thought to be an opportunistic pathogen in immunocompromized humans, an increasing number of intestinal and extraintestinal disease cases suggest that it is an emerging human pathogen irrespective of the immune status of the host [40][10]. The pathogenesis, pathogenic mechanism, and virulence factors responsible for selected A. hydrophila infections in different species are not well understood [41][11]. A. hydrophila is a Gram-negative motile bacillus widely distributed in aquatic environments. It causes motile aeromonad septicemia (MAS), which results in great economic losses in worldwide freshwater fish farming [42][12]. Thus, more effective measures against A. hydrophila infection in fish are needed. Identification of differentially expressed genes (DGEs) following A. hydrophila infection is important for an improved understanding of fish MAS. MicroRNAs (miRNAs) are 20–22 nt non-coding RNAs that play important roles in post-transcriptional gene regulation. In animal cells, miRNAs regulate their targets by translational inhibition and mRNA destabilization [43][13]. MicroRNAs (miRNAs) are key effectors in mediating host-pathogen interactions and constitute a family of small RNA species; they are considered a promising candidate for regulating the interaction between host and pathogen [44][14], [45][15]. Therefore, dissecting the biological functions of miRNAs may help us understand the pathogenic mechanism of motile aeromonad septicemia in C. idella. Many studies have identified miRNAs and mRNA transcriptome in fish species, like common carp [46][16], [47][17], nile tilapia [48][18], [49][19] rainbow trout [50][20], [51][21] channel catfish [52][22], [53][23] and silver carp [54][24], [55][25]. To thoroughly interpret the biological functions of these miRNAs, a first step is to predict their targets. Therefore, establishing a more powerful transcriptome data for target identification is preferred. Although two parallel C. idella expressed sequence tag analyses have already been conducted using head kidney tissue [56][26], [57][27], the data presented here represent the first effort to analyze the transcriptome of C. idella affected by A. hydrophila. Two cDNA libraries from SG and RG C. idella used for our miRNA analysis were constructed and sequenced with Illumina Hiseq 2000. The obtained reads were assembled into transcripts and annotated by BLAST analysis against various databases before screening the results for differentially expressed genes and the prediction of miRNA targets. Our work will provide an approach to identify the target genes of miRNAs and to characterize their functional/regulatory network to increase our understanding of hemorrhagic septicemia outbreaks in C. idella. Materials and Methods Ethics statement All handling of fishes was conducted in accordance with the guidelines on the care and use of animals for scientific purposes set up by the Institutional Animal Care and Use Committee (IACUC) of the Shanghai Ocean University, Shanghai, China. The IACUC has specially approved this study within the project “Breeding of Grass Carp” (approval number is SHOU-09-007). Sample collection C. idella with an average weight of 50 g were cultured individually at the Wujiang National Farm of Chinese Four Family Carps (Jiangsu Province, China). Animals were raised at 28°C in 400-L aerated tanks for one week before the experiment and fed twice daily (in the morning and late in the afternoon) at a ratio of 5% of total biomass. Two groups (30 animals in each group) were maintained in two aquariums and intraperitoneally injected with A. hydrophila AH10 (Aquatic Pathogen Collection Center of Ministry of Agriculture, China) at a dose of 7.0×10^6 cells suspended in 100 µl PBS per fish. All fish were observed every 4 h for any mortality and samples were taken until the termination of the experiment at 240 h post-challenge. C. idella died in the first 72 h post-challenge were classified as susceptible group (SG) for their high sensitivity to A. hydrophila, while the animals that survived over 240 h post-challenge were considered as resistant group (RG) [58][28]. Spleen and kidney samples were immediately snap-frozen in liquid nitrogen and stored at −80°C until further use. cDNA library construction and sequencing Experimental protocols for the cDNA sequence were performed according to the manufacturer's technical instructions. The spleen and kidney tissues of randomly-selected three fish from both the susceptible and resistant groups were collected and, labeled as SG and RG, respectively. The RNA from the same tissue of three fishes of SG and RG C. idella was pooled with equal quantity for the construction of SG and RG cDNA libraries. The pooled total RNA was isolated from each spleen and kidney samples with TRIZOL reagent (Invitrogen, Grand Island, NY, USA). RNA integrity was confirmed using a 2100 Bioanalyzer (Agilent Technologies, Inc.) by RNA 6000 nano with a minimum RNA integrity number (RIN) value of 7.0. Poly (A) mRNA was purified from the total RNA using oligo (dT) magnetic beads. Equal amounts of the high-quality mRNA samples were obtained from each group for cDNA library preparation using the NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) and purified using Agencount AMPure XP beads (Beckman Coulter, Krefeld, Germany) according to the manufacturer's recommendations. The concentration of the cDNA library was determined on an Agilent Technologies 2100 Bioanalyzer by Agilent DNA-1000. Libraries were sequenced, at the Novogene Bioinformatics Institute (Beijing, China) on an Illumina HiSeq 2000 instrument (Illumina, San Diego CA, USA) that generated paired-end reads of 101 nucleotides. Data processing, assembly, and functional annotation Raw reads generated by Illumina Hiseq 2000 were then cleaned by removing the adaptor containing sequences, any ambiguous base>10% reads and low quality reads (1/2 reads with Q-value≤5) to get clean reads. Then, all clean reads were assembled using the de novo assembly program Trinity [59][29]: first, short reads were assembled into high-coverage contigs that could not be extended farther in either direction in a k-mer-based approach for fast and efficient transcript assembly. Then, the related contigs were clustered and a de Bruijn graph for each cluster was constructed. Finally, in the context of the corresponding de Bruijn graph and all plausible transcripts, alternatively spliced isoforms and transcripts were derived. All assembled transcripts were compared with publicly available databases including Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences) [60][30], KOG/COG (Clusters of Orthologous Groups of proteins) [61][31], Swiss-Prot (a manually annotated and reviewed protein sequence database) [62][32], KO (KEGG ortholog database) [63][33], Pfam (protein family) [64][34] and GO (Gene Ontology) ([65]http://www.geneontology.org/). Nr, Nt, KOG/COG, Swiss-Prot, and KO used the BLASTx analysis with a cut-off E-value of 10^−5, Pfam used Hmmerscan and GO used Blast2GO [66][35]. The best Blast hits from all Blast results were parsed for a homology-based functional annotation. For the nr annotations, the Blast2GO program was used to obtain GO annotations of unique assembled transcripts to describe biological processes, molecular functions, and cellular components. Differentially expressed genes between the SG/RG libraries High-quality reads were mapped to reference sequences (unigenes from the transcriptome data of the cDNA library) using RSEM [67][36]. Gene expression levels were calculated using the fragments per kilo bases per million mapped reads (FPKM) method [68][37]. The calculation of unigene expression levels and the identification of unigenes that were differentially expressed between the libraries were performed by DEGseq [69][38] based on TMM normalized counts. The settings “q.value <0.005” and “|log2.Fold change.normalized|>2” were used as thresholds for judging significant differences in transcript expression. Differentially expressed genes across the samples were further annotated by GO and KEGG pathway analysis MiRNA target prediction The SG and RG kidney miRNA-seq analysis were conducted in the same biological samples as mRNA-seq. Small RNA libraries were constructed using a Small RNA Cloning Kit (Takara). RNA was purified by polyacrylamide gel electrophoresis (PAGE) to enrich for the molecules in the range of 17–27 nt, then was ligated with 5' and 3′ adapters. The resulting samples were used as templates for cDNA synthesis followed by PCR amplification. The obtained sequencing libraries were subjected to Solexa sequencing-by-synthesis method. After the run, image analysis, sequencing quality evaluation and data production summarization were performed with Illumina/Solexa pipeline. The sequencing data was pretreated to discard low quality reads, no 3′-adaptor reads, 5′-adaptor contaminants and sequences shorter than 18 nucleotides. After trimming the 3′ adaptor sequence, sequence tags were mapped onto the transcriptome of C. idella using bowtie. Any small RNAs having exact matches to transcriptome of C. idella were used from further analysis. The mapped reads were compared to the miRBase (19.0) to annotate conserved miRNAs. To predict novel miRNAs, the miREvo [70][39] and mirdeep2 [71][40] were used. Computational identification of differentially expressed miRNA targets was performed using the miRanda toolbox [72][41], using the complementary region between miRNAs and mRNAs and the thermodynamic stability of the miRNA-mRNA duplex. All mRNAs used for target prediction came from the differentially expressed unigenes obtained as described above. The miRanda toolbox employed a dynamic programming algorithm to search the complementary regions between the miRNA and the 3′-UTR of the mRNA, and the scores were based on sequence complementary as well as minimum free energy of RNA duplexes, and were calculated with the Vienna RNA package [73][42]. All detected targets with scores and energies less than the threshold parameters of S>90 (single-residue pair scores) and ΔG <−17 kcal/mol (minimum free energy) were selected as potential targets. Real time PCR validation The sequencing results were validated by real time PCR using One Step PrimeScript miRNA cDNA Synthesis Kit for miRNA (TaKaRa) reversely transcribed, PrimeScript RT reagent Kit with gDNA Eraser (TaKaRa) for mRNA and SYBR Premix Ex Taq II (2x) (Takara) for qPCR according to the manufacturer protocols. Specific primer assays for miR-21 [F: 5′-TAGCTTATCAGACTGGTGTTGGC-3′, R: Uni-miR qPCR Primer (TaRaKa)], JNK1 (F: 5′- TGGTCAGAGGTAGTGTGTTG-3′, R: 5′-AGTTTGTTGTGGTCCGAGTC-3′) and ccr7 (F: 5′-CAAGCCAAGAACTTTGAGAGG-3′, R: 5′-GGCATAAAGGCGAATGTTGTC-3′) were purchased from sangon biotech and real time PCR quantification was carried out in CFX96 Real-Time PCR System (Bio-Rad, CA, USA). To normalize the expression values, miR-22a for miRNA and 18s for mRNA were used as housekeeping control [74][43], [75][44]. Expression levels were quantitatively analyzed using the 2^−△△CT method. One-way ANOVA tests were performed using SPSS 17.0 to determine significant differences. Each experiment was repeated in triplicates. Results and Discussion Antagonistic bacteria, such as A. hydrophila, enhance non-specifically immune-related enzyme activities and disease resistance in C. idella and provide a theoretical basis for disease prevention in aquaculture. However, the molecular mechanisms of this disease are still far from fully understood. The identification and characterization of candidate genes involved in MAS would represent the first step in understanding the genetic basis of this process in C. idella. De novo assemblies and unigenes annotation The sequencing generated 135.80 million raw reads. After trimming, 130.88 million clean reads remained, corresponding to 13.22 GB clean bases. The dataset of each sample, SG and RG, was represented by over 60 million clean reads, a read density sufficient for the subsequent quantitative analysis of genes. The raw sequencing reads have been submitted to the NCBI Short Read Archive under the accession number of SRR1124206 and SRR1125014. Then, all clean reads were assembled using a de novo assembly program Trinity [76][45]. These short reads were further assembled into 199,554 transcripts with an average length of 977 bp ([77]Table 1). The size distribution of these transcripts ranged from 201 to 27,185 bp, of which 27,334 were larger than 2,000 bp ([78]Figure 1). The assembled transcriptome data were deposited in NCBI's Transcriptome Shotgun Assembly (TSA) database under the accession numbers from SUB583458. Table 1. Summaries of sequencing cDNA library. Sample name SG RG Total reads 73,063,654 62,737,669 Clean reads 70,210,307 60,668,815 Total mapped to unigenes readcounts 61,396,052.97 51,785,549.97 Reads length (bp) 101 GC content (%) 48.43 47.13 Number of unigenes 199,554 Total length of unigenes (bp) 195,075,872 Mean length of unigenes (bp) 977 N50 of unigenes (bp) 2,117 Maximal length of unigenes (bp) 27,185 [79]Open in a new tab Figure 1. Length distribution of assembled unigenes in the sequenced cDNA library. [80]Figure 1 [81]Open in a new tab Annotation of predicted proteins A total of 61,373 distinct sequences (30.75% of the transcripts) matched known genes corresponding to 28,311 of the annotated proteins ([82]Table 2, [83]Table S1). An additional functional annotation of the unigenes of C. idella was performed searching for putative orthologs and paralogs within the KOG database [84][31]. A total of 16980 unigenes (27.7%) were assigned to 26 eukaryotic orthologous groups ([85]Figure 2A). The category “signal transduction”, which contained 3,611 unigenes (21.27% of 16980 unigenes), was the largest, followed by the categories “general functional prediction only” (3506, 20.65%), “post-translational modification, protein turnover, chaperone” (1539, 9.06%) and “cytoskeleton” (1226, 7.22%). Table 2. Statistics of the annotation results for the C. idella unigenes. All Nr Nt Pfam KOG Swiss-Prot KO GO Number of unigenes 61373 28311 46653 33013 16980 22674 27775 34207 % of unigenes 100 46.1 76.0 53.8 27.7 36.9 45.3 55.7 [86]Open in a new tab Nr: NCBI non-redundant protein sequences, Nt: NCBI non-redundant nucleotide sequences, Pfam: Protein family, KOG: Clusters of Orthologous Groups of proteins, Swiss-Prot: A manually annotated and reviewed protein sequence database, KO: KEGG Ortholog database and GO: Gene Ontology. Figure 2. Functional annotations of the unigenes of C. idella. [87]Figure 2 [88]Open in a new tab (A) KOG annotation. (B) Level 2 GO term distribution for the biological process, cellular component and molecular function categories. GO annotation and KEGG pathway analyses After GO annotation, C. idella transcripts could be assigned to three categories: biological processes, molecular functions and cellular components. Within the various biological processes, cellular processes (19,121 unigenes) metabolic processes (16,091) and biological regulation (8,463) were the most highly represented members ([89]Figure 2B). Important functions, such as cell death (389) and immune system processes (540), were also identified in this category. Similarly, cell (18,586) as well as cell part (18,586) and binding (19,460) were the most represented sub-categories in the cellular component and molecular function categories, respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) [90][33] revealed that 10,561 unigenes could be matched to 298 KEGG pathways. The most-represented pathways hierarchy 2 were the “infectious diseases” pathway (3210 unigenes) and the “signal transduction” pathway (2316) ([91]Table 3). Some pathways related to immune system were also identified, such as the “Toll-like receptor signaling” pathway (110) [92][46] and the “chemokine signaling” pathway (224) [93][47] ([94]Table 4). Table 3. Top 10 list of the gene number of Pathway Hierarchy 2. Pathway Hierarchy 2 Unigene Number Infectious Diseases 3210 Signal Transduction 2316 Cancers 2235 Nervous System 1648 Immune System 1554 Neurodegenerative Diseases 932 Digestive System 875 Endocrine System 861 Cell Communication 859 Signaling Molecules and Interaction 763 [95]Open in a new tab Table 4. Top 10 list of pathways related to immune system. Pathway Hierarchy 2 KEGG Pathway Unigene Numbers Immune System Chemokine signaling pathway 224 Immune System Leukocyte transendothelial migration 199 Immune System T cell receptor signaling pathway 156 Immune System Fc gamma R-mediated phagocytosis 155 Immune System Natural killer cell mediated cytotoxicity 129 Immune System B cell receptor signaling pathway 110 Immune System Toll-like receptor signaling pathway 110 Immune System Antigen processing and presentation 86 Immune System Fc epsilon RI signaling pathway 84 Immune System RIG-I-like receptor signaling pathway 67 [96]Open in a new tab Digital gene expression library sequencing Based on the transcriptome sequence data, two DGE libraries were constructed to identify the differentially expressed unigenes between the SG and RG samples. After removing low-quality reads, 70,210,307 and 60,668,815 clean reads were generated from the SG and RG libraries, respectively ([97]Table 1). Among these clean reads, 61,396,052.97 of the SG and 51,785,549.97 of the RG readcounts were mapped to unigenes. Differential gene expression between the SG and RG libraries The results suggest that the expression of 721 genes differed significantly between the SG and RG groups of C. idella. Of these genes, 475 were up-regulated and 246 were down-regulated in the SG samples compared to the RG samples ([98]Figure 3 and [99]Table S2). GO enrichment analysis of DEGs indicated that these genes were significantly enriched in oxidation-reduction processes (biological process), integral to membrane (cellular components), and protein binding (molecular function) ([100]Table S3). Pathway enrichment analysis found the DEGs to be mainly enriched in complement and coagulation cascades, Staphylococcus aureus infection and porphyrin and chlorophy II metabolism ([101]Table S4). Notably, several genes involved in the immune and inflammatory response were also identified, such as C-type lectin [102][48] and matrix metalloproteinase-9 [103][49]. Figure 3. Number and Fold change distribution of differentially expressed genes between the SG/RG libraries. [104]Figure 3 [105]Open in a new tab MiRNA target prediction The identification of miRNAs and their targets is important for understanding the physiological functions of miRNAs and the functional roles of differentially expressed miRNAs between healthy and diseased fish. We were thus interested in predicting miRNA target genes involved in the immune response or immune system, according to the KEGG analysis. In a previous study, small RNA deep-sequencing data were aligned with miRBase 18.0 to search for known miRNAs with complete matches, namely, conserved miRNAs (data unpublished). Meanwhile, miRNAs predicted by miRDeep 2.0 that could form stable secondary structures, were identified as novel miRNAs. We have used a single algorithm, miRanda [106][50] to predict miRNA targets. As miRNA binding to target 3′ UTR generally results in mRNA destabilization and degradation [107][51], we chose to narrow down potential targets to those showing differential expression in the opposite direction as the mRNA. This approach increases the strength to discover true target genes and functions affected by miRNA dysregulation. In total, 188 of the target genes predicted by miRanda were differentially expressed in the opposite direction in the target tissue. The identification of 188 unigenes ([108]Table S5) as the predicted target genes of 5 conserved and 4 putative novel miRNA families ([109]Table 5). The identified target genes involved in biological processes, molecular functions, and cellular components were defined using GO annotations. GO analysis demonstrated that these targets were involved in a broad range of physiological processes, including gene expression, transcription regulation, immune system processes, and responses to stress or stimuli ([110]Table S6). Table 5. Differentially expressed miRNAs in C. idella kidney between SG and RG. miRNA Sequence (5′-3′) SG (TMP) RG (TMP) log2 (Fold_change) normalized p-value let-7i UGAGGUAGUAGUUUGUGCUGUU 3419.67473 1475.401006 1.212751983 4.62E-174 miR-142a-3p UGUAGUGUUUCCUACUUUAUGGA 4770.81118 9798.344345 −1.038303405 0 miR-21 UAGCUUAUCAGACUGGUGUUGGC 3431.54242 1640.502276 1.064719594 4.12E-142 miR-217 UACUGCAUCAGGAACUGAUUGG 240.9140969 2983.647091 −3.630486183 0 miR-223 UGUCAGUUUGUCAAAUACCCC 1460.912578 6800.683353 −2.218809871 0 novel_115 UGAAGGCCGAAGUGGAGA 3.560306851 17.95531055 −2.334337113 0.001253505 novel_131 UGCCCGCAUUCUCCACCA 7.713998177 40.28996513 −2.384869847 9.85E-07 novel_154 CCCAGCCAUAUUUGUUUGAAC 16.02138083 0 4.768565529 4.44E-05 novel_3 UGUUUCUGGCUCUGAUAUUUGCU 32.04276166 71.82124218 −1.164412111 8.07E-05 [111]Open in a new tab Searching against the KEGG indicated that 188 unigenes mapped to 48 KEGG pathways. 26 pathways were related to immune and diseases in all pathways ([112]Table 6). These included the categories “Toll-like receptor signaling pathway”, “Fc epsilon RI signaling pathway”, “Chemokine signaling pathway”, “Fc gamma R-mediated phagocytosis”, “Antigen processing and presentation”, “Natural killer cell mediated cytotoxicity”, “T cell receptor signaling pathway” and “Complement and coagulation cascades” related immune functions. Toll-like receptor signaling pathway induce the expression of a variety of host defense genes. These include chemokine signaling pathway and other effectors necessary to arm the host cell against the invading pathogen [113][52]. Cytokine-cytokine receptor interaction play a pivotal role in the generation of immunological responses during bacterial infection [114][53]. This implicated functions that are likely regulated by miRNAs and suggests regulation of different pathways during immune activation in susceptible and resistant C. idella. Table 6. 26 pathways were related to immune and diseases in all pathways. KEGG Pathway Pathway Hierarchy 2 Target gene of different expression miRNA Toll-like receptor signaling pathway Immune system JNK, TLR5, TBK1 Hepatitis C Infectious diseases: Viral JNK, TBK1, LDLR Salmonella infection Infectious diseases: Bacterial JNK, TLR5 Fc epsilon RI signaling pathway Immune system JNK, FYN Tuberculosis Infectious diseases: Bacterial JNK, PLK3, CTSS Influenza A Infectious diseases: Viral JNK, DNAJB1, TBK1 Measles Infectious diseases: Viral FYN, TBK1 MAPK signaling pathway Signal transduction JNK, MKP Toxoplasmosis Infectious diseases: Parasitic JNK, LDLR Chemokine signaling pathway Immune system CCR7, ADCY7, DOCK2 HTLV-I infection Infectious diseases: Viral JNK, ADCY7, SLC2A1 Herpes simplex infection Infectious diseases: Viral JNK, TBK1 Fc gamma R-mediated phagocytosis Immune system DOCK2 Prion diseases Neurodegenerative diseases FYN Viral myocarditis Cardiovascular diseases FYN Pathways in cancer Cancers: Overview JNK, SLC2A1 Transcriptional misregulation in cancer Cancers: Overview CCR7 Type II diabetes mellitus Endocrine and metabolic diseases JNK Cytokine-cytokine receptor interaction Signaling molecules and interaction CCR7, TNFSF12 Antigen processing and presentation Immune system CTSS Pertussis Infectious diseases: Bacterial JNK Natural killer cell mediated cytotoxicity Immune system FYN T cell receptor signaling pathway Immune system FYN Chagas disease Infectious diseases: Parasitic JNK Staphylococcus aureus infection Infectious diseases: Bacterial CFB Complement and coagulation cascades Immune system CFB [115]Open in a new tab List of gene abbreviations: JNK: c-Jun N-terminal kinase, TLR5: toll-like receptor 5, TBK1: TANK-binding kinase 1, LDLR: low-density lipoprotein receptor, FYN: tyrosine-protein kinase Fyn, CTSS: cathepsin S, PLK3: polo-like kinase 3, DNAJB1: DnaJ homolog subfamily B member 1, MKP: dual specificity MAP kinase phosphatase, DOCK2: dedicator of cytokinesis protein 2, ADCY7: adenylate cyclase 7, SLC2A1: MFS transporter, SP family, solute carrier family 2 (facilitated glucose transporter), member 1, CCR7: C-C chemokine receptor type 7, TNFSF12: tumor necrosis factor ligand superfamily member 12, CFB: component factor B. Of particular interest to our study is the fact that several of the most highly expressed miRNAs in SG have been shown to have a role in immunity. let-7i and associated TLR4 expression are involved in cholangiocyte immune responses against C. parvum infection [116][54]. miR-21 targets multiple genes associated with the immunologically localized disease [117][55]. miR-21 which is up-regulated in SG is predicted to target 28 differentially expressed genes ([118]Table S7) in C. idella. Frequently represented in the top immune functions were protein kinase JNK1 (JNK1) and chemokine (C-C motif) receptor 7 (ccr7) ([119]Table 6). JNK1 and ccr7 showed clearly down-regulated expression profiles in the SG samples compared to the RG samples ([120]Figure 4A [121]http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone .0073506-pone.0073506.s003 ). The decreased expression profile of these targets in the susceptible samples supported our previous finding that the expression of miR-21 was significantly up-regulated, with TMP of 3,432 and 1,642 in the SG and RG groups, respectively ([122]Table 5). We validated the miR-21, JNK1 and ccr7 which had expression change in the sequencing data by performing real time PCR in the same samples used for the sequencing ([123]Figure 4B). Figure 4. The expression analysis of selected genes from the expression profile by relative quantitative real-time PCR. [124]Figure 4 [125]Open in a new tab A Transcriptome sequencing data, B Real-time PCR data. Increases and decreases in relative levels of transcripts with respect to the control 18s for mRNA and miR-22a for miRNA are shown. The settings “q.value <0.005” and “|log2.Fold change.normalized|>2” were used as thresholds for judging significant differences in transcript expression. One-way ANOVA tests were performed using SPSS 17.0 to determine significant differences for Real-time PCR data. Statistical significance of the relative expression ratio is indicated *. JNK1 is involved in apoptosis, neurodegeneration, cell differentiation and proliferation, inflammatory conditions and cytokine production mediated by AP-1 (activation protein 1), such as Regulated upon activation normal T cell expressed and presumably secreted (RANTES), Interleukin 8 (IL-8), and Granulocyte-macrophage colony-stimulating factor (GM-CSF) [126][56]. It has been reported that JNK plays an important role in the innate immune response to microbial challenge [127][57], [128][58]. Most importantly, JNK1 serves as a negative regulatory factor for MAP kinase phosphatase 5 (MKP5) that plays an essential role in innate immune responses [129][59]. The chemokine receptor CCR7 acts as an important organizer of the primary immune response [130][60]. A previous study demonstrated a discrete CCR7 requirement in the activation of different T cell subsets during bacterial infection [131][61]. CCR7 is differentially regulated by macrophages in exposure to bacteria, as it is triggered by exposure to both Gram-negative and Gram-positive bacteria [132][62]. The discovery of microRNAs dramatically changed our perspective on eukaryotic gene expression regulation [133][63]. MicroRNAs play important gene-regulatory roles in animals and plants by pairing to the mRNAs of protein-coding genes to direct their posttranscriptional repression [134][64], [135][65]. The identification of miRNA target genes is an important step in understanding their role in gene regulatory networks. Most miRNA-associated computational methods comprise the prediction of miRNA genes and their targets, and an increasing number of computational algorithms and web-based resources are being developed to fulfill the needs of scientists performing miRNA research, like miRanda [136][42], TargetScan [137][66], RNAhybrid [138][67] and PicTar [139][68]. However, animal miRNA targets are difficult to predict since miRNA: mRNA duplexes often contain several mismatches, gaps, and G:U base pairs in many positions, thus limiting the maximum length of contiguous sequences of matched nucleotides [140][69]. The predicted interactions using these computational methods are inconsistent and the expected false positive rates are still high. Recently, several authors suggested integrating expression profiles from both miRNA and mRNA with in silico target predictions to reduce the number of false positives and increase the number of biologically relevant targets [141][70]. These methods have been shown to be effective in identifying the most prominent interactions from the databases of putative targets [142][71]. To minimize false positive rates in our study, the RNA-seq and miRNA-seq analysis were conducted in the same biological samples. Likewise, to reduce the number of putative target genes, the miRNA targets were predicted from differentially expressed genes. However, some false positive predictions proved inevitable. Further studies will focus on the experimental validation of the differentially expressed mRNA and miRNAs identified in this study. They are likely to be central regulators of the innate immune response to A. hydrophila and thus represent potential therapeutic targets or novel biomarkers of infection and inflammation. Conclusions In this study, we used high-throughput sequencing data to characterize the transcriptome of C. idella, a species for which little genomic data are available. Further, DGE tags were mapped to the assembled transcriptome for further gene expression analysis. A large number of candidate genes involved in MAS were identified. This represents a fully characterized transcriptome, and provides a valuable resource for genetic and genomic studies in C. idella. Additionally, DGE profiling provides new leads for functionally studies of genes involved in MAS. Finally, comparison with our previous miRNA profiling, this study strongly indicates that miRNA is a critical factor in determining mRNA abundance and regulation during MAS. Our on-going effort using experimental approach such as knock-down or over-express candidate miRNAs and mRNAs in vitro is expected to provide new evidence in understanding these regulatory mechanisms of MAS in C. idella. Supporting Information Table S1 Summary of unigene annotation against Nr, Nt, Pfam, KOG, Swiss-Prot, KO and GO database. (XLSX) [143]Click here for additional data file.^ (23.9MB, xlsx) Table S2 721 genes differed significantly between the SG and RG groups of C. idella. (XLSX) [144]Click here for additional data file.^ (119.3KB, xlsx) Table S3 GO enrichment analysis of 721 genes differed significantly. (XLSX) [145]Click here for additional data file.^ (100.4KB, xlsx) Table S4 Pathway enrichment analysis of 721 genes differed significantly. (XLSX) [146]Click here for additional data file.^ (35KB, xlsx) Table S5 188 genes as the predicted target genes of 9 different expression miRNA between SG and RG groups of C. idella. (XLSX) [147]Click here for additional data file.^ (47.5KB, xlsx) Table S6 GO enrichment analysis of 188 genes as the predicted target genes of different expression miRNA. (XLSX) [148]Click here for additional data file.^ (61.8KB, xlsx) Table S7 28 target gene information of miR-21 . (XLSX) [149]Click here for additional data file.^ (15.4KB, xlsx) Acknowledgments