Abstract
Background
Ammopiptanthus mongolicus (Maxim. ex Kom.) Cheng f., an evergreen
broadleaf legume shrub, is distributed in Mid-Asia where the
temperature can be as low as −30°C during the winter. Although A.
mongolicus is an ideal model to study the plant response to cold
stress, insufficient genomic resources for this species are available
in public databases. To identify genes involved in cold acclimation (a
phenomenon experienced by plants after low temperature stress), a
high-throughput sequencing technology was applied.
Results
We sequenced cold-treated and control (untreated) samples of A.
mongolicus, and obtained 65,075,656 and 67,287,120 high quality reads,
respectively. After de novo assembly and quantitative assessment, 82795
all-unigenes were finally generated with an average length of 816 bp.
We then obtained functional annotations by aligning all-unigenes with
public protein databases including NR, SwissProt, KEGG and COG.
Differentially expressed genes (DEGs) were investigated using the RPKM
method. Overall, 9309 up-regulated genes and 23419 down-regulated genes
were identified. To increase our understanding of these DEGs, we
performed GO enrichment and metabolic pathway enrichment analyses.
Based on these results, a series of candidate genes involved in cold
responsive pathways were selected and discussed. Moreover, we analyzed
transcription factors, and found 720 of them are differentially
expressed. Finally, 20 of the candidate genes that were up-regulated
and known to be associated with cold stress were examined using
qRT-PCR.
Conclusions
In this study, we identified a large set of cDNA unigenes from A.
mongolicus. This is the first transcriptome sequencing of this
non-model species under cold-acclimation using Illumina/Solexa, a
next-generation sequencing technology. We sequenced cold-treated and
control (untreated) samples of A. mongolicus and obtained large numbers
of unigenes annotated to public databases. Studies of differentially
expressed genes involved in cold-related metabolic pathways and
transcription factors facilitate the discovery of cold-resistance
genes.
Keywords: Ammopiptanthus mongolicus, Cold acclimation, Transcriptome,
Illumina/Solexa
Background
Ammopiptanthus mongolicus (Maxim. ex Kom.) Cheng f. is an endangered
angiosperm genus (Leguminosae) mainly distributed in the deserts of
eastern central Asia [[30]1]. As the only evergreen broad-leaved shrub
in this area, it is particularly important for the
ecological-environmental stability of native habitats [[31]2].
Moreover, the increasing desertification in central Asia is gradually
becoming more serious in recent years [[32]3]. Due to the important
role that A. mongolicus plays in fixing moving sands and delaying
further desertification, the protection and research on A. mongolicus
are becoming particularly important and necessary [[33]4]. Former
studies on A. mongolicus have shown its capacity of resisting high
solar radiation, heat, cold, and drought stresses [[34]5,[35]6]. Some
genes involved in drought, cold, heat, and salinity tolerance have been
identified, such as AmCBL1[[36]7], AmLEA[[37]8]AmVP1[[38]9],
AmNHX2[[39]10] and AmCIP[[40]11]. However, these studies on A.
mongolicus are still limited and genomic resources of A. mongolicus
(749 EST and 155 nucleotide sequences in GeneBank prior to 11 January
2013) are also scarce. The ability to survive at −30°C or lower
temperatures makes A. mongolicus an ideal model for studying mechanisms
of cold tolerance in plants. Therefore deeper understanding on genes
expression profile of A. mongolicus under cold stress would be
significative and imperative.
Cold acclimation; i.e., enhancement of the freezing tolerance of plants
after low temperature stress, has been observed in many plants,
including Arabidopsis[[41]12], Oryza sativa[[42]13], Triticum
aestivum[[43]14], and A. mongolicus[[44]15]. During this process,
plants alter the expression of certain genes as well as the
biosynthesis of amino acids and soluble sugars. To elucidate the
mechanism underlying cold acclimation, it is important to determine how
plants alter gene expression in response to this biological process
[[45]16].
Compared with the traditional Sanger method, which is expensive and
time consuming, next-generation sequencing (NGS) technologies or
massively parallel sequencing technologies (e.g. Illumina/Solexa-based
RNA-Seq technology) are much simpler and more cost-effective [[46]17].
Furthermore, these high-throughput RNA sequencing (RNA-Seq)
technologies have other advantages, such as accuracy and sensitivity
for both low- and high-level gene expression [[47]18], and facilitate
rapid identification and analysis of the vast majority of
transcriptomes [[48]19]. Three main commercially available
next-generation sequencing technologies are extant; namely, ABI/SOLiD,
454/Roche and Illumina/Solexa [[49]20]. Illumina/Solexa has been
successfully applied to transcriptome sequencing of many plant species,
including Populus euphratica[[50]21], Aegilops variabilis[[51]22] ,
Brassica napus[[52]23], Zea mays[[53]24], Arachis hypogaea[[54]25], and
Picrorhiza kurrooa[[55]26].
Here, we describe the first transcriptome sequencing of A. mongolicus
under cold-acclimation conditions using a next-generation sequencing
technology, Illumina/Solexa. We sequenced two cDNA libraries
(cold-treated and control samples) of living A. mongolicus tissues, and
got an unprecedented amount of data. All sequences were deposited in
Short Read Archive (SRA) division of the GenBank repository (accession
no. SRA064010). Genes identified in this study expanded the available
EST resources of A. mongolicus. Moreover, the analyses on
differentially expressed genes under cold stress also furthers our
understanding of the cold response mechanism of A. mongolicus, and the
cold-related genes would also contribute to provide a method of
developing cold-tolerant plants through genetic manipulation.
Results
Genome-size estimates
We determined the genome size of A. mongolicus by flow cytometry using
cotyledons nuclei with maize (≈2500 Mb [[56]27]) as an internal
standard. The result showed that the genome size of A. mongolicus
(2n = 18) was approximately 819.56 ± 7.61 Mb (Table [57]1), which was
similar to that of another legume, the chick pea (Cicer arietinuum,
≈738 Mb) [[58]28]. This is the first report of the genome size of A.
mongolicus as far as we know. It is helpful to determine the sequencing
depth.
Table 1.
Flow cytometry determination of the nuclear genome sizes of A.
mongolicus
A. mongolicus peak Reference peak^1 Peak ratio (Ammopiptanthus
/reference) A. mongolicus genome size (1C, Mb, ±standad deviation)
15.53
__________________________________________________________________
47.07
__________________________________________________________________
0.329934141
__________________________________________________________________
824.8353516
__________________________________________________________________
16.48
__________________________________________________________________
50.06
__________________________________________________________________
0.329204954
__________________________________________________________________
823.0123851
__________________________________________________________________
15.62
__________________________________________________________________
48.16
__________________________________________________________________
0.324335548
__________________________________________________________________
810.8388704
__________________________________________________________________
819.56 ± 7.61
[59]Open in a new tab
^1Nuclei from maize young leaves serve as a size standard, which has a
haploid genome size of 2500 Mb.
Illumina sequencing and reads assembly
The two libraries (cold-treated and control samples), were sequenced
respectively using Illumina HiSeq™ 2000. The total clean nucleotides
generated from each sample exceeded 5.8 Gb, that is, an unprecedented
depth of seven times (7-fold or 7× coverage) as much as the genome size
(819.56 ± 7.61 Mb). We obtained approximately 71 million raw reads for
the cold-treated sample (CT) and 73 million for the control sample
(CK). We discarded low-quality reads, which contained adapters and
unknown or low-quality bases, according to our bioinformatics analysis.
A total of 65 million and 67 million clean reads were obtained from CT
and CK samples, respectively. Of these clean reads, the total length
was 11.9 × 10^9 nt and the Q20 percentage (percentage of sequences with
sequencing error rate lower than 1%) was over 97% for both samples. All
clean reads were also deposited in the National Center for
Biotechnology Information (NCBI) and can be accessed in the Short Read
Archive (SRA) under the accession number SRA064010.
Transcriptome de novo assembly was performed using Trinity, a short
reads assembling program [[60]29], which generated 145,000 (CT) and
148,797 contigs (CK) (Table [61]2). In both samples, the average contig
size exceeded 300 nt, with the N50 of 500 nt. The contigs of each
sample were then connected into unigenes, generating 76000 (CT) and
84583 unigenes (CK), respectively. After long-sequence clustering
between both samples, 82795 all-unigenes were obtained. The sequencing
coverage ranged from 1- to 66245-fold (average, 45-fold). The total
length was 67,554,337 nt, with a mean length of 816 nt and an N50 of
1343 nt (i.e., 50% of the assembled bases were incorporated into
unigenes of 1343 nt or longer). Each all-unigene was longer than 200
nt, and 35350 (42.70%) unigenes were 200 to 400 nt. Also, 8066 (9.74%)
unigenes were longer than 2000 nt. The size distribution of the contigs
and unigenes is shown in Figure [62]1.
Table 2.
Overview of the sequencing and assembly
Cold-treated Control Total
Total Raw Reads
__________________________________________________________________
71,441,910
__________________________________________________________________
73,279,028
__________________________________________________________________
__________________________________________________________________
Total Clean Reads
__________________________________________________________________
65,075,656
__________________________________________________________________
67,287,120
__________________________________________________________________
__________________________________________________________________
Total Clean Nucleotides (nt)
__________________________________________________________________
5,856,809,040
__________________________________________________________________
6,055,840,800
__________________________________________________________________
__________________________________________________________________
Q20 percentage
__________________________________________________________________
97.39%
__________________________________________________________________
97.60%
__________________________________________________________________
__________________________________________________________________
N percentage
__________________________________________________________________
0.00%
__________________________________________________________________
0.00%
__________________________________________________________________
__________________________________________________________________
GC percentage
__________________________________________________________________
45.87%
__________________________________________________________________
45.40%
__________________________________________________________________
__________________________________________________________________
Contig
__________________________________________________________________
Total Number
__________________________________________________________________
145,000
__________________________________________________________________
148,797
__________________________________________________________________
__________________________________________________________________
Total Length(nt)
__________________________________________________________________
45,723,903
__________________________________________________________________
51,308,749
__________________________________________________________________
__________________________________________________________________
Mean Length(nt)
__________________________________________________________________
315
__________________________________________________________________
345
__________________________________________________________________
__________________________________________________________________
N50
__________________________________________________________________
521
__________________________________________________________________
619
__________________________________________________________________
__________________________________________________________________
Unigene
__________________________________________________________________
Total Number
__________________________________________________________________
76,000
__________________________________________________________________
84,583
__________________________________________________________________
82,795
__________________________________________________________________
Total Length(nt)
__________________________________________________________________
48,956,203
__________________________________________________________________
57,108,594
__________________________________________________________________
67,554,337
__________________________________________________________________
Mean Length(nt)
__________________________________________________________________
644
__________________________________________________________________
675
__________________________________________________________________
816
__________________________________________________________________
N50
__________________________________________________________________
1122
__________________________________________________________________
1191
__________________________________________________________________
1343
__________________________________________________________________
Total Consensus Sequences
__________________________________________________________________
76,000
__________________________________________________________________
84,583
__________________________________________________________________
82,795
__________________________________________________________________
Distinct Clusters
__________________________________________________________________
1,831
__________________________________________________________________
2,043
__________________________________________________________________
40,988
__________________________________________________________________
Distinct Singletons 74,169 82,540 41,807
[63]Open in a new tab
Figure 1.
[64]Figure 1
[65]Open in a new tab
Length distribution of the contigs and unigenes. The length
distribution of contigs of CT sample (A), contigs of CK sample (B),
unigenes of CT sample (C), unigenes of CK sample (D), and the
all-unigenes (E).
Functional annotation and classification
We next performed BLAST (E-value < 0.00001) analysis of the 82795
all-unigenes against protein databases, following a priority order of
Nr (non-redundant protein sequences in NCBI), Swiss-Prot, KEGG (Kyoto
Encyclopedia of Genes and Genomes database), and COG. There were 53398
(64.5%) unigenes with homologous sequences in at least one of the above
databases. Among them, 53252 (64.3%), 30890 (37.3%), and 27479 (33.2%)
unigenes were found in NR, SwissProt, and KEGG, respectively. A total
of 22101 (26.7%) unigenes were found in all three databases, while
29397 (35.5%) unigenes were not identified (Figure [66]2).
Figure 2.
[67]Figure 2
[68]Open in a new tab
Number of unigenes blasted to NR, Swiss-Prot, KEGG and COG
(E < 0.00001).
We analyzed the 100 most-abundant transcripts of each sample and found
some differences in functional annotations. Compared with control
sample, the most-abundant transcripts in the cold-treated sample were
dehydrin, the LEA protein precursor, SRC1 protein, defensin D1, early
light inducible protein, SOS2-like protein kinase, Ethylene-responsive
transcription factor, and sucrose synthase. The 100 most-abundant
transcripts are listed in Additional file [69]1.
We also identified a total of 1322 unigenes genes expressed only in the
cold-treated sample. Functional annotations showed that some of them
were closely related to cold stress, including calcium-transporting
ATPase, serine/threonine-protein kinase, CBL-interacting protein
kinase, late embryogenesis abundant protein, and dehydrin. Unigenes
expressed only in the cold-treated sample are listed in Additional file
[70]2.
Based on Nr annotations, we used the Gene Ontology (GO) classification
system to classify the possible functions of the unigenes. A total of
23167 (28.0%) unigenes were successfully assigned to at least one GO
term annotation (Figure [71]3). The unigenes were then classified into
three main categories: biological processes, cellular components, and
molecular function. The category of biological processes consisted of
944 GO terms, which were assigned to 14719 (17.8%) unigenes. The
cellular components category consisted of 196 GO terms, which were
assigned to 14095 (17.0%) unigenes. The category of molecular functions
consisted of 577 GO terms, which were assigned to 18775 (22.7%)
unigenes. For biological process, the top five largest categories were:
“metabolic process” (9731), “cellular process” (8907), “response to
stimulus” (3037), “biological regulation” (2502), and “localization”
(2226). For cellular components, the top three largest categories were:
“cell” (13821), “cell part” (12354), and “organelle” (9472). For
molecular function, the top three largest categories were: “binding”
(10876), “catalytic activity” (10452), and “transporter activity”
(1215).
Figure 3.
[72]Figure 3
[73]Open in a new tab
Histogram presentation of Gene Ontology classification. The results are
summarized in three main categories: biological process, cellular
component, and molecular function. The y-axis on the right side
indicates the percent of genes in a category, and the y-axis on the
left side means the number of genes.
To assess the integrality of our transcriptome library and
effectiveness of the annotation process, we aligned the all-unigenes to
the COG database and 17327 (20.9%) were identified. By classifying the
possible functions of these unigenes, they were grouped into 25
functional categories (Figure [74]4). The largest category was “General
function prediction only” (5504 of 17327 unigenes, about 31.8%),
followed by “Transcription” (3344 unigenes, about 19.3%), “Replication,
recombination and repair” (3248, 18.7%), “Post-translational
modification, protein turnover, chaperones” (2408, 13.9%), and “Signal
transduction mechanisms” (2402, 13.9%). The categories of
“Extracellular structures” (8, 0.046%), “Nuclear structure” (20, 0.12%)
and “Cell motility” (257, 1.48%) had the fewest responding genes. Also,
1576 (9.10%) unigenes were annotated as “Function unknown”.
Figure 4.
[75]Figure 4
[76]Open in a new tab
COG classificationf. A total of 17327 unigenes were assigned to 25
classifications. The capital letters in x-axis indicates the COG
categories as listed on the right of the histogram, and the y-axis
indicates the number of unigenes.
We performed BLAST analysis of the 82795 all-unigenes against the KEGG
database to further analyze gene products during metabolic processes
and determine their functions in cellular processes. A total of 27479
(33.2%) unigenes matched 255,974 members, which were involved in 125
KEGG pathways. Of the 27479 unigenes, 5905 (21.49%) were related to
metabolic pathways, 2821 (10.27%) were related to the biosynthesis of
secondary metabolites, 1667 (6.07%) to plant hormone signal
transduction, 1060 (3.86%) to RNA transport, and 975 (3.55%) to the
spliceosome.
Protein-coding region prediction
After searching all-unigene sequences against protein databases using
BLASTx (E-value < 0.00001) in the order mentioned in the functional
annotation and classification section, we extracted 53485 coding
sequences (CDS) from unigene sequences and translated them into peptide
sequences. For unigenes with no BLAST hits, we used ESTScan to predict
the 1936 CDS and translated them into peptide sequences. The
distribution of the CDS is shown in Figure [77]5.
Figure 5.
[78]Figure 5
[79]Open in a new tab
Length distribution of the protein coding region prediction (CDS). A.
The length distribution of CDs using BLASTx. B. The length distribution
of proteins using BLASTx. C. The length distribution of CDs using
ESTscan. D. The length distribution of proteins using ESTscan.
Differential expression analysis
To identify genes with different expression levels, we used the RPKM
method (Reads Per kb per Million reads) to calculate the expression
levels of the unigenes. The result showed that 9309 genes were
up-regulated and 23419 genes were down-regulated with FDR ≤ 0.001 and
ratios larger than 2. The distribution of transcript changes is shown
in Figure [80]6.
Figure 6.
[81]Figure 6
[82]Open in a new tab
Distribution of transcript changes in cold-stress sample compared with
control sample. A. The expression levels of all-unigenes B. The
distribution of differentially expressed genes.
We then mapped all differentially expressed genes to each term of the
Gene Ontology database ([83]http://www.geneontology.org/, release data:
Aug 1st, 2012) and calculated the gene numbers from each GO term. Using
a hypergeometric test, we identified significantly enriched GO terms in
DEGs compared to the genomic background. GO terms with a corrected p
value ≤ 0.05 were defined as significantly enriched in DEGs. The GO
enrichment analysis results are shown in Additional file [84]3.
We also performed metabolic pathways enrichment analysis, and
identified the primary biochemical pathways and signal transduction
pathways in which DEGs involved. A total of 3868 up-regulated unigenes
and 11691 down-regulated unigenes were identified to be involved in
cold stress, on which all following analyses and discussions of
metabolic pathway were based. These genes were related to 44 metabolic
pathways, showing significantly changed under cold stress (p ≤ 0.05)
including genes involved in carbohydrate, amino acid metabolism,
energy, lipid, confactors and vitamins, terpenoids and polyketides,
immune system, and environmental adaptation (Table [85]3).
Table 3.
Statistical enrichment analysis for KEGG metabolic pathways (p ≤0.05)
Pathway ID
__________________________________________________________________
Pathway
__________________________________________________________________
Genes with pathway annotation
__________________________________________________________________
p value
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
All genes
__________________________________________________________________
DEGs
__________________________________________________________________
__________________________________________________________________
Up Down
ko01100
__________________________________________________________________
Metabolic pathways
__________________________________________________________________
5905 (21.49%)
__________________________________________________________________
607
__________________________________________________________________
2004
__________________________________________________________________
9.37E-18
__________________________________________________________________
ko01110
__________________________________________________________________
Biosynthesis of secondary metabolites
__________________________________________________________________
2821 (10.27%)
__________________________________________________________________
322
__________________________________________________________________
989
__________________________________________________________________
4.54E-16
__________________________________________________________________
Carbohydrate metabolism
__________________________________________________________________
ko00040
__________________________________________________________________
Pentose and glucuronate interconversions
__________________________________________________________________
233 (0.85%)
__________________________________________________________________
31
__________________________________________________________________
130
__________________________________________________________________
3.28E-20
__________________________________________________________________
ko00500
__________________________________________________________________
Starch and sucrose metabolism
__________________________________________________________________
651 (2.37%)
__________________________________________________________________
72
__________________________________________________________________
257
__________________________________________________________________
3.70E-09
__________________________________________________________________
ko00053
__________________________________________________________________
Ascorbate and aldarate metabolism
__________________________________________________________________
208 (0.76%)
__________________________________________________________________
21
__________________________________________________________________
80
__________________________________________________________________
0.004397
__________________________________________________________________
ko00010
__________________________________________________________________
Glycolysis / Gluconeogenesis
__________________________________________________________________
337 (1.23%)
__________________________________________________________________
31
__________________________________________________________________
119
__________________________________________________________________
0.03086
__________________________________________________________________
ko00660
__________________________________________________________________
C5-Branched dibasic acid metabolism
__________________________________________________________________
11 (0.04%)
__________________________________________________________________
5
__________________________________________________________________
3
__________________________________________________________________
0.026546
__________________________________________________________________
ko00620
__________________________________________________________________
Pyruvate metabolism
__________________________________________________________________
247 (0.9%)
__________________________________________________________________
20
__________________________________________________________________
94
__________________________________________________________________
0.017667
__________________________________________________________________
Amino acid metabolism
__________________________________________________________________
ko00360
__________________________________________________________________
Phenylalanine metabolism
__________________________________________________________________
214 (0.78%)
__________________________________________________________________
23
__________________________________________________________________
92
__________________________________________________________________
1.42E-05
__________________________________________________________________
ko00270
__________________________________________________________________
Cysteine and methionine metabolism
__________________________________________________________________
225 (0.82%)
__________________________________________________________________
34
__________________________________________________________________
78
__________________________________________________________________
0.000967
__________________________________________________________________
ko00250
__________________________________________________________________
Alanine, aspartate and glutamate metabolism
__________________________________________________________________
171 (0.62%)
__________________________________________________________________
21
__________________________________________________________________
63
__________________________________________________________________
0.006038
__________________________________________________________________
ko00300
__________________________________________________________________
Lysine biosynthesis
__________________________________________________________________
39 (0.14%)
__________________________________________________________________
5
__________________________________________________________________
21
__________________________________________________________________
0.000519
__________________________________________________________________
ko00290
__________________________________________________________________
Valine, leucine and isoleucine biosynthesis
__________________________________________________________________
90 (0.33%)
__________________________________________________________________
16
__________________________________________________________________
33
__________________________________________________________________
0.00268
__________________________________________________________________
ko00380
__________________________________________________________________
Tryptophan metabolism
__________________________________________________________________
116 (0.42%)
__________________________________________________________________
19
__________________________________________________________________
42
__________________________________________________________________
0.002663
__________________________________________________________________
ko00260
__________________________________________________________________
Glycine, serine and threonine metabolism
__________________________________________________________________
139 (0.51%)
__________________________________________________________________
16
__________________________________________________________________
52
__________________________________________________________________
0.013956
__________________________________________________________________
ko00330
__________________________________________________________________
Arginine and proline metabolism
__________________________________________________________________
167 (0.61%)
__________________________________________________________________
25
__________________________________________________________________
52
__________________________________________________________________
0.045321
__________________________________________________________________
Metabolism of other amino acids
__________________________________________________________________
ko00460
__________________________________________________________________
Cyanoamino acid metabolism
__________________________________________________________________
167 (0.61%)
__________________________________________________________________
23
__________________________________________________________________
71
__________________________________________________________________
7.09E-06
__________________________________________________________________
Biosynthesis of other secondary metabolites
__________________________________________________________________
ko00940
__________________________________________________________________
Phenylpropanoid biosynthesis
__________________________________________________________________
483 (1.76%)
__________________________________________________________________
57
__________________________________________________________________
201
__________________________________________________________________
2.29E-10
__________________________________________________________________
ko00941
__________________________________________________________________
Flavonoid biosynthesis
__________________________________________________________________
272 (0.99%)
__________________________________________________________________
29
__________________________________________________________________
124
__________________________________________________________________
1.24E-08
__________________________________________________________________
ko00945
__________________________________________________________________
Stilbenoid, diarylheptanoid and gingerol biosynthesis
__________________________________________________________________
241 (0.88%)
__________________________________________________________________
39
__________________________________________________________________
96
__________________________________________________________________
1.20E-07
__________________________________________________________________
ko00944
__________________________________________________________________
Flavone and flavonol biosynthesis
__________________________________________________________________
78 (0.28%)
__________________________________________________________________
8
__________________________________________________________________
37
__________________________________________________________________
0.000808
__________________________________________________________________
ko00966
__________________________________________________________________
Glucosinolate biosynthesis
__________________________________________________________________
40 (0.15%)
__________________________________________________________________
11
__________________________________________________________________
16
__________________________________________________________________
0.000298
__________________________________________________________________
ko00402
__________________________________________________________________
Benzoxazinoid biosynthesis
__________________________________________________________________
43 (0.16%)
__________________________________________________________________
3
__________________________________________________________________
24
__________________________________________________________________
0.001612
__________________________________________________________________
ko00950
__________________________________________________________________
Isoquinoline alkaloid biosynthesis
__________________________________________________________________
40 (0.15%)
__________________________________________________________________
5
__________________________________________________________________
18
__________________________________________________________________
0.015469
__________________________________________________________________
Energy metabolism
__________________________________________________________________
ko00195
__________________________________________________________________
Photosynthesis
__________________________________________________________________
113 (0.41%)
__________________________________________________________________
4
__________________________________________________________________
72
__________________________________________________________________
1.86E-09
__________________________________________________________________
Lipid metabolism
__________________________________________________________________
ko00592
__________________________________________________________________
alpha-Linolenic acid metabolism
__________________________________________________________________
156 (0.57%)
__________________________________________________________________
20
__________________________________________________________________
67
__________________________________________________________________
2.46E-05
__________________________________________________________________
ko01040
__________________________________________________________________
Biosynthesis of unsaturated fatty acids
__________________________________________________________________
94 (0.34%)
__________________________________________________________________
9
__________________________________________________________________
44
__________________________________________________________________
0.000624
__________________________________________________________________
ko00565
__________________________________________________________________
Ether lipid metabolism
__________________________________________________________________
675 (2.46%)
__________________________________________________________________
83
__________________________________________________________________
215
__________________________________________________________________
0.006119
__________________________________________________________________
ko00062
__________________________________________________________________
Fatty acid elongation
__________________________________________________________________
6 (0.02%)
__________________________________________________________________
2
__________________________________________________________________
4
__________________________________________________________________
0.003736
__________________________________________________________________
ko00564
__________________________________________________________________
Glycerophospholipid metabolism
__________________________________________________________________
866 (3.15%)
__________________________________________________________________
110
__________________________________________________________________
273
__________________________________________________________________
0.001839
__________________________________________________________________
ko00100
__________________________________________________________________
Steroid biosynthesis
__________________________________________________________________
80 (0.29%)
__________________________________________________________________
9
__________________________________________________________________
31
__________________________________________________________________
0.034756
__________________________________________________________________
ko00591
__________________________________________________________________
Linoleic acid metabolism
__________________________________________________________________
76 (0.28%)
__________________________________________________________________
5
__________________________________________________________________
35
__________________________________________________________________
0.013071
__________________________________________________________________
ko00071
__________________________________________________________________
Fatty acid metabolism
__________________________________________________________________
136 (0.49%)
__________________________________________________________________
11
__________________________________________________________________
56
__________________________________________________________________
0.012113
__________________________________________________________________
Metabolism of cofactors and vitamins
__________________________________________________________________
ko00785
__________________________________________________________________
Lipoic acid metabolism
__________________________________________________________________
11 (0.04%)
__________________________________________________________________
0
__________________________________________________________________
9
__________________________________________________________________
0.005247
__________________________________________________________________
ko00770
__________________________________________________________________
Pantothenate and CoA biosynthesis
__________________________________________________________________
79 (0.29%)
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
0.045586
__________________________________________________________________
Metabolism of terpenoids and polyketides
__________________________________________________________________
ko00904
__________________________________________________________________
Diterpenoid biosynthesis
__________________________________________________________________
98 (0.36%)
__________________________________________________________________
10
__________________________________________________________________
55
__________________________________________________________________
6.17E-08
__________________________________________________________________
ko00903
__________________________________________________________________
Limonene and pinene degradation
__________________________________________________________________
205 (0.75%)
__________________________________________________________________
33
__________________________________________________________________
83
__________________________________________________________________
4.46E-07
__________________________________________________________________
ko00906
__________________________________________________________________
Carotenoid biosynthesis
__________________________________________________________________
195 (0.71%)
__________________________________________________________________
25
__________________________________________________________________
85
__________________________________________________________________
1.07E-06
__________________________________________________________________
ko00908
__________________________________________________________________
Zeatin biosynthesis
__________________________________________________________________
312 (1.14%)
__________________________________________________________________
41
__________________________________________________________________
113
__________________________________________________________________
0.000209
__________________________________________________________________
ko00902
__________________________________________________________________
Monoterpenoid biosynthesis
__________________________________________________________________
30 (0.11%)
__________________________________________________________________
13
__________________________________________________________________
26
__________________________________________________________________
0.01791
__________________________________________________________________
Glycan Biosynthesis and Metabolism
__________________________________________________________________
ko00531
__________________________________________________________________
Glycosaminoglycan degradation
__________________________________________________________________
86 (0.31%)
__________________________________________________________________
5
__________________________________________________________________
39
__________________________________________________________________
0.017583
__________________________________________________________________
Immune system
__________________________________________________________________
ko04650
__________________________________________________________________
Natural killer cell mediated cytotoxicity
__________________________________________________________________
123 (0.45%)
__________________________________________________________________
23
__________________________________________________________________
42
__________________________________________________________________
0.00168
__________________________________________________________________
Signal Transduction
__________________________________________________________________
ko04075
__________________________________________________________________
Plant hormone signal transduction
__________________________________________________________________
1667 (6.07%)
__________________________________________________________________
185
__________________________________________________________________
614
__________________________________________________________________
1.87E-13
__________________________________________________________________
Environmental Adaptation
__________________________________________________________________
ko04626 Plant-pathogen interaction 1719 (6.26%) 139 693 3.34E-15
[86]Open in a new tab
Transcription factor prediction
A total of 1636 unigenes were identified to be involved in
transcription, including 720 DEGs (209 up-regulated and 511
down-regulated) (Figure [87]7). The largest gene family was the
ethylene-responsive element binding factor family (ERF), followed by
the basic helix-loop-helix family (bHLH), C2H2 family, the
Homeodomain-leucine zipper family (HD-ZIP), and the WRKY family.
Figure 7.
[88]Figure 7
[89]Open in a new tab
Distribution of transcription factors in gene families. A. The
distribution of transcription factors according to the gene family
information. B. DEGs from every gene family involved in transcription.
Quantitative real-time reverse transcription-PCR (qRT-PCR) analysis
Our genome-wide expression analysis results were confirmed by
quantitative real-time (qRT) PCR using TaqMan probes. We selected 20
unigenes, all of which are known to be related to cold stress, such as
CBF (C-repeat-binding factors), LEA (Late embryogenesis abundant), LTI
(low-temperature-induced), COR (cold-responsive), ERD (early
dehydration-inducible), and DREB (dehydration-responsive element
binding) [[90]30,[91]31]. The Ct values of 18S rRNA of all samples
ranged from 24.0 to 26.0. All 20 transcripts showed the same expression
pattern as the in silico differential analysis results from
high-throughput sequencing (Table [92]4).
Table 4.
Real-time RT-PCR with putative unique transcripts (PUTs)
Putative unique transcript ID Annotation (BLASTX) Relative gene
expression by qRT-PCR (2^-ΔΔ CT) Expression difference analysis of
Illumina/Solexa (Log2(CT_RPKM/WT_RPKM))
Unigene3649_All
__________________________________________________________________
CBF3 protein [Glycine max]
__________________________________________________________________
2.11 ± 0.18
__________________________________________________________________
7.8564
__________________________________________________________________
Unigene5045_All
__________________________________________________________________
PREDICTED: uncharacterized protein LOC100795990 isoform 1 [Glycine max]
__________________________________________________________________
2.64 ± 0.51
__________________________________________________________________
2.1173
__________________________________________________________________
CL9479.Contig1_All
__________________________________________________________________
ICE-like protein [Corylus heterophylla]
__________________________________________________________________
1.90 ± 0.40
__________________________________________________________________
2.2465
__________________________________________________________________
Unigene2612_All
__________________________________________________________________
PREDICTED: probable transcription factor PosF21-like [Glycine max]
__________________________________________________________________
2.70 ± 0.67
__________________________________________________________________
4.3792
__________________________________________________________________
CL26498.Contig1_All
__________________________________________________________________
Basic leucine zipper transcription factor [Medicago truncatula]
__________________________________________________________________
4.30 ± 2.79
__________________________________________________________________
2.0836
__________________________________________________________________
Unigene12211_All
__________________________________________________________________
PREDICTED: microtubule-associated protein 70-5-like [Glycine max]
__________________________________________________________________
2.82 ± 0.11
__________________________________________________________________
2.9144
__________________________________________________________________
CL25117.Contig1_All
__________________________________________________________________
Cold acclimation protein COR413-PM1 [Medicago truncatula]
__________________________________________________________________
3.51 ± 0.32
__________________________________________________________________
2.9919
__________________________________________________________________
CL33467.Contig1_All
__________________________________________________________________
PREDICTED: sugar transporter ERD6-like 5-like [Glycine max]
__________________________________________________________________
2.44 ± 0.27
__________________________________________________________________
1.9228
__________________________________________________________________
Unigene12905_All
__________________________________________________________________
PREDICTED: LOW QUALITY PROTEIN: sugar transporter ERD6-like 16-like
[Glycine max]
__________________________________________________________________
12.59 ± 7.89
__________________________________________________________________
10.7104
__________________________________________________________________
CL21996.Contig1_All
__________________________________________________________________
PREDICTED: cysteine proteinase RD19a-like [Glycine max]
__________________________________________________________________
1.78 ± 0.10
__________________________________________________________________
1.2868
__________________________________________________________________
Unigene28735_All
__________________________________________________________________
Medicago truncatula HVA22-like protein a (MTR_4g108350) mRNA, complete
cds
__________________________________________________________________
2.35 ± 0.28
__________________________________________________________________
2.107
__________________________________________________________________
Unigene5543_All
__________________________________________________________________
HVA22-like protein e [Medicago truncatula]
__________________________________________________________________
9.91 ± 2.45
__________________________________________________________________
6.7199
__________________________________________________________________
Unigene6814_All
__________________________________________________________________
PREDICTED: HVA22-like protein k-like [Glycine max]
__________________________________________________________________
7.02 ± 0.32
__________________________________________________________________
2.8637
__________________________________________________________________
Unigene37576_All
__________________________________________________________________
PREDICTED: low-temperature-induced 65 kDa protein-like [Glycine max]
__________________________________________________________________
2.78 ± 1.34
__________________________________________________________________
7.4373
__________________________________________________________________
CL11725.Contig1_All
__________________________________________________________________
PREDICTED: low-temperature-induced 65 kDa protein-like [Glycine max]
__________________________________________________________________
2.16 ± 0.17
__________________________________________________________________
7.5431
__________________________________________________________________
CL5093.Contig1_All
__________________________________________________________________
Hydrophobic protein LTI6B [Medicago truncatula]
__________________________________________________________________
2.12 ± 0.10
__________________________________________________________________
3.8777
__________________________________________________________________
Unigene5480_All
__________________________________________________________________
Hydrophobic protein RCI2A [Arabidopsis thaliana]
__________________________________________________________________
4.32 ± 0.32
__________________________________________________________________
5.8638
__________________________________________________________________
CL26053.Contig1_All
__________________________________________________________________
PREDICTED: hydrophobic protein LTI6A-like [Glycine max]
__________________________________________________________________
3.43 ± 0.26
__________________________________________________________________
1.5422
__________________________________________________________________
Unigene37577_All
__________________________________________________________________
PREDICTED: low-temperature-induced 65 kDa protein-like [Glycine max]
__________________________________________________________________
3.53 ± 1.07
__________________________________________________________________
6.5686
__________________________________________________________________
CL30168.Contig1_All seed maturation protein [Glycine tomentella]
7.42 ± 0.88 3.3724
[93]Open in a new tab
Discussion
Ammopiptanthus mongolicus, the sole broad-leaved evergreen angiosperm
genus in the deserts of eastern central Asia [[94]32], can maintain its
growth at temperatures as low as −30°C [[95]15]. This attribute makes
it an ideal model for studying the cold-tolerance mechanisms of plants.
Several cold-related genes have been identified in A. mongolicus,
including AmCBL1[[96]7], AmLEA[[97]8], and AmCIP[[98]11]. However, for
woody plants, the available genetic resources are not sufficient to
determine the mechanisms of cold tolerance.
Few studies have performed transcriptome sequencing of A. mongolicus.
Zhou et al. identified putative genes associated with drought tolerance
using 454 pyrosequencing [[99]33]. Compared with this alternative
next-generation method, solexa has the advantages of lower cost and
generation of larger amounts of data [[100]17]. Moreover, using the
recently developed Trinity software, any disadvantages related to short
reads can be overcome, resulting in the assembly of transcriptome
results without a reference genome being as reliable as those with an
available reference genome [[101]29].
During cold acclimation, plants receive low temperature signals and
initiate a defense mechanism, including physical structure adaptations
(changes in lipid composition), increases in intercellular
osmoprotectants (such as soluble sugars, proline and betaine), and
increased synthesis of anti-oxidants (superoxide dismutase, catalase
and ascorbic acid reductase), enabling restoration of the balance of
matter synthesis and energy metabolism and enhancing survival in colder
environments [[102]30,[103]34]. Furthermore, expression patterns of a
large number of genes during cold acclimation have been detected using
gene chip and microarray technologies [[104]35,[105]36]. Use of
transcriptome sequencing during cold acclimation will increase our
understanding of the cold tolerance mechanisms of plants.
The amount of data obtained from transcriptome sequencing varies
according to the transcriptome size of the species. However, the
transcriptome size is affected by both gene number and abundance, and
varies markedly among species. Our estimate of the nuclear genome size
of A. mongolicus (2n = 18) using flow cytometry was
2C = 1639.12 ± 15.22 Mb. In this study, the total length of the reads
from both samples was ~11.9 gigabases (Gb). This is the first report of
the genome size of Ammopiptanthus mongolicus.
Membrane systems
Membrane systems, which are known to be the primary site of freezing
injury in plants, suffer multiple forms of damage caused by
freeze-induced cellular dehydration [[106]37]. During cold acclimation,
plants experience improved cold tolerance with increased concentrations
of unsaturated fatty acids and phospholipids [[107]38]. We identified a
total of 773 genes (4.46%) involved in “Lipid transport and metabolism”
according to the COG classification. Moreover, according to the
metabolic pathway enrichment analysis, eight pathways, including
“Biosynthesis of unsaturated fatty acids” (Ko01040), were involved in
lipid metabolism. Increases in the biosynthesis of unsaturated fatty
acids improve cold defense and prevent damage caused by low
temperatures [[108]39]. Nine genes in this process showed significant
upregulation in transcripts after cold stress. For example, the fatty
acid desaturase 8 (FAD8) gene (CL23852.Contig1_All) was upregulated by
1.62-fold. The FAD8 gene in A. thaliana encodes chloroplast
membrane-associated ω-3 desaturase, which contributes to freezing
tolerance by altering the lipid composition [[109]40].
Intercellular osmoprotectant
Proline, one of the most important organic osmolytes, participates in
the responses to various environmental stresses [[110]41]. As a
hydrophilic protein, proline can relieve the osmotic stress caused by
cold-induced dehydration. Two genes (CL407.Contig3_All and
CL31553.COntig1_All) encoding delta-1-pyrroline-5-carboxylate synthase
(P5CS) were to be significantly upregulated by 1.42- and 3.82-fold
after cold stress, respectively. As a key enzyme in proline synthesis,
P5CS participates in the cold-stress response and shows high
expression, which promotes the synthesis of proline for cold tolerance
[[111]42,[112]43]. The accumulation of sucrose and other simple sugars
also contributes to the stabilization of membranes, as these molecules
protect membranes against freeze-induced damage in vitro[[113]44].
Three LEA (late embryogenesis abundant)-related genes
(CL36265.Contig1_All, CL30168.Contig1_All) were also examined with fold
changes in their expression ranging from 3.37 to 5.64. The LEA protein
functions as an antioxidant, as well as a membrane and protein
stabilizer, during water stress [[114]45]. In A. mongolicus and other
legumes, recent studies have suggested that novel hydrophilic and LEA
polypeptides stabilize membranes against freeze-induced injury
[[115]8,[116]46].
Antioxidant enzyme system
When plants are under cold stress, reactive oxygen species (ROS)
accumulate. These are harmful to both the membrane and related
biological macromolecules [[117]47]. During cold acclimation, the
antioxidant enzyme system of plants is enhanced in response to the
increased stress [[118]48]. A. mongolicus can maintain efficient growth
in extremely stressful environments, which makes it a valuable natural
resource and strong antioxidant [[119]5]. The expression profiles of
seven ROS-scavenging enzyme genes (AmSOD, AmAPX, AmGPX, AmCAT, AmGLR,
AmPrx, and AmTrx) in A. mongolicus were reported by Shi et al. and
AmCAT and AmSOD showed high expression levels [[120]49]. In our study,
a total of 830 DEGs were related to the GO term “oxidoreductase
activity”, 186 of which were up-regulated. Among them, we identified 20
genes in the peroxisome pathway, including two (CL33833.Contig1_All and
Unigene5521_All) encoding CAT (catalase) and one (Unigene12648_All)
encoding SOD (superoxide dismutase) with the fold changes of 2.84,
1.42, and 4.50, respectively. In rice, CAT plays an important role in
cold acclimation, and the accumulation of SODs can reduce cold injury
[[121]50].
Ca^2+ and ABA
As an important second messenger, Ca^2+ is known to play a role in the
plant cold-stress response. The concentration of Ca^2+ increases
rapidly during cold stress, followed by a series of signals mediated by
a combinations of protein phosphorylation/dephosphorylation cascades
[[122]51]. As a large subfamily of plant kinases, Calcium dependent
protein kinase (CDPKs) are implicated as important sensors of Ca^2+
flux in plants in response to a variety of biotic and abiotic stress
stimuli [[123]52]. We have identified three genes (Unigene567_All,
Unigene13378_All, Unigene13204_All) related to CDPK, with fold changes
ranging from 1.96 to 3.14 in their expression after cold stress.
Abscisic acid (ABA) also plays a crucial role in the cold acclimation
of plants. Cold acclimation has been reported to be involved in both
ABA-dependent and -independent pathways [[124]53,[125]54]. The type 2C
protein phosphatases (PP2C) ABI1 and ABI2, which negatively regulate
ABA responses, play a key role in ABA signal transduction [[126]55]. In
our study, eight DEGs (Unigene22945_All, Unigene30641_All,
Unigene34701_All, CL2188.Contig1_All, CL1877.Contig1_All,
CL16627.Contig1_All, CL10137.Contig1_All, and CL28715.Contig1_All)
related to PP2C were identified, showing significant down-regulation
with fold changes ranging from −2.3 to −12.5 in their expression after
cold stress. The function of ABA-activated SnRK2 protein kinase has
also been reported in dehydration stress signaling in
Arabidopsis[[127]56,[128]57]. One gene (Unigene3661_All) related to
SnRK2 was identified with a fold change of 2.4. On the other hand, two
genes related to SnRK2 in A. mongolicus were reported by Zhou et al.,
with fold changes of 1.89 and 3.75, respectively, after drought
exposure for 72 h [[129]33]. Therefore, it is likely that the SnRK2
gene of A. mongolicus shows an expression pattern under cold stress
similar to that under drought stress. We also identified PLDα1, PLDβ1
and PLDδ to be up-regulated (fold changes of 1.20, 1.18 and 1.93,
respectively). These genes are believed to function in ABA signaling in
guard cells [[130]58,[131]59].
Cold-related genes and transcription factors
We last focused on changes in the expression of genes associated with
transcription factors. A total of 720 of 1636 unigenes were identified
as DEGs, 209 of which showed significant upregulation. Most of these
unigenes had homologs in other legumes, such as Glycine max, Medicago
truncatula and Lotus japonicas, which are known to be stress-induced,
such as the ERF and WRKY families. These up-regulated transcription
factors may play important roles in plant defense and stress responses
[[132]60]. Some cold-related genes identified by Thomashow [[133]31],
such as COR, LEA, CBF, and DREB, have been cloned and identified by
qRT-PCR showing the same trend as our Illumina/Solexa sequencing.
Conclusions
This is the first report of transcriptome sequencing of A. mongolicus
under cold acclimation using Illumina/Solexa. The total length of the
reads was ~11.9 Gb. A total of 82795 unigenes were assembled, 32728 of
which were differently expressed with 9309 unigenes showing
up-regulation. By performing BLAST analysis of the all-unigenes against
public databases (Nr, Swiss-Prot, KEGG and COG), we obtained functional
annotations and classifications. The large number of transcriptomic
sequences and their functional annotations provide sufficient resources
for molecular studies of A. mongolicus. Moreover, information on the
KEGG metabolic pathways and transcription factors will facilitate the
discovery of other cold resistant genes.
Methods
Plant materials
A. mongolicus seeds were collected from AlaShan Desert in Inner
Mongolia, Northwest China. After sterilization and seeding in
hormone-free Murashige and Skoog medium for 30 days, the seedlings were
moved and cultured at room temperature (approximately 20°C), with a
photoperiod of 16 h light and 8 h dark. Two weeks after transplanting,
we divided the plantlets into two groups. The first group served as the
control sample (CK), while the other was moved and cultured at 4°C as
the cold-treated sample (CT). Both samples were planted in the nursery
of Beijing Forestry University (BJFU) (116.3°E, 40.0°N), and watered
every three days. Leaves and roots of both samples were collected
simultaneously after treatment for 14 days and rapidly stored at −80°C
until required for RNA extraction.
Nuclear DNA content determination
The cotyledons of A. mongolicus were collected from the plants from
which the leaves and roots for sequencing were obtained. Cotyledons
were homogenized in 2-mL homogenization buffer (45 mM MgCl[2]; 30 mM
sodium; 20 mM MOPS; 0.1% (w/v) TritonX-100; pH 7.0) [[134]61] and
filtered through a 300-mesh nylon netting. The nuclei suspension was
obtained by centrifuging the suspension at 1100 rpm for 6 min. After
staining with propidium iodide (PI, 50 μg/mL) and incubating at 4°C for
20 min, nuclei were examined by flow cytometry. The standard sample was
young healthy leaves from 1-week-old maize seedlings (Zea mays L.)
(Zheng Dan 958). Three measurements per sample were obtained.
RNA extraction and quality determination
Total RNA of each sample was extracted three times using a CTAB
procedure [[135]62]. Leaves and roots of each sample were extracted
separately and mixed with equal amounts of mRNA after examination. The
RNA samples were dissolved in 10 mM Tris (PH 7.6) and examined using
the NanoDrop ND-8000; the A[260]/A[280] ratios of both samples ranged
from 1.9 to 2.1. The integrity of the RNA samples was assessed with an
Agilent 2100 Bioanalyzer; no sign of degradation was found.
cDNA library construction and sequencing
For the synthesis of cDNA and Solexa sequencing, we prepared 45 μg of
total RNA for treated and control sample at concentrations of
approximately 1500 ng/μl. We then enriched the poly (A) mRNA using
beads with Oligo (dT) and interrupted mRNA into short fragments with
fragmentation buffer. Using these short fragments as templates, we
synthesized first-strand cDNA with hexamer-primers and reverse
transcriptase (Invitrogen). The second-strand cDNA was synthesized
using buffer, dNTPs, RNaseH (Invitrogen) and DNA polymerase I (New
England BioLabs). The short fragments were then purified using a
QiaQuick PCR extraction kit and resolved with EB buffer to finish the
end reparation, and were connected using sequencing adaptors. After
resolution by agarose gel electrophoresis, we selected fragments
suitable for PCR amplification. We then constructed two paired-end
libraries which were sequenced using an Illumina HiSeq™ 2000.
Raw reads produced from sequencing machines contain low-quality reads,
which negatively affect subsequent bioinformatics analyses. Therefore,
we discarded the these reads, including those with adaptors, those with
unknown nucleotides larger than 5% and those of low quality (< 20% of
the bases with a quality score Q ≤ 10) using an in-house Perl script.
The average proportion of clean reads in each sample was ~91.5%, on
which the following analysis was based.
De novo assembly and assessment
Transcriptome de novo assembly was performed using the short-reads
assembly program, Trinity [[136]29], which first combined reads with
certain lengths of overlap to form longer fragments, known as contigs.
These reads were then mapped back to contigs. Using paired-end reads,
we detected contigs from the same transcript as well as the distances
between these contigs. Next, we used Trinity to connect the contigs and
obtained sequences that cannot be extended on either end, known as
unigenes. Finally, we used TGICL [[137]63], a software system for rapid
clustering of large EST datasets, to assemble all the unigenes from
both samples to form a single set of non-redundant unigenes.
After clustering, the unigenes were divided into two classes: clusters
and singletons. Finally, BLASTx alignment (E-value < 0.00001) was
performed between unigenes and the protein databases, following the
priority order: Nr, Swiss-Prot, KEGG and COG. The best alignment
results were used to decide the sequence direction of unigenes. For
unigenes that could not be aligned to any of the above databases, we
used ESTScan to determine the sequence direction.
Functional annotation
We aligned the unigene sequences to the above-mentioned protein
databases by BLASTx (E-value < 0.00001) and to the nucleotide sequence
database Nt (E-value < 0.00001) by BLASTn. We thus obtained proteins
with the highest similarity to the given unigenes, as well as the
functional annotations.
According to the Nr annotation, we obtained the GO functional
annotation using the Balst2GO program [[138]64], and the GO functional
classification for all-unigenes using the WEGO software [[139]65] to
understand the distribution of gene functions of the species from the
macro-level. After aligning the all-unigenes to the COG database, we
obtained the COG functional annotations. Next, we further examined the
complex biological behaviors. To investigate the metabolic pathway
annotation of unigenes, we aligned the all-unigenes to the KEGG
database [[140]66], and so obtained pathway annotations.
Protein-coding region prediction
To predict the protein-coding regions, unigenes were aligned to the
protein databases in the above-mentioned priority order. The coding
regions of proteins with the highest ranks based on the BLAST results
were determined, and the coding region sequences were translated into
amino acid sequences using the standard codon table. Both the
nucleotide (5’-3’) and amino acid sequences of the unigene-coding
region were acquired. Unigenes that could not be aligned to any of the
above databases were scanned by ESTScan [[141]67] to determine the
sequence (5’-3’) direction and amino acid sequences of the predicted
coding region.
Differential expression analysis of unigenes
To identify those whose levels of expression differed, we performed a
differential expression analysis of the unigenes. RPKM (reads per kb
per million reads) [[142]68] was used to calculate unigene expression
levels, which eliminated the influence of gene length and sequencing
level on the calculation of gene expression. The RPKM method formula
was:
[MATH: RPKM=106CNL/103
, :MATH]
where C is the number of reads that uniquely aligned to one unigene; N
is the total number of reads that uniquely aligned to all unigenes; L
is the base number in the CDS of one unigene.
Based on the method described by Audic and Claverie [[143]69], we
determined the statistical significance of differential expression
profile for each gene. FDR (False Discovery Rate) control method
[[144]70] was used in multiple hypothesis testing to correct the
results for p value. After the FDR was obtained, we used the ratio of
RPKMs to calculate the fold-change in the expression of each gene in
two samples simultaneously. In our analysis, the differentially
expressed genes (DEGs) were screene with the threshold of FDR ≤ 0.001
and the absolute value of log2Ratio ≥ 1 [[145]71].
We then mapped all DEGs to each term of Gene Ontology database
([146]http://www.geneontology.org/) and calculated the gene numbers
each GO term had, and got a gene list and gene numbers for every
certain GO term. Then using hypergeometric test [[147]72], we found
significantly enriched GO terms in DEGs comparing to the genome
background of A. mongolicus. The calculated p value went through
Bonferroni Correction, taking corrected p value ≤ 0.05 as a threshold.
GO terms fulfilling this condition were defined as significantly
enriched GO terms in DEGs. P value formula was:
[MATH: P=1−∑i=0<
mi>m−1Mi
N−M
n−i
Nn
, :MATH]
where N is the number of all genes with GO annotation; n is the number
of DEGs in N; M is the number of all genes that are annotated to the
certain GO terms; m is the number of DEGs in M.
DEGs were also used in pathway enrichment analysis. We calculated the
gene numbers in each pathway by mapping all DEGs to KEGG database
([148]http://www.genome.jp/kegg). By comparing with the whole genome
background of A. mogolicus, pathways with p ≤ 0.05 were chosen as
significantly enriched in DEGs, using the same multiple testing
correction method with GO enrichment analysis. With pathway enrichment
analysis, we got the main biochemical pathways and signal transduction
pathways in which DEGs involved. Some cold stress related pathways were
listed, on which all following analyses and discussions based.
Transcription factors analysis
Transcription factors were predicted according to protein sequences
obtained from CDS prediction. We used hmmsearch to search the domain of
the plant transcription factors
([149]http://plntfdb.bio.uni-potsdam.de/v3.0/) and classified unigenes
according to the gene family information. Genes that were believed to
be associated with cold stress were selected for further investigation.
Quantitative real-time PCR analysis
The unigenes selected were then assessed by quantitative real-time PCR.
Approximately 1 μg of total RNA of each sample was converted into
single-stranded cDNA using M-MLV Reverse Transcriptase (Progema, USA).
The cDNA products were then diluted 100-fold with deionized water
before use as a template. The reaction was performed on a STEP ONE
PLUS™ Real-Time PCR System (Applied Biosystems, USA) using the
RealMasterMix (SYBR Green) (China). The reaction system (20 μL)
contained 9-μL 2.5 × RealMasterMix/20 × SYBR Solution, 1 μL of each of
the forward and reverse primers and 1-μL cDNA template. The reaction
was performed under the following conditions: 94°C for 2 min, followed
by 45 cycles of 94°C for 20 s, 60°C for 35 s and 68°C for 1 min. Three
independent biological replicates were performed for each sample.
Expression levels of the selected unigenes were normalized to that of
18S rRNA, an internal reference gene. Relative gene expression levels
were calculated using the 2^-ΔΔCt[[150]73]. The primer sequences used
for qRT-PCR are listed in Additional file [151]4.
Competing interest
The authors declare that they have no competing interests.
Authors’ contributions
WLY and XLX conceived this study. TP designed the experimental plan. TP
and CYY participated in sample collection, RNA preparation and analyzed
the sequence data. All authors read and approved the final manuscript.
Supplementary Material
Additional file 1
100 most abundant transcripts in both samples. The list of 100 most
abundant transcripts in both cold-treated sample (CT) and control
sample (CK). The Nr annotations of each gene-ID were also listed.
[152]Click here for file^ (22KB, xlsx)
Additional file 2
Genes that expressed only under cold treated. The list of genes that
expressed only under cold treated. The gene length, raw reads, as well
as functional annotation of several public databases were also listed.
[153]Click here for file^ (118.9KB, xlsx)
Additional file 3
GO enrichment analysis. The list of the GO enrichment analysis results.
GO terms with a corrected p value ≤ 0.05 were defined as significantly
enriched in DEGs.
[154]Click here for file^ (10.8KB, xlsx)
Additional file 4
Primer sequences for qRT-PCR. The primers used in quantitative
real-time PCR analysis.
[155]Click here for file^ (14.2KB, docx)
Contributor Information
Tao Pang, Email: pangtao@bjfu.edu.cn.
Chu-Yu Ye, Email: chuyu.ye@gmail.com.
Xinli Xia, Email: xiaxl@bjfu.edu.cn.
Weilun Yin, Email: yinwl@bjfu.edu.cn.
Acknowledgements