Abstract The lily, a famous bulbous flower, is seriously affected by high temperatures, which affect their growth and production. To date, the signalling pathways and the molecular mechanisms related to heat response in Lilium have not been elucidated. In this study, a comparative transcriptome analysis was performed in an important thermo-tolerant flower, L. longiflorum, and a thermo-sensitive flower, L. distichum. Lily seedlings were first exposed to heat stress at 42°C for different lengths of time, and the optimal time-points (2 h and 24 h) were selected for RNA sequencing (RNA-seq). Approximately 66.51, 66.21, and 65.36 Mb clean reads were identified from three libraries of L. longiflorum (LL_CK, LL_T2h and LL_T24h, respectively) and 66.18, 66.03, and 65.16 Mb clean reads were obtained from three libraries of L. distichum (LD_CK, LD_T2h and LD_T24h, respectively) after rRNA removing. A total of 34,301 unigenes showed similarity to known proteins in the database NCBI non-redundant protein (NR), Swiss-Prot proteins, InterPro proteins, Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG). In addition, 1,621 genes were differentially expressed in the overlapping libraries between LL_DEGs and LD_DEGs; of these genes, 352 DEGs were obviously upregulated in L. longiflorum and downregulated in L. distichum during heat stress, including 4-coumarate, CoA ligase (4CL), caffeoyl-CoA O-methyltransferase (CCoAOMT), peroxidase, pathogenesis-related protein 10 family genes (PR10s), 14-3-3 protein, leucine-rich repeat receptor-like protein kinase, and glycine-rich cell wall structural protein-like. These genes were mainly involved in metabolic pathways, phenylpropanoid biosynthesis, plant-pathogen interactions, plant hormone signal transduction, and kinase signalling pathways. Quantitative RT-PCR was performed to validate the expression profiling of these DEGs in RNA-seq data. Taken together, the results obtained in the present study provide a comprehensive sequence resource for the discovery of heat-resistance genes and reveal potential key components that are responsive to heat stress in lilies, which may help to elucidate the heat signal transcription networks and facilitate heat-resistance breeding in lily. Introduction High temperature is a significant environmental factor that affects plant growth and development. Heat stress, triggered by high temperatures, causes a set of physiological and biochemical changes in plants, such as the inhibition of photosynthesis and damage to the cell membrane, thereby leading to cell senescence and even cell death [[36]1]. In addition, when exposed to high temperature, plants can display growth stagnation, the abortion of flower buds and a decline in economic yield [[37]2, [38]3]. It is concerning that as global temperatures continue to increase, extreme heat events may threaten food and environment safety in the 21^st century [[39]4]. Thus, it is of great urgency to discover the genes related to heat tolerance and investigate the defence mechanisms against heat stress. As sessile organisms, plants have evolved numerous mechanisms to adapt to high temperature stress, including long-term evolutionary phenological and morphological adaptations and short-term avoidance/acclimation mechanisms. In past decades, major progress has been made in understanding plant response to heat stress. Various signal transduction components, such as phytohormones, sugar (as a signalling molecule), Ca-dependent protein kinases (CDPKs), and mitogen-activated protein kinase (MAPK/MPKs), are closely associated to the activation of heat-responsive signalling pathways [[40]5]. Some key factors are ion transporters, osmoprotectants, antioxidant proteins and transcription factors (TFs), which have been demonstrated to be involved in the regulation of the plant response to high temperatures [[41]2, [42]3, [43]6]. Transcription profiling of thermo tolerant and/or thermo susceptible genotypes has shown that many genes are influenced by high temperatures, including heat stress transcription factors (HSFs), heat shock proteins (HSPs), calmodulin CaM3 and genes involved in membrane fluidity, chromatin changes, RNA unfolding, enzymatic reactions, such as reactive oxygen species (ROS), and programmed cell death [[44]7]. Furthermore, the upregulation of several genes has been reported to enhance plant tolerance to high temperatures. These genes include Calmodulin-binding protein phosphatase PP7 [[45]8], protein phosphatase RCF2 and NAC19 [[46]9], HEAT-INDUCED TAS1 TARGET1 (HTT1) in Arabidopsis [[47]10] and SUMO E3 Ligase SlSIZ1 in tomato [[48]11], in addition to the numbers of HSFs that act as key transcription switches in regulating the activation of genes responsive to heat stress in virous plants [[49]12]. The lily, known for the attractive variations in colours, patterns and shapes, is one of the best cut and potted flowers and the lily bulb is always used as a vegetable for editable or medicinal in china, such as Lilium lancifolium thumb., L. brownii var.viridulum and L. davidii var. unicolor. However, the lily mainly enjoys cool-humid climate conditions, the scale of temperature is at approximately 18–24°C. Exposure to high temperature can result in the wilting of leaf tissues, abortion of flower buds, a reduction in cut flowers and even the degeneration of bulbs [[50]13–[51]15]. In most places in China, high temperatures can exceed 37°C, and the maximum can reach 45°C in the summer, becoming a barrier to lily production throughout the year. Therefore, it is very important to enhance the thermo-tolerance of lily plants by breeding heat-resistant lily cultivars. In recent years, increasing attention has been paid to the investigation of the mechanisms of thermo-tolerance in lily plants. HsfA2 was first cloned from Lilium longiflorum, and overexpression of LlHsfA2 enhanced the heat tolerance of transgenic Arabidopsis plants [[52]14]. LlHsfA1 interacting with LlHSFA2 played important roles in the heat stress response of lily [[53]16]. A small heat shock protein named HSP16.45, based on the molecular weight of its protein, was cloned from L. davidii, and the heterologous expression of LimHSP16.45 in E. coli or Arabidopsis further improved cell viability under extreme temperatures [[54]17, [55]18]. In addition, LlCaM3 expression was induced by heat and CaCl[2] in lily, and its overexpression increased the transcript levels of HsfA1a and Hsp18.2 and improved the thermo-tolerance of transgenic plants, suggesting that LlCaM3 is a key component in the Ca^2+-CaM HS signalling pathway in lily [[56]15]. However, traditional cloning and genetic transformation methods are difficult and time-consuming for further study of the mechanisms of thermo-tolerance. Transcriptome changes in response to heat stress as well as combined stresses have been reported in several plant species, including Arabidopsis thaliana [[57]19], Solanum lycopersicum [[58]20], Chinese cabbage [[59]21], Wheat [[60]22], Carnation [[61]23] and Pinus radiata [[62]24], etc. However, fewer studies have focused on heat stress-related metabolic pathways and the molecular signalling network in lily. The wild lily L. longiflorum is well-known for its strong thermo-tolerance [[63]13], while L. distichum is a thermo-sensitive species only found in Korea and northeastern China [[64]25]. In the present study, the physiological and biochemical changes of these two lily species with different thermo-tolerance levels were determined in response to short-term heat (42°C) stress. Three RNA libraries of L. longiflorum leaves in contrast to that of L. distichum leaves were produced using the RNA-seq method. The results revealed that 1,621 common unigenes were significantly responsive to heat stress, of which 350 unigenes were upregulated in L. longiflorum but downregulated in L. distichum. In contrast, 133 unigenes showed decreased levels in L. longiflorum but increased abundance in L. distichum. These unigenes were composed of glycine-rich cell wall structural protein-like, leucine-rich repeat receptor-like protein kinase (IMK2 and IMK3), 4-coumarate-coA ligase (4CL), caffeoyl-CoA O-methyltransferase-like (CCoAOMT), protein G1-like 1, glutathione transferase GST 23-like, glutathione S-transferase U17/U18-like, peroxidase (FLXPER3) and cationic peroxidase 1/SPC4-like, etc. The collected data provide new insights into heat tolerance in lily plants and contribute to the development of heat-tolerant plants through genetic modification approaches. Materials and methods Plant material and experimental design The heat-tolerant L. longiflorum and the heat-sensitive L. distichum were selected and cultivated in the tissue culture room of Yangtze Normal University under the conditions of 70% relative humidity under 25°C/18°C (day/night) and 16 h/8 h (light/dark). After 2 months of growth in aseptic condition from lily scales, the lily plantlets were comprised two groups. The control sample (CK: 0 h, 2 h, 8 h, 16 h, 24 h, 48 h, 72 h) was exposed to 25°C in 16 h/8 h (light/dark) and the heat-treated samples (HT: treatment for 0 h, 2 h, 8 h, 16 h, 24 h, 48 h, 72 h) was exposed to 42°C in the same light/dark conditions in a climate chamber respectively. At each time point, three bottles of these lily samples, each with 4 or 5 independent seedlingswere cleaned and immediately frozen in liquid nitrogen for RNA extraction and the same sampling were further performed for biochemical analysis. We abstracted cDNAs from LD_CK(0h), LD_CK(2h), LD_CK(24h), LD_0h, LD_2h, LD_24h, and LL_CK(0h), LL_CK(2h), LL_CK(24h), LL_0h, LL_2h and LL_24h, a total of 36 samples for qRT-PCR experiment with three replicates. Three independent biological replicates were performed, and a total of 42 samples for each type of lily were prepared together with the control sample, which were selected for physiological analysis of chlorophyll, electrical conductivity, carbohydrate and proline analyse. Physiological analysis of chlorophyll, electrical conductivity, carbohydrate and proline Total chlorophyll content was quantified according to the method described by Sims and Gamon (2002) in three replicates per treatment [[65]26]. Pigments were extracted with 10 mL of acetone/Tris (50 mM) buffer at pH 7.8 (4:1) (v/v). After homogenization and centrifugation, supernatants were used to read absorbance at 537, 647 and 663 nm (UV-1800, [66]www.aoe-sh.com, Shanghai, China), and the pigment content was determined. Electrical conductivity was measured using an electrical conductivity analyser (DDS-307, mL.fzchina.com, Chengdu, China). Three replicates of 100 mg of fresh leaves were collected after high temperature exposure and immediately immersed in sterile de-ionized water. After incubation at room temperature in 24 mL under agitation at 40 rpm, the initial conductivity of each was recorded as S1. Then, the leaves were cooked in boiling water for 15 min; after cooling to room temperature, the total conductivity of each sample was recorded as S2. Relative electrical conductivity (REC) was calculated using the following equation: [MATH: REC(%)=( S1/S2)×100. :MATH] Data of the total soluble sugars and the starch content were collected as previously described in Escandon et al. (2016) [[67]24]. Briefly, 50 mg of each frozen leaf was used to extract total soluble sugars, and after centrifugation, the pellet was used to quantify starch content. Absorbance was read at 625 nm and both contents were determined against a D-glucose standard curve. Three biological replicates were performed for each time-point treatment. Proline content was determined according to Bates et al. (1973) [[68]27]. Three replicates of 100 mg of frozen leaves were homogenized in 4 mL of 3% sulfosalicylic acid and centrifuged. Equal volumes (2 mL) of ninhydrin, glacial acetic acids and the supernatant were incubated at 100°C for 1 min. After cooling to room temperature, 4 mL of toluene was added for the extraction of the chromophore, and the absorbance was analysed at 520 nm by spectrophotometer (UV-1800, [69]www.aoe-sh.com, Shanghai, China). A standard curve of proline was used for calculating the proline concentration. RNA isolation sequencing library construction and sequencing According to the manufacturer’s instructions, total RNA was extracted from plantlets using TRIzol Reagent (Invitrogen, USA), and then contaminated DNA was removed from the extracted RNA by treated with DNase I (Promega, USA). The RNA quality and purity were verified using Nanodrop 2000 and an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA) respectively. Then, high-quality RNA samples of each time point in three independent biological replicates were mixed and then used for RNA sequencing. The quality assessment should yield an OD[260/280] ratio of 1.8 to 2.0 and an OD[260/230] ratio of 2.0 to 2.2 with latter being an additional value for purity determination. A total of 6 cDNA libraries were constructed, comprising three libraries of L. longiflorum (LL_CK, LL_T2h, and LL_T24h) and three libraries of L. distichum (LD_CK, LD_ T2h, and LD_ T24h), using the BGISEQ-500 platform at the Wuhan, Hubei Province (BGI’ Tech, China). In brief, the total RNA was purified to enrich poly(A) mRNA with magnetic oligo (dT) beads. The target RNA was fragmented and then used for dscDNA library construction by random hexamer (N6) primers. The ends of the dscDNA were repaired with phosphate at the 5’ end and sticky ‘A’ at the 3’ end, after which the dscDNA strands were ligated with adapters that had a sticky ‘T’ at the 3’ end. Two specific primers were used to amplify the ligation product. Finally, the PCR product was denatured by heat, and the single-strand DNA was cyclized by splint oligo and DNA ligase. Six cDNA libraries were constructed and sequenced on a BGISEQ-500 RS platform at BGI. All raw read sequences are available in the NCBI sequence read archive under the accession number PRJNA577293 and the assembled transcript sequences were also submitted to the NCBI TSA database ([70]GIUQ00000000). Reads processing and identification of differentially expressed genes (DEGs) Clean reads were obtained by filtering out adaptor-only reads, trimming reads containing more than 5% unknown nucleotides, and low-quality reads with the percentage of low quality bases (base quality ≤10) using Trimmomatic and we subsquently aligned the clean high-quality reads to the SSU and LSU rRNA sequences using BWA software. After that, we assembled the clean reads from the heat-tolerant L. longiflorum after rRNA removing using de novo assembly program Trinity except K-mer value to conduct the de novo assembly [[71]28]. Additionally, only one read copy will be kept for assembly and redundant duplication reads be eliminated for mutli-duplication’s reads, After that, overlapped nucleic acid sequence were generated to the contigs assembled using Trinity. To obtain the unigene, the paired-end reads were used for constructing scaffolds with the paired end information by realigning to contigs. Then, these contigs in one transcript were assembled by the Trinity and gained the sequence not being extended on either end defined as unigenes. To harvest as much description as possible for the assembled sequences, all unigenes were annotated based on BLAST searches using BLASTx search tool through Swiss-Prot protein databases and the National Center for Biotechnology Information non-redundant protein (Nr) with threshold E-value set as less than 1e-10, identity > 70%, query coverage ≥ 80%, and other parameters were defaulted. In general, we used BLAST alignment of transcripts (or translations of predicted ORFs from transcripts) to reference protein sets as a means of assessing coding transcript completeness. Transcripts with ≥ 80% sequence coverage (i.e. a significant alignment between a transcript sequence from our assembly and a target protein sequence, where the alignment covers at least 80% of the target protein sequence) are thus considered “full or near-full length”. In addition, the conserved ortholog content was identified using the nematod Benchmark Universal SingleCopy Orthologs (BUSCOs, v4.0.6). To further predict their functions, based on Nr and SwissPro BLAST results, the unigenes were then annotated in Kyoto Encyclopedia of Genes and Genomes (KEGG, [72]http://www.kegg.jp/) database and Gene Ontology (GO, [73]http://www.geneontology.org/) database. For the nr annotations, the BLAST2GO program was used to assign GO annotations (comprised of biological processes, molecular functions, and cellular components) of unique assembled transcripts [[74]29]. Subsequently, WEGO (Web Gene Ontology Annotation Plot) software was used to conduct GO functional classification for understanding the distribution of gene functions at the macroscopic level [[75]30]. After that, Bowtie2 was adopted to map the clean reads to the de novo assembly transcriptome reference sequences, and based on the mapping of RNA-seq reads to the assembled transcriptome, the developed software RSEM was performed to assess transcript abundances by quantification of the de novo assembly transcript and calculated as the FPKM (fragments per kilobase of transcript per million mapped reads); Expression data from two libraries (treatment and control) were determined by mapping to the transcriptome assembly using Bowtie2 software [[76]31]. The fragments per kilobase of transcripts per million fragments mapped (FPKM) values were analysed further using RESM [[77]32] and PossionDis [[78]33] to get differentially expressed genes (DEGs) between the control and infected groups [(LL_T24h vs. LL_CK and LL_T2h vs. LL_CK for L. longiflorum) and (LD_T24h vs. LD_CK and LD_T2h vs. LD_CK for L. distichum)]. Further, to determine the threshold p-value in multiple tests, a false discovery rate (FDR) was used. Furthermore, significant enrichment was calculated when FDR was <0.05 and FPKM values showed at least a two-fold difference between the two samples reads. Furthermore, DEGs related to heat-stress responsiveness were analysed and plotted using Neighbour–Joining cluster through homemade R script. Followed by the coefficient of variation with a threshold of 1 across the different samples, these genes were selected and then transformed FPKM values by log2. GO functional annotation and KEGG pathway analysis of common differently expressed genes (DEGs) in lilies Common DEGs in both lily species responsive to heat stress were identified using the venny program ([79]https://bioinfogp.cnb.csic.es/tools/venny/); subsequently, we submitted those common DEGs for GO annotation through Gene Ontology Database ([80]http://www.geneontology.org/) and KEGG pathway enrichment analysis using KOBAS software. GO functional categories were assigned to differentially expressed genes based on Gene Ontology Database ([81]http://mL.geneontology.org/); the false discovery rate (FDR) was calculated to correct the P-value for the GOs of all the DEGs. Moreover, KOBAS software [[82]34] was used for pathway enrichment analysis by testing the statistical enrichment of DEGs in the KEGG pathway; the common DEGs in L. longiflorum (LL_T24h vs. LL_CK and LL_T2h vs. LL_CK) and L. distichum (LD_T24h vs. LD_CK and LD_T2h vs. LD_CK) were annotated to the KEGG database, which exhibited all significant enrichments for DEGs in the pathway. A P-value and FDR (FDR < 0.05) were presented after pathway analysis for each type. After that, their relative graphs were constructed using R script. Verification of the gene expression profiles of candidate DEGs by qRT-PCR in lilies Quantitative real-time PCR was performed on QuantStudio Real-time PCR system (Applied Biosystems by Thermo Fisher Scientific) platform using SYBR Premix Ex Taq (Cat. #RR420L, TakaRa, China) following the manufacturer’s instructions. The primers employed in the qRT-PCR experiments are designed by GenScript Molecular Biology Tools online ([83]https://mL.genscript.com/tools/real-time-pcr-tagman-primer-design- tool) ([84]S1 Table). The SAND gene was selected as the internal control according to a previous analysis [[85]35]. At least three independent biological replicates of RNA samples at each time point of heat treatment including CK were obtained from these lily plantlets and collected for qRT-PCR analysis. The first-stand cDNA was synthesized using PrimeScript^™ RT reagent Kit with gDNA Eraser (Cat. #RR047A, TakaRa, China). Amplification was performed using PCR as follows: a first denaturation step at 95°C for 3 min; 40 cycles of denaturation at 95°C for 15 s; followed by annealing and extension at 60°C for 1 min. Melting curves were generated to verify amplification specificity by a thermal denaturing step at 95°C for 15 s and 60°C for 1 min, and then from 60°C to 95°C slowly (0.15°C/S). To calculate the fold change in the expression levels of target genes, the relative quantitative method (2^-ΔΔCT) was selected and performed with three technical replicates for all reactions. Statistical analysis To evaluate differences between difference samples in L. longiflorum and L. distichum, variance analysis was carried out for the Student’s t-test using SPSS statistical software, significant differences among the relative expression levels of the genes were shown to be statistically significant (P<0.05). Results Physiological changes of lily seedings under heat stress conditions To determine the optimal time to analyse the heat response for lilies (L. longiflorum and L. distichum), uniform plantlets with 2 months of growth in aseptic conditions were subjected to high temperature (42°C) treatment and then sampled at six different times of stress (2 h, 8 h, 16 h, 24 h, 48 h and 72 h). As shown in [86]Fig 1, the plantlets of L. longiflorum grew normally, but the leaves gradually turned white with longer durations of heat stress. After 24 h, the colour of L. longiflorum leaves started to change but no signs of wilting, and the plantlets lost their green colour during 48 h to 72 h ([87]Fig 1A). By contrast, the leaves of L. distichum started to wilt accompanied by a colour change after 24 h; the leaves lost almost of their green colour at 48 h and finally appeared to be dying after 72 h ([88]Fig 1B). The analysis of chlorophyll content indicated coincident results with the colour change in lilies ([89]Fig 2). To determine leaf membrane damage, measurements of the electric conductivity showed that the levels were rapidly increased after 2 h in both lilies and reached their peak after 24 h in L. longiflorum and 8 h in L. distichum ([90]Fig 2B). Furthermore, the levels of the measured soluble sugar and the starch increased rapidly and sharply following heat exposure of L. longiflorum leaves compared with L. distichum leaves; however, the starch level declined faster than that of the soluble sugar ([91]Fig 2C and 2D). The general trends of the soluble sugar and proline were similar; both leveled off at 24 h, although they reached their peaks at different times, with the former reaching its peak at 16 h and the latter at 2 h ([92]Fig 2E). The entire physiological index showed that L. longiflorum was more responsive to the high temperature condition than L. distichum, which is consistent with the heat resistance of these two lilies. Taken together, the following experiments of the two lilies were performed at 0 h (CK) and two different heat-treatment times designated as LL_CK, LL_T2h, and LL_T24h for L. longiflorum in contrast to LD_CK, LD_ T2h, and LD_ T24h for L. distichum, accounting for the early heat response and the physiological changes. Fig 1. Dynamic changes of growth development for different lilies (L. longiflorum and L. distichum) subjected to high temperature (42°C) treatment for six lengths of time (2 h, 8 h, 16 h, 24 h, 48 h and 72 h). [93]Fig 1 [94]Open in a new tab Fig 1A. Dynamic changes of growth development for L. longiflorum under heat stress for different treatment times. Fig 1B. Dynamic changes of growth development for L. distichum under heat stress for different treatment time points. Fig 2. Physiological changes of chlorophyll content, soluble sugar content, relative electric conductivity, starch content and proline content for lily seedings under heat stress conditions. [95]Fig 2 [96]Open in a new tab Fig 2A. Dynamics changes of chlorophyll content determined in the heat-tolerance L. longiflorum and the heat-sensitive L. distichum. Fig 2B. Dynamics changes of relative electric conductivity validated by the heat-tolerance L. longiflorum and the heat-sensitive L. distichum. Fig 2C. Dynamics changes of soluble sugar content identified in the heat-tolerance L. longiflorum and the heat-sensitive L. distichum. Fig 2D. Dynamics changes of starch content determined in the heat-tolerance L. longiflorum and the heat-sensitive L. distichum. Fig 2E. Dynamics changes of proline content identified in the heat-tolerance L. longiflorum and the heat-sensitive L. distichum. Illumina sequencing and de novo assembly of lily transcriptome responding to heat stress To develop a comprehensive overview of gene expression profiles of lilies (lilium sp.) responding to high temperature stress, three libraries (LL_C, LL_T2h and LL_T24h) of L. longiflorum in contrast to that (LD_C, LD_T2h and LD_T24h) of L. distichum were constructed. The RNA samples were prepared from lily fleshy leaf with differential thermotolerance and subjected to pair-end read (PE) sequencing using the BGISEQ-500 platform at Beijing Genomics Institute (BGI, Wuhan, China). PE sequencing could increase the depth of sequencing and improve de novo assembly efficiency [[97]36]. As a result, 74.71, 74.71, and 72.22 Mb raw reads were obtained from the LL_C, LL_T2h and LL_T24h libraries, respectively, and 74.71, 74.71, and 74.71 Mb raw reads in LD_C, LD_T2h and LD_T24h libraries, respectively. After removing adaptor sequences and low-quality reads, 66.51, 62.61, and 65.36 Mb clean reads were generated in three libraries of L. longiflorum and 66.18, 66.03, and 65.16 Mb clean reads in three libraries of L. distichum ([98]S2 Table). Subsequently, 198, 342 contigs were constructed from the raw sequence reads with an average length of 411 bp, and 130, 170 unigenes with a N50 length of 1442 bp were generated with an average length of 875 bp using paired-end reads in all the libraries. The percent of length distribution of the assembled contigs and unigenes is shown in [99]Fig 3A, which indicated that unigenes mainly ranged from 300 to 3000 bp, and the majority were distributed from 300 to 1800 bp (over 2000 numbers) accounting for more than 69.2% of all of the unigenes. The assembly completeness has been assessed by calculating the BUSCO. The assembled transcripts were compared against the benchmarking universal single-copy orthologs (BUSCO) revealing that almost 65% of the BUSCO groups have complete gene representation, while 10% are only partially recovered, and 25% are missing ([100]S1 Fig). It is probable that the assembly completeness was not good owing to the bigger and complex of the lily genome; there was not currently genome of the lily provided for research. To evaluate completeness of transcriptome libraries and annotations of all the assembled unigenes, the annotated sequences were corresponded to the known nucleotide sequences of plant species, with 32.96%, 27.75%, 10.52%, and 2.96% matching with Elaeis guineensis, Phoenix dactylifera, Musa acuminate subsp. malaccensis, and Nelumbo nucifera respectively, and the remaining 25.81% matching with others (except for the top matched species) ([101]Fig 3B). All of the top three species with BLAST hits were involved in the Palmaceae family (almost 60.71% of all assembled transcripts) and the Musaceae family (10.52% of all assembled transcripts), implying that the sequences of the lily transcripts aligned more closely to these two families. Through the BLASTx alignment against the public protein databases including Nr, KEGG, COG, Swiss-Prot and InterPro proteins, the results indicated that a total of 34,301 unigenes (89.11% of all unigenes) were annotated and matched to all of the public protein databases ([102]Fig 3C). Fig 3. Comprehensive overview of de novo assembly of lily transcriptome responding to heat stress in lily. [103]Fig 3 [104]Open in a new tab Fig 3A. length distribution of the assembled contigs and unigenes obtained from de novo assembly of lily transcriptome. Fig 3B. Species distribution of the top BLAST hits for lily assembled transcriptome. Fig 3C. Comprehensive overview of BLASTx alignment against the public protein databases including Nr, KEGG, COG, Swiss-Prot and InterPro proteins. Functional annotation and classification of the assembled unigenes from lily transcriptome Based on sequence homology, the Gene Ontology (GO) assignments system was performed to classify the roles of these uncharacterized sequences. By using the BLAST2GO program, all assembled unigenes were assigned to three main categories such as “biological process” (BP), “cellular component” (CC) and “molecular function” (MP), with a total of 22,659, 24,799 and 12,212 unigenes, respectively. Then, these annotated sequences were assigned to GO-slim terms to facilitate the functional distribution of plants. All GO terms were subsequently divided into 54 functional groups such as biological processes (23 groups), cellular component (17 groups), and molecular function (14 groups). Within the biological processes, the metabolic process including 5,754 (25.4%) unigenes and cellular process including 5,445 unigenes (24.0%) were the most dominant. For cellular component, the majority of GO terms were primarily assigned to the cell and cell part including 5,103 (20.6%) and 5,055 unigenes (20.4%), respectively. With regards to molecular function, those assignments were focused on catalytic activity including 5,543 unigenes (45.4%) and binding including 5,010 unigenes (41.0%) ([105]S2A Fig). In addition, blast analysis from the COG database indicated that all the unigenes were assigned to 15 COG categories. Among these categories, the cluster for ‘General functions prediction only’ related to basic physiological and metabolic functions accounted for the largest group, followed by ‘Posttranslational modification, protein turnover, chaperones’, ‘Function unknown’, ‘Defense mechanisms’, ‘Cytoskeleton’, ‘Lipid transport and metabolism’, and ‘Intracellular trafficking, secretion, and vesicular transport’, while fewer unigenes were assigned to ‘Nucleotide transport and metabolism’, ‘Coenzyme transport and metabolism’ and ‘Nuclear structure’ ([106]S2B Fig). These findings revealed that a great deal of genes with molecular functions were related to numerous biological progress under high temperature stress in lilies. Moreover, the KEGG pathway database contributed to better understand the biological roles of genes involved in different networks [[107]37]. Through comparing against the KEGG pathway database with KEGG Orthology (KO) numbers using BLASTx alignments with a cut-off E value of 10^−5, there were altogether 51,199 unigenes belong to 21 KEGG pathways annotated and assigned. Of these categories, the most represented pathways were ‘Transport and catabolism’ (2,291, 4.47%), Signal transduction (2,445, 4.78%), Translation (4,065, 7.94%), Carbohydrate metabolism (4,427, 8.65%) and Environment adaption (1,788, 3.49%) ([108]S2C Fig). All of the annotations provide valuable references for