Abstract Background Sacha Inchi (Plukenetia volubilis L.), which belongs to the Euphorbiaceae, has been considered a new potential oil crop because of its high content of polyunsaturated fatty acids in its seed oil. The seed oil especially contains high amounts of α-linolenic acid (ALA), which is useful for the prevention of various diseases. However, little is known about the genetic information and genome sequence of Sacha Inchi, which has largely hindered functional genomics and molecular breeding studies. Results In this study, a de novo transcriptome assembly based on transcripts sequenced in eight major organs, including roots, stems, shoot apexes, mature leaves, male flowers, female flowers, fruits, and seeds of Sacha Inchi was performed, resulting in a set of 124,750 non-redundant putative transcripts having an average length of 851 bp and an N50 value of 1909 bp. Organ-specific unigenes analysis revealed that the most organ-specific transcripts are found in female flowers (2244 unigenes), whereas a relatively small amount of unigenes are detected to be expressed specifically in other organs with the least in stems (24 unigenes). A total of 42,987 simple sequence repeats (SSRs) were detected, which will contribute to the marker assisted selection breeding of Sacha Inchi. We analyzed expression of genes related to the α-linolenic acid metabolism based on the de novo assembly and annotation transcriptome in Sacha Inchi. It appears that Sacha Inchi accumulates high level of ALA in seeds by strong expression of biosynthesis-related genes and weak expression of degradation-related genes. In particular, the up-regulation of FAD3 and FAD7 is consistent with high level of ALA in seeds of Sacha Inchi compared with in other organs. Meanwhile, several transcription factors (ABI3, LEC1 and FUS3) may regulate key genes involved in oil accumulation in seeds of Sacha Inchi. Conclusions The transcriptome of major organs of Sacha Inchi has been sequenced and de novo assembled, which will expand the genetic information for functional genomic studies of Sacha Inchi. In addition, the identification of candidate genes involved in ALA metabolism will provide useful resources for the genetic improvement of Sacha Inchi and the metabolic engineering of ALA biosynthesis in other plants. Electronic supplementary material The online version of this article (10.1186/s12864-018-4774-y) contains supplementary material, which is available to authorized users. Keywords: Sacha Inchi, de novo transcriptome, Organ-specific gene expression, α-linolenic acid metabolism, Plukenetia volubilis Background Sacha Inchi (Plukenetia volubilis L.), a member of the Euphorbiaceae family [[35]1], is native to the rainforests of South America [[36]2] . Sacha Inchi is also known as Inca peanut, wild peanut, Sacha peanut or mountain peanut, having been cultivated for centuries by the indigenous population [[37]3]. And thus, the magnitude of several evolutionary forces like selection, genetic drift and gene flow on population structure of Sacha Inchi was decided by strong anthropogenic influence [[38]4]. Based on cytological study, the most common chromosome number is 2n = 58 [[39]5]. Sacha Inchi seeds contain 41–54% oil [[40]2, [41]3, [42]6], which is characterized predominantly by high levels of polyunsaturated fatty acids (PUFAs), especially α-linolenic acid (ALA, C18:3 cis Δ9, 12, 15, ω-3) and linoleic acid (LA, C18:2 cis Δ9, 12, ω-6), which represent approximately 50 and 35% of the total oil, respectively [[43]2, [44]7, [45]8]. In addition, Sacha Inchi seeds contain substantial amounts of total tocopherols (137 mg/100 g) [[46]7], which are a class of chemical compounds consisting of various methylated phenols and display strong antioxidant activity [[47]7, [48]9, [49]10]. Considerable amounts of phytosterols (75.7–86.2 mg/100 g) and 15 polyphenolic compounds, belonging to phenyl alcohol, flavonoid, seicoridoid, and lignan classes, were positively identified in seeds [[50]7, [51]11]. Particularly, condensed tannins are the main family of phenolic compounds which might be indicative of potential high antioxidant properties [[52]12]. Leaf extracts were characterized by phenolic compounds, steroids, and/or terpenoids, which resulted in the antioxidant activities in leaves of Sacha Inchi [[53]13]. These results indicate that Sacha Inchi could be considered as an important material for production of antioxidant phenolic compounds and phytosterols. ALA and LA are essential fatty acids [[54]14], which are useful in the prevention of coronary heart disease, hypertension, diabetes, arthritis, high cholesterol, cancer, and inflammatory and autoimmune disorders [[55]15–[56]19]. Serum parameters in rats treated with Sacha Inchi oil indicated lower levels of cholesterol and triglycerides, and higher levels of high density lipoprotein in comparison with the control group [[57]20]. The lipid profiles of patients with hypercholesterolemia who intake seed oil of Sacha Inchi for four months indicated a decreased in the values of total cholesterol and non-esterified fatty acids, and a rise in high density lipoprotein and the insulin levels [[58]21]. Moreover, ALA and LA are also critical for the development of infants during pregnancy and breastfeeding periods [[59]8]. However, the amount of ALA in most human diets is insufficient. The ALA/LA ratio in the Western diet is approximately 1:15, which much lower than the ratio recommended by the WHO (1:2 to 1:6) [[60]15, [61]22]. The Western diet is very high in ω-6 FAs and low in ω-3 FAs because of the unwise recommendation to substitute ω-6 FAs for saturated fats to lower serum cholesterol concentrations, resulting in the production of products rich in ω-6 and poor in ω-3 FAs [[62]23]. Thus, it is necessary to increase the intake of ALA, which can be extracted from the seeds of some oil plants, such as Sacha Inchi. Triacylglycerol (TAG) is the primary unit of energy storage in eukaryotic cells. Normally, TAGs are derived either from the glycerol-3-phosphate (G3P) pathway (also known as the Kennedy pathway) or the acyl-dihydroxyacetone phosphate (acyl-DHAP) pathway [[63]24, [64]25]. TAG degradation through oxidation (primarily beta-oxidation) releases FAs [[65]26]. One aim of this study is to analyze the expression of genes involved in the G3P pathway and beta-oxidation, which are major pathways for TAG biosynthesis and degradation in most tissues or organisms. Thus far, most studies on Sacha Inchi have dealt with plant development and physiology [[66]27–[67]29], the characterization of seed oil [[68]3, [69]6, [70]8], in vitro regeneration systems [[71]30], and potential applications in biofuel production [[72]31] and in cosmetic, pharmaceutical, and food industries [[73]7, [74]32–[75]34]. However, the genetic information and molecular mechanisms underlying ALA metabolism in Sacha Inchi have rarely been studied, especially absence of a genome. Only one transcriptome analysis and the expression of oil biosynthesis genes in Sacha Inchi seeds have been published [[76]35, [77]36]. In the transcriptome analysis, the developing seeds of two stages from two-year-old Sacha Inchi were sequenced and unigenes that may be involved in de novo FA and triacylglycerol biosynthesis were identified [[78]35]. In another study, expression profiles of genes controlling unsaturated fatty acids biosynthesis and oil deposition in developing seeds of Sacha Inchi were investigated by quantitative real-time PCR [[79]35, [80]36]. However, only a few genes contributing to the high level of ALA have been characterized in Sacha Inchi. Our results provide priority candidates for future research. In this study, we sequenced and de novo assembled the transcriptome of 8 major organs of Sacha Inchi using next-generation sequencing technology. The assembled transcriptome sequences will expand the genetic information for functional genomic studies of Sacha Inchi. In addition, the identification of candidate genes involved in ALA metabolism will provide useful resources for the genetic improvement of Sacha Inchi and the metabolic engineering of ALA metabolism in other plants. Results Transcriptome sequencing and de novo assembly of Sacha Inchi To comprehensively construct the transcriptome of Sacha Inchi, eight major organs, including roots, stems, shoot apexes, mature leaves, male flowers, female flowers, fruits, and seeds, were sampled for RNA isolation. Distinct cDNA libraries of those organs were constructed and sequenced, resulting in a total of 164 G 150-bp paired-end raw reads. After the removal of adapters, poly-N-containing reads and low-quality sequences from the raw data, approximately 162 G clean reads were retained and used for transcriptome assembly and analysis (Additional file [81]1: Table S1). The Trinity [[82]37] assembly program, which has been shown to be the best single k-mer assembler for RNA-Seq short reads, was used for de novo assembly [[83]38]. As a result, an assembly of 349,951 contigs was established for the Sacha Inchi transcriptome. To reduce redundancy and potential assembly errors, the candidate unigenes that most likely had the longest ORFs (Open Reading Frames) were chosen from the assembly result, and then those transcripts were filtered by their fragments per kilobase per million mapped base pairs of sequenced (FPKM) values less than 0.1. Finally, a set of 124,750 unigenes with an average length of 851 bp and an N50 value of 1909 bp was obtained (Table [84]1). We have compared our data with the seed transcriptome reported by Wang et al. [[85]35] using BLASTn (E-value≤1e-10). The alignment rate of the reported seed transcriptome to our data is 93.8%, indicating the 6.2% of the seed transcriptome was not detected in our dataset, which probably resulted from the seed transcriptome reported by Wang et al. [[86]35] containing transcripts of seeds at two developmental stages, whereas our dataset containing transcripts of seeds at a single developmental stage. The size distribution of unigenes is shown in Additional file [87]2: Figure S1a. Principal component analysis (PCA) was conducted using R package, the distance assessment reveals that all three independent biological replicates of each sample have good reproducibility, and the seed showed the most distinctive expression patterns in all tested organs (Additional file [88]3: Figure S2), which is in accord with the result in Arabidopsis [[89]39]. Table 1. Summary of the assembly and annotation of the Sacha Inchi transcriptome No. of sequences Assembly  Total number of unigenes 124,750  Total bases (Mb) 145  Average unigene length (bp) 851  N50 (bp) 1909  Number of unigenes (≥ 500 bp) 54,675  Number of unigenes (≥ 1 kb) 29,591 Annotation No. of matched unigenes (percentage)  Transcript BLASTx against NR 67,832 (54.37%)  Transcript BLASTx against UniRef90 68,063 (54.56%)  Transcript BLASTx against TAIR10 42,109 (33.75%)  Transcript BLASTx against KOG 55,403 (44.41%)  Transcript BLASTx against SwissProt 43,501 (34.87%)  All annotated transcripts 70,124 (56.23%)  Transcripts identified in all five databases 35,381 (28.36%) [90]Open in a new tab To evaluate and summarize the reliability and quality of the assembly, the clean reads were mapped back to the Trinity-assembled transcriptome. The overall alignment rate was 83.39%, indicating that a high-quality de novo assembled transcriptome was obtained. Functional annotation of the Sacha Inchi transcriptome After assembly, the 124,750 non-redundant transcripts were subjected to a BLAST search to predict the gene function against five public databases, NR, TAIR10, UniRef90, KOG and Swiss-Prot, and a 10^− 5 e-value cut-off value was used [[91]40]. We annotated 67,832 (54.37%) unigenes against the NR database, 68,063 (54.56%) unigenes against the UniRef90 database, 42,109 (33.75%) unigenes against the TAIR10 database, 55,403 (44.41%) unigenes against the KOG database, and 43,501 (34.87%) unigenes against the Swiss-Prot database (Table [92]1). In total, 70,142 (56.23%) unigenes had at least one homologous match from these databases, whereas 35,381 (28.36%) unigenes had significant BLAST matches to proteins in all of the five databases, as shown in a Venn diagram (Additional file [93]4: Figure S3). The similarity distribution of the top hits showed that 33.99% of the mapped sequences had similarities higher than 80%, while 62.87% of the hits had similarities ranging from 40 to 80% (Additional file [94]2: Figure S1b). The E-value distribution had a comparable pattern with 38.47% of the mapped sequences with high homologies (< 1e-50), whereas 61.53% of the homologous sequences ranged between 1e-5 and 1e-50 (Additional file [95]2: Figure S1c). The species distribution of NR BLAST matches is shown in Additional file [96]2: Figure S1d. The top-scoring BLASTx hits against the NR protein database showed that the top three species were Ricinus communis (29.27%), Jatropha curcas (9.46%) and Malus domestica (3.20%). We further analyzed the unigenes that showed high similarity with those of the KOG database. Among the 25 categories, the largest group was “General function prediction” (11,069, 19.98%), followed by “Posttranslational modification, protein turnover, chaperones” (5827, 10.52%) and “Signal transduction mechanisms” (5752, 10.38%) (Additional file [97]5: Figure S4). The categories “Cell motility” (66, 0.12%), “Nuclear structure” (160. 0.29%), “Extracellular structures” (208, 0.38%) and “Coenzyme transport and metabolism” (511, 0.92%) accounted for relatively low proportions (Additional file [98]5: Figure S4). Gene ontology and KEGG pathway analysis of the Sacha Inchi transcriptome We utilized the best BLASTx hit from the NR database to functionally classify the Sacha Inchi unigenes. The best hits were subjected to Blast2GO [[99]41, [100]42] to analyze the gene ontology (GO) terms and enzyme commission (EC) numbers. The result showed that 45,319 unigenes were assigned to 240,186 GO term annotations, with an average of 5.3 GO terms for each unigene. A total of 45,319 unigene gene functions were described under three main divisions (biological processes, cellular components and molecular functions). The predominant group in each of the biological processes, cellular components and molecular functions was metabolic process (GO: 0008152), cell (GO: 0005623) and catalytic activity (GO: 0003824), respectively (Additional file [101]6: Figure S5). To further understand the biological functions and interactions of transcripts, the unigenes of assembled sequences were assigned by the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The result showed that a total of 24,678 unigenes were involved in 323 different pathways (see Additional file [102]7). Analysis of organ-specific unigenes Examination of the unigenes expressed in each organ type revealed that female flowers expressed the most unigenes (16%), while stems and seeds expressed the least number of unigenes (11% each) (Additional file [103]8: Figure S6a). We designated a gene as the organ-specific with the criteria that the expression value (FPKM) was at least 10 in one organ while less than 1 in other organs. Using the criteria, 495, 24, 249, 390, 195, 2244, 287, and 388 unigenes were specifically found in roots, stems, shoot apexes, mature leaves, male flowers, female flowers, fruits, and seeds, respectively (Additional file [104]8: Figure S6b). Detailed annotation and FPKM information of organ-specific unigenes are listed in Additional file [105]9. To evaluate the functional properties of these organ-specific unigenes, GO and KEGG annotation enrichment analyses were carried out. As a result, transcripts were significantly enriched in 7, 18, 10, 18 and 22 GO terms in male flowers, female flowers, roots, seeds and stems, respectively (Additional file [106]10). However, no significantly enriched GO term was found in fruits or mature leaves. Through KEGG pathway enrichment analysis of female flower, most of those pathways are related to metabolism pathways, including “Amino acid metabolism,” “Biosynthesis of other secondary metabolites,” “Carbohydrate metabolism,” “Energy metabolism,” “Glycan biosynthesis and metabolism,” “Lipid metabolism,” “Metabolism of cofactors and vitamins,” “Metabolism of terpenoids and polyketides” and “Nucleotide metabolism,” followed by pathways related to genetic information processing (Additional file [107]11: Figure S7), and the genes specially expressed in female flowers were significantly enriched in 17 pathways (Additional file [108]12). Genes specifically expressed in female flowers were enriched not only in pathways related to sugar, lipid, and amino acid metabolism but also in plant hormone signal transduction and phagosome pathways, which is consistent with the GO analysis(Additional file [109]11: Figure S7). Identification and characterization of genes involved in ALA metabolism Based on the de novo assembly and annotation of the Sacha Inchi transcriptome, a total of 211 transcripts were identified as candidate unigenes for 13 enzymes of ALA biosynthesis (Table [110]2 and Additional file [111]13). The ALA biosynthesis pathway (Fig. [112]1) is composed of two sub-pathways: fatty acid (FA) elongation and desaturation [[113]43, [114]44]. The initiation and acyl chain elongation steps of de novo FA biosynthesis use acetyl-CoA. Acetyl-CoA is initially catalyzed by acetyl-CoA carboxylase (ACC) to form malonyl-CoA. Then, malonyl-ACP, which is the primary substrate for subsequent elongation, is generated from malonyl-CoA by the malonyl-CoA: ACP malonyl-transferase (MCMT). Four enzymes played a significant role during the addition of two carbons: ketoacyl-ACP synthase III (KAS III), ketoacyl-ACP reductase (KAR), 3-hydroxyacyl-ACP dehydratase (HAD, EC: 4.2.1.-) and enoyl-ACP reductase (EAR) [[115]45, [116]46]. After four reactions, C4:0-ACP, which is the substrate for further elongation, is produced. Next, ketoacyl-ACP synthase I (KAS I) is used for the elongation from C4 to C16. However, the reaction from C16 to C18 in the ALA biosynthesis pathway is catalyzed by ketoacyl-ACP synthase II (KAS II), and the enzyme stearoyl-ACP desaturase (SAD) removes two hydrogen atoms from stearic acid (18C:0) to form oleic acid (18C:1) during the process of unsaturated FA formation. The enzyme fatty acid desaturase 2 (FAD2) and fatty acid desaturase 6 (FAD6) desaturate oleic acid (C18:1Δ^9) to generate LA [[117]47]. Fatty acid desaturase 3 (FAD3), FAD7, and FAD8 enzyme genes act on the ω6 fatty acid LA [[118]48, [119]49] to catalyze the biosynthesis of ALA from LA in Sacha Inchi. In the ALA biosynthesis pathway, the genes encoding FATA, KAS II, FAD2, FAD3, and FAD7 were identified and showed significant up-regulation in seeds compared with other organs (Fig. [120]2), indicating that the unsaturated FA biosynthesis pathway might be blocked in those organs. However, the genes encoding HAD, MCMT (Malonyl-CoA ACP transferase), and KAS I were not found, which might result from the single development stage of seeds sampled in this study. Table 2. Summary of enzymes involved in ALA metabolism identified by the annotation of the Sacha Inchi transcriptome ALA biosynthesis EC No. Enzyme abbreviation Enzyme full name No. of unigenes  1.1.1.100 KAR Ketoacyl-ACP reductase 28  1.3.1.9 EAR Enoyl-ACP reductase 3  1.14.19.1 SAD Stearoyl-CoA desaturase 66  1.14.19.6 FAD2 Fatty acid desaturase 2 29  1.14.19.22 FAD6 Fatty acid desaturase 6 6  1.14.19.25 FAD3 Fatty acid desaturase 3 8  1.14.19.35 FAD7 Fatty acid desaturase 7 17  1.14.19.36 FAD8 Fatty acid desaturase 8 2  2.3.1.179 KAS II Ketoacyl-ACP synthase II 19  2.3.1.180 KAS III Ketoacyl-ACP synthase III 3  3.1.2.14 FATA Acyl-ACP thioesterase A 2  3.1.2.21 FATB Acyl-ACP thioesterase B 7  6.4.1.2 ACC Acetyl-CoA carboxylase 21 Fatty acid catabolism EC No. Enzyme abbreviation Enzyme full name No. of unigenes  6.2.1.3 LACS Long-chain acyl-CoA synthetase 27  4.2.1.17  1.1.1.35 MFP2 Multifunctional protein 15  1.3.3.6 ACX Acyl-CoA oxidase 16  2.3.1.16 KAT 3-ketoacyl-CoA thiolase 31 TAG biosynthesis EC No. Enzyme abbreviation Enzyme full name No. of unigenes  2.3.1.15 GPAT Glycerol-3-phosphate acyltransferase 12  2.3.1.51 LPAAT Lysophosphatidic acid acyltransferase 2  3.1.3.4 PP Phosphatidate phosphatase 10  2.3.1.20 DGAT Diacylglycerol O-acyltransferase 4  2.3.1.158 PDAT Phospholipid: diacylglycerol acyltransferase 5 [121]Open in a new tab Fig. 1. [122]Fig. 1 [123]Open in a new tab Identification of genes in the pathway of ALA biosynthesis based on the transcriptome of Sacha Inchi. Enzymes involved in ALA biosynthesis are abbreviated as follows: ACC, acetyl-CoA carboxylase; KASI, ketoacyl-ACP Synthase I; KASII, ketoacyl-ACP Synthase II; KASIII, ketoacyl-ACP Synthase III; KAR, ketoacyl-ACP reductase; EAR, enoyl-ACP reductase; FATA, acyl-ACP thioesterase A; FATB, acyl-ACP thioesterase B; SAD, stearoyl-CoA desaturase; FAD2, fatty acid desaturase 2; FAD3, fatty acid desaturase 3; FAD6, fatty acid desaturase 6; FAD7, fatty acid desaturase7; FAD8, fatty acid desaturase 8. Enzymes coded by identified genes are shown in red boxes. This figure was constructed according to the KEGG pathway of ALA metabolism reference pathway ([124]http://aralip.plantbiology.msu.edu/pathways/pathways) Fig. 2. Fig. 2 [125]Open in a new tab Heat map representation and hierarchical clustering of putative genes involved in ALA metabolism based on the de novo-assembled Sacha Inchi transcriptome. FF, female flowers; Fr, fruits; MF, male flowers; ML, mature leaves; Ro, roots; SA, shoot apexes; Se, seeds; St, stems. The genes of the y-axis in red are involved in the ALA biosynthesis pathway, those in blue are involved in TAG biosynthesis and those in black are involved in fatty acid catabolism pathway It is worthy to note that phosphatidylcholine diacylglycerol cholinephosphotransferase (PDCT), which was not found in the previous report of transcriptome of Sacha Inchi seeds [[126]35], but has been shown to play an important role in PUFA accumulation by catalyzing the interconversion between phosphatidylcholine (PC) and diacylglycerol (DAG) [[127]50–[128]52], was found to have high level of expression in female flowers and seeds in this study (Fig. [129]3). Fig. 3. Fig. 3 [130]Open in a new tab The PDCT transcript levels in eight major organs. Values are means ± standard deviations (n = 3). FF, female flowers; Fr, fruits; MF, male flowers; ML, mature leaves; Ro, roots; SA, shoot apexes; Se, seeds; St, stems Analysis of differentially expressed genes (DEGs) involved in TAG synthesis Differentially expressed genes (DEGs) were analyzed in the eight major organs, especially the genes involved in TAG synthesis, glycolysis/gluconeogenesis pathway (ko00010) and pentose phosphate pathway (ko00030). Genes having a false discovery rate (FDR) value < 0.001 and |log[2](fold change)| ≥2 found by edgeR were regarded as DEGs [[131]53]. As a result, a total of 24,196 DEGs were detected in our dataset. Among these DEGs, thirty-three genes were involved in TAG biosynthesis (Table [132]2 and Additional file [133]13). Glycerol-3-phosphate acyltransferase (GPAT) catalyze the acylation of glycerol-3-phosphate to produce 1-acyl-sn-glycerol-3-phosphate (lysophosphatididic acid, LPA) (Fig. [134]4) [[135]54, [136]55]. Phosphatidic acid (PA) is synthesized de novo from the acylation of LPA in a reaction catalyzed by acyl-CoA: lysophosphatidic acid acyltransferase (LPAAT). Diacylglycerol (DAG) is produced from PA catalyzed by phosphatidate phosphatase (PP). The DAG is then available for two reactions: diacylglycerol O-acyltransferase (DGAT) transfer acyl-CoAs to the sn-3 position of DAG to produce TAG; alternatively phospholipid: diacylglycerol acyltransferase (PDAT) transfer the sn-2 acyl group from PA to DAG, forming TAG [[137]56, [138]57]. In addition to the DAG derived from Kennedy pathway, a second PC-derived DAG pool has been identified [[139]58]. The GPAT and DGAT were up-regulated in female flowers, while LPAAT, PP, and PDAT were up-regulated in shoot apexes, seeds and fruits, respectively (Fig. [140]2). The glycolysis/gluconeogenesis pathway produces glycerol-3-phosphate which is the source of triacylglycerol (TAG) biosynthesis, and the pentose phosphate pathway is a process of glucose turnover that produces precursor materials and NADPH for FA biosynthesis [[141]54–[142]58]. One hundred and thirty-seven and 64 unigenes were identified as involved in glycolysis/gluconeogenesis pathway and pentose phosphate pathway, respectively, most of which were up-regulated in female flowers and seeds (Additional file [143]14: Figure S8). Fig. 4. Fig. 4 [144]Open in a new tab Identification of genes in the pathway of TAG biosynthesis based on the transcriptome of Sacha Inchi. Enzymes are abbreviated as follows:GPAT, Glycerol-3-Phosphate Acyltransferase; LPAAT, Lysophosphatidic acid Acyltransferase; PP, Phosphatidate Phosphatase; DGAT, Diacylglycerol O-Acyltransferase; PDAT, Phospholipid:Diacylglycerol Acyltransferase; PDCT, Phosphatidylcholine Diacylglycerol Cholinephosphotransferase In addition to the key enzymes, we found that some transcription factors related to TAG biosynthesis were highly up-regulated in the seed. ABSCISIC ACID-INSENSITIVE 3 (ABI3), LEAFY COTYLEDON1 (LEC1) and B3-domain transcription factor FUSCA3 (FUS3) showed a 100-fold or more expression difference in seeds compared to other organs (Table [145]3). These transcription factors probably regulate the seed oil by directly or indirectly regulating FA biosynthesis or TAG accumulation [[146]59]. Table 3. Expression levels of transcription factors PvoABI3, PvoLEC1, and PvoFUS3 detected in the transcriptome of Sacha Inchi Male flowers Fruits Mature leaves Female flowers Roots Shoot apexes Seeds Stems PvoABI3 0.0083 0.0000 0.0110 0.0393 0.0420 0.0073 59.3290 0.0110 PvoLEC1 0.0343 0.0360 0.0300 0.0283 0.0177 0.0000 66.6107 0.0150 PvoFUS3 0.0000 0.0000 0.0000 0.0000 0.1130 0.0297 6.8967 0.4457 [147]Open in a new tab Identification and characterization of genes involved in FA catabolism The major pathway of FA catabolism, beta-oxidation pathway, is composed of acyl-CoA oxidase (ACX), ketoacyl-CoA thiolase (KAT) and multifunctional protein (MFP) (Fig. [148]5). Based on the de novo assembly and annotation of the Sacha Inchi transcriptome, a total of 89 transcripts were identified as candidate unigenes for 4 enzymes of fatty acid catabolism pathway, including 27 unigenes encoding long-chain acyl-CoA synthetase (LACS) that catalyzes initial reactions of FA catabolism, 16 unigenes encoding ACX, 15 unigenes encoding MFP, and 31 unigenes encoding KAT (Table [149]2 and Additional file [150]13). Most of the FA catabolism-related genes exhibited weak expression in seeds, for example, ACX and MFP that catalyze the first and second steps of the β-oxidation of fatty acids, generating acetyl-CoA and energy [[151]60, [152]61] were down-regulated in seeds (Fig. [153]2). Fig. 5. Fig. 5 [154]Open in a new tab Identification of genes in the fatty acid catabolism pathway based on the transcriptome of Sacha Inchi. Enzymes involved in FA catabolism are abbreviated as follows: LACS, long-chain acyl-CoA synthetase; ACX, acyl-CoA oxidase; MFP, enoyl-CoA hydratase;KAT, 3-ketoacyl-CoA thiolase Identification of simple sequence repeats (SSRs) To develop molecular markers for genetic analysis and marker-assisted selection breeding of Sacha Inchi, simple sequence repeats (SSRs) were identified in our transcriptome. Here, 42,987 SSRs were detected in all of the 124,750 assembled unigenes using MISA software (Additional file [155]15). Of the SSRs unigenes, 9902 sequences contained more than one SSR and 4788 SSRs were presented in compound formation (Table [156]4). Of the 42,987 detected SSRs, monomer nucleotide repeats were the most abundant type (30,088; 69.99%), followed by dimer nucleotides (7345; 17.09%), trimer nucleotides (5061; 11.77%), tetramer nucleotides (356; 0.83%), pentamer nucleotides (46; 0.11%) and hexamer nucleotides (91; 0.21%). Among the 42,987 nucleotide repeats, A/T (69.40%) was the most abundant motifs, followed by AT/AT (9.02%) and AG/CT (5.64%). Table 4. Summary of SSR searching results in the Sacha Inchi transcriptome Item Number Total number of sequences examined 124,750 Total size of examined sequences (bp) 106,216,748 Total number of identified SSRs 42,987 Number of SSR containing sequences 27,420 Number of sequences containing more than one SSR 9902 Number of SSRs present in compound formation 4788 [157]Open in a new tab Validation of gene expression profiles using qRT-PCR To experimentally confirm the expression patterns of genes identified by transcriptome sequencing, 9 key unigenes involved in ALA metabolism (Fig. [158]6a), 7 unigenes detected in our data and 8 unigenes showing organ-specific expression (Fig. [159]6b) in Sacha Inchi were chosen for qRT-PCR analysis. The detailed information of the selected unigenes is presented in Additional file [160]16. The results showed that the expression patterns of the most genes tested by qPCR and RNA-Seq were consistent (Fig. [161]6a, b). Overall, a highly significant correlation (Pearson correlation coefficient r = 0.835) existed between qRT-PCR and RNA-Seq results regarding the ratios of gene expression levels (Additional file [162]17: Figure S9), suggesting that our transcriptome data reflect the expression patterns of most genes in Sacha Inchi. Fig. 6. [163]Fig. 6 [164]Open in a new tab Validation of gene expression profiles by quantitative real-time PCR. a Oil-related genes. The full name of genes can been found in the legends of Figs. [165]1, [166]4 and [167]5. Values are means ± standard deviations (n = 3). b Organ-specific genes. HPG, hypothetical protein GLOINDRAFT_348631; GDS, glutamine-dependent NAD(+) synthetase; CBL, CBL-interacting serine/threonine-protein kinase; PP, peroxidase 64 precursor; HPA, hypothetical protein AALP_AAs53585U000100; HSP90, HSP90 co-chaperone CPR7; LP, phospholipid transfer protein; RABC2a, ras-relatedprotein RABC2a-like. FF, female flowers; Fr, fruits; MF, male flowers; ML, mature leaves; Ro, roots; SA, shoot apexes; Se, seeds; St, stems. The levels of the detected amplicons were normalized using the amplified product of ACTIN of Sacha Inchi (n = 3) Discussion In this study, we provide a large number of organ-specific unigenes by analyzing the gene expression profile of individual organs. As shown in Additional file [168]8: Figure S6b, it is apparent that different organs expressed distinct sets of genes. The vast majority of the organ-specific transcripts are found in female flowers, whereas relative small amount of unigenes are predicted to be expressed specifically in other organs. These findings indicate that the metabolisms, such as lipid metabolism, amino acid metabolism in female flowers are active and involve many more specifically expressed genes. This result also provides a large number of candidate specific promoters of female flowers. Considering ALA has potential applications in food and pharmaceutical industries [[169]7], the main objective of our research was to comprehensively study the ALA metabolism and investigate the molecular basis of this pathway in Sacha Inchi. The omega-3 fatty acid desaturases, especially FAD3, are key enzymes for the formation of ALA from the desaturation of LA [[170]62, [171]63]. Overexpression of FAD3 in roots and seeds led to the increase of ALA in a fad3–2 mutant of A. thaliana [[172]64]. Meanwhile, overexpression of FAD7 increased levels of ALA that led to series of physiological alterations, such as electrolyte leakage and malondialdehyde contents in tomato [[173]65]. Taken together, we suggested that high transcript levels of FAD3 and FAD7 is consistent with high level of ALA content in seeds of Sacha Inchi. Our results showed that the predominant genes related to these core pathways are sequence conserved. Furthermore, previous research showed that when the flax PDCTs were co-expressed with FAD2 and FAD3, PUFA levels increased in Saccharomyces cerevisiae [[174]50]. The PDCT from flax are capable of increasing C18-PUFA levels substantially in metabolically engineered yeast and transgenic A. thaliana seeds [[175]50]. These data strongly indicate that those genes appear to play an important role in the determination of PUFA content in TAG synthesis. DEGs that directly and indirectly regulate TAG biosynthesis were identified in our data. In addition to a number of genes related to glycolysis/gluconeogenesis pathway and pentose phosphate pathway were up-regulated expression in female flower and seed, the key gene phosphatidate phosphatase (PP) which catalyzes DAG from PA had a high expression level in seed. The transcription factors also regulate the TAG biosynthesis directly or indirectly in plants. Overexpression of maize LEC1 can increase seed oil as much as 48% [[176]66]. ABI3 and FUS3 are key regulators in phase transition and seed development, maturation and dormancy, and thereby indirectly regulate oil biosynthesis [[177]26]. The genes involved in ALA biosynthesis are regulated by miRNAs as shown in previous study: KASII and KASIII are regulated by the miR159; KAR is regulated by the miR156b, miR156c, miR156g and miR6029; FATA is regulated by miR801, miR298, miR1430 and miR828; FATB is regulated by miR555 and miR113; and SAD is regulated by miR2163 [[178]67]. Based on the gene expression pattern analysis, genes coding enzymes related to the ALA biosynthesis strongly express in the seeds, whereas FA catabolism-related genes exhibited weak expression in seeds compared to those in other organs (Figs. [179]2 and [180]7),which might related to high ALA content. Fig. 7. Fig. 7 [181]Open in a new tab The ratio of expression of genes encoding enzymes for ALA biosynthesis relative to fatty acid catabolism. FF, female flowers; Fr, fruits; MF, male flowers; ML, mature leaves; Ro, roots; SA, shoot apexes; Se, seeds; St, stems Conclusions In the present study, the complete transcriptome of Sacha Inchi was de novo-assembled and annotated for the first time, generating a total of 124,750 non-redundant transcripts, of which 70,142 could be functionally annotated. Among the eight organs analyzed in this study, the largest number of specifically expressed genes was found in female flowers while the least was found in stems. We identified 211 unigenes and 89 unigenes potentially involved in the ALA biosynthesis and FA catabolism pathways, respectively. Compared with other organs, most of the unigenes related to ALA biosynthesis metabolism were up-regulated, whereas most of those enzymes related to FA catabolism were down-regulated in seeds of Sacha Inchi. In particular, the up-regulation of FAD3 and FAD7 may play an important role in high level accumulation of ALA in seeds of Sacha Inchi. Some transcription factors are highly up-regulated in seeds, which are potentially related to TAG accumulation. The transcriptome data reported here provide the foundation for the functional genomics research and genetic improvement of Sacha Inchi. In conclusion, we present a high-quality transcriptome sequence for the Sacha Inchi. The sequences of genes related to organ-specific and ALA metabolism are obtained based on large-scale transcriptomic data, which will enable further metabolomic and gene functional study. This ALA-rich species was studied to form a more diversified set of ALA to eventually increase storage ALA production and satisfy more human need worldwide. Methods Plant materials, cDNA library construction and sequencing Eight organs, including roots (Ro), stems (St), shoot apexes (SA), mature leaves (ML), male flowers (MF), female flowers (FF), fruits (Fr), and seeds (Se), were harvested in August from the 1-year-old plants of Sacha Inchi grown at the Xishuangbanna Tropical Botanical Garden (21^o54’_N, 101^o46’_E, 580 m above sea level), Chinese Academy of Sciences, Mengla, Yunnan, China under natural climate conditions. August has a mean temperature of 25.1 °C and mean monthly precipitation greater than 200 mm in Xishuangbanna Tropical Botanical Garden [[182]68]. The samples were collected at 60 days after pollination (DAP) when fruits and seeds have reached full size. Three independent biological replicates of each sample were collected from three individual plants. All samples were frozen immediately in liquid nitrogen and then stored at − 80 °C for RNA extraction. A total amount of 3 μg of RNA per sample was used as input material for RNA sample preparation. Using poly-T oligo-attached magnetic beads, mRNA was purified from total RNA by following the manufacturer’s recommendations (NEB, USA). Then, fragmentation was carried out using divalent cations under elevated temperature. Using these short fragments (about 200 bp) as templates, first-strand cDNA was synthesized using random hexamer primers and MMuLV Reverse Transcriptase (RNase H). Second-strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. Then, these cDNA fragments were processed by an end-repair and the ligation of adapters, according to the manufacturer’s protocol (Beckman Coulter, Beverly, USA). The products were purified and enriched with PCR for preparing the final sequencing library. Finally, the library quality was assessed using an Agilent Bioanalyzer 2100 system. A TruSeq SR Cluster Kit v3-cBot-HS (Illumina) was used to perform the clustering of index-coded samples. After cluster generation, the library preparations were sequenced on an Illumina Hi-Seq 4000 platform, and 150-bp paired-end reads were generated at the Novogene Bioinformatics Institute, Beijing, P. R. China. Sequence data processing and de novo assembly First, raw reads in fastq format were processed through in-house Perl scripts. Then, the clean data were de novo assembled using Trinity software [[183]37], with parameter settings of “-trimmomatic,” which removes reads containing adapters, reads containing poly-N and reads of low quality [[184]69]. Functional annotation and pathway assignments Unigenes were aligned by BLASTx to the NCBI nonredundant (NR), Swiss-Prot, TAIR10, UniRef90 and KOG databases with a threshold E-value of 10^− 5. For each unigene, the best BLASTx hit from the NR database was submitted to the BLAST2GO platform [[185]42, [186]70], and GO terms were obtained based on annotations between gene names and GO terms. To determine metabolic pathways, the unigenes were also submitted to the online KEGG (Kyoto Encyclopedia of Genes and Genomes) Automatic Annotation Server (KAAS), with single-directional best hit method [[187]71]. Differential expressed genes (DEGs) analysis The expression level of each unigene was measured using the fragments per kilobase of transcript sequence per millions base pairs sequenced (FPKM) method. The clean reads were aligned to the assembly using Bowtie [[188]72], and the resulting alignments were used to estimate expression abundances in FPKM [[189]37]. All read counts were normalized to FPKM. Differential expression analysis of the samples with three biological replications was performed using the edgeR [[190]53]. Genes with FDR value < 0.001 and |log2(fold change)| ≥2 calculated by edgeR were regarded differentially expressed. Simple sequence repeat (SSR) analysis The Perl script MISA (MIcroSAtellite identification tool, [191]http://pgrc.ipk-gatersleben.de/misa/misa.html) was used to identify SSRs in Sacha Inchi transcriptome sequences [[192]73]. Validation by quantitative real-time PCR (qRT-PCR) Gene-specific primer pairs were designed using Primer Premier 5.0 software, and the amplified PCR products varied from 100 to 250 bp (Additional file [193]15). qRT-PCR assays were performed as previously described [[194]74]. The reference gene was chosen based on the methods of Niu et al. [[195]75]. Additional files [196]Additional file 1:^ (25.5KB, xls) Table S1. Overview of the sequencing data of the Sacha Inchi transcriptomes. (XLS 25 kb) [197]Additional file 2:^ (935.2KB, tif) Figure S1. Overview of Sacha Inchi transcriptome assembly and the characteristics of the homology search of unigenes against the NR database by BLAST (cut-off E-value of 1.0E-5). (a) Size distribution of the assembled unigenes. (b) Similarity distribution of the best BLAST hits for each unigene. (c) E-value distributions of the best BLAST hits for each unigene against the NR database. (d) Species distribution of the best BLAST hit for each unigene. (TIF 935 kb) [198]Additional file 3:^ (4.2MB, tif) Figure S2. The Pearson correlation coefficient (r) was used to estimate the difference between the replicates of each tissue. The number between these two samples is given in the plot. The color represents r value, which shows high correlation in red between two samples, while low correlation in blue. (TIF 4328 kb) [199]Additional file 4:^ (574.9KB, tif) Figure S3. Venn diagram showing the BLAST searches of the Sacha Inchi transcriptome against the five public databases. De novo unigene sequences were used to search against the following public databases: NR, UniRef90, TAIR10, KOG and Swiss-Prot. The numbers of unigenes that have significant hits against the five databases are shown in each intersection in the Venn diagram. (TIF 574 kb) [200]Additional file 5:^ (816.7KB, tif) Figure S4. Histogram presentation of clusters of orthologous group classification of assembled unigenes. A total of 124,750 unigenes were classified into 25 functional categories. (TIF 816 kb) [201]Additional file 6:^ (1.6MB, tif) Figure S5. Distribution of gene ontology (GO) categories of unigenes for Sacha Inchi. GO functional annotations are summarized into three main categories: biological process, cellular component, and molecular function. The number of unigenes in each category is shown on the y-axis. (TIF 1684 kb) [202]Additional file 7:^ (60KB, xls) Pathways identified in the Sacha Inchi transcriptome. Three hundred and fifty-three KEGG pathways identified in Sacha Inchi and the corresponding unigene numbers of each pathway are shown. (XLS 60 kb) [203]Additional file 8:^ (2.5MB, tif) Figure S6. The statistics of unigenes in eight organs. a) The percentage of unigenes expressed in each organ. b) Number of organ-specific unigenes. FF, female flowers; Fr, fruits; MF, male flowers; ML, mature leaves; Ro, roots; SA, shoot apexes; Se, seeds; St, stems. (TIF 2536 kb) [204]Additional file 9:^ (646.5KB, xls) Detailed FPKM and Annotation information of organ-specific unigenes. (XLS 646 kb) [205]Additional file 10:^ (33KB, xls) The list of GO terms that were significantly enriched in male flowers, female flowers, roots, seeds and stems. Gene ontology (GO) terms were assigned to organ-specific unigenes based on the top hits against the NR database. (XLS 33 kb) [206]Additional file 11:^ (1.2MB, tif) Figure S7. Histogram presentation of the KEGG pathway annotation of female flower (FF)-specific genes. (TIF 1210 kb) [207]Additional file 12:^ (70KB, xls) The list of KEGG pathways were enriched in each organ. (XLS 70 kb) [208]Additional file 13:^ (68.5KB, xls) List of FA and TAG metabolism-related genes detected in Sacha Inchi transcriptome. A) Unigenes related to ALA biosynthesis. B) Unigenes related to the fatty acid catabolism pathway. C) Unigenes related to TAG biosynthesis. (XLS 69 kb) [209]Additional file 14:^ (2.5MB, tif) Figure S8. Heat map representation and hierarchical clustering of putative genes involved in glycolysis/gluconeogenesis pathway and pentose phosphate pathway. A: glycolysis/gluconeogenesis pathway (ko00010). B: pentose phosphate pathway (ko00030). (TIF 2539 kb) [210]Additional file 15:^ (4.8MB, xls) Overview of the SSRs detected in the assembled unigenes of Sacha Inchi. (XLS 4872 kb) [211]Additional file 16:^ (31KB, xls) Primers for qRT-PCR in this study. (XLS 31 kb) [212]Additional file 17:^ (337.6KB, tif) Figure S9. Pearson correlation analysis of the gene expression ratios obtained from RNA-Seq and qPCR data. The qPCR log[10] values (expression ratios; y-axis) were plotted against the RNA-Seq log[10] values (x-axis). The Pearson correlation coefficient (r) is given in the plot, and the circle indicates the extremely significant difference at p < 0.01. (TIF 337 kb) Acknowledgments