Abstract Soluble sugars and organic acids are important components of fruit flavor and have a strong impact on the overall organoleptic quality of watermelon (Citrullus lanatus) fruit. Several studies have analyzed the expression levels of the genes related to soluble sugar accumulation and the dynamic changes in their content during watermelon fruit development and ripening. Nevertheless, to date, there have been no reports on the organic acid content in watermelon or the genes regulating their synthesis. In this study, the soluble sugars and organic acids in watermelon were measured and a comparative transcriptome analysis was performed to identify the key genes involved in the accumulation of these substances during fruit development and ripening. The watermelon cultivar ‘203Z’ and its near-isogenic line (NIL) ‘SW’ (in the ‘203Z’ background) were used as experimental materials. The results suggested that soluble sugar consist of fructose, glucose and sucrose while malic-, citric-, and oxalic acids are the primary organic acids in watermelon fruit. Several differentially expressed genes (DEGs) related to soluble sugar- and organic acid accumulation and metabolism were identified. These include the DEGs encoding raffinose synthase, sucrose synthase (SuSy), sucrose-phosphate synthase (SPSs), insoluble acid invertases (IAI), NAD-dependent malate dehydrogenase (NAD-cyt MDH), aluminum-activated malate transporter (ALMT), and citrate synthase (CS). This is the first report addressing comparative transcriptome analysis via NILs materials in watermelon fruit. These findings provide an important basis for understanding the molecular mechanism that leads to soluble sugar and organic acid accumulation and metabolism during watermelon fruit development and ripening. 1. Introduction Watermelon [Citrullus lanatus (Thunb.) Matsum and Nakai] belongs to the Cucurbitaceae and is an important and popular staple summertime fresh fruit worldwide. It constitutes 7% of the global area dedicated to fruit and vegetable production [[36]1]. Watermelon fruit provides large amounts of water and nutrients including sugars, carotenoids, lycopene, minerals, and amino acids[[37]2, [38]3]. Soluble sugars, organic acids, and aroma are important components of fruit flavor and have a strong impact on overall organoleptic fruit quality[[39]4]. Soluble sugars in fruits include fructose, glucose, and sucrose. Malic-, citric-, and oxalic acids are the primary organic acids. The type and content of soluble sugars and organic acids determine the organoleptic properties of fruits. Improvement of fruit quality is an important goal in all watermelon breeding programs. Soluble sugars are important components of watermelon fruit quality and have become the focus of a great deal of research. Previous studies have addressed the dynamic changes in soluble sugar levels and the activities of sugar-metabolizing enzymes that occur during watermelon fruit development and ripening [[40]5–[41]7]. It has been established that the primary soluble sugars in watermelon fruit are fructose, glucose and sucrose[[42]8]. Several quantitative trait loci (QTL) related to sugar content were detected by mapping[[43]9–[44]11], only a Tonoplast Sugar Transporter gene ClTST2 involved in sucrose accumulation in watermelon fruit was identified by resequencing and biochemical analyses[[45]12], however, sugar accumulation in fruits is a complex quantitative trait, the other key candidate sugar content-regulating genes have not yet been identified. The organic acid composition and content in fruit are also regarded as commercially important traits because they influence organoleptic quality. They also play critical roles in fruit metabolism. Nevertheless, there is relatively little reported research on organic acids in watermelon fruit. It was recognized that malic- and citric acids are the main organic acids in ripe watermelon fruit[[46]13]. Both accumulation and metabolism of soluble sugars and organic acids are developmental stage-dependent. Fruit development and ripening are complex biological processes. They are regulated by several factors including environmental conditions, phytohormones, and gene regulation[[47]14]. Numerous studies have suggested that fruit development and ripening are regulated by the coordinated expression of a set of genes. A total of 832 expressed sequence tags (ESTs) from a subtracted cDNA library of watermelon fruit were utilized to study gene expression as the fruit develops. Of these, 211 were differentially expressed genes with annotation. A significant number were associated with ethylene biosynthesis, transcriptional regulation, pathogen and stress response, carotenoid biosynthesis, and the vascular system[[48]14]. A total of 3,023 differentially expressed genes (DEGs) were identified during watermelon fruit development and ripening, they encoded metabolites related to pigmentation and sweetness[[49]15]. The transcriptome profiles of fruit tissues from cultivated- and wild developing and ripening watermelon were compared. Several DEGs involved in biochemical pathways were identified. These included sugar metabolism and accumulation, flesh carotenoid biosynthesis and metabolism, flesh texture change, ethylene biosynthesis, and signal transduction[[50]16]. Many DEGs related to sugar- and cell wall metabolism, carotenoid biosynthesis, and phytohormone pathways were identified using comparative transcriptome analysis of two different watermelon types during fruit development and ripening[[51]8]. Thus far, there have been no reports on the genes and pathways involved in organic acid metabolism and accumulation during watermelon fruit development and ripening, we have not known that how the soluble sugars and organic acids interconvert each other. In this study, soluble sugars and organic acids content were measured during fruit development and ripening in watermelon, we performed a comparative transcriptome analysis of the watermelon cultivar ‘203Z’ and its near-isogenic line (NIL) ‘SW’. We identified DEGs that may be related to soluble sugar- and organic acid metabolism and accumulation during fruit development and ripening. We confirmed their expression profiles by quantitative real-time polymerase chain reaction (qRT-PCR). These results provide insights into identifying the key candidate genes or pathways involved in soluble sugar- and organic acid metabolism and accumulation during watermelon fruit development and ripening. 2. Materials and methods 2.1 Plant materials In this study, the experimental materials included ‘SW’, a near-isogenic line containing foreign introgressed segments from the wild watermelon subspecies ‘PI271769’ of the ‘203Z’ cultivar, and the recurrent parent ‘203Z’. ‘203Z’ and ‘PI271769’ used as germplasm resources were conserved in Zhengzhou Fruit Research Institute. The pure inbred line watermelon cultivar ‘203Z’ has spherical fruits with green rind, dark green stripes, red flesh, a high total soluble sugar content (up to 91.3 mg.g^-1 FW), and a low total organic acid content (up to 5.90 mg.g^-1 FW) at maturity. ‘PI271769’ belongs to wild watermelon subspecie, has spherical fruits with white and hard flesh, a low total soluble sugar content (up to 6.83 mg.g^-1 FW) and a high total organic acin content (up to 17.26 mg.g^-1 FW). ‘SW’ was derived from a cross between the inbred ‘203Z’ and the wild subspecies ‘PI271769’. The latter has a low total soluble sugar content and a high total organic acid content. The F[1] plants were backcrossed seven times with ‘203Z’ as recurrent parents to generate BC[7]F[1] then self-pollinated four times to yield BC[7]F[5]. Progeny with stable high total organic acid content were selected and named ‘SW’. There are no significant phenotypic differences between ‘203Z’ and ‘SW’ except that at maturity, the latter has a higher total organic acid content than the former. In this report, ‘C’ means ‘203Z’ and ‘D’ refers to ‘SW’. All of the aforementioned experimental materials were grown in a greenhouse in Xinxiang city, China. The female flowers were manually self-pollinated then tagged to record the number of days after pollination (DAP). According to previous studies, red-flesh cultivated watermelon ripens in four critical stages: (1) immature white flesh (C1 and D1); (2) white-pink flesh (C2 and D2); (3) red flesh (C3 and D3); and (4) full-ripe (C4 and D4) (10, 18, 26, and 34 DAP, respectively)[[52]15–[53]17]. Standard conventional field practices (including fertilization, irrigation, and pest control) were followed during the growing season. Twenty four flesh samples were collected from the center of three uniform watermelon fruits at the four development stages then immediately flash-frozen in liquid nitrogen and stored at -80°C until use. 2.2 Measurement of the soluble sugar and organic acid content in the fruit pulp After the watermelon fruit flesh samples were homogenized, their soluble solid content (%) and pH were determined with a laboratory refractometer (HC-112ATC, Shanghai LICHENKEYI, China) and a pH meter (PHB-4, Shanghai LICHENKEYI, China), respectively. The pooled fruit pulp samples were then flash-frozen in liquid nitrogen and stored at -80 °C until they were used to determine the soluble sugar (glucose, fructose, and sucrose) and organic acid (malic, citric, and oxalic) content according to previously reported methods[[54]18]. 2.3 RNA extraction and quality assessment Total RNA was isolated from the frozen watermelon flesh using a Plant Total RNA Purification Kit (GeneMark, Beijing, China) according to the manufacturer’s instructions. The quantity, quality, and integrity of the RNA samples were determined with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and a Nanodrop NanoPhotometer (Implen GmbH, Munich, Germany). 2.4 cDNA library preparation and sequencing The cDNA library construction and sequencing were performed at BGITech (Shenzhen, China). The mRNA with polyA tail was enriched with oligo magnetic beads[[55]19], then purified. The cleaved RNA fragments were reverse transcribed to double-strand cDNA using N6 random primer. The cDNA fragments were purified, blunted with phosphate at the 5' end and stickiness 'A' at 3' end, and adaptor-ligated. The ligation product was amplified by two specific primers then denatured by heat. The single-strand DNA was cyclized by splint oligo and DNA ligase. Finally, the cDNA libraries were sequenced on the BGISEQ-500 sequencing platform. 2.5 Quality control for raw sequencing and mapping of the reads to the reference genome The sequences of the adaptor, the unknown bases, and the low-quality reads were removed from the raw reads. The Q20-scores for the clean reads were also calculated. After filtering, the clean reads were stored in FASTQ format[[56]20]. High-quality clean reads were used for downstream analyses. The clean reads were mapped to the reference gene and the genome by Bowtie2[[57]21], and HISAT [[58]22], respectively. 2.6 Quantification of gene expression levels and screening differentially expressed genes (DEGs) Gene expression levels were measured by a software package named RSEM [[59]23]. The FPKM method was used to calculate the expression level using the formula FPKM = [10^9/NL] C, where C is the number of fragments aligned to the target unigene, N represents the total number of fragments aligned to all genes, and L refers to the length of the target unigene. Differentially expressed genes (DEGs) between two samples were identified using the NOISeq method [[60]24] based on the following default criteria: log2 (fold change) ≥1, and probability of divergence ≥0.8. 2.7 GO term and KEGG pathway enrichment Gene ontology (GO) is a standard international gene function classification system based on molecular functions, cellular components, and biological processes. In this study, the DEGs were annotated by GO using the ([61]http://www.geneontology.org/) database and the 'GO Term Finder' ([62]http://www.yeastgenome.org/help/analyze/go-term-finder). After obtaining the GO annotations for the DEGs, the GO functional classification was executed with WEGO[[63]25]. Genes usually participate in certain biological functions by interacting with each other. To assign the DEGs to specific biological pathways, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation was used based on the KEGG database[[64]26]. False discovery rates were controlled using methods published earlier[[65]27], setting P ≤ 0.05 as a threshold for significantly enriched categories. 2.8 Validation of DEG expression by quantitative real-time polymerase chain reaction (qRT-PCR) Nine differentially expressed genes (DEGs) were selected to validate RNA-Seq output by qRT-PCR. The first-strand cDNA was synthesized using a PrimeScript^TM RT reagent kit with gDNA Eraser (Perfect Real Time) (TaKaRa, Kusatsu, Shiga, Japan) based on the manufacturer^’s protocol. In the first step, genomic DNA contamination was removed from the cDNA by subjecting it to a 10-μL reaction system consisting of 1 μL gDNA Eraser, 2 μL 5XgDNA Eraser Buffer, 5 μL total RNA (200 ng μl^-1), and 2 μL RNase-free ddH[2]O for 2 min at 42 °C. Then 1 μL PrimeScript RT Enzyme Mix, 1 μL RT Primer Mix, 4 μL 5X PrimeScript Buffer2, and 4 μL RNase-free ddH[2]O were added to the reaction system in the last step. The mixture (final volume 20 μL) was incubated at 37 °C for 15 min followed by 5 s at 85 °C. The qRT-PCR was performed on the LightCycler480 RT-PCR system (Roche Diagnostics International AG, Rotkreuz, Switzerland) using LightCycler 480 SYBR Green I Master (Roche Diagnostics International AG, Rotkreuz, Switzerland) according to the manufacturer^’s instructions. Gene-specific primers were designed using the database ([66]https://www.ncbi.nlm.nih.gov/tools/primer-blast/). Cla016178 [[67]28] was used as the internal control gene ([68]S1 Table). Full-length sequences already determined for watermelon[[69]1], were used to design primers. Each reaction system (total volume: 20 μL) contained 2 μL cDNA, 1 μL of each forward- and reverse primer (10 ng mL^-1), 10 μL 2X SYBR Green real-time PCR Mix, and 6μL ddH[2]O. The PCR program was carried out under the following conditions: initial preheat at 95 °C for 5 min followed by 40 cycles of 30 s at 95 °C, 65 °C, and 72 °C, respectively. Melting temperature curve analysis was performed at the end of each reaction run to confirm the specificity of the qRT-PCR products. Each experiment was performed in triplicate. The raw qRT-PCR data were analyzed with LCS480 v. 1.5.0.39 (Roche Diagnostics International AG, Rotkreuz, Switzerland) and the relative expression levels of the genes were calculated by the 2^-ΔΔCT method [[70]29](Livak and Schmittgen, 2001). 3. Results 3.1 Variations in the soluble sugar and organic acid content during the ripening of ‘203Z’ and ‘SW’ watermelon fruits The soluble sugar content of watermelon fruit largely determine its quality and the organic acid content influences its flavor. Therefore, the soluble sugar and organic acid content of ‘203Z’ and ‘SW’ fruit were measured during their development and ripening. In watermelon fruit, the dominant soluble sugars are fructose, sucrose, and glucose, and the dominant organic acids are malic acid, citric acid, and oxalic acid. Patterns of change in the soluble sugar and organic acid content during fruit development and ripening in watermelon are shown in [71]Fig 1. In both ‘203Z’ and ‘SW’ fruit, the soluble solid content (SSC) and sucrose content peaked during development at 26 DAP then decreased slightly at 34 DAP ([72]Fig 1A and 1B). Nevertheless, the maximum glucose content was measured at 18 DAP ([73]Fig 1C) while that of fructose was detected at 34 DAP ([74]Fig 1D). Moreover, the SSC, fructose, sucrose, and glucose content in ‘203Z’ were much higher than those in ‘SW’ from 18–34 DAP. The pH of ‘203Z’ and ‘SW’ fruit pulp gradually decreased between 10 DAP and 26 DAP then increased slightly at 34 DAP ([75]Fig 1E). In both ‘203Z’ and ‘SW’, the malic- and citric acid content peaked during fruit ripening at 26 DAP then decreased at 34 DAP ([76]Fig 1F and 1G). The oxalic acid content peaked at the early stage of fruit ripening in both ‘203Z’ and ‘SW’, rapidly decreased to their minimum levels at 18 DAP, increased at 26 DAP, then decreased at the end ([77]Fig 1H). The organic acid contents were significantly lower in ‘203Z’ than in ‘SW’ between 10 DAP and 34 DAP. These results indicate that the soluble sugar and organic acid levels in ‘203Z’ and ‘SW’ fruits significantly differ from each other during development and ripening. Fig 1. The trends in soluble sugars and organic acids contents in 203Z and SW watermelon fruit during development and ripening. [78]Fig 1 [79]Open in a new tab Total soluble sugar (a), fructose (b,) glucose (c), sucrose (d), PH (e), malic acid (f), citric acid (g), and oxalic acid (h) were extracted at 10, 18, 26 and 34 DAP. Three individual replicates were used to reduce the experimental error. The bars represent mean ± SE (n = 3). 3.2 An overview of the RNA-Seq data To understand the potential molecular synthesis mechanisms involved in the soluble sugar and organic acid of fruit development and ripening, twenty-four cDNA libraries were prepared from fruit flesh samples at the four critical ripening stages. Three biological replicates were used for each stage and watermelon species. After RNA sequencing, the quality of the data was assessed. An overview of the sequencing and assembly is shown in [80]Table 1. Table 1. An overview of the RNA-Seq data. Sample Raw reads number Clean reads number Total mapped reads(%) Unique match(%) Multi-position match(%)[81]^a Q20(%)[82]^b C1-1 24135583 24119672 97.24 87.67 9.57 97.9 C1-2 24135958 24119571 97.37 87.81 9.56 97.7 C1-3 24136068 24117125 97.88 89.52 8.36 98.0 C2-1 23140509 23122652 98.53 91.27 7.26 98.1 C2-2 23039150 22981291 98.46 91.23 7.23 98.3 C2-3 23235688 23203234 98.52 91.28 7.24 98.2 C3-1 23151242 23116937 98.39 91.06 7.33 98.1 C3-2 23112076 23070091 98.54 91.30 7.24 98.2 C3-3 23223555 23165679 98.51 91.55 6.96 98.4 C4-1 23183013 23145040 98.47 91.06 7.41 98.0 C4-2 23015964 22966485 98.55 91.55 7.00 98.5 C4-3 23716281 23658477 97.49 88.49 9.00 97.5 D1-1 24134726 24113550 97.13 87.12 10.01 97.7 D1-2 24135188 24119312 97.09 87.04 10.05 97.7 D1-3 23386471 23370924 98.48 90.87 7.61 98.1 D2-1 23075487 23032938 98.39 91.04 7.35 98.2 D2-2 22949522 22901119 98.41 91.19 7.22 98.4 D2-3 22792986 22761826 98.41 90.99 7.42 98.3 D3-1 23498245 23484291 98.46 91.29 7.17 98.2 D3-2 23063140 23021956 98.41 91.04 7.37 98.0 D3-3 23135529 23089960 98.39 90.99 7.4 98.1 D4-1 23918095 23865044 97.73 89.09 8.64 97.2 D4-2 23840098 23798619 97.4 88.36 9.04 97.0 D4-3 23814255 23781964 97.7 89.14 8.56 97.6 [83]Open in a new tab Notes ^a Total Mapped Reads (%) = Unique Match (%) + Multi-position Match (%), are the percentages of clean reads align to reference genome. ^b Q20 (%) are the percentages of reads with Phred qualities scores over than 20. After filtering the low-quality reads, an average of 23,421,989 high-quality clean reads (99.85% of the 23,457,034 raw reads) was obtained. Alignment statistics of reads align to reference genome, in all the samples, >90.1% of the total clean reads from the RNA-Seq data were mapped uniquely to the reference genome whereas only a few (<3%) were not mapped to it. More than 97% of the total clean reads had Phred-like quality scores at the Q20 level. Via comparative transcriptome analysis, this high-quality RNA-Seq data provided a solid foundation for identifying key genes participating in soluble sugar and organic acid syntheses during watermelon development and ripening. 3.3 Analyses of differentially expressed genes (DEGs) To find differentially expressed genes between two samples and perform other functional analyses on them, DEG screening was conducted by setting the probability of divergence at ≥0.8 and log2^Ratio>1 as thresholds. Results are shown in [84]Fig 2. Comparisons of ‘203Z’ and ‘SW’ at the same fruit development time points revealed 320 (C1 vs. D1) and 244 (C4 vs. D4) DEGs. In contrast, C2 vs. D2 disclosed only 81 DEGs and C3 vs. D3 indicated only 99. In the latter case, the numbers of up- and downregulated genes were similar. The analyses of ‘203Z’ and ‘SW’ at different development stages showed that there were far more DEGs in C1 vs. C2 than in C2 vs. C3 or C3 vs. C4. Genes with similar expression patterns often participate in the same biological processes or share functionality. Therefore, hierarchical clustering analysis of the DEG expression patterns was performed ([85]Fig 3). The results demonstrated that there were far more up- and downregulated genes in C1 vs. C2 and D1 vs. D2 than the other groups. The probable reason for this was that the expression level of most genes were low at 10 DAP but, as the fruit developed, most of the genes began to express at high levels in order to drive the increasingly complex biological processes. Fig 2. Statistic of differentially expressed genes. [86]Fig 2 [87]Open in a new tab X axis represents pairwise and Y axis means number of screened DEGs. Blue bar denotes down-regulated genes and orange bar for the up-regulated genes. Fig 3. Hierarchical clustering analysis of DEGs between different developmental stages of 203Z and SW. [88]Fig 3 [89]Open in a new tab (C1-VS-C2, D1-VS-D2, C2-VS-C3, D2-VS-D3, C3-VS-C4, D3-VS-D4, “a” was the control and “b” was experimental group in “a-VS-b”). Each line refers to data from one gene. The color bar represents the log[2] (Fold change) and ranges from blue (low expression) to red (high expression). After excluding the unigenes (probability of divergence <0.8 or log2^Ratio<1), a total of 2,260 common DEGs were detected at the various fruit development stages in both ‘203Z’ and ‘SW’. These common DEGs were listed in [90]S2 Table and accounted for 9.6% of the 23,440 predicted genes. There were 636 DEGs identified in ‘203Z’ and ‘SW’ at the same development stages, these DEGs were listed in [91]S3 Table, and they only accounted for 2.7% of the predicted genes. Therefore, transcriptome variation was wider among developmental stages than it was between varieties. 3.4 Gene ontology (GO) and pathway functional enrichment analysis of the DEGs To obtain functional information on the DEGs, a literature search and annotated biological and biochemical functional analyses were run using the WEGO database. Based on the GO classification, the differentially expressed transcripts were classified into three high-level categories: molecular functions, cellular components, and biological processes. To characterize gene function distribution at the macro level, we performed a GO enrichment analysis to determine the DEG functions at various development stages and in different source materials. The results of the GO functional enrichment are shown in [92]Fig 4 and [93]Fig 5. A total of 848 and 251 unigenes have GO annotations in various development stages ([94]Fig 4 and [95]S4 Table) and different source materials ([96]Fig 5 and [97]S5 Table), respectively. They were grouped into three functional GO categories. For molecular function, “catalytic activity” (523 DEGs for the various developmental stages and 152 DEGs for the different source materials) was the most highly represented GO team, followed by “binding” (387 DEGs for the stages and 111 DEGs for the materials). “Transcription factor activity, protein binding” (only 2 DEGs) was the least represented GO team in the various developmental stages while “signal transducer activity” (3 DEGs) had the least representation for the different source materials. For the cellular component, the categories with considerable enrichment and the highest number of DEGs were “cell” (185 DEGs and 69 DEGs) and “cell part” (185 DEGs and 69 DEGs). Only one team, “cell junction” (1 DEG) was significantly enriched in the various developmental stages. For biological process, the categories with the greatest enrichment and number of DEGs were “metabolic” (446 DEGs and 124 DEGs), followed by “cellular process” (287 DEGs and 81 DEGs) while the “growth” team had only one DEG in this category. Fig 4. GO functional enrichment analysis for the DEGs in different development stages for the same experimental material. [98]Fig 4 [99]Open in a new tab X axis means number of DEGs (the number is presented by its square root value). Y axis represents GO terms. All GO terms are grouped in to three ontologies: blue is for biological process, brown is for cellular component and orange is for molecular function. Fig 5. GO functional enrichment analysis for the DEGs in different experimental materials for the same development stage. [100]Fig 5 [101]Open in a new tab X axis means number of DEGs (the number is presented by its square root value). Y axis represents GO terms. All GO terms are grouped in to three ontologies: blue is for biological process, brown is for cellular component and orange is for molecular function. Significantly enriched metabolic pathways and signal transduction pathways in the DEGs were identified by KEGG pathway enrichment analysis. To further identify the biological pathways that are related to the DEGs of fruit development, the detected DEGs were mapped to the reference pathways in the KEGG database and compared to the whole transcriptome background. A KEGG pathway classification and functional enrichment analysis for the DEGs were run in various developmental stages and different source materials. As shown in [102]Fig 6 and [103]Fig 7, 1,620/2,260 DEGs and 458/636 DEGs were mapped to the 20 references pathways in various developmental stages ([104]Fig 6 and