Abstract

Introduction

   The potato (Solanum tuberosum) cultivar ‘Xin Daping’ is tetraploid with
   white skin and white flesh, while the cultivar ‘Hei Meiren’ is also
   tetraploid with purple skin and purple flesh. Comparative transcriptome
   analysis of white and purple cultivars was carried out using
   high-throughput RNA sequencing in order to further understand the
   mechanism of anthocyanin biosynthesis in potato.

Methods and Results

   By aligning transcript reads to the recently published diploid potato
   genome and de novo assembly, 209 million paired-end Illumina RNA-seq
   reads from these tetraploid cultivars were assembled on to 60,930
   transcripts, of which 27,754 (45.55%) are novel transcripts and 9393
   alternative transcripts. Using a comparison of the RNA-sequence
   datasets, multiple versions of the genes encoding anthocyanin
   biosynthetic steps and regulatory transcription factors were
   identified. Other novel genes potentially involved in anthocyanin
   biosynthesis in potato tubers were also discovered. Real-time qPCR
   validation of candidate genes revealed good correlation with the
   transcriptome data. SNPs (Single Nucleotide Polymorphism) and indels
   were predicted and validated for the transcription factors MYB AN1 and
   bHLH1 and the biosynthetic gene anthocyanidin 3-O-glucosyltransferase
   (UFGT).

Conclusions

   These results contribute to our understanding of the molecular
   mechanism of white and purple potato development, by identifying
   differential responses of biosynthetic gene family members together
   with the variation in structural genes and transcription factors in
   this highly heterozygous crop. This provides an excellent platform and
   resource for future genetic and functional genomic research.

Introduction

   Potato (Solanum tuberosum L.) is one of the world’s most important food
   crops, and is recognized as a source of health-promoting antioxidants
   for the human diet [[45]1]. Potato tubers contain significant amounts
   of polyphenols. Anthocyanins are the predominant group of visible
   polyphenols that comprise the red, purple, and blue pigmentation of
   potato [[46]2–[47]4]. As well as playing important physiological roles
   in plant response to stress and acting as attractants for pollination
   and seed dispersal, anthocyanins have high free radical scavenging
   activity, anti-inflammatory and anti-microbial properties, and dietary
   consumption has been associated with a reduced incidence of
   cardiovascular diseases, cancers, neurodegenerative diseases,
   osteoporosis, or diabetes [[48]5–[49]7]. Pigmented potato genotypes
   have been shown to contain significantly higher levels of anthocyanins
   and antioxidant activity; especially cultivars with purple and red skin
   and/or flesh, compared to those with white and yellow tubers
   [[50]4–[51]7]. Therefore, a high anthocyanin potato has potential as a
   new cultivar with enhanced health benefits.

   In many plant species, the anthocyanin biosynthetic pathway and its
   regulation has been elucidated. The genes that encode flavonoid
   biosynthetic steps and regulatory genes have also been cloned [[52]8,
   [53]9]. The biosynthesis of anthocyanin pigments is regulated at the
   transcriptional level by a MYB-bHLH-WD40 (MBW) transcription factor
   (TF) complex [[54]10, [55]11], composed of TFs from the R2R3-MYB, basic
   Helix-Loop-Helix (bHLH) and WD40 classes [[56]12, [57]13]. MYB TFs can
   be classified into three subfamilies based on the number of highly
   conserved imperfect repeats in the DNA-binding domain including R3 MYB
   (MYB1R) with one repeat, R2R3 MYB with two repeats, and R1R2R3 MYB
   (MYB3R) with three repeats [[58]14]. Those associated with
   up-regulation of the anthocyanin pathway are R2R3 MYBs. The first plant
   protein studied with a bHLH domain was Lc which is involved in the
   control of flavonoid/anthocyanin biosynthesis in maize [[59]15]. The
   first two hundred amino acids of this bHLH protein are required to
   interact with the MYB transcription factor partner, while the
   C-terminal of the protein interacts with WD40 proteins [[60]16]. The
   C-terminal ACT-like domain facilitates binding of the MYB to the
   promoter [[61]17]. WD proteins have four to eight imperfect tandem
   repeats and interact with other proteins through the WD repeat region
   [[62]18, [63]19].

   Genes involved in the regulation of the anthocyanin pathway have been
   identified in many different species, for example AN1 and AN2 in
   Petunia [[64]20, [65]21]; C1 and P1 in Zea mays [[66]22, [67]23];
   Rosea1 and Delila in Antirrhinum [[68]24, [69]25]; PAP1 and TTG8 in
   Arabidopsis [[70]26]; MYB1 and MdMYB10 in apples [[71]27, [72]28]; and
   VvMYBA in grape [[73]29]. In apple, two candidate bHLH transcription
   cofactors (bHLH3 and bHLH33) are also needed for activating promoters
   of anthocyanin biosynthetic genes and MYB10 autoregulation [[74]30].

   In potato, previous studies have found that the biosynthesis of
   anthocyanin pigments in the periderm of the tuber is controlled in part
   by three loci, D, P, and R. Genetic evidence based on the
   co-localization in the potato genetic map of R, P and D indicated these
   loci encode dihydroflavonol 4-reductase (DFR), flavonoid 3’,
   5’-hydroxylase (F3’5’H) and an R2R3 MYB transcription factor designated
   AN1[[75]31].

   The recent development of next-generation sequencing technologies can
   generate sequences on an unprecedented scale with a markedly reduced
   cost compared with traditional technologies [[76]32]. Next-generation
   RNA-sequencing (RNA-Seq) has rapidly replaced microarrays as an
   approach to profile transcriptomes in a high-throughput way [[77]33].
   It allows detection of transcripts with low abundance, identifies novel
   transcript units, and reveals their differential expression between
   different samples [[78]34, [79]35].

   To date, there have been no reports of using RNA-Seq technology to
   analyze the control of pigmentation in different potato cultivars. The
   availability of the complete genome sequence of the doubled monoploid
   Solanum tubersosum clone DM1-3 516R44 [[80]36] has made RNA-seq more
   informative. In this study, we used next-generation sequencing
   technology and bioinformatics tools to analyze the transcriptome of
   tetraploid white and purple potato cultivars, by both de novo assembly
   of transcripts and alignment to the published diploid potato genome to
   analyse differentially expressed genes related to anthocyanin
   biosynthesis between the cultivars and discovered new versions of the
   pathway genes and potential transcription factors. In addition,
   putative SNPs were identified and validated in AN1 bHLH1 and UFGT.
   According to the generated RNA-seq datasets, there are significant
   differences in expression among genes involved in anthocyanin
   biosynthesis. These RNA-seq datasets enhance the available
   transcriptome data of potato, and improves our understanding of
   variations in tuber color to identify important genes for anthocyanin
   biosynthesis in potato.

Materials and Methods

Plant material, RNA extraction, library construction and RNA sequencing

   A purple potato (Solanum tuberosum L.) cultivar ‘Hei Meiren’ (purple
   skin and purple flesh), a white potato cultivar ‘Xin Daping’ (white
   skin and white flesh) ([81]Fig 1), two red potato cultivars ‘Gannongshu
   NO.5’ (red skin and white flesh) and ‘Qinshu NO.9’ (red skin, white
   flesh and red vascular ring) ([82]S1 Fig) were grown in a greenhouse at
   Gansu Agricultural University, China. Three potato cultivars ‘Agria’
   (white skin and white flesh), ‘Red Jackets’ (red skin and white flesh)
   and ‘Heather’ (purple skin and white flesh) ([83]S1 Fig) were bought in
   a local supermarket, New Zealand. Five fresh tubers (diameter 4-5cm)
   were harvested, and cleaned with sterilized water. Skin tissue was
   carefully separated from cortex tissue using a scalpel to minimize
   flesh contamination. The skin and flesh of these potatoes were then
   immediately frozen in liquid nitrogen and stored in a -80°C freezer for
   later use.

Fig 1. Tubers of Solanum tuberosum cultivars.

   [84]Fig 1
   [85]Open in a new tab

   (A) white skin and white flesh tetraploid cultivar ‘Xin Daping’, (B)
   purple skin and purple flesh tetraploid cultivar ‘Hei Meiren’.

   Total RNA was extracted from purple skin and purple flesh of ‘Hei
   Meiren’ and white skin and white flesh of ‘Xin Daping’ using the
   PureLink Plant RNA Reagent Kit (Invitrogen, USA) according to the
   manufacturer's instructions. The RNA was quality checked and quantified
   using a Nanodrop ND-1000 (Nanodrop Technologies, USA). Enrichment of
   mRNA, fragment interruption, addition of adapters, size selection, PCR
   amplification and RNA-Seq were performed by Beijing Genome Institute
   (BGI; Shenzhen, China). The mRNA was separated from total RNA using
   oligo (dT) magnetic beads (Invitrogen, USA), then fragmented into short
   fragments using fragmentation buffer. cDNA was synthesized using the
   mRNA fragments as templates. Short fragments were purified and resolved
   with EB buffer for end repair and single nucleotide A (adenine)
   addition. Short fragments (200±20bp) were connected to adapters,
   suitable fragments are selected for the PCR amplification as templates
   according to agarose gel electrophoresis results, then an Agilent 2100
   Bioanalyzer and ABI StepOnePlus Real-Time PCR Systems were used in
   quantification and qualification of the sample library. In total, there
   were 4 RNA-Seq libraries constructed—purple skin (PS), purple flesh
   (PF), white skin (WS) and white flesh (WF)—using the same protocol.
   Finally, an aliquot from each of the libraries was barcoded, pooled and
   sequenced using Illumina HiSeq 2000 in paired-end (PE) mode. The
   remaining RNA was used for real-time quantitative PCR (qPCR)
   verification. The RNA-seq dataset is available at the NCBI Sequence
   Read Archive (SRA) with the accession number: SRP036626.

RNA Data filtering and de novo transcriptome assembly

   Raw RNA sequencing reads passing BGI’s quality control (QC) were 100bp
   long in length. According to the FastQC
   ([86]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) report,
   the reads were further trimmed to 75bp long, with 15bp removed from 5’
   and 10bp from 3’. Trimmed reads with phred scores less than 20 and
   uncertain bases N were discarded from the analysis. De novo
   transcriptome assembly was performed for each cultivar (‘Hei Meiren’
   with PS and PF, ‘Xin Daping’ with WS and WF) using Trinity version
   r20131110 ([87]http://trinityrnaseq.sourceforge.net/). Transcriptome
   assembly completeness was assessed using cegma_v2.4.010312
   ([88]http://korflab.ucdavis.edu/datasets/cegma/). These two assemblies
   were merged with PGSC_DM_v3.4 gene models downloaded from the Potato
   Genome Sequencing Consortium (PGSC,
   [89]http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml,
   [[90]36]) to create the consensus potato transcriptome reference using
   EvidentialGene VERSION 2013.07.27
   ([91]http://arthropods.eugenes.org/EvidentialGene/).

Differentially gene expression test

   Cleaned reads from each of the 4 RNA-Seq libraries were mapped to the
   consensus potato transcriptome reference sequences using bowtie2-2.2.1
   ([92]http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) with a
   maximum fragment size of 500 and maximum of 1 mismatch in seed region
   of 20 bases. Raw read counts were extracted from the sorted alignment
   files. For differential expression test (DET), we compared three
   conditions: WF vs. PF; WS vs. WS; White (S & F) vs. Purple (S & F). For
   each comparison, genes with low read counts per million (cpm < 1) that
   didn’t represent sufficient statistic significance were excluded from
   the DET analysis. The libraries were digitally normalized after
   filtering and the DET was completed using EdgeR 3.24
   ([93]http://www.bioconductor.org/packages/release/bioc/html/edgeR.html)
   [[94]37] and gplots 2.14
   ([95]http://cran.r-project.org/web/packages/gplots/index.html) in R
   3.0.1 ([96]http://www.r-project.org/) environment. Genes with FDR <
   0.05 and the absolute value of logFC (log2 fold change) not less than
   1.0 were considered as highly differentially expressed genes (DEGs).
   The gene expression unit was calculated using the RPKM [[97]38] method
   (Reads per kilobase transcriptome per million mapped reads).

Annotation and predicted CDS

   Identified differentially expressed transcripts were annotated using
   MapMen Mercator Web Service
   ([98]http://mapman.gabipd.org/web/guest/app/mercator) by searching the
   Arabidopsis TAIR proteins (TAIR10), SwissProt/UniProt Plant Proteins,
   Clusters of orthologous eucaryotic genes database (KOG) and Use
   conserved domain database (CDD), with a value of blast cut-off of 80.
   The final description for each novel transcript that was assembled from
   our RNA-Seq data but not predicted in the PGSC gene set was based on a
   series of searches as well as the ratio lengths to the HSP
   (highest-scoring segment pairs) from Blast. The complete Open Reading
   Frame (ORF) was predicted using the BioPerl
   ([99]http://www.bioperl.org/).

Gene ontology functional enrichment and pathway analysis of DEGs

   The DEGs were mapped to GO terms in the GO database
   ([100]http://www.geneontology.org/) to calculate gene numbers for every
   term. The hypergeometric test was then used to find significantly
   enriched GO terms in the input list of DEGs, based on 'GO::TermFinder'
   ([101]http://smd.stanford.edu/help/GO-TermFinder/GO_TermFinder_help.sht
   ml/). GO terms conforming to p-value through Bonferroni Correction ≤
   0.05 were defined as significantly enriched GO terms. KEGG [[102]39]
   ([103]http://www.genome.jp/kegg/) is used to perform pathway enrichment
   analysis of DEGs. MapMan software (version 3.6.0) was used to display
   expression profiles at the pathway level [[104]40]. The expression
   profiles of the metabolic pathways can be viewed by a discrete signal
   visualized using different types (blue and red) and intensity of color.

Real-time PCR (qPCR) quantitative analysis

   For qPCR analysis, total RNA from skin and flesh of white cultivars
   ‘Xin Daping’ and ‘Agria’, purple cultivar ‘Hei Meiren’ and ‘Heather’,
   red cultivars ‘Gannongshu NO.5’, ‘Qinshu NO.9’ and ‘Red Jackets’ were
   extracted as described above. Each RNA sample was subjected to DNase
   digestion (Takara, Dalian, China) to remove any remaining DNA and first
   strand cDNA synthesis was carried out using oligo dT according to the
   manufacturer’s instructions (SuperScript III, Invitrogen, USA). cDNA
   was diluted twenty-fold and used as the template for qPCR. All the
   primer sequences for 4-coumarate-CoA ligase (4CL), flavanone 3
   beta-hydroxylase (F3H), flavonoid 3'-monooxygenase (F3’H), chalcone
   synthase (CHS), dihydroflavonol 4-reductase (DFR), flavonoid
   3',5'-hydroxylase (F3’5’H), leucoanthocyanidin
   dioxygenase/anthocyanidin synthase (LDOX/ANS), flavonol synthase (FLS),
   anthocyanidin 3-O-glucosyltransferase (UFGT), transcription factors AN1
   ^816 and bHLH1 are listed in [105]S1 Table. qPCR DNA amplification and
   analysis was carried out using the LightCycler System (Roche
   LightCycler 480, Roche), with LightCycler software version 4. The
   LightCycler FastStart SYBR Green Master Mix (Roche) was used and 10 μl
   of total reaction volume applied to all the reactions following the
   manufacturer’s method. qPCR conditions were 5 min at 95°C, followed by
   40 cycles of 5 s at 95°C, 5 s at 60°C, and 10 s at 72°C, followed by
   65°C to 95°C melting curve detection. The qPCR efficiency of each gene
   was obtained by analyzing the standard curve of a cDNA serial dilution
   of that gene. Relative abundance was calculated with the ΔC[T] method
   using actin (accession number: [106]X55752) for template normalization.
   Potato actin is widely used as a reference gene in qPCR analysis
   [[107]19, [108]41, [109]42]. Error bars shown in qPCR data are
   technical replicates, representing the means ±SE of four replicate qPCR
   reactions. Statistical significance was determined by one-way ANOVA.

SNPs and indel discovery

   We used the nucleotide sequences of the published UFGT1 ([110]KP096267)
   from white skin of ‘Xin Daping’ cultivar, AN1 ([111]AY841127) from red
   skin of ‘Y83-1’ cultivar and bHLH1 ([112]JX848660) from purple tuber of
   ‘Magic Molly’ as the reference sequences for SNPs and indels discovery.
   Cleaned reads from the white and purple genotypes were mapped back to
   the references using Bowtie 2 with mapping criteria ‘very sensitive’.