Abstract Anthocyanins have significant functions in stress tolerance in pepper (Capsicum annuum L.) and also benefit human health. Nevertheless, the key structural genes and regulatory genes involved in anthocyanin accumulation in pepper fruits are still not well understood and fine mapped. For the present study, 383 F[2] plants from a cross between the green-fruited C. annuum line Z5 and the purple-fruited line Z6 was developed. Two separate bulked DNA pools were constructed with DNAs extracted from either 37 plants with high anthocyanin content or from 18 plants with no anthocyanin. A combination of specific-locus amplified fragment sequencing (SLAF-seq) and bulked segregant analysis (BSA) was used to identify candidates for regions associated with anthocyanin accumulation. We identified a total of 127,004 high-quality single nucleotide polymorphism (SNP) markers, and detected 1674 high-quality SNP markers associated with anthocyanin accumulation. Three candidate anthocyanin-associated regions including the intervals from 12.48 to 20.00 Mb, from 54.67 to 56.59 Mb, and from 192.17 to 196.82 Mb were identified within a 14.10-Mb interval on chromosome 10 containing 126 candidate genes. Based on their annotations, we identified 12 candidate genes that are predicted to be associated with anthocyanin expression. The present results provide an efficient strategy for genetic mapping of and valuable insights into the genetic mechanisms of anthocyanin accumulation in pepper fruit, and allow us to clone and functionally analyze the genes that influence anthocyanin accumulation in this species. Introduction Anthocyanins are soluble flavonoid plant pigments that confer colors ranging from bright red-orange to blue-violet or black [[40]1] to fruits, flowers, seeds, leaves, and stems. Anthocyanins are involved in pigmentation, attraction of seed distributors and pollinators [[41]2, [42]3], and protection against photo-oxidative damage in plants [[43]4], and might have health-promoting effects in humans [[44]5, [45]6]. Anthocyanins are accumulated in the palisade and mesophyll cells of purple or black leaves and in the outer mesocarp of black and violet fruit [[46]1]. The violet or black pigmentation of pepper is also due to high levels of anthocyanin [[47]1, [48]7]. Therefore, new pepper cultivars with high anthocyanin content could both improve stress tolerance in pepper plants and enhance health benefits for humans. Extensive studies of the biosynthetic pathway leading to anthocyanins have found 12 structural genes and three transcription factors (TFs) involved in the pathway [[49]8–[50]10]. Anthocyanin biosynthesis begins in the phenylpropanoid pathway followed by the flavonoid pathway ([51]S1 Fig) [[52]8]. Phenylpropanoid pathway enzymes phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), and 4-coumarate CoA ligase (4CL) transform phenylpropanoid into 4-coumaroyl-CoA. In the flavonoid pathway, the product of condensation of one 4-coumaroyl-CoA with three malonyl-CoA is converted into dihydroflavonol by the enzymes chalcone synthase (CHS), chalcone isomerase (CHI), and flavanone 3-hydroxylase (F3H). Dihydroflavonol is then transformed into dihydromyricetin, a colorless molecule, by the enzyme flavonoid 3´,5´-hydroxylase (F3´5´H). This compound is subsequently reduced by dihydroflavonol 4-reductase (DFR) and ultimately converted into blue delphinidin by anthocyanidin synthase (ANS). All other anthocyanins are then derived from delphinidin. The predominant anthocyanin in pepper (C. annuum L.) is delphinidin-3-trans-coumaroylrutinoside-5-glucoside [[53]10–[54]12]. The delphinidins undergo further modification via glycosylation catalyzed by UDP glucose-flavonoid 3-O-glycosyl-transferase (UFGT/3GT), 5/7-O-glycosyl- transferases (5GT/7GT) [[55]13, [56]14], rhamnosylation catalyzed by rhamnosyl transferases (RTs) [[57]15], acylation catalyzed by acyl transferases (AT) [[58]16], and methylation catalyzed by methyltransferases (MTs) [[59]17]. The anthocyanins are subsequently transported into the vacuole, which might be mediated by glutathione S-transferase (GST) [[60]18, [61]19], anthocyanic vacuolar inclusions (AVIs) in the cytoplasm [[62]20], and anthocyanin permease (ANP) [[63]8, [64]9]. A series of structural genes that encode these anthocyanin biosynthetic enzymes have already been cloned. Each of these genes is expressed in a specific way depending on the tissue or developmental stage. The sequences of these genes are also highly conserved across plant species [[65]8, [66]10]. A regulatory complex comprised of the basic helix-loop-helix (bHLH) protein R2R3-MYB plus WD40 repeats (MYB-bHLH-WD40) interacts with structural gene promoters to modulate the expression of anthocyanin structural genes [[67]7, [68]12, [69]21, [70]22]. Among these, MYB appears to be an important determinant in anthocyanin accumulation [[71]10]. MYB and WD40 TFs control the expression of some late (F3’5’H, DFR, or 3GT) and early anthocyanin biosynthesis pathway structural genes (CHS, CHI, or F3H) in pepper fruits [[72]9]. However, this result differs from that of Borovsky et al. (2004), which found that only the expression of the late structural genes (DFR, ANS) is dependent on the incompletely dominant gene anthocyanin (A) [[73]15]. The A locus, which encodes a MYB TF (CaMYB), controls the expression of anthocyanin in various tissues of pepper [[74]15, [75]23–[76]25]. Li et al. (2013) also found that CaMYB1 and CaMYB2 which were homologous to CaMYB might regulate the expression of anthocyanin biosynthesis in green fruits of hot pepper [[77]26]. Ben-Chaim et al. (2003) mapped the pepper A locus, which is homologous to tomato anthocyanin 1 (ANT1) and petunia anthocyanin 2 (AN2) [[78]15], onto pepper chromosome 10 [[79]24]. In addition, the A locus is linked to a major-effect quantitative trait locus (QTL) (fs10.1) controlling pepper fruit shape [[80]24, [81]27]. Borovsky and Paran (2011) mapped fs10.1 to 0.3 cM from the its nearest molecular marker, CT11, and 6.3 Mb from ANT1, which is the ortholog of the pepper A gene in tomato, according to the tomato genome assembly release 2.30 ([82]http://solgenomics.net/) [[83]27]. The pepper genome sequence was published in 2014 [[84]28], so the physical distance between fs10.1 and CT11 in pepper was not known at that time. Informative, saturated linkage maps are important for fine mapping quantitative traits. Due to the lack of saturated linkage maps in pepper, the anthocyanin accumulation trait had also not yet been finely mapped in this species. The genome sequence of pepper [[85]28] will enable fine mapping of anthocyanin accumulation and other traits in pepper. Efficient identification of genes or QTL linked to plant traits of interest will become possible by combining specific-locus amplified fragment sequencing (SLAF-seq) with bulked-segregant analysis (BSA) [[86]29–[87]34]. SLAF-seq is a type of next-generation, reduced-representation genome sequencing strategy for efficient identification of single-nucleotide polymorphisms (SNPs), while BSA allows subsequent rapid screening of bulked pools of DNA to identify molecular markers linked closely to the target allele of a gene or QTL controlling a trait of interest [[88]35, [89]36]. Here, BSA combined with SLAF-seq was first time to be used to map the anthocyanin accumulation trait in pepper by pooling DNAs from F[2] plants with distinct anthocyanin accumulation phenotypes from a cross between the inbred C. annuum lines Z5 and Z6 as parents. The purple pepper line Z6 has high anthocyanin content with purple stems and fruit, while the green pepper line Z5 has no anthocyanin with green stems and fruit. Although a lot of anthocyanin-related genes have been isolated so far, but only A locus in pepper has been mapped. Nevertheless, the anthocyanin accumulation-related genes including A locus have not been fine mapped due to the lack of saturated linkage maps in pepper and candidate genes for bHLH, WD40 and variation of anthocyanin have not been identified. Thus, the main objective of this study was to identify genome regions and candidate genes related to anthocyanin accumulation-related genes in pepper, and to develop SNP markers to provide diagnostic SNP markers for molecular marker-assisted selection of anthocyanin accumulation genes and lay a foundation for cloning. Materials and methods Plant materials We used the green-fruited C. annuum line Z5 and purple-fruited C. annuum line Z6 as parental lines in the present study. Z6 is an anthocyanin-pigmented variety (0.96 mg anthocyanin/100 g fresh weight) that has purple stems and fruit, while Z5 contains no anthocyanins and has green stems and fruit ([90]S1 Table). We obtained an F[1] hybrid from a Z5 × Z6 cross, and developed the F[2] mapping population of 383 plants by self-pollinating the F[1] hybrid. The parental plants, and F[1] and F[2] populations were grown in a greenhouse at Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China. Extraction and measurement of anthocyanin content Pepper fruits were harvested at 20 d post-anthesis following the method of Aza-González et al. (2013) [[91]8]. The pepper fruit pericarps were dissected and immediately immersed into liquid nitrogen, and the frozen pericarps were stored at -80 °C. Extraction and measurement techniques described in Lee et al. (2005) [[92]37] were used to isolate and quantify anthocyanins in pericarp tissue. In brief, three aliquots of 20.00 g of frozen pulverized pericarp tissues were extracted in 100 mL of 0.1% (v/v) hydrochloric acid in methanol for 2 h at RT with lights on, and then were filtered. The filtered extract was then transferred to a 250-mL volumetric flask and the residue was re-extracted with hydrochloric acid in methanol at least three times until a spectrophotometrically colorless extract was obtained. Two aliquots of 50 mL of combined extracts were condensed to dryness in a rotary evaporator (Buchi, Flawil, Switzerland), and then were diluted with either 0.025 M potassium chloride (pH 1.0) or 0.4 M sodium acetate (pH 4.5) in separate 25-mL volumetric flasks. A test aliquot was then diluted to determine the dilution factor with pH 1.0 buffer until reaching an absorbance of 0.2 to 0.8 at 520 nm (A520). The absorbances at both 520 and 700 nm of test aliquots diluted with pH 1.0 or pH 4.5 buffer were then measured. The anthocyanin concentration as cyanidin-3-glucoside equivalents was calculated as: Anthocyanin (cyanidin-3-glucoside equivalents, mg/L) = [MATH: A×MW×DF×103ε×1 :MATH] , where A = (A520nm–A700nm)[pH 1.0] –(A520nm–A700nm)[pH 4.5]; MW (the molecular weight of cyanidin-3-glucoside) = 449.2 g/mol; DF = the dilution factor determined above; l = cm path length; ε = molar extinction coefficient of cyanidin-3-glucoside; and 10^3 convert from g to mg. Extraction and pooling of DNA Young leaves were harvested from both parents and F[2] plants and genomic DNA was isolated using the N-cetyl N,N,N-trimethylammonium bromide (CTAB) method [[93]38]. The two DNA pools for bulked segregant analysis (BSA) were constructed by combining equal amounts of DNA from high- or low-anthocyanin F[2] plants chosen based on measured anthocyanin concentrations ([94]S1 Table). The high-anthocyanin pool (H-pool) was comprised of DNA from 37 plants with high anthocyanin content that ranged from 0.90 to 7.31 mg per 100 g fresh weight, and the low-anthocyanin pool (N-pool) was comprised of DNA from 18 plants with no measurable anthocyanin content. DNA isolated from parental lines Z5 and Z6 and the two pools of F[2] DNA were used for construction and sequencing of SLAF libraries. Construction and high-throughput sequencing of SLAF libraries An initial restriction enzyme digestion experiment was performed in the pepper line CM334 to select the appropriate restriction enzyme for SLAF library construction as per Xu et al. (2015) [[95]39]. RsaI (New England Biolabs, Ipswich, MA, USA) was chosen for digesting the parental and bulked genomic DNAs because it resulted in even distribution of SLAFs on each pepper CM334 chromosome ([96]S2 Fig). Single-nucleotide A overhangs were added to the resulting DNA fragments using Klenow fragment (USB, Cleveland, OH, USA), and then fragments were ligated with dual-index sequencing adaptors [[97]40] and amplified by PCR (polymerase chain reaction). Amplified fragments were then purified, pooled, and screened for the optimal fragment size range for construction of SLAF libraries, as described by Sun et al. (2013) [[98]36] with minor modifications. DNA fragments of 364 to 414 bp were identified and subjected to paired-end sequencing to identify SLAFs using the Illumina HiSeq 2500 platform (Illumina, Inc., San Diego, CA, USA) at Beijing Biomarker Technologies Corporation ([99]http://www.biomarker.com.cn). Evaluation for the SLAF libraries To validate sequencing procedures and accuracy, we used rice (Oryza sativa L. japonica) as a control with the version 7.0 rice reference genome ([100]http://rice.plantbiology.msu.edu/) with the same library construction and sequencing methods as used for the Z5 × Z6 mapping population to evaluate the SLAF libraries constructed for pepper. First, BWA software was used to compare the rice control sequencing reads with those of the pepper reference genome [[101]41]. As [102]Table 1 shows, we achieved a typical 84.94% efficiency of mapping paired-end reads in the rice control. Because complete digestion by a restriction enzyme improves SLAF experiments, enzyme digestion efficiency was evaluated as an index of optimal SLAF results, which can be affected by genome complexity, DNA purity, and restriction enzyme digestion completeness. Second, the efficiency of digestion of rice genomic DNA in the present study was 93.81% ([103]Table 1), which was adequate for construction of these SLAF libraries. Finally, the read lengths of the paired-end reads of the rice control that mapped to the rice reference genome ranged from 364 to 414 bp ([104]S3 Fig). These results indicated the construction of SLAF libraries were accurate. Table 1. Efficiency of mapping paired-end reads and enzyme digestion in the rice control. Mapped reads Efficiency (%) Enzyme digestion Efficiency (%) Mapped paired-end reads 84.94 Complete digestion 93.81 Mapped single-end reads 4.58 Partial digestion 6.19 Unmapped reads 10.48 — — [105]Open in a new tab Analysis of SLAF-seq data Raw sequence reads from each sample generated on the Illumina HiSeq 2500 platform were subjected to a series of quality control procedures including assessment of sequence quality scores and guanine-cytosine (GC) content to ensure reliable and unbiased reads [[106]36]. More than 80% of sequences in the four libraries had quality scores greater than or equal to Q30 (where a quality score of Q30 indicates 0.1% error probability, or a 99.9% probability of sequence accuracy). We used BWA software to align clean sequence reads against the C. annuum cv. Criollo de Morelos 334 reference genome CM334 ([107]http://peppergenome.snu.ac.kr/download.php, version 1.55) [[108]28, [109]41]. BLAT [[110]42] was then used to cluster all paired-end reads that had distinct index data by sequence similarities among the two parents and the pooled libraries. Sequences that were over 90% identical were then considered as a single SLAF locus or tag. Detecting high-quality SNPs We used GATK [[111]43] and SAMtools [[112]44] software to detect single-nucleotide polymorphisms (SNPs). We designated the intersection of SNPs detected by both GATK and SAMtools as the ultimate set of SNPs to subject to further analysis. SnpEff software [[113]45] was then used to annotate whether SNPs were located upstream or downstream from a nearby gene, or in an intergenic region, and whether they were synonymous or non-synonymous mutations with reference to annotated gene models from the pepper reference genome. SNPs were filtered prior to association analysis according to the following criteria: SNPs with multiple alleles were excluded; SNPs with sequencing depth of less than 4-fold in each F[2] pool or parental DNA were omitted; SNPs with identical genotypes between pools were eliminated; and SNPs exhibiting recessive alleles that were not inherited from the recessive parent were excluded. Ultimately, the resulting high-quality SNPs were then used in association analysis. Association analysis Both Euclidean distance (ED) [[114]43] and a SNP-index algorithm [[115]46, [116]47] were used to conduct the association analysis. The ED of the allele frequencies of each SNP was calculated between the H-pool and N-pool, as described by Hill et al. (2013) [[117]43]: [MATH: ED=(ANp< /mi>oolAHpool)2+(CNp< /mi>oolCHpool)2+(GNp< /mi>oolGHpool)2+(TNp< /mi>oolTHpool)2 :MATH] , where A, C, G, or T refers to the frequency of each nucleotide. Euclidean distances were squared (ED^2) to reduce noise and increase the effect of large ED values. The ED values from the H-pool and N-pool were fitted using a Loess regression analysis [[118]43], and the significance threshold for marker-trait associations was set at three standard deviations above the median of the Loess-fitted values. Regions with Loess-fitted values exceeding this threshold were considered as the candidate genomic regions associated with the anthocyanin content of pepper in this cross. The SNP-index algorithm is a useful tool for identifying significant between-pool differences in genotype frequencies. We used SLAF depth to represent genotype frequency in our SNP-index algorithm [[119]48]. We calculated the ΔSNP-index as follows: ΔSNP-index = SNP-index (H-pool)–SNP-index (N-pool), where SNP-index (N-pool) = N[Np]/(N[Np] + H[Np]) and SNP-index (H-pool) = H[Hp]/(H[Hp] + N[Hp]). N[Np] and H[Np] indicate the depth of the N-pool derived from Z5 and Z6, respectively; and N[Hp] and H[Hp] represent the depth of the H-pool derived from Z5 and Z6, respectively. Thus, the ΔSNP-index equals zero if the SNP-indices of the N-pool and the H-pool are equal. A ΔSNP-index closer to 1 suggests that high anthocyanin content is almost completely associated with one genotype, and that the associated SNPs are closely linked to genomic regions conferring high anthocyanin content. In contrast, a ΔSNP-index closer to –1 indicates that the associated SNPs are linked to genomic regions conferring low anthocyanin content. We calculated the confidence coefficient for each ΔSNP-index then fitted ΔSNP-index values using the SNPNUM method [[120]44]. The genomic regions associated with a trait are identified when the fitted marker values are greater than the 99% confidence coefficient threshold. Ultimately, the intersections of candidate regions identified by both the ED and SNP-index approaches were designated as the final candidate anthocyanin-related regions. We then plotted ED, SNP-index, and ΔSNP-index data individually, and used CIRCOS 0.66 ([121]http://circos.ca/) to plot a circular graph of the distributions of chromosomes, genes, SNPs, ED values, and ΔSNP-indices. Candidate gene annotation and screening The functional annotations of candidate genes were explored using blastx alignment with default parameters to sequences at the Cluster of Orthologous Groups of proteins database (COG, [122]http://www.ncbi.nlm.nih.gov/COG/), Gene Ontology (GO, [123]http://www.geneontology.org/), the Kyoto Encyclopedia of Genes and Genomes (KEGG, [124]http://www.genome.jp/kegg/), the NCBI non-redundant protein database (NR, [125]ftp://ftp.ncbi.nih.gov/blast/db/), and Swiss-Prot ([126]http://www.uniprot.org/). All genes that could be related to the synthesis, storage, or regulation of anthocyanin accumulation, as well as any genes containing SNPs resulting in non-synonymous mutations were chosen for further analysis as candidate genes. Results Analysis of sequence data and identification of SNPs A total of 52.76 Mb and 27.65 Mb of clean sequence reads with an average read length of 50 bp were obtained in the SLAF libraries for the two pools and parents, respectively ([127]Table 2). The average percentage of Q30 bases was 94.50% and the GC content was 37.03%, indicating a majority of high-quality bases. Using BWA, we could map over 90.81% of the clean sequence reads to the CM334 pepper reference genome, indicating the level of sequencing accuracy. After clustering, 771,025 SLAFs were evenly distributed on the CM334 pepper reference chromosomes ([128]Fig 1A; [129]S2 Table). The numbers of SLAFs ranged from 41,197 on chromosome 08 to 73,838 on chromosome 01, 603,960 from the N-pool, 556,567 from the H-pool, 491,731 from Z6, and 496,868 from Z5. The average SLAF sequencing depths were 23.45-, 25.77-, 33.83-, and 45.74-fold in Z6, Z5, the H-pool, and the N-pool, respectively. Table 2. Sequence data summary for parental lines Z5 and Z6 and bulked H-pool and N-pool. Sample Clean_Reads Q30 (%) GC (%) Mapped reads rate (%) Number of SLAFs Total depth Average depth N-pool 31,297,954 94.76 36.70 90.85 603,960 27,624,290 45.74 H-pool 21,460,775 94.35 37.11 90.50 556,567 18,829,026 33.83 Z6 13,090,095 94.37 37.20 91.11 491,731 11,529,003 23.45 Z5 14,556,569 94.50 37.12 90.77 496,868 12,806,096 25.77 [130]Open in a new tab N-pool, the library created from pooled DNA from plants with no anthocyanin content; H-pool, and the library created from pooled DNA from plants with high anthocyanin content; Mapped reads rate (%), the proportion of the clean sequence reads mapped to the CM334 reference genome relative to the total clean reads; Q30, a quality score of 30 indicates 0.1% error probability or 99.9% sequence accuracy; GC, guanine-cytosine content; SLAF, specific-locus amplified fragment. Fig 1. Distribution of SLAFs and SNPs on each chromosome of pepper. [131]Fig 1 [132]Open in a new tab The x-axis and y-axis represent chromosome length and chromosome number, respectively. The distance between two adjacent yellow bars indicates 1 Mb on the chromosome, and black lines indicate SLAFs or SNPs. GATK [[133]43] and SAMtools [[134]44] software were used to identify a total of 836,852 SNPs, whose properties are shown in [135]S3 Table. The most SNPs, 127,688, were detected on chromosome 12, while the fewest, 17,073, were detected on chromosome 08 ([136]S2 Table). These SNPs were mapped to each chromosome on the CM334 reference genome as shown in [137]Fig 1B. A total of 127,004 high-quality SNPs remained after filtering for use in the association analysis to find candidate anthocyanin-associated regions in pepper. Association analysis of anthocyanin content The ED value at each SNP locus was calculated for the 127,004 high-quality SNP markers. We squared ED values to reduce noise, and then fitted the association values by Loess regression analysis [[138]43]. The threshold for association between markers and the anthocyanin trait was set to 0.27, or three standard deviations above the median of all the Loess-fitted values. When the Loess-fitted value was greater than 0.27, we detected three candidate regions on chromosome 05 for anthocyanin accumulation, and four candidate regions on chromosome 10 for this trait ([139]Fig 2). As shown in [140]Table 3, we identified 270 genes in a total interval of 80.12 Mb on chromosome 05. On chromosome 10, we identified 64 genes in the interval from 11,722,320 to 20,005,690 bp, 20 genes in the interval from 53,805,574 to 57,945,748 bp, 68 genes in the interval from 191,899,653 to 196,817,467 bp, and 15 genes in the interval from 210,028,131 to 211,321,646 bp. A total of 437 genes and 6268 high-quality SNP markers were found in these candidate regions, among which we identified two genes containing SNPs resulting in non-synonymous mutations. As shown in [141]S4 Table, a total of 7537 and 2781 SNP markers were detected in parents and bulked pools, respectively, and more than 97.00% of these SNPs were annotated as located in the intergenic regions. Fig 2. Association values for anthocyanin accumulation on each pepper chromosome from Euclidean distance-based association analysis. [142]Fig 2 [143]Open in a new tab The 12 pepper chromosomes are represented along the x-axis, and the association values based on Euclidean distance (ED) are indicated along the y-axis. ED-based association values at each SNP location are represented by colored dots. The red dashed line represents the association threshold and black line indicates Loess-fitted values. When ED-based association values are higher, stronger association of a SNP with anthocyanin accumulation is indicated. Table 3. Summary information for SNP-index and Euclidean distance-based association analyses. ChrID Start (bp) End (bp) Interval size (Mb) Number of genes Number of SNPs Euclidean distance Chr05 53,667,767 93,840,115 40.17 124 3,238 Chr05 100,693,328 103,396,658 2.70 8 37 Chr05 132,573,706 169,821,807 37.25 138 891 Chr10 11,722,320 20,005,690 8.28 64 961 Chr10 53,805,574 57,945,748 4.14 20 341 Chr10 191,899,653 196,817,467 4.92 68 601 Chr10 210,028,131 211,321,646 1.29 15 199 SNP-index Chr10 12,479,910 20,005,690 7.53 52 912 Chr10 54,672,471 56,593,567 1.92 6 226 Chr10 192,166,533 196,817,467 4.65 68 536 [144]Open in a new tab Chr, chromosome; SNP, single-nucleotide polymorphism. ΔSNP-index values were determined by subtracting the H-pool SNP-index from the N-pool SNP-index. Average SNP-indices for the H-pool and N-pool were plotted in 400-SNPsliding windows in 1-SNP steps in the CM334 genome assembly. The SNP-indices of the H- and N-pools and the ΔSNP-index are shown plotted in [145]Fig 3A, [146]Fig 3B, and [147]Fig 3C, respectively. When the fitted SNP marker values exceeded the threshold at a 99% confidence coefficient, three candidate regions for anthocyanin accumulation were detected on chromosome 10 by examining ΔSNP-index values ([148]Fig 3C; [149]Table 3). There were 52 genes in the intervals from 12,479,910 to 20,005,690 bp, six genes in the interval from 54,672,471 to 56,593,567 bp, and 68 genes in interval from 192,166,533 to 196,817,467 bp, or a total of 126 genes in a 14.10 Mb candidate region interval. One gene containing a SNP resulting in a non-synonymous mutation was found in the interval from 194,974,062 to 194,974,736 bp. In addition, 1674 high-quality SNP markers were identified and mapped in the candidate regions ([150]Table 3), and 2069 and 926 SNP markers we identified between the parents and bulked H-pool and N-pool, respectively ([151]S4 Table). A total of 97.20% of the SNP markers in parents and 96.54% of the SNP markers in bulked pools were annotated as located in intergenic regions. Fig 3. [152]Fig 3 [153]Open in a new tab Graphs of H-pool (a) and N-pool (b) indices and ΔSNP-index (c) for SNP-index-based association analysis. The 12 chromosomes are represented along the x-axis, and the SNP index or ΔSNP-index values are shown along the y-axis. The black line indicates fitted SNP-index or ΔSNP-index values. The red, blue, and green lines indicate the thresholds at the 99%, 95%, and 90% confidence coefficients, respectively. Final candidate regions were determined by overlapping the ED and SNP-index association analysis results to identify regions tightly associated with anthocyanin content. The final candidate regions were the same as those identified by SNP-index association analysisin the intervals from 12.48 to 20.00 Mb, from 54.67 to 56.59 Mb, and from 192.17 to 196.82 Mb on chromosome 10. Visualization of results of genomic analysis of anthocyanin accumulation with the combined SLAF-seq/BSA strategy To more clearly visualize all of the results from the present study, we plotted a circular graph ([154]S4 Fig) representing, in order from the first circle to the fifth circle, the 12 chromosomes of pepper, the distributions of genes, SNP densities, ED values, and ΔSNP-index values. Annotation of SNPs and genes in the candidate anthocyanin- associated region We detected a total of 2069 and 926 SNP markers in the candidate anthocyanin-associated genomic regions between the parents and between the bulked pools, respectively ([155]S4 Table). Analysis using SnpEff software [[156]45] indicated the 20 SNPs in the upstream regions of genes, 2011 SNPs in intergenic regions, and 26 SNPs in the downstream regions of genes between the parental lines. This analysis also revealed 10 SNPs in the upstream regions of genes, 894 SNPs in intergenic regions, and 13 SNPs in the downstream regions of genes between the bulked pools. Additionally, there were only two SNPs resulting in synonymous mutations and one SNP resulting in a non-synonymous in coding regions, indicating that most SNP markers were located in the intergenic regions ([157]S4 Table). A total of 126 candidate genes related to anthocyanin content ([158]S5 Table) were identified within the three candidate regions on chromosome 10 according to the current CM334 pepper reference genome annotation. Excluding 11 genes in the candidate regions that have not yet been annotated in public databases, we found annotations for 38, 101, 18, 100, and 126 genes within the candidate anthocyanin-associated regions in the COG, GO, KEGG, NCBI NR, and Swiss-Protdatabases, respectively ([159]Table 4). The 38 genes predicted by COG analysis were classified by their putative functions ([160]Fig 4); 11 genes were classified as ‘general function prediction only’ and seven genes were associated with ‘transcription’. The putative functions of seven other genes were classified as ‘replication, recombination and repair’. In addition, four genes were associated with ‘carbohydrate transport and metabolism’, ‘cell wall/membrane/envelope biogenesis’, ‘posttranslational modification, protein turnover, chaperones’, or ‘signal transduction mechanisms’, which accounted for 10.53% of all genes annotated in the COG analysis ([161]S5 Table; [162]Fig 4). Table 4. Annotation statistics for candidate anthocyanin-associated pepper genes for anthocyanin accumulation. Database NR GO SwissProt COG KEGG Total Number of annotated 126 101 100 38 18 126 Number of CDS containing non-synonymous subsitutions 1 1 1 1 0 1 [163]Open in a new tab Fig 4. Functional annotations of candidate anthocyanin-associated genes according to the Cluster of Orthologous Groups of proteins database (COG). [164]Fig 4 [165]Open in a new tab The x-axis indicates the classification annotation in COG and the y-axis represents the number of annotated candidate genes. GO term enrichment analysis identified genes associated with enriched GO terms from the molecular function, biological process, and cellular component domains ([166]S5 Fig). Because some genes had more than one annotation from different GO domains, genes could sometimes fall into more than one functional category ([167]S5 Fig). The major enriched functional categories in our data included those associated with cellular components such as cell (GO:0005623), cell part (GO:0044464), or organelle (GO:0043226); molecular function such as catalytic activity (GO:0003824) or binding (GO:0005488); and biological process, specifically biological regulation (GO:0065007), cellular process (GO:0009987), metabolic process (GO:0008152), response to stimulus (GO:0050896), or single-organism process (GO:0044699). These results suggested that major metabolic changes in the biological process domain and posttranslational modifications in the molecular function domain might be involved in the regulation of anthocyanin content of pepper. At the same time, the regulation of anthocyanin content could also be associated with cell, organelle and cell part functions in the cellular component domain. Additionally, directed acyclic graphs (DAG) for GO terms were plotted using the Bioconductor package TopGO [[168]49] to show the hierarchical parent-child relationships among GO terms ([169]S6–[170]S8 Figs). This analysis showed that the most-enriched term was chloroplast thylakoid membrane (GO:0009535) in the cellular component domain, DNA-directed RNA polymerase activity (GO:0003899) in the molecular function domain, and negative regulation of abscisic acid-activated signaling pathway (GO:0009788) in the biological process domain, which might be involved in the regulation of pepper anthocyanin content. To better understanding of the functions of genes potentially involved in anthocyanin accumulation, we performed KEGG analysis and identified 18 genes in 22 pathways using KEGG analysis ([171]Fig 5). One gene, CA10g12720, was involved in the ‘endocytosis’ pathway in cellular processes, and CA10g12690 was involved in the ‘plant hormone signal transduction’ pathway that was associated with environmental information processing. In addition, 10 genes were identified in seven pathways in genetic information processing and 16 genes were identified in 12 pathways related to metabolism. Four pathways were identified as enriched (P <0.05) by KEGG enrichment analysis including more than two genes with predicted functions in the ‘RNA polymerase’ (ko03020), ‘homologous recombination’ (ko03440), ‘pyrimidine metabolism’ (ko00240), or ‘purine metabolism’ (ko00230) pathways ([172]S6 Table) that are associated with the regulation of gene expression. Fig 5. Pathways identified as enriched in the candidate regions via KEGG analysis. [173]Fig 5 [174]Open in a new tab The x-axis represents the number and percentage of annotated candidate genes and the y-axis represents name of pathway in KEGG. Candidate genes correlated with anthocyanin accumulation Annotations suggested that 12 candidate genes could have functions associated with anthocyanin accumulation in pepper ([175]Table 5). Seven genes of these genes have annotations clearly associated with anthocyanin biosynthesis or metabolism. For example, CA10g04060 is annotated as 4CL2 in the 4CL gene family [[176]50] and encodes a predicted 4-coumarate-CoA ligase 2 in the ‘ubiquinone and other terpenoid-quinone biosynthesis’ pathway (ko00130), or 4-coumarate-CoA ligase (GO:0016207) in the phenylpropanoid pathway. CA10g03640 and CA10g03760, which encode a predicted anthocyanin 5-aromatic acyltransferase (5AT) and a predicted anthocyanidin 3-O-glucoside 6'-O-acyltransferase (3AT), respectively, were predicted to have anthocyanin 5-O-glucoside 6'-O-malonyltransferase activity (GO:0033810). CA10g03880, CA10g12640, and CA10g12650 were all annotated as glucosyltransferases. CA10g03880 was homologous to 3GGT from Ipomoea nil in anthocyanin biosynthesis pathway [[177]51]. CA10g12640 and CA10g12650 were predicted to encode proteins with flavonol 3-O-glucosyltransferase activity (GO:0047893). Additionally, CA10g12890 (homologous to Solanum melongena cytochrome P450 76A2, CYP76A2) might encode a protein with F3´5´H activity (GO:0033772). Three out of 12 candidate genes were homologs of regulatory genes that regulate structural anthocyanin biosynthesis genes. CA10g03650 was homologous to MYB39 from Arabidopsis thaliana, which encodes a MYB39 TF involved in regulation of phenylpropanoid metabolism (GO:2000762). Meanwhile, CA10g12810, which is homologous to Arabidopsis thaliana APL, also encodes a MYB-family TF APL predicted to participate in the regulation of anthocyanin metabolism (GO:0031537). Further, Zhou et al. (2016) reported that APL was also involved in phenylpropanoid and flavonoid biosynthesis pathways [[178]52]. CA10g12710, a homolog of nol10 in Danio rerio was predicted to be a WD40 repeat protein based on its COG annotation. In addition, CA10g03810 was homologous to Arabidopsis thaliana GDSL esterase/lipase At5g45960, which was associated with accumulation of anthocyanin in ultraviolet light-exposed plants (GO:0043481). Finally, CA10g12840 was the only candidate gene containing a SNP resulting in a non-synonymous mutation. CA10g12840 encodes a predicted subtilisin-like protease, which can bind to maltose binding protein [[179]53] and, based on its COG annotation, is associated with ‘Posttranslational modification, protein turnover, chaperones’. The function of CA10g12840 in anthocyanin accumulation needs to be further studied. Table 5. Candidate genes related to anthocyanin accumulation in pepper. Classification Gene name Gene_ID Start (bp) End (bp) Description Structural genes 4CL2 CA10g04060 18,570,738 18,572,196 4-coumarate—CoA ligase 2 Nicotiana tabacum [[180]50] 5AT CA10g03640 12,745,342 12,746,112 Anthocyanin 5-aromatic acyltransferase Gentiana triflora 3AT CA10g03760 14,981,576 14,981,851 Anthocyanidin 3-O-glucoside 6”-O-acyltransferase Perilla frutescens 3GGT CA10g03880 16,367,650 16,369,032 Anthocyanidin 3-O-glucoside 2”-O-glucosyltransferase Ipomoea nil [[181]51] UGT73C3 CA10g12640 194,040,338 194,041,813 UDP-glycosyltransferase 73C3 Arabidopsis thaliana UGT73C5 CA10g12650 194,053,311 194,053,472 UDP-glycosyltransferase 73C5 Arabidopsis thaliana CYP76A2 CA10g12890 195,250,338 195,253,243 Cytochrome P450 76A2 Solanum melongena Transcription factors MYB39 CA10g03650 13,533,765 13,535,787 Transcription factor MYB39 Arabidopsis thaliana nol10 CA10g12710 194,493,711 194,494,820 Nucleolar protein 10 Danio rerio APL CA10g12810 194,789,125 194,791,433 MYB family transcription factor APL Arabidopsis thaliana [[182]52] Anthocyanin accumulation in response to stress At5g45960 CA10g03810 15,633,033 15,636,114 GDSL esterase/lipase At5g45960 Arabidopsis thaliana Genes with non-synonymous SNPs ARA12 CA10g12840 194974062 194974736 Subtilisin-like protease Arabidopsis thaliana [[183]53] [184]Open in a new tab Discussion The pepper genome is large, with an estimated size of 3.48 Gb [[185]28], and such large genomes can be relatively costly to analyze by low-coverage sequencing or whole genome deep resequencing. SLAF-seq is a relatively new strategy for genome-wide discovery of SNPs and genotyping on a large scale that can improve the throughput and accuracy of high-coverage sequencing, and also make it more efficient and cost effective [[186]36]. A new strategy that combines SLAF-seq with BSA and takes advantage of both methods was used to analyze the genetic control of anthocyanin accumulation in the present study. This strategy also has been used for QTL analysis and linkage mapping in various species such as rice [[187]29], cotton [[188]30], and melon [[189]32], as well as pepper [[190]33, [191]34, [192]54]. For example, we previous had successfully mapped the pepper first flower node trait using the strategy, and the strategy had proven to be an effective method to identify candidate region and genes linked to a specific trait [[193]54]. Anthocyanins are one of the determinants of pepper color, and also happen to increase abiotic stress tolerance in plants and have benefits for human health [[194]4–[195]6]. In pepper, virus-induced gene silencing (VIGS) has been used to analyze the functions of structural genes and TFs that are part of the anthocyanin biosynthetic pathway [[196]9, [197]10, [198]55, [199]56]. For example, Zhang et al. (2015) revealed that silencing the R2R3-MYB TF CaMYB using VIGS led to the repression of the majority of anthocyanin pathway genes, except for PAL, C4H, and 4CL [[200]10]. However, to date, there are no reports yet of the use of combined SLAF-seq and BSA for the study of anthocyanin accumulation in pepper. The present study represents the first application of a combined SLAF-seq/BSA strategy for identification of genomic regions and genes linked to anthocyanin accumulation in pepper. Sequence data analysis indicated that accurate SLAF libraries were constructed after choosing the appropriate restriction enzyme, sizes of restriction fragments, and evaluating the SLAF libraries against a control library prepared from rice corresponding to our previous study [[201]54]. A total of 771,025 SLAFs evenly distributed along the CM334 reference pepper chromosomes were obtained from the SLAF libraries, and the sequencing depths of SLAFs from both pools and parents were all greater than 20×. For successful SLAF-seq, sequencing depth should exceed 6× and quality scores should be greater than Q30 for [[202]36]. Therefore, our data indicate that we successfully constructed an accurate and high-quality SLAF library for identification of the candidate regions and genes associated with anthocyanin accumulation in pepper. Molecular markers accurate, high-resolution genetic maps are essential for mapping QTL and for improving the efficiency of marker-assisted selection [[203]57–[204]60]. However, some PCR-based molecular markers such as amplified fragment-length polymorphism (AFLP) [[205]15, [206]61], random amplified polymorphic DNA (RAPD) [[207]62, [208]63], and simple sequence repeat (SSR) [[209]32, [210]64] markers are relatively low-density, non-specific, and provide incomplete coverage. In contrast, SNPs are high frequency markers with denser, genome-wide [[211]65, [212]66] distributions than SSR or other markers [[213]66]. Their ease of automatic genotyping [[214]67] and high polymorphism make SNPS valuable for molecular genetic analyses [[215]68]. Of the 836,852 SNPs we identified, over 17,073 SNPs were mapped onto the pepper chromosomes, which resulted in denser coverage of the whole pepper genome than in the Cheng et al. (2016) map [[216]69]. Our map will provide higher marker density and increase accuracy for identifying candidate genes. We used a set of 127,004 of high-quality SNPs to perform association analysis and identify candidate anthocyanin-associated regions in pepper. Overlapping the results of ED-based and SNP-index-based association analyses should improve prediction of candidate regions associated with anthocyanin accumulation, as was done by Geng et al. (2016) for seed weight in Brassica napus [[217]31]. SNP-index analysis is more accurate and quantitative for evaluation of frequencies and inheritance of parental alleles in the F[2] [[218]70]. We narrowed candidate anthocyanin-associated regions down to three within an interval of 14.10 Mb on chromosome 10 that contained 1674 high-quality SNPs. The annotations for these SNPs showed the locations of more than 96% of them in intergenic regions, and would thus be useful for fine-mapping of anthocyanin-related genes. The pepper CM334 reference genome has been available since 2014 [[219]28], and has allowed comparisons of high-throughput sequencing results from other pepper crosses, which helps to identify polymorphisms throughout the pepper genome. Anthocyanin biosynthetic pathway genes are incompletely dominant and quantitatively inherited in the Solanaceae [[220]15, [221]23, [222]71, [223]72]. Although the enzyme- and TF-encoding genes of the anthocyanin biosynthetic pathway have been extensively studied, most of these genes had not yet been fine mapped in pepper due to the lack of saturated linkage maps. With an accurate and high-quality SLAF library, we identified three candidate regions associated with anthocyanin accumulation on pepper chromosome 10, as did previous reports in pepper [[224]15, [225]24, [226]25, [227]27]. The A locus that controls anthocyanin accumulation had previously been mapped to pepper chromosome 10 [[228]24, [229]25] and found to be allelic to the fs10.1 locus 2.1 cM from the fs10.1 locus [[230]27]. Three candidate regions on chromosome 10 were identified in our study, intervals that ranged from 1.92 to 7.53 Mb in length. Additionally, two adjacent major QTLs (fap10.1, 106.4 cM and fap10.2, 109.8 cM) [[231]72] associated with control of anthocyanin in eggplant fruit were also mapped to chromosome 10. The ANT1 and AN2 loci from tomato have also been mapped to chromosome 10 [[232]73]. A homolog of AN2 from petunia has also been identified as a candidate gene for the pepper A locus [[233]15], and its location corresponds to those of tomato anthocyanin gainer and eggplant fap10.1 [[234]71], highlighting the genetic similarities between the solanaceous species pepper, tomato, and eggplant. Therefore, the 14.10-Mb interval on pepper chromosome 10 associated with anthocyanin accumulation is a strong candidate region that could harbor a gene(s) controlling anthocyanin accumulation in pepper. A series of structural genes that includes PAL, C4H, 4CL, CHS, CHI, F3H, F3’5’H, DFR, ANS, GTs, ATs, and MTs [[235]8–[236]10, [237]15–[238]17], and three regulatory genes including MYB, bHLH and WD40 affect anthocyanin biosynthesis [[239]7, [240]12, [241]21, [242]22]. However, there have been limited studies of the genetic regulatory mechanisms underlying the anthocyanin biosynthesis in pepper fruit. Until now, A locus is the only one locus to be identified to control the expression of early genes in the pathway, which encoded a CaMYB that cannot control the expression of PAL, C4H, and 4CL [[243]10]. The AN2 has been thought the most likely candidate gene for the pepper A locus [[244]15]. Whether there are genes other than CaMYB that can control the expression of early structural genes (PAL, C4H, and 4CL), and other transcription factors like bHLH and WD40 that can control the anthocyanin accumulation as the candidate genes for bHLH and WD40 have not been found in pepper so far. Besides, Deshpande (1933) also postulated the presence of a second locus other than A locus to explain the variation of anthocyanin [[245]74]. In this study, we sought to identify the candidate genes involved in anthocyanin accumulation and variation including A locus in pepper fruit. We identified a total of 126 genes in the candidate regions, and 12 of which, annotations indicate, could be related to anthocyanin accumulation. Future studies to isolate and functional testing other genes will likely be aided by the study of these candidate genes. Our results showed that CA10g04060, CA10g03640, CA10g0376, CA10g03880, CA10g12640, CA10g12650, and CA10g12890 are related to structural genes that might play important roles in pepper fruit anthocyanin biosynthesis. Among these, CA10g04060 and CA10g12890 were homologous to 4CL2 from Nicotiana tabacum and CYP76A2 from Solanum melongena, respectively, which have been separately related to 4CL and F3´5´H. Shi and Xie (2010) also found that the expression of 4CL2 increased during anthocyanin biosynthesis [[246]50]. CA10g03640 and CA10g03760, which encode 5AT and 3AT respectively, were both predicted to function in acylation of anthocyanin. CA10g03880, CA10g12640, and CA10g12650, homologs of 3GGT, UGT73C3, and UGT73C5 respectively, have all been annotated as predicted glucosyltransferases. Further, the genes encoding the anthocyanin biosynthetic enzymes are likely also transcriptionally regulated in pepper. We also identified homologs of MYB39 (CA10g03650) and APL (CA10g12810), which belong to the MYB TF family and could be involved in regulation of anthocyanin metabolism based on the GO annotation. Additionally, we may answer the question rasied by Zhang et al. (2015) that MYB39 might take part in phenylpropanoid pathway [[247]75]. Zhou et al. (2016) also reported that APL took part in phenylpropanoid and flavonoid biosynthesis pathways [[248]52]. CA10g03650 and CA10g12810 were different from A locus because A locus was homologous to tomato ANT1 and petunia AN2 [[249]15]. The two genes (CA10g03650 and CA10g12810) might be detected as a second or third gene other than A locus to explain the synthesis of anthocyanin, which was consistent with the postulation of Deshpande (1933) [[250]74]. Although previous studies have showed that the method of BSA combined with SLAF-seq in the study was accurate[[251]29,[252]30,[253]32,[254]33,[255]34,[256]54], A locus wasn’t identified most possibly because the plant material used in this study was different from that of Borovsky et al. (2004) [[257]15]. At the same time, the function of two genes in anthocyanin metabolism and the relationship between two genes with CaMYB need to be further verified. COG annotation indicated that CA10g12710 is a likely WD40-repeat protein that might regulate the expression of anthocyanin. Finally, CA10g03810 is related to a gene involved in accumulation of anthocyanin in tissues exposed to ultraviolet light, and a SNP resulting in a non-synonymous mutation in CA10g12840 was found. All above annotated genes lay a good foundation for the understanding of the genetic regulatory mechanisms of the anthocyanin accumulation in pepper fruit, and allow us to be cloned to further analyze the function of these genes that influence anthocyanin accumulation in this species, but not all genes are identified due to the limitation of BSA and large pepper genome (3.48Gb) as previous study mentioned [[258]54]. Although these genes are homologous to genes that are related to anthocyanin biosynthesis and accumulation, or contain non-synonymous SNPs in pepper, direct evidence as to whether these candidate genes control anthocyanin accumulation in pepper has not been found, but could be revealed by future analyses of the functions of these candidate genes. Supporting information S1 Fig. The anthocyanin biosynthesis pathway in pepper. (TIF) [259]Click here for additional data file.^ (1.1MB, tif) S2 Fig. Distribution of SLAFs on the chromosomes of the pepper CM334 reference genome in the restriction enzyme digestion experiment. The x-axis and y-axis represent chromosome length and chromosome number, respectively. The distance between two adjacent yellow bars indicates 1 Mb on the chromosome, and black lines indicate SLAFs or SNPs. (TIF) [260]Click here for additional data file.^ (2.1MB, tif) S3 Fig. The distribution of paired-end reads of the rice SLAF control mapped to the rice genome. Fragments between 364 and 414 bp in size were chosen. (TIF) [261]Click here for additional data file.^ (166.6KB, tif) S4 Fig. Circular graph of sequence variants detected by BSA in pepper. The first to fifth circles in the graph represent, in order, the 12 chromosomes of pepper, gene distribution, SNP density, Euclidean distance values, and ΔSNP-index values related to anthocyanin accumulation. (TIF) [262]Click here for additional data file.^ (1.4MB, tif) S5 Fig. GO term enrichment analysis of 24 candidate genes related to anthocyanin content according to functional categories in the cellular component, molecular function and biological process domains. (TIF) [263]Click here for additional data file.^ (501.9KB, tif) S6 Fig. Directed acyclic graphs of enriched cellular component domain GO terms in the candidate region associated with anthocyanin accumulation. Each enriched GO term is shown, and the box indicates the 10 most-enriched terms. A detailed description of each GO term and the significance of its enrichment are shown in the box or ellipse. Different colors represent different degrees of significance of enrichment: darker colors indicate greater significance. (PDF) [264]Click here for additional data file.^ (53.4KB, pdf) S7 Fig. Directed acyclic graphs of enriched GO terms from the molecular function domain in the candidate region for anthocyanin accumulation. Each enriched GO term is shown, and the box indicates the 10 most-enriched terms. A detailed description of each GO term and the significance of its enrichment are shown in the box or ellipse. Different colors represent different degrees of significance of enrichment: darker colors indicate greater significance. (PDF) [265]Click here for additional data file.^ (37KB, pdf) S8 Fig. Directed acyclic graphs of enriched biological process domain GO terms in the candidate region for anthocyanin accumulation. Each enriched GO term is shown, and the box indicates the 10 most-enriched terms. A detailed description of each GO term and the significance of its enrichment are shown in the box or ellipse. Different colors represent different degrees of significance of enrichment: darker colors indicate greater significance. (PDF) [266]Click here for additional data file.^ (140.5KB, pdf) S1 Table. Anthocyanin concentrations in parental lines and high- and low-anthocyanin pools. (XLSX) [267]Click here for additional data file.^ (11.8KB, xlsx) S2 Table. The distribution of SLAFs and SNPs on each chromosome of pepper. (DOCX) [268]Click here for additional data file.^ (13.3KB, docx) S3 Table. The properties of all SNPs identified between high- and low-anthocyanin lines and pools. (XLSX) [269]Click here for additional data file.^ (72.7MB, xlsx) S4 Table. Annotation of SNP markers in the candidate region for high-and low- anthocyanin parents and pools by both Euclidean distance and SNP-index association analysis. (DOCX) [270]Click here for additional data file.^ (14.2KB, docx) S5 Table. Annotation for 126 candidate genes for anthocyanin accumulation. (XLSX) [271]Click here for additional data file.^ (40.3KB, xlsx) S6 Table. Pathway enrichment analysis via KEGG for candidate genes in pepper. (DOCX) [272]Click here for additional data file.^ (13KB, docx) Data Availability All relevant data are within the paper and its Supporting Information files. Funding Statement Construction Program of Science and Technology Innovation Capability of Beijing Academy of Agriculture and Forestry ScienceS (KJCX2018042), Innovation Ability Construction Project of Beijing Academy of Agriculture and Forestry Sciences (KJCX20170102-14) and National Key R & D Plan (2016YFD0101700). References