Abstract Colored-leaf plants are increasingly popular and have been attracting more and more attentions. However, the molecular mechanism of leaf coloration in plants has not been fully understood. In this study, a colored-leaf cultivar of Populus deltoides (Caihong poplar, CHP) and green-leaf cultivar of Populus deltoides L2025 were used to explore the mechanism of leaf coloration through physiological and the whole genome resequencing analysis. The content of anthocyanins, total Chl, and carotenoids in the leaves of CHP and L2025 were evaluated. The ratio of anthocyanins to total Chl in CHP was 25.0 times higher than that in L2025; this could be attributed to the red leaf color of CHP. Based on the whole genome resequencing analysis, 951,421 polymorphic SNPs and 221,907 indels were screened between CHP and L2025. Using qRT-PCR analysis, three structural genes (flavonol synthase 1 family protein, UDP-glucose flavonoid 3-O-glucosyltransferase 3′ and flavonoid 3-O-galactosyl transferase family protein) and six transcription factors (MYB-related protein Myb4, transcription factor GAMYB, PtrMYB179, transcription factor bHLH53, transcription factor bHLH3, VARICOSE family protein) may be involved in the anthocyanin synthesis pathway, which could be used as candidate genes to explore the molecular regulation mechanism of leaf coloration in Populus deltoids, and could be used in molecular breeding in the future. Subject terms: Plant sciences, Plant genetics, Plant molecular biology Introduction With the progress of society and economic development, colored-leaf plants are increasingly popular, which have a wide range of applications in the courtyard embellishment, road greening, garden set King and so on. The mechanism of leaf coloration in plants is quite complicated, and several factors such as the type, concentration, and distribution of pigments in leaf cells can affect the colors of leaves^[38]1. There are three kinds of pigments in the leaves of higher plants: (i) flavonoids, mainly including anthocyanins; (ii) carotenoids, mainly including lutein and carotenoid pigments; and (iii) chlorophyll (Chl), mainly including Chl a and Chl b^[39]2. In spring, the leaves of plants are very tender, and have a weak capacity for chlorophyll synthesis owing to the low temperature. Anthocyanins are often dominant in spring, making the leaves red^[40]3,[41]4. The leaves of some woody trees turn red in fall as a result of the lower temperatures in this season; the lower temperatures are more conducive to the formation of anthocyanin pigments, and also mean the net concentration of Chl decreases as the leaves fall, increasing the ratio of anthocyanins to total Chl^[42]5,[43]6. The anthocyanin content in purple-leaved Pak Choi and red-leaved Quanhong poplar (QHP) was much higher than the anthocyanin content in green-leaved Pak Choi and green-leaved cultivar of Populus deltoides L2025^[44]7,[45]8. The occurrence of red, purple or blue leaves in many transgenic plants is due to the overexpression of the structural genes or regulatory genes associated with anthocyanin synthesis^[46]9–[47]12. The genes associated with the anthocyanin biosynthesis have been well studied; they include structural genes (CHS, CHI, F3H, DFR, ANS, and UFGT) and regulatory genes (MYB transcription factors, bHLH transcription factors, and WD40 transcription factors)^[48]13. These three kinds of transcription factors could combine to form a ternary complex to regulate structural genes or work individually^[49]14–[50]16. The sequence changes in genes or their promoters may change the content of anthocyanins through the expression regulation of genes associated with anthocyanin synthesis. MdMYB1–1 and MdMYB1–2, the alleles of MdMYB1 associated with the coloration of apple fruit skin, has eight nucleotide differences at the promoter regions. These differences indicate that MdMYB1–1 contributes to the red pigmentation in apple fruit skin, while MdMYB1–2 does not contribute to the red pigmentation in apple skin^[51]17. An autoregulatory site produced due to the insertion of multiple repeats on the MdMYB10 promoter, which produced increased anthocyanin content in flowers and fruits of apples through the increase in MYB10 transcription levels^[52]18. In addition, plants overexpressing regulatory genes or structural genes associated with anthocyanins synthesis, especially the MYB family genes, could modulate expression of various genes involved in anthocyanin synthesis to modify the colors of plant tissues, such as flowers, leaves and fruits. Arabidopsis overexpressing PAP1/AtMYB75 promoted the expression of structural genes associated with anthocyanin biosynthesis, leading to much higher accumulation of anthocyanins^[53]19–[54]21. Tobacco overexpressing IbMYB1a promotes the concentration of anthocyanin through up-regulation of several structural genes, such as ANS and DFR^[55]22. Tobacco overexpressing CsMYB6A increased the expression levels of structural genes associated with flavonoid synthesis, such as 3GT and CHS, leading to the high accumulation of anthocyanins in leaves of transgenic tobacco^[56]11. Recently, there have been larger number of reports on the functional analysis of genes associated with anthocyanin biosynthesis in poplar. Hybrid poplar overexpressing PtrMYB119 and PtrMYB118 separately, has higher anthocyanin content in leaves compared with those in wildtype poplar^[57]23,[58]24. MYB6, is a R2R3-MYB transcription factor from Populus tomentosa, and transgenic MYB6-OE poplar showed a red color in the young leaves and shoots, indicating that overexpression of MYB6 in transgenic poplar increased anthocyanin accumulation^[59]25. Populus trichocarpa R3 MYB-LIKE1 (PtrRML1), a repressor motif-containing poplar R3 MYB-like transcription factor, can reduce the anthocyanin content of stems, petioles, and rosette leaves of PtrRML1 transgenic Arabidopsis plants^[60]26. Overexpression of PtoMYB156, PtrMYB57, MYB182, MYB 165, and MYB 194 separately can also down-regulate anthocyanin biosynthesis in transgenic poplar^[61]27–[62]30. Although genes associated with anthocyanin biosynthesis are well understood, the molecular mechanism of leaf coloration in poplar is not well understood, and the identification of candidate genes responsible for expression of leaf color in plants is a topic of research that is worth pursuing. In the present study, colored-leaf cultivar of Populus deltoides (Caihong poplar, CHP) and green-leaf poplar L2025 were used to explore the mechanism of leaf coloration through physiological and whole genome resequencing analysis. The genes associated with anthocyanin biosynthesis and carrying more than five non-synonymous single nucleotide polymorphisms (SNPs) in the coding regions were identified as candidate genes. In addition, these candidate genes were further validated according to the quantitative reverse transcription polymerase chain reaction (qRT-PCR) analysis. These results could provide a physiological and global genomic perspective to understand the mechanism of leaf coloration in Populus deltoides, and could also contribute to the molecular breeding of poplar and other woody plants. Results Morphological phenotypes and growth status between CHP and L2025 There are many significant differences in morphological phenotypes between CHP and L2025, especially with regarded to leaf color. The color of leaves, veins, branches and petioles in CHP are bright red, while the color of these in L2025 are always green (Fig. [63]1). For the propagated cuttings, CHP grew up to 172.3 cm height and 13.6 mm basal diameter in one year, while L2025 grew up to 387.5 cm height and 28.2 mm basal diameter (Table [64]1). Figure 1. [65]Figure 1 [66]Open in a new tab The morphological phenotype of CHP and L2025. (A) CHP; (B) L2025. Table 1. The height and basal diameter of CHP and L2025, which were cutting propagation in one year. Height/cm ± SD Basal diameter/mm ± SD CHP 172.3 ± 3.0** 13.6 ± 0.2** L2025 387.5 ± 2.3 28.2 ± 0.2 [67]Open in a new tab Each value is the mean ± SE of five replicates. ** and * indicate p < 0.001 and p < 0.05 by Student’s t-test, respectively. Changes of leaf pigment contents between CHP and L2025 There was a lower concentration of total Chl, Chl a, Chl b and carotenoids in CHP leaves compared with concentrations in L2025 leaves (Table [68]2). However, the concentration of anthocyanin in CHP leaves was much higher than concentrations in L2025 leaves (Table [69]2). The anthocyanin to total Chl content ratio in CHP leaves was 25.0 times of these in L2025 leaves; this may be the physiological reason why CHP had bright red leaves and weaker growth. The bright red leaves could be a consequence of the higher anthocyanin to total Chl ratio, and the weaker growth may be due to the lower total chlorophyll content. Table 2. The content of chlorophyll, carotenoid and anthocyanin in the leaves of CHP and L2025. Chlorophyll a (mg/g FW) Chlorophyll b (mg/g FW) Total Chlorophyll (mg/g FW) Carotenoids (mg/g FW) Anthocyanin (mg/g FW) CHP 0.60 ± 0.00** 0.29 ± 0.01** 0.89 ± 0.00** 0.14 ± 0.01** 386.51 ± 0.10** L2025 1.02 ± 0.01 0.40 ± 0.01 1.42 ± 0.01 0.24 ± 0.00 24.63 ± 0.05 [70]Open in a new tab Each value is the mean ± SE of five replicates. ** and * indicate p < 0.001 and p < 0.05 by Student’s t-test, respectively. Whole genome resequencing of the Populus deltoids genome To explore the genome variation between CHP and L2025, the whole genome resequencing was conducted using the P. trichocarpa genome as a reference. A total of 106.4 million reads (about 15.96 Gb) for L2025 and 108.9 million reads (16.33 Gb) for CHP were generated compared with the reference genome, which has 37.2× and 38.4× sequencing depth, respectively (Table [71]3). There was 410.4 Mb consensus sequence in the L2025 genome from 100.7 million reads (94.7% of the total reads), and 410.8 Mb consensus sequence in the CHP genome from 103.5 million reads (95.0% of the total reads). The CHP and L2025 genome coverage were 95.8% and 95.7%, respectively, compared with the P. trichocarpa genome (Table [72]3). Among 36,071 homozygote polymorphic SNPs, transitions (C/T and A/G) accounted for 65.7%, and transversions (A/T, A/C, G/T, and C/G) accounted for 34.3%, indicating that transitions occurred more easily than transversions (Table [73]S1). Table 3. The screening of genome variation between CHP and L2025 compared with the reference genome based on the whole genome resequencing. L2025 CHP Total reads 106,415,483 108,920,101 Total size (bp) 4,723,926,206 4,876,051,559 Sequencing depth 37.2371x 38.4297x Mapped reads 100,732,805 (94.7%) 103,464,008 (95.0%) Properly paired reads 98,599,707 101,402,196 Consensus sequence (bp) 410444509 410845308 Genome coverage 95.7% 95.8% [74]Open in a new tab There were 951,421 polymorphic SNPs between L2025 and CHP compared with the P. trichocarpa genome. The highest number of SNPs in chromosome 01 was 94,154, and the lowest number of SNPs in chromosome 16 was 32,315 (Fig. [75]2, Table [76]S2). Among these polymorphic SNPs, 33.2% of total polymorphic SNPs occurred in genic regions, and the remaining ones occurred in intergenic regions. There were 91,732 polymorphic SNPs in coding sequence regions, and non-synonymous and synonymous SNPs accounted for 53.4% and 46.6%, respectively. The functions of 36,389 genes might be changed due to the polymorphic SNPs occurred in coding sequence regions (Table [77]S2). Figure 2. [78]Figure 2 [79]Open in a new tab The distribution of SNPs, indels and Structural variation (SV) in each poplar chromosome of CHP and L2025 compared with the chromosome of Populus trichocarpa. Structural variation (SV) in genome includes insertion, duplication, deletion, translocation and inversion with length of more than 50 bp. From outside to inside in turn is chromosome location, distribution of SNP density, indels density and SV in each poplar chromosome of L2025 and CHP, respectively. There was a similar trend for the frequency and length of deletions and insertions. The most frequent indels were mononucleotide indels, accounting for 49.9% of the total. The next most frequent were dinucleotide indels with 18.1%, followed by trinucleotide indels with 9.7%, and tetranucleotide indels with 7.2% (Fig. [80]S1). Among the 221,907 polymorphic indels, the highest number of polymorphic indels was 21,607 on chromosome 01, and the smallest number of polymorphic indels was 7580 on chromosome 16 (Table [81]S3). Most of the polymorphic indels (71.2%) occurred in intergenic regions. Of the 63,832 polymorphic indels in genic regions, there were 45,596 indels in the intron regions, 13,511 indels in the untranslated regions, and 4725 indels in the coding sequence regions (Table [82]S3). Gene ontology and KEGG analysis of candidate genes with polymorphic SNPs Gene Ontology (GO) analysis of candidate genes with polymorphic SNPs was conducted between L2025 and CHP. For the ‘molecular function’, the binding contained most genes (13,856), followed by catalytic activity (13,577) and transporter activity (1444). There were 968 genes for nucleic acid binding transcription factor activity, 551 genes for structural molecule activity, and 477 genes for molecular transducer activity. These candidate genes were also classified according to the ‘cellular components’ as cell part with 10,445 genes, membrane with 5797 genes, organelle with 7992 genes, and cell with 10,409 genes. For the ‘biological processes’, the metabolic processes contained 16,657 genes, cellular processes contained 13,573 genes, and single-organism processes contained 11,918 genes (Fig. [83]3). Figure 3. [84]Figure 3 [85]Open in a new tab Gene ontology (GO) functional enrichment analysis of candidate genes carrying polymorphic SNPs between CHP and L2025. The KEGG pathway enrichment analysis of candidate genes were conducted, and ‘Flavonoid biosynthesis’ (ko00941) was the enriched pathway associated with anthocyanin biosynthesis. There were 27 genes carrying polymorphic SNPs in this pathway. To further screen the possibly useful candidate genes associated with anthocyanin biosynthesis, genes carrying more than five non-synonymous SNPs in coding regions were considered. If these candidate genes do not work, genes carrying less than five non-synonymous SNPs in coding regions were also considered. In the flavonoid biosynthesis pathway, flavonol synthase 1 family protein (Ptr FLS1, Potri.004G139500) and shikimate O-hydroxycinnamoyl transferase-like (Potri.005G028000), carried more than five non-synonymous SNPs in coding regions, and were considered as candidate genes (Table [86]4). The relative expression level of these two candidate genes was evaluated, and there was a significant difference for Ptr FLS1 between CHP and L2025, and no significant difference for Shikimate O-hydroxycinnamoyl transferase-like (Fig. [87]S2). Therefore, Ptr FLS1 was screened as a candidate gene associated with anthocyanin biosynthesis. In addition, UDP-glucosyl transferase family genes with polymorphic SNPs were also counted. Six genes with more than five non-synonymous SNPs in coding regions are shown in Table [88]S4. The relative expression level of these six genes was also evaluated by qRT-PCR, and there were significant differences between UDP-glucose flavonoid 3-O-glucosyltransferase 3 and flavonoid 3-O-galactosyl transferase family protein; there were no significant differences among the other four genes (Fig. [89]4). Therefore, UDP-glucose flavonoid 3-O-glucosyltransferase 3 and flavonoid 3-O-galactosyl transferase family protein were also considered as candidate genes associated with anthocyanin biosynthesis. Table 4. The statistics of genes involved in the flavonoid biosynthesis carrying polymorphic SNPs in coding regions between CHP and L2025. Gene ID SNP numbers in coding regions Description Potri.013G073300 0 trans-cinnamate 4-hydroxylase [Populus tremuloides] Potri.018G146100 0 trans-cinnamate 4-hydroxylase [Populus trichocarpa] Potri.014G145100 0 chalcone synthase [Populus alba] Potri.010G104400 0 caffeoyl-CoA O-methyltransferase-like [Populus euphratica] Potri.009G069100 0 flavonoid 3′,5′-hydroxylase 2 isoform 1 [Theobroma cacao] Potri.005G229500 0 dihydroflavonol-4-reductase [Vitis vinifera] Potri.004G139700 0 flavonol synthase 1 family protein [Populus trichocarpa] Potri.003G176700 0 chalcone synthase 1-like isoform X1 [Populus euphratica] Potri.003G119100 0 leucoanthocyanidin dioxygenase-like [Populus euphratica] Potri.002G127500 0 dihydroflavonol-4-reductase [Zea mays] Potri.002G033600 0 dihydroflavonol reductase [Populus tremuloides] Potri.001G274600 0 flavonoid 3′,5′-hydroxylase 2 [Petunia hybrida] Potri.015G050200 1 leucoanthocyanidin reductase [Desmodium uncinatum] Potri.012G138800 1 stilbene synthase 4 OS = (Grape) PE = 3 SV = 1 [Vitis vinifera] Potri.003G176800 1 naregenin-chalcone synthase family protein [Populus trichocarpa] Potri.001G051600 1 naregenin-chalcone synthase family protein [Populus trichocarpa] Potri.001G051500 1 naregenin-chalcone synthase family protein [Populus trichocarpa] Potri.013G073300 1 flavonoid 3′-monooxygenase [Petunia hybrida] Potri.005G113700 2 naringenin-2-oxoglutarate 3-dioxygenase [Vitis vinifera] Potri.005G113900 2 naringenin-2-oxoglutarate 3-dioxygenase [Vitis vinifera] Potri.010G129800 2 leucoanthocyanidin reductase [Desmodium uncinatum] Potri.010G213000 2 chalcone isomerase family protein [Populus trichocarpa] Potri.003G176900 3 PREDICTED: chalcone synthase 1 [Populus euphratica] Potri.003G183900 3 shikimate O-hydroxycinnamoyltransferase [Arabidopsis thaliana] Potri.004G030700 3 leucoanthocyanidin reductase family protein [Populus trichocarpa] Potri.005G028000 5 shikimate O-hydroxycinnamoyltransferase-like [Populus euphratica] Potri.004G139500 9 flavonol synthase 1 family protein [Populus trichocarpa] [90]Open in a new tab Figure 4. [91]Figure 4 [92]Open in a new tab The relative expression levels of six genes involved in flavonoid biosynthesis in the leaves determined by quantitative real-time PCR (qRT-PCR) analysis between L2025 and CHP. Bars indicated the mean ± SE, n = 3. ***Indicated P ≤ 0.001. Candidate transcription factors associated with anthocyanin biosynthesis between L2025 and CHP The anthocyanin biosynthesis was mainly affected by three families of transcription factors: MYB, bHLH, and WD40. There were 157 MYB family genes, 121 bHLH family genes, and 57 WD40 family genes. Ten candidate transcription factors with more than five non-synonymous SNPs in coding regions were identified for MYB family genes, six for bHLH family genes, and four for WD40 family genes (Tables [93]S5–[94]S7). To further screen candidate transcription factors associated with anthocyanin biosynthesis, the relative expression levels of these ten, six, and four candidate transcription factors were conducted. There was much significant difference for ‘MYB-related protein Myb4’, ‘PtrMYB179’ and ‘transcription factor transcription factor GAMYB’ of leaves between L2025 and CHP, and no significant difference for the left ones (Fig. [95]5). There was higher relative expression of ‘transcription factor bHLH53’ and ‘transcription factor bHLH3’ in CHP leaves compared with that in L2025 leaves, and no significant differences for the other transcription factors (Fig. [96]6). For the WD40 family genes, there were significant differences for expression of the ‘VARICOSE family protein’ in CHP leaves compared with L2025 leaves; there were no significant differences for the other transcription factors (Fig. [97]7). These three MYB family, two bHLH family, and one WD40 family genes might play important roles in determining coloration of poplar, and could provide some references to further explore the