Abstract Background Anthocyanin is an important class of water-soluble pigments that are widely distributed in various tissues of plants, and it not only facilitates diverse color changes but also plays important roles in various biological processes. Maize silk, serving as an important reproductive organ and displaying a diverse range of colors, plays an indispensable role in biotic resistance through its possession of anthocyanin. However, the copy numbers, characteristics, and expression patterns of genes involved in maize anthocyanin biosynthesis are not fully understood. In this study, gene numbers, distribution, structure, cis-elements of the anthocyanin synthetic gene family were identified, and then the potential transcriptional factors were predicted by two analyzed methods. Finally, genes involved in maize silk pigment were screened by un-targeted metabolism analysis. Results Ten gene families involved in the maize anthocyanin biosynthesis pathway were identified, and 142 synthetic genes were obtained. These anthocyanin biosynthetic genes have high copy numbers and are normally clustered on chromosomes. The promoters of these synthetic genes contain various cis-elements and the gene expression patterns and transcriptional regulatory networks were analyzed. These genes are distributed on different chromosomes and gene expression patterns vary across different tissues in maize. Specifically, these genes often exhibit higher expression in the stem, leaves, and seeds. Ten highly expressed genes in silks were identified. Based on un-targeted metabolites detection in the silks of four maize representative inbred lines with different colors, two main differential anthocyanin components were identified. Furthermore, the gene expression patterns of the ten highly expressed genes and their potential interacting transcriptional factors were analyzed across the four inbred lines. Conclusions The results in this study show a through picture of maize anthocyanin synthetic genes, and the structure and function of genes related to anthocyanin biosynthesis in maize could be further investigated. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-025-06053-4. Keywords: Maize, Anthocyanin, Biosynthetic genes, Gene expression, Silk Background Anthocyanins, a subclass of proanthocyanidins featuring a 6-carbon (C6) to 3-carbon (C3) to 6-carbon (C6) skeleton, represent a significant category of water-soluble pigments that are naturally occurring in various plant parts, including stems, leaves, flowers, and fruits [[38]1]. Six forms of anthocyanins, including cyanidin, peonidin, delphinidin, malvidin, petunidin, and pelargonidin, are widely distributed in plants, and they facilitate diverse color changes and play vital roles in visual attractiveness and ecological adaptability [[39]2, [40]3]. The diversity of anthocyanins not only enriches the appearance of plants but also plays a significant part in several biological processes [[41]4], such as drought, low temperature, salt stress, and pathogenic attacks [[42]5]. For example, over-expressing wheat R2R3-MYB gene TaPL1 (PURPLE PLANT1) in Arabidopsis led to increased anthocyanin glycoside content of leaves and enhanced drought resistance [[43]6]. In Brassica napus, overexpressing AtDFR of Arabidopsis elevated anthocyanin glycoside content and enhanced salt stress resistance [[44]7]. In addition, overexpression of MdATG18a in Arabidopsis and apples increased tolerance to N-depletion and induced significant anthocyanin accumulation [[45]8]. Besides increasing abiotic stress, over-expression of anthocyanin biosynthetic genes could enhance biotic stress resistance. For example, over-expression of maize leaf color gene (Lc) in apples enhanced resistance to fungal pathogens [[46]9], and silencing of GhCHS, a key enzyme in flavonoid biosynthesis, enhanced susceptibility to yellowing disease [[47]10]. The biosynthesis and metabolism of anthocyanins fall within the flavonoid metabolic pathway. The biosynthesis of plant flavonoids is a complex process that involves the participation of multiple enzyme genes, intricate regulatory relationships, and recognized pathways for biosynthesis [[48]11–[49]13]. In the initial phase of this metabolic pathway, the phenylalanine is first catalyzed by phenylalanine lyase (PAL), cinnamon 4-hydroxylase (C4H), and 4-coumarate: CoA ligase (4CL) facilitates the formation of 4-coumaroyl-CoA, which is then converted into naringenin by chalcone synthase (CHS) [[50]14], and chalcone isomerase (CHI) [[51]15, [52]16] to produce naringenin. Naringenin is an intermediate metabolite of flavonoids, and its synthesis is catalyzed by flavonoid 3-hydroxylase (F3H), which produces dihydrokaempferol. Subsequently, dihydro quercetin and dihydro succinate were produced by catalyzing flavonoids 3 ‘-hydroxylase (F3’H) and 3’-5’ -hydroxylase (F3’5’H), respectively. The synthesis of flavonoids and anthocyanins is catalyzed by three dihydroflavonoid-flavonol synthase (FLS) and dihydroflavonol-4-reductase (DFR) enzymes. Ultimately, anthocyanidin synthase (ANS) facilitates the formation of anthocyanins through the catalysis of anthocyanins [[53]17, [54]18] (Supplementary Fig. [55]1). In Arabidopsis, the anthocyanin biosynthetic genes have been extensively studied, including their copy numbers in the genome and their functions in anthocyanin biosynthesis. For instance, multiple genes encoding PAL, C4H, and 4CL [[56]19–[57]23] were cloned and identified, including four genes encoding PAL isoforms. The analysis of mutants and gene expression experiments conducted under conditions of nitrogen depletion and hypothermia demonstrated that the two isoforms, PAL1 and PAL2, exhibited notable advantages in the biosynthesis of flavonoids. Similarly, 4CL3 is involved in the flavonoid pathway, while 4CL1 and 4CL2 are associated with forming hydroxycinnamic acid derivatives. Compared to the genes encoding PAL and 4CL, only one gene was identified in the Arabidopsis genome as encoding C4H. Furthermore, the research team identified extensive gene families of anthocyanin biosynthetic genes in various plants, representing both model and horticultural species. For illustration, the CHS gene family has been characterized in Phalaenopsis Aphrodite [[58]24], Solanum melongena [[59]14], and Begonia grandis [[60]25]. In contrast, the CHI gene family has been identified in Morus Multicaulis [[61]26] and Dracaena Cambodiana [[62]27]. There are significant differences in the size and composition of each plant species. Therefore, studying different gene families in other species will aid us in understanding their evolution and function. In addition to the biosynthetic genes in these pathways, transcription factors (TFs), including MYB, bHLH, and WD40, were identified as binding specifically to the regions of anthocyanin biosynthetic genes that regulate their expression, thereby influencing the synthesis of anthocyanins [[63]28–[64]31]. The MBW complex is constituted by R2R3-MYB proteins, MYC family bHLH proteins, and WD40 proteins and plays a pivotal role in the biosynthesis of anthocyanins [[65]32, [66]33]. The WD40 gene pale aleurone color1 (PAC1) isolated from maize was the plant’ first identified WD40-type TF. As a WD40 protein, PAC1 activates anthocyanin biosynthetic genes in maize seeds. Furthermore, PAC1 forms a complex with MYB and bHLH proteins, which jointly regulate the expression of biosynthetic genes. Meanwhile, scientists also isolated TRANSPARENT TESTA GLABRA1 (TTG1) transcription factors from Arabidopsis and maize, which generally form complexes with bHLH proteins and MYBs to regulate anthocyanin synthesis. As one of the most significant cereal crops, maize (Zea mays L.) provides food, animal feed, and biofuel globally [[67]34]. Until now, breeders have developed numerous maize varieties with distinct seed pigment deposition by using different methods [[68]35–[69]37], determined by the accumulation of various types and levels of flavonoid compounds, including anthocyanins, phlobaphenes, carotenoids, and others. In maize seeds, anthocyanins are primarily concentrated in the aleurone layer, and several biosynthetic genes have been identified, for example, C2 (Colorless2) [[70]38] and WHP1 (White pollen1) [[71]39]. Dihydroflavonol 4-reductase (DFR) [[72]40] which A1 encodes in corn, promotes the conversion of 4-carbonyl groups in CRC, resulting in the formation of 3-deoxy flavonol 4-alcohols (such as flavonoids, which are subsequently polymerized into phlobaphenes) and 3-hydroxy flavonols 4-ols (such as leucoglycosides, which further stimulate the hydrogen cyanide reaction). The enzyme anthocyanidin synthase (ANS), also identified as leucoanthocyanidin dioxygenase (LDOX), is responsible for converting leucocyanidin to anthocyanidins. Its encoding is designated as A2 in maize, as evidenced by previous research [[73]41, [74]42]. Maize silk is a vital reproductive organ and a rich source of color and plays an essential role in disease resistance [[75]43, [76]44]. Nevertheless, the function of some genes involved in the biosynthesis of anthocyanins remains uncertain, seriously hindering the further investigation of molecular mechanisms of anthocyanin biosynthesis. By homologous comparison of anthocyanin pathway genes in Arabidopsis, the research identified ten families of genes involved in the biosynthesis of anthocyanins in the maize genome. A comprehensive examination of the gene structural characteristics, patterns of evolutionary relationships, cis-elements, and expression patterns was undertaken. Based on the genome identification of anthocyanin pathway genes, four maize inbred lines with different silk colors were selected to check un-targeted metabolites. Differential metabolites were identified by comparing the untargeted profiling. This study investigated the expression patterns of anthocyanin biosynthesis-related genes in the silks of various inbred lines. The findings of this study offer a comprehensive genomic analysis of anthocyanin pathway genes in maize, identifying pivotal genes within this pathway. This establishes a foundation for advancing research on anthocyanin synthesis. Materials and methods Plant materials Four inbred lines of colored silks of maize were obtained from the University of Science and Technology Beijing collection, designated as B112, B286, B440, and B632. The seeds were planted in the Beijing Pinggu experimental field. Roots, stems, and leaves of B73 at the stage of 10th leaves and silks 5 days of 5-day unpollinated. Following rapid freezing with liquid nitrogen, the samples were stored at -80 °C for subsequent gene expression pattern analysis. The unpollinated silks from the remaining four inbred lines were harvested and stored at -80 °C for subsequent gene expression analysis of the identified biosynthesis candidates. Identification of gene family members and sequence analysis The gene sequences of the biosynthetic pathways in A. thaliana and rice genome were obtained from the TAIR ([77]http://www.arabidopsis.org) and Oryzabase ([78]https://shigen.nig.ac.jp/shigenjabrice/) database (Supplementary Table [79]S1). The BLAST program (E-value < 1E-50) in TBtools (v1.098745) software [[80]45] was used to identify homologous genes in the maize genome [[81]46, [82]47] by querying the protein sequences of the Arabidopsis and rice genes. Protein sequences were aligned using ClustalW in MEGAX. TBtools trimmed the alignment to remove gaps. Phylogenic analysis of these genes was performed using the Neighbor-Joining (NJ) method in MEGAX. The phylogenetic tree was visualized using the iTOL online tool ([83]https://itol.embl.de/itol.cgi). Chromosome localization and prediction of cis-elements All the identified gene family members’ chromosome localization was retrieved from the MaizeGDB database and mapped using MapChart (v2.32) ([84]https://www.mapchart.net). The gene promoter sequences (2000 bp upstream of the transcription start site) for all the genes were extracted and analyzed using the Plant CARE tool ([85]http://bioinformatrics.psb.urgent.be/webtools/plantcare/HTML/) to identify cis-elements. The results were visualized using TBtools [[86]45]. Gene expression pattern analysis of anthocyanin biosynthetic gene and visualization Expression pattern data for anthocyanin biosynthetic genes in 21 specific tissues were obtained from the MaizeGDB database (Supplementary Table [87]S2). Heat maps of gene expression data were constructed by using TBtools. Total RNA was extracted using TransZol (TransGen Biotech, Beijing, China) from distinct tissues (including roots, stems, and leaves of 10-leaf seedlings) and B73 silk that had not been pollinated for five days. Total RNA was inversely transcribed into cDNA using an all-in-one RT Super-Mix kit (Vazyme, Nanjing, China). Based on B73 gene expression values, ten genes, including Zm00001eb245960 (ZmPR1), Zm00001eb159020 (ZmA1), Zm00001eb062510, Zm00001eb346150 (ZmCYP47), Zm00001eb394430 (ZmFNS1), Zm00001eb185250 (ZmPAL5), Zm00001eb187910, Zm00001eb431510, Zm00001eb198080 (ZmCHLS7), and Zm00001eb142000 with high expression values in silk tissue were randomly selected for performing RT-qPCR analysis in the four represented inbred lines. Gene-specific primers were devised utilizing the qprimerDB software ([88]https://qprimerdb.biodb.org/analysis). Quantitative real-time polymerase chain reaction (RT-qPCR) was performed on the Quantitative Real-Time PCR System 5 (ABI, MA, USA) using the TaKaRa Premix Ex Taq with ZmCAT as an intra-assay control. Three replicates were conducted for each experiment, with the data analyzed using the 2^^−ΔΔCt _method [[89]48]. Gene structure and conserved motif analysis To analyze the conserved motifs of the identified genes, MEME software ([90]http://meme-suite.org/) was used with the default parameters to identify up to 10 motifs. Domain names were determined with the NCBI CD-Search Tool([91]https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). Gene structure prediction was conducted using gene annotation files from the Ensembl Plants database ([92]https://plants.ensembl.org/index.html). Finally, the conserved motif and genetic structure were comprehensively analyzed using the Mycobacterium TB tool. GO enrichment and KEGG pathway enrichment analysis GO Enrichment analysis was conducted using GO software ([93]https://biit.cs.ut.ee/gprofiler/gost), covering three distinct categories: biological processes (BP), cellular component (CC), and molecular function (MF). The significance of the results was evaluated using a hypergeometric test, and multiple testing corrections were conducted using the Benjamini-Hochberg method, with a significance threshold of p < 0.05 for a corrected q-value. KEGG pathway enrichment analysis was conducted utilizing the orthology-based annotation system ([94]http://kobas.cbi.pku.edu.cn/). The list of genes used was the same as in the GO analysis. Specific background gene sets were selected, and a hypergeometric test was applied to assess the abundance of each pathway. The Benjamini-Hochberg method was employed for calibration purposes to control the analysis of multiple tests, with a corrected p-value of less than 0.05 representing a significant threshold. Subcellular localization of selected genes The subcellular localization of Zm00001eb049660, Zm00001eb149150, and Zm00001eb200050 genes was established in Nicotiana benthamiana (tobacco) cells through homologous recombination with analogous coding sequences. These genes’ full-length coding sequences (CDS) were cloned into a pJG186-35 S-eGFP vector downstream of the 35 S promoter. Nicotiana benthamiana plants were then infected with Agrobacterium strain EHA105, carrying the structure. After a 36-hour incubation, GFP fluorescence signals were identified and documented using a scanning probe microscope with an excitation wavelength of 448 nm (Leica SP8). Empty pJG186-35 S eGFP vectors were used as controls for evaluating background fluorescence. Potential transcriptional regulation analysis of biosynthetic genes To investigate the potential transcriptional regulation patterns of transcription associated with these biosynthetic genes, the PlantTFDB database ([95]http://planttfdb.gao-lab.org/) was used for the first time to predict transcription factors related to the genes. The gene expression data were further analyzed to identify potential interaction relationships. This was achieved using a Triple Gene Mutual Interaction (TGMI) algorithm [[96]49] to examine the identified three-gene interactions and biosynthetic genes. Differential metabolite identification in silks of four maize inbred lines The Silks of the four inbred lines were subjected to cryogenic grinding (MM 400, Retsch, Germany) and freeze-drying (Alpha 3–4 LSC, CHRIST, Germany) to facilitate the detection of untargeted metabolism. The metabolisms were extracted as previously described [[97]50]. Then, the UHPLC-Q-TOF-MS system [[98]51] was performed to detect the metabolisms. Three biological replicates were used for metabolism detection. The raw data of metabolisms were finally collected by the metabolomics software Mass Profiler Professional Version 15.0 (MPP). The differential metabolites among the four inbred lines were further analyzed. The numerical values of each metabolite from B112 were used as a control. The students’ t-test processed the data, and principal component analysis (PCA) and clustering analysis were conducted using MetaboAnalyst 5.0 ([99]https://www.metaboanalyst.ca/). Gene expression analysis of anthocyanin biosynthetic genes and TFs in silks with different colors To better predict the candidate genes for anthocyanin synthesis in maize silk, the genes with high silk expression and the predicted TFs were selected for a qRT-PCR experiment in the four inbred lines. The gene-specific primers are also provided in Supplementary Table [100]S3. Results Identification of gene families involved in the anthocyanin biosynthetic pathway in maize One hundred forty-two biosynthetic genes were identified from 10 gene families within the maize genome (Fig. [101]1, Supplementary Table [102]S4). Of these, 52 were previously annotated as known synthase genes, and the remaining 90 were designated with the prefix “Zm” based on their specific chromosomal locations. A comparison of the number of gene families in maize and Arabidopsis revealed that Arabidopsis has a relatively limited number of genes, with an average of 28 genes per family and only three members (Supplementary Table [103]S1). In contrast, the number of genes involved in anthocyanin biosynthesis is considerably higher in maize, with an average of 14 members per gene family (Supplementary Table [104]S4). The number of members in different gene families ranged from 2 to 35, with 35 being recorded for the F3’H gene family and a minimum of two observed in the F3H gene family. Fig. 1. [105]Fig. 1 [106]Open in a new tab The genes’ distribution in the anthocyanin biosynthesis pathway in the maize genome. Chromosome numbers are shown on the left side of each chromosome. Genes are labeled on the right side of the chromosome. Characters with different colors represent ten types of anthocyanin biosynthetic genes. The leftmost scale bar represents the chromosome length (Mb) Next, we show where each gene is located in the chromosome. We found 142 genes on ten chromosomes, each unevenly distributed from 5 to 24 genes (Fig. [107]1). these genes also exhibit distinct clustering patterns on chromosomes. The ZmF3’H and ZmDFR gene families exhibited the highest concentrations, while the ZmANS gene family demonstrated the lowest concentrations. It is noteworthy that the ZmPAL3-ZmPAL5 genes of the PAL family and the ZmCHLS7-ZmCHLS10 genes in the CHLS family were found to be contiguously distributed (Fig. [108]1). To provide further insight into the evolutionary relationships between the genetic family members of Arabidopsis and their homologs, phylogenetic trees were constructed (Supplementary Fig. [109]2). The ZmPAL, Zm4CL, ZmCHS, ZmF3’5’H, and ZmF3’H families had two evolutionary branches. At the same time, the remaining genes belonged to only one sizeable evolutionary branch (Supplementary Fig. [110]2). In the ZmANS family, one gene, Zm00001eb073030, was evolutionarily distant from the other genes. For Zm4CL, two Arabidopsis homologous genes were grouped in one evolutionary branch, while the other members fell into another branch. Analysis of cis-elements in gene promoters of the identified biosynthetic genes We analyzed the cis-elements in 142 gene promoter regions to elucidate gene structure better. This resulted in identifying 24 distinct cis-elements distributed across the promoter regions (Supplementary Fig. [111]3). Each gene contains 2–41 cis-elements. These included fundamental cis-elements, such as MYB binding sites, G-boxes, GC-motifs, and TATA-boxes (Fig. [112]2). The lowest number of cis-elements was observed in Zm00001eb159020, Zm00001eb177240, and Zm00001eb200040. In contrast, the highest number of these elements was identified in F3’5’H Zm00001eb031040, totaling 41. In addition to the fundamental cis-acting elements, three cisgender factors involved in growth, environmental stress, and hormone response pathways were identified. These included G-box, TCA element, ABRE, TGACG-motif, AACGG -motif, TGA element, and MBS. In addition to the AACGG -motif, the other cis-elements were identified, such as P-boxes, H-boxes, and G-boxes, which were explicitly recognized and bound by MYB TFs. The cis-elements were classified into three categories based on their related functions: plant growth and development, plant hormone response, and biotic and abiotic stress responses (Fig. [113]2B). Fig. 2. [114]Fig. 2 [115]Open in a new tab The cis-elements identified within 2.0 kb promoter regions of anthocyanin biosynthetic genes using Plant CARE. (A) The number of cis-elements identified by Plant CARE. (B) Three most frequent categories of cis-acting elements in biosynthetic genes Gene expression pattern analysis for the identified biosynthetic genes We conducted an in-depth analysis of their gene expression patterns using RNA-Seq data from 21 diverse tissue types to obtain comprehensive expression profiles for the identified biosynthetic genes. Based on the resulting expression patterns, the genes were classified into five groups, primarily comprising those expressed in roots, stems, leaves, seeds, and silks (Fig. [116]3A). Of the five groups, 44 genes exhibited high expression levels in roots, 8 in stems, 17 in leaves, 8 in seeds, and 17 in silks. Consistent with previous studies, the core genes ZmPAL9(Zm00001eb247630), Zm00001eb316920, and Zm4CL2 (Zm00001eb233720) in the phenylpropanoid pathway were found to be highly expressed in the majority of stem and root tissues, due to their significant lignification. Fig. 3. [117]Fig. 3 [118]Open in a new tab Gene expression in the anthocyanin biosynthesis pathway in different tissue types. (A) Heat map representation of biosynthetic genes across various tissues and developmental stages. The FPKM values were transformed in log2, and a heat map was generated using TB tool software. The bar at the side represents log2 transformed values. (B) The qRT-PCR analysis of genes in five tissues. The error bars indicate the mean ± SD (n = 3) Based on the gene types, we selected ten genes with high expression in silks for validation in different tissues of B73 using qRT-PCR. We found seven genes, ZmPAL5, ZmCYP47, Zm00001eb187910, ZmFNS1, ZmPR1, ZmA1, and Zm00001eb431510 exhibited the highest expression levels in silks, which correlated with the data obtained from RNA-seq. The other two genes, ZmCHLS7 and Zm00001eb142000 displayed relatively lower expression in leaves followed by silks, and Zm00001eb062510 expressed relatively more in seeds followed by silks (Fig. [119]3B). Analysis of gene motif composition and gene structure Based on the phylogenetic tree analysis, we identified 40 new genes (Supplementary Table [120]S5). To elucidate the protein structure of the newly discovered genes, we investigated the sequence, domain conservation, and gene structure of 40 genes (Fig. [121]4). We found up to 10 highly conserved sequences in each gene. The results revealed that proteins within the same group exhibited comparable motif compositions, indicating that these anthocyanin-related genes may have analogous functional functions. Additionally, we examined the intron and exon compositions based on gene annotation files to characterize the gene structure of anthocyanin-related genes. The range of introns per gene is between two and five. Notably, considerable variation was observed in the gene structure of the biosynthetic gene families. All genes contained one or two non-coding regions, which may be attributed to a dearth of annotation data within the corresponding untranslated region (UTR) in the GFF file. Fig. 4. [122]Fig. 4 [123]Open in a new tab Analysis of conserved motifs and gene structure in anthocyanin biosynthetic genes in maize. (A) The motif composition of anthocyanin biosynthesis-related genes was analyzed and depicted using colored boxes to represent ten distinct conserved motifs. (B) The gene structures were examined, highlighting the arrangement of introns and exons in these genes. The representation provides insight into the structural features associated with anthocyanin synthesis Prediction of new gene KEGG pathway analysis and GO analysis To better understand the role of these genes in biological networks, enhanced analysis of KEGG function was performed. The enriched pathways covered several critical biological processes and metabolic pathways; a total of 12 significant pathways were selected (Supplementary Table [124]S6). Notably, these genes are predominantly linked to the biosynthesis of secondary metabolites (Fig. [125]5A). The enrichment analysis highlights their significant involvement in Phenylpropanoid Biosynthesis, Biosynthesis of Secondary Metabolites, biosynthesis of a stile, Diarylheptanoid and Gingerol Biosynthesis, and biosynthesis of Ubiquinone and Other Terpenoid-Quinone Biosynthesis. The discovery of these genes indicates that they are integral to phenylpropanoid biosynthesis. Fig. 5. [126]Fig. 5 [127]Open in a new tab KEGG pathway and GO functional enrichment analysis. (A) KEGG pathway enrichment analysis of the relevant genes. The paths are represented by points on the vertical axis, with the enrichment factor shown on the horizontal axis and point sizes indicating the number of genes involved. The top 20 enriched pathways are displayed, with p-values < 0.05. (B) GO functional analysis of genes, showing enrichment in biological processes, cellular components, and molecular functions The corresponding genes were analyzed by GO enrichment, the dot size representing the number of genes associated with each GO item, and the dot color reflecting the p-value range. The results, shown for p-values < 0.05, highlight the significant enrichments. Gene binding patterns were analyzed alongside KEGG enrichment, revealing conserved motifs at gene binding sites and identifying genes associated with these predicted binding patterns. The GO functional enrichment indicates that these genes are primarily related to molecular functions and biological processes. Specifically, 40 predicted genes exhibit diverse protein functions (Supplementary Table [128]S7). In biological processes, significant functions include the secondary metabolic process (GO:0019748), secondary metabolite biosynthetic process (GO:0044550), phenylpropanoid metabolic process (GO:0009698), polyketide biosynthetic process (GO:0030639), polyketide metabolic process (GO:0030638), flavonoid biosynthetic process (GO:0009813), flavonoid metabolic process (GO:0009812), single-organism metabolic process (GO:0044710), and single-organism biosynthetic process (GO:0044711) (Fig. [129]5B). For cellular components, significant functions include an integral component of membrane (GO:0016021) and membrane part (GO:0044425) (Fig. [130]5B). For molecular functions, significant activities include oxidoreductase activity, binding to or reduction of molecular oxygen to paired donors (GO:0016705), monooxygenase activity (GO:0004497), iron ion binding (GO:0005506), heme binding (GO:0020037), tetrapyrrole binding (GO:0046906), and oxidoreductase activity (GO:0016491). Subcellular localization To explore the subcellular localization of new genes, we selected Zm00001eb049660, Zm00001eb149150, and Zm00001eb200050, which predicted that they are F3’H, DFR, and CHS-type genes. The fusion protein was expressed in vivo through the infection of tobacco plants (Nicotiana benthamiana) with the Agrobacterium tumefaciens strain GV3101. The fluorescent signal emitted by the fusion protein was distinguishable from that produced by the pJG186-35 S-eGFP control, displaying a specific localization at the cell membrane. These observations indicate that the fusion proteins are predominantly expressed in the cell membrane and Zm00001eb049660 is expressed in both the membrane and the nucleus (Fig. [131]6). Fig. 6. [132]Fig. 6 [133]Open in a new tab Subcellular localization of GFP fusion proteins. The GFP fusion proteins 35 S-GFP, Zm00001eb049660, Zm00001eb149150, and Zm00001eb200050 were transiently expressed in Nicotiana benthamiana epidermal cells. Fluorescence microscopy images are shown with a scale bar of 50 μm Prediction of the transcriptional regulatory network in the anthocyanin biosynthesis pathway In this study, two methods, PlantTFDB and TGMI, were employed to comprehensively predict the transcriptional regulatory network in the anthocyanin biosynthesis pathway of maize. Ultimately, 137 transcription factors were identified as potential interactors with 99 biosynthetic genes (Supplementary Table [134]S8). In previous studies, It was well-established that MYB, bHLH, and bZIP were the three predominant TFs regulating flavonoid synthesis within the regulatory network [[135]52] (Fig. [136]7). Accordingly, these three transcription factors were selected for further analysis. Finally, we identified 40 TFs that might have potential in gene expression regulation of 62 biosynthetic genes (Table [137]1; Fig. [138]7A). Fig. 7. [139]Fig. 7 [140]Open in a new tab Prediction of transcriptional regulatory networks between the anthocyanin biosynthetic genes and TFs using RNA-Seq data. (A) Regulatory relationship of the three main types of the TF and all the corresponding monolignol biosynthetic genes generated. Green blocks represent statistically significant interactions. (B) Regulatory network of the three main types of the TFs and the corresponding specific monolignol biosynthetic genes. The dark blue nodes represent monolignol biosynthetic genes. Green, purple, and orange nodes are TFs Table 1. Three main types of predicted TFs associated with the biosynthetic genes in maize Gene family Gene ID Gene symbol Reference bHLH Zm00001eb061300 bHLH17 [[141]53] Zm00001eb050790 bHLH43 [[142]54] Zm00001eb059460 bHLH47 [[143]55] Zm00001eb155400 bHLH57 [[144]56] Zm00001eb114530 bHLH59 [[145]57] Zm00001eb332400 bHLH65 [[146]56] Zm00001eb323880 bHLH90 [[147]58] Zm00001eb315910 bHLH106 [[148]44] Zm00001eb317460 bHLH136 [[149]59] Zm00001eb024330 bHLH149 [[150]60] BZIP Zm00001eb133640 bZIP3 [[151]53] Zm00001eb316720 bZIP7 [[152]61] Zm00001eb253370 bZIP13 [[153]62] Zm00001eb296340 bZIP18 [[154]63] Zm00001eb092090 bZIP23 [[155]64] Zm00001eb069140 bZIP27 [[156]65] Zm00001eb060860 bZIP41 [[157]66] Zm00001eb330680 bZIP58 [[158]67] Zm00001eb235510 bZIP61 [[159]68] Zm00001eb209070 bZIP84 [[160]69] Zm00001eb373300 bZIP100 [[161]70] Zm00001eb150010 bZIP102 [[162]64] Zm00001eb231360 bZIP160 [[163]71] MYB Zm00001eb136720 MYB6 [[164]72] Zm00001eb321920 MYB14 [[165]73] Zm00001eb148320 MYB15 [[166]74] Zm00001eb187810 MYB19 [[167]75] Zm00001eb335060 MYB22 [[168]76] Zm00001eb103730 MYB31 [[169]74] Zm00001eb216040 MYB37 [[170]77] Zm00001eb247570 MYB41 [[171]78] Zm00001eb255300 MYB48 [[172]76] Zm00001eb041450 MYB78 [[173]79] Zm00001eb060980 MYB80 [[174]53] Zm00001eb248590 MYB89 [[175]80] Zm00001eb066260 MYB133 [[176]53] Zm00001eb150140 MYB138 [[177]81] Zm00001eb417620 MYB149 [[178]76] Zm00001eb307400 MYB153 [[179]70] Zm00001eb288730 MYBR3 [[180]71] [181]Open in a new tab The MYB regulatory network encompasses 65 nodes, of which 17 are transcription factors (TFs) and 44 are structural genes. The central TF Zm00001eb417620 interacted with up to nine downstream genes, while peripheral TFs, Zm00001eb136720, Zm00001eb148320, and Zm00001eb288730 were found to interact with only one downstream gene, indicating a potentially more constrained regulatory framework. Similarly, 21 nodes, consisting of 10 TFs and 19 structural genes, were found in the bHLH regulatory network. The core TF Zm00001eb155400 regulated up to five genes, while marginal TFs like Zm00001eb315910 and Zm00001eb323880 owned the smallest regulatory scope of only one downstream gene each. The bZIP regulatory network comprised 34 nodes, 12 TFs, and 25 structural genes. The central TF Zm00001eb373300 exhibited the most significant number of downstream genes (six genes). In contrast, TFs situated at the periphery, such as Zm00001eb231360 and Zm00001eb316720, displayed a markedly diminished regulatory impact, with each TF associated with a mere one downstream gene (Fig. [182]7B). Among the highly expressed biosynthetic genes in silks, five genes, ZmA1, Zm00001eb062510, ZmPR1, ZmPAL5, and ZmCHLS7, were found to have two types of regulatory relationships with their corresponding TFs. The first type was that one biosynthetic gene corresponds to only one TF, such as ZmA1, ZmPR1, ZmCHLS7, and Zm00001eb062510 being regulated by TFs ZmbHLH57, ZmMYB41, ZmMYB138, and ZmbHLH149, respectively (Fig. [183]7B). The other type was that multiple TFs, such as ZmCYP47 regulated one gene, it was found co-regulated by ZmMYB19 and Zm00001eb255300. These results showed that different biosynthetic genes had different regulatory mechanisms, and they worked together to control the anthocyanin synthesis in the maize genome. Determination of metabolite content in the silks of four representative maize inbred lines To identify the substances that caused silk color differences among the four maize inbred lines (Fig. [184]8A), an un-targeted metabolomic analysis was initially conducted on the silk. A total of 427 metabolites from 12 samples of the four inbred lines were detected (Supplementary Table [185]S9), and the detected substances were classified into nine types, including flavonoids, organic acids, amino acids, terpenoids and other compounds (Fig. [186]8B). Among these types, the type of the flavonoids was the most abundant superclass (10.75%), followed by organic acids and lipids (8.6%), carbohydrates (7.53%), and amino acids (6.45%). Principal component analysis (PCA) of the 27 metabolites detected revealed that the samples exhibited significant differentiation from different lines (Fig. [187]8C). PC1 and PC2 collectively accounted for 57.38% and 32.47% of the total mutations, respectively, indicating that these components effectively distinguished the samples. The PCA results showed that metabolite-abundant differences between B440 and B632 were relatively minor, whereas B112 vs. B286 and B286 vs. B440 were significant. These mean that some primary differential metabolites exist between purple and green silk varieties. After analyzing the differential metabolites among these inbred lines, we identified a total of 213 differential metabolites (Fig. [188]8D, Supplementary Fig. [189]4). For B112 vs. B286, we detected 124 differential metabolites (64 up-regulated, 60 down-regulated); for B112 vs. B632, we found 146 differential metabolites (47 up-regulated, 99 down-regulated); and for B112 vs. B440, we identified 173 differential metabolites (77 up-regulated, 96 down-regulated). The Venn diagram showed that each line exhibited its unique set of differential metabolites and 52 shared differential metabolites, which also serve as potential biomarkers for distinguishing maize materials (Fig. [190]8D). Fig. 8. [191]Fig. 8 [192]Open in a new tab Detection of un-targeted metabolites in the silks with different colors in the four representative maize inbred lines. (A) The silk color phenotype of the four inbred lines. (B) The classification of all the identified metabolites. (C) PCA analysis result of all the identified metabolites. (D) The Venn Diagram of the differential metabolites’ numbers after comparing the detecting results of the four inbred lines Analysis of differential metabolites in different colored silks We employed hierarchical clustering analysis (HCA) to assess the similarities and distinctions among all the 52 differential metabolites. Correlation scores between different varieties were evaluated based on metabolite values. The results demonstrate that the four inbred lines can be classified into two primary groups: the B112 samples, which constitute a discrete group, and the B286, B440, and B632 samples, which form a distinct group. The heatmap revealed that, compared to the B112 variety, the other three varieties exhibited relatively higher levels of D-(-)-Erythrose (120.042_1.14), DL-Malic acid (134.022_1.29), coumaroyl (356.108_6.64), curcumin (368.111_6.21), ginger glycolipid B (724.509_10.63), cyanidin-3-O-(6-malonyl glucoside) (534.099_7.42), and cyanidin 3-(3,6-dimalonylglucoside) (620.101_8.23). Conversely, the lines with green silk exhibited relatively higher levels of naringenin-7-O-glucoside (434.121_8.51), isoschaftoside (564.147_9.53), and 2-O-Rhamnosylvitexin (578.155_11.02) (Fig. [193]9A). Comparing the relative levels of anthocyanin compounds, we found that B112 did not contain two compounds. In contrast, B286 exhibited the highest content, followed by B440 and B632, consistent with the filament color trend, with purple filaments displaying the highest flavonoid content, followed by mixed-color filaments. The two major anthocyanin components (534.099_7.42 and 620.101_8.23 were tentatively identified as cyanidin-3-O-(6-malonyl glucoside) and cyanidin 3-(3,6-malonyl glucoside), which are associated with color (Fig. [194]9B). In this study, identification of these two types of anthocyanins suggested that some specific biosynthetic genes and regulated networks controlling the silk colors of maize. Fig. 9. [195]Fig. 9 [196]Open in a new tab Analysis of the differential metabolites in silks of the four inbred lines. (A) Heatmap of the 52 Identified differential metabolites in silks of the four inbred lines. (B) The content of the two main differential anthocyanin components in the four inbred lines Gene expression analysis of anthocyanin biosynthetic genes and TFs in silks To gain further insight into the expression patterns of biosynthetic genes and their corresponding transcription factors (TFs) in silk with varying anthocyanin contents, we analyzed the expression of 46 of these genes in four inbred lines (Fig. [197]10). The expression patterns could be classified into two types. The first type is biosynthetic genes and their corresponding TFs having the same expression pattern, such as ZmPAL5, Zm00001eb187910, Zm00001eb062510, and Zm00001eb0431510 (Fig. [198]10A). For ZmPAL5 and Zm00001eb187910, they highly expressed in B286 and lower in B632, and the corresponding TFs exhibited the same expression tendency. The second type was the two types of genes having different expression patterns, such as the CHLS gene ZmCHLS7 and the F3’5’H biosynthetic gene ZmPR1(Fig. [199]10B). The CHI biosynthetic gene Zm00001eb062510 and the DFR biosynthetic gene Zm00001eb431510 highly expression in B440. At the same time, their corresponding TFs, ZmbHLH149 and ZmMYB80, did not exhibit relatively higher expression than others. Furthermore, the ANS biosynthetic gene Zm00001eb142000 and the transcription factor ZmbHLH90, along with the DFR biosynthetic gene ZmA1 and its transcription factor ZmbHLH90, exhibited a negative regulatory pattern, showing expression opposite to that of the biosynthetic genes, with the lowest relative expression in B112. Also, the ZmCYP47 and the FNS1 genes showed expression patterns different from those of their TFs. Fig. 10. [200]Fig. 10 [201]Open in a new tab Gene expression patterns of ten biosynthetic genes and predicted TFs in silks of the four inbred lines. The expression values were presented in fold change Discussion In the present study, 142 synthetic genes from ten gene families have been identified as participating in the anthocyanin biosynthesis pathway within the maize genome. Based on the phylogenetic tree analysis results, 40 new genes were identified (Supplementary Table [202]S5). Identifying new genes related to anthocyanin synthesis makes it possible to precisely regulate the anthocyanin content in maize, thereby breeding functional maize varieties with more diverse appearances and rich in anthocyanins. Two methods were used to predict 137 TFs and 99 synthesis genes (Supplementary Table [203]S8). Anthocyanins and their biosynthesis pathways are often associated with plant stress resistance. Identifying new regulatory transcription factors, especially those that regulate anthocyanin biosynthesis, can help develop maize varieties with enhanced stress tolerance. The anthocyanin biosynthesis pathway constitutes a component of the plant’s secondary metabolism network. Identifying new genes and regulatory transcription factors gave researchers a deeper understanding of the complexity and diversity of secondary metabolic pathways. Moreover, the copy number, chromosomal location, and gene expression patterns were investigated. In addition to the fundamental cis-acting elements, three cisgender factors involved in growth, environmental stress, and hormone response pathways were identified. The abovementioned elements included G-box, TCA element, ABRE, TGACG-motif, AACGG --motif, TGA element, and MBS. One of the motifs, AACGG-motif, has been identified as predominantly recognized by the R2R3-MYB TF, which was crucial for anthocyanin biosynthesis genes to facilitate transcriptional activation [[204]82]. The anthocyanin biosynthesis genes identified in maize exhibit some differences compared to those in other plant species, but overall, the core pathway for anthocyanin biosynthetic is conserved among angiosperms. The anthocyanin biosynthetic pathway is comprised of a series of critical enzymes, which are encoded by a series of conserved genes, such as CHS (chalcone synthase), CHI (chalcone isomerase), F3H (flavanone 3-hydroxylase), DFR (dihydro flavonol 4-reductase), and ANS (anthocyanidin synthase). Previous research has indicated that genes involved in flavonoid synthesis are confined to the cytoplasm, where flavonoids are synthesized and transported to the vacuole. However, recent studies have demonstrated that genes encoding enzymes involved in flavonoid synthesis, including CHS, CHI, and FLS, are not limited to the cytoplasm but are also present in the nucleus. It has been demonstrated that flavonoids in the nucleus protect DNA from damage caused by ultraviolet (UV) radiation and oxidative agents. Additionally, these compounds have been shown to directly or indirectly regulate the activity of auxin transporters and other genes involved in growth. Previous research has revealed that the enzymes responsible for the synthesis of flavonoids are not confined to the cytoplasm but are also present in the nucleus and other cellular organelles. Earlier studies have demonstrated that flavonoid synthesis proteins are present not only in the cytoplasm but also in the nucleus and other organelles. ZmF3’H, ZmDFR, and ZmCHS were also identified in the cell membrane. Additionally, ZmF3’H has been observed to be expressed in the nucleus. However, further verification is required to ascertain whether the two proteins synthesize flavonoids in this cellular compartment. We conducted an in-depth analysis of gene expression patterns using RNA-Seq data from 21 distinct tissues to obtain comprehensive expression profiles for the identified biosynthetic genes. Our findings revealed that ZmA1 exhibited low expression levels in various tissues, as evidenced by the RNA-Seq data and the qRT-PCR results. However, it demonstrated elevated expression in roots and silks. Meanwhile, Zm00001eb359030 (A4) was barely expressed in diverse tissues. Previous studies also showed that among the dihydroflavonol reductases (DFR), ZmDFR1 (A1) maintained low expression levels (average FPKM 6.05) in nearly all issues except roots throughout the whole developmental process. In contrast, the duplicate ZmDFR2 (A4) was not expressed in any tissue [[205]83]. Similarly, the current study indicated that these gene expression data could be used for further candidate gene analysis. Transcription factors (TFs), exemplified by MYB, bHLH, and WD40, have been demonstrated to bind specifically to promoter regions of anthocyanin biosynthetic genes, modulating their expression to regulate the anthocyanin biosynthesis process [[206]28–[207]31]. Various TFs regulate anthocyanin synthesis across different crops, influencing distinct biological processes. Many websites predict TFs interacting with genes, but only one model is usually used to predict them. Combining two methods for predicting the interactions between anthocyanin synthesis genes and transcription factors yielded more accurate results than a single approach. Subsequent gene expression assays also confirmed the crucial importance of the identified transcription factor and biosynthetic gene pairs as targets for further research. We can deepen our understanding of this complex metabolic pathway by identifying new biosynthetic synthesis genes and regulatory transcription factors. In particular, discovering unknown enzymes or new regulatory mechanisms can help refine existing models of anthocyanin synthesis. This deeper understanding not only helps explain the variations in color and phenotype in maize but also provides insights into the secondary metabolic pathways of other plants. In addition, we identified the substances that caused silk color differences among the four maize inbred lines (Fig. [208]8A), and an un-targeted metabolomic analysis was initially conducted on the silk. The results showed that each line exhibited its unique set of differential metabolites and 52 shared differential metabolites. We employed hierarchical clustering analysis (HCA) to assess the similarities and distinctions among all the 52 differential metabolites. Correlation scores between different varieties were evaluated based on metabolite values. Previous studies have reported that various anthocyanin types contribute to the total anthocyanin content in cereals, with flavonol 3-O-glucosides being the predominant type found in cereals [[209]84–[210]86]. In this study, identifying these two types of anthocyanins suggested that some specific biosynthetic genes and regulated networks control the silk colors of maize. The results of the RT-qPCR experiments, as illustrated in Fig. [211]10, demonstrated the expression of the aforementioned genes in the four inbred lines. The result elucidated the expression patterns of biosynthetic genes with high levels of transcription and their corresponding transcription factors (TFs) in silks with varying anthocyanin contents. The expression patterns could be classified into two types. These observations suggest that, as is the case with numerous biological processes, the anthocyanin biosynthesis process is a relatively complex one, regulated by different genetic factors and exhibits different regulatory mechanisms, the full details of which require further detailed experimental analysis [[212]87–[213]89]. Concurrently, the methodologies employed in this study can be extended to other crops. Anthocyanins are a class of secondary metabolites that perform a range of biological functions in plants, including acting as antioxidants, protecting against ultraviolet (UV) radiation, and enhancing resilience to pests and pathogens. By identifying new anthocyanin biosynthetic genes and regulatory transcription factors, we can deepen our understanding of this complex metabolic pathway. In particular, discovering unknown enzymes or new regulatory mechanisms can help refine existing anthocyanin synthesis models. This deeper understanding not only helps explain the differences in color and phenotype in maize but also provides insights into secondary metabolic pathways in other plants. By identifying new genes related to anthocyanin synthesis, it becomes possible to specifically regulate anthocyanin content in maize, enabling the cultivation of functional maize varieties with more diverse appearances and richer anthocyanin content. Such maize varieties may have better market value and nutritional properties, especially in terms of enhancing antioxidant capacity and preventing aging. Conclusion The present study has identified ten gene families and three major transcription factors (TFs) that are involved in the pathway of maize anthocyanin biosynthesis. A total of 142 genes associated with biosynthesis were subjected to analysis, with particular attention paid to their varying copy numbers and distribution patterns within genomes, cis-elements, and expression patterns. Phylogenetic tree analysis identified 40 new genes. Furthermore, 137 transcription factors (TFs) were identified as interacting with 99 biosynthetic genes. Among all the TFs, 40 TFs belonging to three major types were found to potentially regulate 62 biosynthetic genes. To elucidate how these anthocyanin biosynthetic genes relate with the silk color in maize, an un-targeted metabolism analysis was conducted for silks of four representative maize inbred lines. Two major anthocyanin components were screened. Finally, gene expression patterns of ten genes which highly expressed in silks and their corresponding TFs were analyzed in the silks of four inbred lines, The results demonstrated that the expression profiles of these genes were not identical. The results indicated that further investigation into the structure and function of genes related to anthocyanin biosynthesis in maize may be warranted. The identification of new anthocyanin biosynthetic genes and regulatory transcription factors will facilitate a deeper understanding of this complex metabolic pathway. In particular, discovering unknown enzymes or new regulatory mechanisms can help refine the existing anthocyanin biosynthetic models. This deeper understanding not only helps explain the variations in color and phenotype in maize but also provides insights into the secondary metabolic pathways of other plants. Electronic supplementary material Below is the link to the electronic supplementary material. [214]Supplementary Material 1^ (2.6MB, docx) [215]Supplementary Material 2^ (99.9KB, xlsx) Acknowledgements