Abstract Background The cytochrome P450 superfamily comprises a large group of enzymes crucial for the biosynthesis and metabolism of diverse endogenous and exogenous secondary metabolites in plants. Chrysanthemum, an ornamental genus with considerable medicinal value, is one of the most economically important floricultural crops in the world. The characteristics and functions of CYP450 genes in Chrysanthemum species, however, remain largely unknown. Results In this study, we identified 371 CYP450 genes in the Chrysanthemum indicum genome, and categorized them into 8 clans and 44 families through phylogenetic analysis. Gene duplication analysis revealed 111 genes in 47 tandem duplicated clusters and 28 genes in 15 syntenic blocks, suggesting that extensive duplication events may account for the rapid expansion of CiCYP450 superfamily. Additionally, extensive variations in gene structure, motif composition, and cis-regulatory element likely enhance the functional diversity of CiCYP450 proteins. Volatile metabolomic analysis detected a total of 53 distinct volatile organic compounds across the leaves, stems, and roots of C. indicum, with 19 and 16 compounds being exclusive to leaves and stems, respectively. Transcriptomic analysis identified 248 expressed CiCYP450 genes, with 31, 40, and 88 specifically or preferentially expressed in leaves, stems, and roots, respectively. Further correlation analyses between gene expression levels and compound contents highlighted 36 candidate CiCYP450 genes potentially responsible for the biosynthesis of 47 volatile organic compounds. Conclusions The genome-wide analyses of cytochrome P450 superfamily offers essential genomic resources for functional studies of CiCYP450 genes, and is significant for the molecular breeding of Chrysanthemum. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-025-11664-0. Keywords: CYP450, Chrysanthemum indicum, Phylogeny, Volatile metabolome, Gene expression Background Cytochromes P450 (CYP450s) are a superfamily of heme-thiolate enzymes found across all kingdoms of life [[40]1]. These versatile enzymes can catalyze a wide range of biochemical reactions, including monooxygenation, oxidative rearrangement of carbon skeletons, and oxidative C-C bond cleavage, and thus transform endogenous or exogenous compounds into diverse secondary metabolites [[41]2, [42]3, [43]4]. In plants, these metabolites include membrane sterols, phytohormones (e.g., gibberellins, strigolactones and abscisic acid), biopolymers (e.g., lignin and cutin), pigments (e.g., flavonoids and carotenoids), volatile organic compounds (e.g., terpenoids and fatty acid derivatives), and a wide range of lineage-specific secondary metabolites (e.g., taxol in Taxus and berberine in the Ranunculales) [[44]3, [45]5, [46]6, [47]7]. These chemical components are crucial for plant development and adaptation to the environment. For example, lignin strengthens the cell wall, providing structural support, while cutin forms protective barriers to prevent water loss and protect against desiccation [[48]8, [49]9]. Furthermore, compounds like cyanogenic glucosides, alkaloids, flavonoids, and terpenoids are essential for plant interactions with biotic and abiotic factors [[50]10, [51]11, [52]12, [53]13]. The advent of sequencing technology has greatly accelerated the research progress of plant genomes, as well as the identification of CYP450 genes. To date, over 32,000 CYP450s, grouped into 670 families, have been named and annotated across the plant kingdom, with more than 800 of them being functionally characterized [[54]3]. Notably, the CYPomes (i.e., complete set of CYP450s in a given species) vary widely between species due to frequent gene duplication and loss events [[55]14]. For instance, only 39 CYP genes were detected in the green algae Chlamydomonas einhardtii, whereas those in angiosperms account for up to 1% of protein-coding genes, with 236 in Arabidopsis thaliana, 309 in poplar (Populus trichocarpa), and 305 in rice (Oryza sativa) [[56]15, [57]16]. The land plant CYP450s are divided into 11 clans based on homology and phylogeny: seven single-family clans (CYP51, CYP74, CYP97, CYP710, CYP711, CYP727, and CYP746) and four multi-family clans (CYP71, CYP72, CYP85, and CYP86) [[58]3, [59]5]. Structurally, only a few protein sequences are highly conserved, including the heme-binding motif FxxGxRxCxG (CXG motif) in C-term, the ExxR motif in the K-helix, the PxRx motif and the oxygen binding and catalysis motif (A/G)Gx(E/D)T(T/S) in the I-helix [[60]17]. The extensive diversity in the number and structure of plant CYP450s provide a molecular basis for the variability in metabolite profiles, which may lead to differences in morphological or developmental traits, and enable adaptation to specific ecological niches. The genus Chrysanthemum (Asteraceae) consists of approximately 40 species and more than 30,000 cultivars, most of which are cultivated as ornamental plants, with significant economic and cultural values [[61]18]. Beyond their visual appeal, Chrysanthemum species are also renowned for their medicinal and culinary uses, with various plant parts being used in traditional medicine and tea beverage [[62]19, [63]20, [64]21]. In Chrysanthemum species, the pharmacological activities are primarily linked to their component of flavonoids, alkaloids, and sesquiterpene lactones, while the distinctive floral scent is largely attributed to terpenoids, encompassing camphor, α-pinene, cineole, and caryophyllene [[65]22, [66]23, [67]24, [68]25]. Although studies on closely related species highlighted the role of CYP450 genes in the biosynthesis of several important secondary metabolites, such as the artemisinin in Artemisia annua (CYP71AV1), the pyrethrin I in Tanacetum cinerariifolium (CYP82Q3), and the parthenolide in T. parthenium (CYP71CB1), researches on the characteristics and functions of CYP450s in Chrysanthemum remain limited [[69]26, [70]27, [71]28]. Therefore, a comprehensive genome-wide identification and expression analysis of CYP450 gene family in Chrysanthemum is urgently needed for the effective gene utilization in breeding and metabolic engineering. Chrysanthemum indicum, a diploid species (2n = 2x = 18) with the genome size of approximately 3.11 Gb, has been developed as a model of the genus Chrysanthemum [[72]29, [73]30]. In this study, we identified and classified the CYP450 genes in the genome of C. indicum through phylogenetic analysis, and revealed their characteristics in physicochemical properties, gene structures, motif compositions, gene duplication events, and cis‑regulatory elements. Furthermore, by integrating volatile metabolomic and transcriptomic data, we further explored the mechanisms underlying the volatile organic compounds of C. indicum. Our findings not only provide insights into the functions of CYP450 genes in C. indicum, but also offer valuable genetic resources for future molecular breeding efforts. Methods Plant materials C. indicum was identified and kindly shared by Prof. Chao Ma in China Agricultural University [[74]30], and conserved at the National Engineering Research Center for Floriculture, Beijing Forestry University, China. Voucher specimens (BJFC 00115566) were deposited at the Beijing Forestry University Herbarium (BJFC), Beijing. The rooted cuttings of C. indicum were grown in greenhouse under a 12-h-light /12-h-dark photoperiod at 25 °C and 60% relative humidity. Identification of CiCYP450 genes To identify the CYP450 genes in the C. indicum genome (available at Chrysanthemum Genome Database, [75]http://210.22.121.250:8880/asteraceae/download/downloadPage) (accessed on 27 April 2024), the Hidden Markov Model (HMM) file for the conserved domain of CYP450 genes (PF00067) was downloaded from Pfam database ([76]http://pfam.xfam.org/) (accessed on 28 April 2024), and the HMMER v3.3.2 (accessed on 28 April 2024) was used to search the protein sequences containing this domain [[77]31]. At the same time, we obtained the CYP450 protein sequences of Arabidopsis thaliana, poplar, and rice from TAIR ([78]https://www.arabidopsis.org/) (accessed on 29 June 2024) and Cytochrome P450 Homepage ([79]https://drnelson.uthsc.edu/) (accessed on 29 June 2024), and used them as queries for a local BLAST search against the C. indicum genome, with an E-value ≤ 10^− 5. Candidate protein sequences, ranging from 300 to 600 amino acids, were further filtered using the Conservative Domain Database tool (CCD, [80]https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) (accessed on 29 June 2024) [[81]32]. Finally, a total of 371 CYP450 genes were identified in the C. indicum genome. Analysis of physicochemical properties of CiCYP450 genes To characterize the identified CiCYP450 protein sequences, we used the ExPASy ProtParam tool ([82]https://web.expasy.org/protparam/) (accessed on 1 July 2024) to predict their molecular weights, isoelectric points, and grand average of hydrophilicity [[83]33], and the WoLF PSORT tool ([84]https://wolfpsort.hgc.jp/) (accessed on 1 July 2024) to predict their subcellular localization [[85]34]. Phylogenetic analysis of CiCYP450 genes To investigate the phylogenetic relationships of the CiCYP450 genes, the protein sequences of CYP450 genes from C. indicum, A. thaliana, poplar, and rice were aligned using MAFFT v7.490, with the default parameters [[86]35]. A maximum likelihood tree was then constructed using IQ-TREE 2.1.4-beta, with the parameters of “-bb 1000 -nt 128” [[87]36]. Chromosomal localization of CiCYP450 genes The locations of CiCYP450 genes on chromosomes were obtained from the annotation file of the C. indicum genome, and visualized through TBtools (accessed on 6 July 2024) [[88]37]. Genome collinearity analyses of CiCYP450 genes The Chrysanthemum seticuspe genome was available at PlantGarden ([89]https://plantgarden.jp/) (accessed on 17 May 2024), and the A. thaliana genome was downloaded from NCBI ([90]https://www.ncbi.nlm.nih.gov/) (accessed on 17 May 2024). Genome collinearity analysis was performed using MCScanX, with default parameters (accessed on 6 July 2024 and on 12 January 2025) [[91]38], and visualized using TBtools. Conserved motif of CiCYP450 genes To analyze the conserved motifs of the CiCYP450 proteins, the MEME tool ([92]https://meme-suite.org/meme/) (accessed on 8 July 2024) was employed, with the number of motifs being set to 15 [[93]39]. Cis-regulatory elements analyses of CiCYP450 genes To identify potential cis‑regulatory elements of CiCYP450 genes, the 2 kb upstream of the translation start site of CiCYP450 genes was retrieved from C. indicum genome, and submitted to the PlantCARE ([94]http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (accessed on 8 July 2024) [[95]40]. All results were visualized using TBtools. GC-MS analysis of volatiles in C. indicum The volatiles from roots, stems, and leaves were detected using headspace-solid phase micro extraction-gas chromatography-mass spectrometry (HS-SPME-GC-MS), with three biological replicates each. Fresh samples were separately collected and sealed in 20 ml glass bottles, with ethyl decanoate (1:1000) added as an internal standard. A pre-conditioned SPME fiber (DVB/CAR/PDMS, Sigma-Aldrich) was then inserted into the vial’s headspace, and incubated at 40℃ for 40 min. Following extraction, the fiber was desorbed in the GC injection port at 250 °C for 5 min. The GC-MS analysis was performed using Agilent 7890B-5977 A GC/MSD (America) equipped with a DB-5MS capillary column (30 m × 0.25 mm × 0.25 μm, Agilent). The GC program started at 40℃ for 2 min, then increased to 180℃ at a rate of 5℃/min, and finally rose to 270℃ at a rate of 20℃/min. The mass spectra were acquired within the scanning range of 50–400 m/z. Compounds were identified by comparing spectra with the National Institute of Standards and Technology database (NIST, 2017) using Qualitative Navigator B.08.00 (accessed on 13 July 2024). Volatile compounds were quantified relative to the internal standard, and peak areas were normalized to percentages to calculate the amounts of volatiles, following the method of Feng et al. [[96]41]. RNA-seq and expression analysis of CiCYP450 genes The fresh leaves, stems, and roots of C. indicum were collected and stored at − 80 °C for preservation, with three biological replicates each. Total RNA was separately extracted using the RNAprep Pure Plant Plus Kit (TIANGEN). RNA concentration was measured using Qubit RNA Assay Kit in Qubit 3.0 Flurometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bio-analyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). Libraries were generated using VAHTS Universal V6 RNA-seq Library Prep Kit for Illumina ^® (Vazyme Biotech Co., Ltd), and were sequenced on Illumina Novaseq X plus platform (Illumina, San Diego, CA) with 150 bp read length of paired-end (PE) reads by Annoroad Gene Technology (Beijing, China). Each sample obtained approximately 6.0 Gb raw data. To obtain the clean reads, all raw reads were filtered using fastp version 0.23.4, with default parameters. The clean data of RNA-seq were mapped to the C. indicum genome using HISAT2 v2.21, with default parameters [[97]42]. The TPM values were calculated using StringTie v2.21 [[98]43]. Correlation analyses between the gene expression levels and the contents of volatile compound was performed using the R v4.3.2 package corrplot. KEGG enrichment analysis of CiCYP450 genes The annotation information of CiCYP450s for KEGG enrichment was downloaded from eggNOG-mapper ([99]http://eggnog-mapper.embl.de/) (accessed on 27 August 2024) [[100]44]. Then we obtained the KEGG backend background files from TBtools. The KEGG path was visualized by TBtools. Results Identification and phylogenetic analysis of CYP450 genes in Chrysanthemum indicum A total of 371 CYP450 genes were identified in the genome of C. indicum, with protein lengths ranging from 300 to 600 amino acids and molecular weights between 34.02 and 69.34 kDa (Table [101]S1). The CiCYP71-23 and CiCYP71-24 protein have the lowest isoelectric points (PI = 4.96), while the CiCYP71-44 protein has the highest (PI = 9.82) (Table [102]S1). The proteins also have different levels of hydrophilicity, ranging from − 0.42 of CiCYP71-21 and CiCYP76-4 to 0.307 of CiCYP71-1 and CiCYP75-1 (Table [103]S1). Subcellular localization predictions showed that most of proteins were localized in chloroplast (223), followed by cytosol (65), plasma membrane (30), nucleus (13), endoplasmic reticulum (11), vacuolar membrane (9), extracellular (7), peroxisome (6), mitochondrion (6), and cytoskeleton (1) (Table [104]S1). To investigate the evolutionary relationships of CiCYP450s, the protein sequences of CYP450 genes from C. indicum, A. thaliana, O. sativa, and P. trichocarpa were aligned to construct the phylogenetic tree. The analysis revealed that the CiCYP450 genes could be divided into 8 clans and 44 families: clan710 (i.e., CYP710), clan85 (i.e., CYP716, CYP90, CYP724, CYP720, CYP87, CYP85, CYP88, CYP707, CYP728, CYP729, CYP733, and CYP718), clan74 (i.e., CYP74), clan711 (i.e., CYP711), clan97 (i.e., CYP97), clan72 (i.e., CYP714, CYP715, CYP735, CYP749, CYP721, CYP734, and CYP72), clan86 (i.e., CYP94, CYP704, CYP96, and CYP86), and clan71 (i.e., CYP89, CYP77, CYP701, CYP79, CYP78, CYP73, CYP98, CYP76, CYP80, CYP706, CYP82, CYP81, CYP93, CYP75, CYP84, CYP71, and CYP92) (Fig. [105]1, Fig. [106]S1). Among them, the CYP71, CYP706, and CYP72 were the top three largest families with 58, 40, and 33 members, respectively (Fig. [107]1, Table [108]S2). Conversely, the smallest families, each contained only one member, were CYP85, CYP98, CYP711, CYP715, CYP721, CYP724, CYP733, CYP734 and CYP735 (Fig. [109]1, Table [110]S2). Notably, families involved in biosynthesis of flavonoids, alkaloids, and/or terpenoids, such as CYP82, CYP706, CYP72, and CYP80, were more abundant in C. indicum compared to the other three species (Table [111]S2). Fig. 1. [112]Fig. 1 [113]Open in a new tab Phylogenetic trees of CiCYP450 genes. Phylogenetic tree is constructed by using ML algorithms, with 1000 bootstrap replicates. Different clans are represented by different colors Chromosomal locations and gene duplication analysis of CiCYP450 genes To reveal the evolutionary mechanisms of CiCYP450 genes, we first analyzed their chromosomal locations in C. indicum genome. Our analysis showed that 6 genes were located in unplaced contigs and 365 CYP450 genes were unevenly distributed across nine chromosomes, with gene counts ranging from 16 on chromosome 8 to 56 on chromosome 4 (Fig. [114]2). Moreover, these genes were predominantly clustered near the chromosomal ends, resulting in 47 tandem duplicated clusters (Fig. [115]2, Table [116]S3). Of these, 35 clusters had two genes and 12 clusters contained three or more members, with two largest cluster composed of five members (CiCYP80-6 to CiCYP80-8, CiCYP92-2 to CiCYP92-8) located on chromosome 4 (Fig. [117]2). The tandem duplicated genes spanned 17 families, including 18 in CYP706, 14 in CYP71, 12 in CYP72, 10 in CYP80, 7 in CYP96, CYP92 and CYP89, 6 in CYP82, 5 in CYP94 and CYP704, 4 in CYP84, CYP729 and CYP76, and 2 each in CYP93, CYP81, CYP707, and CYP718 (Fig. [118]2). These findings suggest that tandem duplication has played an important role in the expansion of the CiCYP450 supergene family. Fig. 2. [119]Fig. 2 [120]Open in a new tab Locations of CiCYP450 genes on chromosomes. The red arcs highlighted the gene pairs with tandem duplication In addition to tandem duplication, segmental duplication has been proposed as another driver of gene family expansion. Through a collinearity analysis of the C. indicum genome, we further identified 16 pairs of segmentally duplicated CiCYP450 genes within 15 syntenic blocks (Fig. [121]3, Table [122]S4). These syntenic blocks were distributed across eight chromosomes, with the exception of chromosome 5, where no segmental duplicated genes were observed (Fig. [123]3). The 16 segmentally duplicated gene pairs were belonging to 11 families, including three pairs from the CYP84 family, two pairs from the CYP71, CYP82 and CYP749 family, and one pair each from the CYP77, CYP78, CYP90, CYP92, CYP94, CYP701, and CYP707 families (Fig. [124]3). These results together suggest that at least 35.58% (132/371) CiCYP450 genes originated from duplication events, offering opportunities for the acquisition of new gene functions or the refinement of existing ones. Fig. 3. [125]Fig. 3 [126]Open in a new tab Genome-wide synteny of CiCYP450 genes. The gray lines connect syntenic gene pairs, with the CiCYP450 genes highlighted in different colors. The two rings represent the chromosomes and the gene density of each chromosome To further investigate the origin and evolutionary relationship of CiCYP450s, we performed collinearity analyses among C. indicum, C. seticuspe and A. thaliana (Fig. [127]S3). The analyses showed that C. indicum shared 213 collinear genes with C. seticuspe and 47 with A. thaliana (Tables [128]S5-[129]S6), consistent with the closer evolutionary relationship between C. indicum and C. seticuspe compared to C. indicum and A. thaliana. In addition, we found that there were 22 and 7 CYP450 genes with multiple copies in C. indicum were identified as homologous to a single CsCYP450 or AtCYP450 gene, respectively, suggesting that these CiCYP450 genes likely expanded after the divergence of C. indicum with C. seticuspe and A. thaliana. Gene structure and conserved motif analyses of CiCYP450 genes To understand the structural characteristics and diversity of the CiCYP450s, we conducted a detailed analyses of the gene structure and motif composition of all CiCYP450 proteins. The results showed that the number of exons in CiCYP450 genes ranged from 1 to 16, with 2 exons being the most common (Fig. [130]4A, Fig. [131]S2). While exon numbers varied considerably across different families, they remained relatively conserved within each family. For instance, families such as CYP97 (12, 14, 16), CYP87 (9, 11), CYP733 (10), CYP720 (9), CYP724 (9), CYP90 (6–8), CYP707(6–9), CYP85 (8), CYP88 (7), and CYP721 (7) exhibited a relatively higher number of exons (Fig. [132]4A, Fig. [133]S2). In contrast, most members of the other families had fewer exons. Particularly, the families CYP81, CYP82, CYP74, CYP80, CYP86, CYP89, CYP76, CYP77, CYP718, CYP92, CYP93, CYP94, CYP71, CYP96, and CYP706, with 1, 1, 2, 2, 2, 2, 3, 3, 3, 6, 9, 9, 16, 16, and 17 members, respectively, were found to be intronless (Fig. [134]4A, Fig. [135]S2). Moreover, we found 8 CiCYP450 genes, including CYP71-21, CYP92-1, CYP84-1, CYP96-1, CYP72-19, CYP721, CYP728-3, and CYP710-2, which contained an intron longer than 20 kb (Fig. [136]S2). Fig. 4. [137]Fig. 4 [138]Open in a new tab Phyletic relatogenionships, gene structures, and conserved motifs of representative CiCYP450 genes in 44 families. The gray boxes represent CDS, and the colored and numbered boxes represent different motifs Using the MEME tool, we identified 15 conserved motifs in CiCYP450 proteins (Fig. [139]4B-C, Fig. [140]S2). Notably, motifs 1, 2, 7, and 8 correspond to the highly conserved sequences characteristic of the cytochrome P450 superfamily, including the FxxGxRxCxG motif, ExxR motif, PxRx motif, and (A/G)Gx(E/D)T(T/S) motif (Fig. [141]4C). The most of CiCYP450 proteins (50.13%) contained all four of these motifs, while the remaining 185 members lacked one or more of them (Fig. [142]S2). For example, CiCYP71-1 lacked motifs 1, 2, 7, and 8, and CiCYP84-2 was missing motifs 1 and 7 (Fig. [143]S2). Furthermore, significant variations in motif composition were observed among different clans. In the CYP71 clan, 9 members contained 14 conserved motifs (motifs 1 to 14), with many members missing motifs 1, 2, 7, 8, 10, and 12 on the C-terminus, likely due to incomplete or incorrect annotation (Fig. [144]4B, Fig. [145]S2). Unlike the CYP71 clan, the remaining eight clans had fewer conserved motifs. For example, most members of the 86 clan contained 9 conserved motifs (motif 1, 2, 6, 7, 8, 10, 12, 13, and 15), and most members of the 710 clan possessed 5 conserved motifs (motif 2, 7, 8, 10, and 12) (Fig. [146]4B, Fig. [147]S2). The remarkable diversity in gene structure and motif composition among CiCYP450 genes highlights their functional diversity. Analysis of cis-regulatory elements in the promoter region of CiCYP450 genes Cis-elements in promoters serve as binding sites for transcription factor, and are crucial for transcriptional regulation. To identity the potential cis-elements of CiCYP450 genes, the promoter regions (2 kb upstream of the translation start site) were retrieved and analyzed using PlantCARE. A total of 25 putative cis-elements related to plant growth and development, stress response, and phytohormone response were identified across the 371 CiCYP450 genes (Fig. [148]S2). The cis-elements involved in plant growth and development included those for zein metabolism regulation (O2-site), meristem expression (CAT-box), endosperm expression (GCN4-motif and AACA-motif), circadian control (circadian), seed-specific regulation (RY-element), palisade mesophyll cells (HD-Zip 1), and cell cycle regulation (MSA-like) (Fig. [149]S4). The stress response elements included those involved in anaerobic induction (ARE), low temperature responsiveness (LTR), drought inducibility (MBS), defense and stress responsiveness (TC-rich repeats), and anoxic specific inducibility (GC-motif) (Fig. [150]S4). For phytohormone responses, cis-elements included MeJA responsive (CGTCA-motif and TGACG-motif), ABA responsive (ABRE), gibberellin responsive (TATC-box, GARE-motif, and P-box), auxin responsive (TGA-element, TGA-box, AuxRE, and AuxRR-core), and SA responsive (SARE and TCA-element) (Fig. [151]S4). Our findings suggest that CiCYP450 genes may play significant roles in regulating plant growth and development, as well as in responding to hormonal signals and both abiotic and biotic stresses. Integrated volatile metabolomic and transcriptomic analyses of CiCYP450 genes The Chrysanthemum species are distinguished by their unique scent, setting them apart from other ornamental plants. To explore the physical and genetic basis of the chrysanthemum’s unique aroma, we conducted integrated volatile metabolomic and transcriptomic analyses focusing on CiCYP450 genes. Using HS-SPME coupled with GC-MS, we identified 53 distinct volatile organic compounds (VOCs) from C. indicum (Fig. [152]5A-B, Table [153]S7). These VOCs could be categorized into three groups: 40 terpenoids (including 18 monoterpenoids and 22 sesquiterpenoids), 6 aromatic hydrocarbons, and 7 fatty acid derivatives (Fig. [154]5A, Table [155]S7). Among the analyzed organs, 33 volatiles were detected in both leaves and stems, while only 10 were found in the roots, with most compounds being organ-specific. For instance, 19 compounds (e.g., 2-isopropyltoluene, 1,8-cineole, and calamenene) were unique to leaves, and 16 volatiles (e.g., β-sesquiphellandrene, α-zingiberene, and β-phellandrene) were exclusive to stems (Fig. [156]5A). Notably, even VOCs present in all three organs exhibited significant differences in concentration. For example, the content of caryophyllene in the stems was 67.01 µg/g compared to only 2.78 µg/g in leaves and 0.60 µg/g in roots (Fig. [157]5A, Table [158]S7). Fig. 5. [159]Fig. 5 [160]Open in a new tab Volatile organic compounds and expression patterns of CiCYP450 genes. (A) Heatmap of 53 volatile organic compounds in roots, stems, and leaves of C. indicum. (B) Total ion chromatogram of different tissues in C. indicum volatile components. (C) Expression patterns of 248 CiCYP450 genes in roots, stems, and leaves of C. indicum. (D) KEGG pathway enrichment analysis of 248 expressed CiCYP450 genes We then analyzed the expression patterns of CiCYP450 genes in leaves, stems and roots using RNA-seq data. Of 371 CiCYP450 genes, 248 (66.85%) had a TPM value greater than 1 in as least one organ, indicating that they were expressed (Fig. [161]5C, Table [162]S8). Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis showed that these genes mainly involved in the pathway of “metabolism of terpenoids and polyketides”, “isoflavonoid biosynthesis”, “cutin, suberine and wax biosynthesis”, and “diterpenoid biosynthesis” (Fig. [163]5D). Notably, we found 31, 40, and 88 genes were specifically (i.e., a gene whose TPM was ≥ 1.0 only in this organ but < 1.0 in any other organs) or preferentially (i.e., a gene whose TPM in this organ was at least two-fold higher than those in any other organs) expressed in leaves, stems, and roots, respectively (Fig. [164]5C, Table [165]S8). Additionally, several genes were preferentially expressed in two of the three organs: 22 in leaves and stems, 5 in leaves and roots, and 25 in stems and roots (Fig. [166]5C, Table [167]S8). The remaining 37 genes showed comparable expression levels across all three organs (Fig. [168]5C, Table [169]S8). Interestingly, different members of the same family also displayed distinct expression patterns, suggestive of functional divergence. For example, CiCYP706-13, CiCYP706-32, CiCYP706-35, and CiCYP706-36 were highly expressed in leaves, CiCYP706-1, CiCYP706-2, CiCYP706-3, and CiCYP706-4 were preferentially expressed in stems, whereas CiCYP706-12, CiCYP706-14, CiCYP706-15, and CiCYP706-20 were primarily expressed in roots (Fig. [170]5C). The variations in VOC profiles and gene expression provide a valuable opportunity to identify candidate genes associated with the diverse volatile organic compounds in C. indicum. We therefore analyzed the correlations between gene expression levels and the contents of these VOCs. Among the 28 compounds that were either unique to or abundant in leaves, such as 2-isopropyltoluene, 1,8-cineole, germacrene D, and terpinen-4-ol, were positively correlated with 39 CiCYP450 genes, including CiCYP704-13, CiCYP706-35, and CiCYP71-23 (Fig. [171]6). Notably, the nine compounds (i.e., germacrene D, (-)-γ-cadinene, γ-terpinene, β-cadinene, copaene, pinocarvone, (+)-camphor, camphene, and terpinen-4-ol) detected in both leaves and stems, were also associated with additional 25 CiCYP450 genes, including CiCYP96-11, CiCYP96-13, CiCYP96-14, CiCYP72-5, CiCYP90-5, and CiCYP706-30 (Fig. [172]6). Specially, methyl isovalerate were associated with 9 CiCYP450 genes, including CiCYP82-5, CiCYP72-3, CiCYP74-3, CiCYP82-12, and CiCYP72-26 (Fig. [173]6). Similarly, the 16 volatiles exclusive to or enriched in stems, like β-sesquiphellandrene, α-zingiberene, β-phellandrene, and terpinolene, showed positive correlations with 43 CiCYP450 genes, such as CiCYP72-17, CiCYP706-4, and CiCYP77-4 (Fig. [174]6). The four compounds (i.e., myrcene, caryophyllene, α-humulene, and β-pinene) were associated with 31 CiCYP450 genes, including CiCYP72-5, CiCYP71-8, and CiCYP72-17 (Fig. [175]6). And the four compounds (i.e., cis-β-farnesene, (+)-α-pinene, α-curcumene, and hexanal) detected in both stems and roots, were associated with 56 CiCYP450 genes, including CiCYP93-5, CiCYP81-8, and CiCYP78-5 (Fig. [176]6). Fig. 6. [177]Fig. 6 [178]Open in a new tab Correlation analyses between the expression levels of CYP450 genes and the contents of volatile compounds Discussion Cytochrome P450 superfamily represents the largest enzyme family that plays crucial roles in metabolic diversification during plant evolution [[179]5, [180]45]. Gene identification, classification, and expression analysis are essential for understanding the functions of this extensive gene family. With the increase of whole-genome sequences, CYP450 genes have been identified in various plants, such as tea plant (Camellia sinensis, 273 genes), citrus (Citrus clementina, 301 genes), peanuts (Arachis hypogaea, 589 genes), sweet potatoes (Ipomoea batatas, 95 genes), pears (Pyrus spp., 419 genes), and Brassica napus (384 genes) [[181]46, [182]47, [183]48, [184]49, [185]50, [186]51]. However, little is known about the CYP450 gene family in C. indicum, a model species of the genus Chrysanthemum. In this study, we identified a total of 371 CYP450 genes in the genome of C. indicum, and revealed their phylogenetic relationships, molecular characteristics, and expression patterns. Furthermore, we examined correlations between the expression levels of 248 CiCYP450 genes and the contents of 53 VOCs to gain insight into their potential functions. In this study, 371 CYP450 genes were identified in C. indicum, which were grouped into 8 clans and 44 families. As in most angiosperms [[187]52, [188]53], the CYP71 clan constituted the largest CYP450 clan in C. indicum, comprising 17 families and 227 members. In contrast, although the CYP51 clan is generally considered evolutionarily conserved [[189]3, [190]54], no members of this clan were detected in C. indicum, which is likely due to incomplete annotation of protein-coding genes. In addition, C. indicum have experienced extensive gene expansions, resulting in multiple copies across several gene families, such as 58 members in CYP71, 40 members in CYP706, 33 members in CYP72, and 18 members in CYP82. The expansion appears to be primarily driven by gene duplication events, as 47 tandem duplicated clusters and 15 syntenic blocks of CiCYP450 genes were identified within the C. indicum genome. Gene duplication, along with variations in gene structure, motif composition, and cis-regulatory elements, likely led to a wide range of substrate specificities of CiCYP450 proteins, and thus provided C. indicum opportunities to meet diverse adaptive requirements [[191]3, [192]55]. For instance, the most expanded gene families (e.g., CYP71, CYP82, CYP706, and CYP72) were known to participate in the biosynthesis of flavonoids, alkaloids, and terpenoids, which coincidently coupled with the pharmacological activities and distinctive scent of Chrysanthemum [[193]3, [194]5, [195]56, [196]57]. Previous studies have identified the major constituents of Chrysanthemum essential oils as camphor, borneol, camphene, α-pinene, β-cymene, 1,8-cineole, and caryophyllene [[197]22, [198]25, [199]58, [200]59]. Our GC-MS analysis confirmed the presence of all these compounds, except for β-cymene, indicating the high feasibility and reproducibility of our results. In addition to the primary constituents, our analysis also detected 6 aromatic hydrocarbons and 7 fatty acid derivatives within the volatile metabolome, which was not reported in the aforementioned studies. These variations in chemical composition could stem from several factors, including species-specific differences in secondary metabolite production or disparities in experimental methodologies, such as the sampling of different tissues or variations in extraction and analytical techniques. It has been suggested that volatile terpenoids, including monoterpenes, sesquiterpenes and their oxygen-containing derivatives, are mainly catalyzed by members of the CYP71, CYP76 and CYP706 families [[201]3, [202]5, [203]60]. Among the 53 volatiles detected in C. indicum, 40 were terpenoids, suggesting that terpenoids were the dominant volatile components of this species. Furthermore, correlation analyses between gene expression levels and compound contents revealed that 21 terpenoids (e.g., 1,8-cineole, germacrene D, (+)-camphor, and camphene) were positively correlated with 17 genes, including CiCYP71-23, CiCYP76-4 and CiCYP706-35 (Table [204]S9) [[205]61, [206]62, [207]63, [208]64, [209]65], while the remaining 19 terpenoids (e.g., β-sesquiphellandrene, sabinene, borneol, and cis-β-farnesene) were associated with 17 genes, including CiCYP706-5 and CiCYP71-50 (Table [210]S9) [[211]61, [212]66], indicating that members of the same clan may be responsible for the biosynthesis of different terpenoids. In addition to terpenoids, seven fatty acid derivatives were also identified in C. indicum, indicating their contribution to unique scent of C. indicum. Previous studies have found that volatile fatty acid derivatives are primarily catalyzed by members of the CYP74 family [[213]67, [214]68]. Accordingly, we hypothesized that that the production of five fatty acid derivatives (i.e., cis-3-hexenyl acetate, methyl 2-hexenoate, methyl hexoate, hexyl acetate, and methyl isovalerate) was likely regulated by CiCYP74-3 [[215]69, [216]70] while the biosynthesis of 2-hexenal and hexanal were possibly controlled by CiCYP74-1 [[217]71], due to the high correlation between gene expression and compound contents. Our predictions suggest that many genes within the same family are capable of catalyzing the production of different volatile compounds, indicating functional divergence among homologous genes [[218]72, [219]73]. Nonetheless, further functional studies are necessary to confirm the roles of individual CiCYP450 genes. Conclusions In summary, our work presents a comprehensive genome-wide analysis of the cytochrome P450 superfamily in C. indicum genome, identifying 371 CYP450 genes across 44 families within 8 clans. Tandem duplication and segmental duplication appear to be the major drivers for the rapid expansion of CiCYP450 gene family. Gene duplication, together with the variations in gene structure, motif composition, and cis-regulatory elements, likely contributed to the functional diversity of CiCYP450 proteins. Integrated volatile metabolomic and transcriptomic analyses not only revealed several tissue-specific compounds and genes, but also identified 36 CiCYP450 genes potentially responsible for the biosynthesis of 47 volatile organic compounds. Our findings provide a valuable foundation for functional study of CiCYP450 genes and will accelerate genetic improvement of Chrysanthemum. Electronic supplementary material Below is the link to the electronic supplementary material. [220]Supplementary Material 1^ (3.1MB, pdf) [221]Supplementary Material 2^ (83.1KB, xlsx) Acknowledgements