Abstract Background Sophora tonkinensis Gagnep is a significant Chinese herbal medicine, primarily composed of alkaloids and flavonoids, which are its key pharmacological components. Despite its importance, the metabolic pathways of these substances in S. tonkinensis remain inadequately explored. Results This study investigates the molecular regulation of alkaloid and flavonoid accumulation in S. tonkinensis. Through high-throughput transcriptome sequencing (RNA-seq) and liquid chromatography-mass spectrometry (LC-MS) of seeds, leaves, stems, and roots, the research identifies differential metabolites and genes involved in alkaloid and flavonoid biosynthesis. The transcriptome analysis reveals 2,727 differentially expressed genes (DEGs), with 35 related to alkaloids and 48 to flavonoids. Metabolome analysis uncovers 296 differentially accumulated metabolites (DAMs), including 23 alkaloid-related DAMs and 23 flavonoid-related DAMs. Additionally, weighted gene co-expression network analysis suggests StCAO (evm.model.3.924) as a key regulator of alkaloid biosynthesis and StCHIs (evm.model.3.2047, evm.model.1.2104, and evm.model.1.2101) as crucial genes for flavonoid biosynthesis. To validate these findings, qPCR validation confirmed the consistency of expression trends for 12 selected DEGs across the roots, stems, leaves, and seeds of S. tonkinensis. Conclusion This study offers a comprehensive analysis of the regulatory mechanisms governing alkaloid and flavonoid accumulation, as well as the associated key genes, across various S. tonkinensis tissues. These findings pave the way for future research into the regulatory processes of alkaloids and flavonoids in S. tonkinensis. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-025-06865-4. Keywords: Sophora tonkinensis, Transcriptomics, Metabolomics, Alkaloids, Flavonoids Background Sophora tonkinensis Gagnep. (family Leguminosae), a medicinal plant native to China’s karst regions, is classified under national protection Category II [[42]1]. Known in Chinese, as “Shan-Dou-Gen”, its dried rhizomes and roots have been traditionally employed to treat ailments such as heat toxicity accumulation, sore throat, swollen gums, and mouth and tongue soreness [[43]2]. By 2018, over 150 compounds have been identified in S. tonkinensis, including more than 80 flavonoids, 40 volatile oils, and 20 alkaloids, with alkaloids and flavonoids being the primary pharmacological substances [[44]3, [45]4]. Notable alkaloids include substantial amounts of matrine and oxymatrine, along with minor quantities of sophocarpine, harmine, sophoranol, and sophoramol [[46]4]. Matrine and oxymatrine are recognized for their antibacterial, anti-inflammatory, and anti-cancer properties [[47]5–[48]7]. Besides its roots and rhizomes, S. tonkinensis also yields matrine and oxymatrine from its aerial stems and leaves [[49]1]. Up to 2022, over 100 flavonoids have been identified in S. tonkinensis, including dihydroflavonoids, isoflavonoids, flavonols, and chalcones, exhibiting diverse pharmacological activities such as anti-tumor, anti-inflammatory, antioxidant, antibacterial, antiviral, and cardiovascular and cerebrovascular protection [[50]8]. While research on S. tonkinensis has predominantly focused on germplasm breeding [[51]9], cultivation optimization [[52]10], and pharmacological effects [[53]11, [54]12], studies on the metabolic pathways of alkaloids and flavonoids remain scarce, with existing research primarily targeting extraction, detection, and activity analysis [[55]13, [56]14]. The rapid advancements in systems biology and high-throughput sequencing technologies have established multi-omics as an essential research method. This approach illuminates the dynamic changes in plant growth and development at both system and cellular levels. The integration of metabolomics and transcriptomics (RNA-seq) has emerged as a potent tool for exploring post-genomic processes and the molecular regulatory mechanisms underlying gene expression and metabolic pathways in plants [[57]15]. This combined analysis reveals the mechanisms and dynamic changes in the synthesis of various plant metabolic compounds, including those in Mesona chinensis [[58]16], Plumbago zeylanica [[59]17], Bupleurum chinense [[60]18], and Gynostemma pentaphyllum [[61]19]. In this study, transcriptomics and metabolomics were combined to systematically investigate the metabolic variations of alkaloids and flavonoids in the roots, stems, leaves, and seeds of S. tonkinensis. The study identified key enzyme genes closely associated with the biosynthesis of alkaloids and flavonoids, providing a foundation for exploring the molecular regulatory mechanisms behind the differential accumulation of these compounds in various tissues of S. tonkinensis. Furthermore, it enhances the understanding of the formation mechanisms of pharmacological effects and quality. Results Transcriptome alterations overview Transcriptome sequencing was conducted on the roots, stems, leaves, and seeds of S. tonkinensis (Fig. [62]1) to investigate gene expression regulation across different tissues. The acquired raw reads ranged from 45.71 to 51.09 million, with respective Q20 and Q30 values exceeding 97% and 93% (Table [63]S1), indicating high-throughput and high-quality RNA-Seq data. After removing low-quality reads, 44.36 to 49.33 million clean reads were analyzed further. Among these, 69.19–92.05% mapped to the S. tonkinensis genome. The RNA-Seq dataset, accession number PRJNA1052132, is archived in the NCBI SRA database. Additionally, the number of differentially expressed genes (DEGs) across various S. tonkinensis tissues was analyzed (Fig. [64]2A). A comprehensive set of 2,727 DEGs were identified in all comparison groups (Fig. [65]2B and Table [66]S2). Of these, 543 DEGs were commonly expressed in all groups, representing 3.55%. The remaining six groups showed specific genes for each comparison group (Fig. [67]2B), indicating that these genes were activated by S. tonkinensis in response to various tissue functional structures. To verify the reliability of the RNA-Seq data, 12 DEGs were selected, including two related to alkaloid biosynthesis, two related to flavonoid biosynthesis, two related to transcription factors, two related to photosynthesis, and four randomly selected DEGs. Their expression levels were assessed using qPCR, revealing a positive correlation between the expression profiles detected by qPCR and the RNA-Seq results (Fig. [68]3). Fig. 1. [69]Fig. 1 [70]Open in a new tab Phenotypic characteristics of the entire plant (A), roots (B), stems (C), leaves (D), and seeds (E) of S. tonkinensis Fig. 2. [71]Fig. 2 [72]Open in a new tab Analysis of differentially expressed genes (DEGs) in various S. tonkinensis tissues (A) Comparison of up- and down-regulated DEGs in various plant parts. B DEGs in various plant parts are depicted in a Venn diagram Fig. 3. [73]Fig. 3 [74]Open in a new tab Verification of 12 selected DEGs. A RNA-seq heatmap analysis. B Analysis of qPCR value differences. Different letters on the bars indicate significant differences at the 0.05 level DEGs involved in the alkaloid biosynthesis and flavonoid biosynthesis pathways To further understand the differential expression of genes associated with the biosynthetic pathways of alkaloids in distinct S. tonkinensis tissues, 35 DEGs involved in alkaloid biosynthesis were identified (Fig. [75]4A). These DEGs include ten tropinone reductases (TRs), eight copper amine oxidases (CAOs), seven polyphenol oxidases (PPOs), three tyrosine aminotransferases (TATs), two dependent decarboxylase conserves (DDCs), two histidinol-phosphate aminotransferases (HisCs), two aspartate aminotransferases (ASPs) and one lysine decarboxylase (LDC). Fig. 4. [76]Fig. 4 [77]Open in a new tab DEGs associated with alkaloid and flavonoid biosynthesis in different tissues of S. tonkinensis. A Genes related to alkaloid biosynthesis. B Genes related to flavonoid biosynthesis Additionally, 48 DEGs involved in flavonoid biosynthesis were identified (Fig. [78]4B), comprising 11 chalcone synthases (CHSs), seven hydroxycinnamoyl transferases (HCTs), six chalcone isomerases (CHIs), three flavonol synthases (FLSs), three spermidine hydroxycinnamoyl transferases (SHTs), two caffeoyl-CoA O-methyltransferases (CCoAOMTs), two cinnamate 4-hydroxylases (CYP73As), two flavonoid 3’,5’-hydroxylases (CYP75As), two dihydroflavonol reductases (DFRs), two flavonoid 3’,5’-methyltransferases (FAOMTs), one anthocyanidin reductase (ANR), one cinnamoyl-CoA O-methyltransferase (CCOMT), one flavonoid 3’-monooxygenase (CYP75B1), one coumaroylquinate (coumaroylshikimate) 3’-monooxygenase (CYP98A), one omega-hydroxypalmitate O-feruloyl transferase (HHT1), one leucoanthocyanidin dioxygenase (LDOX), one NAD(P)H-dependent 6’-deoxychalcone synthase (NADH), and one phloretin 2’-O-glucosyltransferase (PGT1). Examining expression levels revealed that StCAO (evm.model.3.924) in the alkaloid biosynthesis pathway was significantly enhanced across roots, stems, leaves, and seeds, with the highest expression observed in seeds. In the flavonoid biosynthetic pathway, StCHI (evm.model.3.2047) exhibited significantly higher expression in roots, stems, leaves, and seeds. Notably, StCHI expression in seeds was 1.76, 1.98, and 3.07 times greater than in roots, stems, and leaves, respectively, showing significant upregulation in seeds compared to other genes. Alkaloid and flavonoid content in different S. tonkinensis tissues As shown in Fig. [79]5, the concentrations of total alkaloids, matrine, and oxymatrine in seeds were significantly higher than in other tissues (P < 0.05), with values of 19.88 mg/g, 169.48 µg/g, and 9.43 mg/g, respectively. Similarly, the concentrations of total flavonoids and genistin in seeds were significantly elevated compared to other tissues (P < 0.05), measuring 14.02 mg/g and 27.13 µg/g, respectively. The concentration of genistein in leaves reached 15.05 µg/g, significantly higher than in other tissues (P < 0.05). These results highlight the variations in alkaloid and flavonoid contents across different S. tonkinensis tissues. Fig. 5. [80]Fig. 5 [81]Open in a new tab Analysis of alkaloid and flavonoid compounds in distinct S. tonkinensis tissues. A Total alkaloid content. B Total flavonoid content. C Oxymatrine content. D Genistin content. E Matrine content. F Genistein content. Different letters on the bars indicate significant differences at the 0.05 level Overview of the metabolome changes The results of metabolic compound detection in these samples underwent quality control (QC) analysis, as shown in Fig. [82]S1. The QC sample relative standard deviation (RSD) evaluation plot indicated that the data is qualified and reliable for subsequent analysis, with an RSD of 0.3 and a cumulative proportion of ion peaks reaching 82.01%. Principal component analysis (PCA) revealed that the first principal component (PC1) explained 43.90% of the total variance,, distinguishing samples based on different tissues of S. tonkinensis. The second principal component (PC2) accounted for 31.50% of the total variance (Fig. [83]6A). Samples from the same tissue clustered closely together, indicating good data quality. Roots and stems showed higher similarity in chemical composition, while leaves and seeds were more distinct, indicating unique metabolic characteristics and low chemical composition similarity. The PCA results underscore differences in the metabolic products among different S. tonkinensis tissues. Fig. 6. [84]Fig. 6 [85]Open in a new tab Principal component analysis (PCA) and differentially accumulated metabolite (DAM) analysis of different tissues of S. tonkinensis. A Metabolite PCA of LC-MS in each tissue. B Cluster heatmap between samples of different tissues. C Up- and down-regulated DAMs in certain tissues. D DAMs in various tissues depicted in a Venn diagram Pearson correlation coefficient analysis revealed consistent cumulative metabolite values among the six biological replicates in 24 S. tonkinensis samples (Fig. [86]6B). The correlation coefficient within group samples was higher than that between inter-group samples, affirming the reliability of the differential metabolites obtained. These findings underscore the robust correlation and reliability of experimental results across different tissues of S. tonkinensis. A total of 4,138 metabolites were identified from four tissues, with 2,282 in positive ion mode and 1,856 in negative ion mode. And the MSI level of each metabolite is level 2 (Table S3). PLS-DA model analysis screened differentially accumulated metabolites (DAMs) using significance criteria of P < 0.05 and VIP value > 1. The analysis revealed 1,553 DAMs between roots and stems (432 upregulated, 1,121 downregulated), 1,658 DAMs between roots and leaves (584 upregulated, 1,074 downregulated), 1,761 DAMs between roots and seeds (894 upregulated, 867 downregulated), 1,634 DAMs between stems and leaves (861 upregulated, 773 downregulated), 1,831 DAMs between stems and seeds (1,173 upregulated, 658 downregulated), and 1,846 DAMs between leaves and seeds (1,149 upregulated, 697 downregulated (Fig. [87]6C). These results underscore the diverse metabolic composition across different S. tonkinensis tissues. Further analysis identified 18 major DAMs among roots, stems, leaves, and seeds (Fig. [88]6D), including sparteine, homoferreirin, O-desmethyltramadol, dihydrofolic acid, deoxypyridinoline, artocarpesin, ID14326, 12,20-dioxo-leukotriene B4, CDP-ethanolamine, lucuminoside, 4-hydroxyandrostenedione glucuronide, 6-Epi-7-isocucurbic acid glucoside, isopropyl beta-D-glucoside, ribostamycin, (3 S,7E,9R)−4,7-megastigmadiene-3,9-diol 9-[apiosyl-(1->6)-glucoside], kuwanon B, wyerone, isovitexin 2’’-(6’’’-feruloylglucoside) 4’-glucoside. Sparteine, notably associated with alkaloid biosynthesis, was among the identified DAMs. Integrated transcriptome and metabolome analysis The histogram illustrates KEGG pathway enrichments of DEGs and DAMs. In root_vs_stem, 126 DEGs and 79 DAMs exhibited enrichment in 20 metabolic pathways, with significant enrichment (P < 0.01) in phenylpropanoid and flavonoid biosynthesis pathways (Fig. [89]7A). In leaf_vs_root, 130 DEGs and 84 DAMs showed enrichment in 21 pathways, significantly (P < 0.01) in isoflavonoid, phenylpropanoid, flavonoid, and betalain biosynthesis pathways (Fig. [90]7B). Root_vs_seed, involving 129 DEGs and 84 DAMs, highlighted 20 enriched pathways, with significant enrichment (P < 0.01) in phenylpropanoid, flavonoid, and isoflavonoid biosynthesis(Fig. [91]7C). Leaf_vs_stem, with 129 DEGs and 82 DAMs, exhibited 21 enriched pathways, significantly (P < 0.01) in flavonoid biosynthesis, phenylpropanoid biosynthesis, tryptophan metabolism, isoflavonoid biosynthesis, and alanine, aspartate and glutamate metabolism (Fig. [92]7D). In leaf_vs_seed, 132 DEGs and 69 DAMs revealed 21 enriched pathways, with significant enrichment (P < 0.01) in isoflavonoid, phenylpropanoid, and flavonoid biosynthesis (Fig. [93]7E). Seed_vs_stem, with 125 DEGs and 79 DAMs, indicated enrichment in 20 pathways, significantly (P < 0.01) in flavonoid biosynthesis, phenylpropanoid biosynthesis, tryptophan metabolism, isoflavonoid biosynthesis, and alanine, aspartate and glutamate metabolism (Fig. [94]7F). These findings underscore the critical metabolic pathways involved in phenylpropanoid, flavonoid, and isoflavonoid biosynthesis in S. tonkinensis. Fig. 7. [95]Fig. 7 [96]Open in a new tab KEGG pathway enrichment analysis of DEGs and DAMs in the transcriptome and metabolome. A root_vs_stem KEGG pathway enrichment analysis. B leaf_vs_root KEGG pathway enrichment analysis. C root_vs_seed KEGG pathway enrichment analysis. D leaf_vs_stem KEGG pathway enrichment analysis. E leaf_vs_seed KEGG pathway enrichment analysis. F seed_vs_stem KEGG pathway enrichment analysis. The horizontal axis represents metabolic pathways, while the vertical axis shows the enriched P-values of DEGs (red) and DAMs (green), expressed as–log^P. Pathways with P-values<0.05 are marked, and the top 10 pathways are selected for each comparison Co-expression network analysis Using RNA-seq and content data, weighted gene co-expression network analysis (WGCNA) examined genes related to flavonoid and alkaloid biosynthesis and metabolism. The dendrogram identified fifteen distinct modules (Fig. [97]8A). These modules, represented by different colors, showed significant associations with the content of alkaloids, flavonoids, matrine, oxymatrine, genistein, and genistin, notably the “magenta,” “tan,” “salmon,” “magenta,” “tan,” and “green-yellow” modules (r > 0.7, P < 0.05) (Fig. [98]8B). Fig. 8. [99]Fig. 8 [100]Open in a new tab Networks of co-expression of transcripts involved in the biosynthesis and metabolism of alkaloids, flavonoids, oxymatrine, matrine, genistin, and genistein. A Fifteen modules discovered by weighted gene co-expression network analysis (WGCNA) shown by a hierarchical cluster tree and color bands. Genes not classified into specific modules are shown in gray. Each branch in the tree corresponds to a certain gene. B Module–trait correlation analysis. Columns represent distinct chemical compounds, while rows represent modules. The left box indicates the number of genes in each module. Intersections of rows and columns display correlation coefficients and P-values between chemical compounds and modules. C Co-expression subnetwork analysis of magenta modules related to alkaloid and oxymatrine accumulation. D Co-expression subnetwork analysis of tan modules related to flavonoid, genistin and matrine accumulation. E Co-expression subnetwork analysis of greenyellow modules related to genistein accumulation Based on eigengene connectivity (kME) values in the co-expression network, subnetworks were generated using the top 30 node genes from the “magenta,” “tan,” and “green-yellow” modules. In the “magenta” module, evm.model.8.1855 (no functional annotation information) exhibited the highest kME value, followed by copper amine oxidase (StCAO, evm.model.3.924). Tyrosine aminotransferase (StTAT, evm.model.2.3509), involved in alkaloid biosynthesis, was also identified in this network (Fig. [101]8C). In the “tan” module, squalene monooxygenase (StSQLE, evm.model.5.860) had the highest kME value. Chalcone isomerase (StCHI, evm.model.1.2104 and evm.model.1.2101), essential for flavonoid synthesis, played a central role in this network (Fig. [102]8D). In the “green-yellow” module, chlorophyll a-b binding protein (StCAP10A, evm.model.1.2462) showed the highest kME value and strong correlations with other node genes, all associated with photosynthesis, indicating its crucial role in this network (Fig. [103]8E). These findings suggest that StCAO, StTAT, StSQLE, StCHI, and StCAP10A regulate alkaloid and flavonoid compound metabolism in different tissues of S. tonkinensis. qPCR verification of differential gene expression among seeds, leaves, stems, and roots showed a positive correlation between qPCR and RNA-seq results (Fig. S9). Discussion DEGs and DAMs were identified in different S. tonkinensis tissues Transcriptome data identified 2727 DEGs across the roots, stems, leaves, and seeds of S. tonkinensis (Fig. [104]2). DEG analysis linked these genes to processes, such as photosynthesis, plastid nucleoid, phenylpropanoid metabolism, oxidoreductase activity, secondary metabolism, nucleoside transmembrane transport, auxin-activated signaling, response to reactive oxygen species, starch metabolism, and thylakoid functions (Fig. [105]S2). These findings indicate that DEGs in different tissuesare influenced by both internal and external factors [[106]20]. Significant induction effects were observed isoflavonoid biosynthesis, phenylpropanoid biosynthesis, plant hormone signal transduction, nitrogen metabolism, flavonoid biosynthesis, isoquinoline alkaloid biosynthesis, amino sugar and nucleotide sugar metabolism, tropane, piperidine, and pyridine alkaloid biosynthesis, and the biosynthesis of various secondary metabolites (Fig. S3). Secondary metabolites exhibit dynamic changes throughout the plant’s growth, responding to diverse environmental conditions [[107]21]. Intracellular genes intricately regulate secondary metabolite synthesis, with chemical component accumulation affecting regulatory enzyme activities and associated gene expression [[108]22, [109]23]. Metabolome data identified 296 DAMs in the experimental group (Fig. [110]6). Oxysolavetivone and 1-ethyl-4-methyl-2-[2-(3-methylbutylamino)−2-oxoethyl]pyrrole-3-carboxy lic acid significantly accumulated in the roots of S. tonkinensis. Indomethacrylic acid, N-[2-(cyclohexen-1-yl)ethyl]-N’-[[1-(hydroxymethyl)cyclopropyl]methyl]o xamide, and N-diethyl-m-toluamide showed significant accumulation in the stems. The leaves exhibited significant accumulation of 3-genistein-8-c-glucoside, vitexin-2’-o-rhamnoside, vitexin-2’’-o-rhamnoside, and leucylphenylalanine. Matrine significantly accumulated in the seeds, and oxymatrine in both roots and seeds (Fig. S4). Notably, genistein, matrine, and oxymatrine displayed consistent changes across different tissues, aligning their relative abundance values with content changes in various tissues (Fig. [111]5). These findings highlight the influence of gene abundance values on the content of different components in S. tonkinensis. Procrustes analysis visualizes the overall correlation between the transcriptomes and metabolomes of diverse tissues. This method integrates multiple data types, such as microbiome, metabolome, and transcriptome, by assessing the distribution of datasets within the same system [[112]24]. In this study, Procrustes and histogram analyses assessed the overall correlation between the transcriptomes and metabolomes of different tissues in S. tonkinensis. KEGG pathway enrichment analyses for both DEGs and DAMs highlighted the significance of flavonoid, phenylpropanoid, and isoflavonoid biosynthesis as crucial metabolic pathways in S. tonkinensis. Comprehensive transcriptomics and metabolomics analyses elucidated the metabolic pathways within the roots, stems, leaves, and seeds of S. tonkinensis. This detailed exploration provides profound insights into the biosynthetic mechanisms responsible for the active ingredients in S. tonkinensis. Clarifying these mechanisms at the molecular level offers a theoretical foundation and reference for further research on the metabolic engineering of active ingredients in S. tonkinensis. Metabolites and genes involved in alkaloid biosynthesis in S. tonkinensis Alkaloid metabolism in S. tonkinensis is linked to significant therapeutic effects, including anti-inflammatory [[113]25], antiviral [[114]26], and anti-cancer properties [[115]27]. Matrine, a key alkaloid primarily derives from the quinolizidine alkaloids (QAs) biosynthesis pathway [[116]27]. However, the detailed mechanism of QA biosynthesis remains largely unknown [[117]28], impeding a comprehensive understanding of alkaloid metabolism in S. tonkinensis. This study identified 35 DEGs involved in alkaloid biosynthesis, including eight unigenes encoding CAO (Fig. [118]4A), consistent with previous research on CAO in S. tonkinensis under drought stress [[119]29]. Among these, all eight CAO-encoding unigenes were expressed in various tissues, with four highly expressed in leaves, two in seeds, and two in stems. Notably, StCAO (evm.model.3.924) showed significantly upregulated expression in seeds compared to other genes. Correspondingly, alkaloid content in seeds was significantly higher than in roots, stems, and leaves (Fig. [120]5). Additionally, 23 alkaloid synthesis-related DAMs were preliminarily identified in different tissues of S. tonkinensis, including morphine, alkaloid RC, aquifoliunine EI, tetrahydroharmol, cinchonidine, 5-methoxycanthin-6-one, 3-hydroxyquinidine, 3-hydroxyquinine, and matrine (Fig. S5). Cinchona alkaloids and morphinans exhibited the highest content among the four tissues, followed by harmala alkaloids, lupin alkaloids, rhoeadine alkaloids, strychnos alkaloids, betalains, indolonaphthyridine alkaloids, erythrina alkaloids, benzophenanthridine alkaloids and camptothecins. Notably, lupin alkaloids, a subset of QAs, have less developed biosynthesis research compared to some economically important plants [[121]30]. Transcriptome and gene co-expression analyses identified specific metabolic pathways linked to new genes [[122]31]. This approach selected genes with expression patterns similar to alkaloid content, revealing potential alkaloid biosynthesis genes such as CAO (StCAO, evm.model.3.924) and tyrosine aminotransferase (StTAT, evm.model.2.3509). Historically, CAO has been implicated in the second step of QA biosynthesis, converting cadaverine to 5-aminopentanal and subsequently cyclizing it to 1-piperideine [[123]30, [124]32]. This suggests that the StCAO gene serves as a potential indicator of alkaloid accumulation in S. tonkinensis, providing theoretical and genetic resources for developing new varieties with high alkaloid content. Metabolites and genes involved in flavonoid biosynthesis in S. tonkinensis S. tonkinensis is a valuable medicinal plant, particularly rich in flavonoids, a class of plant secondary metabolites that include flavonols, flavanones, flavones, and flavonols playing crucial roles in many biological processes [[125]33]. However, limited information has hindered molecular-level exploration of flavonoids in S. tonkinensis. Our study presents a transcriptome assembly for S. tonkinensis, revealing 48 DEGs associated with flavonoid biosynthesis (Fig. [126]4B). Eleven unigenes encoding CHS, the initial enzyme in the plant flavonoid synthesis pathway and a key enzyme in secondary metabolic pathways, were expressed across all tissues. Among these, six unigenes exhibited high expression in roots, two in seeds, two in leaves, and one in stems. CHS holds significant physiological importance for plants [[127]34]. Additionally, six CHI-encoding unigenes were identified, with generally higher expression levels in roots and seeds. Notably, StCHI (evm.model.3.2047) displayed significantly higher expression levels across various S. tonkinensis tissues, particularly in seeds where its expression surpassed that of other genes. This correlates with the higher levels of total flavonoids and genistin content observed in S. tonkinensis seeds compared to roots, stems, and leaves (Fig. [128]5). Previous research has highlighted the pivotal regulatory and catalytic roles of CHS and CHI in flavonoid biosynthesis, emphasizing their impact on compound accumulation and diversity [[129]35]. Therefore, the transcriptional changes in genes encoding these enzymes in S. tonkinensis tissues may indicate their important roles in flavonoid metabolism. Flavonoid accumulation levels are correlated with the expression levels of other genes involved in flavonoid synthesis [[130]36]. FLS, a key enzyme that converts dihydroflavonol to flavonol, promotes flavonol accumulation [[131]37]. In this study, FLS expression levels varied, with two unigenes showing higher expression in stems and leaves, and one in roots. Distinctive expression patterns of genes involved in flavonoid biosynthesis were observed across the roots, stems, leaves, and seeds of S. tonkinensis. The combined expression of these enzyme-encoding genes is likely associated with changes in flavonoid compound concentrations and related metabolites in different tissues. The study initially identified 23 flavonoid-related DAMs in different tissues of S. tonkinensis, including butein, galangin, L-epicatechol, tricetin, and naringenin chalcone (Fig. S6). Phenylpropanoid biosynthesis and flavonoid biosynthesis were considered crucial metabolic pathways for the differential metabolites in different tissues. The phenylpropanoid pathway constitutes the initial three steps in the flavonoid biosynthetic pathway, converting phenylalanine to 4-coumaroyl-CoA through phenylalanine ammonia-lyase (PAL) and 4-coumaroyl-CoA ligase (4CL). Subsequently, 4-coumaroyl-CoA is converted into flavonoids via FLS, DFR, CHS, CHI, and ANR [[132]38]. Co-expression network analysis identified genes exhibiting similar expression patterns to flavonoid content, including potential flavonoid biosynthesis genes such as CHI (StCHIs, evm.model.1.2104 and evm.model.1.2101). CHI is a crucial rate-limiting enzyme in flavonoid biosynthesis catalyzing the intramolecular cyclization of chalcones into specific 2 S-flavanones [[133]39]. Additionally, a close association was observed between squalene monooxygenase (StSQLE, evm.model.5.860) and genes involved in flavonoid biosynthesis. Squalene monooxygenase, a rate-limiting enzyme in the cholesterol biosynthetic pathway, catalyzes squalene cyclization [[134]40, [135]41]. The epoxidation of squalene by StSQLE may indirectly impact the activity or expression levels of enzymes involved in flavonoid biosynthesis, thus influencing flavonoid synthesis. Conclusions This study compares DEGs and DAMs associated with alkaloid and flavonoid biosynthesis in different tissues of S. tonkinensis using transcriptome and metabolome datasets. A total of 2,727 DEGs were identified, including 35 related to alkaloid biosynthesis and 48 related to flavonoid biosynthesis. Additionally, 296 DAMs were identified, encompassing 23 alkaloid-related and 23 flavonoid-related DAMs. Metabolome data analysis underscored the significance of phenylpropanoid and flavonoid biosynthesis as pivotal metabolic pathways in S. tonkinensis, corroborating the integrated analysis results. Furthermore, WGCNA highlighted StCAO (evm.model.3.924) as a key gene regulating alkaloid metabolism and StCHI (evm.model.3.2047, evm.model.1.2104, and evm.model.1.2101) as crucial genes regulating flavonoid metabolism. These findings provide valuable insights for future investigations into the regulatory mechanisms of alkaloids and flavonoids in S. tonkinensis. Materials and methods Plant materials This study utilized seeds, leaves, stems, and roots of S. tonkinensis sourced from the S. tonkinensis planting base in Huanjiang County, Hechi City, Guangxi Zhuang Autonomous Region, China (24°53′53″ N, 107°53′29″ E). The plant materials were formally identified by Professor Kunhua Wei and have been deposited in the Germplasm Repository of the Guangxi Botanical Garden of Medicinal Plants, Nanning, China, under voucher number YYZW20220070. Planting began in March 2019, with sampling occurring in September 2022. The region has an annual average temperature of approximately 20–22 ℃, with the coldest month being January (around 10 ℃) and the hottest month being July (approximately 28 ℃). The roots, stems, leaves, and seeds were labeled Root, Stem, Leaf, and Seed, respectively. Samples were collected from five healthy plants of similar size and growth, with three biological replicates for transcriptome analysis and six for metabolome analysis. All samples were rinsed with distilled water, rapidly placed in liquid nitrogen, and stored at − 80 ℃ for subsequent experimental analysis. RNA isolation, library construction, and illumina sequencing Total RNA was extracted from the tissue samples, and RNA purity and concentration were assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). RNA integrity was evaluated with an Agilent 2100/Lab Chip GX. Eukaryotic mRNA was enriched using magnetic beads with oligo(dT), and the mRNA was then fragmented using a fragmentation buffer. The first and second cDNA strands were synthesized using the mRNA as a template, followed by cDNA purification. For sequencing, end repair, A-tail addition, and adapter ligation were performed on the purified double-stranded cDNA. AMPure XP beads were used for fragment size selection, and the cDNA library was enriched through PCR. Following library quality control, all mRNA transcripts were sequenced using the Illumina NovaSeq 6000 platform, generating raw reads. Low-quality reads containing adapters, an N ratio over 10%, and a Q10 base quality value exceeding 50% of the entire read were removed, resulting in high-quality clean reads. Read mapping and annotation Following quality control, mapped reads were obtained by aligning the clean reads with the S. tonkinensis genome for further analysis. TopHat2 software ([136]http://tophat.cbcb.umd.edu/) was used for sequence alignment analysis [[137]42]. StringTie software ([138]http://ccb.jhu.edu/software/stringtie/) was employed to assemble and concatenate the mapped reads [[139]43]. Gene function annotations were based on the following databases: euKaryotic Orthologous Groups (KOG), Swiss-Prot (a manually annotated and reviewed protein sequence database), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG), Clusters of Orthologous Groups (COG), Protein family (Pfam), and National Center for Biotechnology Information (NCBI) non-redundant protein sequences (NR). Differential gene expression analysis Gene expression levels were quantified using the fragments per kilobase of transcript per million fragments (FPKM) value. DEGs between the two groups were identified through differential gene expression analysis. DESeq2 software [[140]44] facilitated this analysis, applying a screening threshold of a fold change greater than 2 for both up- and down-regulation, with a significance level set at P < 0.05. Quantitative real-time PCR analysis Quantitative real-time PCR (qPCR) determined the expression of selected genes. Twelve genes were chosen, and specific primers were generated using the retrieved sequences. The reference gene was evm.model.5.2407. The qPCR reactions were conducted on a qTOWER2.2 Real-Time PCR system (AnalytikJena, Jena, Germany) with 2×SYBR Green Supermix (Bio-Rad, Hercules, CA, USA). Table S4 lists the qPCR primers. Three biological replicates per sample were used for analysis, and the 2^−ΔΔCt method determined the relative expression levels [[141]45]. Metabolite extraction A 6 mm diameter grinding bead and 50 mg of the sample were placed in a 2 mL centrifuge tube. Then, 400 µL of extraction solution (methanol: acetonitrile = 1:1, v: v) was added under low-temperature conditions. The tissue was ground with a freezing tissue grinder for 6 min at − 10 ℃ and 50 Hz. This was followed by a 30-minute low-temperature ultrasound extraction (5 ℃, 40 kHz). The sample was then kept at − 20 ℃ for another 30 min. After 15 min of centrifugation at 4 ℃ and 13,000 × g, the supernatant was transferred to an insert-coupled sample vial for analysis. Additionally, a quality control sample was prepared by transferring and mixing 20 µL of supernatant from each sample. Mass spectrometry parameter settings and data collection Metabolomic analysis utilized an ultra-high performance liquid chromatography tandem mass spectrometry (UHPLC-MS/MS) system. Chromatography conditions included an ACQUITY UPLC HSS T3 column (100 mm × 2.1 mm, 1.8 μm; Waters, Milford, MA, USA) with a column temperature of 40 ℃. Mobile phase A consisted of 95% water + 5% acetonitrile (with 0.1% formic acid), while mobile phase B comprised 47.5% acetonitrile + 47.5% isopropanol + 5% water (with 0.1% formic acid). The positive ion mode elution gradient steps were: 100:0 phase A/phase B at 0 min; 80:20 phase A/phase B at 3 min; 65:35 phase A/phase B at 4.5 min; 0:100 phase A/phase B at 5 min; 0:100 phase A/phase B at 6.3 min; 100:0 phase A/phase B at 6.4 min; 100:0 phase A/phase B at 8.0 min. For the negative ion mode, the steps were: 100:0 phase A/phase B at 0 min;95:5 phase A/phase B at 1.5 min; 90:10 phase A/phase B at 2 min; 70:30 phase A/phase B at 4.5 min; 0:100 phase A/phase B at 5 min; 0:100 phase A/phase B at 6.3 min; 100:0 phase A/phase B at 6.4 min; 100:0 phase A/phase B at 8.0 min. The flow rate was 0.4 mL/min, and the injection volume was 3 µL. Mass spectrometry data were acquired using an electrospray ionization source (ESI) in both negative and positive ion modes on a Q-Exactive HF-X mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). The positive ion spray voltage was set at 3500 V, and the negative ion spray voltage at − 3500 V. The sheath and auxiliary gas flow rates were 50 and 13 arb, respectively. The capillary temperature was 325 ℃, and the heater temperature was 425 ℃. The scanning range for a full scan was 70–1050 m/z, with a full MS resolution of 60,000, and MS^2 resolution of 7500. The normalized collision energy was set at 20, 40, and 60 eV. Further data processing was conducted using Progenesis QI software (Waters Corporation, Milford, USA). Metabolite were identified by searching internal and public databases ([142]https://hmdb.ca/, [143]https://metlin.scripps.edu/), and compare to the KEGG database ([144]https://www.kegg.jp/kegg/compound/) for metabolic pathway analysis. Conjoint analysis KEGG pathway enrichment analysis evaluated selected differential gene sets and differential metabolic sets for enrichment. P-values were corrected using the Benjamini–Hochberg (BH) method, with pathway enrichment considered significant when the corrected P-value was < 0.05. Phytochemical determination Alkaloid content determination: The sample was dried, crushed, and sieved through an 80-mesh screen. Approximately 0.1 g of the sample was weighed, and 1 mL of 80% ethanol was added. After thorough mixing, mixture was transferred to an EP tube, subjected to ultrasound extraction for 60 min, and centrifuged for 10 min at 8,000 ×g and 25 ℃. The supernatant was retrieved for testing. Following the steps outlined in the alkaloid content kit (Suzhou Michy Biomedical Technology Co., Ltd. Item No. M0122A), the solution was vigorously shaken and allowed to settle for 40 min at 25 ℃. The lower chloroform layer was absorbed with 200 µL, and the absorbance at 416 nm was measured. Flavonoid content determination: The sample was dried to a constant weight, crushed, passed through a 40-mesh screen, and approximately 0.1 g was weighed. Subsequently, 1 mL of 60% ethanol was added, and the mixture was shaken for 2 h at 60 ℃, followed by 10 min of centrifugation at 10,000 ×g and 25 ℃. The supernatant was retrieved for testing. Following the steps outlined in the flavonoid content kit (Suzhou Michy Biomedical Technology Co., Ltd. Item No. M0118A), the solution was mixed and allowed to settle for 15 min at 25 ℃. The absorbance was measured at 510 nm. Matrine and oxymatrine content determination: The high-performance liquid chromatography (HPLC) method, as per the Chinese Pharmacopoeia (Commission, 2020: Content determination), was employed for content determination. A sample of 0.1 g was ground and mixed with 1.0 mL of methanol. After creating a slurry using a grinding instrument, ultrasonic extraction was conducted for 1 h. The supernatant obtained from centrifugation was filtered through a 0.45 μm filter for HPLC analysis. An Agilent 1100 high-performance liquid chromatography instrument (wavelength 220 nm) was used. The chromatographic conditions were as follows: chromatographic column, Compass C18 (2) reversed-phase chromatographic column (250 mm × 4.6 mm, 5 μm), 10 µL injection volume, 0.8 mL/min flow rate, and 30 ℃ column temperature. The mobile phase consisted of acetonitrile and a mixed solution (3.4 g of potassium dihydrogen phosphate in 500 mL water with 900 µL of triethylamine) in a 1:9 ratio. Calibration curves for matrine and oxymatrine were generated by diluting reference standards with methanol to appropriate concentrations. Each sample analysis included three biological and three technical replicates. The reference standard chromatogram is shown in Fig. S7. Genistin and genistein content determination: The sample was ground into powder, and approximately 0.1 g was weighed, and mixed with 1.0 mL of 70% methanol. The slurry was prepared using a grinding instrument followed by a two-hour ultrasonic extraction. The supernatant obtained from centrifugation was filtered through a needle filter. An Agilent 1100 HPLC instrument (wavelength 260 nm) was used for analysis. The chromatographic conditions were: Compass C18 (2) reversed-phase column (250 mm × 4.6 mm, 5 μm), 10 µL injection volume, 1.0 mL/min flow rate, and 30 ℃ column temperature. The mobile phase consisted of methanol and 1% acetic acid water. Calibration curves for genistin and genistein were generated by diluting reference standards with methanol to appropriate concentrations. Each sample analysis included three biological and three technical replicates. The reference standard chromatogram is shown in Fig. S8. Statistical analysis One-way ANOVA with least significant difference (LSD) multiple comparisons tested the data, and statistical analyses were performed using SPSS software version 22.0. Results were expressed as means ± standard errors of the means. A significance criterion of P < 0.05 was established. Student’s t-test calculated the P-value for the significance of samples and qPCR assay results. Supplementary Information [145]Supplementary Material 1.^ (7.9MB, docx) [146]Supplementary Material 2.^ (2.7MB, xlsx) Acknowledgements