Abstract Iodes seguinii is a woody vine known for its potential therapeutic applications in treating rheumatoid arthritis (RA) due to its rich bioactive components. Here, we achieved the first chromosome‐level assembly of the nuclear genome of I. seguinii using PacBio HiFi and chromatin conformation capture (Hi‐C) sequencing data. The initial assembly with PacBio data produced contigs with an N50 length of 9.71 Mb, and Hi‐C data anchored these contigs into 13 chromosomes, achieving a total length of 273.58 Mb, closely matching the estimated genome size. Quality assessments, including BUSCO, long terminal repeat assembly index, transcriptome mapping rates, and sequencing coverage, confirmed the high quality, completeness, and continuity of the assembly, identifying 115.28 Mb of repetitive sequences, 1062 RNA genes, and 25,270 protein‐coding genes. Additionally, we assembled and annotated the 150,599 bp chloroplast genome using Illumina sequencing data, containing 121 genes including key DNA barcodes, with maturase K (matK) proving effective for species identification. Phylogenetic analysis positioned I. seguinii at the base of the Lamiales clade, identifying significant gene family expansions and contractions, particularly related to secondary metabolite synthesis and DNA damage repair. Metabolite analysis identified 84 active components in I. seguinii, including the discovery of luteolin, with 119 targets predicted for RA treatment, including core targets like AKT1, toll‐like receptor 4 (TLR4), epidermal growth factor receptor (EGFR), tumor necrosis factor (TNF), TP53, NFKB1, janus kinase 2 (JAK2), BCL2, mitogen‐activated protein kinase 1 (MAPK1), and spleen‐associated tyrosine kinase (SYK). Key active components such as flavonoids and polyphenols with anti‐inflammatory activities were highlighted. The discovery of luteolin, in particular, underscores its potential therapeutic role. These findings provide a valuable genomic resource and a scientific basis for the development and application of I. seguinii, addressing the genomic gap in the genus Iodes and the order Icacinales and underscoring the need for further research in genomics, transcriptomics, and metabolomics to fully explore its potential. Core Ideas * First chromosome‐level genome assembly of Iodes seguinii using PacBio HiFi and chromatin conformation capture, providing key resources. * Identified 84 bioactive compounds in I. seguinii, including luteolin, with potential for arthritis treatment. * Revealed 119 therapeutic targets, such as AKT1, toll‐like receptor 4, and tumor necrosis factor, showing potential for autoimmune disease treatment. * Expanded gene families linked to metabolites offer insights into I. seguinii's evolutionary adaptation. * Network pharmacology and docking show luteolin and flavonoids bind to therapeutic targets, aiding drug development. __________________________________________________________________ Abbreviations AKT1 AKT serine/threonine kinase 1 BCL2 BCL2 apoptosis regulator CHI chalcone isomerase CHS chalcone synthase EGFR epidermal growth factor receptor GO gene ontology Hi‐C chromatin conformation capture HOG hierarchical orthologous group JAK2 janus kinase 2 KEGG Kyoto encyclopedia of genes and genomes MAPK1 mitogen‐activated protein kinase 1 matK maturase K NF‐κB nuclear factor kappa‐light‐chain‐enhancer of activated B cells OPLS‐DA orthogonal partial least squares discriminant analysis PCA principal component analysis PCR polymerase chain reaction psbA‐trnH intergenic spacer region QC quality control RA rheumatoid arthritis rbcL ribulose‐1,5‐bisphosphate carboxylase large subunit SMILES simplified molecular input line entry specification SYK spleen‐associated tyrosine kinase TLR4 toll‐like receptor 4 TNF tumor necrosis factor 1. INTRODUCTION Globally, the family Icacinaceae comprises approximately 58 genera and 400 species, primarily distributed in tropical and subtropical regions, with the Southern Hemisphere serving as their predominant habitat (Allen et al., [32]2015). In China, 13 genera and 25 species of Icacinaceae are found, mainly in the southern and southwestern regions. Specifically, 10 genera have been identified in Yunnan Province: Apodytes, Gomphandra, Gonocaryum, Natsiatum, Nothapodytes, Pittosporopsis, Platea, Mappianthus, and Iodes. Among these, several genera include medicinal plants with notable pharmacological properties. For example, Mappianthus iodoides, from the genus Mappianthus, is a traditional medicinal plant used by the Dai people to treat irritability, frequent urination, and traumatic injuries, while the Yao people use its roots and stems to treat jaundice hepatitis, menstrual disorders, and snake bites (Hazarika et al., [33]2023). The genus Iodes is rich in active compounds and exhibits therapeutic effects in treating conditions such as rheumatism, nephritis, dysuria, and swelling pain by dispelling wind and cold, removing dampness, and promoting blood circulation (Gan et al., [34]2008; Ramesha et al., [35]2013). The Iodes genus in China includes several species, such as Iodes cirrhosa, Iodes balansae, Iodes vitiginea, and Iodes seguinii, predominantly found in Yunnan Province. Research has unveiled multiple phenolic compounds in I. cirrhosa, like lignans, phenylpropanoids, and simple phenols, with potential applications in treating inflammation, rheumatic diseases, nephritis, and cancer (Gan et al., [36]2008; Murray & Young, [37]2019; Ramesha et al., [38]2013; Thi Ngoc et al., [39]2022). This species, however, is the sole member of the Iodes genus subjected to genomic analysis, revealing a chloroplast genome of 151,994 base pairs, housing 80 protein‐coding genes, 28 tRNA genes, and four rRNA genes (L. Wang et al., [40]2019). Therefore, the lack of fundamental genomic research has significantly hindered the study of the Iodes genus. Iodes seguinii is distinguished by its nodular lenticels and is prized for its sweet, pungent fruit, as well as its medicinal properties (Figure [41]1a). In Guangxi, this plant is particularly valued for treating nephritis and is sometimes used as a substitute for Stephania tetrandra, a traditional remedy known for alleviating rheumatic arthralgia and edema (Jiang et al., [42]2020). The therapeutic attributes of I. seguinii suggest that it may be beneficial in managing rheumatic immune disorders. Despite its potential, research on the medicinal molecules and mechanisms of action of I. seguinii is almost nonexistent, highlighting a significant gap in the understanding of this plant's pharmacological potential. FIGURE 1. FIGURE 1 [43]Open in a new tab Comprehensive de novo genome assembly analysis. (a) Morphological characteristics of I. seguinii. (b) GenomeScope projection of genome size and heterogeneity using a 17‐mer analysis. (c) Smudgeplot estimation depicting genome ploidy. (d) 3D‐DNA produced Hi‐C interaction map showcasing chromosomal interactions. (e) Tracks A–F represent the distribution of chromosome karyotypes, guanine–cytosine (GC) content density, gene density, density of repeat sequences, RNA gene density, and genomic collinearity, respectively, with each density measured over 100‐kb genomic windows. In this study, we have documented high‐quality nuclear and chloroplast genomes of I. seguinii. Moreover, we have investigated the potential therapeutic applications of active compounds found in the roots, stems, and leaves of this plant for the treatment of rheumatoid arthritis (RA). These findings provide a robust molecular genetic basis for further genetic enhancement, quality improvement, and biosynthetic studies of I. seguinii. 2. MATERIALS AND METHODS 2.1. Sample collection, library construction, and sequencing Entire plants of I. seguinii growing naturally were collected from Xichou County in the Wenshan Zhuang and Miao Autonomous Prefecture, Yunnan Province. Each plant was comprehensively sampled, including root, stem, and leaf tissues, and individually labeled as Xichou#1, Xichou#2, and Xichou#3. The samples were quickly frozen using liquid nitrogen and subsequently stored at −80°C. Observations and photographic records were made to document the size, shape, color, surface characteristics, and texture of the fresh plants. For nuclear genome assembly from Xichou#1, genomic DNA was extracted from young leaves using the UNlQ‐10 Column Trizol Total RNA Isolation Kit and assessed via agarose gel electrophoresis for quality. Single molecular real‐time sequencing libraries were prepared following Pacific Biosciences' standard protocols. The genomic DNA was sheared to approximately 20 kb, and damaged ends were repaired and linked with blunt‐end adaptors. The libraries were then sequenced on the PacBio Sequel II platform. The workflow for constructing Illumina libraries is as follows: genomic DNA is fragmented into target fragments of approximately 350 bp using ultrasonic shearing. The fragments undergo end repair, addition of an A base, adapter ligation, target fragment selection, and polymerase chain reaction (PCR) amplification to construct the short fragment sequencing library. For the chromatin conformation capture (Hi‐C) library construction, leaf samples were first fixed with formaldehyde, and chromatin was extracted. This chromatin was then digested using 400 U of DPNII enzyme at 37°C. The DNA ends were biotin‐labeled and ligated with T4 DNA ligase from NEB. Post ligation, proteinase K was used for reverse cross‐linking. The DNA fragments were purified and dissolved, then fragmented to 350–500 bp, and biotin‐labeled fragments were isolated with Dynabeads MyOne Streptavidin C1. The Illumina libraries and Hi‐C libraries were both sequenced using the Illumina NovaSeq 6000 sequencing platform. Core Ideas * First chromosome‐level genome assembly of Iodes seguinii using PacBio HiFi and chromatin conformation capture, providing key resources. * Identified 84 bioactive compounds in I. seguinii, including luteolin, with potential for arthritis treatment. * Revealed 119 therapeutic targets, such as AKT1, toll‐like receptor 4, and tumor necrosis factor, showing potential for autoimmune disease treatment. * Expanded gene families linked to metabolites offer insights into I. seguinii's evolutionary adaptation. * Network pharmacology and docking show luteolin and flavonoids bind to therapeutic targets, aiding drug development. For chloroplast genome assembly, total RNA was extracted from leaf samples using the RNAprep Pure Plant Plus Kit. Equal amounts of total RNA from different samples were mixed, and this mixture was then used for library construction and sequencing. The Illumina library construction process involves using magnetic beads with Oligo (dT) to enrich eukaryotic mRNA, adding Fragmentation Buffer to randomly break the mRNA, using mRNA fragments as templates to synthesize the first strand of cDNA with six‐base random primers, and then adding buffer, dNTPs, RNase H, and DNA polymerase I for second‐strand cDNA synthesis. The double‐stranded cDNA is purified using AMPure XP magnetic beads, followed by end repair, A‐tailing, and sequencing adapter ligation, with further size selection using AMPure XP beads, and finally enriched by PCR to complete the cDNA library construction. High‐throughput sequencing of both ONT and Illumina libraries was performed on PromethION P48 and Illumina HiSeq X Ten platforms, respectively. 2.2. Methodology for species identification using DNA barcoding For precise species identification, the ribulose‐1,5‐bisphosphate carboxylase large subunit (rbcL), maturase K (matK), and intergenic spacer region (psbA‐trnH) sequences were amplified using genomic DNA (Figure [44]S1). Primer sequences utilized for amplifying these DNA barcodes, along with the specific PCR reaction protocols, are detailed in Table [45]S1. The sequencing results of rbcL and matK were compared with sequences in the BOLD Systems database ([46]https://www.boldsystems.org), while the psbA‐trnH sequences were aligned against the Nucleotide Sequence Database (NT). Species identification was based on comparison scores, sequence similarity, and E‐value. 2.3. Genome survey, assembly, and quality assessment As illustrated by the bioinformatics pipeline in Figure [47]S2, Jellyfish (v2.2.10) calculates k‐mer frequencies from genomic Illumina sequencing data. Subsequently, a histogram of these frequencies is generated using the same software. GenomeScope (v2.0) then evaluates the nuclear genome's total length, heterozygosity, and repeat content. Finally, Smudgeplot (v0.2.5) determines the k‐mer coverage range for nuclear genome ploidy assessment and generates smudge plots. The parameters set for this assessment are default. Hifiasm software (v0.19.4‐r575) leverages quality‐controlled genomic PacBio HiFi and Hi‐C sequencing data to construct accurate and contiguous haploid genome assemblies, further processed into FASTA format using gfatools (v0.4‐r214‐dirty). HiCUP software (v0.9.2) aligns Hi‐C sequencing data to these contigs, which are assembled into scaffolds using YaHS software (v1.2a.1), guided by the recognition site of HindIII. The assembly is refined and validated using Juicer software (v1.1) and Juicebox software (v1.11.08), which utilize hic and assembly files to correct assembly errors and finalize chromosome‐level structures. BUSCO software (v5.4.6) evaluates the completeness and accuracy of the assembly by comparing it to the Embryophyta_odb10 dataset, a comprehensive plant orthologous gene database. Alignment and coverage analyses are performed using HISAT2 (v2.2.1), minimap2 (v2.17‐r941), and samtools (v1.18), supplemented by mosdepth (v0.3.6) for precise coverage calculations, ensuring a robust assessment of the sequencing and assembly processes. 2.4. Genome annotation To identify long terminal repeats (LTRs) in the nuclear genome, LTRharvest (v1.6.1) and LTR_FINDER_parallel (v1.2) are employed (Ellinghaus et al., [48]2008; Ou & Jiang, [49]2019). Results from both software are integrated, and the long terminal repeat assembly index (LAI) score is calculated using LTR_retriever (v2.9.0) (Ou & Jiang, [50]2018). RepeatModeler (v2.0.4) predicts repeat sequences de novo in the nuclear genome (Abrusan et al., [51]2009). Predicted results are merged with repeat sequences from the Repbase and Dfam databases and annotated using RepeatMasker (v4.1.4) (Jurka et al., [52]2005; Storer et al., [53]2021; Tarailo‐Graovac & Chen, [54]2009). After that, tRNAscan‐SE (v2.0.11) and RNAmmer (v1.2) are used to predict tRNA and rRNA genes, respectively (Chan et al., [55]2021; Lagesen et al., [56]2007). The INFERNAL software's cmscan tool (v1.1.4) compares covariance models from the Rfam database with the nuclear genome to identify RNA genes (Kalvari et al., [57]2021; Nawrocki & Eddy, [58]2013). Three strategies are utilized for annotating protein‐coding genes in the nuclear genome: de novo prediction, homology‐based prediction, and transcriptome‐based prediction. PASA (v2.5.2) aligns transcriptome splicing results with the nuclear genome (Haas, [59]2003). The correct gene structures from PASA are used for training AUGUSTUS (v3.5.0) and GlimmerHMM (v3.0.4), while GeneMark‐ET (v4.71_lic) is trained with intron positions from the nuclear genome (Bruna et al., [60]2020; Majoros et al., [61]2004; Stanke et al., [62]2008). Genome sequences and annotations from the NCBI Genome database for Coffea arabica (C. arabica) and Coffea eugenioides (C. eugenioides), along with transcriptome alignment results, are input into GeMoMa (v1.9) for homology‐base d prediction (Keilwagen et al., [63]2018). Protein sequences from the Lamiaceae family in the UniProt database are used for GenomeThreader (v1.7.1) homology prediction. EVidenceModeler (v1.1.1) integrates results from de novo, protein, and transcript‐based homology predictions to form the final gene structure (Haas et al., [64]2008). Predicted protein‐coding genes are functionally annotated using similarity comparison methods. DIAMOND (v2.0.15.153) searches for homologous genes in the UniProt and NR databases with parameters set to evaluate 1e‐5 and ‐k 1 (Buchfink et al., [65]2021; UniProt, [66]2023). The eggNOG‐mapper and KAAS websites provide gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) annotations for these genes (Cantalapiedra et al., [67]2021; Moriya et al., [68]2007). Finally, InterproScan (v5.61‐93.0) performs structural domain annotations using databases like Pfam and also provides GO terms and InterPro domain annotations (Jones et al., [69]2014). 2.5. Homologous gene clustering and phylogenetic tree construction Complete genome sequences, genome annotation files, coding sequences (CDS), and protein sequences for nine Lamiaceae plants were downloaded from the NCBI Genome database: Anisodus acutangulus, Arabidopsis thaliana, Capsicum annuum, C. eugenioides, Lycium barbarum, Solanum dulcamara, Sesamum indicum, Solanum lycopersicum, and Salvia miltiorrhiza. Based on the genome annotations, genes and their longest transcripts were identified for subsequent analysis. OrthoFinder (v2.5.5) was used for orthologous and paralogous gene clustering. The clustered single‐copy orthologous genes were inputted into MAFFT (v7.520) for multiple sequence alignment. Protein alignments were converted to corresponding CDS alignments using PAL2NAL (v14), and conserved sites were extracted from these alignments with trimAl (v1.4. rev15). These conserved sites were then used as input to create a supermatrix with the phylotools package (v0.2.2), which served as the basis for constructing a phylogenetic tree using IQ‐TREE (v2.2.5). 2.6. Divergence time estimation Based on the results of the CDS multiple sequence alignment, phylogenetic tree, and species divergence times obtained from the TimeTree website ([70]https://timetree.org), the mcmctree program in PAML (v4.10.6) was used to calculate species divergence times. 2.7. Gene family expansion and contraction analysis Based on the results of homologous gene clustering, phylogenetic tree construction, and species divergence time estimation, CAFÉ (v5.0.0) was used to analyze changes in gene family size throughout evolutionary history. Using the nuclear genome's protein‐coding genes as a background gene set, significant expansions or contractions were analyzed for GO and KEGG enrichment using the clusterProfiler package (v4.6.2). Based on the results of CDS multiple sequence alignment and phylogenetic tree construction, the codeml program in PAML software was used to detect Darwinian positive selection driving protein evolution. This study utilized the branch of A. acutangulus in the phylogenetic tree as the foreground branch, with other branches serving as background lineages. The branch‐site model was used to calculate selective pressure on the foreground branch and to identify genes under positive selection through likelihood tests (p‐value ≤ 0.05). 2.8. Genome polyploidization analysis Using BlastP software (v2.14.0), protein sequence alignments of A. acutangulus were performed and compared with the protein sequences of L. barbarum, S. dulcamara, S. indicum, and S. miltiorrhiza (Camacho et al., [71]2009). Based on the alignment results, MCScanX software was used to detect collinear blocks within the nuclear genome of A. acutangulus, and the results were visualized on the SynVisio website ([72]https://synvisio.github.io/) (Y. Wang et al., [73]2012). WGDI software (v0.6.5) was employed to analyze collinear blocks within and between species and to calculate the substitution per synonymous site (Ks), plotting the results as frequency distribution charts (Sun et al., [74]2022). JCVI software (v1.3.8) was used to identify collinear blocks between species and to generate macro‐synteny plots (H. Tang et al., [75]2008). The parameters used for running these three software applications were set to default. 2.9. Gene identification and evolutionary analysis The protein domain HMM configuration files for chalcone synthase (CHS) and isomerase were downloaded from the Pfam database ([76]http://pfam.xfam.org). HMMER software (v2.3.2) and its subprogram hmmsearch were used to search for proteins with the same domains in the protein sequences of A. acutangulus (Finn et al., [77]2015). The search results were combined with gene functional annotations from eggNOG‐mapper to identify the CHS and chalcone isomerase (CHI) genes in A. acutangulus. Then, OMA standalone software (v2.6.0) was used for hierarchical orthologous clustering based on the protein sequences of 10 plants, and the clustering results that included the CHS and CHI genes were visualized using the pyHam package (v1.1.12) (Altenhoff et al., [78]2019; Train et al., [79]2019). The parameters set for gene identification, clustering, and visualization of clustering results were default. To explore the gain or loss of genes encoding these two key rate‐limiting enzymes during adaptive evolution, this study used the protein sequences of I. seguinii, A. thaliana, and the aforementioned eight Lamiaceae species as input files. The OMA standalone software was employed to identify hierarchical orthologous groups (HOGs). 2.10. De novo assembly and annotation of the chloroplast genome To better understand the evolutionary relationships and genetic diversity among plant species, sequencing and assembling the chloroplast genome provides crucial insights into conserved genetic markers. This approach provides a foundation for comparing genetic similarities and differences that can shed light on plant adaptation and speciation. As illustrated by the bioinformatics pipeline in Figure [80]S3, the whole genome Illumina sequencing data, after quality assessment and control, were assembled into the chloroplast genome using GetOrganelle (v1.7.7.0) (Jin et al., [81]2020). After assembly, the results were viewed in Bandage (v0.8.1) to confirm circular assembly (Wick et al., [82]2015). Subsequently, the chloroplast genome sequences of two configurations were aligned with those of I. cirrhosa, C. arabica, Solanum tuberosum, S. lycopersicum, C. annuum, Nicotiana tabacum, and A. thaliana using MAFFT (v7.520) (Rozewicki et al., [83]2019). Conserved sites were extracted using trimAl (v1.4.rev15) (Capella‐Gutierrez et al., [84]2009), and the results were used to construct a phylogenetic tree with IQ‐TREE (v2.2.5) (Minh et al., [85]2020). The chloroplast genome annotation file of I. cirrhosa was downloaded from the NCBI Nucleotide database ([86]https://www.ncbi.nlm.nih.gov/nuccore) and used as a reference for annotating the chloroplast genome of A. acutangulus on the CPGAVAS2 website ([87]http://47.96.249.172:16019/analyzer/home) (Shi et al., [88]2019). The GenBank file was then submitted to the CPGView website ([89]http://47.96.249.172:16085/cpgview/home) to visualize the annotation results as a feature map (S. Liu et al., [90]2023). The annotation results from CPGAVAS2 and the visualization results from CPGView were combined to analyze the chloroplast genome's length, lengths of various regions (large single copy [LSC], small single copy [SSC], and inverted repeat [IR]), number of protein‐coding genes, and number of noncoding genes. 2.11. Sample preparation, extraction for metabolite detection Rhizome and leaf samples were flash‐frozen and ground into a fine powder, which was then mixed with a methanol/acetonitrile/water solution for centrifugation. The resulting supernatant was dried and subsequently redissolved for analysis. High‐performance liquid chromatography‐tandem mass spectrometry (HPLC‐MS/MS) was employed to separate and analyze the samples. Quality control measures were incorporated throughout the process to ensure the reliability of the data. Metabolites were qualitatively and quantitatively analyzed by converting mass spectrometry data into mzXML format, followed by peak alignment, retention time correction, and peak area extraction. Identification of metabolites was performed against a laboratory‐built database. The detailed flowchart for the analysis of metabolite detection data was illustrated in Figure [91]S4. 2.12. Metabolite detection results statistics and inter‐group difference analysis Metabolites identified in both positive and negative ion modes were categorized based on their chemical classifications. The principal component analysis (PCA) was conducted using the ropls package (v1.34.0) to evaluate the reliability of metabolite analysis and the distribution trends between groups. Differences in metabolite abundance among roots, stems, and leaves were analyzed using T‐tests, fold change analysis, and orthogonal partial least squares discriminant analysis (OPLS‐DA) to identify significant differential metabolites (DMs). These DMs were further analyzed for expression patterns in root, stem, and leaf samples using the Mfuzz package (v2.26.0). The relative abundances of differential metabolites were visualized with hierarchical clustering heatmaps using the pheatmap package (v1.0.12). Metabolite pathway annotations and enrichments were performed using the KEGG databases, and subsequent pathway enrichment analysis of the differential metabolites was conducted using the clusterProfiler package (v4.10.0). 2.13. Network pharmacology analysis and docking analysis The physicochemical and pharmacokinetic properties of the detected metabolites were assessed using SwissADME. Simplified molecular input line entry specification (SMILES) codes were subsequently entered into SwissTargetPrediction for target identification. RA treatment targets were investigated using OMIM, GeneCards, and DisGeNET databases. Significant targets identified were analyzed using the VennDiagram package (v1.7.3), and enrichment analysis was performed with the clusterProfiler package. These targets were integrated into a network constructed in Cytoscape (v3.10.1), employing the NetworkAnalyzer plugin to identify key active components. Additionally, a protein–protein interaction (PPI) network was constructed using the STRING database and analyzed in Cytoscape with the CytoNCA plugin to determine essential therapeutic targets. Key therapeutic target structures were retrieved from the RCSB protein data bank database. These structures were prepared in PyMOL (v2.5.5) by removing water and ligand molecules and subsequently processed in AutoDock Tools (v1.5.7) for docking simulations. Structures of active components were downloaded from PubChem, converted to mol2 format using Open Babel (v3.1.0), and further prepared in AutoDock Tools. Docking simulations between these key targets and active components were conducted using AutoDock Vina (v1.1.2) to evaluate their interaction efficacy. 3. RESULTS 3.1. Species identification In addition to morphological observations, DNA barcoding was employed to identify the three plant specimens collected from Yunnan, China. By analyzing the rbcL gene, genus‐level identification was achieved, confirming that the specimens belong to the genus Iodes. However, the use of the matK gene provided precise species‐level identification, consistently confirming the target species (Table [92]S2). Due to the limited number of Iodes sequences in the NT database, the psbA‐trnH sequence aligned with the chloroplast genome of I. cirrhosa (Table [93]S3). 3.2. Genome assembly and completeness evaluation An individual of I. seguinii, numbered as Xichou#1, was sequenced. The estimated genome size of I. seguinii was approximately 288.70 Mb (Figure [94]1b), with a diploid karyotype as determined by a k‐mer survey using Illumina short reads (Figure [95]1c). The heterozygosity level was estimated at 0.65%, and repetitive content accounted for 42.1% of the genome. A de novo assembly from 29 Gb (∼100× coverage) of HiFi reads, with an average read length of 18.52 kb, produced an assembled sequence of 273.58 Mb, comprising 355 contigs with an N50 size of 9.71 Mb (Table [96]1; Figure [97]S5a). Utilizing 107 Gb of Hi‐C data, the sequence of I. seguinii was further organized by anchoring the contigs into 399 scaffolds, achieving an N50 size of 18.97 Mb. After manual correction, the sequence was ultimately assembled to a total size of 273.58 Mb, organized into 13 pseudochromosomes (Figure [98]1d). These pseudochromosomes range in size from approximately 14.92 Mb to 36.59 Mb (Table [99]S4). Chromosome 1 is the longest and is composed of longer contig sequences, which is also observed in chromosomes 6 and 10. In contrast, the shortest chromosome, chromosome 13, contains shorter contig sequences, similar to chromosomes 2, 3, 4, 5, 7, 8, 9, 11, and 12 (Figure [100]S5b). TABLE 1. Quality assessment of the nuclear genome assembly of I. seguinii. Items Data Plant material Xichou#1 from Yunnan Province, China Estimated genome size (Mb) 288.70 Estimated heterozygosity (%) 0.65 Sequencing platform (genome coverage) PacBio Sequel II (Gb) 28.97 (100×) Illumina NovaSeq 6000 (DNA‐seq, Gb) 29.04 (100×) Illumina NovaSeq 6000 (Hi‐C, Gb) 107.3 (371×) ONT PromethION P48 (Gb) 51.82 Illumina HiSeq X (RNA‐seq, Gb) 24.77 Assembly statistics Total number of scaffolds 399.00 Scaffold N50 length (Mb) 18.97 Total number of contigs 355.00 Contig N50 length (Mb) 9.71 Assembly size (Mb) 273.58 GC content (%) 32.73 Repetitive content (%) 42.10 Assessment Genome BUSCOs (%) 97.40 LTR assembly index 11.44 Illumina mapping rate 87.64 ONT mapping rate 89.62 [101]Open in a new tab Abbreviations: GC, guanine–cytosine; LTR, long terminal repeat. A comprehensive evaluation of the assembly quality was conducted. The Hi‐C interaction heatmap indicates that the contigs anchored to the 13 chromosomes are accurate in clustering, sorting, and orientation. The alignment rates of the transcriptome Illumina sequencing data and ONT sequencing data to the 13 chromosomes are 87.64% and 89.62% of their respective total data. Additionally, the assembly completeness of these 13 chromosomes is 97.4%, the HiFi sequencing coverage is 52.08×, and the LAI is 11.44, which falls under the “Reference” level. These results indicate that the chromosome‐level nuclear genome assembled in this study meets the cutting‐edge standards in terms of precision, completeness, and coherence. 3.3. Genome annotation Transposons constitute the largest proportion of repetitive sequences at 22.32%. DNA transposons and retrotransposons, including SINE, LINE, and LTR, account for 4.17%, 0.17%, 3.89%, and 14.09% of the nuclear genome, respectively (Table [102]S5). Microsatellites (simple repeats) are the most numerous, comprising 33.18% of the total repetitive sequences. Satellites, another type of tandem repeat, account for only 0.000511%. Chromosome 1 has the highest number of DNA transposons, LINEs, microsatellites, and SINEs. Chromosome 2 has the highest number of LTRs and satellites, while chromosome 5 has the highest content of repetitive sequences (Figure [103]S6). A total of 25,270 genes were predicted in the I. seguinii genome assembly using homology, transcript‐based, and ab initio gene prediction approaches, after excluding 115.28 Mb (42.14%) of repetitive sequences (Figure [104]1e; Table [105]S6). The completeness of the genome annotation was assessed using BUSCO, revealing that the annotated gene set covered 1513 (93.74%) of the 1614 universal single‐copy genes found in the Embryophyta lineage (Table [106]S7). Sequence similarity comparison methods were used to identify known functional genes homologous to the 25,270 protein‐coding genes in five major databases (NR, Swiss‐Prot, eggNOG, KEGG, InterPro, Pfam) and subsequently annotated the functions of 23,998 (94.97%) genes based on homology. Functional annotations for these genes were derived from at least one of the aforementioned databases. Additionally, 16,992 genes (67.24%) were associated with at least one GO annotation, and 9777 genes (38.69%) were associated with at least one KEGG pathway annotation (Table [107]S8). 3.4. Orthologue identification and phylogenetic inference The results from OrthoFinder indicated that the protein‐coding genes from A. acutangulus, A. thaliana, C. annuum, C. eugenioides, L. barbarum, S. dulcamara, S. indicum, S. lycopersicum, S. miltiorrhiza, and I. seguinii constitute 9630 gene families, including 1076 single‐copy orthologous genes (Figure [108]2a). Among the 25,270 protein‐coding genes in I. seguinii, 23,308 were clustered into 14,410 gene families, while the remaining 1962 genes did not belong to any gene family (Figure [109]2b). To determine the phylogenetic position of I. seguinii within the Lamiids, a maximum likelihood phylogenetic tree was constructed, with A. thaliana serving as the outgroup. The resulting rooted tree reveals that I. seguinii does not share a close evolutionary relationship with eight other Lamiids plants included in the study. Instead, the data suggest that I. seguinii may be situated at the base of the Lamiids evolutionary tree, indicating a more distant ancestral lineage within this clade (Figure [110]2c). FIGURE 2. FIGURE 2 [111]Open in a new tab Gene family dynamics across species and phylogenetic inference. (a) Gene clustering analysis involving 10 different species. (b) Comprehensive overview of orthologous and paralogous gene relationships. (c) Phylogenetic tree illustrates the dynamic evolution of gene families among 10 species, with blue and red numbers on each branch indicating the number of gene families that have expanded and contracted, respectively. Pie charts adjacent to each branch detail the proportions of these expanded (blue) and contracted (red) gene families, while black numbers denote divergence times. (d) Self‐self synteny analysis of pseudo‐assembled genome. Depicting numerous inversions and duplications on a graph. (e) Ks distribution among I. seguinii and three other species, highlighting the synonymous substitution rates and revealing evolutionary events and divergence times. (f) Characterization of the chloroplast genome. (g) Phylogenetic tree inferred from the chloroplast genomes of eight plant species. ATP, adenosine triphosphate; LSU, large subunit; MRCA, most recent common ancestor; NADH, nicotinamide adenine dinucleotide (reduced form); SSU, small subunit. To date evolutionary events, a time‐calibrated phylogenetic tree was reconstructed using fossil calibration times. The analysis revealed that A. acutangulus and L. barbarum, two common medicinal herbs, diverged approximately 14.7 million years ago. Plus, S. dulcamara and S. lycopersicum diverged around 11.4 million years ago, and their common ancestor with C. annuum split around 15.9 million years ago. The divergence between these five Solanaceae plants and Rubiaceae plants (C. eugenioides) occurred approximately 90.3 million years ago. This lineage was sister to another Lamiales branch, which includes S. indicum (Pedaliaceae) and S. miltiorrhiza (Lamiaceae), diverging about 99.7 million years ago. Sesamum indicum and S. miltiorrhiza themselves diverged between 47.4 and 50.1 million years ago. The divergence of these eight plants from I. seguinii occurred between 114.4 and 120.5 million years ago. Serving as the outgroup, A. thaliana diverged from these nine Lamiales plants around 15.96 million years ago. 3.5. Expansion, contraction, and positive selection of gene families In the 14,410 gene families of I. seguinii, 1566 families expanded and 1663 contracted. Genes in significantly expanded families (SEGs) were primarily involved in the synthesis of secondary metabolites such as ethylene, coumarins, monoterpenes, phenylpropanoids, flavonoids, and steroid hormones (Figure [112]S7a,b). Conversely, genes in significantly contracted families (SCGs) were related to the synthesis of sesquiterpenes, phenylpropanoids, monoterpenes, glucosinolates, isoflavones, and zeatin, as well as the degradation of terpenoids and isoprenoids, and the metabolism of alkaloids, linoleic acid, triterpenes, and galactose (Figure [113]S7c,d). Additionally, SEGs were enriched in the isoflavonoid biosynthesis pathway, while SCGs were enriched in GO terms for defense response to damage and herbivores. These enrichment analysis results provide insights into the adaptation of I. seguinii to its environment and the associated metabolic processes over its long‐term evolution. In the lineage containing I. seguinii, a total of 156 genes are under positive selection (p‐value < 0.05). These genes are primarily associated with DNA damage response mechanisms, including DNA recombinational repair (GO:0000725, ko03400), interstrand cross‐link repair (GO:0036297), double‐strand break repair (GO:0006302), and homologous recombination repair (GO:0000724, ko03440) (Figure [114]S7e,f). 3.6. Genome collinearity and polyploidy events The collinearity analysis of I. seguinii chromosomes identified 6646 collinear genes and 342 collinear blocks within the nuclear genome (Figure [115]2d; Figure [116]S8). Subsequently, three medicinal plants were selected with I. seguinii to calculate Ks values based on the nuclear genome structure annotation, CDS sequences, and protein sequences (Figure [117]2e). Ks calculations for S. dulcamara and S. indicum indicated that they, along with I. seguinii, underwent an ancient polyploidy event (Ks ≈ 1.89) followed by divergence (Ks ≈ 1.40), consistent with the phylogenetic tree. Post divergence, I. seguinii experienced a separate polyploidy event (Ks ≈ 1.08), while S. dulcamara and S. indicum shared another event with L. barbarum (Ks ≈ 0.69). Divergence among these species occurred approximately 114.4–120.5 million years ago, suggesting a synonymous nucleotide substitution rate of approximately 5.81 × 10⁻⁹ to 6.12 × 10⁻⁹. Thus, the two polyploidy events in I. seguinii likely occurred 88.2–92.9 million years ago and 154.4–162.7 million years ago. Collinearity analysis revealed that each chromosome of I. seguinii had collinear blocks with multiple chromosomes of S. dulcamara and S. indicum, indicating no one‐to‐one collinearity relationship, likely due to the unique polyploidy events in these species (Figures [118]S9 and [119]S10). 3.7. Chloroplast genome assembly and annotation The assembly results of the chloroplast genome of I. seguinii indicate a single isoform configuration with a total size of 150,599 bp. The chloroplast genome is composed of four regions: an LSC region of 84,147 bp, an SSC region of 18,932 bp, and two inverted repeats (IRs), A and B, each measuring 23,760 bp. (Figure [120]2f). It contains 121 genes, including 84 protein‐coding genes, 29 tRNA genes, and eight rRNA genes. Six protein‐coding genes (ndhB, rpl2, rpl23, rps7, ycf15, ycf2), six tRNA genes (trnI‐CAT, trnL‐CAA, trnM‐CAT, trnN‐GTT, trnR‐ACG, trnV‐GAC), and four rRNA genes (rrn4.5S, rrn5S, rrn16S, rrn23S) are duplicated; eight protein‐coding genes (atpF, ndhA, ndhB, petB, petD, rpl2, rpoC1, rps16) contain one intron each; and each of two protein‐coding genes (clpP and ycf3) contain two introns (Table [121]S9). Subsequently, the chloroplast genome sequence was compared to those of A. thaliana and six related species to construct a phylogenetic tree (Figure [122]2g). The analysis revealed that I. cirrhosa and I. seguinii belong to the same sister clade within the genus Iodes. 3.8. Identification and analysis of differential metabolites A total of 1289 metabolites from 11 categories were detected in the roots, stems, and leaves of the I. seguinii plant (Figure [123]3a). These include 344 lipids and lipid‐like molecules, 213 phenylpropanoids and polyketides, 121 benzene ring compounds, 110 oxygen‐containing organic compounds, 105 heterocyclic compounds, 104 organic acids and their derivatives, 29 alkaloids and their derivatives, 27 nitrogen‐containing organic compounds, 24 lignans and neolignans, and 20 nucleosides, nucleotides, and their analogs. The PCA results for both the experimental and quality control (QC) samples revealed that QC samples were tightly clustered at the center of the score plot under both positive (Figure [124]3b) and negative ion modes (Figure [125]3c), while the more dispersed distribution of root, stem, and leaf samples indicated significant differences in metabolite content within these tissues, a finding corroborated by the OPLS‐DA model (Figure [126]S11a–c). FIGURE 3. FIGURE 3 [127]Open in a new tab Detailed metabolomic profiling and target analysis. (a) Classification of metabolites identified in roots, stems, and leaves, displaying various compound categories. The principal component analysis (PCA) of root, stem, leaf, and quality control (QC) samples in (b) positive and (c) negative ion mode showed clear separation of tissue types. (d) Hierarchical clustering analysis of differential metabolites (DMs), demonstrating unique metabolic profiles. Venn diagram showing the overlap between rheumatoid arthritis therapeutic targets and active component targets, revealing 119 common targets. (f) Kyoto encyclopedia of genes and genomes (KEGG) pathway and (g) gene ontology (GO) enrichment analysis of these common targets, highlighting the associated biological processes and pathways. RA, rheumatoid arthritis. Ultimately, 137 differential metabolites (DM) were identified between leaves and stems (77 upregulated, 60 downregulated), 133 DMs between roots and leaves (54 upregulated, 79 downregulated), and 95 DMs between roots and stems (30 upregulated, 65 downregulated) (Figure [128]S12a). There were 22 DMs common to all comparison groups, while 29 DMs were unique to the leaves and stems group, 16 to the roots and leaves group, and 26 to the roots and stems group (Figure [129]S12b). A hierarchical clustering heatmap of these 207 differential metabolites showed that the samples within each group clustered together, indicating high intra‐group similarity. The clustering results of the rows indicated DMs with similar expression patterns (Figure [130]3d). These expression patterns could be divided into nine categories. Among them, 84 DMs had the highest relative content in leaves (expression modules 1, 2, 9), 50 DMs had the highest relative content in roots (expression modules 3, 4, 6), and 73 DMs had the highest relative content in stems (expression modules 5, 7, and 8) (Figure [131]S12c). 3.9. Network pharmacology and molecular docking Based on the metabolite detection results from the roots, stems, and leaves of I. seguinii, 84 active compounds were identified with high bioavailability and possessing anti‐inflammatory, immunomodulatory, or immunosuppressive effects. These compounds adhere to at least four out of five drug‐likeness rules (Lipinski, Ghose, Veber, Egan, and Muegge). Using their SMILES codes as input data, target prediction on the SwissTargetPrediction website ([132]http://www.swisstargetprediction.ch) identified 733 targets. Further searches using “rheumatoid arthritis” as a keyword in databases like DisGeNet, GeneCards, and OMIM yielded 829 potential therapeutic targets after filtering. A Venn diagram indicated an overlap of 119 RA therapeutic targets among the 733 interaction targets (Figure [133]3e). This relationship forms a “compound‐target” network with 666 edges. Hierarchical clustering and analysis identified 11 key bioactive compounds, predominantly flavonoids, except for Morin, a polyphenolic compound (Figure [134]4a). FIGURE 4. FIGURE 4 [135]Open in a new tab Detailed analysis of protein interactions and biosynthetic pathways of luteolin. (a) Component‐Target network illustrates the interactions between active ingredients in I. seguinii and therapeutic targets for rheumatoid arthritis. Blue nodes represent the active ingredients, while red nodes indicate intersection targets. Gray lines connect blue nodes to red nodes, showing interactions between the active ingredients and their targets. (b) Topological analysis of protein–protein interaction (PPI) network, highlighting key hub proteins and their interactions. (c) Heat map showing the binding energies of the main components with the hub targets, indicating potential therapeutic interactions. (d) Proposed biosynthetic pathway for luteolin, detailing the enzymatic steps and intermediate compounds involved. 4CL, 4‐coumarate: CoA ligase; C4H, cinnamic acid 4‐hydroxylase; CHI, chalcone isomerase; CHS, chalcone synthase; FNS, flavone synthase; PAL, phenylalanine ammonia‐lyase. The KEGG enrichment analysis indicated that the intersection targets were significantly enriched in the RA disease pathway (hsa05323), three immune response‐related pathways—Toll‐like receptor signaling pathway (hsa04620), T cell receptor signaling pathway (hsa04660), Th17 cell differentiation (hsa04659), three inflammation‐related immune response pathways—nuclear factor kappa‐light‐chain‐enhancer of activated B cells (NF‐κB) signaling pathway (hsa04064), chemokine signaling pathway (hsa04062), IL‐17 signaling pathway (hsa04657), and one immune regulation pathway—prolactin signaling pathway (hsa04917). This indicates that the therapeutic effects of the active ingredients on RA may be related to their regulation of the aforementioned pathways (Figure [136]3f).The results of the GO enrichment analysis showed that the intersection targets were significantly enriched in biological processes such as regulation of inflammatory response (GO:0050727), positive regulation of inflammatory response (GO:0050729), positive regulation of cytokine production (GO:0001819), leukocyte migration (GO:0050900), and cytokine‐mediated signaling pathway (GO:0019221). This suggests that the active ingredients may participate in regulating the human inflammatory response, playing roles in inhibiting cytokine secretion, cytokine‐mediated signaling pathways, and excessive migration of immune cells (Figure [137]3g). Further analyses in the STRING database constructed a PPI network of 106 proteins with 602 interactions, pinpointing highly connected core nodes crucial for network integrity and biological significance (Figure [138]4b). Based on degree centrality and intermediate centrality, the screened core targets included AKT serine/threonine kinase 1 (AKT1), toll‐like receptor 4 (TLR4), epidermal growth factor receptor (EGFR), tumor necrosis factor (TNF), tumor protein P53 (TP53), nuclear factor kappa B subunit 1 (NFKB1), janus kinase 2 (JAK2), BCL2 apoptosis regulator (BCL2), mitogen‐activated protein kinase 1 (MAPK1), and spleen‐associated tyrosine kinase (SYK). The affinity value of luteolin with two core therapeutic targets (AKT1 and TNF) is greater than −7 kcal/mol, while its affinity value with the remaining eight core therapeutic targets (TLR4, EGFR, TP53, NFKB1, JAK2, BCL2, MAPK1, and SYK) is less than or equal to −7 kcal/mol (Figure [139]4c). Other key active ingredients (hesperetin, apigenin, isorhamnetin, wogonin, isorhamnetin, kaempferol, luteolin, myricetin, isorhamnetin, and quercetin) have an affinity value greater than −7 kcal/mol with at least three core therapeutic targets (Table [140]S10). Furthermore, hydrogen bond visualization results indicate that luteolin is connected to each of the core therapeutic targets by three, four, two, four, three, four, three, one, four, and three hydrogen bonds, respectively. These results collectively suggest that luteolin can bind to the core therapeutic targets of RA and form structurally stable complexes (Figure [141]5). FIGURE 5. FIGURE 5 [142]Open in a new tab Three‐dimensional interactions of luteolin with key hub targets. This figure displays the docking results of luteolin with 10 critical targets: (a) AKT1, (b) BCL2, (c) EGFR, (d) JAK2, (e) MAPK1, (f) NFKB1, (g) SYK, (h) TLR4, (i) TNF, and (j) TP53. Each panel illustrates the interaction sites, with amino acid residues that bind to luteolin highlighted in light cyan. The names of the amino acid residues are labeled next to the binding sites, and the yellow dashed lines represent hydrogen bonds with distances indicated. 3.10. Luteolin biosynthesis This study detailed the biosynthetic pathway of flavonoids, focusing on luteolin, using data from the KEGG pathway database and existing literature. The biosynthesis begins with the conversion of phenylalanine to cinnamic acid by phenylalanine ammonia‐lyase (PAL) (Figure [143]4d). Cinnamic acid is then hydroxylated by cinnamic acid 4‐hydroxylase (C4H) to form p‐coumaric acid, which is linked with coenzyme A by 4‐coumarate: CoA ligase (4CL) to produce p‐coumaroyl‐CoA. CHS then catalyzes the formation of naringenin chalcone from p‐coumaroyl‐CoA and malonyl‐CoA. CHI converts naringenin chalcone to naringenin, a precursor for apigenin and eriodictyol, which are subsequently converted to luteolin by flavone synthase (FNS) and flavonol hydroxylases (flavanone 3′‐hydroxylase and flavanone 3′,5′‐hydroxylase). Based on the qualitative and quantitative analysis of metabolites, this study determined the relative contents of five metabolites in the luteolin biosynthesis pathway: phenylalanine, cinnamic acid, apigenin, eriodictyol, and luteolin. As shown in Figure [144]4d, the relative content of apigenin, eriodictyol, and luteolin was highest in the leaves. In the stems, the relative content of eriodictyol and luteolin was higher than in the roots, while the relative content of apigenin was similar to that in the roots. Additionally, the relative content of phenylalanine was highest in the stems, and the relative content of cinnamic acid was highest in the roots. From the above, it can be concluded that the rate‐limiting steps identified are the CHS‐catalyzed synthesis of naringenin chalcone and its isomerization by CHI. The results showed that three HOGs contained CHS genes, and six HOGs contained CHI genes. I seguinii acquired new CHS genes (evm.model.chr5.1279 and evm.model.chr5.1283) and also underwent CHS gene duplication (evm.model.chr9.1030 and evm.model.chr9.1485) without experiencing any loss (Table [145]S11). The increase in CHS gene number may indicate an enhanced capacity for naringenin chalcone synthesis (Figure [146]6a). In contrast, the CHI gene in I. seguinii was lost, a phenomenon also observed in C. arabica, C. annuum, L. barbarum, and S. miltiorrhiza (Figure [147]6b). FIGURE 6. FIGURE 6 [148]Open in a new tab Chalcone synthase (CHS) and chalcone isomerase (CHI) hierarchical orthologous groups (HOGs) among 10 species. The yellow bars indicate gene duplication, the red bars indicate gene loss, and the blue bars represent genes that have neither gained nor lost. 4. DISCUSSION To fill the genomic gap in the genus Iodes and even the order Icacinales, we utilized PacBio HiFi and Hi‐C sequencing data to achieve the first chromosome‐level assembly of the nuclear genome of I. seguinii. During the assembly process, the initial contigs sequences were assembled using PacBio HiFi sequencing data, resulting in an N50 length of 9.71 Mb. Subsequently, based on the alignment results of Hi‐C sequencing data, the contigs sequences were anchored to 13 chromosomes, with a total length of 273.58 Mb, which is very close to the estimated length (approximately 95% of the estimated length). The assembly quality of the chromosomes was evaluated from four aspects: BUSCO, LAI, transcriptome sequencing data mapping rate, and PacBio HiFi sequencing coverage. The results consistently indicated that the assembly quality, completeness, and continuity of the chromosomes are excellent. On this basis, various functional elements in the nuclear genome were annotated, resulting in the annotation of 115.28 Mb of repetitive sequences (covering 42.14% of the nuclear genome), 1062 RNA genes, and 25,270 protein‐coding genes (BUSCO evaluation result: 93.74%). The high‐quality assembly and annotation results lay a solid foundation for subsequent analysis of evolutionary processes, mining of functional genes, and exploration of biosynthetic pathways of active ingredients, providing valuable genomic resources for genetic improvement and synthetic biology research of active ingredients. In addition to assembling and annotating the nuclear genome, we completed the de novo assembly and annotation of the chloroplast genome based on Illumina sequencing data. The chloroplast genome of I. seguinii is relatively large, at 150,599 bp, with the four components—LSC region, SSC region, IRA, and IRB—being 84,147, 18,932, 23,760, and 23,760 bp in length, respectively. The total length of this genome and the lengths of its components are very similar to those of the chloroplast genome of I. cirrhosa. Using the annotated information of the I. cirrhosa chloroplast genome as a reference, we annotated the chloroplast genome of I. seguinii, resulting in the annotation of 121 genes, including some commonly used DNA barcodes such as rbcL, matK, psbA, trnH, accD, and ycf1. We performed PCR amplification for rbcL, matK, and the noncoding region between psbA and trnH and used their sequencing results for species identification. The results showed that matK can be used as a basis for the identification of I. seguinii with better species identification efficacy than rbcL and psbA‐trnH. To determine the phylogenetic relationship between I. seguinii and other Lamiales plants and to analyze the evolutionary phenomena in its adaptive evolution, we conducted an evolutionary analysis based on the nuclear genomes of I. seguinii, two Lamiales species, one Gentianales species, and five Solanales species. The results showed that I. seguinii is positioned at the base of the Lamiales clade, indicating it is a basal Lamiales species. The divergence between I. seguinii and the core Lamiales (Gentianales, Lamiales, Solanales) occurred approximately 117.8 million years ago, consistent with previous findings based on chloroplast genomes or pollen traits (Alawfi & Alzahrani, [149]2023; Stull et al., [150]2015). Additionally, 1566 gene families in I. seguinii have expanded, 1663 gene families have contracted, and 156 genes are under positive selection. Genes within significantly expanded or contracted gene families are related to the synthesis of secondary metabolites, while genes under positive selection are associated with DNA damage repair. The expansion of gene families in I. seguinii may be attributed to polyploidization events. According to the Ks distribution, I. seguinii, along with S. dulcamara and S. indicum, experienced a whole‐genome duplication event between 154.4 and 162.7 million years ago. After divergence, I. seguinii underwent another independent whole‐genome duplication event between 88.2 and 92.9 million years ago, whereas S. dulcamara and S. indicum experienced a whole‐genome duplication event together with L. barbarum. Consequently, there is no one‐to‐one syntenic relationship between the nuclear genomes of I. seguinii and either S. dulcamara or S. indicum. Based on the metabolite detection results from the roots, stems, and leaves of I. seguinii, this study identified a total of 84 active components. Using disease target information from public databases and target prediction results, 119 targets were identified for RA treatment. Among these, the core therapeutic targets closely related to the development and progression of RA are AKT1, TLR4, EGFR, TNF, TP53, NFKB1, JAK2, BCL2, MAPK1, and SYK. Research indicates that the expression levels of BCL2, TP53, and EGFR are significantly elevated in patients with RA (Alawfi & Alzahrani, [151]2023; Taghadosi et al., [152]2021; Yuan et al., [153]2013). BCL2, an apoptosis inhibitor, helps maintain the integrity of mitochondrial structures in affected joints. TP53, a transcription factor, alleviates inflammation by interfering with the NF‐κB and MAPK signaling pathways and reducing the expression of inflammatory factors. EGFR, a transmembrane glycoprotein, promotes the proliferation of synovial fibroblasts and the secretion of cytokines while also inhibiting the formation and differentiation of osteoclasts. Targeting EGFR can alleviate RA symptoms. Among the other core therapeutic targets, both SYK and JAK2 are non‐receptor tyrosine kinases; SYK regulates the formation and secretion of cytokines, osteoclast maturation, and platelet aggregation, while JAK2 is crucial for hematopoiesis. Reducing the activity of these enzymes can help improve the condition (Cooper et al., [154]2023; Ma et al., [155]2016; Roskoski, [156]2022). AKT1 and MAPK1 are serine/threonine kinases; AKT1 is a central node in the phosphoinositide 3‐kinase signaling pathway, which is associated with abnormal proliferation of fibroblast‐like synoviocytes and synovitis, as well as osteoclast formation and differentiation. MAPK1 plays a key role in regulating the secretion of pro‐inflammatory cytokines and is related to joint inflammation and damage. TLR4, a pattern recognition receptor, is activated by specific exogenous substances or endogenous molecules, triggering innate immune and inflammatory responses. However, excessive activation leads to the production of large amounts of inflammatory factors (Zhang et al., [157]2022). TNF is a pro‐inflammatory cytokine, and antagonists developed against it are powerful tools for treating RA (Balkwill, [158]2009). The subsequent topological analysis of the “compound‐target” network revealed that the key active components in I. seguinii are 10 flavonoids (hesperetin, apigenin, luteolin, chrysin, baicalein, morin, kaempferol, isorhamnetin, fisetin, quercetin) and one polyphenol (scutellarin). These compounds all possess anti‐inflammatory activity and can stably bind to core therapeutic targets for RA to exert their therapeutic effects (Conti et al., [159]2021; Kim, [160]2023; X. R. Liu et al., [161]2024; Nam et al., [162]2013; Periferakis et al., [163]2022; Shina et al., [164]2009; Singaravelu et al., [165]2021; M. Tang et al., [166]2022; Y. Wang et al., [167]2022; G. Yang, Xia, et al., [168]2021). Additionally, studies have shown that apigenin, fisetin, and quercetin can inhibit the proliferation of fibroblast‐like synoviocytes, while kaempferol, isorhamnetin, and scutellarin can inhibit their migration and invasion. Hesperetin and luteolin can downregulate the levels of pro‐inflammatory cytokines (Lee et al., [169]2009; Pan et al., [170]2018; L. Yang, Cao, et al., [171]2021). Since the abnormal proliferation of fibroblast‐like synoviocytes and the excessive secretion of pro‐inflammatory cytokines are the main pathological changes in the affected areas of RA, these active components may help alleviate the symptoms of RA and hold promise as the basis for developing new therapeutic drugs for the disease. Before luteolin was detected in I. seguinii in this study, this flavonoid had already been found in various Chinese medicinal herbs (such as Cleome rutidosperma, Crocus sativus, Cyperus rotundus, Matricaria chamomilla, Ocimum basilicum, Perilla frutescens, Portulaca oleracea, Punica granatum, and Zingiber officinale) as well as in fruits and vegetables (such as celery, parsley, broccoli, onion, cabbage, and apple) (Taban et al. [172]2023). There are also two key rate‐limiting enzymes in its biosynthetic pathway: CHS and CHI. CHS catalyzes the condensation reaction between one molecule of coumaroyl‐CoA and three molecules of malonyl‐CoA, producing naringenin chalcone; CHI catalyzes the rapid isomerization of naringenin chalcone to naringenin (Yin et al., [173]2019). The CHS gene sequence is highly conserved across different plants, but the number of copies and the chromosomes containing this gene vary significantly. For example, the nuclear genome of parsley contains only one CHS gene, while that of snapdragon contains 14, and rice has 27. The CHS gene is present on only one chromosome in A. thaliana, but it is found on seven chromosomes in soybean. Using sequence similarity comparison, this study identified six CHS genes in I. seguinii, distributed across three chromosomes. Subsequently, by analyzing HOGs, we found that the CHS gene family in I. seguinii expanded through gene duplication and the formation of new genes. This expansion, likely due to polyploidization events, may contribute to the synthesis and accumulation of luteolin in I. seguinii. Such expansion of key gene families is also common in other Chinese medicinal herbs (Kang et al., [174]2020; J. Wang et al., [175]2021; Xu et al., [176]2020). In contrast, the CHI gene family in I. seguinii contracted due to gene loss, a phenomenon also observed in C. arabica, C. annuum, L. barbarum, and S. miltiorrhiza. In summary, we assessed the effectiveness of using DNA barcoding technology to identify I. seguinii and achieved high‐precision assembly and annotation of its nuclear and chloroplast genomes by integrating advanced sequencing and genome assembly technologies such as Illumina, PacBio HiFi, and Hi‐C data. Through genomic evolutionary analysis, this study successfully constructed the phylogenetic tree of I. seguinii, elucidating its evolutionary relationships with other Lamiaceae species. Furthermore, by combining untargeted analysis and network pharmacology, this study identified components in I. seguinii with potential therapeutic effects on RA, focusing specifically on the biosynthesis pathway of the key compound luteolin and the changes in its key gene families. These findings provide a scientific basis for the development and application of I. seguinii, but there are still gaps in the breadth of research. Future studies need to delve deeper into areas such as genomics, transcriptomics, and metabolomics. AUTHOR CONTRIBUTIONS Xun Gong: Conceptualization; funding acquisition; resources; writing—original draft. Hantao Zhang: Data curation; software; writing—review and editing. Yinluo Guo: Visualization; writing—review and editing. Shaoshuai Yu: Methodology; resources; writing—review and editing. Min Tang: Project administration; writing—review and editing. CONFLICT OF INTEREST STATEMENT The authors declare no conflicts of interest. Supporting information Figure S1 Comprehensive species identification through rbcL, psbA‐trnH and matK gene amplification. [177]TPG2-18-e20534-s020.docx^ (334.1KB, docx) Figure S2 Bioinformatics workflow for genomic assembly and annotation of I. Seguinii. [178]TPG2-18-e20534-s010.docx^ (152.5KB, docx) Figure S3 Flowchart for assembling and annotating the chloroplast genome and surveying the nuclear genome [179]TPG2-18-e20534-s018.docx^ (72.1KB, docx) Figure S4 Flowchart for integrated network pharmacology analysis of metabolite assay data. [180]TPG2-18-e20534-s011.docx^ (103.6KB, docx) Figure S5 Overview of sequencing data and genomic Analysis. [181]TPG2-18-e20534-s022.docx^ (1,006.5KB, docx) Figure S6 Distribution of repeats in the nuclear genome of I. seguinii. [182]TPG2-18-e20534-s008.docx^ (1.5MB, docx) Figure S7 GO and KEGG enrichment analysis of genes in the gene families of I. seguinii undergoing significant (a, b) expansion, (c, d) contraction and (e, f) positive selection. [183]TPG2-18-e20534-s002.docx^ (1.6MB, docx) Figure S8 Chromosome‐wise self‐synteny of I. seguinii, illustrating the syntenic relationships within its chromosomes [184]TPG2-18-e20534-s014.docx^ (3.8MB, docx) Figure S9 Collinear relationships among I. seguinii, S. dulcamara, and S. indicum. [185]TPG2-18-e20534-s017.docx^ (3.1MB, docx) Figure S10 Synteny map of I. seguinii pseudochromosomes compared to contigs of other species reveals a close relationship, with significant portions of the chromosomes conserved between the species. [186]TPG2-18-e20534-s012.docx^ (2MB, docx) Figure S11 Multivariate analysis and clustering of DMs. [187]TPG2-18-e20534-s023.docx^ (513.1KB, docx) Figure S12 Comprehensive Analysis of DMs and FCM Clustering. [188]TPG2-18-e20534-s009.docx^ (974.5KB, docx) Table S1 Identification of species using rbcL, psbA‐trnH and matK gene amplification with corresponding forward (F) and reverse (R) Primers. [189]TPG2-18-e20534-s016.docx^ (15.8KB, docx) Table S2 Detailed results from the BOLD for accurate species identification using DNA barcoding. [190]TPG2-18-e20534-s006.docx^ (19.1KB, docx) Table S3 Results of sequence alignment for psbA‐trnH against NT database. [191]TPG2-18-e20534-s001.docx^ (17.6KB, docx) Table S4 Chromosome assembly and global statistics for I. seguinii. [192]TPG2-18-e20534-s007.docx^ (20.2KB, docx) Table S5 Length statistics of different types of repeats in the nuclear genome of I. seguinii [193]TPG2-18-e20534-s004.docx^ (18.4KB, docx) Table S6 Number and length statistics of RNA genes in nuclear genome of I. seguinii [194]TPG2-18-e20534-s019.docx^ (18.6KB, docx) Table S7 Genome annotation evaluation using BUSCO analysis [195]TPG2-18-e20534-s015.docx^ (17.6KB, docx) Table S8 Functional annotation of I. seguinii protein‐coding genes [196]TPG2-18-e20534-s021.docx^ (17.8KB, docx) Table S9 Classification of genes encoded by the chloroplast genome of I. seguinii [197]TPG2-18-e20534-s013.docx^ (20.5KB, docx) Table S10 Molecular docking results of the main components in I. seguinii with the hub targets [198]TPG2-18-e20534-s005.docx^ (21.2KB, docx) Table S11 CHS and CHI genes in the nuclear genome of I. seguinii [199]TPG2-18-e20534-s003.docx^ (19.5KB, docx) ACKNOWLEDGMENTS