Abstract Rafflesia possesses unique biological features and known primarily for producing the world’s largest and existing as a single flower. However, to date, little is known about key regulators participating in Rafflesia flower development. In order to further understand the molecular mechanism that regulates Rafflesia cantleyi flower development, RNA-seq data from three developmental stages of floral bud, representing the floral organ primordia initiation, floral organ differentiation, and floral bud outgrowth, were analysed. A total of 89,890 transcripts were assembled of which up to 35% could be annotated based on homology search. Advanced transcriptome analysis using K-mean clustering on the differentially expressed genes (DEGs) was able to identify 12 expression clusters that reflect major trends and key transitional states, which correlate to specific developmental stages. Through this, comparative gene expression analysis of different floral bud stages identified various transcription factors related to flower development. The members of WRKY, NAC, bHLH, and MYB families are the most represented among the DEGs, suggesting their important function in flower development. Furthermore, pathway enrichment analysis also revealed DEGs that are involved in various phytohormone signal transduction events such as auxin and auxin transport, cytokinin and gibberellin biosynthesis. Results of this study imply that transcription factors and phytohormone signalling pathways play major role in Rafflesia floral bud development. This study provides an invaluable resource for molecular studies of the flower development process in Rafflesia and other plant species. Introduction Rafflesia is a member of the holoparasitic plant family, Rafflesiaceae, which is known to produce the world’s largest flower. There are over 30 species of Rafflesia that can be found in the tropical rainforest of Southeast Asia. Rafflesia cantleyi was the first species identified from Peninsular Malaysia, with more species identified later on [[40]1–[41]2]. Besides an extraordinary flower size, the floral structure of R. cantleyi is highly modified compared to other angiosperms. It has no apparent leaves, stems or roots, and only appears as a flower, which parasitises a specific host, Tetrastigma [[42]3]. Rafflesia possesses five perigone lobes as perianth connected to a diaphragm enclosing a large and bowl-shaped floral chamber with a central column as the reproductive organ [[43]4]. Apart from gigantism, flowering of R. cantleyi is irregular, infrequent, and the development of floral bud takes up to nine months. At the early developmental stage, the swollen bud of R. cantleyi appears through the bark of Tetrastigma covered with bracts and continues to grow progressively. Upon maturation and bracts abscission, the bud opens gradually over a 24 to 48-hour period [[44]5]. Flower development, preceded by the flowering process is the most important developmental event in a plant’s life cycle. Molecular and genetic studies in the annual model species Arabidopsis present an intricate genetic network that orchestrates the flowering process, controlled by diverse exogenous and endogenous factors. Endogenous factors include hormones, autonomous pathway, and aging pathway, whereas exogenous factors comprise of photoperiod and vernalisation [[45]6]. Subsequently, genes involved in the flowering pathway converge on floral integrators to activate floral meristem identity genes, which are essential for floral organ development. Floral integrators include FLOWERING LOCUS T (FT), CONSTANS (CO), CONSTANS-LIKE (COL), SUPPRESSOR OF OVEREXPRESSION OF CO1 (SOC1), FLOWERING LOCUS C (FLC), as well as floral meristem identity genes, including LEAFY (LFY) and APETALA1 (AP1) [[46]7]. Floral integrators activate floral organ identity genes known as ABC (extended to ABCDE) model genes [[47]8]. Additionally, transcription factors (TFs) such as bHLH, MYB, and MADS-box families are important regulatory proteins of transcription in flower development [[48]9]. Substantial progress has been achieved in understanding the mechanism of flower development, particularly in Arabidopsis [[49]8, [50]10]. However, while considerable work has been carried out in recent years on the different aspects of Rafflesia [[51]11–[52]12] including genes potentially involved in the growth and flower development [[53]13–[54]14], our knowledge regarding the molecular mechanism of R. cantleyi floral development is still very limited due to scarce sample availability, challenges during sample collection, lack of suitable material in the wild, undeveloped analytical methodologies, and inadequate molecular resources. In an effort to address this issue, we have previously generated RNA-seq data from early, mid and advanced R. cantleyi bud stages [[55]15]. The cross-sections of these three bud stages showed the early (floral bud stage 1), mid (floral bud stage 2) and advanced (floral bud stage 3) developmental stages. Floral bud stage 1 (FBS1) consists of undifferentiated cells while floral bud stage 2 (FBS2) contains moderately differentiated and visible internal organs, whereas floral bud stage 3 (FBS3) have more developed and mature internal organs. In this study, functional annotation of the transcriptome data and differential gene expression analysis were carried out to glean insights into molecular genetics underlying the regulation of flower development in R. cantleyi. This valuable genomic resource will facilitate further study of flower development in Rafflesia and other plant species. Materials and methods Transcriptome de novo assembly and functional annotation Raw data previously generated as described in [[56]15] were used in this study. The raw data obtained for three different floral bud stages (early, mid and advanced) has been registered under NCBI BioProject with the accession number PRJNA378435. The raw reads were trimmed and quality-filtered using Trimmomatic [[57]16]. The high-quality raw reads were pooled for de novo assembly using the Trinity (v2.0.6) analysis pipeline [[58]17]. The generated contigs were assembled and constructed sequences that could not be extended on either end are considered as unique transcripts [[59]18], herein referred as transcripts. All the assembled transcripts were subjected to BLASTX similarity search against the NCBI non-redundant (Nr) ([60]www.ncbi.nlm.nih.gov) and Swiss-Prot ([61]www.uniprot.org) databases with an E-value cut-off of 1e^-5. Based on Nr BLAST result, BLAST2GO [[62]19] was performed to obtain Gene Ontology (GO) annotation of assembled transcripts to represent biological process, cellular component, and molecular function categories. WEGO [[63]20] was employed to retrieve the GO functional classification to explain the distribution of gene functions. Transcript sequences were also searched against the Cluster of Orthologous Groups (COG) protein database ([64]www.ncbi.nlm.nih.gov/COG/). Furthermore, annotated transcripts were searched against the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway maps database [[65]21] for mapping of biological pathways represented by the R. cantleyi bud transcriptome. Protein coding sequences of R. cantleyi transcriptome were predicted via Transdecoder (transdecoder.github.io) implemented in the Trinity analysis pipeline. Trinotate annotation pipeline (trinotate.github.io) was carried out to annotate the transcriptome against Pfam [[66]22–[67]23]. Identification of transcription factors The transcripts were aligned against known TFs, as grouped in Plant Transcription Factor Database (PlnTFDB) [[68]24], using default parameters and cut-off E-value of 1e^-5. PlnTFDB is a comprehensive library of plant TFs that provides the complete lists of TFs families of fully sequenced genomes. Online protein sequence data for all genes listed in PlntTFDB version 3.0 (plntfdb.bio.uni-potsdam.de/v3.0/) were downloaded for transcript annotation using BLASTX. Heatmaps depicting the expression patterns of TF families at different bud stages of R. cantleyi were created using the MeV tool ([69]www.tm4.org/mev.html). Differential gene expression analysis To compare the differences in gene expression between different bud stages, transcript abundance was estimated from mapped reads with RSEM [[70]25]. The relative gene expression levels for each bud samples were normalised and expressed as Fragments per Kilobase of Exon per Million Reads Mapped (FPKM) values. Bioconductor package edgeR [[71]26] was used to determine the DEGs by evaluating the dispersion of the entire dataset. Significant DEGs were defined by P-value ≤ 0.001, Benjamin-Hochberg False Discovery Rate (FDR) ≤ 0.05 and |Log[2] fold change| > 2. KEGG pathway and GO enrichment analysis of DEGs The pathway enrichment analysis with hypergeometric test was performed using ‘Annotate’ and ‘Identify’ programs in KEGG Orthology-Based Annotation System (KOBAS 2.0) with Benjamini-Hochberg FDR correction [[72]27] to annotate putative pathways and biological functions of DEGs. The web-based ReviGO software (revigo.irb.hr) was used to reveal enriched gene GO functional categories identified from DEGs [[73]28]. Data validation by RT‑qPCR R. cantleyi buds of different developmental stages (early mid, and advanced) for RT-qPCR were sampled independently from the Forest Reserve in Raub, Pahang, Malaysia (3° 47′ 24″ N, 101° 51′25″E). Samples were rinsed using 10% (v/v) Clorox® solution (1% sodium hypochlorite) after carefully dissected from the host plant, followed by three rinses with sterile water, and flash frozen in liquid nitrogen before stored at -80°C. Inner tissues (for FBS2 and FBS3 perigone lobe) of the floral buds were used for the isolation of total RNA using modified CTAB extraction protocol [[74]29–[75]31]. RNA quality and quantity were evaluated using gel electrophoresis, ND-1000 Nanodrop spectrophotometer (Thermo Scientific) and Agilent 2100 Bioanalyzer with a minimum RNA integrity number of 7. Using PrimerBlast 5.0, primers were designed to amplify short regions for each target and reference genes ranging in product size from 90 to 120 bp. The primer sequences of references