Abstract Ricebean (Vigna umbellata) is a lesser known pulse with well-recognized potential. Recently, it has emerged as a legume with endowed nutritional potential because of high concentration of quality protein and other vital nutrients in its seeds. However, the genes and pathways involved in regulating seed development and size are not understood in this crop. In our study, we analyzed the transcriptome of two genotypes with contrasting grain size (IC426787: large seeded and IC552985: small seeded) at two different time points, namely, 5 and 10 days post-anthesis (DPA). The bold seeded genotype across the time points (B5_B10) revealed 6,928 differentially expressed genes (DEGs), whereas the small seeded genotype across the time point (S5_S10) contributed to 14,544 DEGs. We have also identified several candidate genes for seed development–related traits like seed size and 100-seed weight. On the basis of similarity search and domain analysis, some candidate genes (PHO1, cytokinin dehydrogenase, A-type cytokinin, and ARR response negative regulator) related to 100-seed weight and seed size showed downregulation in the small seeded genotype. The MapMan and KEGG analysis confirmed that auxin and cytokinin pathways varied in both the contrasting genotypes and can therefore be the regulators of the seed size and other seed development–related traits in ricebeans. A total of 51 genes encoding SCF ^TIR1/AFB , Aux/IAA, ARFs, E3 ubiquitin transferase enzyme, and 26S proteasome showing distinct expression dynamics in bold and small genotypes were also identified. We have also validated randomly selected SSR markers in eight accessions of the Vigna species (V. umbellata: 6; Vigna radiata: 1; and Vigna mungo: 1). Cross-species transferability pattern of ricebean–derived SSR markers was higher in V. radiata (73.08%) than V. mungo (50%). To the best of our knowledge, this is the first transcriptomic study conducted in this crop to understand the molecular basis of any trait. It would provide us a comprehensive understanding of the complex transcriptome dynamics during the seed development and gene regulatory mechanism of the seed size determination in ricebeans. Keywords: ricebean, seed size, hormone signaling, MapMan, SSR Introduction A rapid increase in the human population, which is expected to reach 9.7 billion by 2050, is one of the biggest challenges of this world ([49]Gu et al., 2021). To ensure food and nutritional security to the ever-growing human population, it is extremely important to bring underutilized and neglected crops into mainstream agriculture. Owing to its short growth duration and ability to thrive well in stress conditions and various soil types, ricebean (Vigna umbellata) is one such crop which has huge potential to sustain food and nutritional security in most parts of the world ([50]Pattanayak et al., 2019). It is a diploid (2n = 2× = 22), warm-season annual legume with a genome size of approximately 440 Mb ([51]Kaul et al., 2019). Ricebean is mainly cultivated in Nepal, Bhutan, Northeast India up to Myanmar, Southern China, Northern Thailand, Laos, Vietnam, Indonesia, and East Timor ([52]Tian et al., 2013), where it constitutes an important source of protein for the sizable population and contributes to household food and nutritional security. The observed protein content in ricebean is 25.57% with high concentration of various essential amino acids. Besides protein, ricebean grains also contain a significant amount of other nutrients such as carbohydrates, fiber, minerals, vitamins, and fatty acids. Moreover, ricebean is a rich source of unsaturated fatty acids like linoleic and linolenic acids ([53]Katoch, 2013). Among various productivity traits, pod length, seed size, and seed weight have major emphasis on ricebean genetic improvement programs because of their direct impact on the total grain yield. Furthermore, the seed is the key reservoir of proteins, essential amino acids, unsaturated fatty acids, and minerals in ricebean. Therefore, it is of great importance to decipher the molecular mechanism underlying seed development and size determination process in this minor but potential pulse crop. In recent years, with the advent of next-generation sequencing technology, key gene regulatory networks governing pod and seed development have been well characterized in both model plants like rice ([54]Herridge et al., 2011), Arabidopsis ([55]Herridge et al., 2011; [56]Mahto et al., 2017), and also in non-model legume species like black gram ([57]Souframanien and Reddy, 2015), cowpea ([58]Lonardi et al., 2019), chickpea ([59]Pradhan et al., 2014), mungbean ([60]Tian et al., 2016a), and soybean ([61]Jones and Vodkin, 2013; [62]Qi et al., 2018; [63]Peng et al., 2021). These studies revealed that seed development in higher plants is a highly complex process and governed by phytohormone signaling including cytokinins (CKs), gibberellins (GAs), brassinolides (BRs), ethylene (ET), and their associated genes and transcription factors. In all these phytohormones, genes related to auxin pathways including indole-3-acetic acid (IAA), auxin-responsive protein (IAA12, IAA), auxin response factors (ARFs), SAUR-like auxin superfamilies, auxin-related Aux/IAA, OsIAA18, and AP2/ERF, along with other genes such as ARR-B (cytokinin signaling), ethylene-responsive transcription factor (ERF084-like, ERF4, ERF061), ethylene-insensitive protein 3 (EIN3), ethylene receptor (ETR), ethylene-insensitive protein 4 (EIN4), serine/threonine-protein kinase (CTR1), ethylene responsive APATELA2 (AP2), ethylene-responsive element binding protein (EREBP), and many more genes were reported during seed development ([64]Garg et al., 2011; [65]Jones and Vodkin, 2013; [66]Tian et al., 2016a; [67]Nelson et al., 2017; [68]Geng et al., 2018; [69]Li et al., 2019a; [70]Lonardi et al., 2019; [71]Yi et al., 2019; [72]Raizada and Jegadeesan, 2020; [73]Zhu et al., 2020). The aforementioned transcriptome-based gene expression analysis has provided a robust functional genomics resource for deciphering gene networks and candidate genes regulating various biological processes in crop plants. For minor crops with poorly characterized genomes, like ricebean, such detailed transcriptome analysis will provide comprehensive information about expression patterns of genes and molecular mechanisms governing traits of economic importance. This valuable information can further be employed for the development of functional markers for gene and QTL mapping. Therefore, in the present study, we conducted transcriptome analyses to investigate gene expression networks and identify the candidate genes controlling seed size variation in ricebean. RNA sequencing of two contrasting ricebean genotypes was performed at early development stages (i.e., 5 and 10 DPA). The study provides detailed insights into various gene networks and their potential roles in determining seed size. Furthermore, the study also identified simple sequence repeat (SSR) motifs that could be used for molecular mapping of seed size/weight and other related traits. Materials and Methods Plant Material and Growth Conditions Seeds of two contrasting ricebean genotypes, namely, IC426787 (bold seeded) and IC552985 (small seeded) were obtained from ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi ([74]Figure 1). On the basis of the 2-year trial (2018 and 2019), the average 100-seed weight of IC426787 and IC552985 was 13.20 and 3.87 gm, respectively. Plants were grown in a net-house at ICAR-NBPGR, New Delhi (latitude: 28°38′56″N, longitude: 77°9′8″E, altitude: 228 mean sea level (msl)), during Kharif (rainy) season 2020. During pod filling, the minimum temperature ranged from 10.8 to 23°C, maximum temperature ranged from 30.4 to 36°C, and average RH% varied from 53 to 56. The ricebean pod filling duration varied from 20 to 30 DPA depending upon the genotype. Genotypes with smaller grain size took comparatively less pod filling time than the genotypes having larger grain size. Three biological replicates of pod samples were harvested from three full-grown plants of both genotypes at 5 and 10 DPA each. The seeds were separated and immediately frozen in liquid nitrogen and stored at −80°C for future use. A total of 12 samples were prepared for the construction of RNA libraries. FIGURE 1. FIGURE 1 [75]Open in a new tab Two contrasting genotypes of ricebean, that is, IC426787 (bold size) and IC552985 (small size), selected for the transcriptome analysis on the basis of their seed size. RNA Extraction, Library Preparation, and Sequencing The Pure Link RNA Mini Kit (Ambion, United States) was used to extract RNA from the frozen samples. The total RNA quality was checked using the RNA 6000 Nano Kit (Agilent Technologies, United States) on a 2100 Bioanalyzer (Agilent Technologies, United States), with a minimum RNA integrity number (RIN) of 7. RNA concentrations were determined with a NanoDrop ND-8000 spectrophotometer (Nano-Drop Technologies, Thermo scientific, Wilmington, DE). RNA-Seq libraries for all samples were prepared using the NEBNext UltraII RNA library preparation kit for Illumina; Cat no: E7770 (New England Biolabs), according to manufacturers recommended protocol, and sequencing was done in a single HiSeq 4000 lane using 150 bp paired-end chemistry. Briefly, total RNA was used to purify poly (A) messenger RNA (mRNA) using oligo-dT labeled magnetic beads. Then, the isolated mRNA was fragmented into 200 to 500 bp pieces in the presence of divalent cations at 94°C for 5 min using an ultrasonicator. The cleaved RNA fragments were copied into first-strand cDNA using SuperScript-II reverse transcriptase (Life Technologies, Inc.) and random primers. After second-strand cDNA synthesis, fragments were end-repaired and A-tailed, and indexed adapters were ligated. The products were purified and enriched with PCR to create the final cDNA library. The tagged cDNA libraries were pooled in equal ratios and used for 2 × 150 bp paired-end sequencing on a single lane of the Illumina HiSeq 4000. Illumina clusters were generated and loaded onto the Illumina Flow Cell on the Illumina HiSeq 4000 instrument, and sequencing was carried out using 2 × 150 bp paired-end chemistry. After sequencing, the samples were demultiplexed, and the indexed adapter sequences were trimmed using CASAVA v1.8.2 software (Illumina Inc.). Read Quality and Adapter Removal Raw reads of ricebean were evaluated for their quality using FASTQC v0.11.8 package ([76]http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Four parameters were considered: base quality score distribution, sequence quality score distribution, average base content per read, and GC distribution in the reads. Trimmomatic v0.36 was applied to remove the adapter and trim the low-quality reads (trimming includes reads with or without ambiguous sequence “N”) using default parameters. To correct the random sequencing errors in Illumina RNA-Seq reads, Rcorrector v1.0.3 was used. Clean reads were also checked for their quality using FASTQC only. RNA-Seq De Novo Assembly and Transcriptome Assessment The obtained clean reads of all 12 samples were assembled using Trinity v2.4.0 with the paired-end model and default K-mer value of 25. The de novo assembly was merged and clustered using CDHIT v4.0 to get non-redundant sequences. Furthermore, these non-redundant sequences were made transcripts using the trinity in-built script. The clean reads of each sample were mapped back to the de novo assembled genome through BWA-mem software with default parameters. The BAM files were handled by samtools. The number of reads mapped to genes was calculated using samtools v0.1.19. The expression difference of each transcript between different samples was calculated using DESeq2 R package. False discovery rate (FDR) values less than 0.01 and |log2 (fold change)| ≥2 were considered significant differences at the expression level. The transcript abundance was normalized by the fragments per kilobase of transcript per million mapped reads (FPKM) value. Gene Functional Analysis To annotate the assembled transcripts, sequences were aligned by BLASTX (e-value <1e^−5) to protein databases, including the non-redundant protein (NR) database, Swiss-Prot, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. A GO enrichment analysis was conducted for the transcripts according to biological process, cellular component, and molecular function ontologies using Blast2GO software ([77]Liu et al., 2013; [78]Calzadilla et al., 2016). The GO annotation functional classifications were determined using WEGO software for the distribution of gene functions ([79]Ye et al., 2006). GO functional enrichment and KEGG pathway enrichment analysis were also tested at a significance cutoff of p-value. All the p-values were adjusted with the criterion of Bonferroni correction. We selected the corrected p-value of 0.05 as the threshold to determine significant enrichment terms of the gene sets. The MapMan analysis was also conducted to provide a graphical overview of the metabolic and regulatory pathways for the detected genes using the MapMan tool, and the mapping file of ricebean for all the samples was generated using the Mercator tool. Candidate Gene Identification and Their Domain Analysis The candidate genes for seed development-related traits were identified on the basis of similarity (BLASTX with similarity >80% and e-value <0.001) with genes responsible for similar traits in other species, including Arabidopsis, Phaseolus vulgaris, and Vigna species (V. radiata and V. angularis). Furthermore, the candidate genes were also validated in silico on the basis of their domain analysis. The amino acid sequences of the identified candidate genes were predicted and compared against the Pfam protein database using HMMER 3.0 (e-value ≤ 1e^−10) to obtain candidate gene domain/family annotation information. A heatmap was also generated for the candidate genes on the basis of their expression in both the genotypes at different times of development. The heatmap was made using an in-house R script. Simple Sequence Repeats Identification and Primer Design The MIcroSAtellite (MISA) search engine ([80]http://pgrc.ipk-gatersleben.de/misa) was employed for the identification of SSRs. The minimum numbers of repeats used for selecting the SSRs were ten for mononucleotide-based loci, six for dinucleotide loci, five for trinucleotide loci, and three for all larger repeat types (tetra- to hexanucleotide motifs). For validation, 50 SSR motifs were randomly selected, that is, 25 for dinucleotide and trinucleotide each. The primers for these selected SSR motifs were designed based on flanking sequences using Primer3 software ([81]http://sourceforge.net/projects/primer3) with targeted size of PCR products in the range between 100 and 300 bp, primer length between 18 and 22 bp, GC content between 40 and 70, and melting temperature of 50–60°C. Simple Sequence Repeats Validation DNA was isolated from young leaves of eight accessions of Vigna species including V. umbellata (6), V. mungo (1), and V. radiata (1) by following the protocol described in the DNeasy plant mini kit (QIAGEN, Hilden, Germany). DNA concentration was measured using a NanoDrop™ 2000 spectrophotometer (Thermo, United States), and DNA quality was analyzed using 0.8% agarose gel. A working stock of DNA was (10 ng/µl) prepared with nuclease-free water for polymerase chain reaction (PCR) for SSR amplification. For the SSR amplification, 20 µl reaction mixture containing 4 µl genomic DNA (40 ng), 10 µl Taq Polymerase 2X Master Mix (United States), 0.8 µl primers (10 pM), and 5.2 µl nuclease-free water were used. For the amplification, the following thermal conditions were carried out: initial denaturation of 94°C for 3 min, then 35 cycles of 94°C for 30 s, primer annealing at 55°C for 45 s, extension at 72°C for 1 min, and final extension for 10 min. PCR products were separated using high-resolution metaphor agarose gel (3%) electrophoresis. Furthermore, the dendrogram of the genotypes was generated using the hierarchical clustering algorithm in DARwin v6.0.21 software ([82]https://darwin.cirad.fr/). Results and Discussion Transcriptome Sequencing and De Novo Assembly To obtain a comprehensive transcriptome profile of ricebean 12 RNA libraries were sequenced, and a total of 94.35 Gb raw data were generated. For these 12 samples, approximately, 98.50–99.80% of reads passed the quality control, and 98.60–99.60% of the clean reads were mapped back to the de novo assembled ricebean genome. On average, raw data of the seed transcriptome at 5 DPA and 10 DPA had 50.33 and 48.66% GC content, respectively, while after trimming, the GC content of clean data at 5 DPA and 10 DPA was 48.66 and 49.33%, respectively, which is similar to the GC content reported in the previous study of ricebean ([83]Chen et al., 2016; [84]Table 1). TABLE 1. Summary of RNA-Seq data for 12 samples of ricebean at 5 DPA and 10 DPA. Genotype Replicate Time point Read before quality control Read after quality control GC% Bold (IC426787) Replicate 1 5 DPA 28,178,488 28,081,924 49 10 DPA 32,397,064 32,223,234 52 Replicate 2 5 DPA 20,042,440 19,897,123 51 10 DPA 23,598,201 23,252,956 48 Replicate 3 5 DPA 32,043,486 31,869,247 51 10 DPA 26,455,283 26,137,490 46 Small (IC552985) Replicate 1 5 DPA 27,160,408 27,037,555 49 10 DPA 21,272,208 21,053,611 50 Replicate 2 5 DPA 24,589,394 24,425,550 49 10 DPA 25,209,773 25,053,008 49 Replicate 3 5 DPA 26,969,114 26,850,966 48 10 DPA 24,511,033 24,368,871 49 [85]Open in a new tab The obtained clean reads of all 12 samples were assembled using Trinity (v2.4.0) with default parameters. The assembled transcriptome consists of 218,486 super transcripts with an N50 value of 1,041. The number of transcripts generated in the current study is comparable to previous studies. In terms of N50, the ricebean had a higher N50 value than field pea (781) and chickpea (441) ([86]Pradhan et al., 2014; [87]Sudheesh et al., 2015) and less value than mungbean, common bean, and adzuki bean ([88]Hiz et al., 2014; [89]Chen et al., 2015b; [90]Chen et al., 2016). These results indicate the good quality of ricebean transcriptome. The lengths of the transcripts ranged from 201 to 15,828 bp, with an average length of 669 bp, which is less than other Vigna species like cowpea (871 bp) and mungbean (874 bp) but more than that of black gram (443 bp) ([91]Chen et al., 2015b, [92]2017; [93]Souframanien and Reddy, 2015). Of these transcripts, 146,622 (67.11%) were 201–500 bp; 39,620 (18.13%) were 501–1,000 bp; 12,654 (5.79%) were 1,001–1,500 bp; 6,511 (2.98%) were 1,501–2,000 bp; 3,986 (1.82%) were 2,001–2,500 bp; 2,567 (1.17%) were 2,501–3,000 bp; and 6,526 (2.99%) were more than 3,000 bp in length ([94]Figure 2). The developed assembly showed ∼100% back mapping of total and important reads, and this shows that our assembly had vast and proper mapping quality for the generated reads. The high percentage of reads mapping back to the de novo assembled transcriptome is a quality metric that provides an assessment of assembly entirety ([95]Hornett and Wheat, 2012). FIGURE 2. [96]FIGURE 2 [97]Open in a new tab Sequence length distribution of the assembled transcripts. Differential Expression Analysis In this study, a comprehensive transcriptome analysis has been performed with the aim to reveal those gene expression changes that, independently of the genotype diversity, are involved in controlling seed size in ricebean. Comparative transcriptome analysis was performed between two genotypes with contrasting seed size at two time points, namely, 5 and 10 DPA. A similar type of study using two genotypes with a contrasting seed size has also been done in the peanut ([98]Li et al., 2019b). The expression profile was checked for the individual genotypes across the time points (B5_B10, S5_S10) as well as between the genotypes at each time point (B5_S5, B10_S10) ([99]Supplementary Table S1). False discovery rate (FDR) values less than 0.01 and |log2 (fold change) |≥2 were considered significant differences at the expression level. While evaluating the expression difference individually for the bold genotype across the time point (B5_B10), 6,928 differentially expressed genes were identified. In B5_B10, the number of upregulated genes (6,284) were higher than downregulated genes (644), suggesting that these upregulated DEGs might be responsible for the increase in seed size. Similarly, a small genotype across the time point (S5_S10) contributed to 14,544 DEGs ([100]Figure3A; [101]Table 2). In contrast to B5_B10 expression results, S5_S10 had a high number of downregulated genes (7,862) in comparison with the upregulated genes (6,682), indicating that these downregulated genes might be repressing any transcriptional activity or downstream pathways resulting in the small size of ricebean seeds ([102]Li et al., 2019b). To gain a better understanding of molecular processes/regulatory networks associated with the seed size in ricebeans, the pattern of differentially expressed genes was analyzed between genotypes in each time point and across the time point using a Venn diagram ([103]Figure 3B). FIGURE 3. [104]FIGURE 3 [105]Open in a new tab Differentially expressed genes in the two genotypes of ricebean at two time points, i.e., 5 and 10 DPA. (A) Comparison of DEGs representing the share of overlapped and non-overlapped transcripts in bold and small genotypes at 5 and 10 DPA. (B) Number of upregulated and downregulated significant genes in bold and small genotypes. TABLE 2. Summary of significant DEGs identified in ricebean. Comparison Total DEG Total significant DEG Significantly upregulated DEG Significantly downregulated DEG B5_B10 DPA 276,372 6,928 6,284 644 B5_S5 DPA 227,479 7,185 2,079 5,106 B10_S10 DPA 264,964 5,223 634 4,589 S5_S10 DPA 220,089 14,544 6,682 7,862 [106]Open in a new tab We also identified common genes between the individual genotype across time points (i.e., B5_B10 and S5_S10) as well as between the genotypes at each time point (B5_S5 and B10_S10). In case of B5_B10 and S5_S10, in total, 2091 DEGs were common. On the other hand, 850 DEGs were common between B5_S5 and B10_S10 ([107]Supplementary Table S2). The comparative gene expression analysis indicated that a relatively large amount of the transcriptional program operating during seed development or maturation is shared between both the genotypes. The same results have been observed in the case of common bean, where 2,487 DEGs were shared by two contrasting genotypes ([108]González et al., 2021). Gene Ontology Analysis of the Transcriptome To infer the biological processes and the functions related to seed development stages, gene ontology analysis was conducted for differentially expressed genes in terms of their biological involvement, target cellular component, and molecular function using Blast2GO. Out of total 33,880 DEGs, 16,002 DEGs contributed to GO terms. In core GO annotation, 7,002 (25.37%) genes annotated for biological process (BP), 12,069 (43.72%) for molecular function (MF), and 8,533 (30.91%) for cellular components (CC). The highest number of GO terms were observed in the case of S5_S10 (44.10%), followed by B5_B10 (23.89%), B10_S10 (16.91%), and B5_S5 (15.09%). In case of the bold genotype across the time point (B5_B10), out of 6,928 DEGs, only 3,764 were annotated, constituting 1,696, 2,046, and 2,864 GO terms for BP, CC, and MF, respectively. However, in S5_S10, we observed 7,158 annotated DEGs from 14,544 DEGs and 2,954, 3,909, and 5,312 GO terms for BP, CC, and MF, respectively. On the other hand, in the case of between the genotype at the first time point (B5_S5), 2,537 DEGs were found to be annotated as compared B10_S10, in which 7,158 DEGs were annotated. In case of BP, 1,075 and 2,954 GO terms were identified in B5_S5 and S5_S10. Similarly, 2,046 and 3,909 GO terms were found for the cellular component function in B5_S5 and S5_S10, respectively, whereas in the case of molecular function, B5_S5 and S5_S10 consisted of 1,875 and 5,312 GO terms, respectively ([109]Supplementary Table S3). We have also illustrated the top or enriched functions in terms of BP, MF, and CC for both the genotypes. For example, the top biological activities include “cellular process,” “nitrogen compound metabolic process,” “small molecule metabolic process,” “cellular component organization,” “regulation of metabolic process,” “response to stress,” “cell wall organization,” cellular response to stimulus,” and developmental process. All these results indicated the biological process of DEGs vary over a broad range of terms. These enriched GO terms for BP indicate that hormone and environment stimuli played a vital role in ricebean seed/pod development. A similar type of results was also found in peanut pod development ([110]Zhu et al., 2014). Similarly, in the case of MF, bold and small genotypes were identified to be involved in “binding,” “metabolic processes,” “organic cyclic compound binding,” “heterocyclic compound binding,” “ion binding,” “transferase activity,” and “biosynthetic processes.” However, on the other hand, cellular component activities include “catalytic activity,” “membrane,” “membrane part,” “intrinsic component of the brain,” and “intracellular” and cellular activities” ([111]Figure 4). Similar results for MF and CC were observed in the pod development of peanuts ([112]Zhu et al., 2014). FIGURE 4. [113]FIGURE 4 [114]Open in a new tab Gene ontology (GO) annotation of differentially expressed genes in ricebean summarized in three main categories: biological process, cellular component, and molecular function. MapMan Analysis For comprehensive assessment of gene expression network dynamics in a developing seed of bold and small genotypes, identified DEGs were mapped onto metabolic maps using the MapMan tool and categorized into BINS on the basis of their functional groups. We could observe various functional groups of genes activated at different stages of seed development. Interestingly, we noticed a major variation between the bold and small genotypes with respect to genes related to important functions like those involved in different aspects of metabolism and signaling or regulation. A detailed analysis of genes expressed in these categories that actually distinguish the two genotypes was considered relevant, and a major emphasis was therefore given to the BINS in which the genotypes were found to be involved. This analysis allowed exploration of the global activation of specific metabolic pathways and gene regulatory networks activated during ricebean seed development. For the whole ricebean transcriptome, we annotated 13,759 transcripts with MapMan BINS of known function after running the Mercator web tool. In total, these transcripts were classified into 29 BINS. The transcripts were expressed mainly in the following categories: carbohydrate metabolism (major and minor CHO metabolism), amino acid turnover, photosynthesis, secondary metabolism, and cell wall organization ([115]Supplementary Table S4). In the former categories, most of the transcripts were highly expressed in B5_B10, while downregulated in the case of S5_S10 ([116]Figure 5A). B5_B10 and S5_S10 shared 17 pathways, but only two pathways were found in B5_B10 such as RNA processing and polyamine metabolism, indicating that these two pathways triggered after 5 DPA. Similarly, while comparing expressed transcripts between the genotype at the same time points (i.e., B5_S5 and B10_S10), 18 categories were the same, except the polyamine metabolism which was detected only at the second time point, that is, 10 DPA, which also confirms our previous result that polyamines activate only in the case of bold genotype after 5 DPA of seed development ([117]Figure 6A). FIGURE 5. [118]FIGURE 5 [119]Open in a new tab MapMan pathway representing the differential expression of genes across the time point involved in (A) metabolism (B) cellular and regulation pathway in bold and small genotypes. FIGURE 6. [120]FIGURE 6 [121]Open in a new tab MapMan pathway representing the differential expression of genes across the genotype involved in (A) metabolism (B) cellular and regulation pathway in bold and small genotypes. In our study, photosynthesis-related genes were highly enriched in bold genotypes in comparison with the small genotype which is in similarity with the previously published reports ([122]Zhu et al., 2014; [123]Clevenger et al., 2016; [124]Sinha et al., 2020). The main role of photosynthesis in seed development is reported to increase the internal oxygen content and to control biosynthetic fluxes by improving the energy supply ([125]Borisjuk et al., 2004), and it can also affect the metabolism in a number of distinct ways ([126]Ruuska et al., 2004). Our results indicate that many metabolic genes are most active during ricebean seed filling, which aligns with previous studies on M. truncatula and P. sativa where approximately half of the seed-regulated genes were assigned to metabolic pathways ([127]Benedito et al., 2008; [128]Liu et al., 2015). Furthermore, some DEGs are also mapped to hormone metabolic pathways. Majority of the genes associated with biosynthesis and response of many phytohormones like IAA, ABA, BAP, ethylene, cytokinin, jasmonate, and gibberellic acid were upregulated in the case of bold genotypes (B5_B10) as compared to small genotypes (S5_S10), in which most of the genes were downregulated ([129]Supplementary Table S5; [130]Figure 5B), whereas in case of B5_S5 and B10_S10, mixed expression of phytohormones was observed ([131]Figure 6B); a complex regulatory network triggers the initiation of seed development, maturation, and accumulation of storage products. Several studies suggested the vital role of phytohormones in pod and seed development ([132]Zhu et al., 2014; [133]Huang et al., 2017; [134]Wan et al., 2017; [135]Kumar et al., 2019; [136]Sinha et al., 2020). In 2017, a study demonstrated the role of phytohormones in various aspects of plant hormone homeostasis including biosynthesis, metabolism, receptor, and signal transduction ([137]Xu and Huang, 2017). Kyoto Encyclopedia of Genes and Genomes Pathway Analysis The KEGG pathway enrichment analysis was conducted for two contrasting genotypes at both time points (i.e., 5 DPA and 10 DPA) at a p-value <0.05 using the KEGG database server. The KEGG pathway enrichment analysis indicated that 7,178 transcripts obtained hits in the KEGG database, and those transcripts were associated with 106 unique pathways. The 7,178 transcripts included 3,112, 434, 2,103, and 1,529 transcripts with respect to B5_B10, B5_S5, B10_S10, and S5_S10, respectively. The pathway enrichment analysis of DEG conducted between different combinations, B5_B10, B5_S5, B10_S10, and S5_S10, revealed involvement the of 7, 52, 458, and 35 pathways, respectively. In case of B5_B10 and S5_S10, from the top 10 pathways, four pathways, namely, biosynthesis of secondary metabolites, protein processing in the endoplasmic reticulum, plant–pathogen interaction, and starch and sucrose metabolism were common. On the other hand, between the genotypes at both the time points (i.e., B5_S5 and B10_S10), only one pathway i.e., metabolic pathway—was common. The top 10 pathways among the time points for both genotypes as well as between the genotypes at both the time points are represented in [138]Figure 7. FIGURE 7. [139]FIGURE 7 [140]Open in a new tab List of top 10 pathways revealed by KEGG enrichment analysis. In KEGG pathway–based analysis, we observed a clear difference in the expression of some phytohormones which regulates seed development, including auxin, cytokinin, gibberellin, and ethylene. The differential expression of these phytohormones was also observed in our MapMan analysis. This was not surprising since phytohormones control or influence all aspects of plant growth and reproduction, including seed germination, growth of roots, stems and leaves, plant flowering, seed development, seed fill, and seed dormancy. The expression pattern of key genes involved in biosynthesis and signaling of important phytohormones was compared between small and bold seeded genotypes for their possible role in determining seed size. Auxin Pathway Auxin regulates many aspects of plant growth and development, including embryogenesis ([141]Möller and Weijers, 2009), the architecture of the root system ([142]Benková et al., 2003), gravitropism ([143]Rashotte et al., 2003), phototropism ([144]Blakeslee et al., 2004), initiation and radial positioning of plant lateral organs, and cell elongation ([145]Reinhardt et al., 2000; [146]Christian et al., 2006). Auxin is sensed by its receptor protein such as TRANSPORT INHIBITOR RESPONSE 1/AUXIN-SIGNALING F-BOX proteins (TIR1/AFBs) which mediate the auxin signaling pathway and centered on a ubiquitin-dependent Skp1-Cullin-F-box (SCF)^ TIR1/AFBs protein complex to regulate the Aux/IAAs-ARFs flow ([147]Leyser, 2003; [148]Figure 8A). The TIR receptor protein confers substrate specificity and target-specific Aux/IAA proteins for degradation via the SCF ^TIR1/AFBs protein complex, in the presence of auxin. The degradation of Aux/IAA leads to switching on transcriptional expression of a range of genes including auxin responsive factors (ARFs) which in turn regulate the expression of several other genes that have a role in auxin-mediated plant growth and development. FIGURE 8. [149]FIGURE 8 [150]Open in a new tab Phytohormone pathways important for seed development are represented in two contrasting genotypes of ricebeans on the basis of their expression and involvement in the enriched KEGG pathways. (A) Auxin signaling pathway. (B) Cytokinin pathway. (C) Ethylene pathway. (D) Gibberellic acid pathway. The KEGG pathway expression–based analysis revealed a clear difference in the auxin signaling pathway in two contrasting ricebean genotypes, which is also in accordance with our MapMan results where auxin signaling related genes showed higher expression in the bold genotype than the small genotype. We found approximately 51 genes encoding SCF ^TIR1/AFB , Aux/IAA, ARFs, E3 ubiquitin transferase enzyme, and 26S proteasome, showing distinct expression dynamics in bold (B5_B10) and small (S5_S10) genotypes ([151]Supplementary Table S6). The three key signaling elements TIR1/AFBs, Aux/IAAs, and ARFs have also been identified in different species including Arabidopsis ([152]Chapman and Estelle, 2009), populus ([153]Kalluri et al., 2007), and rice ([154]Jain et al., 2005; [155]Parry et al., 2009; [156]Shen et al., 2010). Similarly, several studies focused on the role of AUX/IAA in determining the seed size with the influence of the expression of a gene in AUX biosynthesis (ZmTar3, ZmTar1, and ZmYuc1) and signaling (auxin efflux carriers, PIN, and ARF2) ([157]Schruff et al., 2006; [158]Bernardi et al., 2016). Homologs of ZmYuc1, PIN, and ARF2 were significantly differentially expressed during tartary buckwheat seed development ([159]Huang et al., 2017). The high expression of DEGs in bold genotypes corresponds to cell division, and expansion is faster to form larger size seeds at these stages. The upregulation of SCF ^TIR1 , E3 ubiquitin transferase enzyme, and 26S proteasome was found in the bold genotypes, indicating the degradation of Aux/IAA and release of ARFs to modulate the expression of their target genes including SMALL AUXIN UP RNA (SAUR), Gretchen Hagen 3 (GH3), and indole-3-acetic acid–inducible gene (Aux/IAA), while in case of small genotypes, SCF ^TIR1 was not expressed, but TOPLESS (TPL) gene was upregulated, suggesting that Aux/IAA might have formed the complex with ARFs to block the transcriptional activity ([160]Figure 8A; [161]Hayashi, 2012).The induction of auxin-inducible acyl amidosynthetases, GH3, by the ARF family is the early event of auxin signaling cascade ([162]Zhang et al., 2016). The expression of GH3 gene was upregulated in the case of bold genotypes, while it was downregulated in the small genotypes. SAUR expression was upregulated in small genotypes, while downregulated in bold genotypes. The aforementioned results clearly inferred that the differential regulation of the auxin signaling pathway in bold and small genotypes might be the main factor contributing to the variation in ricebean seed size ([163]Bai et al., 2019). Cytokinin Pathway Similar to auxin, cytokinin is another important plant hormone regulating many aspects of plant growth ([164]Tarkowski et al., 2006; [165]Werner et al., 2008). In plants, the regulation of cytokinin is facilitated by the two-component system (TCS) which consists of four groups of proteins: histidine kinases (AHKs; AHK2, AHK3, and AHK4/WOL1/CRE1), histidine-containing phosphotransfer proteins (AHPs; AHP1-AHP5), type-B response regulators (type-B ARRs; ARR1, ARR2, ARR10-ARR14, and ARR18-ARR21), and type-A ARRs (ARR3-ARR9 and ARR15-ARR17). In Arabidopsis, AHK2, AHK3, and CRE1 were found to be involved in seed size ([166]Riefler et al., 2006; [167]Heyl et al., 2012). Cytokinins have been reported to function in seed development, such as seed size, seed yield, embryonic growth, with the involvement of genes encoding isopentenyl transferase (IPT), cytokinin oxidase/dehydrogenase (CKX), and histidine kinase (HK) ([168]Bartrina et al., 2011). In our study, we have also found the expression of genes such as IPT, CKX, and HK. IPT upregulation was observed only in the case of small genotypes, whereas CKX was upregulated in bold genotypes, and mixed expression of HK was noticed in both the genotypes ([169]Figure 8B; [170]Supplementary Table S7). The upregulation of CKX in bold seed genotypes hints at its possible role in determining the seed size. The CKX proteins are widely distributed in plants and implicated in various plant growth and developmental processes by maintaining the endogenous cytokinin level via irreversible degradation. In plant tissues, the expression of the CKX genes is primarily regulated by the endogenous cytokinin level. Various past studies have shown the role of CKX genes in the regulation of the seed size and grain yield in different plant species. In Arabidopsis, a CKX family gene–encoded enzyme CYTOKININ OXIDASE 2 (CKX 2) has been demonstrated to be associated with large seed size via catalyzing irreversible degradation of cytokinin. Similarly, in rice, a Gn1a locus encoding for cytokinin oxidase/dehydrogenase (OsCKX2) is shown to be responsible for high grain yield ([171]Ashikari et al., 2005). On the other hand, the expression of type-A Arabidopsis response regulator (type-A ARRs) genes that negatively regulates the cytokinin signaling was majorly detected in small genotypes. This suggested that type-A ARR genes may be repressing the cytokinin signaling pathway ([172]Heyl and Schmülling, 2003; [173]Lohar et al., 2004; [174]Desbrosses and Stougaard, 2011). The inhibition of the cytokinin signaling pathway may contribute to plant and bacterial cell differentiation ([175]Bromley et al., 2014). Mixed expression of AHP and type-B ARRs was found in both the genotypes. Phosphate transfer to type-B ARR proteins modulates the transcriptional changes in the nucleus and causes the expression of primary cytokinin response genes including the type-B ARRs. Ethylene Pathway Ethylene, an “aging” hormone, has been reported to control the development of plant seeds and grains in various species ([176]Zhong et al., 2002; [177]Hentrich et al., 2013; [178]Huang et al., 2013; [179]Guo et al., 2016).Molecular evidence demonstrated ethylene’s role in the regulation of seed size and seed shape, in which genes in ethylene biosynthesis (EIN2, ERS1, and ETR1), signaling (CTR1, ETO1, ETR1, and EIN2), and catabolism (ACC deaminase) were involved ([180]Robert et al., 2008; [181]Walton et al., 2012). According to our results, the expression of ethylene receptors (ERS1/2) was higher in bold genotypes than small genotypes, whereas CTR1, a negative regulator of ethylene hormone showed contrasting expression with upregulation in small and downregulation in bold genotypes ([182]Figure 8C; [183]Supplementary Table S8). In buckwheat, the differential expression of ERS1, ETO1, ETR1, etc. was observed ([184]Huang et al., 2017). In case of bold genotypes, we have noticed the high expression of SIMKK, MPK6, EIN3-like transcription factors, and EIN2, indicating positive regulation of transcriptional response in the bold genotype. In case of small genotypes, the upregulation of ERFs depicted that ERF might have shown activity after the phosphorylation via the MPK3/6-cascade, which regulates the ethylene biosynthesis, and the expression of EIN3/EIL1 was not found which possibly indicates its degradation by ubiquitination. In our samples, we found a full cascade of gene expressions in bold genotypes, while in small genotypes, the expression of genes detour from the normal expression and opted a new route for the ethylene-inducible gene expression. Gibberellin Pathway Gibberellins (GAs) are well-known plant hormones that are widely involved in the growth and development processes. GAs, auxin, ABA, and ethylene have been involved in the regulation of seed development and pod maturation ([185]Ziv and Kahana, 1988; [186]Shlamovitz et al., 1995; [187]Ozga et al., 2003). In case of bold genotypes, the expression of GA, DELLAs, and SCF-complex protein is upregulated, which indicates DELLA proteolysis; simultaneously, the upregulation of protein indeterminate domains (IDDs) and scarecrow-like proteins (SCLs) were also observed, which supports the feedback loop mechanism which regulate the GA signaling ([188]Figure 8D). According to the feedback loop mechanism, DELLA initiates the expression of downstream genes, including SCLs by IDD-mediated interaction with their promoters. The subsequent increased concentration of SCLs enhances the SCL3/IDD complex synthesis while decreasing the formation of the DELLA/IDD complex and consequent suppression of the expression of SCLs, which mediates the homeostatic regulation of the downstream genes, including positive regulation of SCLs and GA signaling. In case of small genotypes, the expression of SCF complex protein was absent, while the expression of DELLAs was unregulated. Consecutively, we observed the SCL protein script, while IDD protein was completely absent. The expression of the phytochrome-interacting factor (PIF) protein was found, which indicates the DELLA-mediated inhibition of hypocotyls elongation ([189]Supplementary Table S9). Previous studies have revealed that genes encoding GA2 oxidase and GA3 oxidase in the GA biosynthesis pathway can affect seed development, starch biosynthesis, embryo, and seed coat development ([190]Nakayama et al., 2002; [191]Singh et al., 2002). The downregulation of GA2-oxidase was observed in our results similar to a study of the tartary buckwheat in which the downregulation of GA2-oxidase was also depicted during seed development ([192]Huang et al., 2017). The KEGG pathway and the MapMan analysis suggested the differential expression of phytohormone biosynthesis or response genes. According to the MapMan analysis, auxin, cytokinin, ethylene, and gibberellin showed contrasting expressions in both the genotypes ([193]Figure 5B). Similarly, in terms of the KEGG pathway, we have observed how the signaling pathways of these phytohormones were different. The present work confirms that auxin, cytokinin, ethylene, and gibberellin are the important regulators of the seed size in ricebean. Our results are also in accordance with those of previous studies in other species ([194]Riefler et al., 2006; [195]Heyl et al., 2012). Candidate Gene Identification for Seed Development–Related Traits The expression of a number of genes starting from the anthesis to early stages of maturity may have a crucial role in determining grain size and various other pod-related traits in pulses ([196]Pazhamala et al., 2016). In this study, candidate genes for various traits such as days of flowering, pod shattering, seed per pod, seed size, 100-seed weight, and pod length were identified from the assembled transcriptome on the basis of sequence similarity search. In total, we identified 142 genes in ricebean belonging to development-related traits on the basis of similarity search (BLASTX) and e-value. Furthermore, the candidate genes were also characterized in silico on the basis of their domain analysis using Pfam software. Out of 142 genes, only 120 genes showed domain similarity with their hits. Therefore, we discarded 22 genes whose domain was not matched. Hence, according to our study, we found 120 candidate genes of ricebean belonging to different development-related traits (days of flowering, pod shattering, seed per pod, seed size, 100-seed weight, and seed length) ([197]Figure 9; [198]Supplementary Table S10). FIGURE 9. [199]FIGURE 9 [200]Open in a new tab Heat map representing the differential gene expression of the identified candidate genes for six traits including seed size, 100-seed weight, seed/pod, days to flowering, pod shattering, and pod length in bold and small genotypes at 5 DPA and 10 DPA. In terms of pod development, seed size is a key determinant for the seed or grain yield in legume crops ([201]Amkul et al., 2020). In ricebean, we found four candidate genes for seed size encoding: histidine kinase 2, delta sterol reductase, phosphate transporter (PHO1), and WRKY domain–containing protein (WRKY 40). These genes have already been reported to be involved in seed size. For example, Vigun05g039600 (PHO1) has been reported to be a positive regulator of seed development that affects both the cell size and cell number ([202]Lo et al., 2019). Similarly, Vigun08g217000 which codes for histidine kinase 2 has been identified as a potential candidate gene for improved organ size during cowpea domestication ([203]Lonardi et al., 2019), and its Arabidopsis ortholog AHK2 has been shown to regulate the seed size ([204]Riefler et al., 2006; [205]Bartrina et al., 2017). Vigun11g191300 encoding a delta (24)-sterol reductase is an ortholog of the Arabidopsis DIMINUTO gene which has been shown to regulate cell elongation ([206]Takahashi et al., 1995). In foxtail millet, Loose Panicle1–encoded WRKY transcription factor regulates the seed size by increasing the length and width of the seed ([207]Xiang et al., 2017). Hence, these genes are the strong candidates as seed size is affected by multiple pathways. On the other hand, for 100-seed weight, 29 candidate genes were identified corresponding to expansin, cytokinin dehydrogenase, cytochrome P450, and response regulatory domain containing protein. The significance of these genes as candidate loci related with the 100-seed weight is supported by the work done on Arabidopsis, where orthologs of the candidate genes in the cytokinin pathway have been shown, in transgenic studies, to regulate seed size and/or weight ([208]Daele et al., 2012). Our findings are also in accordance with the common bean in which type-B regulators were found to be involved in the activation of downstream genes in the cytokinin pathway, and the genes encoding cytokinin dehydrogenase regulates the pathway by degrading active cytokinin ([209]Hwang et al., 2012; [210]Schmutz et al., 2014). Likewise, in Arabidopsis, expansins increased grain size and also improved grain production ([211]Bae et al., 2014). Recent studies have also associated expansins with grain size and weight in wheat and tomato ([212]Muñoz and Calderini, 2015; [213]Brinton et al., 2017). TaCYP78A3 in wheat and CYP78A5 in Arabidopsis encodes the cytochrome P450, which positively correlates with seed size and seed weight ([214]Ma et al., 2015; [215]Tian et al., 2016b). Like other development-related traits, flowering time is also an important trait because several agronomical traits such as quality of the grain and grain yield depend on flowering time. For days to flowering trait, we identified 21 candidate genes in our dataset encoding protein Flowering Locus T-like, GIGANTEA-like, cryptochrome, and transcription factors such as bHLH, ERF, and PIF-3. Most of the candidate genes of days to flowering had high expression in case of 10 DPA, instead of 5 DPA. In rice, florigen is encoded by RICE FLOWERING LOCUS T 1 (RFT1) and the orthologs of Arabidopsis FT and plays important role in heading date, influencing yield traits in rice ([216]Tamaki et al., 2007; [217]Komiya et al., 2009), whereas GIGNANTEA-like genes observed in the regulation of many genes which influence the circadian clock, blue light photoreceptor, and flowering time have also been reported in Arabidopsis ([218]Hayama et al., 2003; [219]Fornara et al., 2009). Similar to rice results, the Flowering Locust T-like in ricebean might help in the improvement of yield. On the other hand, the number of seeds per pod might be useful for increasing the seed yield of ricebean. We identified 15 candidate genes for seeds per pod trait having annotations like MAPK, NAC, MALE STERILE 5, and ABSCISIC ACID-INSENSITIVE 5-like protein. Vigun03g187300 (ABA-insensitive 5-like protein 6) is an ABA-responsive element (ABRE)–binding factor that regulates ABRE-dependent gene expression ([220]Nakashima and Yamaguchi-Shinozaki, 2013). In Arabidopsis, ABA deficiency reportedly decreases the number of seeds per siliqua ([221]Cheng et al., 2014). Hence, the higher expression of this gene in bold genotypes implies an increase in the number of seeds per pod that could result in the improvement of the ricebean yield. The Vigun05g126900 gene, encoding MALE STERILE 5, was selected as a candidate gene in zombie pea ([222]Amkul et al., 2020). In a previous study on Arabidopsis, mutations to MALE STERILE 5 resulted in the development of “polyads” (i.e., tetrads with more than four pools of chromosomes following male meiosis) ([223]Glover et al., 1998). Plants that are homozygous for the MS5 recessive allele apparently revealed arrested growth and harvested empty siliques, whereas in plants that are heterozygous for MS5, siliqua elongation and seed set are less repressed ([224]Glover et al., 1998). In case of the pod length, five candidate genes in ricebeans have been identified, mostly corresponding to the auxin response factor. Glyma.07G134800, an ortholog of Arabidopsis, was also associated with the auxin pathway ([225]Jiang et al., 2018). Furthermore, we have also identified a few candidate genes associated with pod shattering which is considered to be an undesirable agronomical trait. We identified maximum candidate genes (i.e., 57) for this trait in our ricebean study. Out of the 57 candidate genes, 18 genes encode transcription factors like AP2/ERF, WRKY, and NAC, whereas the rest of the genes were involved in cellulose synthase and serine/threonine protein kinase. The candidate genes for pod shattering have also been identified in other legumes including Vigun02g095200 (cellulose synthase), Vigun03g306000 (NAC domain transcription factor), and zombi pea ([226]Suanum et al., 2016; [227]Lo et al., 2018; [228]Takahashi et al., 2019; [229]Amkul et al., 2020; [230]Watcharatpong et al., 2020). In Sorghum propinquum, WRKY modulates the flower and seed development and lignin deposition, and it is also found to be involved in pod shattering ([231]Tang et al., 2013). Recently, in rice, AP2 transcription factor–coding gene SHATTERING ABORTION1 (SHAT1) was observed having a crucial role in pod shattering. Two genes encoding NAC in Vigna unguiculata were found to be involved in cell wall biosynthesis and hence influencing the pod shattering ([232]Zhou et al., 2012; [233]Lo et al., 2018). The identification of pod shattering genes may reduce preharvest yield damages in ricebean, resulting in a more efficient yield. Thus, pod indehiscence may be a valuable trait during seed harvesting, making it a main concern during crop domestication ([234]Amkul et al., 2020). To support our findings related to candidate genes, we performed a comparative analysis of the identified candidate genes with our MapMan and KEGG pathway results. Out of 120 candidate genes, 23 genes matched with the MapMan results ([235]Table 3). For example, the expression of eight candidate genes of 100-seed weight and seed size were only shown in the small genotype encoding PHO1, cytokinin dehydrogenase, A-type cytokinin ARR response negative regulator, etc. Similarly, for bold genotypes, only one gene, aB10dtrinity_dn14585_c0_g1_i2, for a seed was upregulated, revealing a high number of pods in bold genotypes as compared to the small genotype. On the other hand, in terms of time point, three genes (cS5dtrinity_dn10996_c2_g4_i3: seed size; cS5dtrinity_dn11557_c0_g1_i4: seeds/pod; and aB10dtrinity_dn30303_c0_g10_i1: days to flowering) were detected only at the first time point, that is, 5 DPA. Two genes (aB10dtrinity_dn33078_c0_g1_i1: 100-seed weight and bS10d1trinity_dn10624_c1_g9_i1: pod shattering) were found to be highly expressed only in bold genotypes, whereas nine genes encoding alpha class expansins were found to be downregulated, specifically in the small genotype. TABLE 3. List of candidate genes matched with our MapMan results. MapMan category Candidate ricebean gene ID Description B5_B10 S5_S10 B5_S5 B10_S10 Trait Amino acid metabolism cs5dtrinity_dn11557_c0_g1_i4 Histidinol-phosphate aminotransferase — — 2.23 — Seeds/pod Cell wall organization bs10d1trinity_dn10624_c1_g9_i1 Catalytic component CesA of cellulose synthase complex 2.34 — — — Pod shattering ab10dtrinity_dn33078_c0_g1_i1 Alpha-class expansin 7.38 — — — Seed weight ab4dtrinity_dn10598_c3_g1_i1 Alpha-class expansin — −3.88 — — Seed weight ab10dtrinity_dn30367_c2_g3_i1 Alpha-class expansin — −3.22 — — Seed weight bb4dtrinity_dn13171_c6_g7_i1 Alpha-class expansin — −3.13 — — Seed weight bb10dtrinity_dn16316_c5_g6_i2 Alpha-class expansin — −2.76 — — Seed weight bs5dtrinity_dn13044_c1_g3_i3 Alpha-class expansin — −2.82 — — Seed weight cb4dtrinity_dn14045_c9_g6_i1 Alpha-class expansin — −3.33 — — Seed weight cb4dtrinity_dn14247_c0_g4_i1 Alpha-class expansin — −4.18 — — Seed weight cs5dtrinity_dn11417_c0_g2_i1 Alpha-class expansin — −3.43 — — Seed weight cs5dtrinity_dn12484_c15_g1_i1 alpha-class expansin — −3.3 — — Seed weight Lipid metabolism cs5dtrinity_dn11621_c1_g7_i3 Sterol delta24 reductase — −2.09 — — Seed size ab10dtrinity_dn14585_c0_g1_i2 Dihydrolipoamide acetyltransferase component E2 5.3 — — — seeds/pod Nucleotide metabolism ab10dtrinity_dn30303_c0_g10_i1 Uracil phosphoribosyltransferase (UPP) — — −3.05 — days to flowering Nutrient uptake cs5dtrinity_dn18119_c0_g1_i1 Phosphate transporter (PHO1) — −2.93 — — Seed size Phytohormone action cs5dtrinity_dn10996_c2_g4_i3 Receptor protein (AHK) — — 2.2 — Seed size ab4dtrinity_dn16153_c0_g1_i2 Cytokinin dehydrogenase — −4.66 — — Seed weight cs5dtrinity_dn9587_c0_g1_i4 Steroid 22-alpha-hydroxylase (DWF4) — −3.24 — — Seed weight as10dtrinity_dn8390_c0_g1_i1 A-type cytokinin ARR response negative regulator — 3.93 — — Seed weight cs5dtrinity_dn5774_c0_g1_i1 Cytokinin dehydrogenase — −3.51 — — Seed weight Protein homeostasis bb4dtrinity_dn32604_c0_g1_i1 Matrixin-type metalloprotease — −2.66 — — Pod Shattering Redox homeostasis ab4dtrinity_dn10550_c1_g1_i11 GDP-D-mannose-epimerase (GME) — −2.78 — — Seeds/pod [236]Open in a new tab Similarly, 16 candidate genes (auxin: 2; cytokinin: 8; ethylene: 5; GA: 1) were matched with the KEGG pathway results ([237]Table 4). The matched genes were found to be associated with several seed development–related traits like pod length, days to flowering, 100-seed weight, seeds/pod, and pod shattering. All the genes were expressed in the small genotype, except two (ab10dtrinity_dn29885_c1_g2_i2 and bs10dtrinity_dn12088_c3_g6_i6) which were expressed in bold genotypes corresponding to MAPK. TABLE 4. List of candidate genes matched with our KEGG pathway results. KEGG pathway Ricebean candidate gene ID Description B5_B10 S5_S10 Trait Auxin cb10dtrinity_dn16977_c3_g12_i1 Auxin response factor — −2.09 Pod length cs5dtrinity_dn10719_c0_g1_i4 Auxin response factor — −2.15 Pod length Cytokinin ab4dtrinity_dn16153_c0_g1_i2 Cytokinin dehydrogenase 6–like — −4.66 Seed weight cs5dtrinity_dn10983_c0_g1_i2 Two-component response regulator–like APRR1 isoform X4 (CCT motif, rec) — −2.19 Days to flowering as5dtrinity_dn1065_c0_g1_i1 HPt domain–containing protein — 2.17 Seed weight as10dtrinity_dn8390_c0_g1_i1 Response regulatory domain–containing protein (type A) — 3.93 Seed weight bs10d1trinity_dn9417_c0_g2_i1 HPt domain–containing protein — 2.26 Seed weight cb10dtrinity_dn35628_c0_g1_i1 HPt domain–containing protein — 2.71 Seed weight cs5dtrinity_dn4103_c0_g1_i1 Cytokinin hydroxylase–like — −2.85 Seed weight cs5dtrinity_dn5774_c0_g1_i1 Cytokinin dehydrogenase 6–like — −3.51 Seed weight Ethylene bb10dtrinity_dn16202_c0_g2_i1 Ethylene-responsive transcription factor RAP2-7–like isoform X2 — 2.52 Days to flowering cb4dtrinity_dn13190_c1_g1_i4 AP2/ERF domain–containing protein — 3.14 Pod Shattering ab10dtrinity_dn29885_c1_g2_i2 Mitogen-activated protein kinase 9.72 — Seed/pod bs10dtrinity_dn12088_c3_g6_i6 Mitogen-activated protein kinase 4.21 — Seed/pod bs10dtrinity_dn12088_c3_g6_i6 Mitogen-activated protein kinase — 5.55 Seed/pod GA ab4dtrinity_dn7670_c0_g1_i4 Transcription factor PIF3-like isoform — 2.06 Days to flowering [238]Open in a new tab Simple Sequence Repeat Identification In this study, we used the MISA Perl script ([239]http://pgrc.ipk-gatersleben.de/misa) to detect the microsatellites. Of the 288,393 transcripts generated in this study, 14,663 contained an SSR totaling 201,517,181 bp. Out of these 14,663 sequences, 2,317 sequences had more than a single SSR, and 1,487 had SSRs of different motifs (compound SSR). Dinucleotide repeat motifs were the most abundant among the five types of motifs, totaling 8,866 (50.67%). The second most abundant were trinucleotides totaling 7,938 (45.36%), followed by 448 tetranucleotides (2.56%), 145 pentanucleotides (0.82%), and 100 hexanucleotide motifs (0.57%) ([240]Figure 10A). Similar results have been reported in the previous transcriptome published for ricebean varieties ([241]Chen et al., 2016) as well as for other legume species including mungbean ([242]Tian et al., 2016b), adzuki bean ([243]Chankaew et al., 2014; [244]Chen et al., 2015b), cowpea ([245]Gupta et al., 2010), and chickpea ([246]Choudhary et al., 2008). FIGURE 10. [247]FIGURE 10 [248]Open in a new tab (A) Bar diagram representing the type and frequency of SSRs identified in ricebean using assembled transcripts. (B) SSR13 polymorphism on selected eight accessions of Vigna species. (C) Dendrogram representing the relationship distance among the eight accessions. The number of the given repeat unit of SSRs ranged from 5 to >10, and as the number of repeat units increased, the frequency of the given SSR structure progressively decreased ([249]Supplementary Table S11). As for the two most abundant repeat motif types (di- and trinucleotides), the frequency of the AG/CT motif type accounted for 17.41% in dinucleotide repeat motifs, and the frequency of GAA/TTC was the most abundant motif type in the trinucleotide, accounting for 6.3%. A previous study on adzuki bean also showed a high frequency of AG motifs in dinucleotides ([250]Chankaew et al., 2014). Simple Sequence Repeat Validation To determine the polymorphism level of the identified EST-SSRs, the randomly selected 50 SSRs were evaluated in eight accessions of Vigna species including V. umbellata (6), V. mungo (1), and V. radiata (1) ([251]Supplementary Table S12). From 50 pairs, 43 were successfully amplified, while seven pairs were not able to generate a PCR product ([252]Supplementary Table S13). More than 85% of the SSR markers were successfully amplified, suggesting that the quality of our assembled transcripts was very high. The annealing temperatures of the primers ranged between 54 and 56°C. Out of these 43 SSR primer pairs, 26 pairs showed polymorphism (dinucleotide: 12, trinucleotide: 14) and the rest were monomorphic ([253]Figure 10B; [254]Table 5). A high polymorphism level (60.46%) of ricebean EST-SSRs was observed in the selected set of eight accessions which was higher than that from previous reports in other legume species including the chickpea (47.3%) ([255]Nayak et al., 2010), mungbean (33%) ([256]Chen et al., 2015b), black gram (58.2%) ([257]Souframanien and Reddy, 2015), and adzuki bean (7.6%) ([258]Chen et al., 2015a) while lower than common bean (71.3%) ([259]Hanai et al., 2007), whereas when we considered only ricebean genotypes (six accessions), only 34.88% SSR markers were found polymorphic. We have also checked the cross-species transferability pattern and found that the transferability of ricebean–derived SSR markers was higher in V. radiata (73.08%) than in V. mungo (50%). Various studies depicted the importance of SSR cross-transferability in Vigna species including ricebean, mungbean, and cowpea ([260]Pattanayak et al., 2019). Furthermore, the genetic distance among the accessions was determined, and we found two clusters, with six (V. umbellata) in the first cluster and two (V. mungo and V. radiata) in the second cluster, respectively ([261]Figure 10C). TABLE 5. List of 26 SSR markers that showed polymorphism in a set of eight accessions of Vigna species. Transcript ID Marker Forward primer (5'->3′) Reverse primer (5'->3′) Annealing temp (°C) Repeat motif Allele size (bp) No. of alleles BB7DTRINITY_DN9136_c0_g1_i1 SSR2 ATG​ATC​GGA​CAC​TAG​GAG​AC TTG​GCC​AAT​GTC​TAT​TTG​A 54 ATT(18) 150–160 2 BS7D1TRINITY_DN10554_c1_g1_i1 SSR3 ACG​CAC​AGT​TTC​ATG​GTT​A ACA​ATC​TTC​AAC​CAC​ACT​CC 55 GAA(19) 100–130 4 BS4DTRINITY_DN12307_c0_g4_i3 SSR4 CAA​ACC​CAC​TAA​CCC​AAG​TA ATG​AAA​ATG​CAA​ACA​CAC​TG 55 TAA(17) 140–150 2 CB4DTRINITY_DN13434_c0_g3_i2 SSR7 ATT​CCC​AGC​TTA​GGA​GAA​AC TGG​ATT​TGT​TCT​TAA​TGG​TG 55 ATA(18) 140–170 2 AB4DTRINITY_DN10798_c2_g4_i2 SSR8 GTT​ATT​GGA​ATG​GAA​GAG​CA CTT​CCG​ACA​ACA​ATT​CCT​T 55 GAA(16) 120–140 3 AS4DTRINITY_DN8490_c0_g1_i1 SSR9 CAA​CCG​GGT​AGA​GAA​AAG​TA CTA​CCA​AGT​TGC​TTG​CTT​CT 54 AAT(22) 210–220 2 CB7DTRINITY_DN17452_c5_g1_i3 SSR11 ATG​GGT​TTC​CTA​TGA​ATT​TG GCT​AAT​GAC​TCT​GCT​GTT​CC 55 TAA(11) 140–150 2 AB4DTRINITY_DN10727_c2_g3_i1 SSR12 GCT​AAT​GAC​TCT​GCT​GTT​CC ATG​GGT​TTC​CTA​TGA​ATT​TG 55 TTA(11) 140–150 2 AS4DTRINITY_DN11798_c1_g2_i4 SSR13 GGG​AAA​ATG​TTA​CGG​AGT​TC GTT​TTC​CCA​CCA​CAA​CTA​AC 56 TGG(12) 120–150 2 BS4DTRINITY_DN12006_c0_g8_i4 SSR14 CTG​GGA​AAC​TGA​GCA​GAT​AG CAG​ATA​GTT​GCA​ATA​GCT​TGA​A 55 TAT(12) 170–190 3 CB7DTRINITY_DN16904_c1_g1_i5 SSR15 TTA​GAA​TTT​CCG​TTG​CTA​CC CCC​TGA​AAG​AAG​TTT​GGA​AT 55 TAT(12) 170–180 2 BB7DTRINITY_DN17092_c1_g1_i3 SSR16 TTC​ACC​TCT​GAC​TGA​TCA​CA CAA​GTC​TAA​TGC​ATC​CAC​CT 55 GAT(13) 160–180 3 BS7DTRINITY_DN11614_c2_g22_i1 SSR18 CTG​GGA​AAC​TGA​GCA​GAT​AG CAG​ATA​GTT​GCA​ATA​GCT​TGA​A 55 TAT(12) 190–200 2 AS4DTRINITY_DN11028_c4_g1_i3 SSR24 CTG​GGA​AAC​TGA​GCA​GAT​AG CAG​ATA​GTT​GCA​ATA​GCT​TGA​A 55 TAT(12) 190–200 2 BS4DTRINITY_DN11934_c2_g1_i5 SSR28 TTC​CAC​GTT​CTC​ACT​CTC​TT GGA​ATC​CAT​TAC​TGT​GAA​CG 55 TC(37) 100–130 4 BS7DTRINITY_DN11790_c6_g2_i8 SSR30 CTC​TTC​TTA​GAG​CCA​AAC​CA ACG​CCA​TGT​GTA​TGA​AGA​TT 55 CT(36) 100–120 3 BS7DTRINITY_DN3955_c0_g1_i2 SSR31 CGT​TTC​CTA​AGC​TTC​CTT​TA GAG​AAG​CGA​AGA​AGA​AAG​GT 55 TC(35) 100–130 4 AB7DTRINITY_DN28298_c2_g1_i4 SSR32 CTA​CCA​GTG​GGT​TCG​TTT​AC TCT​CTC​TTC​TCC​CCT​TAA​CC 55 GA(32) 130–160 4 AB4DTRINITY_DN10313_c1_g2_i7 SSR35 CAC​CCT​AAC​CTC​ATT​CTC​AG GAC​AGC​AAG​AAG​GAG​AGA​GA 54 CT(48) 100–110 2 CB4DTRINITY_DN14022_c6_g1_i1 SSR37 TCA​CAA​AAC​CCT​AAA​ACT​CG GGC​AGT​GTG​AAA​GAA​AGA​GA 55 TC(28) 200–220 3 CS4DTRINITY_DN11268_c2_g5_i1 SSR38 AAT​GTG​CTC​TTC​TTG​TTG​CT ACC​GAT​GGA​ATA​ACC​AAA​C 55 TC(28) 100–110 2 BB7DTRINITY_DN13571_c0_g1_i4 SSR39 TTG​TGG​ATA​TAA​ACC​CAA​CC GCT​CCT​CCG​CTC​TTC​TAT​TA 56 AG(28) 120–150 4 BB7DTRINITY_DN16958_c4_g1_i5 SSR40 TGA​TTA​ACT​GGG​TTC​TCT​GC TTC​TAC​AAC​CAC​CCA​ATC​TC 55 AT(28) 110–130 3 BS7DTRINITY_DN11905_c0_g1_i6 SSR41 GGG​AGT​ATC​CAA​AGA​AAC​AA AAT​CCA​CAC​ACA​AAT​GTG​AA 54 TC(30) 110–120 2 BB4DTRINITY_DN12047_c0_g1_i7 SSR42 GGA​ATC​CAT​TAC​TGT​GAA​CG TTC​CAC​GTT​CTC​ACT​CTC​TT 55 GA(30) 110–140 4 CB7DTRINITY_DN16920_c1_g4_i9 SSR45 GTG​GGT​AAC​TAT​GCC​CTA​AGT GGT​GAG​TGG​ATG​TGA​GAA​AG 55 TC(27) 110–120 2 [262]Open in a new tab SSR molecular markers on the basis of transcriptomes have become more promising and useful because of their high cross-species transferability, their high rate of amplification, and being reasonably cheap as compared with the SSR markers of non-transcribed regions ([263]Hansen et al., 2008; [264]Rai et al., 2013). Moreover, since they can easily expose variance in the expressed portion of the genome, it is possible to evaluate marker–trait association (MTA) and specific genomic regions stating important physio-agronomic traits ([265]Kalia et al., 2011). Conclusion The transcriptomic analysis in our study provided detailed insights into molecular processes and candidate genes controlling seed size and other seed development–related traits in ricebean. The MapMan and KEGG analyses confirmed that the phytohormone signaling pathways varied in both the contrasting expressions taken in this study and can therefore be the regulators of seed size as well as other seed development–related traits in ricebean. We hypothesize that the auxin, cytokinin, ethylene, and gibberellin signaling pathways interact cooperatively with one another, thereby modulating the expression of genes of seed development–related traits. Further research is required to identify key regulators/genes in determining seed size. Acknowledgments