Abstract The normal megastrobilli and microstrobilli before and after the sexual reversal in Pinus massoniana Lamb. were studied and classified using a transcriptomic approach. In the analysis, a total of 190,023 unigenes were obtained with an average length of 595 bp. The annotated unigenes were divided into 56 functional groups and 130 metabolic pathways involved in the physiological and biochemical processes related to ribosome biogenesis, carbon metabolism, and amino acid biosynthesis. Analysis revealed 4,758 differentially expressed genes (DEGs) between the mega- and microstrobili from the polycone twig. The DEGs between the mega- and microstrobili from the normal twig were 5,550. In the polycone twig, 1,188 DEGs were identified between the microstrobili and the sexually reversed megastrobili. Concerning plant hormone signal transduction pathways, the DEGs from both the normal and polycone twigs displayed distinct male or female associated expression patterns. There were 36 common hormone-related DEGs from the two types of twigs of P. massoniana. Interestingly, expression of these DEGs was up-regulated in the bisexual strobili, which underwent the sexual reversal. A portion of MADS-box genes in the bisexual strobili were up-regulated relative to expression in microstrobili. Keywords: Pinus massoniana Lamb., Microstrobilus, Sexual reversal, Transcriptome, Differentially expressed gene 1. Introduction Pinus massoniana Lamb. (Fam.: Pinaceae) is a monoecious gymnosperm with unisexual flowers. It serves as an important afforestation and timber yielding species in the Peoples Republic of China. Usually in September to October, the axillary buds of the vegetative stem of P. massoniana begin to form the male cone primordia in the direction of development from bottom to top. Later, it produces nearly one hundred microstrobili per vegetative stem. In October, 2-4 female cone primordia develop at the apex of the twig. In the following months from February to April, 2-4 megastrobili (female flowers) are developed in the shoot apex. These form 2-4 female cones following pollination, fertilization and development ([29]Fig. 1). However, the microstrobili in twigs of some plants experience sexual reversal. In those plants, the microsporangia develop into bracts and ovuliferous scales of the female cones basipetally. Gradually, they are converted into female flowers morphologically which develop further into long strings of cones (polycones). Previous studies have shown that this trait is genetically stable, with a great potential in increasing seed yield [[30]1]. The sexual reversal of microstrobili to polycones has been discovered in other species of Pinaceae too [[31]2,[32]3,[33]4,[34]5,[35]6]. Currently, few studies are available on the sexual reversal mechanism of unisexual flowers of gymnosperms. In Pinus tabulaeformis Carr., via transcriptome analysis, Shihui et al.[[36]7] revealed that the expression of genes were dramatically different between male and female flowers. Fig. 1. [37]Fig. 1 [38]Open in a new tab A-F. Normal- and polycones of P. massoniana. A, normal microstobilli; B, normal megastrobilli; C, normal cones; D and E, sexual reversal of microstrobilli into megastrobilli and F, polycones. In the present investigation, transcriptome sequence analysis of strobilli before and after the sexual reversal, and also the normal strobilli of P. massoniana has been performed for the first time using this second generation sequencing technology. This will provide data on the induction factors the reproductive regulation of bud differentiation and the sexual reversal processes of the bisexual flower of P. massoniana. 2. Materials and Methods 2.1. Test material On April 6, 2016, material for the present study was collected from the gene collection area of the national P. massoniana seedling base, located in Ma’anshan (26°16’ N and 107°31’ E), Duyun, Guizhou Province, China. One 14 year old P. massoniana plant with polycones was selected as the study subject. The same plant contained both the normal and polycone twigs. On the normal twigs, both mega- and microstrobili developed normally, without sexual reversal. Whereas on the polycone twigs, the microstrobili reversed sexually into megastrobili and produced polycones ([39]Fig. 1). Five groups of samples were collected: (1) microstrobili before the sexual reversal (PM_w), (2) bisexual strobili during the sexual reversal (PM_b), (3) megastrobili formed by the sexual reversal (PM_q) from a polycone twig, (4) megastrobili (PM_f) and (5) microstrobili (PM_m) from a normal twig. From each group, three replicates were made and stored in liquid nitrogen. The description of the collected material and their images are shown in [40]Table 1 and [41]Fig. 2, respectively. Table 1. A brief description of the collected samples from the same polycone twig of P. massoniana. Samples Description Collection site PM_b Bisexual strobili during the sexual reversal (half red) Polycone twig PM_q Megastrobili formed by the sexual reversal of microstrobili (all red) Polycone twig PM_w Microstrobili Polycone twig PM_f Megastrobili Normal twig PM_m Microstrobili Normal twig [42]Open in a new tab Figs 2. [43]Figs 2 [44]Open in a new tab m, f, w, b and q. Material sources of the normal and polycone twigs of the P. massoniana. m, microstrobili from a normal twig; f, megastrobili from a normal twig; w, microstrobili that had not yet been sexually reversed on the polycone twig; b, bisexual strobili during sexual reversal on the polycone twig and q, megastrobili formed from sexual reversal on the polycone twig. 2.2. RNA extraction and library construction A Trizol kit (Invitrogen) was used to extract the total RNA, then the total RNA was treated with RNase-free DNase I (Takara Bio, Japan) for 30 min at 37°C to remove residual DNA. RNA quality was verified using a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA) and were also checked by RNase free agarose gel electrophoresis. The mRNA was enriched with oligo dT beads for total RNA whose quality met the requirements. Further, the mRNA was fragmented into short segments in the fragmentation buffer. Using mRNA as a template, cDNA was synthesized with reverse transcriptase (RT) using random hexamers. Purification was completed with end repair and addition of a poly-A tail to the doublestranded cDNA, thus establishing a cDNA library. The cDNA library was then sequenced on an Illumina sequencing platform after the qualification examination. 2.3. Transcriptome analysis of P.massoniana polycone After filtering raw data to remove low-quality sequences, and reads with adapters that produced an N-ratio greater than 0.1%, clean reads were obtained and evaluated further. The Trinity system [[45]8] was used to splice the clean reads. The longest transcript of each gene, obtained thereby, was used as the unigene for subsequent analysis. Unigenes then were queried against the major databases with BLASTX and were further classified into the Gene Ontology (GO), euKaryotic Ortholog Groups (KOG) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) according to the annotations. The clean reads of each sample were mapped to the longest transcript sequence, the resulting read counts were converted to FPKM (expected number of Fragments Per Kilobase of transcript sequence per millions base pairs sequenced) values to analyze the gene expression level. Using DEGseq [[46]9], read-counts obtained from the gene expression analysis with |log2 fold change >1 and q value < 0.005 were taken as the differentially expressed gene (DEGs). 3. Results 3.1. Assembly and splicing of transcriptome From the splicing quality as presented in [47]Table 2, it was observed that the base error rates of all samples are below 0.01%. The averages of Q20, Q30 and GC were 97.88, 94.80 and ~45.2%, respectively. In total, 190,023 unigenes were obtained, with an average length of 595 bp. For N50 the length was 929 bp, and that for N90 it was 245 bp. The lengths of unigenes are predominantly within 200-300bp. More specifically, the number of genes within the size ranges of 200-301, 301-500, 501-1,000, 1,001-2,000 bp, and >2,000 bp are 90,894, 45,307, 28,009, 15,329, and 10,484 unigenes, respectively. The number of unigene sequences decrease gradually with the length of the sequence, thus indicating good sequence quality. Table 2. Summary of sequencing data quality of P. massoniana polycone Sample Raw reads Clean reads Clean bases Error (%) Q20 (%) Q30 (%) GC Content (%) PM_b1 58793018 57647452 8.65G 0.01 98.14 95.34 45.2 PM_b2 47738824 46710370 7.01G 0.01 97.86 94.76 45.05 PM_b3 47597638 46288816 6.94G 0.01 97.82 94.79 45.31 PM_q1 52536546 50929904 7.64G 0.01 98.03 95.45 45.75 PM_q2 45975644 44775990 6.72G 0.01 97.75 94.62 45.76 PM_q3 44138302 42725456 6.41G 0.01 97.82 94.84 45.57 PM_w1 68036142 66709378 10.01G 0.01 97.82 94.83 44.96 PM_w2 52038922 50992778 7.65G 0.01 97.8 94.62 44.87 PM_w3 45970238 44803048 6.72G 0.01 98.12 95.37 44.92 PM_f1 59838294 58207864 8.73G 0.01 97.63 94.47 45.33 PM_f2 51126784 49650744 7.45G 0.01 97.82 94.81 45.44 PM_f3 41165852 40130986 6.02G 0.01 97.59 94.3 45.76 PM_m1 49471722 48001564 7.2G 0.01 98.02 95.17 44.9 PM_m2 48692894 47399458 7.11G 0.01 97.87 94.84 44.62 PM_m3 47169460 45979646 6.9G 0.01 98.04 95.14 44.76 [48]Open in a new tab 3.2. Unigene function annotation According to the sequence similarity analysis, the resultant unigenes were queried in the NR, NT, KO, SwissProt, Pfam, GO, and KOG databases ([49]Table 3). Due to the reproducibility and complexity of the sequence of the gymnospermic plants belonging to the family Pinaceae, the resources for the pine trees were relatively scarce compared to those of the other model organisms. The greatest number of annotations was retrieved from the NR database, with a value of 66,236 (34.85%), and the fewest from the KOG database, with a value of 28,025 (14.74%). Shared annotations between the seven databases are 12,198 (6.41%). From the NR database, the near-source species with the highest similarity is Picea sitchenensis (Bong) Carr. (24.9%), followed by Gossypium raimondii Ulbr. (7.4%), Amborella trichopoda Baill. (4.7%), Prunus persica (L.) Batsch (4.2%), and Prunus mume Siebold & Succ (3.2%). Table 3. List of annotation of unigenes in different databases. Database Number of unigenes Percentage (%) NR 66,236 34.85 NT 65,429 34.43 KO 25,863 13.61 SwissProt 55,836 29.38 Pfam 51,428 27.06 GO 52,472 27.61 KOG 28,025 14.74 Annotated in all databases 12,198 6.41 Annotated in at least one database 96,476 50.77 Total unigenes 190,023 100 [50]Open in a new tab 3.3. The GO, KOG, and KEGG classification of unigenes The functional classification of unigenes was performed using the GO database. In it’s entirety, there were 56 functional groups identified, among which, cellular process (GO: GO:0009987), the binding (GO: 0005488), and metabolic process (GO:0008152) have most annotations. The number of genes of those important functional groups, namely binding, catalytic activity and cell part are 27,225, 21,903, and 14,994, respectively. In the KOG database, a total of 28,025 unigenes (14.74%) were annotated in 26 categories, with a maximal number of 4,742 in the class of General function prediction only. A minimum of two was obtained in the class of unnamed proteins, and varied gene expression abundance in other functional categories. In the KEGG database, a total of 25,863 unigenes were divided into 130 metabolic pathways, involving ribosome, carbon metabolism, biosynthesis of amino acids, and plant pathogen interactions. The number and proportion of metabolic pathways in the top 15 are listed in [51]Table 4. Table 4. KEGG metabolic pathway classification of unigenes of polycone Pinus. KEGG Pathway Pathway ID Quantity (percentage) 1 Ribosome ko03010 1767(6.01%) 2 Carbon metabolism ko01200 1523(5.18%) 3 Biosynthesis of amino acids ko01230 1130(3.84%) 4 Protein processing in endoplasmic reticulum ko04141 1041(3.54%) 5 Plant-pathogen interaction ko04626 875(2.98%) 6 Oxidative phosphorylation ko00190 791(2.69%) 7 Spliceosome ko03040 706(2.40%) 8 Glycolysis/Gluconeogenesis ko00010 698(2.38%) 9 Starch and sucrose metabolism ko00500 643(2.19%) 10 RNA transport ko03013 615(2.09%) 11 Endocytosis ko04144 592(2.01%) 12 Purine metabolism ko00230 516(1.76%) 13 Phenylpropanoid biosynthesis ko00940 481(1.64%) 14 Plant hormone signal transduction ko04075 468(1.59%) 15 Pyruvate metabolism ko00620 464(1.58%) [52]Open in a new tab The greatest proportion belonged to ribosome, with a gene number of 1,767 (6.01%), followed by carbon metabolism and biosynthesis of amino acids. The latter two metabolic pathways showed a number of 1,523 (5.18%) and 1,130 (3.84%), respectively. Plant hormone signal conduction involves a total of 468 genes (1.59%). 4. Expression analysis of DEGs Common DEGs from the paired analysis of the ploycone of P. massoniana were subjected to stratified cluster analysis ([53]Fig. 3). The PM_w from the polycone twig and the PM_m from the normal twig clustered together, while PM_q from the polycone twig and the PM_f from the normal twig clustered together. The difference between these two clusters was shown to be significant. The overall value of PM_b expression from the polycone twig also fell between the values of the microstrobili and the megastrobili. Principle component analysis (PCA) was performed for the expression of the unigenes of the ploycone of P. massoniana. The results demonstrated that the PM_b was located between the microtrobili and the megastrobili at the transition state of sexual reversal. This result is consistent with the actual situation and was observed with good reproducibility. Figs. 3. [54]Figs. 3 [55]Open in a new tab Clustering of transcriptome expression of the P. massoniana. A, stratified clustering of DEGs of the polycone of P. massoniana; B, three-dimensional PCA of transcriptome data. 4.1. Gene expression differences between microstrobili and megastrobili of P. massoniana polycone According to screening criteria, 1,188 DEGs were found for the comparison between the PM_b and the PM_w samples. Of these, 715 genes were up-regulated and 473 were down-regulated. Further, a total of 4,768 DEGs were found for the comparison between the PM_q and the PM_w, of which, 2,075 genes were up-regulated and 2,717 were down-regulated. For the comparison between PM_f and PM_m, a total of 5,550 DEGs were identified, of which 2,651 genes were up-regulated and 2,899 were down-regulated. Among them, there are 69 differential genes specific to microstrobili to bisexual strobili (PM-bvsPM-w), and these genes may be related to the process of the sexual reversa. 4.2. Analysis of unigenes involved in plant hormone signal transduction of P. massoniana polycone Identified DEGs were subjected to the pathway enrichment analysis. It was found that the majority of the combinations among the paired comparisons were significantly enriched in the plant hormone signaling pathway (ko04075). Regarding the metabolic pathway of plant hormone signal transduction, there were 51 DEGs between the megastrobili and microstrobili from the polycone twig, with 26 up-regulated and 25 down-regulated. Also, there were 51 DEGs between the megastrobili and microstrobili from the normal twig, with 30 up-regulated and 21 down-regulated. Altogether there were 36 DEGs common to the polycone and normal twigs, among which, 15 were related to the auxin (indole-3-acetic acid, IAA), three to gibberellic acid (GA), five to abscisic acid (ABA), three to zeatin nucleoside (ZR), two to salicylic acid (SA), four to brassinosteroid (BR) and three to cytokinin (CTK). With respect to the IAA metabolic pathway, ten genes were related to the small auxin upregulated RNA (SAUR) family, six to auxin-reactive protein IAA, and the expression of the six genes were all up-regulated in the megastrobili. The genes and their related metabolic pathways are listed in [56]Table 5. Table 5. DEGs common to megastrobili and microstrobili from two different twigs that were involved in plant hormone signaling pathways. Number Gene ID KO ID KO Description 1 c72332_g1 K14431 transcription factor TGA 6 c81784_g1, c84572_g1, c81784_g4, c71445_g1, c81784_g3, c71164_g1 K14484 auxin-responsive protein IAA 10 c66758_g1, c72270_g1, c80646_g1, c85500_g2, c85500_g4, c82056_g6, c21023_g1, c85347_g2, c88602_g3, c83511_g1 K14488 SAUR family protein 3 c80401_g2, c80401_g1, c86301_g1 K14489 arabidopsis histidine kinase 2/3/4 (cytokinin receptor) 1 c67220_g1 K14490 histidine-containing phosphotransfer peotein 1 c84223_g9 K14491 two-component response regulator ARR-B family 1 c67613_g1 K14492 two-component response regulator ARR-A family 1 c84139_g2 K14494 DELLA protein 2 c7599_g1, c139107_g1 K14495 F-box protein GID2 3 c87733_g1, c79770_g4, c87368_g3 K14496 abscisic acid receptor PYR/PYL family 2 c84358_g1, c69215_g2 K14497 protein phosphatase 2C 2 c67761_g1, c73235_g1 K14503 brassinosteroid resistant 1/2 2 c76507_g1, c79098_g6 K14505 cyclin D3, plant 1 c55993_g2 K14508 regulatory protein NPR1 [57]Open in a new tab The DEGs between the megastrobili and microstrobili from the normal twig and those from the polycone twig, as well as their common DEGs which were related to the plant hormone signaling pathways are shown in [58]Fig. 4. Interestingly, the involved genes demonstrated either male or female preferred expression, associated with the sex difference. However, the expression of 36 common DEGs were all up-regulated in the bisexual strobili during sexual reversal from the polycone twig. Figs. 4. [59]Figs. 4 [60]Open in a new tab Heat map of unigene expression involved in plant hormone signal transduction of P. massoniana. A, DEGs of the megastrobili and microstrobili from the normal twig; B, DEGs of the megastrobili and microstrobili from the polycone twig; C, clustering of common DEGs from the two different types of twigs and D, common DEGs from the two different types of twigs. 4.3. Expression of MADS-box genes involved in the megastrobili and microstrobili of P. massoniana According to our search using the conserved MADS-box protein domain of Arabidopsis thaliana in the transcriptome data of P. massoniana, with using the method of local blastp, a total of 63 unigenes were identified as homologues to the MADS box transcription factor. Their expression was analyzed using R software ([61]Figure 5). It can be found that most of the MADS-box genes in the megastrobili and microstrobili of P. massoniana demonstrated either male or female preferred expression. For the expression of the MADS-box, there were two distinct expression patterns between the megastrobili and microstrobili. The genes from the megastrobili showed high expression in cluster 1 and the expression of PM_q was similar to that of PM_f. The genes from the microstrobili showed the high expression in cluster 2 and the expression of PM_w was similar to that of PM_m. The MADS-box genes of PM_b showed a higher expression level than the microstrobili on the cluster 1. Fig. 5. [62]Fig. 5 [63]Open in a new tab Expression of MADS-box genes in P. massoniana. 5. Discussion P. massoniana is the main timber species in southern China as well as a pioneer afforestation plant to control desertification. Polycone development is a special phenomenon occurring in P. massoniana, in both seedling base and in natural forests. To date, there have been few studies examining the sexual reversal mechanism of Pinus, especially P. massoniana. During the reproductive growth of plants, there are a number of external and internal factors affecting the flower bud differentiation. Six signaling pathways have been identified to regulate the flowering of A. thaliana: photoperiod, vernalization, autonomous, gibberellin (GA), temperature-sensitive, and age-dependent control [[64]10]. The regulation and interaction of plant endogenous hormones in plant tissues have direct effects on plant bud differentiation and sex determination [[65]11,[66]12,[67]13]. In angiosperms which are monoecism, endogenous GA played a feminine role in the sex determination of Zea mays L. [[68]14]. Sex-determining genes and plant hormones have some connection with the sex determination of Cucumis sativus L [[69]15,[70]16]. The plant hormone, auxin, is at the core of many aspects of plant growth and development [[71]17,[72]18]. In the bisexual strobili during sexual reversal of the polycone twig expression of many hormone related genes is up-regulated, especially the expression of SAUR and AUX / IAA. These two kinds of hormones belong to the early response genes of auxin [[73]19]. AUX / IAA is a transcription factor that is rapidly induced by auxin [[74]20]. In A. thaliana, AUX / IAA gene-derived mutants induce auxin related phenotype abnormalities [[75]21,[76]22,[77]23]. Recent genetic and molecular studies have shown auxin to be a major regulator of differential growth responses [[78]24]. Previously, Wakushima et al. [[79]25] were able to induce sex changes, and the production of bisexual strobili by administering exogenous hormones in conifers.The high expression of plant hormone related genes during the spontaneous reversal of P. massoniana indicated that the early response gene of IAA was related to the occurrence of sexual inversion. The sex system of gymnosperms is very complex when compared to the system of angiosperms [[80]26]. However, some aspects of the control of female reproductive development are conserved between flowering plants and their sister group, the gymnosperms, indicating the presence of these processes in a common ancestor of the extant seeds plants [[81]27]. Besides, gymnosperms do not produce petals, and their male reproductive organs are different from angiosperms stamens. In the classical plant flowering ‘ABCDE model’ [[82]28,[83]29], all genes belong to the MIKC type MADS-box gene except the AP2 gene [[84]30]. In this model, class B genes play a key role in specifying the identity of male reproductive organs (stamens) and petals during the development of flowers, while class C genes control female organ identity [[85]31], the absence of B gene expression leads to the formation of female reproductive organs [[86]32]. Theissen et al. [[87]33] found that the phylogenetic development of the MADS-box gene is similar to the origin and evolution of plant reproductive structures such as the ovule and flower. MADS-box as an important transcription factor in seed plants (including flowering plants and conifers) [[88]34], and plays an important role in controlling flower development and organ formation [[89]35]. Comparing functions of the floral MADS-box genes in gymnosperms with their orthologues in the early angiosperm Amborella can improve our understanding of the transition of their control functions from cone to flower development in early angiosperm evolution [[90]36]. According to our search of the conserved MADS-box protein domain of A. thaliana in the transcriptomic data of P. massoniana using the method of local blastp, a total of 63 unigenes were selected as homologues to the MADS-box transcription factor. Interestingly, the expression of MADS-box related genes in P. massoniana was found to be related to the gender difference. The MADS-box genes of PM_b that related to the process of sexual inversion showed higher expression than detected in the microstrobili in cluster 1. However, the expression of MADS-box genes in bisexual strobili was similar to that of microstrobili. The expression of many genes is regulated by transcription factors, and the different expression of MADS-box genes may be the first critical step during sex reversal. At present, for the occurrence of P. massoniana inversion, there is no transcriptomic data available. Researches on the reversal of plant sex are still rare, and most of them only stay at the level of physiology and anatomy, many specific regulatory mechanisms are unclear. Questions remain, such as why the P. massoniana polycone can have both twigs of the polycone and normal cone, and yet, these twigs can inherit stably; how do plant hormones interact and respond to control the differentiation of flower buds; or the role of specific regulation factors in the phenomenon of the sexual reversal. However, the answers to these questions are not yet known, it need learning and exploring more deeply. 6. Conclusions Results of the present study demonstrated that DEGs of the megastrobili and microstrobili of the normal and polycone twigs of P. massoniana exhibited male and female preferred expression in the plant hormone signal transduction pathways. A total of 36 common hormone-related DEGs between the two groups of DEGs (from the normal twig and the polycone twig) were all up-regulated in the bisexual strobili. This process involved a total of seven hormones, and the effect of IAA was the most significant. Among them, the expression of six auxin-related genes were up-regulated in the megastrobili and bisexual strobili. There was a significant positive correlation between IAA signal transduction pathway and the occurrence of sexual reversal in the P. massoniana. The expression of MADS-box related genes in P. massoniana was found to be related to sex difference. A part of the MADS-box genes of bisexual strobili showed a higher expression than measured in the microstrobili. However, the expression of MADS-box genes in the bisexual strobili was similar to that of microstrobili. Acknowledgements