Abstract Flower bud development is a defining feature of walnut, which contributes to the kernel yield, yield stability, fruit quality and commodity value. However, little is known about the mechanism of the flower bud development in walnut. Here, the stages of walnut female flower bud development were divided into five period (P01-05) by using histological observation. They were further studied through PacBio Iso-Seq and RNA-seq analysis. Accordingly, we obtained 52,875 full-length transcripts, where 4,579 were new transcripts, 3,065 were novel genes, 1,437 were consensus lncRNAs and 20,813 were alternatively spliced isoforms. These transcripts greatly improved the current genome annotation and enhanced our understanding of the walnut transcriptome. Next, RNA sequencing of female flower buds at five periods revealed that circadian rhythm-plant was commonly enriched along with the flower bud developmental gradient. A total of 14 differentially expressed genes (DEGs) were identified, and six of them were confirmed by real-time quantitative analysis. Additionally, six and two differentially expressed clock genes were detected to be regulated by AS events and lncRNAs, respectively. All these detected plant circadian genes form a complex interconnected network to regulate the flower bud development. Thus, investigation of key genes associated with the circadian clock could clarify the process of flower bud development in walnut. Introduction Juglans regia L., also known as common walnut, is widely cultivated in temperate regions extending from North, Central and South America to Europe and North Africa because of its high nutritional values [[44]1,[45]2]. Walnut has more than 7000 years of evolutionary and domestication history in China [[46]3], where produced numerous walnut cultivars with elite traits such as early fruiting, high fruit yield, high oil content, thin kernel and so on. Therein, early fruiting, fruit yield and kernel quantity are directly influenced by the intensity and quality of flower bud development. However, the molecular mechanism of flower bud development has remained unclear in walnut. Flower development is a highly complex physiological, biochemical and morphogenetic process in plant. It is affected by various factors such as, photoperiod [[47]4], circadian clocks [[48]5], hormones [[49]4,[50]6,[51]7], sucrose distribution [[52]8–[53]10] and other metabolic processes [[54]7,[55]11]. However, the advances in the studies of flowering pathway are relatively slow in woody trees. Zhu [[56]11] demonstrated that genes related to oxidoreductase activity and phytohormone metabolism regulate the dormant flower buds of Chinese cherry (Prunus pseudocerasus). Muhammad [[57]7] indicated that genes involved to the secondary metabolites biosynthesis and plant hormone signaling affect the bud dormancy in grape. In walnut, the treatment of heat orgibberellic acid (GA[3]) can affect the flowering pattern [[58]12,[59]13]. MicroRNAs sequencing revealed that the differentially expressed miRNAs mainly enriched to ubiquitin mediated proteolysis, RNA degradation and various carbohydrate metabolism pathways were related to the female flower induction [[60]14]. Transcriptome profiling identified 374 genes, which differentially expressed to regulate the transition between female buds and leaf buds [[61]15]. To date, however, none of the reported miRNAs or genes can completely explain the genetic basis of flower bud development in walnut. The published ‘Chandler’ walnut genome provides an important tool to facilitate the genetic dissection of complex traits [[62]16]. Nevertheless, the gene annotation was built based on the transcriptional data with short reads generated by second-generation sequencing (SGS) [[63]17], which is difficult in dealing with the repeat regions. Additionally, cultivars usually exhibit a strong reduction in genetic variation due to population bottlenecks and strong artificial selection during domestication, implying the limitation of single genome reference based on short-read sequencing. Fortunately, long-read or full-length cDNA sequences based on a third-generation sequencing (TGS) platform can perfectly overcome the problems of SGS platforms [[64]18,[65]19] and correct the gene annotations [[66]20,[67]21]. Furthermore, full-length cDNA sequences are particularly helpful to analyze different transcript isoforms generated by alternate splicing and non-coding RNA (ncRNA). In this study, we preformed full-length cDNA and transcriptome sequencing from female flower buds, and analyses the differentially expressed genes (DEGs), alternate splicing (AS) and long non-coding RNA (lncRNA) related to flower bud development in walnut. Our results provided valuable resources for elucidating the development of the flower buds in walnut. Materials and methods Plant materials ‘Wen 185’, one of the most widely cultivated walnut varieties in Xinjiang, was grown under natural conditions in the Xinjiang Fruit Science Experiment Station of Ministry of Agriculture and Rural Affairs, Yecheng, China. Female flower buds were collected from the upper bud of the axillary bud in the 2-3^rd position below the top bud of terminal. They were collected five stages (P01, P02, P03, P04 and P05) in an interval of 20 days starting from the initial period of flowering (P01) on 5^th April 2018 to 25^th June 2018. A total of 18 female flower buds were collected for each stage. Among them, 15 buds were stored in -80°C, and the others were fixed in flavone acetic acid (FAA) fixative after peeling off the outer scales before use. Histological observation using paraffin sections The fixed female flower buds at different stages were used for histological observation using paraffin sections. These buds were dehydrated with a continuous gradient of (%) ethanol and were embedded in paraffin. Samples were cut into 8–10μm slices (Leica Microtome, Germany), deparaffinized with xylene, and hydrated in a decreasing ethanol series. The sections were stained with Safranin and Fast Green and mounted with neutral gum. Finally, the slices were observed under a Motic microscope (Motic AE31, China). RNA preparation Each sample was pooled with five buds from the same walnut tree, and three biological repeats (from three different walnut trees) were performed for five different stages. The total RNA of 15 samples were extracted individually by using RNA preparation kit (Tiangen Biotech, Beijing, China) following the provided protocol. RNA concentration was measured using NanoDrop 2000 (Thermo Scientific, Waltham, MA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Then, quantified RNA samples were used for constructing cDNA libraries subsequently. PacBio cDNA library construction and third-generation sequencing One mixed RNA of all 15 samples in five periods was used to construct PacBio cDNA library. The sequencing library was prepared according to the Iso-Seq protocol as described by Pacific Biosciences (P/N100-377-100-05 and P/N100-377-100-04). The SMARTer PCR cDNA Synthesis Kit was used to synthesize cDNA from the same RNA samples used for PacBio sequencing. Size fractionation and selection (1-2kb, 2-3kb, and 3-6kb) were performed using the BluePippin™ Size Selection System (Sage Science, Beverly, MA, USA). The amplified cDNA products were used to generate SMRTbell Template libraries according to the Iso-Seq protocol. Libraries were prepared for sequencing by annealing a sequencing primer and adding polymerase to the primer annealed template. The polymerase-bound template was bound to MagBeads, and sequencing was performed on a PacBio RSII using the provided protocol. Quality filtering and error correction of PacBio long reads TGS subreads were filtered using the standard protocols in the SMRT Analysis software suite ([68]http://www.pacificbiosciences.com), and reads of insert (ROIs) were obtained. After examining for poly (A) signal and 5’ and 3’ adaptors, full-length and non-full-length cDNA reads were recognized. Consensus isoforms were identified using the algorithm of iterative clustering for error correction and further polished to obtain high-quality consensus isoforms. The raw Illumina SGS reads were filtered to remove adaptor sequences, ambiguous reads with ’N’ bases, and low-quality reads. Afterward, error correction of low-quality isoforms was conducted using the SGS reads with the software proovread 2.13.841. Redundant isoforms were then removed to generate a high-quality transcript dataset for walnut using the program CD-HIT 4.6.142 ([69]http://weizhongli-lab.org/cd-hit/). Functional annotation of transcripts Functional annotations were conducted by using BLASTX (cutoff E-value ≤ 1e-5) against different protein and nucleotide databases of Clusters of Orthologous Groups (COG), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG). For each transcript in each database searching, the functional information of the best matched sequence was assigned to the query transcript. Predictions of ORF, gene fusion and long non-coding RNAs To predict the open reading frames (ORFs) in transcripts, we used the package TransDecoder v2.0.1 ([70]https://transdecoder.github.io/) to define putative coding sequences (CDS). The fusion transcript was identified according to the following criterions: 1) A single transcript must map onto the reference genome with two or more loci; 2) The minimum coverage for each loci was 5% length of the total transcript; 3) Total coverage was ≥ 95%; 4) the distance between adjacent loci was at least 10kb. The predicted CDS was searched and confirmed by BLASTX (E-value ≤1e-5) against the protein databases of NR, SWISS-PROT, and KEGG. Those transcripts containing complete ORFs as well as 5’- and 3’-UTR (untranslated regions) were designated as full-length transcripts. To predict long non-coding RNAs (lncRNAs), we used PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme ([71]https://sourceforge.net/projects/plek). Illumina cDNA library construction and second-generation sequencing A total of 15 Three RNA samples from five buds in each period were used for Illumina cDNA library construction using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, Beverly, MA, USA), following the manufacturer’s recommendations. Qualified libraries were applied to NGS using an Illumina Hiseq 2500 (Illumina, San Diego, CA, USA) to generate 125bp paired-end sequence reads (2×125bp). High-throughput sequencing reported in this study was performed in the Biomarker Technology Co., Ltd (Beijing, China). Illumina data analysis Raw data (raw reads) in fastq format were first processed using in-house perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapters, reads containing poly “N” and low-quality reads. These clean reads were then mapped to the reference genome sequence using Tophat2 [[72]22]. Only reads with a perfect match or one mismatch were further analyzed and annotated based on the reference genome. Gene expression levels were estimated by FPKM. Differential expression analysis between two stages was performed using the DESeq package [[73]23]. The resulting P values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate [[74]24]. Genes identified by DESeq with FDR ≤ 0.01 and FC ≥ 2 were defined as differentially expressed. K-means clustering was conducted based on the Pearson correlation of gene expression profiles [[75]25]. GO and KEGG enrichment analysis Functional annotations of the DEGs were performed and searched against the GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomics) databases. Classification and enrichment of DEGs were carried out by WEGO and ArigGO [[76]26]. respectively. Metabolic pathway assignments were carried out based on the KEGG orthology database ([77]http://www.genome.ad.jp/kegg/) using KOBAS2.0 [[78]27,[79]28]. GO and KEGG pathway enrichment analysis was conducted by Fisher’s exact test corrected with FDR of 5%. Arabidopsis homologous genes searches and phylogenetic analysis of the clock genes DEGs were used as queries in translated nucleotide BLAST (BLASTX) searches against Arabidopsis gene sequences in the Arabidopsis Information Resource 10 database ([80]http://www.arabidopsis.org/) to obtain the closest Arabidopsis homolog (E-value ≤ e-10). The domains and functional sites in their corresponding proteins were examined with InterProScan [[81]29]. The ClustalW software was used for the sequence alignment between walnut and Arabidopsis, and phylogenetic trees were constructed using the MEGA 7.0 software. The neighbor-joining method was performed and 1000 bootstrap test replicates were used during the construction [[82]30]. Quantitative real-time PCR RNA samples were isolated from female flower buds in different developmental stages from ‘Wen 185’. qRT-PCR primers were designed with PRIMER PREMIER 6 (PREMIER Biosoft, [83]http://www.premierbiosoft.com/primerdesign/). PCRs were conducted using the StepOnePlus Real-Time PCR System (Applied Biosystems, Foster City, CA, USA). Data were analysed using the 2^−ΔΔCt method to calculate relative gene expression [[84]31]. [85]S1 Table lists all primers used in the qRT-PCR experiments. Results Morphological characteristics of walnut floral development In the Kashgar region of Xinjiang, the mixed buds of the ‘Wen185’ began to sprout in late March. After that, new branches began to grow, the axillary buds of the new shoot form and grow continuously ([86]Fig 1A, [87]1C, [88]1E, [89]1G and 1I). While the external morphology of female flower bud changes, the internal development also continues. In P01, the internal growth of female flower buds was flat, and the leaf primordia already formed ([90]Fig 1B); In P02, the apical meristem began to flatten and sag ([91]Fig 1D). In P03, the growth point grows upward, and the spired turn round and shape protuberance ([92]Fig 1F). In P04, the lower part of the globular bulge developed into the flower stalk primordium ([93]Fig 1H). In P05, the top developed into the female flower primordium ([94]Fig 1J). Fig 1. Morphological characteristics of the ‘Wen 185’ female flower bud. [95]Fig 1 [96]Open in a new tab a, c, e, g and i present the phenotypic characteristics of five different developmental stages of female flower buds; b, d, f, h and j represent the internal morphological differentiation characteristics of five different developmental stages of female flower buds, a and b: The bud sprouted; c and d: Apical meristem began to flatten and sag; e and f: Spire turn rounded and shape protuberance; g and h: Flower stalk primordium development stage; i and j: Female flower primordium development stage. Ruler: 100μm. lp, leaf primordia; gp, growth point; sam, apical meristem; fip, female flower inflorescence primordium; fst, flower stalk; fp, female flower primordium. Full-length sequencing and acquisition of high-quality redundant sequences To identify as many transcripts as possible, the concentration of RNA extracted from the female flowers of five different growth stages were mixed with equivalent content and used for library construction. Multiple size-fractionated libraries (1–2, 2–3 and 3–6 k) were constructed to avoid loading bias, which favoured sequencing of short transcripts ([97]S2 Table). Eight cells were carried (four cells for 1–2 k libraries, two cells for 2–3 k libraries and two cells for 3–6 k libraries) ([98]S2 Table). Filtered for sub-reads with length less than 50 bp and sequence accuracy less than 0.75, a total of 23.81 Gb of clean data containing 9,710,591 sub-reads were obtained ([99]S2 Table). Each size-selected library had the expected distribution of mean transcript lengths ranging from 1,946 bp to 4,308 bp ([100]S2 Table). Of the inserted reads, 240,258 of 616,540 were full-length reads based on the presence of bar-coded primers and polyA tails ([101]S2 Table). The SMRT Analysis (v2.3.0) software using the ICE (Iterative Clustering for Error Correction) algorithm, combined with the quiver program, was used for sequence clustering. As a result, a total of 121,139 consensus isoforms were obtained, of which HQ (High-Quality) transcripts were 95,766 and LQ (Low-Quality) transcripts were 25,373. Transcriptomes sequence for the same samples were further used to correct the LQ transcripts ([102]S3 Table). To get the non-redundant transcript sequences, HQ transcript sequences were mapped to the draft genome using GMAP (Genomic Mapping and Alignment Program). The sequences with identity less than 0.9 and coverage less than 0.85 were filtered. Reads differing only at the 5′-start site within the first exon was counted as redundant reads. Using this method, 52,875 non-redundant transcript sequences were obtained ([103]S4 Table). Among of these non-redundant transcript sequences, 48,296 are similar to that of the known genome in transcript lengths. In which, a total of 6,951 transcript were completely mapped to the genome, but 41,345 transcripts with partial mapping. Additional 4,579 non-redundant transcripts were identified as the new transcripts. Thus, the walnut genome annotation was enriched with the SMRT results, and the integrated version was used for further analysis. Identification of novel genes, LncRNAs and alternative splicing events Prediction of novel genes, lncRNAs and AS events offers a new direction to further study the development of female flower buds in walnut. Using the transcripts detected by PacBio sequencing, 3,065 novel genes and 1,350 fusion genes were identified ([104]Fig 2A). A total of 1,434 consensus lncRNAs were predicted by using CPC, CNCI, CPAT and PFAM software ([105]Fig 2B). 14,313 lncRNA-mRNA pairs were predicted based on the location and LncTar software ([106]S5 Table). In addition, a total of 43,171 ORFs containing 33,734 complete ORFs were obtained by using TransDecoder (v3.0.0) ([107]Fig 2C and 2D). Fig 2. Identification of fusion genes, novel genes, lncRNAs, alternative splicing events and ORF numbers based on SMRT-based RNA sequencing. [108]Fig 2 [109]Open in a new tab (a) numbers of fusion genes and novel genes, (b) Number of lncRNAs analyzed by pfam, cpat, cnci and cpc based on PacBio platform, (c) ORF coding protein complete length distribution, (d) numbers of ORF and complete ORF, (e) number and type of lncRNA. lncRNA, long non-coding RNA. Five kinds of alternative splicing (AS) patterns containing 20,813 AS events were identified based on Pacbio reads and corresponding annotated gene models ([110]S6 Table). The number of alternative 3′ splice sites (17.61%) and retained introns (61.92%) were much more than alternative 5′ splice sites (9.21%), skipped exons (10.52%) and mutually exclusive exons (0.74%) ([111]Fig 2E). Obviously, the majority of AS events is retained introns events. Transcriptome analysis of the flower buds in differently developmental stages RNA-seq provided an overview of genes differentially expressed during different developmental stages. We constructed libraries and analyzed the sequences based on the Illumina HiSeq Xten platform. A total of 107.87sequencing data were obtained from the 15 libraries (P01, P02, P03, P04 and P05 with 3 replications), respectively ([112]Table 1). After removing low-quality reads and adaptor sequences, 24.83, 21.74, 24.24, 23.94 and 25.50 million clean reads were generated, and more than 80% of the reads were uniquely mapped to the reference genome ([113]Table 1). The transcript abundances of genes were estimated by fragments per kilobase of exon per million fragments mapped (FPKM). The boxplot distribution of the log[10]FPKM values in [114]Fig 3A showed that the median and quartile values of the expression values across the libraries compared for differential expression were comparable. Table 1. Summary of transcripts detected from RNA sequencing data. Sample Library Base sum Total reads Uniquely mapped reads (%) Reads mapped to multiple loci (%) Reads mapped to many loci(%) GC (%) Q30(%) P01 T01 7556617516 25298473 84.71% 7.50% 0.05% 45.91% 93.93% T02 7012081266 23464545 83.51% 7.00% 0.04% 45.51% 93.68% T03 7704480602 25754343 83.19% 6.88% 0.04% 45.55% 93.25% P02 T04 6404233368 21456602 84.35% 7.37% 0.04% 45.39% 92.21% T05 5945529716 19913116 83.37% 8.25% 0.15% 45.6% 92.28% T06 7114499344 23859892 87.39% 8.09% 0.05% 45.59% 92.82% P03 T07 7398788244 24743100 83.06% 7.10% 0.03% 45% 93.43% T08 6577597758 22011929 84.70% 7.85% 0.04% 45.75% 93.79% T09 7767310550 25980409 83.33% 7.07% 0.04% 45.22% 93.28% P04 T10 6922008312 23166410 82.62% 7.53% 0.06% 45.32% 93.56% T11 7144878360 23865918 81.47% 7.03% 0.04% 45.39% 93.35% T12 7416817380 24788064 81.73% 7.28% 0.06% 45.25% 93.51% P05 T13 7790691172 26016207 80.82% 7.55% 0.05% 45.6% 93.27% T14 7739820868 25875265 83.57% 7.18% 0.04% 45.13% 93.7% T15 7377869284 24632169 81.53% 6.86% 0.03% 45.31% 93.25% [115]Open in a new tab Fig 3. Identification of DEGs based on Illumina-based RNA sequencing. [116]Fig 3 [117]Open in a new tab (a) Boxplots of transcript expression of female flower buds with three biological repeats at five different developmental periods, (b) number and regulated pattern of DEGs in different comparable groups, (c) KEGG enrichment analysis of DEGs in differently comparable groups, (d) number and regulated pattern of DEGs related to plant circadian rhythm in different comparable groups, (e) venn diagrams of DEGs related to plant circadian rhythm in differently comparable groups, (f) qRT-PCR analysis of selected plant circadian rhythm DEGs. DEG, differentially expressed gene. fpkm, fragments per kilobase per million mapped reads. To detect the DEGs related to the development of flower buds, we emphatically studied the gene expression patterns in the comparable groups: P01_P02, P02_P03, P03_P04 and P04_P05. Correspondingly, 56, 669, 312 and 2,029 DEGs were identified from P01_P02, P02_P03, P03_P04 and P04_P05, respectively ([118]Fig 3B). The number of DEGs was least in P01_P02, but largest in P04_P05. The number of down-regulated genes were more than that of up-regulated genes in each comparable group. It can, therefore, be inferred that more and more genes were activated along with the progress of the flower development, and the negative regulatory genes are mainly dominanted during the flower bud development in walnut. To explore the putative function of DEGs in the development of flower bud, KEGG pathway enrichment analysis was performed in each comparable group ([119]Fig 3C). A total of 10 pathway terms contained 89 DEGs were significantly enriched (q <0.05) in the four comparable groups: one pathway contained three genes for P01_P02, five pathways contained 42 genes for P02_P03, one pathway contained six genes for P03_P04, and four pathways contained 38 genes for P04_P05, respectively. The pathway of DNA replication was significantly (q = 0.003) enriched in the P01_P02. P02_P03 and P04_P05. The flavonoid biosynthesis pathway was significantly enriched in the P02_P03 and P04_P05. The genes associated to DNA replication and flavonoid biosynthesis appear to especially express in the certain stage to affect flower bud development in walnut. Circadian rhythm-plant was commonly shared by P02_P03, P03_P04 and P04_P05. It implied that the circadian rhythm-plant pathway is the representation from the developmental stage P02, when was a initial stage of transformation of apical meristem to pistil primordium in walnut ([120]Fig 1C and 1D). Thus, circadian rhythm-plant pathway was further investigated as candidate ones in the development of flower bud. DEGs associated with circadian rhythm-plant pathway There were 14 DEGs to be annotated as circadian rhythm factors in three comparable groups (P02_P03, P03_P04 and P04_P05) ([121]Fig 3D). Among of these DEGs, seven (three up-regulated genes and four down-regulated genes), six (two up-regulated genes and four down-regulated genes) and ten (eight up-regulated genes and two down-regulated genes) genes were identified in P02_P03, P03_P04 and P04_P05, respectively ([122]Table 2). Two DEGs (WALNUT_00003099-RA and WALNUT_00021104-RA) were commonly detected in these three comparable groups. They were up-regulated in both P02_P03 and P04_P05, but were down-regulated in P03_P04. Similarly, WALNUT_00021264-RA showed differently regulated direction between P02_P03 and P03_P04. WALNUT_00012557-RA and WALNUT_00018494-RA exhibited the opposite trend between P03_P04 and P04_P05. These genes affected the development of flower bud by different regulated models along with the different flower bud developmental stages in walnut. Conversely, WALNUT_00021631-RA was down-regulated in both P02_P03 and P04_P05, but WALNUT_00014467-RA was up-regulated in these two comparable groups. It indicated that the expression of WALNUT_00014467-RA and WALNUT_00021631-RA might associate with the flower bud differentiation. PB.6052 is first annotated as circadian rhythm factors by full-length sequencing, and positively regulated the development of flower buds from P04 to P05 ([123]Table 2). Table 2. List of DEGs related to circadian rhythm-plant pathway in walnut. Walnut A.thaliana Comparable_group Gene_id FDR log[2]FC Regulated Homologous Description P02_P03 WALNUT_00003099-RA 0.004 1.98 up — WALNUT_00014467-RA 0.002 1.81 up AT2G46790 (PRR9) Involved in clock function. Interact with TOC1 and act as transcriptional repressor of CCA1 and LHY. WALNUT_00014989-RA 0.006 -7.40 down AT1G01060 (LHY) Encodes a myb-related putative transcription factor involved in circadian rhythm along with another myb transcription factor CCA1. WALNUT_00016144-RA 0.007 -2.19 down — WALNUT_00021104-RA 0.000 1.40 up AT1G22770 (GI) WALNUT_00021264-RA 0.003 -8.24 down — WALNUT_00021631-RA 0.000 -2.41 down — P03_P04 WALNUT_00003099-RA 0.002 -1.23 down — WALNUT_00006554-RA 0.000 1.29 up — WALNUT_00012557-RA 0.000 -1.32 down — WALNUT_00018494-RA 0.000 -1.88 down AT1G68050 (FKF1) WALNUT_00021104-RA 0.000 -1.57 down AT1G22770 (GI) Regulates several developmental processes, including photoperiod-mediated flowering, circadian clock. WALNUT_00021264-RA 0.004 8.32 up — P04_P05 PB.6052 0.001 Inf up — WALNUT_00003099-RA 0.000 1.94 up — WALNUT_00008415-RA 0.000 1.02 up AT5G02810 (PRR7) WALNUT_00012557-RA 0.000 1.44 up — WALNUT_00014467-RA 0.001 1.25 up AT2G46790 (PRR9) WALNUT_00018494-RA 0.000 1.96 up AT1G68050 (FKF1) A flavin-binding kelch repeat F box protein, is clock-controlled, regulates transition to flowering. WALNUT_00021104-RA 0.000 1.78 up AT1G22770 (GI) WALNUT_00021631-RA 0.002 -1.58 down — WALNUT_00024934-RA 0.008 -1.69 down — WALNUT_00026613-RA 0.000 1.21 up AT5G02810 (PRR7) Acts as transcriptional repressor of CCA1 and LHY. [124]Open in a new tab To further study the molecular function of the DEGs, they were used as query genes to detect their homologous genes in the model plant Arabidopsis thaliana ([125]Table 2). WALNUT_00014989-RA is orthologous to A. thaliana AT1G01060 (LHY), which was down-regulated in P02_P03. There are two genes WALNUT_00008415-RA and WALNUT_00026613-RA, which are both orthologous to A. thaliana AT5G02810 (PRR7), up-regulated in P04_P05. Additionally, WALNUT_00014467-RA, WALNUT_00018494-RA and WALNUT_00021104-RA are orthologous to A. thaliana AT2G46790 (PRR9), AT1G68050 (FKF1) and AT1G22770 (GI), respectively. To our knowledge, the PRR7, PRR9, LHY, FKF1 and GI are extensively proved to be clock genes controlling the flower in A. thaliana [[126]32–[127]34]. To validate the results from RNA-seq data, we randomly selected six genes (WALNUT_00014467-RA, WALNUT_00021104-RA, WALNUT_00021264-RA, WALNUT_00006554-RA, WALNUT_00012557-RA and WALNUT_00018494-RA) from 14 circadian rhythm DEGs for real-time quantitative reverse transcription PCR (qRT-PCR) analysis. As expected, the expression profiles of these genes by qRT-PCR showed the similar trend with RNA-seq data ([128]Fig 3F). Phylogeny and structure characteristics of the circadian genes To gain insights into the phylogenetic relationship of the clock genes between walnut and A. thaliana, a phylogenetic tree was built by using their sequences ([129]Fig 4). WALNUT00021631-RA, WALNUT00016144-RA, WALNUT_00018494-RA and their orthologous FKF1 were grouped into the same cluster, they encoded the common PLN_superfamily domain. WALNUT_00024934-RA and COP1 were grouped together, and they also encoded a common PLN_superfamily domain. Fig 4. Phylogenetic tree and conserved structure domain of plant circadian rhythm proteins in walnut and Arabidopsis. [130]Fig 4 [131]Open in a new tab Proteins encoding by WALNUT_00003099-RA, WALNUT00014467-RA and PRR5 shared common CCT domain, they were clustered into the same group. WALNUT00026613-RA, WALNUT00008415-RA combined with PRR7 formed the same cluster, their proteins not only had a common REC_superfamily domain but also CCT domain. SANT_superfamily was both detected in WALNUT00014989 and WALNUT00021264, which tended to form a same clay. WALNUT_00012557-RA, WALNUT_00021104-RA and their orthologous GI were clustered to the same clay. Similarly, WALNUT_00006554-RA and CRY2 were grouped into the same cluster, their proteins had PhrB_superfamily and crypt_chrom_pln domain, respectively. Collectively, walnut circadian genes and their orthologous in A. thalinia tended to form the same clusters as their encoding proteins with same structure domain. These genes in the same clusters may act within the development of flower bud by displaying similar functions. Moreover, PB.6052 formed a single clay without any predicted domain, implying that it may specially regulate the development of flower bud by response to circadian clock in walnut. AS events and lncRNAs coupled with expressed circadian genes AS plays important role in mRNA processing [[132]35,[133]36]. Our results showed that eight expressed circadian genes were alternatively spliced with three types of splice isoform (Exon skipping, Intron retention, Alternative 3’ splice site) in walnut ([134]S7 Table). Therein, intron retention (IR) was the predominant splicing pattern, with alternative 3′ splice sites ranking second and exon skipping ranking last. The proportion of different splicing events varied among DEGs. For WALNUT_00012557-RA, WALNUT_00006554-RA and WALNUT_00003099-RA, only a single splice isoform was detected, and others with more than two splice isoforms. All of these novel splice isoforms were newly annotated in the walnut genome. LncRNA play a variety of biological roles in organisms [[135]37]. In the study, large number of lncRNA-mRNA pairs were predicted based on full-length transcriptome sequences and transcriptome analysis. Among of these expressed circadian genes, WALNUT_00003099-RA, WALNUT_00021104-RA and WALNUT_00012557-RA were proved to be regulated by LncRNA.16269.3, LncRNA.18027.1 and LncRNA.18728.3, respectively ([136]S7 Table). It implied the important roles of interaction between LncRNA and mRNA during the development of flower buds in walnut. Discussion Detection of novel genes, AS events and LncRNAs De novo transcriptomes assemblies can be developed for the studies in functional genomics of plants. Mohammad Sadat-Hosseini [[137]38] compared the assembled transcripts with transcripts of the walnut leaf, published genome assembly for the ‘Chandler’ cultivar using the BLAST algorithm and identified a number of novel homologue transcripts. This is a combine study of the full-length transcriptome sequences and transcriptome analysis to conduct the transcript detection of flower buds in walnut. Using this strategy, 52,875 non-redundant transcript sequences were obtained, and 4,579 new transcripts were identified with SMRT sequencing. Correspondingly, 3,065 novel and 1,350 fusion genes were identified by comparing with the walnut genome [[138]16]. Among of these novel genes, PB.6052 was annotated to response to circadian rhythm-plant, and whose expression was associated with the development of flower buds in walnut. Which is an excellent complement to the single genome in walnut. On the one hand, walnut exhibits a strong reduction in genetic variation due to population bottlenecks and strong artificial selection during domestication, which means certain functionally important genes lost in the reference walnut genome. On the other hand, characterizations of the reference walnut genome rely on short reads sequencing, and the genome gap regions may reduce the gene annotation accuracy. A total of 1,437 consensus lncRNAs and 14,313 lncRNA-mRNA pairs were predicted based on SMRT sequencing. Additionally, 20,813 AS events were first found in walnut genome based on Pacbio reads and corresponding annotated gene models. The results indicated that large-scale full-length cDNA sequence based on SMRT is a powerful tool for identifying novel genes/isoforms, which has greatly improved the current genome annotation and enhanced our understanding of the walnut transcriptome. Detection of circadian genes for the flower bud development Circadian clocks fine-tune physiology and behaviour to match the environmental day/night cycle [[139]39,[140]40]. The plant circadian clock generates daily and even seasonal rhythms in an orderly succession of physiological processes including stomatal opening, leaf movement, hypocotyl elongation, photosynthesis and carbon metabolism, resistance to abiotic and biotic stresses, and flowering time [[141]15,[142]33,[143]41–[144]45]. Expression analysis of critical genes helped to dissect the genetic basis of important traits, such as floral in hickory [[145]46] and Persian walnut [[146]47]. Based on our analysis, the pathway of circadian rhythm-plant was enriched during the development of flower buds of walnut, which is also observed by Quan [[147]15]. A total of 14 DEGs involved to the circadian rhythm were identified during the study. Homologous comparison and phylogenetic analysis revealed that JrFKF1, JrGI, JrPRR5, JrLHY, JrPRR9 and JrPRR7 are highly orthologous to the corresponding genes in Arabidopsis thaliana, indicating their similar functions in these two species. Additionally, JrCOP1-like, CRY2-like, JrPRR5-like, JrFKF1-like, JrLHY-like and JrGI-like were first reported to confer flower bud development in walnut. All these DEGs provided a valuable resource for further gene cloning. Regulatory network of circadian genes Extensively excellent reviews and references had reported the details