Abstract Background The genus Pseudoroegneria (Nevski) Löve (Triticeae, Poaceae), whose genome symbol was designed as “St”, accounts for more than 60% of perennial Triticeae species. The diploid species Psudoroegneria libanotica (2n = 14) contains the most ancient St genome, exhibited strong drought resistance, and was morphologically covered by cuticular wax on the aerial part. Therefore, the St-genome sequencing data could provide fundamental information for studies of genome evolution and reveal its mechanisms of cuticular wax and drought resistance. Results In this study, we reported the chromosome-level genome assembly for the St genome of Pse. libanotica, with a total size of 2.99 Gb. 46,369 protein-coding genes annotated and 71.62% was repeat sequences. Comparative analyses revealed that the genus Pseudoroegneria diverged during the middle and late Miocene. During this period, unique genes, gene family expansion, and contraction in Pse. libanotica were enriched in biotic and abiotic stresses, such as fatty acid biosynthesis which may greatly contribute to its drought adaption. Furthermore, we investigated genes associated with the cuticular wax formation and water deficit and found a new Kcs gene evm.TU.CTG175.54. It plays a critical role in the very long chain fatty acid (VLCFA) elongation from C18 to C26 in Pse. libanotica. The function needs more evidence to be verified. Conclusions We sequenced and assembled the St genome in Triticeae and discovered a new KCS gene that plays a role in wax extension to cope with drought. Our study lays a foundation for the genome diversification of Triticeae species and deciphers cuticular wax formation genes involved in drought resistance. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10140-5. Keywords: Genome assemble, St genome, Triticeae, Comparative analyses, Cuticular wax, Drought resistance Background The Triticeae tribe (Poaceae), includes economically important annual crops (e.g., wheat, rye, and barley) and crucial perennial forage grasses (e.g., Roegneria, Agropyron, Leymus, and Pseudoroegneria) [[51]1]. The genus Pseudoroegneria consists of six diploids (2n = 2x = 14, StSt) and nine autotetraploid species (2n = 4x = 28, StStStSt) [[52]2]. Pseudoroegneria was built around one genome designated as St, which is one of the most important basic genomes (St, E, H, P, W, and Ns) in perennial Triticeae. The St genome is the donor genome of species in eight polyploid genera, including Roegneria (StY), Douglasdeweya (StP), Elymus (StH), Thinopyrum (StE), Campeiostachys (StYH), Anthosachne (StYW), Kengyilia (StYP), and Pascopyrum (StHNsXm) [[53]2, [54]3]. At present, the St-containing species were utilized for wheat breeding, such as wheat-Thinopyrum poticum (StStEEE) translocation line with stripe rust resistance [[55]4], Wheat-Elymus repens (StStH) 3D/3St translocation line with highly resistance to Fusarium head blight (FHB) [[56]5]. Recently, a new resistance locus FhbRc1 was mapped to the distal region of chromosome 7S^cL of Roegneria ciliaris [[57]6, [58]7]. In addition, high grass production, excellent disease, and saline-alkali resistance characteristics make the St-containing species to be the ideal germplasm for selecting forage varieties and ecological restoration [[59]8, [60]9]. Thus, the St genome not only accounts for extensive perennial speciation in Triticeae, but also is prominent for forage and crop breeding. More and more efforts involved the origin of the St genome and genetic relationships among the Triticeae species. Molecular data based on several single-copy genes or transcriptome analysis estimated that the genus Pseudoroegneria is more ancient than the Triticum/Aegilops group but younger than the genus Psathyrostachys (genome symbol was designed as Ns) [[61]10, [62]11]. Genomic affinity suggested that the St genome was closely related to the J (= E) genome of Thinopyrum elongatum, and diverged from H (Hordeum bogdanii), D sub-genome, then followed by B and A sub-genome in common wheat [[63]12, [64]13]. Recently, Wang et al. (2020) mapped 14 linkage groups (LGs) and identified seven homologous groups of the St genome, and revealed the genome shared homology between the St and ABD, and H genome (35% and 24%, respectively) [[65]14]. Karyotype comparative analyses revealed a highly conserved cytogenetic collinearity between the St genome and the common wheat genome, except for a well-known 4 A chromosome translocation [[66]15]. These studies provide partly St genome information, nevertheless, the whole genome homoeology and genetic relationship between the St genome and the published Triticeae species from whole genome level have not been studied. Pseudoroegneria plants are predominately cool-season and drought-tolerant grasses, which are distributed in arid or semi-arid areas. The diploid Pse. libanotica grows in the lithoid slopes of Lebanon, Iraq, and the north of Iran [[67]2]. Molecular and phylogeny studies of Triticeae species showed that Pse. libanotica has earlier speciation at phylogenetic tree, implying probably a more ancient St genome than diploid Pseudoroegneria species from Eastern Europe (Pse. strigosa), East Asia (Pse. stipifolia), and North America (Pse. spicata) [[68]16, [69]17]. Morphologically, Pse. libanotica is covered by thickened cuticular wax on the aerial part and it has been reported that the cuticular waxes is essential to prevent non-stomata water loss in Pseudoroegneria [[70]8, [71]18]. Thus, characterizing the St genome of Pse. libanotica might reveal the regulation of cuticular wax biosynthesis in response to water deficit. The regulation of cuticle deposition in response to drought stress was found in model species, crops, and important economic species such as Arabidopsis thaliana, Oryza sativa, wheat (Triticum aestivum), maize, soybean (Glycine max), sesame (Sesamum indicum), sorghum, oats (Avena sativa), Medicago sativa, cotton (Gossypium hirsutum), tree tobacco (Nicotiana glauca), and pine (Pinus palustris) [[72]19–[73]21]. Cuticular waxes mostly comprise very long chain fatty acids (VLCFAs) and their derivatives, including alkanes, wax esters, branched alkanes, primary alcohols, alkenes, secondary alcohols, aldehydes ketones, and unsaturated fatty alcohols, as well as cyclic compounds including terpenoids and metabolites, such as sterols and flavonoids. Multiple genes involved the cuticular wax biosynthesis have been investigated. The plastid Acetyl-CoA carboxylases are responsible for de novo fatty acid biosynthesis (up to C16:0, C16:1, and C18:1) [[74]22, [75]23]. Further, the long-chain fatty acids (C16–C18) are exported to the cytosol after the hydrolysis of ACPs by acyl-ACP thioesterases [[76]24]. These 3-ketoacyl-CoA synthases (KCS) isoforms are involved in fatty acid elongation [[77]25]. Some cytochrome P450s have been reported to be involved in plant cuticle wax synthesis as well [[78]26, [79]27]. Eceriferum (CER) genes affect different steps of the wax biosynthesis pathway including primary alcohol forming, aldehydes, and alkanes synthesis [[80]28]. Researchers investigated distinctive cuticular waxes among different plant species, tissues, and organs, even in different growth and developmental stages [[81]29]. Novel genes involved in cuticular wax biosynthesis may be practically used as valuable genetic resources to improve drought tolerance in crop breeding [[82]30]. However, drought-induced wax biosynthesis has been seldomly investigated in wild germplasms in Triticeae. Given the importance of the St genome in Triticeae and the excellent drought resistance traits of the Pseudoroegneria species, the diploid wild species Pse. libanotica was performed whole-genome sequencing in this study. We sequence and assemble the chromosome-scale reference St genome to (1) identify the genetic relationship, especially among Triticeae species using whole genome comparative analysis; (2) elucidate the mechanism of fatty acid biosynthesis in Pse. libanotica under drought stress. Those results will provide solid information for the evolution of Triticeae species, and facility better germplasm utilization. Results Genome assembly, quality evaluation, and annotation Before the genome de novo assembly, a genome size survey of Pse. libanotica (PI 228,392) based on flow cytometry and k-mer statistics is about 3,273.28 Mb (1 C) and 3,048.18 Mb, respectively (Fig. [83]S1). The genome of Pse. libanotica was de novo assembled by integrating ~ 191 Gb (64×) Illumina short paired-end reads, ~ 440 Gb (sequencing depth 147×) Nanopore sequencing data, and ~ 330 Gb (110×) high-throughput chromosome conformation capture (Hi-C) data to generate the V1.0 assembly. For draft assembly improvement, we conducted three rounds of self-correction for Nanopore data and Illumina correction and generated a chromosome-level assembly of Pse. libanotica. The assembly sequence comprised 2.99 Gb of genome data, with a contig N50 of 0.96 Mb and a super-scaffold N50 of 398.55 Mb, accounting for 96.45% of the estimated genome size with 45.07% GC content and 1.34% heterozygous (Table [84]1; Table [85]S1 and S2; Fig. [86]S2 and [87]S3). Of the 2.99 Gb scaffold sequences, there are 2.75 Gb (92.09%) totaling 4060 scaffolds were anchored to seven super-scaffolds (chromosomes) using the Hi-C platform. The number of chromosome-scale super scaffolds is consistent with the species’ determined chromosome number of 7. The shortest chromosome is Chr4 (327.30 Mb), which contains 462 scaffolds, and the longest chromosome is Chr2 (464.04 Mb), which contains 660 scaffolds (Fig. [88]S4; Table [89]S3). The assembly accuracy and the qv were 23.23, which indicated the reliability of the assembly genome. Table 1. Overview of genome assembly and gene annotation for Pse. libanotica Assembly characteristics Values Estimated genome size 3.10 Gb Assembled genome size 2.99 Gb Total length of contigs 2.99 Gb N50 length of contigs 0.96 Mb Total number of contigs 7,391 N50 length of super scaffolds 398.55 Mb Number of annotated high-confidence genes 46,369 Percentage of repeat sequences 71.62% Complete BUSCOs 95.20% Fragmented BUSCOs 1.20% Missed BUSCOs 3.60% [90]Open in a new tab The long terminal repeat (LTR) Assembly Index (LAI), which evaluates the contiguity of intergenic and repetitive regions of genome assemblies based on the intactness of LTR retrotransposons (LTR-RTs) [[91]31], of the Pse. libanotica genome assembly was 13.20 (Table [92]S4), reaching to the criterion of reference quality. Further, we conducted CEGMA [[93]32] and BUSCO [[94]33] to evaluate the integrity and base accuracy of the assembled Pse. libanotica genome. CEGMA showed that the assembled genome completely covered 228 (91.94%) of the 248 core genes, and partially covered 11 core genes. Less than 4% of the core genes were not detected. BUSCO displayed that 95.2% of the 1440 single-copy genes were homologous sequences in Triticeae species (Table [95]S5). The draft assembly was further evaluated by mapping short high-quality reads into the assembled genome. The mapping rate was 98.95%, with 58.95% of the average sequencing depth (Table [96]S6). In Pse. libanotica, 151,872 expressed sequence tag (EST) sequences were mapped to the genome with > 95% identity, in which 132,240 (87.10%) were aligned to the reference genome with > 87% coverage (Table [97]S7). Collectively, these data showed the high coverage of the assembled St genome. A total of 46,369 protein-coding genes were identified, 82.89% of which had functional annotations, and 67.16% were annotated with functional domain using eggNOG database (v5.0) (Fig. [98]1; Tables S8, S9). Moreover, we identified 1,483 transfer RNAs, 18,438 miRNAs, 1,427 small nuclear RNAs, and 473 ribosomal RNAs (Table [99]S10). Repeat sequences comprised 71.62% of the assembled genome, in which the long terminal repeat-retrotransposon (LTRs) were the most abundant repeat type, including two ubiquitous classes Gypsy and Copia. Meanwhile, the DNA retrotransposon, short interspersed nuclear elements (SINEs), and long interspersed nuclear elements (LINEs) had the lowest proportion in the final assembly (Table [100]2, S11). The Gypsy family density was increased from the telomere to the centromere, while the Copia family was uniformly distributed along the seven chromosomes (Fig. [101]1). Fig. 1. [102]Fig. 1 [103]Open in a new tab Overview of the Pse. libanotica PI 228,392 genome. The tracks indicate (moving inwards): (a) Chromosomes (Chr1 ~ 7) name and size, (b) Gene density, (c) repeat density (window size of 500 kb), (d) LTR-Gypsy (window size of 500 kb), (e) LTR-Copia (window size of 500 kb), (f) GC content (%), (g) Expansion gene (window size of 10 Mb), (h) Contraction gene (window size of 10 Mb), (i) Chromosome homologous relationship Table 2. Statistics of transposable elements in Pse. libanotica genome sequences Denovo+Repbase Length(bp) % in Genome TE proteins Length(bp) % in Genome Combined Tes Length(bp) % in Genome DNA 80,757,639 2.70 49,637,759 1.66 128,642,083 4.30 LINE 2,196,166 0.07 33,738,564 1.13 35,319,985 1.18 SINE 1,570,054 0.05 0 0 1,570,054 0.05 LTR 1,952,684,497 65.32 391,751,832 13.1 1,977,516,340 66.15 Unknown 9,318,886 0.31 120 0.000004 9,319,006 0.31 Total 2,045,135,021 68.41 475,109,220 15.89 2,105,455,067 70.43 [104]Open in a new tab Phylogenetic evolution, whole-genome duplication, and genome synteny To explore the phylogeny and divergence of the St genome, we reconstructed the phylogenetic tree using single-copy ortholog genes shared by 14 Poaceae plants (Fig. [105]2A). Sorghum bicolor and Zea mays were C[4] plants and diverged firstly in the phylogenetic tree, followed by O. sativa, Brachypodium distachyon, and Dactylis glomerat. Triticeae species were clustered into one monophyletic group (Aegilops tauschii, Hordeum vulgare, Secale cereale, T. aestivum, T. durum, T. dicoccoides, T. urartu). The Pse. libanotica diverged ~ 10.7 million years ago (Mya) after H. vulgare (~ 11.9 Mya), while the S. cereale and the genus Triticum separated ~ 9.6 Mya. To clarify whole-genome duplication in Pse. libanotica, synonymous substitutions (Ks) were calculated in Pse. libanotica, D. glomerat, and B. distachyon. The peak Ks at 0.20 which occurred after the divergence peak at 0.05 between Pse. libanotica and D. glomerat and 0.05 between Pse. libanotica and B. distachyon, indicating that a whole-genome duplication event occurred in the common ancestor of Poaceae (Fig. [106]2B). Fig. 2. [107]Fig. 2 [108]Open in a new tab Gene family and genome evolution of Pse. libanotica. A: The left panel includes the estimation of divergence time of Pse. libanotica and O. sativa, B. distachyon, T. aestivum, A. tauschii, T. durum, T. dicoccoides, T. urartu, S. cereale, H. vulgare, Z. mays and (A) thaliana. The right panel displays the distribution of single-copy, multiple-copy, unique and other orthologues. B: Distribution of the ks values of the best reciprocal BLASTP hits in the genomes of Pse. libanotica (Pse), D. glomerata (Dgl) and (B) distachyon (Bdi). C: The number of gene families shared among five Triticeae species genomes shown in Venn diagrams. The overlap between the circles indicates gene families shared between species, and the numbers refer to the number of gene families Orthologous genes in Pse. libanotica, T. aestivum, T. urartu, H. vulgare, and D. glomerat were analyzed to illustrate chromosome collinearity and derivation (Fig. [109]S5). Pse. libanotica exhibited conserved chromosome homoeologous collinearity with A genome, B-subgenome, D-subgenome, and H genome, and varied with D. glomerat. Interestingly, the 4St in Pse. libanotica and the 4 A chromosome of T. urartu displayed high consistency. Nevertheless, a pericentric inversion was observed in the 4 A chromosome, and a 4AL (long arm)-5AL-7BS (short arm) translocation in the common wheat. The St genome and the D. glomerat displayed chaotic chromosome collinearity. The 2St and 3St chromosomes mainly matched to the 1 and 5 chromosomes of D. glomerat, the 5St had syntenic to the 3 and 4 chromosomes of D. glomerat, and the 7 St was syntenic to the 3 and 7 chromosomes of D. glomerat (Fig. [110]S5). Thus, gene order and location varied among Poaceae species after divergence, conserved among diploid Triticeae species, and partly rearranged during the polyploidization of Triticeae. Unique gene families, contract and expanded gene families involved in cuticular wax biosynthesis in pse. Libanotica To characterize the St genome, unique gene families, contract and expanded gene families were analyzed in the St genome. In total, 10,073 shared gene families were identified in published Triticeae species and B. distachyon, whereas 882 gene families were specific to Pse. libanotica (Fig. [111]2C; Table [112]S1). Unique gene families in Pse. libanotica were mainly involved in MAPK signaling pathway, ABC transport, and cuticular wax biosynthetic pathways (Fig. [113]S6; Table [114]S12, S13). Compared with the most recent common ancestor (MRCA), Pse. libanotica contained 3,558 expanded and 7,007 contracted genes (Table [115]S14). There showed no distribution preference bias of expanded or contracted gene location among seven chromosomes, whereas GO term analysis revealed that most expanded or contracted genes were enriched in protein binding molecular functions including fatty acid binding. KEGG pathway enrichment analysis showed that the expanded gens were in five pathways: cutin, suberine, and wax biosynthesis, sesquiterpenoid and triterpenoid biosynthesis, flavone and flavonol biosynthesis, and benzoxazinoid biosynthesis (Table [116]S15, S16). In summary, unique gene families, expanded and contracted gene families in the St genome are clustered in the cutin, suberine, and wax biosynthesis pathways. Fatty acid-related genes under drought stress in pse. Libanotica To identify candidate genetic regions associated with fatty acids and their derivatives metabolism pathway, we performed 28 d drought stress treatment without irrigation and conducted transcriptome sequencing (6 Gb/sample) on each treatment. Every seven days was taken as a treatment 7 d, 14 d, 21 d, 28 d, and each treatment has three biology repeats. Total reads and mapping ratio were listed in Table S17. The number of differentially expressed genes (DEGs) under drought conditions at 7d, 14d, 21d, and 28d were selected using criteria| log2FC| ≥ 1 and adjusted q-value < 0.05 (Table S18). We found that DGEs were largely expressed at 14 d drought treatment. In total, 1,010 co-expressed DEGs were shared at four treatments (Fig. [117]S7A, S7B). GO enrichment analysis showed that 1,190 GO terms were assigned to the 2,300 DEGs that responded to drought treatment (Fig. [118]S8). KEGG enrichment analysis showed that DEGs were mainly involved in the biosynthesis of unsaturated fatty acids, fatty acid metabolism and the elongation process (Fig. [119]S9). Comparing the DEGs with expansion and contraction genes, 1,576 DEGs (9.25%) were found in the expansion genes, whereas 61 DEGs (29.35%) were found in the contraction genes. In total, we identified 14 significantly different expression genes (ACC, FATB, FACR, FACR1, FACR4, KCS1, KCS5, KCS6, KCS11, KCS12, KCS20, CER1, CER3 and CYP96A15) including 19 transcripts that directedly participated in the fatty acid biosynthesis pathway under 28 days of drought stress (Table S19). Phylogeny tree of the transcripts corresponding genes, involving in wax biosynthesis, displayed that Kcs gene family, Facr gene family and Cer gene family were formed subclade respectively (Fig. [120]S10). Fatb was annotated as a novel gene (novel.7326), thus, it was absent in the phylogenetic tree. During the de novo fatty acid biosynthesis, the Acc and Fatb were upregulated during 21-day water deficit. During fatty acid elongation, except for two transcripts of Kcs5 and evm.TU.CTG175.54, other Kcs genes (Kcs1, Kcs6, Kcs11, Kcs12, and Kcs20) were significantly downregulated. Genes catalyzed VCLFAs derivatives in alcohol-forming pathway (Far, Far1, and Far4) and alkane-forming pathway (Cer1, Cer3/Cytb5, Cyp96a15/Mah1) were upregulated apart from one of the Far1 transcript evm.TU.CTG5611.2 (Fig. [121]3). In summary, we speculated that only Kcs5 and evm.TU.CTG175.54 were responsible for VCLFA elongation in Pse. libanotica. Fig. 3. [122]Fig. 3 [123]Open in a new tab Cuticular wax biosynthetic pathway ([124]www.kegg.jp/kegg/kegg1.html) related genes in Pse. libanotica under drought stress. Cuticular wax are synthesized and secreted in the endoplasmic reticulum (ER). First, long-chain fatty acids (C16–C18) are synthesized in the plastids. Second, the saturated C16 and C18 fatty acyl-CoA precursors are elongated to very long chain fatty acyl-coenzyme As (VLCFA-CoAs) in ER. Once the VLC-acyl-CoAs are synthesized, they can be further released as free VLCFAs that can either be directly exported as cuticular waxes, or undergo further modifications in the via either the alkane-forming pathway or the alcohol-forming pathway Characterization of the evm.TU.CTG175.54 gene related to VCLFA elongation To characterize the evm.TU.CTG175.54, synteny analysis among major Triticeae species (T. aestivum, T. urartu, A. tauschii, and H. vulgare) were conducted. Candidate genes presented ortholog among major Triticeae genomes, however, no orthologs of the evm.TU.CTG175.54 were found (Fig. [125]4). We aligned the cDNA in the National Center for Biotechnology Information ([126]https://blast.ncbi.nlm.nih.gov/Blast.cgi, NCBI database), and the result showed that it was Kcs5 like and/or Kcs6 like in Poaceae species, such as T. aestivum, A. tauschii, H. vulgare, Lolium rigidum, Z. mays, O. sativa, and Panicum virgatum, etc. Comparative analysis of cDNA sequence showed that 86.92% and 77.23% similarity shared between evm.TU.CTG175.54 and Kcs5 and Kcs6, respectively. Fig. 4. [127]Fig. 4 [128]Open in a new tab Microsynteny analysis between fatty acid biosynthesis candidate genes loci and their respective synteny counterparts in common T. aestivum (ABD), T. urartu (A), A. tauschii (D) and H. vulgare (H). The syntenic genes are connected by colored lines. The evm.TU.CTG175.54 were displayed in red line. The arrowhead indicates genes resided in the Pse. libanotica locus, and gene names are above the arrowhead. Corresponding syntenic gene names are displayed below The phylogenetic tree based on 132 KCS proteins in Poaceae displayed that the evm.TU.CTG175.54 distributed in a subclade including KCS5 or KCS6 similar proteins, while other KCS proteins (KCS1, KCS5, KCS6, KCS11, KCS12, and KCS20) formed a single subclade, respectively (Fig. [129]5). Thus, we speculated that the evm.TU.CTG175.54 might be a new Kcs gene during fatty acid elongation under drought stress in Pse. libanotica and named it as PlKCS5/6. Fig. 5. [130]Fig. 5 [131]Open in a new tab Maximum likelihood tree derived from 132 KCS protein sequences among Poaceae. KCS proteins in this research were presented in red, while KCS proteins downloaded from NCBI were nominated as ‘KCS protein + species + protein entry’ To investigate the growth of yeast cells exposed to stresses, the yeast cells harboring empty pYES2 (as a control) or pYES2-PlKCS5/6 plasmid was evaluated after drought treated with 2 M sorbitol (Fig. [132]6). The results showed that the growth rate of yeast contained empty pYES2 and pYES2-PlKCS5/6 had no difference under the treatment of 0 M sorbitol. However, the growth of the PYES2-PlKCS5/6 plasmid was worse than empty pYES2 under 2 M sorbitol treatment (Fig. [133]6). These results demonstrated that expression of the PlKCS5/6 gene in yeast related to the abiotic stress treatments. The function of PlKCS5/6 needs further in-depth research. Fig. 6. [134]Fig. 6 [135]Open in a new tab The growth activity of pYES2 and pYES2-PlKCS5/6 under different treatments. 5 µL of serial dilutions of pYES2 and pYES2-PlKCS5/6 stressed with 0 M sorbitol and 2 M sorbitol were cultivated on SC-Ura medium for 2 days Discussion Characterization of the reference genome of pse. Libanotica Karyotype comparative analysis displayed that the chromosome length of the St genome is shorter than other genomes in Triticeae, such as R genome (Secale), H genome (Hordeum), P genome (Agropyron), Ns genome (Psathyrostachys), and E genome (Thinopyrum) [[136]34]. In this study, we assemble a highly contiguous reference genome sequence of Pse. libanotica with 2.99 Gb in size, by using Illumina sequencing, nanopore-based next-generation sequencing, and chromosome-scale scaffolding by Hi-C. The genome size of St is smaller than other diploid Triticeae species reported at present, such as the D genome (4.3 Gb) of A. tauschii [[137]35], the E genome (4.78 Gb) of Th. elongata [[138]36], and the R genome (7.74 Gb) of S. cereale [[139]37]. Repeat sequences comprise 71.62% of the assembled Pse. libanotica genome which is lower than repetitive proportions in Hordeum vulgare, Th. elongatum, Triticum aestivum, Aegilops speltoides and Secale cereal (varies from 80.8 to 90.31%) even though they displayed no composition difference. Thus, we suppose that less repetitive sequence may responsible for smaller St genome size. From the standpoint of germplasm innovation for breeding, clarifying the genome relationships in Triticeae can provide useful strategies for obtaining germplasm with high genetic diversity and fitness by artificial hybridization [[140]3]. In this study, we found that the St genome is closely related to the A, B, and D genomes of wheat, offering the potential to transfer gene(s) from the St genome to wheat. Furthermore, we observed that T. aestivum 4 A was syntenic to Pse. libanotica chromosome 4St, 5St, and 7St, and T. urartu 4 A was syntenic to Pse. libanotica chromosome 4St and 5St, which were related to the most significant overall re-arrangement of chromosome 4 A in wheat [[141]38, [142]39]. The St genome combines with other basic genomes formed more than 60% perennial Triticeae species [[143]2, [144]3]. These St-containing species are excellent forages for feeding livestock and contain abundant resistant genes for crop breeding and improvement [[145]6–[146]9]. Pse. libanotica is an often-crosspollination plant, with an estimated genome size of 1.34% heterozygous. Cytogenetic research revealed that 5St and 7St displayed high-level genetic heterogeneity among seven pairs of chromosomes of Pse. libanotica [[147]15]. The high genomic heterozygosity could frequently occur from open pollination plants, which not only spurs the establishment of fitness and abundant genetic diversity but also contributes to the genetic advantage in terms of the richness of their polyploid offspring species in Triticeae. Many natural hybrids between various St-containing polyploid species have been reported [[148]40–[149]43], further enlarging the richness of St-containing descendants. The hierarchy of phylogenetic tree based on all single-copy gene families in this research is similar to that from few nuclear single-copy genes [[150]10], whole chloroplast genome [[151]3] and nuclear transcriptome sequencing [[152]11]. However, the estimate of diverge time is slightly different. We evaluated the Pseudorogneria diverged 10.7 Mya (10.0 ~ 11.4 Mya), which is earlier than 8.0 Mya (5.6 ~ 10.3 Mya) based on nuclear single-copy genes, but later than 14.5 Mya (14.4 ~ 14.7 Mya) inferred from nuclear transcriptome. The difference might be caused by different genetic information used and different species used for constructing the chronogram. Yet despite all that, the speculated diverge time belongs to the middle and late Miocene (5.3 ~ 15 Mya). During Miocene stage, the ice sheet grew, and the Tethyan Seaway went through a gradually complete closure leading to the low temperature and water deficit in the Iranian Plateau [[153]44]. The St genomic analysis revealed that unique gene families and expanded families displayed enrichment in cutin, suberine, and wax biosynthesis. Morphologically, the aerial parts of Pse. libanotica is covered by cuticular wax [[154]2]. It has been reported that plant cuticular wax, which acts as the first barrier against environmental threats, consistently serves a critical role in restricting nonstomatal water loss which was often related to drought resistance [[155]19–[156]21, [157]45, [158]46]. Thus, it is reasonable to speculate that Pse. libanotica diverged during the middle and late Miocene in Tran and might develop cuticular wax resistance to cold and drought environments. In this study, Ks was calculated in Pse. libanotica, D. glomerat, and B. distachyon to identify putative whole-genome duplication events by using each gene pair in syntenic blocks. The results indicated that a whole-genome duplication event occurred in the common ancestor of Poaceae, which is consistent with previous study [[159]47]. However, Pse. libanotica and other species used for calculated Ks are heterozygous, and polymorphic positions are represented by only one of the possible nucleotides in assembled sequences. Carina et al. [[160]48] think that traditionally methods (using a single representative DNA sequence per species) ignored the presence of polymorphisms, thus leading to bias. To address the problem, they apply a time-dependent Poisson random field model of sequence divergence to study codon evolution in protein coding genes. They suggested a best-practice protocol for the estimation of Ka/Ks and illustrate the performance of this protocol by studying genome data set of four crow species [[161]48]. Therefore, the effect of heterozygosity or polymorphism on Ks should be fully considered in subsequent studies. A combination of cuticular wax and related genes might promote the excellent drought resistance of pse. Libanotica Cuticular waxes mostly comprise VLCFAs and their derivatives. In this study, 14 genes involved in cuticular wax biosynthesis were identified in leaves of Pse. libanotica during 28 days of drought treatment. The Acc was reported to catalyze the first step and increased significantly the overall rate of fatty acid biosynthesis up to C16:0, C16:1, and C18:1 [[162]49]. The long-chain fatty acids are further hydrolyzed by Fatb, and the FATB thioesterase gene encodes a group of enzymes with more heterogeneous substrate specificity. However, generally shows high activities towards saturated acyl-ACPs which is the major determinant of the chain length and level of saturated fatty acid found on most plant tissues [[163]24]. In Pse. libanotica, Acc and Fatb, determined the de novo fatty acid biosynthesis, are significantly upregulated during 21 days of water deficit which might indicate its excellent drought resistance. Based on differences in VLCFA length and degree of unsaturation, 21 KCS genes in Arabidopsis showed substrate specificity and organ- or tissue-specific expression patterns during fatty acid elongation [[164]25, [165]50]. It is proved that Kcs1 is an important gene and has broad substrate specificity for saturated and mono-unsaturated C16–C24 acyl CoAs formation [[166]51]. The Kcs20 and Kcs2/DAISY are involved in the two-carbon elongation to C22 VLCFA, and are required for cuticular wax and root suberin biosynthesis and expressed in stem epidermal peels [[167]52]. The Kcs6 is essential and crucial for the production of epicuticular and pollen coat lipids that are longer than C28 [[168]53, [169]54]. In this study, two Kcs5 transcripts and evm.TU.CTG175.54 were upregulated during VLCFA elongation. It has been reported that the Kcs5 is responsible for C26–C34 formation in Arabidopsis [[170]55], whereas gene responsible for the C18 elongated to C26 is missing in Pse. libanotica. The evm.TU.CTG175.54 displayed partly DNA and protein sequence similarity with published KCS5 and KCS6, and differed from other reported KCS gene families. Thus, we speculate that evm.TU.CTG175.54 might be a new gene responsible for C18 to C26 biosynthesis during VLCFA elongation in Pse. libanotica. The fatty acyl-CoA reductase (FAR) gene family is expressed in the epidermis of aerial tissues and in roots, which is responsible for the primary alcohol formatting pathway. In Pse. libanotica, the Far1 was downregulated, while the Far4 was upregulated. It has been reported that Far4 catalyzes the reduction of VLC-acyl-CoAs to primary alcohols, presumably via an unreleased aldehyde intermediate which was produced through alkane forming pathway [[171]56]. The Cer1, Cer3, Cyp96a15/Mah1 related to alkane forming pathway were significantly upregulated in Pse. libanotica. Cer1 interacts with the wax-associated protein Cer3 and endoplasmic reticulum–localized Cytb5s resulting in VLC alkane synthesis [[172]28]. The alkane was further hydrolyzed by Cyp96a15/Mah1, which led alkanes producing secondary alcohols and corresponding ketones [[173]26]. To sum up, we assume that the aldehydes, alkanes, secondary alcohol, and ketones accumulated in the alkane forming pathway are the main cuticular wax component in Pse. libanotica leaves. These might increase wax deposition and restricting water loss. Conclusions In this study, we reported the chromosome-level genome assembly for the St genome of Pse. libanotica. The total genome size is 2.99 Gb and 46,369 protein-coding genes were annotated with 71.62% repeat sequences. During the middle and late Miocene, unique genes, gene family expansion, and contraction in Pse. libanotica were enriched in biotic and abiotic stresses, such as fatty acid biosynthesis which may greatly contribute to its drought adaption. Furthermore, we investigated genes associated with the cuticular wax formation and water deficit and found a new Kcs gene evm.TU.CTG175.54 plays a critical role in the VLCFA elongation from C18 to C26 in Pse. libanotica. Our study lays a foundation for the genome diversification of Triticeae species and deciphers cuticular wax formation genes involved in drought resistance. Materials and methods Plant materials The diploid Pse. libanotica accession PI 228,392 (2n = 14) was used for genome sequencing in this study. The original seeds were collected from the northeast side of Kuhe Savalan, Azerbaijan, Iran (47.85 N, 38.30E). It was kindly provided by the National Plant Germplasm System (NPGS, United States). The voucher specimens were kept in the herbarium of Triticeae Research Institute, Sichuan Agricultural University, China. DNA extraction and library preparation High-molecular-weight genomic DNA was extracted from fresh leaves of a single plant using a DNAsecure Plant Kit (TIANGEN, Beijing, China). For Nanopore sequencing, a ligation of the sequencing adapters library was prepared following the manufacturer’s protocol (PromethION, CA). For Illumina (San Diego, CA) short-read sequencing, libraries were size-selected for PE150 sequencing. Sequencing libraries with insert sizes of 350 bp were constructed and sequenced using an Illumina HiSeq X Ten platform at the Novogene Bioinformatics Institute, Beijing. Genome assembly We constructed a de novo assembly of the St genome of Pse. libanotica PI 228,392 by combining sequences from three different technologies: Illumina PE150 short-read sequencing, Nanopore long-read sequencing, and Hi-C conformational alignment. The clean Nanopore reads after filtering and decontamination were assembled with wtdbg2 (v2.5). The error-corrected reads were aligned to each other and assembled into genomic contigs. The iterative polishing was conducted using Pilon (v1.22) in which clean Illumina reads were aligned with the pre-assembled contigs and BWA-MEM with the default parameters [[174]57, [175]58]. Further, we combined the final pre-assembled contig sequences from Nanopore sequencing and clean paired-read data from Illumina sequencing into scaffolds using SSPACE (v3.0) tool [[176]59]. Genome assembly completeness was assessed using the plantae database of 1440 single-copy orthologues using BUSCO (v3) with a BLAST threshold E-value of 1 × 10^− 5. The LAI was used to evaluate the assembly quality [[177]31]. GenomeTools v1.6.2 (gt ltrharvest) and LTR_Finder v1.07 was used to accurately identify LTR-RTs. LAI was calculated with parameters: -t 24 -window 3,000,000 -step 300,000 -1. Higher LAI scores correspond to more complete genome assemblies because a greater number of intact LTR retrotransposons are identified in these cases. The Hi-C libraries were prepared as described previously [[178]60]. Hi-C library sequence used a modified SNAP read mapper to align the draft input assembly ([179]http://snap.cs.berkeley.edu) [[180]61]. HiRise was used to analyze the segregation of Hi-C read pairs mapped within draft scaffolds, and a likelihood model of the genomic distance between the read pairs was generated. The model was used to identify and break putative mis-joins, score prospective joins, and select joins above a threshold. Annotation of repetitive sequences Both homology-based and de novo-based approaches were used to search for TEs. Tandem Repeat was extracted using TRF ([181]http://tandem.bu.edu/trf/trf.html) by ab initio prediction. The homolog prediction used Repbase ([182]http://www.girinst.org/repbase) database employing RepeatMasker ([183]http://www.repeatmasker.org/) software and its in-house scripts (RepeatProteinMask) with default parameters to extract repeat regions. For the de novo-based approach, we used LTR_FINDER ([184]http://tlife.fudan.edu.cn/ltr_finder/), RepeatScout ([185]http://www.repeatmasker.org/), and RepeatModeler ([186]http://www.repeatmasker.org/RepeatModeler.html) to build the de novo repeat library. All the repeats identified by different methods were combined into the final repeat annotation after removing the redundant repeats. Genome annotation To predict protein-coding genes, three approaches were used: de novo gene prediction, homolog prediction, and RNA-sequencing annotation. For de novo prediction, Augustus (v3.2.3), Geneid (v1.4), Genescan (v1.0), GlimmerHMM (v3.04), and SNAP ([187]http://homepage.mac.com/iankorf/) were applied to predict genes. For homolog prediction, the protein sequences of twelve published plant genomes (A. tauschii, B. distachyon, T. aestivum, T. durum, T. dicoccoides, T. urartu, H. vulgare, O. sativa, S. cereale, Sorghum bicolor, Z. mays, and Arabidopsis thaliana) were aligned to the genome using TblastN (v2.2.26; E-value ≤ 1e-5), and then used Gene-Wise (v2.4.1) [[188]62] to predict gene structures. To optimize the genome annotation, the RNA-seq reads were aligned to the genome using TopHat (v2.0.11), and the alignments were used as input for Cufflinks (v.2.2.1) [[189]63, [190]64]. The non-redundant reference gene set was generated by merging genes predicted by three methods with EvidenceModeler (v1.1.1) using PASA (Program to Assemble Spliced Alignment) terminal exon support and including masked transposable elements as input into gene prediction [[191]65]. Genes functions were assigned according to the best match by aligning the protein sequences to the Swiss-Prot (with a threshold of E-value ≤ 1e^-5) [[192]66]. The motifs and domains were annotated using InterProScan70 (v5.31) by searching against publicly available databases, including ProDom, PRINTS, Pfam, SMRT, PANTHER, PROSITE, and eggNOG (v5.0) [[193]67–[194]71]. The Gene Ontology (GO) IDs for each gene were assigned according to the corresponding InterPro entry. Constructing gene families To construct the dataset for gene-family clustering, the protein sequences from the genomes of Pse. libanotica and 13 other plants (A. tauschii [[195]35], B. distachyon [[196]72], T. aestivum [[197]73], T. durum [[198]74], T. dicoccoides [[199]75], T. urartu [[200]76], H. vulgare [[201]77], O. sativa [[202]78], S. cereale [[203]37], Sorghum bicolor [[204]79], Z. mays [[205]80], Dactylis glomerata [[206]81] and Arabidopsis thaliana [[207]82] were used. In the included species, only the longest transcript in the coding region was retained for further analysis when multiple transcripts were present in a gene. Additionally, genes encoding proteins with fewer than 50 amino acids were filtered. The protein sequences of all species were filtered by BLASTP with an E-value of 1e^-5 Protein sequences from all 14 species were clustered into paralogous and orthologous groups using OrthoMCL ([208]http://orthomcl.org/orthomcl/) with an inflation parameter equal to 1.5. Phylogenetic tree reconstruction Protein sequences of all single-copy gene families were aligned using MUSCLE [[209]83], and the alignments of each gene family were concatenated into a super-alignment matrix. These data matrices were used for maximum likelihood phylogenetic analyses by RAxML ([210]http://sco.h-its.org/exelixis/web/software/raxml/index.html) with a bootstrap value of 100, where A. thaliana was designated as the outgroup. The Venn diagram was constructed to display the number of gene families that were shared among four Poaceae species (A. tauschii, T. aestivum, H. vulgare, and S. cereale) clustered into one group of the phylogenetic tree. Species divergence time estimation Single-copy gene families among Pse. libanotica, (A) tauschii, (B) distachyon, T. aestivum, T. durum, T. dicoccoides, T. urartu, H. vulgare, O. sativa, S. cereale, Sorghum bicolor, Z. mays, Dactylis glomerata and Arabidopsis thaliana were selected using the MCMCTree program ([211]http://abacus.gene.ucl.ac.uk/software/paml.html) in phylogenetic analysis with Maximum Likelihood (PAML) for an estimate the divergence time of the nodes on the phylogenetic tree. The MCMCTree parameters were as follows: a burn-in of 10,000 steps, a sample number of 100,000, and a sample frequency of 2. The calibration times of divergence were obtained from the TimeTree database ([212]http://www.timetree.org/): 2.20–3.80 Mya for T. durum and T. aestivum, 3.60–4.40 Mya for T. dicoccoides and T. urartu, 10.0–11.40 Mya for Pse. libanotica and (A) tauschii, 9.60–11.90 Mya for S. cereale and H. vulgare, 31.4–47.2 Mya for Dactylis glomerata and (B) distachyon, 9. 0–12.0 Mya for S. bicolor and Z. mays, 47.2–189 Mya for O. sativa and A. thaliana. Gene family expansion and contraction The expansion and contraction of gene families were determined by comparing the cluster size differences between the ancestor and each species using the CAFE program ([213]http://sourceforge.net/projects/cafehahnlab/). A random birth-and-death model was used to evaluate changes in gene families of the phylogenetic tree. A probabilistic graphical model (PGM) was used to calculate the transfer probability of each gene family from parent to child nodes in the phylogeny. The conditional likelihood was used as the test statistics to calculate the corresponding P-value of each lineage, and a P-value of or below 0.05 was considered significant. Whole-genome duplication The homologous search in the Pse. libanotica genome was performed using BLASTP (E-value < 1e^-5), and then MCScanX ([214]http://chibba.pgml.uga.edu/mcscan2/) was used to identify syntenic blocks in the genome according to the gene location and blast results. For each gene pair in syntenic blocks, Ks values were calculated, and values of all gene pairs were plotted to identify putative whole-genome duplication events in Pse. libanotica. Drought experiments The seeds of Pse. libanotica PI 228,392 were germinated in a petri dish lined with a double layer of filter paper. Afterward, young seedlings were transplanted to potted monocultures and cultivated in a greenhouse at a temperature of 20℃ with a light cycle of 16 h/8 h until the sturdy stage and had a certain number of tillers. For drought treatments, adult stage (five-month-old) potted Pse. libanotica plants of uniform growth were selected for water deficit treatment. The experimental groups were not watered for 28 days (every seven days were taken as drought stress 7d, 14d, 21d, 28d, and each treatment has three biology repeats). The control group was watered every seven days during 28 days experiments. Fresh leaves of each treatment were taken at 9–10 am and stored at -80℃. Library preparation, transcriptome sequencing and analysis Accurate detection of RNA integrity and total volume with Agilent 2100 bioanalyzer. NEB general library building using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® kit and strand-specific library building using NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® kit. After passing the library test, the different libraries are pooled according to the effective concentration and the target downstream data volume required for Illumina sequencing. The basic principle of sequencing is sequencing while synthesizing. The library construction and Illumina sequencing were conducted at Novogene limited liability company (Beijing, China). Clean data were obtained after quality clipping of the raw data and Q20, Q30, and GC content. All the downstream analyses were based on clean data with high quality. We used the assembly Pse. libanotica as the reference genome. Reads were aligned to the reference genome using Hisat2 (v2.0.5) which can generate a database of splice junctions based on the gene model annotation file and thus produce a better mapping result than other non-splice mapping tools. The mapped reads of each sample were assembled by StringTie (v1.3.3b) in a reference-based approach [[215]84]. FeatureCounts v1.5.0 was used to count the reads numbers mapped to each gene. And the FPKM values were then mapped back to read counts according to known gene lengths. Differential expression analysis of two conditions/groups was performed using the DESeq2 R package (1.20.0). The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate. Differentially expressed genes (DEGs) were selected using criteria| log2FC| ≥ 1 and adjusted q-value < 0.05. We plot the venn diagram by the number of differential expressed genes in each experiment group. All numbers in the circle represent the total different genes in the comparison groups, the overlapping regions indicate different genes between groups. All DEGs were mapped to individual terms in the gene ontology (GO) database and the number of genes per term was calculated by the clusterProfiler R package. Analysis of gene regulatory pathways was conducted using the KEGG pathway database [[216]85, [217]86]. Finally, based on the above analysis, wax biosynthesis genes and Kcs genes were obtained from these DEGs. Electronic supplementary material Below is the link to the electronic supplementary material. [218]Supplementary Material 1^ (938.9KB, pdf) [219]Supplementary Material 2^ (161.7KB, pdf) [220]Supplementary Material 3^ (141.8KB, pdf) [221]Supplementary Material 4^ (9.8KB, xlsx) [222]Supplementary Material 5^ (11.1KB, xlsx) [223]Supplementary Material 6^ (19.9KB, xlsx) [224]Supplementary Material 7^ (11.5KB, xlsx) [225]Supplementary Material 8^ (276.1KB, xlsx) [226]Supplementary Material 9^ (1.2MB, xlsx) [227]Supplementary Material 10^ (57.5KB, xlsx) [228]Supplementary Material 11^ (11.7KB, xlsx) [229]Supplementary Material 12^ (184.6KB, xlsx) [230]Supplementary Material 13^ (11.7KB, xlsx) [231]Supplementary Material 14^ (1.1MB, xlsx) Acknowledgements