Abstract Background Asia minor bluegrass (Polypogon fugax, P. fugax), a weed that is both distributed across China and associated with winter crops, has evolved resistance to acetyl-CoA carboxylase (ACCase) herbicides, but the resistance mechanism remains unclear. The goal of this study was to analyze the transcriptome between resistant and sensitive populations of P. fugax at the flowering stage. Results Populations resistant and susceptible to clodinafop-propargyl showed distinct transcriptome profiles. A total of 206,041 unigenes were identified; 165,901 unique sequences were annotated using BLASTX alignment databases. Among them, 5904 unigenes were classified into 58 transcription factor families. Nine families were related to the regulation of plant growth and development and to stress responses. Twelve unigenes were differentially expressed between the clodinafop-propargyl-sensitive and clodinafop-propargyl-resistant populations at the early flowering stage; among those unigenes, three belonged to the ABI3VP1, BHLH, and GRAS families, while the remaining nine belonged to the MADS family. Compared with the clodinafop-propargyl-sensitive plants, the resistant plants exhibited different expression pattern of these 12 unigenes. Conclusion This study identified differentially expressed unigenes related to ACCase-resistant P. fugax and thus provides a genomic resource for understanding the molecular basis of early flowering. Electronic supplementary material The online version of this article (10.1186/s12864-017-4324-z) contains supplementary material, which is available to authorized users. Keywords: RNA sequencing, Transcriptomics, Herbicide resistance, Polypogon fugax, Clodinafop-propargyl, Flowering Background Common annual Asia minor bluegrass (Polypogon fugax) is a weed that is both distributed across China and associated with winter crops. This weed has evolved resistance to clodinafop-propargyl, an acetyl-CoA carboxylase (ACCase) herbicide [[33]1]. The mechanisms of herbicide resistance have been intensively studied in the past twenty years and at least three have been identified: 1) target site change; 2) closure or translocation of herbicides; and 3) alteration in the rate of herbicide metabolism [[34]2–[35]4]. Nevertheless, these three mechanisms alone often fail to explain the development of herbicide resistance from an evolutionary and ecological perspective [[36]5]. Herbicide resistance will increase the fitness of resistant individuals and hence their ability to produce the next generation. Identifying which biological characteristics play a major role on fitness and interactions with environmental factors is essential for predicting herbicide resistance. It is generally believed that the initial occurrence of major resistance genes in weed populations is the main factor that influences the dynamic evolution of a resistance under herbicide selection [[37]6]. In Lolium rigidum, carrying the common Leu-1781 in ACCase affects the fitness of resistant mutants, and the germination rate of the resistant biotype is lower than that of the sensitive biotype [[38]7]. To palliate this lower germination rate, the mutant Setaria viridis produces more seeds than the sensitive population [[39]8]. In addition, herbicide-resistant Setaria flowers earlier than the susceptible population, with more tillers and panicle, and with lighter seeds [[40]9]. These mechanisms contribute to a more important spread of the resistant populations compared with susceptible ones. Stress-induced flowering has recently received increased attention. Early flowering and seed setting of resistant plants allow the resistant seeds better access to resources [[41]9]. Photoperiodic flowering and vernalization have been well characterized, and the regulatory mechanisms are well known [[42]10–[43]12]. In addition, flowering physiologists had reported that plants tend to flower when grown under unsuitable conditions, indicating that stress is a flower-inducing factor [[44]13–[45]16]. Stress-induced flowering is now considered as the third category of flowering responses alongside regulated autonomous flowering and environment-induced flowering [[46]17]. Nevertheless, the mechanisms underlying early flowering remain poorly understood, particularly among herbicide-resistant weeds. Stress-induced flowering changes the life cycle and might alter fitness. In addition, stress adaptation extends to the evolution of the flowering characteristics [[47]17]. Transcription factors are proteins regulating gene expression and specific transcription factors selectively regulate the transcriptional expression of specific genes. Therefore, in the present study, we aimed to investigate the transcription factors that regulate plant flowering in order to elucidate the relationship between early flowering and selection pressure (herbicide application) of ACCase-resistant P. fugax, and to identify candidate genes responsible for early flowering in resistant plants. Methods Plants The seeds of a putative resistant population of P. fugax (RP population) were collected from Qingsheng County (29° 54′ 1″ N, 103° 48′ 57″ E), Sichuan Province, China, where clodinafop-propargyl has been used for more than five years and has failed to control P. fugax growth. A sensitive population of P. fugax (susceptible plant (SP) population) was sampled from a non-cultivated area in Xichang City, Sichuan Province (27° 50′ 56″ N, 102° 15′ 53″ E). The plants were collected without permissions being sought for the nature of scientific research according to the law of the People’s Republic of China. Because gene expression can differ due to genetic background, genetically homogenized plant material was generated by controlled pairings to narrow the difference. In brief, F1 plants were transplanted to individual 1-L pots in a greenhouse that has an 18/15 °C day/night temperature and a 14-h photoperiod. At the four-tiller stage, the plants were subjected to vegetative propagation: all individual tillers of each plant were separated and transplanted to individual pots. Four clones were therefore obtained. At the three-leaf growth stage, ACCase herbicide was applied. The herbicide sensitivity of each F1 plant was assessed by spraying with clodinafop-propargyl (45 g. active ingredient (a.i.)/ha). Sensitive and resistant F1 plants were then crossed, yielding an F2 population. Visual phenotype rating of the F2 plants was carried out by clodinafop-propargyl selection. The F2 plants with contrasting phenotypes were selected as resistant and sensitive plants. And the F2 generation was used for transcriptome sequencing. Sample collection RPs and SPs (n = 18/group) at the seedling stage (3-leaf stage) were selected, and treated with clodinafop-propargyl (R: 45 g.a.i./ha; S: 0.2 g.a.i/ha) (n = 9 plants/group-); the remaining plants were treated with an equal volume of water. After 72 h, the aerial parts were collected (n = 3 plants/group) and immediately cryopreserved in liquid nitrogen. Afterward, samples were taken at the tillering (four tillers, n = 3 plants/group) and flowering (early flowering, n = 3 plants/group) stages in the same manner and stored in liquid nitrogen. Genomic library construction and sequencing The total RNA was extracted using TRIzol (Invitrogen Inc., Carlsbad, CA, USA) and DNase I (Takara, Otsu, Japan) in accordance with the manufacturer’s instructions. Magnetic beads with oligo (dT) (Takara, Otsu, Japan) were used to isolate mRNA, and the mRNA was mixed together with fragmentation buffer by a ThermoMixer (Thermo Fisher Scientific, Waltham, MA, USA), breaking the mRNA into short fragments. cDNA was synthesized using the RevertAid First Strand cDNA Synthesis Kit (Fermentas, Thermo Fisher Scientific, Waltham, MA, USA) in accordance with the manufacturer’s instructions. Short fragments were purified and resolved with EB buffer (10 mM Tris·Cl, pH 8.5) to repair their ends by the addition of a single adenine nucleotide. The short fragments were then connected with adaptors (BGI, Beijing, China). Suitable fragments were selected as templates for PCR amplification. The constructed sample library was quantified and characterized using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and an ABI StepOnePlus Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) and then sequenced using a HiSeq 4000 system (Illumina, Inc., San Diego, CA, USA). Sequencing and de novo assembly To obtain high-quality clean reads for de novo assembly, the raw reads were generated from transcriptome sequencing in accordance with the following steps. First, the adaptor sequences were removed. Then, reads with more than 5% of unknown nucleotides were removed and reads with more than 50% of low-quality bases (base quality ≤ 20) were discarded. The clean reads that remained were assembled into unigenes using the Trinity software with an optimized K-mer length of 25 for de novo assembly, as previously published [[48]18]. The expression of unigenes was calculated via the reads per kb per million reads (RPKM, ≥ 0.5), which is a general method of quantifying gene expression from RNA sequencing data by normalizing for total read length and the number of sequencing reads [[49]19]. Data analysis Genes expressed at different levels in RPs and SPs (i.e., differentially expressed genes (DEGs)) were subjected to Gene Ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. We used getorf software ([50]http://emboss.bioinformatics.nl/cgi-bin/emboss/getorf) to identify the open reading frame (ORF) of each unigene. Then, the ORFs were aligned to the transcription factor (TF) domains using hmmsearch software ([51]http://hmmer.org/) [[52]20]. The false discovery rate (FDR) statistical method was used in multiple hypothesis testing to correct the P-value. A smaller FDR and larger ratio indicate a greater difference in the expression level between two samples. In this analysis, we chose samples with an FDR ≤ 0.001 and a ratio greater than 2. Differential gene expression analysis Gene expression levels were estimated using RSEM for each sample, as previously described [[53]21]. Clean reads were mapped back onto the assembled transcriptome, and read count for each gene was obtained from the mapping results and normalized as FPKM (fragments per kilobase of transcript per million mapped reads). Differential expression analysis of the two groups was performed using the DESeq 2. The resulting P-values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate. Genes with an adjusted P-value < 0.05 (fold change ≥ 2) found by DESeq were determined as being differentially expressed. Differentially expressed genes (DEGs) were analyzed by GO and KEGG enrichment analysis. GO and KEGG enrichment analyses All DEGs were mapped to terms in the GO database ([54]http://www.geneontology.org//) and the number of genes corresponding to each GO term was calculated. We established a gene list and gene numbers for each GO term and then used a hypergeometric test to identify DEG GO terms whose enrichment was significantly greater than that in the genome background. The KEGG pathway database contains networks of molecular interactions in cells and variants specific to particular organisms. We used pathway enrichment analysis to identify DEG metabolic pathways or signal transduction pathways whose enrichment was significantly greater than that in the whole genomic background. After multiple corrections, we selected pathways with an FDR value ≤ 0.001 to represent pathways significantly enriched in DEGs. Real-time PCR The total RNA was extracted as described above and cDNA was synthesized using an M-MLV Rtase Kit (Thermo Fisher Scientific, Waltham, MA, USA) in accordance with the manufacturer’s instructions. The qRT-PCR mix (25 μl) contained 12.5 μl of SYBR Green Mix (Thermo Fisher Scientific, Waltham, MA, USA), 0.5 μl of each primer (10 μM), 2 μl of cDNA, and 9.5 μl of RNase-free water. The reaction was performed on an ABI 7300 real-time PCR system (Applied Biosystems, Foster City, CA, USA). The qRT-PCR program consisted of 95 °C for 10 min, 40 cycles of 95 °C for 15 s and 60 °C for 45 s, and finally 60 °C for 15 s. EF1 was used as a reference gene for normalization. GraphPad Prism 5 software (GraphPad Software, Inc., La Jolla, CA, USA) was used for data analysis. Expression was calculated in accordance with the 2^-ΔΔCt method. Each experiment was repeated at least three times and consisted of three replicates. Primer sequences are listed in Additional file [55]1: Table S1. Data access The raw reads have been deposited in the NCBI Sequence Read Archive (SRA) database (BioProject PRJNA385696, SRP106591). The data are available at [56]https://trace.ncbi.nlm.nih.gov/Traces/sra_sub/sub.cgi?acc=SRP106591 &focus=SRP106591&from=list&action=show:STUDY. Results Polypogon fugax Transcriptome sequencing and data assembly Compared with P. fugax SPs, RPs flowered 10-15 days earlier, and their inflorescence was morphologically altered, RPs produced 1.9 times more seed than did SPs Additional file [57]2: Figure S1). Therefore, the transcriptome of SPs and RPs were compared to explore the potential genes involved in this process. Strand-specific RNA-Seq was applied to assess RNAs from three pairs of RPs and SPs at the seedling, tillering, and flowering stages with or without clodinafop-propargyl treatment (seedling stage) to comprehensively identify the unigenes associated with herbicide resistance. The sequencing reads containing low-quality, adaptor-contaminated, or high contents of unknown base (N) reads were removed before downstream analyses. Afterward, 150-base single-end sequence raw reads were subjected to quality control using the Phred scaled quality score. Overall, 1.39 billion raw reads and 1.07 billion clean reads (average clean read ratio of 77.05%) were obtained; 92.8% of the clean reads had a quality score ≥ 30, and 97.7% of the clean reads were quality filtered and matched the Illumina’s quality requirements. Read quality metrics after filtering are shown in Additional file [58]3: Table S2. De novo assembly of the 150-base reads yielded 206,041 unique sequences ranging from 300 to 3000 nt in length (Table [59]1) (including 14,166 unigenes with sequences of up to 3000 nt in length) Additional file [60]3: Table S3). The length distribution of the assembled contigs is shown in Additional file [61]4: Figure S2. Among the detected 206,041 unigenes, 165,901 unique sequences were annotated based on BLASTX alignment (E-value <0.00001) searches of seven databases: the NCBI non-redundant (NR), NCBI non-redundant nucleotide (NT), Swiss-Prot protein, KEGG, Cluster of Orthologous Groups of proteins (COG), InterPro protein, and GO databases (Additional file [62]3: Table S4). The 153,591 unique sequences were annotated by reference to the NR database, and then compared to those encoded in the genomes of all grass (Poaceae) species whose genome is fully sequenced, i.e., Brachypodium distachyon, Hordeum vulgare, Aegilops tauschii, and Triticum urartu. (Additional file [63]5: Figure S3). Table 1. Summary of Polypogon fugax transcriptome sequencing, assembly, and annotation Reference transcriptome Total clean reads 275,659,060 Assembled unigenes 206,041 Average read length 1337 N50 contig size 1851 N90 contig size 705 Annotation in NR 153,591 Annotation in NT 145,977 Annotation in KO 121,526 Annotation in SwissProt 111,290 Annotation in GO 80,312 Annotation in COG 74,434 Annotation in Interpro 93,000 [64]Open in a new tab NR, NCBI non-redundant protein sequences database NT, NCBI nucleotide sequences database KO, KEGG (Kyoto Encyclopedia of Genes and Genomes) Orthology database GO, Gene Ontology COG, Clusters of Orthologous Groups of Proteins Annotation of assembled unigenes To further examine the integrity and effectiveness of the annotation process, the number of unigenes (that have NR matches) with a COG classification was calculated. A total of 74,434 unigenes were identified with a COG classification (Additional file [65]3: Table S5). Among the 25 COG categories, the cluster of “General function prediction only” had the highest number (21,107, 28.36%), followed by “Transcription” (15,595, 20.95%), and “Function unknown” (14,816, 19.90%). Categories of “Extracellular structures” (86, 0.001%) and “Nuclear structure” (11, 1.48 e^−4) had the fewest matching genes (Additional file [66]6: Figure S4). GO and KEGG enrichment analyses were used to classify the functions of the predicted P. fugax unigenes. Based on homologous genes, 80,312 sequences (Additional file [67]3: Table S6) from all unigenes of 36 P. fugax libraries were categorized into 56 GO terms comprising three domains: biological process, cellular component, and molecular function (Fig. [68]1). Most were categorized in “cellular process”, “metabolic process”, “cell”, and “cell part”. A high percentage of genes were also assigned to “binding”, “catalytic activity”, “organelle”, and “membrane” as well as “biological regulation”, “development process”, “transporter activity”, and “reproductive process” (Fig. [69]1). Fig. 1. Fig. 1 [70]Open in a new tab Histogram of GO classification. Based on the homologous genes, 80,312 sequences from all unigenes of 36 P. fugax libraries were categorized into 56 GO terms comprising three domains: biological process, cellular component, and molecular function. The majority of the terms were categorized as “cellular process”, “metabolic process”, “cell”, and “cell part”. A high percentage of genes were also assigned to “binding”, “catalytic activity”, “organelle”, and “membrane”, and as well as “biological regulation”, “development process”, “transporter activity”, and “reproductive process” There were 121,526 unigenes that mapped onto KEGG pathways (Additional file [71]3: Table S7). A total of 53,337 annotated unigenes between NR, COG, KEGG, SwissProt, and InterPro databases were identified with Venn diagrams (Fig. [72]2). Fig. 2. Fig. 2 [73]Open in a new tab Venn diagram of the NR, COG, KEGG, SwissProt, and InterPro databases. A total of 53,337 annotated unigenes between the NR, COG, KEGG, SwissProt, and InterPro databases were identified by Venn diagram TF prediction of assembled unigenes Next, we studied unigenes that encoded TFs. The list of unigenes that encode TFs is shown in Additional file [74]3: Table S8. We also performed TF family classification, and found that 5904 unigenes were classified into 58 TF families (Fig. [75]3). Among those families, the MYB family had the highest number (742, 12.57%), followed by the MYB-related family (604, 10.23%) and the AP2-EREBP family (453, 7.67%). Genes of the MADS-box family were also found (131, 0.02% %); these genes are associated with plant development and adversity responses. Fig. 3. Fig. 3 [76]Open in a new tab TF family classification of unigenes. In total, 5904 unigenes were classified into 58 TF families. Among these families, the MYB family represented the highest number (742, 12.57%), followed by the MYB-related family (604, 10.23%) and the AP2-EREBP family (453, 7.67%). Genes of the MADS-box family, which are associated with plant development and adversity responses, were also identified (131, 0.02%). The X axis represents the number of unigenes. The Y axis represents the TF family. “All-Unigene” indicates that the unigenes were those assembled from the 36 samples of Polypogon fugax qRT-PCR validation of P. fugax expression data To verify the gene expression patterns, qRT-PCR analyses were performed on Unigene12462, CL1441.Contig20, CL21112.Contig3, CL4600.Contig9 and CL4600. Contig2, CL12188.Contig2, and EF1 served as candidate reference genes for RT-PCR normalization. The expression of CL12188.Contig2, Unigene12462, and CL1441.Contig20 was higher in the treated sensitive population when flowering (TFS) than in the untreated sensitive population when flowering (UFS). CL4600.Contig2 and CL21112.Contig3 were higher in the UFS than in the TFS. These results confirmed the reliability of the data (Fig. [77]4). Fig. 4. Fig. 4 [78]Open in a new tab qRT-PCR validation of RNA-Seq results. qRT-PCR analysis of six randomly selected genes was conducted to confirm the expression patterns indicated by sequencing. The expression of CL12188.Contig2, Unigene12462, and CL1441.Contig20 was higher in the TFS than in the UFS. CL4600.Contig2 and CL21112.Contig3 was higher in the UFS than in the TFS. These results confirmed the reliability of the data. The error bars represent the standard error of the mean. U: untreated; T: treated; F: flowering stage; S: sensitive population; R: resistant population. *significant difference p>0.05, **significant difference 0.010.05, **significant difference 0.01