Abstract Background The expression profiles of differentially expressed genes (DEGs) during pupal development have been demonstrated to be vital in age estimation of forensic entomological study. Here, using forensically important Aldrichina grahami (Diptera: Calliphoridae), we aimed to explore the potential of intrapuparial stage aging and postmortem interval (PMI) estimation based on characterization of successive developmental transcriptomes and gene expression patterns. Methods We collected A. grahami pupae at 11 successive intrapuparial stages at 20 °C and used the RNA-seq technique to build the transcriptome profiles of their intrapuparial stages. The DEGs were identified during the different intrapuparial stages using comparative transcriptome analysis. The selected marker DEGs were classified and clustered for intrapuparial stage aging and PMI estimation and then further verified for transcriptome data validation. Ultimately, we categorized the overall gene expression levels as the dependent variable and the age of intrapuparial A. grahami as the independent variable to conduct nonlinear regression analysis. Results We redefined the intrapuparial stages of A. grahami into five key successive substages (I, II, III, IV, and V), based on the overall gene expression patterns during pupal development. We screened 99 specific time-dependent expressed genes (stage-specific DEGs) to determine the different intrapuparial stages based on comparison of the gene expression levels during the 11 developmental intrapuparial stages of A. grahami. We observed that 55 DEGs showed persistent upregulation during the development of intrapuparial A. grahami. We then selected four DEGs (act79b, act88f, up and ninac) which presented consistent upregulation using RT-qPCR (quantitative real-time PCR) analysis, along with consideration of the maximum fold changes during the pupal development. We conducted nonlinear regression analysis to simulate the calculations of the relationships between the expression levels of the four selected DEGs and the developmental time of intrapuparial A. grahami and constructed fitting curves. The curves demonstrated that act79b and ninac showed continuous relatively increasing levels. Conclusions This study redefined the intrapuparial stages of A. grahami based on expression profiles of developmental transcriptomes for the first time. The stage-specific DEGs and those with consistent tendencies of expression were found to have potential in age estimation of intrapuparial A. grahami and could be supplementary to a more accurate prediction of PMI. Keywords: Postmortem interval estimation, Forensic entomology, Pupal development, Age estimation, Transcriptome, Aldrichina grahami 1. Introduction The close association between dead bodies and necrophagous insects and the use of experimental data from necrophagous insects in practical case work is the basis of forensic entomology [[43]1,[44]2]. Under certain conditions, insect evidence that has been found at the scene of a crime can provide information on manner, time and place of death [[45]3]. The predominant task in forensic entomology is estimating the age of the oldest insect evidence that have developed on the corpses at death scenes to determine the time since death through constantly collecting the developmental data of different species of necrophagous insects in various environments and analyzing the composition of the chronological insect succession patterns on the corpses [[46]3,[47]4]. The postmortem interval (PMI) refers to the time between death and the discovery of a corpse [[48]5]. The estimation of PMI continues to be one of the most important yet complicated problems in forensic science, despite the existence of quite a few macroscopic and microscopic approaches [[49]6]. The minimum postmortem interval (PMI[min]) is often referred to as the time from when a female adult insect first arrives to a body after death to the recovery of that body [[50]7]. The developmental time of necrophagous insects are excellent predictors for determining PMI[min] [[51]8]. Blow flies (Diptera: Calliphoridae) are characteristically the first visitors to the corpses as they are capable of detecting and colonizing remains within minutes of death; thus, they are the focus of PMI[min] estimation [[52]9]. Therefore, calliphorids are of considerable importance in forensic practice. The developmental time of the immature stages of blow flies can be divided into three stages: eggs, larvae, and pupae [[53]10]. The larval developmental time can be divided into three stages (first-instar, second-instar and third-instar) and can be evaluated through morphological characteristics such as development of posterior spiracles and cephalopharyngeal skeleton [[54]11], and variations of width, length, and weight. The immature developmental stages comprise more than 50 % of the complete life cycle of the blow flies; however, the intrapuparial stage refers to the whole process from pupariation to emergence and occupies about 50 % of the immature development. The intrapuparial period takes a relatively long time (can last for up to several days or even several weeks at low temperatures for specific species), during which morphological changes are not easily visible to the unaided eyes, making accurate pupae age estimation extremely difficult [[55]12,[56]13]. In other words, the accurate estimation of the duration of the intrapuparial stages would improve PMI prediction. Traditionally, there are several methods which are useful for pupae age estimation. Initially, pupae found at the crime scene are reared under controlled laboratory conditions to adulthood for subsequent age estimation. However, this method is time-consuming and relies on rearing environment and conditions, or just be infeasible when relevant specimens die during the rearing process ^[12、14]. Although the variation of puparium color has been suggested as a means for pupae age estimation, this approach is only useful within the first few hours of puparium formation, with color changes no longer being predictable after a set amount of time [[57]14,[58]15]. In addition, the morphological characteristics can only be scrutinized through a time-consuming and laborious preparation of the insect specimens; most importantly, the assessment of morphological changes within the puparium relies heavily on the experimenters and requires entomological experience about pupal development, and especially hard to quantify (though it is the most economical and easiest approach) [[59]16,[60]17]. Thus, it would be necessary to develop approaches that could be carried out using standard experiments and would produce more standardized and quantifiable results for the reliable pupae age estimation of insects of forensic importance, especially when pupae represent the oldest specimens or are the only available sources of entomological evidence at a crime scene [[61]18]. Metamorphosis during the intrapuparial stages comprises an integrated set of developmental processes that are controlled by a series of cellular divisions, cellular differentiations, cell deaths, and tissue remodeling. Additionally, there are regular changes in the expression of many differentially expressed genes (DEGs) that are involved in the control and execution of metamorphosis [[62]19,[63]20]. The data of the analysis of DEGs during metamorphosis could be quantitative and allows easy standardization. In addition, for non-insect experts such as the police or investigators, without the participation of insect experts, molecular methods could be regarded as objective and effective approaches. Under various conditions, three blind studies had been conducted to prove how the age of Calliphora vicina (Diptera: Calliphoridae) during the intrapuparial stage can be well evaluated based on data of gene expression patterns using an inverse prediction; the results show that the estimated minimum age approximated the true minimum age by at least 1.0 ± 1.6 days (median ± standard deviation) and the estimated maximum age approximated the true maximum age by up to 1.6 ± 1.7 days (median ± s.d.) [[64]9]. The analysis of DEGs and specific time-dependent (stage-specific) expressed genes could form the basis of a measurable method for more accurate age estimation of immature necrophagous insects, thereby facilitating the possibility of more accurate PMI prediction in future forensic investigations [[65]21,[66]22]. Molecular genetic researches on Drosophila melanogaster (Diptera: Drosophilidae) have led to remarkable progresses in our comprehension of developmental gene regulation, and the expression patterns of nearly one-third of all Drosophila genes during the entire developmental time have been reported [[67]23]. In addition, the first use of DEGs to estimate the age of insects associated with a corpse was by Tarone in 2007, whereby the expression of three genes (bcd, sll and cs) was profiled during the process of maturation of eggs to verify the practicability of age prediction [[68]24]. The expression profiles of DEGs during developmental processes have been used for the age estimation of forensically important Lucilia sericata (Diptera: Calliphoridae) [[69]22, [70]24] and C. vicina [[71]18, [72]25]. Notably, some candidate genes have also been identified through the transcriptome analysis of L. sericata [[73]26]. Several of the DEGs in C. vicina intrapuparial stages were found on the basis of transcriptome data analysis, and their use is expected to contribute to more accurate PMI estimation compared with the molecular markers for age determination that have been published so far [[74]27]. The expression profiles of three DEGs (15_2, actin, and tbp) have previously been used to predict age throughout the intrapuparial stages of Lucilia illustris (Diptera: Calliphoridae) metamorphosis [[75]28]. Additionally, four DEGs (Hsp90, A-alpha, AFP, and AFBP) with the potential to be used in accurate age estimation of intrapuparial Sarcophaga peregrina (Diptera: Sarcophagidae) have been identified [[76]12]. Transcriptome analysis of C. vicina provided 111 stage-specific DEGs that can be used as molecular markers for estimating the age of intrapuparial C. vicina [[77]29]. Collectively, these findings demonstrate that DEGs are vital for pupae age estimation and PMI prediction. The more stable the expressions of the DEGs, the more accurate the estimation of pupae age [[78]30]. Aldrichina grahami (Diptera: Calliphoridae), a common species of necrophagous insects of forensic importance that is situated predominantly in Asia, was first reported by Aldrich in 1929 [[79]31,[80]32]. A. grahami has since been useful in forensic investigations [[81]33]. A. grahami also adapts to a broad range of environmental temperatures and particularly exhibits a characteristically high tolerance to cold, which makes it active (i.e., the first and dominant species of necrophagous insects to detect and colonize cadavers) in early spring and late autumn when the relatively low temperatures cause other necrophagous insects to be almost inactive [[82]34]. Compared with other necrophagous insect species, the prominent advantage of low-temperature tolerance of A. grahami is not possessed by the vast majority of blow flies, and thus allows it to be applied to PMI estimation in cold environments [[83]8]. In the present study, a comprehensive analysis of the gene expression profiles of successive developmental intrapuparial stage samples of A. grahami was performed. We redefined the successive developmental intrapuparial stages of A. grahami into the key substages, based on the overall gene expression patterns for the first time. We screened stage-specific DEGs to determine the different developmental intrapuparial stages and obtained DEGs with stable expression tendencies and high potential for age estimation. Finally, we constructed a model that could be applied to the age estimation of forensically important intrapuparial blow flies based on the relationships between the gene expression levels of selected and verified marker DEGs and the aging of intrapuparial A. grahami, which could facilitate accurate pupae age estimation and thus accurate PMI determination. 2. Materials and methods 2.1. Fly rearing and specimen collection [[84]34,[85]35] The adult specimens of A. grahami were obtained using pig lung in Changsha (28°12′N, 112°58′E), Hunan province, China, in November 2019. Before rearing, species identification was conducted according to the morphological characteristics of adults. The adult specimens were reared in an insect cage (35 cm × 35 cm × 35 cm), in which a nest feeder was used to provide fresh drinking water, and one dish (12 cm) contained soybean milk powder as food for adults, while another dish (12 cm) contained fresh pig lung to induce oviposition. After 4 h, the eggs (500–1000 individuals) were taken out and reared (enough of fresh pig lung was supplemented every 24 h until pupation) in a plastic bowl (18 cm × 5 cm); then, the bowl was placed in a box (25 cm × 25 cm × 12 cm) that was filled with 3–4 cm of fine sand to cover the bottom of the box, and this box was placed in a new insect cage. The combination of the nest feeder, two dishes, plastic bowl, box, and two insect cages for rearing eggs, larvae, and adults was placed in an artificial climate box with a constant temperature of 20 °C, 75 % relative humidity, and a 12 h:12 h light/dark cycle. The A. grahami population used for this experiment was established through over 6 generations in subculture. The intrapuparial A. grahami samples were collected after about 50 % of the larvae were pupated for 8 h and the pupae age was set to 8 h. We collected 10 available pupae as one sample at 24 h (1 d) intervals from the time point of 8 h until more than five occurrences of emergence were observed. All of the samples were gathered in 5 mL cryovials and immediately frozen in liquid nitrogen and stored at −80 °C until use (total RNA extraction). In total, three biological replications from each interval were gathered from all samples. Pupal development lasted nearly 11 d at 20 °C, and a total of 330 pupae were collected. We divided and defined the 11 d of A. grahami pupal development into 11 successive developmental intrapuparial stages (with a 24 h interval), denoted A to K. That was to say, stages A-K correspond to 8 h, 32 h, 56 h, …, and 248 h (pupae age), respectively, and were suffixed with 1, 2, and 3 according to the three biological replications (e.g., “A-1, A-2, and A-3”). 2.2. Total RNA extraction, library construction and transcriptome sequencing [[86]36] Total RNA extraction of the collected samples used a TRIzol Reagent Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol, and the extracted total RNA was stored at −80 °C until analysis. The RNA quality was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). The RNA integrity and DNA contamination were examined using 1 % RNase free agarose gel electrophoresis. The extracted total RNA was treated by removing rRNA using Ribo-Zero™ Magnetic Kit (Epicentre, Madison, WI, USA) and enriching poly-A tail mRNA with magnetic beads with Oligo(dT), and then obtained the desired mRNA. Subsequently, the enriched mRNA was split into short fragments using fragmentation buffer and reverse transcribed into double-stranded cDNA using random primers. The second-strand cDNA was synthesized using buffer, dNTP, RNase H, and DNA polymerase I. The cDNA fragments were subsequently purified using QIAquick PCR Purification Kit (Qiagen, Venlo, The Netherlands), end-repaired, mixed with poly-A tail and ligated to Illumina sequencing adapters. These modified products were then selected according to length using 1 % agarose gel electrophoresis, PCR amplified, and sequenced by the Gene Denovo Biotechnology Company (Guangzhou, China) using Illumina HiSeq2500. 2.3. Sequence read mapping and assembly [[87][37], [88][38], [89][39]] Firstly, the raw reads were filtered by removing reads that contained adapters, reads that contained more than 10 % unknown nucleotides (N), and low-quality reads that contained more than 50 % low-quality (Q-value ≤20) bases using fastp (version 0.18.0). Then, the high-quality clean reads (reads that contained less than 20 % incorrectly bases, i.e., quality score of at least Q20) were obtained. Additionally, we calculated the Q20, Q30, GC-content and sequence duplication levels of the clean data. All of the subsequent analyses were conducted using the high-quality clean reads. Then, Bowtie2 (version 2.2.8), a short read alignment tool, was used to map the reads to the A. grahami ribosomal RNA (rRNA) database. The mapped rRNA reads were removed, and the remaining high-quality clean reads were further used for assembly and gene abundance calculations. Lastly, an index of the reference genome was built, and the paired-end clean reads were mapped to the reference A. grahami genome using HISAT2.2.4, with “-rna-strandness RF” and other parameters set as default. 2.4. Differential expression analysis [[90]40,[91]41] Gene differential expression analysis between two groups was carried out using the parameter FPKM (fragments per kilobase of transcript per million mapped reads), which was relevant for quantifying the gene expression levels. HTseq was applied to the count calculations and FPKM was calculated using the NCBI gtf file through gene length annotation. RNA differential expression analysis between two groups was carried out using DESeq2 software. The characteristics of false discovery rate (FDR) < 0.05 and absolute fold change ≥2 of genes/transcripts were considered to be differentially expressed. 2.5. Sample relation analysis [[92]42] A correlation analysis was conducted using the correlation coefficient R. The correlation coefficients of two parallel experiments provide an estimation of the reliability of the results of the experiments as well as the stability of the operation. The Pearson correlation coefficients of two parallel experiments were calculated to evaluate the reliability of the biological replications. The closer the Pearson correlation coefficient was to 1, the higher the repeatability between two parallel experiments. Principal component analysis (PCA), a statistical procedure for dimension reduction that converts large and complex correlated variables (gene expression levels) into sets of values for linearly uncorrelated variables (called principal components), is often applied to reveal the structures/relationships between samples/data. In this study, the PCA was performed using pcaMethods. 2.6. KEGG pathway enrichment analysis [[93]43] The complex interaction among genes plays a significant role in certain biological functions. Pathway-based analyses help to find out more about the biological functions of genes. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a bioinformatics database resource that can be used to explore the high-level biological functions and practicalities of biological systems. In this study, KEGG pathway enrichment analysis was used to recognize the remarkably enriched metabolic pathways and signal transduction among DEGs. A hypergeometric test was used to identify the crucial enrichment pathways, and the calculated p-values were subjected to FDR correction, taking FDR ≤0.05 as the threshold. Pathways which met the condition were defined as significantly enriched pathways. 2.7. Identification and co-expression of DEGs [[94]44,[95]45] Weighted gene co-expression network analysis (WGCNA), which is a systems biology method, provides the R functions for weighted correlation network analyses, which can be used to find clusters (modules) of highly correlated genes, summarize such clusters using module eigengenes, relate clusters to each another and to other external sample traits, and calculate module membership measures. In this study, the gene co-expression network was constructed using WGCNA of the PFKM change tendencies of 33 samples across the 11 successive developmental intrapuparial stages of A. grahami. After filtering out the less abundant genes, the remaining genes were clustered according to WGCNA (power value: 8; minimum module genes: 50; maximum modules: 20) based on similar expression patterns. The module eigengenes were calculated for each module to find the vital module that had the strongest correlation with the developmental time of the pupae. In the present study, stage-specific DEGs were defined as genes that exhibit remarkably high expression levels during specific developmental stages compared with all other developmental intrapuparial stages. Then, screening criteria were introduced to identify the stage-specific DEGs under the following conditions: firstly, the elimination of the less abundant genes (FPKM <1); the gene expression in the specific developmental stage (FPKM[n]) was more than seven times the average gene expression in all other developmental intrapuparial stages (FPKM[v]), i.e., FC= FPKM[(n/v)] > 7; finally, the sum of the gene expression levels (s-FPKM) of the DEGs in all stages was more than 22, i.e., s-FPKM >22. All data analyses (after screening) were performed using Microsoft Excel 2007, and heatmaps representing the expression levels of the DEGs in all developmental intrapuparial stages were prepared using the Omicshare tool. Series cluster analysis was carried out using STEM to identify the trends of the DEGs based on the FPKM change tendencies of the genes during the 11 successive developmental intrapuparial stages in the object module (i.e., the module that had the strongest correlation with the developmental time of the pupae). The steadily upregulated or downregulated DEGs were finally filtered as candidate genes for the estimation of the developmental time of pupae and, further, for the estimation of PMI. 2.8. Quantitative real-time PCR analysis [[96]46,[97]47] We processed the same RNA samples that were used in transcriptome sequencing using quantitative real-time PCR (RT-qPCR) to evaluate the reliability of the sequencing and analyses. The RT-qPCR reactions were performed on a 7500 Real-Time PCR System (Applied Biosystems) using the 2 × T5 Fast qPCR Mix (SYBR Green I) (Qingke Biotechnology Co., Ltd. Hunan, China) according to the manufacturer's protocols. Each RT-qPCR amplification mixture (20 μL) contained the 2 × T5 Fast qPCR Mix (10 μL), forward primer (0.5 μL), reverse primer (0.5 μL), template cDNA (2 μL) and ddH[2]O (7 μL). The PCR reactions were set as follows: initial denaturation at 95 °C for 2 min; followed by 40 cycles of 95 °C for 30 s, 60 °C for 30 s, and 72 °C for 30 s; and melting curve analysis. GST1 was selected as the internal reference gene. Primer Premier 5 was used to design the primers for all selected genes. The expression of the DEGs was standardized against the reference gene, and the 2^−△△Ct method was used to calculate the relative quantification (RQ) of the gene expression levels. Afterward, a log-transform to the base of 2 (log[2](RQ)) was used to normalize the FC (fold change, i.e., the relative change in gene expression) values of the DEGs (i.e., log[2](RQ) = log FC); thus, positive and negative values represent upregulation and downregulation, respectively, of the DEGs. The relationships between the DEG expression levels and the age of intrapuparial A. grahami were ultimately analyzed by categorizing the DEG expression levels (log FC) as the dependent variables and the age of intrapuparial A. grahami (i.e., the developmental time of the pupae) as the independent variable to conduct nonlinear regression analysis to simulate the calculations that would be used for PMI prediction. All statistical analyses of the data were conducted using SPSS version 22.0, and OriginPro version 8.6. 3. Results 3.1. Overview of RNA-seq data In total, 33 samples (Additional File 1: [98]Table S1) were sequenced from the A.grahami samples at 11 successive developmental intrapuparial stages (A-K). A total of 1567.38 Mb of raw reads was generated. After removing reads that contained adapters or low-quality sequences, a mean of 47.30 Mb clean reads was obtained for each sample. The mean Q30 value was approximately 94.03 % for each sample. The details of the data quality control are shown in Additional File 2: [99]Table S2. In total, 13,719 genes were annotated, including 12,823 known genes and 896 predicted novel genes. The sums of the FPKM values for each developmental intrapuparial stage (A-K) were calculated as 748801.246, 867756.916, 794180.623, 805456.817, 848336.911, 859134.854, 857455.297, 900073.219, 929483.831, 951515.667 and 925887.348, respectively (Additional File 3: [100]Table S3). 3.2. Sample relation analysis The correlations between the transcript expression levels are crucial indicators for the reliability of the results of the experiments, and we found that the Pearson Correlation Coefficients between the three biological replications of the 11 stages in the present study all had high repeatability (i.e., all R^2 values ≥ 0.890; [101]Fig. 1A and Additional File 4: [102]Table S4). According to our PCA results, the replications of each stage were well clustered ([103]Fig. 1B). In relation to the heatmaps and PCA plots, the whole pupal development process could be divided into five cluster: cluster I (stage A), cluster II (stage B), cluster III (stages C-G), cluster IV (stage H), and cluster V (stages I–K). Fig. 1. [104]Fig. 1 [105]Fig. 1 [106]Open in a new tab [107]Fig. 1A The Pearson correlation coefficients of three biological replications of 11 successive developmental intrapuparial stages (A–K) of A.grahami at 20 °C. The color red and color blue show the highest and lowest correlations between every two samples, [108]respectively: the darker the color, the stronger the correlation between the two samples. [109]Fig. 1B A PCA plot that shows the gene expression profiles at successive developmental intrapuparial stages of A. grahami. The three points of each triangle represent the three biological replications, and the point of gravity of each triangle represents the average FPKM value of the three repeated samples. 3.3. DEGs involved in the 11 successive developmental stages Under set conditions (|log[2]FC| > 1 and FDR <0.05), we filtered the DEGs to find out the differences between the developmental intrapuparial stages. We identified 1546, 1795, 814, 633, 294, 276, 1116, 1628, 771, and 964 DEGs in the comparisons of A vs. B, B vs. C, C vs. D, D vs. E, E vs. F, F vs. G, G vs. H, H vs. I, I vs. J, and J vs. K, respectively ([110]Fig. 2A). We also identified 1546, 2137, 3694, 4871, 2034, 3786, 4570, 2126, 5499, and 3044 DEGs in the comparisons of I vs. II, I vs. III, I vs. IV, I vs. V, II vs. III, II vs. IV, II vs. V, III vs. IV, III vs. V, and IV vs. V, respectively ([111]Fig. 2B). Fig. 2. [112]Fig. 2 [113]Fig. 2 [114]Open in a new tab Fig. 2A The number of DEGs in the comparison stages. The upregulated (red) and downregulated (blue) DEGs are presented in the histogram. [115]Fig. 2B The number of DEGs in the comparison clusters. The upregulated (red) and downregulated (blue) DEGs are presented in the histogram. 3.4. KEGG pathway enrichment analysis of DEGs KEGG pathway enrichment analyses (excluding the pathways related to human diseases) were conducted on the DEGs in four comparison groups (I vs. II, II vs. III, III vs. IV, and IV vs. V), and the following were observed: lysosome and metabolic pathways are obviously enriched in comparison group I vs. II; phagosome, phototransduction-fly, cutin, suberin, and wax biosynthesis pathways are markedly enriched in comparison group II vs. III; oxidative phosphorylation, thermogenesis, and metabolic pathways are significantly enriched in comparison group III vs. IV; oxidative phosphorylation, and thermogenesis are remarkably enriched in comparison group IV vs. V (Additional File 5: [116]Fig. S1). 3.5. Weighted gene co-expression network analysis (WGCNA) After filtering out the less abundant genes (FPKM <1), a total of 11378 genes clustered into 20 modules with different expression patterns through our WGCNA (Additional File 6: [117]Fig. S2). The number of genes differed in each module (Additional File 7: [118]Fig. S3); therefore, several genes whose patterns of expression were found to be closely correlated with the developmental time of pupae were identified. The genes in the light green, pink, royal blue, black, salmon, dark orange, grey, dark magenta, steel blue, saddle brown, and midnight blue modules showed the highest expression levels in stages A, B, C, D, E, F, G, H, I, J, and K, respectively (Additional File 8: [119]Fig. S4). Additionally, the genes in the purple module displayed the strongest correlation with the developmental time of the pupae (PCA = 0.93; p-value = 0) (Additional File 9: [120]Fig. S5). The eigengenes of different samples in the purple module are presented in the histogram in [121]Fig. 3, which demonstrates that these genes show consistent and stable upregulation in expression according to the developmental time of the pupae. Therefore, the genes that showed high expression levels in each module and the genes clusters in the purple module were selected for subsequent analyses. Fig. 3. [122]Fig. 3 [123]Open in a new tab The gene expression profiles of eigengenes in the purple module according to developmental stages. 3.6. 3.6 identification of stage-specific DEGs According to the WGCNA results, we found that the genes in 11 modules with different colors showed the highest expression levels in 11 successive developmental intrapuparial stages, respectively. Under set conditions (FPKM ≥1, FC= FPKM[(n/v)] > 7 and s-FPKM >22), the genes in 11 modules were screened manually and finally 99 stage-specific DEGs were obtained to determine the differences between the developmental intrapuparial stages. The screened stage-specific DEGs show significantly higher expression levels (red) in specific stages than in the other stages ([124]Fig. 4). The number of stage-specific DEGs that were identified in stages A, B, C, D, E, F, G, H, I, and K was 9, 25, 11, 1, 8, 3, 2, 2, 3, 4, and 31, respectively. Stages A, B, C, and K contained the largest numbers of DEGs. According to the results of our gene annotation, the stage-specific DEGs identified in the different developmental intrapuparial stages are mainly involved in encoding lysozyme proteins, larval cuticular proteins, pupal cuticular proteins, mitochondrial respiratory chains, and heat shock proteins (HSPs). Fig. 4. [125]Fig. 4 [126]Open in a new tab The gene expression levels of 99 stage-specific DEGs. Each vertical color block represents a developmental intrapuparial stage (A–K). The screened stage-specific DEGs show the highest (red) and lowest (blue) expression levels in specific stages. The IDs and annotation information are shown on the right-hand side of each gene. 3.7. Identification of consistently upregulated or downregulated DEGs From the WGCNA results, the purple module was selected as the object module for further study on account of it having the strongest correlation with the developmental time of the pupae. The expression trend analysis of the genes in the purple module was carried out using STEM, and a total of 1165 genes were ultimately clustered into 20 profiles (profiles 0–19). Profile 0 contains 257 genes with a decreasing trend and profile 19 contains 767 genes with an increasing trend. To obtain DEGs with more stable expression trends, we screened the genes in the purple module using the following screening criteria: the gene expression levels were consistently more or less than 0 in each specific developmental intrapuparial stage compared with the previous stage. There were 55 DEGs that showed persistent upregulation during the pupal development of A. grahami ([127]Fig. 5), but no genes were found to be stably downregulated. These persistently upregulated genes could have great potential in PMI prediction. Fig. 5. [128]Fig. 5 [129]Open in a new tab The expression profiles of steadily upregulated DEGs at each developmental stage of intrapuparial A. grahami. 3.8. RT-qPCR analysis In total, four DEGs were screened to validate whether the sequencing and analyses are reliable. The four DEGs (act79b, act88f, up and ninac), which presented consistent upregulation, were used in our RT-qPCR analysis, along with consideration of the maximum fold changes during stages A (8 h), C (56 h), E (104 h), G (152 h), I (200 h), and K (248 h) of pupal development. Details of the gene expression levels detected using RT-qPCR analysis are shown in Additional File 10: [130]Table S5. The detailed primer pairs used in the present study are shown in Additional File 11: [131]Table S6. Comparative analysis of all of the screened genes showed similar expression patterns in the RT-qPCR analysis as were observed in the RNA-seq data ([132]Fig. 6), indicating that the data are reliable. The relationships between the overall DEG expression levels of intrapuparial A. grahami and the developmental time of intrapuparial A. grahami were used to conduct nonlinear regression analysis to simulate the calculations. The regression data were fit using fourth-order polynomials, and the simulation equations are shown in [133]Table 1. The expression levels of the four DEGs are shown in [134]Fig. 7, which demonstrates that act79b and ninac showed continuous relatively increasing levels. The expression level of up first decreased, then immediately showed a steady increase over approximately 3–80 % of pupal development to reach its maximum and then decreased again until the conclusion of pupal development. The expression profile of act88f presented a roughly upward trend with fluctuations. Fig. 6. [135]Fig. 6 [136]Open in a new tab Validation of the RNA-seq data by RT-qPCR. The histograms show four screened DEGs of A. grahami, the red line charts show the FPKM values of the RNA-seq data, and the blue bars show the RT-qPCR results, and the black “T” represented as the mean ± s.d. of the three biological replications. The left-hand Y-axis shows the relative expression levels that were calculated by RT-qPCR, and the right-hand Y-axis shows the FPKM values of the RNA-seq data, and the X-axis shows the developmental time of intrapuparial A. grahami. Table 1. The simulation equations, F values, and coefficients of determination (R^2) of the relationships between the overall DEG expression levels of intrapuparial A. grahami (y) (log FC) and the developmental time (x) (h) of intrapuparial A. grahami. Gene Simulation Equation F R^2 act79b y = 0.33925 − 0.04552x + 0.00111x^2 − 3.09852E − 6x^3 + 1.26126E − 9x^4 96.02 0.997 act88f y = 1.62064 − 0.26689x + 0.00675x^2 − 4.13762E − 5x^3 + 7.88977E − 8x^4 23.81 0.948 up y = 0.31425 − 0.04454x + 0.00106x^2 − 4.60468E − 6x^3 + 5.70154E − 9x^4 67.65 0.982 ninac y = −0.01339 + 0.00506x + 5.50645E − 4x^2 − 3.03718E − 6x^3 + 5.72616E − 9x^4 20.62 0.940 [137]Open in a new tab Fig. 7. [138]Fig. 7 [139]Open in a new tab The overall expression levels of four DEGs according to developmental stages. The log FC values of four DEGs were averaged and plotted against the developmental time of intrapuparial A. grahami. The regression data were fit using fourth-order polynomials. The dots represent the mean values, and the curved lines represent the fitting curves. 4. Discussion 4.1. Redefinition of the developmental intrapuparial stages of A. grahami based on gene expression levels According to our PCA results, the whole pupal development process of A. grahami could be roughly separated into two groups ([140]Fig. 1B): the pre-intrapuparial stage (stages A-G) and the post-intrapuparial stage (stages H–K). The pre-intrapuparial stage accounted for two-thirds of the whole pupal development process, indicating a longer developmental time but a slower developmental speed. The post-intrapuparial stage accounted for one-third of the whole pupal development process, indicating a shorter developmental time but a faster developmental speed. In addition, the samples from stages A, B, C-G, H and I–K were separately clustered into five groups ([141]Fig. 1B). The numbers of DEGs in the comparison groups of A vs. B, B vs. C and H vs. I were the highest among all of the other comparison groups ([142]Fig. 2A). Thus, we redefined the 11 successive developmental intrapuparial stages of A. grahami into five key substages: I (stage A), II (stage B), III (stages C-G), IV (stage H), and V (stages I–K). This represents the first documented attempt to define developmental substages based on the overall gene expression levels during pupal development. In contrast to the use of the dissection of puparium and key tissues within puparium, this method allows avoiding the identification difficulties due to unremarkable changes in anatomy and morphology during pupal development [[143]48,[144]49]. Furthermore, to divide metamorphosis into smaller developmental substages could be subsequently used for more exact PMI estimation. 4.2. A combination analysis of morphological changes and gene expression patterns during pupal development According to morphological changes, division of intrapuparial stage of A. grahami was conducted as described previously [[145]50], and then the intrapuparial stage was subdivided into 11 substages, denoted A to K [[146]51]. In the present study, we redefined the intrapuparial stage of A. grahami into five key successive substages (I, II, III, IV, and V), based on the overall gene expression patterns during pupal development. Through calculating the experienced time after pupation at 20 °C during each intrapuparial stage of A. grahami [[147]50,[148]51], we could correspond our proposed substages I, II, III, IV, and V to one or several morphological substages, respectively. The substage I could correspond to the prepupal stage, and the substage II could correspond to the cryptocephalic pupal stage and the phanerocephalic pupal stage. According to the results of our KEGG pathway enrichment analysis, the lysosomal and apoptotic pathways were obviously enriched in comparison group I vs. II, and played significant roles in the larval tissue self-digestion process during pupal development [[149][52], [150][53], [151][54]], indicating that pupae mainly undergo the process of larval tissue degradation during our proposed substages I-II, which is consistent with the degradation of a large mass of larval tissue between the prepupal stage and the phanerocephalic pupal stage. The substage III could correspond to the yellow-eye stage, the visible stage of the antenna, and the early-stage of the appearance stage of thoracic setae. The phagosome, phototransduction-fly, cutin, suberin, and wax biosynthesis pathways were markedly enriched in comparison group II vs. III, indicating that the pupae mainly undergo significant changes in tissue remodeling, epidermal development, and external light sensation during our proposed substages II-III, which is consistent with the continuous formation of various tissues and organs within the pupae [[152]55,[153]56]. The substage IV could correspond to the early-stage of the appearance stage of thoracic setae. The substage V could correspond to the late-stage of the appearance stage of thoracic setae, coloring stage of thorax and abdomen, pink-eye stage, the red-eye stage and the brown-eye stage. The oxidative phosphorylation, thermogenesis, and metabolic pathways were remarkably enriched in comparison groups III vs. IV and IV vs. V, suggesting that DEGs in the post-intrapuparial stage are mainly related to energy metabolism and environmental adaptability. It should be noticed that the hormone biosynthesis pathways were significantly enriched in comparison groups I vs. II and III vs. IV. Insect hormones play an essential role in the processes of larva-pupa and pupa-adult molting [[154]57], indicating that pupae experience the larva-pupa molting process during our proposed substages I-II and the pupa-adult molting process during our proposed substages III-IV, which is consistent with the morphological changes. Given all that, the division of developmental intrapuparial stages based on gene expression profiles is an important supplement to the definition of morphological developmental intrapuparial stages, which can avoid the errors caused by subjective factors during morphological observation. 4.3. Identification of stage-specific DEGs Stage-specific DEGs have been shown to play vital roles in regulating developmental processes [[155]58,[156]59]. Stage-specific DEGs can obtain not only the key candidates involved in the development of plants, but also be used as diagnostic markers for early detection of cancer or recently identified biological targets of drugs [[157][59], [158][60], [159][61]]. The expression profiles in six developmental stages (early larvae, late larvae, early pupae, late pupae, adult male, and adult female) of Aedes aegypti (Diptera: Culicidae) were examined, which indicated that the levels of gene expression between the larval, pupal, and adult stages vary significantly; thus, demonstrated that genes involved in the growth and development are highly stage-specific [[160]62]. A large-scale catalog of the stage-specific DEGs that are involved in spectacular changes in body coloration was successfully obtained for the swallowtail butterfly larvae [[161]63]. De novo transcriptome analysis of C. vicina pupae was performed for 15 different developmental stages, and 111 highly stage-specific DEGs were finally identified across all 15 stages; more importantly, the accuracy of the age calculations was no more than ± 1–3 days using these genes [[162]29]. Therefore, stage-specific DEGs that can be used as molecular markers for pupae age estimation are of instructive significance for PMI prediction. In the present study, we identified 99 stage-specific DEGs that could indicate pupae age based on a comparison of gene expression levels among the 11 developmental intrapuparial stages of A. grahami. According to the results of our gene annotation, the stage-specific DEGs are mainly involved in encoding apoptosis proteins, chitin proteins (e.g., chia, chit10, etc.), cuticular proteins (e.g., edg84a, ccp84ab, pcp, etc.), hypothetical proteins, heat shock proteins (i.e., hsp70ba, hsp68, etc.), and proteins of the mitochondrial respiratory chains (e.g., phosphorylase, cytochrome P450, etc.). Previously, the change tendencies of the genes that encode hsp60 and hsp90 have shown potential for the estimation of pupae age in L. sericata [[163]22]. We have further verified that members of the hsp superfamily probably play a vital role in pupae age prediction based on the identification of hsp70ba and hsp68 in our study. 4.4. Identification of consistently upregulated DEGs WGCNA is an innovative and effective statistical approach based on gene expression correlations, which is used not only to build networks among various genes to show the expression patterns among multiple samples but also to screen and identify candidate genes for biological markers [[164]64]. Theoretically speaking, the most appropriate DEGs for indicating pupae age during metamorphosis should show sustained increases or sustained decreases in gene expression levels during the pupal development, so that each relative gene expression level would match with one corresponding developmental time [[165]64,[166]65]. However, finding the optimal DEGs that match the criteria is hard. The expression levels of Hsp23 and Hsp24 at 15 °C fluctuated initially, then showed growing tendencies and reached the peak, and fell during the intrapuparial process of A. grahami; in addition, the expression levels of Hsp23 and Hsp24 at 17 °C and 22 °C showed similar changing trends [[167]65]. These results showed that it is feasible to integrate the results of various DEGs to obtain more precise results. The gene expression patterns were examined using RT-qPCR and several marker transcripts of C. vicina pupae were newly identified, which are reliable for examining a certain age by indicating a monotonous rise of gene expression at a certain time point throughout metamorphosis regardless of the rearing temperature (within a temperature range between 17 °C and 25 °C); therefore, these recently identified biological markers will contribute to more accurate PMI estimation compared to so far published molecular markers for age estimation [[168]27]. In total, four DEGs with continuous expression trends have been found to have the potentiality to be used in age estimation of intrapuparial S. peregrina [[169]12]. In the present study, 55 consistently upregulated DEGs were identified during the development of intrapuparial A. grahami using RNA-seq and DEG-seq combined with the WGCNA and STEM analyses. Improving the accuracy of intrapuparial A. grahami age estimation and subsequent PMI determination using the newly identified DEGs could allow expanding the marker gene pool to provide more reference genes for age estimation of other closely related species. 4.5. Age estimation model of intrapuparial A. grahami based on DEGs We constructed a model that could be applied to the age estimation of intrapuparial A. grahami based on the gene expression levels of four DEGs consistently upregulated during six developmental intrapuparial stages. The model suggests that the four DEGs (especially act79b and ninac) could have great potential in PMI prediction. The act79b and act88f genes are two significant actin genes, which are mainly expressed in muscles, promote the jumping movements of flies, and are involved in the morphogenesis of muscles and tendons during D. melanogaster pupal development [[170]66]. Over recent years, actin genes have been used as molecular markers in the pupae age estimation of forensically important necrophagous insects. Under different temperature conditions, actin remains consistently upregulated during C. vicina pupal development [[171]18], while it remains consistently downregulated during L. sericata pupal development [[172]22]. In conclusion, act79b and act88f could be used as marker genes for the age estimation of intrapuparial A.grahami. The ninac and up genes were newly identified DEGs in the present study that have not been reported in previous publications. These two genes could be significant molecular markers to be applied in improving the accuracy of pupae age estimation and thus PMI prediction. The model which we constructed is based on the data collected at one optimal constant temperature of A.grahami. However, the temperature had a crucial effect on the gene expression levels of almost all the genes especially the DEGs, because temperature is the most dominant factor during the developmental process of heterothermic animals, which was indicated by the developmental time decreasing with higher temperature resulting in a faster progression of gene expression and changes in mRNA levels [[173]18, [174]67]. The gene expression patterns of four DEGs (Hsp90, A-alpha, AFP, and AFBP) of S. peregrina were examined for intrapuparial age estimation at different constant temperatures (35 °C, 25 °C, and 15 °C), and the gene expression patterns of four DEGs mentioned were proved statistically significant for identification of the intrapuparial developmental stages of S. peregrina, and the gene expression profiles of the same DEGs were similar at different constant temperatures [[175]12]. 15_2, 2014192, actin, and arylphorin receptor gene expression patterns were fairly the same in C. vicina pupae reared at all three constant temperatures (15 °C, 20 °C, and 25 °C); and with the exception of 15_2, the gene expression levels of all DEGs show a single up-regulation or down-regulation over time [[176]18]. However, the gene expression levels of actin, tbp were discordant at different constant temperatures [[177]28]. actin expression was down-regulated at temperatures between 25 °C and 30 °C, while it progressively up-regulated at 20 °C throughout the first nine days, and then down-regulated until the end of the pupal development. tbp expression initially up-regulated at temperatures between 20 °C and 25 °C, then down-regulated and then up-regulated again; at the temperature of 30 °C, it initially up-regulated and then down-regulated. During the intrapuparial stages of A. grahami, the expression of 1_16 slightly fluctuates at 15 °C and 27 °C, but, at 22 °C, the expression showed a dynamically growing tendency [[178]65]. In addition, a blind study represented a typical situation occurred in forensic practice while the forensic entomologists are confronted with a fluctuating ambient temperature at the crime scene with the ongoing investigation but have reference data under different constant temperatures from related studies; however, in these cases, it is notable that gene expression data would still contribute to relatively accurate age estimation [[179]9]. Furthermore, besides the temperature, the effects of geographical origin on developmental time of L. sericata [[180]68] and geographic variation on developmental time of S. peregrina [[181]69] were researched, but the effects on the expression profiles of DEGs are indefinite and need further researches in the future. We made basic verification at the gene expression and the transcriptome levels. The fitting curves largely confirmed the consistence of expression patterns among genes with predictive tendencies. The verification of efficiency or precision of the model we constructed should be based on systematic parallel experiment and practical cases, which will be explored in our future research. 5. Conclusions With this study, we aimed to provide the potential DEGs that may meet the consideration of forensic entomology. We provided new insights into intrapuparial development patterns at the transcriptome level. We revealed several functional genes related to pupal development and redefined the intrapuparial stages of A.grahami, which could facilitate our understanding of the changes and regulations during the pupal development. The stage-specific DEGs and those with consistent tendencies of expression could also be helpful in facilitating more accurate age estimation of intrapuparial Calliphoridae and, therefore, the more accurate prediction of PMI. Additionally, the genes with similar expression patterns could serve as candidate variables that adjust and improve the statistical model for PMI estimation. Because a multi marker gene pool could provide more references for checking the