Abstract Background Real-time quantitative PCR (RT-qPCR) is a crucial and widely used method for gene expression analysis. Selecting suitable reference genes is extremely important for the accuracy of RT-qPCR results. Commonly used reference genes are not always stable in various organisms or under different environmental conditions. With the increasing application of high-throughput sequencing, transcriptome analysis has become an effective method for identifying novel stable reference genes. Results In this study, we identified candidate reference genes based on transcriptome data covering embryos and larvae of early development, normal adult tissues, and the hindgut under sulfide stress using the coefficient of variation (CV) method in the echiuran Urechis unicinctus, resulting in 6834 (15.82%), 7110 (16.85%) and 13880 (35.87%) candidate reference genes, respectively. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses revealed that the candidate reference genes were significantly enriched in cellular metabolic process, protein metabolic process and ribosome in early development and normal adult tissues as well as in cellular localization and endocytosis in the hindgut under sulfide stress. Subsequently, ten genes including five new candidate reference genes and five commonly used reference genes, were validated by RT-qPCR. The expression stability of the ten genes was analyzed using four methods (geNorm, NormFinder, BestKeeper, and ∆Ct). The comprehensive results indicated that the new candidate reference genes were more stable than most commonly used reference genes. The commonly used ACTB was the most unstable gene. The candidate reference genes STX12, EHMT1, and LYAG were the most stable genes in early development, normal adult tissues, and hindgut under sulfide stress, respectively. The log[2](TPM) of the transcriptome data was significantly negatively correlated with the Ct values of RT-qPCR (Ct =  − 0.5405 log[2](TPM) + 34.51), which made it possible to estimate the Ct value before RT-qPCR using transcriptome data. Conclusion Our study is the first to select reference genes for RT-qPCR from transcriptome data in Echiura and provides important information for future gene expression studies in U. unicinctus. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09358-6. Keywords: Reference genes, RT-qPCR, Transcriptome data, Urechis unicinctus Background Real-time quantitative PCR (RT-qPCR) is the most widely used technique for relative gene quantification because of its good repeatability, high sensitivity, strong specificity, high throughput, simplicity, speed, and low cost [[43]1–[44]3]. However, the biological variability of initial materials and the technical factors involved in sample preparation, such as the quantity of cDNA, RNA extraction, RNA integrity, and storage conditions will inevitably affect the accuracy of RT-qPCR [[45]4–[46]6]. Therefore, normalization is necessary to correct for variations in template quantity. Reference genes are used for the normalization of gene expression because of the stability of expression levels among different tissues, different developmental stages, or under various treatments [[47]5]. In general, constitutively expressed housekeeping genes are used as reference genes, such as actin, elongation factor, glyceraldehyde-3-phosphate dehydrogenase, ribosomal RNA, translation initiation factor, tubulin, and ubiquitin [[48]7–[49]9]. However, many studies have shown that some of these genes are not always stable and their expression levels vary greatly under specific experimental conditions [[50]6, [51]10, [52]11]. This is especially true for non-model organisms, which currently lag behind well-characterized model organisms in terms of genomic resources and empirically tested reference genes [[53]4, [54]6, [55]10, [56]11]. Moreover, recent studies have shown that it is impossible to totally normalize gene expression data from all sample types using a single gene [[57]6, [58]10]. Therefore, two or more reference genes are desirable to improve the reliability and accuracy of the RT-qPCR results. With the increasing application of high-throughput sequencing, RNA-seq has provided a new strategy for identifying new highly stable reference genes from transcriptome data. Heretofore, identification of many novel reference genes has been performed based on transcriptome data in various organisms [[59]10–[60]22]. The Echiura worm Urechis unicinctus, a typical benthic species living in intertidal sediments, is widely distributed in Russia, Korean Peninsula, Japan and China [[61]23]. U. unicinctus possesses high economic value because of great edible value and potential medical value [[62]23]. U. unicinctus is also mostly used to study gametogenesis [[63]24], development [[64]25–[65]31], evolution [[66]32, [67]33], and sulfide metabolism [[68]34] because of its characteristics such as a large number of eggs laid, high fertilization rate, biphasic life cycle, and high sulfide tolerance ability. Recently, the use of U. unicinctus in evo-devo studies has generated many breakthrough [[69]32, [70]33]. Evolutionary transcriptome analysis of the trochophores of U. unicinctus and other metazoan animals reveal an adult-first evolutionary scenario with a single metazoan larval intercalation [[71]32]. Hox-mediated body plan diversification is an important developmental process [[72]35]. In U. unicinctus, the expression of Hox genes exhibits a subcluster-based whole-cluster spatio-temporal collinearity pattern, suggesting that Hox subcluster play an important role in spatio-temporal collinearity pattern in invertebrates [[73]33]. On the other hand, as a species living in the intertidal zone, U. unicinctus can tolerate, metabolize and utilize environmental sulfide and is considered a model species for sulfide adaptation [[74]34, [75]36–[76]45]. At present, gene expression analysis by RT-qPCR has been widely performed in U. unicinctus, using commonly used reference genes, such as ATPase [[77]29–[78]31, [79]33, [80]46] and β-actin [[81]34, [82]36, [83]40, [84]41, [85]43, [86]47–[87]51]. Previous studies have identified some reference genes, but have generally focused on some traditionally used genes, such as EF-1-α, TBP, TUB, eIF3, and ATPase [[88]52, [89]53]. Suitable reference genes are crucial for verifying the expression profiles of related genes for future studies in U. unicinctus. RNA-Seq, which can provide a large amount of gene transcription information, is a better method for reference gene screening [[90]10–[91]22]. In addition to classical housekeeping genes, transcriptome data analysis provides an opportunity to identify novel and more stable reference genes. In recent years, a large amount of transcriptome data of U. unicinctus has been published [[92]32, [93]45, [94]46, [95]54], which provides a new strategy for selecting housekeeping genes or reference genes in U. unicinctus. In this study, we systematically screened reference genes by analyzing transcriptome data including early development, normal adult tissues, and the hindgut under sulfide stress in U. unicinctus. Candidate reference genes were selected from the three datasets. Moreover, the correlation between the Ct of RT-qPCR and transcripts per million (TPM) of the transcriptome data was investigated. Our findings identified novel stable reference genes from transcriptome data and contributed to the accurate quantification of gene expression in U. unicinctus. Results and discussion Identification of the candidate reference genes from transcriptome data In this study, we systematically screened reference genes based on transcriptome datasets from early developmental embryos and larvae, normal adult tissues, and the hindgut under sulfide stress using the coefficient of variation (CV) method in U. unicinctus. The CV method is simple to use for candidate reference gene selection. Moreover, compared with other methods such as the fold change method, the CV method can quantify expression variability in a way in which genes can be ranked and directly compared, which has previously been used to identify novel reference genes from transcriptome data in plant species such as the monkeyflower genera Mimulus luteus, Polygonum cuspidatum, apple, and Lycium barbarum L [[96]12, [97]16, [98]21, [99]55] and animals such as Mizuhopecten yessoensis, and silkworm Bombyx mori [[100]17, [101]18]. Although both reads/fragments per kilobase per million (RPKM or FPKM) and TPM can be used to measure gene expression levels, RPKM and FPKM may not be applicable to the comparison of gene expression levels because of the differences sequencing depth between samples. TPM is more suitable for comparison of expression levels among samples [[102]11, [103]56]. First, we excluded genes with low expression levels for easy detection in RT-qPCR assays and adopted a minimum mean log[2](TPM) cut-off of 5 as a criterion for gene expression levels. Second, to ensure that the reference genes had low variance, a standard deviation (SD) log[2](TPM) value of less than 1 was required. So a 0.2 CV cut-off was applied to further identify reference genes, which has been recommended in previous studies [[104]11]. Based on these criteria, we identified 6834 (15.82%), 7110 (16.85%) and 13880 (35.87%) candidate reference genes from 43209 genes of early developmental embryos and larvae, 42191 genes of different normal adult tissues, and 38690 genes of the hindgut under sulfide stress. The number of candidate reference genes for early development was lowest. This result was expected because gene expression levels can change dramatically in a short time during early development [[105]11]. Further, the expression levels of the candidate reference genes were analyzed. The results indicated that the median log[2](TPM) values of the candidate reference genes were 14.162 in the early developmental stages, 15.389 in normal adult tissues and 16.317 in hindgut under sulfide stress (Fig. [106]1A). The ten most stable genes with the lowest CV values in early developmental embryos and larvae, different normal adult tissues, and hindgut under sulfide stress are listed in Table [107]1. The mean log[2](TPM) and CV values of the ten most stable genes in the early development stages ranged from 14.317 to 14.329 and 0.0157 to 0.0203, respectively. All genes were annotated, of which six encoded proteins (FXRD1, IF2M, STX12, PTCD3, TCF25, and TFG) related to gene transcription, translation, protein transport and assembly, and three encoded proteins (CAPR1, NSMA, PDCD6, and HBS1L) related to cell growth, proliferation and apoptosis. As for the ten most stable genes in normal adult tissues, their mean log[2](TPM) and CV values ranged from 15.524 to 15.532 and 0.0093 to 0.0138, respectively. Nine genes were annotated, of which six were encoded proteins (UBE4A, OGT1, GGA3, EHMT1, TRA2B, and PRP39) related to protein modification, transport and mRNA splicing. The ten most stable genes in the hindgut under sulfide stress showed mean log[2](TPM) values ranged from 16.3457 to 16.3463 and CV values ranging from 0.0018 to 0.0032. Nine genes were annotated, of which three (CNOT1, RBM26, and HTR5B) encoded proteins related to post-transcriptional regulation. Fig. 1. [108]Fig. 1 [109]Open in a new tab The selected candidate reference genes in early development, normal adult tissues, and the hindgut under sulfide stress. A Boxplot exhibiting the log[2]TPM values of candidate reference genes that met the criteria. B Venn diagram showing the relationships between candidate reference genes under three conditions. Pink, early development; blue, normal adult tissues; green, hindgut under sulfide stress Table 1. The information of the top 10 candidate reference genes in the early development, normal adult tissues and hindgut under sulfide stress of the U. unicinctus Gene ID Gene name CV Mean Early development evm.model.Hic_asm_9.985.3 FAD-dependent oxidoreductase domain-containing protein 1, FXRD1 0.0157 14.329 evm.model.Hic_asm_2.658 Caprin-1, CAPR1 0.0170 14.326 evm.model.Hic_asm_11.431 Putative neutral sphinGOmyelinase, NSMA 0.0171 14.326 evm.model.Hic_asm_13.131 Translation initiation factor IF-2 Mitochondrial, IF2M 0.0183 14.323 evm.model.Hic_asm_2.533 Syntaxin-12, STX12 0.0190 14.322 evm.model.Hic_asm_6.652 Programmed cell death protein 6, PDCD6 0.0191 14.326 evm.model.Hic_asm_3.310.1 Pentatricopeptide repeat domain-containing protein 3, mitochondrial, PTCD3 0.0192 14.320 evm.model.Hic_asm_6.1042 Transcription factor 25, TCF25 0.0201 14.319 evm.model.Hic_asm_11.1625 TRK-fused gene protein, TFG 0.0202 14.319 evm.model.Hic_asm_0.1156 HBS1-like protein, HBS1L 0.0203 14.317 Normal adult tissues evm.model.Scaf262.4 Ubiquitin conjugation factor E4 A, UBE4A 0.0093 15.532 evm.model.Hic_asm_10.1197 UDP-N-acetylglucosamine–peptide N-acetylglucosaminyltransferase 110 kDa subunit, OGT1 0.0102 15.531 novel.9221 Not annotated 0.0114 15.528 evm.model.Hic_asm_9.1113_evm.model.Hic_asm_9.1114 Histone-lysine N-methyltransferase, EHMT1 0.0114 15.528 evm.model.Hic_asm_8.1166 ADP-ribosylation factor-binding protein, GGA3 0.0118 15.528 evm.model.Hic_asm_7.291 Transformer-2 protein homolog beta, TRA2B 0.0125 15.526 evm.model.Hic_asm_4.469 F-box only protein 18, FBX18 0.0129 15.525 evm.model.Hic_asm_3.391 Uncharacterized protein STK_23830 0.0131 15.525 evm.model.Hic_asm_13.300 Pre-mRNA-processing factor 39, PRP39 0.0134 15.524 evm.model.Scaf156.13 Rabenosyn-5, RBNS5 0.0138 15.524 Hindgut under sulfide stress evm.model.Hic_asm_6.1089 CCR4-NOT transcription complex subunit 1, CNOT1 0.0018 16.346 evm.model.Hic_asm_1.1227 Clathrin heavy chain 1, CLH1 0.0027 16.346 evm.model.Hic_asm_3.989 Exocyst complex component 6, EXOC6 0.0027 16.346 evm.model.Hic_asm_14.1611 Lysosomal alpha-glucosidase, LYAG 0.0027 16.346 evm.model.Hic_asm_7.654 Not annotated 0.0028 16.346 evm.model.Hic_asm_3.766 Melanotransferrin, TRFM 0.0030 16.346 evm.model.Hic_asm_16.105 Angiomotin, AMOT 0.0031 16.346 evm.model.Hic_asm_10.1107 RNA-binding protein 26, RBM26 0.0031 16.346 evm.model.Hic_asm_2.1309 Calumenin-B, CALUB 0.0031 16.346 evm.model.Hic_asm_13.2016 HEAT repeat-containing protein 5B, HTR5B 0.0032 16.346 [110]Open in a new tab Functional enrichment analysis of candidate reference genes To further analyze the relationships of candidate reference genes in early development, normal adult tissues, and the hindgut under sulfide stress, we compared the three candidate reference gene datasets. As shown in Fig. [111]1B, 4079 genes were shared in three candidate reference gene datasets, 4184 genes were shared in early development and normal adult tissues, 5790 genes were shared in early development and hindgut under sulfide stress, and 6615 genes were shared in normal adult tissues and hindgut under sulfide stress. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of the three candidate reference gene datasets were then performed (Table [112]2). In early development, GO enrichment analysis showed that the candidate reference genes were mainly enriched in biological process (BP) terms associated with cellular protein metabolic process and macromolecule metabolic process, and in molecular function (MF) terms related to binding. These genes were also enriched in cellular component (CC) terms associated with intracellular and cell. KEGG pathway enrichment analysis indicated ribosome was the most significant pathway, followed by proteasome. In different adult tissues, GO enrichment analysis showed that the candidate reference genes were mainly enriched in biological process (BP) terms associated with cellular protein metabolic process and intracellular transport, and in molecular function (MF) terms related to binding. These genes were also enriched in cellular component (CC) terms associated with intracellular and cell. KEGG pathway enrichment analysis indicated ribosome was the most significant pathway, followed by oxidative phosphorylation. In the hindgut under sulfide stress, GO enrichment analysis showed that the candidate reference genes were mainly enriched in biological process (BP) terms associated with protein phosphorylation and in molecular function (MF) terms related to protein binding. These genes were also enriched in cellular component (CC) terms associated with intracellular, cell and organelle. KEGG pathway enrichment analysis revealed diabetic cardiomyopathy was the most significant pathway, followed by the oxidative phosphorylation. In summary, the results of GO and KEGG enrichment analysis in the hindgut under sulfide stress were different from those in early development and different normal adult tissues, with the proportion of specific candidate reference genes belonging to the hindgut under sulfide stress being the largest, suggesting that the hindgut under sulfide stress, early developmental embryos and larvae, and different adult tissues may focus on diverse biological processes or pathways. Therefore, we need to screen for optimal reference genes in early development, different normal adult tissues, and the hindgut under sulfide stress. Table 2. GO and KEGG pathways enriched in the candidate reference genes ID code Term/ Pathway Subgroup Gene number Q value Early development GO GO:0044267 cellular protein metabolic process Biological process 679 (19.73%) 0.000000 GO:0044260 cellular macromolecule metabolic process Biological process 1518 (44.1%) 0.000000 GO:0044237 cellular metabolic process Biological process 1883 (54.71%) 0.000000 GO:0046907 intracellular transport Biological process 281 (8.16%) 0.000000 GO:0005622 intracellular Cellular component 1315 (57%) 0.000000 GO:0005623 cell Cellular component 1414 (61.29%) 0.000000 GO:0044464 Cell part Cellular component 1414 (61.29%) 0.000000 GO:0044424 intracellular part Cellular component 1263 (54.75%) 0.000000 GO:0043226 organelle Cellular component 1032 (44.73%) 0.000000 GO:0043229 intracellular organelle Cellular component 1014 (43.95%) 0.000000 GO:0005488 binding Molecular function 3127 (72.4%) 0.000000 GO:0005515 protein binding Molecular function 1612 (37.32%) 0.000000 GO:0043168 anion binding Molecular function 753 (17.43%) 0.000000 GO:0000166 nucleotide binding Molecular function 737 (17.06%) 0.000000 KEGG ko03010 Ribosome 89 (4.38%) 0.000000 ko03050 Proteasome 37 (1.82%) 0.000000 ko04141 Protein processing in endoplasmic reticulum 97 (4.77%) 0.000000 ko04120 Ubiquitin mediated proteolysis 84 (4.13%) 0.000001 ko03420 Nucleotide excision repair 33 (1.62%) 0.000002 ko05171 Coronavirus disease—COVID-19 75 (3.69%) 0.000010 Normal adult tissues GO GO:0044267 cellular protein metabolic process Biological process 743 (20.74%) 0.000000 GO:0046907 intracellular transport Biological process 298 (8.32%) 0.000000 GO:0051649 establishment of localization in cell Biological process 323 (9.01%) 0.000000 GO:0044260 cellular macromolecule metabolic process Biological process 1577 (44.01%) 0.000000 GO:0005622 intracellular Cellular component 1364 (57.97%) 0.000000 GO:0044424 intracellular part Cellular component 1312 (55.76%) 0.000000 GO:0005623 cell Cellular component 1458 (61.96%) 0.000000 GO:0044464 cell part Cellular component 1458 (61.96%) 0.000000 GO:0043226 organelle Cellular component 1084 (46.07%) 0.000000 GO:0,043,229 intracellular organelle Cellular component 1056 (44.88%) 0.000000 GO:0005488 bind Molecular function 3371 (74.51%) 0.000000 GO:0005515 protein binding Molecular function 1763 (38.97%) 0.000000 GO:0043168 anion binding Molecular function 801 (17.71%) 0.000000 GO:0017076 purine nucleotide binding Molecular function 668 (14.77%) 0.000000 KEGG ko03010 Ribosome 96 (5.02%) 0.000000 ko00190 Oxidative phosphorylation 67 (3.51%) 0.000000 ko03050 Proteasome 38 (1.99%) 0.000000 ko05171 Coronavirus disease—COVID-19 85 (4.45%) 0.000000 ko05415 Diabetic cardiomyopathy 85 (4.45%) 0.000000 ko05016 Huntington disease 141 (7.38%) 0.000000 Hindgut under sulfide stress GO GO:0006468 protein phosphorylation Biological process 329 (4.82%) 0.000000 GO:0007264 small GTPase mediated signal transduction Biological process 225 (3.3%) 0.000000 GO:0051641 cellular localization Biological process 558 (8.18%) 0.000000 GO:0051649 establishment of localization in cell Biological process 513 (7.52%) 0.000000 GO:0005622 intracellular Cellular component 2248 (48.43%) 0.000000 GO:0005623 cell Cellular component 2461 (53.02%) 0.000000 GO:0044464 cell part Cellular component 2461 (53.02%) 0.000000 GO:0044424 intracellular part Cellular component 2147 (46.25%) 0.000000 GO:0043226 organelle Cellular component 1776 (38.26%) 0.000000 GO:0043229 intracellular organelle Cellular component 1729 (37.25%) 0.000000 GO:0005515 protein binding Molecular function 3325 (38.6%) 0.000000 GO:0005488 binding Molecular function 6141 (71.3%) 0.000000 GO:0043168 anion binding Molecular function 1372 (15.93%) 0.000000 GO:0017076 purine nucleotide binding Molecular function 1119 (12.99%) 0.000000 KEGG ko05415 Diabetic cardiomyopathy 102 (3.23%) 0.000004 ko00190 Oxidative phosphorylation 73 (2.31%) 0.000004 ko05017 Spinocerebellar ataxia 97 (3.07%) 0.000008 ko03050 Proteasome 38 (1.2%) 0.000013 ko04136 Autophagy—other 36 (1.14%) 0.000026 ko04120 Ubiquitin mediated proteolysis 107 (3.39%) 0.000026 [113]Open in a new tab Validation of candidate and commonly used reference genes expression stability by RT-qPCR assay Five candidates and five commonly used reference genes were selected for validation and comparison of the expression stability in this study (Additional file [114]1: Table S1). Five of the top ten candidate reference genes were chosen for RT-qPCR in each case, as shown in Table [115]1. The five candidate reference genes were FXRD1, CAPR1, NSMA, IF2M, and STX12 in early development; UBE4A, OGT1, EHMT1, GGA3, and TRA2B in normal adult tissues and CNOT1, CLH1, EXOC6, LYAG, and TRFM in the hindgut under sulfide stress. The five commonly used reference genes were ATPase B, TBP, eIF3, ACTB, and GAPDH (Table [116]3). Table 3. Detailed information on the five commonly used reference genes in the early development, normal adult tissues, and the hindgut under sulfide stress of the U. unicinctus GenBank ID Gene name Early development Normal adult tissues Hindgut under sulfide stress CV Mean CV Mean CV Mean [117]MN662252 ATPase B 0.0623 14.128 0.0424 15.415 0.0150 16.327 [118]OQ111925 TBP 0.0956 13.568 0.0360 15.435 0.0154 16.325 [119]MN662253 eIF3 0.0517 14.205 0.0495 15.361 0.0131 16.331 [120]GU592178 ACTB 0.2601 12.530 0.1265 14.444 0.0132 16.330 [121]EF012775 GAPDH 0.0887 13.880 0.0607 15.235 0.0202 16.312 [122]Open in a new tab Boxplots were constructed to present the expression levels of five candidate reference genes and five commonly used reference genes under all three conditions (Fig. [123]2). As shown in Fig. [124]2, Table [125]1, and Table [126]3, the novel candidate reference genes possessed higher stability than the commonly used reference genes in all three conditions. The variances in the commonly used reference genes were different among three cases. Three of the five commonly used reference genes, ACTB, GAPDH, and TBP are unstable during early development. ACTB was the most unstable reference gene in early development and normal adult tissues. GAPDH have the highest variance in the hindgut under sulfide stress, followed by TBP. Fig. 2. [127]Fig. 2 [128]Open in a new tab The boxplots that show the log[2]TPM values of the five commonly used reference genes and five candidate reference genes in early development (A), normal adult tissues (B) and hindgut under sulfide stress To further examine the results of transcriptome analysis, RT-qPCR experiments were carried out, and the expression level constancy of ten reference genes in different cases was assessed by four data processing methods (geNorm, NormFinder, BestKeeper, and ∆Ct) [[129]57–[130]60]. Despite a slight difference in the samples used between transcriptome data and RT-qPCR, the results of transcriptome analysis and RT-qPCR assay are very similar, which suggests that candidate reference genes have higher stability than most of the commonly used genes. As shown in Fig. [131]3A, during early development, syntaxin-12 (STX12) was the most stable gene, which is a member of the syntaxin family localized to the endosome [[132]61]. The syntaxin family belongs to the t-SNARE subfamily of the SNARE superfamily and is involved in vesicle trafficking [[133]62, [134]63]. STX12 is widely expressed and potentially participates in a common trafficking event that occurs in every cell [[135]64–[136]66], which explains why STX12 showed stable expression levels during early development. The most stable reference gene in different adult tissues was euchromatic histone-lysine N-methyltransferase 1 (EHMT1) (Fig. [137]3B). EHMT1 and euchromatic histone-lysine N-methyltransferase 2 (EHMT2) are highly homologous and generate functional heterodimeric complexes that are mainly responsible for mono- and dimethylation of histone H3 lysine 9 (H3K9) in euchromatin [[138]67]. EHMT1/EHMT2 is essential for maintaining the normal methylation patterns of H3K9 and plays a central role in the epigenetic control of euchromatin, which is vital for normal cell function. They are universally expressed and associated with many biological processes [[139]67–[140]71]. EHMT1 is also required for normal levels of DNA methylation in facultative heterochromatin [[141]72]. The stable expression levels of EHMT1 and its key role in cells enabled its use as a reference gene in normal adult tissues. As to hindgut under sulfide stress, the most stable gene was lysosomal alpha-glucosidase (LYAG) (Fig. [142]3C), which is a retaining exo-glucosidase catalyzing the production of glucose from glycogen in lysosomes [[143]73, [144]74]. LYAG is extremely important for the degradation of glycogen in lysosomes [[145]75]. This defect can cause the substrate to accumulate in almost all body tissues [[146]76]. Alpha-glucosidases have weak specificity and a given substrate is not strictly connected to a single type of protein [[147]77]. The expression levels of LYAG were not affected by sulfide treatment. Therefore, STX12, EHMT1 and LYAG can be selected as reference genes to normalize the results of the RT-qPCR assay in early development, normal adult tissues, and hindgut after sulfide stress in U. unicinctus, respectively. Fig. 3. Fig. 3 [148]Open in a new tab Expression stability of the ten genes in early development (A), normal adult tissues (B), and hindgut under sulfide stress (C). Stability was estimated using geNorm, NormFinder, BestKeeper, and ∆Ct analyses of the RT-qPCR data. The genes were ranked in descending order of expression stability from left to right after comprehensive analysis Compared to the novel screened candidate reference genes, most of the commonly used reference genes had lower comprehensive ranking values. During early development, ACTB was the most variable, followed by TBP, which is consistent with the transcriptome data of the two genes with high CV values (Fig. [149]3A, Table [150]3). Similarly, in different normal adult tissues, ACTB had the lowest comprehensive analysis ranking values of RT-qPCR and the highest variance, which was not included in the candidate reference gene list. TBP and ATPase B were more stable with relatively high rankings of stability by RT-qPCR compared with other traditional reference genes (Fig. [151]3B), which is expected because the two genes are candidate reference genes that pass the criteria and have lower CV values in five commonly used reference genes (Table [152]3). The results of the comprehensive analysis in early development and normal adult tissues are in accordance with those of previous studies in U. unicinctus [[153]52]. During sulfide stress in the hindgut, ACTB was the most unstable in the comprehensive RT-qPCR results (Fig. [154]3C). However, there was also some inconsistency between the results of RT-qPCR and transcriptome data analysis. For example, GAPDH, which ranked second after STX12 in the RT-qPCR results during early development (Fig. [155]3A), was not included in the candidate reference gene list with a 1.23 of SD log[2](TPM) value by transcriptome data analysis. We deduced that the inconsistent phenomenon may result from the difference in the samples between the transcriptome data and RT-qPCR. Relationship of gene expression level between transcriptome data and RT-qPCR Previous studies have suggested that there is a high correlation between RNA-Seq data and Ct value of RT-qPCR [[156]11]. Therefore, we assessed the relationship between the TPM values of the transcriptome data and RT-qPCR data. As shown in Fig. [157]4, there was a significant negative correlation between log[2](TPM) and Ct values (R^2 = 0.0453, P < 0.0001), with the formula Ct =  − 0.5405 log[2](TPM) + 34.51. This formula will contribute to the estimation the Ct value based on transcriptome data without executing the RT-qPCR assay, and will be conducive to our further research. Fig. 4. [158]Fig. 4 [159]Open in a new tab The correlation of gene expression between log[2]TPM of transcriptome data and Ct values of RT-qPCR Conclusions In this study, we identified candidate reference genes for embryos and larvae of early developmental stages, normal adult tissues, and the hindgut under sulfide stress based on transcriptome data from U. unicinctus. We then validated of the candidate reference genes by RT-qPCR using four methods (geNorm, NormFinder, BestKeeper, and ∆Ct) and compared the stability between the candidate reference genes and commonly used reference genes. The results showed that STX12, EHMT1, and LYAG are the most stable candidate reference genes in early development, normal adult tissues, and the hindgut under sulfide stress, respectively. Our study indicates that transcriptome analysis approaches have great potential to discover novel stable reference genes and will contribute to future gene expression level research in U. unicinctus. Materials and methods Animals materials and treatments Adult U. unicinctus were collected from the intertidal zone along the coast of Yantai city, China. They were maintained in aerated seawater (19℃, pH8.0, salinity 30 PSU) and raised with Chaetoceros muelleri, Chlorella vulgaris, and Platymonas helgolandica. We selected three healthy adult worms and dissected six tissues, including the body wall, coelomic fluid, foregut, mid-gut, hindgut and anal sac from each individual in phosphate-buffered saline (PBS, pH7.4). After dissection, the tissues were immediately frozen in liquid nitrogen and stored at -80℃. Sexually mature individuals were selected and dissected to acquire mature ova and sperm from nephridia (gonoducts) during the spawning season. Sperm and ova were then mixed for artificial insemination at ratio of 10:1. Fertilized eggs were reared in filtered seawater (FSW) (17℃, pH 7.9, salinity 30). Embryos and larvae from ten developmental stages, including early cells (EC, 2 cells, 4 cells, and 8 cells), multiple cell (MC, generally more than 32 cells), blastulae (BL), gastrulae (GA), early trochophore larva (ET, 1 d post fertilization, dpf), mid-trochophore larva (MT, 2 dpf), late-trochophore larva (LT, 25 dpf), early segmentation larva (ES, 30 dpf), segmentation larvae (SL, 35 dpf) and worm-shaped larvae (WL, 42 dpf), were collected, frozen immediately in liquid nitrogen and then stored at -80℃ for total RNA extraction. Three biological replicates were prepared for each developmental stage. The experimental system and sulfide treatment were conducted as described previously [[160]45]. We prepared three aquariums containing 30 L of seawater and sealed them with a cling film. Six individuals were randomly selected for placement in each aquarium. The sulfide concentration in seawater was maintained at 50 μM (equivalent to moderately polluted sediment that U. unicinctus can live normally) by adding the sulfide stock solution (10 mM Na[2]S, pH 8.0) at 2 h interval, and detected by the methylene blue method [[161]41]. The hindguts of three individuals from each aquarium were dissected 0 (control), 6, 24, and 48 h after sulfide treatment. The hindgut was immediately frozen in liquid nitrogen and stored at − 80 °C. Transcriptome datasets Transcriptome data of U. unicinctus embryos and larvae at various developmental periods were obtained from the NCBI Sequence Read Archive (SRA) database under the accession numbers PRJNA485379 and PRJNA394029, mainly including the following stages, EC: Early cells; MC: Multicellular; BL: Blastula; GA: Gastrula; ET: Early-trochophore; MT: Mid-trochophore; LT: Late-trochophore; SL: Segmentation larva and WL: Worm-shaped larva. The transcriptome data of the normal adult tissues were also obtained from the NCBI SRA database under the accession number PRJNA917787, which mainly included the body wall, coelomic fluid, foregut, mid-gut, hindgut, and anal sac. Transcriptome data of the sulfide stress hindgut samples (50 μM for 0, 6, 24, and 48 h) were also obtained from the NCBI SRA database under the accession number PRJNA752504. Identification of reference genes based on transcriptome data Reference genes for RT-qPCR were selected using the coefficient of variation (CV) method as previously described [[162]11]. TPM values were used to measure gene expression levels and averaged for subsequent analyses of biological replicates. Firs, genes with log[2](TPM) values less than or equal to 5 were excluded, because these low-expression genes would lead to poor RT-qPCR results, which makes it difficult to detect and quantify their expression. CV values were calculated using the formula CV = standard deviation (SD) of log[2](TPM) / average log[2](TPM) (mean). Calculations for the mean, SD, and CV were implemented in Microsoft Excel. Candidate reference genes with low variances were required, with SD values lower than 1. Therefore, a CV cut-off of 0.2 for stable genes was adopted, which was the cut-off for stable expression across heterogeneous genes. Functional enrichment analysis To further understand the functions of the selected candidate reference genes, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were performed. The Swiss-Prot Blast results for all genes and the results of reference genes were imported into the online software OmicShare Tools ([163]https://www.omicshare.com/tools/home/index/index.html), and the GO and KEGG enrichment analysis was completed using the Bioinformatics Cloud Tool Platform [[164]78–[165]80]. RNA isolation and cDNA synthesis Total RNA was extracted from the stored different samples using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), according to the manufacturer’s instructions. The RNA quality was assessed using NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA) and agarose gel electrophoresis. Then, the cDNA template was prepared using a PrimeScript™ RT reagent kit with gDNA Eraser (TaKaRa, Dalian, China) following the manufacturing’s instruction, and diluted with distilled water (1:10) for subsequent experiments. Validation of the reference gene expression stability by RT-qPCR assay Ten genes, consisting of five novel candidates and five commonly used reference genes, were chosen for RT-qPCR validation in early development, normal adult tissues, and the hindgut under sulfide stress. Primers were designed using Primer Primier software (5.0) and the primer sequences are listed in Additional file [166]1: Table S1. RT-qPCR was performed on Light Cycler 480 system (Roche, Basel, Switzerland) using SYBR Premix Ex TaqTM (TaKaRa, Dalian, China). All reactions were carried out with three sample replicates and three technical replicates, and all RT-qPCR assays were validated in compliance with “MIQE guidelines” [[167]81]. Four statistical approaches, geNorm ([168]https://genorm.cmgg.be/) [[169]57], NormFinder ([170]http://moma.dk/) [[171]58], BestKeeper ([172]www.gene-quantification.com/bestkeeper.html) [[173]59], and ∆Ct method [[174]60], were applied to estimate the expression stability of the reference genes. The final ranking of gene expression stability was determined by calculating the geometric mean values of the results acquired using the four approaches. Statistical analysis Statistical analysis of the correlation between FPKM and Ct values was performed using one-way analysis of variance (one-way ANOVA). Statistical significance was set at P < 0.05. Supplementary Information [175]12864_2023_9358_MOESM1_ESM.xlsx^ (11.6KB, xlsx) Additional file 1: Table S1. Information of primers of reference genes in RT-qPCR validation. Acknowledgements