Abstract

Background

   Real-time quantitative PCR (RT-qPCR) is a crucial and widely used
   method for gene expression analysis. Selecting suitable reference genes
   is extremely important for the accuracy of RT-qPCR results. Commonly
   used reference genes are not always stable in various organisms or
   under different environmental conditions. With the increasing
   application of high-throughput sequencing, transcriptome analysis has
   become an effective method for identifying novel stable reference
   genes.

Results

   In this study, we identified candidate reference genes based on
   transcriptome data covering embryos and larvae of early development,
   normal adult tissues, and the hindgut under sulfide stress using the
   coefficient of variation (CV) method in the echiuran Urechis
   unicinctus, resulting in 6834 (15.82%), 7110 (16.85%) and 13880
   (35.87%) candidate reference genes, respectively. Gene Ontology (GO)
   and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses
   revealed that the candidate reference genes were significantly enriched
   in cellular metabolic process, protein metabolic process and ribosome
   in early development and normal adult tissues as well as in cellular
   localization and endocytosis in the hindgut under sulfide stress.
   Subsequently, ten genes including five new candidate reference genes
   and five commonly used reference genes, were validated by RT-qPCR. The
   expression stability of the ten genes was analyzed using four methods
   (geNorm, NormFinder, BestKeeper, and ∆Ct). The comprehensive results
   indicated that the new candidate reference genes were more stable than
   most commonly used reference genes. The commonly used ACTB was the most
   unstable gene. The candidate reference genes STX12, EHMT1, and LYAG
   were the most stable genes in early development, normal adult tissues,
   and hindgut under sulfide stress, respectively. The log[2](TPM) of the
   transcriptome data was significantly negatively correlated with the Ct
   values of RT-qPCR (Ct =  − 0.5405 log[2](TPM) + 34.51), which made it
   possible to estimate the Ct value before RT-qPCR using transcriptome
   data.

Conclusion

   Our study is the first to select reference genes for RT-qPCR from
   transcriptome data in Echiura and provides important information for
   future gene expression studies in U. unicinctus.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12864-023-09358-6.

   Keywords: Reference genes, RT-qPCR, Transcriptome data, Urechis
   unicinctus

Background

   Real-time quantitative PCR (RT-qPCR) is the most widely used technique
   for relative gene quantification because of its good repeatability,
   high sensitivity, strong specificity, high throughput, simplicity,
   speed, and low cost [[43]1–[44]3]. However, the biological variability
   of initial materials and the technical factors involved in sample
   preparation, such as the quantity of cDNA, RNA extraction, RNA
   integrity, and storage conditions will inevitably affect the accuracy
   of RT-qPCR [[45]4–[46]6]. Therefore, normalization is necessary to
   correct for variations in template quantity. Reference genes are used
   for the normalization of gene expression because of the stability of
   expression levels among different tissues, different developmental
   stages, or under various treatments [[47]5].

   In general, constitutively expressed housekeeping genes are used as
   reference genes, such as actin, elongation factor,
   glyceraldehyde-3-phosphate dehydrogenase, ribosomal RNA, translation
   initiation factor, tubulin, and ubiquitin [[48]7–[49]9]. However, many
   studies have shown that some of these genes are not always stable and
   their expression levels vary greatly under specific experimental
   conditions [[50]6, [51]10, [52]11]. This is especially true for
   non-model organisms, which currently lag behind well-characterized
   model organisms in terms of genomic resources and empirically tested
   reference genes [[53]4, [54]6, [55]10, [56]11]. Moreover, recent
   studies have shown that it is impossible to totally normalize gene
   expression data from all sample types using a single gene [[57]6,
   [58]10]. Therefore, two or more reference genes are desirable to
   improve the reliability and accuracy of the RT-qPCR results. With the
   increasing application of high-throughput sequencing, RNA-seq has
   provided a new strategy for identifying new highly stable reference
   genes from transcriptome data. Heretofore, identification of many novel
   reference genes has been performed based on transcriptome data in
   various organisms [[59]10–[60]22].

   The Echiura worm Urechis unicinctus, a typical benthic species living
   in intertidal sediments, is widely distributed in Russia, Korean
   Peninsula, Japan and China [[61]23]. U. unicinctus possesses high
   economic value because of great edible value and potential medical
   value [[62]23]. U. unicinctus is also mostly used to study
   gametogenesis [[63]24], development [[64]25–[65]31], evolution [[66]32,
   [67]33], and sulfide metabolism [[68]34] because of its characteristics
   such as a large number of eggs laid, high fertilization rate, biphasic
   life cycle, and high sulfide tolerance ability. Recently, the use of U.
   unicinctus in evo-devo studies has generated many breakthrough [[69]32,
   [70]33]. Evolutionary transcriptome analysis of the trochophores of U.
   unicinctus and other metazoan animals reveal an adult-first
   evolutionary scenario with a single metazoan larval intercalation
   [[71]32]. Hox-mediated body plan diversification is an important
   developmental process [[72]35]. In U. unicinctus, the expression of Hox
   genes exhibits a subcluster-based whole-cluster spatio-temporal
   collinearity pattern, suggesting that Hox subcluster play an important
   role in spatio-temporal collinearity pattern in invertebrates [[73]33].
   On the other hand, as a species living in the intertidal zone, U.
   unicinctus can tolerate, metabolize and utilize environmental sulfide
   and is considered a model species for sulfide adaptation [[74]34,
   [75]36–[76]45].

   At present, gene expression analysis by RT-qPCR has been widely
   performed in U. unicinctus, using commonly used reference genes, such
   as ATPase [[77]29–[78]31, [79]33, [80]46] and β-actin [[81]34, [82]36,
   [83]40, [84]41, [85]43, [86]47–[87]51]. Previous studies have
   identified some reference genes, but have generally focused on some
   traditionally used genes, such as EF-1-α, TBP, TUB, eIF3, and ATPase
   [[88]52, [89]53]. Suitable reference genes are crucial for verifying
   the expression profiles of related genes for future studies in U.
   unicinctus. RNA-Seq, which can provide a large amount of gene
   transcription information, is a better method for reference gene
   screening [[90]10–[91]22]. In addition to classical housekeeping genes,
   transcriptome data analysis provides an opportunity to identify novel
   and more stable reference genes. In recent years, a large amount of
   transcriptome data of U. unicinctus has been published [[92]32, [93]45,
   [94]46, [95]54], which provides a new strategy for selecting
   housekeeping genes or reference genes in U. unicinctus.

   In this study, we systematically screened reference genes by analyzing
   transcriptome data including early development, normal adult tissues,
   and the hindgut under sulfide stress in U. unicinctus. Candidate
   reference genes were selected from the three datasets. Moreover, the
   correlation between the Ct of RT-qPCR and transcripts per million (TPM)
   of the transcriptome data was investigated. Our findings identified
   novel stable reference genes from transcriptome data and contributed to
   the accurate quantification of gene expression in U. unicinctus.

Results and discussion

Identification of the candidate reference genes from transcriptome data

   In this study, we systematically screened reference genes based on
   transcriptome datasets from early developmental embryos and larvae,
   normal adult tissues, and the hindgut under sulfide stress using the
   coefficient of variation (CV) method in U. unicinctus. The CV method is
   simple to use for candidate reference gene selection. Moreover,
   compared with other methods such as the fold change method, the CV
   method can quantify expression variability in a way in which genes can
   be ranked and directly compared, which has previously been used to
   identify novel reference genes from transcriptome data in plant species
   such as the monkeyflower genera Mimulus luteus, Polygonum cuspidatum,
   apple, and Lycium barbarum L [[96]12, [97]16, [98]21, [99]55] and
   animals such as Mizuhopecten yessoensis, and silkworm Bombyx mori
   [[100]17, [101]18]. Although both reads/fragments per kilobase per
   million (RPKM or FPKM) and TPM can be used to measure gene expression
   levels, RPKM and FPKM may not be applicable to the comparison of gene
   expression levels because of the differences sequencing depth between
   samples. TPM is more suitable for comparison of expression levels among
   samples [[102]11, [103]56]. First, we excluded genes with low
   expression levels for easy detection in RT-qPCR assays and adopted a
   minimum mean log[2](TPM) cut-off of 5 as a criterion for gene
   expression levels. Second, to ensure that the reference genes had low
   variance, a standard deviation (SD) log[2](TPM) value of less than 1
   was required. So a 0.2 CV cut-off was applied to further identify
   reference genes, which has been recommended in previous studies
   [[104]11]. Based on these criteria, we identified 6834 (15.82%), 7110
   (16.85%) and 13880 (35.87%) candidate reference genes from 43209 genes
   of early developmental embryos and larvae, 42191 genes of different
   normal adult tissues, and 38690 genes of the hindgut under sulfide
   stress. The number of candidate reference genes for early development
   was lowest. This result was expected because gene expression levels can
   change dramatically in a short time during early development [[105]11].
   Further, the expression levels of the candidate reference genes were
   analyzed. The results indicated that the median log[2](TPM) values of
   the candidate reference genes were 14.162 in the early developmental
   stages, 15.389 in normal adult tissues and 16.317 in hindgut under
   sulfide stress (Fig. [106]1A). The ten most stable genes with the
   lowest CV values in early developmental embryos and larvae, different
   normal adult tissues, and hindgut under sulfide stress are listed in
   Table [107]1. The mean log[2](TPM) and CV values of the ten most stable
   genes in the early development stages ranged from 14.317 to 14.329 and
   0.0157 to 0.0203, respectively. All genes were annotated, of which six
   encoded proteins (FXRD1, IF2M, STX12, PTCD3, TCF25, and TFG) related to
   gene transcription, translation, protein transport and assembly, and
   three encoded proteins (CAPR1, NSMA, PDCD6, and HBS1L) related to cell
   growth, proliferation and apoptosis. As for the ten most stable genes
   in normal adult tissues, their mean log[2](TPM) and CV values ranged
   from 15.524 to 15.532 and 0.0093 to 0.0138, respectively. Nine genes
   were annotated, of which six were encoded proteins (UBE4A, OGT1, GGA3,
   EHMT1, TRA2B, and PRP39) related to protein modification, transport and
   mRNA splicing. The ten most stable genes in the hindgut under sulfide
   stress showed mean log[2](TPM) values ranged from 16.3457 to 16.3463
   and CV values ranging from 0.0018 to 0.0032. Nine genes were annotated,
   of which three (CNOT1, RBM26, and HTR5B) encoded proteins related to
   post-transcriptional regulation.

Fig. 1.

   [108]Fig. 1
   [109]Open in a new tab

   The selected candidate reference genes in early development, normal
   adult tissues, and the hindgut under sulfide stress. A Boxplot
   exhibiting the log[2]TPM values of candidate reference genes that met
   the criteria. B Venn diagram showing the relationships between
   candidate reference genes under three conditions. Pink, early
   development; blue, normal adult tissues; green, hindgut under sulfide
   stress

Table 1.

   The information of the top 10 candidate reference genes in the early
   development, normal adult tissues and hindgut under sulfide stress of
   the U. unicinctus
   Gene ID Gene name CV Mean
   Early development evm.model.Hic_asm_9.985.3 FAD-dependent
   oxidoreductase domain-containing protein 1, FXRD1 0.0157 14.329
   evm.model.Hic_asm_2.658 Caprin-1, CAPR1 0.0170 14.326
   evm.model.Hic_asm_11.431 Putative neutral sphinGOmyelinase, NSMA 0.0171
   14.326
   evm.model.Hic_asm_13.131

   Translation initiation factor IF-2

   Mitochondrial, IF2M
   0.0183 14.323
   evm.model.Hic_asm_2.533 Syntaxin-12, STX12 0.0190 14.322
   evm.model.Hic_asm_6.652 Programmed cell death protein 6, PDCD6 0.0191
   14.326
   evm.model.Hic_asm_3.310.1 Pentatricopeptide repeat domain-containing
   protein 3, mitochondrial, PTCD3 0.0192 14.320
   evm.model.Hic_asm_6.1042 Transcription factor 25, TCF25 0.0201 14.319
   evm.model.Hic_asm_11.1625 TRK-fused gene protein, TFG 0.0202 14.319
   evm.model.Hic_asm_0.1156 HBS1-like protein, HBS1L 0.0203 14.317
   Normal adult tissues evm.model.Scaf262.4 Ubiquitin conjugation factor
   E4 A, UBE4A 0.0093 15.532
   evm.model.Hic_asm_10.1197 UDP-N-acetylglucosamine–peptide
   N-acetylglucosaminyltransferase 110 kDa subunit, OGT1 0.0102 15.531
   novel.9221 Not annotated 0.0114 15.528
   evm.model.Hic_asm_9.1113_evm.model.Hic_asm_9.1114 Histone-lysine
   N-methyltransferase, EHMT1 0.0114 15.528
   evm.model.Hic_asm_8.1166 ADP-ribosylation factor-binding protein, GGA3
   0.0118 15.528
   evm.model.Hic_asm_7.291 Transformer-2 protein homolog beta, TRA2B
   0.0125 15.526
   evm.model.Hic_asm_4.469 F-box only protein 18, FBX18 0.0129 15.525
   evm.model.Hic_asm_3.391 Uncharacterized protein STK_23830 0.0131 15.525
   evm.model.Hic_asm_13.300 Pre-mRNA-processing factor 39, PRP39 0.0134
   15.524
   evm.model.Scaf156.13 Rabenosyn-5, RBNS5 0.0138 15.524
   Hindgut under sulfide stress evm.model.Hic_asm_6.1089 CCR4-NOT
   transcription complex subunit 1, CNOT1 0.0018 16.346
   evm.model.Hic_asm_1.1227 Clathrin heavy chain 1, CLH1 0.0027 16.346
   evm.model.Hic_asm_3.989 Exocyst complex component 6, EXOC6 0.0027
   16.346
   evm.model.Hic_asm_14.1611 Lysosomal alpha-glucosidase, LYAG 0.0027
   16.346
   evm.model.Hic_asm_7.654 Not annotated 0.0028 16.346
   evm.model.Hic_asm_3.766 Melanotransferrin, TRFM 0.0030 16.346
   evm.model.Hic_asm_16.105 Angiomotin, AMOT 0.0031 16.346
   evm.model.Hic_asm_10.1107 RNA-binding protein 26, RBM26 0.0031 16.346
   evm.model.Hic_asm_2.1309 Calumenin-B, CALUB 0.0031 16.346
   evm.model.Hic_asm_13.2016 HEAT repeat-containing protein 5B, HTR5B
   0.0032 16.346
   [110]Open in a new tab

Functional enrichment analysis of candidate reference genes

   To further analyze the relationships of candidate reference genes in
   early development, normal adult tissues, and the hindgut under sulfide
   stress, we compared the three candidate reference gene datasets. As
   shown in Fig. [111]1B, 4079 genes were shared in three candidate
   reference gene datasets, 4184 genes were shared in early development
   and normal adult tissues, 5790 genes were shared in early development
   and hindgut under sulfide stress, and 6615 genes were shared in normal
   adult tissues and hindgut under sulfide stress. Gene Ontology (GO) and
   Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of
   the three candidate reference gene datasets were then performed (Table
   [112]2). In early development, GO enrichment analysis showed that the
   candidate reference genes were mainly enriched in biological process
   (BP) terms associated with cellular protein metabolic process and
   macromolecule metabolic process, and in molecular function (MF) terms
   related to binding. These genes were also enriched in cellular
   component (CC) terms associated with intracellular and cell. KEGG
   pathway enrichment analysis indicated ribosome was the most significant
   pathway, followed by proteasome. In different adult tissues, GO
   enrichment analysis showed that the candidate reference genes were
   mainly enriched in biological process (BP) terms associated with
   cellular protein metabolic process and intracellular transport, and in
   molecular function (MF) terms related to binding. These genes were also
   enriched in cellular component (CC) terms associated with intracellular
   and cell. KEGG pathway enrichment analysis indicated ribosome was the
   most significant pathway, followed by oxidative phosphorylation. In the
   hindgut under sulfide stress, GO enrichment analysis showed that the
   candidate reference genes were mainly enriched in biological process
   (BP) terms associated with protein phosphorylation and in molecular
   function (MF) terms related to protein binding. These genes were also
   enriched in cellular component (CC) terms associated with
   intracellular, cell and organelle. KEGG pathway enrichment analysis
   revealed diabetic cardiomyopathy was the most significant pathway,
   followed by the oxidative phosphorylation. In summary, the results of
   GO and KEGG enrichment analysis in the hindgut under sulfide stress
   were different from those in early development and different normal
   adult tissues, with the proportion of specific candidate reference
   genes belonging to the hindgut under sulfide stress being the largest,
   suggesting that the hindgut under sulfide stress, early developmental
   embryos and larvae, and different adult tissues may focus on diverse
   biological processes or pathways. Therefore, we need to screen for
   optimal reference genes in early development, different normal adult
   tissues, and the hindgut under sulfide stress.

Table 2.

   GO and KEGG pathways enriched in the candidate reference genes
   ID code Term/ Pathway Subgroup Gene number Q value
   Early development GO
   GO:0044267 cellular protein metabolic process Biological process 679
   (19.73%) 0.000000
   GO:0044260 cellular macromolecule metabolic process Biological process
   1518 (44.1%) 0.000000
   GO:0044237 cellular metabolic process Biological process 1883 (54.71%)
   0.000000
   GO:0046907 intracellular transport Biological process 281 (8.16%)
   0.000000
   GO:0005622 intracellular Cellular component 1315 (57%) 0.000000
   GO:0005623 cell Cellular component 1414 (61.29%) 0.000000
   GO:0044464 Cell part Cellular component 1414 (61.29%) 0.000000
   GO:0044424 intracellular part Cellular component 1263 (54.75%) 0.000000
   GO:0043226 organelle Cellular component 1032 (44.73%) 0.000000
   GO:0043229 intracellular organelle Cellular component 1014 (43.95%)
   0.000000
   GO:0005488 binding Molecular function 3127 (72.4%) 0.000000
   GO:0005515 protein binding Molecular function 1612 (37.32%) 0.000000
   GO:0043168 anion binding Molecular function 753 (17.43%) 0.000000
   GO:0000166 nucleotide binding Molecular function 737 (17.06%) 0.000000
   KEGG
   ko03010 Ribosome 89 (4.38%) 0.000000
   ko03050 Proteasome 37 (1.82%) 0.000000
   ko04141 Protein processing in endoplasmic reticulum 97 (4.77%) 0.000000
   ko04120 Ubiquitin mediated proteolysis 84 (4.13%) 0.000001
   ko03420 Nucleotide excision repair 33 (1.62%) 0.000002
   ko05171 Coronavirus disease—COVID-19 75 (3.69%) 0.000010
   Normal adult tissues GO
   GO:0044267 cellular protein metabolic process Biological process 743
   (20.74%) 0.000000
   GO:0046907 intracellular transport Biological process 298 (8.32%)
   0.000000
   GO:0051649 establishment of localization in cell Biological process 323
   (9.01%) 0.000000
   GO:0044260 cellular macromolecule metabolic process Biological process
   1577 (44.01%) 0.000000
   GO:0005622 intracellular Cellular component 1364 (57.97%) 0.000000
   GO:0044424 intracellular part Cellular component 1312 (55.76%) 0.000000
   GO:0005623 cell Cellular component 1458 (61.96%) 0.000000
   GO:0044464 cell part Cellular component 1458 (61.96%) 0.000000
   GO:0043226 organelle Cellular component 1084 (46.07%) 0.000000
   GO:0,043,229 intracellular organelle Cellular component 1056 (44.88%)
   0.000000
   GO:0005488 bind Molecular function 3371 (74.51%) 0.000000
   GO:0005515 protein binding Molecular function 1763 (38.97%) 0.000000
   GO:0043168 anion binding Molecular function 801 (17.71%) 0.000000
   GO:0017076 purine nucleotide binding Molecular function 668 (14.77%)
   0.000000
   KEGG
   ko03010 Ribosome 96 (5.02%) 0.000000
   ko00190 Oxidative phosphorylation 67 (3.51%) 0.000000
   ko03050 Proteasome 38 (1.99%) 0.000000
   ko05171 Coronavirus disease—COVID-19 85 (4.45%) 0.000000
   ko05415 Diabetic cardiomyopathy 85 (4.45%) 0.000000
   ko05016 Huntington disease 141 (7.38%) 0.000000
   Hindgut under sulfide stress GO
   GO:0006468 protein phosphorylation Biological process 329 (4.82%)
   0.000000
   GO:0007264 small GTPase mediated signal transduction Biological process
   225 (3.3%) 0.000000
   GO:0051641 cellular localization Biological process 558 (8.18%)
   0.000000
   GO:0051649 establishment of localization in cell Biological process 513
   (7.52%) 0.000000
   GO:0005622 intracellular Cellular component 2248 (48.43%) 0.000000
   GO:0005623 cell Cellular component 2461 (53.02%) 0.000000
   GO:0044464 cell part Cellular component 2461 (53.02%) 0.000000
   GO:0044424 intracellular part Cellular component 2147 (46.25%) 0.000000
   GO:0043226 organelle Cellular component 1776 (38.26%) 0.000000
   GO:0043229 intracellular organelle Cellular component 1729 (37.25%)
   0.000000
   GO:0005515 protein binding Molecular function 3325 (38.6%) 0.000000
   GO:0005488 binding Molecular function 6141 (71.3%) 0.000000
   GO:0043168 anion binding Molecular function 1372 (15.93%) 0.000000
   GO:0017076 purine nucleotide binding Molecular function 1119 (12.99%)
   0.000000
   KEGG
   ko05415 Diabetic cardiomyopathy 102 (3.23%) 0.000004
   ko00190 Oxidative phosphorylation 73 (2.31%) 0.000004
   ko05017 Spinocerebellar ataxia 97 (3.07%) 0.000008
   ko03050 Proteasome 38 (1.2%) 0.000013
   ko04136 Autophagy—other 36 (1.14%) 0.000026
   ko04120 Ubiquitin mediated proteolysis 107 (3.39%) 0.000026
   [113]Open in a new tab

Validation of candidate and commonly used reference genes expression
stability by RT-qPCR assay

   Five candidates and five commonly used reference genes were selected
   for validation and comparison of the expression stability in this study
   (Additional file [114]1: Table S1). Five of the top ten candidate
   reference genes were chosen for RT-qPCR in each case, as shown in Table
   [115]1. The five candidate reference genes were FXRD1, CAPR1, NSMA,
   IF2M, and STX12 in early development; UBE4A, OGT1, EHMT1, GGA3, and
   TRA2B in normal adult tissues and CNOT1, CLH1, EXOC6, LYAG, and TRFM in
   the hindgut under sulfide stress. The five commonly used reference
   genes were ATPase B, TBP, eIF3, ACTB, and GAPDH (Table [116]3).

Table 3.

   Detailed information on the five commonly used reference genes in the
   early development, normal adult tissues, and the hindgut under sulfide
   stress of the U. unicinctus
   GenBank ID Gene name Early development Normal adult tissues Hindgut
   under sulfide stress
   CV Mean CV Mean CV Mean
   [117]MN662252 ATPase B 0.0623 14.128 0.0424 15.415 0.0150 16.327
   [118]OQ111925 TBP 0.0956 13.568 0.0360 15.435 0.0154 16.325
   [119]MN662253 eIF3 0.0517 14.205 0.0495 15.361 0.0131 16.331
   [120]GU592178 ACTB 0.2601 12.530 0.1265 14.444 0.0132 16.330
   [121]EF012775 GAPDH 0.0887 13.880 0.0607 15.235 0.0202 16.312
   [122]Open in a new tab

   Boxplots were constructed to present the expression levels of five
   candidate reference genes and five commonly used reference genes under
   all three conditions (Fig. [123]2). As shown in Fig. [124]2, Table
   [125]1, and Table [126]3, the novel candidate reference genes possessed
   higher stability than the commonly used reference genes in all three
   conditions. The variances in the commonly used reference genes were
   different among three cases. Three of the five commonly used reference
   genes, ACTB, GAPDH, and TBP are unstable during early development. ACTB
   was the most unstable reference gene in early development and normal
   adult tissues. GAPDH have the highest variance in the hindgut under
   sulfide stress, followed by TBP.

Fig. 2.

   [127]Fig. 2
   [128]Open in a new tab

   The boxplots that show the log[2]TPM values of the five commonly used
   reference genes and five candidate reference genes in early development
   (A), normal adult tissues (B) and hindgut under sulfide stress

   To further examine the results of transcriptome analysis, RT-qPCR
   experiments were carried out, and the expression level constancy of ten
   reference genes in different cases was assessed by four data processing
   methods (geNorm, NormFinder, BestKeeper, and ∆Ct) [[129]57–[130]60].
   Despite a slight difference in the samples used between transcriptome
   data and RT-qPCR, the results of transcriptome analysis and RT-qPCR
   assay are very similar, which suggests that candidate reference genes
   have higher stability than most of the commonly used genes. As shown in
   Fig. [131]3A, during early development, syntaxin-12 (STX12) was the
   most stable gene, which is a member of the syntaxin family localized to
   the endosome [[132]61]. The syntaxin family belongs to the t-SNARE
   subfamily of the SNARE superfamily and is involved in vesicle
   trafficking [[133]62, [134]63]. STX12 is widely expressed and
   potentially participates in a common trafficking event that occurs in
   every cell [[135]64–[136]66], which explains why STX12 showed stable
   expression levels during early development. The most stable reference
   gene in different adult tissues was euchromatic histone-lysine
   N-methyltransferase 1 (EHMT1) (Fig. [137]3B). EHMT1 and euchromatic
   histone-lysine N-methyltransferase 2 (EHMT2) are highly homologous and
   generate functional heterodimeric complexes that are mainly responsible
   for mono- and dimethylation of histone H3 lysine 9 (H3K9) in
   euchromatin [[138]67]. EHMT1/EHMT2 is essential for maintaining the
   normal methylation patterns of H3K9 and plays a central role in the
   epigenetic control of euchromatin, which is vital for normal cell
   function. They are universally expressed and associated with many
   biological processes [[139]67–[140]71]. EHMT1 is also required for
   normal levels of DNA methylation in facultative heterochromatin
   [[141]72]. The stable expression levels of EHMT1 and its key role in
   cells enabled its use as a reference gene in normal adult tissues. As
   to hindgut under sulfide stress, the most stable gene was lysosomal
   alpha-glucosidase (LYAG) (Fig. [142]3C), which is a retaining
   exo-glucosidase catalyzing the production of glucose from glycogen in
   lysosomes [[143]73, [144]74]. LYAG is extremely important for the
   degradation of glycogen in lysosomes [[145]75]. This defect can cause
   the substrate to accumulate in almost all body tissues [[146]76].
   Alpha-glucosidases have weak specificity and a given substrate is not
   strictly connected to a single type of protein [[147]77]. The
   expression levels of LYAG were not affected by sulfide treatment.
   Therefore, STX12, EHMT1 and LYAG can be selected as reference genes to
   normalize the results of the RT-qPCR assay in early development, normal
   adult tissues, and hindgut after sulfide stress in U. unicinctus,
   respectively.

Fig. 3.

   Fig. 3
   [148]Open in a new tab

   Expression stability of the ten genes in early development (A), normal
   adult tissues (B), and hindgut under sulfide stress (C). Stability was
   estimated using geNorm, NormFinder, BestKeeper, and ∆Ct analyses of the
   RT-qPCR data. The genes were ranked in descending order of expression
   stability from left to right after comprehensive analysis

   Compared to the novel screened candidate reference genes, most of the
   commonly used reference genes had lower comprehensive ranking values.
   During early development, ACTB was the most variable, followed by TBP,
   which is consistent with the transcriptome data of the two genes with
   high CV values (Fig. [149]3A, Table [150]3). Similarly, in different
   normal adult tissues, ACTB had the lowest comprehensive analysis
   ranking values of RT-qPCR and the highest variance, which was not
   included in the candidate reference gene list. TBP and ATPase B were
   more stable with relatively high rankings of stability by RT-qPCR
   compared with other traditional reference genes (Fig. [151]3B), which
   is expected because the two genes are candidate reference genes that
   pass the criteria and have lower CV values in five commonly used
   reference genes (Table [152]3). The results of the comprehensive
   analysis in early development and normal adult tissues are in
   accordance with those of previous studies in U. unicinctus [[153]52].
   During sulfide stress in the hindgut, ACTB was the most unstable in the
   comprehensive RT-qPCR results (Fig. [154]3C). However, there was also
   some inconsistency between the results of RT-qPCR and transcriptome
   data analysis. For example, GAPDH, which ranked second after STX12 in
   the RT-qPCR results during early development (Fig. [155]3A), was not
   included in the candidate reference gene list with a 1.23 of SD
   log[2](TPM) value by transcriptome data analysis. We deduced that the
   inconsistent phenomenon may result from the difference in the samples
   between the transcriptome data and RT-qPCR.

Relationship of gene expression level between transcriptome data and RT-qPCR

   Previous studies have suggested that there is a high correlation
   between RNA-Seq data and Ct value of RT-qPCR [[156]11]. Therefore, we
   assessed the relationship between the TPM values of the transcriptome
   data and RT-qPCR data. As shown in Fig. [157]4, there was a significant
   negative correlation between log[2](TPM) and Ct values (R^2 = 0.0453,
   P < 0.0001), with the formula Ct =  − 0.5405 log[2](TPM) + 34.51. This
   formula will contribute to the estimation the Ct value based on
   transcriptome data without executing the RT-qPCR assay, and will be
   conducive to our further research.

Fig. 4.

   [158]Fig. 4
   [159]Open in a new tab

   The correlation of gene expression between log[2]TPM of transcriptome
   data and Ct values of RT-qPCR

Conclusions

   In this study, we identified candidate reference genes for embryos and
   larvae of early developmental stages, normal adult tissues, and the
   hindgut under sulfide stress based on transcriptome data from U.
   unicinctus. We then validated of the candidate reference genes by
   RT-qPCR using four methods (geNorm, NormFinder, BestKeeper, and ∆Ct)
   and compared the stability between the candidate reference genes and
   commonly used reference genes. The results showed that STX12, EHMT1,
   and LYAG are the most stable candidate reference genes in early
   development, normal adult tissues, and the hindgut under sulfide
   stress, respectively. Our study indicates that transcriptome analysis
   approaches have great potential to discover novel stable reference
   genes and will contribute to future gene expression level research in
   U. unicinctus.

Materials and methods

Animals materials and treatments

   Adult U. unicinctus were collected from the intertidal zone along the
   coast of Yantai city, China. They were maintained in aerated seawater
   (19℃, pH8.0, salinity 30 PSU) and raised with Chaetoceros muelleri,
   Chlorella vulgaris, and Platymonas helgolandica.

   We selected three healthy adult worms and dissected six tissues,
   including the body wall, coelomic fluid, foregut, mid-gut, hindgut and
   anal sac from each individual in phosphate-buffered saline (PBS,
   pH7.4). After dissection, the tissues were immediately frozen in liquid
   nitrogen and stored at -80℃.

   Sexually mature individuals were selected and dissected to acquire
   mature ova and sperm from nephridia (gonoducts) during the spawning
   season. Sperm and ova were then mixed for artificial insemination at
   ratio of 10:1. Fertilized eggs were reared in filtered seawater (FSW)
   (17℃, pH 7.9, salinity 30). Embryos and larvae from ten developmental
   stages, including early cells (EC, 2 cells, 4 cells, and 8 cells),
   multiple cell (MC, generally more than 32 cells), blastulae (BL),
   gastrulae (GA), early trochophore larva (ET, 1 d post fertilization,
   dpf), mid-trochophore larva (MT, 2 dpf), late-trochophore larva (LT, 25
   dpf), early segmentation larva (ES, 30 dpf), segmentation larvae (SL,
   35 dpf) and worm-shaped larvae (WL, 42 dpf), were collected, frozen
   immediately in liquid nitrogen and then stored at -80℃ for total RNA
   extraction. Three biological replicates were prepared for each
   developmental stage.

   The experimental system and sulfide treatment were conducted as
   described previously [[160]45]. We prepared three aquariums containing
   30 L of seawater and sealed them with a cling film. Six individuals
   were randomly selected for placement in each aquarium. The sulfide
   concentration in seawater was maintained at 50 μM (equivalent to
   moderately polluted sediment that U. unicinctus can live normally) by
   adding the sulfide stock solution (10 mM Na[2]S, pH 8.0) at 2 h
   interval, and detected by the methylene blue method [[161]41]. The
   hindguts of three individuals from each aquarium were dissected 0
   (control), 6, 24, and 48 h after sulfide treatment. The hindgut was
   immediately frozen in liquid nitrogen and stored at − 80 °C.

Transcriptome datasets

   Transcriptome data of U. unicinctus embryos and larvae at various
   developmental periods were obtained from the NCBI Sequence Read Archive
   (SRA) database under the accession numbers PRJNA485379 and PRJNA394029,
   mainly including the following stages, EC: Early cells; MC:
   Multicellular; BL: Blastula; GA: Gastrula; ET: Early-trochophore; MT:
   Mid-trochophore; LT: Late-trochophore; SL: Segmentation larva and WL:
   Worm-shaped larva. The transcriptome data of the normal adult tissues
   were also obtained from the NCBI SRA database under the accession
   number PRJNA917787, which mainly included the body wall, coelomic
   fluid, foregut, mid-gut, hindgut, and anal sac. Transcriptome data of
   the sulfide stress hindgut samples (50 μM for 0, 6, 24, and 48 h) were
   also obtained from the NCBI SRA database under the accession number
   PRJNA752504.

Identification of reference genes based on transcriptome data

   Reference genes for RT-qPCR were selected using the coefficient of
   variation (CV) method as previously described [[162]11]. TPM values
   were used to measure gene expression levels and averaged for subsequent
   analyses of biological replicates. Firs, genes with log[2](TPM) values
   less than or equal to 5 were excluded, because these low-expression
   genes would lead to poor RT-qPCR results, which makes it difficult to
   detect and quantify their expression. CV values were calculated using
   the formula CV = standard deviation (SD) of log[2](TPM) / average
   log[2](TPM) (mean). Calculations for the mean, SD, and CV were
   implemented in Microsoft Excel. Candidate reference genes with low
   variances were required, with SD values lower than 1. Therefore, a CV
   cut-off of 0.2 for stable genes was adopted, which was the cut-off for
   stable expression across heterogeneous genes.

Functional enrichment analysis

   To further understand the functions of the selected candidate reference
   genes, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and
   Genomes (KEGG) enrichment analysis were performed. The Swiss-Prot Blast
   results for all genes and the results of reference genes were imported
   into the online software OmicShare Tools
   ([163]https://www.omicshare.com/tools/home/index/index.html), and the
   GO and KEGG enrichment analysis was completed using the Bioinformatics
   Cloud Tool Platform [[164]78–[165]80].

RNA isolation and cDNA synthesis

   Total RNA was extracted from the stored different samples using TRIzol
   reagent (Invitrogen, Carlsbad, CA, USA), according to the
   manufacturer’s instructions. The RNA quality was assessed using
   NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA) and agarose gel
   electrophoresis. Then, the cDNA template was prepared using a
   PrimeScript™ RT reagent kit with gDNA Eraser (TaKaRa, Dalian, China)
   following the manufacturing’s instruction, and diluted with distilled
   water (1:10) for subsequent experiments.

Validation of the reference gene expression stability by RT-qPCR assay

   Ten genes, consisting of five novel candidates and five commonly used
   reference genes, were chosen for RT-qPCR validation in early
   development, normal adult tissues, and the hindgut under sulfide
   stress. Primers were designed using Primer Primier software (5.0) and
   the primer sequences are listed in Additional file [166]1: Table S1.

   RT-qPCR was performed on Light Cycler 480 system (Roche, Basel,
   Switzerland) using SYBR Premix Ex TaqTM (TaKaRa, Dalian, China). All
   reactions were carried out with three sample replicates and three
   technical replicates, and all RT-qPCR assays were validated in
   compliance with “MIQE guidelines” [[167]81].

   Four statistical approaches, geNorm ([168]https://genorm.cmgg.be/)
   [[169]57], NormFinder ([170]http://moma.dk/) [[171]58], BestKeeper
   ([172]www.gene-quantification.com/bestkeeper.html) [[173]59], and ∆Ct
   method [[174]60], were applied to estimate the expression stability of
   the reference genes. The final ranking of gene expression stability was
   determined by calculating the geometric mean values of the results
   acquired using the four approaches.

Statistical analysis

   Statistical analysis of the correlation between FPKM and Ct values was
   performed using one-way analysis of variance (one-way ANOVA).
   Statistical significance was set at P < 0.05.

Supplementary Information

   [175]12864_2023_9358_MOESM1_ESM.xlsx^ (11.6KB, xlsx)

   Additional file 1:  Table S1. Information of primers of reference genes
   in RT-qPCR validation.

Acknowledgements