Graphical abstract graphic file with name fx1.jpg [40]Open in a new tab Highlights * • Toti is the first-in-field multi-omics resource focused on totipotency * • Toti enables intuitive and comparative in silico exploration of totipotency * • Toti provides interactive single-cell transcriptomic analysis of totipotent states * • Toti offers motif- and pathway-based analyses __________________________________________________________________ In this article, Sheng, Fu, and colleagues present Toti, the first comprehensive multi-omics database focused on totipotency, enabling in silico exploration of its transcriptional and epigenetic regulation. Toti integrates data from in vivo, in vitro, and genome-edited models across 8,284 human and mouse embryonic samples, allowing flexible search, visualization, and analysis of totipotent and related stem cell states. Introduction Totipotent stem cells (TSCs), emerging in early embryogenesis with the broadest cellular plasticity in the mammalian body, harbor enormous potential for regenerative medicine and reproductive technology ([41]Cai et al., 2022). Relative to the developmentally more restricted pluripotent stem cells (PSCs), TSCs are capable of producing all of the differentiated cell types in both embryo and extraembryonic components and forming an entire organism ([42]Wu and Schöler, 2016). Totipotency is limited to early-stage blastomeres. In mice, only the zygotes and blastomeres from 2-cell embryos are bona-fide TSCs and can give rise to the blastocyst, which is composed of the inner cell mass (ICM) and outer trophectoderm ([43]Figure 1A) ([44]Sotomaru et al., 1998). As cells develop into distinct cell lineages by the blastocyst stage, totipotency is lost and cellular plasticity is gradually reduced. Currently, due to the scarcity of embryonic TSCs, the gene regulatory logic underlying totipotency is still incompletely understood. Figure 1. [45]Figure 1 [46]Open in a new tab Toti provides multi-omics data to investigate epigenetic and transcriptomic underpinning of totipotency (A) Toti contains multi-omics data in in vivo, in vitro, and genome-edited embryonic TSCs, TSC-like cells, PSCs, and embryos spanning preimplantation stages from 4,680 human and 3,604 mouse samples. FSHD, facioscapulohumeral muscular dystrophy; PSCs, pluripotent stem cells; TSCs, trophoblast stem cells; 8CLCs, 8-cell like cells; EPSCs, expanded pluripotent stem cells; EPSs, extended pluripotent stem cells; H9-e4CL, H9 hESC cultured in e4CL (stepwise) for 5 days; TBLCs, totipotent blastomere-like cells; ESCs, embryonic stem cells; H9-4CL, H9 hESC cultured in 4CL (stepwise) for 12 days; C2C12, cultured mouse myoblasts; MEF, mouse embryonic fibroblasts; 2CLCs, 2-cell-like cells; TPSs, totipotent potential stem cells; TLSCs, totipotent-like stem cells; ciTotiSCs, chemically induced totipotent stem cells; ZGA, zygotic genome activation. (B) A Ciros plot showing read coverage for gene expression (orange), H3K4me3 (blue), and H3K27ac (green) histone-modified peaks in mESC, 2CLC, and 2-cell embryo. The black lines in the center represent genomic gaps or low-mappability regions, such as centromeres and other repetitive sequences. (C) Read coverage represents gene expression (orange), H3K4me3 (blue), and H3K27ac (green) modifications around totipotent signature genes, Zfp352, Zscan4b, and Zscan4c, and pluripotent signature gene Klf4, in 2-cell embryo, 2CLC, and mESC. mESC, mouse embryonic stem cell. In vitro cellular models are critical for understanding the molecular architectures of cell stemness. For instance, embryonic stem cells (ESCs), derived from the ICM of blastocysts, are classic pluripotent cellular models and greatly facilitate the exploration of pluripotency ([47]Rossant and Tam, 2009). Intriguingly, approximately 1%–5% of ESCs resembling the blastomeres of 2-cell embryos, referred to as “2-cell-like cells” (2CLCs), arise spontaneously in ESC culture in vitro. They show greater cellular plasticity and downregulated protein levels of pluripotency factors ([48]Rodriguez-Terrones et al., 2018). Unlike pluripotent ESCs, 2CLCs retain the totipotent-like state to generate both embryonic and extraembryonic tissues when reintroduced into early embryos ([49]Yang et al., 2020), highlighting the promising application of these cells in understanding totipotency. Other in vitro totipotent-like cellular counterparts with biological relevance, including totipotent blastomere-like cells ([50]Shen et al., 2021), totipotent potential stem cells (TPSs) ([51]Xu et al., 2022), and chemically induced totipotent stem cells (ciTotiSCs), have been actively pursued afterward ([52]Malik and Wang, 2022) and widely used in totipotency study as well. Recent advances in totipotency cellular model and sequencing technology lead to the ever-growing influx of multi-omics data on totipotent-like cells. These results have rapidly forwarded our understanding of epigenomic and transcriptomic features governing the establishment and exit of totipotency. For example, previous studies have identified transcription factor (TF) Dux as the master inducer of the 2C-like transition in mouse embryonic stem cells (mESCs) ([53]Hendrickson et al., 2017), based on RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) data detected from CRISPR-based genome-edited embryonic cells. Supported by RNA-seq data, 2CLCs are typically characterized by transient activation of major satellites ([54]Kresoja-Rakic and Santoro, 2019) and endogenous retroviral elements that significantly contribute to promoting zygotic genome activation (ZGA) ([55]Lu and Zhang, 2015). Particularly, as shown by comprehensive epigenetic information measured by ATAC-seq, bisulfite sequencing, and ChIP-seq, endogenous retrovirus, MERVL, activated many 2C-specific genes, including the Zscan4 cluster genes, likely through epigenetic variations, such as DNA methylation and histone modifications associated with DNMT ([56]Eckersley-Maslin et al., 2016) and LSD1 (KDM1A) ([57]Wasson et al., 2016). Other features of 2-cell embryos, such as their chromatin accessibility landscape ([58]Hendrickson et al., 2017) and increased global histone mobility ([59]Bošković et al., 2014), are also found to be recapitulated in 2CLCs. Moreover, the remodeling of the totipotency-specific broad H3K4me3 domains could help stabilize totipotency in vitro ([60]Yang et al., 2022), highlighting the importance of characterizing epigenetic underpinning related to totipotency. Besides, benefiting from scRNA-seq data, our previous studies identified a novel intermediate state during the 2C-like transition ([61]Fu et al., 2019, [62]2020). Therefore, extensive investigation of epigenetic and transcriptomic architectures in totipotent cells is of paramount importance to pinpoint molecular mechanisms controlling cellular plasticity and lineage segregation in early development. Integrative analysis of multi-omics data in both embryonic cells and their in vitro counterparts makes it possible to comprehend the molecular basis of in vivo cell fate differentiation and in vitro cell fate engineering. Currently, arising databases mainly contain information on pluripotent cells instead of totipotent cells and few information on CRISPR-based genome-edited embryonic cells. For example, EMAGE ([63]Christiansen et al., 2006), DBTMEE ([64]Park et al., 2015), EmExplorer ([65]Hu et al., 2019), StemCellDB ([66]Mallon et al., 2013), LifeMap Discovery ([67]Edgar et al., 2013), FunGenES ([68]Schulz et al., 2009), and StemMapper ([69]Pinto et al., 2018) provide gene expression patterns during mouse or human embryo development and in induced pluripotent stem cells. ScRNASeqDB ([70]Cao et al., 2017) provides single-cell gene expression profiles in 200 human early embryonic cells at different developmental stages. MethBank ([71]Li et al., 2018), DevMouse ([72]Liu et al., 2014), iHMS ([73]Gan et al., 2017), GED ([74]Bai et al., 2017), and MetaImprint ([75]Wei et al., 2014) are mainly focused on epigenetic modifications, such as DNA methylation and histone modifications during embryogenesis in human and mouse. EpiDenovo ([76]Mao et al., 2018), ESCAPE ([77]Xu et al., 2013), DevOmics ([78]Yan et al., 2021), and dbEmbryo ([79]Huang et al., 2022) integrated genomic, epigenomic, and transcriptomic data from human and mouse early embryos without TSCs. However, none of the existing databases have provided multi-omics data on in vivo, in vitro, and genome-edited totipotent cells, greatly hindering the interpretation of underlying mechanisms that contribute to totipotency and embryo development. To bridge this gap, here we present a novel database, Toti, the unique and first-in-field multi-omics database dedicated to comprehensively investigating epigenetic and transcriptomic architectures in early-stage embryogenesis of mouse and human ([80]Figure 1A). Toti encompasses in vivo, in vitro, and genome-edited embryonic cells, with the particular focus on totipotent cells, thus providing an unprecedented platform for extensive investigation of totipotency in silico ([81]Figure 1A). Toti allows facilitated and quick access of interested samples/datasets by searching for genes, sequencing types, or other relevant keywords. Toti enables comparative analyses not only on epigenetic characteristics measured by multiple methods, including bisulfite sequencing, ATAC-seq, ChIP-seq, CUT&RUN, and CUT&Tag, but also on temporal gene expression patterns supported by RNA-seq and single-cell RNA-seq (scRNA-seq) data to dissect molecular (dis)similarity across in vivo and in vitro totipotent cells, PSCs, and other embryonic cells in human and mouse ([82]Figure 1B). Toti also facilitates users to prioritize top enriched TFs, genes and pathways under different developmental stages, and genome-edited conditions. Taken together, Toti thus serves as a unique, comprehensive, and valuable resource to provide insights into molecular characteristics shaping totipotency. Results Toti overview As characterization of the molecular basis in TSCs is the key to comprehend mechanisms underlying totipotency, here we design Toti to store gene expression and epigenetic modification information covering in vivo, in vitro, or CRISPR-based genome-edited totipotent, pluripotent, and differentiated embryonic cells ([83]Figure 1A) from 4,680 human and 3,604 mouse samples ([84]Tables S1 and [85]S2). Toti contains: (1) DNA methylation (WGBS, PBAT, and RRBS); (2) genome-wide chromatin features, including open chromatin peaks (ATAC-seq), TF-binding sites, and histone modifications (ChIP-seq, CUT&Tag, and CUT&RUN); and (3) gene expression patterns (RNA-seq and scRNA-seq) from different types of cells. According to these holistic multi-omics data in Toti, we observed globally similar patterns of peaks in gene expression, H3K4me3 and H3K27ac modifications in both mouse 2-cell embryos and 2CLCs, but distinct from mESCs ([86]Figure 1B). Specifically, regarding totipotent genes, such as Zfp352 ([87]Li et al., 2023), Zscan4b, and Zscan4c ([88]Falco et al., 2007), enriched peaks in gene expression, H3K4me3, and H3K27ac were shown in both mouse 2-cell embryos and 2CLCs, but not in mESCs, highlighting their totipotency-specific signatures in 2-cell embryos and 2CLCs ([89]Figure 1C). As a comparison, we also observed mESC-specific peaks in gene expression, H3K4me3, and H3K27ac around the pluripotent gene Klf4 ([90]Figure 1C) ([91]Guo et al., 2009). Hence, by integrative analysis of these orthogonal omics data from cells under different developmental stages or genome-edited conditions, Toti makes it possible for users to comprehensively investigate totipotency in silico. Toti is designed with three main functionalities: Search, Browse, and Analysis ([92]Figure 2). Figure 2. [93]Figure 2 [94]Open in a new tab The main function of Toti (A) “Search” module allows a flexible search of genes, sequencing types, or other relevant keywords. (B) “Browse” includes two sub-modules. “Genome Browse” visualizes analyzed multi-omics data by embedding configured JBrowse. “Featured Gene” visualizes expression patterns of top variable signature genes or user-selected genes under selected developmental stages or genome-edited conditions. ZGA, zygotic genome activation. (C) “Analysis” contains three sub-modules. “Motif Enrichment” and “Pathway Enrichment” prioritize enriched transcription factors and biological pathways associated with interested development stage or genome-edited condition, respectively. “scRNA-seq Analysis” facilitates the intuitive comparison of gene expression or co-expression patterns among embryos under selected developmental stages or genome-edited conditions. Search Toti enables a flexible search that provides quick access to totipotency-related studies/datasets of interest by querying genes, sequencing types, species, or other relevant keywords, such as histone modification type, cell type, library strategy, article name, and GEO accession ID. All the queried tables recording meta information of data stored in Toti are sortable and downloadable. Users can also click the hyperlink of each GEO accession ID to obtain further experimental details or acquire the corresponding raw data ([95]Figure 2A). Browse A systematic scrutiny of epigenetic and transcriptomic architectures, specifically for signature genes representing for totipotency or pluripotency, is extremely important to understand key factors governing the establishment and exit of totipotency in embryos. Toti ensures both orthogonally and parallelly comparative investigations on molecular basis with “Genome Browse” and “Featured Genes” ([96]Figure 2B). Genome browse: Allows for orthogonally and parallelly comparative analysis Toti provides extensive investigation into gene expression and epigenetic variations, such as open chromatin peaks, histone modifications, and DNA methylation, across in vivo, in vitro, and genome-edited totipotent, pluripotent, and differentiated cells by embedding our well-configured JBrowse ([97]Figure 2B). Data are hierarchically organized based on species, sequencing type, epigenetic modification, and gene expression. The interface permits the selection of one or multiple tracks to visualize analyzed multi-omics data. By specifying interested species, cell type, gene, or chromosome region, tracks for the corresponding regions are displayed. And it is of particular convenience to share the queried result to others for further discussion by clicking the “Share” button on the top right of the Genome Browser. Users can also add custom data tracks for personalized comparison. All the resulting images, such as plots for gene structure, read coverage, and histone modification peaks, are easily to be exported. Featured genes: Provides insights into expression patterns of signature genes The developmental trajectory of totipotent cells during early development is governed by highly coordinated transcriptomic remodeling. After fertilization, the maternal genes inherited from oocytes drive two waves of ZGA, including minor ZGA and major ZGA ([98]Lu and Zhang, 2015). These ZGA genes modulate the entry and exit of totipotency in embryos. Understanding the intricate interplay of these signature genes is critical for comprehending the biological nuances of totipotency. Here, we designed “Featured Genes” ([99]Figure 2B) sub-module to allow users to scrutinize expression patterns of signature genes, including maternal, minor ZGA, major ZGA, totipotent, and pluripotent genes defined from [100]Yang et al. (2022), across both in vivo and in vitro TSCs and cells under different developmental stages or genome-edited circumstances. Users can customize their search by choosing interested featured genes and specifying cells to compare. Toti prioritizes development- or genome-edited-condition-associated genes based on the standard deviation of gene expression among selected samples. Subsequently, interactive heatmaps are rendered, providing a visual representation of top variable signature genes across chosen samples. Alternatively, user can also flexibly select any gene of interest and visualize their expression patterns across selected samples. The expression matrix of each heatmap can be downloaded and further tailored for customized use. Analysis It is of fundamental importance to pinpoint top enriched TFs, genes, and pathways under different totipotent-associated development stages or genome-edited conditions, as these analyses could shed light on gene regulatory logics contributing to totipotency. Here, we designed three sub-modules, including “Motif Enrichment,” “Pathway Enrichment,” and “scRNA-seq analysis” in this “[101]analysis” section ([102]Figure 2C) to provide insights into the top development- or genome-edited-condition-associated TFs, genes, or pathways and facilitate the dissection of gene expression patterns at single-cell resolution in totipotency-related samples. Motif enrichment: Prioritizes top associated TFs As TFs play important roles in modulating cellular gene expression, we developed a “Motif Enrichment” sub-module to prioritize top associated TFs under different genome-edited conditions, histone modifications, or developmental stages ([103]Figure 2C). By selecting interested epigenetic modifications, such as open chromatin peaks (ATAC-seq), TF-binding sites, or histone modifications (ChIP-seq, CUT&Tag, or CUT&RUN), in cells under certain conditions, Toti provides enriched TF-binding motifs analyzed by HOMER ([104]http://homer.ucsd.edu/homer/). Further details for motifs are accessible via clicking links, such as “motif file” and “Known Motif Enrichment Results.” Pathway enrichment: Prioritizes top associated biological pathways Previous studies indicated that recapitulating epigenetic modifications of 2-cell embryos, such as chromatin accessibility landscape ([105]Hendrickson et al., 2017) and broad H3K4me3 ([106]Yang et al., 2022), can facilitate totipotency stabilization in vitro. This underscores the importance of characterizing activated biological pathways under the establishment, self-renewal, or exit of totipotency. Therefore, we developed the “Pathway Enrichment” sub-module ([107]Figure 2C), providing users with an intuitive and interactive chart, along with a sortable and downloadable table that prioritizes key relevant biological pathways based on gene ontology (GO) ([108]Ashburner et al., 2000) and Kyoto Encyclopedia of Genes and Genomes ([109]Kanehisa and Goto, 2000). This sub-module enables users to gain insights into biological context-dependent epigenetic modifications in in vivo, in vitro, or genome-edited totipotent, pluripotent, and differentiated cells. Each resulting table can be sorted and downloaded. More information, such as genes involved in each pathway, could be accessible in the downloaded CSV file. scRNA-seq analysis: Dissects transcriptome nuances at single-cell resolution Using scRNA-seq data sequenced at different time points of Dux induction, we previously confirmed the existence of an intermediate state during the 2C-like transition ([110]Fu et al., 2019), emphasizing that scrutiny of transcriptome at single-cell resolution could promote a better understanding of molecular mechanisms underlying the establishment and exit of totipotency. Here, we present an interactive “scRNA-seq analysis” interface to facilitate the intuitive comparison of gene expression or co-expression across cells under different developmental stages and TSC-like cells with/without genome-editing ([111]Figure 2C). To our knowledge, this is the first online scRNA-seq analysis tool tailored to investigation in transcriptome nuances of totipotency. A case study: A systematic integrated analysis of epigenetic and transcriptomic architectures in totipotency Here, we present a case study to illustrate how the functional modules in Toti, such as “Featured Gene,” “Genome Browse,” “Motif Enrichment,” “Pathway Enrichment,” “scRNA-seq analysis,” and “Download,” can be effectively utilized to explore the epigenetic and transcriptomic nuances of totipotency. In this case study, we mainly chose five mouse cell types for comparison, including pluripotent mESCs, embryonic TSCs (2-cell embryos), and three in vitro TSCs models (2CLCs, TPSs, and ciTotiSCs). 2CLCs are naturally occurring, transient totipotent-like cell populations within mESC cultures, whereas TPSs and ciTotiSCs are totipotent-like cell lines generated through distinct chemical reprogramming protocols. All these three in vitro TSC models were previously found to show similar upregulated expression of totipotency factors as in totipotent 2-cell (2C) embryos but were distinguished from mESCs ([112]Hu et al., 2023). Our case study aims to delineate epigenetic and transcriptomic architectures recapitulated by these in vitro TSCs through systematic integrated analysis of multi-omics data available in Toti. This case study not only sheds light on how to leverage these modules in totipotency exploration but also highlights their potential applications in advancing our understanding of totipotent states. Deciphering transcriptome discrepancies in totipotency across in vitro TSC models The in vitro TSCs exhibit totipotent-like features in terms of transcriptome and developmental potential and thus serve as alternative models for totipotency study ([113]Xu and Liang, 2022). To evaluate totipotency-associated characteristics these in vitro TSCs recapitulated, we chose embryos across different developmental stages to compare the molecular (dis)similarity across these cellular models and in vivo cells using scRNA-seq analysis. In addition, mESCs were also included in the analysis as controls. Consistent with previous findings ([114]Hu et al., 2023), we observed that all in vitro TSCs showed higher similarity with the totipotent 2-cell embryos compared to mESCs ([115]Figure 3A). Interestingly, although these TSC models were all considered in vitro totipotent-like cells sharing core totipotent features, discrepancies in their gene expression patterns suggest that they may capture different molecular states within the totipotency spectrum, as reflected by their separated clustering positions in the UMAP ([116]Figure 3A). Figure 3. [117]Figure 3 [118]Open in a new tab Transcriptome discrepancies in totipotency across in vitro TSC models (A) The UMAP of in vitro mouse TSC models (2CLCs, ciTotiSCs, and TPSs), mESCs, and mouse embryo cells spanning from zygote to blastocyst (n = 34,874 cells). 15 cell clusters were identified. (B) Expression patterns of maternal, minor ZGA, major ZGA, totipotent, and pluripotent genes in oocyte, zygote, 2-cell embryo, mESC, 2CLC, ciTotiSC, and TPS. ZGA, zygotic genome activation; mESC, mouse embryonic stem cell; 2CLC, 2-cell-like cell; ciTotiSC, chemically induced totipotent stem cell; TPS, totipotent potential stem cell. (C) Heatmaps showing gene expression patterns of maternal genes in oocytes, 2-cell embryos, 2CLCs, TPSs, ciTotiSCs, and mESCs (left). Functional enrichment analysis results (right). The x axis is the −log[10] (two-sided p value); the y axis represents the enriched pathways. (D) Heatmaps showing gene expression patterns of totipotent genes in 2-cell embryos, 2CLCs, TPSs, ciTotiSCs, and mESCs (left). Functional enrichment (gene ontology, GO) analysis results (right). (E) Heatmaps showing gene expression patterns of minor ZGA genes in 2-cell embryos, 2CLCs, TPSs, ciTotiSCs, and mESCs (left). Functional enrichment (gene ontology, GO) analysis results (right). (F) Heatmaps showing gene expression patterns of major ZGA genes in 2-cell embryos, 2CLCs, TPSs, ciTotiSCs, and mESCs (left). Functional enrichment (gene ontology, GO) analysis results (right). The x axis is the −log[10] (two-sided p value); the y axis represents the enriched pathways. Shared, genes expressed in all three in vitro TSC models (2CLCs, ciTotiSCs, and TPSs); 2CLC-sig/ciTotiSC-sig/TPS-sig, cell-type-specific expressed genes. To investigate the discrepancies among these TSC models, we utilized the “Featured Gene” module to examine the expression patterns of previously reported signature genes, including maternal, minor ZGA, major ZGA, totipotent, and pluripotent genes ([119]Yang et al., 2022). In line with previous findings, these TSC models exhibited elevated expression of totipotent and ZGA genes and decreased expression of pluripotent genes compared to mESCs ([120]Figure 3B) ([121]Xu and Liang, 2022). Notably, we also observed differences in expression of maternal and totipotent genes in these in vitro TSCs, while comparing to that in zygotes ([122]Figure 3B). This suggests that, although all these three TSC models are widely used as in vitro models of totipotency and share core totipotent features, they also exhibit unique transcriptional signatures, likely reflecting their different origins, derivation methods, and molecular states within the totipotent spectrum. This diversity offers a valuable opportunity to explore multiple facets of totipotency that may not be fully captured in any single model. Next, based on the expression patterns of previously reported signature genes ([123]Yang et al., 2022) in the three in vitro TSC models, we categorized maternal, totipotent, and ZGA genes into both shared and cell-type-specific clusters using unsupervised hierarchical clustering. These included a shared cluster and three cell model-specific cluster, defined as the 2CLC-signature cluster, cTotiSC-signature cluster, and TSC-signature cluster ([124]Figures 3C–3F; [125]Table S3). Our analysis revealed both common and distinct expression patterns of these signature genes across the three in vitro TSC models ([126]Figures 3C–3F). Specifically, while each model displayed unique transcriptional signatures, they all shared core totipotency features. As shown in [127]Figure 3D, the totipotent genes shared among the three in vitro TSC models were functionally enriched in pathways related to cell development and maturation, supporting a common totipotent identity. However, each in vitro TSC model also expressed a unique subset of totipotent genes, reflecting molecular distinctions among them. Notably, 2CLCs were uniquely enriched for totipotent genes associated with chromosome telomeric regions ([128]Figure 3D), including the well-established Zscan4 family genes, consistent with previous transcriptomic and epigenetic studies of 2CLCs ([129]Eckersley-Maslin et al., 2016). In contrast, cTotiSC-signature totipotent genes were enriched in pathways related to cell fate commitment, pattern specification, and regionalization ([130]Figure 3D). TPS-specific totipotent genes showed enrichment in immune response related pathways. These distinct features likely reflect differences in the origin and induction strategy of the models and suggest that each captures a different molecular state within the totipotent spectrum. Key histone modifications correlated with gene expression in 2CLCs Our analysis showed that 2CLCs partially recapitulated the transcriptome features of totipotency, as evidenced by the extremely high expression of totipotent and minor ZGA genes in 2CLCs compared to other cells ([131]Figure 3B). To explore key epigenetic modifications modulating totipotent signatures in 2CLCs, we initially examined the correlation between gene expression and histone modifications, which were previously shown to regulate early development ([132]Zhou and Dean, 2015), including H3K4me3, H3K27ac, H3K27me3, and H3K79me3. When considering all protein-coding genes (PCGs), whose promoters regions overlap with histone modification peaks (±5 kb around the transcription start site [TSS] of each PCG), we only observed that H3K4me3 modification showed association with their expression ([133]Figure 4A). Among the histone marks examined, H3K4me3 showed the strongest correlation with both totipotent genes and minor ZGA genes, compared to all PCGs in 2CLCs ([134]Figure 4A; [135]Table S4). In addition, the marker H3K27ac positively correlated with expression of both totipotent and minor ZGA genes ([136]Figures S1A and S1B). The Polycomb repression marker H3K27me3 ([137]Kundaje et al., 2015) showed no correlation with expression of all PCGs, totipotent genes, and minor ZGA genes in 2CLCs ([138]Figures 4B and [139]S1C), which is in line with the previous finding that H3K27me3 does not modulate 2C-like transition in mESCs ([140]Rodriguez-Terrones et al., 2018). H3K79me3 only showed a slightly positive association with totipotent genes but not all PCGs ([141]Figures 4B and [142]S1D). Furthermore, among the histone modifications analyzed, only H3K4me3 showed enrichment around the TSS of PCGs in 2CLCs ([143]Figure 4C). Collectively, these findings suggest that among histone modifications known to regulate embryonic gene expression, H3K4me3 is particularly strongly associated with the transcriptional activation of totipotent genes and minor ZGA genes in 2CLCs. This observation is consistent with previous studies showing that H3K4me3 deposition closely correlates with the expression of upregulated 2C-specific genes ([144]Zhang et al., 2021), further supporting the robustness of the standardized data analysis pipeline implemented in Toti. Figure 4. [145]Figure 4 [146]Open in a new tab Identification of key histone modifications that modulate gene expression in 2CLCs (A) Correlation between H3K4me3 modification and expression of all protein-coding genes (PCGs; left), totipotent genes (middle), and minor ZGA genes (right). (B) Correlation between expression of all protein-coding genes (PCGs) and H3K27ac (left), H3K79me3 (middle), and H3K27me3 (right) modification. The x axis is normalized gene expression (log[2] (RPKM +1)) and the y axis is the density of reads overlapping with histone modified regions (±5 kb around TSS) and in 2CLCs. (C) Density distribution of reads overlapping with histone-modified regions (±5 kb around TSS) in 2CLCs. The x axis is the distance from the transcription start site. And the y axis is the read density. The potential role of H3K4me3-Zscan4 axis in 2CLCs Zscan4 family genes are typically totipotent signature genes in 2CLCs ([147]Falco et al., 2007). Next, we investigated gene expression and epigenetic regulatory patterns around Zscan4a, Zscan4c, Zscan4d, and Zscan4f genes in 2-cell embryos, selected in vitro TSCs models (2CLCs, TPSs, ciTotiSCs, and TLSCs), and pluripotent mESCs. Zscan4c only expressed in 2-cell embryos and in vitro TSCs models but not in mESCs ([148]Figure 5A). Open accessible regions around Zscan4c also show a similar totipotent-specific pattern with accessible peaks existing only in 2-cell embryos and 2CLCs, when compared to mESCs. Importantly, H3K4me3 peaks around Zscan4c were specifically presented in 2CLCs and 2-cell embryos but not in mESCs as well ([149]Figure 5A), suggesting the potential totipotency-relevant role of H3K4me3-Zscan4c axis. Similar expression and epigenetic modification patterns were also observed around Zscan4a, Zscan4d, and Zscan4f ([150]Figure S2). These findings are consistent with previous findings ([151]Genet and Torres-Padilla, 2020) showing totipotent-specific patterns in both expression and H3K4me3 modification surrounding Zscan4 family genes. Figure 5. [152]Figure 5 [153]Open in a new tab The potential role of H3K4me3-Zscan4c axis in 2CLCs (A) Read coverage represents gene expression, chromatin accessibility, and H3K4me3 modifications (from left to right) around gene Zscan4c in 2C embryo, 2CLC, TPS, ciTotiSC, TLSC, and mESC. 2CLC, 2-cell-like cell; TPS, totipotent potential stem cell; ciTotiSC, chemically induced totipotent stem cell; TLSC, totipotent-like stem cell; mESC, mouse embryonic stem cell. (B) Functional annotation (gene ontology) of H3K4me3-Zscan4c axis (H3K4me3 modified regions located within ±2 kb around TSS of Zscan4c) in 2CLCs. The x axis is the −log[10] (two-sided p value), and the y axis represents the enriched GO terms. (C) Read coverage represents H3K4me3 modification and gene expression around Ctc1 in 2CLC and mESC. (D) Read coverage represents H3K4me3 modification and gene expression around Tinf2 in 2CLC and mESC. (E) Motif enrichment results for Zscan4c in Zscan4c-over-expressed mESCs. (F) Schematic illustrating the putative role of Zscan4c in maintaining genomic stability and telomere elongation. Next, we investigated the function role of H3K4me3-Zscan4c axis in 2CLCs using the “Pathway Enrichment” module. Intriguingly, we found that H3K4me3-Zscan4c axis suggested enrichment in telomere maintenance- and chromosome telomeric region-relevant pathways ([154]Figure 5B). In addition to Zscan4c, we also observed that H3K4me3 modification and expression of other genes involved in telomere regulation and maintenance, such as Ctc1 ([155]Wang and Chai, 2018) and Tinf2 ([156]Schmutz et al., 2020), showed similar patterns of enrichment in 2CLCs compared to mESCs ([157]Figures 5C and 5D). Altogether, these results indicate the potential role of H3K4me3-Zscan4 axis in regulating the telomere in 2CLCs ([158]Figures 5B–5D; [159]Table S5). Furthermore, we examined the target sequences of Zscan4c in Zscan4c-over-expressed mESCs (ChIP-seq data in Toti) using the “Motif Enrichment” module. The top 2 enriched binding motifs displayed a repeated CA/TG pattern, suggesting the microsatellite signatures ([160]Figure 5E). The (CA)n/(TG)n microsatellites are susceptible to chromosomal recombination, which affects the genome integrity ([161]Gendrel et al., 2000). Zscan4 was previously found to recognize and bind to (CA)n/(TG)n microsatellites in nucleosomes, preventing them from forming an energetically unfavorable structure known as Z-DNA and disassembling under torsional stress ([162]Srinivasan et al., 2020). This mechanism potentially functions as a developmentally regulated safeguard that maintains genomic integrity in totipotent embryos, which is in line with the observed enrichment in “regulation of DNA recombination” pathway for H3K4me3-Zscan4c axis illustrated in [163]Figure 5B. Collectively, our primary analysis suggests the hypothesized role of Zscan4 in preserving genomic integrity by maintaining telomeres and safeguarding microsatellite regions in 2CLCs. As illustrated by this case study, we show that Toti could facilitate users to understand the potential function of H3K4me3 in regulating genomic stability in TSCs through activating Zscan4 ([164]Figure 5F). Our case study exemplified how to apply Toti to explore discrepancies of transcriptome and epigenome nuances underlying totipotency across mESCs, 2-cell embryos, and in vitro TSC models. Discussion To the best of our knowledge, Toti is the first and unique multi-omics database exclusively developed for investigation of transcriptional and epigenetic factors shaping totipotency and can be accessed through a user-friendly Web interface ([165]http://toti.zju.edu.cn/). Toti unprecedentedly provides a comprehensive repository of transcriptomic and epigenomic data from 4,680 human and 3,604 mouse samples, covering in vivo, in vitro, and genome-edited embryonic TSCs, TSC-like cells, PSCs, and embryos spanning preimplantation stages, setting it apart from other embryonic and stem cell resources, such as dbEmbryo, DevOmics, and StemCellDB. The “Search” module enables users to easily and effectively focus on their interested studies/datasets by searching for genes, sequencing types, or other relevant keywords. This “Browse” module provides insights into the (dis)similarity across in vivo and in vitro totipotent cells, PSCs, and other embryonic cells in human and mouse by allowing intuitive and comparative visualization of epigenetic and transcriptomic features. With three sub-modules designed in the “[166]analysis” section, Toti also facilitates users to prioritize key motifs and biological pathways that might be able to capture gene regulatory logic underlying totipotency. Furthermore, Toti allows an online scRNA-seq analysis module tailored to investigation on cellular transcriptome underpinning of the establishment and exit of totipotency at single-cell resolution. We conducted a case study to provide a walkthrough of various modules, such as “Featured Gene,” “Genome Browse,” “Motif Enrichment,” “Pathway Enrichment,” “scRNA-seq analysis,” and “Download,” demonstrating how they can be effectively used to explore the potential applications of Toti for totipotency research. Our case study successfully replicated previously reported findings, testifying to the robustness, consistency, and reliability of our standardized data analysis pipeline. While we acknowledge that the analyses presented in this case study may not meet the stringent standards typically expected for standalone biological discoveries, they sufficiently demonstrate how researchers can utilize the integrated multi-omics datasets and multifaceted analytical modules in Toti for totipotency-related research. We believe Toti constitutes a comprehensive resource and opens new avenues for understanding the intricacies of totipotency. In the future, we will continuously maintain and update Toti by curating and incorporating the latest findings in TSC studies. As TSCs are the progenitors of all cell types in mammals and have great potential for regenerative medicine, we will continue to expand our collection of publicly available multi-omics data of TSCs to more species, such as pigs and monkeys. In addition, we will continue to integrate other data types, such as scATAC-seq and Hi-C data, to further improve our understanding of totipotency-associated regulatory networks. These additions are anticipated to provide insights into the molecular basis of totipotency and facilitate researchers to stabilize totipotency in vitro. Methods Data processing We processed the collected data using constructed pipelines, as illustrated in [167]Figure S3. Our study was based on GRCm38 (mm10) and GRCh38 (hg38) reference genomes for mouse and human, respectively. scRNA-seq data harmonization We performed downstream single-cell analysis using Seurat (v.5.0.1) ([168]Butler et al., 2018). Data were normalized with SCTransform ([169]Hao et al., 2024), which uses regularized negative binomial regression to remove technical variation while maintaining biological heterogeneity. Compared to traditional log-normalization, SCTransform better stabilizes variance across expression levels, retaining signals from both lowly and highly expressed genes. To reduce integration bias from disparate sample sizes, we randomly down-sampled 1,500 cells per cell type, balancing dataset diversity and statistical power. Prior to integration, we compared harmonization methods, including FindIntegrationAnchors (Seurat), Harmony, and LIGER, using pooled scRNA-seq data from embryos at different developmental stages, with the goal of identifying the optimal integration approach that effectively minimizes batch effects while preserving biologically meaningful relationships and developmental hierarchies. FindIntegrationAnchors performed the best, maintaining biologically plausible distances on the UMAP and preserving developmental trajectories ([170]Figure S4). Thus, we integrated scRNA-seq datasets using FindIntegrationAnchors (anchor features: 5–20; neighbors filtering anchors: 30–300), followed by IntegrateData to correct for batch effects. Cell identities were defined by experimental sorting, ensuring high confidence without relying on post hoc gene-based annotation. To validate the robustness and biological relevance of our results, we performed principal-component analysis on pseudobulk expression profiles from scRNA-seq data of embryos at different developmental stages. As shown in [171]Figure S5A, cell types clustered by developmental stages: early, mid, and late 2-cell embryos clustered closely with ciTotiSCs, 2CLCs, and TPS, while mESCs, ICM, and blastocysts formed a separate cluster, confirming the robustness of our integration and annotations. Featured gene selection Representative genesets (maternal, minor ZGA, major ZGA, totipotent, and pluripotent genes) were previously defined in the literature ([172]Yang et al., 2022). We validated their robustness using two independent datasets: scRNA-seq data ([173]Deng et al., 2014) and bulk RNA-seq ([174]Wu et al., 2016). Despite the absence of oocytes in the scRNA-seq data, signature genes showed expected developmental expression. Totipotent genes were enriched in zygotes and early, mid, and late 2-cell populations ([175]Figure S5B). Similarly, in the bulk RNA-seq dataset, expression patterns of maternal, totipotent, and pluripotent genes aligned with oocytes, early embryos, and ICM/mESCs, respectively ([176]Figure S5C). Minor and major ZGA genes also showed developmental stage-appropriate activation. These consistent expression patterns support the robustness of these gene sets. Resource availability Lead contact Further information and resource requests should be directed to Xin Sheng (shengxin@zju.edu.cn). Materials availability This study did not generate new unique reagents. Data and code availability Toti is freely accessible at [177]https://toti.zju.edu.cn. All raw sequencing data are publicly available from SRA ([178]http://www.ncbi.nlm.nih.gov/sra) and ENA ([179]http://www.ebi.ac.uk/ena/browser/search). Metadata for all datasets are provided in [180]Tables S1 and [181]S2. Gene/isoform expression matrices and epigenetically modified regions in embryonic cells can be downloaded from the “Download” page. Data processing code is available at [182]https://github.com/cosmoss8274/Toti. Acknowledgments