Abstract Background Guinea pigs exhibit numerous physiological similarities to humans, yet the details of their preimplantation embryonic development remain largely unexplored. Results To address this, we conducted single-cell sequencing on the transcriptomes of cells isolated from the zygote stage through preimplantation stages in guinea pigs. This study identified seven distinct cell types within guinea pig preimplantation embryos and pinpointed the timing of zygotic gene activation (ZGA). Trajectory analysis revealed a bifurcation into two lineage-specific branches, accompanied by alterations in specific pathways, including oxidative phosphorylation and vascular endothelial growth factor (VEGF). Additionally, co-expressed gene network analysis highlighted the most enriched functional modules for the epiblast (EPI), primitive endoderm (PrE), and inner cell mass (ICM). Finally, we compared the similarities and differences between human and guinea pig epiblasts (EPIs). Conclusion This study systematically constructs a cell atlas of guinea pig preimplantation embryonic development, offering fresh insights into mammalian embryonic development and providing alternative experimental models for studying human embryonic development. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10815-z. Keywords: Guinea pigs, Single-cell sequencing, Early embryonic development Introduction Early embryonic development, especially mammalian embryonic development (including human), has been studied mainly using mouse and non-human primate embryos as models [[38]1]. Mice are extensively used due to their quick reproductive cycle and the advancement of technologies such as gene editing and embryo manipulation. The preimplantation development in mice spans approximately five days, progressing from a single fertilized egg through multiple cell divisions, zygotic gene activation, compaction, polarization, and cavitation to form the blastocyst cavity. The initial lineage separation occurs during the transition from the morula to the early blastocyst stage [[39]2]. Subsequently, the totipotent inner cell mass (ICM) differentiates into the primitive endoderm (PrE) and the pluripotent epiblast (EPI) [[40]3–[41]5]. Extensive research has focused on cell fate determination in mouse preimplantation embryos. Although human and mouse preimplantation embryos exhibit morphological similarities, mice develop early blastocysts at embryonic day 3.25 (E3.25) whereas humans do so at day post fertilization 5 (d.p.f 5). Late blastocysts form at E3.75 in mice and d.p.f 6–7 in humans. The first lineage separation begins in the morula of mice and the expanding blastocyst of humans, with the second lineage separation occurring in the late blastocyst [[42]6, [43]7]. Human embryonic development research has been limited in part due to ethical and other constraints, and the maturity of technologies such as micromanipulation and gene editing has facilitated the use of mice and other alternative models for research. However, as mentioned above, mouse embryo development still exhibits differences from human embryo development in certain aspects. Therefore, it is essential to study embryos from various species to gain a comprehensive understanding of mammalian embryonic development. Guinea pigs, also known as Dutch pigs, share several physiological similarities with humans. Firstly, their immune systems closely resemble those of humans, making guinea pigs excellent models for immunization studies [[44]8–[45]10]. Neuroanatomically mature, guinea pigs have well-characterized brain and neuroendocrine development, with earlier brain maturation than rats [[46]11, [47]12]. They are among the few laboratory rodents that exhibit a complete estrous cycle, including follicular and luteal phases similar to humans, unlike most rodents such as mice, rats, and hamsters, which have distinct and incomplete estrous cycles lacking a functional luteal phase [[48]13]. Crucially, guinea pigs have a placental pattern similar to humans [[49]14–[50]16] and possess unique developmental advantages over mice and rats [[51]17–[52]21]. Additionally, like humans, guinea pigs cannot synthesize their own vitamin C, and research indicates that vitamin C deficiency in these animals can negatively impact fertility and pregnancy [[53]22, [54]23]. Due to the numerous similarities between guinea pigs and humans, the use of guinea pigs was prevalent in the early 1970s, surpassing that of mice, rats, and hamsters. However, this usage has since declined [[55]24]. The limitations in genome assembly and the immaturity of gene editing technologies have contributed to this decrease. Given these similarities and limitations, further investigation into guinea pig embryonic development is essential to bridge gaps in our understanding of mammalian development. In this study, we hypothesize that guinea pig embryos undergo zygotic genome activation, followed by the first and second cell fate determinations, all occurring in a relatively orderly manner before implantation under physiological conditions. To address this, we propose to employ single-cell transcriptome sequencing technology to explore the lineage decisions and gene expression patterns during guinea pig preimplantation embryo development. Our objectives include characterizing cell types in these embryos, pinpointing the timing of zygotic genome activation (ZGA) in guinea pigs, and delineating two critical cell fate-determining branches with specific pathway alterations across developmental stages. Additionally, we aim to identify functional modules predominantly enriched during the second lineage segregation involving three distinct cell types. Lastly, we will investigate interspecies differences in the preimplantation epiblast (EPI) population between humans and guinea pigs. Methods Collection of guinea pig preimplantation embryos and isolation of single cell samples Pregnant guinea pigs were euthanized using carbon dioxide, and the reproductive organs (ovaries, oviducts, and uterus) were excised and immersed in pre-warmed (37 °C) phosphate-buffered saline (PBS). To harvest zygotes, two-cell, four-cell, and eight-cell stage embryos, the fallopian tube was incised at the uterine horn junction, and 500 μl of pre-warmed embryo collection solution (DMEM/F12 supplemented with 2% fetal bovine serum, FBS) was injected using a 1 ml syringe. The embryos were then flushed out from the fallopian tube’s fimbriae. For the collection of morulae and blastocysts, the uterus was cut near the cervix, and a syringe needle was inserted near the tubal-uterine horn junction to inject 1 to 2 ml of preheated embryo collection solution, facilitating the flushing of embryos from the uterus. The zona pellucida was subsequently removed by treating the embryos with Acidic Tyrode’s Solution for 15–30 s. Embryos without zona pellucida were treated with Accutase alone or a mixture of Accutase and TrypLE (1:1) for 3–30 min until the cells were loosened. Single cells were then dispersed in DPBS containing 0.2% BSA using a mouth pipette for sampling. To separately sample mural trophectoderm (mTE) cells, some blastocyst-stage embryos were microdissected along the inner cell mass (ICM) using a laser membrane rupture device on a micromanipulation platform. The mTE and the ICM plus polar trophectoderm (pTE) groups were then digested and sampled individually. Construction and sequencing of single-cell RNA-seq libraries Cells were lysed, and RNA with polyA tails, primarily mRNA, was reverse transcribed using an oligo(dT) primer to synthesize the first strand of cDNA. A splice sequence was appended to the 3’ end of the cDNA utilizing the template-switching activity of the reverse transcriptase. The second strand of cDNA was synthesized at this junction sequence using a TSO (template-switching oligo) primer, which displaced the RNA complementary to the first strand of cDNA. PCR amplification followed to achieve full-length cDNA products, effectively mitigating 3’ bias and rRNA contamination during cDNA synthesis. The resultant single-cell cDNAs underwent quality control using a high-sensitivity chip, with fragment lengths ranging from 400 bp to 10,000 bp, and the peak of the library distribution was around 2,000 bp, indicating qualified cDNA libraries. These qualified single-cell cDNA libraries were fragmented using Tn5 transposase, which also added junction sequences to the cDNA ends. A final round of PCR amplification of the labeled cDNAs was conducted using sequencing primers, and the amplified cDNAs were purified with magnetic beads to yield on-board libraries of 200–600 bp. All steps of the cDNA library construction, including lysis, reverse transcription, amplification, and purification, were carried out in-house by our lab. The sequencing platforms chosen for this study were Illumina’s NovaSeq6000 and BGI’s MGISEQ2000. The data volume generated from a single-cell library on the paired-end PE150/100 machines was approximately 1.5G-2G, sufficient for comprehensive analysis. The library construction method employed was Smart-seq2 [[56]25]. Bioinformatics workflow Upon receiving raw FASTQ files, we conducted quality control with FastQC and MultiQC to generate comprehensive quality reports. The subsequent adapter trimming was performed using Trim Galore. Post-quality check, the data were aligned with HISAT2, and expression quantification was achieved using featureCounts. The resulting expression matrix was then processed through the “Seurat” package for standard single-cell workflow analysis. Annotated cells underwent differential expression analysis, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, developmental trajectory construction with Monocle2, and high-dimensional gene co-expression network analysis using hdWGCNA to examine lineage-specific gene expression characteristics. Additionally, we conducted interspecies comparisons. Detailed descriptions of the algorithms used and their corresponding parameters are provided in the following sections. Preprocessing of scRNA-seq data The quality of the initial sequencing data, represented by the FASTQ file, is assessed, encompassing the GC content, base quality at each position, base distribution, presence of junctions, and junction content. Sequencing files that exhibit no evident quality issues, contamination from adapter sequences or other species, or sequences of identical length are retained. Adapter sequences are automatically detected and removed using “trim_galore,” which also eliminates low-quality bases scoring below q20 (--quality 20). The sequenced reads are aligned to the guinea pig reference genome (Cavpor3.0) using the Hisat2 (v2.2.1) software package [[57]26]. Unmatched data is filtered out using “samtools” (v1.17) [[58]27], resulting in bam files. The gene expression matrix is subsequently quantified by counting the number of molecules per gene, guided by the annotation file, through “featureCounts” (Liao et al., 2014). Dimensionality reduction clustering The expression matrix derived from the upstream analysis was imported into R, and the Seurat package(v4.4.0) [[59]28] was employed to create Seurat objects. Following a simple merge, three batches of data were filtered based on QC-specific criteria: genes expressed in at least 3 cells, and thresholds for nFeature_RNA (> 5000), nCount_RNA (> 1,000,000), and nCount_RNA (< 10,000,000). The standard Seurat workflow was then applied, including “NormalizeData”, “FindVariableFeatures”, and “ScaleData”. Principal component analysis (PCA) was conducted using the “RunPCA” function. Clustering was achieved through “FindNeighbors” and “FindClusters”, and the results were visualized using Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction. The “FindAllMarkers” function, with default parameters, was used for identifying characteristic genes of each cluster. The default parameters used in “FindAllMarkers” for this study are “only.pos = FALSE, min.pct = 0.25, thresh.use = 0.25”. Cell type annotation was performed manually. Default parameters were used throughout, with dimensions selected from 1 to 15 (dims = 1:15) and a resolution of 0.8. Identification of the period of occurrence of ZGA To detect changes in gene expression during successive developmental stages, we conducted a differential analysis comparing each time point to its preceding one. Genes exhibiting significant differences were identified using the criteria of absolute average log2 fold change (avg_log2FC) greater than 1 and a p-value less than 0.05. Visualization of these differential genes was achieved through scatter plots and volcano plots, both of which were generated using the “ggplot2” package [[60]29]. DEG analysis The analysis was conducted using the built-in “FindMarkers” function within the Seurat package, which employs the “wilcox” test for identifying differentially expressed genes. The specific parameters are “verbose = FALSE, test.use = ‘wilcox’, min.pct = 0.1,” with all other parameters set to default. GO and KEGG enrichment analysis The initial step involved utilizing the “AnnotationHub” package [[61]30] to retrieve annotations specific to guinea pigs. Following this, GO and KEGG enrichment analyses were conducted on the outcomes of the differential expression analysis, employing the “clusterProfiler” package(v4.10.1) [[62]31]. Visualization of the results was accomplished through the “GOplot” package [[63]32]. Pseudotime analysis To investigate the two genealogical separations, single-cell trajectory analysis was conducted using the R Monocle2 package with the DDRTree method and default parameters. The top 3000 genes were selected as the input for this trajectory analysis. Subsequently, heatmaps were generated to display significantly changed genes, which were clustered based on their pseudo-temporal expression patterns. Pathway enrichment analysis was then performed using gene sets from KEGG. In a similar vein, the “AddModuleScore_UCell” function [[64]33] was employed to integrate the gene set scoring results into the corresponding matrix of the Monocle object, and the “plot_cell_trajectory” function was used to visualize the pathways. High dimensional weighted gene co-expression network analysis (hdWGCNA) To pinpoint genes linked to the second lineage segregation in guinea pigs, a co-expression gene network analysis was executed using the “hdWGCNA” package [[65]34]. The process involved constructing metacells individually for each sample and cell cluster using the hdWGCNA function, with 50 cells amalgamated per metacell. For each cell population, a subset of Seurat objects was established for the targeted cell population, followed by the application of the standard hdWGCNA pipeline. This pipeline was implemented sequentially through the execution of the following functions with default parameters: TestSoftPowers, ConstructNetwork, ModuleEigengenes, ModuleConnectivity, and RunModuleUMAP. Species comparison The dataset utilized in this study, sourced from [66]GSE136447 [[67]35], comprised scRNA-seq profiles of human preimplantation EPI cells aligned with the human genome. The dataset included FPKM values for 13,915 homologous genes, derived from a combination of 106 human ICM and EPI single cells and 315 guinea pig ICM and EPI scRNA-seq profiles. To refine the dataset, filtering was applied to retain 10,698 genes common to both human and guinea pig, based on the criterion that these genes had a log2(FPKM + 1) value of at least 4 in at least one of the 106 human or 315 guinea pig cells. The raw FPKM values for these 10,698 genes from both species were subsequently normalized and scaled using the Seurat package in R. Principal Component Analysis (PCA) was conducted on these normalized and scaled FPKM values via the prcomp function in R. The PCA results for the first three major components were visualized using MatLab (MathWorks) [[68]36]. Notably, 952 homologous genes in human and guinea pig significantly contributed to PC1, with absolute PC1 values greater than 0.8 standard deviations. Clustering and visualization of 12,475 genes were performed using the heatmap function on the R platform. Further analysis focused on genes with PC1 values greater than 0.8 and less than − 1.75 standard deviations, encompassing 250 and 702 genes respectively, for KEGG enrichment using the clusterProfiler package. Animals Females and males representing the outbred Hartley strain of guinea pigs (Cavia porcellus) were purchased from Beijing Vital River Laboratory. Results Acquisition of high-quality guinea pig preimplantation embryos To acquire high-quality single-cell data from guinea pig preimplantation embryos, we employed morphological observation to stage these embryos based on post-fertilization time and microscopic morphological features. Embryos were harvested from guinea pigs at various embryonic days ranging from E0.5 to E5.5, encompassing stages from Zygote to Blastocyst, including 2-cell, 4-cell, 8-cell, early morula, late morula, early blastocyst, and blastocysts stages. Some blastocysts were further dissected into Inner Cell Mass (ICM) plus polar Trophoblast Cells (pTE) and mural Trophoblast Cells (mTE) groups using microdissection. These embryos were then enzymatically dissociated into single cells. We collected a total of 599 samples from 39 pregnant guinea pigs (Fig. [69]1A, B). Following quality control (QC), we obtained final data from 551 high-quality single-cell samples derived from guinea pig preimplantation embryos. The morphology of each stage of embryos and the dissociated single cells was documented (Fig. [70]1C, Fig.[71]S1A, B). Fig. 1. [72]Fig. 1 [73]Open in a new tab Preimplantation Embryo Sampling in Guinea Pigs (A) Embryo sampling conducted at different preimplantation stages. (B) Summary of the number of embryos and cells collected at each time point. (C) Representative bright-field microscopy images of in vivo acquired embryos. Scale bar: 50 μm Single-cell analysis of guinea pig preimplantation embryos The post-QC single-cell data were subjected to UMAP dimensionality reduction and classified into a total of 9 clusters. Given the limited research on guinea pig preimplantation embryos, we relied on cell markers from human and mouse preimplantation embryos to characterize the single-cell data of guinea pig embryos (Fig. [74]2B; Fig. [75]S1D) [[76]37–[77]40]. The joint morphological pre-staging of sample information facilitated accurate cell annotation of clusters (Fig. [78]S1C). We defined Totipotency (Toti) to include Zygote, 2-cell, and 4-cell stages, as well as 8-cell, morula, Trophoblastic Ectoderm (TE), and Primitive Endoderm (PrE) based on morphological observations and corresponding markers (Fig. [79]2A). However, the definitions of the Inner Cell Mass (ICM) and Epiblast (EPI) were more ambiguous. The regulation of the pre-implantation Epiblast (EPI) in various species has been shown to be regulated by Nanog [[80]41–[81]43]. Consequently, we aimed to use Nanog as a specific marker for EPI in cell cluster characterization. However, unexpectedly, Nanog expression was not detected in the available single-cell data (Fig. [82]2B). Given that guinea pig embryos initiate implantation at E6 [[83]21], and theoretically, Nanog should be expressed in both ICM and EPI at E5.5, which was not observed. Further investigation revealed a gene named LOC100731016, which, upon inquiry in the NCBI database, was identified as the Nanog2 gene of the guinea pig. Nanog belongs to a gene family that includes NANOG1, its tandem duplication gene NANOG2, and the pseudogenes NANOGP2-P11. NANOG and NANOG2 have been demonstrated to be functionally equivalent and to activate specific stem cell regulatory pathways [[84]44]. Thus, we adopted LOC100731016 (Nanog2) as an EPI-specific marker for cell cluster characterization (Fig. [85]2B). Fig. 2. [86]Fig. 2 [87]Open in a new tab Transcriptomic analysis of preimplantation embryos in guinea pigs (A) UMAP visualization of 551 single-cell transcriptomes from guinea pig preimplantation embryos. Points are colored by cell types. (B) Dot plot of selected marker genes for various cell types. (C) Visualization of the top 5 upregulated and downregulated genes in each cluster. (D) Heatmap showing the top 25 differentially expressed genes (DEGs) in each cluster. (E) UMAP plot of all cells with projected expression patterns of the following marker genes for each cluster: Zscan4 (8 C), Nanog2 (EPI), Cdx2, Gata3 (TE), and Sox17, Gata6 (PrE) Additionally, Dnmt3b was highly expressed in EPI, indicating a hypermethylation state, akin to findings in human embryo studies [[88]45]. By subgrouping based on sampling information and marker expressions such as Tfcp2l1 and LOC100731016, the ICM group was also defined. The final classification comprised 7 cell classes: Totipotency (Toti), 8 C, Morula, ICM, TE, EPI, and PrE (Fig. [89]2A). Multiple marker gene enrichment and specific up- and down-regulated genes were also clearly differentiated (Fig. [90]2C-E). Guinea pig ZGA occurs at the 4–8 cell stage Mammalian embryonic development commences with the formation of a fertilized egg following sperm-egg fusion. The initial stage of embryonic development is predominantly governed by a series of maternal factors supplied by the egg cell’s cytoplasm. During this phase, the zygotic genome remains transcriptionally silent, enabling the zygote to maintain a state of totipotency. As the maternally deposited mRNA degrades, the zygotic genome starts to be transcribed, leading to the activation of the genome, the mobilization of embryonic gene products, and the clearance of maternal factors. This process, known as the maternal-to-zygotic transition (MZT), involves two key events: the degradation of maternal mRNA and the production of nascent mRNA, which signifies the activation of the zygotic genome (ZGA) [[91]46, [92]47]. The coordination of these events is crucial. MZT is a critical phase in developmental studies, as the transitions that occur during this period are essential for embryonic patterning, which is heavily influenced by the actions of transcriptional activators [[93]48–[94]50]. Advances in histological sequencing have enabled the development of a sensitive and accurate method for sequencing in continuous time to assess the activation of the zygotic genome. We analyzed differences in gene expression between adjacent time periods for samples ranging from the zygote to the morula stage. The onset of ZGA is marked by significant changes in gene expression patterns and transcript levels, transitioning the embryo from a state of minimal transcription to one where thousands of genes are actively transcribed. This shift triggers a cascade of gene expression changes that influence subsequent cell fates [[95]51]. Analysis of the differentially expressed genes (DEGs) revealed that the transcriptomic differences in guinea pig embryos at the zygote, 2 C, and 4 C stages are minimal, suggesting a transcriptionally quiescent state. In stark contrast, the transition from 4 C to 8 C stages exhibited significant transcriptomic alterations (Fig. [96]3A), aligning with the anticipated dynamics associated with the onset of ZGA. We delved into the gene expression patterns during the 4 C to 8 C developmental stages (Fig. [97]3B; Fig. [98]S2,B). Gene Ontology (GO) enrichment analysis indicated that the up-regulated genes were predominantly involved in protein synthesis and transport (Fig. [99]3C). Conversely, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis highlighted the pentose phosphate pathway (Fig. [100]3D). which is crucial for providing precursors for nucleic acid synthesis. The enrichment of this pathway during the 4 C to 8 C stages suggests vigorous DNA replication and synthesis. Additionally, Gene Set Enrichment Analysis (GSEA) and GO confirmed that numerous synthesis reactions are actively occurring at this stage. (Fig. [101]S2). Fig. 3. [102]Fig. 3 [103]Open in a new tab The zygotic genome activation (ZGA) of guinea pigs (A) Percentage difference (Δ percent of cells) and log-fold change based on the Wilcoxon rank-sum test results for differential gene expression comparing 2 C versus zygote(Left), 4 C versus 2 C(Middle), and 8 C versus 4 C(Right). Red: upregulated genes; Blue: downregulated genes. (B) Volcano plots of differential gene expression comparing 4 C versus zygote (Left) and 8 C versus zygote (Right). (C) Gene Ontology (GO) circle plot displaying gene-annotation enrichment analysis. Red indicates upregulated gene-associated GO terms, relative to the z-score of the analysis. (D) KEGG Chord plot illustrating the relationship between the list of selected KEGG terms (upregulated genes in 8 C versus 4 C) and their corresponding genes, along with the log fold change (logFC) of the genes. The left half of the KEGG Chord plot displays whether the gene expression is upregulated, and the right half represents different KEGG terms with various colors Pseudotime analysis of preimplantation embryonic development in guinea pigs provides insights into lineage specification, a process extensively investigated in humans and mice. In mice, lineage specification typically unfolds in two stages. The initial stage occurs at the morula stage, delineating the inner cell mass (ICM) and trophectoderm (TE) cell fates, with TE later giving rise to extraembryonic tissues like the placenta. The subsequent stage transpires in the blastocyst phase, where the ICM differentiates into epiblast (EPI) and primitive endoderm (PrE) lineages, with EPI eventually forming the individual and PrE contributing to the yolk sac [[104]3, [105]52]. In human studies, however, two contrasting views have emerged. One perspective posits that TE, EPI, and PrE are specified concurrently during human preimplantation development, suggesting a one-step process [[106]41]. The alternative view advocates for an intermediate ICM state during lineage specification, akin to the two-step model observed in mice [[107]53–[108]55]. To investigate the lineage specification of preimplantation embryonic development in guinea pigs, we analyzed embryonic data spanning from the 8 C to the blastocyst stages. The pseudotime analysis revealed a bifurcating trajectory with two primary branches emerging from an initial branch at the first branching point. Subsequently, one of these branches further diverged into two additional branches at the second branching point. Mapping cell types onto these branches indicated that the initial branch was predominantly composed of 8 C and 16 C morula cells. The first branching point delineated the trophectoderm (TE) and inner cell mass (ICM) branches. Following the proposed developmental timeline, the ICM branch subsequently differentiated into epiblast (EPI) and primitive endoderm (PrE) branches at the second branching point (Fig. [109]4A). Additionally, we overlaid the sampling time and pseudotime values onto the branches, identifying the first lineage separation in guinea pig preimplantation embryonic development around day E4.5, with the second lineage separation occurring around day E5.0 (Fig. [110]4B). To elucidate the key drivers of lineage specification, we visualized the top five genes with the highest contribution to the trajectory construction. Among these, Argfx, a gene homologous to a frame gene frequently observed in mammalian embryonic development studies from 8 C to morula stages and known to influence developmental regulation [[111]56], exhibited a comparable expression pattern in guinea pigs. The projection of specific marker genes further validated the accuracy of the trajectory construction (Fig. [112]S3; Fig. [113]S4). Notably, the ICM branch in guinea pigs was found to contain a mixture of EPI cells, differing from mice where distinct ICM cell clusters can be identified based on significant molecular features. Similar to humans, guinea pigs exhibit a high degree of similarity between ICM and EPI cell clusters. Overall, the pseudotime analysis of our data offers a comprehensive view of the three-lineage specification in guinea pig preimplantation embryos. Furthermore, we depicted the cell proportion dynamics at various time points, aligning with the developmental trajectory (Fig. [114]4C). Fig. 4. [115]Fig. 4 [116]Open in a new tab Two rounds of lineage separation in guinea pig preimplantation embryos (A) UMAP plot showing the developmental trajectory across different sampling times. (B) UMAP plot illustrating the developmental trajectory of 8-cell stage (8 C), morula, trophectoderm (TE), inner cell mass (ICM), epiblast (EPI), and primitive endoderm (PrE) cells. (C) Bar plot representing the proportion of each cell type at different sampling times. (D) Heatmap displaying trends of differentially expressed genes as a function of inferred pseudotime. Genes in each row are grouped into six clusters based on their expression patterns. (E-G) These UMAP plots represent the distribution of pathway scores across different pseudotime points. The color gradient on the plot indicates the pathway scores, with dark blue representing low scores and light yellow representing high scores. To explore the gene expression dynamics during lineage specification, we ranked the top 1000 genes by ascending q-values and generated a heatmap depicting differential gene expression across pseudotime, aiming to pinpoint genes with significant expression shifts over time. These genes were grouped into six clusters based on their expression patterns (Fig. [117]4D). The heatmap clusters illustrated distinct gene expression profiles that evolved with pseudotime. To delve deeper into pathway expression trends, we utilized Ucell to score relevant pathways, focusing on key processes such as oxidative phosphorylation, purine metabolism, and the VEGF pathway (Fig. [118]4E-G). Notably, the oxidative phosphorylation pathway exhibited elevated scores in ICM and EPI cells, aligning with observations of enrichment in preimplantation EPI cells across various species [[119]57, [120]58]. These findings are congruent with prior research. Furthermore, the VEGF pathway showed higher scores in TE and PrE cells, which is consistent with its role in angiogenesis, as TE and PrE later contribute to the vascular-rich placenta and yolk sac, respectively. The observed expression patterns were congruent with the pathway enrichment analysis, underscoring the relevance of these pathways in early embryonic development. Functional characteristics of ICM, EPI and PrE To delineate the molecular characteristics of each lineage, we employed hdWGCNA (high-dimensional Weighted Gene Co-expression Network Analysis) [[121]34] to uncover gene expression patterns associated with lineage specification (Fig. [122]5A; Fig. [123]S5B). We selected a soft threshold power of 7 to construct the co-expression network (Fig. [124]S5A). Following high-dimensional gene co-expression analysis, we identified 19 functional modules (Fig. [125]S5A-B; Fig. [126]S5D), with particular attention to those associated with the second lineage segregation. Utilizing correlation-assisted enrichment module screening, we identified the most significant modules enriched for EPI, ICM, and PrE as 3, 6, and 13 modules, respectively (Fig. [127]5B; Fig. [128]S5C). The hub genes of these modules were determined by selecting the top 25 hub genes for each. Visualization revealed that Nanog2 was the central regulator in the 3 modules most significantly enriched for EPI, whereas Sox17 was the predominant core regulator enriched in PrE (Fig. [129]5C), aligning with previous findings [[130]59]. Fig. 5. [131]Fig. 5 [132]Open in a new tab Identification of co-expressed gene modules associated with ICM, EPI, and PrE (A) Schematic representation of the second cell fate decision for preimplantation embryos. (B) Module activities in different clusters estimated using the hdWGCNA algorithm. (C) Co-expression plots for modules M3 (EPI), M6 (ICM), and M13 (PrE). (D) Differential gene heatmap and KEGG enrichment analysis results for EPI, ICM, and PrE To understand the gene expression patterns associated with the second cell fate decision, we identified 1209 genes by intersecting the markers identified by hdWGCNA and the findmarker function. We then analyzed these genes using KEGG to plot and examine a heatmap. The analysis revealed that EPI cells primarily involve pathways such as Hippo, Notch, oxidative phosphorylation, embryonic development, and cell adhesion. In contrast, ICM cells were associated with embryonic development and oxidative phosphorylation, while PrE cells were enriched in pathways like Ras, Rap1, and VEGF. These cell types exhibited partially overlapping yet distinct pathway expression profiles (Fig. [133]5D). Comparison of human and guinea pig EPI characteristics It is now generally accepted that preimplantation EPI is characterized by a naïve state, with the potential to produce a complete individual (but not to develop extra-embryonic tissues) and the ability to form chimeras [[134]60, [135]61]. Also similar to humans, after implantation into the endometrium, both guinea pig and human EPIs show developmental events of cavitation and formation of a double-layered intervertebral disc [[136]62]. EPI plays an important role in understanding the developmental mechanisms of pluripotent stem cells as well as the development of regenerative medicine. To compare the characteristics of EPI in human and guinea pig preimplantation embryos, we combined previously published human data with our guinea pig data. We focused on the dynamics of naïve genes and the EPI pluripotency transition process (EPST), which have been documented in humans [[137]35]. Our analysis revealed that, compared to human EPST, guinea pigs lacked expression of Il6r and Nr0b1, showed reduced expression of Utf1, and exhibited increased expression of Lifr (Fig. [138]6C -D). Fig. 6. [139]Fig. 6 [140]Open in a new tab Comparison of preimplantation EPI in humans and guinea pigs (A) Three-dimensional Principal Component Analysis (PCA) of the EPI lineage, determined by the expressed genes among all groups of EPIs during development in humans (hu) and guinea pigs (GP). In total, 10,698 out of 13,915 annotated genes expressed in human and guinea pig cells (human, 106 cells; guinea pig, 315 cells) were used. (B) Heatmap of 952 genes that highly contributed to PC1 (greater than 0.8 standard deviation of PC1). (C) Violin plots showing expression levels of pluripotency genes during the EPI pluripotency transition in human embryos. (D) Violin plots showing expression levels of selected genes from (C) in guinea pig embryos Furthermore, we performed PCA on the integrated data. The analysis indicated a. clear separation of human and guinea pig cell data along the PC1 axis, highlighting species differences as the most significant factor (Fig. [141]6A). We identified 13,915 genes expressed in both species. Along the PC1 axis, we classified these genes into significantly positive and significantly negative categories. KEGG enrichment analysis revealed notable differences in pathways such as phosphatidylinositol metabolism, pyrimidine and purine metabolism, and oxidative phosphorylation (Fig. [142]6B). Along the PC2 and PC3 axes, the cellular projections of human and guinea pig overlapped, suggesting similar developmental trajectories during the EPI pluripotency transition process (Fig. [143]6A). These findings suggest both conservation and some differences in the development of human and guinea pig preimplantation EPI. Discussion In our study, we systematically analyzed single-cell data from 62 guinea pig preimplantation embryos, encompassing 551 single cells, using single-cell sequencing technology to explore their developmental characteristics. Our analysis covered key developmental events from the zygote stage through to embryo implantation, including zygotic gene activation (ZGA), ICM and TE cell fate decisions, EPI and PrE cell fate decisions. We identified specific time windows for these critical events and observed corresponding gene expression changes. Additionally, we compared the similarities and differences between human and guinea pig preimplantation EPIs by integrating previously reported human embryonic data. Our findings offer fresh perspectives on guinea pig embryonic development and expand the range of species available for studying mammalian embryogenesis, thereby providing new options for comparative studies. We employed single-cell RNA-seq for lineage tracing in guinea pig preimplantation embryos. Following QC, the data were subjected to dimensionality reduction and clustered into seven distinct cell types. The preimplantation EPI population in guinea pigs exhibits a unique core marker compared to other species, with high expression of the Nanog2 gene, which is functionally similar to Nanog. Additionally, we observed that Totipotency (Toti) cells highly express Sox2, 8 C cells express Zscan4, and ICM cells express Tfcp2l1. In contrast, the TE population is characterized by high enrichment of Krt8, Krt18, Gata3, and Cdx2. The PrE signature marker Sox17, commonly found in human and mouse, is also highly enriched in the guinea pig PrE population. These findings resulted in high-resolution single-cell transcriptome profiles, providing detailed insights into the molecular signatures of different cell types during early embryonic development. Embryonic development commences with a fertilized egg, initially governed primarily by maternal factors. Subsequently, there is a gradual decline in maternal deposition and the activation of the zygotic genome. The occurrence of ZGA is a conserved process across species; however, the timing varies significantly. For instance, human ZGA primarily occurs at the 8-cell stage, while in mice, it happens at the 2-cell stage, and in cattle, it is at the 8-16-cell stage [[144]63]. Our findings indicate that guinea pig ZGA predominantly takes place during the 4-cell to 8-cell phase, aligning closely with the timing in humans. The enrichment of up-regulated genes during this ZGA window is associated with processes such as protein synthesis and transport, as well as ribose synthesis. The number of activated genes and the enriched functions collectively suggest that zygotic gene activation in guinea pig embryos occurs from the 4 C to 8 C stages. We conducted pseudotime analysis of transcriptome developmental trajectories in guinea pig embryos, which revealed a clear bifurcation into two distinct branches. Analysis of the embryo sampling points indicated that the ICM/TE cell fate decision occurs around E4.5, while the EPI/PrE cell fate decision takes place earlier, at E5.0. Despite the clear branching, we observed a high degree of similarity between early ICM and EPI, with significant overlap in trajectory branching. This challenge in distinguishing EPI/PrE lineages is also present in human preimplantation embryos. Our findings suggest that EPI may have emerged first, followed by PrE fate. In mice, the presence of Fgf4 (secreted by EPI cells) is crucial for PrE fate decision [[145]64]. A hypothesis regarding human EPI/PrE cell fate decision posits that ICM is closer to EPI at the transcriptome level, potentially leading to confusion between ICM and EPI [[146]6]. Our data indicate that guinea pigs exhibit a similar pattern to humans in this aspect. Furthermore, we examined the gene expression characteristics of the second lineage decision. Employing hdWGCNA, we identified functional modules most pertinent to ICM, EPI, and PrE. For instance, at the core of regulation for EPI and PrE are Nanog2 and Sox17. Nanog and Sox17 are key regulators in the cell fate determination of the EPI and PrE lineages in mice. During the specialization of the EPI lineage, Nanog, Sox2, and Oct4 work synergistically, promoting the expression of FGF4 while suppressing Gata6 and Fgf signaling, thus driving the specification of the ICM lineage. In contrast, Sox17 plays a crucial role in the specification of the PrE lineage. FGF signaling facilitates PrE formation by promoting Gata6 expression and inhibiting Nanog, with Sox17 acting as a downstream regulatory factor. Given that Nanog2 is functionally equivalent to Nanog, it is intriguing to explore whether the regulatory mechanism in guinea pig EPI/PrE follows a similar pathway. This question remains open and warrants further investigation. Recognizing that the analyzed functional modules could be expressed across multiple cell populations, we intersected the most enriched module genes of ICM, EPI, and PrE with genes identified by the Findmarker function to pinpoint specific genes for these three cell types. We then enriched these for further analysis. The pathways enriched in the respective cells are similar and conserved compared to those in human and mouse, suggesting that the guinea pig offers certain advantages and is a rational choice as an experimental animal for studying human embryonic development. Trophoblast (TE) cells play a crucial role in implantation. Traditionally, it was thought that all TE cells possess equal potential to interact with and invade the endometrium. However, recent research has revealed that polar TEs (pTE) and mural TEs (mTE), which have distinct localizations, exhibit different potentials [[147]6, [148]41, [149]65]. In species such as mice, rats, pigs, and cows, embryos attach to the endometrium through mTE, while in humans and non-human primates, it is the pTE that facilitates attachment. In guinea pigs, similar to mice and rats, it is the mTE that is oriented towards attachment and invasion of the endometrium [[150]66–[151]69]. Despite these findings, the specific mechanisms by which mTE and pTE influence embryo attachment to the endometrium remain elusive, making this an intriguing area of inquiry. The guinea pig genome available on NCBI have not been assembled to the level of individual chromosomes, which limits the resolution of genome assembly. Consequently, research on guinea pig chromatin accessibility, epigenetic modifications of transcription factor binding, and related aspects is constrained. The current study is predominantly confined to the transcriptome level, yet transcription factors and epigenetic modifications are crucial regulators during embryonic development. This represents a limitation of the study. Therefore, constructing a more comprehensive and refined reference genome for the guinea pig is a critical step for advancing future research in this species. Conclusions To summarize, the current research is the first to systematically establish a single-cell global transcriptional profile of guinea pig preimplantation embryos, which is characterized by a wide array of similarities with human embryos. Embryonic development is a pivotal topic in developmental biology and broader biological research. Ethical constraints limit the study of human embryos, and while mice have served as model organisms for embryonic development research, they exhibit differences from humans in certain aspects. The investigation of guinea pig preimplantation embryos offers a novel perspective and method for comprehending human preimplantation embryonic development. As a consequence, deciphering molecular mechanisms underlying inter-transcriptomic hallmarks correlated with early embryogenesis in guinea pigs might open up new possibilities for elaborating an excellent ex vivo embryological rodent-based model. The latter seems to be a highly suitable for exploration of both maternal-to-embryonic transition for controlling gene expression and developmental timeline arising from embryonic cell commitment/differentiation in humans. Moreover, this guinea pig model aimed at investigating experimental and applied embryology in humans might provide a powerful tool for transgenic and biomedical research with the use of such assisted reproductive technologies as cloning by somatic cell nuclear transfer and in vitro fertilization of oocytes either by their coincubation with capacitated spermatozoa or by their intracytoplasmic microinjection with single sperm cells. Electronic supplementary material Below is the link to the electronic supplementary material. [152]Supplementary Material 1^ (3.2MB, jpg) Acknowledgements