Abstract
   Zygotic genome activation (ZGA) marks the critical transition from
   reliance on maternal transcripts to the initiation of embryonic
   transcription early in development. Despite extensive characterization
   in model species, the regulatory framework of ZGA in sheep remains
   poorly defined. Here, we applied single-cell RNA sequencing
   (Smart-seq2) to in vivo- and in vitro-derived sheep embryos at the 8-,
   16-, and 32-cell stages. Differential expression analysis revealed 114,
   1628, and 1465 genes altered in the 8- vs. 16-, 16- vs. 32-, and 8- vs.
   32-cell transitions, respectively, with the core pluripotency factors
   SOX2, NANOG, POU5F1, and KLF4 upregulated during major ZGA. To uncover
   coordinated regulatory modules, we constructed a weighted gene
   co-expression network using WGCNA, identifying the MEred module as most
   tightly correlated with developmental progression (r = 0.48, p = 8.6 ×
   10^−14). The integration of MERed genes into the STRING v11
   protein–protein interaction network furnished a high-confidence
   scaffold for community detection. Louvain partitioning delineated two
   discrete communities: Community 0 was enriched in ER–Golgi
   vesicle-mediated transport, transmembrane transport, and cytoskeletal
   dynamics, suggesting roles in membrane protein processing, secretion,
   and early signaling; Community 1 was enriched in G2/M cell-cycle
   transition and RNA splicing/processing, indicating a coordinated
   network for accurate post-ZGA cell division and transcript maturation.
   Together, these integrated analyses reveal a modular regulatory
   architecture underlying sheep ZGA and provide a framework for
   dissecting early embryonic development in this species.
1. Introduction
   Zygotic genome activation (ZGA) is a crucial transcriptional event in
   mammalian development, signifying the transition from maternal mRNA and
   proteins to transcription driven by the embryonic genome [[38]1].
   During the early stages of embryonic development, transcription remains
   largely inactive, and the embryo primarily relies on maternal
   transcripts and proteins [[39]2,[40]3,[41]4]. ZGA occurs in two
   distinct phases: minor ZGA and major ZGA [[42]5,[43]6]. Minor ZGA,
   characterized by low-level gene expression, establishes the groundwork
   for major ZGA [[44]7]. During minor ZGA, RNA polymerase II (RNA Pol II)
   accumulates at gene promoters with high GC contents. In contrast, major
   ZGA is marked by a significant increase in transcriptional activity
   [[45]8,[46]9]. The timing of ZGA varies among species, occurring at the
   1-cell stage in mice, the late 2-cell stage in cattle and humans, and
   the 8- vs 16-cell stage in sheep [[47]10,[48]11,[49]12]. Epigenetic
   modifications, notably histone H3 lysine 4 trimethylation (H3K4me3),
   have emerged as critical regulators of ZGA. Broad H3K4me3 domains
   established in oocytes are refined into promoter-centric peaks during
   major ZGA, thereby facilitating transcriptional competence [[50]13]. In
   that same study, Kdm5b knockdown was shown to enhance H3K4me3
   deposition at LINE1 and ERVL elements, which correlated with increased
   RNA polymerase II occupancy and TE expression during ZGA [[51]13].
   These findings underscore a dual role for H3K4me3 in licensing both
   gene- and TE-driven transcription during early development [[52]13].
   Despite these advances, the extent to which epigenetic and
   transcriptional regulators are organized into coordinated networks
   during sheep ZGA remains poorly understood. In particular, it is
   unknown whether sheep embryos deploy discrete co-expression modules to
   orchestrate minor and major ZGA or how H3K4me3–Pol II interactions at
   TEs integrate into these networks. To address these gaps, we performed
   Smart-seq2 single cell RNA sequencing on in vivo- and in vitro-derived
   sheep embryos at the 8-, 16-, and 32-cell stages. By combining
   differential expression and weighted gene co-expression network
   analyses, we aim to (i) delineate the modular transcriptional
   architecture of sheep ZGA, (ii) identify key signaling pathways and hub
   regulators, and (iii) explore the interplay between epigenetic
   reprogramming and TE activation.
   This work establishes a comprehensive framework for understanding the
   molecular drivers of early embryonic development in sheep.
2. Materials and Methods
   In this experiment, all embryo samples were collected from the Key
   Laboratory of Livestock and Grassland Resources Utilization in the
   Tarim Basin (Tarim University Animal Experiment Station, Xinjiang,
   China). The ovaries were sourced from Xiyu Livestock Co., Ltd., located
   in Aksu, Xinjiang, China, during routine slaughter. A total of 12
   single-embryo samples were used for Smart-seq2-based transcriptome
   sequencing, including embryos at the 8-cell, 16-cell, and 32-cell
   stages. For each developmental stage, four embryos were analyzed: two
   in vivo-derived and two in vitro-derived. All embryos were
   morphologically normal and evaluated based on cell number, symmetry,
   and zona pellucida integrity under a stereomicroscope.
2.1. Chemicals and Culture Media
   All chemicals were purchased from Sigma-Aldrich (Oakville, ON, Canada),
   unless stated otherwise.
2.2. Animals and Sample Collection
2.2.1. In Vivo Embryo Production
   Healthy female sheep, aged 2 to 4 years, were selected as donors to
   ensure high oocyte quality. Superovulation was initiated during the
   luteal phase using a decreasing dose of follicle-stimulating hormone
   (FSH) via intramuscular injection over 3–4 days. Concurrently with the
   final FSH injection, 2 mg of prostaglandin F[2]α (PGF[2]α) was
   administered intramuscularly to induce luteolysis and estrus. Estrous
   behavior was closely monitored 36–48 h after PGF[2]α injection, and
   artificial insemination (AI) was performed 12 h post-estrus detection.
   Embryos were recovered non-surgically on day 6 post-insemination, with
   mild sedation applied to minimize stress. The uterus was flushed with
   sterile saline, and the recovered fluid was transferred to a Petri
   dish. Under a stereomicroscope, embryos at the 8-cell and 16-cell
   stages were selected and evaluated based on morphological
   characteristics, including cell number, symmetry, and zona pellucida
   integrity.
2.2.2. In Vitro Embryo Production
     * Oocyte Collection and In Vitro Maturation (IVM)
   Ovaries were obtained from abattoir-sourced sheep and transported in
   sterile saline at 38 °C containing 100 IU/mL penicillin and 100 µg/mL
   streptomycin. Upon arrival, ovaries were rinsed in saline and 70%
   ethanol and then maintained at 37 °C.
   Follicles (2–6 mm diameter) were aspirated using a sterile 5 mL
   syringe; cumulus–oocyte complexes (COCs) were selected under a
   stereomicroscope. Grade A COCs exhibited ≥3 compact cumulus cell
   layers; Grade B COCs had 2–3 layers with a uniform cytoplasm.
   Selected COCs were rinsed in pre-equilibrated maturation medium and
   cultured in 80 µL droplets (30 COCs/droplet) under mineral oil at 38.5
   °C, 5% CO[2], and saturated humidity for 24 h.
   After maturation, cumulus cells were removed by incubation in PBS
   containing 0.1% hyaluronidase; oocyte maturity was confirmed based on
   first polar body extrusion.
     * In Vitro Fertilization (IVF)
   Ram semen was collected via electroejaculation, diluted 1:4 in diluent,
   and incubated at 37 °C. Capacitation was performed according to the
   manufacturer’s protocol. After 30 min at 38.5 °C, sperm were washed
   twice at 800 × g and resuspended to 1 × 10^6 sperm/mL. Mature oocytes
   were washed in fertilization medium and co-incubated with sperm (≤15
   oocytes/50 µL droplet) at 38.5 °C, 5% CO[2], and 100% humidity for 18
   h. At 24 h post-insemination, cumulus cells were removed by gentle
   pipetting in PBS, and presumptive zygotes were washed twice in
   equilibrated SOF before culture.
     * In Vitro Culture (IVC)
   Presumptive zygotes were cultured in synthetic oviduct fluid (SOF)
   supplemented with 0.5× essential and non-essential amino acids, 3 mg/mL
   bovine serum albumin, and 5% fetal bovine serum. Seventy-microliter SOF
   droplets under mineral oil were equilibrated at 38.5 °C, 5% CO[2], and
   saturated humidity for ≥4 h. No more than 15 embryos were cultured per
   droplet. Cleavage was assessed at 48 h (2–4-cell rate) and 72 h
   (8–16-cell rate) post-insemination. Half of the medium was replaced
   every 48 h (days 2, 4, and 6) without disturbing the embryos.
   Morula formation was evaluated on day 5 (120 h), based on compaction
   and the cell number (16–32 cells). Blastocyst development was recorded
   on day 7 (168 h); blastocysts were graded (1–3) based on blastocoel
   expansion and inner cell mass quality according to International Embryo
   Technology Society standards. All procedures were conducted under
   sterile conditions and recorded with time-stamped observations for
   subsequent analysis.
2.3. RNA Extraction and High-Throughput Sequencing
   Two embryos were collected from each of the in vivo and in vitro groups
   at the 8-cell, 16-cell, and 32-cell stages, resulting in four embryos
   per developmental stage (total = 12 samples). Each embryo was
   immediately placed in 10 µL of lysis buffer containing RNase inhibitors
   (RNeasy Micro Kit, Qiagen, Hilden, Germany, Cat. No. 74004) to preserve
   RNA integrity. Total RNA was extracted according to the manufacturer’s
   protocol. The RNA concentration and purity were measured on a NanoDrop
   2000 spectrophotometer (Thermo Fisher Scientific, San Diego, CA, USA),
   and the RNA integrity was assessed on an Agilent 2100 Bioanalyzer
   (Agilent Technologies, Santa Clara, CA, USA); all samples exhibited an
   RIN ≥ 8.0.
   First-strand cDNA synthesis was performed using the PrimeScript RT
   Reagent Kit (Takara, Shiga, Japan, Cat. No. RR037A) following the
   manufacturer’s protocol. RNA libraries were prepared using the NEBNext
   Ultra II RNA Library Prep Kit for Illumina (New England Biolabs,
   Ipswich, MA, USA, Cat. No. E7770) with poly(A) enrichment. Sequencing
   was conducted on an Illumina NovaSeq 6000 platform (Illumina, San
   Diego, CA, USA) with paired-end 150 bp reads.
2.4. RNA Data Analysis
   The raw sequencing reads were first evaluated with FastQC v0.12.1
   ([53]https://www.bioinformatics.babraham.ac.uk/projects/fastqc/,
   accessed on 5 June 2025) to assess the per-base quality, GC content,
   and adapter contamination. Adapter and low-quality sequences were then
   removed using Trimmomatic v0.39 [[54]14] with the following parameters:
   (1) Phred33 quality encoding; (2) trimming of bases with quality <3
   from both the 5′ and 3′ ends; (3) application of a 5-bp sliding window
   to cut when the average quality fell below 20; (4) removal of the first
   13 bp to eliminate Smart-seq2 template-switching bias; and (5)
   discarding of any reads shorter than 36 bp. The resulting clean reads
   were aligned to the sheep reference genome (Oar_v4.0) using HISAT2
   v2.2.1 [[55]15], sorted with SAMtools, and quantified with
   FeatureCounts v2.0.6 [[56]16] to produce a gene-by-sample count matrix.
   Differential expression analysis was carried out in R (v4.4.0) using
   DESeq2 v1.34.0 [[57]17], applying the Benjamini–Hochberg correction;
   genes with an adjusted p-value (p.adj) < 0.05 and |log[2] fold-change|
   > 1 were defined as differentially expressed and used for subsequent
   analyses.
2.5. Weighted Gene Co-Expression Network Analysis (WGCNA)
   We constructed a gene co-expression network using the WGCNA R package
   in the R environment (version 4.4.0) [[58]18]. First, we selected the
   top 5000 genes with the greatest variation in gene expression (measured
   based on the median absolute deviation, MAD), and these genes were used
   to build the co-expression network. To construct the weighted
   co-expression network, we evaluated different soft threshold values (β)
   and determined the optimal value. The pickSoftThreshold function was
   used to identify the best soft threshold, which amplifies strong
   correlations between genes and penalizes weak correlations [[59]19].
   Based on the calculated optimal soft threshold (β = 8), we proceeded
   with the network construction. To explore the correlation between
   different gene modules and various cell developmental stages, we
   utilized the module eigengene (ME) values and the sample phenotype data
   and visualized the correlations between the modules and phenotypes
   using a heatmap.
2.6. Louvain Community Detection
   We constructed a gene co-expression network based on the PPI network
   obtained from the MERed module, which was derived from the WGCNA
   analysis. The purpose was to identify potential regulatory modules
   involved in zygotic genome activation (ZGA) in sheep. Experimental
   protein–protein interaction (PPI) data were initially processed to
   generate an undirected weighted graph
   [MATH: G=(V,E) :MATH]
   , where each node represents a gene and the weight
   [MATH:
   Aij :MATH]
   assigned to an edge indicates the interaction strength between gene
   [MATH: i :MATH]
   and gene
   [MATH: j :MATH]
   . The degree of node
   [MATH: i :MATH]
   is defined as
   [MATH: Ki=∑jAij
   :MATH]
   and the total weight of all edges is given by
   [MATH: m=12∑i,j
   mi>Aij
   . :MATH]
   Since the raw PPI network comprises a considerable number of low-degree
   nodes that often participate in only a few interactions and may be
   confounded by experimental noise, a filtering step was introduced
   during the initial network construction. Based on a statistical
   analysis of the node degree distribution—where the maximum node degree
   observed was approximately 213—we set a degree threshold of
   degree_threshold = 200, retaining only those nodes with
   [MATH: ki :MATH]
   > 200. This filtering strategy focuses the subsequent analyses on core
   genes, which are likely to function as hubs in the network and play
   critical roles in the regulatory mechanisms underlying ZGA in sheep.
   Following network filtering, the Louvain algorithm was employed for
   community detection. Initially, every gene was assigned to an
   individual community, and the quality of the resulting partition was
   evaluated based the modularity Q, defined for a weighted network as
   [MATH: Q=12m∑i,j
   mi>Aij
   −kikj2mδci,cj, :MATH]
   where
   [MATH: ci :MATH]
   represents the community assignment of gene
   [MATH: i :MATH]
   and
   [MATH: δci,c<
   mi>j :MATH]
   is the Kronecker delta function, which takes the value 1 when
   [MATH: ci=c<
   mi>j :MATH]
   and 0 otherwise. For each node
   [MATH: i :MATH]
   , the algorithm calculates the change in modularity
   [MATH: ΔQ :MATH]
   associated with moving
   [MATH: i :MATH]
   from its current community to the community of one of its neighboring
   nodes. The modularity increment is calculated as
   [MATH: ΔQ=∑in+ki,in
   2m−
   mo>∑tot+ki
   2m2
   mn>−∑in2m−∑tot2m2−ki2m
   mrow>2 :MATH]
   A higher modularity Q indicates that interactions among genes within
   the same community are relatively dense, suggesting that these genes
   may participate in the same biological processes or regulatory networks
   [[60]20]. Specifically, for each gene, the algorithm evaluates the
   change in modularity (
   [MATH: ΔQ :MATH]
   ) that would result from moving the gene from its original community to
   one of its neighboring communities; only when
   [MATH:
   ΔQ>0
   :MATH]
   is the gene reassigned, thereby gradually optimizing the network’s
   modular structure. After local optimization, the gene nodes within the
   same community are aggregated into “super-nodes” to construct a new
   aggregated network, and this process is iterated until the modularity
   converges, ultimately yielding a hierarchical community structure that
   reflects the regulatory characteristics of zygotic genome activation in
   sheep. The complete source code is available on GitHub (version 3.4.20)
   ([61]https://github.com/miexiaochi/RNA_seq_data/tree/master, accessed
   on 5 June 2025).
2.7. Functional Annotation and Pathway Enrichment Analysis
   Using R software (version 4.4.0), we performed GO enrichment analysis
   and KEGG pathway annotation for differentially expressed genes at
   various developmental stages using the clusterProfiler R package
   [[62]21]. The GO enrichment analysis includes three categories:
   Cellular Component (CC), Molecular Function (MF), and Biological
   Process (BP).
3. Results
3.1. Summary of RNA-Seq Data
   After quality trimming with Trimmomatic and alignment to the sheep
   reference genome using HISAT2, we obtained 299 GB of SAM-format reads.
   A summary of the RNA-Seq dataset, including the alignment metrics, is
   provided in [63]Supplementary Table S1. Gene-level counts for 27,754
   annotated genes were generated with FeatureCounts ([64]Supplementary
   Table S2). Differential expression analysis with DESeq2 identified 114,
   1628, and 1465 DEGs in the 8- vs. 16-cell, 16- vs. 32-cell, and 8- vs.
   32-cell comparisons, respectively ([65]Figure 1A; [66]Supplementary
   Table S3). Strikingly, only two genes were shared across all three
   contrasts, indicating largely nonoverlapping transcriptional programs
   at each transition ([67]Figure 1B). Notably, the core pluripotency
   factors SOX2, NANOG, POU5F1, and KLF4 showed increased mean expression
   in the 16- vs. 32-cell transition, although paired t-tests did not
   reach significance—likely reflecting high intercellular variability at
   this stage ([68]Figure 1C; [69]Supplementary Table S4).
Figure 1.
   [70]Figure 1
   [71]Open in a new tab
   Analysis of significantly differentially expressed genes across
   different embryonic developmental stages. (A) Comparison of gene
   expression changes in the transitions 8- vs. 16-cell, 16- vs. 32-cell,
   and 8- vs. 32-cell, showing numbers of significantly upregulated (UP)
   and downregulated (DOWN) genes. (B) Venn diagram of stage-specific and
   shared differentially expressed genes among the 8-, 16-, and 32-cell
   stages. Numbers in each region indicate the counts of unique and
   overlapping DEGs. (C) Expression patterns of four core pluripotency
   transcription factors (SOX2, NANOG, POU5F1, KLF4) across the 8-, 16-,
   and 32-cell stages. Boxplots display the median and interquartile range
   (IQR); whiskers extend to 1.5× IQR, and hollow circles denote
   individual outlier values beyond this range. Pairwise t-test
   significance is annotated above each comparison (ns, not significant).
3.2. WGCNA Analysis Result
   We conducted WGCNA analysis on the counts matrix, encompassing gene
   clustering, module identification, and module interaction assessment
   ([72]Figure 2A). This analysis identified a developmentally significant
   module, MEred, which is strongly associated with embryonic progression.
   The key genes within this module play a central regulatory role during
   the transition from the 16-cell to 32-cell stage ([73]Figure 2B).
Figure 2.
   [74]Figure 2
   [75]Open in a new tab
   WGCNA and Louvain Community Detection Results. (A) Heatmap showing the
   predicted correlations among gene co-expression modules identified via
   WGCNA. (B) Module–trait relationship heatmap across embryonic
   developmental stages. The X-axis denotes the three developmental stages
   (8-cell, 16-cell, 32-cell), and the Y-axis lists the color-coded WGCNA
   gene modules. Each cell shows the Pearson correlation coefficient
   (upper number) between the module eigengene and stage, with the
   corresponding p-value in parentheses below. A continuous color bar
   alongside maps the correlation values from –1.0 (green) to +1.0 (red),
   facilitating a direct interpretation of the association strength and
   direction. (C) Circular plot illustrating the results of Louvain
   community detection and enrichment analysis. The outer ring displays
   genes from Community 1 (purple nodes), and the inner ring shows genes
   from Community 0 (light blue nodes). The central network represents
   functional enrichment results generated using the R package aPEAR, with
   node colors corresponding to the respective community.
3.3. Louvain Community Detection Result
   To dissect the internal modularity of the MEred-derived PPI network
   ([76]Supplementary Table S5), we first filtered out low connectivity
   nodes (degree ≤ 200), yielding a core subnetwork of 182 genes linked by
   1045 high-confidence edges. We then applied the Louvain algorithm for
   community detection, which partitions the network by optimizing
   modularity through iterative node reassignment and aggregation. This
   procedure identified two major communities, each corresponding to a
   distinct functional program during sheep ZGA ([77]Figure 2C).
   Community 0, composed of 89 genes, is functionally enriched in membrane
   trafficking and cytoskeletal dynamics. Central to this module is
   TRAPPC2, a core component of the TRAPP complex, which is essential for
   ER-to-Golgi vesicle tethering and fusion, facilitating early secretory
   pathway organization. Other prominent members, such as BCAP29 and
   YIPF5, are involved in protein transport and vesicle docking,
   reinforcing this module’s role in intracellular trafficking. The
   presence of KIF18A, a kinesin motor protein that regulates
   microtubule-based chromosome alignment during mitosis, links membrane
   trafficking with cytoskeletal remodeling. Gene Ontology (GO) enrichment
   supports these functions, highlighting vesicle-mediated transport
   (GO:0006888, p.adj = 4.8 × 10^−5), transmembrane transporter activity
   (GO:0022857, p.adj = 5.8 × 10^−2), and microtubule motor activity
   (GO:0003777, p.adj = 2.6 × 10^−2).
   Community 1, encompassing 93 genes, converges on cell cycle regulation
   and RNA processing. This module is anchored by CDK1 (Cyclin-dependent
   kinase 1), a master regulator of the G2/M transition, which
   phosphorylates key substrates to initiate mitotic entry. Other pivotal
   genes include CDC25C, which activates CDK1 via dephosphorylation;
   PTTG1, a securin that prevents premature chromatid separation; and
   FBXO5, which inhibits the anaphase-promoting complex, ensuring proper
   cell cycle progression. These components drive robust enrichment in
   cell division (GO:0051301, p.adj = 9.6 × 10^−6), mitotic cell cycle
   processes (GO:1903047, p.adj = 4.9 × 10^−4), and oocyte meiosis
   (oas04114, p.adj = 1.6 × 10^−2). Notably, RNA processing is tightly
   integrated into this regulatory axis through PRPF18, MAGOH, and PPWD1,
   which are core components of the spliceosome, mediating intron removal
   and mRNA maturation. GO terms such as spliceosomal complex (GO:0005681,
   p.adj = 2.1 × 10^−3) and catalytic step 2 spliceosome (GO:0071013,
   p.adj = 3.5 × 10^−3) confirm this role.
   Together, these findings reveal a modular regulatory architecture
   underpinning sheep ZGA. Community 0 orchestrates membrane remodeling
   and cytoskeletal organization, facilitating the establishment of early
   secretory and signaling capacity, while Community 1 governs cell cycle
   entry and transcriptome maturation, thereby supporting precise
   coordination of nuclear and cytoplasmic events during the
   maternal-to-zygotic transition
3.4. Results of Functional Annotation and Pathway Enrichment Analysis
   In the comparison of 8 vs. 16 cells ([78]Figure 3A,B), GO enrichment
   revealed that processes such as “neuronal differentiation of the
   central nervous system” (GO:0021953), “cell adhesion” (GO:0007155), and
   “cell-cell adhesion” (GO:0098609) were significantly enriched. The
   corresponding differentially expressed genes included CBLN1, NPY, MOG,
   FUT9, RAC2, etc. This result indicates that at the 16-cell stage, genes
   related to neuronal differentiation in the embryonic transcriptome
   begin to be expressed, and intercellular adhesion molecules gradually
   establish, laying the foundation for the subsequent formation of cell
   polarity and tissue morphology ([79]Figure 3A). The KEGG enrichment
   analysis showed that the “Rap1 signaling pathway” (bta04015) and the
   “cAMP signaling pathway” (bta04024) were marginally significantly
   different (p.adjust ≈ 0.068) at the 8–16 stage. Rap1 and cAMP signaling
   play a crucial role in embryonic cell proliferation, migration, and
   differentiation. Their early activation indicates that the cell
   signaling network begins to mediate phenotypic shaping after ZGA
   ([80]Figure 3B). In the comparison of 16 vs. 32 cells ([81]Figure
   3C,D), the GO enrichment of “translation” (GO:0006412), “regulation of
   translation” (GO:0006417), and “peptide biosynthesis process”
   (GO:0043043) was significantly enriched. The corresponding
   differentially expressed genes included EIF4EBP1, EIF3B, MRPS34, RPLP2,
   AKT1, etc. In the KEGG pathway, “ribosome” (bta03010) showed extremely
   significant enrichment (p.adjust ≈ 2.37 × 10^−18), and 52 out of 551
   differentially expressed genes were members of the ribosomal protein
   family. This result indicates that the protein translation capacity of
   the 16- vs. 32-cell phase has reached its peak, providing a foundation
   for rapid cell proliferation and functional differentiation, which is
   consistent with the extensive mRNA translation and ribosome assembly
   during the main ZGA period ([82]Figure 3C). The KEGG pathway “oxidative
   phosphorylation” (bta00190, 33/551 genes, p.adjust ≈ 1.37 × 10^−8) and
   “carbon metabolism” (bta01200, 19/611 genes, p.adjust ≈ 0.0135) are
   significantly enriched, clearly indicating a significant enhancement in
   the mitochondrial energy supply. During this period, the cell
   proliferation rate accelerated, requiring a higher ATP output and the
   need for energy metabolism reprogramming in the early stage of
   development to adapt to rapid proliferation ([83]Figure 3D). For the 8-
   vs. 32-cell overall comparison, protein synthesis and mitochondrial
   function continue to strengthen; genes related to neuronal
   differentiation and cell adhesion are gradually expressed; the signal
   transduction network and metabolic pathways jointly regulate,
   demonstrating the dynamic coupling of transcription, translation, and
   metabolism during embryonic development ([84]Figure 3E,F).
Figure 3.
   [85]Figure 3
   [86]Open in a new tab
   Functional enrichment analysis of differentially expressed genes during
   early embryonic development stages. (A) GO enrichment analysis of
   differentially expressed genes between the 8-cell stage and the 16-cell
   stage. (B) KEGG enrichment analysis of differentially expressed genes
   between the 8-cell stage and the 16-cell stage. (C) GO enrichment
   analysis of differentially expressed genes between the 16-cell stage
   and the 32-cell stage. (D) KEGG enrichment analysis of differentially
   expressed genes between the 16-cell stage and the 32-cell stage. (E) GO
   enrichment analysis of differentially expressed genes between the
   8-cell stage and the 32-cell stage. (F) KEGG enrichment analysis of
   differentially expressed genes between the 8-cell stage and the 32-cell
   stage.
   In the 16-cell to 32-cell comparison, a substantial increase in
   transcriptomic activity was observed (n = 1628 DEGs). However, multiple
   enriched GO terms lost statistical significance after multiple testing
   correction, likely due to functional heterogeneity ([87]Figure 3C).
   Nonetheless, the enrichment of terms such as granulocyte activation
   (GO:0036230), leukocyte activation involved in immune response
   (GO:0002366), and regulation of cell morphogenesis (GO:0022604) was
   detected ([88]Figure 3D). KEGG analysis identified enrichment in
   pathways such as proteoglycans in cancer (oas05205), tight junction
   (oas04530), and endocytosis (oas04144), indicating the activation of
   cell adhesion, cytoskeletal remodeling, and vesicular trafficking
   programs critical for morphogenesis and compaction during this
   transition ([89]Figure 3C,D).
   The 8-cell to 32-cell comparison, which reflects the cumulative
   transcriptional changes across early ZGA, yielded a broad spectrum of
   enriched terms predominantly related to biosynthetic and metabolic
   processes. Notably, GO terms such as amide metabolic process
   (GO:0043603), translation (GO:0006412), cellular respiration
   (GO:0045333), and oxidative phosphorylation (GO:0006119) were
   significantly overrepresented (adjusted p < 1 × 10^−7) ([90]Figure
   3E,F). KEGG pathway analysis consistently demonstrated enrichment in
   metabolism-centered pathways, including carbon metabolism (oas01200),
   biosynthesis of nucleotide sugars (oas01250), inositol phosphate
   metabolism (oas00562), and oxidative phosphorylation (oas00190),
   suggesting a developmental shift toward enhanced translational output
   and mitochondrial bioenergetics to support post-ZGA proliferation and
   morphogenesis ([91]Figure 3E,F).
   Collectively, these results indicate that early-stage DEGs (8 vs. 16)
   are functionally linked to signaling and immune priming, whereas
   later-stage DEGs (16 vs. 32 and 8 vs. 32) are primarily engaged in
   protein biosynthesis, energy metabolism, and organelle maturation. This
   stage-specific transition in molecular function reflects the
   progressive reprogramming of embryonic cells during ZGA and provides
   functional insight into the transcriptional logic of early sheep
   development.
4. Discussion
4.1. Two-Stage Regulatory Mechanisms of Sheep ZGA
   Sheep zygotic genome activation (ZGA) proceeds not as an abrupt switch
   but as a two-phase continuum. During the minor ZGA phase (8-cell vs.
   16-cell transition), only 114 genes are differentially expressed
   ([92]Figure 1A), yet these include key signaling and patterning factors
   such as NODAL, HOXA10, and ID1. Gene Ontology enrichment highlights
   innate immune readiness and intercellular signaling (“response to other
   organism,” “defense response”; [93]Figure 3B), suggesting that early
   embryos prime essential communication and developmental pathways before
   widespread transcription begins. Notably, similar early immune- and
   signaling-related gene activations have been observed during minor ZGA
   in bovine embryos, whereas murine minor ZGA primarily primes cell cycle
   and RNA-processing genes [[94]1,[95]2]. Thus, while the two-phase logic
   appears to be conserved, sheep early embryos emphasize immune-related
   functions more strongly, perhaps reflecting species-specific
   requirements for controlling the maternal environment. This selective
   transcription may establish the framework for subsequent pluripotency
   network activation.
   In the major ZGA phase (16- vs. 32-cell transition), transcriptomic
   activity expands dramatically: 1628 DEGs are detected ([96]Figure 1A),
   including the core pluripotency transcription factors SOX2, NANOG,
   POU5F1, and KLF4 ([97]Figure 1B). Concomitantly, the DNA demethylase
   TET1 is upregulated, indicating an active epigenetic reprogramming that
   erases maternal methylation marks to enable robust embryonic
   transcription ([98]Supplementary Figure S1). The surge in the DEG count
   and the enrichment of cell-cycle and RNA-processing pathways underscore
   the global remodeling of gene regulatory networks as the embryo assumes
   transcriptional autonomy [[99]22,[100]23].
   Notably, a total of 1 742 transcription factors exhibit dynamic,
   stage-specific expression changes when both transitions are considered,
   reflecting finely tuned regulatory waves that coordinate minor and
   major ZGA. Collectively, these data support a two-stage model in which
   an initial priming phase selectively engages signaling and patterning
   genes, followed by a broad activation phase driven by core pluripotency
   factors and epigenetic modifiers. Future studies should dissect the
   upstream regulators that trigger each phase and employ targeted
   functional assays (e.g., qRT-PCR, loss-of-function) to validate
   candidate drivers of sheep ZGA.
4.2. Core Pluripotency Transcription Factors in Sheep Zygotic Genome
Activation
   SOX2, NANOG, POU5F1, and KLF4, as essential core pluripotency
   transcription factors, play pivotal roles in embryonic development
   across various species [[101]24,[102]25]. The core pluripotency
   transcription factors (TFs) NANOG, SOX2, and OCT4 collaboratively
   regulate the intrinsic pluripotency transcription factor network
   [[103]26]. Moreover, pluripotent embryonic stem cells express
   “auxiliary” pluripotency regulators, such as KLF4, a process also
   reflected by our findings ([104]Figure 1B). In sheep embryonic
   development, our research reveals a marked upregulation of POU5F1
   during ZGA, suggesting its involvement in the transition of sheep
   embryos from maternal dependence to zygotic genome activation. SOX2
   also plays a significant role in embryonic development. In sheep
   embryos, changes in its expression may be closely related to the
   formation and development of specific cell types. NANOG is vital for
   maintaining the self-renewal and pluripotency of embryonic stem cells.
   Its upregulation during sheep zygotic genome activation may provide the
   necessary conditions for the subsequent differentiation of totipotent
   embryonic stem cells [[105]27,[106]28,[107]29,[108]30]. In this study,
   the significant changes in core pluripotency transcription factors
   during sheep ZGA further confirm their central roles in regulating the
   ZGA process.
4.3. Modular Logic of PPI–Louvain Integration
   In the present study, we extended the traditional co-expression
   analysis by overlaying the MEred-derived gene set onto a
   high-confidence protein–protein interaction (PPI) network, thereby
   integrating transcriptional correlation with physical association data.
   While WGCNA effectively clusters genes according to shared expression
   profiles, it cannot discriminate between direct biochemical
   interactions and indirect, co-regulated relationships; by incorporating
   PPI information from the STRING v11 database, we leverage
   experimentally validated and computationally predicted associations to
   resolve mechanistic cores within each module [[109]31]. This
   integrative strategy is grounded in the principles of network biology,
   which posit that cellular functions emerge from the coordinated action
   of interacting proteins and that PPI networks can reveal functional
   assemblies not apparent from expression data alone [[110]32].
   To partition this combined co-expression/PPI network into biologically
   coherent substructures, we applied the Louvain algorithm, a two-phase,
   greedy optimization method designed to maximize network modularity
   [[111]33]. In the first phase, individual nodes are iteratively
   reassigned to neighboring communities if doing so yields a positive
   gain in modularity, effectively capturing local edge density
   enhancements. In the second phase, entire communities are collapsed
   into “super-nodes,” and the process repeats, uncovering a hierarchical
   organization of communities at multiple scales. This approach has been
   shown to efficiently and accurately identify community structures in
   large and complex networks, outperforming earlier methods in both speed
   and modularity quality [[112]34].
   Recognizing that peripheral or spurious interactions can obscure core
   regulatory modules, we filtered the PPI network to retain only nodes
   with a degree > 200 before community detection. This thresholding
   enriches for hub-centered subnetworks, which, in accordance with
   network biology theory, often correspond to essential regulatory
   circuits and molecular complexes [[113]32]. Indeed, a hub-based
   analysis has been demonstrated to enhance the detection of functional
   modules in both health and disease contexts, ensuring that resulting
   communities represent robust, high-confidence interaction cores rather
   than noise-driven clusters [[114]35,[115]36,[116]37,[117]38].
   Biologically, the two communities delineated by Louvain detection map
   onto the major functional demands of early embryogenesis. The first
   community, enriched for vesicle-mediated transport and cytoskeletal
   dynamics, likely orchestrates the establishment of the secretory
   pathways and membrane architecture necessary for signal reception and
   organelle biogenesis during ZGA. The second community, characterized by
   cell-cycle regulators and spliceosomal components, appears to
   coordinate the G2/M transition and mRNA processing modules essential
   for accurate cell division and transcriptome maturation. This clear
   division of labor underscores how discrete network modules can
   simultaneously satisfy the sequential and overlapping requirements of
   zygotic genome activation [[118]6,[119]39,[120]40,[121]41].
   Methodologically, our multi-layered framework—combining WGCNA and PPI
   integration with Louvain community detection—mitigates the limitations
   inherent to each individual approach. Spurious co-expression links are
   filtered by physical interaction evidence, while community detection
   distills complex networks into functionally coherent modules. Such
   integrative network methodologies are increasingly recognized as
   critical for unraveling the architecture of developmental processes,
   from molecular assemblies to higher-order morphogenetic events
   [[122]42,[123]43,[124]44]. Future extensions could incorporate
   temporally resolved PPIs or single-cell network reconstructions to
   capture the dynamic reconfiguration of regulatory modules as embryos
   progress through ZGA.
5. Conclusions
   This study provides a systems-level view of sheep zygotic genome
   activation by integrating differential expression, WGCNA, and
   PPI-guided community detection.
   We identified stage-specific transcriptional changes, highlighted core
   pluripotency factors, and uncovered two major regulatory communities
   coordinating membrane dynamics and cell cycle progression.
   These findings offer new insights into the modular regulatory
   architecture underlying early embryogenesis in sheep.
Supplementary Materials
   The following supporting information can be downloaded at:
   [125]https://www.mdpi.com/article/10.3390/biology14060676/s1.
   [126]biology-14-00676-s001.zip^ (14.2MB, zip)
Author Contributions
   X.L. Conceptualization, Methodology, Software, Data curation,
   Visualization, Writing—review and editing; P.N. Conceptualization,
   Methodology, Data curation, Visualization; K.H. Conceptualization, Data
   curation, Visualization, Writing—review and editing; F.H. Data
   curation, Visualization, Writing—review and editing; X.W.
   Conceptualization, Data curation; F.H. Data curation, Visualization,
   Writing—review and editing; P.S. Investigation, Software; Q.G.
   Investigation, Methodology, Writing—review and editing; D.F.
   Conceptualization, Methodology, Writing—review and editing. All authors
   have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
   This study was conducted in strict accordance with the “Guidelines for
   the Care and Use of Animals in Biological Research in the People’s
   Republic of China” and received approval from the Animal Ethics
   Committee of the College of Animal Science and Technology, Tarim
   University, under approval number DWBH20220101. All experimental
   animals were legally obtained from the slaughterhouse, and the study
   adhered to ethical guidelines for the use of animal tissues.
Informed Consent Statement
   Not applicable.
Data Availability Statement
   The datasets generated and/or analyzed during the current study are not
   publicly available due to privacy and proprietary constraints but are
   available from the corresponding author on reasonable request.
Conflicts of Interest
   The authors declare no conflicts of interest.
Funding Statement
   This research was supported by grants from the National Natural Science
   Foundation of China (32402760) and the Xinjiang Tianchi Doctoral
   Introduction Plan Project.
Footnotes
   Disclaimer/Publisher’s Note: The statements, opinions and data
   contained in all publications are solely those of the individual
   author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI
   and/or the editor(s) disclaim responsibility for any injury to people
   or property resulting from any ideas, methods, instructions or products
   referred to in the content.
References