Abstract

   The regulatory mechanisms governing cell fate determination,
   particularly lineage diversification during mammalian embryonic
   development, remain poorly understood with in‐depth regulatory
   paradigms yet to be fully elucidated. Here, leveraging the epigenetic
   landscape of mouse gastrula, p‐Enh is identified, an enhancer located
   within the first intron of Cdx2 and epigenetically pre‐marked in the
   primitive streak region, as a pivotal regulator for posterior tissue
   development in mouse embryos. Morphological and single‐cell
   transcriptomic analyses confirmed embryonic lethality phenotype with
   disrupted posterior tissue development trajectories in p‐Enh‐KO
   embryos. Molecularly, apart from regulating the neighboring coding‐gene
   Cdx2 in cis, the findings suggest that p‐Enh also modulates the global
   transcriptome and epigenomic landscape, which might through the
   transient production of eRNA in trans. Further investigation revealed
   p‐Enh‐derived eRNAs participate in the regulatory cascades of TGF‐β
   signaling by directly interacting with SMAD4 protein. Combinatorial
   modulation of TGF‐β signaling and p‐Enh‐eRNA abundance can largely
   rescue the posterior development deficiency in in vitro gastruloids
   through a Cdx2‐independent mechanism. Thus, a potential model is
   proposed in which the broadly distributed p‐Enh transcripts within the
   nucleus can serve as essential cross‐modular coordinators, priming the
   posterior development of mouse embryo.

   Keywords: enhancer RNA, posterior tissue development, Pre‐marked
   enhancer, TGF‐β signaling
     __________________________________________________________________

   A gastrula‐premarked enhancer within the first intron of Cdx2 governs
   posterior embryonic development via both regulating Cdx2 expression in
   cis and coordinating TGF‐β signaling in trans to ensure proper
   posterior body development. Functional rescue in gastruloids reveals a
   Cdx2‐independent pathway, highlighting p‐Enh‐eRNA as key epigenetic
   coordinators in posterior tissue development.

   graphic file with name ADVS-12-e00895-g007.jpg

1. Introduction

   Gastrulation stands as a pivotal developmental process to set up the
   embryonic body plan, characterized by the differentiation of the
   epiblast into three germ layers.^[ [56]^1 , [57]^2 ^] Studies reported
   that the posteriorly positioned primitive streak (PS) and adjacent
   posterior epiblast region can integrate the intricate developmental
   signals, and then serve as a progenitor reservoir with the competence
   to give rise to a diverse range of tissues, spanning from mesodermal
   tissues such as the anlage of somitic, limb and cardiac mesoderm to the
   posterior neural lineage of the spinal cord.^[ [58]^3 , [59]^4 , [60]^5
   ^] Nonetheless, how cells reside in the PS region coordinate the
   extrinsic signals and intrinsic molecular features and sustain the
   competence for prospective cell fate differentiation remains largely
   elusive.

   Vital developmental morphogens lay the foundation for tissue patterning
   and homeostasis across various organs.^[ [61]^6 ^] The cross‐talk among
   different pathways forms a “signal cocktail” with an optimal
   concentration niche, which facilitates diverse cell differentiation
   programs. For instance, during gastrulation, Wnt and TGF‐β signaling
   pathways are activated in the PS and mesodermal cells, then diffuse
   from the organized sources and form morphogen gradients, thus
   instructing the cell fate of surrounding tissues.^[ [62]^7 , [63]^8 ^]
   However, β‐catenin and SMAD4, the downstream effectors of Wnt and TGF‐β
   signaling, are broadly expressed throughout the entire embryo during
   gastrulation.^[ [64]^9 ^] Distinct sets of downstream developmental
   genes are then activated to promote cell fate diversification in
   different regions of the embryo.^[ [65]^1 ^] The specific underlying
   mechanisms for how these signals anchor precisely to a specific set of
   developmental genes remain as mysteries.

   The selective chromatin responsiveness, manifested as various chromatin
   accessibilities, chromatin modifications, or chromatin topologies at
   regulatory loci, to signal effectors and co‐factors, have been reported
   as an important dimension in orchestrating the regulative forces of
   extrinsic signals.^[ [66]^10 ^] Notably, the regulatory elements are
   frequently docking platforms for transcription factors (TFs) and are
   usually annotated as enhancers or promoters.^[ [67]^11 ^] Dynamics of
   regulatory elements contribute to the regulation of lineage‐specific
   transcriptional programs under distinct developmental conditions.^[
   [68]^12 ^] Mechanistic studies indicate that regulatory elements can
   not only regulate the expression of neighboring genes but also hold the
   proper cell fate plasticity through direct enhancer‐promoter chromatin
   interactions or self‐transcribed enhancer RNAs (eRNAs).^[ [69]^13 ,
   [70]^14 ^] In particular, the discovery of eRNAs has expanded the
   understanding of enhancer functions from modulating local gene targets
   in cis to even genome‐wide chromatin remodeling and cross‐modular
   interactions with the transcriptional machinery in trans.^[ [71]^15 ,
   [72]^16 , [73]^17 ^] Recently, we and others have reported that
   enhancers crucial for prospective tissues development are frequently
   pre‐marked by epigenetic modifications or TFs’ binding prior to the
   corresponding gene expression,^[ [74]^13 , [75]^18 , [76]^19 , [77]^20
   , [78]^21 , [79]^22 ^] a phenomena termed “chromatin priming”. However,
   limited knowledge has been acquired concerning the coordination between
   chromatin regulation and the extrinsic signals in the establishment of
   cellular developmental competence and the subsequent lineage
   diversification of the PS region.

   In this study, through combined bioinformatic analysis and experimental
   screening of the H3K27ac pre‐marked regulatory elements in the PS
   regions of the mouse gastrula, we identified a regulatory element,
   p‐Enh, which is located in the first intron of Cdx2. Transgenic
   enhancer reporter assay revealed that p‐Enh is pre‐activated in the PS
   region at the late gastrulation stage (E7.5) and consistently activated
   in the tailbud region during early organogenesis. Phenotypic analyses
   of the p‐Enh knockout embryo mutants revealed embryo lethality at
   around E10.5 stage with severe disruptions of posterior tissue
   development. Single‐cell RNA‐seq confirmed that the cellular abundances
   of posterior tissues, such as neuromesodermal progenitors (NMP), limb
   mesenchyme, and presomitic mesoderm, were heavily decreased.
   Mechanistically, besides regulating its neighboring coding‐gene Cdx2,
   p‐Enh could transiently express abundant eRNAs, in the nuclei of the PS
   cells at E7.5 but not in later‐stage embryos. The genetic removal of
   p‐Enh in in vitro differentiated NMP leads to dramatic remodeling of
   epigenomic landscape specifically at SMAD4 binding hotspots, thus
   reducing SMAD4 binding. RNA pull‐down assays demonstrated that eRNAs
   derived from p‐Enh directly interact with SMAD4 protein. Furthermore,
   functional assays demonstrate that combinatorial activation of TGF‐β
   signaling and eRNAs enrichment can largely rescue the p‐Enh‐KO defects.
   Thus, we propose that the transient expressed p‐Enh‐eRNAs might act as
   trans‐modular coordinator, which bridges the specific interactions with
   morphogen effectors and specific genomic loci of master regulators,
   therefore contributing to the priming of posterior tissue development.

2. Results

2.1. Screening and Identification of Pre‐Marked Distal Regulatory Elements
within PS Cells in the Mouse Gastrula

   During mouse gastrulation, the PS cells first emerge at the E6.5 stage,
   then gradually form a morphologically distinct structure at the
   posterior embryonic region, and finally establish a diverse cell
   fate‐specified progenitor pool at the late gastrulation stage (E7.0 to
   E7.5).^[ [80]^4 , [81]^23 ^] PS cells possess a remarkable capacity to
   give rise to a diverse range of tissues, encompassing mesodermal cells
   such as somites,^[ [82]^24 ^] limb mesenchyme,^[ [83]^25 ^] and cardiac
   mesoderm (Figure [84]1a).^[ [85]^26 ^] Moreover, it has been recognized
   that the PS region of the mouse gastrula also harbors a newly
   identified bi‐potent NMP, which can contribute to both the posterior
   spinal cord and mesoderm (Figure [86]1a).^[ [87]^27 , [88]^28 , [89]^29
   ^] Thus, the mechanisms governing PS cell development are fundamental
   for subsequent patterning of diverse tissues and organs.

Figure 1.

   Figure 1
   [90]Open in a new tab

   Screening and identification of pre‐marked distal regulatory elements
   within PS cells in the mouse gastrula. a) The diagram illustrating the
   lineages of posterior tissue development during mouse embryonic
   development. The precursor cells contributing to the posterior tissue
   are denoted in red shade. b) The ranking of distal regulatory elements
   (DREs) and corresponding heatmap based on quantified H3K27ac ChIP‐seq
   data.^[ [91]^18 , [92]^31 ^] The top‐30 significant DREs are shown
   based on calculated ‐log[10](p‐value). p‐values are calculated by using
   a modified two‐sided t‐test (two‐sided) in MAnorm2 (see Statistical
   analysis). Heatmap is constructed based on log[2](normalized count).
   DREs with retained abundant H3K27ac at E9.5 are marked with asterisk.
   c) The heatmap visualizing the eRNA qPCR results correlated with
   Figure [93]1b. Multiple tissue samples are obtained from embryos at
   E7.0, E7.5, and E9.5. Tissue samples are stratified into three groups
   based on developmental trajectories. For each DRE locus, two
   independent primer sets with distinct targeting segments are used for
   qPCR test, all tests are replicated in pairs to ensure reliability. The
   DREs highlighted in bold indicating DREs subjected to transgenic assay.
   d) The LacZ staining revealing spatiotemporal activities of distinct
   DREs at key developmental stages (E7.0, E7.5, E9.5). (Some transgenic
   embryos with no detectable positive signals are labeled with “no
   signal”) Scale bar: 500 µm. e) The genome browser snapshots showing
   chromatin status near p‐Enh (DRE‐1). The tracks including published
   H3K27ac ChIP‐seq,^[ [94]^18 , [95]^31 ^] ATAC‐seq,^[ [96]^122 ^]
   ChIP‐seq,^[ [97]^38 , [98]^39 , [99]^40 , [100]^41 , [101]^42 ,
   [102]^43 , [103]^44 ^] GRO‐seq,^[ [104]^45 ^] and RNA‐seq^[ [105]^123
   ^] datasets. f) RNAscope targeting p‐Enh‐eRNA during gastrulation
   (E6.5, E7.0, E7.5). The schematic of the mouse gastrula is presented on
   the left side. Zoomed‐in view showing the enrichment of p‐Enh‐eRNA in
   the E7.5 PS. The labeling scheme employs “A” for anterior region, “P”
   for posterior region, “Epi” for epiblast, “Mes” for mesoderm, and “End”
   for endoderm. Scale bar: 50 µm.

   Previously, we identified the epigenetic pre‐mark at regulatory
   elements associated with developmental genes during gastrulation. This
   mechanism may serve as a widely available mechanism that primes
   prospective cell fate for following organogenesis.^[ [106]^18 ^] Here,
   to pinpoint the specific regulatory elements responsible for the
   prospective developmental competence of the PS cells, we first employed
   MAnorm2^[ [107]^30 ^] to search for pre‐marked elements with low
   H3K27ac at E7.0 stage, abundant H3K27ac enrichment at E7.5 stage,
   sustainable H3K27ac signal at E9.5 tailbud region (Figure [108]1b). In
   total, 1828 distal regulatory elements (DREs) were identified with
   significant upregulation (twofold, p‐value < 0.01) of H3K27ac
   enrichment within the posterior tissue from E7.0 to E7.5 stage (Figure
   [109]S1a, Supporting Information). Genomic Regions Enrichment of
   Annotations Tool (GREAT) analysis of the top‐100 DREs, ranked by
   p‐value, revealed that these DREs are related to genes involved in
   anterior‐posterior (A‐P) patterning, animal organ morphogenesis, neural
   tube development, as well as mesenchymal cell differentiation, which
   further indicate that the pre‐mark of H3K27ac at DREs are functionally
   related to the posterior tissue development (Figure [110]S1b,
   Supporting Information; Table [111]S1, Supporting Information). To
   determine the enrichment of H3K27ac during the following organogenesis,
   we checked the H3K27ac distribution at the DREs in E9.5 tailbud tissues
   by incorporating the published dataset.^[ [112]^31 ^] As shown in
   Figure [113]1b, among the top‐30 ranked DREs, 10 DREs (marked with
   asterisk) maintain abundant H3K27ac enrichment in E9.5 tailbud.

   Enhancer RNAs, which are transcribed from corresponding enhancers and
   often pre‐loaded with RNA polymerase II and H3K27ac, could be used to
   represent the enhancer activity and also be involved in the regulation
   of gene expression and cell fate determination.^[ [114]^32 , [115]^33 ,
   [116]^34 , [117]^35 , [118]^36 ^] To further refine the scope of
   potential functional DREs, we determined the abundance of eRNA
   transcripts for the top‐30 DREs across various tissues of embryos at
   different stages (E7.0, E7.5, E9.5) using RT‐qPCR (Figure [119]1c).
   Based on the expression pattern of these eRNAs, we clustered the eRNAs
   into three groups: posterior early‐activated group with relatively high
   and specific eRNA expression in the PS cells of E7.5 stage; posterior
   late‐activated group with obvious eRNA expression in the E9.5 trunk and
   tailbud; and posterior no‐enriched group (Figure [120]1c). Next, to
   determine the innate regulatory activity during mouse embryogenesis, we
   used in vivo transgenic enhancer reporter system to validate the DREs
   from the above three groups (Figure [121]S1c, Supporting
   Information).^[ [122]^18 , [123]^37 ^] We performed transgenic reporter
   assays with the top‐10 DREs, spanning the three eRNA groups
   (highlighted in bold in Figure [124]1c) and found that 6 out of 10 DREs
   could drive LacZ expression in the tested developmental stages (E7.0 to
   E9.5). Notably, the majority (5 out of 6) of DREs with LacZ signals
   belonged to the early‐activated group (DRE‐1, 2, 3, 5, and 8). Only one
   DRE (DRE‐6) from the late‐activated group showed LacZ signal (1 out of
   3), and no positive results were observed in the posterior no‐enriched
   group (Figure [125]1d; Figure [126]S1d, Supporting Information). Within
   these six DREs, DRE‐1 exhibits the most pronounced caudal tissue
   specificity of LacZ signals. DRE‐1 is de novo activated in the E7.5 PS
   cells with extraembryonic activity in E7.0 embryos, and its posterior
   region‐specific enhancer activity is sustained in the tailbud region
   till later organogenetic stages (E8.5, E9.5, and E11.5)
   (Figure [127]1d; Figure [128]S1e, Supporting Information). Further
   exploration of the genomic features of DRE‐1 revealed that the
   evolutionarily conserved DRE‐1 element is located in the first intron
   of the coding‐gene, Cdx2, and harbors pervasive binding sites for
   epigenetic activators (MED1 and p300),^[ [129]^38 , [130]^39 ^] signal
   effectors (β‐catenin, TCF‐1, and SMAD4),^[ [131]^40 , [132]^41 ,
   [133]^42 ^] and master regulators (T and SOX2) (Figure [134]1e; Figure
   [135]S1f, Supporting Information).^[ [136]^43 , [137]^44 ^] Motif
   enrichment and sequence composition analyses revealed that DRE‐1 is
   GC‐rich and enriched for binding motifs of key developmental
   transcription factors (Figure [138]S1g,h, Supporting Information).
   Published gene run‐on sequencing (GRO‐seq)^[ [139]^45 ^] using in vitro
   PS counterpart also demonstrated the existence of nascent transcripts
   from the DRE‐1 locus (Figure [140]1e). Given the high
   tissue‐specificity of DRE‐1 in the posterior tissue, we named the
   element as posterior enhancer (p‐Enh) in the following studies.
   Consistent with the GRO‐seq and eRNA quantification results
   (Figure [141]1c,e), RNA in situ visualization using RNAscope with
   strand‐specific probes revealed that the p‐Enh region gives rise to
   bi‐directional transcripts, with the sense‐oriented transcript being
   more abundant and thus referred to as “p‐Enh‐eRNA” in the following
   text. Specifically, the abundant p‐Enh‐eRNA signals were detected in
   the PS and adjacent epiblast region of E7.5 embryos (Figure [142]1f,
   Figure [143]S2a, Supporting Information). There were no eRNA signals
   could be detected in PS cells of earlier embryonic stages (E6.5 and
   E7.0), or in the tailbud of the organogenetic embryos (E9.5)
   (Figure [144]1f; Figure [145]S2b,c, Supporting Information).
   Additionally, despite robust expression of the Cdx2 in the ExE region,
   we found no signals of eRNA in these cells (Figure [146]S2d, Supporting
   Information). This result persuasively eliminated the possibility that
   p‐Enh‐eRNA act as a byproduct of unspliced intronic fragments of newly
   transcribed Cdx2.

   Taken together, we identified a DRE, p‐Enh, which is pre‐marked by
   H3K27ac specifically in posterior epiblast at E7.5, possessing the
   posterior tissue‐specific enhancer activity from late gastrulation
   stage to organogenesis stage. Notably, p‐Enh shows transient eRNA
   transcription with strict spatial‐temporal specificity, occurring only
   in the E7.5 posterior epiblast.

2.2. Genetic Removal of p‐Enh Causes Severe Posterior Tissue Developmental
Defects and Embryonic Lethality at the Organogenetic Stage

   To investigate the potential functional role of p‐Enh during mouse
   embryo development, we genetically knocked‐out the p‐Enh (≈1.3 kb) in
   mouse embryo using CRISPR/Cas9 system (Figure [147]S3a, Supporting
   Information). No live mouse individual with homozygous p‐Enh removal
   (p‐Enh‐KO) could be acquired (Figure [148]S3b, Supporting Information).
   Further analyses identified that p‐Enh‐KO embryos are deceased around
   the E10.5‐E11.5 stage with noticeable posterior tailbud developmental
   abnormalities and hindlimb absence (Figure [149]2a; Figure [150]S3c,d,
   Supporting Information). TUNEL assay further confirmed that the
   apoptotic signals become prominent in the E10.5 p‐Enh‐KO embryos,
   predominantly enriched at the tailbud and the brain regions (Figure
   [151]S3e, Supporting Information).

Figure 2.

   Figure 2
   [152]Open in a new tab

   Genetic removal of p‐Enh causes severe posterior tissue developmental
   defects and embryonic lethality at the organogenetic stage. a) Bright
   field images of WT and p‐Enh‐KO mouse embryos at different development
   stages (E8.5, E9.5, E10.5, and E11.5). The white dotted lines
   representing the top baseline of the head. Arrowheads indicating the
   tailbud region. The tailbud and limb are outlined with dotted lines,
   with tailbud in white, forelimb in black, and hindlimb in yellow. FL:
   forelimb; HL: hindlimb; TB: tailbud. Scale bar: 500 µm. b) Top left:
   the schematic diagram describing the sampling strategy of single‐cell
   RNA‐seq. Right: UMAP plots of single‐cell RNA‐seq data from WT and
   p‐Enh‐KO embryo from E8.5 to E10.5, color‐coded by cluster identities.
   Bottom left: the percent of cluster cells in WT embryos displaying
   progressive cell‐type complexity changes. c) Differential abundance of
   cell clusters with pronounced changes in p‐Enh‐KO compared with WT
   based on log[2](fold change). d) Enrichment score illustrating
   expression of top‐100 marker genes in each cell clusters. The
   enrichment score of upregulated genes are labeled with red color, and
   the downregulated genes in blue. p‐values for enrichment analysis are
   calculated using Fisher's exact test (one‐sided) and adjusted for
   multiple testing using Benjamini‐Hochberg (BH) method. e) Boxplot
   illustrating normalized read count of marker genes with significant
   expression changes between WT and p‐Enh‐KO embryos at E8.5 in indicated
   clusters. p‐values are calculated by using a two‐sided Wilcoxon rank
   sum test in Seurat FindMarkers function, with significantly
   differentially expressed genes defined as genes with log2(fold change)
   > 0.25, BH‐adjusted p‐values < 0.05 and fraction of detected cells >
   0.25.

   To systematically characterize the molecular phenotypes of p‐Enh‐KO
   embryos, we performed single‐cell transcriptomic analyses in p‐Enh‐KO
   and wildtype counterparts from the same litter covering successive
   developmental stages from E8.5 to E10.5 (Figure [153]2b). In total,
   180930 cells across three sequential stages were sequenced with a
   median coverage of 23783 unique molecular identifiers (UMIs) per cell.
   127069 cells were retained for further analysis after rigorous quality
   filters (Figure [154]S3f, Supporting Information; Experimental
   Section). By referencing the published embryo atlas dataset,^[ [155]^46
   , [156]^47 ^] 21 cell clusters of the WT embryos with precise
   annotation were identified (Figure [157]2b; Figure [158]S3g, Supporting
   Information). Generally, our annotation unveiled diverse clusters with
   unique set of markers spanning all three germ layer derivatives. For
   example, the neural tube cells were defined by the expression of Sox2,
   Pax6, and Fabp7, while NMPs were distinguished by Epha5, T, and Fgf8
   (Figure [159]S3g, Supporting Information). Additionally, the dynamics
   of cell type proportions across developmental stages also reflect the
   sequential cell fate differentiation during organogenesis
   (Figure [160]2b). Thus, the single‐cell transcriptome profile of early
   organogenesis ranging from E8.5 to E10.5 faithfully captures the in
   vivo developmental events.

   Next, we projected the single‐cell transcriptome of p‐Enh‐KO embryos
   onto the WT reference, and found that the major cell type compositions
   between the WT and p‐Enh‐KO embryos were comparable (Figure [161]2b).
   The prominent apoptotic signals identified in E10.5 p‐Enh‐KO embryos
   were further confirmed by the assessment of apoptosis and hypoxia
   activity hallmarks expression (Figure [162]S4a, Supporting
   Information). To mitigate the potential bias caused by apoptotic cells,
   we focused our analyses on the E8.5 and E9.5 embryos in the following
   sections. Differential cellular abundance analysis highlighted a
   substantial decrease in posterior derived tissues of p‐Enh‐KO embryos,
   such as NMP, presomitic mesoderm, and limb mesenchyme (Figure [163]2c;
   Figure [164]S4b, Supporting Information). Following this, we calculated
   the enrichment scores of marker gene sets within each cluster.
   Remarkably, while the most significant cell density changes were
   observed at E9.5 (Figure [165]2c), the marker gene expression patterns,
   especially for NMP and presomitic mesoderm cells, already showed severe
   abnormalities at E8.5 in p‐Enh‐KO embryos (Figure [166]2d,e; Figure
   [167]S4c, Supporting Information). The alterations of gene expression
   prior to those of cellular phenotypes indicates that p‐Enh may function
   at an earlier stage when the posterior tissues were differentiated from
   the PS region in the gastrula.

2.3. Developmental Anomalies in both A‐P Axis Patterning and Limb Development
in p‐Enh‐KO Embryos

   Given the phenotypes of p‐Enh‐KO were mainly associated with the
   posterior tissue development, we focused on cell types comprising the
   embryonic A‐P axis from the single‐cell reference atlas, encompassing
   forebrain/midbrain/hindbrain, neural tube, NMP, presomitic mesoderm,
   and paraxial mesoderm. Sub‐clustering of these single cells by UMAP
   further divided the cells into 7 clusters (Figure [168]3a). RNA
   velocity and pseudo‐time transcriptional ordering analyses showed a
   clear bi‐directional trajectory originated from NMP toward the neural
   and mesodermal lineages, respectively (Figure [169]3a). This result
   also aligns with the model that two distinct developmental trajectories
   exist in neural tube cells: one originating from the anterior brain
   cells and the other from the bipotential NMP in the posterior region.^[
   [170]^27 , [171]^28 ^] The developmental signal gradients (RA, Wnt, and
   FGF signaling) and A‐P axis‐related marker genes alignment^[ [172]^48 ,
   [173]^49 ^] signify the robust correlation between the plotted
   single‐cell transcriptome and the A‐P axis patterning process in vivo
   (Figure [174]S5a–d, Supporting Information), suggesting that the
   reconstructed UMAP plot effectively reflects cellular distribution
   along the in vivo A‐P axis, and highlight NMP as the origin for
   posterior neural tube cells (Figure [175]3a).

Figure 3.

   Figure 3
   [176]Open in a new tab

   Developmental anomalies in both A‐P axis patterning and limb
   development in p‐Enh‐KO embryos. a) Left: cartoon diagram demonstrating
   the structure and cell types in tailbud region. RA, Wnt, and FGF
   signaling show gradient activities along the A‐P axis. Right: UMAP
   visualizing the subclusters and RNA velocity projection field. The
   dashed lines delineate diverse developmental trajectories within the
   neural tube. b) Cell density distribution of the A‐P axis related cells
   along 1D transcriptional axis. The dashed line indicating boundary
   between anterior‐derived and NMP‐derived neural tube cells. The black
   arrows indicating noticeable cell density decrease in p‐Enh‐KO embryos
   compared with WT. c) Heatmap showing scaled expression of
   trajectory‐based expressed genes based on NMP‐derived tissues (NMP,
   posterior neural tube, presomitic mesoderm, paraxial mesoderm‐1, and
   paraxial mesoderm‐2). Examples of key TFs are labeled. Scaled
   expression is calculated based on the output of the NormalizeData
   function from the Seurat package, and further normalized to a [0, 1]
   range per gene using min‐max scaling. d) In situ hybridization results
   showing the expression patterns of Hoxb9 in WT and p‐Enh‐KO embryos at
   different developmental stages. Dashed lines outlining the
   Hoxb9‐positive regions in posterior neural tube and somites. Similar
   results are obtained from embryo replicates. Scale bar: 500 µm. e)
   Expression patterns of Tcf15 in WT and p‐Enh‐KO embryos at different
   developmental stages indicating shrinkage of the undifferentiated
   progenitor of aPSM. Yellow dashed lines indicating Tcf15‐negative
   tailbud region. Similar results are obtained from embryo replicates.
   Scale bar: 500 µm. f) Lightsheet imaging of whole‐mount RNAscope
   targeting Uncx4.1 in WT and p‐Enh‐KO embryo samples at E8.5 and E9.5.
   The clear strip‐like pattern of Uncx4.1 in WT embryos disturbs in E9.5
   p‐Enh‐KO embryos. Scale bar: 500 µm. g) UMAP visualizing the projection
   field of limb related cell clusters with RNA velocity. Colors
   indicating different cell types. h) Density plots illustrating the
   distribution of limb related cells in WT and p‐Enh‐KO embryos at
   different developmental stages. i) Boxplot showing the expression
   (log[2](Normalized Read Count +1)) of limb marker genes in WT and
   p‐Enh‐KO samples. p‐values are calculated by using a two‐sided Wilcoxon
   rank sum test in Seurat FindMarkers Function. j) In situ hybridization
   of forelimb marker Tbx5 and hindlimb marker Pitx1 in WT and p‐Enh‐KO
   embryos. Scale bar: 500 µm.

   Subsequently, we compared the cellular distribution between wildtype
   and p‐Enh‐KO embryos along with the reconstructed digital A‐P map. By
   linearly plotting the single cells along the reconstructed A‐P axis,
   the p‐Enh‐KO embryo exhibited a noticeable reduction in cell density in
   NMP and their immediate developmental progenies (Figure [177]3b).
   Trajectory‐based gene expression analyses revealed significant
   disruptions in the developmental programs during both posterior neural
   tube and mesodermal lineage development in p‐Enh‐KO embryos
   (Figure [178]3c). Experimental validation showed that the expression
   territory and abundance of Hoxb9, a marker for posterior neural tube
   and mesodermal cells with the M‐shaped distribution, was restricted in
   the tailbud tip of the p‐Enh‐KO embryos (Figure [179]3d). Additionally,
   Tcf15, marker for anterior presomitic mesoderm (aPSM) and somite,^[
   [180]^50 ^] exhibited a substantial decrease and shrinkage in the
   undifferentiated progenitor pool for aPSM (Tcf15‐negative region) in
   E9.5 p‐Enh‐KO embryos (Figure [181]3e), reflecting a depletion of
   posterior progenitors and/or precocious differentiation toward anterior
   somitic fates. Besides, the highly organized stripe‐like pattern as
   marked by of Uncx4.1 and Tbx18 in WT embryos^[ [182]^51 , [183]^52 ^]
   was severely disrupted in p‐Enh‐KO embryos, especially for embryos at
   E9.5 (Figure [184]3f; Figure [185]S5e, Supporting Information).
   Interestingly, the anterior somite structure appears to be normal in
   the E8.5 p‐Enh‐KO embryos (Figure [186]3f). Given the different
   trajectory routes for the early‐anterior (early PS‐derived) and
   late‐posterior somitogenesis (NMP‐derived) during mouse embryonic
   development,^[ [187]^24 ^] the disrupted Uncx4.1 distribution indicated
   that p‐Enh may specifically regulate the NMP‐derived late‐posterior
   somitogenesis process in the mouse embryo.

   Additionally, we also observed a notable decrease in the
   posterior‐derived limb mesenchyme cell population in E9.5 p‐Enh‐KO
   embryos (Figure [188]2c). To delve into these abnormalities, we
   specifically analyzed cells associated with limb mesenchyme and further
   categorized them into three sub‐clusters representing forelimb
   progenitors, hindlimb progenitors, and other mesenchyme based on marker
   gene expression. RNA velocity analysis confirmed that both forelimb and
   hindlimb progenitors seem to share one common developmental origin
   (Figure [189]3g). Cell density analyses reveal comparable patterns
   between WT and p‐Enh‐KO embryos at E8.5, whereas there is a pronounced
   reduction in hindlimb progenitors in E9.5 p‐Enh‐KO embryos
   (Figure [190]3h). Molecular analyses identified that genes related to
   hindlimb development, such as Hoxc10, Hoxd11, Pitx1, and Tmem119, were
   significantly down‐regulated in p‐Enh‐KO embryos, while the markers of
   forelimb showed limited expression alterations (Figure [191]3i).
   Further experimental validation of the typical hindlimb marker Pitx1
   confirmed the absence of hindlimb structure in the p‐Enh‐KO embryos
   with the normal distribution of forelimb cells as marked by Tbx5
   (Figure [192]3j).

   Moreover, to assess whether additional developmental defects beyond the
   A‐P axis and hindlimb development were present in p‐Enh‐KO embryos, we
   examined the development of other major embryonic structures, including
   the brain, branchial arch, and heart. The representative markers such
   as Sox2 for the neural tube, Msx2 for the branchial arch, and Tbx5 and
   Myh6 for the heart remain largely normal expression patterns in
   p‐Enh‐KO embryos, indicating no major disruption in these tissues
   (Figure [193]S6a‐f, Supporting Information).

   Together, these results indicate that the deletion of p‐Enh
   specifically leads to severe developmental abnormalities in posterior
   tissues, manifesting as disrupted aPSM differentiation, paralyzed
   somitogenesis, and absent hindlimbs, which finally cause embryonic
   lethality around E10.5 stage. Single‐cell transcriptomics analysis
   reveals that p‐Enh can regulate the gene expression of NMP‐derived
   clusters as early as E8.5, p‐Enh‐KO results in a pronounced decrease in
   cell density in posterior cell clusters at E9.5.

2.4. Two Distinct Mechanisms with Differential CDX2 Dependencies Underlying
p‐Enh Function

   A substantial body of work on various enhancers have proposed
   mechanisms of enhancer function by regulating adjacent genes.^[
   [194]^53 , [195]^54 , [196]^55 , [197]^56 , [198]^57 ^] Considering
   that p‐Enh is located in the first intron of Cdx2 and the crucial role
   of Cdx2 in body axis development,^[ [199]^31 , [200]^58 , [201]^59 ,
   [202]^60 ^] we examined whether the expression of Cdx2 was affected in
   p‐Enh‐KO embryos. As shown, there was no change in Cdx2 expression in
   the extraembryonic cells, with partial down‐regulation in the posterior
   regions at E7.5 (Figure [203]4a). Consistently, single‐cell
   transcriptome data also showed reduced Cdx2 expression in posterior
   tissues such as NMP and presomitic mesoderm (Figure [204]S7a,
   Supporting Information). To probe the downstream molecular mechanisms
   for p‐Enh function and dissect the potential relevance between p‐Enh
   and its neighboring coding gene‐Cdx2, we prepared the p‐Enh‐KO and
   Cdx2‐KO mouse embryonic stem cell lines with CRISPR/Cas9 system,
   respectively (Figure [205]S7b, Supporting Information).^[ [206]^61 ^]
   It is noteworthy that the Cdx2‐KO cell line features a specific 35 bp
   deletion in the first exon, causing a frameshift mutation and
   eliminating functional CDX2 protein without interrupting the p‐Enh
   element genetically (Figure [207]S7c,d, Supporting Information).
   Generally, the morphologies of stem cell clones and distributions of
   pluripotent markers NANOG and OCT4 were comparable among WT, Cdx2‐KO,
   and p‐Enh‐KO ESCs (Figure [208]S7e, Supporting Information), indicating
   that the removal of CDX2 protein or p‐Enh element did not affect stem
   cell pluripotency. Next, we applied the WT, Cdx2‐KO and p‐Enh‐KO cell
   lines to the newly developed in vitro gastruloids^[ [209]^62 ^] and
   Trunk‐like structures (TLSs)^[ [210]^63 ^] differentiation systems
   (Figure [211]4b), which can be used to model the anterior‐posterior
   axis formation and later somitogenesis in the embryo. We found that the
   WT embryoids could generate tailbud‐like structures that resemble the
   caudal region of embryos in both gastruloids and TLSs systems
   (Figure [212]4c; Figure [213]S7f, Supporting Information). However, the
   elongation of tailbud regions in both the gastruloids and TLSs for
   Cdx2‐KO and the p‐Enh‐KO cells was largely compromised in comparison to
   the WT group. Moreover, in contrast to the WT and Cdx2‐KO gastruloids
   and TLSs, the embryoids derived from p‐Enh‐KO ESCs remain as large
   spherical aggregates with no morphological evidence of A‐P axis
   elongation (Figure [214]4c; Figure [215]S7f, Supporting Information).
   Geometric analysis of the acquired gastruloids and TLSs further
   confirmed the consistent and significant defects (p<0.0001) in A‐P axis
   organization and elongation in both Cdx2‐KO and p‐Enh‐KO embryoids.
   Moreover, the phenotypic abnormalities were more pronounced in p‐Enh‐KO
   embryoids compared to Cdx2‐KO counterparts (Figure [216]4d; Figure
   [217]S7g, Supporting Information).

Figure 4.

   Figure 4
   [218]Open in a new tab

   a) The dynamic expression pattern of Cdx2 in WT and p‐Enh‐KO embryos at
   different development stages. Similar results are obtained from embryo
   replicates. Scale bar: 500 µm. b) Schematic overview of gastruloid and
   TLS differentiation system. The length, width, and tail are indicated.
   c) Representative bright‐field images of gastruloids at D5 from WT,
   Cdx2‐KO, and p‐Enh‐KO groups. The length, width, and tail are labeled
   and measured for morphological assessment. Scale bar: 500 µm. d) The
   geometric results summarizing the morphological features for
   gastruloids of each genotype. WT (n = 35); Cdx2‐KO (n = 26); p‐Enh‐KO
   (n = 38). The p‐values are calculated based on one‐way ANOVA. e)
   Principal component analysis (PCA) of transcriptome of WT, Cdx2‐KO and
   p‐Enh‐KO gastruloid. Three biological replicates are used. f) Venn plot
   showing a modest percentage of shared genes between Cdx2‐KO affecting
   genes and p‐Enh‐KO affecting genes (adjusted p‐value<0.01; fold change
   >2). Differential expression analysis was performed by using DESeq2.
   p‐values are calculated by using a Wald test (two‐tailed) in DESeq2 and
   corrected for multiple testing by using Benjamini‐Hochberg (BH) method.
   g) Examples of Cdx2‐KO and p‐Enh‐KO overlapped DEGs. Statistical
   significances are determined by one‐way ANOVA. h) Heatmap showing
   differentially expressed genes in WT, Cdx2‐KO, and p‐Enh‐KO groups.
   Significant terms of Gene Ontology (GO) in Group3 and Group5 are shown.
   The adjusted p‐values are calculated using hypergeometric test
   (one‐sided) and adjusted for multiple testing using Benjamini‐Hochberg
   (BH) method. i) Transcriptome projection of genes regulated by Cdx2
   and/or p‐Enh onto in vivo single cell dataset. The cluster enrichment
   scores are listed below, which are calculated based on AddModuleScore
   function in Seurat package. j) The schematic diagram illustrating the
   experimental design of Dox‐induced iCdx2 in gastruloid system. The time
   window for adding Dox is highlighted in green. k) RNAscope results
   targeting Tbx6 and Uncx4.1 in gastruloids with different genotypes.
   Strip‐like patterns of Uncx4.1 only appear in WT gastruloids and cannot
   be rescued by iCdx2 overexpression. Scale bar: 100 µm. l) Heatmap
   showing differential expressed genes in WT, p‐Enh‐KO and p‐Enh‐KO/iCdx2
   groups. Gene examples are listed. Boxplot showing the average
   expression of genes across different genotypes. The Gene Ontology
   (Biological Process) of Group 4‐B and Group 5‐B are shown on the right.
   The adjusted p‐values are calculated using hypergeometric test
   (one‐sided) and adjusted for multiple testing using BH method.

   To comprehensively determine the molecular architectures of gastruloids
   derived from Cdx2‐KO and p‐Enh‐KO cells, we performed transcriptomic
   profiling of the gastruloids with three biological replicates for each
   group (Figure [219]S8a, Supporting Information). Our analysis revealed
   a significant decrease in Cdx2 expression in both Cdx2‐KO and p‐Enh‐KO
   samples, which was further experimentally confirmed (Figure [220]S8b,c,
   Supporting Information). Principal component analyses (PCA) revealed
   distinct transcriptomic features for Cdx2‐KO and p‐Enh‐KO samples
   (Figure [221]4e). Additionally, p‐Enh‐KO exhibited a greater number of
   differentially expressed genes (DEGs) compared to Cdx2‐KO (p‐Enh‐KO:
   608 up and 1451 down, Cdx2‐KO: 819 up and 455 down) (Figure [222]S8d,
   Supporting Information). Remarkably, only a small portion of DEGs were
   overlapped between Cdx2‐KO and p‐Enh‐KO group (330/1274, 25.9% of
   Cdx2‐KO DEGs; 330/2059, 16% of p‐Enh‐KO DEGs) (Figure [223]4f),
   suggesting both shared and distinct regulatory roles of Cdx2 and p‐Enh
   during A‐P axis formation and elongation in embryoids. Despite
   constituting a small fraction, the shared target genes among the
   overlapping DEGs, including regulators of A‐P axis patterning such as
   Hox genes and T,^[ [224]^31 , [225]^64 , [226]^65 ^] exhibited
   substantial downregulation in both Cdx2‐KO and p‐Enh‐KO gastruloids
   (Figure [227]4g). qPCR results of representative genes from different
   groups confirmed expression changes (Figure [228]S8e, Supporting
   Information). Next, we categorized these DEGs (Figure [229]4f) into 5
   clusters based on gene expression patterns, including the shared DEGs
   (Group 1), Cdx2‐KO specific up (Group 2), Cdx2‐KO specific down (Group
   3), p‐Enh‐KO specific up (Group 4), and p‐Enh‐KO specific down (Group
   5) (Figure [230]4h). Gene ontology (GO) analysis revealed that genes in
   Group 3 were enriched in the biological process related to
   anterior/posterior patterning and somitogenesis process, consistent
   with the well‐known functions of CDX2 protein.^[ [231]^58 ^] As for
   p‐Enh‐KO specific down (Group 5) genes, the pathways related to a broad
   spectrum of mesodermal development, such as gland development,
   mesenchyme development, and heart morphogenesis, were enriched
   (Figure [232]4h; Figure [233]S8f, Supporting Information). Following
   this, we conducted an enrichment analysis of the DEGs of Group 1, 3, 5
   with the constructed in vivo single‐cell A‐P atlas (Figure [234]3a),
   and found that Group 1 and Group 3 exhibited a clear posteriorized
   enrichment, displaying high enrichment scores with NMP, presomitic
   mesoderm, and posterior neural tube clusters. However, genes in the
   p‐Enh‐KO specific down group (Group 5) showed a strong correlation with
   NMP and the mesodermal clusters (Figure [235]4i), indicating p‐Enh may
   participate into posterior mesoderm development through a
   CDX2‐independent mechanism.

   To further validate this notion, we established the inducible Cdx2
   overexpression cell line on the background of p‐Enh‐KO (p‐Enh‐KO/iCdx2)
   and subjected this cell line into gastruloid differentiation system
   (Figure [236]4j; Figure [237]S8g, Supporting Information). We observed
   that the expression of the presomitic mesoderm marker Tbx6, a
   downstream target of T and Cdx2 network,^[ [238]^60 ^] was reactivated
   in the p‐Enh‐KO/iCdx2 gastruloids (Figure [239]4k). However, the A‐P
   elongated morphology as well as the stripe‐like structure marked by
   Uncx4.1 transcripts remained compromised in p‐Enh‐KO/iCdx2
   (Figure [240]4k). Transcriptome analyses showed that more than 50% of
   p‐Enh‐KO specific down genes (651/1224, 53% of Group 5) failed to be
   fully rescued by the inducible Cdx2 overexpression in comparison with
   the WT counterpart (Figure [241]4l; Figure [242]S8h, Supporting
   Information), which enriched organ formation‐related genes. In
   particular, genes involved in the response to the transforming growth
   factor beta (TGF‐β) pathway remains abnormal even with the restoration
   of CDX2 (Figure [243]4l). Regulon analyses summarizing the upstream
   regulators for the un‐rescuable genes revealed that major effectors
   involved in TGF‐β signaling, such as SMAD1 and SMAD9, were specifically
   enriched (Figure [244]S8i, Supporting Information).

   In summary, in vitro embryoid data showed that p‐Enh can regulate
   posterior tissue development partially through the neighboring gene
   Cdx2. Moreover, these results also suggest that a previously
   unrecognized CDX2‐independent mechanism may contribute to p‐Enh
   function, possibly involving cellular responsiveness to TGF‐β signaling
   pathway.

2.5. Genome‐Wide Epigenomic Remodeling Suggests a Potential Functional
Relevance between p‐Enh and TGF‐β Signaling

   To assess whether p‐Enh regulates specific target genes via direct
   chromatin contacts, we performed 4C‐seq using the p‐Enh locus as the
   viewpoint in NMP cells. In consistent with the published HiC dataset,^[
   [245]^66 ^] we found that p‐Enh exhibits interactions with neighboring
   chromatin regions, within a 3 Mb window surrounding its genomic locus
   (Figure [246]S9a, Supporting Information). By integrating 4C‐seq data
   with transcriptomic changes upon p‐Enh deletion, we identified three
   candidate genes—Cdx2, Rasl11a, and Slc46a3—that showed both physical
   interaction and differential expression (fold change > 2) (Figure
   [247]S9a, Supporting Information). However, functional analysis through
   shRNA‐mediated knockdown of Slc46a3 in p‐Enh‐KO cells failed to rescue
   the developmental defects (Figure [248]S9b,c, Supporting Information),
   indicating that in cis chromatin interactions with distal genes, such
   as Slc46a3, is not responsible for p‐Enh functions.

   To systematically determine the downstream molecular cascades of p‐Enh,
   we profiled the H3K27ac modification enrichment from both WT and
   p‐Enh‐KO NMP cells acquired from directed NMP differentiation system
   (Figure [249]5a).^[ [250]^67 ^] The comparative analysis of chromatin
   activities marked by H3K27ac revealed the occurrence of pervasive
   epigenomic remodeling in p‐Enh‐KO NMP cells, especially at distal
   intergenic chromatin regions (Figure [251]5b; Figure [252]S9d–f,
   Supporting Information). GREAT analyses identified that peaks with
   weakened enrichment of H3K27ac in p‐Enh‐KO cells are strongly enriched
   with SMAD binding capabilities (Figure [253]5b). Following this, we
   also incorporated published dataset of SMADs ChIP‐seq data,^[ [254]^68
   , [255]^69 ^] as well as CDX2 ChIP‐seq^[ [256]^31 ^] data from in vitro
   sample counterpart, and checked the distribution of these signaling
   effectors around peak sets with changed H3K27ac level in p‐Enh‐KO cells
   (Figure [257]5c). We observed enrichment of TGF‐β signal effectors
   (SMADs) binding with these peaks, but not CDX2 binding
   (Figure [258]5c). These results demonstrate that the loss of p‐Enh
   element in NMP cells lead to significant decrement of H3K27ac at TGF‐β
   signal effectors chromatin binding sites. To test whether epigenetic
   remodeling of H3K27ac at these regions coincides with the change of
   chromatin binding for SMAD proteins, we performed ChIP‐qPCR analyses
   for the co‐SMAD effector, SMAD4, in both WT and p‐Enh‐KO NMP cells. As
   shown, the bindings of SMAD4 are significantly down‐regulated in
   p‐Enh‐KO cells (Figure [259]S9g, Supporting Information). These
   results, together with previous transcriptome data (Figure [260]4f,l),
   indicate that p‐Enh could play roles in mediating the chromatin
   responsiveness to TGF‐β signal effector, in a CDX2‐independent
   mechanism.

Figure 5.

   Figure 5
   [261]Open in a new tab

   Functional interplay between p‐Enh and the TGF‐β signaling pathway. a)
   Schematic of 2D NMP differentiation system. H3K27ac ChIP‐seq data are
   employed to assess the chromatin activity of WT and p‐Enh‐KO NMPs. b)
   Heatmap showing differential H3K27ac peaks in WT and p‐Enh‐KO NMPs. GO
   terms of top‐ranked GREAT analysis from differential H3K27ac peaks are
   shown. No GO terms enriched in p‐Enh‐KO gain group under default
   parameters. c) Heatmaps illustrating chromatin binding patterns of
   multiple TGF‐β signaling downstream effectors and CDX2 at chromatin
   regions with altered H3K27ac enrichment in Figure [262]5b. Published
   datasets for SMAD2, SMAD3, SMAD4 in TGF‐β‐activated condition,^[
   [263]^68 ^] SMAD2/3 in TGF‐β‐activated condition,^[ [264]^69 ^] and
   CDX2 ChIP‐seq^[ [265]^31 ^] are used. d) Co‐staining of Cdx2 mRNA and
   p‐Enh‐eRNA using RNAscope in E7.5 mouse embryos. Both transverse and
   sagittal sections are shown. The labeling scheme is consistent with
   Figure [266]1f, and the nuclear‐cytoplasmic boundary is outlined in a
   vibrant yellow color. Scale bar: 50 µm. e) TGF‐β signaling activity
   determined by gene set (GESA: MM9770) using AUCell (v1.20.2) in E7.5
   mouse embryo. The schematic diagram is adapted and modified from the
   spatial transcriptome of gastrulation.^[ [267]^9 ^]f) RNAscope
   targeting p‐Enh‐eRNA and co‐staining with SMAD4 protein in E7.5 mouse
   embryo sections. Zoomed‐in view showing the detailed signals at
   posterior region. The labeling scheme is consistent with
   Figure [268]1f. The co‐localization spots of p‐Enh‐eRNA with SMAD4
   proteins are further magnified for the clear presentation. The
   co‐localization analysis is performed using ImageJ plugin ComDet 0.5.5.
   Scale bar: 10 µm. g) Diagram showing the RNA pull‐down assay workflow.
   h) Western blot (WES) results. The bands corresponding to SMAD4 protein
   (≈64 kDa) are indicated. Two biological replicates are included in each
   condition to ensure robustness. i) Diagram showing modulation of TGF‐β
   signaling and eRNA in gastruloid differentiation system. Activin A
   (20 ng mL^−1) or SB421542 (10 µM) are added into differentiation system
   from the end of D1. j) Bright‐field images of D5 gastruloids in
   different conditions. Scale bar: 200 µm. k) Geometric features (Axial
   length, width, tail length, and length/width ratio) of D5 gastruloids
   in Figure [269]5j. Each group includes multiple biological replicates
   (n = 16–31) to ensure statistical robustness. Mean values for selected
   bars are indicated in blue text. The p‐value are calculated based on
   one‐way ANOVA. l) Expression pattern of Cdx2 in WT, p‐Enh‐KO and
   p‐Enh‐KO (Activin A) gastruloids. The p‐values are calculated based on
   one‐way ANOVA. m) Hierarchical clustering of transcriptome from WT,
   p‐Enh‐KO and p‐Enh‐KO (Activin A) gastruloids. n) Heatmap illustrating
   expression pattern of p‐Enh specific regulated genes (defined in
   Figure [270]4h,l) in different groups. Subclusters A and B are
   delineated according to the rescue efficacy of iCdx2 overexpression.
   Boxplot showing the average expression of genes in each subclusters.
   p‐values are calculated by using Wilcoxon rank sum test (two‐sided).

   Recent studies have found that a broad spectrum of TFs bind with RNA
   molecules, highlighting the conserved and essential nature of
   interactions between TFs and RNA in vertebrate development.^[ [271]^70
   ^] We found that p‐Enh can transcribe eRNAs, which are largely
   distributed in the posterior regions at the late‐gastrulation stage
   (E7.5) (Figure [272]1f). Co‐detection of p‐Enh transcripts and Cdx2
   mRNA in the E7.5 mouse gastrula revealed that both transcripts were
   specifically enriched in the posterior region (Figure [273]5d).
   Moreover, sub‐cellular distribution analyses showed that the
   p‐Enh‐eRNAs were prominently enriched in the nuclei, forming
   puncta‐like structure at multiple loci within a single nucleus. In
   contrast, the transcripts of the neighboring coding‐gene, Cdx2, were
   universally located in the cytoplasm (Figure [274]5d). Meanwhile, the
   signaling activity of TGF‐β superfamily was also enriched in the
   posterior embryonic region at E7.5, represented by the spatial gene
   expression enrichment of typical TGF‐β signaling target genes
   (Figure [275]5e). To investigate the spatial association between
   p‐Enh‐eRNA and TGF‐β signaling, we performed co‐staining of p‐Enh‐eRNA
   with SMAD proteins in E7.5 embryos. Over 40% of nuclear p‐Enh‐eRNA
   puncta co‐localized with SMAD4 (Figure [276]5f), and ≈30% overlapped
   with phosphorylated SMAD1/5‐positive foci (Figure [277]S10a, Supporting
   Information). The spatial co‐occurrence of p‐Enh transcripts and TGF‐β
   signaling effectors hints an intriguing involvement role of p‐Enh‐eRNA
   in TGF‐β signaling transduction.

   To further validate this possibility, we performed RNA pull‐down
   followed by Western blot to directly examine the physical interaction
   between p‐Enh‐eRNA and SMAD4 (Figure [278]5g). Using biotin‐labeled
   antisense probes targeting p‐Enh‐eRNA, we found that SMAD4 protein was
   specifically enriched in the probe (+) group purified from WT NMP
   cells. Moreover, the abundance of SMAD4 was increased upon p‐Enh‐eRNA
   overexpression (Figure [279]5h). These results provide direct
   biochemical evidence supporting a physical interaction between
   p‐Enh‐eRNA and SMAD4, reinforcing that p‐Enh‐eRNA may participate in
   TGF‐β signaling by engaging with SMAD4.

   To further validate the involvement of eRNA and TGF‐β in p‐Enh's
   functions, we performed gastruloid differentiation experiments under
   multiple conditions (Figure [280]5i). First, to test whether p‐Enh‐eRNA
   could compensate for the loss of p‐Enh, we generated p‐Enh‐KO cells
   with nuclear‐enriched overexpression of eRNA (p‐Enh‐KO/eRNA‐OE)^[
   [281]^71 , [282]^72 ^] (Figure [283]S10b, Supporting Information).
   However, p‐Enh‐KO/eRNA‐OE gastruloids did not exhibit obviously
   morphological change at D5 compared to the p‐Enh‐KO group, suggesting
   that eRNA alone is insufficient to restore axial elongation. Next, we
   activated TGF‐β signaling using Activin A in gastruloid system
   (Figure [284]5i). Notably, Activin A treatment markedly ameliorated the
   differentiation defects in p‐Enh‐KO gastruloids (Figure [285]5j,k).
   Conversely, inhibition of TGF‐β signaling with SB431542 induced
   extensive cell death, indicating the essential roles of TGF‐β signaling
   in A‐P axis patterning (Figure [286]S10c, Supporting Information). To
   further assess the potential interplay between p‐Enh‐eRNA and TGF‐β
   signaling, we performed a combinatorial treatment in which p‐Enh‐KO
   gastruloids were subjected to both eRNA overexpression and Activin A
   treatment. Compared to Activin A alone, the p‐Enh‐KO/eRNA‐OE (Activin
   A) group exhibited further improvement in morphological features,
   including axial length, tail length, and the length/width ratio.
   Specially, the length and length/width ratio, which reflect the
   anterior‐posterior polarity, of the p‐Enh‐KO gastruloids exhibit
   gradual alleviation of the defects and finally reach a comparable level
   to the WT group (Figure [287]5k). These results suggest a cooperative
   effect between p‐Enh‐eRNA and TGF‐β signaling in modulating posterior
   developmental programs.

   As the phenotypic improvement mainly occurred upon TGF‐β activation, we
   next examined the associated transcriptomic profiles (Figure [288]S10d,
   Supporting Information). Notably, Cdx2 expression remained low even
   after Activin A treatment (Figure [289]5l), indicating that the rescue
   effect was not mediated through this neighboring gene. This supports
   the notion that the differentiation defects could be attributed to the
   interactions between p‐Enh and TGF‐β signaling in a Cdx2‐independent
   way, as shown in Figure [290]4l. In contrast to the expression pattern
   of Cdx2, a list of genes crucial for A‐P axis patterning and mesoderm
   development, including Hes7, Tgfb2 and Mixl1, exhibited marked
   upregulation upon TGF‐β activation. This underscores the ability of
   TGF‐β to effectively ameliorate the phenotype associated with p‐Enh‐KO
   (Figure [291]S10e, Supporting Information). Hierarchical clustering
   result also revealed that Activin A treatment effectively rescued genes
   expression abnormalities caused by p‐Enh depletion (Figure [292]5m).
   Specifically, p‐Enh regulated genes that failed to be rescued by Cdx2
   re‐expression (subcluster B of Group 4 and Group 5 in Figure [293]4l)
   were largely restored with Activin A treatment (Figure [294]5n).

   Together, these findings suggest that the direct crosstalk between
   p‐Enh‐eRNA and the TGF‐β signaling pathway, which facilitates the
   chromatin binding of signaling effectors, is responsible for p‐Enh
   function, thus holds substantial biological importance for the
   development of posterior tissues.

3. Discussion

   In this study, we have identified a pre‐marked regulatory element,
   p‐Enh, exhibiting highly restricted spatial‐temporal activity and
   significant biological importance during mouse early embryonic
   development. The pivotal role of p‐Enh in posterior tissue development
   is underscored by the embryonic lethality observed in p‐Enh‐KO embryos,
   accompanied by severe posterior tissue growth failure, disruption of
   A‐P axis gene expression pattern, and dysfunctions across multiple cell
   types (Figures [295]2 and [296]3). Mechanism analyses through
   transcriptomic dissection pinpointed the co‐occurrence of two distinct
   mechanisms underlying the p‐Enh function: (i) regulating the expression
   level of its neighboring gene Cdx2 in cis, thereby affecting the
   downstream cascades of CDX2 (Figure [297]4); (ii) transient production
   of nuclei‐distributed eRNAs, potentially involved in coordinating the
   nuclear responsiveness to the extrinsic developmental signals, such as
   TGF‐β signaling pathway (Figure [298]5), through direct interaction
   with signal effectors. Thus, we propose a molecular model for H3K27ac
   pre‐marked developmental enhancers, wherein genetic sequence‐based in
   cis local regulation and RNA transcripts‐based in trans genome‐wide
   regulation co‐exists to facilitate tight spatiotemporal control of
   global molecular architecture and thus set up the molecular blueprint
   for future tissue development (Figure [299]6 ).

Figure 6.

   Figure 6
   [300]Open in a new tab

   Model of the regulation of posterior developmental genes through p‐Enh
   with distinct in cis and in trans mechanisms. The gastrula pre‐marked
   p‐Enh plays dual roles in governing posterior tissue development: (i)
   regulating the neighboring gene Cdx2 (in cis), which in turn affects
   their shared downstream targets, such as Hox and T genes; (ii)
   modulating the global transcriptome and epigenome via transient
   expression of broadly nuclei‐distributed eRNA (in trans), by mediating
   the interplay between p‐Enh‐eRNA and signaling effector such as SMAD4,
   which may contribute to the modulation of TGF‐β signaling. p‐Enh‐KO
   embryos exhibit embryonic lethality, aberrant posterior tissue
   development trajectories, and failure to initiate hindlimb formation.

   Recently, there has been a growing interest in dissecting the molecular
   hierarchy of gene regulatory networks (GRNs) that govern rapid cell
   fate commitment.^[ [301]^9 , [302]^18 , [303]^73 , [304]^74 ^] Previous
   studies have primarily focused on identifying specific pioneer
   transcription factors capable of steering the developmental competence
   toward specific developmental lineages.^[ [305]^75 , [306]^76 ,
   [307]^77 ^] In line with the pioneering factors, emerging evidences
   also indicated that the epigenetic priming or pre‐mark at regulatory
   elements could offer a notable feature prior to cell fate decision.^[
   [308]^13 , [309]^78 ^] However, it remains unclear whether the
   “chromatin priming” of regulatory elements holds potential functional
   roles in biological process beyond merely serving as a marker. Here, we
   determined one H3K27ac pre‐marked regulatory element, p‐Enh, which is
   specifically activated in the PS and adjacent epiblast at late
   gastrulation stage, and sustains the epigenetic activity till late
   organogenesis. Phenotypic and molecular analysis demonstrated that
   genetic loss of p‐Enh leads to dramatic developmental defects when the
   posterior lineage diversifies. Moreover, the precedence of
   transcriptomic disruptions over phenotypic anomalies strengthens the
   evidence for the importance of the priming effects. Thus, future
   evaluation of pre‐marked regulatory elements and further phenotyping
   through incorporating multi‐dimensional data may better elucidate the
   function of novel crucial regulatory elements.

   A considerable amount of research has put forth on how enhancers exert
   their functions, particularly through the regulation of neighboring
   genes. Here, through in‐depth transcriptomic dissection, chromatin
   interaction capture and genetic analyses, as reflected by the
   transcriptomic comparison among multiple genetic edited cell lines (WT,
   p‐Enh‐KO, Cdx2‐KO, p‐Enh‐KO/iCdx2), we found that only 16% genes were
   regulated through the p‐Enh‐Cdx2 axis (Figure [310]4f). Meanwhile,
   TGF‐β signaling responsive genes were highly enriched in the cluster of
   p‐Enh specifically regulated genes (Figure [311]4l; Figure [312]S8i,
   Supporting Information). Further, through profiling H3K27ac pattern in
   both WT and p‐Enh‐KO cells, we found that the global distribution of
   H3K27ac was pervasively remodeled in p‐Enh‐KO cells (Figure [313]5b).
   Interestingly, the genomic regions with changed H3K27ac level are
   enriched with TGF‐β signaling effectors but not CDX2 binding
   (Figure [314]5c, Figure [315]S9g, Supporting Information). Thus, the
   regulatory mechanisms underlying p‐Enh function are intricately linked
   with TGF‐β signaling, suggesting a complex interplay rather than a
   straightforward linear molecular cascade acting through its neighboring
   gene Cdx2 or functioning solely through in cis chromatin interactions
   with distal targets such as Slc46a3 (Figure [316]4; Figure [317]S9a–c,
   Supporting Information). p‐Enh possesses both in cis and in trans
   regulatory mechanisms in the developmental control of posterior
   tissues.

   The discovery of enhancer‐derived transcripts largely broaden the
   current scope of the molecular mechanisms governing chromatin structure
   organization and gene expression regulation.^[ [318]^79 , [319]^80 ^]
   The abundant expression and nuclear localization of p‐Enh‐eRNAs at
   multiple loci within individual nuclei in the PS and adjacent epiblast
   region suggest their potential roles in global GRNs instead of
   localized functions. Their broad distribution within nuclei may
   contribute to the observed pervasive chromatin activity changes in
   p‐Enh‐KO cells (Figure [320]5a–d). Furthermore, a recent study has
   underscored the extensive interaction between TFs and RNA molecules.^[
   [321]^70 ^] Here, we demonstrated that p‐Enh‐eRNA not only physically
   interacts with TGF‐β effectors, SMAD4 and pSMAD1/5, but also regulates
   the chromatin binding of the interacting TF partners. Importantly, the
   increment of TGFβ signaling activity and restore of eRNA abundance
   synergistically rescue the differentiation defects in gastruloid.
   Drawing from the existing research and our findings, we hypothesize
   that the widespread eRNAs within individual nuclei can function as
   trans‐modular coordinators to interact with signaling downstream
   effectors, potentially serving as “regulatory units”, then facilitate
   the chromatin binding of interacting TF partners to the designated
   chromatin regions, thereby influencing the long‐range modulation of
   global GRNs for proper developmental programming (Figure [322]6).

   Even though the comprehensive nature of our data, we do acknowledge
   that future in‐depth characterization of the structure of p‐Enh‐eRNA
   and systematic identification of its potential interacting partners, of
   this transcript will finally clarify the complex regulatory network and
   promote a global understanding of mouse embryogenesis.

   In summary, to delve into the molecular mechanisms underlying
   epigenetic pre‐marked enhancers, we systematically dissect both the in
   cis and in trans paths for p‐Enh, proposing this as a novel strategy of
   developmental enhancer to coordinate multiple downstream modules
   genome‐widely, and then establish a responsive cellular state for
   forthcoming lineage diversification.

4. Experimental Section

Animals Use, Embryo Staging, and Collection

   Animal experiments were performed at the Animal Core Facility, under
   program project SIBCB‐S308‐1807‐025 approved by the Institutional
   Animal Care and Use Committee in Center for Excellence in Molecular
   Cell Science, CAS, and the program project GZLAB‐AUCP‐2022‐10‐A04
   approved by Institutional Animal Care and Use Committee of Guangzhou
   National Laboratory. Timed matings were set up between sexually mature
   (older than six weeks of age) of wildtype C57BL/6 or p‐Enh‐KO. Plugged
   female mice were picked after mating and marked as embryonic day 0.5
   (E0.5). Female mice were sacrificed for embryos collection at specific
   gestational stages. Embryos were isolated from the uterus and carefully
   transferred into PBS in petri dishes, and extra‐embryonic tissues were
   removed using needles under Olympus stereoscope. The developmental
   stage and morphology of embryos were reconfirmed according to the
   Theiler stage methodology. The acquired embryos were ready for further
   experiments including whole‐mount in situ hybridization, iDISCO,
   RNAscope, or single cell transcriptome profiling.

qPCR Quantification of eRNA Expression

   To ensure sufficient RNA input for reliable detection of low‐abundance
   enhancer RNAs, microdissected tissues were pooled from ≈50 mouse
   gastrula‐stage embryos for each sample. For E9.5 stage samples, tissues
   from 5 embryos were microdissected and combined per sample. The pooling
   strategy was designed to enrich for the target regions and provide
   adequate RNA yield for downstream analyses. Total RNA was extracted
   using the TRIzol reagent (Invitrogen, 15596018) following the
   manufacturer's standard protocol. Briefly, tissues were homogenized in
   TRIzol, followed by phase separation with chloroform and RNA
   precipitation with isopropanol. After washing with 75% ethanol, RNA
   pellets were air‐dried and resuspended in RNase‐free water. RNA
   concentration and purity were assessed using a NanoDrop
   spectrophotometer. Reverse transcription was carried out using random
   primers and reverse transcriptase (Tiangen, KR108) to synthesize
   complementary DNA (cDNA) from total RNA, enabling unbiased detection of
   eRNAs. The resulting cDNA was used as the template for quantitative PCR
   (qPCR) with region‐specific primers to measure eRNA expression levels.

mESC Maintenance and Differentiation

   mESC lines were maintained in standard feeder‐free culture conditions
   which were supplemented with 3 µM CHIR99021 (Selleck Chemicals, S1263),
   1 µM PD0325901(Selleck Chemicals, S1036) and 10 ng mL^−1 mouse leukemia
   inhibitory factor (Millipore, ESG1107) as previously described.^[
   [323]^81 ^] The NMP differentiation was performed as previously
   reported.^[ [324]^67 ^] In brief, 50000 cells were seeding in N2B27
   medium containing 10 ng mL^−1 bFGF (Shanghai Pufei Biotechnology,
   1106‐010) in Matrigel pre‐coated 35 mm dish and cultured for two days.
   The N2B27 medium contained a 1:1 ratio of DMEM/F12:Neurobasal medium
   (GIBCO) supplemented with 1xN2 (GIBCO), 1xB27 (GIBCO), 1x GlutaMax
   (GIBCO), 40 mg mL^−1 BSA (Roche), penicillin/streptomycin (GIBCO) and
   0.1 mM 2‐mercaptoethanol. Cells were pulsed with 5 µM CHIR99021 during
   24 h to 36 h. At the end of D3, the NMPs were ready for transcriptome
   profiling, chromatin conformation capture assays, and ChIP‐seq
   analysis.

Gastruloids and Trunk‐Like Structure (TLS) Generation

   Gastruloids and TLSs were generated using established protocols with
   minor adjustments.^[ [325]^63 , [326]^82 ^] The basic N2B27 medium
   contained a 1:1 ratio of DMEM/F12: Neurobasal medium (GIBCO)
   supplemented with 1xN2 (GIBCO), 1xB27 (GIBCO), 1x GlutaMax (GIBCO),
   penicillin/streptomycin (GIBCO) and 0.1 mM 2‐mercaptoethanol. The
   initial 4 days of differentiation for both gastruloids and TLSs were
   consistent. In brief, 200–250 single mESC were seeded in 40 µL N2B27
   medium into wells of the 96‐well V bottom Ultra‐low Attachment Plates
   (S‐BIO, MS‐9096VZ) and allowed to aggregated for 48 h. Subsequently,
   the spheres were treated with 3 µM CHIR99021 in 150 µL N2B27 medium for
   24 h. After this, medium was refreshed every 24 h with the same volume
   of basic N2B27 medium. On the final day of differentiation, gastruloids
   were cultured with basic N2B27, while TLSs were embedded in 5%
   Growth‐Factor‐Reduced Matrigel (Corning, 356231) in N2B27 medium. At
   the end of day 5, the gastruloids and TLSs were subjected for
   morphology analysis, transcriptome sample collection, or
   immunofluorescence. In the TGF‐β signaling modulation condition,
   Activin A (20 ng mL^−1) or SB421542 (10 µM) are added into
   differentiation system from the end of D1.

Transgenic Embryo Enhancer Activity Screening

   Transgenic embryo enhancer activity screen was performed following
   methods as previously reported.^[ [327]^37 ^] In brief, enhancer
   candidates with posterior epigenetic activity were selected from our
   published dataset and ranked according to H3K27ac increasing enrichment
   at posterior region from E7.0 to E7.5 mouse gastrula.^[ [328]^18 ^]
   Selected fragments were cloned from mouse genomic DNA and ligated into
   PB‐pHsp68‐LacZ plasmid using seamless cloning following the
   manufacturer's instructions (Beyotime, D7010M). Then, the acquired
   PB‐DREs‐Hsp68‐LacZ plasmids and PBase mRNA were microinjected into the
   cytoplasm of a fertilized egg with the FemtoJet Microinjection System
   (Eppendorf). The injected embryos were then cultured to the 2‐cell
   stage in KSOM medium at 37 °C, 5% CO2 in standard incubator, and the
   2‐cell embryos were transferred to the oviduct of pseudo‐pregnant ICR
   females and marked as 0.5 dpc. Finally, embryos were collected at
   appropriate stages and subjected to further genotyping and LacZ
   staining following the protocol previously reported.^[ [329]^37 ^]

Whole‐Mount In Situ Hybridization

   The mRNA expression in E6.5‐E10.5 embryos was assessed by whole‐mount
   in situ hybridization using the digoxigenin‐labelled antisense RNA
   probes as described previously.^[ [330]^83 ^] Primers for amplifying
   probe templates are listed in Table [331]S4 (Supporting Information).
   In brief, embryos at different stages were collected and fixed with
   4%PFA overnight and dehydrated stepwise in graded series of methanol.
   Embryos can be stored in methanol for a minimum overnight and up to 1
   week at −20 °C. Embryos were rehydrated through 75%, 50%, and 25%
   methanol at room temperature, washed three times with DPBS and treated
   with 10 µg mL^−1 proteinase K (Invitrogen, AM2548) in PTW for suitable
   time. After post‐fixation for 30 min, ≈500 ng of digoxigenin‐labelled
   RNA probe was incubated with the embryo at 70 °C overnight. The embryos
   were then washed with TBST and incubated with 1:2000 diluted
   anti‐digoxigenin AP antibody (Roche, 11093274910) in blocking buffer.
   Incubating on a rocker platform at 4 °C overnight. The next day,
   embryos were further washed with TBST and stained with NBT/BCIP
   solution. Images were obtained using Olympus stereoscope.

Haematoxylin and Eosin (HE) Staining

   Embryo samples were fixed in 4% paraformaldehyde/PBS at 4 °C overnight
   and then processed for paraffin wax embedding. 7‐µm thick sections were
   cut, dewaxed in xylene, rehydrated through an ethanol series into PBS.
   HE counterstaining was performed using staining kit (Beyotime, C0105)
   according to manufacturer's instructions. Images were taken on Olympus
   VS120 microscope.

Single‐Cell Transcriptome Profiling of Mouse Embryos

   Sexually mature p‐Enh^+/− mice were inter‐crossed and pregnant females
   at the indicated gestational stages (E8.5, E9.5 and E10.5) were
   sacrificed for embryo collection. After dissection from the decidua,
   embryos with the same genotype were combined together and incubated in
   200 uL TrypLE (Gibco, 12604013) at 37 °C with shaking (300 rpm) for ≈
   15 min. The tissue pellets were manually triturated with 2 min
   interval. Immediately following digestion, single cells were washed
   twice in 1% BSA/PBS and filtered through Falcon 40 µm Cell Strainer
   (Corning, 352340). Finally, cells were spun at 500 g for 5 min at 4 °C
   and resuspended in 0.04% BSA/PBS for further quality check and loading
   onto the 10× Chromium Controller. Processing of the samples was
   performed using the Chromium Single Cell 3′ library & Gel Bead Kit
   version 3 (10× Genomics) according to the manufacturer's instructions.
   Pair‐end 150 bp sequencing was performed on NovaSeq platform.

The Generation of Specific Fragments Genomic Knockout ESC Line and Mouse

   The generation of enhancer knockout or Cdx2 knockout mESCs were
   performed following protocol published previously with slightly
   modify.^[ [332]^61 ^] Briefly, small guided RNAs (sgRNAs) were designed
   according to the instruction from Chop‐chop website
   ([333]http://chopchop.cbu.uib.no/). sgRNAs were annealed and cloned
   into linearized px330‐mCherry plasmid (Addgene, Plasmid #98 750).
   Px330‐mCherry‐sgRNA was transfected into E14TG2a by Lipofectamine 3000
   Transfection Reagent (Invitrogen, L3000008) following the
   manufacturer's instructions. The mCherry positive cells were isolated
   and collected by BD FACS Aria SORP for further culture for 4 to 5 days
   until clone formation. Single clones were picked up manually for
   genotyping and sanger sequencing. The potential off‐target sites were
   tested based on the Chop‐chop website's indication. Primers used for
   genotyping and sgRNA sequences targeting p‐Enh and Cdx2 were listed in
   Table [334]S4 (Supporting Information). For the generation of p‐Enh‐KO
   mice, we employed the same sgRNAs utilized for p‐Enh knockout in mESCs.
   Cas9 mRNA and sgRNAs with scaffolds were obtained through in vitro
   transcription using the MEGAscript Kit (Invitrogen). Subsequently,
   these Cas9 mRNAs and sgRNAs were microinjected into the cytoplasm of
   fertilized ova utilizing the FemtoJet Microinjection System
   (Eppendorf). Following microinjection, the embryos were subjected to
   culture under standard culture conditions (37 °C, 5% CO[2]) in KSOM
   medium until reaching the 2‐cell stage. These 2‐cell embryos were then
   surgically transferred to the oviducts of pseudo‐pregnant females. Upon
   reaching adulthood, the resulting offspring were subjected to
   genotyping analysis.

RNA Extraction and Quantitative PCR (qPCR) and RNA‐seq Library Preparation
and Sequencing

   For the in vitro culture cells and embryo samples, total RNA was
   extracted using TRIzol reagent (Invitrogen, 15596018). cDNA was
   reverse‐transcribed with FastQuant RT Super Mix (Tiangen, KR108),
   following the manufacturer's instructions. Quantitative PCR analysis of
   mRNA levels for different markers was performed using diluted acquired
   cDNA with Stormstar SYBR green qPCR master mix (DBI Bioscience,
   DBI‐143) with specific primers, and all primers used in qPCR analysis
   were listed in Table [335]S4 (Supporting Information). RNA‐seq
   libraries were prepared from 300 ng pure total RNA using NEBNext Ultra
   II RNA Library Prep kit (NEB, E7775L). Two highly reproducible
   biological replicates (each with two technical replicates) were
   performed. Pair‐end 150 bp sequencing was performed on NovaSeq
   platform.

Immunofluorescence

   For the cultured cells, protocol for immunofluorescence was performed
   as described previously.^[ [336]^84 ^] Briefly, cultured cells were
   washed twice with PBS to remove detached cells, and fixed with 4%
   paraformaldehyde in PBS (pH 7.3) for ≈ 30 min at room temperature,
   followed by permeabilized and blocked with blocking buffer (0.5% Triton
   X‐100/5%BSA in PBS) for 1 h. Next, primary antibodies targeting NANOG
   (Abcam, ab80892. 1:200) and OCT4 (Santa Cruz, sc‐5279. 1:200) were
   diluted in blocking buffer and incubated with samples overnight at 4
   °C. Then, samples were washed three times with PBS containing 0.3%
   Triton‐X100 (PBS‐Tri), followed by incubation with secondary antibodies
   (1:400 diluted) in PBS‐Tri for 2 h at room temperature. After wash
   steps and DAPI staining, samples were imaged using Leica TCS SP8 STED
   confocal microscope.

   For the gastruloids and embryo samples, immunolabeling‐enabled 3D
   imaging of solvent‐cleared organs (iDISCO) was used for the whole‐mount
   immunolabeling as previously reported.^[ [337]^85 ^] In brief, samples
   were dehydrated with methanol, bleach in chilled fresh 5% H[2]O[2] in
   methanol overnight at 4 °C. After washing and rehydration, samples were
   incubated in blocking buffer which containing 0.2% TritonX‐100/20%
   DMSO/0.3 M glycine in PBS at 37 °C overnight. Samples were washed in
   PBS/0.2% Tween‐20 with 10 µg mL^−1 heparin (PTwH) for 1 h twice. Next,
   the primary antibodies (CDX2 (Abcam, ab76541), SOX2 (Abcam, ab97959),
   MyHC (R&D Systems, MAB4470)) targeting specific protein were diluted in
   PTwH/5% DMSO/3% donkey serum (1:200) and incubated with samples at 37
   °C overnight. Then, samples were washed with PTwH for 1 day and
   incubated with specific secondary antibodies (1:400 diluted). Finally,
   after washing steps and nuclear labeling, solvent‐based tissue clearing
   was carried out which involving dichloromethane (Sigma,
   270997‐12×100ML) and dibenzyl ether (DBE) (Sigma, 108014‐1KG). OLYMPUS
   IXplore SpinSR and LiTone XL Light‐sheet Microscope were used for the
   imaging.

RNAscope

   Fresh mouse embryo samples were fixed in 4% PFA/PBS at 4 °C overnight.
   The fixed samples were further embedded in OCT and cryo‐sectioned at
   20 µm thickness. The RNAscope procedure was then performed using
   RNAscope Multiplex Fluorescent Reagent Kit v2 (Advanced Cell
   Diagnostics, 323100) according to the RNAscope Multiplex Fluorescent
   Reagent Kit v2 Assay User Manual (Document Number 323100‐USM). Probes
   targeting from Advanced Cell Diagnostics Uncx (Cat No. 521431), Tbx6
   (Cat No. 498251‐C2), p‐Enh‐eRNA (Cat No. 1199081‐C2), p‐Enh‐eRNA
   (antisense) (Cat No. 1198881), Cdx2 (Cat No. 438921‐C3) were used. For
   the simultaneous detection of RNA and protein in embryo samples, the
   procedural guidelines outlined in the RNAscope Multiplex Fluorescent v2
   Assay were followed and combined with Immunofluorescence – Integrated
   Co‐Detection Workflow (ICW) (Document Number MK 51–150). The sample
   pre‐treatment step using protease digestion (as applied in RNA‐only
   detection) was replaced with 1% Triton X‐100 treatment to preserve both
   RNA and protein signals. SMAD4 polyclonal antibody (Invitrogen,
   PA5‐34806, diluted at 1:100), Phospho‐Smad1/5 (Ser463/465) (41D10)
   (Cell Signaling Technology, 9516T, diluted at 1:200) were used. Images
   were acquired using OLYMPUS IXplore SpinSR, Zeiss LSM 980 confocal
   microscope, and LiTone XL Light‐sheet Microscope.

Western Blotting

   The cultured cells were washed 3 times with PBS, lysed with lysis
   buffer and heated at 100 °C for 10 mins. Protein solution was run on
   SDS‐PAGE and the gels were transferred to nitrocellulose membranes
   (GE). After blocking in 5% skimmed milk at room temperature for 1 h,
   membranes were then incubated with primary antibodies targeting CDX2
   (Abcam, ab76541) overnight at 4 °C. Next, membranes were washed 3 times
   with TBST and incubated with HRP‐conjugated secondary antibody at room
   temperature for 1 h. immunoreactive bands were visualized with
   SperSignal West Pico Chemiluminescent Substrate (Thermo Scientific,
   34579) and detected using Tanon‐5500 imaging system. Automated western
   blotting analysis were performed on WES (ProteinSimple) system using a
   Size Separation Master Kit with Split Buffer (12–230 kDa) in accordance
   with the manufacturer's instructions. Capillary‐based immunoassays were
   performed using the Wes Simple Western method (SMAD4, Invitrogen,
   PA5‐34806. diluted at 1:50) and an anti‐rabbit detection module
   (ProteinSimple). Compass for Simple Western program (ProteinSimple) was
   used for further illustration.

eRNA Overexpression

   To achieve stable nuclear expression of p‐Enh‐eRNA, we employed the
   previously reported sno‐vector system.^[ [338]^71 , [339]^72 ^] In
   brief, p‐Enh‐eRNA sequence was cloned into pZW1‐snoVector with flanking
   SNORD116‐13 and SNORD116‐14. Subsequently, mESCs were transfected with
   the pZW1‐snoVector construct and selected using 1 mg mL^−1 G418 for
   Kan/Neo resistance.

Chromatin Immunoprecipitation (ChIP) and ChIP‐seq Libraries Preparation

   The protocol for chromatin immunoprecipitation (ChIP) was described
   previously.^[ [340]^18 ^] Cross‐linked cells were lysed in Solution I
   (10 mm HEPES, pH 7.9, 0.5% NP40, 1.5 mm MgCl2, 10 mm KCl, 0.5 mm DTT)
   and SDS lysis buffer (1% SDS, 10 mm EDTA, 50 mm Tris‐HCl, pH 8.0), and
   the samples were sheared for 14 cycles (30 s on/off) in a Bioruptor
   Pico (Diagenode, Belgium) to achieve an average fragment size of
   200–300 bp. Solubilized fragmented chromatin was immunoprecipitated
   with antibody against H3K27ac (Active Motif, 39133) or SMAD4 polyclonal
   antibody (Invitrogen, PA5‐34806). Antibody‐chromatin complexes were
   pulled down using Dynabeads Protein G (Invitrogen, 10004D) on a
   magnetic separation rack, washed several times, and then eluted from
   the magnetic beads. Reverse crosslink was performed subsequently under
   65 °C for at least 4 h. ChIP‐DNA was treated with RNase A and
   Proteinase K, and precipitated with ethanol. ChIP‐DNA was finally
   solved in nuclease‐free water and quantified using Qubit. Sequencing
   libraries were generated by using NEBNext Ultra DNA library preparation
   kit (NEB, E7370). Libraries were quality‐controlled and quantified
   using a Qubit 2.0 Fluorometer (Life Technologies) and Agilent 2100
   Bioanalyzer (Agilent Technologies). High‐throughput sequencing was
   performed on a NovaSeq instrument.

Circular Chromosome Conformation Capture (4C) Libraries Construction

   The protocol for 4C and 4C‐seq libraries construction were reported
   previously with slight modifications.^[ [341]^86 ^] Briefly, ≈5 million
   NMP cells were cross‐linked with 1% formaldehyde/PBS for 12 min,
   followed by quenching with 0.125 M glycine for 5 min at room
   temperature. The cells were then centrifuged at 500 g at 4 °C for 5
   min. After removing the supernatant, the cell pellets were frozen with
   liquid nitrogen and stored at −80 °C. The next day, the pellets were
   resuspended in 1 mL cold lysis buffer (50 mm Tris–HCl pH 7.5, 150 mm
   NaCl, 5 mM EDTA, 0.5% NP‐40, 1% Triton X‐100) containing protease
   inhibitors and incubated on ice for 30 min. After a 5 min
   centrifugation at 2,500 g at 4 °C, the supernatant was carefully
   removed. The nuclei pellets were then resuspended in 50 µL 0.5% SDS,
   incubated on ice for 30 min, heated up to 62 °C, and incubated for 10
   min. Subsequently, 160 µL 1.56% Triton X‐100 was added to the reaction
   tube, and the mixture was incubated at 37 °C for 30 min. For the first
   round of digestion, 200 U Dpn II was used, and the reaction was
   incubated at 37 °C for 3 h. Following complete digestion, the reaction
   tube was incubated at 65 °C for 20 min. After heat‐inactivation, the
   ligation process was carried out with the T4 DNA ligase system at 37 °C
   for 4 h. Subsequently, reverse cross‐linking and precipitation were
   performed, and the DNA was dissolved in 450 µL H2O. In the next step,
   the second‐round digestion with Csp6l was carried out at 37 °C for 3 h.
   After heat‐inactivation and ligation, DNA was purified with AMPure
   beads (Beckman Coulter, A63881). Finally, the DNA product was amplified
   with reading primer and reverse primer, and Illumina sequencing indexes
   were added following the manufacturer's instructions. High‐throughput
   sequencing was performed on a NovaSeq instrument.

RNA Pull‐Down Assay

   The RNA pull‐down assay was performed as previously described with
   minor modifications.^[ [342]^87 , [343]^88 ^] Briefly, a set of
   biotinylated antisense DNA probes targeting the p‐Enh region were
   synthesized with a Biotin‐TEG modification at the 3′ end (probe
   sequences listed in Table [344]S4, Supporting Information). NMP cells
   were subjected to UV crosslinking using a UV crosslinker (Analytik
   Jena, CL‐3000) at 254 nm, with an energy setting of 120 mJoules cm^−
   ^2, repeated three times to ensure efficient crosslinking. Cells were
   then pelleted by centrifugation, washed with ice‐cold PBS, snap‐frozen
   in liquid nitrogen, and stored at −80 °C until further use.

   Approximately 20–30 million cells were used for one reaction. For
   lysis, the frozen cell pellets were resuspended in lysis buffer (50 mm
   Tris‐Cl pH 7.0, 10 mm EDTA, 1% SDS) supplemented with protease
   inhibitors, PMSF, and Superase‐In (Invitrogen), followed by sonication
   using a Bioruptor Pico (Diagenode, Belgium). The lysates were
   centrifuged at 14000 × g for 10 min at 4 °C, and the supernatant was
   collected for hybridization. A small aliquot of the supernatant was
   retained as the total nuclear lysate (TNL) control.

   For hybridization, the lysates were incubated with biotinylated probes
   (1 µL of 100 µM probes mix per 1 mL of lysate), supplemented with
   one‐fourth volume of 5× hybridization buffer (50 mm Tris‐HCl pH 7.5,
   10 mm EDTA, 1.5 m NaCl, 50% formamide). The mixture was rotated at
   39 °C for 3 h, followed by incubation with 100 µL of pre‐washed
   streptavidin magnetic beads (Dynabeads Streptavidin C1, Invitrogen,
   65001) for an additional 3 h at the same temperature. Beads were then
   washed five times with 2× SSC containing 0.5% SDS at 37 °C to remove
   nonspecifically bound material.

   Protein complexes bound to the probes were eluted by incubating the
   beads in 100 µL biotin elution buffer (12.5 mm D‐biotin, 7.5 mm HEPES
   pH 7.5, 75 mm NaCl, 1.5 mm EDTA, 0.15% SDS, 0.075% sarkosyl, 0.02%
   sodium deoxycholate) with rotation at room temperature for 20 min,
   followed by heating at 65 °C for 10 min. The supernatant was
   transferred to a new tube, and the beads were eluted once more with an
   additional 100 µL of elution buffer. The two eluates were combined, and
   proteins were precipitated by adding four volumes of pre‐chilled
   acetone and incubating at −80 °C overnight. The precipitated proteins
   were collected by centrifugation at 15000 × g for 30 min at 4 °C and
   used for subsequent Western blot analysis.

Quantification and Statistical Analysis–Enhancer Screening Based on H3K27ac
ChIP‐seq Data—Data Processing

   H3K27ac ChIP‐seq data for E7.0 and E7.5 mouse embryo posterior tissues
   were obtained from the NCBI Gene Expression Omnibus (GEO) under
   accession numbers [345]GSE98101.^[ [346]^18 ^] Additionally, H3K27ac
   ChIP‐seq data for E9.5 mouse embryo tailbud were obtained from
   [347]GSE84899.^[ [348]^31 ^] Processing of all ChIP‐seq samples started
   with raw sequencing reads. First, Trim Galore (version 0.4.4_dev) was
   used to remove adapter and low‐quality sequences by trimming 3′ ends of
   reads.^[ [349]^89 ^] The resulting reads were then aligned to mouse
   reference genome mm10 using Bowtie (version 1.2.2) with the parameters
   “–chunkmbs = 512 ‐I = 0 ‐X = 1000 –best ‐m = 1”.^[ [350]^90 ^] After
   removing duplicates, we performed peak calling using MACS (version
   1.4.2) with the parameters “–shiftsize = 100 –nomodel –keep‐dup =
   all”.^[ [351]^91 ^] Subsequently, we merged all resulting peaks of
   samples into a consensus list of genomic regions and counted the reads
   within those regions using MAnorm2_utils (version 1.0.0) with the
   parameters “–min‐peak‐gap = 150 –typical‐bin‐size = 2000 –shiftsize =
   100 –filter = blacklist”.^[ [352]^30 ^] The blacklist regions of mm10
   were obtained from Amemiya et al.^[ [353]^92 ^]

Screening Distal Regulatory Elements Activated at E7.5 Mouse Embryo Posterior
Tissues

   We used MAnorm2 (version 1.2.2) to identify differential H3K27ac
   regions between E7.0 and E7.5 embryo posterior tissue samples with the
   cutoffs p‐value < 0.01 and fold change > 2.^[ [354]^30 ^] Distal
   regulatory elements activated in E7.5 mouse embryo posterior tissues
   were defined as the differential H3K27ac regions which both exhibited
   increased H3K27ac levels in E7.5 samples and located greater than
   1.5 kb from transcription start sites (TSSs) of genes. The top‐30 most
   significant elements are shown in Figure [355]1B. The gene annotation
   file of mm10 was obtained from the GENCODE project.^[ [356]^93 ^]

TF Motif Enrichment Analysis

   Position Frequency Matrices (PFMs) of vertebrate transcription factors
   were acquired from the JASPAR CORE database^[ [357]^94 ^]
   (JASPAR2024_CORE_vertebrates_non‐redundant_pfms_jaspar.txt). For TF
   motif enrichment analysis, we employed two tools: FIMO^[ [358]^95 ^]
   (version 5.5.5) and MotifScan^[ [359]^96 ^] (version 1.3.0). FIMO
   identified 189 significantly enriched TF motifs with the significance
   threshold p‐value < 0.0001. Motifscan detected 100 TF motifs in the
   p‐Enh regions under the threshold p‐value < 0.0001.

Single Cell RNA Sequencing Data Analysis—Data Processing

   For each dataset, raw sequencing data was preprocessed using cellranger
   (10x Genomics Cell Ranger 5.0.0) with default settings, including
   alignment to mouse reference genome mm10, filtering, barcode counting
   and unique molecular identifier (UMI) counting.^[ [360]^97 ^] The
   output files (matrix.tsv.gz, barcordes.tsv.gz and features.tsv.gz)
   generated by cellranger were subsequently used for downstream analysis.
   Cell doublets were estimated by scrublet pipeline (version 0.2.3) and
   discarded.^[ [361]^98 ^] Cells with fewer than 1000 detectable genes or
   with higher than 5% UMIs from mitochondrial genes were excluded. Genes
   detected in more than 1% total number of cells were retained for
   following analysis. The detailed quality control information was listed
   in Table [362]S2 (Supporting Information).

Construction of Mouse Embryo Single‐Cell Transcriptome Reference Map

   scRNA‐seq datasets were utilized from E8.5, E9.5, and E10.5 WT embryos
   to establish a reference map with the R package Seurat (version
   4.0.1).^[ [363]^99 ^] First, we normalized gene expression using the
   NormalizeData function and identified variable genes using the
   FindVariableFeatures function for each dataset. Specifically, gene
   count matrices were normalized by the total number of UMIs per cell and
   then multiplied by a scale factor (10000), followed by
   log‐transformation. After using variance‐stabilizing transformation
   (VST), top 2000 genes sorted by variance were selected as variable
   genes.

   Next, the datasets were integrated by identifying cell pairwise
   correspondences, referred to as “anchors”, between datasets based on
   the consistently variable genes using the FindIntegrationAnchors
   function.^[ [364]^100 ^] Batch correction was performed with the
   identified anchors using the IntegrateData function.^[ [365]^100 ^]
   Subsequently, principal component analysis was performed using the
   RunPCA function and top 30 principal components were chosen for cell
   clustering. Cell clusters were identified using the FindClusters
   function with a resolution parameter of 0.5, which employed a K‐nearest
   neighbor (KNN) graph‐based clustering approach to iteratively group
   cells together.^[ [366]^101 ^] For visualization, UMAP dimensionality
   reduction was performed with top 30 principal components using the
   RunUMAP function.^[ [367]^102 ^]

   For cell type annotation, we detected marker genes of cell clusters by
   comparing each cell cluster with all other cell clusters using Wilcoxon
   rank sum test, with fraction of expressed cells > 0.25 in ether of the
   two cell populations, adjusted p‐value < 0.05 and log2(fold change) >
   0.25. Based on the marker genes of cell clusters, 22 major cell types
   were identified. Subsequently, the marker genes of cell types were
   detected again using the same method as described above. The complete
   list of marker genes for all cell types can be found in Table [368]S3
   (Supporting Information).

Projecting p‐Enh‐KO Embryo Cells onto Reference Map

   p‐Enh‐KO cells were projected onto reference map by using the Seurat
   package. For scRNA‐seq datasets from E8.5, E9.5, and E10.5 p‐Enh KO
   embryos, gene expression normalization and variable gene identification
   were first performed using NormalizeData and FindVariableFeatures
   functions respectively. Next, the anchors between p‐Enh‐KO datasets and
   the reference map were identified using the FindTransferAnchors
   function. Finally, p‐Enh‐KO cells were classified and annotated using
   the TransferData function, as well as projected onto the reference UMAP
   using the MapQuery function.

Differential Expression Analysis Between WT and KO Embryos

   Comparisons between WT and p‐Enh‐KO were done for each cell type and at
   each stage. By using Wilcoxon rank sum test, differential expression
   genes (DEGs) were detected with fraction of expressed cells > 0.25 in
   ether WT or p‐Enh‐KO cells, adjusted p‐value < 0.05 and log2(fold
   change) > 0.25. Enrichment analysis between DEGs and the top 100
   significant marker genes of cell types were performed using Fisher's
   exact test.

Calculating Module Scores of a Gene Set at Single Cell Levels

   For a gene set of interest, the module scores in single cells were
   computed using the AddModuleScore function of the R package Seurat.
   Specially, all expressed genes were first divided into bins based on
   their averaged expression across all cells, and then control genes were
   randomly selected from each bin. Module scores were calculated as the
   difference between the average expression of the gene set of interest
   and the average expression of the control gene set.^[ [369]^103 ^]

Analysis of Anterior‐Posterior Axis Pattering and Limb Development

   For analysis of anterior‐posterior axis pattering, the cells of
   Forebrain/Midbrain/Hindbrain, NMP, Neural tube, Paraxial mesoderm‐1,
   Paraxial mesoderm‐2, and presomitic mesoderm were selected for
   reclustering. For analysis of limb development, the cells of Limb
   mesenchyme and Mesenchyme were selected for reclustering. Consistent
   with the previous analysis strategy, we first constructed a WT
   reference and then projected cells of corresponding cell types from
   p‐Enh‐KO embryos onto the reference.

   For RNA velocity analysis, the count matrices of pre‐mature and mature
   mRNA abundances were first generated by velocyto (version 0.17.17)
   counting pipeline, and then used as input for scVelo (version 0.2.4) to
   estimate RNA velocity.^[ [370]^104 , [371]^105 ^]

   Trajectory analysis was performed using monocle3 (version 1.0.0) and
   pseudotime of cells was calculated based on the trajectory.^[ [372]^46
   ^] The density plots of WT or p‐Enh‐KO cells along the pseudotime axis
   were implemented using the function kdeplot function of the python
   seaborn package, where a Gaussian kernel was used to produce a
   continuous density estimate and the density was scaled by the number of
   observations so as to make the total area under the density equal to 1.
   For posterior neural tube development and the posterior mesodermal
   lineages development, we selected the top 500 most variable genes,
   detected by using the FindVariableFeatures function of the Seurat
   package, as trajectory‐based expressed genes.

Gastruloids Transcriptome Data Processing

   Raw sequencing reads were first trimmed by Trim Galore (version
   0.4.4_dev) and then aligned to mm10 reference genome by STAR (version
   2.5.2b) with default parameters.^[ [373]^106 ^] After removal of
   duplicates, featureCounts (version 1.6.5) was used to count reads for
   genes.^[ [374]^107 ^] The gene annotation file
   (gencode.vM10.annotation.gtf) was obtained from the GENCODE project.
   Batch effects were corrected using the ComBat function of the sva
   package, where the first batch was selected as the reference batch.^[
   [375]^108 , [376]^109 ^] Differential expression analysis was performed
   using the R package DESeq2 (version 1.40.2) and significantly
   differential expression genes (DEGs) were defined as the genes with
   adjusted p‐value less than 0.01 and fold change greater than 2.^[
   [377]^110 ^] GO terms enriched by DEGs was identified using the R
   package clusterProfiler (version 4.8.2).^[ [378]^111 ^] The table
   containing the list of genes specifically regulated by CDX2 or p‐Enh is
   provided in Table [379]S5 (Supporting Information).

H3K27ac ChIP‐seq Data Processing

   Paired‐end ChIP‐seq reads were trimmed by TrimGalore v0.6.6 with
   parameters ’‐q 20′, then aligned to the mouse genome assembly (mm10)
   with Bowtie 2 v2.2.5^[ [380]^112 ^] and sorted with SAMtools v1.13.^[
   [381]^113 ^] All reads from mitochondria and chromosome Y were removed.
   Duplicated reads and reads with mapping quality below 30 were removed.

   Peaks were called by MACS2 v2.2.7.1^[ [382]^91 ^] with parameters
   ‘–extsize 200 –nomodel –nolambda –keep‐dup all ‐p 0.01 ‐g mm’.
   Differential peak analysis was performed with the DESeq2 method in
   DiffBind 3.8.4.^[ [383]^114 ^] Significant differential peaks were
   defined with FDR <0.05 and log2(FoldChange)≥1. Bigwig files were
   normalized in RPKM under 50 bp bin and generated with deepTools v3.5.1
   bamCoverage.^[ [384]^115 ^] Annotations of peaks were performed with R
   package ChIPSeeker v1.5.1.^[ [385]^116 ^] Table containing detailed
   information about differential H3K27ac ChIP‐seq peaks is provided in
   Table [386]S6 (Supporting Information).

TFs ChIP‐seq Data Processing

   TFs ChIP‐seq includes SMAD2/3/4, CDX2 from different public datasets
   with paired‐end or single‐end sequencing. Paired‐end and single‐end
   reads were trimmed by TrimGalore v0.6.6 with parameters ‘‐q 20′
   relatively, then aligned to the mouse genome assembly (mm10) with
   Bowtie 2 v2.2.5^[ [387]^112 ^] and sorted with SAMtools v1.13.^[
   [388]^113 ^] All reads from mitochondria and chromosome Y were removed.
   Duplicated reads and reads with mapping quality below 30 were removed.

   Peaks were called by MACS2 v2.2.7.1^[ [389]^91 ^] with parameters
   ‘–extsize 200 –nomodel –keep‐dup all ‐p 1e‐5 ‐f BAM ‐g mm’. Bigwig
   files were normalized in RPKM under 50 bp/10 bp bin and generated with
   deepTools v3.5.1^[ [390]^115 ^] bamCoverage.

4C‐seq Data Processing and Peak Calling

   Raw sequencing data was processed using pipe4C^[ [391]^86 ^] (version
   1.0.0). Subsequently, all resulting RDS files were used to identify
   reproducible peaks using the PeakC^[ [392]^117 ^] R package (version
   0.2). Significant peaks were identified based on a p‐value threshold of
   p < 0.05.

Signaling Pathway Enrichment Analysis

   GEO‐seq data for E7.5 mouse embryos were obtained from the NCBI GEO
   under accession numbers [393]GSE120963.^[ [394]^9 ^] Processing of
   GEO‐seq data samples started with raw sequencing reads. Quality control
   metrics such as sequence quality scores, GC content, and adapter
   content were assessed using FastQC (v0.11.9). Low‐quality reads and
   adapters were trimmed or removed. Then, the high‐quality reads were
   aligned to a reference genome (mm10) with Hisat2 (v2.2.1).^[ [395]^118
   ^] Gene expression quantification was performed using featureCounts
   (v1.5.3).^[ [396]^107 ^] The count matrix generated representing the
   number of reads mapped to each gene in each sample. The count matrix
   was normalized to account for differences in sequencing depth and gene
   length as Transcripts Per Kilobase of exon model per Million mapped
   reads (TPM). To evaluate the enrichment of genes response to TGF‐β,
   AUCell (v1.20.2) was used to calculate the gene‐set activity score
   (gene set from GESA: MM9770).^[ [397]^119 ^] The score was calculated
   by the top 10% genes of all genes in expression matrix.

Statistical Analysis

   Analysis of H3K27ac ChIP‐seq data for E7.0 and E7.5 mouse embryo
   posterior tissues, each condition with two biological replicates, was
   performed by MAnorm2. For normalization of ChIP‐seq signals, MAnorm2
   first identified common peaks between samples as baseline regions.
   These common peaks are then subjected to MA‐plot (log‐ratio versus
   average signal intensity), where a linear regression model is fitted to
   calculate sample‐specific scaling factors. Finally, these scaling
   factors are applied to adjust the ChIP‐seq signals across all peaks,
   including sample‐specific peaks. For differential analysis, MAnorm2
   employs a modified t‐test based on both the empirical Bayes framework
   to improve variance estimation for small sample sizes and the
   mean‐variance modeling to stabilize variance estimates by borrowing
   information across genomic regions. p‐values were corrected for
   multiple testing by using Benjamini‐Hochberg (BH) method.

   Single cell RNA sequencing data analysis was performed by Seurat.
   Differential gene expression analysis was performed by using the
   FindMarkers function in Seurat tested by two‐sided Wilcoxon rank sum
   test followed by BH correction.

   Analysis of gastruloids transcriptome data, each condition with three
   biological replicates, was performed by DESeq2. For normalization,
   DESeq2 calculated sample‐specific size factors as the median of gene
   expression ratios to correct for sequencing depth variations. For
   differential expression analysis, DESeq2 employs a negative binomial
   generalized linear model (NB‐GLM) framework, which involves three key
   computational steps: 1) estimation of gene‐specific dispersion
   parameters with empirical Bayes shrinkage to account for biological
   variability, 2) fitting of the generalized linear model to quantify
   expression changes, and 3) statistical hypothesis testing using Wald
   test to assess significant differences between treatment and control
   groups. p‐values were corrected for multiple testing by using
   Benjamini‐Hochberg (BH) method.

   GO enrichment analysis was performed by clusterProfiler where
   statistical significance is tested by one‐sided hypergeometric test
   followed by Benjamini‐Hochberg (BH) correction.

   For the ChIP‐seq data, the differential peak analysis was performed
   with the DESeq2 method in DiffBind 3.8.4.^[ [398]^114 ^] Significant
   differential peaks were defined with FDR <0.05 and log2(FoldChange) ≥
   1.

   For the 4C‐seq data analysis, significant peaks were identified based
   on a p‐value threshold of p < 0.05 in the PeakC^[ [399]^117 ^] R
   package (version 0.2).

   For comparisons involving three or more groups, such as gastruloid
   morphological measurements and qPCR‐based gene expression analyses,
   statistical significance was assessed using one‐way ANOVA. For
   ChIP‐qPCR data comparing two groups, unpaired t‐tests were performed.
   Statistical analyses were conducted using GraphPad Prism. Bar plots
   represent mean ± standard deviation (SD), and sample sizes (n) are
   indicated in the corresponding figure legends. Detailed statistical
   tests and exact p‐values are labeled in the figures.

Conflict of Interest

   The authors declare no conflict of interest.

Author Contributions

   Y.C., F.T., Q.F., and L.Z. contributed equally to this work. X.Y.,
   N.J., and Y.C. performed Conceptualization: Y.C., F.T., X.Y., L.Z.,
   Q.F., J.L., P.S., M.W., Y.Q., R.S., Y.F., H.J.X., and R.W. performed
   Investigation: Y.C., X.Y., F.T., Q.F., R.S., and Y.F. performed
   visualization: X.Y. and N.J. performed funding acquisition: X.Y., N.J.,
   J.L., Z.S., and C.L. supervision: Y.C., X.Y., F.T., Q.F., and N.J.
   wrote original draft: X.Y., N.J., Y.C., F.T., Q.F., and C.L. wrote,
   reviewed, and edited.

Data and Materials Availability

   All raw and processed sequence data reported in this paper have been
   deposited in the Genome Sequence Archive^[ [400]^120 ^] in National
   Genomics Data Center,^[ [401]^121 ^] China National Center for
   Bioinformation / Beijing Institute of Genomics, Chinese Academy of
   Sciences (GSA: CRA014616) that are publicly accessible at
   https://ngdc.cncb.ac.cn/gsa.

Supporting information

   Supporting Information
   [402]ADVS-12-e00895-s006.pdf^ (37.4MB, pdf)

   Supporting Information
   [403]ADVS-12-e00895-s003.xlsx^ (21KB, xlsx)

   Supplemental Table 1
   [404]ADVS-12-e00895-s004.xlsx^ (11.4KB, xlsx)

   Supplemental Table 2
   [405]ADVS-12-e00895-s002.xlsx^ (36.3KB, xlsx)

   Supplemental Table 3
   [406]ADVS-12-e00895-s005.xlsx^ (13.9KB, xlsx)

   Supplemental Table 4
   [407]ADVS-12-e00895-s007.xlsx^ (20KB, xlsx)

   Supplemental Table 5
   [408]ADVS-12-e00895-s001.xlsx^ (195.7KB, xlsx)

Acknowledgements