Abstract

   Despite extensive research, the gene regulatory architecture governing
   mammalian cell states remains poorly understood. Here we present an
   integrative systems biology approach to elucidate the network
   architecture of primed state pluripotency. Using an unbiased
   methodology, we identified and experimentally confirmed 132
   transcription factors as master regulators (MRs) of mouse epiblast stem
   cell (EpiSC) pluripotency, many of which were further validated by
   CRISPR-mediated functional assays. To assemble a comprehensive
   regulatory network, we silenced each of the 132 MRs to assess their
   effects on the other MRs and their transcriptional targets, yielding a
   network of 1273 MR → MR interactions. Network architecture analyses
   revealed four functionally distinct MR modules (communities), and
   identified key Speaker and Mediator MRs based on their hierarchical
   rank and centrality. Our findings elucidate the de-centralized logic of
   a “communal interaction” model in which the balanced activities of four
   MR communities maintain primed state pluripotency.

   Subject terms: Embryonic stem cells, Regulatory networks, Gene
   silencing
     __________________________________________________________________

   The architecture of gene regulatory networks that govern mammalian
   cells has been poorly understood. Here, Li et al. present a
   computational systems approach to elucidate the regulatory logic of the
   gene network for primed state pluripotency.

Introduction

   Pluripotency is defined by cellular states in which cells are capable
   of self-renewal while remaining poised to differentiate into all
   embryonic cell lineages. To date, a progression of three distinct in
   vivo pluripotency states—naïve, formative, and primed—has been
   identified in the mouse embryo^[34]1–[35]4, with formative cells
   representing a transition between the naïve and primed
   states^[36]5,[37]6. Due to their central significance for stem cell
   biology and regenerative medicine, the naïve and primed states have
   been captured as cell lines in vitro and intensively studied.

   In culture, embryonic stem cells (ESCs) established from
   pre-implantation blastocysts reflect the naïve pluripotent state,
   whereas epiblast stem cells (EpiSCs) isolated from pre-gastrulation
   embryos^[38]7,[39]8 correspond to the primed pluripotent state. Primed
   pluripotent cells co-express markers characteristic of the
   undifferentiated naïve state, as well as markers of more differentiated
   lineage-specific states; they differ from the naïve state in a range of
   molecular and functional properties^[40]1,[41]2,[42]9. Notably,
   conventional pluripotent stem cell cultures established from human and
   many other mammalian species largely correspond to a primed
   pluripotency state, with smaller subpopulations corresponding to naïve
   and formative states, and thus most closely resemble mouse
   EpiSCs^[43]1–[44]3. However, although the primed state is highly
   conserved, it is less understood than the naïve state, particularly in
   terms of the regulatory network that governs its pluripotency.

   Previous studies of pluripotency network architecture have largely
   focused on a limited number of well-known regulators that have been
   primarily identified by mutational analyses in cultured stem cells or
   in vivo^[45]10–[46]17. These studies led to two general models of the
   regulatory logic underlying naïve state pluripotency. In the first
   model, a highly interconnected, hierarchical network of core
   transcription factors such as Oct4 and Nanog are responsible for
   activating pluripotency programs and repressing
   differentiation^[47]10,[48]11,[49]13,[50]17,[51]18. In the second
   model, pluripotency regulators comprise an array of specification genes
   that direct differentiation toward a specific lineage while repressing
   other lineages, creating an unstable undifferentiated equilibrium with
   the potential to commit to multiple lineages^[52]19–[53]23. However,
   these models were based on analyses of a relatively small number of
   genes, and it is conceivable that key features that may have emerged
   from an unbiased approach were missed. More critically, elucidating
   large-scale network topology by studying a relatively small number of
   genes is suboptimal, as underscored by an ever-expanding repertoire of
   genes implicated in pluripotency maintenance, which suggests that
   pluripotency is controlled by a sizeable, highly complex gene
   regulatory network.

   To address this challenge, we leveraged a comprehensive and unbiased
   systems approach for the large-scale systematic elucidation of the
   regulatory logic of mouse EpiSC cells, a defined cell state of
   fundamental importance in normal development. The approach—which is
   based on the seamless integration of both computational and
   experimental methodologies—has led to the systematic identification and
   validation of primed pluripotency regulators, followed by elucidation
   of their regulatory relationships and overall network architecture. For
   this purpose, we have utilized ARACNe^[54]24 and VIPER^[55]25, two
   algorithms extensively validated in a wide range of
   physiologic^[56]26–[57]28 and pathologic contexts, both
   tumor^[58]29–[59]32 and non-tumor related^[60]33–[61]35. ARACNe was
   used to assemble an initial regulatory network (interactome) from large
   collections of expression profiles, using an information theory
   approach that largely eliminates indirect interactions. VIPER
   interrogates the resulting interactome to identify candidate Master
   Regulator (MR) proteins representing mechanistic determinants of a
   specific transcriptional cell state via their transcriptional targets;
   this is accomplished by measuring the enrichment of their activated and
   repressed targets in genes differentially expressed in that state, akin
   to a highly multiplexed gene reporter assay. Notably, the accuracy and
   sensitivity of VIPER-inferred protein activity compares favorably with
   antibody-based measurements^[62]35,[63]36, and allows unbiased
   identification of candidate MRs (i.e., independent of prior literature
   knowledge).

   In our study, we have generated an extensive collection of EpiSC gene
   expression profiles, as well as multiple signatures of lineage-specific
   differentiation to support the systematic generation of a mouse
   EpiSC-specific interactome and the identification of MR proteins
   responsible for the maintenance of primed state pluripotency. Using a
   set of 132 experimentally confirmed MRs, we elucidated their mutual
   regulatory relationships by systematically silencing each MR in turn to
   determine the resulting effects on the activity of all other MRs. This
   large-scale experimental analysis yielded a comprehensive MR → MR
   causal interaction network, which we have investigated using concepts
   from social network analysis to understand its regulatory architecture.
   Notably, topology analysis identified four distinct gene communities
   within the EpiSC network, suggesting a “communal interaction” model for
   how pluripotency is maintained. Our analysis of the regulatory
   architecture of primed state pluripotency thus provides relevant
   insights into mechanisms of pluripotency and may guide future studies
   of other similarly complex regulatory networks.

Results

Overall experimental strategy

   To dissect the logic of molecular interactions responsible for
   regulating primed state pluripotency, we pursued a six-step strategy
   that integrates computational inference with validation assays,
   followed by experimental refinement. First, as inputs for the ARACNe
   algorithm, we generated a large repertoire of gene expression profiles
   from EpiSCs undergoing a wide range of small molecule perturbations.
   This resulted in a comprehensive first-generation interactome
   comprising the computationally inferred transcriptional targets
   (regulon) of every transcription factor (Fig. [64]1a). Second, we
   leveraged this network to identify candidate MRs of primed state
   pluripotency, by using the VIPER algorithm to analyze gene expression
   time series representative of EpiSC differentiation along multiple
   lineages (Fig. [65]1b). Third, we validated candidate MRs emerging from
   the analysis in terms of their ability to modulate the pluripotent
   transcriptional state following RNAi-mediated silencing, resulting in a
   list of 132 confirmed MRs that contained both known and novel
   regulators of pluripotency (Fig. [66]1c). Fourth, among these, we
   sought to validate 70 MRs not previously reported in the literature
   using functional assays of self-renewal and differentiation phenotypes
   following their CRISPR-mediated knock-out (Fig. [67]1d). Fifth, we used
   the RNA-seq profiles of EpiSC cells following RNAi-mediated silencing
   of each of the 132 confirmed MRs, as generated in step three, to
   assemble a comprehensive, experimentally-derived regulatory network
   governing primed state pluripotency (Fig. [68]1e). This fully causal
   MR → MR interaction network is based directly on experimental
   determination of the differential activity of each MR as induced by the
   silencing of every other MR. Finally, we investigated the architecture
   of the primed pluripotency network through modularity, hierarchy, and
   centrality analyses (Fig. [69]1f). Overall, this strategy allowed
   reconstructing the architecture of a complex and previously
   uncharacterized EpiSC regulatory network, using a fully unbiased
   approach that is largely based on the analysis of targeted experimental
   perturbations.

Fig. 1. Flowchart of experimental design.

   [70]Fig. 1
   [71]Open in a new tab

   a Interactome construction. Two independent EpiSC lines were treated
   with 5 differentiation conditions and 33 small molecule perturbations,
   generating 276 expression profiles. These profiles served as input for
   the ARACNe algorithm to create the “first-generation” EpiSC
   interactome. b Inference of candidate MRs. 144 expression profiles from
   5 distinct lineage-specific differentiation time courses were used to
   generate signatures. The VIPER algorithm then analyzed these signatures
   to identify candidate MRs. c Validation of candidate MRs. Two
   independent screens investigated whether silencing of candidate MRs
   could modulate the pluripotent state, resulting in 132 validated MRs.
   The first screen examined whether candidate MR silencing altered
   endogenous Oct4 protein levels, while the second screen used PLATE-seq
   to determine if candidate MR silencing recapitulated the EpiSC
   differentiation signature. d Functional validation of MRs. 70 selected
   MRs underwent CRISPR-mediated knockout to investigate pluripotency
   defects through cell culture assays and teratoma formation. e Network
   modularity analysis. PLATE-seq expression profiles from MR silencing
   were used to assemble a comprehensive, experimentally-derived
   regulatory network. Four distinct MR communities were identified within
   this network. f Regulatory architecture inference. Network hierarchy
   and centrality were analyzed, followed by further examination of
   community structure.

Computational inference of a primed state pluripotency network

   The unbiased ARACNe-based^[72]24 reverse-engineering of a primed
   pluripotency cell state transcriptional interactome requires a large
   (n > 100) compendium of gene expression profiles, with sufficient
   expression variability to reveal the range of mutual information
   between transcriptional regulators and their potential targets. To
   accomplish this goal, we developed a general strategy to create the
   necessary spectrum of EpiSC-derived expression profiles by combining
   two complementary sources of experimental perturbations. Specifically,
   we combined treatment with 33 small molecules (termed “perturbagens”)
   with 5 conditions known to induce differentiation. To be optimally
   informative and minimize underlying statistical co-dependencies, we
   selected perturbagens that affect an orthogonal range of signaling
   pathways to treat two distinct EpiSC cell lines. Critically, the order
   of treatment was defined based on a pilot experiment demonstrating
   greater variability of gene expression signatures when perturbagen
   treatment preceded induction of differentiation (Methods; Supplementary
   Fig. [73]1a; Supplementary Data [74]1).

   This strategy generated 276 diverse gene expression profiles
   representing the response of the two EpiSC cell lines to combinatorial
   treatment with perturbagens and differentiation inducers (Methods;
   Supplementary Fig. [75]1a, b; Supplementary Data [76]1, [77]2.1). These
   profiles were then analyzed by ARACNe to infer the transcriptional
   targets of 1478 proteins annotated as transcription factors in the Gene
   Ontology^[78]37 database. The resulting EpiSC primed pluripotency
   interactome comprises 911,753 TF →target interactions between 1393 TFs
   and 17,068 candidate transcriptional target genes. This
   first-generation interactome represents a foundational model for the
   subsequent identification of pluripotency maintenance master regulator
   (MR) proteins by VIPER analysis (Supplementary Data [79]3.1).

   We then proceeded to identify candidate MRs for five distinct pathways
   of lineage-specific differentiation using the VIPER algorithm. VIPER
   requires signatures representing two distinct cellular states to
   identify the proteins (candidate MRs) representing mechanistic
   determinants of the transition between them. For this purpose, we
   generated gene expression profiles of EpiSCs following treatment with
   five distinct differentiation protocols at multiple time points
   (ranging from 3 h to 72 h). The resulting time series included a total
   of 144 gene expression profiles (Supplementary Data [80]2.2). These
   protocols were selected to enrich for differentiation along specific
   lineages, including neuroectoderm (RA and SB431542), trophectoderm
   (BMP4), mesoderm (WAFB), and endoderm (WA) (Methods; Supplementary
   Fig. [81]1c–f). To confirm that these differentiation treatments
   induced distinct transcriptional states, we analyzed the resulting time
   series by Principal Components Analysis (PCA) and by expression of
   lineage-specific marker proteins as assessed by immunofluorescence
   (Supplementary Fig. [82]1c–f). For each time point in this analysis, we
   generated a differential gene expression signature for VIPER analysis
   by comparing perturbagen-treated to baseline untreated EpiSC cells. The
   resulting differentiation signatures were then analyzed by VIPER, using
   the EpiSC interactome, to identify candidate MRs for the associated
   cell state transitions.

   We reasoned that the most universal pluripotency MRs would be those
   conserved across most if not all of the five differentiation pathways
   examined. We thus used Fisher’s method to integrate the VIPER
   statistics (p-values) representing the differential activity of 1393
   TFs, as determined by the analysis of each differentiation signature,
   across all time points and differentiation trajectories. The resulting
   ranked list identified 300 candidate MRs whose regulons were
   significantly enriched in differentially expressed genes (p < 10^-4),
   with negative and positive enrichment representing MRs capable of
   inducing or repressing pluripotency, respectively (Fig. [83]2a;
   Supplementary Data [84]4.1). From this initial list, we selected 172
   candidate MRs for further analysis, including most of those that were
   top ranked, as well as additional candidate MRs of interest (see
   Methods section “Selection and evaluation of pluripotency MRs” for
   selection criteria and additional details). Many of the candidate MRs
   nominated by this systematic approach had been previously implicated in
   pluripotency regulation (Supplementary Data [85]4.4), confirming that
   our unbiased analysis was highly effective and suggesting that
   additional candidates not previously reported in the literature might
   include many bona fide pluripotency MRs.

Fig. 2. Identification and validation of candidate master regulators of
primed state pluripotency.

   [86]Fig. 2
   [87]Open in a new tab

   a Examples of transcriptional regulatory proteins being inactivated,
   unaffected, or activated during the differentiation treatments. Each
   heatmap cell represents the VIPER-inferred differential protein
   activity (DA) expressed as normalized Enrichment Score (NES).
   Enrichment plots show the distribution of activated (orange) and
   repressed (purple) transcriptional targets on the 72 h gene expression
   signatures. The right-most column represents the integrated
   differential activity of each protein across all time points and
   treatments (integrated DA). RA: Retinoic acid; WAFB: Wnt3a, Activin A,
   FGF2, and BMP4; WA: Wnt3a and Activin A. b Oct4 staining of EpiSCs
   after shRNA-mediated silencing of selected candidate MRs and vehicle or
   mock-treated controls. Each MR was targeted by two individual hairpins
   in four independent replicates. Scale bars: 100 microns. c Waterfall
   plot showing differential Oct4 protein level  (z-score) after silencing
   of each candidate MR; dashed lines indicate FDR < 0.05 (2-tailed). Raw
   data are provided in Supplementary Data 5.2. d Scatterplot of the
   effect of silencing each MR on Oct4 protein level and the
   differentiation score of EpiSCs, which is determined by the
   recapitulation of the protein activity signature of EpiSC
   differentiation by the silencing experiment (see Methods). We observed
   a significant correlation between these scores (Spearman’s Rho 0.32,
   p < 10^−^4); dotted lines indicate FDR = 0.05 (2-tailed). Candidate MRs
   showing significant validation (differentiation) score are highlighted
   in green. Selected MRs labeled in black indicate MRs functionally
   validated in CRISPR/Cas9 assays, and in blue indicate known key
   pluripotency regulators. Raw data are provided in Supplementary Data
   6.2.

Experimental confirmation of candidate MRs

   To systematically confirm the functional role of the 172 candidate MRs
   in pluripotency control, we employed two independent approaches. In the
   first approach, we examined whether silencing of candidate MRs would
   affect Oct4 protein abundance, which has been frequently employed as a
   readout in pluripotency screens^[88]38–[89]41. However, since there may
   be MRs that act independently of Oct4, we also employed a second assay
   to determine whether silencing of candidate MRs might result in a gene
   expression signature with similarities to the overall EpiSC
   differentiation signature.

   First, we assessed whether lentiviral-mediated shRNA silencing of
   candidate MRs would modulate the expression of endogenous Oct4 (Pou5f1)
   at the protein level (Fig. [90]2b, c). To minimize contributions from
   off-target effects, we used multiple independent hairpins for each
   candidate MR, with 92% of the candidate MRs targeted by at least 3
   hairpins (Supplementary Fig. [91]1g; Supplementary Data [92]5.1). We
   quantified Oct4 protein levels by immunofluorescence staining, and
   calculated z-scores representing its differential protein expression
   relative to non-targeting controls (Fig. [93]2c). Using this assay, we
   found that 117 (68%) and 15 (8.7%) of the candidate MRs down-regulated
   and up-regulated Oct4 protein expression when silenced, respectively
   (FDR < 0.05) (Fig. [94]2c; Supplementary Data [95]5.2). Notably, the
   analysis identified Oct4 itself as well as other well-known MRs such as
   Otx2 and Stat3^[96]42,[97]43 as pluripotency regulators (Fig. [98]2b,
   c).

   Secondly, to confirm the roles of candidate MRs that modulate
   pluripotency independently of their influence on Oct4 protein
   expression, we developed an assay to determine whether their silencing
   might recapitulate the EpiSC differentiation signature. This assay
   utilized PLATE-seq, a multiplexed barcoded RNA-sequencing
   protocol^[99]44 that generates low-depth (1 ~ 2 million reads/sample)
   gene expression profiles. We used PLATE-seq to generate gene expression
   profiles produced by 596 independent shRNA-mediated silencing assays
   that targeted 154 of the 172 candidate MRs in EpiSC cells; the other 18
   candidate MRs were either targeted unsuccessfully or lacked a regulon
   for analysis (Methods; Supplementary Data [100]5.1). Next, we performed
   VIPER-based analysis of the PLATE-seq profiles, using the EpiSC
   interactome, to assess the differential activity of all candidate
   pluripotency MRs, as induced by each silencing assay vs. the pool of
   non-targeting controls. The resulting differential MR activity profiles
   were compared to those obtained along the five lineage-specific
   differentiation time courses by enrichment analysis using the aREA
   algorithm^[101]25. The resulting Normalized Enrichment Scores (NES)
   from all hairpins targeting the same candidate MR were integrated
   across all five differentiation signatures, using a weighted
   implementation of Stouffer’s method. Weights were proportional to the
   effect of each hairpin on the activity of its target protein, such that
   hairpins inducing less-efficient silencing had lower weight
   (Supplementary Data [102]6.1).

   Interestingly, we have previously shown that up to ~20% of candidate
   VIPER-inferred MRs may be correctly inferred but with an opposite
   directionality (inverted NES sign), due to the presence of
   autoregulatory loops that invert the relationship between mRNA
   expression and the activity of its associated protein at steady state
   (equilibrium)^[103]25. As a result, these analyses were instrumental in
   confirming the role of candidate MRs as either activators or
   repressors. Furthermore, we also identified and corrected the activity
   of 27 of the tested MRs (18%) that had been inferred with inverted
   activity (Methods; Supplementary Data [104]6.2).

   To determine the number of candidate MRs whose induced differentiation
   signature was consistent with or divergent from their ability to
   modulate Oct4 protein expression, we computed a Marker Score (MS) and a
   Differentiation Score (DS; see Methods for details) (Fig. [105]2d;
   Supplementary Data [106]6.1). As expected, the two scores were highly
   correlated (p = 4.8 x 10^−^5), with 85 of 154 candidate MRs tested by
   both assays (55%, p < 0.05, 1-tailed FET) showing concordant scores
   (FDR < 0.05, by 2-tail analysis). Notably, an additional 47 candidate
   MRs (31%) presented with a significant DS but not MS score, suggesting
   that these genes may regulate primed state pluripotency in a
   Oct4-independent manner. In total, we confirmed 132 MRs (86%) by the
   more conservative test (DS analysis, green dots in Fig. [107]2d), which
   were selected as the basis for further investigation.

Differentiation analysis of primed state pluripotency MRs

   Since the ability of an MR to induce the experimentally determined
   differentiation signature upon silencing suggests a potential
   mechanistic role in pluripotency control, we tested whether silencing
   of each confirmed MR was sufficient to induce differentiation in
   culture, using several cell-based assays. For this purpose, we focused
   on 70 of the 132 confirmed MRs that were identified as novel (i.e., not
   previously reported in the pluripotency literature at the time the list
   was compiled), as well as on 4 additional genes (Utf1, Hes6, Rarb, and
   Ilf3) selected from the MRs previously implicated in pluripotency
   regulation as positive controls (Supplementary Data [108]4.2). For
   functional validation, we performed CRISPR-mediated gene knock-out
   assays (CRISPR-KO) using a lentiviral system^[109]45 to generate mutant
   alleles for each MR, thus yielding 1150 clonal EpiSC lines and 128
   scramble or mock control lines (Supplementary Data [110]7.1). Of the
   1150 clonal lines, 311 contained monoallelic frameshift mutations,
   while 319 contained biallelic in-frame mutations that may not be
   loss-of-function. The remaining 520 lines contained biallelic
   frameshift mutations and were thus identified as candidate null mutants
   for 62 of the targeted novel MRs and the 4 selected known MRs. We
   failed to derive candidate null lines for the other 8 MRs, 3 of which
   (Eno1b, Zfp598, and Zfp184) might be due to the low number of clonal
   sgRNA-targeted lines obtained (Supplementary Data [111]7.1). However,
   of the other 5 remaining MRs (Tada2a, Phf5a, Zpr1, Zfp207 and Rrn3), we
   derived ≥ 20 clonal targeted lines for each, none of which represented
   a biallelic frameshift mutant, suggesting that null mutants for these
   genes may be impaired in self-renewal and/or clonogenicity (p < 0.01 by
   Chi-square test; Supplementary Fig. [112]2a).

   We screened the candidate null mutant and control targeted lines for
   their morphological phenotypes in feeder-free culture conditions, in
   which wild-type EpiSC cells display round and densely packed colonies
   with large nuclei, prominent nucleoli, and scant cytoplasm. We found
   that candidate null mutants for 15 MRs displayed a higher frequency of
   abnormal morphologies relative to controls (Fig. [113]3a, Supplementary
   Figs. [114]3 and [115]4a). In addition, shRNA-mediated knock-down of
   the 5 MRs that failed to generate biallelic frameshift mutants also
   resulted in abnormal morphologies (Supplementary Figs. [116]3 and
   [117]4a). Overall, 3 of the 4 (75%) MRs previously implicated in
   pluripotency regulation as well as 17 of the 70 (24%) not previously
   reported MRs displayed morphological phenotypes in this analysis,
   demonstrating its robustness in identifying pluripotency regulators
   validated by multiple assays. For further analyses of these 20 MRs, we
   selected a total of 40 knock-out (KO) lines for the 15 MRs with
   biallelic frameshift mutations and 11 knock-down (KD) lines for the
   remaining 5 MRs, as well as 9 negative controls (Supplementary
   Fig. [118]2b; Supplementary Data [119]7). To assess whether these KO/KD
   lines were truly null, we performed qRT-PCR analyses of targeted MR
   expression, which showed significantly lower expression in 46 of the 51
   KO/KD lines (Supplementary Fig. [120]5), consistent with increased RNA
   degradation induced by frameshift mutations or RNA interference.

Fig. 3. Functional validation of two master regulators.

   [121]Fig. 3
   [122]Open in a new tab

   a Morphology of KO or control EpiSC lines; representative images for
   independent lines are shown. b Fold change in cell number after five
   days of colony formation. c Number of alkaline phosphatase positive
   colonies. d Time line of neuroectoderm differentiation. e
   Immunofluorescence staining of biological replicates (clonal lines) for
   lineage-specific markers. Each MR was characterized by two clonal KO
   cell lines in at least three independent replicates. f Time line of
   mesendoderm differentiation. g Staining for lineage-specific markers;
   number of replicates as in (e). h Western blots for Cbl and Zc3h13.
   Source data are provided as a Source Data file. i Schematic of the
   teratoma assay. j Representative images of mutant and control teratomas
   dissected from contralateral legs of the same animal. k Teratoma
   weight. l Hematoxylin and eosin (H&E) and immunohistochemical staining
   of serial sections. m, n Fraction of area showing ectoderm (neural
   rosette) (m) and endoderm (ciliated epithelium) features (n). Line and
   whiskers show the mean ± s.d.; statistical significance is shown as
   FDR, 2-tailed U-test. Scale bars: 100 microns.

   We analyzed the self-renewal and differentiation potential of the 20
   MRs using in vitro functional assays. We found that KO/KD lines for 18
   of the 20 MRs presented lower colony formation efficiency, with 13 of
   these displaying significantly fewer alkaline phosphatase (AP)-high
   colonies (Fig. [123]3a–c, Supplementary Figs. [124]3 and [125]4a–c;
   2-tailed U-test; FDR < 0.05). We used embryoid body (EB) formation to
   assess pluripotency exit, and found significantly reduced EB formation
   efficiency in KO/KD lines for 18 MRs (Fig. [126]3a, Supplementary
   Figs. [127]3 and [128]4a). Finally, we performed directed
   differentiation to neuroectoderm or mesendoderm followed by
   immunostaining for lineage-specific markers, and observed significant
   effects on neuroectoderm differentiation in 16 KO/KD lines, and on
   mesendoderm differentiation in 8 KO/KD lines (Fig. [129]3d–g,
   Supplementary Figs. [130]3 and [131]4a, d, e). Interestingly, none of
   the 20 functionally validated MRs displayed similar effects across all
   assays, indicating the functional diversity of individual MRs in
   regulating distinct aspects of pluripotency and lineage
   differentiation, as well as the robustness of our MR discovery
   pipeline. Finally, during the time span of our experiments, 6 of the 20
   validated MRs (Irf6, Jazf1, Peg3, Phf5a, Zc3h13, and Zfp207) were
   identified as pluripotency relevant, albeit in non-EpiSC
   contexts^[132]46–[133]51, providing independent confirmation of our
   systematic MR analyses.

   We focused in greater detail on 2 MRs that consistently generated
   highly significant results in nearly all assays, Cbl and Zc3h13. Cbl is
   a member of a family of three closely related RING finger E3 ubiquitin
   ligases that are responsible for proteasome-mediated substrate
   degradation^[134]52; although Cbl null mutant mice are homozygous
   viable with T cell defects, Cbl; Cbl-b double homozygotes display an
   uncharacterized lethality prior to mid-gestation^[135]53. Zc3h13 is a
   component of the WMM (Wtap-Mettl3-Mettl14) complex that mediates
   N6-methyladenosine (m6A) methylation of RNA, and its knock-down
   decreases self-renewal of mouse embryonic stem cells^[136]47; in
   addition, we have found that Zc3h13 null mutants display a
   peri-implantation lethal phenotype (N. Chirathivat et al. in
   preparation). We confirmed that the KO EpiSC lines for Cbl and Zc3h13
   were true null alleles by Western blotting (Fig. [137]3h), and that Cbl
   and Zc3h13 mutant EpiSCs displayed altered differentiation of
   neuroectoderm and mesendoderm at the mRNA level by qRT-PCR
   (Supplementary Fig. [138]6). To determine whether loss-of-function for
   Cbl and Zc3h13 affected differentiation potential in vivo, we performed
   grafts of KO lines to generate teratomas and found that both Cbl and
   Zc3h13 KO teratomas were smaller in size and weight (Fig. [139]3i–k).
   By H&E staining and marker immunostaining of tissue sections, we found
   that Cbl and Zc3h13 KO teratomas displayed increased neural rosette
   formation and neural marker expression, whereas Cbl KO but not Zc3h13
   KO resulted in increased ciliated epithelium, consistent with increased
   epithelial marker expression (Fig. [140]3l–n, Supplementary
   Fig. [141]7). Thus, both Cbl and Zc3h13 are required for pluripotency
   maintenance in teratomas.

An experimentally-based primed state pluripotency network

   Having computationally identified a large number of previously known as
   well as unknown MRs for primed state pluripotency, and functionally
   validated many of the previously unknown ones, we next sought to
   elucidate the architecture of the regulatory network that governs the
   primed state pluripotency state. To accomplish this goal, we derived an
   experimentally-refined, second-generation primed pluripotency
   transcriptional interactome, by leveraging the PLATE-seq gene
   expression profiles representing the response of EpiSC cells to the
   silencing of each of the 132 confirmed MRs (green dots in
   Fig. [142]2d). Specifically, these data were used to refine the ARACNe
   regulon of each confirmed MR by integrating the statistical
   significance of (a) the candidate target’s differential expression
   following MR silencing and (b) their mutual information. By
   representing the causal response of the cell to silencing of each MR,
   this strategy also helped infer the directionality of each interaction.
   This analysis produced a comprehensive set of experimentally refined
   regulons, resulting in an “experimentally curated,” causal interactome
   (ECC-interactome) (Supplementary Data [143]3.2).

   Using the ECC-interactome, we evaluated whether silencing a specific MR
   would affect the VIPER-inferred activity of another MR, as assessed
   using the experimentally refined regulons (Supplementary
   Data [144]3.2). This analysis yielded a pluripotency focused,
   experimentally-based EpiSC regulatory MR network, in which MRs
   represent nodes in the network and MR[1] → MR[2] interactions
   correspond to directed edges that are defined by a significant change
   in MR[2] protein activity (FDR < 10^−^10) when MR[1] is silenced
   (Fig. [145]4a). Specifically, the EpiSC pluripotency network contains
   1,273 MR → MR interactions (edges) between 120 of the 132 confirmed MRs
   (nodes) (Fig. [146]4a and Supplementary Data [147]8).

Fig. 4. Regulatory network model for primed state pluripotency.

   [148]Fig. 4
   [149]Open in a new tab

   a Regulatory network of primed state pluripotency with 120 nodes and
   1273 unidirectional edges. Only significant effects (FDR < 10^−10,
   2-tailed aREA test) are included. The size of each node is proportional
   to its Regularized Out-Degree (ROD) score; colors indicate community
   assignment (strong, FDR < 0.01; weak, 0.01 < FDR < 0.1; unassigned
   FDR > 0.1). Six outlier MRs showing the highest ROD scores (Speakers)
   are highlighted. b Heatmap showing the effect of silencing each MR on
   the activity of all other MRs. The silenced MRs are represented by the
   columns, and the rows report activity changes for the indicated
   proteins. The 77 MRs (76 strong community members and Utf1) are grouped
   by their community membership. The color scale indicates statistical
   significance (-log[10](FDR), 2-tailed aREA test) for each MR protein
   differential activity; only significant changes in protein activity
   (FDR < 10^−^10) are shown.

   To investigate the architecture of this EpiSC pluripotency MR network,
   we assessed its modularity, using Louvain’s community detection
   algorithm^[150]54. The analysis showed that 74% (n = 89/120) of the
   confirmed MRs were organized into four distinct communities
   (Supplementary Fig. [151]8a). We then assessed the association between
   each MR and each community, based on the statistical significance of
   the enrichment of the MRs representing its first neighbors (i.e., the
   MRs directly connected to it) in each community. The analysis
   identified 76 MRs with strong community association (FDR < 0.01, 1-tail
   Fisher’s Exact Test, FET), 13 MRs with weak association
   (0.01 ≤ FDR < 0.1), and 31 MRs lacking association (FDR ≥ 0.1) with any
   community (Fig. [152]4b; Supplementary Data [153]8.4; Supplementary
   Fig. [154]8a).

The regulatory logic of primed state pluripotency

   We next assessed whether individual MRs may play distinct roles in the
   EpiSC pluripotency MR network, based on their interactions with other
   MRs. For this purpose, we assessed the In-Degree and Out-Degree metrics
   for each MR, corresponding to its number of incoming ( → MR) and
   outgoing (MR → ) regulatory interactions; we also computed a
   “Regularized Out-Degree” (ROD) metric, which corrects for the different
   number of interactions of each MR (see Methods). The statistical
   significance of their In-Degree vs. Out-Degree helped stratify MRs into
   three classes, corresponding to “Speakers” (significantly higher
   Out-Degree), “Listeners” (significantly higher In-Degree), and
   “Communicators” (no significant difference) (Fig. [155]4a;
   Fig. [156]5a–c), thus illuminating their role in mediating the flow of
   information in pluripotent cells. We could also identify “Mediator” MRs
   based on their “Betweenness” centrality, a metric aimed at assessing
   the number of shortest paths between any two MRs going through an MR of
   interest, thus capturing its influence over the flow of information in
   the network (Fig. [157]5a, c; Supplementary Fig. [158]9a, b).
   Importantly, for the 77 Core MRs (76 with strong community association
   plus Utf1), there was no correlation of their average protein activity,
   community score, ROD, or Betweenness with the silencing efficiency of
   shRNA hairpins used in our PLATE-seq analysis (Rho < 0.22, p > 0.01;
   Supplementary Fig. [159]9c–f), suggesting the absence of assay-derived
   bias. However, we did observe ROD differences between communities 1 and
   4, as well as differences in interaction degree between communities
   (Fig. [160]5d–f), suggesting differences in the connectivity of these
   communities.

Fig. 5. Community structure of the primed state pluripotency network.

   [161]Fig. 5
   [162]Open in a new tab

   a Schematic representation of parameters for defining node hierarchy
   based on their regularized out-degree (ROD) and betweenness score. b
   Distribution density for ROD across all MRs. The vertical dotted lines
   indicate the point of equal probability for a mixture of Gaussian
   models fitted to the data, defining the boundaries for Listeners,
   Communicators and Speakers. c Heatmap showing network parameters for
   the 77 MRs, showing scaled values for Degree, ROD and betweenness
   scores. d–f Violin plots showing distribution of regularized out-degree
   (d), betweenness (e), and degree (f) (FDR < 0.05, 2-tailed U-test). g
   Heatmap of the integrated meta-protein activity change for each
   community across the five differentiation treatments. Each cell
   represents the weighted average of the relative change in activity vs.
   untreated EpiSC controls across all MRs of each community, with the
   weight calculated based on the MR community score (Supplementary
   Fig. [163]8). h Heatmap showing GOBP pathways preferentially enriched
   (-log[10](FDR), 1-tailed aREA test) on each of the four communities.

   These analyses identified 6 Speaker, 75 Listener, and 39 Communicator
   MRs, as well as 4 Mediator MRs (Oct4, Cited2, Utf1, and Otx2)
   (Figs. [164]4a, [165]5a, c; Supplementary Fig. [166]9b). The Speakers
   include three ETS transcription factors (Etv1, Etv6, and Etv4), likely
   due to their roles in mediating FGF signaling, which is involved in
   maintaining primed state pluripotency^[167]7. Interestingly, Zfp207,
   which has only recently been characterized in regulation of
   pluripotency^[168]48,[169]55, was identified as a Speaker, while Utf1
   was the only Speaker or Mediator not strongly associated with any of
   the four communities. Notably, two well-characterized pluripotency
   regulators, Oct4 and Otx2, were identified as both Speakers and
   Mediators (Fig. [170]4a; Supplementary Fig. [171]9b; Supplementary
   Data [172]8.2), reinforcing their significance as key pluripotency
   regulators.

   Next, we asked whether the arrangement of MRs into communities might
   reflect differences in their functions in pluripotency maintenance or
   lineage specification. Therefore, we examined the differential activity
   of each MR in the network along the five lineage differentiation time
   courses. Interestingly, nearly all MRs displayed similar patterns, with
   ~10% and ~90% of the MRs showing time-dependent increased and decreased
   activity along the five differentiation treatments, respectively,
   suggesting a highly coordinated program (Supplementary Fig. [173]8b).
   To determine whether individual communities may be associated with
   specific lineage differentiation paths, we then integrated the activity
   of the MRs in each community weighted by their community scores
   (Supplementary Fig. [174]8a). Interestingly, the average activity of
   all four communities was significantly reduced across all five lineage
   differentiation signatures (Fig. [175]5g), indicating that the four
   communities are not associated with any specific lineage
   differentiation trajectory.

   Finally, to determine whether individual communities oversee distinct
   biological functions, we performed pathway enrichment analyses on the
   gene expression signature obtained by silencing of each individual MR.
   We then integrated the enrichment scores for the MRs assigned to each
   community while weighting their contribution by the community
   association scores (Fig. [176]5h and Supplementary Fig. [177]8a). This
   analysis identified 250 Gene Ontology-Biological Process (GOBP) terms
   enriched in at least one community, with 57 enriched in all four
   (FDR < 0.01, 1-tailed aREA test; Supplementary Data [178]9.1). Common
   biological terms included “developmental processes,” “nucleotide
   processing,” “protein modification,” and “cellular transportation
   events,” all of which are involved in pluripotency regulation. In
   contrast, 81 GOBP terms were differentially enriched only in a single
   community (Fig. [179]5h; Supplementary Data [180]9.2). Community 1, for
   instance, is enriched in “nucleotide processing,” “protein
   modification,” and “other intracellular activities,” Community 2 in a
   range of developmental processes and cellular signaling pathways, and
   Community 4 in metabolic pathways. Interestingly, Community 3 was only
   associated with a single category, “platelet formation,” thus defying a
   more general characterization. These findings suggest that while
   different communities are associated with distinct GO biological
   processes, changes in the activity of their constituent MR
   proteins—primarily progressive inactivation occurring over the short
   time scale of our study—were observed similarly across all five
   differentiation protocols, as indicated by Fig. [181]5g. Thus, pathway
   enrichment analysis revealed both cooperative and unique roles of MR
   communities in regulating primed state pluripotency.

Discussion

   Pluripotent cell states and dynamic transitions between these and more
   differentiated states have long been utilized as model systems for
   reconstruction of gene regulatory networks. For example, development of
   the CellNet platform revealed a regulatory network for somatic cell
   reprogramming and identified factors modulating the efficacy of these
   cell fate conversions^[182]56–[183]59. However, earlier studies of
   pluripotency networks have focused only on a relatively small set of
   regulators, many of which had been previously identified in the
   literature. Our current network for primed state pluripotency is
   notable for its size, unbiased generation, and derivation by
   computation followed by direct experimentation. Furthermore, the size
   of our network has allowed analysis of its overall topology and
   elucidation of its regulatory logic.

   Our analysis includes several advances that, taken together, produce an
   innovative and highly generalizable methodology. Specifically: (a)
   Nodes (i.e., MRs) in the regulatory network were identified by an
   unbiased methodology that is not based on prior knowledge of gene
   function; (b) Node identification was not dependent upon differential
   expression of the corresponding gene, but rather on the role of the
   encoded protein in regulating the overall transcriptional state of the
   pluripotent cell (differential activity), as determined by the
   differential expression of its transcriptional targets. This is
   critical since transcription factor activity is often regulated
   post-translationally rather than transcriptionally; (c) Network
   interactions (i.e., edges) and causality were dissected based on the
   effect of regulatory proteins on the inferred activity of other
   regulatory proteins, following experimental silencing of 132 confirmed
   MRs identified by the analysis; (d) Assessment of pluripotency
   integrated the effect of individual regulators on Oct4 protein
   expression as well as on induction of lineage-specific differentiation
   signatures, providing an approach to identify both Oct-4 dependent and
   Oct4-independent pluripotency regulators; (e) Analyses of network
   architecture utilized concepts of centrality and betweenness, which are
   critical properties of social network interactions^[184]60,[185]61.
   This approach is instrumental in illuminating the control of
   information flow in primed pluripotent cells, based on the well-studied
   concept in graph theory that some nodes may play a critical role in
   mediating the information flow between communities (Mediators) or in
   controlling entire programs (Speakers)^[186]60. In combination, these
   features result in an unbiased primed state pluripotency network
   architecture that is substantially larger than described in previous
   studies.

   We have shown that 76 of the 120 regulatory proteins that control
   pluripotency in our network are organized within four regulatory
   modules, termed “communities,” where the highly interconnected
   architecture of each community is countered by a much sparser
   connectivity between communities. This network architecture contrasts
   significantly with the previously hypothesized “hierarchy” and
   “balance” models. Instead, our findings suggest that, in mouse primed
   state pluripotent cells, the “hierarchy” and “balance” models coexist
   in a hybrid form that we term “communal interaction”. Specifically, we
   propose that primed state pluripotency is maintained by the balanced
   activities of four regulatory communities, with a small number of
   community members providing hubs for information flow. Each community
   represents a distinct functional unit yet is not closely associated
   with lineage-specific differentiation, as originally proposed in the
   balance model.

   In principle, identification of 8 MR proteins as either Speakers or
   Mediators might lend support to the existence of core factors in the
   network. Notably, Mediators and Speakers include Oct4 and Otx2,
   consistent with their essential roles in primed state
   pluripotency^[187]42,[188]62,[189]63. In addition, the identification
   of ETS transcription factors Etv1, Etv4, and Etv6 as Speakers is
   consistent with the central role of FGF signaling in maintaining primed
   state pluripotency^[190]7. However, previous studies have not suggested
   a critical role for Zfp207, Utf1, and Cited2 in controlling primed
   state pluripotency, suggesting that these proteins may deserve further
   investigation. Importantly, EpiSCs differ from mESCs in their lack of
   requirement for key regulators of the naïve pluripotent state such as
   Sox2 and Nanog^[191]64,[192]65, and notably these genes were not
   detected as having central roles. Overall, however, the strong
   association of 7 of the 8 Speaker/Mediator MRs with one of the 4
   communities argues against a dominant role of these MRs. Rather, the
   nearly equivalent levels of hierarchical organization and centrality
   among the four communities (Fig. [193]5d–f) supports a more
   decentralized architecture of the primed state pluripotency network.

   It is important to note some limitations of the approach that may
   affect the completeness and accuracy of our regulatory network for
   primed state pluripotency. First, the analysis is limited to 1393 genes
   annotated as transcription factors. Second, our original EpiSC
   interactome was derived from microarray data rather than from RNA-seq,
   which could have produced higher-quality transcriptomic profiles that
   might have improved the resolution of the network. Third, MRs are often
   synergistic, even if they fail to produce an observable phenotype on an
   individual basis^[194]29,[195]30; consequently some of the MRs
   identified by this study may lack a detectable phenotype in vitro
   and/or in vivo, suggesting that the number of MRs contributing to
   primed state pluripotency maintenance may be larger than that validated
   by our functional assays. Finally, this study was performed using bulk,
   rather than single-cell profiles, and consequently may not account for
   potential heterogeneity within EpiSC cultures.

   More broadly, our work provides a general road-map for elucidating the
   regulatory logic of a complex cell state. In particular, when combined
   with gene expression and protein activity analysis, our methodology for
   low-cost, high-throughput RNA-sequencing facilitates practical direct
   experimentation to infer or refine the architecture of any regulatory
   network. The inclusion of ATAC-seq data, in particular for targets of
   transcription factors with known binding motifs, might further enhance
   this strategy to strengthen the network inference. Furthermore, our
   study provides a rich and valuable resource to investigate molecular
   mechanisms that govern stemness. Future studies will promote
   development of innovative strategies to control the dynamics of
   pluripotency and may therefore hold considerable promise for
   translational applications in regenerative medicine.

Methods

EpiSC derivation and maintenance

   All experiments utilized the EpiSC-5 line^[196]8, which was derived
   from 129SvEv mice, or the EpiSC-G line. As described in the sections
   below, both lines were used for generation of the expression profiles
   for initial interactome assembly using the ARACNe algorithm, but the
   EpiSC-5 line was solely used for all other assays, including time
   course differentiation signatures, shRNA-based validation, PLATE-seq
   assays, and CRISPR-mediated validation.

   We derived the EpiSC-G line from C57BL/6 J (Jackson Laboratory) mice as
   described^[197]7,[198]8, with the following minor modifications. In
   brief, C57BL/6 pregnant female mice were euthanized at 5.5 dpc, and
   pre-gastrulation epiblast was dissected intact and separated from
   extraembryonic ectoderm using glass needles in calcium- and
   magnesium-free PBS. Dissected tissue was treated with Cell Dissociation
   Buffer (Gibco 13151-014) at 4 °C for 15 min. Epiblast cells were
   cultured on inactivated mouse embryonic fibroblasts (MEFs) in EpiSC
   basal medium containing DMEM/F12, 1x GlutaMAX supplement (Gibco
   10565-018), 20% knockout serum replacement (Gibco 10828-010), 1x MEM
   non-essential amino acids (Gibco 11140-050), 0.1 mM 2-Mercaptoethanol
   (Gibco 21985-023), 10 ng/ml Activin (Gemini 300-356 P), and 12.5 ng/ml
   FGF2 (Peprotech 100-18B). Every 3-4 days, cells were dissociated with
   2.5 U/ml Dispase (Stemcell Technology 07913), and re-plated as small
   clumps at ~1:6 ratio onto a new MEF layer. After 8 ~ 10 passages, cells
   were dissociated, aliquoted, and frozen in cryovials in 50% basal
   medium/40% knockout serum replacement/10% DMSO. The established cell
   line was verified for normal male karyotype and tested negative for
   mycoplasma.

   EpiSC lines were cultured under feeder-free conditions in
   Matrigel-coated plates with filtered MEF-conditioned media (EpiSC basal
   media harvested after culturing with inactivated MEFs for 24 h)
   supplemented with 12.5 ng/ml FGF2 and 10 ng/ml Activin (CMAF). In
   experiments requiring cell quantitation, cells were dissociated with
   0.05% Trypsin or TrypLE and counted with an automated cell counter
   (Bio-Rad TC20). Cells were then re-plated onto Matrigel-coated plates
   with CMAF supplemented with 10 uM Rock inhibitor Y-27632.

EpiSC differentiation for interactome generation and analysis

   To generate gene expression profiles for interactome assembly, we used
   the EpiSC-5^[199]8 and EpiSC-G lines. Cells were plated at a density of
   50,000 cells/well in 6-well plates coated with Matrigel and 2 ml
   CMAF + 10 µM Y27632. At 24 h after plating, cells were maintained in
   CMAF or switched to EpiSC basal media (Supplementary Data [200]2.1),
   followed by addition of each of 33 perturbagens (Supplementary
   Data [201]1). Each one of five differentiation-inducing treatments
   (Supplementary Data [202]1) was then added at 12 h after perturbagen
   addition; finally, cells were harvested at 60 h after initial plating.
   For embryoid body formation, perturbagen-treated EpiSCs were
   dissociated and plated onto AggreWell^TM 400 plates (StemCell
   Technology 34411), with 2 ml of EpiSC basal media at 1.2 million
   cells/well. Plates were spun for 5 min at 1500 rpm to settle the cells.
   At 24 h after plating, the nascent embryoid bodies were transferred to
   6-well, low-attachment plate cultures (Corning 3471) for an additional
   12 h before harvest. Gene expression profiles were generated as
   described in the transcriptomic profiling section.

   Differentiation time course data were generated using the EpiSC-5 line.
   For retinoic acid treatment^[203]66, cells were cultured with 5 µM
   retinoic acid in MEF conditioned media without supplementation of FGF2
   or Activin A (CM) for 72 h. For BMP4 treatment^[204]67, cells were
   cultured for 72 h in CM + 100 ng/ml BMP4. For SB431542
   treatment^[205]68, cells were cultured for 72 h with 10 nM SB431542 in
   culture media containing DMEM/F12 with 1x GlutaMAX/1x
   insulin-transferrin-selenium/1x NEAA/2% B-27 (Gibco 17504-044)/90 µM
   2-Merceptoethanol. For endoderm differentiation^[206]69, cells were
   first cultured for 24 h in RPMI (Gibco 21870-076)/GlutaMAX/Activin A
   (100 ng/ml)/Wnt3a (25 ng/ml)/0.075 mM EGTA, followed by another 48 h in
   RPMI/GlutaMAX/Activin A (100 ng/ml)/0.2% FBS. Mesoderm differentiation
   was performed according to^[207]70, by culturing cells in basal media +
   25 ng/ml Wnt3a + 50 ng/ml of Activin A on Day 1; Basal media + 25 ng/ml
   Wnt3a + 25 ng/ml Activin A + 20 ng/ml FGF2 on Day 2, and basal media +
   25 ng/ml Wnt3a + 10 ng/ml Activin A + 20 ng/ml FGF2 + 40 ng/ml BMP4 on
   Day 3. Gene expression profiles were generated as described in the
   transcriptomic profiling section.

Transcriptomic profiling

   To generate gene expression profiles for interactome assembly and MR
   analysis, cells were lysed with Trizol (ThermoFisher 15596018), and
   transferred into Eppendorf tubes for short-term storage at −80 °C. We
   used 600 µl of cell lysate for total RNA extraction (Clontech 740955),
   followed by processing for microarray (for interactome assembly) or RNA
   sequencing (for differentiation signatures). For microarray analysis,
   total RNA was biotin-labeled using the Illumina TotalPrep RNA
   Amplification Kit (ThermoFisher AMIL1791) and hybridized on mouseWG-6
   v2 BeadArrays (Illumina). Slides were scanned using an iScan (Illumina
   SY-101-1001) to generate the resulting files.

   For RNA sequencing, samples were submitted to the Columbia Genome
   Center for library preparation and sequencing. In brief, mRNA was
   enriched by poly-A pull-down, and library synthesis was performed using
   the Illumina TruSeq RNA prep kit (Illumina RS-122-2001). Libraries were
   pooled and sequenced on the Illumina HiSeq 2500 or 4000 platforms,
   yielding approximately 30 million single-ended reads per sample.

EpiSC perturbation (pilot project)

   To guide the design of the EpiSC perturbation assays for interactome
   generation, we performed a pilot experiment. Specifically, Illumina
   BeadArrays (mouse WG-6, Illumina) were used to profile the
   transcriptome of 33 independent EpiSC samples. This experiment was
   divided in three arms. The first arm consisted of cells perturbed with
   10 small molecule compounds (100 nM rapamycin, 250 nM PF573228, 2 μM
   dorsomorphin dihydrochloride, 25 μM BAY 11-7082, 50 μM forskolin, 5 μM
   BAPTA-AM, 500 pM docetaxel, 50 nM wortmannin, 2 μM purmorphamine, and 2
   μM SU5402) and vehicle control for 24 h. For the second arm, cells were
   first perturbed for 36 h with 5 μM RA and for 24 h with 10 small
   molecule compounds and vehicle control. Specifically, the cells were
   exposed to either control media or RA for 12 h, small molecule
   compounds were then added, and cells were further incubated for 24 h.
   The third arm consisted of the same RA/compounds combination experiment
   but was performed using an inverted perturbagen vs. RA treatment order
   (i.e., 24 h RA and 36 h perturbagen treatment). Gene expression
   profiles were generated using an iScan BeadArray scanner (Illumina) and
   pre-processed by variance stabilization and robust spline
   normalization, as implemented by the lumi R package^[208]71. Gene
   expression profiles are available from the Gene Expression Omnibus
   ([209]GSE197632).

   A 2D plot (Supplementary Fig. [210]1a)—obtained by projecting the data
   on the first two principal components, after scaling by subtracting the
   mean and dividing by the standard deviation of each gene—showed that
   treating first with perturbagens and then with RA induced greater data
   dynamic range than the opposite order. We thus decided to perform the
   larger study by first treating with each perturbagen for a total of
   36 h, and with each morphogen for 24 h, thus maximizing the dynamic
   range for the reverse engineering of an EpiSC-specific interactome (see
   below). In particular, the 36 h time point was selected to maximize
   gene expression dynamic range while minimizing the likelihood of
   chromatin remodeling, which otherwise would alter the resulting network
   topology.

Dimensionality reduction analysis

   To project the data in 2 dimensions, we used Principal Component
   Analysis (PCA), as implemented by the prcomp function of the stats
   package in R-system. We also used the ‘umap’ v0.2.7.0 R package to
   perform Uniform Manifold Approximation and Projection (UMAP) analysis,
   as available from the CRAN repository
   ([211]https://cran.r-project.org/web/packages/umap/index.html).

Generating an EpiSC context-specific interactome

   The transcriptomes of the two EpiSC cell lines—following treatment with
   combinations of 33 perturbagen (36 h) and 5 morphogens (24 h), in two
   growth media conditions (CMAF and Basal)—were profiled by Illumina
   BeadArrays (mouse WG-6 v2, Illumina). Poor quality profiles were
   removed using the built-in Illumina BeadArray software, for a total of
   276 high-quality gene expression profiles. (Supplementary
   Data [212]2.1). Gene expression profiles were generated using an iScan
   BeadArray scanner (Illumina) and pre-processed by variance
   stabilization and robust spline normalization, as implemented in the
   lumi R package^[213]71. Gene expression profiles are available from the
   Gene Expression Omnibus ([214]GSE197414).

   The profiles were then analyzed using the ARACNe
   algorithm^[215]24,[216]72 to reverse engineer an EpiSC-specific
   transcriptional interactome representing the transcriptional targets of
   1478 regulatory proteins. The latter included those annotated as
   GO:0003700 (transcription factor activity), GO:0004677 (DNA binding),
   and GO:0030528 (transcription regulator activity), or as GO:0004677 and
   GO:0045449 (regulation of transcription) in the Gene Ontology (GO)
   Molecular Function database^[217]37. To generate an optimal consensus
   network, we performed 100 bootstrap iterations of the ARACNe analysis.
   Parameters were set to DPI = 0 (data processing inequality tolerance)
   and MI P-value p ≤ 10^−8 (mutual information threshold). The resulting
   interactome comprises 911,753 transcriptional interactions
   (Supplementary Data [218]3.1).

Predicting master regulators of pluripotency and differentiation

   To generate time courses representing the path from a pluripotent to a
   differentiated cell state, we generated RNA-Seq profiles of EpiSC-5
   cells undergoing 5 distinct differentiation treatments (differentiation
   arms), at 3 h, 6 h, 12 h, 18 h, 24 h, 36 h, 48 h and 72 h
   (Supplementary Data [219]2.2). RNA-Seq profiles were generated with the
   Illumina TruSeq protocol, using a HiSeq 2000 instrument (Illumina), at
   the Columbia Sulzberger Genome Center. Sequencing data were mapped to
   the mouse genome MGSCv37 with TopHat v1.3.3^[220]73. Reads mapping to
   known genes, based on Entrez gene identifiers, were then counted using
   the GenomicFeatures R package, from Bioconductor^[221]74. Summarized
   raw counts for 23,283 known genes in 144 samples are available from the
   Gene Expression Omnibus ([222]GSE199114). We computed a size factor for
   each sample using the median log-ratio method and identified as
   outliers the samples showing residuals from a linear fit of the
   computed size factors vs. sequencing depth (total number of mapped
   reads) greater than 50%. Samples identified as outliers were removed
   from further analyses. Gene expression data were then normalized by
   equivariance transformation, based on the negative binomial
   distribution, with the DESeq R package^[223]75.

   The expression likelihood for each gene was estimated by fitting a
   mixture of three Gaussian distributions to the Log (RPKM + 1) data,
   across all genes in the integrated dataset, i.e. raw counts for each
   gene aggregates across all control samples, with the mixtools
   package^[224]76, and expressed as 1 minus the relative likelihood for
   an observation, expressed as Log (RPKM + 1), to be derived from the
   first (lowest mean) distribution.

   For each differentiation arm, gene expression signatures were computed
   at each time point by differential gene expression analysis against the
   profiles of undifferentiated EpiSC-5 cells (t = 0 h) as the reference,
   using Student’s t-test (Supplementary Data [225]4.3). Protein activity
   signatures were then assessed using the msviper function of the VIPER R
   package, available from Bioconductor^[226]25. Normalized Enrichment
   Score (NES) and associated p-values were estimated by permuting sample
   labels uniformly, at random 1,000 times. For time points with fewer
   than 5 replicates, the closest samples from contiguous time points were
   included, up to 5 sample in total, to support the permutation-based
   estimation of NES and p-values (Supplementary Data [227]4.4). These
   were assessed based on the similarity of their gene expression
   signatures using Euclidean distance.

   For each differentiation arm and time point, differential protein
   activity p-values were assessed from the VIPER-computed NES. To account
   for proteins that could be activated in some conditions and inactivated
   in others, p-values were further integrated using the Fisher’s method.
   Finally, to compensate for a potentially non-statistically independent
   contribution of the mesoderm and endoderm differentiation arms—which
   produced strongly correlated protein activity signatures—we
   down-weighted their contribution by half during Fisher’s p-value
   integration.

   The resulting ranked protein list identified 300 candidate MRs
   (p < 10^-4) (Supplementary Data [228]4.1). Among these, 134 (45%) had
   been previously implicated in pluripotency regulation in the literature
   (Supplementary Data [229]4.2). To prioritize candidate MRs for an
   experimentally feasible shRNA-based validation assay, we selected
   candidate genes using the following four criteria: (1) The top 120 most
   differentially active proteins, as ranked by p-value, were included by
   default; (2) 70 additional proteins were selected uniformly at random,
   from proteins ranked between positions 121 and 300 by p-value; (3) 30
   previously established pluripotency regulators—including 18 that were
   not identified among the top 300 MRs, and 12 that were not represented
   as transcriptional regulators in the EpiSC interactome—were also
   included; (4) Finally, since MR validation was based on silencing
   assays, we removed those not expressed in EpiSC (likelihood > 0.5).
   Taken together, these criteria produced a list of 172 candidate MRs for
   experimental validation (Supplementary Data [230]6.2). Note that of the
   18 previously identified pluripotency regulators that were not in the
   original list of 300 candidate MRs, only 4 (Gli2, Jarid2, Sall4, and
   Sox2) remain among the 132 MRs used in the final network analysis.
   Thus, 128 out of the 132 MRs in the final network were identified in a
   completely unbiased manner.

High-throughput shRNA screen

   Lentiviruses containing shRNA hairpins were purchased from the MISSION®
   shRNA Lentiviral library (Sigma) (Supplementary Data [231]4.1). To
   minimize off-target effects, a minimum of three independent hairpins
   were used for > 80 of candidate MRs. EpiSCs were passaged onto
   matrigel-coated 96-well plates (Greiner 655090) at a density of 3,000
   cells/well in 100 µl CMAF medium supplemented with 10 µM Y27632. At
   12 h after passaging, 50 µl of the culture medium was replaced with
   50 µl lentivirus with 8 µg/ml polybrene. At 24 h after lentivirus
   transduction, the medium was replaced with 100 µl CMAF + 2 µg/ml
   puromycin (Sigma P7255) for selection of stable integrants; puromycin
   concentration was increased to 4 µg/ml at 48 h. At 60 h, cells were
   fixed with 4% paraformaldehyde (PFA) and immunostained for Oct4
   (Pou5f1) (Santa Cruz sc-5279, 1:1,000) using an Alexa Fluor 488
   secondary antibody (Invitrogen A11029, 1:500). Cells were
   counter-stained with Hoechst 33342 (ThermoFisher H3570, 1:5,000), and
   plates were scanned using the IN Cell analyzer 2000 (GE Healthcare).
   For each well, 12 square-shaped fields were imaged for two channels,
   and exported images were analyzed by the IN Cell Analyzer workstation
   software.

Oct4 immunofluorescence data analysis

   We used the IN Cell Image Analyzer to perform immunofluorescence
   intensity quantification at the individual cell level. This instrument
   can scan wells in a 96-well plate and exports image files using two
   channels (Green Fluorescence (GF) and DAPI). These data were analyzed
   using the IN Cell 2000 workstation software. By setting up diameter
   parameter as 5um~10um, the software first analyzes the DAPI-channel to
   identify individual nuclei. The green fluorescence of each nucleus is
   then quantified by scanning the 488 nm channel. CSV files comprising
   the green fluorescence quantification for the cells in each well were
   then exported by the software. The data was acquired in 22 96-well
   plates (batches), with each plate assaying a subset of MR-directed and
   negative control shRNA hairpins.

   Overall, the instrument quantified the fluorescence of 23,317,327
   individual cells. These include 6,537,819 cells representing control
   conditions, 15,243,283 cells transduced with candidate MR-targeting
   shRNA hairpins, and 1,536,225 cells transduced with shRNA clones
   targeting established pluripotency regulators. In total, 623 and 47
   hairpins, targeting 158 candidate MR and 14 established pluripotency
   regulators, were tested, respectively.

   Cells with fluorescence values higher than 6 inter-quartile range (IQR)
   units above the median of control conditions were removed from further
   analysis as outliers. The data were then normalized, independently for
   each plate, using the plate-specific control conditions. A cumulative
   empirical density distribution was computed for the control conditions
   of each plate and used to estimate the normalized fluorescence z-score
   of each cell. By using plate-specific controls, the analysis eliminated
   all systematic difference between plates (batch effect). Data were
   summarized on an individual well basis by computing the area over the
   cumulative empirical distribution across all cells in the well, thus
   eliminating well-specific cell density as a confounding factor. Next,
   the statistical significance for the differential Oct4 protein level
   was estimated by comparing the summarized replicates for each hairpin
   to the summarized replicates of control wells, using Student’s t-test.
   Finally, to compute a gene-level z-score, the statistics of each
   associated shRNA clone were integrated using Stouffer’s method.

   To account for different silencing efficiencies, the shRNA-mediated
   reduction in the activity of the target protein (see next section
   below), expressed as z-score, was used to weight the contribution of
   the corresponding shRNA. Thus, shRNAs failing to inhibit the target
   proteins provided no contribution to the score. The final Oct4
   corroboration score (CS) was computed by multiplying the Stouffer’s
   integrated z-scores for the corresponding gene by the sign of the MR
   protein activity change during differentiation (i.e., +1 for activity
   increase and -1 for activity decrease). Finally, we derived the Oct4
   protein score as the log-transformation of CS while maintaining its
   sign:
   [MATH: <mi mathvariant="normal">Oct</mi><mspace
   width="0.25em"></mspace><mn>4</mn><mspace width="0.25em"></mspace><mi
   mathvariant="normal">protein</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">score</mi><mo>=</mo><msub><mrow><mi>log</mi></mrow
   ><mrow><mn>2</mn></mrow></msub><mrow><mo>(</mo><mrow><mo>∣</mo><mi
   mathvariant="normal">CS</mi><mo>∣</mo><mo>+</mo><mn>1</mn></mrow><mo>)<
   /mo></mrow><mo>⋅</mo><mi
   mathvariant="normal">sign</mi><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">CS</mi></mrow><mo>)</mo></mrow> :MATH]
   1

PLATE-seq analyses

   For PLATE-seq (Pooled Library Amplification for Transcriptome
   Expression), samples were prepared as described^[232]44, with minor
   modifications. Total RNA from 100 µl of Trizol-dissolved cell lysate
   was purified using the “No Spin” method of the MagMAX-96 for
   Microarrays kit (Ambion AM1830). RNA was eluted using 50 µl of elution
   buffer, and the concentration measured by Nanoview. 100 ng total RNA
   from each sample was diluted to 15 µl using nuclease-free water and
   transferred to 96-well plates (Applied Biosystems N8010560). We added
   1 µl of 1:1,000 dilution of ERCC Ex-Fold Spike-Ins (ThermoFisher
   4456739) to every other column of the plate to check for potential
   inter-sample contamination during library preparation.

   To anneal primers for reverse transcription, 3 µl of 100 µM
   barcode-linked oligo-dT primer was added to each well with 5 µl of 5x
   ProtoScript RT buffer (New England Biolabs M0368). The plate was
   incubated at 94 °C for 2 min and immediately cooled on ice for at least
   5 min, followed by addition of the reverse transcription master mix:
   2.88 mM dNTPs, 6.4 mM DTT, 0.14 U/µl SuperaseIN, 1.44 U/µl Protoscript
   II Reverse Transcriptase, and water to a final volume of 25 µl. The
   plate was then incubated at 42 °C for 2 h. To remove excess RT primer,
   2 µl of 1:4 diluted ExoI (20 U/µl) was added to each well, and the
   plate incubated at 25 °C for 1 h. To hydrolyze the RNA, 10 µl of 1:1
   mixture of 1 M sodium hydroxide and 0.5 M EDTA was added followed by
   incubation at 65 °C for 15 min. All wells on the plate were then pooled
   in a 50 ml conical tube.

   The pooled sample was then purified and concentrated using DNA Clean &
   Concentrator Kit (Zymo Research D4013) using a modified protocol. The
   sample was first diluted seven-fold using DNA binding buffer (Zymo
   Research D4003-1-L), and passed through the binding column using a
   vacuum apparatus. The column was washed twice with washing buffer, and
   eluted with 15 µl of nuclease-free water. The eluted sample was further
   purified using AMPure XP PCR purification beads (Beckman Coulter
   A63880) at a 1:1 sample to bead ratio, and eluted with 15 µl of
   nuclease-free water. For second-strand synthesis, 1 µl of 10 mM dNTPs
   and 1 µl of 100 mM adapter-linked random hexamer primer were added,
   followed by heating at 70 °C for 2 min and immediate cooling on ice for
   5 min. 2 µl of NEBuffer 2 (New England Biolabs B7002) and 1 µl of
   Klenow large fragment DNA polymerase (New England Biolabs M0210) were
   then added, followed by incubation at 25 degrees for 30 min and two
   rounds of purification with AMPure XP beads. The double-stranded cDNA
   was eluted into 15 µl nuclease-free water and concentration measured
   using a Qubit Fluorometer 3.0 with dsDNA high-sensitivity assay kit
   (ThermoFisher [233]Q32851).

   For library amplification, the cDNA was diluted to 0.05 µg/ml and
   PCR-amplified using Phusion DNA polymerase (New England Biolabs M0530)
   with 0.5 mM dNTPs, 0.5 µM Illumina RP1 and RPl1 primers. The PCR
   product was purified using AMPure XP beads for one round and eluted
   into 15 µl nuclease-free water. 2 µl of the product was analyzed by
   bioanalyzer with an expected broad peak from 200 bp to 1,000 bp and
   average length of product 500 bp. 1 nM of PCR product was hydrolyzed
   with an equal amount of 0.2 M sodium hydroxide, and mixed with
   hydrolyzed PhIX control library (Illumina FC-110-3001). The final
   product contained 1.8 pM library with 30% PhIX/70% sample in 1.3 ml HT1
   buffer. The product was loaded on the NextSeq 500 high output v2 kit
   (Illumina FC-404-2005) together with two custom primers and two PhIX
   primers at final concentration of 300 nM each. The sample was loaded
   onto an Illumina NextSeq 500 sequencer for paired-end sequencing, with
   Read 1 (26 cycles) identifying the 8 bp barcode for sample well
   information and Read 2 (66 cycles) to determine the identity of the
   mRNA. All PLATE-seq oligo sequences are listed in Supplementary
   Data [234]10.4.

EpiSC expression profiles after candidate MR silencing

   We used the PLATE-seq technology^[235]44 to generate gene expression
   profiles from EpiSC cells, at 72 h following lentiviral-mediated shRNA
   transduction. Sequencing reads were de-multiplexed as previously
   described in ref.^[236]44, mapped to the mouse genome build GRCm38
   (mm10), and counted at the known gene level with STAR aligner
   v2.5.2^[237]77. Summarized gene expression counts are available from
   the Gene Expression Omnibus database ([238]GSE199855). Expression data
   were normalized by equivariance transformation, based on the negative
   binomial distribution with the DESeq R-system package^[239]75.

   For each candidate MR-targeting shRNA clone, differential gene
   expression signatures were computed, against plate-matched,
   non-targeting controls, as a reference. Statistical significance was
   assessed by moderated Student’s t-test, as implemented in the limma
   package from Bioconductor^[240]77 and represented as a z-score vector.

   For each MR-targeting shRNA, differential protein activity was assessed
   by VIPER analysis, using the EpiSC interactome. To control for
   differences in regulon size (number of transcriptional targets per
   regulon), all regulons were capped to the top 50 targets, based on
   their interaction likelihood.

Inverted regulon analysis

   We used the silencing assays to identify regulons likely to predict an
   inverted activity for the associated regulatory protein. Specifically,
   we assessed the change in VIPER-assessed protein activity following MR
   silencing and changed the regulon sign of candidate MRs, whose activity
   was assessed as increased following silencing. This addresses a problem
   originally reported in the VIPER manuscript^[241]25, whereby the
   activity change may be inferred in the opposite direction due to
   negative autoregulatory loops that may anticorrelate the expression of
   a regulator and that of its targets. To identify the regulons reporting
   an inverted activity, we fitted a mixture of two Gaussian distributions
   to the empirical distribution of estimated activity z-scores, for all
   silenced regulators, representing the distribution of differential
   protein activity for the inverted and non-inverted regulons. Then,
   based on these two Gaussians, we estimated a conservative threshold to
   identify regulons with an inverted regulatory sign. This was computed
   as the z-score corresponding to a likelihood ratio of 4 between the
   Gaussian models fitted to the inverted and non-inverted regulons,
   respectively.

Differentiation signature score

   We then assessed whether the changes induced by candidate MR silencing
   would recapitulate those observed in the differentiation time-courses.
   This was achieved by computing the enrichment of the 100 most
   differentially active transcriptional regulators, following
   shRNA-mediated silencing, in proteins differentially active at 72 h in
   the morphogen-mediated differentiation assays, using the aREA
   algorithm^[242]25. For these analyses, protein activity was again
   assessed using regulons capped at the top 50 transcriptional targets.

   The Normalized Enrichment Score (NES) of each MR-specific hairpin and
   for the 5 lineage specific differentiation assays, was integrated using
   a weighted implementation of Stouffer’s method. Mesoderm and endoderm
   differentiation were included with half the weight of other lineages
   since they were strongly correlated. In addition, as discussed above,
   the efficiency of each hairpin in reducing the activity of the target
   protein, expressed as z-score, was also used to weigh its contribution.
   The final NES was multiplied by the sign of the MR protein activity
   change, change to obtain the differentiation z-score (DZ), such that
   concordant and discordant effects would have a positive and negative
   NES, respectively. Finally, we derived the Differentiation Score as the
   log-transformation of DZ while maintaining its sign:
   [MATH: <mi mathvariant="normal">Differentiation</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">Score</mi><mo>=</mo><msub><mrow><mi>log</mi></mrow
   ><mrow><mn>2</mn></mrow></msub><mrow><mo>(</mo><mrow><mo>∣</mo><mi
   mathvariant="normal">DZ</mi><mo>∣</mo><mo>+</mo><mspace
   width="0.25em"></mspace><mn>1</mn></mrow><mo>)</mo></mrow><mo>⋅</mo><ms
   pace width="0.25em"></mspace><mi
   mathvariant="normal">sign</mi><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">DZ</mi></mrow><mo>)</mo></mrow> :MATH]
   2

Selection and evaluation of pluripotency MRs

   In the first assay using Oct4 protein level as the readout for
   pluripotency alteration following silencing, we tested 172 selected
   genes expressed in EpiSCs, including the top 95 most differentially
   active candidate MRs expressed in EpiSC, 63 additional candidate MRs
   expressed in EpiSC and ranking between 121 and 300, and 14 additional
   genes previously known to be involved in pluripotency. An additional
   candidate MR, Kmt2a, could not be evaluated because all hairpins used
   for targeting did not pass the data QC in either of the two assays
   below. From the 172 genes tested, 117 MRs (68%) were found to
   down-regulate Oct4 protein levels (FDR < 0.05) and 15 MRs (8.7%) were
   found to up-regulate Oct4 when silenced. In a second, parallel assay
   using PLATE-seq datasets from silenced cells, we obtained analyzable
   datasets after QC for 154 of the 172 MRs—including all the top 95 most
   differentially active candidate MRs expressed in EpiSC, 54 candidate
   MRs ranking between 121 and 300, and 5 genes previously known to be
   involved in pluripotency—following their individual silencing in
   EpiSCs. Among those 154 MRs, a total of 132 were confirmed by PLATE-seq
   based assay to induce an expected differentiation state when silenced
   (positive DS with FDR < 0.05; Fig. [243]2d).

CRISPR/Cas9 editing

   To minimize off-target effects, we used a minimum of two independent
   sgRNAs for each candidate MR. sgRNA sequences were optimized using two
   online sgRNA design tools (at [244]https://design.synthego.com/ or the
   site formerly available at [245]http://crispr.mit.edu). For most
   experiments, we selected sgRNAs with low off-target effects, high
   on-target efficiency, and targeting to a common exon shared by all
   splice isoforms in the first half of the coding region. In some cases,
   we selected a pair of sgRNAs targeting intronic regions that flank a
   common exon located in the first half of the coding region to achieve
   targeted deletion of the entire exon. sgRNA primers were ordered from
   Integrated DNA Technologies. Single-stranded primers were annealed into
   double-stranded sequences with restriction enzyme engineered ends, and
   cloned into lentiGuide-Puro (Addgene 52963) by the Golden Gate method
   as described in
   ([246]https://media.addgene.org/cms/filer_public/3e/e1/3ee1ce9c-99f9-40
   74-9a28-109d34971471/zhang-lab-sam-cloning-protocol.pdf). All inserted
   sgRNAs were sequence-verified.

   For production of lentivirus, HEK293FT cells were grown to 90%
   confluency and transfected with pMD2.G, psPAX2 packaging vectors and
   the lentiviral expression construct, using Lipofectamine 3000
   (ThermoFisher L3000001). Virus was harvested twice on two consecutive
   days, and titers were verified by Lenti-X GoStix Plus (Takara Bio
   631280) before storage at −80 °C.

   We established a stable Cas9-expressing line by infecting EpiSC-5 cells
   with lentiCas9-Blast (Addgene 52962), followed by selection with
   5 µg/ml blasticidin (Sigma 15205) for five days. Established
   Cas9-positive EpiSCs were confirmed for pluripotency by marker analysis
   as well as self-renewal and differentiation assays. To induce
   CRISPR/Cas9 mediated mutagenesis, Cas9-positive cells were infected
   with lentiGuide-Puro (Addgene 52963) inserted with sgRNA. At 24 h,
   cells were selected with 2 µg/ml puromycin for 2 days and passaged at
   clonal density (1000 cells/well on 6-well plates). At 72 h after
   passaging, single colonies were picked into 96-well plates; at least 16
   colonies were picked for each sgRNA targeted line. Colonies were grown
   to confluency and expanded onto 12-well plates to establish clonal cell
   lines. To validate targeted mutations, cells were lysed with DirectPCR
   Lysis Reagent (Viagen Biotech 301-C). Genomic regions were
   PCR-amplified (Platinum SuperFi II PCR master mix, ThermoFisher
   12368050) using a pair of primers at least 200 bp away from the 5’ and
   3’ ends of the sgRNA targeted site. PCR products were diluted 1:40 and
   10 µl mixed with 5 µl of 5 µM sequencing primer for Sanger sequencing.
   Results from true clonal cell lines displayed either a single ORF
   consistent with identical genome editing results on both alleles, or
   double peaks with equal height indicating different alleles on the
   homologous chromosomes. In the latter case, to determine sequences of
   the individual alleles, PCR products were cloned into TOPO vector
   (Strataclone Blunt PCR Cloning Kit, Agilent 240207) and selected on
   agar plates with ampicillin and kanamycin. Multiple colonies were
   mini-prepped (Qiagen 51306) and sequenced.

RT-qPCR

   Total RNA was reverse-transcribed to cDNA using SuperScript Reverse
   Transcriptase II. Diluted cDNA was mixed with SYBR green master mix
   (Applied Biosystems [247]A25777) and qPCR reaction was performed on an
   ABI7500 instrument (Applied Biosystems) using conditions of 50 °C for
   2 min; 94 °C for 2 min; 94 °C for 30 s + 60 °C 30 s for 40 cycles. The
   expression of Gapdh or Hprt was used to normalize the loading error,
   and the Delta-delta Ct method was used to quantify relative gene
   expression levels.

Differentiation phenotype analysis

   As a secondary validation assay, aimed at assessing whether a
   differentiation phenotype would emerge following CRISPR/Cas9 based
   silencing, we focused on 70 of the 132 MRs that had not been previously
   reported as pluripotency MRs in the literature, as well as 4 additional
   MRs that were previously known. A total of 1150 clonal lines were
   generated by sgRNA targeting each one of these MRs, together with 128
   scramble/mock controls. Among them, 520 lines contained biallelic
   frameshift mutations. Analyzing the morphology of those 520 lines
   identified 15 MRs whose knock-out significantly affected EpiSC
   morphology (Supplementary Data [248]7.1). In addition, we could not
   recover biallelic frameshift lines for 5 additional MRs (Chi-square
   test, p < 0.05; Supplementary Fig. [249]2a), suggesting that null
   mutants for these genes may be impaired in self-renewal and/or
   clonogenicity. Collectively, a total of 15 + 5 = 20 MRs were validated
   by CRISPR/Cas9 based analysis, including 3 of 4 genes selected as known
   MRs (75%) and 17 of 70 novel MRs (24%). As a follow-up, a panel of in
   vitro functional assays evaluating self-renewal and differentiation
   potential of these 20 MRs were performed using 40 knock-out (KO) lines
   for the 15 MRs with biallelic frameshifts and 11 knock-down (KD) lines
   for the remaining 5 MRs, as well as 9 empty or mock controls
   (Supplementary Data [250]7.2, [251]7.3). 2 MRs (Cbl and Zc3h13) display
   the strongest results and their KO lines were further evaluated by in
   vivo teratoma formation assay.

Immunofluorescence and immunohistochemistry

   For adherent cells, cells were fixed with 4% PFA for 15 min and washed
   with 1x PBS. Cells were permeabilized by 0.1% TritonX-100 for 10 min.
   Blocking was performed using 5% goat serum or 2% non-fat milk, and
   primary antibody was diluted with 1% serum or non-fat milk for staining
   overnight at 4 °C. Cells were then washed three times with 0.05%
   Tween-20 and secondary antibody was diluted 1:500 with 1% goat serum or
   non-fat milk for staining at room temperature for 1 h, followed by
   washing three times with 0.05% Tween-20. Nuclei were stained with
   Hoechst 33342 (1:5,000 dilution in PBS) for 10 min and washed with PBS
   twice.

   For teratomas, hematoxylin and eosin (H&E) staining or
   immunohistochemistry was performed on 5-micron paraffin sections. For
   H&E staining, sections were rehydrated by serial incubation with
   xylenes three times, 100% EtOH twice, 95% EtOH twice, 70% EtOH once,
   and 50% EtOH once. Slides were dipped into hematoxylin for 3 min,
   followed by rinsing with deionized water and quick incubation with acid
   ethanol. Slides were then incubated with eosin for 30 s and then
   dehydrated by incubating with solvents in the reverse order of
   rehydration. The slides were then cover-slipped using Clearmount.

   For immunohistochemistry, rehydrated slides were blocked with 3%
   hydrogen peroxide for 20 min. Antigen retrieval was performed by
   heating in a steamer for 45 min. Slides were then permeabilized with
   PBS + 0.1% Triton-X for 15 min and blocked with 5% serum for 1 hr.
   Primary antibody was diluted with 1% serum and incubated with slides in
   humid chamber overnight at 4 °C. Secondary antibody staining used
   biotinylated antibodies at 1:500 dilution in 1% serum. Color detection
   used an ABC amplification kit and Vector NovaRed Substrate Kit (Vector
   Laboratories). Hematoxylin staining was performed by quick dipping
   slides into hematoxylin 3–5 times, followed by washing, dehydrating,
   and coverslipping as described above.

Colony formation assays

   Confluent cells were dissociated, counted (Bio-Rad TC20), and replated
   at clonal density (500 cells/well) on 12-well attachment plates, in 1:2
   diluted CMAF without Y27632. Media was refreshed once at 48 h after
   replating. At 96 h, cells were stained with methylene blue by
   incubating with 1% methylene blue diluted in 50% methanol for 5 min
   followed by washing with tap water for 10 min. Alternatively, cells
   were stained for alkaline phosphatase using a commercially available
   kit (Stemgent 00-0055). In a separate experiment at 120 h after plating
   at clonal density, cells were dissociated with 0.05% trypsin and
   resuspended in 1 ml basal media. Cell numbers in suspension were
   counted by the Bio-Rad TC10 cell counter and divided by the initial
   number of plated cells to determine the fold-change.

Embryoid body formation

   Embryoid body formation was performed as described in refs.
   ^[252]78–[253]80, with minor modifications. In brief, EpiSCs were
   dissociated into single cells and diluted to 10,000 cells/ml in EB
   growth media containing DMEM/F12, 20% FBS, 1x MEM non-essential amino
   acids, 1x GlutaMAX, and 55 µM 2-merceptoethanol. Hanging drops were
   plated as 30 µl drops containing 300 cells on the top of a 150 mm
   culture dish. After 72 h, embryoid bodies were washed and imaged.

Western blotting

   Attached cells were grown to confluency on 10 cm^2 dishes and
   dissociated with 0.05% trypsin. Cell suspension was lysed, measured and
   denatured by following protocol as described in PMID: 29625057. For
   SDS-PAGE, a 10–20 µg sample was loaded onto Bio-Rad pre-cast 4%–20%
   Mini-PROTEAN TGX gels (4561094S), run for 2–3 h at 120–140 V, and then
   transferred to PVDF membranes at 400 mM for 3 h. Primary antibody
   staining was performed overnight at 4 °C with antibodies diluted in 5%
   non-fat milk in TBST. HRP-conjugated secondary antibody was used to
   incubate the membrane for 1 hour at room temperature. Color reaction
   was performed using SuperSignal West Pico Plus Chemiluminescent
   Substrate (ThermoFisher 34577).

In vitro lineage-specific differentiation analysis

   EpiSCs were differentiated into neuroectoderm and mesendoderm lineages
   as described in refs. ^[254]68,[255]81, with minor modifications. In
   brief, 500–5000 cells, depending on their growth rate, were plated in
   24-well culture plates with 0.5 ml CMAF + 10 µM Y27632. The next day,
   differentiation was initiated in MEF conditioned media, lacking FGF2
   and Activin A. For neuroectoderm differentiation, cells were treated
   with 10 µM SB431542 (Millipore Sigma S4317) + 12 ng/ml FGF2 for 7 days.
   For mesendoderm differentiation, cells were treated with 10 µM SU5402
   (Millipore Sigma 572630) + 5 ng/ml Activin for 3 days, followed by
   10 ng/ml BMP4 + 20 ng/ml FGF2 + 30 ng/ml Activin for 4 days. On day 4,
   after most wells reached confluency, cells were dissociated and
   replated onto 24-well and 96-well plates to achieve 30–50% confluency.
   Cells continued differentiation for another three days before
   harvesting for RT-qPCR analysis (24-well cultures) and
   immunofluorescence staining (96-well cultures).

Image analysis

   Immunofluorescence-stained samples in 96-well plates (Greiner 655090)
   were imaged by the Cytation 5 Cell Imaging Multimode Reader using the
   Biotek Gen5^TM software. For each marker, exposure time for two
   channels (GFP: 488 nm and DAPI: 405 nm) was adjusted to reduce
   background and to maximize on-target signal. For each well, 9
   square-shaped field were recorded for each channel and exported to a
   stitched image file. Two channel images from the same sample were
   pseudo-colored and overlaid by Fiji ImageJ.

   An image analysis pipeline was developed using the CellProfiler
   software (version 3.0.0). This pipeline scans the DAPI-channel image to
   mark the area containing the cell nucleus, and then scans the
   488nm-channel image to quantify the nuclear staining intensity of the
   marker protein. For markers that stain cytoplasm such as Doublecortin,
   the area surrounding the nucleus is marked for green fluorescent
   intensity quantification. CSV formatted files containing green
   fluorescent intensity quantification of individual cells in each sample
   well were exported, which were then analyzed by an R pipeline generated
   by the Herbert Irving Comprehensive Cancer Center (HICCC) Confocal and
   Specialized Microscopy Shared Resource. The background intensity was
   estimated for each antibody marker using the CellProfiler software as
   the average intensity of pixels in an area with no cells. The R
   pipeline then scanned the data and identified cells that showed higher
   intensity of green fluorescence signal than background. The proportion
   of cells with positive marker expression was then calculated by
   dividing the number of cells with greater than background staining by
   the total number of cells recorded. To correct for batch-effects
   created by variations in experiments over time, the proportions of
   positive cells, expressed as percentages, were divided by the average
   percentage of positive cells in the control wells (wild type or
   untreated EpiSCs) on the same 96-well plate. Statistical significance
   for the difference between mutants and controls was estimated by
   two-tailed Mann-Whitney U-test.

Teratoma formation

   Teratoma formation was performed as described^[256]82 with minor
   modifications. In brief, EpiSCs were dissociated into single cells, and
   300,000 cells were resuspended in 50 µl Matrigel. Cells were injected
   into the gastrocnemius muscle of immunodeficient nude male mice
   (CrTac:NCr-Foxn1^nu, Taconic) at 10 weeks of age (n = 20), and mice
   were monitored daily to track tumor growth; male mice were used as
   graft hosts since the implanted EpiSC cells were male. Mice were
   euthanized when tumors grew to 2 cm diameter, around 25 days after
   injection. Tumors were dissected intact, weighed, cut into 1 mm
   sections by razor blade, fixed with 4% PFA or 10% formalin,
   paraffin-embedded, and processed for H&E staining or
   immunohistochemistry.

   All animal studies were approved by and conducted according to
   standards set by the Columbia University Institutional Animal Care and
   Use Committee (IACUC) under protocol AC-AABV7651. Mice were housed
   under specific pathogen-free (SPF) conditions in individually
   ventilated autoclaved cages with irradiated feed and automated RO
   (reverse osmosis) watering, under a 12 hr dark/12 hr light cycle with
   temperatures at 68–79 °F and humidity between 30 and 70%.

MR interactome refinement

   To refine the transcriptional targets of each candidate MRs using
   experimental data, we integrated z-scores corresponding to (1) the
   differential expression (DE) of its candidate targets, following
   shRNA-mediated silencing, and (2) the mutual information (MI) between
   the MR and its candidate targets (see EpiSC context-specific
   interactome above). To avoid bias in assessing the statistical
   significance of these metrics, we rank-transformed and scaled between 0
   and 1 both variables before integration, while preserving the
   differential expression (DE) score sign. Integration between MI and DE
   was performed using a weighed implementation of the Stouffer’s method.
   The DE contribution was weighted by the silencing efficiency of each
   shRNA hairpin, as previously discussed. Such weights were computed by a
   logistic transformation of the silencing efficiency scores with
   inflection point equal 1 and trend at the inflection point equal 3. MI
   scores were only limited to the regulator’s transcriptional
   interactions present in the original EpiSC interactome, in this way
   making full use of the data processing inequality filter. The “TFmode”
   component of the new regulons^[257]25 was computed for each target as
   the average between the direction-corrected “TFmode” from the EpiSC
   interactome and the DE score for the target in response to the
   regulator silencing.

   To estimate the “likelihood” component of the new regulons, we first
   computed two integration scores: (1) a non-directional integration
   score (NDIS) as the weighted Stouffer’s integration of the
   rank-transformed MI and |DE | , where |x| denotes the absolute value of
   x, and (2) a directional integration score (DIS) as the weighted
   Stouffer’s integration of the rank-transformed MI multiplied by the
   sign of the direction-corrected “TFmode” from the EpiSC interactome and
   the DE score. Finally, the direction-corrected “TFmode” from the EpiSC
   interactome was used to weight the contribution of the NDIS and DIS
   scores when computing the integrated interaction score (IS) as DIS x
   |TFmode | + NDIS x (1 - |TFmode | ). The “likelihood” component was
   then estimated by scaling the IS by its maximum value across all the
   targets for the regulator. The top 200 transcriptional targets per
   regulator, based on IS, were included in the interactome. For VIPER
   analysis, we limited the number of transcriptional targets per regulon
   such as the effective size—i.e., sum(likelihood^2)—was capped at 50.

Experimentally refined network analysis

   Based on the new, experimentally-refined network, we used VIPER to
   assess the differential activity of 132 candidate pluripotency MRs
   following shRNA-mediated silencing. The NES scores for each shRNA
   targeting the same candidate MR were integrated using the Stouffer’s
   method, weighted to account for the hairpin silencing efficiency. A
   major advantage of this approach is that we could determine network
   directionality. ARACNe cannot determine the directionality of an
   interaction between two regulatory proteins A and B. However, if B’s
   expression or activity is affected by the experimental silencing of A,
   or vice-versa, this can effectively resolve this issue, thus leading to
   a bona fide causal network. This can also resolve situations where A
   regulates B and B regulates A (autoregulatory loop).

   The statistical threshold to determine the presence of a causal edge
   was determined by evaluating the relationship between the number of
   nodes (proteins) and edges at different thresholds. Based on this
   analysis, we inferred an edge between a silenced regulator P[r] and a
   downstream target protein P[t] conservatively, only when the activity
   of P[t] significantly changed in response to P[r] silencing
   (FDR < 10^−^10). The final network comprises 120 MRs and 1273 causal
   MR → MR edges (Fig. [258]4a). Of the validated MRs, 12 were not
   included in the network because no significant interactions with other
   MRs were identified by the analysis.

Causal pluripotency MR network analysis

   Community analysis was performed with the Louvain multi-level
   modularity optimization algorithm^[259]54. The optimal number of
   communities was inferred by modularity optimization as implemented in
   the igraph package in Bioconductor, with the precision parameter set to
   the default of 1.

   For each node, we computed the betweenness centrality, defined as the
   number of shortest paths connecting all node pairs in the network that
   pass though the evaluated node; the degree, as the number of edges
   associated to each node; the in-degree, as the number of incoming edges
   for each node; and the out-degree, as the number of outgoing edges for
   each node, using the implementation for computing these centrality
   measurements in the igraph package. We defined the outlier nodes
   showing the highest betweenness as “Mediators”, because they mediate
   most of information flow in the network.

   We define the Regularized out-degree (ROD) for each node as the total
   number of outgoing interactions (out-degree) multiplied by the
   difference between its out- and in-degree, after scaling such
   differences between 0 and 1 along all nodes. The ROD analysis
   identified three proteins classes including (a) Speakers, defined as
   nodes with high levels of ROD, (b) Communicators, defined as nodes with
   intermediate ROD, and (c) Listeners, defined as nodes with low ROD. The
   ROD thresholds for considering a node as a Speaker, Communicator or
   Listener were inferred by fitting a mixture of 3 Gaussian distributions
   to the ROD distribution along all nodes. We defined the ROD threshold
   separating listener from communicator nodes as the point of equal
   probability between the first (lowest mean) and second fitted
   distributions. The speakers were defined by showing an ROD higher than
   the quantile of the third fitted distribution corresponding to a
   right-tail probability lower than 5% (Fig. [260]5b).

Community functional gene-set analysis

   The enrichment of the MR silencing gene expression signatures on Gene
   Ontology Biological Processes (GO-BP), or on MsigDB v7.4, including
   KEGG, Reactome, Biocarta, PID pathways and biological hallmarks were
   estimated with the aREA algorithm. Enrichment due to gene-sets overlap
   was corrected using shadow analysis as implemented in the VIPER package
   from Bioconductor^[261]25. Multiple hypothesis testing was addressed by
   False Discovery Rate (FDR). The enrichment results were integrated for
   each community, or each category of regulator (i.e., speaker,
   communicator and listener) using average integration of the normalized
   enrichment scores weighted by the community score. The results are
   reported for all GO-BP or MsigDB categories significant in at least one
   community, or at least one speaker category at FDR < 0.01. Due to the
   low number of hits obtained from MsigDB dataset, only the enrichment of
   GO-BP to the four communities were reported in Supplementary
   Data [262]9.1. The GO-BP terms preferentially enriched to each of the
   four communities were hand-picked using the following criteria: 1) GOBP
   terms with -log[10] p < 2.5—a value selected to optimize the number of
   statistically significant categories—in all four community signatures
   were first filtered out; 2) For the remaining GOBP terms, the maximal
   number of the -log[10] p value among all four communities were
   determined, and the differences between this maximal value to the other
   three values were calculated; 3) GOBP terms with maximal value larger
   than any other three values by more than 2, or GOBP terms that are
   significantly enriched to only one community (-log[10] p < 2.5) but not
   the other three communities were included in Supplementary
   Data [263]9.2 and Fig. [264]4h.

Reporting summary

   Further information on research design is available in the [265]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [266]Supplementary Information^ (6.1MB, pdf)
   [267]41467_2025_57894_MOESM2_ESM.pdf^ (121.5KB, pdf)

   Description of Addtional Supplementary File
   [268]Supplementary Data 1^ (22.3KB, xlsx)
   [269]Supplementary Data 2^ (61.4KB, xlsx)
   [270]Supplementary Data 3^ (28.8MB, xlsx)
   [271]Supplementary Data 4^ (8.6MB, xlsx)
   [272]Supplementary Data 5^ (49.6KB, xlsx)
   [273]Supplementary Data 6^ (99.3KB, xlsx)
   [274]Supplementary Data 7^ (21.3KB, xlsx)
   [275]Supplementary Data 8^ (189.1KB, xlsx)
   [276]Supplementary Data 9^ (54.3KB, xlsx)
   [277]Supplementary Data 10^ (67.9KB, xlsx)
   [278]Reporting Summary^ (4.3MB, pdf)
   [279]Transparent Peer Review file^ (347KB, pdf)

Source data

   [280]Source Data^ (586.6KB, zip)

Acknowledgements