Abstract Despite extensive research, the gene regulatory architecture governing mammalian cell states remains poorly understood. Here we present an integrative systems biology approach to elucidate the network architecture of primed state pluripotency. Using an unbiased methodology, we identified and experimentally confirmed 132 transcription factors as master regulators (MRs) of mouse epiblast stem cell (EpiSC) pluripotency, many of which were further validated by CRISPR-mediated functional assays. To assemble a comprehensive regulatory network, we silenced each of the 132 MRs to assess their effects on the other MRs and their transcriptional targets, yielding a network of 1273 MR → MR interactions. Network architecture analyses revealed four functionally distinct MR modules (communities), and identified key Speaker and Mediator MRs based on their hierarchical rank and centrality. Our findings elucidate the de-centralized logic of a “communal interaction” model in which the balanced activities of four MR communities maintain primed state pluripotency. Subject terms: Embryonic stem cells, Regulatory networks, Gene silencing __________________________________________________________________ The architecture of gene regulatory networks that govern mammalian cells has been poorly understood. Here, Li et al. present a computational systems approach to elucidate the regulatory logic of the gene network for primed state pluripotency. Introduction Pluripotency is defined by cellular states in which cells are capable of self-renewal while remaining poised to differentiate into all embryonic cell lineages. To date, a progression of three distinct in vivo pluripotency states—naïve, formative, and primed—has been identified in the mouse embryo^[34]1–[35]4, with formative cells representing a transition between the naïve and primed states^[36]5,[37]6. Due to their central significance for stem cell biology and regenerative medicine, the naïve and primed states have been captured as cell lines in vitro and intensively studied. In culture, embryonic stem cells (ESCs) established from pre-implantation blastocysts reflect the naïve pluripotent state, whereas epiblast stem cells (EpiSCs) isolated from pre-gastrulation embryos^[38]7,[39]8 correspond to the primed pluripotent state. Primed pluripotent cells co-express markers characteristic of the undifferentiated naïve state, as well as markers of more differentiated lineage-specific states; they differ from the naïve state in a range of molecular and functional properties^[40]1,[41]2,[42]9. Notably, conventional pluripotent stem cell cultures established from human and many other mammalian species largely correspond to a primed pluripotency state, with smaller subpopulations corresponding to naïve and formative states, and thus most closely resemble mouse EpiSCs^[43]1–[44]3. However, although the primed state is highly conserved, it is less understood than the naïve state, particularly in terms of the regulatory network that governs its pluripotency. Previous studies of pluripotency network architecture have largely focused on a limited number of well-known regulators that have been primarily identified by mutational analyses in cultured stem cells or in vivo^[45]10–[46]17. These studies led to two general models of the regulatory logic underlying naïve state pluripotency. In the first model, a highly interconnected, hierarchical network of core transcription factors such as Oct4 and Nanog are responsible for activating pluripotency programs and repressing differentiation^[47]10,[48]11,[49]13,[50]17,[51]18. In the second model, pluripotency regulators comprise an array of specification genes that direct differentiation toward a specific lineage while repressing other lineages, creating an unstable undifferentiated equilibrium with the potential to commit to multiple lineages^[52]19–[53]23. However, these models were based on analyses of a relatively small number of genes, and it is conceivable that key features that may have emerged from an unbiased approach were missed. More critically, elucidating large-scale network topology by studying a relatively small number of genes is suboptimal, as underscored by an ever-expanding repertoire of genes implicated in pluripotency maintenance, which suggests that pluripotency is controlled by a sizeable, highly complex gene regulatory network. To address this challenge, we leveraged a comprehensive and unbiased systems approach for the large-scale systematic elucidation of the regulatory logic of mouse EpiSC cells, a defined cell state of fundamental importance in normal development. The approach—which is based on the seamless integration of both computational and experimental methodologies—has led to the systematic identification and validation of primed pluripotency regulators, followed by elucidation of their regulatory relationships and overall network architecture. For this purpose, we have utilized ARACNe^[54]24 and VIPER^[55]25, two algorithms extensively validated in a wide range of physiologic^[56]26–[57]28 and pathologic contexts, both tumor^[58]29–[59]32 and non-tumor related^[60]33–[61]35. ARACNe was used to assemble an initial regulatory network (interactome) from large collections of expression profiles, using an information theory approach that largely eliminates indirect interactions. VIPER interrogates the resulting interactome to identify candidate Master Regulator (MR) proteins representing mechanistic determinants of a specific transcriptional cell state via their transcriptional targets; this is accomplished by measuring the enrichment of their activated and repressed targets in genes differentially expressed in that state, akin to a highly multiplexed gene reporter assay. Notably, the accuracy and sensitivity of VIPER-inferred protein activity compares favorably with antibody-based measurements^[62]35,[63]36, and allows unbiased identification of candidate MRs (i.e., independent of prior literature knowledge). In our study, we have generated an extensive collection of EpiSC gene expression profiles, as well as multiple signatures of lineage-specific differentiation to support the systematic generation of a mouse EpiSC-specific interactome and the identification of MR proteins responsible for the maintenance of primed state pluripotency. Using a set of 132 experimentally confirmed MRs, we elucidated their mutual regulatory relationships by systematically silencing each MR in turn to determine the resulting effects on the activity of all other MRs. This large-scale experimental analysis yielded a comprehensive MR → MR causal interaction network, which we have investigated using concepts from social network analysis to understand its regulatory architecture. Notably, topology analysis identified four distinct gene communities within the EpiSC network, suggesting a “communal interaction” model for how pluripotency is maintained. Our analysis of the regulatory architecture of primed state pluripotency thus provides relevant insights into mechanisms of pluripotency and may guide future studies of other similarly complex regulatory networks. Results Overall experimental strategy To dissect the logic of molecular interactions responsible for regulating primed state pluripotency, we pursued a six-step strategy that integrates computational inference with validation assays, followed by experimental refinement. First, as inputs for the ARACNe algorithm, we generated a large repertoire of gene expression profiles from EpiSCs undergoing a wide range of small molecule perturbations. This resulted in a comprehensive first-generation interactome comprising the computationally inferred transcriptional targets (regulon) of every transcription factor (Fig. [64]1a). Second, we leveraged this network to identify candidate MRs of primed state pluripotency, by using the VIPER algorithm to analyze gene expression time series representative of EpiSC differentiation along multiple lineages (Fig. [65]1b). Third, we validated candidate MRs emerging from the analysis in terms of their ability to modulate the pluripotent transcriptional state following RNAi-mediated silencing, resulting in a list of 132 confirmed MRs that contained both known and novel regulators of pluripotency (Fig. [66]1c). Fourth, among these, we sought to validate 70 MRs not previously reported in the literature using functional assays of self-renewal and differentiation phenotypes following their CRISPR-mediated knock-out (Fig. [67]1d). Fifth, we used the RNA-seq profiles of EpiSC cells following RNAi-mediated silencing of each of the 132 confirmed MRs, as generated in step three, to assemble a comprehensive, experimentally-derived regulatory network governing primed state pluripotency (Fig. [68]1e). This fully causal MR → MR interaction network is based directly on experimental determination of the differential activity of each MR as induced by the silencing of every other MR. Finally, we investigated the architecture of the primed pluripotency network through modularity, hierarchy, and centrality analyses (Fig. [69]1f). Overall, this strategy allowed reconstructing the architecture of a complex and previously uncharacterized EpiSC regulatory network, using a fully unbiased approach that is largely based on the analysis of targeted experimental perturbations. Fig. 1. Flowchart of experimental design. [70]Fig. 1 [71]Open in a new tab a Interactome construction. Two independent EpiSC lines were treated with 5 differentiation conditions and 33 small molecule perturbations, generating 276 expression profiles. These profiles served as input for the ARACNe algorithm to create the “first-generation” EpiSC interactome. b Inference of candidate MRs. 144 expression profiles from 5 distinct lineage-specific differentiation time courses were used to generate signatures. The VIPER algorithm then analyzed these signatures to identify candidate MRs. c Validation of candidate MRs. Two independent screens investigated whether silencing of candidate MRs could modulate the pluripotent state, resulting in 132 validated MRs. The first screen examined whether candidate MR silencing altered endogenous Oct4 protein levels, while the second screen used PLATE-seq to determine if candidate MR silencing recapitulated the EpiSC differentiation signature. d Functional validation of MRs. 70 selected MRs underwent CRISPR-mediated knockout to investigate pluripotency defects through cell culture assays and teratoma formation. e Network modularity analysis. PLATE-seq expression profiles from MR silencing were used to assemble a comprehensive, experimentally-derived regulatory network. Four distinct MR communities were identified within this network. f Regulatory architecture inference. Network hierarchy and centrality were analyzed, followed by further examination of community structure. Computational inference of a primed state pluripotency network The unbiased ARACNe-based^[72]24 reverse-engineering of a primed pluripotency cell state transcriptional interactome requires a large (n > 100) compendium of gene expression profiles, with sufficient expression variability to reveal the range of mutual information between transcriptional regulators and their potential targets. To accomplish this goal, we developed a general strategy to create the necessary spectrum of EpiSC-derived expression profiles by combining two complementary sources of experimental perturbations. Specifically, we combined treatment with 33 small molecules (termed “perturbagens”) with 5 conditions known to induce differentiation. To be optimally informative and minimize underlying statistical co-dependencies, we selected perturbagens that affect an orthogonal range of signaling pathways to treat two distinct EpiSC cell lines. Critically, the order of treatment was defined based on a pilot experiment demonstrating greater variability of gene expression signatures when perturbagen treatment preceded induction of differentiation (Methods; Supplementary Fig. [73]1a; Supplementary Data [74]1). This strategy generated 276 diverse gene expression profiles representing the response of the two EpiSC cell lines to combinatorial treatment with perturbagens and differentiation inducers (Methods; Supplementary Fig. [75]1a, b; Supplementary Data [76]1, [77]2.1). These profiles were then analyzed by ARACNe to infer the transcriptional targets of 1478 proteins annotated as transcription factors in the Gene Ontology^[78]37 database. The resulting EpiSC primed pluripotency interactome comprises 911,753 TF →target interactions between 1393 TFs and 17,068 candidate transcriptional target genes. This first-generation interactome represents a foundational model for the subsequent identification of pluripotency maintenance master regulator (MR) proteins by VIPER analysis (Supplementary Data [79]3.1). We then proceeded to identify candidate MRs for five distinct pathways of lineage-specific differentiation using the VIPER algorithm. VIPER requires signatures representing two distinct cellular states to identify the proteins (candidate MRs) representing mechanistic determinants of the transition between them. For this purpose, we generated gene expression profiles of EpiSCs following treatment with five distinct differentiation protocols at multiple time points (ranging from 3 h to 72 h). The resulting time series included a total of 144 gene expression profiles (Supplementary Data [80]2.2). These protocols were selected to enrich for differentiation along specific lineages, including neuroectoderm (RA and SB431542), trophectoderm (BMP4), mesoderm (WAFB), and endoderm (WA) (Methods; Supplementary Fig. [81]1c–f). To confirm that these differentiation treatments induced distinct transcriptional states, we analyzed the resulting time series by Principal Components Analysis (PCA) and by expression of lineage-specific marker proteins as assessed by immunofluorescence (Supplementary Fig. [82]1c–f). For each time point in this analysis, we generated a differential gene expression signature for VIPER analysis by comparing perturbagen-treated to baseline untreated EpiSC cells. The resulting differentiation signatures were then analyzed by VIPER, using the EpiSC interactome, to identify candidate MRs for the associated cell state transitions. We reasoned that the most universal pluripotency MRs would be those conserved across most if not all of the five differentiation pathways examined. We thus used Fisher’s method to integrate the VIPER statistics (p-values) representing the differential activity of 1393 TFs, as determined by the analysis of each differentiation signature, across all time points and differentiation trajectories. The resulting ranked list identified 300 candidate MRs whose regulons were significantly enriched in differentially expressed genes (p < 10^-4), with negative and positive enrichment representing MRs capable of inducing or repressing pluripotency, respectively (Fig. [83]2a; Supplementary Data [84]4.1). From this initial list, we selected 172 candidate MRs for further analysis, including most of those that were top ranked, as well as additional candidate MRs of interest (see Methods section “Selection and evaluation of pluripotency MRs” for selection criteria and additional details). Many of the candidate MRs nominated by this systematic approach had been previously implicated in pluripotency regulation (Supplementary Data [85]4.4), confirming that our unbiased analysis was highly effective and suggesting that additional candidates not previously reported in the literature might include many bona fide pluripotency MRs. Fig. 2. Identification and validation of candidate master regulators of primed state pluripotency. [86]Fig. 2 [87]Open in a new tab a Examples of transcriptional regulatory proteins being inactivated, unaffected, or activated during the differentiation treatments. Each heatmap cell represents the VIPER-inferred differential protein activity (DA) expressed as normalized Enrichment Score (NES). Enrichment plots show the distribution of activated (orange) and repressed (purple) transcriptional targets on the 72 h gene expression signatures. The right-most column represents the integrated differential activity of each protein across all time points and treatments (integrated DA). RA: Retinoic acid; WAFB: Wnt3a, Activin A, FGF2, and BMP4; WA: Wnt3a and Activin A. b Oct4 staining of EpiSCs after shRNA-mediated silencing of selected candidate MRs and vehicle or mock-treated controls. Each MR was targeted by two individual hairpins in four independent replicates. Scale bars: 100 microns. c Waterfall plot showing differential Oct4 protein level (z-score) after silencing of each candidate MR; dashed lines indicate FDR < 0.05 (2-tailed). Raw data are provided in Supplementary Data 5.2. d Scatterplot of the effect of silencing each MR on Oct4 protein level and the differentiation score of EpiSCs, which is determined by the recapitulation of the protein activity signature of EpiSC differentiation by the silencing experiment (see Methods). We observed a significant correlation between these scores (Spearman’s Rho 0.32, p < 10^−^4); dotted lines indicate FDR = 0.05 (2-tailed). Candidate MRs showing significant validation (differentiation) score are highlighted in green. Selected MRs labeled in black indicate MRs functionally validated in CRISPR/Cas9 assays, and in blue indicate known key pluripotency regulators. Raw data are provided in Supplementary Data 6.2. Experimental confirmation of candidate MRs To systematically confirm the functional role of the 172 candidate MRs in pluripotency control, we employed two independent approaches. In the first approach, we examined whether silencing of candidate MRs would affect Oct4 protein abundance, which has been frequently employed as a readout in pluripotency screens^[88]38–[89]41. However, since there may be MRs that act independently of Oct4, we also employed a second assay to determine whether silencing of candidate MRs might result in a gene expression signature with similarities to the overall EpiSC differentiation signature. First, we assessed whether lentiviral-mediated shRNA silencing of candidate MRs would modulate the expression of endogenous Oct4 (Pou5f1) at the protein level (Fig. [90]2b, c). To minimize contributions from off-target effects, we used multiple independent hairpins for each candidate MR, with 92% of the candidate MRs targeted by at least 3 hairpins (Supplementary Fig. [91]1g; Supplementary Data [92]5.1). We quantified Oct4 protein levels by immunofluorescence staining, and calculated z-scores representing its differential protein expression relative to non-targeting controls (Fig. [93]2c). Using this assay, we found that 117 (68%) and 15 (8.7%) of the candidate MRs down-regulated and up-regulated Oct4 protein expression when silenced, respectively (FDR < 0.05) (Fig. [94]2c; Supplementary Data [95]5.2). Notably, the analysis identified Oct4 itself as well as other well-known MRs such as Otx2 and Stat3^[96]42,[97]43 as pluripotency regulators (Fig. [98]2b, c). Secondly, to confirm the roles of candidate MRs that modulate pluripotency independently of their influence on Oct4 protein expression, we developed an assay to determine whether their silencing might recapitulate the EpiSC differentiation signature. This assay utilized PLATE-seq, a multiplexed barcoded RNA-sequencing protocol^[99]44 that generates low-depth (1 ~ 2 million reads/sample) gene expression profiles. We used PLATE-seq to generate gene expression profiles produced by 596 independent shRNA-mediated silencing assays that targeted 154 of the 172 candidate MRs in EpiSC cells; the other 18 candidate MRs were either targeted unsuccessfully or lacked a regulon for analysis (Methods; Supplementary Data [100]5.1). Next, we performed VIPER-based analysis of the PLATE-seq profiles, using the EpiSC interactome, to assess the differential activity of all candidate pluripotency MRs, as induced by each silencing assay vs. the pool of non-targeting controls. The resulting differential MR activity profiles were compared to those obtained along the five lineage-specific differentiation time courses by enrichment analysis using the aREA algorithm^[101]25. The resulting Normalized Enrichment Scores (NES) from all hairpins targeting the same candidate MR were integrated across all five differentiation signatures, using a weighted implementation of Stouffer’s method. Weights were proportional to the effect of each hairpin on the activity of its target protein, such that hairpins inducing less-efficient silencing had lower weight (Supplementary Data [102]6.1). Interestingly, we have previously shown that up to ~20% of candidate VIPER-inferred MRs may be correctly inferred but with an opposite directionality (inverted NES sign), due to the presence of autoregulatory loops that invert the relationship between mRNA expression and the activity of its associated protein at steady state (equilibrium)^[103]25. As a result, these analyses were instrumental in confirming the role of candidate MRs as either activators or repressors. Furthermore, we also identified and corrected the activity of 27 of the tested MRs (18%) that had been inferred with inverted activity (Methods; Supplementary Data [104]6.2). To determine the number of candidate MRs whose induced differentiation signature was consistent with or divergent from their ability to modulate Oct4 protein expression, we computed a Marker Score (MS) and a Differentiation Score (DS; see Methods for details) (Fig. [105]2d; Supplementary Data [106]6.1). As expected, the two scores were highly correlated (p = 4.8 x 10^−^5), with 85 of 154 candidate MRs tested by both assays (55%, p < 0.05, 1-tailed FET) showing concordant scores (FDR < 0.05, by 2-tail analysis). Notably, an additional 47 candidate MRs (31%) presented with a significant DS but not MS score, suggesting that these genes may regulate primed state pluripotency in a Oct4-independent manner. In total, we confirmed 132 MRs (86%) by the more conservative test (DS analysis, green dots in Fig. [107]2d), which were selected as the basis for further investigation. Differentiation analysis of primed state pluripotency MRs Since the ability of an MR to induce the experimentally determined differentiation signature upon silencing suggests a potential mechanistic role in pluripotency control, we tested whether silencing of each confirmed MR was sufficient to induce differentiation in culture, using several cell-based assays. For this purpose, we focused on 70 of the 132 confirmed MRs that were identified as novel (i.e., not previously reported in the pluripotency literature at the time the list was compiled), as well as on 4 additional genes (Utf1, Hes6, Rarb, and Ilf3) selected from the MRs previously implicated in pluripotency regulation as positive controls (Supplementary Data [108]4.2). For functional validation, we performed CRISPR-mediated gene knock-out assays (CRISPR-KO) using a lentiviral system^[109]45 to generate mutant alleles for each MR, thus yielding 1150 clonal EpiSC lines and 128 scramble or mock control lines (Supplementary Data [110]7.1). Of the 1150 clonal lines, 311 contained monoallelic frameshift mutations, while 319 contained biallelic in-frame mutations that may not be loss-of-function. The remaining 520 lines contained biallelic frameshift mutations and were thus identified as candidate null mutants for 62 of the targeted novel MRs and the 4 selected known MRs. We failed to derive candidate null lines for the other 8 MRs, 3 of which (Eno1b, Zfp598, and Zfp184) might be due to the low number of clonal sgRNA-targeted lines obtained (Supplementary Data [111]7.1). However, of the other 5 remaining MRs (Tada2a, Phf5a, Zpr1, Zfp207 and Rrn3), we derived ≥ 20 clonal targeted lines for each, none of which represented a biallelic frameshift mutant, suggesting that null mutants for these genes may be impaired in self-renewal and/or clonogenicity (p < 0.01 by Chi-square test; Supplementary Fig. [112]2a). We screened the candidate null mutant and control targeted lines for their morphological phenotypes in feeder-free culture conditions, in which wild-type EpiSC cells display round and densely packed colonies with large nuclei, prominent nucleoli, and scant cytoplasm. We found that candidate null mutants for 15 MRs displayed a higher frequency of abnormal morphologies relative to controls (Fig. [113]3a, Supplementary Figs. [114]3 and [115]4a). In addition, shRNA-mediated knock-down of the 5 MRs that failed to generate biallelic frameshift mutants also resulted in abnormal morphologies (Supplementary Figs. [116]3 and [117]4a). Overall, 3 of the 4 (75%) MRs previously implicated in pluripotency regulation as well as 17 of the 70 (24%) not previously reported MRs displayed morphological phenotypes in this analysis, demonstrating its robustness in identifying pluripotency regulators validated by multiple assays. For further analyses of these 20 MRs, we selected a total of 40 knock-out (KO) lines for the 15 MRs with biallelic frameshift mutations and 11 knock-down (KD) lines for the remaining 5 MRs, as well as 9 negative controls (Supplementary Fig. [118]2b; Supplementary Data [119]7). To assess whether these KO/KD lines were truly null, we performed qRT-PCR analyses of targeted MR expression, which showed significantly lower expression in 46 of the 51 KO/KD lines (Supplementary Fig. [120]5), consistent with increased RNA degradation induced by frameshift mutations or RNA interference. Fig. 3. Functional validation of two master regulators. [121]Fig. 3 [122]Open in a new tab a Morphology of KO or control EpiSC lines; representative images for independent lines are shown. b Fold change in cell number after five days of colony formation. c Number of alkaline phosphatase positive colonies. d Time line of neuroectoderm differentiation. e Immunofluorescence staining of biological replicates (clonal lines) for lineage-specific markers. Each MR was characterized by two clonal KO cell lines in at least three independent replicates. f Time line of mesendoderm differentiation. g Staining for lineage-specific markers; number of replicates as in (e). h Western blots for Cbl and Zc3h13. Source data are provided as a Source Data file. i Schematic of the teratoma assay. j Representative images of mutant and control teratomas dissected from contralateral legs of the same animal. k Teratoma weight. l Hematoxylin and eosin (H&E) and immunohistochemical staining of serial sections. m, n Fraction of area showing ectoderm (neural rosette) (m) and endoderm (ciliated epithelium) features (n). Line and whiskers show the mean ± s.d.; statistical significance is shown as FDR, 2-tailed U-test. Scale bars: 100 microns. We analyzed the self-renewal and differentiation potential of the 20 MRs using in vitro functional assays. We found that KO/KD lines for 18 of the 20 MRs presented lower colony formation efficiency, with 13 of these displaying significantly fewer alkaline phosphatase (AP)-high colonies (Fig. [123]3a–c, Supplementary Figs. [124]3 and [125]4a–c; 2-tailed U-test; FDR < 0.05). We used embryoid body (EB) formation to assess pluripotency exit, and found significantly reduced EB formation efficiency in KO/KD lines for 18 MRs (Fig. [126]3a, Supplementary Figs. [127]3 and [128]4a). Finally, we performed directed differentiation to neuroectoderm or mesendoderm followed by immunostaining for lineage-specific markers, and observed significant effects on neuroectoderm differentiation in 16 KO/KD lines, and on mesendoderm differentiation in 8 KO/KD lines (Fig. [129]3d–g, Supplementary Figs. [130]3 and [131]4a, d, e). Interestingly, none of the 20 functionally validated MRs displayed similar effects across all assays, indicating the functional diversity of individual MRs in regulating distinct aspects of pluripotency and lineage differentiation, as well as the robustness of our MR discovery pipeline. Finally, during the time span of our experiments, 6 of the 20 validated MRs (Irf6, Jazf1, Peg3, Phf5a, Zc3h13, and Zfp207) were identified as pluripotency relevant, albeit in non-EpiSC contexts^[132]46–[133]51, providing independent confirmation of our systematic MR analyses. We focused in greater detail on 2 MRs that consistently generated highly significant results in nearly all assays, Cbl and Zc3h13. Cbl is a member of a family of three closely related RING finger E3 ubiquitin ligases that are responsible for proteasome-mediated substrate degradation^[134]52; although Cbl null mutant mice are homozygous viable with T cell defects, Cbl; Cbl-b double homozygotes display an uncharacterized lethality prior to mid-gestation^[135]53. Zc3h13 is a component of the WMM (Wtap-Mettl3-Mettl14) complex that mediates N6-methyladenosine (m6A) methylation of RNA, and its knock-down decreases self-renewal of mouse embryonic stem cells^[136]47; in addition, we have found that Zc3h13 null mutants display a peri-implantation lethal phenotype (N. Chirathivat et al. in preparation). We confirmed that the KO EpiSC lines for Cbl and Zc3h13 were true null alleles by Western blotting (Fig. [137]3h), and that Cbl and Zc3h13 mutant EpiSCs displayed altered differentiation of neuroectoderm and mesendoderm at the mRNA level by qRT-PCR (Supplementary Fig. [138]6). To determine whether loss-of-function for Cbl and Zc3h13 affected differentiation potential in vivo, we performed grafts of KO lines to generate teratomas and found that both Cbl and Zc3h13 KO teratomas were smaller in size and weight (Fig. [139]3i–k). By H&E staining and marker immunostaining of tissue sections, we found that Cbl and Zc3h13 KO teratomas displayed increased neural rosette formation and neural marker expression, whereas Cbl KO but not Zc3h13 KO resulted in increased ciliated epithelium, consistent with increased epithelial marker expression (Fig. [140]3l–n, Supplementary Fig. [141]7). Thus, both Cbl and Zc3h13 are required for pluripotency maintenance in teratomas. An experimentally-based primed state pluripotency network Having computationally identified a large number of previously known as well as unknown MRs for primed state pluripotency, and functionally validated many of the previously unknown ones, we next sought to elucidate the architecture of the regulatory network that governs the primed state pluripotency state. To accomplish this goal, we derived an experimentally-refined, second-generation primed pluripotency transcriptional interactome, by leveraging the PLATE-seq gene expression profiles representing the response of EpiSC cells to the silencing of each of the 132 confirmed MRs (green dots in Fig. [142]2d). Specifically, these data were used to refine the ARACNe regulon of each confirmed MR by integrating the statistical significance of (a) the candidate target’s differential expression following MR silencing and (b) their mutual information. By representing the causal response of the cell to silencing of each MR, this strategy also helped infer the directionality of each interaction. This analysis produced a comprehensive set of experimentally refined regulons, resulting in an “experimentally curated,” causal interactome (ECC-interactome) (Supplementary Data [143]3.2). Using the ECC-interactome, we evaluated whether silencing a specific MR would affect the VIPER-inferred activity of another MR, as assessed using the experimentally refined regulons (Supplementary Data [144]3.2). This analysis yielded a pluripotency focused, experimentally-based EpiSC regulatory MR network, in which MRs represent nodes in the network and MR[1] → MR[2] interactions correspond to directed edges that are defined by a significant change in MR[2] protein activity (FDR < 10^−^10) when MR[1] is silenced (Fig. [145]4a). Specifically, the EpiSC pluripotency network contains 1,273 MR → MR interactions (edges) between 120 of the 132 confirmed MRs (nodes) (Fig. [146]4a and Supplementary Data [147]8). Fig. 4. Regulatory network model for primed state pluripotency. [148]Fig. 4 [149]Open in a new tab a Regulatory network of primed state pluripotency with 120 nodes and 1273 unidirectional edges. Only significant effects (FDR < 10^−10, 2-tailed aREA test) are included. The size of each node is proportional to its Regularized Out-Degree (ROD) score; colors indicate community assignment (strong, FDR < 0.01; weak, 0.01 < FDR < 0.1; unassigned FDR > 0.1). Six outlier MRs showing the highest ROD scores (Speakers) are highlighted. b Heatmap showing the effect of silencing each MR on the activity of all other MRs. The silenced MRs are represented by the columns, and the rows report activity changes for the indicated proteins. The 77 MRs (76 strong community members and Utf1) are grouped by their community membership. The color scale indicates statistical significance (-log[10](FDR), 2-tailed aREA test) for each MR protein differential activity; only significant changes in protein activity (FDR < 10^−^10) are shown. To investigate the architecture of this EpiSC pluripotency MR network, we assessed its modularity, using Louvain’s community detection algorithm^[150]54. The analysis showed that 74% (n = 89/120) of the confirmed MRs were organized into four distinct communities (Supplementary Fig. [151]8a). We then assessed the association between each MR and each community, based on the statistical significance of the enrichment of the MRs representing its first neighbors (i.e., the MRs directly connected to it) in each community. The analysis identified 76 MRs with strong community association (FDR < 0.01, 1-tail Fisher’s Exact Test, FET), 13 MRs with weak association (0.01 ≤ FDR < 0.1), and 31 MRs lacking association (FDR ≥ 0.1) with any community (Fig. [152]4b; Supplementary Data [153]8.4; Supplementary Fig. [154]8a). The regulatory logic of primed state pluripotency We next assessed whether individual MRs may play distinct roles in the EpiSC pluripotency MR network, based on their interactions with other MRs. For this purpose, we assessed the In-Degree and Out-Degree metrics for each MR, corresponding to its number of incoming ( → MR) and outgoing (MR → ) regulatory interactions; we also computed a “Regularized Out-Degree” (ROD) metric, which corrects for the different number of interactions of each MR (see Methods). The statistical significance of their In-Degree vs. Out-Degree helped stratify MRs into three classes, corresponding to “Speakers” (significantly higher Out-Degree), “Listeners” (significantly higher In-Degree), and “Communicators” (no significant difference) (Fig. [155]4a; Fig. [156]5a–c), thus illuminating their role in mediating the flow of information in pluripotent cells. We could also identify “Mediator” MRs based on their “Betweenness” centrality, a metric aimed at assessing the number of shortest paths between any two MRs going through an MR of interest, thus capturing its influence over the flow of information in the network (Fig. [157]5a, c; Supplementary Fig. [158]9a, b). Importantly, for the 77 Core MRs (76 with strong community association plus Utf1), there was no correlation of their average protein activity, community score, ROD, or Betweenness with the silencing efficiency of shRNA hairpins used in our PLATE-seq analysis (Rho < 0.22, p > 0.01; Supplementary Fig. [159]9c–f), suggesting the absence of assay-derived bias. However, we did observe ROD differences between communities 1 and 4, as well as differences in interaction degree between communities (Fig. [160]5d–f), suggesting differences in the connectivity of these communities. Fig. 5. Community structure of the primed state pluripotency network. [161]Fig. 5 [162]Open in a new tab a Schematic representation of parameters for defining node hierarchy based on their regularized out-degree (ROD) and betweenness score. b Distribution density for ROD across all MRs. The vertical dotted lines indicate the point of equal probability for a mixture of Gaussian models fitted to the data, defining the boundaries for Listeners, Communicators and Speakers. c Heatmap showing network parameters for the 77 MRs, showing scaled values for Degree, ROD and betweenness scores. d–f Violin plots showing distribution of regularized out-degree (d), betweenness (e), and degree (f) (FDR < 0.05, 2-tailed U-test). g Heatmap of the integrated meta-protein activity change for each community across the five differentiation treatments. Each cell represents the weighted average of the relative change in activity vs. untreated EpiSC controls across all MRs of each community, with the weight calculated based on the MR community score (Supplementary Fig. [163]8). h Heatmap showing GOBP pathways preferentially enriched (-log[10](FDR), 1-tailed aREA test) on each of the four communities. These analyses identified 6 Speaker, 75 Listener, and 39 Communicator MRs, as well as 4 Mediator MRs (Oct4, Cited2, Utf1, and Otx2) (Figs. [164]4a, [165]5a, c; Supplementary Fig. [166]9b). The Speakers include three ETS transcription factors (Etv1, Etv6, and Etv4), likely due to their roles in mediating FGF signaling, which is involved in maintaining primed state pluripotency^[167]7. Interestingly, Zfp207, which has only recently been characterized in regulation of pluripotency^[168]48,[169]55, was identified as a Speaker, while Utf1 was the only Speaker or Mediator not strongly associated with any of the four communities. Notably, two well-characterized pluripotency regulators, Oct4 and Otx2, were identified as both Speakers and Mediators (Fig. [170]4a; Supplementary Fig. [171]9b; Supplementary Data [172]8.2), reinforcing their significance as key pluripotency regulators. Next, we asked whether the arrangement of MRs into communities might reflect differences in their functions in pluripotency maintenance or lineage specification. Therefore, we examined the differential activity of each MR in the network along the five lineage differentiation time courses. Interestingly, nearly all MRs displayed similar patterns, with ~10% and ~90% of the MRs showing time-dependent increased and decreased activity along the five differentiation treatments, respectively, suggesting a highly coordinated program (Supplementary Fig. [173]8b). To determine whether individual communities may be associated with specific lineage differentiation paths, we then integrated the activity of the MRs in each community weighted by their community scores (Supplementary Fig. [174]8a). Interestingly, the average activity of all four communities was significantly reduced across all five lineage differentiation signatures (Fig. [175]5g), indicating that the four communities are not associated with any specific lineage differentiation trajectory. Finally, to determine whether individual communities oversee distinct biological functions, we performed pathway enrichment analyses on the gene expression signature obtained by silencing of each individual MR. We then integrated the enrichment scores for the MRs assigned to each community while weighting their contribution by the community association scores (Fig. [176]5h and Supplementary Fig. [177]8a). This analysis identified 250 Gene Ontology-Biological Process (GOBP) terms enriched in at least one community, with 57 enriched in all four (FDR < 0.01, 1-tailed aREA test; Supplementary Data [178]9.1). Common biological terms included “developmental processes,” “nucleotide processing,” “protein modification,” and “cellular transportation events,” all of which are involved in pluripotency regulation. In contrast, 81 GOBP terms were differentially enriched only in a single community (Fig. [179]5h; Supplementary Data [180]9.2). Community 1, for instance, is enriched in “nucleotide processing,” “protein modification,” and “other intracellular activities,” Community 2 in a range of developmental processes and cellular signaling pathways, and Community 4 in metabolic pathways. Interestingly, Community 3 was only associated with a single category, “platelet formation,” thus defying a more general characterization. These findings suggest that while different communities are associated with distinct GO biological processes, changes in the activity of their constituent MR proteins—primarily progressive inactivation occurring over the short time scale of our study—were observed similarly across all five differentiation protocols, as indicated by Fig. [181]5g. Thus, pathway enrichment analysis revealed both cooperative and unique roles of MR communities in regulating primed state pluripotency. Discussion Pluripotent cell states and dynamic transitions between these and more differentiated states have long been utilized as model systems for reconstruction of gene regulatory networks. For example, development of the CellNet platform revealed a regulatory network for somatic cell reprogramming and identified factors modulating the efficacy of these cell fate conversions^[182]56–[183]59. However, earlier studies of pluripotency networks have focused only on a relatively small set of regulators, many of which had been previously identified in the literature. Our current network for primed state pluripotency is notable for its size, unbiased generation, and derivation by computation followed by direct experimentation. Furthermore, the size of our network has allowed analysis of its overall topology and elucidation of its regulatory logic. Our analysis includes several advances that, taken together, produce an innovative and highly generalizable methodology. Specifically: (a) Nodes (i.e., MRs) in the regulatory network were identified by an unbiased methodology that is not based on prior knowledge of gene function; (b) Node identification was not dependent upon differential expression of the corresponding gene, but rather on the role of the encoded protein in regulating the overall transcriptional state of the pluripotent cell (differential activity), as determined by the differential expression of its transcriptional targets. This is critical since transcription factor activity is often regulated post-translationally rather than transcriptionally; (c) Network interactions (i.e., edges) and causality were dissected based on the effect of regulatory proteins on the inferred activity of other regulatory proteins, following experimental silencing of 132 confirmed MRs identified by the analysis; (d) Assessment of pluripotency integrated the effect of individual regulators on Oct4 protein expression as well as on induction of lineage-specific differentiation signatures, providing an approach to identify both Oct-4 dependent and Oct4-independent pluripotency regulators; (e) Analyses of network architecture utilized concepts of centrality and betweenness, which are critical properties of social network interactions^[184]60,[185]61. This approach is instrumental in illuminating the control of information flow in primed pluripotent cells, based on the well-studied concept in graph theory that some nodes may play a critical role in mediating the information flow between communities (Mediators) or in controlling entire programs (Speakers)^[186]60. In combination, these features result in an unbiased primed state pluripotency network architecture that is substantially larger than described in previous studies. We have shown that 76 of the 120 regulatory proteins that control pluripotency in our network are organized within four regulatory modules, termed “communities,” where the highly interconnected architecture of each community is countered by a much sparser connectivity between communities. This network architecture contrasts significantly with the previously hypothesized “hierarchy” and “balance” models. Instead, our findings suggest that, in mouse primed state pluripotent cells, the “hierarchy” and “balance” models coexist in a hybrid form that we term “communal interaction”. Specifically, we propose that primed state pluripotency is maintained by the balanced activities of four regulatory communities, with a small number of community members providing hubs for information flow. Each community represents a distinct functional unit yet is not closely associated with lineage-specific differentiation, as originally proposed in the balance model. In principle, identification of 8 MR proteins as either Speakers or Mediators might lend support to the existence of core factors in the network. Notably, Mediators and Speakers include Oct4 and Otx2, consistent with their essential roles in primed state pluripotency^[187]42,[188]62,[189]63. In addition, the identification of ETS transcription factors Etv1, Etv4, and Etv6 as Speakers is consistent with the central role of FGF signaling in maintaining primed state pluripotency^[190]7. However, previous studies have not suggested a critical role for Zfp207, Utf1, and Cited2 in controlling primed state pluripotency, suggesting that these proteins may deserve further investigation. Importantly, EpiSCs differ from mESCs in their lack of requirement for key regulators of the naïve pluripotent state such as Sox2 and Nanog^[191]64,[192]65, and notably these genes were not detected as having central roles. Overall, however, the strong association of 7 of the 8 Speaker/Mediator MRs with one of the 4 communities argues against a dominant role of these MRs. Rather, the nearly equivalent levels of hierarchical organization and centrality among the four communities (Fig. [193]5d–f) supports a more decentralized architecture of the primed state pluripotency network. It is important to note some limitations of the approach that may affect the completeness and accuracy of our regulatory network for primed state pluripotency. First, the analysis is limited to 1393 genes annotated as transcription factors. Second, our original EpiSC interactome was derived from microarray data rather than from RNA-seq, which could have produced higher-quality transcriptomic profiles that might have improved the resolution of the network. Third, MRs are often synergistic, even if they fail to produce an observable phenotype on an individual basis^[194]29,[195]30; consequently some of the MRs identified by this study may lack a detectable phenotype in vitro and/or in vivo, suggesting that the number of MRs contributing to primed state pluripotency maintenance may be larger than that validated by our functional assays. Finally, this study was performed using bulk, rather than single-cell profiles, and consequently may not account for potential heterogeneity within EpiSC cultures. More broadly, our work provides a general road-map for elucidating the regulatory logic of a complex cell state. In particular, when combined with gene expression and protein activity analysis, our methodology for low-cost, high-throughput RNA-sequencing facilitates practical direct experimentation to infer or refine the architecture of any regulatory network. The inclusion of ATAC-seq data, in particular for targets of transcription factors with known binding motifs, might further enhance this strategy to strengthen the network inference. Furthermore, our study provides a rich and valuable resource to investigate molecular mechanisms that govern stemness. Future studies will promote development of innovative strategies to control the dynamics of pluripotency and may therefore hold considerable promise for translational applications in regenerative medicine. Methods EpiSC derivation and maintenance All experiments utilized the EpiSC-5 line^[196]8, which was derived from 129SvEv mice, or the EpiSC-G line. As described in the sections below, both lines were used for generation of the expression profiles for initial interactome assembly using the ARACNe algorithm, but the EpiSC-5 line was solely used for all other assays, including time course differentiation signatures, shRNA-based validation, PLATE-seq assays, and CRISPR-mediated validation. We derived the EpiSC-G line from C57BL/6 J (Jackson Laboratory) mice as described^[197]7,[198]8, with the following minor modifications. In brief, C57BL/6 pregnant female mice were euthanized at 5.5 dpc, and pre-gastrulation epiblast was dissected intact and separated from extraembryonic ectoderm using glass needles in calcium- and magnesium-free PBS. Dissected tissue was treated with Cell Dissociation Buffer (Gibco 13151-014) at 4 °C for 15 min. Epiblast cells were cultured on inactivated mouse embryonic fibroblasts (MEFs) in EpiSC basal medium containing DMEM/F12, 1x GlutaMAX supplement (Gibco 10565-018), 20% knockout serum replacement (Gibco 10828-010), 1x MEM non-essential amino acids (Gibco 11140-050), 0.1 mM 2-Mercaptoethanol (Gibco 21985-023), 10 ng/ml Activin (Gemini 300-356 P), and 12.5 ng/ml FGF2 (Peprotech 100-18B). Every 3-4 days, cells were dissociated with 2.5 U/ml Dispase (Stemcell Technology 07913), and re-plated as small clumps at ~1:6 ratio onto a new MEF layer. After 8 ~ 10 passages, cells were dissociated, aliquoted, and frozen in cryovials in 50% basal medium/40% knockout serum replacement/10% DMSO. The established cell line was verified for normal male karyotype and tested negative for mycoplasma. EpiSC lines were cultured under feeder-free conditions in Matrigel-coated plates with filtered MEF-conditioned media (EpiSC basal media harvested after culturing with inactivated MEFs for 24 h) supplemented with 12.5 ng/ml FGF2 and 10 ng/ml Activin (CMAF). In experiments requiring cell quantitation, cells were dissociated with 0.05% Trypsin or TrypLE and counted with an automated cell counter (Bio-Rad TC20). Cells were then re-plated onto Matrigel-coated plates with CMAF supplemented with 10 uM Rock inhibitor Y-27632. EpiSC differentiation for interactome generation and analysis To generate gene expression profiles for interactome assembly, we used the EpiSC-5^[199]8 and EpiSC-G lines. Cells were plated at a density of 50,000 cells/well in 6-well plates coated with Matrigel and 2 ml CMAF + 10 µM Y27632. At 24 h after plating, cells were maintained in CMAF or switched to EpiSC basal media (Supplementary Data [200]2.1), followed by addition of each of 33 perturbagens (Supplementary Data [201]1). Each one of five differentiation-inducing treatments (Supplementary Data [202]1) was then added at 12 h after perturbagen addition; finally, cells were harvested at 60 h after initial plating. For embryoid body formation, perturbagen-treated EpiSCs were dissociated and plated onto AggreWell^TM 400 plates (StemCell Technology 34411), with 2 ml of EpiSC basal media at 1.2 million cells/well. Plates were spun for 5 min at 1500 rpm to settle the cells. At 24 h after plating, the nascent embryoid bodies were transferred to 6-well, low-attachment plate cultures (Corning 3471) for an additional 12 h before harvest. Gene expression profiles were generated as described in the transcriptomic profiling section. Differentiation time course data were generated using the EpiSC-5 line. For retinoic acid treatment^[203]66, cells were cultured with 5 µM retinoic acid in MEF conditioned media without supplementation of FGF2 or Activin A (CM) for 72 h. For BMP4 treatment^[204]67, cells were cultured for 72 h in CM + 100 ng/ml BMP4. For SB431542 treatment^[205]68, cells were cultured for 72 h with 10 nM SB431542 in culture media containing DMEM/F12 with 1x GlutaMAX/1x insulin-transferrin-selenium/1x NEAA/2% B-27 (Gibco 17504-044)/90 µM 2-Merceptoethanol. For endoderm differentiation^[206]69, cells were first cultured for 24 h in RPMI (Gibco 21870-076)/GlutaMAX/Activin A (100 ng/ml)/Wnt3a (25 ng/ml)/0.075 mM EGTA, followed by another 48 h in RPMI/GlutaMAX/Activin A (100 ng/ml)/0.2% FBS. Mesoderm differentiation was performed according to^[207]70, by culturing cells in basal media + 25 ng/ml Wnt3a + 50 ng/ml of Activin A on Day 1; Basal media + 25 ng/ml Wnt3a + 25 ng/ml Activin A + 20 ng/ml FGF2 on Day 2, and basal media + 25 ng/ml Wnt3a + 10 ng/ml Activin A + 20 ng/ml FGF2 + 40 ng/ml BMP4 on Day 3. Gene expression profiles were generated as described in the transcriptomic profiling section. Transcriptomic profiling To generate gene expression profiles for interactome assembly and MR analysis, cells were lysed with Trizol (ThermoFisher 15596018), and transferred into Eppendorf tubes for short-term storage at −80 °C. We used 600 µl of cell lysate for total RNA extraction (Clontech 740955), followed by processing for microarray (for interactome assembly) or RNA sequencing (for differentiation signatures). For microarray analysis, total RNA was biotin-labeled using the Illumina TotalPrep RNA Amplification Kit (ThermoFisher AMIL1791) and hybridized on mouseWG-6 v2 BeadArrays (Illumina). Slides were scanned using an iScan (Illumina SY-101-1001) to generate the resulting files. For RNA sequencing, samples were submitted to the Columbia Genome Center for library preparation and sequencing. In brief, mRNA was enriched by poly-A pull-down, and library synthesis was performed using the Illumina TruSeq RNA prep kit (Illumina RS-122-2001). Libraries were pooled and sequenced on the Illumina HiSeq 2500 or 4000 platforms, yielding approximately 30 million single-ended reads per sample. EpiSC perturbation (pilot project) To guide the design of the EpiSC perturbation assays for interactome generation, we performed a pilot experiment. Specifically, Illumina BeadArrays (mouse WG-6, Illumina) were used to profile the transcriptome of 33 independent EpiSC samples. This experiment was divided in three arms. The first arm consisted of cells perturbed with 10 small molecule compounds (100 nM rapamycin, 250 nM PF573228, 2 μM dorsomorphin dihydrochloride, 25 μM BAY 11-7082, 50 μM forskolin, 5 μM BAPTA-AM, 500 pM docetaxel, 50 nM wortmannin, 2 μM purmorphamine, and 2 μM SU5402) and vehicle control for 24 h. For the second arm, cells were first perturbed for 36 h with 5 μM RA and for 24 h with 10 small molecule compounds and vehicle control. Specifically, the cells were exposed to either control media or RA for 12 h, small molecule compounds were then added, and cells were further incubated for 24 h. The third arm consisted of the same RA/compounds combination experiment but was performed using an inverted perturbagen vs. RA treatment order (i.e., 24 h RA and 36 h perturbagen treatment). Gene expression profiles were generated using an iScan BeadArray scanner (Illumina) and pre-processed by variance stabilization and robust spline normalization, as implemented by the lumi R package^[208]71. Gene expression profiles are available from the Gene Expression Omnibus ([209]GSE197632). A 2D plot (Supplementary Fig. [210]1a)—obtained by projecting the data on the first two principal components, after scaling by subtracting the mean and dividing by the standard deviation of each gene—showed that treating first with perturbagens and then with RA induced greater data dynamic range than the opposite order. We thus decided to perform the larger study by first treating with each perturbagen for a total of 36 h, and with each morphogen for 24 h, thus maximizing the dynamic range for the reverse engineering of an EpiSC-specific interactome (see below). In particular, the 36 h time point was selected to maximize gene expression dynamic range while minimizing the likelihood of chromatin remodeling, which otherwise would alter the resulting network topology. Dimensionality reduction analysis To project the data in 2 dimensions, we used Principal Component Analysis (PCA), as implemented by the prcomp function of the stats package in R-system. We also used the ‘umap’ v0.2.7.0 R package to perform Uniform Manifold Approximation and Projection (UMAP) analysis, as available from the CRAN repository ([211]https://cran.r-project.org/web/packages/umap/index.html). Generating an EpiSC context-specific interactome The transcriptomes of the two EpiSC cell lines—following treatment with combinations of 33 perturbagen (36 h) and 5 morphogens (24 h), in two growth media conditions (CMAF and Basal)—were profiled by Illumina BeadArrays (mouse WG-6 v2, Illumina). Poor quality profiles were removed using the built-in Illumina BeadArray software, for a total of 276 high-quality gene expression profiles. (Supplementary Data [212]2.1). Gene expression profiles were generated using an iScan BeadArray scanner (Illumina) and pre-processed by variance stabilization and robust spline normalization, as implemented in the lumi R package^[213]71. Gene expression profiles are available from the Gene Expression Omnibus ([214]GSE197414). The profiles were then analyzed using the ARACNe algorithm^[215]24,[216]72 to reverse engineer an EpiSC-specific transcriptional interactome representing the transcriptional targets of 1478 regulatory proteins. The latter included those annotated as GO:0003700 (transcription factor activity), GO:0004677 (DNA binding), and GO:0030528 (transcription regulator activity), or as GO:0004677 and GO:0045449 (regulation of transcription) in the Gene Ontology (GO) Molecular Function database^[217]37. To generate an optimal consensus network, we performed 100 bootstrap iterations of the ARACNe analysis. Parameters were set to DPI = 0 (data processing inequality tolerance) and MI P-value p ≤ 10^−8 (mutual information threshold). The resulting interactome comprises 911,753 transcriptional interactions (Supplementary Data [218]3.1). Predicting master regulators of pluripotency and differentiation To generate time courses representing the path from a pluripotent to a differentiated cell state, we generated RNA-Seq profiles of EpiSC-5 cells undergoing 5 distinct differentiation treatments (differentiation arms), at 3 h, 6 h, 12 h, 18 h, 24 h, 36 h, 48 h and 72 h (Supplementary Data [219]2.2). RNA-Seq profiles were generated with the Illumina TruSeq protocol, using a HiSeq 2000 instrument (Illumina), at the Columbia Sulzberger Genome Center. Sequencing data were mapped to the mouse genome MGSCv37 with TopHat v1.3.3^[220]73. Reads mapping to known genes, based on Entrez gene identifiers, were then counted using the GenomicFeatures R package, from Bioconductor^[221]74. Summarized raw counts for 23,283 known genes in 144 samples are available from the Gene Expression Omnibus ([222]GSE199114). We computed a size factor for each sample using the median log-ratio method and identified as outliers the samples showing residuals from a linear fit of the computed size factors vs. sequencing depth (total number of mapped reads) greater than 50%. Samples identified as outliers were removed from further analyses. Gene expression data were then normalized by equivariance transformation, based on the negative binomial distribution, with the DESeq R package^[223]75. The expression likelihood for each gene was estimated by fitting a mixture of three Gaussian distributions to the Log (RPKM + 1) data, across all genes in the integrated dataset, i.e. raw counts for each gene aggregates across all control samples, with the mixtools package^[224]76, and expressed as 1 minus the relative likelihood for an observation, expressed as Log (RPKM + 1), to be derived from the first (lowest mean) distribution. For each differentiation arm, gene expression signatures were computed at each time point by differential gene expression analysis against the profiles of undifferentiated EpiSC-5 cells (t = 0 h) as the reference, using Student’s t-test (Supplementary Data [225]4.3). Protein activity signatures were then assessed using the msviper function of the VIPER R package, available from Bioconductor^[226]25. Normalized Enrichment Score (NES) and associated p-values were estimated by permuting sample labels uniformly, at random 1,000 times. For time points with fewer than 5 replicates, the closest samples from contiguous time points were included, up to 5 sample in total, to support the permutation-based estimation of NES and p-values (Supplementary Data [227]4.4). These were assessed based on the similarity of their gene expression signatures using Euclidean distance. For each differentiation arm and time point, differential protein activity p-values were assessed from the VIPER-computed NES. To account for proteins that could be activated in some conditions and inactivated in others, p-values were further integrated using the Fisher’s method. Finally, to compensate for a potentially non-statistically independent contribution of the mesoderm and endoderm differentiation arms—which produced strongly correlated protein activity signatures—we down-weighted their contribution by half during Fisher’s p-value integration. The resulting ranked protein list identified 300 candidate MRs (p < 10^-4) (Supplementary Data [228]4.1). Among these, 134 (45%) had been previously implicated in pluripotency regulation in the literature (Supplementary Data [229]4.2). To prioritize candidate MRs for an experimentally feasible shRNA-based validation assay, we selected candidate genes using the following four criteria: (1) The top 120 most differentially active proteins, as ranked by p-value, were included by default; (2) 70 additional proteins were selected uniformly at random, from proteins ranked between positions 121 and 300 by p-value; (3) 30 previously established pluripotency regulators—including 18 that were not identified among the top 300 MRs, and 12 that were not represented as transcriptional regulators in the EpiSC interactome—were also included; (4) Finally, since MR validation was based on silencing assays, we removed those not expressed in EpiSC (likelihood > 0.5). Taken together, these criteria produced a list of 172 candidate MRs for experimental validation (Supplementary Data [230]6.2). Note that of the 18 previously identified pluripotency regulators that were not in the original list of 300 candidate MRs, only 4 (Gli2, Jarid2, Sall4, and Sox2) remain among the 132 MRs used in the final network analysis. Thus, 128 out of the 132 MRs in the final network were identified in a completely unbiased manner. High-throughput shRNA screen Lentiviruses containing shRNA hairpins were purchased from the MISSION® shRNA Lentiviral library (Sigma) (Supplementary Data [231]4.1). To minimize off-target effects, a minimum of three independent hairpins were used for > 80 of candidate MRs. EpiSCs were passaged onto matrigel-coated 96-well plates (Greiner 655090) at a density of 3,000 cells/well in 100 µl CMAF medium supplemented with 10 µM Y27632. At 12 h after passaging, 50 µl of the culture medium was replaced with 50 µl lentivirus with 8 µg/ml polybrene. At 24 h after lentivirus transduction, the medium was replaced with 100 µl CMAF + 2 µg/ml puromycin (Sigma P7255) for selection of stable integrants; puromycin concentration was increased to 4 µg/ml at 48 h. At 60 h, cells were fixed with 4% paraformaldehyde (PFA) and immunostained for Oct4 (Pou5f1) (Santa Cruz sc-5279, 1:1,000) using an Alexa Fluor 488 secondary antibody (Invitrogen A11029, 1:500). Cells were counter-stained with Hoechst 33342 (ThermoFisher H3570, 1:5,000), and plates were scanned using the IN Cell analyzer 2000 (GE Healthcare). For each well, 12 square-shaped fields were imaged for two channels, and exported images were analyzed by the IN Cell Analyzer workstation software. Oct4 immunofluorescence data analysis We used the IN Cell Image Analyzer to perform immunofluorescence intensity quantification at the individual cell level. This instrument can scan wells in a 96-well plate and exports image files using two channels (Green Fluorescence (GF) and DAPI). These data were analyzed using the IN Cell 2000 workstation software. By setting up diameter parameter as 5um~10um, the software first analyzes the DAPI-channel to identify individual nuclei. The green fluorescence of each nucleus is then quantified by scanning the 488 nm channel. CSV files comprising the green fluorescence quantification for the cells in each well were then exported by the software. The data was acquired in 22 96-well plates (batches), with each plate assaying a subset of MR-directed and negative control shRNA hairpins. Overall, the instrument quantified the fluorescence of 23,317,327 individual cells. These include 6,537,819 cells representing control conditions, 15,243,283 cells transduced with candidate MR-targeting shRNA hairpins, and 1,536,225 cells transduced with shRNA clones targeting established pluripotency regulators. In total, 623 and 47 hairpins, targeting 158 candidate MR and 14 established pluripotency regulators, were tested, respectively. Cells with fluorescence values higher than 6 inter-quartile range (IQR) units above the median of control conditions were removed from further analysis as outliers. The data were then normalized, independently for each plate, using the plate-specific control conditions. A cumulative empirical density distribution was computed for the control conditions of each plate and used to estimate the normalized fluorescence z-score of each cell. By using plate-specific controls, the analysis eliminated all systematic difference between plates (batch effect). Data were summarized on an individual well basis by computing the area over the cumulative empirical distribution across all cells in the well, thus eliminating well-specific cell density as a confounding factor. Next, the statistical significance for the differential Oct4 protein level was estimated by comparing the summarized replicates for each hairpin to the summarized replicates of control wells, using Student’s t-test. Finally, to compute a gene-level z-score, the statistics of each associated shRNA clone were integrated using Stouffer’s method. To account for different silencing efficiencies, the shRNA-mediated reduction in the activity of the target protein (see next section below), expressed as z-score, was used to weight the contribution of the corresponding shRNA. Thus, shRNAs failing to inhibit the target proteins provided no contribution to the score. The final Oct4 corroboration score (CS) was computed by multiplying the Stouffer’s integrated z-scores for the corresponding gene by the sign of the MR protein activity change during differentiation (i.e., +1 for activity increase and -1 for activity decrease). Finally, we derived the Oct4 protein score as the log-transformation of CS while maintaining its sign: [MATH: Oct4proteinscore=log2(CS+1)< /mo>sign(CS) :MATH] 1 PLATE-seq analyses For PLATE-seq (Pooled Library Amplification for Transcriptome Expression), samples were prepared as described^[232]44, with minor modifications. Total RNA from 100 µl of Trizol-dissolved cell lysate was purified using the “No Spin” method of the MagMAX-96 for Microarrays kit (Ambion AM1830). RNA was eluted using 50 µl of elution buffer, and the concentration measured by Nanoview. 100 ng total RNA from each sample was diluted to 15 µl using nuclease-free water and transferred to 96-well plates (Applied Biosystems N8010560). We added 1 µl of 1:1,000 dilution of ERCC Ex-Fold Spike-Ins (ThermoFisher 4456739) to every other column of the plate to check for potential inter-sample contamination during library preparation. To anneal primers for reverse transcription, 3 µl of 100 µM barcode-linked oligo-dT primer was added to each well with 5 µl of 5x ProtoScript RT buffer (New England Biolabs M0368). The plate was incubated at 94 °C for 2 min and immediately cooled on ice for at least 5 min, followed by addition of the reverse transcription master mix: 2.88 mM dNTPs, 6.4 mM DTT, 0.14 U/µl SuperaseIN, 1.44 U/µl Protoscript II Reverse Transcriptase, and water to a final volume of 25 µl. The plate was then incubated at 42 °C for 2 h. To remove excess RT primer, 2 µl of 1:4 diluted ExoI (20 U/µl) was added to each well, and the plate incubated at 25 °C for 1 h. To hydrolyze the RNA, 10 µl of 1:1 mixture of 1 M sodium hydroxide and 0.5 M EDTA was added followed by incubation at 65 °C for 15 min. All wells on the plate were then pooled in a 50 ml conical tube. The pooled sample was then purified and concentrated using DNA Clean & Concentrator Kit (Zymo Research D4013) using a modified protocol. The sample was first diluted seven-fold using DNA binding buffer (Zymo Research D4003-1-L), and passed through the binding column using a vacuum apparatus. The column was washed twice with washing buffer, and eluted with 15 µl of nuclease-free water. The eluted sample was further purified using AMPure XP PCR purification beads (Beckman Coulter A63880) at a 1:1 sample to bead ratio, and eluted with 15 µl of nuclease-free water. For second-strand synthesis, 1 µl of 10 mM dNTPs and 1 µl of 100 mM adapter-linked random hexamer primer were added, followed by heating at 70 °C for 2 min and immediate cooling on ice for 5 min. 2 µl of NEBuffer 2 (New England Biolabs B7002) and 1 µl of Klenow large fragment DNA polymerase (New England Biolabs M0210) were then added, followed by incubation at 25 degrees for 30 min and two rounds of purification with AMPure XP beads. The double-stranded cDNA was eluted into 15 µl nuclease-free water and concentration measured using a Qubit Fluorometer 3.0 with dsDNA high-sensitivity assay kit (ThermoFisher [233]Q32851). For library amplification, the cDNA was diluted to 0.05 µg/ml and PCR-amplified using Phusion DNA polymerase (New England Biolabs M0530) with 0.5 mM dNTPs, 0.5 µM Illumina RP1 and RPl1 primers. The PCR product was purified using AMPure XP beads for one round and eluted into 15 µl nuclease-free water. 2 µl of the product was analyzed by bioanalyzer with an expected broad peak from 200 bp to 1,000 bp and average length of product 500 bp. 1 nM of PCR product was hydrolyzed with an equal amount of 0.2 M sodium hydroxide, and mixed with hydrolyzed PhIX control library (Illumina FC-110-3001). The final product contained 1.8 pM library with 30% PhIX/70% sample in 1.3 ml HT1 buffer. The product was loaded on the NextSeq 500 high output v2 kit (Illumina FC-404-2005) together with two custom primers and two PhIX primers at final concentration of 300 nM each. The sample was loaded onto an Illumina NextSeq 500 sequencer for paired-end sequencing, with Read 1 (26 cycles) identifying the 8 bp barcode for sample well information and Read 2 (66 cycles) to determine the identity of the mRNA. All PLATE-seq oligo sequences are listed in Supplementary Data [234]10.4. EpiSC expression profiles after candidate MR silencing We used the PLATE-seq technology^[235]44 to generate gene expression profiles from EpiSC cells, at 72 h following lentiviral-mediated shRNA transduction. Sequencing reads were de-multiplexed as previously described in ref.^[236]44, mapped to the mouse genome build GRCm38 (mm10), and counted at the known gene level with STAR aligner v2.5.2^[237]77. Summarized gene expression counts are available from the Gene Expression Omnibus database ([238]GSE199855). Expression data were normalized by equivariance transformation, based on the negative binomial distribution with the DESeq R-system package^[239]75. For each candidate MR-targeting shRNA clone, differential gene expression signatures were computed, against plate-matched, non-targeting controls, as a reference. Statistical significance was assessed by moderated Student’s t-test, as implemented in the limma package from Bioconductor^[240]77 and represented as a z-score vector. For each MR-targeting shRNA, differential protein activity was assessed by VIPER analysis, using the EpiSC interactome. To control for differences in regulon size (number of transcriptional targets per regulon), all regulons were capped to the top 50 targets, based on their interaction likelihood. Inverted regulon analysis We used the silencing assays to identify regulons likely to predict an inverted activity for the associated regulatory protein. Specifically, we assessed the change in VIPER-assessed protein activity following MR silencing and changed the regulon sign of candidate MRs, whose activity was assessed as increased following silencing. This addresses a problem originally reported in the VIPER manuscript^[241]25, whereby the activity change may be inferred in the opposite direction due to negative autoregulatory loops that may anticorrelate the expression of a regulator and that of its targets. To identify the regulons reporting an inverted activity, we fitted a mixture of two Gaussian distributions to the empirical distribution of estimated activity z-scores, for all silenced regulators, representing the distribution of differential protein activity for the inverted and non-inverted regulons. Then, based on these two Gaussians, we estimated a conservative threshold to identify regulons with an inverted regulatory sign. This was computed as the z-score corresponding to a likelihood ratio of 4 between the Gaussian models fitted to the inverted and non-inverted regulons, respectively. Differentiation signature score We then assessed whether the changes induced by candidate MR silencing would recapitulate those observed in the differentiation time-courses. This was achieved by computing the enrichment of the 100 most differentially active transcriptional regulators, following shRNA-mediated silencing, in proteins differentially active at 72 h in the morphogen-mediated differentiation assays, using the aREA algorithm^[242]25. For these analyses, protein activity was again assessed using regulons capped at the top 50 transcriptional targets. The Normalized Enrichment Score (NES) of each MR-specific hairpin and for the 5 lineage specific differentiation assays, was integrated using a weighted implementation of Stouffer’s method. Mesoderm and endoderm differentiation were included with half the weight of other lineages since they were strongly correlated. In addition, as discussed above, the efficiency of each hairpin in reducing the activity of the target protein, expressed as z-score, was also used to weigh its contribution. The final NES was multiplied by the sign of the MR protein activity change, change to obtain the differentiation z-score (DZ), such that concordant and discordant effects would have a positive and negative NES, respectively. Finally, we derived the Differentiation Score as the log-transformation of DZ while maintaining its sign: [MATH: DifferentiationScore=log2(DZ+1)sign(DZ) :MATH] 2 Selection and evaluation of pluripotency MRs In the first assay using Oct4 protein level as the readout for pluripotency alteration following silencing, we tested 172 selected genes expressed in EpiSCs, including the top 95 most differentially active candidate MRs expressed in EpiSC, 63 additional candidate MRs expressed in EpiSC and ranking between 121 and 300, and 14 additional genes previously known to be involved in pluripotency. An additional candidate MR, Kmt2a, could not be evaluated because all hairpins used for targeting did not pass the data QC in either of the two assays below. From the 172 genes tested, 117 MRs (68%) were found to down-regulate Oct4 protein levels (FDR < 0.05) and 15 MRs (8.7%) were found to up-regulate Oct4 when silenced. In a second, parallel assay using PLATE-seq datasets from silenced cells, we obtained analyzable datasets after QC for 154 of the 172 MRs—including all the top 95 most differentially active candidate MRs expressed in EpiSC, 54 candidate MRs ranking between 121 and 300, and 5 genes previously known to be involved in pluripotency—following their individual silencing in EpiSCs. Among those 154 MRs, a total of 132 were confirmed by PLATE-seq based assay to induce an expected differentiation state when silenced (positive DS with FDR < 0.05; Fig. [243]2d). CRISPR/Cas9 editing To minimize off-target effects, we used a minimum of two independent sgRNAs for each candidate MR. sgRNA sequences were optimized using two online sgRNA design tools (at [244]https://design.synthego.com/ or the site formerly available at [245]http://crispr.mit.edu). For most experiments, we selected sgRNAs with low off-target effects, high on-target efficiency, and targeting to a common exon shared by all splice isoforms in the first half of the coding region. In some cases, we selected a pair of sgRNAs targeting intronic regions that flank a common exon located in the first half of the coding region to achieve targeted deletion of the entire exon. sgRNA primers were ordered from Integrated DNA Technologies. Single-stranded primers were annealed into double-stranded sequences with restriction enzyme engineered ends, and cloned into lentiGuide-Puro (Addgene 52963) by the Golden Gate method as described in ([246]https://media.addgene.org/cms/filer_public/3e/e1/3ee1ce9c-99f9-40 74-9a28-109d34971471/zhang-lab-sam-cloning-protocol.pdf). All inserted sgRNAs were sequence-verified. For production of lentivirus, HEK293FT cells were grown to 90% confluency and transfected with pMD2.G, psPAX2 packaging vectors and the lentiviral expression construct, using Lipofectamine 3000 (ThermoFisher L3000001). Virus was harvested twice on two consecutive days, and titers were verified by Lenti-X GoStix Plus (Takara Bio 631280) before storage at −80 °C. We established a stable Cas9-expressing line by infecting EpiSC-5 cells with lentiCas9-Blast (Addgene 52962), followed by selection with 5 µg/ml blasticidin (Sigma 15205) for five days. Established Cas9-positive EpiSCs were confirmed for pluripotency by marker analysis as well as self-renewal and differentiation assays. To induce CRISPR/Cas9 mediated mutagenesis, Cas9-positive cells were infected with lentiGuide-Puro (Addgene 52963) inserted with sgRNA. At 24 h, cells were selected with 2 µg/ml puromycin for 2 days and passaged at clonal density (1000 cells/well on 6-well plates). At 72 h after passaging, single colonies were picked into 96-well plates; at least 16 colonies were picked for each sgRNA targeted line. Colonies were grown to confluency and expanded onto 12-well plates to establish clonal cell lines. To validate targeted mutations, cells were lysed with DirectPCR Lysis Reagent (Viagen Biotech 301-C). Genomic regions were PCR-amplified (Platinum SuperFi II PCR master mix, ThermoFisher 12368050) using a pair of primers at least 200 bp away from the 5’ and 3’ ends of the sgRNA targeted site. PCR products were diluted 1:40 and 10 µl mixed with 5 µl of 5 µM sequencing primer for Sanger sequencing. Results from true clonal cell lines displayed either a single ORF consistent with identical genome editing results on both alleles, or double peaks with equal height indicating different alleles on the homologous chromosomes. In the latter case, to determine sequences of the individual alleles, PCR products were cloned into TOPO vector (Strataclone Blunt PCR Cloning Kit, Agilent 240207) and selected on agar plates with ampicillin and kanamycin. Multiple colonies were mini-prepped (Qiagen 51306) and sequenced. RT-qPCR Total RNA was reverse-transcribed to cDNA using SuperScript Reverse Transcriptase II. Diluted cDNA was mixed with SYBR green master mix (Applied Biosystems [247]A25777) and qPCR reaction was performed on an ABI7500 instrument (Applied Biosystems) using conditions of 50 °C for 2 min; 94 °C for 2 min; 94 °C for 30 s + 60 °C 30 s for 40 cycles. The expression of Gapdh or Hprt was used to normalize the loading error, and the Delta-delta Ct method was used to quantify relative gene expression levels. Differentiation phenotype analysis As a secondary validation assay, aimed at assessing whether a differentiation phenotype would emerge following CRISPR/Cas9 based silencing, we focused on 70 of the 132 MRs that had not been previously reported as pluripotency MRs in the literature, as well as 4 additional MRs that were previously known. A total of 1150 clonal lines were generated by sgRNA targeting each one of these MRs, together with 128 scramble/mock controls. Among them, 520 lines contained biallelic frameshift mutations. Analyzing the morphology of those 520 lines identified 15 MRs whose knock-out significantly affected EpiSC morphology (Supplementary Data [248]7.1). In addition, we could not recover biallelic frameshift lines for 5 additional MRs (Chi-square test, p < 0.05; Supplementary Fig. [249]2a), suggesting that null mutants for these genes may be impaired in self-renewal and/or clonogenicity. Collectively, a total of 15 + 5 = 20 MRs were validated by CRISPR/Cas9 based analysis, including 3 of 4 genes selected as known MRs (75%) and 17 of 70 novel MRs (24%). As a follow-up, a panel of in vitro functional assays evaluating self-renewal and differentiation potential of these 20 MRs were performed using 40 knock-out (KO) lines for the 15 MRs with biallelic frameshifts and 11 knock-down (KD) lines for the remaining 5 MRs, as well as 9 empty or mock controls (Supplementary Data [250]7.2, [251]7.3). 2 MRs (Cbl and Zc3h13) display the strongest results and their KO lines were further evaluated by in vivo teratoma formation assay. Immunofluorescence and immunohistochemistry For adherent cells, cells were fixed with 4% PFA for 15 min and washed with 1x PBS. Cells were permeabilized by 0.1% TritonX-100 for 10 min. Blocking was performed using 5% goat serum or 2% non-fat milk, and primary antibody was diluted with 1% serum or non-fat milk for staining overnight at 4 °C. Cells were then washed three times with 0.05% Tween-20 and secondary antibody was diluted 1:500 with 1% goat serum or non-fat milk for staining at room temperature for 1 h, followed by washing three times with 0.05% Tween-20. Nuclei were stained with Hoechst 33342 (1:5,000 dilution in PBS) for 10 min and washed with PBS twice. For teratomas, hematoxylin and eosin (H&E) staining or immunohistochemistry was performed on 5-micron paraffin sections. For H&E staining, sections were rehydrated by serial incubation with xylenes three times, 100% EtOH twice, 95% EtOH twice, 70% EtOH once, and 50% EtOH once. Slides were dipped into hematoxylin for 3 min, followed by rinsing with deionized water and quick incubation with acid ethanol. Slides were then incubated with eosin for 30 s and then dehydrated by incubating with solvents in the reverse order of rehydration. The slides were then cover-slipped using Clearmount. For immunohistochemistry, rehydrated slides were blocked with 3% hydrogen peroxide for 20 min. Antigen retrieval was performed by heating in a steamer for 45 min. Slides were then permeabilized with PBS + 0.1% Triton-X for 15 min and blocked with 5% serum for 1 hr. Primary antibody was diluted with 1% serum and incubated with slides in humid chamber overnight at 4 °C. Secondary antibody staining used biotinylated antibodies at 1:500 dilution in 1% serum. Color detection used an ABC amplification kit and Vector NovaRed Substrate Kit (Vector Laboratories). Hematoxylin staining was performed by quick dipping slides into hematoxylin 3–5 times, followed by washing, dehydrating, and coverslipping as described above. Colony formation assays Confluent cells were dissociated, counted (Bio-Rad TC20), and replated at clonal density (500 cells/well) on 12-well attachment plates, in 1:2 diluted CMAF without Y27632. Media was refreshed once at 48 h after replating. At 96 h, cells were stained with methylene blue by incubating with 1% methylene blue diluted in 50% methanol for 5 min followed by washing with tap water for 10 min. Alternatively, cells were stained for alkaline phosphatase using a commercially available kit (Stemgent 00-0055). In a separate experiment at 120 h after plating at clonal density, cells were dissociated with 0.05% trypsin and resuspended in 1 ml basal media. Cell numbers in suspension were counted by the Bio-Rad TC10 cell counter and divided by the initial number of plated cells to determine the fold-change. Embryoid body formation Embryoid body formation was performed as described in refs. ^[252]78–[253]80, with minor modifications. In brief, EpiSCs were dissociated into single cells and diluted to 10,000 cells/ml in EB growth media containing DMEM/F12, 20% FBS, 1x MEM non-essential amino acids, 1x GlutaMAX, and 55 µM 2-merceptoethanol. Hanging drops were plated as 30 µl drops containing 300 cells on the top of a 150 mm culture dish. After 72 h, embryoid bodies were washed and imaged. Western blotting Attached cells were grown to confluency on 10 cm^2 dishes and dissociated with 0.05% trypsin. Cell suspension was lysed, measured and denatured by following protocol as described in PMID: 29625057. For SDS-PAGE, a 10–20 µg sample was loaded onto Bio-Rad pre-cast 4%–20% Mini-PROTEAN TGX gels (4561094S), run for 2–3 h at 120–140 V, and then transferred to PVDF membranes at 400 mM for 3 h. Primary antibody staining was performed overnight at 4 °C with antibodies diluted in 5% non-fat milk in TBST. HRP-conjugated secondary antibody was used to incubate the membrane for 1 hour at room temperature. Color reaction was performed using SuperSignal West Pico Plus Chemiluminescent Substrate (ThermoFisher 34577). In vitro lineage-specific differentiation analysis EpiSCs were differentiated into neuroectoderm and mesendoderm lineages as described in refs. ^[254]68,[255]81, with minor modifications. In brief, 500–5000 cells, depending on their growth rate, were plated in 24-well culture plates with 0.5 ml CMAF + 10 µM Y27632. The next day, differentiation was initiated in MEF conditioned media, lacking FGF2 and Activin A. For neuroectoderm differentiation, cells were treated with 10 µM SB431542 (Millipore Sigma S4317) + 12 ng/ml FGF2 for 7 days. For mesendoderm differentiation, cells were treated with 10 µM SU5402 (Millipore Sigma 572630) + 5 ng/ml Activin for 3 days, followed by 10 ng/ml BMP4 + 20 ng/ml FGF2 + 30 ng/ml Activin for 4 days. On day 4, after most wells reached confluency, cells were dissociated and replated onto 24-well and 96-well plates to achieve 30–50% confluency. Cells continued differentiation for another three days before harvesting for RT-qPCR analysis (24-well cultures) and immunofluorescence staining (96-well cultures). Image analysis Immunofluorescence-stained samples in 96-well plates (Greiner 655090) were imaged by the Cytation 5 Cell Imaging Multimode Reader using the Biotek Gen5^TM software. For each marker, exposure time for two channels (GFP: 488 nm and DAPI: 405 nm) was adjusted to reduce background and to maximize on-target signal. For each well, 9 square-shaped field were recorded for each channel and exported to a stitched image file. Two channel images from the same sample were pseudo-colored and overlaid by Fiji ImageJ. An image analysis pipeline was developed using the CellProfiler software (version 3.0.0). This pipeline scans the DAPI-channel image to mark the area containing the cell nucleus, and then scans the 488nm-channel image to quantify the nuclear staining intensity of the marker protein. For markers that stain cytoplasm such as Doublecortin, the area surrounding the nucleus is marked for green fluorescent intensity quantification. CSV formatted files containing green fluorescent intensity quantification of individual cells in each sample well were exported, which were then analyzed by an R pipeline generated by the Herbert Irving Comprehensive Cancer Center (HICCC) Confocal and Specialized Microscopy Shared Resource. The background intensity was estimated for each antibody marker using the CellProfiler software as the average intensity of pixels in an area with no cells. The R pipeline then scanned the data and identified cells that showed higher intensity of green fluorescence signal than background. The proportion of cells with positive marker expression was then calculated by dividing the number of cells with greater than background staining by the total number of cells recorded. To correct for batch-effects created by variations in experiments over time, the proportions of positive cells, expressed as percentages, were divided by the average percentage of positive cells in the control wells (wild type or untreated EpiSCs) on the same 96-well plate. Statistical significance for the difference between mutants and controls was estimated by two-tailed Mann-Whitney U-test. Teratoma formation Teratoma formation was performed as described^[256]82 with minor modifications. In brief, EpiSCs were dissociated into single cells, and 300,000 cells were resuspended in 50 µl Matrigel. Cells were injected into the gastrocnemius muscle of immunodeficient nude male mice (CrTac:NCr-Foxn1^nu, Taconic) at 10 weeks of age (n = 20), and mice were monitored daily to track tumor growth; male mice were used as graft hosts since the implanted EpiSC cells were male. Mice were euthanized when tumors grew to 2 cm diameter, around 25 days after injection. Tumors were dissected intact, weighed, cut into 1 mm sections by razor blade, fixed with 4% PFA or 10% formalin, paraffin-embedded, and processed for H&E staining or immunohistochemistry. All animal studies were approved by and conducted according to standards set by the Columbia University Institutional Animal Care and Use Committee (IACUC) under protocol AC-AABV7651. Mice were housed under specific pathogen-free (SPF) conditions in individually ventilated autoclaved cages with irradiated feed and automated RO (reverse osmosis) watering, under a 12 hr dark/12 hr light cycle with temperatures at 68–79 °F and humidity between 30 and 70%. MR interactome refinement To refine the transcriptional targets of each candidate MRs using experimental data, we integrated z-scores corresponding to (1) the differential expression (DE) of its candidate targets, following shRNA-mediated silencing, and (2) the mutual information (MI) between the MR and its candidate targets (see EpiSC context-specific interactome above). To avoid bias in assessing the statistical significance of these metrics, we rank-transformed and scaled between 0 and 1 both variables before integration, while preserving the differential expression (DE) score sign. Integration between MI and DE was performed using a weighed implementation of the Stouffer’s method. The DE contribution was weighted by the silencing efficiency of each shRNA hairpin, as previously discussed. Such weights were computed by a logistic transformation of the silencing efficiency scores with inflection point equal 1 and trend at the inflection point equal 3. MI scores were only limited to the regulator’s transcriptional interactions present in the original EpiSC interactome, in this way making full use of the data processing inequality filter. The “TFmode” component of the new regulons^[257]25 was computed for each target as the average between the direction-corrected “TFmode” from the EpiSC interactome and the DE score for the target in response to the regulator silencing. To estimate the “likelihood” component of the new regulons, we first computed two integration scores: (1) a non-directional integration score (NDIS) as the weighted Stouffer’s integration of the rank-transformed MI and |DE | , where |x| denotes the absolute value of x, and (2) a directional integration score (DIS) as the weighted Stouffer’s integration of the rank-transformed MI multiplied by the sign of the direction-corrected “TFmode” from the EpiSC interactome and the DE score. Finally, the direction-corrected “TFmode” from the EpiSC interactome was used to weight the contribution of the NDIS and DIS scores when computing the integrated interaction score (IS) as DIS x |TFmode | + NDIS x (1 - |TFmode | ). The “likelihood” component was then estimated by scaling the IS by its maximum value across all the targets for the regulator. The top 200 transcriptional targets per regulator, based on IS, were included in the interactome. For VIPER analysis, we limited the number of transcriptional targets per regulon such as the effective size—i.e., sum(likelihood^2)—was capped at 50. Experimentally refined network analysis Based on the new, experimentally-refined network, we used VIPER to assess the differential activity of 132 candidate pluripotency MRs following shRNA-mediated silencing. The NES scores for each shRNA targeting the same candidate MR were integrated using the Stouffer’s method, weighted to account for the hairpin silencing efficiency. A major advantage of this approach is that we could determine network directionality. ARACNe cannot determine the directionality of an interaction between two regulatory proteins A and B. However, if B’s expression or activity is affected by the experimental silencing of A, or vice-versa, this can effectively resolve this issue, thus leading to a bona fide causal network. This can also resolve situations where A regulates B and B regulates A (autoregulatory loop). The statistical threshold to determine the presence of a causal edge was determined by evaluating the relationship between the number of nodes (proteins) and edges at different thresholds. Based on this analysis, we inferred an edge between a silenced regulator P[r] and a downstream target protein P[t] conservatively, only when the activity of P[t] significantly changed in response to P[r] silencing (FDR < 10^−^10). The final network comprises 120 MRs and 1273 causal MR → MR edges (Fig. [258]4a). Of the validated MRs, 12 were not included in the network because no significant interactions with other MRs were identified by the analysis. Causal pluripotency MR network analysis Community analysis was performed with the Louvain multi-level modularity optimization algorithm^[259]54. The optimal number of communities was inferred by modularity optimization as implemented in the igraph package in Bioconductor, with the precision parameter set to the default of 1. For each node, we computed the betweenness centrality, defined as the number of shortest paths connecting all node pairs in the network that pass though the evaluated node; the degree, as the number of edges associated to each node; the in-degree, as the number of incoming edges for each node; and the out-degree, as the number of outgoing edges for each node, using the implementation for computing these centrality measurements in the igraph package. We defined the outlier nodes showing the highest betweenness as “Mediators”, because they mediate most of information flow in the network. We define the Regularized out-degree (ROD) for each node as the total number of outgoing interactions (out-degree) multiplied by the difference between its out- and in-degree, after scaling such differences between 0 and 1 along all nodes. The ROD analysis identified three proteins classes including (a) Speakers, defined as nodes with high levels of ROD, (b) Communicators, defined as nodes with intermediate ROD, and (c) Listeners, defined as nodes with low ROD. The ROD thresholds for considering a node as a Speaker, Communicator or Listener were inferred by fitting a mixture of 3 Gaussian distributions to the ROD distribution along all nodes. We defined the ROD threshold separating listener from communicator nodes as the point of equal probability between the first (lowest mean) and second fitted distributions. The speakers were defined by showing an ROD higher than the quantile of the third fitted distribution corresponding to a right-tail probability lower than 5% (Fig. [260]5b). Community functional gene-set analysis The enrichment of the MR silencing gene expression signatures on Gene Ontology Biological Processes (GO-BP), or on MsigDB v7.4, including KEGG, Reactome, Biocarta, PID pathways and biological hallmarks were estimated with the aREA algorithm. Enrichment due to gene-sets overlap was corrected using shadow analysis as implemented in the VIPER package from Bioconductor^[261]25. Multiple hypothesis testing was addressed by False Discovery Rate (FDR). The enrichment results were integrated for each community, or each category of regulator (i.e., speaker, communicator and listener) using average integration of the normalized enrichment scores weighted by the community score. The results are reported for all GO-BP or MsigDB categories significant in at least one community, or at least one speaker category at FDR < 0.01. Due to the low number of hits obtained from MsigDB dataset, only the enrichment of GO-BP to the four communities were reported in Supplementary Data [262]9.1. The GO-BP terms preferentially enriched to each of the four communities were hand-picked using the following criteria: 1) GOBP terms with -log[10] p < 2.5—a value selected to optimize the number of statistically significant categories—in all four community signatures were first filtered out; 2) For the remaining GOBP terms, the maximal number of the -log[10] p value among all four communities were determined, and the differences between this maximal value to the other three values were calculated; 3) GOBP terms with maximal value larger than any other three values by more than 2, or GOBP terms that are significantly enriched to only one community (-log[10] p < 2.5) but not the other three communities were included in Supplementary Data [263]9.2 and Fig. [264]4h. Reporting summary Further information on research design is available in the [265]Nature Portfolio Reporting Summary linked to this article. Supplementary information [266]Supplementary Information^ (6.1MB, pdf) [267]41467_2025_57894_MOESM2_ESM.pdf^ (121.5KB, pdf) Description of Addtional Supplementary File [268]Supplementary Data 1^ (22.3KB, xlsx) [269]Supplementary Data 2^ (61.4KB, xlsx) [270]Supplementary Data 3^ (28.8MB, xlsx) [271]Supplementary Data 4^ (8.6MB, xlsx) [272]Supplementary Data 5^ (49.6KB, xlsx) [273]Supplementary Data 6^ (99.3KB, xlsx) [274]Supplementary Data 7^ (21.3KB, xlsx) [275]Supplementary Data 8^ (189.1KB, xlsx) [276]Supplementary Data 9^ (54.3KB, xlsx) [277]Supplementary Data 10^ (67.9KB, xlsx) [278]Reporting Summary^ (4.3MB, pdf) [279]Transparent Peer Review file^ (347KB, pdf) Source data [280]Source Data^ (586.6KB, zip) Acknowledgements