Abstract

   Polycomb repressive complexes (PRCs) are important histone modifiers,
   which silence gene expression; yet, there exists a subset of PRC-bound
   genes actively transcribed by RNA polymerase II (RNAPII). It is likely
   that the role of Polycomb repressive complex is to dampen expression of
   these PRC-active genes. However, it is unclear how this flipping
   between chromatin states alters the kinetics of transcription. Here, we
   integrate histone modifications and RNAPII states derived from bulk
   ChIP-seq data with single-cell RNA-sequencing data. We find that
   Polycomb repressive complex-active genes have greater cell-to-cell
   variation in expression than active genes, and these results are
   validated by knockout experiments. We also show that PRC-active genes
   are clustered on chromosomes in both two and three dimensions, and
   interactions with active enhancers promote a stabilization of gene
   expression noise. These findings provide new insights into how
   chromatin regulation modulates stochastic gene expression and
   transcriptional bursting, with implications for regulation of
   pluripotency and development.
     __________________________________________________________________

   Polycomb repressive complexes modify histones but it is unclear how
   changes in chromatin states alter kinetics of transcription. Here, the
   authors use single-cell RNAseq and ChIPseq to find that actively
   transcribed genes with Polycomb marks have greater cell-to-cell
   variation in expression.

Introduction

   Embryonic stem cells (ESCs) are capable of self-renewing and
   differentiating into all somatic cell types^[42]1, [43]2, and their
   homeostasis is maintained by epigenetic regulators^[44]3. In this
   context, Polycomb repressive complexes (PRCs) are important histone
   modifiers, which play a fundamental role in maintaining the pluripotent
   state of ESCs by silencing important developmental regulators^[45]4.
   There are two major PRCs: PRC1, which monoubiquitinylates histone 2 A
   lysine 119 (H2Aub1) via the ubiquitin ligase RING1A/B; and PRC2, which
   catalyzes dimethylation and trimethylation of H3K27 (H3K27me2/3) via
   the histone methyltransferase (HMT) EZH1/2.

   Recently, we discovered that a group of important signaling genes
   coexists in active and Polycomb-repressed states in mouse ESCs
   (mESCs)^[46]5. During the transcription cycle, recruitment of histone
   modifiers or RNA-processing factors is achieved through changing
   patterns of post-translational modifications of the carboxy-terminal
   domain of RNAPII^[47]6. Phosphorylation of S5 residues (S5p) correlates
   with initiation, capping, and H3K4 HMT recruitment. S2 phosphorylation
   (S2p) correlates with elongation, splicing, polyadenylation, and H3K36
   HMT recruitment. Phosphorylation of RNAPII on S5, but not on S2, is
   associated with Polycomb repression and poised transcription factories,
   while active factories are associated with phosphorylation on both
   residues^[48]5, [49]7, [50]8. S7 phosphorylation (S7p) marks the
   transition between S5p and S2p^[51]9, but its mechanistic role is
   unclear presently.

   Our genome-wide analyses of RNAPII and Polycomb occupancy in mESCs
   identified two major groups of PRC targets: (1) repressed genes
   associated with PRCs and unproductive RNAPII (phosphorylated at S5 but
   lacking S2p; PRC-repressed) and (2) expressed genes bound by PRCs and
   active RNAPII (both S5p and S2p; PRC-active)^[52]5. Both types of genes
   are marked by H3K4me3 and H3K27me3, a state termed bivalency^[53]1,
   [54]10. H3K4me3 correlates tightly with RNAPII-S5p^[55]5, a mark that
   does not distinguish PRC-active and Polycomb-repressed states.

   The role of PRCs in modulating the expression of PRC-active genes was
   shown by PRC1 conditional knockout (KO). Sequential ChIP and
   single-cell imaging showed mutual exclusion of S2p and PRCs at
   PRC-active genes^[56]5, although PRCs were found to co-associate with
   S5p. This indicates that PRC-active genes acquire separate active and
   PRC-repressed chromatin states. It remains unclear whether these two
   states occur in different cells within a cell population, or within
   different alleles in the same cell^[57]5. This pattern of two distinct
   chromatin states could imply a digital switch between actively
   transcribing and repressed promoters within a population of cells,
   thereby introducing more cell-to-cell variation in gene expression
   compared to genes with both alleles in active chromatin states.

   Motivated by this hypothesis, here, we integrate states of histone and
   RNAPII modification from a published classification of ChIP-seq
   data^[58]5 with single-cell RNA-sequencing (RNA-seq) data generated for
   this analysis. The matched chromatin and scRNA-seq data sets allow us
   to decipher, on a genome-wide scale, how differences in the chromatin
   state can affect transcriptional kinetics. A schematic overview of our
   analysis strategy is shown in Fig. [59]1. We focus on active PRC-target
   genes that are marked by PRCs (H3K27me3 modification or both H3K27me3
   and H2Aub1) and active RNAPII (S5pS7pS2p), and compare these with
   “active” genes (marked by S5p, S7p, S2p without H3K27me3 and H2Aub1
   marks). We quantify variation in gene expression and transcriptional
   kinetics statistically and by mathematical modeling (Fig. [60]1). In
   addition, we map the functions of PRC-active genes in the context of
   pluripotency signaling and homeostasis networks. Further, we analyze
   the linear ordering and three-dimensional contacts of PRC-active genes
   on the mouse chromosomes. Finally, we investigate the effect of
   Polycomb on regulating transcriptional heterogeneity by deletion of
   Ring1A/B, followed by single-cell profiling.

Fig. 1.

   Fig. 1
   [61]Open in a new tab

   Summary of methodology. OS25 mESCs were cultured and characterized by
   single-cell RNA-seq using the Fluidigm C1 system, applying the SMARTer
   kit to obtain cDNA and the Nextera XT kit for Illumina library
   preparation. OS25 cells are grown in conditions that select for
   undifferentiated cells (high Oct4-expressing). Libraries from 96 cells
   were pooled and sequenced on four lanes of a HiSeq. After
   quality-control analysis of cells, 90 cells out of 96 remained for
   further analysis. We first unraveled contributions of components of
   gene expression variation using the scLVM method^[62]13. Removing cell
   cycle variation and technical noise allowed us to focus on stochastic
   gene expression. Gene expression variation can be quantified by CV^2 or
   DM, which is a measure of noise independent of gene expression levels
   and gene length. To explore the transcriptional kinetics of OS25 ES
   cells, poisson-beta model^[63]16 was fitted to single-cell gene
   expression data, leading to estimates of burst frequency and size.
   Next, histone and RNAPII promoter modifications were obtained from
   Brookes et al.^[64]5 and integrated with single-cell RNA-seq to
   investigate relationship between stochastic gene expression and
   epigenetics. Active genes with no PRC marks are usually in the “on”
   state with high burst frequencies (k [on]), PRCr genes are mostly “off”
   and PRC-active genes switch between “on” and “off” states very
   frequently. Considering the allele-level possibilities, at active
   genes, both alleles would be in an actively transcribing state. For
   PRCa genes, both alleles would be in an actively transcribing state, or
   both alleles would be in a silent PRC-marked state, or only one allele
   is in PRC-marked state, which, subsequently, would result in noisier
   gene expression. For PRC-repressed genes, both alleles are expected to
   be in a silent PRC-marked state

Results

Single-cell RNA-seq and data processing

   To investigate how Polycomb repression relates to stochasticity in gene
   expression, we profiled single-cell transcriptomes of mouse OS25 ESCs
   cultured in serum and leukemia-inhibitory factor (LIF), previously used
   to map RNAPII phosphorylation and H2Aub1^[65]5. Single-cell RNA-seq was
   performed using the Fluidigm C1 system, applying the SMARTer kit to
   obtain cDNA and the Nextera XT kit for Illumina library preparation.
   Libraries from 96 cells were pooled and sequenced on four lanes of an
   Illumina HiSeq2000 (Fig. [66]1; please refer to Methods for details).

   Next, we performed quality-control analysis for each individual cell
   data set and removed poor-quality data based on two criteria (as
   described before in ref. ^[67]11). Cells were removed if: (1) the total
   number of reads mapping to exons for the cell was lower than half a
   million and (2) the percentage of reads mapping to
   mitochondrial-encoded RNAs was higher than 10%. We also compared
   normalized read counts of genes between cells and found many genes
   abnormally amplified for three cells. Therefore, we removed these
   cells, resulting in 90 cells that could be used for further analysis.
   For these 90 cells, over 80% of reads were mapped to the Mus musculus
   genome (GRCm38) and over 60% to exons (Supplementary Fig. [68]1A–C).

   OS25 ES cells are grown under Oct4 selection and do not express
   early-differentiation markers such as Gata4 and Gata6^[69]5, having the
   expected features of pluripotency. They are ideal for studying Polycomb
   repression and its impact on transcriptional cell-to-cell variation as
   compared to other culture conditions such as 2i (serum-free). ESCs
   grown in 2i show decreased Polycomb repression and RNAPII poising at
   well-characterized early-developmental genes^[70]12, therefore making
   2i conditions the least ideal conditions to study mechanisms of
   Polycomb regulation in the pluripotent state. As previously
   shown^[71]5, we do not observe distinct subpopulations of cells based
   on key pluripotency factors and differentiation markers in our OS25
   single-cell data sets (Supplementary Fig. [72]1D).

   In addition, we compared single-cell expression profiles of the OS25
   ESCs grown under Oct4 with recently published scRNAseq data sets from
   mESCs cultured in serum + LIF and 2i^[73]11; Principal component
   analysis using pluripotency genes and differentiation markers shows
   that OS25 cells are more similar to the subpopulation of pluripotent
   serum cells, rather than the subpopulation of serum cells that are
   either “primed for differentiation” or are “on the differentiation
   path” (Supplementary Fig. [74]1E).

Defining chromatin state and gene expression noise

   We integrated our new single-cell RNA-seq data with a previous
   classification of gene promoters according to the presence of histone
   and RNAPII modifications^[75]5 (Fig. [76]1). Comparison of our average
   single-cell expression profiles with the bulk gene expression
   (mRNA-seq) profiles from Brookes et al.^[77]5 yields a high correlation
   (Spearman’s rho = 0.87, Supplementary Fig. [78]1F), suggesting that the
   chromatin and RNAPII data reflect cells in the same biological state as
   the single-cell RNA-seq data.

   Next, we analyzed gene expression variation within the single-cell
   data. First, we quantified cell-to-cell variation at each mean
   expression level using the coefficient of variation (Supplementary
   Fig. [79]2A). Cell-to-cell variation can arise either due to stochastic
   gene expression itself, or technical noise or confounding expression
   heterogeneity due to biological processes such as the cell cycle.

   To isolate pure stochastic gene expression from cell cycle variation in
   gene expression, we applied a latent variable model^[80]13. This is a
   two-step approach, which reconstructs cell cycle state before using
   this information to obtain “corrected” gene expression levels. The
   method reveals that the cell cycle contribution to variation is 1.2% on
   average (Supplementary Fig. [81]2B). While this effect is small, when
   clustering all cells based on G2/M stage markers, we found that cells
   separate into two groups: one with high expression of G2 and M genes
   and the other with low expression of these genes (Supplementary
   Fig. [82]2C). Applying the cell cycle correction removes this effect,
   leading to a more homogeneous expression distribution of these genes
   across the cells (Supplementary Fig. [83]2D).

   To account for the technical noise present in single-cell RNA-seq data,
   we removed lowly expressed (LE) genes that are most likely to display
   high technical variability^[84]14, [85]15. Here, a gene is considered
   as LE if the average normalized read count is less than 10. This
   results in a set of 11,861 genes with moderate to high mRNA abundance.
   Subsequently, we use the DM (distance to median) to quantify gene
   expression variation in mRNA expression^[86]11, since it accounts for
   confounding effects of expression level and gene length on variation
   (described in detail in the Methods; Fig. [87]1).

   Among the 11,861 expressed genes, 7175 have categorized ChIP-seq
   profiles as defined by Brookes et al.^[88]5; genes excluded have
   transcription start site (TSS) regions that overlap with other genes,
   and therefore cannot be unequivocally classified. We defined two major
   sets of genes based on their PRC marks and RNAPII states: (1) “Active”
   genes (n = 4483) without PRC marks (H3K27me3 or H2Aub1) but with active
   RNAPII (S5pS7pS2p), (2) “PRC-active” genes (labeled as “PRCa”; n = 945)
   with PRC marks (H3K27me3 or H3K27me3 plus H2Aub1), and active RNAPII.

   To explore the transcriptional kinetics of these genes and describe
   stochastic gene expression in OS25 ES cells, we estimated their kinetic
   transcription parameters using a Poisson-beta model described
   previously^[89]16 (see also in the Methods).

PRCa genes have distinct transcriptional kinetics

   Using the DM measure to quantify gene expression variation in single
   cells, we observe that histone modifications mediated by PRCs (H3K27me3
   or H3K27me3 and H2Aub1) correlate with high levels of variability
   compared to active genes (those without PRC marks; P < 2.2 × 10^−16 by
   the two-tailed Wilcoxon rank sum test, Fig. [90]2a). Furthermore, the
   inferred kinetic parameters provide insight into the expression
   behavior of genes, showing that active genes have significantly higher
   burst frequencies than PRCa genes (Fig. [91]2a and Supplementary
   Fig. [92]3A). This suggests that PRCa genes are more frequently in the
   “off” state, i.e., more alleles are in the off state at any given point
   in time, potentially due to the PRC repression of a subset of alleles.

Fig. 2.

   Fig. 2
   [93]Open in a new tab

   Stochastic gene expression of PRCa and active genes. a Comparison of
   PRCa and active genes reveals that PRCa genes are more variable with
   lower burst frequency levels than active genes (P < 2.2 × 10^−16 by the
   two-tailed Wilcoxon rank sum test). Gene expression variation is
   represented by DM values. b Expression profiles of PRCa genes show
   bimodal patterns. The distribution of a gene with bimodal expression is
   assumed to be expressed as a mixture of two normal distributions (LE
   and HE states; upper panel). PRCa genes have mixed cell states (on
   average 49% in HE and 51% in LE) indicating they are either in active
   state (i.e., active RNAPII and no PRC marks) or in repressed state
   (unproductive RNAPII and with PRC marks) consistent with cellular
   heterogeneity, suggested in Brookes et al.^[94]5 Error bars represent
   s.e.m.

   To ensure that differences between the kinetic parameters are not
   driven by changes in gene expression levels between the active and PRCa
   groups, we extracted expression-matched genes of active and PRCa groups
   (please refer to Methods). These analyses confirmed that PRCa genes
   have lower burst frequency and higher noise levels than active genes
   (Supplementary Fig. [95]3B and C). Consequently, the greater
   cell-to-cell variability for PRCa compared to Active genes is not
   driven by difference in the mean expression level, but potentially
   linked to the presence of PRC marks themselves.

   To explore whether H3K9me3 could contribute to the transcriptional
   heterogeneity identified at PRCa genes, we analyzed H3K9me3 ChIP-seq
   data of Mikkelsen et al.^[96]17, and found that only a few expressed
   PRCa genes (n = 5) are marked by H3K9me3 at their promoter region (2 kb
   centered on the TSS), making further analysis statistically impossible.

   Although the literature shows that the DNA of mESCs is hypomethylated,
   and genes that are marked by Polycomb are usually devoid of DNA
   methylation^[97]18, [98]19, we checked the extent of DNA methylation at
   the PRCa gene list considered. We extracted the DNA methylation
   patterns at proximal promoter regions in mESCs reported in Fouse et
   al.^[99]19. Only a small proportion of genes (n = 110) has DNA
   methylation according to this definition. Owing to the small sample
   size, a statistical assessment will be weak, but comparison of gene
   expression variation profiles of these genes with the same number of
   PRCa genes (and same expression levels) that are unmethylated showed
   that the differences are not significant (Wilcoxon rank sum test, 0.1).
   This suggests no detectable effect of DNA methylation on
   transcriptional heterogeneity of PRCa genes (Supplementary
   Fig. [100]3D).

   A decrease in the frequency of transcriptional bursting can manifest
   itself as a more bimodal pattern of gene expression across a cell
   population. Indeed, we observe that PRCa genes have significantly more
   bimodal expression profiles compared to active genes (see Methods for
   bimodality index calculation; Supplementary Fig. [101]3E and
   Fig. [102]2b). Assuming that the distribution of a gene with bimodal
   expression can be expressed as a mixture of two log-normal
   distributions^[103]20 (LE and highly expressed (HE) states), we observe
   that PRCa genes have mixed cell states (on average 49% of cells in HE
   and 51% in LE). In contrast, active genes are mostly in the active
   state as expected (on average 70% in HE and 30% in LE). PRC-repressed
   genes with unproductive RNAPII and PRC marks, labeled as “PRCr”) are
   24% in HE and 76% in LE (Fig. [104]2b). Therefore, expression patterns
   of PRCa are in between Active and PRCr, suggesting a composite of these
   two states.

   We should note that in our kinetic models, decay rates are set to 1 to
   normalize kinetic parameters so that they are independent of
   time^[105]16. To investigate whether decay rates have profound effects
   on kinetic parameters, we integrated published mRNA decay rates in
   mESCs^[106]21 into our kinetic model. The subtle differences in decay
   rates across genes did not result in major changes in the inferred
   kinetic parameters, leaving all major trends unaffected (Supplementary
   Fig. [107]3F).

PRCa genes are important regulators in signaling pathways

   To investigate potential functions of the cell-to-cell variation in
   gene expression in PRCa genes, we carried out KEGG pathway enrichment
   analysis for PRCa genes in our OS25 mESCs (see also Brookes et
   al.^[108]5). While active genes are enriched in pathways related to
   housekeeping functions, such as RNA transport, consistent with their
   uniform and stable expression across cells, PRCa genes are enriched in
   signaling pathways such as PI(3)K-Akt, Ras signaling, and TGF-beta
   signaling (Supplementary Table [109]1). These signaling pathways show
   high levels of cell-to-cell variation compared to pathways related to
   housekeeping functions (Supplementary Fig. [110]3G). This may be due to
   transcriptomic fluctuations introduced by cytokine LIF signalling via
   two signaling pathways: Jak-Stat3 and PI(3)K-Akt^[111]22 (Fig. [112]3).

Fig. 3.

   Fig. 3
   [113]Open in a new tab

   Signaling pathways that are key regulators of pluripotency in mESCs. In
   OS25 cells there is a selection for undifferentiated cells (high
   Oct4-expressing). LIF integrates signals into the core regulatory
   circuitry of pluripotency (Sox2, Oct4, and Nanog) via two signaling
   pathways: Jak-Stat and PI3K-Akt^[114]22. The Jak-Stat pathway activates
   Klf4, and the PI3K-Akt pathway stimulates the transcription of Tbx3,
   which are both PRCa genes. The MAPK pathway antagonizes the nuclear
   localization of Tbx3. PRCa genes are enriched in Jak-Stat and PI3K-Akt
   pathways, which show high cell-to-cell variation, suggesting a crucial
   role of PRCs in modulating fluctuations in signaling pathways that
   integrate LIF signals into core transcription factor network (Figure
   adapted from ref. ^[115]22)

   The Jak-Stat3 pathway activates Klf4, and the PI(3)K-Akt pathway
   stimulates the transcription of Tbx3 ^[116]22. The expression levels of
   Klf4 and Tbx3, which are PRCa genes, are noisier than the pluripotency
   factors Nanog, Sox2, and Oct4. This pattern of noise propagation from
   the signaling pathways through the downstream transcriptional
   regulatory network is interesting, as it might indicate the role of
   PRCs in modulating transcriptomic fluctuations.

Chromosomal position effects and stochastic gene expression

   It is known that neighboring genes on chromosomes exhibit significant
   correlations in gene expression abundance and regulation, partly due to
   two-dimensional (2D) chromatin domains^[117]23–[118]26. Is there a
   similar effect of clustering by chromatin marks and noise in gene
   expression?

   To address this, we investigated the positional effects of noise in
   mRNA expression using the DM values (Methods). If genes cluster
   together based upon their transcriptional noise, we would expect that
   the DM values of genes adjacent to noisy genes would be higher than
   those of genes adjacent to stable genes. Indeed, the noise levels of
   genes in the neighborhood of noisy genes are significantly higher than
   those of genes that flank stable genes (P = 1.3 × 10^−4 by the
   one-tailed Wilcoxon rank sum test, ±50 kb of TSS, Supplementary
   Fig. [119]4A). This suggests that the genomic neighborhood might
   influence the frequency of transcriptional bursting.

   In Fig. [120]4a, we show the association between chromosomal position
   and gene expression noise. The difference between the mean expression
   levels of flanking genes between noisy and stable genes is not
   significant (P = 0.7311 by the two-tailed Wilcoxon rank sum test,
   ±50 kb of TSS), suggesting that the clusters of genes are not driven by
   their expression levels. The association between chromosomal position
   and gene expression noise was most significant at the window size of
   50 kb, but weaker at a neighborhood size of 0.5 Mb (Fig. [121]4a).
   (Please refer to Methods for P value calculation.) Thus, genes tend to
   be clustered into neighborhood domains by their noise levels, ranging
   in size up to 0.5 Mb.

Fig. 4.

   Fig. 4
   [122]Open in a new tab

   Chromosomal position effects and stochastic gene expression. a Maps of
   genes belonging to noisy and stable clusters. Chromosomal positions of
   genes marked by PRCs and RNAPII in the noisiest clusters. One of the
   noisy clusters is visualized, DM levels correlate with bimodal
   expression patterns. In lower panel, association between chromosomal
   position and gene expression noise is shown; the noise levels of genes
   in the neighborhood of noisy genes are significantly higher than those
   of flanking genes of stable genes. As a control, we constructed 100
   randomized genomes in which the positions of genes were fixed but the
   DM value of each gene was assigned randomly without replacement, and
   the same analysis was performed on each randomized genome to obtain
   random P values. In all, 2.5% quantile of random P value, and 97.5%
   quantile of random P values are shaded in gray. All data are shown on a
   −log(p) scale. b Enrichments of PRC marks and RNAPII states in noisy
   and stable clusters; two-tailed Fisher’s exact test; *P < 0.1,
   **P < 0.05

   To identify the clusters of noisy or stable genes, we performed a
   sliding-window analysis on the mouse genome (Methods). We found 129
   noisy clusters ranging in size from 4 to 11 genes, spanning a total
   number of 669 genes. Similarly, 112 stable clusters (between 4 and 13
   genes) with a total number of 556 genes were found (Fig. [123]4a). The
   noise levels of genes in noisy clusters are significantly higher than
   that of genes in stable clusters (P < 2.2 × 10^−16 by two-tailed
   Wilcoxon rank sum test, Supplementary Fig. [124]4B) independent of the
   mean expression levels and gene lengths (Supplementary Fig. [125]4C–D).

   In addition, we found that DM levels correlate with bimodal expression
   patterns within the noisy clusters. One example is visualized in
   Fig. [126]4a; one of the noisy clusters on chromosome 1 consists of
   three PRCa and two active genes. Lefty1 and Lefty2 PRCa genes, which
   are important in controlling the balance between self-renewal and
   pluripotent differentiation in mESCs, are highly variable, and also
   highly correlated in their gene expression. An active gene, Pycr2,
   Pyrroline-5-carboxylate reductase 2, is in close proximity to both
   Lefty1 and Lefty2, and is more variable than the Sde2 gene that lies in
   proximity of Lefty2 only (density profiles are shown in Supplementary
   Fig. [127]4E). Indeed, within the clusters, gene expression variation
   levels of active genes increase with the increasing number of flanking
   variable genes (Supplementary Fig. [128]4F). Another PRCa gene is
   Tmem63a, which is a transmembrane protein implicated in maintenance of
   pluripotency, lies near Lefty1 and has high cell-to-cell variation in
   gene expression.

   Interestingly, PRCs characterize the noisy clusters, i.e., PRC marks
   are enriched in noisy clusters rather than in stable ones. In
   particular, genes with H3K27me3 are enriched at noisy clusters
   (P = 1.1 × 10^−2 by the two-tailed Fisher’s exact test), but depleted
   at stable clusters (P = 5.9 × 10^−2 by the two-tailed Fisher’s exact
   test, Fig. [129]4b). Since PRCs are tightly associated with RNAPII
   states, we examined differences between the RNAPII state of genes
   between noisy and stable clusters. We found that genes marked by active
   elongating RNAPII (S5pS7pS2p) are depleted at noisy clusters
   (P = 1.3 × 10^−3 by the two-tailed Fisher’s exact test, Fig. [130]4b),
   supporting the view that elongating RNAPII modifications promote stable
   gene expression. Together, noisy clusters are characterized by the
   presence of PRC marks and the absence of active elongating RNAPII,
   while stable clusters are characterized by the absence of PRCs.

Gene and enhancer clustering in 2D and 3D

   Next, we analyzed whether PRCa genes are proximal to fully repressed
   Polycomb genes, which could eventually increase their sensitivity to
   Polycomb repression. Linear spatial proximity between PRCa genes and
   PRCr genes is significantly closer than the median distance between
   randomly chosen genes and PRCr genes (empirical P = 2 × 10^−2,
   Fig. [131]5a; Methods). Interestingly, PRCa genes are also in close
   proximity to active genes (empirical P < 0.0001, Supplementary
   Fig. [132]4G), while active genes are distal from PRCr genes (empirical
   P = 5 × 10^−3, Supplementary Fig. [133]4H), suggesting a 2D spatial
   arrangement of these genes as Active-PRCa-PRCr (as visualized in
   Fig. [134]5a).

Fig. 5.

   Fig. 5
   [135]Open in a new tab

   Effects of 2D and 3D neighborhood on transcriptional kinetics. a
   Histogram of simulated median distances under a null model assuming no
   positional preference in the neighborhood of PRCr genes. The observed
   median distance of PRCa genes to their nearest neighbor in the PRCr
   group, depicted by vertical dashed red line, are significantly less
   than expected by chance (empirical P = 2 × 10^−2). b Analyzing mESC
   Promoter Capture Hi-C data reveals that PRCa genes have a strong
   enrichment for long-range contacts between promoters with levels in
   between PRCr and active genes. Error bars represent s.e.m. c PRCa genes
   have significantly more interactions with active enhancers compared to
   PRCr genes (P < 2.2 × 10^−16). In contrast, interactions with poised
   enhancers are mainly observed for PRCr genes rather than PRCa
   (P < 2.2 × 10^−16). d Occurrence of interactions with active enhancers
   decrease noise of PRCa genes independent of the mean expression levels

   We next asked whether the linear genomic position effects of PRCs are
   reflected in the three-dimensional (3D) genome organization in ESCs.
   Recently, Schoenfelder et al.^[136]27 found that PRC1 acts as a major
   regulator of ESC genome architecture by organizing genes into 3D
   interaction networks. They generated mESC Promoter Capture Hi-C (CHi-C)
   data^[137]28, and analyzed it using the GOTHiC (Genome Organization
   Through Hi-C) Bioconductor package
   ([138]http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.htm
   l). This yielded a strong enrichment for long-range contacts between
   promoters bound by PRCs.

   We applied the same approach to this data set using our gene list. We
   found that there is a strong enrichment for long-range
   promoter–promoter contacts for both PRCa and PRCr genes (Fig. [139]5b).
   Interestingly, PRCr genes have significantly stronger contact
   enrichment than PRCa genes in mESCs (one-tailed t-test
   P = 6.3 × 10^−6). PRCa genes are in between PRCr and active genes; they
   have stronger contact enrichment than active genes (one-tailed t-test
   P = 1 × 10^−4; Fig. [140]5b).

   In Fig. [141]5b, the promoter contacts of the aforementioned noisy
   cluster PRCa gene Lefty2 are visualized. It is in contact with the
   other PRCa genes Lefty1 and Tmem63a, and it has a strong connectivity
   with the active Pycr2 gene. These contacts may affect Pycr2’s frequency
   of transcriptional bursting, and thereby tune expression noise.

   In terms of the promoter preferences of gene sets, it is interesting to