Abstract

   The execution of biological activities inside space‐limited cell nuclei
   requires sophisticated organization. Current studies on the 3D genome
   focus on chromatin interactions and local structures, e.g.,
   topologically associating domains (TADs). In this study, two global
   physical properties: DNA density and distance to nuclear periphery
   (DisTP), are introduced and a 2D matrix, D^2 plot, is constructed for
   mapping genetic and epigenetic markers. Distinct patterns of functional
   markers on the D^2 plot, indicating its ability to compartmentalize
   functional genome regions, are observed. Furthermore, enrichments of
   transcription‐related markers are concatenated into a cross‐species
   transcriptional activation model, where the nucleus is divided into
   four areas: active, intermediate, repress and histone, and repress and
   repeat. Based on the trajectories of the genomic regions on D^2 plot,
   the constantly active and newly activated genes are successfully
   identified during olfactory sensory neuron maturation. The analysis
   reveals that the D^2 plot effectively categorizes functional regions
   and provides a universal and transcription‐related measurement for the
   3D genome.

   Keywords: 3D genome, DNA density, nuclear lamina, single‐cell Hi‐C
     __________________________________________________________________

   The D^2 algorithm is developed to detect two physical properties, DNA
   density and distance to nuclear periphery, from the reconstructed
   genome structures. Based on them, D^2 plot is constructed to reveal the
   spatial positions of genetic and epigenetic markers. Furthermore, a
   cross‐species transcriptional activation model on D^2 plot is
   summarized, which is able to reflect expression levels.

   graphic file with name ADVS-9-2202149-g005.jpg

1. Introduction

   DNA molecules are highly condensed inside cell nuclei to compact into
   limited space, with the necessary loci exposed for active
   transcription.^[ [36]^1 ^] The rapid development of 3C‐based
   technologies offers insight into the sophisticated organization of the
   genome, allowing for the discovery of functional nuclear structures,
   for example, TADs, loops, and compartments.^[ [37]^2 ^] Nevertheless,
   most of these interaction‐derived structures focus on local spatial
   organizations, lacking a universal ability to compare distal regions.
   Although compartments are genome‐wise structures, they fail to
   correlate a given region to a specific biological function without
   additional functional data.^[ [38]^2 ^] While some researchers use
   interdisciplinary approaches to delineate the chromatin structure,^[
   [39]^3 , [40]^4 , [41]^5 ^] their methods continue to focus on
   interactions. Therefore, universal measurement for a 3D genome with
   functional correlation is required to elucidate the structural basis of
   genome function.

   DNA density (hereafter referred to as density) and distance to nuclear
   periphery (DisTP) are two primary physical properties that are related
   to transcriptional activities in genome spatial organization. A
   negative correlation between density and transcriptional activity has
   been implied by several nuclear structures in previous studies.^[
   [42]^6 , [43]^7 , [44]^8 ^] Specifically, the densely packed
   heterochromatin is less accessible and transcriptionally repressed,^[
   [45]^6 , [46]^9 ^] whereas euchromatin, which is less condensed,
   exhibits more active transcription.^[ [47]^6 ^] Several studies have
   shown that DisTP is positively correlated with transcriptional
   activity.^[ [48]^9 , [49]^10 , [50]^11 , [51]^12 ^] The
   lamina‐associated domains (LADs), located at the nuclear periphery, are
   believed to repress gene expression.^[ [52]^9 ^] Experimental evidence
   has reinforced the idea that reduced proximity to the nuclear lamina is
   usually accompanied with higher transcriptional activity, while
   increased proximity to the nuclear lamina results in more repressed
   transcription.^[ [53]^13 , [54]^14 ^] Previous works have clustered the
   genomic segments into groups with different DisTP (1 Mb resolution),
   either according to the results of experimentation^[ [55]^11 ^] or
   those of simulation.^[ [56]^12 ^] By computing the correlation with
   genetic markers, such as histone modifications, they found that the
   inner groups were more likely to enrich for active markers compared
   with the peripheral groups.

   Owing to the lack of appropriate techniques for detecting the physical
   properties in genome‐wide scale with high resolution, the extent to
   which density or DisTP are connected to transcriptional activities, and
   the analysis comprising the properties for the comprehensive
   description of genome structure and function are unclear. We developed
   an algorithm, D^2, to compute the two properties: density and DisTP,
   and integrated them as a 2D matrix, D^2 plot, for mapping functional
   markers at the genome‐wide scale. We identified distinct enrichment
   patterns of functional markers on the D^2 plot and proposed a
   cross‐species transcriptional activation model on D^2 plot. Our study
   revealed a novel analysis procedure for correlating nuclear
   organization with gene transcriptional activities and provided a new
   perspective for the study of the 3D genome structure.

2. Results

2.1. Detection of DNA Density and DisTP

   D^2 algorithm computed Density and DisTP at the genome‐wide scale in
   high resolution (Figure [57]1A; the Experimental Section). The input of
   the D^2 algorithm was a set of DNA particles, which were reconstructed
   from the single‐cell Hi‐C contact map by Dip‐C^[ [58]^15 ^] or
   NucDynamics^[ [59]^16 ^] (Table [60]S1, Supporting Information; the
   Experimental Section). Each particle had its own spatial coordinate and
   genomic position, thus linking spatial structure to genomic
   sequence. First, we introduced a 3D mesh segmentation to assign the
   input DNA particles into different cubes. The cube length was
   adjusted so that the average number of particles for each cube was
   approximately equal among cells, thereby allowing for intercell
   comparison. Density was calculated as the number of particles in one
   cube and then smoothed to reduce noise. To obtain DisTP, we first
   identified the membrane cubes (cubes located at the nuclear periphery),
   and then calculated DisTP for each cube as its distance to the nearest
   membrane cubes (Figure [61]S1, Supporting Information). Both properties
   were consistent across different reconstructed replicates of a given
   individual cell (Figure [62]S2, Supporting Information), indicating
   that D^2 was robust against the randomness introduced by the structure
   reconstruction algorithm. Therefore, we selected the first
   reconstruction of each cell for the following analysis. Serial sections
   of density and DisTP of a sample cell were illustrated in
   Figure [63]1B.

Figure 1.

   Figure 1
   [64]Open in a new tab

   D^2 detects DNA density and DisTP at a genome‐wide scale.
   A) Illustration of the D^2 algorithm. Each yellow dot from the
   reconstructed structure represents a 20 kb genomic segment. The line
   threading yellow dots indicates they are part of a given chromosome.
   The light gray mesh is the 3D segmentation, which is shown in 2D for
   clear visualization. In the “define membrane cubes” panel, the gray
   cubes are regarded as the membrane cubes as they are not buried. The
   intensity of red for each cube in the right top subpanels indicates its
   density value, while the intensity of yellow in the right bottom
   subpanel indicates its DisTP value. B) Serial sections of a sample
   nucleus (GM12878 cell 1) showing density (red) and DisTP (yellow). Only
   seven sections in the middle are shown. Each section is 8% of the
   entire width. The white patches on the sections denote the empty or
   almost empty regions inside the nuclei.

   We then validated the results of D^2 with experimental evidence. LADs
   were the conspicuous nucleic structures that were mainly located at the
   nuclear periphery. We found a significantly high correlation between
   the probability of appearing at the nuclear periphery (low DisTP) and
   the experimental LAD data, thereby verifying the fitness of DisTP
   (Figure [65]2A; Pearson coefficient correlation (PCC): 72.87%, p‐value:
   4.8 × 10^−213). Rods of nocturnal animals adopted an inverted radial
   organization, in which euchromatin resided at periphery and
   heterochromatin preferred the interior.^[ [66]^17 ^] Inverted
   distributions between the computed DisTP of rods and that of other cell
   types were observed (Figure [67]2B), indicating that our algorithm is
   able to capture diverse configurations of DisTP from dramatically
   different genome structures. Owing to the lack of experimental
   measurement for density, we attempted to validate this through highly
   compacted chromocenters located at the inner nuclei of mouse cells but
   not in human cells (Figure [68]S3, Supporting Information).^[ [69]^18 ,
   [70]^19 ^] We found that high‐density regions preferentially located at
   the inner nuclei of mouse cells but this was not the case in human
   cells (Figure [71]2C), indicating the feasibility of density detection.

Figure 2.

   Figure 2
   [72]Open in a new tab

   Validation of D^2 algorithm. A) Scatter plot of the experimental LAD
   data versus the probability of periphery regions from the D^2
   algorithm. Each point denotes a 2 Mb DNA segment of the mESC cell. The
   correlation and significance are computed by Pearson correlation (n =
   1103). The black line shows the fitting line of linear regression. B)
   Correlations of computed DisTP among different cell types. Each scatter
   plot includes 50 000 randomly sampled 20 kb genomic segments. Detailed
   analysis of adult rods is shown in the Supporting Information.
   C) Percentage of high‐density regions for a set of given DisTP. Each
   line denotes a cell type. The top 5% of dense regions are regarded as
   the high‐density regions.

   We then investigated the correlation between density and DisTP. In
   human cells, density and DisTP were largely independent of each other
   (PCC: −0.019 and −0.103; Figure [73] 3A,B; Table [74]S2, Supporting
   Information). By contrast, they were positively correlated in mouse
   cells (PCC: 0.278 to 0.673, Figure [75]3A,B; Table [76]S2, Supporting
   Information). This interspecies difference indicated that human and
   mouse cells adapted different strategies for nuclear spatial
   organization.

Figure 3.

   Figure 3
   [77]Open in a new tab

   Strong intrinsic stochasticity of density and DisTP. A) Scatter and
   kernel density estimate (KDE) plots of density and DisTP. The gray line
   delineates the fitting line of linear regression. The correlations are
   computed by Pearson correlation. Detailed information for the
   additional 15 cell types is shown in Table [78]S2 of the Supporting
   Information. B) Boxplot of PCC between density and DisTP for human and
   mouse cells. n indicates the number of cell types. C) 2D histogram and
   KDE plots of mean and SD for DisTP. One human and two mouse cell lines
   are shown here, and additional 14 cell types are shown in Figure [79]S5
   of the Supporting Information. D) Similar to (C) but for density.

   Next, we explored the stochasticity of density and DisTP by computing
   their standard deviations (SDs; Figure [80]S4, Supporting Information;
   the Experimental Section). SDs of density (≈1.5) and DisTP (≈4.7) were
   comparable to their means (≈2.2 and ≈6.8, Table [81]S3, Supporting
   Information), indicating a strong intrinsic stochasticity that was
   consistent with previous observations.^[ [82]^20 ^] In addition,
   certain regions with specific density or DisTP had lower standard
   deviation (SD). Specifically, for DisTP, the periphery regions showed
   lower SD for both human and mouse cells (Figure [83]3C; Figure [84]S5A,
   Supporting Information), thereby coinciding with their roles as anchors
   for attachment to the nucleus membrane. For density, the lower SD
   regions exhibited noticeable cell‐type specificity. For most cell types
   (11 out of 17), high‐density regions were lower in SD, while for the
   other six mouse neuron cell types, low‐density regions had lower SD
   (Figure [85]3; Figures [86]S5B and [87]S6, Supporting Information),
   suggesting a cell‐type‐specific anchoring mechanism for density.

2.2. Distinct Patterns of Functional Markers on D^2 Plot Revealed Functional
Genomic Regions

   The spatial organization of eukaryotic cell nuclei plays a pivotal role
   in regulating the biological activities of DNA molecules.^[ [88]^1 ,
   [89]^21 , [90]^22 ^] However, a comprehensive and systematic analysis
   of the correlation of density and DisTP with chromatin activities was
   lacking. To achieve that, we collected different types of
   epigenomic and genomic data from human GM12878 cells and mouse neonatal
   forebrain neurons, including DNA methylations, histone modifications,
   transcription factor binding sites, genomic repeats, and chromatin
   states.

   For joint analysis, we constructed a 2D matrix of density (x‐axis) and
   DisTP (y‐axis), named D^2 plot (the Experimental Section). D^2 plot was
   composed of physical states (the Experimental Section), the rectangular
   bins with specific values of density and DisTP (Figure [91]4A; Table
   [92]S4, Supporting Information). We only kept the states with enough
   genomic segments for statistical significance. Genetic markers were
   then mapped to the D^2 plot according to their enrichments on genomic
   segments. Enrichment scores of each marker at each physical state were
   calculated, compared to the whole genome as the background (the
   Experimental Section). The enrichment scores of all markers are shown
   in Figure [93]4 and the figures in the Supporting Information.

Figure 4.

   Figure 4
   [94]Open in a new tab

   Enrichment scores of genetic and epigenetic markers on D^2 plot. A) 2D
   histogram and KDE plots of density and DisTP in human GM12878 cells and
   mouse neonatal forebrain neurons. The top and right subpanels show the
   1D histograms of density (top) and DisTP (right). Yellow lines mark the
   boundaries to exclude outliers from downstream analysis. The boundaries
   of 16 cell types are shown in Table [95]S4 of the Supporting
   Information. B) Enrichment scores of H3K4me3 marker of human and mouse
   cells on D^2 plot. The enrichment score for the bottom right physical
   state in mouse cells is not shown owing to its small number (n < 200)
   of genomic segments (Methods section, Supporting Information).
   C) Similar to (B) but with markers H3K27me3 and H3K9me3. D) Enrichment
   scores of Alu and L1 for human GM12878 cell line. The results for mouse
   cell lines are shown in Figure [96]S11 of the Supporting Information.
   The right panel illustrates the competitive relationship between Alu
   and L1 in both human and mouse cells. E) Enrichment scores of YY1 and
   RAD21 for human GM12878 cell line. The enrichment scores of CTCF and
   SMC3 are shown in Figure [97]S13 of the Supporting Information. The
   right panel illustrates the mostly cooperative but occasionally
   independent relationships among CTCF, YY1, and cohesin. Details on the
   process of computing enrichment scores are presented in Methods section
   of the Supporting Information.

   We first examined the distribution of active markers on D^2 plot. We
   found that H3K4me3, a well‐established marker for transcriptional
   activation,^[ [98]^23 ^] enriched at the center rather than at the
   periphery of the nuclei, which was consistent with previous microscopic
   observations (Figure [99]4B; Figure [100]S7, Supporting Information).^[
   [101]^7 ^] Regions with enriched H3K4me3 distribution showed lower
   density (Figure [102]4), which confirmed that transcriptional
   regulators around active genes lowered the local DNA density.^[ [103]^6
   ^] Other active histone markers and chromatin states, and even the
   poised or weakly transcribed chromatin states exhibited similar
   patterns (Figure [104]S8, Supporting Information). The coexistence of
   active and weak transcriptional markers implied that the inner
   low‐density areas were not a determining factor for transcriptional
   activity, but rather provided a hospitable environment, allowing for
   the recruitment of regulating proteins for transcriptional activation,
   and thereby reinforcing the previously proposed “permissive role” of
   spatial organization.^[ [105]^24 ^]

   Next, we focused on the facultative and constitutive heterochromatins
   with repressed transcription, as marked by the trimethylation of lysine
   27 on histone H3 (H3K27me3) and trimethylation of lysine 9 on histone
   H3 (H3K9me3), respectively.^[ [106]^25 ^] On the D^2 plot, H3K27me3 was
   more likely to occupy the condensed regions inside the nuclei, while
   H3K9me3 showed an obvious preference for nuclear periphery
   (Figure [107]4C), as confirmed by the immunofluorescence experiments
   (Figures [108]S9 and [109]S10, Supporting Information).^[ [110]^26 ,
   [111]^27 ^] We also found that H3K9me3 enriched at inner high‐density
   regions in mouse cells but not in human cells, suggesting the
   species‐specific features of heterochromatin arrangements.^[ [112]^18 ,
   [113]^19 ^]

   We then investigated the repetitive elements on the D^2
   plot. We found that long repeats, including long terminal repeats and
   long interspersed nuclear elements (for example, L1), preferred the
   inner high‐density regions or periphery regions (Figure [114]4D; Figure
   [115]S11, Supporting Information). Contrarily, the short interspersed
   nuclear elements (including Alu, B2, and ID) preferably enriched at
   inner low‐density regions (Figure [116]4; Figure [117]S10, Supporting
   Information), which was consistent with the FISH analysis performed by
   Lu et al., who found that Alu clustered at the nuclear interior, while
   L1 clustered at the nuclear periphery.^[ [118]^28 ^] The distributions
   of Alu and L1 were significantly negatively correlated in both species
   (Figure [119]S12, Supporting Information), indicating a competitive
   relationship between the two transposons in spatial organization.

   Next, we examined the distribution of the architectural proteins,
   including RAD21, SMC3 (both subunits of cohesin), CTCF, and YY1. They
   are believed to cooperate in the construction of genomic structures via
   the use of a previously proposed loop extrusion model.^[ [120]^29 ^]
   The patterns of these proteins largely overlapped on the D^2 plot at
   the inner region of the nuclei (Figure [121]4E; Figure [122]S13,
   Supporting Information). Nonetheless, YY1 and CTCF were more likely to
   enrich at the lower‐density regions than cohesin, indicating that YY1
   and CTCF preferred to locate at the active compartment. Consistently,
   the fold changes of YY1 (1.71) and CTCF (1.23) enrichments in
   compartment A to B were higher than those of RAD21 (1.13) and SMC3
   (1.15) (Figure [123]S14, Supporting Information). The differences in
   enrichments implied that, in addition to working together, these
   proteins might function individually, as previously observed by Wei
   et al.^[ [124]^30 ^]

2.3. Cross‐Species Transcriptional Activation Model on D^2 Plot

   To link transcription activity to the D^2 plot, we selected 16
   transcription‐related markers and performed the hierarchy cluster on
   physical states (the Experimental Section). As shown in Figure [125]5A,
   we successfully distinguished the active states from the repressed
   ones. However, there were still intermediate states marked by both
   active markers and repressive histone modifications, corresponding to
   the “bivalent” genomic regions where promoters were prepared for rapid
   activation during differentiation.^[ [126]^31 ^]

Figure 5.

   Figure 5
   [127]Open in a new tab

   Cross‐species transcriptional activation model on D^2 plot.
   A) Hierarchy clustering of D^2 plot physical states (n = 225 for human,
   n = 224 for mouse due to one state with fewer than 200 genomic
   segments) based on the 16 transcription related markers. Using the
   optimal ranking hierarchy cluster, the states are ranked automatically
   according to their activation levels. The number of clusters is set as
   4 for both species. The black dashed lines mark the boundaries of the
   clusters. The abbreviations have been spelled out in the legends of
   Figures [128]S7 and [129]S10 of the Supporting Information. B) The four
   areas on each D^2 plot for either human GM12878 cells or mouse neonatal
   neurons, corresponding to (A). C) Illustration of the transcriptional
   activation model on the D^2 plot. The arrows point to more active
   regions. D) The enrichment scores of newly activated genes on the D^2
   plots of oligodendrocyte progenitors and mature oligodendrocytes. The
   enrichment scores of blank physical states are not shown owing to the
   small number (n < 200) of genomic bins. E) Similar to (D) but with
   newly repressed genes.

   Based on the hierarchy clusters for both human and mouse cells, we
   divided the physical states into four groups: active, intermediate,
   repress and histone (repressed states marked by repressed histone
   modifications), and repress and repeat (repressed states marked by long
   repeats) areas. Then we assigned these four areas back to the D^2 plot
   to reveal their transcriptional patterns (Figure [130]5B). The patterns
   between human GM12878 cells and mouse neonatal forebrain neurons showed
   slight differences. For example, by computing the property strength
   (Figure [131]S15, Supporting Information; the Experimental Section), we
   found that DisTP had less impact on the nucleic interior in human cells
   compared to mouse cells, probably owing to the different strategies of
   heterochromatin organization between the two species.^[ [132]^18 ^]
   Other than the marginal difference, the patterns were highly similar,
   out of which we proposed a cross‐species transcriptional activation
   model on the D^2 plot (Figure [133]5C). The model showed that the most
   inactive area, repress and repetitive, located at the periphery
   regardless of density. At the inner nucleus, however, density and DisTP
   jointly affected gene regulation. From inactive to active areas
   (repress and histone, intermediate to active area), density decreased
   while DisTP increased. This model is largely consistent with the
   current understanding of density and DisTP, but provides additional
   functional annotation.

   To verify the proposed model, we projected the lineage‐specific active
   genes of mouse brain cells onto the D^2 plot and found that those
   active genes located at more interior regions with lower density in the
   corresponding cell types compared with other cell types (Figure
   [134]S16, Supporting Information). Next, we focused on the genes that
   were previously defined as either newly activated or repressed during
   maturation from oligodendrocyte progenitors to mature
   oligodendrocytes.^[ [135]^32 ^] As expected, the newly activated genes
   distributed almost randomly on the D^2 plot of progenitors, but
   enriched at inner low‐density regions after the maturation
   (Figure [136]5D), corresponding to the active and intermediate areas.
   Conversely, the newly repressed genes located at inner low‐density
   regions in progenitors but distributed randomly in mature cells
   (Figure [137]5E).

   Tan et al. found no explanation for the adult‐neuron‐specific genes
   maintaining the same radial position during differentiation, despite
   the newly activation of these genes.^[ [138]^32 ^] We revealed that
   DisTP values of these genes remained unchanged (Fisher Test p‐value:
   0.68), but density values declined significantly (Fisher Test p‐value:
   3.34 × 10^−148; Figure [139]S17, Supporting Information), demonstrating
   that although DisTP alone was not sufficient for predicting gene
   regulation, its combination with DNA density was instructive.

2.4. Trajectories on D^2 Plots Revealed Transcriptional Activation Modes

   After establishing the correlation between the transcriptional level
   and the D^2 plot, we aimed to identify the transcriptional patterns of
   genomic regions merely by their trajectories on the D^2 plot. To do
   this, the activation index was introduced as the probability of a given
   genomic region appearing at the active or intermediate areas, and
   changes in its value reflected the trajectories on the D^2 plot during
   cell maturation (Figure [140]S18, Supporting Information; the
   Experimental Section). We examined our approach on olfactory sensory
   neurons (OSNs), which included three cell types at different
   differentiating stages (newborn, developing, and mature OSNs). By
   investigating the trajectories of the whole genome on the D^2 plot, we
   identified three transcriptional modes: constantly active (CA),
   repressed to active (RA), and constantly repressed (CR), which revealed
   the dynamic regulation of gene expression during OSN development.

   Next, we investigated the enrichments of functional elements in the
   three modes. Compared to the background, the percentages of exons in CA
   and RA were remarkably higher, while that of CR was lower (Figure
   [141]6A), in accordance with their expression levels. In stark
   contrast, the percentages of L1 in CA and RA were lower while that of
   CR was higher. As expected, CA was enriched with foundational gene
   pathways (ribosome, translation, and lipid transport), whereas RA was
   enriched with olfactory receptor (OR)‐related pathways
   (Figure [142]6B). No significantly enriched pathways (FDR < 0.01) were
   identified for CR.

Figure 6.

   Figure 6
   [143]Open in a new tab

   Trajectories on D^2 plots reveal the transcriptional activation modes.
   A) Boxplots for percentages of exon (left) and L1 (right) in three
   identified modes, namely constantly active (CA), repressed to active
   (RA), and constantly repressed (CR). For background datasets, 20 000
   bins from the whole genome are randomly selected. B) Enriched GO terms
   and KEGG pathways of the three modes. Only the top five significant
   terms (FDR < 0.05) are plotted. No significant term is found for the
   CR, therefore it is not shown. C) Schematic of a 3 Mb genomic segment
   of paternal haplotype containing 19 HK genes, 10 ORs, and 129 L1
   repeats. The colored segments mark the locations of these functional
   elements. D) The top three heatmaps show the activation indexes of
   three OSN cell types. The three identified modes are shown beneath
   them. The gray bins are not classified into the three modes. The bottom
   three bar plots denote the HK gene density, L1 density, and OR density
   at the 3 Mb genomic region. Each bin at the heatmap or bar plot
   indicates a 20 kb genomic segment. E) The trajectories of the sample
   genomic segments on the D^2 plot. The trajectories for other genomic
   segments are shown in Figure [144]S20 of the Supporting Information.
   The black arrows show the directions of trajectories for ORs.

   We then spotted an ≈3 Mb genomic segment (chromosome 10: 126.16–129.28
   Mb), harboring 19 housekeeping (HK) genes (mainly located at CA), 10
   ORs (at RA), and 129 L1 repeats (at CR) (Figure [145]6C). The
   activation indexes of HK genes stabilized at high levels across the
   three cell types, while those of L1 repeats remained at low levels
   (Figure [146]6D). On the D^2 plot, the housekeeping genes persistently
   enriched at the inner low‐density regions, while the L1 repeats
   enriched at the periphery (Figure [147]6E; Figure [148]S19, Supporting
   Information), corresponding to their constitutive activation or
   repression, respectively.

   ORs were a large gene family that expressed at mature OSNs for sensing
   odors.^[ [149]^33 ^] Correspondingly, their activation indexes
   increased during development (Figure [150]6D), and they moved on the
   D^2 plot from the periphery (newborn OSNs), to inner high‐density
   (developing OSNs), and then toward lower density (mature OSNs), in
   consonance with their activation process (Figure [151]6E; Figure
   [152]S18, Supporting Information). Additionally, for each individual
   OSN, only one specific OR expressed while all the others were
   repressed,^[ [153]^33 ^] which would leave featured distributions on
   D^2 plot. Our D^2 plot demonstrated that ORs collectively enriched
   inside nuclei with varying densities, corresponding to the active,
   intermediate, and repress and histone areas (Figure [154]S20,
   Supporting Information), and reflecting their diverse levels of
   activation at the bulk level.

3. Conclusion

   Inside the cell nuclei, the genome is compartmentalized into regions
   with different physical properties, to effectively execute their
   biological functions. Based on the reconstructed genome structures from
   single‐cell Hi‐C, we developed the D^2 algorithm to define density and
   DisTP at a genome‐wide scale and integrated them into a two‐dimensional
   D^2 plot. Different genetic and epigenetic markers exhibited distinct
   enrichment patterns on the D^2 plot, indicating that these two physical
   properties, density and DisTP, were two pivotal parameters for
   identifying the functional compartmentalization of the genome
   structure. For instance, transcription‐related enzymes and active
   genomic sites altogether clustered at inner low‐density regions,
   whereas repressed heterochromatin markers enriched at inner
   high‐density or periphery regions.

   In this study, we investigated the enrichments of architectural
   proteins (CTCF, YY1, and cohesin) on the D^2 plot and revealed their
   largely overlapping enrichment zones, correlating to their cooperative
   functions. In addition, we spotted cohesin‐independent enrichments of
   YY1 and CTCF at inner low‐density regions, hinting their independent
   functions. Furthermore, all the architectural proteins located inside
   the nuclei, suggesting alternative mechanisms for genome construction
   at the nuclear periphery. L1 repeats, which were previously reported to
   promote heterochromatin compartmentalization,^[ [155]^28 ^] were found
   to mainly locate at the nucleic periphery, implying their possible
   involvement in maintaining genome construction.

   In this study, we proposed a cross‐species transcriptional activation
   model on the D^2 plot among different human and mouse cells. This model
   revealed that the distance to the periphery was negatively correlated
   with the transcriptional activities of genes, which seemingly
   contrasted to the center‐positioned nucleoli that were mainly composed
   of heterochromatin. The significantly low genome coverage (<5%) of
   nucleolus‐associated domains might render these regions neglected in
   our model.^[ [156]^34 ^] To examine the fitness of our model to reflect
   gene activation, mouse olfactory neurons at three differentiating
   stages were analyzed, and our model successfully identified the
   activated genes based on the change of density and DisTP. However, the
   underlying mechanism has not been explored. We assume that the
   translocation of the selected genes between distal regions with
   different density and DisTP is unfavorable, owing to the high cost of
   energy. Alternatively, the changes of local chromatin modifications,
   such as histone modification, assembling into distinct condensates
   through phase separation, and simultaneously altering surrounding
   density or DisTP, is feasible.

   Notably, the D^2 model of mouse cells, albeit being derived from mouse
   forebrain neurons, is capable of linking the structure with gene
   expression during OSN development, verifying its broad fitness and
   feasibility. Nonetheless, further verification and application of this
   model in other species is required, but these are currently unavailable
   owing to the lack of high‐quality single‐cell Hi‐C data. Our model
   would be applicable to most species, where sophisticated Mb‐scale
   spatial organization is essential for regulating gene expression. It
   might be helpful to elucidate human tumorigenesis analysis, which
   usually involves dramatic alteration of the genome structure.

   Our study provides a new perspective for 3D genome analysis. Previous
   studies mainly focused on well‐known structures such as loops or TADs,
   which only revealed the spatial adjacency of focal structures.
   Contrarily, our D^2 model allows for universal measurements of the
   genome structure and correlates the genomic structure with its
   function. Nonetheless, we speculate that physical properties of DNA act
   more as microenvironments to facilitate or repress gene transcription
   activity rather than direct and decisive factors, as protein structure
   to its function. In summary, our study paves the way for further
   methodological exploration to link genome structure with its function.

4. Experimental Section

D^2 for Detecting Density and DisTP

   Python script D2.py implemented the D^2 algorithm, which rapidly and
   accurately computed the DNA density and DisTP values based on the
   reconstructed genome structure. D^2 algorithm can be found at
   [157]https://github.com/xjtu‐omics/D2.

Inputted Reconstructed Structures

   Reconstructed genome structures, which were generated from single‐cell
   Hi‐C, were the main input for the D^2 algorithm. Genome structures from
   two research groups were collected in this study (Table [158]S1,
   Supporting Information). Stevens et al.^[ [159]^16 ^] processed
   single‐cell Hi‐C on eight individual haploid mouse ES cells and then
   reconstructed the nuclear structures, using the NucDynamics algorithm.
   The algorithm was designed based on molecular dynamic simulation.
   First, the genome was divided into 100 kb genomic bins, each of which
   represented a DNA particle in 3D structure. Then, a structure was
   constructed using the physical force field, including attraction,
   repel, and restriction. Using the simulated annealing algorithm, the
   structures were constructed to optimally simulate actual structures.
   The other group, Tan et al. developed a novel chromain conformation
   capture method, Dip‐C, which can detect more contacts than single‐cell
   Hi‐C. They omitted biotin pull down and conducted high‐coverage
   whole‐genome amplification (META). Then, they used a genome
   reconstruction algorithm, hickit, which was similar to NucDynamics and
   enabled the construction of diploid cells at a relatively higher
   resolution (20 kb).^[ [160]^10 ^]

Partition Particles by 3D Segmentation

   Owing to the heterogeneity of nuclei sizes among cells, the
   reconstructed structures were not directly comparable among cells. This
   batch effect resulted in different nucleic radii and DNA densities. To
   address this, the DNA particles were placed inside a 3D mesh
   segmentation with equally shaped cubes, with cell‐specific cube length.
   The length was set as the average distance to the nearest ith particle
   of all particles from a given cell (i is the predetermined parameter,
   default as 7), to ensure the average number of DNA particles within a
   cube was approximately the same across cells. This predetermined
   parameter was not sensitive for further analysis, because the following
   analysis concentrated on the differences between the density and DisTP
   of genomic regions (e.g., lower‐density regions), while the change in
   this parameter increased or decreased density or DisTP universally. The
   segmentation was constructed according to the cube length and covered
   the particles. Then the spatial coordinate of particles was replaced
   using the cube coordinate. This normalization was based purely on DNA
   density, but the nucleic radius (half of the maximum distance among
   filled cubes) was adequately normalized (Figure [161]S21, Supporting
   Information). Furthermore, this partition reduced the number of tasks
   from particles to cubes, thus efficiently speeding up and saving memory
   during calculation.

Computing DNA Density

   Once segmentation was performed, DNA density could simply be defined as
   the number of particles per cube. However, the arbitrary partition
   above could result in an inaccurate density. For example, if a DNA
   particle resided in the transiting regions between compacted and loose
   regions, it should be labeled as middle density. However, it may be
   partitioned into high‐density or low‐density cubes, instead of a middle
   one. To solve this problem, the density of a given cube was smoothened
   by its nearby cubes through the following equation
   [MATH: <mrow><mrow><msub><mi>D</mi><mi>i</mi></msub><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mo>∑</mo><mrow><mi>j</mi
   ><mo>∈</mo><mi>S</mi></mrow></msub><msub><mi>C</mi><mi>j</mi></msub><mo
   >/</mo><mo stretchy="false">∥</mo><mover
   accent="true"><mi>i</mi><mo>⃗</mo></mover><mo>−</mo><mover
   accent="true"><mi>j</mi><mo>⃗</mo></mover><mo
   stretchy="false">∥</mo></mrow><mrow><mi>num</mi><mfenced open="("
   close=")"><mi
   mathvariant="normal">S</mi></mfenced></mrow></mfrac></mrow></mrow>
   :MATH]
   (1)

   where D[i] is smoothed DNA density of the spatial cube i, C[i] is the
   DNA particle number within the cube i, and S is the set of nearby
   cubes. The nearby range was set as 3. A longer range would be more
   optimized for calculation, but it was unnecessary as distant cubes
   contributed little.

Computing Distance to Periphery

   Previous methods for detecting DisTP were simple and, to a certain
   extent, inaccurate (Note [162]S2, Supporting Information). One of them
   was developed in 2017 by Stevens et al.^[ [163]^16 ^] It first
   identified the empty cubes around the nonempty cubes as the surface.
   Then, they calculated the DisTP as the nearest distance to the surface.
   However, imaging experiments showed that chromocenters would repel DNA
   inside the nucleus,^[ [164]^10 ^] which would be wrongly defined as the
   nuclear surface in Steven's methods. Tan et al. used another metric,
   the distance to the nuclear center.^[ [165]^10 ^] This circumvented the
   chromocenter, but assumed the nuclei to be spherical. However, nuclei
   were always in different shapes. Therefore, a new methodology that
   accurately defined the periphery and then computed the DisTP was
   introduced.

   In this method (Figure [166]S1, Supporting Information), the outer
   empty cubes were first defined as the membrane cubes. They were started
   to define from the boundaries of cubes, and then were moved toward the
   inner cubes. The definition was stopped when filled cubes were
   encountered. To eliminate the dissociative outlier particles, the
   definition was stopped only when there were enough particles within a
   certain range (3 particles in 5 cubes by default). In other words, the
   faraway cubes with only one or two particles were defined as membrane
   cubes. Then, the DisTP for a given cube was calculated as the average
   value of the top 10 minimum distances to membrane cubes. The minimum
   distances were not used because they were discrete values and highly
   sensitive. For example, if one inner cube was incorrectly defined as a
   membrane cube, the cubes around it would be defined as periphery.

D^2 Plot: Construction of D^2 Plot

   To combine density and DisTP for further systematic analysis, a 2D
   density–DisTP matrix, the D^2 plot was constructed. Outlier bins were
   eliminated during plotting. For the density axis, the distribution
   resembled the normal distribution. Therefore, the 1.96 * SD was
   selected from the center as the separating line, to retain 95% of the
   center bins. For the DisTP axis, all the bins were maintained near the
   periphery, but eliminated the top 5% bins further away from the
   periphery. The ranges were similar among the diploid cell types, even
   among the species (Table [167]S4, Supporting Information). Therefore, a
   standard range was set for all diploid cells (1 to 3.2 for density,
   1.21 to 16 for DisTP). Each axis was then divided into 15 equally
   shaped bins. The bin number was set arbitrarily, but it was showed that
   different bin numbers did not affect the enrichment patterns (Figure
   [168]S22, Supporting Information). Each 2D bin at the D^2 plot
   represented a specific physical state with a unique combination of
   density and DisTP. Notably, states with fewer than 200 particles were
   removed from subsequent analysis to ensure statistical significance
   (marked as empty).

Computing Marker Enrichment Scores

   To calculate the marker enrichment scores on the D^2 plot, genetic data
   were weighted and averaged by the probability of the genomic segment
   appearing at the physical state, as shown below
   [MATH: <mrow><mrow><msub><mi>E</mi><mi>a</mi></msub><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mo>∑</mo><mi>G</mi></msu
   b><msub><mi>P</mi><mrow><mi>g</mi><mi>a</mi></mrow></msub><mo>·</mo><mi
   >N</mi><msub><mi>V</mi><mi
   mathvariant="normal">g</mi></msub></mrow><mrow><msub><mo>∑</mo><mi>G</m
   i></msub><msub><mi>P</mi><mrow><mi>g</mi><mi>a</mi></mrow></msub></mrow
   ></mfrac></mrow></mrow> :MATH]
   (2)

   where E[a]  is the computed enrichment score of the genomic marker in
   the physical state, a. NV[g]  is the z‐scored normalized enrichment
   score of a genetic marker at the genome segment, g. P[ga]  is the
   probability of the genome segment g appearing in the state a, defined
   as the ratio of the number of cells whose genomic segment g located at
   the state a with respect to the total cell number. Afterward, the
   enrichment scores were normalized by z‐score normalization.

   Through this procedure, the density and DisTP of single cells were
   determined. There was another way of computing the enrichment scores,
   where the density and DisTP were averaged first
   [MATH: <mrow><mrow><msubsup><mi>E</mi><mi>a</mi><mo>′</mo></msubsup><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mo>∑</mo><msub><mi>G</mi
   ><mi>a</mi></msub></msub><mi>N</mi><msub><mi>V</mi><mi>g</mi></msub></m
   row><mrow><mi>N</mi><mfenced separators="" open="("
   close=")"><msub><mi>G</mi><mi>a</mi></msub></mfenced></mrow></mfrac></m
   row></mrow> :MATH]
   (3)

   where
   [MATH: <mrow><msubsup><mi>E</mi><mi>a</mi><mo>′</mo></msubsup></mrow>
   :MATH]
   is the alternative enrichment score derived from the average density
   and DisTP at physical state, a. G[a] is the collection of genomic
   segments whose average density and DisTP were located at state a.
   N(G[a]) is the number of collections. The average values delineated the
   bulk activities of a cell type more optimally, but were unable to
   represent the raw density and DisTP detected from the single cell. For
   example, the average values could not reflect the sparse distribution,
   for example, the multimodal distribution.

   To test if the average density and DisTP would give similar enrichment
   patterns, the dip test was first carried out for unimodal distribution.
   All genomic bins were unimodal for both density and DisTP (Table
   [169]S3, Supporting Information, percentage of both p‐value < 0.01:
   98.11–99.88%). Then, the enrichment score analysis with average density
   and DisTP was repeated and it was found that it exhibited similar
   patterns to Figure [170]4 (Figure [171]S23, Supporting Information),
   indicating that the average density and DisTP equivalently reflect
   biological activity.

Stochasticity of Marker Enrichment Score

   Considering the high cell stochasticity of density and DisTP, the
   stochasticity of marker enrichment scores was examined by computing
   their SD using the following equation
   [MATH: <mrow><mrow><msub><mi>SD</mi><mi>a</mi></msub><mo
   linebreak="badbreak">=</mo><msqrt><mfrac><mrow><msub><mo>∑</mo><mi>G</m
   i></msub><msub><mi>P</mi><mrow><mi>g</mi><mi>a</mi></mrow></msub><mo>·<
   /mo><msup><mfenced separators="" open="("
   close=")"><mrow><mi>N</mi><msub><mi>V</mi><mi>g</mi></msub><mo>−</mo><m
   sub><mi>E</mi><mi>a</mi></msub></mrow></mfenced><mn>2</mn></msup></mrow
   ><mrow><msub><mo>∑</mo><mi>G</mi></msub><msub><mi>P</mi><mrow><mi>g</mi
   ><mi>a</mi></mrow></msub></mrow></mfrac></msqrt></mrow></mrow> :MATH]
   (4)

   where SD[ a ] is the calculated SD of the marker enrichment score at
   physical state a. NV[g] , P[ga] , and G are defined as above. To
   compare SD with the mean enrichment scores, they were presented
   together at Figure [172]S24 of the Supporting Information. The max–min
   normalization on mean values was utilized for comparison between
   markers. This normalization was also used on SD values. Therefore,
   markers with SD less than 0 denoted low stochasticity, while others
   with greater than 1 denoted high stochasticity. Certain markers (e.g.,
   H3K27me3 and H3K9me3) showed lower SD than the mean value, indicating
   their consistent enrichment among cells. Nonetheless, most markers
   showed a comparable or larger SD, indicating high stochasticity. To
   test the robustness of the enrichment score analysis under such
   stochasticity, the analysis was repeated using only one cell (Figure
   [173]S25, Supporting Information). The enrichment patterns, even from
   one cell, were highly similar to Figure [174]3, suggesting a variable
   despite the robust correlation between physical properties and genetic
   markers.

Hierarchy Cluster and Property Strength

   To obtain a transcriptional pattern on the D^2 plot, 16
   transcription‐related markers (Txn‐St, Txn‐W, Pr‐Ac, Pr‐W, En‐St, En‐W,
   H3K27ac, H3K36me3, H3K4me3, H3K4me2, Alu, H3K9mne3, H3K27me3, L1, Ht‐L;
   the abbreviations are spelled out in the legend of Figures [175]S7 and
   [176]S10, Supporting Information) were selected and the hierarchy
   cluster (scipy.cluster.hierarchy with parameter optimal_ordering =
   True, method = “ward”) was performed on physical states from the D^2
   plot. The hierarchy cluster number was set as 4 for both species.

   The property strength reflected the intensity of impact of density or
   DisTP on the transcription level. The active markers were selected and
   the property strength was computed at a given DisTP as the slope
   between property changes (x) and marker enrichment score (y), as
   follows
   [MATH: <mrow><mrow><msub><mi>S</mi><mrow><mi
   mathvariant="normal">d</mi><mo>,</mo><mi
   mathvariant="normal">p</mi></mrow></msub><mo
   linebreak="badbreak">=</mo><mrow><mi>slope</mi><mspace></mspace><mi>of<
   /mi><mspace></mspace><mi>linear</mi><mspace></mspace><mi>fitting</mi><m
   space></mspace></mrow><mfenced separators="" open="("
   close=")"><mrow><mi>x</mi><mo
   linebreak="goodbreak">=</mo><msub><mi>MFG</mi><mrow><mi
   mathvariant="normal">d</mi><mo>,</mo><mrow><mspace></mspace><mi
   mathvariant="normal">p</mi></mrow></mrow></msub><mo>,</mo><mi>y</mi><mo
   >=</mo><msub><mi>P</mi><mrow><mi
   mathvariant="normal">d</mi><mo>,</mo><mi
   mathvariant="normal">p</mi></mrow></msub></mrow></mfenced></mrow></mrow
   > :MATH]
   (5)

   where S [d,p] is the computed property strength of property p at DisTP
   d. For density, the x input for linear fitting is the collection of all
   densities at a given DisTP and y is the corresponding marker enrichment
   score. For DisTP, x is the collection of the given DisTP and its nearby
   DisTP, and y is the corresponding enrichment score. The property
   strength reflected the association between the physical properties and
   the transcription level.

Distribution of Lineage‐Specific Active Genes

   The distribution of active genes was examined on the D^2 plot, to
   validate the transcriptional activation model. Considering the
   nonuniform distribution of whole genome bins on the D^2 plot, the
   enrichment scores of active genes over the background were computed.
   Specifically, the distribution of active genes was first calculated,
   and then it was divided by the distribution of all genomic bins as the
   background.

Trajectories on D^2 Plots

   The activation index and its change were introduced to reveal the
   trajectories of a given genomic region on the D^2 plot, which reflected
   its transcriptional activation modes (Figure [177]S18, Supporting
   Information). Specifically, the activation index was defined as the sum
   of the enrichment scores of a given genomic bin on active or
   intermediate regions. Based on the activation index, the top 10%
   genomic bins were marked as the active bins, the bottom 10% as the
   repressed bins, and the others as medium. Three transcriptional modes:
   constantly active (CA), constantly repressed (CR), repressed to active
   (RA), were identified. During OSN maturation, the bins, which were
   active at all OSN cell stages, were marked as the CA bins. Conversely,
   the bins repressed at all stages were marked as CR bins. RA bins
   comprised bins that were not active at newborn OSNs, but active at
   mature OSNs.

   The genomic positions of exons were obtained from the gff file, and
   those of L1s were obtained from RepeatMasker. Housekeeping genes were
   obtained from the HRT Altas v1.1 database.^[ [178]^35 ^] Their
   percentage of length at the given genomic bins was calculated.

   The gene pathway enrichment analysis was performed by DAVID.^[ [179]^36
   ^] Only KEGG pathways and GO terms were chosen for enrichment.

Reference Genome

   The mouse reference genome (GRCm38) and gene annotations (ALL) were
   downloaded from the GENCODE ([180]https://www.gencodegenes.org/mouse/)
   M19 release.

   The human reference genome (GRCh37) and gene annotations (ALL) were
   downloaded from the GENCODE ([181]https://www.gencodegenes.org/human/)
   39 release.

Statistical Analysis

   All of the Pearson correlations were two‐sided and were computed using
   scipy.stats.pearsonr. The linear fitting was conducted using
   scipy.stats.linregress, with parameter deg = 1. The sample size for
   each statistical analysis is available in the figure legends or in the
   Supporting Information. Fisher tests were two‐sided and computed by
   scipy.stats.fisher_exact.

Preprocess of Genetic Markers

   To integrate density and DisTP with other genomic data, the other data
   were transferred into the same indexed data. All data (except
   DamID‐LmnB1 data of mESC) were indexed by 20 kb genomic bins. For
   unphased data, the data were copied to two haplotypes.

ChIP‐seq Data

   ChIP‐seq sequenced protein binding sites. ChIP‐seq data of
   transcription factors (CTCF, YY1, RAD21, SMC3, POLR2A, POLR2Ap5 for
   human GM12878 cells) and histone modifications (H3K4me2, H3K4me3,
   H3K9me3, H3K27ac, H3K27me3, H3K36me3 for both human GM12878 cells and
   mouse neonatal neurons) were collected. These data, with different
   types of processed files, were collected from ENCODE. Here the fold
   change was utilized over the control file, because it provides whole
   genome protein binding information. This file was formatted as BigWig
   and was transformed to a bedgraph file using the UCSC script
   bigWigToBedGraph. Then bedtools intersect were used to find the
   intersected regions between the index file and the marker bedgraph
   file. Next, the fold change of all 20 kb bins was weight‐averaged and
   they were indexed according to the index file.

Genomic Repeat

   The repeat data were obtained from RepeatMasker. Owing to the low
   resolution of the 3D genome, only broadly distributed repeats would be
   statistically significant for enrichment. Therefore, repeats with less
   occurrences were filtered out. Only repeats with more than 20 000
   occurrences were left. The remaining repeats with the index file were
   intersected and then their percentage at a given genomic bin was
   calculated, and finally they were indexed in the aforementioned
   procedure.

Chromatin State

   Chromatin states described the transcriptional states of genomic loci.
   They were calculated using HMM and based on chip‐seq data.^[ [182]^37 ,
   [183]^38 ^] Because chromatin states were discrete states,
   intersections, calculated percentages, and indexed were performed
   through processes similar to those of the repeats. Each genomic bin had
   a value for each chromatin state, representing the length of the state
   within that bin.

CpG

   The CpG data were downloaded from ENCODE, formatted as bedMethyl. Sites
   were filtered out without reads. The remaining sites were intersected
   with the index file, the median CpG percentages were calculated in a
   given bin, and finally they were indexed using the aforementioned
   procedure.

DamID‐Lmn for Detecting LADs

   DamID‐seq identified protein binding sites by expressing the proposed
   DNA‐binding protein as a fusion protein with DNA methyltransferase.
   Using DamID‐seq to detect the binding sites of Lmn‐B1 (Lamin‐B1), the
   main component of nuclear lamina, is a popular method of detecting
   LADs. The mESC LAD experimental data were utilized.^[ [184]^39 ^]
   Because these data were discrete like genomic repeats, the data were
   intersected, calculated, and indexed in the same way as the repeats.
   Notably, the bin size was 100 kb for the used mES cells.

   To validate DisTP, experimental LAD data were compared to the
   probability of appearing at the nuclear periphery (percentages of cells
   whose given genomic bin had DisTP below 2). As they were both indexed
   by the same index file, they were associated with their index. The
   outliers were eliminated using three‐sigma limits.

   Noteworthy, due to the small number of available cells (n = 8), the
   possible values for simulated LAD data are too sparse at 100 kb
   resolution. Therefore, the nearby bins were concatenated to 2 Mb
   resolution, in order to provide continuous percentages. The number of 2
   Mb genomic bins is 1103. The two‐sided Pearson correlation was computed
   using scipy.stats.pearsonr.

Stochasticity of Density and DisTP

   To explore the stochasticity of DisTP, its standard deviations were
   computed (SDs; Figure [185]S4, Supporting Information). SD was computed
   among cells of given genomic bins, thereby reflecting the cell
   heterogeneity. The lower SD meant a more stable DisTP value among
   cells.

   The stochasticity of density was computed similarly to DisTP. In
   contrast with DisTP, the lower SD regions of density showed obvious
   cell‐type specificity. Therefore, the Pearson correlation between
   density mean and density SD (scipy.stats.pearsonr) was conducted, to
   identify the values of density that were more stable.

Conflict of Interest

   The authors declare no conflict of interest.

Authors Contribution

   Y.C. developed the D^2 algorithm with the help of X.Y. Y.C. downloaded,
   analyzed, and interpreted the data in current studies. P.J. and T.W.
   provided help for analyzing data. Y.C. was the major contributor in
   writing the manuscript. T.G. helped with the benchmark. X.Y., D.X., and
   K.Y. revised the manuscript. K.Y. supervised the whole program. All
   authors read and approved the final manuscript.

Supporting information

   Supporting Information
   [186]Click here for additional data file.^ (4.6MB, pdf)

Acknowledgements