Abstract

   Large copy number variants (CNVs) in the human genome are strongly
   associated with common neurodevelopmental, neuropsychiatric disorders
   such as schizophrenia and autism. Here we report on the epigenomic
   effects of the prominent large deletion CNVs on chromosome 22q11.2 and
   on chromosome 1q21.1. We use Hi-C analysis of long-range chromosome
   interactions, including haplotype-specific Hi-C analysis, ChIP-Seq
   analysis of regulatory histone marks, and RNA-Seq analysis of gene
   expression patterns. We observe changes on all the levels of analysis,
   within the deletion boundaries, in the deletion flanking regions, along
   chromosome 22q, and genome wide. We detect gene expression changes as
   well as pronounced and multilayered effects on chromatin states,
   chromosome folding and on the topological domains of the chromatin,
   that emanate from the large CNV locus. These findings suggest basic
   principles of how such large genomic deletions can alter nuclear
   organization and affect genomic molecular activity.
     __________________________________________________________________

   Copy number variants in the human genome (CNVs) are associated with
   neurodevelopmental and psychiatric disorders such as schizophrenia and
   autism. Here the authors investigate how  the large deletion CNV on
   chromosome 22q11.2 alters chromatin organization.

Introduction

   Two of the most exciting discoveries in human genetics of the past
   decade are that small-to-medium-sized copy number variants (CNVs) are
   very common in the human genome and that there is a group of large CNVs
   that are strongly associated with brain development and
   neuropsychiatric disorders, such as schizophrenia and the autism
   spectrum disorders (ASDs)^[42]1,[43]2. These large CNVs are widely
   considered to be  enticing points of entry to the analysis of the
   strong but complex genetic, molecular, and possibly even cellular,
   basis of these common disorders.

   Large CNVs, typically sized from hundreds of thousands to millions of
   base pairs of genomic DNA sequence, were previously known to be in
   strong association with often severe but rare congenital malformations,
   or found in cancer genomes. It was a striking discovery when a series
   of studies^[44]1,[45]2 showed that there is a group of more than ten
   large CNVs that are strongly associated with aberrant brain development
   and a resulting neuropsychiatric phenotype such as schizophrenia or
   ASD. These large neuropsychiatric CNVs each encompass multiple genes
   and their effects across the various molecular levels of gene activity
   and regulation, and the connections from there to the clinical
   phenotypes, are complex and only poorly understood. For example, 22q11
   deletion syndrome (22q11DS) is a disorder caused in the vast majority
   of cases by a heterozygous deletion of about 3 million base pairs
   spanning about 60 known genes on chromosome 22q11.2. It occurs in 1 per
   3000–6000 live births^[46]3. The common phenotypes of 22q11DS include a
   large spectrum of congenital anomalies, for example of the facial
   structures and the immune and cardiovascular systems—and notably there
   is a strong association with several neurodevelopmental psychiatric
   disorders, in particular schizophrenia and ASD^[47]2,[48]4–[49]7.

   On the molecular level, these large neuropsychiatric CNVs have been
   mostly studied by focusing on the effects of individual genes or small
   groups of genes from within the CNV boundaries. Many very interesting
   insights have been gained using this approach.

   However, these findings about individual genes fall short of explaining
   the full effects of the large CNVs. There already have been a number of
   transcriptome-wide studies that at least hint at certain network
   effects emanating from the large CNVs^[50]8–[51]12. Which mechanisms
   mediate such transcription network effects is then the question.
   Furthermore, there are an increasing number of studies that show a
   potentially very important role of chromatin regulation in the
   molecular etiology of neuropsychiatric disorder^[52]13–[53]19.

   Against this backdrop, we reasoned that it was worthwhile testing
   whether large CNVs with association with brain development might cause
   a disruption or at least alteration of one or several aspects of
   chromatin conformation, such as the distribution of regulatory
   chromatin marks, the long-range direct physical interactions between
   distant regions on one chromosome or between different chromosomes, or
   the higher-order chromatin domain structures that are defined by such
   marks or interactions. Such effects on these important layers of
   molecular regulation of gene activity would then constitute a basic
   principle by which large CNVs could transmit their presence to the
   machinery of cellular physiology.

   Here we show, in a cohort of lymphoblastoid cell lines (LCLs) derived
   from patients with 22q11DS, that chromatin marks, chromatin domains,
   and long-range chromosome interactions are affected in several distinct
   ways by the large, common, and strongly disease-associated CNV on
   chromosome 22q11.2. We use the large CNV on 22q11.2 as a model to
   determine the generalizable principles along which large CNVs of this
   category can lead to changes to the various ways in which chromatin is
   ordered, using unbiased, genome-wide, sequencing-based assays for
   discovery. We then go on to show in a smaller number of LCLs from
   different patients that at least some of the same observations can also
   be made for another neuropsychiatric large CNV on chromosome 1q21.1.

Results

Generation of Hi-C, capture-Hi-C, and haplotype phasing data

   To determine the possible effects of the 22q11.2 deletion on
   chromosomal interactions, we generated Hi-C contact maps for 11 human
   LCLs (5 patient cell lines with 22q11.2 deletion and 6 control cell
   lines without), with a total of 3.1 billion Hi-C contact reads of which
   680 million read-pairs were of high quality and used for the downstream
   analyses (Supplementary Table [54]1). The presence and boundaries of
   the heterozygous 3-Mbp deletion in the patient cell lines were
   validated by whole-genome sequencing (Supplementary Fig. [55]1). To
   rule out an increase in overall genome instability in the patient LCLs,
   we carried out whole-genome sequencing at a coverage comparable to that
   used for the majority of samples by the 1000 Genomes Project (i.e.,
   <10× genome-wide coverage). Almost all of our control cell lines had
   been part of the 1000 Genomes Project and we compared the genome-wide
   CNV load between our patient LCLs and LCLs used by the 1000 Genomes
   Project. We found no elevated genome-wide CNV burden in our patient
   cell lines (Supplementary Table [56]2).

   As a means of quality control, our study included the cell line
   GM06990, which was the cell line used in the original Hi-C
   paper^[57]20. Inter-chromosomal contacts of GM06990 as determined by
   our own Hi-C data for this line (Supplementary Fig. [58]2a) show the
   same patterns of chromosomal interactions across the nucleus as in ref.
   ^[59]20; i.e., small chromosomes generally have more interactions with
   each other than larger chromosomes with each other and many more than
   chromosomes in the medium size range. Specific interaction pairs, e.g.,
   between chromosomes 17 and 19, were also replicated in our GM06990
   data. Global inter-chromosomal contact maps resulting from combining
   all our controls and cases, respectively, again replicated these global
   interaction patterns (Supplementary Fig. [60]2b, c).

   In order to be able to carry out haplotype-specific analysis of the
   chromatin and transcriptome data, we conducted haplotype phasing for
   the two related patients in our cohort, a mother and son duo with the
   22q11.2 deletion, where this was made possible by combining the
   sequencing-based approach of statistically aided long-read haplotyping
   (SLRH)^[61]21 with an analysis of Mendelian inheritance of
   single-nucleotide variants (SNVs). We were able to phase 1,868,316 and
   1,870,948 heterozygous SNVs on the autosomes of ID00014 and ID00016,
   respectively (see details in Methods and Supplementary Information).
   Two control samples from related donors, GM12878 and GM12892, were also
   haplotype-phased using the same approach. For GM12892, we phased
   1,929,967 autosomal SNVs and for GM12878 we phased 1,874,181 autosomal
   SNVs. These phased SNVs were then used to create haplotype-specific
   Hi-C interaction maps and to carry out allele-specific analyses of the
   chromatin immunoprecipitation–sequencing (ChIP-Seq) and RNA-Seq data.

   Haplotype-specific Hi-C analysis requires very deep sequencing coverage
   of Hi-C libraries since only paired-end reads that cover at least one
   informative (heterozygous) SNV can be used for that purpose. To
   increase the Hi-C coverage for chromosome 22q to these required very
   deep levels, we carried out custom-designed chromosome-wide targeted
   capture Hi-C. We performed capture Hi-C by using a custom-designed set
   of 2.1 million NimbleGen oligomer capture probes representing the
   entire sequence of chromosome 22q and Hi-C libraries prepared using the
   in situ Hi-C protocol from the four cell lines for which we generated
   haplotype phasing data, followed by Illumina sequencing (Supplementary
   Table [62]1). With this approach, we achieved 10–16-fold levels of
   chromosome 22q-specific enrichment of Hi-C paired-end reads over the
   standard non-capture Hi-C data (Supplementary Fig. [63]3).

Normalization of Hi-C data

   Many factors, such as mappability of sequencing reads, GC content,
   length of the restriction enzyme fragment, etc., can lead to biases in
   Hi-C data. Several alternative computational approaches for Hi-C data
   analysis have been developed that allow for the removal of these biases
   from data^[64]22–[65]31. When using these alternative algorithmic
   approaches on Hi-C data that was generated for genomes without large
   CNVs, such as GM12878, the resulting normalization metrics will be
   highly correlated^[66]32. For our Hi-C data, however, we needed to be
   certain to use a normalization method that is not thrown off at the
   outset by the presence of the heterozygous 3-Mbp deletion in 22q11.2.

   We tested three different commonly used normalization algorithms that
   were developed for Hi-C data, following the rationale that the
   normalization methods should not change the general patterns of
   interaction we can see in the raw data.

   We found that not all of the available normalization methods are robust
   for the use with Hi-C data coming from genomes with large CNVs.
   However, the hicpipe algorithm^[67]22 is quite suitable for this
   purpose (details of the comparison between normalization methods in
   Supplementary Information and Supplementary Fig. [68]4).

Decrease of intra-chromosomal contacts involving 22q11.2

   We observed that in 22q11DS cell lines the chromosomal contacts within
   the 22q11.2 deletion region are decreased significantly compared to
   control cell lines (Fig. [69]1a). Also strongly reduced are the
   chromosomal contacts between the 22q11.2 deletion region with the
   entire remainder of chromosome 22 (Fig. [70]1a; Supplementary
   Fig. [71]5a). This decrease is consistent with the copy number of the
   22q11.2 deletion region in the patient cell lines, as all of the
   22q11DS cell lines are heterozygously deleted for this region. No such
   decrease of chromosomal contacts that involved an extended and
   contiguous chromosomal region was observed that did not involve the
   22q11.2 region. Likewise there was no such strong decrease of
   intra-chromosomal contacts over an extended and contiguous chromosomal
   region on any of the other autosomes (for example, chromosome 19 in
   Fig. [72]1b; Supplementary Fig. [73]5b).

Fig. 1.

   [74]Fig. 1
   [75]Open in a new tab

   Effects of the 22q11.2 deletion on chromosome interactions, gene
   expression and chromatin marks. a–d Each pixel in the heatmaps
   represents the intra- or inter-chromosomal contact frequency in Hi-C
   data from 22q11.2del cell lines (n = 5) versus control cell lines
   (n = 6) for a 500-kbp region. Yellow dashed lines indicate the 3-Mbp
   deletion on chromosome 22q. The color scale goes from −2 (blue) to 0
   (gray) to 2 (red). a Fold change of cis-contacts along chromosome 22 in
   22q11.2del versus control cell lines. Black boxes indicate increased
   contacts between the deletion-flanking regions in 22q11.2del cell
   lines. Blue box: the signal for increased contacts between the
   centromere–distal deletion-flanking region and the telomeric end of
   chromosome 22q (red arrows and dashed red line indicate the
   corresponding chromosome folding event). b Lack of intra-chromosomal
   fold change of contacts for chromosome 19. c Fold change of
   inter-chromosomal contacts between chromosome 22 and chromosome 17. d
   Lack of inter-chromosomal fold change of contacts between chromosome 18
   and chromosome 19. e Log2-transformed fold change of gene expression
   for genes on chromosome 22q in RNA-Seq data from 22q11.2del (n = 5)
   versus control (n = 9) cell lines. Each point represents a gene. f–h
   Log2-transformed fold change in ChIP-Seq signals in 22q11.2del versus
   control cell lines. f H3K27ac histone modifications (n = 5 for
   22a11.2del and n = 4 for control cell lines). g H3K27me3 histone
   modifications (n = 5 for 22a11.2del and n = 4 for control cell lines).
   h CTCF-binding sites (n = 4 for 22a11.2del and n = 3 for control cell
   lines). Black vertical dashed lines indicate the 3-Mbp deletion on
   chromosome 22q in e–h

   We then investigated whether this effect of a decrease in chromosomal
   interactions between the 22q11.2 deletion and cis-contacts (i.e.,
   within the deletion boundaries and between the deletion region and
   elsewhere on chromosome 22q) also holds for trans-contacts (i.e., for
   contacts between the region within the deletion boundaries and the rest
   of the genome). We found the trans-contacts involving the 22q11.2
   deletion in the patient cell lines and any other chromosome also
   decreased compared to control cell lines (Fig. [76]1c; Supplementary
   Fig. [77]5c). No other regions of chromosome 22q showed such an effect
   and neither did any other pair of autosomes that did not include 22q
   (Fig. [78]1d; Supplementary Fig. [79]5d).

Gene expression and epigenetic profiles in 22q11.2 region

   To determine the effects of the 22q11.2 deletion on gene expression
   patterns and chromatin marks, we performed RNA-Seq on 14 cell lines
   (Supplementary Table [80]1) and ChIP-seq for H3K27ac and H3K27me3
   histone modifications in 6 cell lines, as well as ChIP-Seq for CTCF
   binding in 5 cell lines, respectively (Supplementary Table [81]3).
   RNA-Seq analysis showed that if a gene that is located within the
   boundaries of the large CNV in 22q11.2 is expressed, the level of that
   expression is decreased in 22q11.2DS patient lines relative to control
   cell lines (Fig. [82]1e), consistent with a previous study^[83]8.

   Differential pattern analysis of the ChIP-Seq data for H3K27ac,
   H3K27me3, and CTCF showed that for the majority of binding sites within
   the 22q11.2 deletion boundaries the binding signal decreased in
   22q11.2DS patient lines compared to control cell lines (Fig. [84]1f–h).
   This phenomenon of decreased binding signals over a large chromosomal
   region is specific to chromosome 22q11.

Chromosome contacts and chromatin marks in flanking regions

   The distal and proximal flanking regions of the 22q11.2 deletion are
   brought into close proximity to each other by the formation of the
   deletion breakpoint junction. We hypothesized that, since Hi-C contacts
   between two given regions will increase with decreasing genomic
   distance, the contacts between the distal and proximal
   deletion-flanking regions in 22q11DS patient cells would be markedly
   enhanced. We indeed found such stronger contacts between the
   deletion-flanking regions in 22q11.2DS cell lines (Fig. [85]1a;
   Supplementary Fig. [86]4a).

   We used the haplotype phasing information and the deep capture Hi-C
   data for two patient and two control cell lines to generate
   haplotype-specific contact maps for chromosome 22q (Fig. [87]2a). The
   chromosome contacts between proximal and distal flanking regions of the
   22q11.2 deletion increased on the chromosome 22q with the deletion when
   compared with the intact chromosome 22q within the same patient cell
   line (Fig. [88]2). The chromosome contacts between proximal and distal
   deletion-flanking regions in 22q11.2 on the intact chromosome in the
   patient cell lines were not affected compared to the chromosomes in the
   controls (Fig. [89]2).

Fig. 2.

   [90]Fig. 2
   [91]Open in a new tab

   Haplotype-specific intra-chromosomal contacts of chromosome 22q. a
   Contacts within each homologous chromosome 22q for patients (ID00014
   and ID00016) and controls (GM12878 and GM12892) at 200 kbp resolution.
   The chromosome 22q with 22q11.2 deletion is shown on the left for the
   two patient cell lines. Intensity of contacts is represented by a scale
   from 0 (white) to 5 (deep red). The gap in signal for the region from
   39 to 42 Mbp in ID00016 is caused by a very low density of heterozygous
   SNVs. b Zoomed-in view of the regions in black boxes in a. The color
   scale goes from 0 (white) to 10 (red)

   Following this observation, we wanted to examine whether there is an
   effect on the chromatin marks concurrent to these increased chromosomal
   contact patterns. To do so, we performed enrichment analysis in our
   ChIP-Seq data to detect genomic regions that were enriched with
   significantly differential signals of H3K27ac, H3K27me3, and CTCF
   between patients and controls in 500-kbp bins. We found that the
   deletion-flanking regions were significantly enriched for differences
   in H3K27ac and H3K27me3 signal (Fig. [92]3a, b). For CTCF, only the
   distal deletion-flanking region was significantly enriched (Fisher’s
   exact test p value = 2.53e−06) (Fig. [93]3c).

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Distribution of significantly enriched differential ChIP-Seq signals
   and haplotype-specific A/B compartments. a H3K27ac histone marks (n = 5
   for 22a11.2del and n = 4 for control cell lines). b H3K27me3 histone
   marks (n = 5 for 22a11.2del and n = 4 for control cell lines). c
   CTCF-binding sites (n = 4 for 22q11.2del and n = 3 for control cell
   lines). d A/B compartments for each homologous chromosome 22q. Each bar
   represents a 500-kbp bin. Red and black dashed lines mark the
   boundaries of the 3-Mbp deletion in 22q11.2. Shown in d is the first
   eigenvector for the principal component analysis of the chromosomal
   contact matrix. x Axis: position on chromosome 22q. y Axis: value of
   the first eigenvector. The signal from the homologous chromosome 22q
   with 22q11.2 deletion and the intact chromosome 22q are shown in red
   and turquoise, respectively, for patients (ID00016 and ID00014).
   Signals from the homologous chromosomes 22q of the controls are in blue

   We then determined which individual binding sites within the
   differential 500-kbp bins contributed to the differential signal.
   Within the proximal deletion-flanking region, from 18 to 18.5 Mbp, we
   found that 5 out of the 24 sites with ChIP-Seq signal for H3K27ac and 6
   out of the 15 sites with ChIP-Seq signal for H3K27me3 showed
   significantly differential binding (Fisher’s exact test p
   value = 0.0075 for H3K27ac and p value = 5.67e−05 for H3K27me3,
   respectively). Interestingly, for H3K27ac all of the five sites with
   significant differential binding were bound less strongly while for the
   same region for H3K27me3 all of the six sites with significant
   differential binding were bound more strongly in the 22q11DS patient
   lines. For the distal region, from 23 to 23.5 Mbp, 5 out of the 13
   sites with H3K27ac binding and 11 out of the 30 sites with H3K27me3
   binding showed significantly differential binding (Fisher’s exact test
   p value = 0.0004 for H3K27ac and p value = 1.26e−07 for H3K27me3,
   respectively). Again we observed the reciprocity between significantly
   differential changes for the two different histone marks: 4 out of the
   5 sites for H3K27ac gave a less strong signal while all of the 11 of
   such sites for H3K27me3 showed a stronger signal in the 22q11.2DS
   patient cell lines.

   Such reciprocity in signal strengths between these two histone marks is
   a general feature of their principle of action and indicates that the
   observed changes in chromatin marks are of physiological relevance.

Haplotype-specific A/B compartments and topological domains

   Previous studies of cell lines without large CNVs have revealed that
   there is a level of organization in the cell’s nucleus where the
   chromatin is partitioned into two states, termed A and B compartments,
   which broadly correspond to large regions that are overall active and
   overall inactive, respectively^[96]20. These A/B compartments can be
   derived from Hi-C data. We sought to determine whether the 22q11.2
   deletion might lead to changes in the A/B compartments. We computed the
   haplotype-specific A/B compartments for the four cell lines for which
   both deep capture Hi-C and genome haplotype phasing data are available
   (Fig. [97]3d). We did not observe differences in the A/B compartments
   between cell lines from patients and from controls and neither between
   the two homologous chromosomes 22q in any individual cell line,
   patient, or control. Our results indicated that A/B compartments of the
   homologous chromosome with the 22q11.2 deletion were not affected by
   the deletion.

   Topological domains (also known as topologically associating domains
   (TADs)) are a megabase-sized structural feature of the genome
   organization that is constituted of highly self-interacting chromosome
   regions^[98]33. We calculated the haplotype-specific topological
   domains on the four cell lines with both capture Hi-C and haplotype
   phasing data (Fig. [99]4, Supplementary Figs. [100]6-[101]7). Although
   there were variations in the calling of topological domains across
   individuals and between the homologous chromosomes, the direction
   indices were highly consistent between the two homologous chromosomes
   in the controls (Supplementary Fig. [102]6c-d, Supplementary
   Fig. [103]7) as well as in the patients (Fig. [104]4, Supplementary
   Fig. [105]6a-b), suggesting no changes of topological domains on the
   homologous chromosome with the 22q11.2 deletion. In spite of the
   increased contacts between the flanking regions of the deletion, they
   were not strong enough to incur fusion of the topological domains
   proximal and distal of the 22q11.2 deletion. However, the increased
   contacts between the flanking regions of the deletion extended to
   multiple deletion-distal topological domains.

Fig. 4.

   [106]Fig. 4
   [107]Open in a new tab

   Differential chromosome contacts for the two chromosomes 22q and
   haplotype-specific topological domains for two patient cell lines. a
   Patient ID00014, b Patient ID00016. Heatmaps with bin size 200 kbp are
   showing the intra-chromosomal contacts on the homologous chromosome 22q
   with the 22q11.2 deletion minus the intra-chromosomal contacts on the
   intact chromosome 22q. The color scale of the heatmap goes from −5
   (blue) to 5 (red) with 0 represented by black. Yellow rectangles mark
   the increased contacts between the topological domains proximal and
   distal of the 22q11.2 region on the homologous chromosome 22q with the
   deletion (the topological domain on the proximal side of the deletion
   increases its contact not just with the topological domain immediately
   distal of the deletion, as predicted, but also with the two following
   topological domains). Yellow triangles and pink dashed lines mark the
   topological domains involved in the increased contacts. Black lines
   mark the boundaries and extent of the 22q11.2 deletion. Signal tracks
   for CTCF binding, histone marks, RNA-Seq, direction indices (DIs), and
   topological domains are shown twice, for the chromosome 22q with
   deletion (above dashed red line) and for the intact chromosome 22q
   (below dashed red line). Direction indices (DIs) are calculated at
   40-kbp resolution for each homologous chromosome 22q.
   Haplotype-specific gene expression, histone modifications, and CTCF
   binding, respectively, are shown after normalizing the reads to 10
   million read pairs. RNA-Seq and ChIP-Seq read-count bars within the
   boundaries of the deletion but from the intact chromosome 22q are shown
   for loci where there are phased SNVs in GM12878

Change in cis-contacts of distal deletion-flanking region

   Based on the above findings that the genomic region distal of the
   22q11.2 deletion was affected on different levels of molecular
   regulation in 22q11.2DS patient cell lines, we next sought to
   investigate whether the presence of the 22q11.2 deletion also affected
   the chromosomal cis-contacts of the distal deletion-flanking region
   with any other region on chromosome 22q. To do so, we analyzed our Hi-C
   data by calculating the fold change of contacts between the 21.5 and
   22-Mbp window, which is situated right distal of the 22q11.2 deletion,
   and all the other 1-Mbp-sized regions on chromosome 22q. We observed
   that region 17–18 Mbp, i.e., the breakpoint-proximal region right
   proximal of the deletion, showed the largest fold change (2.04) of
   contact with region 21.5–22 Mbp between 22q11DS and control cell lines.
   This was as expected given that these chromosomal regions were brought
   into close proximity to each other by the 22q11.2 deletion
   (Fig. [108]1a).

   The second largest fold change (1.96) of cis-contacts involving region
   21.5–22 Mbp was for contacts with region 50–51 Mbp, i.e., toward the
   very telomeric end of chromosome 22q (Fig. [109]1a). Intriguingly, we
   also observed strong positive correlation between region 22–22.5 Mbp
   and region 50–50.5 Mbp for CTCF binding (Pearson’s r = 0.933, p
   value = 0.02) (Supplementary Fig. [110]8a) and between region
   22–22.5 Mbp and region 50.5–51 Mbp for H3K27ac enrichment (Pearson’s
   r = 0.811, p value < 0.05) (Supplementary Fig. [111]8b). Furthermore,
   there was weak correlation between region 22–22.5 Mbp and 50.5–51 Mbp
   for H3K27me3 enrichment (Pearson’s r = 0.74, p value = 0.090)
   (Supplementary Fig. [112]8c). Given that both regions 21.5–22 and
   22–22.5 Mbp are in the distal flanking region of the 22q11.2 deletion,
   our results indicate that increased chromosomal contacts between the
   distal deletion-flanking region and the telomeric region 50–51 Mbp may
   be associated with the differential changes of histone modifications
   and CTCF binding that we found to be in correlation between these two
   regions.

   To validate the increased chromosomal contact between the
   deletion-distal region and the telomeric end of 22q, we performed
   three-dimensional fluorescence in situ hybridization (3D FISH) on 5
   cell lines with the 22q11.2 deletion and on 6 control cell lines, using
   two FISH probes specific to loci in region 21.8–22.5 Mbp (RP11–47L18:
   chr22:21,931,796–22,118,344) and in region 50–51 Mbp (RP11–125K3:
   chr22:50,149,996–50,287,311), respectively. FISH showed that the distal
   flanking region of the 22q11.2 deletion is significantly closer to
   region 50–51 Mbp in nuclear space in 22q11DS cell lines than in control
   cell lines (t test p value 0.008) (Fig. [113]5; Supplementary
   Fig. [114]9). Taken together, our results strongly indicate that the
   22q11.2 deletion causes conformational changes on several levels on
   chromosome 22q.

Fig. 5.

   [115]Fig. 5
   [116]Open in a new tab

   Examples of 3D FISH visualization of intra-chromosomal interaction
   changes. The regions for which Hi-C predicted changes in interaction,
   regions 21.8–22.5 and 50–51 Mbp on chromosome 22q, were visualized by
   3D DNA FISH using BAC probes RP11-47L18 and RP11-125K3, labeled with
   biotin (magenta) or digoxigenin (green). Arrow: an example for the
   magenta and green FISH probes in close proximity in a cell carrying the
   22q11.2 deletion. Scale bars are 5 µm. Magenta and green bars on
   chromosome 22q indicate the locations of the biotin and digoxigenin
   FISH probes, respectively. Black dashed lines indicate the position of
   the 3-Mbp deletion

Global changes of inter-chromosomal contact patterns

   To explore the effect of the 22q11.2 deletion on trans-contacts, i.e.,
   between any non-homologous autosomes in the nucleus of 22q11.2DS
   patient lines, we analyzed our Hi-C data for significantly different
   trans-contacts between 22q11DS and control cell lines. On the
   genome-wide level, we found 272 trans-contacts with a Fisher’s exact
   test p value of <0.0001 (Fig. [117]6a). Interestingly the majority of
   these chromosomal trans-contacts did not involve chromosome 22q as one
   of the interacting partners. Notably, 56 of these inter-chromosomal
   contact signals are among the top 5% of the strongest trans-contacts
   (Fig. [118]6b). This enrichment of significantly different genome-wide
   chromosomal trans-contacts within the strongest trans-contacts is again
   highly statistically significant (Fisher’s exact test p
   value < 2.2e−16). We found far fewer significantly different
   genome-wide chromosomal trans-contacts by randomly swapping the
   assignment of 22q11.2 deletion and control status across our Hi-C data
   sets, by comparing within control cell lines and by comparing within
   22q11.2del cell lines (Supplementary Table [119]4). None of these
   swapping analyses achieved the same enrichment. This indicates that a
   relevant amount of the significantly different genome-wide chromosomal
   trans-contacts is not due to random chromosomal motion or to as-of-yet
   unknown factors such as cell culture variations across the LCLs.
   Rather, our analysis points to a genome-wide disturbance of the network
   of chromosomal trans-contacts that is at least in part attributable to
   the presence of the 22q11.2 deletion on chromosome 22q.

Fig. 6.

   [120]Fig. 6
   [121]Open in a new tab

   Genome-wide inter-chromosomal contact changes in 22q11.2del versus
   control cell lines. a Circos plot of the inter-chromosomal contacts
   exhibiting differential interaction in 22q11.2del (n = 5) versus
   control (n = 6) cell lines at significance level of Student’s t test p
   value = 0.0001. b Circos plot of the inter-chromosomal contacts
   exhibiting differential interaction in 22q11.2del (n = 5) versus
   control (n = 6) cell lines at significance level of Student’s t test p
   value = 0.0001 and showing only the top 5% strongest inter-chromosomal
   contacts. The circle displays all autosome-sized scaffolds and each
   line represents an inter-chromosomal contact change in a, b

Global changes of gene expression patterns

   To investigate the global effect of the 22q11.2 deletion on gene
   expression, we performed differential expression analysis between the
   22q11.2DS and control cell lines. Of the 11,374 genes with detectable
   levels of expression (fragments per kilobase of transcript per million
   mapped reads (FPKM) > 0.5), 1610 genes showed significantly
   differential expression (false discovery rate (FDR) < 0.05). Gene
   ontology analysis indicated that these differentially expressed genes
   are enriched for genes involved in mitochondrial pathways such as the
   respiratory chain (n = 32, modified Fisher’s exact test p
   value = 2.96e−11). KEGG pathway analysis demonstrated enrichment of
   genes involved in oxidative phosphorylation and neurodegenerative
   diseases, such as Parkinson’s disease (Fig. [122]7a). Earlier studies
   have noted that there are several genes related to mitochondrial
   function that are located within the 22q11 deletion boundaries^[123]34,
   therefore the pathways associated with this cellular function would
   have a high likelihood to be affected by the change in copy number. The
   enrichment for pathways related to neurodegenerative disorder is
   unexpected in LCLs but notable, as an association between 22q11DS and
   Parkinson’s Disorder has been reported previously^[124]35–[125]37. This
   may indicate that the LCL cell culture may in certain instances be of
   relevance for the molecular study of 22q11DS even beyond the
   fundamental and general levels.

Fig. 7.

   [126]Fig. 7
   [127]Open in a new tab

   Differential expression analysis and correlation between gene
   expression and epigenetic profiles. a KEGG pathway analysis for genes
   differentially expressed between 22q11.2del (n = 5) and control (n = 9)
   cell lines. Shown are pathways with modified Fisher’s exact test p
   value < 9.05e−04. b Enrichment analysis for chromosome 22q for
   differentially expressed genes. Red dashed lines mark the boundaries of
   22q11.2del region. c Differentially expressed genes correlated with
   different H3K27ac and H3K27me3 signal strengths in 22q11.2del versus
   control cell lines. A1 represents genes significantly upregulated in
   22q11.2del cell lines while B1 represents non-significantly upregulated
   genes; A2 represents genes significantly downregulated in 22q11.2del
   cell lines while B2 represents non-significantly downregulated genes.
   The y axis shows the Z-score transformed fold change of H3K27ac signals
   (left) or H3K27me3 signals (right) between 22q11.2del (n = 5) and
   control (n = 4) cell lines. d Differential H3K27ac and H3K27me3 signal
   strengths exhibited correlation with differential gene expression
   between 22q11.2del and control cell lines. A1 represents genes whose
   TSSs were significantly more marked by H3K27ac (left) or by H3K27me3
   (right) in 22q11.2del cell lines. B1 represents genes with TSSs
   non-significantly more marked by H3K27ac (left) or by H3K27me3 (right).
   A2 represents genes whose TSSs were significantly less marked by
   H3K27ac (left) or by H3K27me3 (right) in 22q11.2del cell lines. B2
   represents genes with TSSs non-significantly less marked by H3K27ac
   (left) or by H3K27me3 (right). The y axis shows the Z-score transformed
   fold change of genes’ FPKM in 22q11.2del (n = 5) versus control (n = 9)
   cell lines. In c, d, box represents quartiles, centre line denotes 50th
   percentile, and whiskers extend to the most extreme data points within
   1.5 times of the interquartile range

   We also carried out genome-wide analysis to determine whether there are
   entire genomic regions that are enriched for differentially expressed
   genes. As expected, the most significant signals for this analysis were
   located in the 22q11.2 region (Fig. [128]7b; Supplementary
   Fig. [129]10). No other regions genome wide achieved FDR-corrected
   significance (Supplementary Fig. [130]10).

Correlation between histone modification and gene expression

   To examine whether the gene expression changes are consistent with the
   histone modification changes, we assigned the H3K27ac and H3K27me3
   peaks to their nearest genes based on the distance to their
   transcription start sites (TSSs), and for each gene, we only retained
   the closest peak for both histone marks. We observed that significantly
   upregulated genes in 22q11.2DS cell lines exhibited significantly
   higher fold change of H3K27ac enrichment (permutation test p
   value = 0.0140) and significantly lower fold change of H3K27me3
   enrichment (permutation test p value = 0.0018) than genes
   non-significantly upregulated in 22q11.2DS cell lines (Fig. [131]7c).
   Consistently, significantly downregulated genes in 22q11.2DS cell lines
   exhibited significantly lower fold change of H3K27ac enrichment
   (permutation test p value = 0.0002) but significantly higher fold
   change of H3K27me3 enrichment (permutation test p value = 0.0227) than
   genes non-significantly downregulated in 22q11.2DS cell lines
   (Fig. [132]7c).

   Moreover, genes whose TSSs showed significantly upregulated binding by
   H3K27ac in 22q11DS cell lines exhibited significantly higher fold
   change of expression (permutation test p value < 2.2e−16) than those
   with non-significantly upregulated binding, while those TSSs showing
   significantly downregulated binding exhibited a significantly lower
   fold change of expression (permutation test p value = 0.0202) than
   those with non-significantly downregulated binding (Fig. [133]7d).
   Consistently, genes whose TSSs showed significantly upregulated binding
   by H3K27me3 in 22q11DS showed significantly lower fold change of
   expression (permutation test p value = 0.0037), whereas those with
   significantly downregulated binding exhibited significantly higher fold
   change of expression (permutation test p value = 0.0027) between
   22q11DS and control cell lines (Fig. [134]7d).

   We also calculated the correlation between gene expression and
   H3K27ac-binding affinity across the individuals for all the genes with
   TSSs bound by H3K27ac. In line with the above results, we found
   significantly higher Pearson’s correlation coefficients than the
   coefficients obtained from permutations (Wilcoxon rank sum test p
   value < 2.2e−16) (Supplementary Fig. [135]11a). This difference was
   even more significant when we included the differential expressed genes
   (absolute fold change >2) only (Wilcoxon rank sum test p
   value < 2.2e−16) or the differential H3K27ac-bound genes only into our
   analysis (Wilcoxon rank sum test p value < 2.2e−16) (Supplementary
   Fig. [136]11a). We also observed significantly higher Pearson’s
   correlation coefficients between gene expression and H3K27me3 binding
   than the coefficients obtained from permutations for the differential
   expressed genes (absolute fold change >2) only (Wilcoxon rank sum test
   p value = 9.72e−04) and the differential H3K27me3-bound genes only
   (Wilcoxon rank sum test p value = 3.71e−06) (Supplementary
   Fig. [137]11b). Together, our results demonstrate that gene expression
   changes are associated with histone modification changes in 22q11DS
   cell lines compared with control cell lines and that this phenomenon
   occurs across the entire genome in cells with the deletion in 22q11.2.

Effects of 1q21.1 deletion on chromosome folding

   To explore whether large CNVs other than the one on chromosome 22q11.2
   can lead to changes in the patterns of chromosome folding, we performed
   Hi-C on two LCLs with a heterozygous deletion of approximately 1.35 Mbp
   in size on chromosome 1q21.1 (1q21.1del). This deletion is strongly
   associated with the development of schizophrenia^[138]38–[139]40.
   Similar to what we had found for the 22q11.2 deletion, we observed that
   both cis- and trans-contacts between the 1q21.1 deletion regions and
   other regions were decreased in 1q21.1del cell lines relative to
   control cell lines (Fig. [140]8a, b). As observed in the 22q11DS cell
   lines, in the 1q21.2del cell lines there was an increase of
   intra-chromosomal contacts between the regions directly flanking the
   main CNV of 1q21.1 (Fig. [141]8a). Global changes of inter-chromosomal
   contacts were also observed (Supplementary Fig. [142]12), again
   consistent with the inter-chromosomal contact changes in 22q11.2 LCLs.
   Taken together, cells with the large deletion CNV on chromosome 1q21.1
   exhibited similar effects on chromosomal cis-contacts and
   trans-contacts as found in cells with the large deletion CNV on
   chromosome 22q11.2, which points toward our findings in 22q11DS being
   generalizable across large neuropsychiatric CNVs.

Fig. 8.

   [143]Fig. 8
   [144]Open in a new tab

   Effect of the large deletion CNV on 1q21.1 on chromosome conformation.
   a Fold change of cis-contacts of chromosome 1 in 1q21.1del (n = 2)
   versus control (n = 6) cell lines. Black boxes mark the regions of
   increased contacts between the proximal and distal flanking regions of
   the 1q21.1 deletion. b Fold change of trans-contacts between chromosome
   1 and 4 in 1q21.1del (n = 2) versus control (n = 6) cell lines. In a,
   b, black dashed lines mark the boundaries of the 1q21.1 deletion. Each
   cell in the heatmap represents the chromosomal contact level between
   two 500 kbp regions. The color scale goes from −1 (blue) to 0 (gray) to
   1 (red)

Discussion

   Large CNVs are an important feature of the genetic architecture of
   several major neurodevelopmental psychiatric disorders as well as of
   conditions involving aberrant morphology of many organ systems. Their
   effects on the level of phenotype are complex and the molecular
   mechanisms mediating these effects are very incompletely understood.
   While it is still a good assumption that a considerable portion of
   these mechanisms are a direct consequence of the copy number change of
   the genes within a given CNV’s boundaries and the resulting changes in
   expression levels for these genes, it also seems plausible to
   investigate whether additional levels of complexity exist regarding the
   effects of a large CNV across multiple layers of the control of gene
   activity. This plausibility stems from several observations and lines
   of reasoning. There are large numbers of genes affected genome wide and
   far distal from the main CNV. This leads us to consider the basic
   principles that govern organizational features of the nucleus and the
   chromatin such as regulatory domains being bounded by protein factors
   that recognize binding sites in the DNA sequence, which in turn could
   be affected (i.e., deleted or duplicated) by a CNV, or long-range
   chromosome contacts being also influenced by the length of the involved
   chromosomes (and that length being changed by large CNVs). Finally,
   there have been several recent reports about mutations in chromatin
   remodeling genes in the context of neuropsychiatric disorders pointing
   to the importance of proper molecular management on the epigenomic
   level in these conditions.

   Here we studied the effects of the important large CNV on chromosome
   22q11.2, and in a more limited fashion the effects of the large CNV on
   chromosome 1q21.2, on chromatin conformation including long-range
   chromosome contacts and domain formation, epigenetic profiles, and gene
   expression. We developed an approach that combined genome phasing and
   targeted fragment enrichment for very deep chromosome-wide Hi-C, which
   makes it possible to investigate the effects of the 22q11.2 deletion in
   a haplotype-specific fashion. We observed multiple effects of the large
   CNVs on long-range chromosome contacts, chromatin domains, and
   epigenetic profiles as well as on gene expression. More specifically,
   we found increased contacts between distal and proximal flanking
   regions of the 22q11.2 deletion in 22q11DS cell lines in contrast with
   control cell lines. Our haplotype-specific analyses of chromosomal
   interactions demonstrated that these increased contacts only occurred
   on the chromosome 22q carrying the deletion. Interestingly, both distal
   and proximal flanking regions of the 22q11.2 deletion were enriched
   with differential signals for the histone marks H3K27ac and H3K27me3
   while only the distal flanking regions were enriched with
   differentially binding sites of CTCF. At present, the reason for this
   discrepancy is not known. We note that only the deletion-distal
   flanking region is engaging in increased intra-chromosomal interactions
   with the telomeric end of chromosome 22q but the available data are not
   sufficient to conclude that differential binding of CTCF is causally
   involved in this phenomenon.

   We did not observe the end of chromosome 22 (50–51 Mbp) to be enriched
   with differentially expressed genes (2 out of the 24 genes) despite the
   Hi-C finding, validated by FISH, of significantly increased contacts
   with the distal flanking region of the 22q11.2 deletion in 22q11.2DS
   cell lines. Allele-specific expression analysis did not provide
   evidence for the existence of differential expression between the
   intact chromosome 22q and the chromosome with the 22q11.2 deletion
   (Supplementary Fig. [145]13), outside the CNV boundaries. Taken
   together, our results indicate that chromosome conformation changes
   caused by the 22q11.2 deletion contribute to gene expression changes
   and epigenetic profile changes between 22q11.2DS and control cell
   lines, but not always in a linear and clearly deterministic fashion.

   However, we did observe significant positive correlation between
   H3K27ac changes and gene expression changes but negative correlation
   between H3K27me3 and gene expression changes on the genome-wide level.
   These findings demonstrated that the epigenetic profiles were reshaped
   genome wide in correlation with extensive gene expression changes in
   22q11.2DS cell lines.

   Our pathway enrichment analysis of differentially expressed genes
   revealed candidate pathways that might be associated with various
   clinical phenotypes of 22q11DS, such as cardiac symptoms and
   neurodegenerative disorders (Fig. [146]7a), in addition to general
   molecular pathways, some with obvious potential association with
   22q11DS such as mitochondrial energy metabolism. It is important to
   remain cautious when interpreting these disease-specific results as
   they are based on data from a cell type that is not directly relevant
   to the organ systems in question. At the same time, many genes will of
   course be relevant in more than one cell type and organ, and their
   expression levels may be regulated by similar molecular mechanisms
   across cell types. With the proper caution in place, the findings about
   potential relevant pathways may be considered as an unexpected side
   product of this study that may point toward additional routes for
   further analysis. In any case, the pathway analyses do not affect the
   findings about effects on chromatin states and chromosome folding in
   22q11DS.

   Our study does not address the question of what the molecular causes
   for phenotypic variance between individual carriers of the 22q11.2
   deletion might be. This is a highly important question but one that,
   given the vast number in each individual genome of potential genetic
   modifiers, that together with environmental factors are the prime
   suspects for such variance, will need far larger cohort sizes of cell
   lines modeling the tissues where such variance is observed. Such
   cohorts are currently being assembled and will be the basis of highly
   interesting work in the coming years^[147]41.

   The effects of decreasing cis- and trans-contacts and the increased
   chromosomal contacts between distal and proximal deletion-flanking
   regions in 22q11Ds cell lines were also observed in cell lines with the
   1q21.1 deletion. That such similar results were obtained on this level
   of observation for both the 22q11.2 deletion and the 1q21.1 deletion
   could suggest that at least some of the changes caused by a large
   deletion CNV are generalizable for such chromosomal aberrations instead
   of specific to the 22q11.2 deletion.

   As a cautionary note on the technical level, we demonstrated here that
   for genomes with a large deletion CNV the appropriate normalization
   methods for Hi-C data have to be chosen with great care to avoid false
   findings. For instance, we would have reached the conclusion that the
   chromosomal contacts within the deletion regions are not decreased in
   cell lines with deletion compared with control cell lines if the hiclib
   software package^[148]23 had been used for normalization of Hi-C data,
   instead of the hicpipe package^[149]22.

   Lastly, we were able to generate haplotype-specific maps of chromosomal
   contacts by developing an approach that combines
   genome-sequencing-based phasing information and chromosome-wide capture
   of Hi-C fragments before sequencing. These haplotype-specific contact
   analyses enabled us to distinguish between the intact chromosome 22q
   and the chromosome 22q with deletion. Based on this, we derived
   chromosomal A/B compartments and topological domains for both
   homologous chromosomes 22q separately. There was no switch of A/B
   compartments on the chromosome 22q with the 22q11.2 deletion compared
   to the intact chromosome 22q in the patients. There also was no
   evidence for newly fused topological domains or changes of topological
   domain boundaries on the chromosome 22q with the 22q11.2 deletion.
   Earlier, Lupiáñez et al.^[150]42, using the 4C assay, demonstrated in a
   mouse model as well as in human lines from patients with rare
   malformations of the limbs, that large structural changes in the
   genomic sequence on mouse chromosome 1 and human chromosome 2 can
   affect the topological domain architecture that is situated directly on
   top of the sequence change. However, our results did not show
   comparable effects of the 22q11.2 deletion. Further studies will
   determine which factors, such as, for example, the size of a given CNV
   or its position relative to topological domain-defining sequence
   elements, might determine whether topological domains are altered or
   not. Interestingly, although the topological domains on or nearby the
   22q11.2 deletion themselves were not altered (rather, they were either
   deleted entirely or left unchanged), there was an unexpected pattern of
   increase in contact between the topological domains flanking the
   deletion, where the proximal topological domain not only increased its
   frequency of interaction with the nearest distal topological domain but
   also with the two topological domains following that.

   In summary, we found multiple effects of the large deletion CNV on
   chromosome 22q11.2 on long-range chromosome contacts, chromatin
   organization, epigenetic profiles, and gene expression. Extensive
   changes on these levels caused by the 22q11.2 deletion are global and
   seem to be rippling along the entire chromosome carrying the deletion
   as well as across the entire nucleus rather than being confined to the
   deletion region only. Such effects had never been shown before outside
   of cancer cell lines^[151]43. Furthermore, in contrast to the findings
   in cancer cells we used a larger cohort of individual patient cell
   lines for most of our analyses, with all lines carrying only one main
   large CNV that is clearly and strongly associated in a causative manner
   with a neurodevelopmental phenotype. The earlier study in cancer used
   only two cancer cell lines that each carried multiple large CNVs which
   could also have been a consequence rather than a cause of the disease
   phenotype. This makes it much more likely that the higher-order effects
   of the large CNVs that we observed may be contributing to the molecular
   etiology of the developmental disorders in question, a point which is
   further strengthened by another recent paper^[152]9, where the authors
   described studying the effects of large CNVs on chromosome 16p11 on
   chromosome interactions. The large CNVs on 16p11 are almost as strongly
   associated with neurodevelopmental disorders as the large CNV on
   22q11.2. While the study on the 16p11 CNVs used a somewhat smaller
   number of cell lines than our study and also used 4C as a method of
   discovery, which is, unlike the Hi-C method used by us, not able to
   detect changes in a global and unbiased fashion, it is one more
   independent piece of supporting evidence for the biological validity
   and general relevance of the findings which we describe here.

   While we were able to show possible correlations across several pairs
   of the molecular levels that we assayed in this study, there are other
   combinations of molecular levels that show no obvious connection to
   each other in our data. We believe that this could be a function of
   either the developmental time point or the cell type, or both, being
   removed from those where the 22q11.2 and 1q21.1 deletions most likely
   exert some of their strongest effects (e.g., during embryonal
   development and in cells of the developing central nervous system). The
   clear effects on several genomic and epigenomic levels that we were
   able to observe in LCLs could represent the afterglow or reflection of
   a molecular tragedy that played out earlier in the development of
   various organs in the patients carrying these deletions. At the same
   time, the strength of this afterglow would hint at the strength of the
   effects that impacted across various levels of molecular control and
   gene regulation by CNVs of this size.

Methods

Cell lines

   All cell lines were either acquired from the Coriell cell repository
   (cell lines IDs starting with GM or ID) or were taken from the
   Molecular Genetics of Schizophrenia (MGS) cohort (dbGaP Study
   Accession: phs000167.v1.p, cell lines 52425 and 82699) and were
   appropriately consented.

Hi-C libraries

   The Hi-C assay was carried out according to the original
   protocol^[153]20, with several modifications. Twenty-five million
   cultured Epstein–Barr virus-transformed lymphoblastoid cells were spun
   down, crosslinked by adding 9 ml fresh medium and 0.25 ml 37%
   formaldehyde and incubated for 10 min at room temperature (RT). The
   reaction was stopped by adding 0.5 ml 2.5 M glycine and incubating for
   5 min at RT. For cell lysis, 5 ml lysis buffer (500 μl 10 mM Tris-HCl
   pH8.0, 10 mM NaCl, 0.5% Ige cal CA630; 50 μl protease inhibitors
   (Sigma, St. Louis, MO)) was added to the cells followed by incubation
   on ice for at least 15 min, cells then were lysed with a Dounce
   homogenizer by moving the pestle A up and down 10 times, followed by
   incubating on ice for 1 min and 30–45 more strokes with pestle B. The
   resulting suspension was spun down, the supernatant was discarded and
   the pellet was washed twice with 500 μl ice-cold 1× NEBuffer 2 (NEB,
   Ipswich, MA). The pellet was then resuspended in 1× NEBuffer 2 in a
   total volume of 50 μl × n (n = number of tubes, at this point the
   sample can be split into multiple aliquots as needed). Next, 346 μl 1×
   NEBuffer 2 were added per tube. To remove proteins that were not
   crosslinked directly to the DNA, 1.9 μl of 20% sodium dodecyl sulfate
   was added per tube and the mixture was resuspended and incubated at
   65 °C for 20 min, then at 37 °C for 30 min with shaking. Tubes were put
   on ice and 44 μl 10% Triton X-100 was added and mixed, followed by
   shaking for 30 min at 37 °C. Chromatin was subsequently digested
   overnight at 37 °C by adding 600 Units HindIII (NEB).

   After digestion, the following several steps were carried out according
   to the original Hi-C publication^[154]20: marking of DNA ends and
   blunt-end ligation, DNA purification, removal of biotin from unligated
   ends, and shearing. After shearing, end repair, and “A” addition to the
   ends of DNA, 2 μl of Illumina adaptors (PE1.0 and PE2.0) were ligated
   to 1 μg of DNA using 2 μl T4 DNA ligase (Promega) and incubating at RT
   for 30 min. After purification with QIAquick spin columns, DNA
   fragments with adaptor attached were selected and enriched for
   biotin-labeling by biotin pull down using Streptavidin C1 beads. The
   resulting eluate of 50 μl DNA solution was divided into 20 individual
   PCR tubes with each containing 2.5 μl of the eluted DNA, 0.25 μl of
   Illumina paired end primers (PE1.0 and PE2.0), and 1× Phusion High
   Fidelity master mix with HF buffer (NEB) in a total volume of 25 μl.
   PCR was carried out with a temperature profile of 30 s at 98 °C
   followed by 9 cycles of 20 s at 98 °C, 30 s at 60 °C, 30 s at 72 °C,
   and a final 7-min extension at 72 °C. PCR products were pooled and
   purified by Qiagen MinElute column. DNA was then run on a 2% E-gel for
   the purpose of purification and size selection. DNA fragments between
   450 and 600 base pairs were excised and purified with a gel extraction
   kit (Qiagen). DNA concentration was measured with the Quant-iT assay
   (Invitrogen). Finally, the Hi-C library was sequenced on the Illumina
   HiSeq platform using paired-end sequencing.

Capture Hi-C libraries

   For capture Hi-C of chromosome 22q, we prepared separate in situ Hi-C
   libraries using the previously reported protocol^[155]32. Briefly, five
   million cells were crosslinked with formaldehyde and lysed with
   protease inhibitors (Sigma, P8340). Chromatins were subsequently
   digested by MboI restriction enzyme (NEB, R0147). Restriction fragment
   overhangs were filled and the DNA ends were marked with biotin. Next,
   biotinylated DNA were sheared to a size of 300–500 bp, whose ends were
   repaired subsequently. dATP were attached and Illumina indexed adapter
   (TrueSeq nano) were ligated. The DNA fragments were then amplified and
   purified. A total of 2.1 million oligomer capture probes that densely
   tile the sequence of chromosome 22q were designed using the
   NimbleDesign Software and ordered from Roche Sequencing (formerly
   NimbleGen-Roche, Pleasanton, CA). Roche Sequencing’s SeqCap EZ Choice
   XL Enrichment Kit was used to capture fragments from the in situ Hi-C
   libraries using the chromosome 22q-specific capture oligomers.

ChIP-Seq libraries

   ChIP-Seq was performed according to the protocols in previous
   studies^[156]44,[157]45. Briefly, per ChIP 2 × 10^7 LCLs were
   crosslinked in 1% formaldehyde for 10 min at RT. The reaction was
   stopped by adding glycine to a final concentration of 0.125 M and
   stirring for 5 min at RT. The cells were pelleted and washed with 1×
   phosphate-buffered saline (PBS) plus protease inhibitors to remove the
   crosslinking reagent. The cells were pelleted again and stored at
   −80 °C until further use.

   On the day of the immunoprecipitation, the pellets were thawed and
   washed with 1× PBS plus Protease Inhibitors. The cells were pelleted
   and exposed to hypotonic buffer and broken with a Dounce homogenizer.
   The cell nuclei were pelleted and lysed in RIPA buffer. Chromatin was
   prepared using a Branson 250 Sonifier (7 × 30 s, 100% duty cycle).
   During the procedure, the lysate was kept cold at all times.

   A sample of the chromatin lysate was set aside as input control. The
   rest of the chromatin lysate was used in the ChIP experiment using the
   following antibodies against H3K27Ac (Abcam ab4729, 1:110 dilution),
   H2K27me3 (Cell Signaling 9733, 1:175 dilution), and CTCF (Millipore
   07-729, 1:175 dilution). Control IgG from the corresponding species was
   used as negative control. The Chromatin–antibody complexes were pulled
   down using Protein G Dynabeads (Life Technologies). The isolated DNA
   was purified and tested for successful enrichment using quantitative
   PCR against known loci. Illumina sequencing libraries were prepared
   using Illumina TruSeq adapters and the enzymes specified in previous
   study^[158]45. Libraries of the size range of 350–650 bp were excised
   from an agarose gel and cleaned up using the Qiagen Gel Extraction Kit.
   Libraries were PCR amplified for 14 cycles. Four to five libraries were
   pooled on one HiSeq2000 lane and sequenced using 2 × 100 bp.

RNA-Seq libraries

   The RNA-Seq libraries were generated according as in a previous
   study^[159]46. Briefly, polyadenylated RNA fragments were purified
   using a Dynabeads mRNA Purification Kit, fragmented, and reverse
   transcribed into first-strand cDNA using random hexamer and Superscript
   II reverse transcriptase, followed by second-strand cDNA synthesis
   using RNaseH and DNA polymerase I. The resulting cDNA was end-repaired
   and a single “A” was added at the 3′ ends before ligating to Illumina
   paired-end sequencing adaptors. After running on an agarose gel, DNA
   fragments from 250 to 350 bp were cut out and extracted using a Qiagen
   MinElute Gel Purification Kit, and PCR amplified using Phusion
   High-Fidelity master mix and Illumina PE primers.

Hi-C data analysis

   All Hi-C data were produced using Illumina paired-end sequencing with a
   read length of 2 × 101 bp. As there might be ligation junctions
   present in the reads, we performed iterative mapping using bowtie2 as
   in Imakaev et al.^[160]23. Briefly, we computationally cut all the
   reads to 25 bp first and mapped them to human genome (hg19). Then we
   extended the non-uniquely mapped reads by 5–30 bp and mapped them
   again. This process was repeated until the read length was extended to
   101 bp. This iterative mapping did improve the mapping rate
   (Supplementary Table [161]5). Each read end was mapped separately using
   the single-end mode. Only uniquely mapping reads were used and PCR
   duplicate read pairs were removed. Self-ligation fragments and the read
   pairs whose sum of distances from mapped positions to the nearest
   restriction sites is larger than the length of the fragments in the
   Hi-C library were further removed by hicpipe^[162]22. We only included
   autosomes in our study. The filtered contact number is listed in
   Supplementary Table [163]6.

   We compared three different data normalization methods: hiclib^[164]23,
   hicpipe^[165]22, and HiCNorm^[166]24. All of the three tools were run
   using the default parameters except for the segment length threshold
   being set to 600 bp. We chose normalized metrics on hicpipe for the
   following analyses, using a bin size of 40 kbp for topological domain
   analysis and of 500 kbp for the other analyses. The total number of
   contacts was normalized for each sample before combining cell lines in
   each category (control, 22q11del and 1q21del, respectively). Fold
   changes of log2-transformed mean contacts between deletion cell lines
   and control cell lines were calculated by (deletion−control)/control.

   To identify the differential inter-chromosomal contacts, we only
   included contacts with at least one supporting read pair in each of the
   cell lines. Differential contacts analysis was conducted by two-sided
   Student’s t test with the “t.test” function in R using the normalized
   metrics. Fisher’s exact test was used to assess the enrichment of
   differential contacts within the top 5% strongest contacts. We also
   performed the same analysis by permuting the control and 22q11del
   status of the cell lines ten times. Comparison within control cell
   lines and within 22q11del cell lines were performed by randomly
   dividing the cell lines into two groups three times.

Generation of haplotype-specific chromosomal contact maps

   To phase the homologous chromosomes of two patient cell lines, we first
   carried out deep whole-genome sequencing (37.4× genome-wide coverage
   for ID00014 and 32.8× genome-wide coverage for ID00016), which yielded
   the required heterozygous SNV information. Haplotype phasing was then
   carried out by SLRH^[167]21 in combination with Mendelian inheritance
   patterns of informative SNVs, based on the knowledge that ID00014 is a
   parent of ID00016. Briefly, genomic DNA was sheared to fragments of
   about 10 kbp length, which were diluted as needed and pipetted into a
   384-well plate with 3000–6000 gDNA fragments in each well. Within each
   well, fragments were amplified, fragmented further, and ligated to
   barcodes unique to each well. Fragments from across all wells were
   pooled together and sequenced on Illumina HiSeq. Sequencing reads were
   mapped to the reference genome and assigned to their unique well based
   on their barcodes. For each well, reads were assembled into haplotype
   blocks based on their overlapping heterozygous SNVs. The haplotype
   blocks were then used to construct long haplotype contigs using
   Illumina’s haplotyping algorithm Prism (v2.2). The N50 length of the
   haplotype contigs was 492,634 bp and 453,807 bp for ID00014 and
   ID00016, respectively. SNVs that could be phased by Mendelian
   inheritance analysis (heterozygous in one individual but homozygous in
   the other) were assigned to the phased haplotype contigs to assemble
   these haplotype contigs into whole phased chromosomes. The phased
   genome data of two control cell lines from related donors had been
   previously reported^[168]21 and were included in the analysis. More
   details on this approach can be found in Supplementary
   [169]Information.

   Capture Hi-C reads were mapped by BWA-MEM^[170]47 in single-end mode
   after ligation junction removal using Cutadapt^[171]48. Reads spanning
   the position of phased SNVs were assigned to the corresponding
   homologous chromosome using custom-written scripts. Then HOMER^[172]27
   was used to pair up the reads, filter the read pairs assigned to
   haplotypes (duplicates, self-ligations, read pairs <1.5× the sequencing
   fragment length, or distance to restriction site >1.5× the fragment
   length), and generate contact matrices for each homologous chromosome
   separately. To normalize the contacts involving the 22q11.2 region to
   the comparable level with contacts involving other regions on the
   intact homologous chromosome 22 in the patients, we calculated the
   expected number of phased heterozygous SNVs in this 3 million bp
   region, based on the density of phased heterozygous SNVs on chromosome
   22q in each patient. Random single base pair positions in the 22q11.2
   region were sampled to the calculated expected number. Only the
   contacts involving these sampled positions were included in the
   analysis. We performed this random sampling for ten times and the
   pattern remained consistently similar. We adjusted for recombination
   events for the haplotypes of ID00014 on the grounds that
   intra-chromosomal contacts should be more prevalent than
   inter-chromosomal contacts.

   Identification of A and B compartments was performed as in
   Lieberman-Aiden et al.^[173]20 and topological domains were identified
   as in Dixon et al.^[174]33.

Three-dimensional FISH

   Two human DNA BACs (clones RP11-47L18 and RP11-125K3) covering two
   distinct regions of chromosome 22 were labeled with biotin or
   digoxigenin (DIG) by the Nick Translation Kit (Roche Applied Sciences)
   to make FISH probes (Roche). In situ hybridization was performed
   according to the method published by a previous study^[175]49, with
   several modifications. Briefly, GM12878 and GM06990 cells were
   immobilized on poly-l-lysine-coated slides for 1 h at 37 °C before
   being fixed with 4% paraformaldehyde and permeabilized with 0.05 %
   Triton X-100. Cells were hybridized with DIG-labeled and biotin-labeled
   FISH probes overnight at 37 °C. Hybridized samples were washed and then
   incubated with anti-DIG-Rhodamine (Roche, 1:100 dilution) and
   anti-biotin-Alexa488 (Invitrogen, 1:100 dilution) for 1 h at 37 °C. 3D
   two-color image stacks were captured by a Zeiss LSM510 confocal
   microscope. Distances between two FISH signals were measured by ImageJ.

3D FISH data analysis

   For all individual cell lines tested in the FISH experiment, we first
   masked their identities to allow us picking target cells for
   high-resolution 3D imaging in an unbiased fashion. Between 16 and 33
   cells were chosen randomly for each tested cell line, the basic
   condition for each cell being that 3 out of the 4 expected FISH signals
   had to be clearly visible. The distance between the targeted
   chromosomal regions was then determined by measuring the average
   distance between the green and red signals, normalized by the multiple
   of the shortest and longest radius of the nuclei.

   To determine whether the distance between FISH signals is different
   between the 22q11 deletion and control groups, we carried out an
   analysis of variance with consideration of gender difference among the
   samples and distinct cell line subjects in each group:
   [MATH: <mi mathvariant="normal">Distance</mi><mo>~</mo><mi
   mathvariant="normal">Deletion</mi><mo>+</mo><mi
   mathvariant="normal">Gender</mi><mo>+</mo><mi
   mathvariant="normal">Deletion</mi><mo>∕</mo><mi
   mathvariant="normal">Subject</mi><mo>.</mo> :MATH]

   The parameter Subject is a random effect variable that represents the
   cell lines tested in our analysis. The variable Deletion indicates
   whether the cell line contains the 22q11 deletion and was initially
   masked when picking cells for imaging and signal distance measurement.
   The Subject parameter is nested within the Deletion parameter, as each
   cell line either has the 22q11 deletion or carries the full-length
   chromosome 22q.

Differential expression analysis

   All RNA-seq data were generated using Illumina paired-end sequencing
   with read length 101 bp. Reads were mapped to hg19 and transcriptome
   reference with TopHat 2^[176]50. TopHat 2 was run using default
   parameters but with the coverage search being turned off. The mapped
   reads were analyzed by Cufflinks^[177]51. Differential expression was
   estimated with Cuffdiff 2^[178]52. We excluded the genes with low
   expression (FPKM < 0.5) from downstream analysis.

   Pathway analysis of significantly differential expressed genes was
   conducted with DAVID^[179]53 using all the expressed genes as
   background.

Allele-specific gene expression analysis

   Samtools mpileup (v0.1.19) and BCFtools (v0.1.19)^[180]54 were used to
   count the number of reads mapped from bam files of RNA-Seq data to each
   allele of the heterozygous SNVs. Binomial tests were performed to
   determine whether the percentage of RNA-Seq reads mapped to the
   alternative allele was significantly different from the mean frequency
   of the alternative allele of all heterozygous SNVs within each
   individual. Only heterozygous SNVs with read coverage >10 were included
   in the analysis.

ChIP-Seq data analysis

   All ChIP-Seq data were generated using Illumina paired-end sequencing
   with read length 101 bp. Reads were aligned to hg19 with BWA-MEM using
   default parameters^[181]47. Reads with low mapping quality (<30) were
   removed. PCR duplicate reads were removed using Picard
   ([182]http://broadinstitute.github.io/picard). As a quality control, we
   calculated the normalized strand cross-correlation coefficient (NSC)
   and relative strand correlation (RSC)^[183]55 to assess the
   signal-to-noise ratios. All the data showed higher NSC than RSC
   (Supplementary Fig. [184]14). Replicates for the same cell line on
   average showed higher correlation than datasets from different cell
   lines (Supplementary Fig. [185]15).

   For CTCF and H3K27ac, we used MACS2^[186]56 to call narrow peaks with
   default parameters. For H3K27me3, we used the broad peak calling in
   MACS2. For all peak calling, we used the corresponding whole-cell
   extract input library as background. For differential bound analysis,
   we used the R package DiffBind^[187]57 with the effective library size
   for read count normalization. Then DBA_DESEQ2 method was employed to
   conduct the differential bound analysis. Signal artifact blacklist
   regions were excluded from our analysis
   ([188]http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncode
   Mapability).

Enrichment analysis

   For the enrichment analysis of differentially expressed genes, we
   divided each chromosome into 500-kbp bins. Within each bin, we
   calculated the total number of expressed genes and the number of genes
   with significantly differential expression between 22q11del and control
   cell lines. Then we conducted Fisher’s exact test to identify bins
   enriched with significantly differentially expressed genes against the
   background of the whole genome.

   For the analysis of differentially enriched sites for CTCF, H3K27ac,
   and H3K27me3, we also used 500-kbp bins. Log2-transformed fold changes
   of normalized read numbers in binding sites between 22q11del and
   control cell lines were further transformed to Z-scores. We considered
   binding sites with Z-score >2 or <−2 as significantly bound sites.
   Then, within each 500-kbp bin, we calculated the total number of
   binding sites and the number of significantly differentially enriched
   sites between 22q11del and control cell lines. Then we conducted
   Fisher’s exact test to identify bins with significantly differentially
   enriched sites against the background of the whole genome.

Correlation between gene expression and histone modification

   To estimate the correlation between gene expression and histone
   modification, we assigned each binding site of H3K27ac and H3K27me3 to
   its nearest ENSEMBL TSS using the R package ChIPpeakAnno^[189]58. If a
   TSS was associated with multiple binding sites, only the nearest
   binding site was retained. We then used this assignment for downstream
   correlation analysis. To determine the cutoff for the distance in which
   binding sites are associated with TSSs, we plotted the distribution of
   distances between binding sites and their assigned TSSs (Supplementary
   Fig. [190]16). Based on the distribution, we set the cutoff to distance
   to TSS to ±1 kbp for H3K27ac and to ±5 kbp for H3K27me3.

   To investigate the effects of histone modifications on gene expression,
   we divided the genes into two categories based on the differential
   expression analysis: differentially expressed genes (FDR < 0.05)
   (category A) and not differentially expressed genes (FDR > 0.05)
   (category B). Within each category, we further categorized the genes
   into two groups: upregulated expressed genes (A1, B1) and downregulated
   expressed genes (A2, B2) in 22q11del cell lines relative to control
   cell lines. Then, for the genes within each of the four groups (A1, B1,
   A2, B2), we calculated the Z-score transformed fold changes of the
   normalized read counts in the TSS-binding sites of histone marks
   between 22q11del and control cell lines. To obtain the statistical
   significance of the fold change differences between A1 and B1 and
   between A2 and B2, we performed permutation tests with 9999
   permutations.

   We also carried out the reverse analysis. TSS-binding sites of histone
   marks were partitioned into two categories based on the differential
   bound analysis: differentially bound sites (|Z-score| > 1 for H3K27ac,
   |Z-score| > 2 for H3K27me3) (category A) and non-differentially bound
   sites (|Z-score| < 1 for H3K27ac, |Z-score| < 2 for H3K27me3) (category
   B). Within each category, we further categorized the binding sites into
   two groups: upregulated bound sites (A1, B1) and downregulated bound
   sites (A2, B2) in 22q11del cell lines. Then we calculated the fold
   changes of the genes’ FPKM between 22q11del and control cell lines
   within each group. Permutation tests were performed with 9999
   permutations to obtain statistical significance.

   To estimate the correlation between gene expression and histone
   modification in a direct way, we calculated Pearson’s correlation
   coefficient between gene’s FPKM and normalized read counts in the
   corresponding TSS-binding site for each gene across all of the cell
   lines. To obtain statistical significance, we first permuted genes’
   FPKM across the cell lines for each TSS ten times to assess the
   background correlation levels and then performed the Wilcoxon rank sum
   test between the observed correlation coefficients and the background
   correlation coefficients. We also performed the same analysis using the
   differentially expressed genes only and differentially enriched binding
   sites only.

Correlation analysis for epigenetic marks

   To assess the correlation of epigenetic marks binding between different
   regions on the same chromosomes, we divided the chromosomes into
   500-kbp bins. Within each bin, we calculated the mean value of the
   normalized read counts for all the binding sites of each epigenetic
   mark. Then we calculated the Pearson’s correlation coefficients of
   obtained mean values across the cell lines between any two bins on the
   same chromosomes.

Supplementary information

   [191]Supplementary Information^ (38.2MB, pdf)
   [192]peer review file^ (5.9MB, pdf)

Acknowledgements