Abstract

Background

   Previous studies using different cardiac phenotypes, technologies and
   designs suggest a burden of large, rare or de novo copy number variants
   (CNVs) in subjects with congenital heart defects (CHD). We sought to
   identify disease-related CNVs, candidate genes and functional pathways
   in a large number of cases with conotruncal and related defects that
   carried no known genetic syndrome.

Methods

   Cases and control samples were divided into two cohorts and genotyped
   in order to assess each subject’s CNV content. Analyses were performed
   to ascertain differences in overall CNV prevalence and to identify
   enrichment of specific genes and functional pathways in conotruncal
   cases relative to healthy controls.

Results

   Only findings present in both cohorts are presented. From 973 total
   conotruncal cases, a burden of rare CNVs was detected in both cohorts.
   Candidate genes from rare CNVs found in both cohorts were identified
   based on their association with cardiac development or disease, and/or
   their reported disruption in published studies. Functional and pathway
   analyses revealed significant enrichment of terms involved in either
   heart or early embryonic development.

Conclusions

   Our study tested one of the largest cohorts specifically with cardiac
   conotruncal and related defects. These results confirm and extend
   previous findings that CNVs contribute to disease risk for CHDs in
   general and conotruncal defects in particular. As disease heterogeneity
   renders identification of single recurrent genes or loci difficult,
   functional pathway and gene regulation network analyses appear to be
   more informative.

   Keywords: Congenital heart defects, conotruncal defects, copy number
   variants, CNVs, functional analysis, pathway analysis

Introduction

   Congenital heart defects (CHDs), which comprise the most common, severe
   birth defect, occur in 4–9 per 1,000 liveborn and are thought to be
   caused by both genetic and environmental factors ([38]Pierpont et al.,
   2007). Conventional karyotyping detects chromosomal anomalies in
   approximately 13% of all CHD cases, most of which fall into aneuploidy
   syndromes (e.g. trisomy 18 or 21) (reviewed in [39]Hartman et al.,
   2011). Array-based technologies have revealed submicroscopic
   chromosomal deletions or duplications (copy number variants (CNVs)) in
   an additional 3–20% of CHD cases, with a higher frequency observed in
   those with syndromic or additional non-cardiac features (reviewed in
   [40]Andersen et al., 2014; [41]Lalani and Belmont, 2014). Despite
   differences in study cohort phenotypes and genomic surveillance
   approach, most studies report a significant burden of large, rare,
   and/or de novo CNVs in CHD cases ([42]Glessner et al., 2014;
   [43]Greenway et al., 2009; [44]Lalani et al., 2013; [45]Silversides et
   al., 2012; [46]Soemedi et al., 2012b; [47]Tomita-Mitchell et al.,
   2012). Some of these CNVs encompass genes usually disrupted by single
   nucleotide mutations for which CHD is part of the clinical spectrum,
   such as TBX1 (22q11.2 deletion, OMIM#188400, MIM:602054), EHMT1 (9q34.3
   deletion or the Kleefstra syndrome OMIM#610253, MIM:607001), GATA4
   (MIM:600576, mapping in to the 8p23.1 deletion), and other genes deemed
   critical for heart development (reviewed by [48]Andersen et al., 2014;
   [49]Lalani and Belmont, 2014). However, many of the newly discovered
   CNVs do not contain a yet well-established cardiac-related gene, and
   few are recurrent. We and others ([50]Glessner et al., 2014; [51]White
   et al., 2014) have therefore applied functional and pathway analyses to
   identify additional candidate genes, in order to establish mechanistic
   and/or developmental relationships between these rare events. To date,
   most studies have employed a limited repertoire of functional
   approaches and few have replicated findings from other studies
   ([52]Glessner et al., 2014; [53]Lalani et al., 2013; [54]Silversides et
   al., 2012).

   In an attempt to reduce disease heterogeneity, we sought to identify
   recurrent CNVs, candidate gene sets and developmental mechanisms
   associated with a specific subset of CHD, namely conotruncal and
   related defects. These defects are thought to share a common genetic
   etiology based on family and animal studies ([55]Digilio et al., 2000;
   [56]Gobel et al., 1993; [57]Miller and Smith, 1979). To that end we
   studied one of the largest cohorts to date with conotruncal defects
   whose cases did not carry a known genetic diagnosis, used denser
   SNP-based arrays to increase resolution in a subset of cases, applied a
   range of pathway and functional analyses, and compared our results to
   those previously published.

Methods

Study Cohorts

   This study was approved by The Children’s Hospital of Philadelphia
   (CHOP) Institutional Review Board. Study subjects and their parents
   were recruited, consented, and diagnosed in a uniform manner at the
   CHOP Cardiac Center. Study subjects were approached to participate if
   they had a conotruncal or related cardiac defect and had not been
   diagnosed with a recognized genetic syndrome upon review of their
   medical record (e.g. 22q11.2 deletion syndrome, Trisomy 21, Alagille
   syndrome). Reports from echocardiograms, cardiac catheterizations,
   cardiac magnetic resonance imaging or cardiac operative notes were
   reviewed to detail the cardiac anatomy. Medical records, including
   available consults performed by clinical geneticists, were reviewed to
   detail non-cardiac congenital anomalies. Family medical history was
   obtained by an interview conducted by a genetic counselor. DNA was
   extracted from whole blood collected from parents; proband DNA was
   either extracted from whole blood or in certain cases, from an
   established lymphoblastoid cell line, using the Puregene DNA isolation
   kit (Gentra Systems Inc., Minneapolis, MN).

   Three independent groups of healthy controls were used in this study.
   Healthy control samples (N=4255, Healthy_CHOP) were recruited from
   well-child visits (ages 3–18 years) within CHOP’s healthcare network as
   previously described ([58]Glessner et al., 2009). All healthy control
   samples for this study were carefully examined by genotype and health
   record to exclude samples with any indications of CHD, evidence of
   chronic health issues, documented genetic abnormalities, or syndromic
   genomic diseases. Genomic DNA was obtained from whole blood using
   standard protocols.

   A second group of healthy adult controls (N=2156), which were part of a
   previously published study of candidate genes for ocular refraction in
   the Age Related Eye Diseases Study (AREDS), were downloaded from dbGaP
   (dbGaP Study Accession: phs000001.v3.p1) ([59]Wojciechowski et al.,
   2013).

   A third control cohort, 179 HapMap CEU samples genotyped using Illumina
   HumanOmni 2.5M Beadchip Array, was downloaded from the Illumina data
   depository (ftp.illumina.com).

Array Genotyping

   All CHOP samples, including all conotruncal patients and controls in
   the healthy CHOP cohort (N=4255), were genotyped following a consistent
   protocol at CHOP’s Center for Applied Genomics. The majority of
   conotruncal cases (n= 627) and all of the healthy controls were array
   genotyped on the Illumina Infinium™ II HumanHap550 v1 or v3, or
   BeadChip 610 array (Illumina, San Diego, CA) as previously described
   ([60]Elia et al., 2012). The remaining cases (n= 346) were array
   genotyped using the HumanOmni2.5-8 BeadChip array. The standard
   Illumina cluster file downloaded from the Illumina website was used for
   the analysis and running the GenomeStudio clustering algorithm. Control
   samples from the AREDS study were genotyped using the Illumina
   HumanOmni2.5 Quad BeadChip array with the standard Illumina cluster
   file as previously described (dbGaP Study Accession: phs000429.v1.p1
   ([61]Simpson et al., 2013)).

Sample Quality Control

   Subject gender was verified by the CNV Workshop software package
   ([62]Gai et al., 2010; [63]Gai et al., 2012). Exclusion criteria for
   genotypes included SNP call rate <98%, probe intensity LRR ≥3 standard
   deviations from the cohort mean (0.36), excess of inheritance errors
   within trios, non-European ancestry as determined by Plink sample
   stratification ([64]Patterson et al., 2006; [65]Price et al., 2006;
   [66]Purcell et al., 2007), or gender inconsistencies between
   self-reported and genotype-derived values.

CNV detection and analysis

   We grouped cases and controls into two mutually exclusive cohorts.
   Cohort 1 included all cases and controls genotyped using the Illumina
   Infinium™ II HumanHap550 v1 or v3, or BeadChip 610 array. Cohort 2
   included cases and AREDS control samples genotyped using the Illumina
   2.5M BeadChip.

   In order to correct for differences in SNP probe content among all
   three SNP array versions used in Cohort 1, analysis was limited to the
   subset of SNPs shared by all three genotyping arrays (535,591 SNPs).
   CNV Workshop ([67]Gai et al., 2010; [68]Gai et al., 2012) and PennCNV
   ([69]Wang et al., 2007) were used to define CNV regions as previously
   described ([70]White et al., 2014).

   We applied the same approach for samples in Cohort 2 to adjust for the
   different versions of Illumina 2.5M BeadChip arrays between cases
   (Illumina HumanOmni2.5-8v1) and controls (Illumina HumanOmni2.5-4). For
   the 2.5M arrays, the subset of 2,332,843 SNPs in common between the two
   platforms was used to predict CNV regions in genotyped samples. In
   addition, we used 179 Hapmap Caucasian samples that were genotyped
   using HumanOmni2.5-8v1 BeadChip array (Illumina) to further reduce any
   systemic bias potentially introduced by different genotyping
   technologies used in Cohort 2. Hapmap samples were processed in a
   manner consistent with the Cohort 2 cases. Quality filtered CNV calls
   from HapMap samples were used as a validation set. Any genes,
   functional terms, or gene network clusters deemed as significant by
   comparing HapMap samples to the AREDS cohort control samples (nominal
   p-value< 0.05) were removed from further consideration, as these
   findings could be due to systemic bias.

   All of the analyses described below were performed in each cohort
   independently and repeated in the Combined Cohort, generated by merging
   Cohort 1 and Cohort 2.

CNV Quality Control

   CNV calls were considered for further review only if predicted by both
   algorithms for ≥60% of the predicted CNV span, with the exception of
   certain large CNVs as specified below. Subject genotypes with total CNV
   burden ≥3 standard derivations from the cohort mean were removed from
   further analysis ([71]Pankratz et al., 2011). To reduce the possibility
   of type I error, deletions spanning less than 5 consecutive SNPs and
   duplications spanning less than 10 consecutive SNPs in Cohort 1 were
   excluded. Given that Cohort 2 was genotyped on a higher density array,
   we adopted a higher threshold for Cohort 2 such that deletions spanning
   less than 10 consecutive SNPs and duplications spanning less than 20
   consecutive SNPs were excluded. In both cohorts, deletions spanning
   less than 10 kilobases and duplications spanning less than 20 kilobases
   were removed. CNV SNP and length thresholds were selected based upon
   previous studies from our group ([72]Elia et al., 2012; [73]Gai et al.,
   2012; [74]Shaikh et al., 2009; [75]White et al., 2014), examination of
   size-based concordance rates between the two algorithms ([76]White et
   al., 2014), and extensive experience with samples undergoing
   array-based clinical diagnostics at our institution ([77]Conlin et al.,
   2010).

   Additional CNV exclusion criteria included: CNVs with ≥50% overlap with
   centromere, telomere, and immunoglobulin variable regions; CNVs within
   olfactory receptor genes; and CNVs with SNP densities ≤ 1 SNP/30
   kilobases, as described in ([78]Hasin et al., 2008; [79]Hellemans et
   al., 2007; [80]Young et al., 2008). CNVs were considered equivalent if
   their genomic regions reciprocally overlapped for ≥60% of their length.
   Large CNVs were defined as those falling within the top 5% of CNVs
   observed in the corresponding control cohorts, inherited CNVs as
   equivalent CNVs identified in a subject and either parent, rare CNVs as
   being observed in one or fewer controls (<0.05% frequency in controls),
   and very rare CNVs as those not observed in the control cohort
   ([81]White et al., 2014). B-allele frequencies (BAF) and signal
   intensity Log R ratios (LRR) of large CNVs were also visually inspected
   in GenomeStudio (Illumina). Large CNVs within 10 kilobases of each
   other were also visually inspected in GenomeStudio, and if the BAF and
   LRR traces indicated likelihood of a single contiguous event, the CNV
   regions were merged. Predicted CNVs were annotated using the RefSeq
   gene list ([82]Pruitt et al., 2005), as represented in the UCSC Genome
   Browser ([83]Kent et al., 2002) (genome.ucsc.edu).

Functional analysis

   Gene Ontology (GO) ([84]Ashburner et al., 2000) annotations were
   retrieved from Ensembl.org (huseast.ensembl.org/index.html) using the
   BioMart data-mining tool ([85]Smedley et al., 2015). Mammalian
   Phenotype Ontology (MPO) term annotations were obtained from the
   Mammalian Genome Informatics resource (MGI)
   ([86]www.informatics.jax.org)([87]Eppig et al., 2015). Functional
   annotation of Reactome ([88]www.reatome.org) ([89]Croft et al., 2014;
   [90]Milacic et al., 2012) and KEGG ([91]www.kegg.jp) ([92]Kanehisa and
   Goto, 2000; [93]Kanehisa et al., 2016) gene set collections were
   downloaded from the GSEA database
   ([94]www.broadinstitute.org/gsea/msigdb/index.jsp) ([95]Mootha et al.,
   2003). All annotations were studied to assess gene set enrichments in
   cases as compared to controls. Gene Ontology and Mammalian Phenotype
   Ontology analyses included child and antecedent parental terms
   associated with a given gene. The extent of statistical enrichment for
   each functional term was determined by applying Fisher’s Exact Test
   (two-sided), which directly compared the frequency of occurrence in
   case and control cohorts for each gene or CNV being considered. We
   applied the Benjamini-Hochberg False Discovery Rate procedure
   ([96]Benjamini and Hochberg, 1995) to further eliminate any potential
   family-wise type I error. For global CNV and gene analyses,
   amplification and deletion events were considered both in aggregate and
   separately at each locus considered. We only reported a finding when
   the functions’ nominal p-value was less than 0.05 in each cohort and
   the False Discovery Rate measured in the merged cohort was less than
   0.05 ([97]Figure 1).

FIGURE 1. Flow chart outlining process of data analysis.

   [98]FIGURE 1
   [99]Open in a new tab

   For CNV detection workflow refer to [100]White et al. (2014).

Knowledge-based Analysis

   A subset of genes of particular interest for cardiac development and
   congenital cardiac defects was compiled in an unsupervised manner by
   considering prior knowledge of the biomedical literature or expression
   status in heart tissue. We used 47 terms descriptive of conotruncal
   defects or general cardiac development through an analysis of MEDLINE
   articles using natural language processing methods. Gene-Cardiac terms
   were required to be associated with at least three articles in order to
   eliminate type I error.

Gene network construction

   To construct a network among our genes of interest, especially rare
   genes among patient cohorts, we used the Cytoscape ReactomeFIViz Gene
   Set/Mutation Analysis application with default parameters. (Cytoscape
   version 3.2, f1000research.com/articles/3–146/v2) ([101]Shannon et al.,
   2003; [102]Wu et al., 2014) Gene interaction networks obtained were
   clustered into modules using ReactomeFIViz’s Cluster FI Network
   function. A pathway enrichment analysis was employed on each individual
   network module using the Analyze Module Functions tool. Only pathways
   with a FDR <0.05 were reported in order to reduce family wise type I
   error.

Cardiac Gene sets

   Two mouse gene expression profiles were compiled and tested for
   enrichment among our collection of case CNVs using Fisher’s Exact test.
   Known cardiac relevance was assayed by using previously reported gene
   lists that compiled mouse genes ranked by level of expression in the
   developing mouse heart at days E9.5 and E14.5 ([103]Zaidi et al.,
   2013). All mouse transcripts were converted to human gene homologs and
   subsequently ranked by their relative expression levels. The “high
   heart expressed 9.5” (HHE_9.5) list contains genes within the top
   quartile of expression levels (n = 4402) at E9.5, while the “high heart
   expressed _14.5” (HHE_14.5) list contains genes within the top quartile
   of expression levels at E14.5. Gene lists with expression levels ranked
   in the lowest quartile were also compiled (“low heart expressed 9.5”
   (LHE_9.5), and “low heart expressed_14.5” (LHE_14.5). For each gene
   list, differing thresholds of inclusion were also explored to measure
   the trend of enrichments among conotruncal patient cohorts.

   We repeated our gene function and network studies restricting the gene
   list to those present in very rare CNVs and a third high-heart
   expressed gene list that combined HHE_9.5 and HHE 14.5 (HHE: combined
   HHE_14.5 and 9.5) given that HHE_9.5 and HHE 14.5 shared approximately
   80% gene identity. Selected genes were imported into DAVID
   Bioinformatics website ([104]Huang da et al., 2009a; [105]b) and
   Reactome FI application to evaluate gene functional and regulation
   network properties as previously described. We also repeated our
   analysis restricting the gene list to those present in very rare CNVs
   and the low-heart expressed gene list (LHE: combined LHE_14.5 and 9.5)
   to eliminate any false positive findings resulted from systemic gene
   set annotation bias by either DAVID Bioinformatics or Reactome FI.

Statistics Test Utility

   The Wilkoxon rank sum test, two way ANOVA test (Type III Sums of
   Squares), or two tailed Fisher’s Exact Test, as appropriate, were used
   to test significance in case-control CNV and gene enrichment analyses.
   The Benjamini Hochberg False Discovery Rate (BH-FDR) procedure was
   applied to adjust for family-wise multiple hypotheses testing.

CNV validation

   Selected CNVs, based on likely candidacy, statistical likelihood, or
   putative function, were validated using TaqMan® copy number assays
   (Life Technologies, Grand Island, NY). Selection was based on CNV size
   (<100 kb) and on available human disease information (OMIM: omim.org).
   An RNAse P TaqMan assay was used as the internal control. Assays were
   performed on an ABI 7500 Fast Realtime PCR System (Life Technologies)
   using standard conditions and analyzed with the 7500 Fast System SDS
   v.1.4.0 software (Life Technologies). All samples were assayed in
   triplicate and negative results were verified at least twice in
   independent experiments.

Results

Study cohort

   A total of 973 cases (Cohort 1 + Cohort 2) with a definitive diagnosis
   of a conotruncal or related heart malformation who upon review of
   medical records did not carry the diagnosis of a known genetic syndrome
   were used for these analyses ([106]Table 1). All cases were recruited
   at the CHOP Cardiac Center and passed our rigid quality control process
   as detailed in Methods. Most cases were ascertained at less than one
   year of age (63% of Cohort 1, 52% Cohort 2, 59% overall), and 71% of
   cases were ascertained at less than five years of age. As such, while
   we divided the cohort into those with and without additional congenital
   anomalies for subgroup analyses, we could not consider the presence of
   neurodevelopmental disorders given the young age of the study
   population. A first-degree relative was reported to have CHD in 6%
   (n=59) of cases. Array genotyped parental samples were only available
   for Cohort 1 for which there were 367 complete case-parent trios (both
   parents and case) and 199 incomplete case-parent trios (one parent and
   case). The type, number, and frequency of specific cardiac
   abnormalities from both cohorts are listed in [107]Table 1. All Cohort
   1 (n=627) and Cohort 2 (n=346) cases were of European descent. There
   was no gender difference between the two cohorts with a proband gender
   ratio of 1.5:1 (376 males) and 1.34:1 (198 males) in Cohort 1 and 2,
   respectively (p-value=0.44, Fisher’s Exact Test). A total of 4833
   healthy subjects (2980 in Cohort 1 and 1853 in Cohort 2) passed our
   quality control steps outlined above and were used as controls as
   detailed in Methods.

TABLE 1.

   Phenotype distribution for both cohorts
   Count (%)
     __________________________________________________________________

   Cardiac Lesion[108]^* Cohort 1 Cohort 2
   Tetralogy of Fallot 249 (39.7) 118 (34.1)
       Pulmonary Stenosis 195 (78.3) 79 (66.9)
       Pulmonary Atresia 41 (16.5) 27 (22.9)
       Absent Pulmonary Valve 6 (2.4) 1 (0.8)
       Unspecified Pulmonary Anatomy 7 (2.8) 11 (9.3)
     __________________________________________________________________

   Ventricular Septal Defect[109]^† 120 (19.1) 93 (26.9)
       Conoventricular 101 (84.2) 72 (77.4)
       Conal Septal Hypoplasia 5 (4.2) 4 (4.3)
       Malalignment 14 (11.7) 15 (16.1)
       Unspecified Type 0 2 (2.2)
     __________________________________________________________________

   D-Transposition of the Great Arteries 124 (19.8) 68 (19.6)
       With Ventricular Septal Defect 61 (49.2) 30 (44.1)
       Without Ventricular Septal Defect 60 (48.4) 33 (48.5)
       Unspecified if Ventricular Septal Defect Present 3 (2.4) 5 (7.4)
   Transposition of the Great Arteries - other/unknown[110]^∼ 6 (1) 4
   (1.2)
     __________________________________________________________________

   Double Outlet Right Ventricle[111]^^ 68 (10.8) 19 (5.5)
       Pulmonary Stenosis/Atresia 28 (41.2) 8 (42.1)
       Aortic Stenosis/Atresia 9 (13.2) 1 (5.3)
       Tricuspid Stenosis/Atresia 8 (11.8) 2 (10.5)
       Mitral Stenosis/Atresia 26 (38.2) 5 (26.3)
       Common Atrioventricular Valve 6 (8.8) 5 (26.3)
       Single Ventricle (Double Inlet Right or Left Ventricle) 1 (1.5) 1
   (5.3)
   Isolated Aortic Arch Anomaly 29 (4.7) 18 (5.2)
       Left Aortic Arch with Aberrant Right Subclavian Artery 1 (3.4) 4
   (22.2)
       Right Aortic Arch with Mirror Image Branching 3 (10.3) 2 (11.1)
       Right Aortic Arch with Aberrant Left Subclavian Artery 9 (31.0) 7
   (38.9)
       Double Aortic Arch 16 (55.2) 5 (27.8)
     __________________________________________________________________

   Truncus Arteriosus 18 (2.9) 16 (4.6)
       Type 1 8 (44.4) 11 (68.8)
       Type 2 6 (33.3) 4 (25.0)
       Type 3 1 (5.6) 0
       Type 4 1 (5.6) 0
       Type Unspecified 2 (11.1) 1 (6.3)
     __________________________________________________________________

   Interrupted Aortic Arch 12 (1.9) 8 (2.3)
       Type A 3 (25.0) 1 (12.5)
       Type B 8 (66.7) 7 (87.5)
       Type Unspecified 1 (8.3) 0
     __________________________________________________________________

   Other [112]^# 1 (0.1) 2 (0.6)
     __________________________________________________________________

   Total 627 (100) 346 (100)
   [113]Open in a new tab
   ^*

   2.7% and 3.2% of the subjects were also diagnosed with heterotaxy in
   Cohort1 and Cohort 2, respectively.
   ^†

   17.5% and 14% of the subjects were also diagnosed with coarctation of
   the aorta in Cohort 1 and Cohort 2, respectively; and 9.2% and 6.5% had
   concurrent muscular VSDs in Cohort 1 and Cohort 2, respectively.
   ^∼

   Cardiac segments SDL or unknown
   ^^

   Subsets are not mutually exclusive.
   ^#

   Single subjects with atrial septal defect and muscular VSD, muscular
   VSD, right ventricle aorta and pulmonary atresia.

CNV burden in conotruncal patient cohorts

   Structural variation content of the 627 cases in Cohort 1 totaled 2735
   CNVs, consisting of 553 duplications, 2083 heterozygous deletions, 90
   homozygous deletions, and 9 hemizygous deletions (deletions in male
   X-chromosome) ([114]Figure 2; [115]Supplemental Table S1a). Of these,
   1407 (51.4%) could be definitively identified as inherited (710
   maternal, 636 paternal, and 61 present in both parents), while 487 were
   present in neither parent and were thus suggestive of de novo events.
   Of these de novo CNVs, 145 were very rare (5.3% of total CNVs) and
   identified in 105 subjects (16.7% of subjects). Previous work had
   established bias towards Type II error using the protocol proposed by
   ([116]Itsara et al., 2010). Therefore, certain of these de novo events
   were likely due to Type II error and present in a parent; those of
   interest were validated by quantitative PCR, as described in Methods.
   We detected no significant differences in the overall CNV frequency
   (P>0.05, case/control ratio=1.00) or CNV size (P>0.05, case/control
   ratio=1.05) between cases and controls. This lack of correlation was
   upheld when considering only the subset of CNVs overlapping transcribed
   regions between cases and controls (P>0.05, mean case/control
   ratio=1.00 for CNV frequency, mean case/control ratio=1.08 for CNV
   size). The same conclusion was observed when we restricted the
   CNV-derived gene list to those overlapping with the HHE genes (CNV
   frequency: p-value>0.05, mean case/control ratio =1; CNV size: p-value
   >0.05, mean case/control ratio=1.04). When restricting CNV burden
   analysis to the 367 conotruncal trios, parental transmission of
   inherited CNVs to probands was found to be independent of parent gender
   (P>0.05; 654 maternal vs. 655 paternal).

FIGURE 2. Flow chart depicting the distribution of CNVs in each cohort.

   [117]FIGURE 2
   [118]Open in a new tab

   The total count of CNVs and in parenthesis, the subset of CNVs
   containing genes, are presented. Row I reports all CNVs; Row II
   describes inheritance status for Cohort 1; Rows III and IV report the
   number of rare and very rare CNVs as defined in Methods, respectively.

   We detected 3192 total CNVs from 346 singletons of Cohort 2, including
   2270 heterozygous deletions, 283 homozygous deletions, and 639
   duplications ([119]Supplemental Table S1b). We again detected no
   significant differences in the overall CNV frequency (P>0.05,
   case/control ratio=1.00) or CNV size (P>0.05, case/control ratio=1.05)
   between cases and controls in Cohort 2. As Cohort 2 had no trio data,
   we were unable to determine inheritance status.

   We defined rare CNVs as those present in less than 0.05% of healthy
   controls whether inherited or de novo. By this definition, Cohort 1
   contained 836 rare CNVs (263 duplications, 568 heterozygous deletions,
   and 5 hemizygous X chromosome deletions) and Cohort 2 contained 888
   rare CNVs (276 duplications, 611 heterozygous deletions, and one
   homozygous deletion). The overall distribution of CNVs in both cohorts
   is depicted in [120]Figure 2.

   The burden of rare CNVs was assessed in each cohort ([121]Table 2).
   Rare CNVs were significantly overrepresented in cases, both when
   comparing the proportion of subjects with rare CNVs or the frequency of
   rare CNVs in cases and controls. Rare CNV burden remained significant
   for overall large CNVs (CNVs with size larger than 3 times of standard
   derivation of mean CNV size in controls), suggesting similar overall
   CNV burden characteristics for each cohort. A subgroup analysis
   comparing the burden of rare CNVs in cases with and without additional
   non-cardiac anomalies showed significant enrichment as compared to
   controls ([122]Table 2) while there was no difference comparing one to
   the other ([123]Supplemental Table S2).

TABLE 2.

   Rare CNV Burden Analysis
   All patients
     __________________________________________________________________

   CTD[124]^# Patients with no other anomalies
     __________________________________________________________________

   CTD[125]^# patients with additional anomalies
     __________________________________________________________________

   CNV type Count CNV
   burden Case/control
   CNV burden
   odds ratio Significance
   (sample count
   based)[126]^* Significance
   (CNV count
   based)[127]^* Count CNV
   burden Case/control
   CNV burden
   odds ratio Significance
   (sample count
   based)[128]^* Significance
   (CNV count
   based)[129]^* Count CNV
   burden Case/control
   CNV burden
   odds ratio Significance
   (sample count
   based)[130]^* Significance
   (CNV count
   based)[131]^*
   Cohort 1 Duplications 263 0.420 1.462 1.13E-06 8.58E-10 213 0.4235
   1.4759 7.98E-06 2.52E-08 49 0.3984 1.3885 3.05E-02 3.78E-03
     __________________________________________________________________

       Deletions 573 0.914 1.460 1.53E-07 9.45E-22 460 0.9145 1.4605
   4.23E-06 2.58E-19 113 0.9187 1.4672 3.01E-03 2.61E-05
     __________________________________________________________________

       All CNVs 836 1.333 1.460 1.72E-09 1.89E-30 673 1.338 1.4653
   1.06E-07 8.65E-27 162 1.3171 1.4424 1.99E-03 6.70E-07
     __________________________________________________________________

   Large CNVs
     __________________________________________________________________

   Duplications 75 0.120 1.498 3.72E-03 2.01E-04 60 0.1193 1.4936 5.86E-03
   7.27E-04 15 0.122 1.527 2.23E-01 8.23E-02
     __________________________________________________________________

       Deletions 32 0.051 1.690 1.91E-02 9.55E-03 22 0.0437 1.4482
   1.01E-01 3.02E-02 10 0.0813 2.692 1.44E-02 9.12E-02
     __________________________________________________________________

       All CNVs 107 0.171 1.551 1.90E-04 3.94E-06 82 0.163 1.4811 1.40E-03
   5.58E-05 25 0.2033 1.8466 2.21E-02 1.42E-02
     __________________________________________________________________

   Cohort 2 Duplications 276 0.798 1.668 3.03E-12 4.03E-47 235 0.7993
   1.6717 2.67E-10 8.69E-42 40 0.8 1.6731 5.36E-04 5.00E-09
     __________________________________________________________________

       Deletions 612 1.769 1.847 6.28E-11 7.10E-34 526 1.7891 1.8677
   8.53E-10 9.60E-32 82 1.64 1.7121 1.82E-02 4.61E-05
     __________________________________________________________________

       All CNVs 888 2.567 1.787 5.05E-15 1.10E-65 761 2.5884 1.8025
   3.72E-12 3.14E-60 122 2.44 1.6991 1.71E-04 5.29E-10
     __________________________________________________________________

   Large CNVs
     __________________________________________________________________

   Duplications 61 0.176 1.675 1.93E-04 2.47E-08 50 0.1701 1.6161 1.38E-03
   1.03E-07 11 0.22 2.0906 1.29E-02 1.97E-02
     __________________________________________________________________

       Deletions 30 0.087 1.530 2.05E-01 6.36E-02 23 0.0782 1.3806
   5.83E-01 2.40E-01 6 0.12 2.1177 5.59E-02 1.45E-01
     __________________________________________________________________

       All CNVs 91 0.263 1.625 1.47E-04 5.49E-09 73 0.2483 1.5337 1.71E-03
   1.70E-07 17 0.34 2.1001 1.36E-02 3.13E-03
   [132]Open in a new tab
   ^#

   CTD: Conotruncal defect,
   ^*

   Fisher Exact Test, two-side, bold type indicates significance

Gene analysis

   In Cohort 1, a total of 1217 CNVs included one or more genes,
   collectively representing 1816 individual genes ([133]Supplemental
   Table S3). We determined that 314 of these genes were included in CNVs
   in two or more individuals; of these, only 42 genes were not included
   in CNVs in controls. In Cohort 2, 1412 CNVs included 1458 individual
   genes ([134]Supplemental Table S3). We determined that 364 of these
   genes were included in CNVs in two or more individuals; of these, only
   54 genes were not included in CNVs in controls. When combined, 55 genes
   were included in CNVs in both cohorts at least once but not in any
   controls (23 genes were in deletions in both cohorts, 22 genes were in
   duplications in both cohorts, and 10 genes were in different types of
   CNVs in the two case cohorts; [135]Supplemental Table S4).

   We performed a gene-based case-control enrichment analysis of
   conotruncal CNV-associated genes to determine if any genes were
   overrepresented in cases. No genes remained significantly enriched in
   our cases when all CNVs or only deletions or duplications were
   considered after correcting for multiple tests in the Combined Cohort
   (see [136]Figure 1). We observed the same conclusion when the analysis
   was restricted to the subset of HHE genes.

   We next restricted our analysis to include only a subset of genes (1534
   genes in total) previously implicated in cardiovascular development
   from the biomedical literature, as described in Methods. Using this
   process, we identified 37 such genes within 39 CNVs (10 duplications
   and 29 heterozygous deletions) in Cohort 1 and 40 genes within 89 CNVs
   (21 duplications and 68 heterozygous deletions) in Cohort 2. Among
   those CNVs, 29 of 39 were rare CNVs in Cohort 1 (7 duplications and 22
   deletions) and 27 of 89 were rare in Cohort 2 (10 duplications and 17
   deletions). Three of these rare CNVs were present in both Cohort 1 and
   2, all of which have been identified in other CHD studies. These
   included 2 very rare chromosome 1q21 deletions that overlapped with
   previously reported CNVs deleting the gene GJA5 ([137]Digilio et al.,
   2013; [138]Glessner et al., 2014; [139]Greenway et al., 2009;
   [140]Silversides et al., 2012; [141]Soemedi et al., 2012a;
   [142]Tomita-Mitchell et al., 2012; [143]Warburton et al., 2014). A
   smaller very rare CNV in the same region deleting only CHD1L was found
   in a single case from Cohort 2. The other two recurrent CNVs in our
   cohort disrupted genes ANGPT2 ([144]Silversides et al., 2012) and FLT4,
   respectively ([145]Serra-Juhe et al., 2012; [146]Soemedi et al., 2012b)
   ([147]Table 3). Several other rare CNVs found only in one of our
   cohorts were also reported in other CHD studies. These CNVs are listed
   in [148]Table 3 and overlapped genes of interest at 5q14.1 (SSBP2)
   ([149]Silversides et al., 2012; [150]Soemedi et al., 2012b), and 3q22.1
   (NPHP3) ([151]Tomita-Mitchell et al., 2012).

TABLE 3.

   Candidate Genes in Rare and Very Rare CNVs
   Hg19 Coordinate CNV Gene of Interest and [152]supporting Information
   Frequency
   (type if
   different) of
   Gene in
   controls
     __________________________________________________________________
     __________________________________________________________________

   Cyto
   band ID DX[153]^# start end Size
   (Kb) Type Inherited[154]^‡ Genes Gene(s) of
   interest Animal Model Human
   Phenotype Function
   and/or
   Expression Cohort References citing genes