Abstract Background Previous studies using different cardiac phenotypes, technologies and designs suggest a burden of large, rare or de novo copy number variants (CNVs) in subjects with congenital heart defects (CHD). We sought to identify disease-related CNVs, candidate genes and functional pathways in a large number of cases with conotruncal and related defects that carried no known genetic syndrome. Methods Cases and control samples were divided into two cohorts and genotyped in order to assess each subject’s CNV content. Analyses were performed to ascertain differences in overall CNV prevalence and to identify enrichment of specific genes and functional pathways in conotruncal cases relative to healthy controls. Results Only findings present in both cohorts are presented. From 973 total conotruncal cases, a burden of rare CNVs was detected in both cohorts. Candidate genes from rare CNVs found in both cohorts were identified based on their association with cardiac development or disease, and/or their reported disruption in published studies. Functional and pathway analyses revealed significant enrichment of terms involved in either heart or early embryonic development. Conclusions Our study tested one of the largest cohorts specifically with cardiac conotruncal and related defects. These results confirm and extend previous findings that CNVs contribute to disease risk for CHDs in general and conotruncal defects in particular. As disease heterogeneity renders identification of single recurrent genes or loci difficult, functional pathway and gene regulation network analyses appear to be more informative. Keywords: Congenital heart defects, conotruncal defects, copy number variants, CNVs, functional analysis, pathway analysis Introduction Congenital heart defects (CHDs), which comprise the most common, severe birth defect, occur in 4–9 per 1,000 liveborn and are thought to be caused by both genetic and environmental factors ([38]Pierpont et al., 2007). Conventional karyotyping detects chromosomal anomalies in approximately 13% of all CHD cases, most of which fall into aneuploidy syndromes (e.g. trisomy 18 or 21) (reviewed in [39]Hartman et al., 2011). Array-based technologies have revealed submicroscopic chromosomal deletions or duplications (copy number variants (CNVs)) in an additional 3–20% of CHD cases, with a higher frequency observed in those with syndromic or additional non-cardiac features (reviewed in [40]Andersen et al., 2014; [41]Lalani and Belmont, 2014). Despite differences in study cohort phenotypes and genomic surveillance approach, most studies report a significant burden of large, rare, and/or de novo CNVs in CHD cases ([42]Glessner et al., 2014; [43]Greenway et al., 2009; [44]Lalani et al., 2013; [45]Silversides et al., 2012; [46]Soemedi et al., 2012b; [47]Tomita-Mitchell et al., 2012). Some of these CNVs encompass genes usually disrupted by single nucleotide mutations for which CHD is part of the clinical spectrum, such as TBX1 (22q11.2 deletion, OMIM#188400, MIM:602054), EHMT1 (9q34.3 deletion or the Kleefstra syndrome OMIM#610253, MIM:607001), GATA4 (MIM:600576, mapping in to the 8p23.1 deletion), and other genes deemed critical for heart development (reviewed by [48]Andersen et al., 2014; [49]Lalani and Belmont, 2014). However, many of the newly discovered CNVs do not contain a yet well-established cardiac-related gene, and few are recurrent. We and others ([50]Glessner et al., 2014; [51]White et al., 2014) have therefore applied functional and pathway analyses to identify additional candidate genes, in order to establish mechanistic and/or developmental relationships between these rare events. To date, most studies have employed a limited repertoire of functional approaches and few have replicated findings from other studies ([52]Glessner et al., 2014; [53]Lalani et al., 2013; [54]Silversides et al., 2012). In an attempt to reduce disease heterogeneity, we sought to identify recurrent CNVs, candidate gene sets and developmental mechanisms associated with a specific subset of CHD, namely conotruncal and related defects. These defects are thought to share a common genetic etiology based on family and animal studies ([55]Digilio et al., 2000; [56]Gobel et al., 1993; [57]Miller and Smith, 1979). To that end we studied one of the largest cohorts to date with conotruncal defects whose cases did not carry a known genetic diagnosis, used denser SNP-based arrays to increase resolution in a subset of cases, applied a range of pathway and functional analyses, and compared our results to those previously published. Methods Study Cohorts This study was approved by The Children’s Hospital of Philadelphia (CHOP) Institutional Review Board. Study subjects and their parents were recruited, consented, and diagnosed in a uniform manner at the CHOP Cardiac Center. Study subjects were approached to participate if they had a conotruncal or related cardiac defect and had not been diagnosed with a recognized genetic syndrome upon review of their medical record (e.g. 22q11.2 deletion syndrome, Trisomy 21, Alagille syndrome). Reports from echocardiograms, cardiac catheterizations, cardiac magnetic resonance imaging or cardiac operative notes were reviewed to detail the cardiac anatomy. Medical records, including available consults performed by clinical geneticists, were reviewed to detail non-cardiac congenital anomalies. Family medical history was obtained by an interview conducted by a genetic counselor. DNA was extracted from whole blood collected from parents; proband DNA was either extracted from whole blood or in certain cases, from an established lymphoblastoid cell line, using the Puregene DNA isolation kit (Gentra Systems Inc., Minneapolis, MN). Three independent groups of healthy controls were used in this study. Healthy control samples (N=4255, Healthy_CHOP) were recruited from well-child visits (ages 3–18 years) within CHOP’s healthcare network as previously described ([58]Glessner et al., 2009). All healthy control samples for this study were carefully examined by genotype and health record to exclude samples with any indications of CHD, evidence of chronic health issues, documented genetic abnormalities, or syndromic genomic diseases. Genomic DNA was obtained from whole blood using standard protocols. A second group of healthy adult controls (N=2156), which were part of a previously published study of candidate genes for ocular refraction in the Age Related Eye Diseases Study (AREDS), were downloaded from dbGaP (dbGaP Study Accession: phs000001.v3.p1) ([59]Wojciechowski et al., 2013). A third control cohort, 179 HapMap CEU samples genotyped using Illumina HumanOmni 2.5M Beadchip Array, was downloaded from the Illumina data depository (ftp.illumina.com). Array Genotyping All CHOP samples, including all conotruncal patients and controls in the healthy CHOP cohort (N=4255), were genotyped following a consistent protocol at CHOP’s Center for Applied Genomics. The majority of conotruncal cases (n= 627) and all of the healthy controls were array genotyped on the Illumina Infinium™ II HumanHap550 v1 or v3, or BeadChip 610 array (Illumina, San Diego, CA) as previously described ([60]Elia et al., 2012). The remaining cases (n= 346) were array genotyped using the HumanOmni2.5-8 BeadChip array. The standard Illumina cluster file downloaded from the Illumina website was used for the analysis and running the GenomeStudio clustering algorithm. Control samples from the AREDS study were genotyped using the Illumina HumanOmni2.5 Quad BeadChip array with the standard Illumina cluster file as previously described (dbGaP Study Accession: phs000429.v1.p1 ([61]Simpson et al., 2013)). Sample Quality Control Subject gender was verified by the CNV Workshop software package ([62]Gai et al., 2010; [63]Gai et al., 2012). Exclusion criteria for genotypes included SNP call rate <98%, probe intensity LRR ≥3 standard deviations from the cohort mean (0.36), excess of inheritance errors within trios, non-European ancestry as determined by Plink sample stratification ([64]Patterson et al., 2006; [65]Price et al., 2006; [66]Purcell et al., 2007), or gender inconsistencies between self-reported and genotype-derived values. CNV detection and analysis We grouped cases and controls into two mutually exclusive cohorts. Cohort 1 included all cases and controls genotyped using the Illumina Infinium™ II HumanHap550 v1 or v3, or BeadChip 610 array. Cohort 2 included cases and AREDS control samples genotyped using the Illumina 2.5M BeadChip. In order to correct for differences in SNP probe content among all three SNP array versions used in Cohort 1, analysis was limited to the subset of SNPs shared by all three genotyping arrays (535,591 SNPs). CNV Workshop ([67]Gai et al., 2010; [68]Gai et al., 2012) and PennCNV ([69]Wang et al., 2007) were used to define CNV regions as previously described ([70]White et al., 2014). We applied the same approach for samples in Cohort 2 to adjust for the different versions of Illumina 2.5M BeadChip arrays between cases (Illumina HumanOmni2.5-8v1) and controls (Illumina HumanOmni2.5-4). For the 2.5M arrays, the subset of 2,332,843 SNPs in common between the two platforms was used to predict CNV regions in genotyped samples. In addition, we used 179 Hapmap Caucasian samples that were genotyped using HumanOmni2.5-8v1 BeadChip array (Illumina) to further reduce any systemic bias potentially introduced by different genotyping technologies used in Cohort 2. Hapmap samples were processed in a manner consistent with the Cohort 2 cases. Quality filtered CNV calls from HapMap samples were used as a validation set. Any genes, functional terms, or gene network clusters deemed as significant by comparing HapMap samples to the AREDS cohort control samples (nominal p-value< 0.05) were removed from further consideration, as these findings could be due to systemic bias. All of the analyses described below were performed in each cohort independently and repeated in the Combined Cohort, generated by merging Cohort 1 and Cohort 2. CNV Quality Control CNV calls were considered for further review only if predicted by both algorithms for ≥60% of the predicted CNV span, with the exception of certain large CNVs as specified below. Subject genotypes with total CNV burden ≥3 standard derivations from the cohort mean were removed from further analysis ([71]Pankratz et al., 2011). To reduce the possibility of type I error, deletions spanning less than 5 consecutive SNPs and duplications spanning less than 10 consecutive SNPs in Cohort 1 were excluded. Given that Cohort 2 was genotyped on a higher density array, we adopted a higher threshold for Cohort 2 such that deletions spanning less than 10 consecutive SNPs and duplications spanning less than 20 consecutive SNPs were excluded. In both cohorts, deletions spanning less than 10 kilobases and duplications spanning less than 20 kilobases were removed. CNV SNP and length thresholds were selected based upon previous studies from our group ([72]Elia et al., 2012; [73]Gai et al., 2012; [74]Shaikh et al., 2009; [75]White et al., 2014), examination of size-based concordance rates between the two algorithms ([76]White et al., 2014), and extensive experience with samples undergoing array-based clinical diagnostics at our institution ([77]Conlin et al., 2010). Additional CNV exclusion criteria included: CNVs with ≥50% overlap with centromere, telomere, and immunoglobulin variable regions; CNVs within olfactory receptor genes; and CNVs with SNP densities ≤ 1 SNP/30 kilobases, as described in ([78]Hasin et al., 2008; [79]Hellemans et al., 2007; [80]Young et al., 2008). CNVs were considered equivalent if their genomic regions reciprocally overlapped for ≥60% of their length. Large CNVs were defined as those falling within the top 5% of CNVs observed in the corresponding control cohorts, inherited CNVs as equivalent CNVs identified in a subject and either parent, rare CNVs as being observed in one or fewer controls (<0.05% frequency in controls), and very rare CNVs as those not observed in the control cohort ([81]White et al., 2014). B-allele frequencies (BAF) and signal intensity Log R ratios (LRR) of large CNVs were also visually inspected in GenomeStudio (Illumina). Large CNVs within 10 kilobases of each other were also visually inspected in GenomeStudio, and if the BAF and LRR traces indicated likelihood of a single contiguous event, the CNV regions were merged. Predicted CNVs were annotated using the RefSeq gene list ([82]Pruitt et al., 2005), as represented in the UCSC Genome Browser ([83]Kent et al., 2002) (genome.ucsc.edu). Functional analysis Gene Ontology (GO) ([84]Ashburner et al., 2000) annotations were retrieved from Ensembl.org (huseast.ensembl.org/index.html) using the BioMart data-mining tool ([85]Smedley et al., 2015). Mammalian Phenotype Ontology (MPO) term annotations were obtained from the Mammalian Genome Informatics resource (MGI) ([86]www.informatics.jax.org)([87]Eppig et al., 2015). Functional annotation of Reactome ([88]www.reatome.org) ([89]Croft et al., 2014; [90]Milacic et al., 2012) and KEGG ([91]www.kegg.jp) ([92]Kanehisa and Goto, 2000; [93]Kanehisa et al., 2016) gene set collections were downloaded from the GSEA database ([94]www.broadinstitute.org/gsea/msigdb/index.jsp) ([95]Mootha et al., 2003). All annotations were studied to assess gene set enrichments in cases as compared to controls. Gene Ontology and Mammalian Phenotype Ontology analyses included child and antecedent parental terms associated with a given gene. The extent of statistical enrichment for each functional term was determined by applying Fisher’s Exact Test (two-sided), which directly compared the frequency of occurrence in case and control cohorts for each gene or CNV being considered. We applied the Benjamini-Hochberg False Discovery Rate procedure ([96]Benjamini and Hochberg, 1995) to further eliminate any potential family-wise type I error. For global CNV and gene analyses, amplification and deletion events were considered both in aggregate and separately at each locus considered. We only reported a finding when the functions’ nominal p-value was less than 0.05 in each cohort and the False Discovery Rate measured in the merged cohort was less than 0.05 ([97]Figure 1). FIGURE 1. Flow chart outlining process of data analysis. [98]FIGURE 1 [99]Open in a new tab For CNV detection workflow refer to [100]White et al. (2014). Knowledge-based Analysis A subset of genes of particular interest for cardiac development and congenital cardiac defects was compiled in an unsupervised manner by considering prior knowledge of the biomedical literature or expression status in heart tissue. We used 47 terms descriptive of conotruncal defects or general cardiac development through an analysis of MEDLINE articles using natural language processing methods. Gene-Cardiac terms were required to be associated with at least three articles in order to eliminate type I error. Gene network construction To construct a network among our genes of interest, especially rare genes among patient cohorts, we used the Cytoscape ReactomeFIViz Gene Set/Mutation Analysis application with default parameters. (Cytoscape version 3.2, f1000research.com/articles/3–146/v2) ([101]Shannon et al., 2003; [102]Wu et al., 2014) Gene interaction networks obtained were clustered into modules using ReactomeFIViz’s Cluster FI Network function. A pathway enrichment analysis was employed on each individual network module using the Analyze Module Functions tool. Only pathways with a FDR <0.05 were reported in order to reduce family wise type I error. Cardiac Gene sets Two mouse gene expression profiles were compiled and tested for enrichment among our collection of case CNVs using Fisher’s Exact test. Known cardiac relevance was assayed by using previously reported gene lists that compiled mouse genes ranked by level of expression in the developing mouse heart at days E9.5 and E14.5 ([103]Zaidi et al., 2013). All mouse transcripts were converted to human gene homologs and subsequently ranked by their relative expression levels. The “high heart expressed 9.5” (HHE_9.5) list contains genes within the top quartile of expression levels (n = 4402) at E9.5, while the “high heart expressed _14.5” (HHE_14.5) list contains genes within the top quartile of expression levels at E14.5. Gene lists with expression levels ranked in the lowest quartile were also compiled (“low heart expressed 9.5” (LHE_9.5), and “low heart expressed_14.5” (LHE_14.5). For each gene list, differing thresholds of inclusion were also explored to measure the trend of enrichments among conotruncal patient cohorts. We repeated our gene function and network studies restricting the gene list to those present in very rare CNVs and a third high-heart expressed gene list that combined HHE_9.5 and HHE 14.5 (HHE: combined HHE_14.5 and 9.5) given that HHE_9.5 and HHE 14.5 shared approximately 80% gene identity. Selected genes were imported into DAVID Bioinformatics website ([104]Huang da et al., 2009a; [105]b) and Reactome FI application to evaluate gene functional and regulation network properties as previously described. We also repeated our analysis restricting the gene list to those present in very rare CNVs and the low-heart expressed gene list (LHE: combined LHE_14.5 and 9.5) to eliminate any false positive findings resulted from systemic gene set annotation bias by either DAVID Bioinformatics or Reactome FI. Statistics Test Utility The Wilkoxon rank sum test, two way ANOVA test (Type III Sums of Squares), or two tailed Fisher’s Exact Test, as appropriate, were used to test significance in case-control CNV and gene enrichment analyses. The Benjamini Hochberg False Discovery Rate (BH-FDR) procedure was applied to adjust for family-wise multiple hypotheses testing. CNV validation Selected CNVs, based on likely candidacy, statistical likelihood, or putative function, were validated using TaqMan® copy number assays (Life Technologies, Grand Island, NY). Selection was based on CNV size (<100 kb) and on available human disease information (OMIM: omim.org). An RNAse P TaqMan assay was used as the internal control. Assays were performed on an ABI 7500 Fast Realtime PCR System (Life Technologies) using standard conditions and analyzed with the 7500 Fast System SDS v.1.4.0 software (Life Technologies). All samples were assayed in triplicate and negative results were verified at least twice in independent experiments. Results Study cohort A total of 973 cases (Cohort 1 + Cohort 2) with a definitive diagnosis of a conotruncal or related heart malformation who upon review of medical records did not carry the diagnosis of a known genetic syndrome were used for these analyses ([106]Table 1). All cases were recruited at the CHOP Cardiac Center and passed our rigid quality control process as detailed in Methods. Most cases were ascertained at less than one year of age (63% of Cohort 1, 52% Cohort 2, 59% overall), and 71% of cases were ascertained at less than five years of age. As such, while we divided the cohort into those with and without additional congenital anomalies for subgroup analyses, we could not consider the presence of neurodevelopmental disorders given the young age of the study population. A first-degree relative was reported to have CHD in 6% (n=59) of cases. Array genotyped parental samples were only available for Cohort 1 for which there were 367 complete case-parent trios (both parents and case) and 199 incomplete case-parent trios (one parent and case). The type, number, and frequency of specific cardiac abnormalities from both cohorts are listed in [107]Table 1. All Cohort 1 (n=627) and Cohort 2 (n=346) cases were of European descent. There was no gender difference between the two cohorts with a proband gender ratio of 1.5:1 (376 males) and 1.34:1 (198 males) in Cohort 1 and 2, respectively (p-value=0.44, Fisher’s Exact Test). A total of 4833 healthy subjects (2980 in Cohort 1 and 1853 in Cohort 2) passed our quality control steps outlined above and were used as controls as detailed in Methods. TABLE 1. Phenotype distribution for both cohorts Count (%) __________________________________________________________________ Cardiac Lesion[108]^* Cohort 1 Cohort 2 Tetralogy of Fallot 249 (39.7) 118 (34.1) Pulmonary Stenosis 195 (78.3) 79 (66.9) Pulmonary Atresia 41 (16.5) 27 (22.9) Absent Pulmonary Valve 6 (2.4) 1 (0.8) Unspecified Pulmonary Anatomy 7 (2.8) 11 (9.3) __________________________________________________________________ Ventricular Septal Defect[109]^† 120 (19.1) 93 (26.9) Conoventricular 101 (84.2) 72 (77.4) Conal Septal Hypoplasia 5 (4.2) 4 (4.3) Malalignment 14 (11.7) 15 (16.1) Unspecified Type 0 2 (2.2) __________________________________________________________________ D-Transposition of the Great Arteries 124 (19.8) 68 (19.6) With Ventricular Septal Defect 61 (49.2) 30 (44.1) Without Ventricular Septal Defect 60 (48.4) 33 (48.5) Unspecified if Ventricular Septal Defect Present 3 (2.4) 5 (7.4) Transposition of the Great Arteries - other/unknown[110]^∼ 6 (1) 4 (1.2) __________________________________________________________________ Double Outlet Right Ventricle[111]^^ 68 (10.8) 19 (5.5) Pulmonary Stenosis/Atresia 28 (41.2) 8 (42.1) Aortic Stenosis/Atresia 9 (13.2) 1 (5.3) Tricuspid Stenosis/Atresia 8 (11.8) 2 (10.5) Mitral Stenosis/Atresia 26 (38.2) 5 (26.3) Common Atrioventricular Valve 6 (8.8) 5 (26.3) Single Ventricle (Double Inlet Right or Left Ventricle) 1 (1.5) 1 (5.3) Isolated Aortic Arch Anomaly 29 (4.7) 18 (5.2) Left Aortic Arch with Aberrant Right Subclavian Artery 1 (3.4) 4 (22.2) Right Aortic Arch with Mirror Image Branching 3 (10.3) 2 (11.1) Right Aortic Arch with Aberrant Left Subclavian Artery 9 (31.0) 7 (38.9) Double Aortic Arch 16 (55.2) 5 (27.8) __________________________________________________________________ Truncus Arteriosus 18 (2.9) 16 (4.6) Type 1 8 (44.4) 11 (68.8) Type 2 6 (33.3) 4 (25.0) Type 3 1 (5.6) 0 Type 4 1 (5.6) 0 Type Unspecified 2 (11.1) 1 (6.3) __________________________________________________________________ Interrupted Aortic Arch 12 (1.9) 8 (2.3) Type A 3 (25.0) 1 (12.5) Type B 8 (66.7) 7 (87.5) Type Unspecified 1 (8.3) 0 __________________________________________________________________ Other [112]^# 1 (0.1) 2 (0.6) __________________________________________________________________ Total 627 (100) 346 (100) [113]Open in a new tab ^* 2.7% and 3.2% of the subjects were also diagnosed with heterotaxy in Cohort1 and Cohort 2, respectively. ^† 17.5% and 14% of the subjects were also diagnosed with coarctation of the aorta in Cohort 1 and Cohort 2, respectively; and 9.2% and 6.5% had concurrent muscular VSDs in Cohort 1 and Cohort 2, respectively. ^∼ Cardiac segments SDL or unknown ^^ Subsets are not mutually exclusive. ^# Single subjects with atrial septal defect and muscular VSD, muscular VSD, right ventricle aorta and pulmonary atresia. CNV burden in conotruncal patient cohorts Structural variation content of the 627 cases in Cohort 1 totaled 2735 CNVs, consisting of 553 duplications, 2083 heterozygous deletions, 90 homozygous deletions, and 9 hemizygous deletions (deletions in male X-chromosome) ([114]Figure 2; [115]Supplemental Table S1a). Of these, 1407 (51.4%) could be definitively identified as inherited (710 maternal, 636 paternal, and 61 present in both parents), while 487 were present in neither parent and were thus suggestive of de novo events. Of these de novo CNVs, 145 were very rare (5.3% of total CNVs) and identified in 105 subjects (16.7% of subjects). Previous work had established bias towards Type II error using the protocol proposed by ([116]Itsara et al., 2010). Therefore, certain of these de novo events were likely due to Type II error and present in a parent; those of interest were validated by quantitative PCR, as described in Methods. We detected no significant differences in the overall CNV frequency (P>0.05, case/control ratio=1.00) or CNV size (P>0.05, case/control ratio=1.05) between cases and controls. This lack of correlation was upheld when considering only the subset of CNVs overlapping transcribed regions between cases and controls (P>0.05, mean case/control ratio=1.00 for CNV frequency, mean case/control ratio=1.08 for CNV size). The same conclusion was observed when we restricted the CNV-derived gene list to those overlapping with the HHE genes (CNV frequency: p-value>0.05, mean case/control ratio =1; CNV size: p-value >0.05, mean case/control ratio=1.04). When restricting CNV burden analysis to the 367 conotruncal trios, parental transmission of inherited CNVs to probands was found to be independent of parent gender (P>0.05; 654 maternal vs. 655 paternal). FIGURE 2. Flow chart depicting the distribution of CNVs in each cohort. [117]FIGURE 2 [118]Open in a new tab The total count of CNVs and in parenthesis, the subset of CNVs containing genes, are presented. Row I reports all CNVs; Row II describes inheritance status for Cohort 1; Rows III and IV report the number of rare and very rare CNVs as defined in Methods, respectively. We detected 3192 total CNVs from 346 singletons of Cohort 2, including 2270 heterozygous deletions, 283 homozygous deletions, and 639 duplications ([119]Supplemental Table S1b). We again detected no significant differences in the overall CNV frequency (P>0.05, case/control ratio=1.00) or CNV size (P>0.05, case/control ratio=1.05) between cases and controls in Cohort 2. As Cohort 2 had no trio data, we were unable to determine inheritance status. We defined rare CNVs as those present in less than 0.05% of healthy controls whether inherited or de novo. By this definition, Cohort 1 contained 836 rare CNVs (263 duplications, 568 heterozygous deletions, and 5 hemizygous X chromosome deletions) and Cohort 2 contained 888 rare CNVs (276 duplications, 611 heterozygous deletions, and one homozygous deletion). The overall distribution of CNVs in both cohorts is depicted in [120]Figure 2. The burden of rare CNVs was assessed in each cohort ([121]Table 2). Rare CNVs were significantly overrepresented in cases, both when comparing the proportion of subjects with rare CNVs or the frequency of rare CNVs in cases and controls. Rare CNV burden remained significant for overall large CNVs (CNVs with size larger than 3 times of standard derivation of mean CNV size in controls), suggesting similar overall CNV burden characteristics for each cohort. A subgroup analysis comparing the burden of rare CNVs in cases with and without additional non-cardiac anomalies showed significant enrichment as compared to controls ([122]Table 2) while there was no difference comparing one to the other ([123]Supplemental Table S2). TABLE 2. Rare CNV Burden Analysis All patients __________________________________________________________________ CTD[124]^# Patients with no other anomalies __________________________________________________________________ CTD[125]^# patients with additional anomalies __________________________________________________________________ CNV type Count CNV burden Case/control CNV burden odds ratio Significance (sample count based)[126]^* Significance (CNV count based)[127]^* Count CNV burden Case/control CNV burden odds ratio Significance (sample count based)[128]^* Significance (CNV count based)[129]^* Count CNV burden Case/control CNV burden odds ratio Significance (sample count based)[130]^* Significance (CNV count based)[131]^* Cohort 1 Duplications 263 0.420 1.462 1.13E-06 8.58E-10 213 0.4235 1.4759 7.98E-06 2.52E-08 49 0.3984 1.3885 3.05E-02 3.78E-03 __________________________________________________________________ Deletions 573 0.914 1.460 1.53E-07 9.45E-22 460 0.9145 1.4605 4.23E-06 2.58E-19 113 0.9187 1.4672 3.01E-03 2.61E-05 __________________________________________________________________ All CNVs 836 1.333 1.460 1.72E-09 1.89E-30 673 1.338 1.4653 1.06E-07 8.65E-27 162 1.3171 1.4424 1.99E-03 6.70E-07 __________________________________________________________________ Large CNVs __________________________________________________________________ Duplications 75 0.120 1.498 3.72E-03 2.01E-04 60 0.1193 1.4936 5.86E-03 7.27E-04 15 0.122 1.527 2.23E-01 8.23E-02 __________________________________________________________________ Deletions 32 0.051 1.690 1.91E-02 9.55E-03 22 0.0437 1.4482 1.01E-01 3.02E-02 10 0.0813 2.692 1.44E-02 9.12E-02 __________________________________________________________________ All CNVs 107 0.171 1.551 1.90E-04 3.94E-06 82 0.163 1.4811 1.40E-03 5.58E-05 25 0.2033 1.8466 2.21E-02 1.42E-02 __________________________________________________________________ Cohort 2 Duplications 276 0.798 1.668 3.03E-12 4.03E-47 235 0.7993 1.6717 2.67E-10 8.69E-42 40 0.8 1.6731 5.36E-04 5.00E-09 __________________________________________________________________ Deletions 612 1.769 1.847 6.28E-11 7.10E-34 526 1.7891 1.8677 8.53E-10 9.60E-32 82 1.64 1.7121 1.82E-02 4.61E-05 __________________________________________________________________ All CNVs 888 2.567 1.787 5.05E-15 1.10E-65 761 2.5884 1.8025 3.72E-12 3.14E-60 122 2.44 1.6991 1.71E-04 5.29E-10 __________________________________________________________________ Large CNVs __________________________________________________________________ Duplications 61 0.176 1.675 1.93E-04 2.47E-08 50 0.1701 1.6161 1.38E-03 1.03E-07 11 0.22 2.0906 1.29E-02 1.97E-02 __________________________________________________________________ Deletions 30 0.087 1.530 2.05E-01 6.36E-02 23 0.0782 1.3806 5.83E-01 2.40E-01 6 0.12 2.1177 5.59E-02 1.45E-01 __________________________________________________________________ All CNVs 91 0.263 1.625 1.47E-04 5.49E-09 73 0.2483 1.5337 1.71E-03 1.70E-07 17 0.34 2.1001 1.36E-02 3.13E-03 [132]Open in a new tab ^# CTD: Conotruncal defect, ^* Fisher Exact Test, two-side, bold type indicates significance Gene analysis In Cohort 1, a total of 1217 CNVs included one or more genes, collectively representing 1816 individual genes ([133]Supplemental Table S3). We determined that 314 of these genes were included in CNVs in two or more individuals; of these, only 42 genes were not included in CNVs in controls. In Cohort 2, 1412 CNVs included 1458 individual genes ([134]Supplemental Table S3). We determined that 364 of these genes were included in CNVs in two or more individuals; of these, only 54 genes were not included in CNVs in controls. When combined, 55 genes were included in CNVs in both cohorts at least once but not in any controls (23 genes were in deletions in both cohorts, 22 genes were in duplications in both cohorts, and 10 genes were in different types of CNVs in the two case cohorts; [135]Supplemental Table S4). We performed a gene-based case-control enrichment analysis of conotruncal CNV-associated genes to determine if any genes were overrepresented in cases. No genes remained significantly enriched in our cases when all CNVs or only deletions or duplications were considered after correcting for multiple tests in the Combined Cohort (see [136]Figure 1). We observed the same conclusion when the analysis was restricted to the subset of HHE genes. We next restricted our analysis to include only a subset of genes (1534 genes in total) previously implicated in cardiovascular development from the biomedical literature, as described in Methods. Using this process, we identified 37 such genes within 39 CNVs (10 duplications and 29 heterozygous deletions) in Cohort 1 and 40 genes within 89 CNVs (21 duplications and 68 heterozygous deletions) in Cohort 2. Among those CNVs, 29 of 39 were rare CNVs in Cohort 1 (7 duplications and 22 deletions) and 27 of 89 were rare in Cohort 2 (10 duplications and 17 deletions). Three of these rare CNVs were present in both Cohort 1 and 2, all of which have been identified in other CHD studies. These included 2 very rare chromosome 1q21 deletions that overlapped with previously reported CNVs deleting the gene GJA5 ([137]Digilio et al., 2013; [138]Glessner et al., 2014; [139]Greenway et al., 2009; [140]Silversides et al., 2012; [141]Soemedi et al., 2012a; [142]Tomita-Mitchell et al., 2012; [143]Warburton et al., 2014). A smaller very rare CNV in the same region deleting only CHD1L was found in a single case from Cohort 2. The other two recurrent CNVs in our cohort disrupted genes ANGPT2 ([144]Silversides et al., 2012) and FLT4, respectively ([145]Serra-Juhe et al., 2012; [146]Soemedi et al., 2012b) ([147]Table 3). Several other rare CNVs found only in one of our cohorts were also reported in other CHD studies. These CNVs are listed in [148]Table 3 and overlapped genes of interest at 5q14.1 (SSBP2) ([149]Silversides et al., 2012; [150]Soemedi et al., 2012b), and 3q22.1 (NPHP3) ([151]Tomita-Mitchell et al., 2012). TABLE 3. Candidate Genes in Rare and Very Rare CNVs Hg19 Coordinate CNV Gene of Interest and [152]supporting Information Frequency (type if different) of Gene in controls __________________________________________________________________ __________________________________________________________________ Cyto band ID DX[153]^# start end Size (Kb) Type Inherited[154]^‡ Genes Gene(s) of interest Animal Model Human Phenotype Function and/or Expression Cohort References citing genes