Abstract

Background

   Copy number variants (CNVs) contribute to 3% to 10% of isolated
   congenital heart disease (CHD) cases, yet their pathogenic roles remain
   unclear. Diagnostic efforts have focused on protein‐coding genes,
   largely overlooking long noncoding RNAs (lncRNAs), which play key roles
   in development and disease.

Methods AND RESULTS

   We systematically analyzed lncRNAs overlapping clinically validated
   CNVs in 743 patients with CHD from the Cytogenomics of Cardiovascular
   Malformations Consortium. We identified heart‐expressed lncRNAs,
   constructed a gene regulatory network using weighted gene coexpression
   network analysis, and identified gene modules associated with heart
   development. Functional enrichment and network analyses were used to
   identify lncRNAs that may be involved in heart development and
   potentially contribute to CHD. The code is stably archived at
   [36]https://doi.org/10.5281/zenodo.13799779. We identified 18 lncRNA
   candidate genes within modules significantly correlated with heart
   tissue, highlighting their potential involvement in CHD pathogenesis.
   Notably, lncRNAs such as lnc‐STK32C‐3, lnc‐TBX20‐1, and CRMA
   demonstrated strong associations with known CHD genes. Strikingly,
   although only 7.6% of known CHD genes were affected by a CNV, 68.8% of
   the CNVs contained a lncRNA expressed in the heart.

Conclusions

   Using weighted gene coexpression network analysis, we identified
   CNV‐associated lncRNAs with potential relevance to CHD, underscoring
   the complexities of noncoding regions in disease pathogenesis. These
   findings suggest that lncRNAs may play a greater role in CHD than
   previously recognized, highlighting the need for broader genomic
   analyses that extend beyond protein‐coding genes. This study provides a
   foundation for further exploration of lncRNAs in CHD, with potential
   implications for improved genetic characterization and diagnosis.

   Keywords: congenital heart disease, copy number variants, genomic
   regulation in cardiovascular development, long noncoding RNAs, weighted
   gene coexpression network analysis

   Subject Categories: Computational Biology, Etiology, Genetics, Basic
   Science Research
     __________________________________________________________________

Nonstandard Abbreviations and Acronyms

   CCVM
          Cytogenomics of Cardiovascular Malformations

   CHDgene
          Congenital Heart Disease Gene Database

   CNV‐lncRNA
          candidate lncRNA within clinically validated copy number
          variants

   GO
          Gene Ontology

   LncExpDB
          Long Non‐Coding RNA Expression Database

   PCW
          postconception weeks

   ROH
          runs of homozygosity

   TOM
          topological overlap matrix

   TPM
          transcripts per million

   WGCNA
          weighted gene coexpression network analysis

Research Perspective.

What Is New?

     * Our study underscores the critical yet often overlooked role of
       heart‐expressed long noncoding RNAs in congenital heart disease,
       identifying 18 candidate long noncoding RNAs linked to key
       developmental pathways and shedding light on the previously
       underappreciated contributions of noncoding regions to congenital
       heart disease pathogenesis.

What Question Should Be Addressed Next?

     * Future studies should investigate the functional roles of these
       candidate long noncoding RNAs in heart development, including
       experimental validation of their regulatory interactions with known
       congenital heart disease genes and their potential as diagnostic
       biomarkers or therapeutic targets.

   Congenital heart disease (CHD) is one of the most common congenital
   malformations, with a birth prevalence of 1% worldwide.[37] ^1 In 60%
   to 75% of CHD cases, the genetic cause of the disease is unknown.[38]
   ^2 Copy number variants (CNVs), which are subchromosomal variations in
   the number of copies of DNA segments ranging in size from 1000 base
   pairs to several megabases (Mb), account for 3% to 10% of the
   pathogenic causes of isolated CHD.[39] ^2 Some recurrent CNVs are
   associated with a specific type of CHD, including 22q11.2 Deletion
   Syndrome (also known as DiGeorge syndrome, a ~3 Mb microdeletion
   affecting TBX1, a T‐box transcription factor required for normal
   development of the second heart field),[40] ^3 Williams–Beuren syndrome
   (deletion of 7q11.23 that leads to the haploinsufficiency of nearly 25
   genes, including the ELN gene that codes for elastin, important for
   blood vessel elasticity and associated with aortic and pulmonary
   stenosis)[41] ^4 and 1p36 deletion syndrome (the most common terminal
   deletion syndrome with cardiac findings that include congenital heart
   defects and cardiomyopathy).[42] ^5 In addition to these clinical
   syndromes, CNVs are linked with a considerably greater risk of
   nonsyndromic CHD.[43] ^6 Nonetheless, many distinct CNVs are observed
   in patients who have the same CHD diagnosis, and often, their
   contribution to CHD pathogenesis remains unclear. The precise
   mechanisms by which CNVs lead to nonsyndromic CHD are still largely
   unknown, necessitating further investigation into their functional
   impact and the underlying biological pathways involved.

   When considering the molecular genetics of CNVs in CHD, disease
   pathogenesis can be attributed to changes in the number of gene copies
   within the CNV. Although the established diagnostic approach in CHD
   primarily emphasizes well‐known protein‐coding genes, as evidenced by
   resources such as the CHDgene website (a curated list of 142
   high‐confidence protein‐coding CHD genes),[44] ^7 it is becoming
   increasingly apparent that variation in noncoding regions, which
   constitute about 99% of the genome, may contribute to the pathogenicity
   of CNVs.[45] ^8 Approximately 60% to 80% of the human genome can be
   transcribed into noncoding RNAs, including microRNAs (miRNAs), long
   noncoding RNAs (lncRNAs), small nucleolar RNAs, and several other
   classes of RNA molecules that provide various regulatory and functional
   roles in the genome.

   lncRNAs are RNA molecules longer than 200 nucleotides that usually do
   not have a coding sequence.[46] ^9 , [47]^10 , [48]^11 They are
   involved in diverse regulatory roles in gene expression, such as
   modifying chromatin, regulating transcription and posttranscriptional
   processes, and regulating RNA processing.[49] ^10 , [50]^11 lncRNAs
   originate from various genomic regions, including intergenic, intronic,
   and overlapping regions with other genes.[51] ^9 , [52]^12 At the DNA
   level, lncRNAs influence chromatin architecture by recruiting
   chromatin‐activating complexes or looping factors, thereby regulating
   target genes.[53] ^10 They also play significant roles in alternative
   splicing by interacting with splicing factors to control transcription
   rates and ensure proper gene expression.[54] ^12 Posttranscriptionally,
   lncRNAs interact with RNA‐binding proteins through sequence or
   structural motifs to regulate mRNA splicing. Furthermore, lncRNAs can
   act as miRNA sponges, binding to miRNAs and preventing their
   interaction with mRNAs, which is crucial for gene expression.[55] ^10
   Lastly, lncRNAs regulate protein stability and degradation by altering
   their structure to interact with proteins involved in key signaling
   pathways. Together, these roles make lncRNAs vital for differentiation
   and development, affecting gene regulation at the DNA, RNA, and protein
   levels.

   lncRNAs have been implicated in various biological processes and
   diseases. Their dysregulation can lead to a variety of conditions,
   including cardiovascular diseases,[56] ^13 , [57]^14 , [58]^15
   neurodegenerative disorders,[59] ^16 , [60]^17 , [61]^18 , [62]^19 and
   other complex diseases.[63] ^20 , [64]^21 , [65]^22 , [66]^23 Although
   the role of lncRNAs in CHD is emerging,[67] ^13 , [68]^14 , [69]^15
   their function as key regulators of gene expression during heart
   development suggests that they could be critical players in the
   pathogenesis of CHD. The lncRNA lnc‐STK32C/RP11‐432J24.5, associated
   with responses to stress, stimuli, and the immune system, is
   differentially expressed in ischemic hearts.[70] ^14 CRMA (also known
   as CARMA for Cardiomyocyte Maturation‐Associated lncRNA), previously
   known as MIR1‐1HG‐AS1, is an antisense lncRNA that regulates NOTCH1
   signaling by targeting the miRNAs MIR1‐1, MIR1‐133a2, and other
   noncoding RNAs.[71] ^24 Notably, in vitro knockdown of CRMA promotes
   cardiac commitment, suggesting that its downregulation is crucial for
   directing progenitor cells toward a cardiac fate. The intergenic lncRNA
   BANCR (BRAF‐Activated Nonprotein Coding RNAs; HSALNG0071734)[72] ^25
   has been reported to be upregulated in atherosclerotic plaques[73] ^26
   and may influence the development of cardiomyopathy.[74] ^13 This
   lncRNA encodes a short open reading frame that produces a microprotein
   that regulates various cellular functions, including contributing to
   heart muscle structural integrity and function. BANCR has been shown to
   facilitate cardiomyocyte migration and ventricular enlargement during
   heart formation in primates, with its expression confirmed to be higher
   during this critical period of heart development.[75] ^27 , [76]^28
   ClinVar, a database of genetic variants, contains several reports of
   pathogenic CNVs affecting the 9q21.11–21.13 region, which includes the
   gene BANCR.[77] ^29 , [78]^30 However, these CNVs affect several genes
   in the region, and BANCR has not yet been directly associated with CHD.

   Several CNVs affecting lncRNAs have been described in patients with
   CHD. In DiGeorge syndrome, the 22q11.2 deletion spans the antisense
   lncRNA lnc‐TSSK2‐8 (HSALNG0134007), which activates canonical Wnt
   signaling by protecting β‐catenin from degradation.[79] ^31 Previously,
   only TBX1 was thought to be the disease‐causing gene in 22q11.2, but
   evidence suggests that the loss of lnc‐TSSK2‐8 contributes to the
   development of CHD.[80] ^32 , [81]^33 Another intergenic lncRNA
   associated with CHD is lnc‐NIPA1‐4 (HSALNG0104472), located in the
   pathogenic CNV 15q11.2, where in vitro studies have shown that knockout
   of this gene significantly affects cardiomyocyte differentiation.[82]
   ^34

   Despite their significance, the role of most lncRNAs in human health
   and disease remains largely unknown. This study aims to identify novel
   lncRNAs that may play a role in the pathogenesis of CHD by examining a
   large cohort of patients with CHD with candidate lncRNAs within
   clinically validated CNVs (CNV‐lncRNAs). This study focuses on
   nonsyndromic CHD to reduce confounding factors associated with
   syndromic cases, which often present with additional systemic
   abnormalities. By narrowing the scope to nonsyndromic CHD, we aim to
   identify cardiac‐specific candidate genes and pathways, providing a
   clearer foundation for future research into the molecular basis of both
   syndromic and nonsyndromic CHD. By leveraging transcriptomic data from
   the developing heart and six other organs, we used a data mining method
   known as weighted gene coexpression network analysis (WGCNA) to assess
   the expression patterns, interaction networks, and potential mechanisms
   by which CNV‐lncRNAs may influence CHD. We identified several
   CNV‐lncRNAs with strong associations with key developmental pathways
   implicated in CHD. Furthermore, to promote transparency and
   reproducibility, we have made the software code and methodologies used
   in this analysis publicly available
   ([83]https://github.com/nch‐igm/lncCHDNet/). By providing our codebase
   as a reusable resource, we aim to contribute to a more comprehensive
   understanding of lncRNA function in CHD and pave the way for future
   discoveries in the field.

METHODS

Description and Structure of the CNV Data Set of Patients With CHD

   Figure [84]1 gives an overview of the process we followed to identify
   lncRNAs affected by CNVs that may play a role in the cause of CHD. To
   elucidate the potential role of copy number changes in lncRNA genes in
   abnormal heart development, we leveraged the recently published CHD
   cohort from the Cytogenomics of Cardiovascular Malformations (CCVM)
   Consortium (see Table [85]S1 contributing organizations that provided
   CNV data used in this study).[86] ^35 This data set contains 1363
   patients with CHD with both an abnormal echocardiogram and an abnormal
   finding on chromosomal microarray analysis (CMA). CMA is a genome‐wide
   technique that identifies genomic gains or losses (CNVs) and runs of
   homozygosity (ROH). It is recommended as a first‐tier test for patients
   with neurodevelopmental disorders and congenital anomalies[87] ^36 and
   has been integrated into routine practice at many pediatric cardiac
   centers for infants with severe CHD.[88] ^37 Using a centralized
   echocardiography review, the CCVM registry meticulously categorized
   each patient's CHD using a hierarchical classification to generalize
   the cardiac phenotypes.[89] ^35 Specific structural heart defects
   identified through clinical evaluations included, but were not limited
   to, septal defects (SDs), conotruncal heart defects (CTDs), and left
   and right‐sided obstructive lesions. The cohort included about an equal
   proportion of male and female patients. Race and ethnicity were
   self‐reported, with 60% of patients reported as White, 14% as Black, 8%
   of patients identifying as Hispanic or Latino, 3% as Asian, and 0.4% as
   Native American. These proportions align with the regional demographics
   of the participating centers. For more detailed information regarding
   the cohort demographics, please refer to the CCVM study.[90] ^35 All
   patients had clinically reported pathogenic CNVs as assessed by the
   American College of Medical Genetics and Genomics guidelines.[91] ^38
   Both syndromic and isolated cases of CHD are present in this cohort to
   allow for comprehensive phenotypic and genotypic correlations. The CNV
   data set applied in this study is available through the CCVM
   Consortium.

Figure 1. Workflow for identifying lncRNA candidates in CHD.

   Figure 1
   [92]Open in a new tab

   This figure provides an overview of the multistep methodological
   approach used in our analysis to investigate the roles of lncRNAs in
   CHD. In the first step, we integrate genomic information about lncRNAs
   from our cohort with expression data. In the second step, weighted gene
   coexpression network analysis is applied to identify modules that
   exhibit correlated expression across various tissues. The third step
   involves identifying hub genes, focusing on lncRNA candidates in the
   heart‐specific modules. Finally, the network is visualized to highlight
   the interrelationships among CHD‐related genes, lncRNAs, and
   protein‐coding genes, showcasing the comprehensive methods applied in
   our analysis. CNV indicates copy number variant; CNV‐lncRNA, candidate
   lncRNA within a clinically validated CNV; and lncRNA, long noncoding
   RNA.

Microarray Technology

   The CMA data were obtained from clinical‐grade, commercially available
   platforms, with each center following the respective manufacturer
   protocols. Platforms included oligo‐based and oligo+SNP arrays from
   Agilent, Roche, PerkinElmer, and Affymetrix, among others. Data
   processing followed clinical standards, ensuring equivalent resolution
   across platforms. Fluorescence in situ hybridization was occasionally
   used for confirmatory testing, as described in the initial CCVM
   study.[93] ^39

Ethics Approval and Consent to Participate

   The CCVM Consortium's collection of CMA and clinical data was approved
   by each clinical center's institutional review board and used a waiver
   of informed consent. All data were anonymized before access and we
   followed general ethical guidelines for data use, including respecting
   privacy, data integrity, and intellectual property rights.

Preprocessing of the CHD Cohort

   The CCVM data set underwent rigorous data wrangling and filtering to
   focus the analysis on isolated cases of CHD with CNVs. This filtering
   considered the CHD classification frequency, CNV length, location, and
   type (deletion, duplication, or ROH). The original data set includes
   CMA data for 1 363 patients with CHD. Of these, 386 patients were
   classified as having syndromic CHD, and 977 were categorized as
   nonsyndromic (Figure [94]S1). For this study, “syndromic” refers to
   patients with genomic disorders with well‐established associations with
   congenital heart defects, based on consensus opinions within the CCVM
   Consortium. The list of disorders considered syndromic was curated by
   the consortium and may not fully encompass all syndromic presentations.
   Across the patients who were nonsyndromic, a total of 2 416 CNVs were
   recorded, including heterozygous (n=901) and homozygous (n=16)
   deletions, duplications (n=770), and runs of homozygosity (ROH, n=729)
   (Figure [95]S2). Although most patients had a single CNV (n=963), 300
   had 2, and 100 had 3 or more CNVs listed on their clinical test report.

   The first stage of preprocessing the CHD cohort was to remove all CNV
   entries capturing ROH. This type of CNV was excluded from our analysis,
   as discerning the function of a lncRNA contained within these regions
   would be considerably more complex than events that resulted in a loss
   or gain of a lncRNA. For the remaining 1687 CNVs (duplications and
   deletions), the data set was next filtered based on the size of the CNV
   to exclude small CNVs <5 000 base pairs and large CNVs >5 000 000 base
   pairs (5 Mb). The lower limit for CNV length is determined by the size
   detection capabilities of the microarray technology. In contrast, the
   upper limit of 5 Mb is a commonly used cutoff to exclude aneuploidies
   and other large chromosomal rearrangements.[96] ^35 A summary of CNV
   length distributions for the filtered data set is as follows:
   minimum=424 bp, median size=0.64 Mb, mean size=7.11 Mb, and maximum of
   155.27 Mb (see Table [97]S2 for more details). The highly variable
   pseudoautosomal regions on the X chromosome, which exhibit high rates
   of recombination and structural variation, were also removed to ensure
   more consistent and interpretable results in the analysis.

   The final steps of preprocessing the data focused on the CHD category
   and defect classification. After removing ROH, small and large CNVs,
   and CNVs located in the pseudoautosomal regions, 1 126 patients
   remained with 1 404 CNVs. Of these patients (821 nonsyndromic; 305
   syndromic), we removed all 305 patients who were syndromic and those
   with a diagnosis of cardiomyopathy (n=10) as this is primarily a
   disease of the heart muscle that may not directly relate to structural
   congenital defects. Syndromic CHD cases were excluded from this
   analysis to focus on nonsyndromic cases, thereby reducing potential
   confounding from phenotypes unrelated to cardiac defects. This
   filtering step aimed to ensure the identification of CNV‐lncRNAs
   specifically associated with cardiac phenotypes.

   Patients with aortopathy (n=15) and arteriopathy (n=14) were removed
   due to the distinct pathophysiology of these conditions. Five other
   categories were removed due to their small sample size: CTD and
   atrioventricular septal defects (n=15), septal defect with right
   ventricular outflow tract obstruction (VOTO; n=11), other (n=9), patent
   ductus arteriosus (n=2), and single ventricle, other subtype (n=2). The
   exclusion of these rare phenotypes enabled us to focus the analysis on
   the recurrent cardiac malformations that constitute the majority of
   phenotypes observed in this cohort with CHD. The final data set
   comprised 743 patients with 938 CNVs.

RNA‐Seq Data Set and Processing for CNV‐lncRNA Analysis

   We used the human organ RNA‐seq time series of the development of seven
   major organs (ArrayExpress Accession Number E‐MTAB‐6814).[98] ^40 The
   data set covers 23 developmental stages across seven organs (n=313:
   brain/forebrain 55, hindbrain/cerebellum 59, heart 50, kidney 40, liver
   50, ovary 18, and testis 41), starting at 4 postconception weeks (PCW)
   until 58 to 63 years of age. Normalized expression levels of lncRNAs
   calculated from this time series, expressed as transcripts per million
   (TPM), were obtained from the publicly accessible Long Non‐Coding RNA
   Expression Database (LncExpDB, available at lncRNA Expression
   Database).[99] ^41 For a description of how LncExpDB processed the
   transcriptomic data, see Data [100]S1.

   The RNA‐seq data include gene expression levels for 101 293 lncRNAs and
   19 957 protein‐coding genes. When performing WGCNA, applying a minimum
   expression threshold of at least 1 TPM is recommended, ensuring that
   only genes expressed in at least 1 sample are included in the
   analysis.[101] ^34 Applying this threshold to all samples in the
   developmental time series, 18 317 protein‐coding genes had a minimum
   expression level of ≥1 TPM. For lncRNAs, we required a minimum
   expression level of ≥1 TPM in any of the 50 heart samples, as
   recommended for building the coexpression network. This filtering
   criterion identified 24 009 lncRNAs expressed in the heart at any
   developmental stage, along with corresponding expression levels for
   these genes across the other 6 organs.

Identifying the CNV‐lncRNA

   Next, we wanted to identify all lncRNAs that fell within the 938 CNVs
   in the 743 patients with isolated CHD in the processed CCVM cohort.
   Given the CNV coordinates of the original data were mapped to GRCh37, a
   liftover was performed to map regions to reference genome build GRCh38.
   The lncExpDB RNA‐seq data were generated using a custom LncBook gene
   transfer format file (LncBookv1.9_GENCODEv33_GRCh38.gtf). As such, this
   same file was used to identify CNV‐lncRNAs. Although protein‐coding
   genes are present in the gene transfer format file, only lncRNAs were
   mapped to the CNV coordinates. Coordinates of CNVs were intersected
   with the LncBook lncRNAs using the R package GenomicsRanges (v1.56.1).
   A total of 15 261 unique lncRNAs fell within clinically reported CNVs
   in the CCVM cohort. The median number of lncRNAs per CNV was 10, with
   an interquartile range of 24 and a maximum of 356 lncRNAs in a single
   CNV (Figure [102]S3 shows the distribution of lncRNAs within the CNVs).
   Of the original 743 patients, 720 had a CNV overlapping 1 or more
   lncRNAs.

   The list of 24 009 lncRNAs expressed in the heart at any developmental
   stage was intersected with the 15 262 unique lncRNAs falling within a
   CNV in the CCVM cohort. This resulted in a final list of 3 608
   CNV‐lncRNAs with evidence of heart expression, along with 18 317 unique
   protein‐coding genes expressed in any tissue at any development stage,
   that together can be further analyzed in the coexpression network
   (Figure [103]2E). The median number of expressed lncRNAs per CNV was 2,
   with an interquartile range of 6, and a maximum of 110 expressed
   lncRNAs in a single CNV (Figure [104]S4 shows the distribution of
   expressed lncRNAs within the CNVs). Of the original 743 patients, 557
   had a CNV overlapping 1 or more expressed lncRNAs.

Figure 2. Overview of CHD categories and CNVs in the CCVM cohort.

   Figure 2
   [105]Open in a new tab

   This figure details the selection of the CNV‐LncRNAs reported in the
   CCVM cohort of patients with isolated CHD. A, Schematic representation
   of the CNV filtering process that refined the data set for more
   targeted analysis. B, The bar plot shows CHD category distribution in
   patients with isolated (nonsyndromic) CHD across the CCVM study cohort.
   Diagnoses are ranked by patient count, with bars indicating totals.
   Bars are color‐coded: sky blue for patients included in the analysis
   and white for those excluded. C, Sankey diagram shows the distribution
   of isolated CHD cases in the cohort, detailing associated cardiac
   phenotypes and the total number of CNVs across all patients. Each node
   represents a category, diagnosis, or CNV type labeled with the total
   CNV count. The flow widths between nodes reflect the number of CNVs,
   highlighting the relative distribution of gains and losses across
   different CHD diagnoses in the cohort. D, The circos plot visualizes
   CNV gains and losses across the human genome (hg38). The outer ideogram
   track displays chromosome bands, with black lines indicating gene‐poor
   G‐bands and red lines marking centromeres. The blue histogram track
   represents CNV gains, and the red track represents CNV losses. Taller
   bars highlight regions with higher frequencies of CNVs, pinpointing the
   genomic areas most impacted by gains and losses in the study cohort. E,
   Venn diagram of expressed lncRNAs and CNV‐associated lncRNAs
   illustrating the overlap between lncRNAs expressed in any tissue
   (blue), heart‐expressed lncRNAs (red), and CNV‐associated lncRNAs
   (green). The overlap regions indicate the number of lncRNAs shared
   between these categories, with the 3608 heart‐expressed lncRNAs used in
   our analysis being a subset of the total expressed lncRNAs. APVR
   indicates anomalous pulmonary venous return; AVSD, atrioventricular
   septal defect; CCVM, Cytogenomics of Cardiovascular Malformations; CHD,
   congenital heart disease; CNV, copy number variant; CTD, conotruncal
   heart defects; HTX, heterotaxy; lncRNA, long noncoding RNA; LVOTO, left
   ventricular obstructive lesion; PDA, patent ductus arteriosus; RVOTO,
   right ventricular obstructive lesion; and SD, septal defects.

Network Construction

   Protein‐coding genes are well studied, and their functions are better
   understood than those of lncRNAs. Therefore, we can leverage the
   expression pattern of protein‐coding genes to infer the function of
   lncRNAs. The final gene set consisted of 21 925 total genes, of which
   18 317 are protein‐coding genes expressed in any tissue, and 3 608
   unique lncRNAs expressed in the human heart and falling within
   clinically reported CNVs in the CCVM cohort. We next constructed gene
   regulatory networks using lncRNAs and protein‐coding genes identified
   through WGCNA.[106] ^42 WGCNA is a systems biology method that
   identifies clusters, or modules, of genes that show high levels of
   correlation in their expression patterns. These modules are then
   summarized using module eigengenes. A module eigengene captures the
   average expression profile of a gene module.[107] ^42 This allows the
   module to be associated with different tissues, and hub genes
   specifically related to the heart to be identified. Building on this
   framework, we implemented stepwise module detection to refine our
   network analysis further, focusing on CNV‐lncRNA coexpression networks
   to identify novel lncRNA hub genes.

   In contrast to the block‐wise module detection approach taken by Lu
   et al.,[108] ^34 our study employs stepwise module detection for
   annotating CNV‐lncRNAs. This approach offers greater control and
   precision in network construction by allowing the merging of similar
   modules. In contrast, the block‐wise method does not and places genes
   into independent blocks, even when there may be potential
   similarities.[109] ^42 For module detection, an appropriate
   soft‐threshold power must be specified. WGCNA aims to create a network
   that follows a scale‐free topology, characterized by a structure in
   which a few nodes are highly connected, and most nodes have low
   connectivity. We evaluated a range of powers, selecting a power value
   of 9 (R^2=0.76), to ensure an optimal balance between achieving a
   scale‐free topology and maintaining network connectivity
   (Figure [110]S5).

   First, the topological overlap matrix (TOM) similarity was calculated
   with an unsigned network (see Data [111]S1). The unsigned network
   considers positive and negative gene correlations but treats them
   equivalently without distinguishing their specific contribution.[112]
   ^42 , [113]^43 We chose an unsigned network to capture both activators
   and repressors of the heart developmental process. Second, network
   modules were detected by hierarchical clustering on the distance matrix
   derived from the TOM, specifically using
   [MATH: <mrow><mrow><mn>1</mn><mo>−</mo><mi
   mathvariant="italic">TOM</mi></mrow></mrow> :MATH]
   to measure gene–gene dissimilarity (Figure [114]S6), enabling genes to
   be grouped into coexpression modules based on their correlation
   patterns.[115] ^42 Each module is assigned a unique identifier in the
   form of a color, which provides a visual and intuitive way to
   distinguish between different modules. These colors are arbitrary and
   serve as labels, but they remain consistent throughout the analysis,
   making it easy to track modules across different plots and analyses.
   For example, genes assigned to the “magenta” module exhibit similar
   expression patterns, whereas those in the “green” module form a
   distinct coexpression group. This color‐based labeling system
   facilitates the interpretation of network topology and gene clustering
   results. Third, each module was summarized by its eigengene.
   Hierarchical clustering of these eigengenes, using dissimilarity
   measures (1 – correlation), was used to refine module granularity and
   merge similar modules at a 0.1 dissimilarity threshold
   (Figure [116]S7).

Relate Modules to Organs and Developmental Stages

   The final steps in our analysis focused on identifying which modules
   correlated most closely with the developing heart and identifying hub
   genes. First, the module‐trait relationships were measured by
   calculating the Pearson correlation of the module eigengenes with the
   tissue type. To ensure robustness against Type I errors due to multiple
   comparisons across the 7 organs, we applied a stringent P value
   threshold of ≤0.001.[117] ^42 Modules with high correlations to
   particular tissues, such as the heart, were highlighted as
   tissue‐specific.

   In addition to using the Pearson correlations, we visualized these
   relationships through a dendrogram, traditionally used to cluster
   coexpressed modules based on their eigengenes. In this case, we treated
   tissue types as an additional external trait and extended the
   dendrogram to reflect the similarity of module eigengenes to these
   tissue traits (Figure [118]S8). To complement the dendrogram, we also
   used an adjacency heatmap, which provides a visual representation of
   the correlation strengths between modules and tissues (Figure [119]S9).
   We focused on identifying modules most strongly correlated with the
   heart (Figures [120]S10 through [121]S12), focusing on the 50 heart
   samples to developmental stages (Figures [122]S13 through [123]S15).

   An enrichment analysis was conducted on each WGCNA module to assess the
   representation of lncRNAs, protein‐coding genes, and CHD genes.
   Hypergeometric and Fisher's exact tests were employed to determine
   whether these gene types were significantly enriched or depleted within
   the modules. Fisher's exact test P values were adjusted for multiple
   testing, and the results were classified as “enriched,” “depleted,” or
   “not significant” (see Data [124]S1)

   Next, we performed Gene Ontology (GO) and pathway enrichment analysis
   on the protein‐coding genes within each module to identify
   overrepresented gene sets. This analysis focused on three GO
   categories: biological processes, molecular functions, and cellular
   components, adjusting P values for multiple testing (see Data [125]S1
   for details).

Identification of Hub Genes

   In WGCNA, a hub gene is a highly connected gene within a module,
   strongly correlated with the module eigengene, and often pivotal in the
   biological processes the module represents. Hub genes for modules of
   interest are identified by calculating the module memberships (MM) for
   all genes and selecting those with R^2≥0.80.[126] ^44 High MM values
   signify a strong and specific connection between a gene and a
   particular module. Low MM values indicate a weaker connection, showing
   that the gene's expression pattern does not align closely with the
   module's expression. In addition to MM, we calculated the correlation
   between each gene's expression and the heart trait to determine
   gene–trait significance (GS) values. Here, we refer to the “heart
   trait” as a variable representing the expression data specific to heart
   samples across the data set. GS represents how strongly each gene is
   associated with the heart trait, compared with other organs. Scatter
   plots were generated to visualize the relationship between MM and GS
   for each module, helping to identify hub genes with high MM and GS
   values. Gene aliases of the hub genes are identified by the LncExpDB
   database.[127] ^41 The classification of lncRNAs among the hub genes
   was obtained from LncBook (Version 2.00).[128] ^45 Additionally,
   LncExpDB was used to overview lncRNA tissue expression in organs not
   included in the developmental data set.

   To visualize the networks of correlated genes, the modules with
   significant association with the heart were exported to Cytoscape,[129]
   ^46 where nodes represent genes and edges represent gene coexpression
   derived from the TOM matrix. We applied an edge weight minimum of 0.3
   to refine the network, removing isolated nodes and small subnetworks
   for clarity. We focused on networks containing known CHD genes within
   the module of interest. To identify whether a protein‐coding gene was a
   CHD gene, we cross‐referenced our gene list with 142 high‐confidence
   CHD genes from the CHDgene database. The CHDgene website
   ([130]https://chdgene.victorchang.edu.au) is a valuable resource for
   clinicians and researchers, offering a curated list of genes with
   variants consistently shown to cause CHD in humans.

   Embryonic heart Gini coefficient scores were used to quantify
   tissue‐specific gene expression during human heart development (see
   Data [131]S1 for details).[132] ^28 VanOudenhove et al. proposed that a
   higher Gini coefficient (eg, >0.5) indicated a gene's expression was
   more specific to the embryonic heart, identifying genes enriched for
   heart‐related functions like TBX5, IRX4, and HAND1. To integrate this
   approach into the current analysis, LncBook identifiers were mapped to
   Ensembl gene accessions to retrieve Gini coefficients for genes with
   maximal expression in the embryonic heart.

RESULTS

   The CCVM cohort comprises 1 363 patients and includes both syndromic
   and nonsyndromic CHD (Figure [133]S1).[134] ^35 We focused our analysis
   on patients with nonsyndromic CHD, applying a filtering scheme that
   removed ROH, deletions <5 000 base pairs (Kb), aneuploidies, and CNVs
   falling within the pseudoautosomal region (Figure [135]2A). The
   filtered data set has 743 patients who are nonsyndromic with a total of
   938 CNVs >5 Kb, but <5 000 000 base pairs (Mb) (Figure [136]S2). Eight
   different subtypes of CHD were represented in the cohort: CTD, SD, left
   VOTO, right VOTO, SD + left VOTO, heterotaxy, atrioventricular septal
   defect, and anomalous pulmonary venous return (Figure [137]2B).

Identification of CNV‐Associated lncRNA Genes

   The CCVM CMA data were successfully lifted from GRCh37 to GRCh38 to
   annotate lncRNA and protein‐coding genes within each CNV using a
   comprehensive set of >100 000 putative lncRNA genes from LncExpDB.[138]
   ^41 Following the liftover process, some CNVs were split into multiple
   regions, resulting in a final data set of 953 CNVs. The frequency of
   CNVs for the 8 cardiac phenotypes is ranked from most to least in
   Figure [139]2C. There are 494 CNV gains (blue) and 459 losses (red). A
   total of 573 patients had 1 CNV, 140 had 2 CNVs, and 30 had ≥3 CNVs.
   The chromosomal distribution for gains and losses shows clinically
   reported CNVs in the CCVM patients with nonsyndromic CHD affect all
   chromosomes (Figure [140]2D). Overall, chromosome X had the most CNVs
   with 88 (60 gains and 28 losses). Chromosome Y had the fewest with 7
   CNVs (5 gains and 2 losses). Only 72 of the 953 CNVs (7.6%) affected a
   known CHD gene. Of the 142 known CHD genes in the current CHDgene list,
   33 (23%) were affected by a CNV in our data set. Notably, some of these
   known genes were recurrently affected, including MYH11 (13 CNVs), LZTR1
   (8 CNVs), and NOTCH1 (8 CNVs).

   To identify CNV‐lncRNAs, we used RNA‐seq data from a developmental time
   series of 7 major human organs. For the WGCNA analysis, we first
   identified the subset of protein‐coding genes expressed in any tissue
   across developmental stages. To focus on lncRNAs potentially involved
   in heart development and thus relevant to CHD, we restricted our
   analysis to lncRNAs expressed in the heart. We applied a minimum
   expression threshold of at least 1 TPM, resulting in 18 317
   protein‐coding genes and 57 627 lncRNAs that were expressed at any
   developmental time point across the seven different organs.

   Among the expressed lncRNAs, 24 009 (42%) were expressed in the heart
   at some developmental stage. Of these heart‐expressed lncRNAs, 3 608
   (15%) were located within clinically reported CNVs from the CCVM cohort
   (Figure [141]2E). Out of the original 743 patients, 557 had CNVs
   overlapping 1 or more of these 3 608 heart‐expressed lncRNAs, which we
   refer to as CNV‐lncRNAs. These CNV‐lncRNAs were used as input for the
   WGCNA analysis.

WGCNA‐Based Identification and Functional Analysis of Heart‐Specific Gene
Modules

   For the WGCNA network construction, the TOM adjacency matrix contained
   a pairwise comparison for 21 925 total genes: 18 317 protein‐coding
   genes expressed in any tissue and 3 608 CNV‐lncRNAs derived from the
   intersection of lncRNAs expressed in the heart and CNVs in the CCVM
   cohort. The dendrogram constructed from the hierarchical clustering of
   the distance TOM matrix contains information on the structure of gene
   expression correlations (Figure [142]S6). The hierarchical clustering
   constructs the modules for all genes. Each gene is tagged underneath by
   the color according to the module. For each module identified using the
   TOM‐based clustering, eigengenes were calculated as the first principal
   component of the gene expression profiles, summarizing the overall
   expression pattern within each module. The hierarchical clustering
   initially resulted in 53 modules (Figure [143]S7A). Merging similar
   modules below the dissimilarity threshold of 0.1 led to 40 modules
   (Figure [144]S7B). All 40 modules and their associated counts of
   protein‐coding genes, lncRNA genes, and known CHD genes are shown in
   Data Set [145]S1. Statistical analysis for gene enrichment of lncRNA
   genes and known CHD genes was performed using the hypergeometric
   distribution. Six modules (gray, magenta, paleturquoise, green,
   lightcyan1, and lightyellow) were enriched for lncRNAs. Four modules
   demonstrated enrichment of known CHD genes (darkturquoise,
   darkolivegreen, purple, and magenta). The magenta module (ME03) was the
   only module to display significant enrichment for both lncRNAs (P=8.09
   x 10^−08) and known CHD genes (P=0.0112).

   Pearson correlation values for the module eigengene‐organ association
   across the seven tissues for the 40 modules are depicted in the heatmap
   in Figure [146]3 (modules and their associated correlations and P
   values are available in Data Set [147]S2). The stronger the
   correlation, the higher the intensity of the color, with red
   representing positive correlations, blue representing negative
   correlations, and white representing no correlation. Modules with high
   positive or negative correlations (|r|>0.6) and P values <0.001 are
   considered tissue‐specific. For example, module ME40 (pink) is highly
   specific to the liver (r=0.96, P=5×10^−174), whereas module ME28
   (royalblue) is highly specific to the kidney (r=0.83, P=3×10^−82).

Figure 3. Module‐tissue association heatmap across seven organ systems.

   Figure 3
   [148]Open in a new tab

   This heatmap helps identify gene modules critical for the development
   and function of specific tissues. It illustrates the correlation
   between the modules identified by weighted gene coexpression network
   analysis and the different tissue types across the developmental time
   series. Each row represents a module eigengene, color‐coded according
   to the original module colors prefixed by “ME,” and each column
   corresponds to a specific tissue. Positive correlations are depicted in
   red, negative correlations in blue, and white indicates no correlation.
   The color intensity reflects the strength of the correlation, with
   numerical values in each cell displaying the correlation coefficient,
   followed by the P value in parentheses for statistical significance.
   Only significant correlations (P<0.001) are shown. Modules with high
   positive or negative correlations (>|0.6|) are considered
   tissue‐specific, highlighting potential tissue‐specific gene expression
   patterns. Module ME03 (magenta) notably shows the highest positive
   correlation (R=0.95) with heart tissue, suggesting a strong link to
   heart development and function.

   Hierarchical clustering of the modules demonstrated that the magenta
   module clustered most closely to the heart samples (Figure [149]S8).
   The magenta module (ME03), comprising 567 genes, demonstrated
   significant heart tissue specificity, with a positive correlation value
   of 0.95 and a P value of 4×10^−156 (Data Set [150]S3). This module
   showed no significant positive correlations in the other organs.
   Focusing on the heart samples, we compared each module's eigengene and
   developmental stage, using the age of the samples in postconception
   (Data Set [151]S4). Modules with high positive correlations (in red)
   indicate increasing expression with developmental stage, whereas
   negative correlations (in blue) suggest decreasing expression over time
   (Figure [152]S13). The magenta module had a significant negative
   correlation (r=−0.50, P=0.0002), clustering most closely with the
   mid‐evelopment (11–20 PCW) stage (Figures [153]S14 and [154]S15). In
   the magenta module, eigengene expression increased from conception,
   peaking around fetal middevelopment (10–20 PCW) (Figure [155]4).
   Following this peak, expression levels gradually declined, reaching
   their lowest levels in the mature heart. CNVs affecting 1 or more of
   the 567 genes (425 protein‐coding and 142 lncRNA genes) in the magenta
   module are found across all 8 CHD phenotypes (Data Set [156]S5). In
   total, 154 (21%) of patients with nonsyndromic CHD had a CNV affecting
   the module (Data Set [157]S6).

Figure 4. Magenta module eigengene expression across developmental stages in
the heart.

   Figure 4
   [158]Open in a new tab

   The eigengene expression for the magenta module (ME03) is shown across
   various developmental stages in the heart, plotted by age in weeks PCW.
   The red data points represent the average eigengene expression at each
   stage, with a locally estimated scatterplot smoothing‐smoothed line
   (blue) fitted to highlight the trend over time. The gray shading
   indicates the 95% CI, showing the range within which the trend line is
   expected to fall. Vertical dashed gray lines indicate key developmental
   milestones: 10 weeks, 20 weeks, birth (38 weeks), 1 year, and 5 years
   post conception. These milestones mark significant transitions in heart
   development, providing context for the observed changes in eigengene
   expression. The expression level increases during early development,
   peaking between 10 and 20 PCW and gradually decreasing. PCW indicates
   post conception.

   The functional associations of the magenta module were identified using
   GO terms to reinforce its connection to the heart. Functional
   enrichment of GO terms was performed using the 425 protein‐coding genes
   within the magenta module. The analysis focused solely on
   protein‐coding genes because lncRNA genes generally lack well‐defined
   functional annotations in existing GO databases, making them less
   suitable for GO enrichment analysis. After combining semantically
   similar terms, 159 GO terms were functionally enriched (Data
   Set [159]S7), with the top 10 for each category illustrated in
   Figure [160]S16. These significant GO terms predominantly relate to
   muscle structure and function, highlighting critical processes such as
   muscle cell development, differentiation, and assembly. The identified
   cellular components include key structural elements like the sarcomere,
   myofibril, and contractile fibers. Additionally, molecular functions
   such as actin‐binding, structural constituents of muscle, and electron
   transfer activity further emphasize the role of these genes in muscle
   contraction and energy transduction. Of the 159 enriched GO terms, 26
   were related to the heart and the cardiovascular organ system
   (Figure [161]5). These terms included biological processes related to
   cardiac tissue muscle development (P=4.41×10^−29), heart contraction
   (P=1.44×10^−21), heart morphogenesis (P=2.20×0^−16), cardiac chamber
   development (P=2.79×10^−13), vasculogenesis (P=6.44×10^−05) and
   embryonic heart tube development (P=0.0037).

Figure 5. Functional enrichment of heart‐related biological processes in the
magenta module.

   Figure 5
   [162]Open in a new tab

   This bar chart presents 26 significantly enriched GO BP terms related
   to the heart and cardiovascular system, derived from the protein‐coding
   genes in the magenta module. Semantically similar terms were collapsed,
   and the most significant terms were selected based on their lowest
   adjusted P values (Benjamini–Hochberg method), all <0.05. Each bar
   represents the negative base‐10 logarithm transformation of the
   adjusted P values (−log10(padj)), indicating the relative statistical
   significance. The analysis highlights the magenta module's enrichment
   for genes with key roles in heart development and function. BP
   indicates biological process; and GO, Gene Ontology.

Identification and Analysis of Hub Genes: Central Regulators of
Heart‐Specific Gene Networks

   Genes with high MM, indicating strong connectivity within the module,
   and high GS, indicating a strong association with the heart trait, were
   identified as hub genes (Figure [163]6). These genes are likely key
   drivers of the biological processes relevant to heart function and
   development. The correlation coefficients for the protein‐coding genes
   (turquoise, r=0.968) and lncRNA genes (purple, r=0.972) both indicate
   strong positive correlation with P value <0.001. The correlation cutoff
   for module membership of r≥0.80 led to the identification of 146 hub
   genes (25.7% of the total 567 genes in the module) (Data Set [164]S8).
   Within the hub genes, 18 were lncRNAs (12.3%, P=0.9326). Strikingly, 8
   of the 9 known CHD genes in the module were within the hub,
   representing a highly significant enrichment (P=4.787×10^−6). The
   lncRNA and known CHD hub genes and their corresponding MM and GS values
   are shown in Table [165]1. Our analysis identified 18 lncRNAs as
   potential clinical targets from the CCVM cohort of clinically validated
   CNVs associated with nonsyndromic CHD (Table [166]2).

Figure 6. Identification of hub genes in the magenta module.

   Figure 6
   [167]Open in a new tab

   This scatter plot illustrates the relationship between MM and GS for
   genes within the magenta module (ME03). MM reflects how strongly a gene
   is associated with the module eigengene, and GS measures the
   correlation between a gene's expression and the heart trait. Genes are
   categorized into lncRNAs (purple circles) and protein‐coding genes
   (teal squares). The top hub genes, which are highly connected within
   the module and strongly associated with the heart trait, are emphasized
   by a dashed vertical line at MM=0.8. The plot also includes correlation
   (R) and P values for lncRNA and protein‐coding genes, calculated
   separately and displayed on the graph to indicate the strength and
   significance of the correlation between MM and GS. This visualization
   is crucial for identifying lncRNA hub genes within the module, which
   may play essential roles in the heart‐related biological processes
   associated with the module. GS indicates gene significance; lncRNA,
   long noncoding RNA; and MM, module membership.

Table 1.

   Known CHD and lncRNA Hub Genes From the Magenta Module (ME03)
   Accession Gene symbols/synonyms MM.Cor MM.P value GS.Cor GS.P value
   Classification
   Know CHD protein‐coding genes
   ENSG00000183072 NKX2‐5 0.95 1.215E‐153 0.98 2.980E‐213 Protein‐coding
   ENSG00000134571 MYBPC3 0.93 9.593E‐138 0.93 1.460E‐140 Protein‐coding
   ENSG00000089225 TBX5 0.92 1.236E‐131 0.94 2.262E‐149 Protein‐coding
   ENSG00000092054 MYH7 0.92 7.219E‐130 0.95 5.668E‐156 Protein‐coding
   ENSG00000159251 ACTC1 0.91 3.250E‐119 0.90 1.605E‐113 Protein‐coding
   ENSG00000197616 MYH6 0.90 2.217E‐115 0.95 1.557E‐153 Protein‐coding
   ENSG00000113196 HAND1 0.85 2.988E‐87 0.85 1.218E‐87 Protein‐coding
   ENSG00000164532 TBX20 0.85 1.172E‐87 0.86 2.123E‐94 Protein‐coding
   lncRNA Genes
   HSALNG0131540

   MIR1‐1HG

   MIR133A2HG

   Lnc‐SLCO4A1–8
   0.93 2.219E‐140 0.95 9.435E‐165 Intergenic
   HSALNG0057201

   Lnc‐TBX20‐1

   ENSG00000226063
   0.89 1.122E‐110 0.90 4.646E‐117 Intergenic
   HSALNG0055689 HSALNG0055689 0.88 8.389E‐104 0.89 1.328E‐109 Antisense
   HSALNG0057200 ENSG00000289335 0.88 9.246E‐101 0.87 4.126E‐99 Intergenic
   HSALNG0081800

   Lnc‐STK32C‐3

   ENSG00000226900
   0.88 8.062E‐101 0.83 4.435E‐81 Intergenic
   HSALNG0047724

   Lnc‐ECI2‐2; LOC101927888

   ENSG00000231811
   0.87 2.538E‐98 0.90 2.27E‐112 Intergenic
   HSALNG0092764

   Lnc‐CCDC59‐5

   ENSG00000288941
   0.87 3.134E‐96 0.86 7.883E‐95 Intergenic
   HSALNG0057214 HSALNG0057214 0.85 9.024E‐87 0.82 9.348E‐77 Intergenic
   HSALNG0088492 lnc‐B4GALNT3–4 0.85 3.362E‐90 0.82 2.549E‐78 Antisense,
   Sense
   HSALNG0088499 LINC02455 0.85 7.368E‐88 0.82 6.253E‐77 Intergenic
   HSALNG0096048

   LINC02455

   ENSG00000256672
   0.85 1.990E‐89 0.83 1.008E‐80 Antisense
   HSALNG0062838 HSALNG0062838 0.83 8.777E‐82 0.81 3.391E‐73 Sense
   HSALNG0119665

   ENSG00000272625

   antisense to LPIN2
   0.82 8.212E‐76 0.72 2.238E‐51 Antisense, Sense
   HSALNG0062865 lnc‐RP11‐63E5.6.1–3 0.81 6.605E‐74 0.71 6.227E‐49
   Antisense
   HSALNG0114768

   Lnc‐PMP22‐2

   ENSG00000237377
   0.81 2.341E‐74 0.80 3.837E‐70 Intergenic
   HSALNG0054330 HSALNG0054330 0.80 6.152E‐70 0.76 3.558E‐59 Antisense
   HSALNG0069545 lnc‐DOCK8‐2 0.80 2.181E‐72 0.72 6.540E‐51 Sense
   HSALNG0131539

   CRMA

   ENSG00000174403

   C20orf166‐AS1

   MIR1‐1HG‐AS1
   0.80 1.225E‐70 0.74 4.564E‐56 Intergenic
   [168]Open in a new tab

   For each gene, the table provides the module membership correlation and
   its associated P value, which indicates the strength of each gene's
   association with the magenta module. Additionally, the table shows the
   gene significance correlation and its P value, reflecting the
   correlation between each gene and heart‐related traits. This summary
   highlights hub genes highly correlated with the module, including both
   protein‐coding CHD genes and lncRNAs. CHD indicates congenital heart
   disease; GS.Cor, gene significance correlation; lncRNA, long noncoding
   RNA; and MM.Cor, module membership correlation.

Table 2.

   lncRNA Hub Genes and Associated CNVs and Cardiac Phenotype
   LncBook identifier Overlapping CNV Cardiac phenotype CCVM patient(s)
   HSALNG0047724 6p25.1 loss CTD 003–0293
   HSALNG0054330

   10q23.1 gain

   6q25.1 gain
   CTD 000–0369
   HSALNG0055689 7p22.3p22.2 loss CTD 002–0019

   HSALNG0057200

   HSALNG0057201

   HSALNG0057214
   7p14.2 loss[169]^* CTD 003–0191
   HSALNG0062838 7q36.3 loss LVOTO 001–0182
   HSALNG0062865

   8p23.3 loss

   8p23.3 gain

   CTD

   LVOTO

   004–0065

   003–0116
   HSALNG0069545

   9p24.3 gain (2)

   9p24.3 loss

   CTD

   RVOTO

   LVOTO

   008–0053

   001–0078

   003–0375
   HSALNG0081800 10q26.3 loss (2) RVOTO (2)

   002–0008

   009–0009

   HSALNG0088492

   HSALNG0088499

   12p13.33 gain

   2p16.3 loss
   SD + LVOTO 007–0016
   HSALNG0092764

   12q21.31 loss

   16p13.11 gain
   SD 003–0440
   HSALNG0096048 13q12.3 gain CTD 007–0002
   HSALNG0114768

   2p21 gain

   5p15.2 loss

   17p12 gain
   Atrioventricular septal defect 002–0007
   HSALNG0119665

   18p11.32p11.31 loss

   18p11.32p11.31 gain

   Anomalous pulmonary venous return

   SD

   005–0036

   000–0224

   HSALNG0131539

   HSALNG0131540
   20q13.33 gain SD + LVOTO 001–0285
   [170]Open in a new tab

   Each lncRNA from the magenta module (ME03) is listed alongside the
   corresponding chromosomal region of the overlapping CNV (eg, gain or
   loss) and the associated cardiac phenotype. The table also includes
   patient identifiers and CHD classification from the CCVM cohort who
   exhibit the relevant CNV, directly linking the genetic variations and
   observed heart conditions. CCVM indicates Cytogenomics of
   Cardiovascular Malformations; CHD, congenital heart disease; CNV, copy
   number variant; CTD, conotruncal heart defects; lncRNA, long noncoding
   RNA; LVOTO, left ventricular obstructive lesion; RVOTO, right
   ventricular obstructive lesion; and SD, septal defects.
   ^*

   This CNV also contains the known CHD gene TBX20.

   The relationships within the magenta module were further explored
   through coexpression network visualization (Figure [171]7). To enhance
   the clarity of the network, we applied a stringent filter to the edges,
   retaining only those with a TOM value >0.30. This filtering resulted in
   a network comprising 67 nodes with high MM (MM.Cor ≥0.80) and strong
   coexpression interaction. Known CHD genes in the module were included
   in the network regardless of their edge threshold interaction value. As
   a result, the network included 8 CHD genes, 7 lncRNA genes, and 52
   protein‐coding genes. Notably, all genes in this network displayed a
   coexpression interaction with NKX2‐5, a critical regulator of heart
   development.

Figure 7. Gene network visualization of hub genes in the magenta module.

   Figure 7
   [172]Open in a new tab

   This figure visualizes the coexpression network of hub genes within the
   magenta module (ME03), generated using Cytoscape. Nodes represent hub
   genes, color‐coded by gene type: gold for known CHD genes, purple for
   lncRNAs, and light green for protein‐coding genes. Edges indicate
   coexpression interactions between genes based on the TOM matrix, with a
   threshold of TOM>0.30. Hub genes are those with high module membership
   (R≥0.80). Coexpression edges connecting to lncRNA genes are highlighted
   in red. The network is divided into 2 panels for clarity: A, Seven of
   the 18 CNV‐lncRNA hub genes are coexpressed with NKX2‐5, a critical
   transcription factor for early heart development. B, A highly
   interconnected network of known CHD genes and other protein‐coding
   genes, all coexpressed with NKX2‐5. For clarity, redundant edges were
   removed and indicated with an arrow and curly brace. CHD indicates
   congenital heart disease; CNV‐lncRNA, candidate lncRNA within a
   clinically validated CNV; lncRNA, long noncoding RNA; and TOM,
   topological overlap matrix.

   The Gini index of >36 000 genes from 31 tissues, developed using
   independent human heart RNA‐seq and GTEx data sets, was mapped to the
   hub genes.[173] ^28 The Gini coefficient with respect to the
   distribution of expression levels across samples identifies genes
   likely important in heart development by highlighting those with
   expression patterns specific to the heart during embryogenesis.
   VanOudenhove et al. proposed that candidate CHD genes have a Gini
   coefficient >0.5 and, of the tissues studied, the gene must be
   maximally expressed in the embryonic heart. Within the hub genes
   identified in the magenta module, all 8 known CHD genes and 7 of the 18
   lncRNAs genes had Gini scores from their study (Table [174]3). Four of
   the 7 lncRNA candidates met the CHD gene candidate criteria:
   lnc‐TBX20‐1 (Gini=0.959), lnc‐STK32C‐3 (Gini=0.867), CRMA (Gini=0.662),
   and lnc‐DOCK8‐2 (Gini=0.631). As was observed in the known CHD genes
   MYH6 and MYH7, the other candidate lncRNA genes had high expression in
   adult heart and muscle, or had a Gini score <0.5, thus not meeting
   VanOudenhove et al,'s suggested criteria for a CHD candidate gene.

Table 3.

   Gini Coefficient Analysis of Hub Genes in the Magenta Module
   Accession Symbol Type Gini coefficient Tissue with max expression Max
   expression
   ENSG00000226063 Lnc‐TBX20‐1 (HSALNG0057201) lncRNA 0.959 Embryonic
   heart 167
   ENSG00000164532 TBX20 CHD 0.948 Embryonic heart 5 117
   ENSG00000197616 MYH6 CHD 0.936 Heart 323 262
   ENSG00000183072 NKX2‐5 CHD 0.935 Embryonic heart 9 467
   ENSG00000134571 MYBPC3 CHD 0.934 Heart 91 682
   ENSG00000231811 Lnc‐ECI2‐2 (HSALNG0047724) lncRNA 0.93 Heart 85
   ENSG00000174407 MIR1‐1HG (HSALNG0131540) lncRNA 0.918 Muscle 5 447
   ENSG00000159251 ACTC1 CHD 0.915 Embryonic heart 290 948
   ENSG00000092054 MYH7 CHD 0.905 Muscle 457 406
   ENSG00000113196 HAND1 CHD 0.889 Embryonic heart 2 246
   ENSG00000089225 TBX5 CHD 0.873 Embryonic heart 5 400
   ENSG00000226900 Lnc‐STK32C‐3 (HSALNG0081800) lncRNA 0.867 Embryonic
   heart 567
   ENSG00000174403 CRMA/MIR1‐1HG‐AS1 (HSALNG0131539) lncRNA 0.662
   Embryonic heart 2167
   ENSG00000226403 lnc‐DOCK8‐2 (HSALNG0069545) lncRNA 0.631 Embryonic
   heart 398
   ENSG00000272625 antisense to LPIN2 (HSALNG0119665) lncRNA 0.251
   Embryonic heart 81
   [175]Open in a new tab

   This table presents the Gini coefficients for hub genes identified in
   the magenta module (ME03). The table includes hub genes that are either
   known CHD genes or lncRNAs. The Gini coefficient, a measure of
   tissue‐specific gene expression, is calculated for each gene, with
   higher values indicating tissue‐specific expression. Genes with a Gini
   coefficient>0.5 and maximal expression in the embryonic heart are
   considered to be candidate CHD genes. Out of 8 known CHD hub genes, 5
   were found to have significant embryonic heart‐specific expression
   (Gini coefficient >0.5). Among the 18 lncRNA hub genes, 7 of the 9 with
   Ensembl accessions have a Gini coefficient, and 5 are maximally
   expressed in the embryonic heart. Of these, 4 lncRNAs exhibit
   significant heart‐specific expression with a Gini coefficient >0.5. CHD
   indicates congenital heart disease; and lncRNA, long noncoding RNA.

DISCUSSION

   We applied WGCNA to cluster CNV‐lncRNAs using a similar approach to
   that described for the heart and kidney.[176] ^34 , [177]^47
   CNV‐associated lncRNAs were identified in clinically reported CNVs from
   a cohort of patients with isolated CHD. The use of standardized,
   clinical‐grade CMA platforms across multiple centers ensures the
   reliability of CNV detection and enhances the reproducibility of our
   results. Using human organ transcriptomic time‐series data for the
   development of 7 major organs, we clustered the gene expression of both
   protein‐coding and CNV‐lncRNA genes into modules. These modules were
   then correlated with specific organs, allowing us to identify a module
   of coexpressed genes highly specific to the developing heart. We
   identified 18 CHD‐candidate lncRNA genes within this heart‐specific
   module (the magenta module, ME03). Seven of these 18 lncRNAs had
   significant coexpression relationships with known CHD genes, and 4 were
   confirmed as candidate CHD genes in an orthogonal data set examining
   gene expression patterns specific to the heart during embryogenesis.
   Our study offers a well‐documented and reproducible programming
   pipeline that applies a widely used method in a novel way to identify
   candidate lncRNAs within a disease‐specific context and predict their
   functions. This approach empowers researchers to systematically study
   CNV‐lncRNAs, providing a comprehensive toolset for future
   investigations. Moreover, our work expands the analysis of CNV‐lncRNAs
   by using a larger and more recent data set of clinically confirmed CNVs
   from patients with CHD in the CCVM data set, further enhancing the
   scope and applicability of the research.

   Clinical genetics in CHD primarily emphasizes well‐known protein‐coding
   genes, as evidenced by resources such as the high‐confidence CHDgene
   list.[178] ^7 However, our analysis of the CCVM cohort revealed that
   only 72 of the 953 clinically reported CNVs (7.6%) affected 1 of these
   known CHD genes, suggesting that many potential pathogenic factors
   remain unidentified. By contrast, 656 of the 953 clinically reported
   CNVs (68.8%) contain 1 or more lncRNAs expressed in the heart, that is,
   CNV‐lncRNAs, underscoring the need to consider the noncoding genome as
   a crucial factor in the genetic cause of CHD.

   Through WGCNA, we identified a highly heart‐correlated module (the
   magenta module, ME03, Figure [179]3) consisting of 567 genes with
   significant heart tissue specificity, clustering most closely with the
   middevelopment stage (11–20 PCW). No significant positive correlations
   for this module were observed in other organs. CNVs affecting these
   genes were present across all 8 CHD phenotypes, with 21% of patients
   with nonsyndromic CHD carrying a CNV affecting the module. In this
   module, eigengene expression increased steadily from conception,
   peaking during fetal middevelopment (10–20 PCW) (Figure [180]4). After
   this peak, expression levels gradually declined, reaching their lowest
   point in the mature heart. This expression pattern suggests that the
   genes within the magenta module are most active during early to
   mid‐heart development, a critical period for structural refinement,
   including chamber septation, conduction system development, and the
   initiation of blood circulation. Functional enrichment analysis of the
   protein‐coding genes in this module identified multiple enriched GO
   terms directly linked to heart and cardiovascular development,
   highlighting processes such as cardiac muscle development, heart
   contraction, and morphogenesis (Figure [181]5).

   Additionally, the analysis revealed significant enrichment of genes
   involved in mitochondrial function, energy production, and metabolic
   processes, underscoring the critical role of cellular energy dynamics
   in heart development (Data set [182]S7 and Figure [183]S16). Given the
   essential role of mitochondria in providing the energy required for
   cardiac muscle contraction and overall cardiovascular function, this
   enrichment of mitochondrial and metabolic genes suggests that
   dysregulation in energy metabolism may contribute to CHD pathogenesis.
   Overall, the developmental gene expression pattern and the identified
   GO terms suggest that genes in this module likely play pivotal roles in
   heart development and may serve as candidate genes for CHD.

   Within the module of genes highly correlated with the developing heart,
   we identified 18 highly connected lncRNA hub genes strongly correlated
   with the module eigengene (Figure [184]6). These genes are likely key
   drivers of the biological processes relevant to heart function and
   development. However, the lack of Ensembl accession numbers for many of
   our candidate lncRNAs limited our search for external information.
   Notably, only 9 of these lncRNAs possess recognized Ensembl
   identifiers; the other 9 were putative lncRNAs identified by LncBook.
   This highlights significant constraints in current gene annotation
   methods, which are often too stringent for noncoding RNA. For instance,
   Ensembl's criteria exclude intergenic lncRNAs with open reading frames
   that exceed 35% of their length or contain protein domains.[185] ^48
   Consequently, our analysis and subsequent discussions were confined to
   those lncRNA candidates with Ensembl identifiers, limiting the full
   exploration of the potential roles of the remaining putative lncRNAs in
   heart development and emphasizing the need for more inclusive gene
   annotation frameworks for noncoding RNAs.

   Seven of the 18 CNV‐lncRNA hub genes were coexpressed with NKX2‐5, a
   well‐known transcription factor critical for early heart development:
   HSALNG0057201 (Lnc‐TBX20‐1), HSALNG0057200 (ENSG00000289335),
   HSALNG0055689, HSALNG0092764 (Lnc‐CCDC59‐5), HSALNG0096048
   (ENSG00000256672), HSALNG0047724 (Lnc‐ECI2‐2), and HSALNG0131540
   (MIR1‐1HG) (Figure [186]7A). NKX2‐5 is expressed in the first and
   second heart fields to maintain chamber‐specific identities.[187] ^49
   Pathogenic variants in NKX2‐5 are associated with a variety of CHDs,
   including atrial[188] ^50 and ventricular SDs,[189] ^51
   atrioventricular septal defects,[190] ^52 and others.[191] ^53 ,
   [192]^54 Notably, HSALNG0131540 or MIR1‐1HG was also significantly
   coexpressed with well‐established CHD genes, TBX5,[193] ^55 , [194]^56
   MYH6,[195] ^57 , [196]^58 and MYH7 [197]^59 , [198]^60
   (Figure [199]7B). The coexpression of these lncRNAs with NKX2‐5 and
   other known CHD genes underscores their potential roles in heart
   development and cardiac function. This suggests that they may be
   involved in similar developmental pathways and could contribute to the
   pathogenesis of congenital heart defects.

   Four of our CNV‐lncRNA hub genes—lnc‐TBX20‐1, lnc‐STK32C‐3, CRMA, and
   lnc‐DOCK8‐2—were also highly ranked as candidate CHD genes in an
   external validation (Table [200]3). These genes met the criteria of
   having the highest expression in the embryonic heart across 32
   different organs and tissues, along with a significant Gini index,
   indicating strong heart‐specificity.[201] ^28 Established CHD genes
   that meet the Gini index criteria include TBX20 (Gini=0.948) and NKX2‐5
   (Gini=0.935). Unexpectedly, the three known CHD genes—MYBPC3, MYH6, and
   MYH7—were excluded from VanOudenhove's candidate gene set because their
   expression extends beyond the embryonic heart into the adult heart and
   muscle. Despite the Gini index limitations to score all known CHD
   genes, this index did confirm lnc‐TBX20‐1, lnc‐STK32‐C, CRMA, and
   lnc‐DOCK8‐2—as potential contributors to heart‐specific gene regulation
   and strengthens their candidacy for further investigation in CHD
   research.

   The intergenic lncRNA lnc‐TBX20‐1 (HSALNG0057201) was particularly
   interesting due to its high Gini index and proximity to the known CHD
   gene TBX20.[202] ^61 Of the known CHD genes and CNV‐lncRNAs,
   lnc‐TBX20‐1 had the highest Gini index (0.959) and peak expression in
   the embryonic heart, making it a strong CHD candidate lncRNA. The CNV
   associated with lnc‐TBX20‐1 is a 350 Kb loss of 7p14.2 in a patient
   with CTD. This region contains the known CHD gene, TBX20, and 2 other
   lncRNA candidates, HSALNG0057200 and HSALNG0057214. The location of
   lncRNAs is important as lncRNAs are known to regulate nearby
   genes.[203] ^11 Although it is not possible to determine whether the
   effects of this pathogenic CNV are due to the loss of TBX20 or the
   lncRNA genes, the coexpression of lnc‐TBX20‐1 and HSALNG0057200 with
   NKX2‐5 is particularly intriguing. NKX2‐5 and TBX20 are transcription
   factors that interact in heart development regulation.[204] ^62 This
   raises the possibility that these lncRNAs may have regulatory roles in
   connection with TBX20 and potentially NKX2‐5, though experimental
   evidence is still needed to support this hypothesis.

   lncRNAs function in processes ranging from chromatin modification to
   nuclear organization and are emerging as critical players in
   cardiovascular development and disease. However, their complexity often
   extends beyond the traditional definitions of noncoding genes, as
   exemplified by lnc‐STK32C‐3's (HSALNG0081800) coding potential.[205]
   ^13 The CNVs associated with lnc‐STK32C‐3 are 2 separate 2.9 MB
   deletions of 10q26.3 observed in 2 CCVM patients with right VOTO. We
   identified this CNV‐lncRNA as a candidate CHD gene with the highest
   module membership in the magenta module, indicating specificity in
   cardiac development. This gene was also indicated as a candidate CHD
   gene by the Gini index with a highly significant Gini index of 0.867
   and is most highly expressed in the embryonic heart across the 32
   organs and tissues transcriptionally profiled.[206] ^28 We are the
   first to suggest that this lncRNA is involved in the initial formation
   of the heart. Previous research suggests that lnc‐STK32C‐3 is
   abundantly expressed in ischemic hearts, and it is suggested to be
   involved in myocardial injury.[207] ^14 This aligns with the broader
   understanding that genes can turn on and off in response to
   developmental cues and cardiac injury, as shown by the protein‐coding
   gene VEGF‐B.[208] ^63 Therefore, if protein‐coding genes can have dual
   roles under different conditions expressed by the heart, it is
   plausible that lncRNAs can, too. Lastly, although it is known that
   lnc‐STK32C‐3 encodes microproteins and secretes them in the adult
   heart,[209] ^27 whether it can also perform this function during
   cardiac development remains to be explored.

   CRMA is a known regulator of cardiac differentiation and has been
   experimentally validated as important for heart development.[210] ^24 ,
   [211]^64 Variants affecting CRMA are, therefore, likely pathogenic and
   could help explain the CHD phenotype. The CNV associated with CRMA is a
   2.9 MB gain in the 20q13.3 region in a patient with SD and left VOTO.
   In addition to CRMA, this CNV affects the known CHD gene GATA5 [212]^65
   , [213]^66 and the lncRNA MIR1‐1HG.[214] ^24 MIR1‐1HG is the precursor
   of 2 cardiogenic miRNAs, known as MIR1‐1 and MIR‐133a2. GATA5 has been
   implicated in CHD only through loss‐of‐function mutations in
   humans,[215] ^65 whereas in our cohort patients with 20q13.3
   amplification, it is possibly impacting CHD pathogenesis through a
   gain‐of‐function mutation. When CRMA is expressed, it downregulates
   MIR1‐1 and MIR133a2, pushing embryonic stem cells into a mature cardiac
   fate.[216] ^24 , [217]^64 Although the knockdown of CRMA promotes
   cardiac differentiation through increased expression of MIR‐133a2,
   MIR‐133a1 targets RBPJ to inhibit the NOTCH pathway.[218] ^24 We
   hypothesize that in a gain‐of‐function mutation, CRMA may suppress the
   cardiac miRNAs expression beyond normal levels, leading to
   dysregulation of the NOTCH pathway causing cardiac malformations.
   Together, these networks of genes are required for the coordinated
   regulation of cardiogenic differentiation in embryonic stem cells.

   Three CNVs affecting 9p24.3 were associated with lnc‐DOCK8‐2: a 305 Kb
   duplication in a patient with right VOTO, a 354 Kb duplication in a
   patient with CTD, and a 100 Kb deletion in a patient with left VOTO.
   Although the CNVs do not affect any known CHD genes, in addition to the
   lncRNA, these smaller CNVs affect 2 protein‐coding genes, DOCK8 and
   KANK1. DOCK8 is a known Mendelian disease gene, resulting in a rare
   autosomal recessive immunologic disorder characterized by recurrent
   staphylococcal infections of the skin and respiratory tract.[219] ^67
   KANK1 is an essential regulator of the actin cytoskeleton, similar to
   Rho GTPases, which also regulate actin dynamics through ROCK
   (Rho‐kinase), and is required for myogenic differentiation.[220] ^68
   Rho GTPase signaling is involved in cardiac looping and early
   development of the heart, and dysregulation of this pathway may lead to
   structural heart defects.[221] ^69 Whether lnc‐DOCK8‐2 regulates either
   of these genes is unknown, and like the majority of CNV‐lncRNAs that we
   identified, further studies are required.

   Reproducible bioinformatics methods and well‐documented software are
   still significant barriers to advancing biomedical research, often
   hindering the ability to replicate and build upon existing studies. To
   address this challenge, we developed comprehensive and well‐documented
   R notebooks that streamline our analytical workflow and fully automate
   the generation of all tables and figures, enabling easy replication and
   modification of the entire pipeline. This approach provides a robust
   and adaptable framework standardizing coexpression analysis to
   CNV‐lncRNAs that can be applied to other genomic data sets,
   facilitating broader investigations into the roles of lncRNAs in
   tissue‐specific and disease‐specific contexts. As annotations of
   noncoding RNAs continue to improve and new genomic studies of patients
   with CHD emerge, our methods can be reused and extended to uncover
   additional candidate genes and pathways involved in CHD.

   Although our study has identified several candidate lncRNAs associated
   with CHD, further experimental work is needed to validate these
   findings and explore the mechanistic roles of these lncRNAs. Future
   directions include verifying the expression and functional roles of all
   identified lncRNAs through experimental validation. This process should
   involve characterizing their spatial and temporal expression patterns
   in cardiomyocyte cells under both normal and pathological conditions.
   For instance, while the downregulation of lncRNA CRMA has been studied,
   it remains unexplored in terms of upregulation. Evaluating the effects
   of CRMA upregulation in cell culture systems could provide valuable
   insights into its role in cardiac development. Moreover, CRMA's known
   binding sites for miR‐133a and miR‐1‐1, 2 critical miRNAs in
   cardiogenesis, warrant further investigation to assess how its
   expression modulates these regulatory molecules.

   Another compelling candidate, lnc‐TBX20‐1, is coexpressed with NKX2‐5,
   a transcription factor pivotal for heart development. To better
   understand their relationship, future studies could use coexpression
   experiments in human cardiomyocyte cell lines, such as those with
   NKX2‐5‐tagged systems, to elucidate their potential regulatory
   interactions and their roles in CHD pathogenesis. For other novel
   lncRNAs identified in this study, it is essential first to characterize
   their expression in cardiomyocytes to determine their roles during
   heart development. Once expression patterns are established, functional
   studies could investigate the effects of upregulation and
   downregulation on cardiomyocyte processes such as differentiation,
   proliferation, and apoptosis.

   By pursuing these experimental directions, future research will
   complement the computational findings of this study, clarifying the
   molecular roles of lncRNAs in CHD. These efforts, along with the
   continued adaptation of our pipeline to incorporate new data sets and
   improved lncRNA annotations, will advance our understanding of CHD
   pathogenesis and may lead to novel therapeutic targets.

CONCLUSIONS

   In this study, we identified and characterized previously unrecognized
   lncRNAs associated with well‐established protein‐coding CHD genes,
   confirming prior findings while revealing new potential links, such as
   the involvement of lnc‐DOCK8‐2 and HSALNG0096048 in our cohort with
   CHD. By focusing on nonsyndromic CHD, this study aimed to reduce
   confounding factors introduced by systemic phenotypes common in
   syndromic cases. This approach allowed us to prioritize
   cardiac‐specific candidate genes, which are more likely to contribute
   directly to CHD pathogenesis. Although syndromic CHD remains an
   important area for future research, requiring multitissue analyses and
   broader data sets, we believe the insights gained from this study
   provide a foundation for future investigations into both syndromic and
   nonsyndromic CHD.

   This study specifically investigates structural CHD, as the CCVM
   Consortium study was designed to identify CNVs associated with
   structural abnormalities. Although nonstructural CHD, such as
   arrhythmias, is an important area of study, its genetic basis is
   typically linked to SNVs or small indels, which are better addressed
   through sequencing‐based techniques rather than chromosomal microarray
   analysis. Moreover, many nonstructural CHD cases are secondary to
   maternal conditions or metabolic disorders and are less likely to
   involve structural genomic changes. Future studies employing sequencing
   approaches may provide insights into the genetic underpinnings of these
   conditions.

   By applying the advanced bioinformatics approach of WGCNA, we uncovered
   insights into the complexities of CNV‐associated lncRNAs within a CHD
   context. These findings emphasize the potential pathogenic roles of
   noncoding regions, which are often overlooked by traditional genetic
   studies. Our work highlights the significant, yet underexplored,
   contribution of lncRNAs to the genomics of CHD, opening new pathways
   for diagnosing and understanding the genetic basis of CHD.

   In addition to these discoveries, we have made our fully documented R
   code publicly accessible, enabling other researchers to replicate and
   build upon our findings. This transparency facilitates further
   investigation into the uncharted territory of lncRNAs, their critical
   roles in heart development, and their probable link to CHD. As we
   continue to advance our understanding of noncoding RNAs and their role
   in CHD, we anticipate that this research will ultimately lead to better
   diagnostic tools and improved outcomes for individuals affected by CHD.
   As genome sequencing is incorporated into the standard of care for
   patients with critical CHDs, we encourage clinical labs to consider
   reporting CNVs and single nucleotide variants affecting lncRNAs.

APPENDIX

   Additional Cytogenomics of Cardiovascular Malformations (CCVM)
   Consortium Contributors: Lindsey R. Helvaty, Department of Pediatrics
   and Department of Medical and Molecular Genetics, Indiana University
   School of Medicine, Indianapolis, Indiana, United States of America
   [222]https://orcid.org/0000‐0002‐4234‐419X.

   Gabrielle C. Geddes, Department of Pediatrics and Department of Medical
   and Molecular Genetics, Indiana University School of Medicine,
   Indianapolis, Indiana, United States of America
   [223]https://orcid.org/0000‐0003‐2077‐4997.

   Jennelle C. Hodge, Department of Pediatrics and Department of Medical
   and Molecular Genetics, Indiana University School of Medicine,
   Indianapolis, Indiana, United States of America
   [224]https://orcid.org/0000‐0002‐1770‐4042.

   Vidu Garg, Center for Cardiovascular Research, Nationwide Children's
   Hospital, Columbus, Ohio, United States of America and Department of
   Pediatrics, The Ohio State University College of Medicine, Columbus,
   Ohio, United States of America
   [225]https://orcid.org/0000‐0002‐3778‐5927.

   Cecilia W. Lo, Department of Developmental Biology, University of
   Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
   [226]https://orcid.org/0000‐0003‐4314‐3434.

   Svetlana A. Yatsenko, Department of Pathology, Stanford University,
   Stanford, United States of America
   [227]https://orcid.org/0000‐0003‐4809‐8601.

   Jiuann‐Huey Ivy Lin, Department of Critical Care Medicine, University
   of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
   [228]https://orcid.org/0000‐0003‐3029‐6079.

   Stephanie Burns Wechsler, Department of Pediatrics, Emory University
   School of Medicine, Atlanta, Georgia, United States of America
   [229]https://orcid.org/0000‐0002‐9321‐0220.

   Seema R. Lalani, Department of Molecular and Human Genetics, Baylor
   College of Medicine, Houston, Texas, United States of America
   [230]https://orcid.org/0000‐0003‐0707‐657X.

Disclosures

   None.

Source of Funding

   Research reported in this publication was supported by the American
   Heart Association Transformational Award AHA 19TPA34850054 (Stephanie
   M. Ware) and the National Heart, Lung, and Blood Institute (NHLBI)
   under Award Number F31HL168950 (Jacqueline S. Penaloza). The content is
   solely the authors' responsibility and does not necessarily represent
   the official views of these funding agencies.

Supporting information

   Supplemental Methods

   Tables S1–S2

   Figures S1–S16

   Data Sets S1–S8
   [231]JAH3-14-e039177-s001.zip^ (33.4MB, zip)

Acknowledgments