Abstract Background Copy number variants (CNVs) contribute to 3% to 10% of isolated congenital heart disease (CHD) cases, yet their pathogenic roles remain unclear. Diagnostic efforts have focused on protein‐coding genes, largely overlooking long noncoding RNAs (lncRNAs), which play key roles in development and disease. Methods AND RESULTS We systematically analyzed lncRNAs overlapping clinically validated CNVs in 743 patients with CHD from the Cytogenomics of Cardiovascular Malformations Consortium. We identified heart‐expressed lncRNAs, constructed a gene regulatory network using weighted gene coexpression network analysis, and identified gene modules associated with heart development. Functional enrichment and network analyses were used to identify lncRNAs that may be involved in heart development and potentially contribute to CHD. The code is stably archived at [36]https://doi.org/10.5281/zenodo.13799779. We identified 18 lncRNA candidate genes within modules significantly correlated with heart tissue, highlighting their potential involvement in CHD pathogenesis. Notably, lncRNAs such as lnc‐STK32C‐3, lnc‐TBX20‐1, and CRMA demonstrated strong associations with known CHD genes. Strikingly, although only 7.6% of known CHD genes were affected by a CNV, 68.8% of the CNVs contained a lncRNA expressed in the heart. Conclusions Using weighted gene coexpression network analysis, we identified CNV‐associated lncRNAs with potential relevance to CHD, underscoring the complexities of noncoding regions in disease pathogenesis. These findings suggest that lncRNAs may play a greater role in CHD than previously recognized, highlighting the need for broader genomic analyses that extend beyond protein‐coding genes. This study provides a foundation for further exploration of lncRNAs in CHD, with potential implications for improved genetic characterization and diagnosis. Keywords: congenital heart disease, copy number variants, genomic regulation in cardiovascular development, long noncoding RNAs, weighted gene coexpression network analysis Subject Categories: Computational Biology, Etiology, Genetics, Basic Science Research __________________________________________________________________ Nonstandard Abbreviations and Acronyms CCVM Cytogenomics of Cardiovascular Malformations CHDgene Congenital Heart Disease Gene Database CNV‐lncRNA candidate lncRNA within clinically validated copy number variants GO Gene Ontology LncExpDB Long Non‐Coding RNA Expression Database PCW postconception weeks ROH runs of homozygosity TOM topological overlap matrix TPM transcripts per million WGCNA weighted gene coexpression network analysis Research Perspective. What Is New? * Our study underscores the critical yet often overlooked role of heart‐expressed long noncoding RNAs in congenital heart disease, identifying 18 candidate long noncoding RNAs linked to key developmental pathways and shedding light on the previously underappreciated contributions of noncoding regions to congenital heart disease pathogenesis. What Question Should Be Addressed Next? * Future studies should investigate the functional roles of these candidate long noncoding RNAs in heart development, including experimental validation of their regulatory interactions with known congenital heart disease genes and their potential as diagnostic biomarkers or therapeutic targets. Congenital heart disease (CHD) is one of the most common congenital malformations, with a birth prevalence of 1% worldwide.[37] ^1 In 60% to 75% of CHD cases, the genetic cause of the disease is unknown.[38] ^2 Copy number variants (CNVs), which are subchromosomal variations in the number of copies of DNA segments ranging in size from 1000 base pairs to several megabases (Mb), account for 3% to 10% of the pathogenic causes of isolated CHD.[39] ^2 Some recurrent CNVs are associated with a specific type of CHD, including 22q11.2 Deletion Syndrome (also known as DiGeorge syndrome, a ~3 Mb microdeletion affecting TBX1, a T‐box transcription factor required for normal development of the second heart field),[40] ^3 Williams–Beuren syndrome (deletion of 7q11.23 that leads to the haploinsufficiency of nearly 25 genes, including the ELN gene that codes for elastin, important for blood vessel elasticity and associated with aortic and pulmonary stenosis)[41] ^4 and 1p36 deletion syndrome (the most common terminal deletion syndrome with cardiac findings that include congenital heart defects and cardiomyopathy).[42] ^5 In addition to these clinical syndromes, CNVs are linked with a considerably greater risk of nonsyndromic CHD.[43] ^6 Nonetheless, many distinct CNVs are observed in patients who have the same CHD diagnosis, and often, their contribution to CHD pathogenesis remains unclear. The precise mechanisms by which CNVs lead to nonsyndromic CHD are still largely unknown, necessitating further investigation into their functional impact and the underlying biological pathways involved. When considering the molecular genetics of CNVs in CHD, disease pathogenesis can be attributed to changes in the number of gene copies within the CNV. Although the established diagnostic approach in CHD primarily emphasizes well‐known protein‐coding genes, as evidenced by resources such as the CHDgene website (a curated list of 142 high‐confidence protein‐coding CHD genes),[44] ^7 it is becoming increasingly apparent that variation in noncoding regions, which constitute about 99% of the genome, may contribute to the pathogenicity of CNVs.[45] ^8 Approximately 60% to 80% of the human genome can be transcribed into noncoding RNAs, including microRNAs (miRNAs), long noncoding RNAs (lncRNAs), small nucleolar RNAs, and several other classes of RNA molecules that provide various regulatory and functional roles in the genome. lncRNAs are RNA molecules longer than 200 nucleotides that usually do not have a coding sequence.[46] ^9 , [47]^10 , [48]^11 They are involved in diverse regulatory roles in gene expression, such as modifying chromatin, regulating transcription and posttranscriptional processes, and regulating RNA processing.[49] ^10 , [50]^11 lncRNAs originate from various genomic regions, including intergenic, intronic, and overlapping regions with other genes.[51] ^9 , [52]^12 At the DNA level, lncRNAs influence chromatin architecture by recruiting chromatin‐activating complexes or looping factors, thereby regulating target genes.[53] ^10 They also play significant roles in alternative splicing by interacting with splicing factors to control transcription rates and ensure proper gene expression.[54] ^12 Posttranscriptionally, lncRNAs interact with RNA‐binding proteins through sequence or structural motifs to regulate mRNA splicing. Furthermore, lncRNAs can act as miRNA sponges, binding to miRNAs and preventing their interaction with mRNAs, which is crucial for gene expression.[55] ^10 Lastly, lncRNAs regulate protein stability and degradation by altering their structure to interact with proteins involved in key signaling pathways. Together, these roles make lncRNAs vital for differentiation and development, affecting gene regulation at the DNA, RNA, and protein levels. lncRNAs have been implicated in various biological processes and diseases. Their dysregulation can lead to a variety of conditions, including cardiovascular diseases,[56] ^13 , [57]^14 , [58]^15 neurodegenerative disorders,[59] ^16 , [60]^17 , [61]^18 , [62]^19 and other complex diseases.[63] ^20 , [64]^21 , [65]^22 , [66]^23 Although the role of lncRNAs in CHD is emerging,[67] ^13 , [68]^14 , [69]^15 their function as key regulators of gene expression during heart development suggests that they could be critical players in the pathogenesis of CHD. The lncRNA lnc‐STK32C/RP11‐432J24.5, associated with responses to stress, stimuli, and the immune system, is differentially expressed in ischemic hearts.[70] ^14 CRMA (also known as CARMA for Cardiomyocyte Maturation‐Associated lncRNA), previously known as MIR1‐1HG‐AS1, is an antisense lncRNA that regulates NOTCH1 signaling by targeting the miRNAs MIR1‐1, MIR1‐133a2, and other noncoding RNAs.[71] ^24 Notably, in vitro knockdown of CRMA promotes cardiac commitment, suggesting that its downregulation is crucial for directing progenitor cells toward a cardiac fate. The intergenic lncRNA BANCR (BRAF‐Activated Nonprotein Coding RNAs; HSALNG0071734)[72] ^25 has been reported to be upregulated in atherosclerotic plaques[73] ^26 and may influence the development of cardiomyopathy.[74] ^13 This lncRNA encodes a short open reading frame that produces a microprotein that regulates various cellular functions, including contributing to heart muscle structural integrity and function. BANCR has been shown to facilitate cardiomyocyte migration and ventricular enlargement during heart formation in primates, with its expression confirmed to be higher during this critical period of heart development.[75] ^27 , [76]^28 ClinVar, a database of genetic variants, contains several reports of pathogenic CNVs affecting the 9q21.11–21.13 region, which includes the gene BANCR.[77] ^29 , [78]^30 However, these CNVs affect several genes in the region, and BANCR has not yet been directly associated with CHD. Several CNVs affecting lncRNAs have been described in patients with CHD. In DiGeorge syndrome, the 22q11.2 deletion spans the antisense lncRNA lnc‐TSSK2‐8 (HSALNG0134007), which activates canonical Wnt signaling by protecting β‐catenin from degradation.[79] ^31 Previously, only TBX1 was thought to be the disease‐causing gene in 22q11.2, but evidence suggests that the loss of lnc‐TSSK2‐8 contributes to the development of CHD.[80] ^32 , [81]^33 Another intergenic lncRNA associated with CHD is lnc‐NIPA1‐4 (HSALNG0104472), located in the pathogenic CNV 15q11.2, where in vitro studies have shown that knockout of this gene significantly affects cardiomyocyte differentiation.[82] ^34 Despite their significance, the role of most lncRNAs in human health and disease remains largely unknown. This study aims to identify novel lncRNAs that may play a role in the pathogenesis of CHD by examining a large cohort of patients with CHD with candidate lncRNAs within clinically validated CNVs (CNV‐lncRNAs). This study focuses on nonsyndromic CHD to reduce confounding factors associated with syndromic cases, which often present with additional systemic abnormalities. By narrowing the scope to nonsyndromic CHD, we aim to identify cardiac‐specific candidate genes and pathways, providing a clearer foundation for future research into the molecular basis of both syndromic and nonsyndromic CHD. By leveraging transcriptomic data from the developing heart and six other organs, we used a data mining method known as weighted gene coexpression network analysis (WGCNA) to assess the expression patterns, interaction networks, and potential mechanisms by which CNV‐lncRNAs may influence CHD. We identified several CNV‐lncRNAs with strong associations with key developmental pathways implicated in CHD. Furthermore, to promote transparency and reproducibility, we have made the software code and methodologies used in this analysis publicly available ([83]https://github.com/nch‐igm/lncCHDNet/). By providing our codebase as a reusable resource, we aim to contribute to a more comprehensive understanding of lncRNA function in CHD and pave the way for future discoveries in the field. METHODS Description and Structure of the CNV Data Set of Patients With CHD Figure [84]1 gives an overview of the process we followed to identify lncRNAs affected by CNVs that may play a role in the cause of CHD. To elucidate the potential role of copy number changes in lncRNA genes in abnormal heart development, we leveraged the recently published CHD cohort from the Cytogenomics of Cardiovascular Malformations (CCVM) Consortium (see Table [85]S1 contributing organizations that provided CNV data used in this study).[86] ^35 This data set contains 1363 patients with CHD with both an abnormal echocardiogram and an abnormal finding on chromosomal microarray analysis (CMA). CMA is a genome‐wide technique that identifies genomic gains or losses (CNVs) and runs of homozygosity (ROH). It is recommended as a first‐tier test for patients with neurodevelopmental disorders and congenital anomalies[87] ^36 and has been integrated into routine practice at many pediatric cardiac centers for infants with severe CHD.[88] ^37 Using a centralized echocardiography review, the CCVM registry meticulously categorized each patient's CHD using a hierarchical classification to generalize the cardiac phenotypes.[89] ^35 Specific structural heart defects identified through clinical evaluations included, but were not limited to, septal defects (SDs), conotruncal heart defects (CTDs), and left and right‐sided obstructive lesions. The cohort included about an equal proportion of male and female patients. Race and ethnicity were self‐reported, with 60% of patients reported as White, 14% as Black, 8% of patients identifying as Hispanic or Latino, 3% as Asian, and 0.4% as Native American. These proportions align with the regional demographics of the participating centers. For more detailed information regarding the cohort demographics, please refer to the CCVM study.[90] ^35 All patients had clinically reported pathogenic CNVs as assessed by the American College of Medical Genetics and Genomics guidelines.[91] ^38 Both syndromic and isolated cases of CHD are present in this cohort to allow for comprehensive phenotypic and genotypic correlations. The CNV data set applied in this study is available through the CCVM Consortium. Figure 1. Workflow for identifying lncRNA candidates in CHD. Figure 1 [92]Open in a new tab This figure provides an overview of the multistep methodological approach used in our analysis to investigate the roles of lncRNAs in CHD. In the first step, we integrate genomic information about lncRNAs from our cohort with expression data. In the second step, weighted gene coexpression network analysis is applied to identify modules that exhibit correlated expression across various tissues. The third step involves identifying hub genes, focusing on lncRNA candidates in the heart‐specific modules. Finally, the network is visualized to highlight the interrelationships among CHD‐related genes, lncRNAs, and protein‐coding genes, showcasing the comprehensive methods applied in our analysis. CNV indicates copy number variant; CNV‐lncRNA, candidate lncRNA within a clinically validated CNV; and lncRNA, long noncoding RNA. Microarray Technology The CMA data were obtained from clinical‐grade, commercially available platforms, with each center following the respective manufacturer protocols. Platforms included oligo‐based and oligo+SNP arrays from Agilent, Roche, PerkinElmer, and Affymetrix, among others. Data processing followed clinical standards, ensuring equivalent resolution across platforms. Fluorescence in situ hybridization was occasionally used for confirmatory testing, as described in the initial CCVM study.[93] ^39 Ethics Approval and Consent to Participate The CCVM Consortium's collection of CMA and clinical data was approved by each clinical center's institutional review board and used a waiver of informed consent. All data were anonymized before access and we followed general ethical guidelines for data use, including respecting privacy, data integrity, and intellectual property rights. Preprocessing of the CHD Cohort The CCVM data set underwent rigorous data wrangling and filtering to focus the analysis on isolated cases of CHD with CNVs. This filtering considered the CHD classification frequency, CNV length, location, and type (deletion, duplication, or ROH). The original data set includes CMA data for 1 363 patients with CHD. Of these, 386 patients were classified as having syndromic CHD, and 977 were categorized as nonsyndromic (Figure [94]S1). For this study, “syndromic” refers to patients with genomic disorders with well‐established associations with congenital heart defects, based on consensus opinions within the CCVM Consortium. The list of disorders considered syndromic was curated by the consortium and may not fully encompass all syndromic presentations. Across the patients who were nonsyndromic, a total of 2 416 CNVs were recorded, including heterozygous (n=901) and homozygous (n=16) deletions, duplications (n=770), and runs of homozygosity (ROH, n=729) (Figure [95]S2). Although most patients had a single CNV (n=963), 300 had 2, and 100 had 3 or more CNVs listed on their clinical test report. The first stage of preprocessing the CHD cohort was to remove all CNV entries capturing ROH. This type of CNV was excluded from our analysis, as discerning the function of a lncRNA contained within these regions would be considerably more complex than events that resulted in a loss or gain of a lncRNA. For the remaining 1687 CNVs (duplications and deletions), the data set was next filtered based on the size of the CNV to exclude small CNVs <5 000 base pairs and large CNVs >5 000 000 base pairs (5 Mb). The lower limit for CNV length is determined by the size detection capabilities of the microarray technology. In contrast, the upper limit of 5 Mb is a commonly used cutoff to exclude aneuploidies and other large chromosomal rearrangements.[96] ^35 A summary of CNV length distributions for the filtered data set is as follows: minimum=424 bp, median size=0.64 Mb, mean size=7.11 Mb, and maximum of 155.27 Mb (see Table [97]S2 for more details). The highly variable pseudoautosomal regions on the X chromosome, which exhibit high rates of recombination and structural variation, were also removed to ensure more consistent and interpretable results in the analysis. The final steps of preprocessing the data focused on the CHD category and defect classification. After removing ROH, small and large CNVs, and CNVs located in the pseudoautosomal regions, 1 126 patients remained with 1 404 CNVs. Of these patients (821 nonsyndromic; 305 syndromic), we removed all 305 patients who were syndromic and those with a diagnosis of cardiomyopathy (n=10) as this is primarily a disease of the heart muscle that may not directly relate to structural congenital defects. Syndromic CHD cases were excluded from this analysis to focus on nonsyndromic cases, thereby reducing potential confounding from phenotypes unrelated to cardiac defects. This filtering step aimed to ensure the identification of CNV‐lncRNAs specifically associated with cardiac phenotypes. Patients with aortopathy (n=15) and arteriopathy (n=14) were removed due to the distinct pathophysiology of these conditions. Five other categories were removed due to their small sample size: CTD and atrioventricular septal defects (n=15), septal defect with right ventricular outflow tract obstruction (VOTO; n=11), other (n=9), patent ductus arteriosus (n=2), and single ventricle, other subtype (n=2). The exclusion of these rare phenotypes enabled us to focus the analysis on the recurrent cardiac malformations that constitute the majority of phenotypes observed in this cohort with CHD. The final data set comprised 743 patients with 938 CNVs. RNA‐Seq Data Set and Processing for CNV‐lncRNA Analysis We used the human organ RNA‐seq time series of the development of seven major organs (ArrayExpress Accession Number E‐MTAB‐6814).[98] ^40 The data set covers 23 developmental stages across seven organs (n=313: brain/forebrain 55, hindbrain/cerebellum 59, heart 50, kidney 40, liver 50, ovary 18, and testis 41), starting at 4 postconception weeks (PCW) until 58 to 63 years of age. Normalized expression levels of lncRNAs calculated from this time series, expressed as transcripts per million (TPM), were obtained from the publicly accessible Long Non‐Coding RNA Expression Database (LncExpDB, available at lncRNA Expression Database).[99] ^41 For a description of how LncExpDB processed the transcriptomic data, see Data [100]S1. The RNA‐seq data include gene expression levels for 101 293 lncRNAs and 19 957 protein‐coding genes. When performing WGCNA, applying a minimum expression threshold of at least 1 TPM is recommended, ensuring that only genes expressed in at least 1 sample are included in the analysis.[101] ^34 Applying this threshold to all samples in the developmental time series, 18 317 protein‐coding genes had a minimum expression level of ≥1 TPM. For lncRNAs, we required a minimum expression level of ≥1 TPM in any of the 50 heart samples, as recommended for building the coexpression network. This filtering criterion identified 24 009 lncRNAs expressed in the heart at any developmental stage, along with corresponding expression levels for these genes across the other 6 organs. Identifying the CNV‐lncRNA Next, we wanted to identify all lncRNAs that fell within the 938 CNVs in the 743 patients with isolated CHD in the processed CCVM cohort. Given the CNV coordinates of the original data were mapped to GRCh37, a liftover was performed to map regions to reference genome build GRCh38. The lncExpDB RNA‐seq data were generated using a custom LncBook gene transfer format file (LncBookv1.9_GENCODEv33_GRCh38.gtf). As such, this same file was used to identify CNV‐lncRNAs. Although protein‐coding genes are present in the gene transfer format file, only lncRNAs were mapped to the CNV coordinates. Coordinates of CNVs were intersected with the LncBook lncRNAs using the R package GenomicsRanges (v1.56.1). A total of 15 261 unique lncRNAs fell within clinically reported CNVs in the CCVM cohort. The median number of lncRNAs per CNV was 10, with an interquartile range of 24 and a maximum of 356 lncRNAs in a single CNV (Figure [102]S3 shows the distribution of lncRNAs within the CNVs). Of the original 743 patients, 720 had a CNV overlapping 1 or more lncRNAs. The list of 24 009 lncRNAs expressed in the heart at any developmental stage was intersected with the 15 262 unique lncRNAs falling within a CNV in the CCVM cohort. This resulted in a final list of 3 608 CNV‐lncRNAs with evidence of heart expression, along with 18 317 unique protein‐coding genes expressed in any tissue at any development stage, that together can be further analyzed in the coexpression network (Figure [103]2E). The median number of expressed lncRNAs per CNV was 2, with an interquartile range of 6, and a maximum of 110 expressed lncRNAs in a single CNV (Figure [104]S4 shows the distribution of expressed lncRNAs within the CNVs). Of the original 743 patients, 557 had a CNV overlapping 1 or more expressed lncRNAs. Figure 2. Overview of CHD categories and CNVs in the CCVM cohort. Figure 2 [105]Open in a new tab This figure details the selection of the CNV‐LncRNAs reported in the CCVM cohort of patients with isolated CHD. A, Schematic representation of the CNV filtering process that refined the data set for more targeted analysis. B, The bar plot shows CHD category distribution in patients with isolated (nonsyndromic) CHD across the CCVM study cohort. Diagnoses are ranked by patient count, with bars indicating totals. Bars are color‐coded: sky blue for patients included in the analysis and white for those excluded. C, Sankey diagram shows the distribution of isolated CHD cases in the cohort, detailing associated cardiac phenotypes and the total number of CNVs across all patients. Each node represents a category, diagnosis, or CNV type labeled with the total CNV count. The flow widths between nodes reflect the number of CNVs, highlighting the relative distribution of gains and losses across different CHD diagnoses in the cohort. D, The circos plot visualizes CNV gains and losses across the human genome (hg38). The outer ideogram track displays chromosome bands, with black lines indicating gene‐poor G‐bands and red lines marking centromeres. The blue histogram track represents CNV gains, and the red track represents CNV losses. Taller bars highlight regions with higher frequencies of CNVs, pinpointing the genomic areas most impacted by gains and losses in the study cohort. E, Venn diagram of expressed lncRNAs and CNV‐associated lncRNAs illustrating the overlap between lncRNAs expressed in any tissue (blue), heart‐expressed lncRNAs (red), and CNV‐associated lncRNAs (green). The overlap regions indicate the number of lncRNAs shared between these categories, with the 3608 heart‐expressed lncRNAs used in our analysis being a subset of the total expressed lncRNAs. APVR indicates anomalous pulmonary venous return; AVSD, atrioventricular septal defect; CCVM, Cytogenomics of Cardiovascular Malformations; CHD, congenital heart disease; CNV, copy number variant; CTD, conotruncal heart defects; HTX, heterotaxy; lncRNA, long noncoding RNA; LVOTO, left ventricular obstructive lesion; PDA, patent ductus arteriosus; RVOTO, right ventricular obstructive lesion; and SD, septal defects. Network Construction Protein‐coding genes are well studied, and their functions are better understood than those of lncRNAs. Therefore, we can leverage the expression pattern of protein‐coding genes to infer the function of lncRNAs. The final gene set consisted of 21 925 total genes, of which 18 317 are protein‐coding genes expressed in any tissue, and 3 608 unique lncRNAs expressed in the human heart and falling within clinically reported CNVs in the CCVM cohort. We next constructed gene regulatory networks using lncRNAs and protein‐coding genes identified through WGCNA.[106] ^42 WGCNA is a systems biology method that identifies clusters, or modules, of genes that show high levels of correlation in their expression patterns. These modules are then summarized using module eigengenes. A module eigengene captures the average expression profile of a gene module.[107] ^42 This allows the module to be associated with different tissues, and hub genes specifically related to the heart to be identified. Building on this framework, we implemented stepwise module detection to refine our network analysis further, focusing on CNV‐lncRNA coexpression networks to identify novel lncRNA hub genes. In contrast to the block‐wise module detection approach taken by Lu et al.,[108] ^34 our study employs stepwise module detection for annotating CNV‐lncRNAs. This approach offers greater control and precision in network construction by allowing the merging of similar modules. In contrast, the block‐wise method does not and places genes into independent blocks, even when there may be potential similarities.[109] ^42 For module detection, an appropriate soft‐threshold power must be specified. WGCNA aims to create a network that follows a scale‐free topology, characterized by a structure in which a few nodes are highly connected, and most nodes have low connectivity. We evaluated a range of powers, selecting a power value of 9 (R^2=0.76), to ensure an optimal balance between achieving a scale‐free topology and maintaining network connectivity (Figure [110]S5). First, the topological overlap matrix (TOM) similarity was calculated with an unsigned network (see Data [111]S1). The unsigned network considers positive and negative gene correlations but treats them equivalently without distinguishing their specific contribution.[112] ^42 , [113]^43 We chose an unsigned network to capture both activators and repressors of the heart developmental process. Second, network modules were detected by hierarchical clustering on the distance matrix derived from the TOM, specifically using [MATH: 1TOM :MATH] to measure gene–gene dissimilarity (Figure [114]S6), enabling genes to be grouped into coexpression modules based on their correlation patterns.[115] ^42 Each module is assigned a unique identifier in the form of a color, which provides a visual and intuitive way to distinguish between different modules. These colors are arbitrary and serve as labels, but they remain consistent throughout the analysis, making it easy to track modules across different plots and analyses. For example, genes assigned to the “magenta” module exhibit similar expression patterns, whereas those in the “green” module form a distinct coexpression group. This color‐based labeling system facilitates the interpretation of network topology and gene clustering results. Third, each module was summarized by its eigengene. Hierarchical clustering of these eigengenes, using dissimilarity measures (1 – correlation), was used to refine module granularity and merge similar modules at a 0.1 dissimilarity threshold (Figure [116]S7). Relate Modules to Organs and Developmental Stages The final steps in our analysis focused on identifying which modules correlated most closely with the developing heart and identifying hub genes. First, the module‐trait relationships were measured by calculating the Pearson correlation of the module eigengenes with the tissue type. To ensure robustness against Type I errors due to multiple comparisons across the 7 organs, we applied a stringent P value threshold of ≤0.001.[117] ^42 Modules with high correlations to particular tissues, such as the heart, were highlighted as tissue‐specific. In addition to using the Pearson correlations, we visualized these relationships through a dendrogram, traditionally used to cluster coexpressed modules based on their eigengenes. In this case, we treated tissue types as an additional external trait and extended the dendrogram to reflect the similarity of module eigengenes to these tissue traits (Figure [118]S8). To complement the dendrogram, we also used an adjacency heatmap, which provides a visual representation of the correlation strengths between modules and tissues (Figure [119]S9). We focused on identifying modules most strongly correlated with the heart (Figures [120]S10 through [121]S12), focusing on the 50 heart samples to developmental stages (Figures [122]S13 through [123]S15). An enrichment analysis was conducted on each WGCNA module to assess the representation of lncRNAs, protein‐coding genes, and CHD genes. Hypergeometric and Fisher's exact tests were employed to determine whether these gene types were significantly enriched or depleted within the modules. Fisher's exact test P values were adjusted for multiple testing, and the results were classified as “enriched,” “depleted,” or “not significant” (see Data [124]S1) Next, we performed Gene Ontology (GO) and pathway enrichment analysis on the protein‐coding genes within each module to identify overrepresented gene sets. This analysis focused on three GO categories: biological processes, molecular functions, and cellular components, adjusting P values for multiple testing (see Data [125]S1 for details). Identification of Hub Genes In WGCNA, a hub gene is a highly connected gene within a module, strongly correlated with the module eigengene, and often pivotal in the biological processes the module represents. Hub genes for modules of interest are identified by calculating the module memberships (MM) for all genes and selecting those with R^2≥0.80.[126] ^44 High MM values signify a strong and specific connection between a gene and a particular module. Low MM values indicate a weaker connection, showing that the gene's expression pattern does not align closely with the module's expression. In addition to MM, we calculated the correlation between each gene's expression and the heart trait to determine gene–trait significance (GS) values. Here, we refer to the “heart trait” as a variable representing the expression data specific to heart samples across the data set. GS represents how strongly each gene is associated with the heart trait, compared with other organs. Scatter plots were generated to visualize the relationship between MM and GS for each module, helping to identify hub genes with high MM and GS values. Gene aliases of the hub genes are identified by the LncExpDB database.[127] ^41 The classification of lncRNAs among the hub genes was obtained from LncBook (Version 2.00).[128] ^45 Additionally, LncExpDB was used to overview lncRNA tissue expression in organs not included in the developmental data set. To visualize the networks of correlated genes, the modules with significant association with the heart were exported to Cytoscape,[129] ^46 where nodes represent genes and edges represent gene coexpression derived from the TOM matrix. We applied an edge weight minimum of 0.3 to refine the network, removing isolated nodes and small subnetworks for clarity. We focused on networks containing known CHD genes within the module of interest. To identify whether a protein‐coding gene was a CHD gene, we cross‐referenced our gene list with 142 high‐confidence CHD genes from the CHDgene database. The CHDgene website ([130]https://chdgene.victorchang.edu.au) is a valuable resource for clinicians and researchers, offering a curated list of genes with variants consistently shown to cause CHD in humans. Embryonic heart Gini coefficient scores were used to quantify tissue‐specific gene expression during human heart development (see Data [131]S1 for details).[132] ^28 VanOudenhove et al. proposed that a higher Gini coefficient (eg, >0.5) indicated a gene's expression was more specific to the embryonic heart, identifying genes enriched for heart‐related functions like TBX5, IRX4, and HAND1. To integrate this approach into the current analysis, LncBook identifiers were mapped to Ensembl gene accessions to retrieve Gini coefficients for genes with maximal expression in the embryonic heart. RESULTS The CCVM cohort comprises 1 363 patients and includes both syndromic and nonsyndromic CHD (Figure [133]S1).[134] ^35 We focused our analysis on patients with nonsyndromic CHD, applying a filtering scheme that removed ROH, deletions <5 000 base pairs (Kb), aneuploidies, and CNVs falling within the pseudoautosomal region (Figure [135]2A). The filtered data set has 743 patients who are nonsyndromic with a total of 938 CNVs >5 Kb, but <5 000 000 base pairs (Mb) (Figure [136]S2). Eight different subtypes of CHD were represented in the cohort: CTD, SD, left VOTO, right VOTO, SD + left VOTO, heterotaxy, atrioventricular septal defect, and anomalous pulmonary venous return (Figure [137]2B). Identification of CNV‐Associated lncRNA Genes The CCVM CMA data were successfully lifted from GRCh37 to GRCh38 to annotate lncRNA and protein‐coding genes within each CNV using a comprehensive set of >100 000 putative lncRNA genes from LncExpDB.[138] ^41 Following the liftover process, some CNVs were split into multiple regions, resulting in a final data set of 953 CNVs. The frequency of CNVs for the 8 cardiac phenotypes is ranked from most to least in Figure [139]2C. There are 494 CNV gains (blue) and 459 losses (red). A total of 573 patients had 1 CNV, 140 had 2 CNVs, and 30 had ≥3 CNVs. The chromosomal distribution for gains and losses shows clinically reported CNVs in the CCVM patients with nonsyndromic CHD affect all chromosomes (Figure [140]2D). Overall, chromosome X had the most CNVs with 88 (60 gains and 28 losses). Chromosome Y had the fewest with 7 CNVs (5 gains and 2 losses). Only 72 of the 953 CNVs (7.6%) affected a known CHD gene. Of the 142 known CHD genes in the current CHDgene list, 33 (23%) were affected by a CNV in our data set. Notably, some of these known genes were recurrently affected, including MYH11 (13 CNVs), LZTR1 (8 CNVs), and NOTCH1 (8 CNVs). To identify CNV‐lncRNAs, we used RNA‐seq data from a developmental time series of 7 major human organs. For the WGCNA analysis, we first identified the subset of protein‐coding genes expressed in any tissue across developmental stages. To focus on lncRNAs potentially involved in heart development and thus relevant to CHD, we restricted our analysis to lncRNAs expressed in the heart. We applied a minimum expression threshold of at least 1 TPM, resulting in 18 317 protein‐coding genes and 57 627 lncRNAs that were expressed at any developmental time point across the seven different organs. Among the expressed lncRNAs, 24 009 (42%) were expressed in the heart at some developmental stage. Of these heart‐expressed lncRNAs, 3 608 (15%) were located within clinically reported CNVs from the CCVM cohort (Figure [141]2E). Out of the original 743 patients, 557 had CNVs overlapping 1 or more of these 3 608 heart‐expressed lncRNAs, which we refer to as CNV‐lncRNAs. These CNV‐lncRNAs were used as input for the WGCNA analysis. WGCNA‐Based Identification and Functional Analysis of Heart‐Specific Gene Modules For the WGCNA network construction, the TOM adjacency matrix contained a pairwise comparison for 21 925 total genes: 18 317 protein‐coding genes expressed in any tissue and 3 608 CNV‐lncRNAs derived from the intersection of lncRNAs expressed in the heart and CNVs in the CCVM cohort. The dendrogram constructed from the hierarchical clustering of the distance TOM matrix contains information on the structure of gene expression correlations (Figure [142]S6). The hierarchical clustering constructs the modules for all genes. Each gene is tagged underneath by the color according to the module. For each module identified using the TOM‐based clustering, eigengenes were calculated as the first principal component of the gene expression profiles, summarizing the overall expression pattern within each module. The hierarchical clustering initially resulted in 53 modules (Figure [143]S7A). Merging similar modules below the dissimilarity threshold of 0.1 led to 40 modules (Figure [144]S7B). All 40 modules and their associated counts of protein‐coding genes, lncRNA genes, and known CHD genes are shown in Data Set [145]S1. Statistical analysis for gene enrichment of lncRNA genes and known CHD genes was performed using the hypergeometric distribution. Six modules (gray, magenta, paleturquoise, green, lightcyan1, and lightyellow) were enriched for lncRNAs. Four modules demonstrated enrichment of known CHD genes (darkturquoise, darkolivegreen, purple, and magenta). The magenta module (ME03) was the only module to display significant enrichment for both lncRNAs (P=8.09 x 10^−08) and known CHD genes (P=0.0112). Pearson correlation values for the module eigengene‐organ association across the seven tissues for the 40 modules are depicted in the heatmap in Figure [146]3 (modules and their associated correlations and P values are available in Data Set [147]S2). The stronger the correlation, the higher the intensity of the color, with red representing positive correlations, blue representing negative correlations, and white representing no correlation. Modules with high positive or negative correlations (|r|>0.6) and P values <0.001 are considered tissue‐specific. For example, module ME40 (pink) is highly specific to the liver (r=0.96, P=5×10^−174), whereas module ME28 (royalblue) is highly specific to the kidney (r=0.83, P=3×10^−82). Figure 3. Module‐tissue association heatmap across seven organ systems. Figure 3 [148]Open in a new tab This heatmap helps identify gene modules critical for the development and function of specific tissues. It illustrates the correlation between the modules identified by weighted gene coexpression network analysis and the different tissue types across the developmental time series. Each row represents a module eigengene, color‐coded according to the original module colors prefixed by “ME,” and each column corresponds to a specific tissue. Positive correlations are depicted in red, negative correlations in blue, and white indicates no correlation. The color intensity reflects the strength of the correlation, with numerical values in each cell displaying the correlation coefficient, followed by the P value in parentheses for statistical significance. Only significant correlations (P<0.001) are shown. Modules with high positive or negative correlations (>|0.6|) are considered tissue‐specific, highlighting potential tissue‐specific gene expression patterns. Module ME03 (magenta) notably shows the highest positive correlation (R=0.95) with heart tissue, suggesting a strong link to heart development and function. Hierarchical clustering of the modules demonstrated that the magenta module clustered most closely to the heart samples (Figure [149]S8). The magenta module (ME03), comprising 567 genes, demonstrated significant heart tissue specificity, with a positive correlation value of 0.95 and a P value of 4×10^−156 (Data Set [150]S3). This module showed no significant positive correlations in the other organs. Focusing on the heart samples, we compared each module's eigengene and developmental stage, using the age of the samples in postconception (Data Set [151]S4). Modules with high positive correlations (in red) indicate increasing expression with developmental stage, whereas negative correlations (in blue) suggest decreasing expression over time (Figure [152]S13). The magenta module had a significant negative correlation (r=−0.50, P=0.0002), clustering most closely with the mid‐evelopment (11–20 PCW) stage (Figures [153]S14 and [154]S15). In the magenta module, eigengene expression increased from conception, peaking around fetal middevelopment (10–20 PCW) (Figure [155]4). Following this peak, expression levels gradually declined, reaching their lowest levels in the mature heart. CNVs affecting 1 or more of the 567 genes (425 protein‐coding and 142 lncRNA genes) in the magenta module are found across all 8 CHD phenotypes (Data Set [156]S5). In total, 154 (21%) of patients with nonsyndromic CHD had a CNV affecting the module (Data Set [157]S6). Figure 4. Magenta module eigengene expression across developmental stages in the heart. Figure 4 [158]Open in a new tab The eigengene expression for the magenta module (ME03) is shown across various developmental stages in the heart, plotted by age in weeks PCW. The red data points represent the average eigengene expression at each stage, with a locally estimated scatterplot smoothing‐smoothed line (blue) fitted to highlight the trend over time. The gray shading indicates the 95% CI, showing the range within which the trend line is expected to fall. Vertical dashed gray lines indicate key developmental milestones: 10 weeks, 20 weeks, birth (38 weeks), 1 year, and 5 years post conception. These milestones mark significant transitions in heart development, providing context for the observed changes in eigengene expression. The expression level increases during early development, peaking between 10 and 20 PCW and gradually decreasing. PCW indicates post conception. The functional associations of the magenta module were identified using GO terms to reinforce its connection to the heart. Functional enrichment of GO terms was performed using the 425 protein‐coding genes within the magenta module. The analysis focused solely on protein‐coding genes because lncRNA genes generally lack well‐defined functional annotations in existing GO databases, making them less suitable for GO enrichment analysis. After combining semantically similar terms, 159 GO terms were functionally enriched (Data Set [159]S7), with the top 10 for each category illustrated in Figure [160]S16. These significant GO terms predominantly relate to muscle structure and function, highlighting critical processes such as muscle cell development, differentiation, and assembly. The identified cellular components include key structural elements like the sarcomere, myofibril, and contractile fibers. Additionally, molecular functions such as actin‐binding, structural constituents of muscle, and electron transfer activity further emphasize the role of these genes in muscle contraction and energy transduction. Of the 159 enriched GO terms, 26 were related to the heart and the cardiovascular organ system (Figure [161]5). These terms included biological processes related to cardiac tissue muscle development (P=4.41×10^−29), heart contraction (P=1.44×10^−21), heart morphogenesis (P=2.20×0^−16), cardiac chamber development (P=2.79×10^−13), vasculogenesis (P=6.44×10^−05) and embryonic heart tube development (P=0.0037). Figure 5. Functional enrichment of heart‐related biological processes in the magenta module. Figure 5 [162]Open in a new tab This bar chart presents 26 significantly enriched GO BP terms related to the heart and cardiovascular system, derived from the protein‐coding genes in the magenta module. Semantically similar terms were collapsed, and the most significant terms were selected based on their lowest adjusted P values (Benjamini–Hochberg method), all <0.05. Each bar represents the negative base‐10 logarithm transformation of the adjusted P values (−log10(padj)), indicating the relative statistical significance. The analysis highlights the magenta module's enrichment for genes with key roles in heart development and function. BP indicates biological process; and GO, Gene Ontology. Identification and Analysis of Hub Genes: Central Regulators of Heart‐Specific Gene Networks Genes with high MM, indicating strong connectivity within the module, and high GS, indicating a strong association with the heart trait, were identified as hub genes (Figure [163]6). These genes are likely key drivers of the biological processes relevant to heart function and development. The correlation coefficients for the protein‐coding genes (turquoise, r=0.968) and lncRNA genes (purple, r=0.972) both indicate strong positive correlation with P value <0.001. The correlation cutoff for module membership of r≥0.80 led to the identification of 146 hub genes (25.7% of the total 567 genes in the module) (Data Set [164]S8). Within the hub genes, 18 were lncRNAs (12.3%, P=0.9326). Strikingly, 8 of the 9 known CHD genes in the module were within the hub, representing a highly significant enrichment (P=4.787×10^−6). The lncRNA and known CHD hub genes and their corresponding MM and GS values are shown in Table [165]1. Our analysis identified 18 lncRNAs as potential clinical targets from the CCVM cohort of clinically validated CNVs associated with nonsyndromic CHD (Table [166]2). Figure 6. Identification of hub genes in the magenta module. Figure 6 [167]Open in a new tab This scatter plot illustrates the relationship between MM and GS for genes within the magenta module (ME03). MM reflects how strongly a gene is associated with the module eigengene, and GS measures the correlation between a gene's expression and the heart trait. Genes are categorized into lncRNAs (purple circles) and protein‐coding genes (teal squares). The top hub genes, which are highly connected within the module and strongly associated with the heart trait, are emphasized by a dashed vertical line at MM=0.8. The plot also includes correlation (R) and P values for lncRNA and protein‐coding genes, calculated separately and displayed on the graph to indicate the strength and significance of the correlation between MM and GS. This visualization is crucial for identifying lncRNA hub genes within the module, which may play essential roles in the heart‐related biological processes associated with the module. GS indicates gene significance; lncRNA, long noncoding RNA; and MM, module membership. Table 1. Known CHD and lncRNA Hub Genes From the Magenta Module (ME03) Accession Gene symbols/synonyms MM.Cor MM.P value GS.Cor GS.P value Classification Know CHD protein‐coding genes ENSG00000183072 NKX2‐5 0.95 1.215E‐153 0.98 2.980E‐213 Protein‐coding ENSG00000134571 MYBPC3 0.93 9.593E‐138 0.93 1.460E‐140 Protein‐coding ENSG00000089225 TBX5 0.92 1.236E‐131 0.94 2.262E‐149 Protein‐coding ENSG00000092054 MYH7 0.92 7.219E‐130 0.95 5.668E‐156 Protein‐coding ENSG00000159251 ACTC1 0.91 3.250E‐119 0.90 1.605E‐113 Protein‐coding ENSG00000197616 MYH6 0.90 2.217E‐115 0.95 1.557E‐153 Protein‐coding ENSG00000113196 HAND1 0.85 2.988E‐87 0.85 1.218E‐87 Protein‐coding ENSG00000164532 TBX20 0.85 1.172E‐87 0.86 2.123E‐94 Protein‐coding lncRNA Genes HSALNG0131540 MIR1‐1HG MIR133A2HG Lnc‐SLCO4A1–8 0.93 2.219E‐140 0.95 9.435E‐165 Intergenic HSALNG0057201 Lnc‐TBX20‐1 ENSG00000226063 0.89 1.122E‐110 0.90 4.646E‐117 Intergenic HSALNG0055689 HSALNG0055689 0.88 8.389E‐104 0.89 1.328E‐109 Antisense HSALNG0057200 ENSG00000289335 0.88 9.246E‐101 0.87 4.126E‐99 Intergenic HSALNG0081800 Lnc‐STK32C‐3 ENSG00000226900 0.88 8.062E‐101 0.83 4.435E‐81 Intergenic HSALNG0047724 Lnc‐ECI2‐2; LOC101927888 ENSG00000231811 0.87 2.538E‐98 0.90 2.27E‐112 Intergenic HSALNG0092764 Lnc‐CCDC59‐5 ENSG00000288941 0.87 3.134E‐96 0.86 7.883E‐95 Intergenic HSALNG0057214 HSALNG0057214 0.85 9.024E‐87 0.82 9.348E‐77 Intergenic HSALNG0088492 lnc‐B4GALNT3–4 0.85 3.362E‐90 0.82 2.549E‐78 Antisense, Sense HSALNG0088499 LINC02455 0.85 7.368E‐88 0.82 6.253E‐77 Intergenic HSALNG0096048 LINC02455 ENSG00000256672 0.85 1.990E‐89 0.83 1.008E‐80 Antisense HSALNG0062838 HSALNG0062838 0.83 8.777E‐82 0.81 3.391E‐73 Sense HSALNG0119665 ENSG00000272625 antisense to LPIN2 0.82 8.212E‐76 0.72 2.238E‐51 Antisense, Sense HSALNG0062865 lnc‐RP11‐63E5.6.1–3 0.81 6.605E‐74 0.71 6.227E‐49 Antisense HSALNG0114768 Lnc‐PMP22‐2 ENSG00000237377 0.81 2.341E‐74 0.80 3.837E‐70 Intergenic HSALNG0054330 HSALNG0054330 0.80 6.152E‐70 0.76 3.558E‐59 Antisense HSALNG0069545 lnc‐DOCK8‐2 0.80 2.181E‐72 0.72 6.540E‐51 Sense HSALNG0131539 CRMA ENSG00000174403 C20orf166‐AS1 MIR1‐1HG‐AS1 0.80 1.225E‐70 0.74 4.564E‐56 Intergenic [168]Open in a new tab For each gene, the table provides the module membership correlation and its associated P value, which indicates the strength of each gene's association with the magenta module. Additionally, the table shows the gene significance correlation and its P value, reflecting the correlation between each gene and heart‐related traits. This summary highlights hub genes highly correlated with the module, including both protein‐coding CHD genes and lncRNAs. CHD indicates congenital heart disease; GS.Cor, gene significance correlation; lncRNA, long noncoding RNA; and MM.Cor, module membership correlation. Table 2. lncRNA Hub Genes and Associated CNVs and Cardiac Phenotype LncBook identifier Overlapping CNV Cardiac phenotype CCVM patient(s) HSALNG0047724 6p25.1 loss CTD 003–0293 HSALNG0054330 10q23.1 gain 6q25.1 gain CTD 000–0369 HSALNG0055689 7p22.3p22.2 loss CTD 002–0019 HSALNG0057200 HSALNG0057201 HSALNG0057214 7p14.2 loss[169]^* CTD 003–0191 HSALNG0062838 7q36.3 loss LVOTO 001–0182 HSALNG0062865 8p23.3 loss 8p23.3 gain CTD LVOTO 004–0065 003–0116 HSALNG0069545 9p24.3 gain (2) 9p24.3 loss CTD RVOTO LVOTO 008–0053 001–0078 003–0375 HSALNG0081800 10q26.3 loss (2) RVOTO (2) 002–0008 009–0009 HSALNG0088492 HSALNG0088499 12p13.33 gain 2p16.3 loss SD + LVOTO 007–0016 HSALNG0092764 12q21.31 loss 16p13.11 gain SD 003–0440 HSALNG0096048 13q12.3 gain CTD 007–0002 HSALNG0114768 2p21 gain 5p15.2 loss 17p12 gain Atrioventricular septal defect 002–0007 HSALNG0119665 18p11.32p11.31 loss 18p11.32p11.31 gain Anomalous pulmonary venous return SD 005–0036 000–0224 HSALNG0131539 HSALNG0131540 20q13.33 gain SD + LVOTO 001–0285 [170]Open in a new tab Each lncRNA from the magenta module (ME03) is listed alongside the corresponding chromosomal region of the overlapping CNV (eg, gain or loss) and the associated cardiac phenotype. The table also includes patient identifiers and CHD classification from the CCVM cohort who exhibit the relevant CNV, directly linking the genetic variations and observed heart conditions. CCVM indicates Cytogenomics of Cardiovascular Malformations; CHD, congenital heart disease; CNV, copy number variant; CTD, conotruncal heart defects; lncRNA, long noncoding RNA; LVOTO, left ventricular obstructive lesion; RVOTO, right ventricular obstructive lesion; and SD, septal defects. ^* This CNV also contains the known CHD gene TBX20. The relationships within the magenta module were further explored through coexpression network visualization (Figure [171]7). To enhance the clarity of the network, we applied a stringent filter to the edges, retaining only those with a TOM value >0.30. This filtering resulted in a network comprising 67 nodes with high MM (MM.Cor ≥0.80) and strong coexpression interaction. Known CHD genes in the module were included in the network regardless of their edge threshold interaction value. As a result, the network included 8 CHD genes, 7 lncRNA genes, and 52 protein‐coding genes. Notably, all genes in this network displayed a coexpression interaction with NKX2‐5, a critical regulator of heart development. Figure 7. Gene network visualization of hub genes in the magenta module. Figure 7 [172]Open in a new tab This figure visualizes the coexpression network of hub genes within the magenta module (ME03), generated using Cytoscape. Nodes represent hub genes, color‐coded by gene type: gold for known CHD genes, purple for lncRNAs, and light green for protein‐coding genes. Edges indicate coexpression interactions between genes based on the TOM matrix, with a threshold of TOM>0.30. Hub genes are those with high module membership (R≥0.80). Coexpression edges connecting to lncRNA genes are highlighted in red. The network is divided into 2 panels for clarity: A, Seven of the 18 CNV‐lncRNA hub genes are coexpressed with NKX2‐5, a critical transcription factor for early heart development. B, A highly interconnected network of known CHD genes and other protein‐coding genes, all coexpressed with NKX2‐5. For clarity, redundant edges were removed and indicated with an arrow and curly brace. CHD indicates congenital heart disease; CNV‐lncRNA, candidate lncRNA within a clinically validated CNV; lncRNA, long noncoding RNA; and TOM, topological overlap matrix. The Gini index of >36 000 genes from 31 tissues, developed using independent human heart RNA‐seq and GTEx data sets, was mapped to the hub genes.[173] ^28 The Gini coefficient with respect to the distribution of expression levels across samples identifies genes likely important in heart development by highlighting those with expression patterns specific to the heart during embryogenesis. VanOudenhove et al. proposed that candidate CHD genes have a Gini coefficient >0.5 and, of the tissues studied, the gene must be maximally expressed in the embryonic heart. Within the hub genes identified in the magenta module, all 8 known CHD genes and 7 of the 18 lncRNAs genes had Gini scores from their study (Table [174]3). Four of the 7 lncRNA candidates met the CHD gene candidate criteria: lnc‐TBX20‐1 (Gini=0.959), lnc‐STK32C‐3 (Gini=0.867), CRMA (Gini=0.662), and lnc‐DOCK8‐2 (Gini=0.631). As was observed in the known CHD genes MYH6 and MYH7, the other candidate lncRNA genes had high expression in adult heart and muscle, or had a Gini score <0.5, thus not meeting VanOudenhove et al,'s suggested criteria for a CHD candidate gene. Table 3. Gini Coefficient Analysis of Hub Genes in the Magenta Module Accession Symbol Type Gini coefficient Tissue with max expression Max expression ENSG00000226063 Lnc‐TBX20‐1 (HSALNG0057201) lncRNA 0.959 Embryonic heart 167 ENSG00000164532 TBX20 CHD 0.948 Embryonic heart 5 117 ENSG00000197616 MYH6 CHD 0.936 Heart 323 262 ENSG00000183072 NKX2‐5 CHD 0.935 Embryonic heart 9 467 ENSG00000134571 MYBPC3 CHD 0.934 Heart 91 682 ENSG00000231811 Lnc‐ECI2‐2 (HSALNG0047724) lncRNA 0.93 Heart 85 ENSG00000174407 MIR1‐1HG (HSALNG0131540) lncRNA 0.918 Muscle 5 447 ENSG00000159251 ACTC1 CHD 0.915 Embryonic heart 290 948 ENSG00000092054 MYH7 CHD 0.905 Muscle 457 406 ENSG00000113196 HAND1 CHD 0.889 Embryonic heart 2 246 ENSG00000089225 TBX5 CHD 0.873 Embryonic heart 5 400 ENSG00000226900 Lnc‐STK32C‐3 (HSALNG0081800) lncRNA 0.867 Embryonic heart 567 ENSG00000174403 CRMA/MIR1‐1HG‐AS1 (HSALNG0131539) lncRNA 0.662 Embryonic heart 2167 ENSG00000226403 lnc‐DOCK8‐2 (HSALNG0069545) lncRNA 0.631 Embryonic heart 398 ENSG00000272625 antisense to LPIN2 (HSALNG0119665) lncRNA 0.251 Embryonic heart 81 [175]Open in a new tab This table presents the Gini coefficients for hub genes identified in the magenta module (ME03). The table includes hub genes that are either known CHD genes or lncRNAs. The Gini coefficient, a measure of tissue‐specific gene expression, is calculated for each gene, with higher values indicating tissue‐specific expression. Genes with a Gini coefficient>0.5 and maximal expression in the embryonic heart are considered to be candidate CHD genes. Out of 8 known CHD hub genes, 5 were found to have significant embryonic heart‐specific expression (Gini coefficient >0.5). Among the 18 lncRNA hub genes, 7 of the 9 with Ensembl accessions have a Gini coefficient, and 5 are maximally expressed in the embryonic heart. Of these, 4 lncRNAs exhibit significant heart‐specific expression with a Gini coefficient >0.5. CHD indicates congenital heart disease; and lncRNA, long noncoding RNA. DISCUSSION We applied WGCNA to cluster CNV‐lncRNAs using a similar approach to that described for the heart and kidney.[176] ^34 , [177]^47 CNV‐associated lncRNAs were identified in clinically reported CNVs from a cohort of patients with isolated CHD. The use of standardized, clinical‐grade CMA platforms across multiple centers ensures the reliability of CNV detection and enhances the reproducibility of our results. Using human organ transcriptomic time‐series data for the development of 7 major organs, we clustered the gene expression of both protein‐coding and CNV‐lncRNA genes into modules. These modules were then correlated with specific organs, allowing us to identify a module of coexpressed genes highly specific to the developing heart. We identified 18 CHD‐candidate lncRNA genes within this heart‐specific module (the magenta module, ME03). Seven of these 18 lncRNAs had significant coexpression relationships with known CHD genes, and 4 were confirmed as candidate CHD genes in an orthogonal data set examining gene expression patterns specific to the heart during embryogenesis. Our study offers a well‐documented and reproducible programming pipeline that applies a widely used method in a novel way to identify candidate lncRNAs within a disease‐specific context and predict their functions. This approach empowers researchers to systematically study CNV‐lncRNAs, providing a comprehensive toolset for future investigations. Moreover, our work expands the analysis of CNV‐lncRNAs by using a larger and more recent data set of clinically confirmed CNVs from patients with CHD in the CCVM data set, further enhancing the scope and applicability of the research. Clinical genetics in CHD primarily emphasizes well‐known protein‐coding genes, as evidenced by resources such as the high‐confidence CHDgene list.[178] ^7 However, our analysis of the CCVM cohort revealed that only 72 of the 953 clinically reported CNVs (7.6%) affected 1 of these known CHD genes, suggesting that many potential pathogenic factors remain unidentified. By contrast, 656 of the 953 clinically reported CNVs (68.8%) contain 1 or more lncRNAs expressed in the heart, that is, CNV‐lncRNAs, underscoring the need to consider the noncoding genome as a crucial factor in the genetic cause of CHD. Through WGCNA, we identified a highly heart‐correlated module (the magenta module, ME03, Figure [179]3) consisting of 567 genes with significant heart tissue specificity, clustering most closely with the middevelopment stage (11–20 PCW). No significant positive correlations for this module were observed in other organs. CNVs affecting these genes were present across all 8 CHD phenotypes, with 21% of patients with nonsyndromic CHD carrying a CNV affecting the module. In this module, eigengene expression increased steadily from conception, peaking during fetal middevelopment (10–20 PCW) (Figure [180]4). After this peak, expression levels gradually declined, reaching their lowest point in the mature heart. This expression pattern suggests that the genes within the magenta module are most active during early to mid‐heart development, a critical period for structural refinement, including chamber septation, conduction system development, and the initiation of blood circulation. Functional enrichment analysis of the protein‐coding genes in this module identified multiple enriched GO terms directly linked to heart and cardiovascular development, highlighting processes such as cardiac muscle development, heart contraction, and morphogenesis (Figure [181]5). Additionally, the analysis revealed significant enrichment of genes involved in mitochondrial function, energy production, and metabolic processes, underscoring the critical role of cellular energy dynamics in heart development (Data set [182]S7 and Figure [183]S16). Given the essential role of mitochondria in providing the energy required for cardiac muscle contraction and overall cardiovascular function, this enrichment of mitochondrial and metabolic genes suggests that dysregulation in energy metabolism may contribute to CHD pathogenesis. Overall, the developmental gene expression pattern and the identified GO terms suggest that genes in this module likely play pivotal roles in heart development and may serve as candidate genes for CHD. Within the module of genes highly correlated with the developing heart, we identified 18 highly connected lncRNA hub genes strongly correlated with the module eigengene (Figure [184]6). These genes are likely key drivers of the biological processes relevant to heart function and development. However, the lack of Ensembl accession numbers for many of our candidate lncRNAs limited our search for external information. Notably, only 9 of these lncRNAs possess recognized Ensembl identifiers; the other 9 were putative lncRNAs identified by LncBook. This highlights significant constraints in current gene annotation methods, which are often too stringent for noncoding RNA. For instance, Ensembl's criteria exclude intergenic lncRNAs with open reading frames that exceed 35% of their length or contain protein domains.[185] ^48 Consequently, our analysis and subsequent discussions were confined to those lncRNA candidates with Ensembl identifiers, limiting the full exploration of the potential roles of the remaining putative lncRNAs in heart development and emphasizing the need for more inclusive gene annotation frameworks for noncoding RNAs. Seven of the 18 CNV‐lncRNA hub genes were coexpressed with NKX2‐5, a well‐known transcription factor critical for early heart development: HSALNG0057201 (Lnc‐TBX20‐1), HSALNG0057200 (ENSG00000289335), HSALNG0055689, HSALNG0092764 (Lnc‐CCDC59‐5), HSALNG0096048 (ENSG00000256672), HSALNG0047724 (Lnc‐ECI2‐2), and HSALNG0131540 (MIR1‐1HG) (Figure [186]7A). NKX2‐5 is expressed in the first and second heart fields to maintain chamber‐specific identities.[187] ^49 Pathogenic variants in NKX2‐5 are associated with a variety of CHDs, including atrial[188] ^50 and ventricular SDs,[189] ^51 atrioventricular septal defects,[190] ^52 and others.[191] ^53 , [192]^54 Notably, HSALNG0131540 or MIR1‐1HG was also significantly coexpressed with well‐established CHD genes, TBX5,[193] ^55 , [194]^56 MYH6,[195] ^57 , [196]^58 and MYH7 [197]^59 , [198]^60 (Figure [199]7B). The coexpression of these lncRNAs with NKX2‐5 and other known CHD genes underscores their potential roles in heart development and cardiac function. This suggests that they may be involved in similar developmental pathways and could contribute to the pathogenesis of congenital heart defects. Four of our CNV‐lncRNA hub genes—lnc‐TBX20‐1, lnc‐STK32C‐3, CRMA, and lnc‐DOCK8‐2—were also highly ranked as candidate CHD genes in an external validation (Table [200]3). These genes met the criteria of having the highest expression in the embryonic heart across 32 different organs and tissues, along with a significant Gini index, indicating strong heart‐specificity.[201] ^28 Established CHD genes that meet the Gini index criteria include TBX20 (Gini=0.948) and NKX2‐5 (Gini=0.935). Unexpectedly, the three known CHD genes—MYBPC3, MYH6, and MYH7—were excluded from VanOudenhove's candidate gene set because their expression extends beyond the embryonic heart into the adult heart and muscle. Despite the Gini index limitations to score all known CHD genes, this index did confirm lnc‐TBX20‐1, lnc‐STK32‐C, CRMA, and lnc‐DOCK8‐2—as potential contributors to heart‐specific gene regulation and strengthens their candidacy for further investigation in CHD research. The intergenic lncRNA lnc‐TBX20‐1 (HSALNG0057201) was particularly interesting due to its high Gini index and proximity to the known CHD gene TBX20.[202] ^61 Of the known CHD genes and CNV‐lncRNAs, lnc‐TBX20‐1 had the highest Gini index (0.959) and peak expression in the embryonic heart, making it a strong CHD candidate lncRNA. The CNV associated with lnc‐TBX20‐1 is a 350 Kb loss of 7p14.2 in a patient with CTD. This region contains the known CHD gene, TBX20, and 2 other lncRNA candidates, HSALNG0057200 and HSALNG0057214. The location of lncRNAs is important as lncRNAs are known to regulate nearby genes.[203] ^11 Although it is not possible to determine whether the effects of this pathogenic CNV are due to the loss of TBX20 or the lncRNA genes, the coexpression of lnc‐TBX20‐1 and HSALNG0057200 with NKX2‐5 is particularly intriguing. NKX2‐5 and TBX20 are transcription factors that interact in heart development regulation.[204] ^62 This raises the possibility that these lncRNAs may have regulatory roles in connection with TBX20 and potentially NKX2‐5, though experimental evidence is still needed to support this hypothesis. lncRNAs function in processes ranging from chromatin modification to nuclear organization and are emerging as critical players in cardiovascular development and disease. However, their complexity often extends beyond the traditional definitions of noncoding genes, as exemplified by lnc‐STK32C‐3's (HSALNG0081800) coding potential.[205] ^13 The CNVs associated with lnc‐STK32C‐3 are 2 separate 2.9 MB deletions of 10q26.3 observed in 2 CCVM patients with right VOTO. We identified this CNV‐lncRNA as a candidate CHD gene with the highest module membership in the magenta module, indicating specificity in cardiac development. This gene was also indicated as a candidate CHD gene by the Gini index with a highly significant Gini index of 0.867 and is most highly expressed in the embryonic heart across the 32 organs and tissues transcriptionally profiled.[206] ^28 We are the first to suggest that this lncRNA is involved in the initial formation of the heart. Previous research suggests that lnc‐STK32C‐3 is abundantly expressed in ischemic hearts, and it is suggested to be involved in myocardial injury.[207] ^14 This aligns with the broader understanding that genes can turn on and off in response to developmental cues and cardiac injury, as shown by the protein‐coding gene VEGF‐B.[208] ^63 Therefore, if protein‐coding genes can have dual roles under different conditions expressed by the heart, it is plausible that lncRNAs can, too. Lastly, although it is known that lnc‐STK32C‐3 encodes microproteins and secretes them in the adult heart,[209] ^27 whether it can also perform this function during cardiac development remains to be explored. CRMA is a known regulator of cardiac differentiation and has been experimentally validated as important for heart development.[210] ^24 , [211]^64 Variants affecting CRMA are, therefore, likely pathogenic and could help explain the CHD phenotype. The CNV associated with CRMA is a 2.9 MB gain in the 20q13.3 region in a patient with SD and left VOTO. In addition to CRMA, this CNV affects the known CHD gene GATA5 [212]^65 , [213]^66 and the lncRNA MIR1‐1HG.[214] ^24 MIR1‐1HG is the precursor of 2 cardiogenic miRNAs, known as MIR1‐1 and MIR‐133a2. GATA5 has been implicated in CHD only through loss‐of‐function mutations in humans,[215] ^65 whereas in our cohort patients with 20q13.3 amplification, it is possibly impacting CHD pathogenesis through a gain‐of‐function mutation. When CRMA is expressed, it downregulates MIR1‐1 and MIR133a2, pushing embryonic stem cells into a mature cardiac fate.[216] ^24 , [217]^64 Although the knockdown of CRMA promotes cardiac differentiation through increased expression of MIR‐133a2, MIR‐133a1 targets RBPJ to inhibit the NOTCH pathway.[218] ^24 We hypothesize that in a gain‐of‐function mutation, CRMA may suppress the cardiac miRNAs expression beyond normal levels, leading to dysregulation of the NOTCH pathway causing cardiac malformations. Together, these networks of genes are required for the coordinated regulation of cardiogenic differentiation in embryonic stem cells. Three CNVs affecting 9p24.3 were associated with lnc‐DOCK8‐2: a 305 Kb duplication in a patient with right VOTO, a 354 Kb duplication in a patient with CTD, and a 100 Kb deletion in a patient with left VOTO. Although the CNVs do not affect any known CHD genes, in addition to the lncRNA, these smaller CNVs affect 2 protein‐coding genes, DOCK8 and KANK1. DOCK8 is a known Mendelian disease gene, resulting in a rare autosomal recessive immunologic disorder characterized by recurrent staphylococcal infections of the skin and respiratory tract.[219] ^67 KANK1 is an essential regulator of the actin cytoskeleton, similar to Rho GTPases, which also regulate actin dynamics through ROCK (Rho‐kinase), and is required for myogenic differentiation.[220] ^68 Rho GTPase signaling is involved in cardiac looping and early development of the heart, and dysregulation of this pathway may lead to structural heart defects.[221] ^69 Whether lnc‐DOCK8‐2 regulates either of these genes is unknown, and like the majority of CNV‐lncRNAs that we identified, further studies are required. Reproducible bioinformatics methods and well‐documented software are still significant barriers to advancing biomedical research, often hindering the ability to replicate and build upon existing studies. To address this challenge, we developed comprehensive and well‐documented R notebooks that streamline our analytical workflow and fully automate the generation of all tables and figures, enabling easy replication and modification of the entire pipeline. This approach provides a robust and adaptable framework standardizing coexpression analysis to CNV‐lncRNAs that can be applied to other genomic data sets, facilitating broader investigations into the roles of lncRNAs in tissue‐specific and disease‐specific contexts. As annotations of noncoding RNAs continue to improve and new genomic studies of patients with CHD emerge, our methods can be reused and extended to uncover additional candidate genes and pathways involved in CHD. Although our study has identified several candidate lncRNAs associated with CHD, further experimental work is needed to validate these findings and explore the mechanistic roles of these lncRNAs. Future directions include verifying the expression and functional roles of all identified lncRNAs through experimental validation. This process should involve characterizing their spatial and temporal expression patterns in cardiomyocyte cells under both normal and pathological conditions. For instance, while the downregulation of lncRNA CRMA has been studied, it remains unexplored in terms of upregulation. Evaluating the effects of CRMA upregulation in cell culture systems could provide valuable insights into its role in cardiac development. Moreover, CRMA's known binding sites for miR‐133a and miR‐1‐1, 2 critical miRNAs in cardiogenesis, warrant further investigation to assess how its expression modulates these regulatory molecules. Another compelling candidate, lnc‐TBX20‐1, is coexpressed with NKX2‐5, a transcription factor pivotal for heart development. To better understand their relationship, future studies could use coexpression experiments in human cardiomyocyte cell lines, such as those with NKX2‐5‐tagged systems, to elucidate their potential regulatory interactions and their roles in CHD pathogenesis. For other novel lncRNAs identified in this study, it is essential first to characterize their expression in cardiomyocytes to determine their roles during heart development. Once expression patterns are established, functional studies could investigate the effects of upregulation and downregulation on cardiomyocyte processes such as differentiation, proliferation, and apoptosis. By pursuing these experimental directions, future research will complement the computational findings of this study, clarifying the molecular roles of lncRNAs in CHD. These efforts, along with the continued adaptation of our pipeline to incorporate new data sets and improved lncRNA annotations, will advance our understanding of CHD pathogenesis and may lead to novel therapeutic targets. CONCLUSIONS In this study, we identified and characterized previously unrecognized lncRNAs associated with well‐established protein‐coding CHD genes, confirming prior findings while revealing new potential links, such as the involvement of lnc‐DOCK8‐2 and HSALNG0096048 in our cohort with CHD. By focusing on nonsyndromic CHD, this study aimed to reduce confounding factors introduced by systemic phenotypes common in syndromic cases. This approach allowed us to prioritize cardiac‐specific candidate genes, which are more likely to contribute directly to CHD pathogenesis. Although syndromic CHD remains an important area for future research, requiring multitissue analyses and broader data sets, we believe the insights gained from this study provide a foundation for future investigations into both syndromic and nonsyndromic CHD. This study specifically investigates structural CHD, as the CCVM Consortium study was designed to identify CNVs associated with structural abnormalities. Although nonstructural CHD, such as arrhythmias, is an important area of study, its genetic basis is typically linked to SNVs or small indels, which are better addressed through sequencing‐based techniques rather than chromosomal microarray analysis. Moreover, many nonstructural CHD cases are secondary to maternal conditions or metabolic disorders and are less likely to involve structural genomic changes. Future studies employing sequencing approaches may provide insights into the genetic underpinnings of these conditions. By applying the advanced bioinformatics approach of WGCNA, we uncovered insights into the complexities of CNV‐associated lncRNAs within a CHD context. These findings emphasize the potential pathogenic roles of noncoding regions, which are often overlooked by traditional genetic studies. Our work highlights the significant, yet underexplored, contribution of lncRNAs to the genomics of CHD, opening new pathways for diagnosing and understanding the genetic basis of CHD. In addition to these discoveries, we have made our fully documented R code publicly accessible, enabling other researchers to replicate and build upon our findings. This transparency facilitates further investigation into the uncharted territory of lncRNAs, their critical roles in heart development, and their probable link to CHD. As we continue to advance our understanding of noncoding RNAs and their role in CHD, we anticipate that this research will ultimately lead to better diagnostic tools and improved outcomes for individuals affected by CHD. As genome sequencing is incorporated into the standard of care for patients with critical CHDs, we encourage clinical labs to consider reporting CNVs and single nucleotide variants affecting lncRNAs. APPENDIX Additional Cytogenomics of Cardiovascular Malformations (CCVM) Consortium Contributors: Lindsey R. Helvaty, Department of Pediatrics and Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America [222]https://orcid.org/0000‐0002‐4234‐419X. Gabrielle C. Geddes, Department of Pediatrics and Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America [223]https://orcid.org/0000‐0003‐2077‐4997. Jennelle C. Hodge, Department of Pediatrics and Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America [224]https://orcid.org/0000‐0002‐1770‐4042. Vidu Garg, Center for Cardiovascular Research, Nationwide Children's Hospital, Columbus, Ohio, United States of America and Department of Pediatrics, The Ohio State University College of Medicine, Columbus, Ohio, United States of America [225]https://orcid.org/0000‐0002‐3778‐5927. Cecilia W. Lo, Department of Developmental Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America. [226]https://orcid.org/0000‐0003‐4314‐3434. Svetlana A. Yatsenko, Department of Pathology, Stanford University, Stanford, United States of America [227]https://orcid.org/0000‐0003‐4809‐8601. Jiuann‐Huey Ivy Lin, Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America [228]https://orcid.org/0000‐0003‐3029‐6079. Stephanie Burns Wechsler, Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, United States of America [229]https://orcid.org/0000‐0002‐9321‐0220. Seema R. Lalani, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America [230]https://orcid.org/0000‐0003‐0707‐657X. Disclosures None. Source of Funding Research reported in this publication was supported by the American Heart Association Transformational Award AHA 19TPA34850054 (Stephanie M. Ware) and the National Heart, Lung, and Blood Institute (NHLBI) under Award Number F31HL168950 (Jacqueline S. Penaloza). The content is solely the authors' responsibility and does not necessarily represent the official views of these funding agencies. Supporting information Supplemental Methods Tables S1–S2 Figures S1–S16 Data Sets S1–S8 [231]JAH3-14-e039177-s001.zip^ (33.4MB, zip) Acknowledgments