Abstract Chickens are the most abundant agricultural animals globally, with controlling abdominal fat deposition being a key objective in poultry breeding. While GWAS can identify genetic variants associated with abdominal fat deposition, the precise roles and mechanisms of these variants remain largely unclear. Here, we use male chickens from two lines divergently selected for abdominal fat deposition as experimental models. Through the integration of genomic, epigenomic, 3D genomic, and transcriptomic data, we build a comprehensive chromatin 3D regulatory network map to identify the genetic regulatory mechanisms that influence abdominal fat deposition in chickens. Notably, we find that the rs734209466 variant functions as an allele-specific enhancer, remotely enhancing the transcription of IGFBP2 and IGFBP5 by the binding transcription factor IRF4. This interaction influences the differentiation and proliferation of preadipocytes, which ultimately affects phenotype. This work presents a detailed genetic regulatory map for chicken abdominal fat deposition, offering molecular targets for selective breeding. Subject terms: Animal breeding, Agricultural genetics, Gene regulation __________________________________________________________________ Abdominal fat deposition is an important trait in chickens farmed for food. Here, the authors study this trait by integrating -omics data from several sources to construct a 3D chromatin map and validate key variants. Introduction Chicken, being the most widely consumed meat globally, is valued for its high-quality protein content and cost-effectiveness^[44]1. However, traits related to abdominal fat deposition (AFD) significantly impact meat quality and production efficiency. Research indicates that excessive fat accumulation in chickens can notably decrease feed conversion efficiency^[45]2 and carcass lean meat rate^[46]3. Moreover, breeder chickens with excessive fat can negatively affect egg production rate, fertilization rate, and hatchability rate, potentially leading to increased mortality during the laying period^[47]4. Therefore, effectively controlling AFD in chickens is a crucial issue that requires urgent attention in chicken breeding worldwide. Furthermore, chickens are valuable model organisms in life sciences research, playing a significant role in the study of human obesity and related metabolic diseases^[48]5,[49]6. They exhibit characteristics similar to early stages of human type 2 diabetes such as high blood glucose and insulin resistance^[50]7, making them an ideal animal model for obesity research. For instance, the obesity-inducing virus SMAM-1, which leads to obesity in chickens, has been found to have similar effects in humans^[51]8. Moreover, several quantitative trait loci associated with obesity in chickens contain genes linked to human susceptibility to obesity^[52]9. Therefore, studying the regulatory mechanisms of AFD traits in chickens also offers valuable insights into understanding the molecular mechanisms of human obesity and related diseases. The AFD traits are a complex quantitative trait controlled by multiple genomic variants with intricate genetic regulatory mechanisms. Recent research has made significant progress in identifying genomic variants associated with AFD traits through genome-wide association studies (GWAS). Despite these advancements, the functional mechanisms of these variants, especially those located in non-coding regions, remain poorly understood, leading to a ‘black box’ scenario regarding their specific effects on phenotype. Traditional research methodologies, such as the ‘nearest gene model’ based on locational proximity, simply assign the nearest gene as the regulatory target of a genomic variant^[53]10. However, recent studies revealed that approximately 75–76% of genomic variants affect target genes through long–range interactions^[54]11, which challenges the effectiveness of the ‘nearest gene model’. To overcome these limitations, many researchers have turned to expression quantitative trait locus (eQTL) analysis, which aims to identify correlations between genomic variants and the expression levels of their corresponding target genes. International initiatives, such as the Farm Animal Genotype–Tissue Expression (FarmGTEx) project^[55]12–[56]14, have contributed to extensive eQTL datasets on adipose tissues from agricultural animals, enhancing our understanding of the genetic mechanisms underlying genomic variants associated with economic traits. However, eQTL analysis predominantly relies on statistical correlations and fails to directly elucidate the functionality of genomic variants. Emerging epigenetic techniques suggest that variants located in regulatory elements have functional implications^[57]15, but these methods are limited in explicating how functional variants exert their influence through specific gene regulation. In contrast, three–dimensional (3D) genomics approaches based on the frequency of physical chromatin interactions can predict target genes but often neglect the significant influence of epigenetic activity on variants^[58]11,[59]16,[60]17. Therefore, the integration of 3D genomics with multiple omics datasets is crucial for comprehensively understanding how noncoding variants influence transcriptional regulation^[61]18–[62]22. In this study, we used two Northeast Agricultural University Broiler Lines (NEAUHLF), which have been selected for divergent abdominal fat content since 1996, as our experimental populations. These two lines have proven to be ideal genetic models for investigating the underlying mechanisms of obesity^[63]23. Building upon this, we constructed an Integrative Multi–omics approach for Variant–Gene Interactions (IMVGI) that considers genomic, epigenomic, 3D genomic, and transcriptomic data and broadly analyzed how genomic variants systematically affect the transcription regulatory mechanisms of AFD traits. The strategy was to use whole–genome sequencing analysis of the NEAUHLF to identify genomic variants in the chicken genome, and selection signature analysis was used to identify variants associated with AFD traits. Subsequently, the functionality of these variants was annotated through epigenetic analysis, and their potential target genes were identified through 3D genomic analysis. By further employing weighted gene coexpression network analysis (WGCNA), we established correlations between these target genes and AFD traits. Finally, experimental validation revealed the role of crucial functional variants in AFD traits and their regulatory mechanisms. In summary, our research unveils a detailed framework that shows how genomic variants influence transcriptional regulation associated with AFD traits, providing insights into the genetic basis of these traits and identifying significant targets for the molecular breeding of chicken AFD traits. Results Analysis of genomic selection signatures uncovers candidate variants associated with fat deposition In this research, we conducted a thorough investigation to identify candidate variants related to abdominal fat deposition (AFD) traits. Our research focused on a unique population of 330 chickens from the Northeast Agricultural University broiler lines that were divergently selected for abdominal fat content (NEAUHLF) for 19^th generations (years). Despite their similar body weights at 7 weeks of age (BW7), chickens from the fat (FL) and lean (LL) lines displayed a notable 7.83–fold difference in abdominal fat weight (AFW) and a 7.51–fold difference in abdominal fat percentage (AFP = AFW/BW7) (Fig. [64]1a, b and Supplementary Fig. [65]1). This discrepancy underscores the significant phenotypic divergence between the FL and LL groups in terms of AFD traits. Fig. 1. Genomic analysis identifying candidate variants for fat deposition in broiler chicken lines. [66]Fig. 1 [67]Open in a new tab a Comparative visualization of fat (FL) and lean (LL) lines, highlighting the marked difference in abdominal fat tissue. b Graphical representation of body weight at 7 weeks of age (BW7), abdominal fat weight (AFW), and abdominal fat percentage (AFP = AFW/BW7) in the FL and LL groups. Data were shown as the mean ± SD, n = 160 (FL), 170 (LL), biologically independent animals. Statistical analysis was performed using an unpaired two-tailed Student’s t-test. The fold differences observed were as follows: BW7 in FL relative to LL is 1.04-fold (p < 0.0001); AFW is 7.83-fold (p < 0.0001); and AFP shows a 7.51-fold difference (p < 0.0001). c Circos plot displaying the distribution characteristics of genomic variants, including SNPs and InDels, across the whole genome. d Population structure analysis at K = 2, illustrating distinct genetic clusters. e Three–dimensional principal component analysis (PCA) plot. f Analysis of linkage disequilibrium (LD) decay distance, quantified by R^2 values. g Genome–wide F[ST] bar plot, with a red dashed line indicating the cutoff threshold (F[ST] > 0.2), highlighting regions of high genetic differentiation. h Genome–wide θπ ratio bar plot. The red dashed line indicates the cutoff thresholds for log2 (θπ ratios) > 0.3 or < −0.3, identifying areas under selection or with high genetic variation. i Scatter plot showing 2301 selected regions associated with abdominal fat deposition AFD traits identified through joint F[ST] and θπ ratios. j Venn diagram illustrating the overlap between the 2301 adipose–related selected regions and known adipose–related QTLs from the Chicken QTLdb. k Manhattan plot highlighting significant changes in allele frequency (AF) between the FL and LL groups, with regions under selection marked. l Pie chart categorizing the 311,947 candidate variants linked to AFD traits. Source data are provided as a Source Data file. Population structure analysis was subsequently conducted to estimate the genomic relationship between these two groups. Whole–genome sequencing (WGS) of these 330 individuals revealed 4,677,252 high–quality, filtered genomic variants (4,140,284 single nucleotide polymorphisms (SNPs) and 536,968 insertions/deletions (InDels)). Intriguingly, 20.01% of these variants were novel and were not recorded in the dbSNP database ([68]https://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi, last accessed on August 22, 2021) (Fig. [69]1c). Admixture analysis partitioned the population into FL and LL subpopulations at K = 2 (Fig. [70]1d), which was supported by principal component analysis (PCA) (Fig. [71]1e). Linkage disequilibrium (LD) analysis indicated rapid degradation, particularly in the LL subpopulation, suggesting intense selection pressure (Fig. [72]1f). The significant genetic differences between the FL and LL groups make them ideal subjects for studying traits influenced by artificial selection, such as AFD traits. To identify candidate variants associated with AFD traits, we employed selection signature analysis. We calculated the fixation indices (F[ST]) and θπ ratios across the genome, leading to the identification of 2301 regions selected for fat traits (Fig. [73]1g–i). Remarkably, 72.23% of these regions overlapped with known quantitative trait loci (QTLs) associated with the fat traits from the Chicken QTLdb^[74]24 (Fig. [75]1j), suggesting their potential involvement in AFD traits. By focusing on these regions, we identified variants that showed significant changes in allele frequency (|ΔAF | ≥ 0.5) between FL and LL as candidate variants linked to AFD traits. This process allowed us to identify 311,947 candidate variants associated with AFD traits, predominantly located in noncoding intergenic/intronic regions (94.42%), posing challenges in identifying the true functional variants and their target genes (Fig. [76]1k, l). Identification of potentially functional variants regulating fat deposition using epigenomic techniques To identify the potentially functional variants among the 311,947 variants associated with AFD traits, we employed a comprehensive annotation strategy. This methodology entailed the synthesis of data derived from transposase–accessible chromatin sequencing (ATAC–seq) and chromatin immunoprecipitation sequencing (ChIP–seq) of chicken abdominal fat tissues to increase the precision of our identification process. In our analysis of open chromatin regions (OCRs) using ATAC–seq data, we focused on identifying differentially open chromatin regions (DOCRs) in abdominal fat tissues between the FL and LL groups, as those showing statistically significant changes (p < 0.05, |log2 (fold change)| > 1). This approach led us to identify 128,090 OCRs, 4028 of which met the criteria for classification as DOCRs (Fig. [77]2a, b). These DOCRs exhibited distinct length characteristics and were predominantly enriched in regions flanking transcription start sites (TSSs) (Fig. [78]2c, d). We further classified these DOCRs into distinct regulatory elements (REs) based on their genomic features using ChIP-seq data. Specifically, we identified 218 promoters (TSS −4 kb/+2 kb, H3K4me3, H3K27ac), 1106 enhancers (H3K4me1, H3K27ac), 312 silencers (H3K27me3), 2337 CTCF-enriched regions (marked by CTCF peaks), and 1264 low-signal regions (regions lacking significant RE signals) (Fig. [79]2e, f). Notably, some DOCRs may exhibit multiple regulatory functions, leading to their classification as various types of REs (Supplementary Fig. [80]2a). These REs showed higher fold changes in activity signals between FL and LL compared to low-signal regions (Fig. [81]2g), and these low signal regions are highly conserved across all biological replicate samples (Supplementary Fig. [82]2b), thereby enhancing the credibility of our annotations. Fig. 2. Epigenomic screening of functional variants influencing fat deposition. [83]Fig. 2 [84]Open in a new tab a Circos plot showing the distribution and frequency of open chromatin regions (OCRs) across chromosomes identified by ATAC–seq, with each ring representing different sample data. b Volcano plot of 4028 differentially expressed OCRs between FL and LL, with significance thresholds. c Graphical representation of the size distribution of differential OCRs. d Enrichment analysis of ATAC–seq signals near transcription start sites (TSSs). e Annotation of DOCRs with regulatory elements (REs) using ChIP–seq data of histone modifications. f Number of annotated REs in DOCRs. g Comparative analysis of the activity signals of different REs. Violin plots provide a density estimation of the data distribution, and box plots display the median (central line), interquartile range (IQR, bounds), and whiskers representing the minimum and maximum values. Sample sizes are n = 1264 (Low signal), 218 (Promoter), 1106 (Enhancer), 312 (Silencer), and 2337 (CTCF), biologically independent samples. P values were obtained using one-way ANOVA followed by Dunnett’s multiple comparisons test. h Circular plot depicting the identification of superenhancers within DOCRs. i Circular plot showing the identification of supersilencers within DOCRs. j, k Comparative analysis of chromatin marks in enhancers and silencers, with box plots showing the median, interquartile range, and whiskers representing the range of data, and violin plots providing a density estimation of the data distribution. Data are plotted in box plots showing the median (central line) and IQR (bounds) with whiskers extending to the minimum and maximum values. n = 1106 (Typical enhancer), 280 (Super enhancer), 312 (Typical silencer) and 41 (Super silencer), biologically independent samples. P values were obtained using unpaired two-tailed Student’s t-test. l Bar and pie charts showing the distribution and proportion of functional variants across different REs. Source data are provided as a Source Data file. Intriguingly, in addition to typical REs, we also identified a distinct class of enhancers and silencers, termed superenhancers (SEs) and supersilencers (SSs), respectively. SEs were defined by applying the ranking of the ROSE pipeline^[85]25 to ATAC–seq and H3K4me1 signals, which revealed 280 SEs in DOCRs (Fig. [86]2h). The authors of a recent study referred to H3K27me3–rich genomic regions as SSs that promote gene repression through chromatin interactions^[87]26. Similar to the identification of SEs, we defined SSs by utilizing the ROSE algorithm to rank ATAC–seq signals and H3K27me3 signals. This method allowed for the recognition of 41 SSs in DOCRs (Fig. [88]2i). These SEs/SSs showed stronger activity signals than regions with typical REs (Fig. [89]2j, k), enhancing the credibility of our annotations. After applying the comprehensive annotation strategy using ATAC–seq and ChIP–seq data, we identified 2162 potentially functional variants embedded within annotated REs. The functional relevance of these variants is underscored by their distribution: approximately half were significantly enriched at CTCF–binding sites (50.66%), and the rest (45.24%) were nested within distal regulatory elements such as enhancers, SEs, silencers, and SSs (Fig. [90]2l). The localization of these 2162 variants within REs has potential functional significance, as they may influence the activity of these elements and consequently modify the transcriptional regulation of target genes. Identification of variant–gene interactions using three–dimensional genomes To investigate how 2162 potentially functional variants influence gene regulatory mechanisms, we utilized Hi–C technology to perform high–resolution genome–wide chromatin interaction mapping of the abdominal fat tissues of the FL and LL groups (Supplementary Table [91]1). Through this chromatin interaction mapping, we constructed chromatin interaction frequency plots, revealing complex chromatin structures (A/B compartments, topologically associating domains [TADs], and loops) in both the FL and LL genomes (Fig. [92]3a). Furthermore, we crafted individualized three–dimensional (3D) chromatin models for FL and LL, revealing marked structural differences between the two groups. Notably, the LL group exhibited a weaker chromatin conformation, as indicated by increased Von Neumann entropy^[93]27 and decreased interaction frequencies (Fig. [94]3b–d, Supplementary Figs. [95]3 and [96]4). Fig. 3. 3D genome mapping of variant–gene interactions in fat and lean broiler chicken lines. [97]Fig. 3 [98]Open in a new tab a Hi–C contact matrices at varying resolutions for chromosomes in the FL and LL genomes, with darker shades indicating higher chromatin interaction frequencies. b Three-dimensional average models of FL and LL chromatin structures, color-coded to represent different chromosomes at a 500 kb resolution, illustrating average inter-chromosomal spatial relationships rather than specific diploid configurations. c Box plots of Von Neumann entropy comparing chromatin conformational looseness in the FL and LL genomes. Boxplot center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range and data beyond that threshold indicated as outliers. P values were obtained using unpaired two-tailed Student’s t-test, n = 40 biologically independent samples. d Log–log plot of contact frequency versus distance, highlighting structural differences between the FL and LL genomic architectures. e Chromatin A/B compartment distribution maps for specific chromosomal regions in the FL and LL genomes at 100 kb resolution. f Frequency charts of A/B compartment transitions, showing the proportion of transitions between A/B compartments in the FL and LL genomes. g Aggregate peak analysis (APA) of all topologically associating domains (TADs) within the FL and LL genomes at 10 kb resolution. h Statistical summary of specific and shared TAD boundaries, including counts and percentages. i APA for a comparative analysis of all chromatin loops within the FL and LL genomes at resolutions of 1, 5, 10, and 25 kb. j Distribution charts summarizing the number and percentage of specific and shared chromatin loops. k Schematic diagrams illustrating how functional variants influence gene interactions through changes in proximal and distal RE activity and in TAD and loop structures. l Table detailing 2559 variant–gene interactions identified, categorized by interaction type, variant number, REs, and target genes involved. Source data are provided as a Source Data file. We conducted a comparative analysis of A/B compartments, TADs, and loop structures between the FL and LL groups. Our analysis at a 100 kb resolution revealed that only 6.16% of the total genomic regions showed transitions between the A and B compartments in the FL and LL groups, suggesting high conservation in these compartment regions (Fig. [99]3e, f). Furthermore, a more refined 10 kb resolution allowed us to identify a total of 3829 TAD boundaries, of which 2898 (75.69%) were shared by the two groups. Additionally, there were 483 (12.61%) FL–specific TAD boundaries and 448 (11.70%) LL–specific TAD boundaries (Fig. [100]3g, h). Further examination of loop structures at resolutions of 1, 5, 10, and 25 kb yielded a total of 4246 loops, with only 1243 (29.27%) loops being shared by the two groups. Additionally, there were 2100 (49.46%) FL–specific loops and 903 (21.27%) LL–specific loops, which was further confirmed by CTCF enrichment at loop anchors (Fig. [101]3i, j and Supplementary Fig. [102]4d). These findings underscored a notable disparity in TADs (24.31%) and loops (70.73%) structures between FL and LL, suggesting their potential regulatory roles. We annotated the 2162 potentially functional variants associated with AFD traits on the high–order structure of the 3D genome, focusing on TAD boundaries and loops, to identify target genes regulated by these variants. This process revealed four primary transcriptional regulatory mechanisms mediated by functional variants: (1) alterations in proximal RE activity, in which 552 functional variants modulate the activity of 168 proximal REs (promoters, nearby enhancers, and nearby silencers) to regulate the transcription of 154 target genes; (2) modifications in distal RE activity, in which 619 functional variants exert transcriptional control over 261 target genes by altering the activity of 160 REs (enhancers and silencers); (3) reconfiguration of the TAD structure, in which 40 functional variants impact the transcription of 18 target genes by modifying 8 TAD structures, particularly at CTCF binding sites located at TAD boundaries; and (4) loop structure alterations, in which 666 functional variants regulate 267 distant target genes by influencing 163 loop structures, especially at CTCF binding sites within loop anchor points (Fig. [103]3k, l). In total, our analysis identified a compendium of 1169 functional variants that regulate the expression of 500 target genes by influencing chromatin structures such as proximal and distal regulatory elements, TADs, and loop structures, resulting in a complex network of 2559 variant–gene interactions. Weighted gene coexpression network analysis reveals key variant–gene interactions related to fat deposition After identifying the target genes of potentially functional variants, we employed weighted gene coexpression network analysis (WGCNA) to explore the associations between the expression of target genes and fat traits, thereby establishing comprehensive variant–gene networks related to AFD traits. Upon detailed analytical consideration, a power beta value of 11 was chosen for WGCNA to ensure a scale–free network (Fig. [104]4a, b). Using this power beta value, 439 of the above 500 target genes were successfully grouped into four distinct functional modules (Fig. [105]4c). Specifically, the blue module encompassed 133 genes, the turquoise module contained 172 genes, the brown module comprised 86 genes, and the yellow module included 48 genes. The remaining 61 genes, which did not exhibit a sufficiently strong coexpression relationship with the others, have been allocated to the gray group in Fig. [106]4c. Fig. 4. Weighted gene coexpression network analysis (WGCNA) and functional enrichment analysis. [107]Fig. 4 [108]Open in a new tab a Scale independence (left) and mean connectivity (right) analysis for choosing the soft threshold in establishing the WGCNA network. b Gene clustering and identification of gene modules using WGCNA. c Heatmap illustrating the relationships between gene modules and traits. The module–trait relationships were assessed by calculating the Pearson correlation coefficients (r) between the module eigengenes and trait data (AFW and AFP). P values were calculated based on the Pearson correlation test, reflecting the statistical significance of the correlations. d Network diagram of adipose–related variant–gene interactions. e Gene Ontology (GO) term enrichment analysis for target genes. f Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of target genes, providing insight into biological pathways. KEGG pathway enrichment analysis was performed using a two-sided Fisher’s exact test. Multiple comparisons were adjusted using the Benjamini-Hochberg (BH) method to control the false discovery rate (FDR). Source data are provided as a Source Data file. The association analysis between the gene modules and phenotypes revealed a significant relationship between 439 target genes and AFD traits among the 1134 functional variants. Specifically, 305 target genes of 900 functional variants were associated to fat deposition. This finding is supported by the positive correlations observed in the blue module, which exhibited a strong positive correlation with AFW (r = 0.90, p = 4e − 04) and AFP (r = 0.87, p = 1e − 04), and in the turquoise module, which showed a similarly strong positive correlation with AFW (r = 0.98, p = 8e − 07) and AFP (r = 0.99, p = 1e − 07) (Fig. [109]4c). Conversely, another set of 134 target genes of 518 functional variants appeared to inhibit fat deposition. This observation was supported by the negative correlations in the brown module with AFW (r = −0.64, p = 0.01) and AFP (r = −0.65, p = 0.01) and in the yellow module with AFW (r = −0.53, p = 0.04) and AFP (r = −0.54, p = 0.04) (Fig. [110]4c). Notably, there is an overlap of 284 variants between the two sets, resulting in a total of 1134 unique variants (900 + 518 - 284) regulating the 439 genes associated with AFD traits. Additionally, 61 target genes of 35 functional variants, encompassing 304 variant–gene interactions, were identified within the grey module and were not associated with AFD traits. In total, our analysis successfully highlighted 439 significant adipose–related target genes influenced by 1134 functional variants, encompassing 2255 variant–gene interactions (Fig. [111]4d). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses further confirmed that these genes strongly correlated with lipid metabolism–related pathways, primarily signal transduction, metabolic processes, and cell growth and death (Fig. [112]4e, f and Supplementary Fig. [113]5d–f). Functional characterization of variant–gene interactions in relation to fat deposition In this study, we constructed an integrated framework, as described above, that combines genomic, epigenomic, 3D genomic, and transcriptomic data to unravel the mechanisms of transcriptional regulation involved in the modulation of AFD traits by functional variants; this framework is termed Integrative Multiomics for Variant–Gene Interactions (IMVGI). This holistic approach allowed a systematic analysis of functional variants and their impact on transcriptional regulatory mechanisms associated with AFD traits. (Fig. [114]5a). Using IMVGI, we identified 2255 adipose–related variant–gene interactions, including 1134 functional variants (90.48% SNPs and 9.52% InDels) and 439 target genes. These functional variants were predominantly localized in enhancer–associated regions (45.67%) and CTCF–associated regions (40.06%) (Fig. [115]5b and Supplementary Data [116]1–[117]3). On average, each functional variant was predicted to regulate 1.94 target genes, whereas each target gene was influenced by 4.23 functional variants. The average genomic distance of these interactions was approximately 201 kb, with about 71.00% of variants skipping over the nearest gene to regulate their target genes through long-range interactions (Fig. [118]5c–f and Supplementary Fig. [119]5a–c), highlighting the importance of integrating 3D genomic interactions in this study. Fig. 5. Functional analysis of variant–gene interactions in relation to AFD traits. [120]Fig. 5 [121]Open in a new tab a Flowchart model for Integrative Multi–omics for Variant–Gene Interactions (IMVGI) identification of functional variants and target genes related to AFD traits. b Functional annotations for IMVGI variants. c Cumulative fractions of target genes per IMVGI variant (mean = 1.94). d Cumulative fractions of IMVGI variants per target gene (mean = 4.23). e Cumulative fractions of genomic distances for identified variant–gene connections (mean = 201 kb). f The bar chart indicates the number of genes skipped by variants before interacting with their target genes. The x-axis represents the number of genes skipped, and the y-axis represents the number of variants. g Comparative analysis of the enrichment of non–IMVGI and IMVGI variants in active chromatin regions, including ATAC–seq, H3K27ac, H3K4me3, H3K4me1, and CTCF markers. Violin plots provide a density estimation of the data distribution, and box plots display the median (central line), interquartile range (IQR, bounds), and whiskers representing the minimum and maximum values. Sample sizes are n = 1134 biologically independent samples (unpaired two-tailed Student’s t-test). h Heatmaps showing transcription factor binding preferences for non–IMVGI vs.