Abstract Introduction Heart failure (HF) is the most common complication following myocardial infarction (MI) and frequently occurs during the postinfarction recovery phase. Despite the well-established association between HF and MI, the underlying molecular mechanisms driving their transition remain poorly understood. Methods The aim of this study was to identify key regulatory genes involved in this transition via advanced computational tools. We conducted a comprehensive analysis of differentially expressed genes (DEGs) via Limma software, leveraging five independent datasets retrieved from the Gene Expression Omnibus (GEO) database: [32]GSE59867, [33]GSE62646, [34]GSE168281, [35]GSE267644, and [36]GSE269269. Our multistep analytical pipeline included weighted gene coexpression network analysis (WGCNA) to map interacting genes, machine learning algorithms for robust classification, functional annotation via Kyoto Encyclopedia of Genes and Genomes (KEGG) to explore biological pathways, CIBERSORT correlation analysis linking hub genes with immune cell states, transcriptional regulation profiling of key hubs, and single-cell sequencing to assess the functional relevance of these hubs. Results Our findings revealed that 413 DEGs were significantly different between MI and HF. WGCNA identified 98 genes associated with both conditions. Machine learning filtering further prioritized 10 hub genes: GPER1, E2F5, DZIP3, CYLD, ADAMTS2, ZNF366, ST14, SNORD28, LHFPL2, and HIVEP2. These hubs were significantly associated with immune-related processes, suggesting their potential role in the pathogenesis of HF after MI. Single-cell transcriptomic analysis demonstrated that CYLD exhibited the strongest correlation with the transition from MI to HF; using random forest modelling, we validated its predictive value in this context. Discussion In conclusion, our study identified CYLD as a critical regulator of the transition from MI to HF. Our findings underscore the potential of hub genes as targets for novel therapeutic interventions aimed at mitigating MI-to-HF progression and improving patient outcomes. Keywords: heart failure, myocardial infarction, WGCNA, machine learning, single-cell sequencing analysis, hub genes 1 Introduction Myocardial infarction (MI) is a leading cause of morbidity and mortality worldwide, with heart failure (HF) being a common yet devastating complication ([37]Bahit et al., 2018; [38]Li et al., 2024). Despite advancements in treatment strategies, patients who experience MI remain at high risk for the development of HF, which significantly impacts their long-term prognosis ([39]Kaul et al., 2013). HF frequently follows MI, and its prevalence among hospitalized patients with acute MI varies widely across studies, ranging from 14% to 36% ([40]Bahit et al., 2018). HF significantly elevates the risk of mortality following MI. Over time, mortality rates after MI have decreased, primarily due to advancements in HF survival rates. This difference may be attributed to varying risk factors and underlying mechanisms that contribute to the development of HF at different time points following MI([41]Gerber et al., 2016; [42]Jenca et al., 2021). Current guidelines recommend that patients undergo early risk assessment after acute MI to ensure the provision of appropriate therapy. Early and accurate risk stratification is crucial for identifying patients at risk of developing HF post-MI, as it guides treatment and improves prognosis ([43]Ibanez et al., 2018; [44]Collet et al., 2021; [45]Jenca et al., 2021; [46]Akhtar et al., 2024). The American Heart Association and the European Society of Cardiology have long acknowledged the existence of an urgent public health need for the prevention of HF ([47]Schocken et al., 2008; [48]Ponikowski et al., 2014). The cohort of patients who have experienced an MI constitutes a high-risk demographic for the development of HF, rendering HF screening and prevention within this group particularly crucial. Failure to diagnose HF promptly or accurately can jeopardize patient outcomes and increase treatment expenses ([49]Faridi et al., 2016; [50]Jenca et al., 2021). If a biomarker is causally associated with HF, it may play a pivotal role in predicting the onset and progression of HF post-MI. This predictive capability not only aids in early diagnosis but also facilitates personalized treatment strategies; thus, these biomarkers may ultimately serve as a important targets for intervention. By identifying and targeting such biomarkers, healthcare providers can implement more effective preventive measures and therapeutic interventions, thereby improving patient outcomes and reducing the overall burden of HF. In pursuit of this goal, the identification of potential target genes involved in the development of HF post-MI is of paramount importance. To further elucidate the molecular mechanisms driving HF after MI, our study employed a comprehensive bioinformatics approach. Given the swift advancements in microarray and high-throughput sequencing technologies, bioinformatics approaches are frequently employed in disease research. Using weighted gene coexpression network analysis (WGCNA), we identified several modules enriched for immune-related processes that were significantly associated with the transition from MI to HF. These findings were validated via the CIBERSORT algorithm, which revealed specific signatures linked to unfavourable outcomes in high-risk patients. Single-cell transcriptomic data provided additional insights into the heterogeneity of HF progression. Our findings underscore the importance of early and accurate risk stratification for HF development in MI survivors. By identifying biomarkers that are causally associated with HF, we can develop targeted therapeutic strategies aimed at reversing or preventing disease progression. 2 Materials and methods 2.1 Data collection In this study, we accessed gene expression microarray data from the Gene Expression Omnibus (GEO) database ([51]http://www.ncbi.nlm.nih.gov/geo). The microarray data were obtained via multiple platforms: [52]GPL6244 with accession numbers [53]GSE59867 ([54]Maciejak et al., 2015) and [55]GSE62646 ([56]Kiliszek et al., 2012), an additional external validation set [57]GPL11154 with accession number [58]GSE168281, and single-cell gene expression datasets [59]GSE267644 ([60]Kneuer et al., 2024) and [61]GSE269269 ([62]Qian et al., 2024). 2.2 Identification of differentially expressed genes (DEGs) The initial expression matrix was normalized and processed via R software (version 4.4.1). To identify DEGs between the datasets [63]GSE59867 and [64]GSE62646, we utilized the “limma” package within the R environment. DEGs were selected based on an adjusted P value <0.05 and a fold-change threshold of |log FC| ≥ 0.5. 2.3 Construction of the WGCNA network WGCNA is a widely used bioinformatics tool for revealing gene coexpression patterns across multiple samples by clustering genes with similar expression profiles into distinct modules. These gene modules can be further correlated with external traits or phenotypes to explore potential functional associations ([65]Langfelder and Horvath, 2008). In this study, WGCNA was performed via the R WGCNA package. To construct a reliable coexpression network, we selected a soft-thresholding power of 11 based on the criterion for achieving a scale-free topology. Specifically, we evaluated a range of power values and identified the lowest power for which the scale-free topology fit index (R ^2) exceeded 0.80 while ensuring relatively high mean connectivity. This approach aligns with the principle that biological networks tend to exhibit scale-free properties. 2.4 Identification of shared genes To identify the core common genes, we performed a comprehensive analysis by first constructing Venn diagrams of genes identified through WGCNA and DEG analyses to identify overlapping genes. These shared genes were then selected for subsequent analysis. 2.5 Screening hub genes via the machine learning algorithm Following the identification of shared genes, we applied two complementary machine learning algorithms to identify potential hub genes with the highest discriminative power. First, we used support vector machine recursive feature elimination (SVM-RFE), which employs a linear kernel function to ensure model interpretability and computational efficiency ([66]Huang et al., 2014). Feature ranking was performed iteratively, and the optimal subset was selected based on the minimal cross-validation error. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was implemented via the glmnet package in R ([67]Li and Sillanpaa, 2012). We applied 10-fold cross-validation to identify the optimal value of the regularization parameter lambda (λ), defined as the value minimizing the mean cross-validated error (lambda.min). To validate the classification performance of the selected features, we constructed multiple supervised learning models—logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbour (KNN), support vector machine (SVM), random forest (RF), ridge regression, elastic net, naïve Bayes and XGBoost. Their predictive performance was evaluated via the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). The model yielding the best overall metrics was adopted for hub gene selection. 2.6 Correlation analysis between infiltrating immune cells and hub genes The CIBERSORT algorithm was employed for immune cell infiltration analysis. Spearman correlation analysis was performed to examine the relationships between infiltrating immune cells and hub genes, and the corrplot tool in R was used for visualization. 2.7 Candidate biomarker expression levels and diagnostic value The ggplot2 package in R was used to analyse the expression levels of candidate biomarkers. To assess their potential diagnostic performance, receiver operating characteristic (ROC) analysis was conducted, with the AUC serving as a key metric for evaluating the efficacy of these markers. 2.8 Construction of the nomogram model Based on the selected explanatory variables, we constructed a nomogram model employing the “rms” package. We calculated every item’s score by projecting upwards on a small scale (points) based on the characteristics of each variable of the patient. The total value was calculated by adding the scores of each item. A higher total value indicates a higher probability of HF. To assess the model’s accuracy, we conducted an analysis of the calibration curve, decision curve analysis (DCA) curve and clinical impact curve ([68]Iasonos et al., 2008). 2.9 Single-cell sequencing analysis The data were first subjected to standardized preprocessing via the Seurat software package ([69]Stuart et al., 2019). The t-SNE method was subsequently applied to model spatial relationships between cellular clusters. Cluster annotation was then performed via the celldex tool, with particular emphasis on key functional cell types. By implementing the FindAllMarkers function combined with a log2(FC). In the threshold thresholding approach, significant marker genes for individual cell subsets were identified from the single-cell expression data. 3 Results 3.1 Elucidating key regulatory genes in the MI-to-HF pathway via DEGs and WGCNA To identify the key regulatory genes associated with progression from MI-to-HF, we performed a genome-wide gene expression analysis using publicly available datasets from the GEO database ([70]GSE59867 and [71]GSE62646). These datasets compare gene expression profiles between HF patients and those who remained stable post-MI. The raw data were normalized via the “limma” package ([72]Supplementary Figure S1), and DEGs were identified under stringent criteria (P value <0.05 and log-fold change |logFC| ≥ 0.5). This analysis revealed 413 DEGs, including 250 upregulated and 163 downregulated genes ([73]Figure 1A). FIGURE 1. [74](A) Volcano plot showing gene expression changes with purple indicating downregulated, grey not changed, and yellow upregulated genes. (B) Line graphs depicting scale independence and mean connectivity against soft threshold power. (C) Histogram of k values and log-log scale plot for scale-free topology check. (D) Gene dendrogram with module colors. (E) Heatmap displaying module-trait relationships with correlation values. (F) Venn diagrams showing overlaps of upregulated and downregulated differentially expressed genes (DEGs) with positive and negative genes. [75]Open in a new tab Identification of differentially expressed genes. (A) Differential expression analysis between MI patients with HF and non-HF; (B) A scale-free network model was utilized to identify the optimal β value; (C) Network connectivity distribution analysis revealed a right-skewed histogram pattern; (D) WGCNA was employed to construct modules via hierarchical clustering; (E) A heat map was generated to visualize the association between WGCNA modules and enrollment characteristics; (F) Venn diagrams were used to illustrate the overlap of DEGs associated withincident HF risk across patient cohorts. For network construction, we employed the WGCNA approach. A soft threshold of 11 was selected to ensure network scalability ([76]Figures 1B,C), resulting in the identification of eight distinct modules, each represented by a unique colour ([77]Figure 1D; [78]Supplementary Figure S2A). Among the modules, the blue, pink, and green modules were significantly correlated with MI patients at high risk of HF progression ([79]Supplementary Figure S2B–D). Specifically, the blue module exhibited a positive correlation (r = 0.55, P = 0.02), as did the pink module (r = 0.56, P = 0.02), whereas the green module displayed a negative correlation (r = −0.5, P = 0.04), suggesting a potential protective role against HF progression ([80]Figure 1E). To identify candidate genes, we performed a Venn diagram analysis of the DEGs and module genes, which yielded 98 overlapping genes ([81]Figure 1F). These genes were identified as potential regulatory factors driving the MI-to-HF transition. 3.2 Hub gene identification and validation in MI-to-HF progression To identify hub genes with significant differences between HF patients and non-HF patients post-MI, we employed the SVM-RFE algorithm to select 10 candidate genes (DZIP3, HIVEP2, ZNF366, CYLD, SNORD28, GPER1, E2F5, LHFPL2, ADAMTS2 and ST14) ([82]Figure 2A). These genes were further refined via the LASSO algorithm, which identified three critical hub genes (ZNF366, HIVEP2, and E2F5) from an initial set of 98 common genes ([83]Figures 2B,C). The 10 candidate genes were integrated for subsequent analyses ([84]Supplementary Table S1). FIGURE 2. [85]Image consists of three graphs labeled A, B, and C. Graph A shows a line plot of RMSE (cross-validation) versus the number of variables, indicating a minimum at 10 variables. Graph B displays coefficient paths in a regularization model over L1 norm. Graph C presents binomial deviance against Log(lambda), highlighting minimum deviance at a specific lambda value. [86]Open in a new tab Features selection using machine learning algorithms. (A) Selection of characteristic genes via SVM-RFE algorithm; (B,C) Selection of characteristic genes via LASSO algorithm. Comparative analysis of hub gene expression revealed distinct patterns between HF patients and non-HF patients post-MI. DZIP3, HIVEP2, CYLD, SNORD28 and E2F5 were significantly downregulated in HF patients compared with non-HF patients ([87]Figure 3A). Among these genes, ZNF366, ST14 and LHFPL2 were positively correlated with MI-to-HF progression, whereas SNORD28, HIVEP2 and CYLD were negatively correlated ([88]Figure 3B). Notably, HIVEP2 and CYLD demonstrated the greatest functional importance in this progression ([89]Figure 3C). To validate these findings, we analysed the [90]GSE168281 dataset, which confirmed consistent expression trends and statistical significance (P < 0.05) ([91]Figure 3D), supporting the reliability of our results. FIGURE 3. [92]Four-panel figure illustrating gene expression analysis. Panel A shows box plots of gene expression for different genes, comparing HF and NHF groups. Panel B displays a forest plot with odds ratios and confidence intervals for various genes, indicating positive and negative effects. Panel C presents two scatter plots showing variable importance in models, labeled "%IncMSE" and "IncNodePurity." Panel D contains a bar graph comparing gene expression in the MI and HF groups using the GSE168281 dataset, with significant differences marked by asterisks. [93]Open in a new tab Differential expression and values of hub genes in the training and test groups. (A) Differential analysis of variables between HF patients and non-HF patients after MI; (B) Single-factor logistic regression analysis of variable associations with HF post-MI; (C) Feature importance assessment in the Random Forest model; (D) Expression levels of the 10 hub genes identified in [94]GSE168281 dataset. Statistical significance was assessed using the Wilcoxon rank-sum test (two-sided). *P < 0.05, **P < 0.01. 3.3 Immune cell infiltration and correlation with hub genes in the MI-to-HF progression To investigate the progression from MI to HF, we performed a KEGG enrichment analysis focusing on temporal changes in pathways following MI ([95]Supplementary Figure S3A-F). Heatmaps revealed dynamic pathway alterations, highlighting key processes such as the inflammatory response, interferon alpha signalling, E2F targets, epithelial‒mesenchymal transition and glycolysis ([96]Figure 4A). KEGG pathway analysis confirmed the involvement of immune-related pathways in the progression from MI to HF. Using CIBERSORT, we analysed immune cell infiltration and identified 16 immune cell subpopulations ([97]Figure 4B). Compared with the non-HF post-MI group, the HF post-MI group exhibited significant dynamic changes in dendritic cells (DCs) and naïve CD4^+ T cells. Hub genes, including CYLD and HIVEP2, exhibited strong correlations with immune cells in the HF group 1 day post-MI. These correlations persisted for most hub genes up to 6 days post-MI, except for ZNF366, SNORD28, GPER1, and ADAMTS2. At 30 days post-MI, the hub genes (excluding HIVEP2, ADAMTS2 and ST14) maintained close associations with immune cells. Notably, SNORD28, LHFPL2, ADAMTS2 and ST14 maintained their immune cell correlations 180 days post-MI ([98]Figure 5). FIGURE 4. [99]Two heatmaps labeled A and B display data with color gradients from red to blue. A includes gene pathways and immune responses with time points: 1 day, 4-6 days, 1 month, and 6 months. B includes immune cell types with different conditions at those same time intervals. Each heatmap uses clustering and color intensity to indicate data variations. [100]Open in a new tab KEGG enrichment analysis for the progression from MI to HF. (A) Heatmap depicting the alteration of KEGG pathways in the procession from MI-to-HF; (B) Heatmap showing the alteration of immune characteristics in HF and non-HF after MI by CIBERSORT. FIGURE 5. [101]Heatmaps labeled A, B, C, and D compare immune cell types against specific genes. Color gradients represent correlation values from red (positive) to blue (negative). Asterisks denote significance levels: *p<0.05, **p<0.01, ***p<0.001. Cells analyzed include neutrophils, T cells, and more, with genes like DLZIP3 and HVEP2. [102]Open in a new tab The relationship between hub gene expression and immune cell infiltration. (A) 1 days; (B) 4–6 days; (C) 30 days and (D) 180 days after MI. 3.4 Construction and evaluation of machine learning models We performed a rigorous comparative evaluation of machine learning algorithms for post-MI HF progression risk prediction, employing feature subsets derived from two distinct selection methodologies: SVM-RFE ([103]Table 1) and LASSO ([104]Table 2). Among the eight clinically relevant classifiers evaluated—LR, LDA, KNN, SVM, ridge regression, elastic net, NBC and XGBoost—the RF ensemble learning architecture demonstrated statistically significant superiority across four critical clinical performance metrics: sensitivity, specificity, positive predictive value and negative predictive value. This optimized RF implementation was consequently designated the principal predictive framework for subsequent mechanistic investigations of post-MI HF progression dynamics based on its balanced performance in terms of both statistical accuracy and clinical interpretability. TABLE 1. Predictive model performance for HF progression risk using SVM-RFE features. Term Training cohort (1 day after MI) Test cohort (4 days after MI) AUC Sensitivity Specificity PPV NPV AUC Sensitivity Specificity PPV NPV Logistic Regression 1.000 1.000 1.000 1.000 1.000 0.528 0.222 1.000 0.500 0.364 Linear Discriminant Analysis 1.000 1.000 1.000 1.000 1.000 0.704 0.889 0.500 0.750 0.455 KNN 0.938 0.875 1.000 1.000 0.900 0.750 0.500 1.000 1.000 0.750 SVM 0.938 0.875 1.000 1.000 0.900 0.778 0.667 0.889 0.800 0.800 Random Forest 1.000 1.000 1.000 1.000 1.000 0.926 0.889 0.833 1.000 0.545 Ridge 0.986 1.000 0.875 0.900 1.000 0.759 0.778 0.667 0.667 0.417 Elastric Net 1.000 1.000 1.000 0.900 1.000 0.741 0.889 0.667 0.667 0.417 Naïve Bayes Classifier 0.826 0.708 0.806 0.875 0.778 0.667 0.333 1.000 1.000 0.500 Xgboost 1.000 1.000 1.000 1.000 1.000 0.889 1.000 0.833 1.000 0.500 [105]Open in a new tab TABLE 2. Predictive model performance for HF progression risk using LASSO features. Term Training cohort (1 day after MI) Test cohort (4 days after MI) AUC Sensitivity Specificity PPV NPV AUC Sensitivity Specificity PPV NPV Logistic Regression 0.986 1.000 0.875 0.889 0.875 0.796 0.778 0.833 0.800 0.500 Linear Discriminant Analysis 0.986 1.000 0.875 0.875 0.778 0.815 0.889 0.667 1.000 0.500 KNN 0.938 0.875 1.000 1.000 0.900 0.722 0.611 0.733 0.667 0.778 SVM 0.882 0.875 0.889 0.875 0.889 0.667 0.625 0.664 0.571 0.750 Random Forest 1.000 1.000 1.000 1.000 1.000 0.787 1.000 0.500 1.000 0.462 Ridge 0.986 1.000 0.875 0.875 0.778 0.833 0.889 0.667 1.000 0.500 Elastric Net 0.986 1.000 0.875 0.875 0.778 0.815 0.889 0.667 1.000 0.500 Naïve Bayes Classifier 0.826 0.778 0.875 0.875 0.778 0.611 0.222 1.000 1.000 0.462 Xgboost 0.001 0.001 0.001 0.001 0.889 0.796 0.667 0.833 1.000 0.462 [106]Open in a new tab To validate the diagnostic potential of the 10 candidate hub genes, we conducted rigorous ROC curve analyses using independent validation cohorts. The biomarker panel demonstrated exceptional acute-phase discrimination capacity, achieving perfect classification (AUC = 1.000, 95% CI 1.000-1.000) at 24 h post-MI in the training cohort ([107]Figure 6A). Temporal performance analysis revealed peak diagnostic accuracy during the subacute phase (four to six days post-MI; AUC = 0.926, 95% CI 0.722–1.000), followed by gradual attenuation at chronic timepoints: 0.828 (95% CI 0.562–0.969) at 1 month and 0.812 (95% CI 0.547–0.984) at 6 months postinfarction ([108]Figure 6B‒D). This temporal performance pattern suggests strong utility for early-stage heart failure detection in acute MI settings while highlighting the potential need for complementary biomarkers in chronic phase monitoring. FIGURE 6. [109]Four ROC curve plots labeled A through D compare model performance. Plot A (Training Dataset) exhibits an ideal curve with an AUC of 1.00. Plots B, C, and D (Test Dataset) show varying performance. Plot B has an AUC of 0.926, plot C has an AUC of 0.828, and plot D has an AUC of 0.812. Each plot includes confidence intervals for sensitivity, specificity, PPV, NPV, and accuracy. [110]Open in a new tab Prognostic Capability of Random Forest Model in Predicting HF Post-MI. (A) Ten hub genes distinguish MI patients at risk of HF from non-progressors within 1 day post-MI (training cohort); (B–D) Prognostic accuracy of hub genes in test cohort at 4–6 days, 1 month and 6 months post-MI. We developed a multivariate prognostic nomogram integrating three critical components, (1) baseline risk score, (2) DCs abundance and (3) naïve CD4^+ T cell frequency, with each parameter assigned weighted scores based on multivariate Cox regression coefficients ([111]Figure 7A). The model demonstrated excellent calibration fidelity and clinical utility across the validation cohorts ([112]Figures 7B–D). Notably, the combined model incorporating DCs, naïve CD4^+ T cells, and hub gene expression profiles achieved exceptional diagnostic accuracy in non-HF patients (AUC = 0.976, 95% CI 0.901–1.000) and maintained robust performance in established HF patients (AUC = 0.852, 95% CI 0.684–0.957) ([113]Figure 7E‒F). Comparative analysis demonstrated significant predictive improvement when immune cell biomarkers were combined with genomic signatures versus individual modalities ([114]Supplementary Figure S4). FIGURE 7. [115]A composite image of multiple figures related to medical prediction modeling:A. A nomogram displaying points related to CD4 T cells, DCs, risk score, and total points, indicating disease risk.B. Calibration plot comparing predicted and actual probabilities, with lines representing apparent, bias-corrected, and ideal predictions.C. Decision curve analysis showing the number of high-risk cases across various high-risk thresholds and cost-benefit ratios.D. Net benefit curve with labels for different prediction models and threshold probabilities.E. Receiver Operating Characteristic (ROC) curve for a CD4 T cell and DCs signature, indicating sensitivity and specificity with associated statistics.F. ROC curve for a different signature, also showing sensitivity, specificity, and statistics. [116]Open in a new tab Prognostic performance of combined signatures in predicting HF progression post-MI. (A) Nomogram model integrating riskScore, DCs and CD4^+T cells; (B) Calibration plot assessing model accuracy by comparing predicted and observed probabilities; (C) Decision curve analysis: identification of high-risk individuals at varying thresholds; (D) Decision curve analysis: net benefit across threshold probabilities; (E) Prognostic performance of combined signatures in predicting non-HF progression post-MI; (F) Prognostic performance of combined signatures in predicting HF progression post-MI. The superior predictive capacity of this immune‒genomic combination model may reflect the dual role of DC-mediated inflammatory responses and CD4^+ T cell-regulated adaptive immunity in post-MI ventricular remodelling. Future single-cell sequencing studies could elucidate the spatial‒temporal interplay between these immune subsets and myocardial gene expression patterns during HF progression. 3.5 Single-cell transcriptomic profiling of the hub genes To elucidate the cellular specificity of the hub genes, we performed integrated analysis of single-cell RNA sequencing (scRNA-seq) data from two independent myocardial infarction cohorts ([117]GSE226794 and [118]GSE269269). Following batch effect correction via the Harmony integration algorithm ([119]Supplementary Figure S5), we identified six major cellular compartments: basal cells, epithelial cells, erythrocytes, immune cells, platelets and proliferating cells ([120]Figure 8A). Immune cell subclustering revealed seven functionally distinct subtypes, including neutrophils, γδ T cells, cytotoxic T cells, B cells, macrophages, DCs and plasma cells, with substantial compositional shifts between the MI and HF states ([121]Figures 8B–D). FIGURE 8. [122]Panel A and B show UMAP plots visualizing cell types and immune cells across dimensions. Panel C is a bar chart displaying cell type percentages across samples HC, HF, and MI. Panel D is a dot plot correlating cell identities with various features, indicated by dot size and color intensity. Panel E displays UMAP plots of CYLD expression in HC, HF, and MI groups, with a color gradient legend for expression levels. Panel F is a violin plot comparing CYLD expression levels across HC, HF, and MI, with significant differences marked by asterisks. [123]Open in a new tab Single-cell analysis of cell populations. (A) UMAP visualization of annotated cell types; (B) Subclustering and annotation of immune cells; (C) The relative proportions of immune cell subtypes; (D) Feature plots used for cell type annotation; (E) UMAP distribution of target gene CYLD expression across groups; (F) Quantitative analysis of CYLD expression levels. Statistical significance was assessed using the Wilcoxon rank-sum test (two-sided). ***P < 0.001. Hub gene expression mapping revealed cell type-specific enrichment patterns: CYLD exhibited predominant expression in immune subsets, whereas SNORD28 was excluded from further analysis due to a lack of cell type specificity ([124]Supplementary Figure S6). Strikingly, CYLD expression was markedly downregulated in HF-derived immune cells compared with that in MI samples ([125]Figures 8E,F). 3.6 Functional and diagnostic validation of the CYLD Gene Ontology (GO) enrichment analysis of CYLD-associated pathways revealed differential immune regulation between the MI and HF microenvironments. In MI samples, CYLD ^+ T cells were enriched in positive regulation of T-cell activation and the Wnt signalling pathway ([126]Figure 9A), whereas HF-associated CYLD ^+ T cells were enriched in small molecule catabolic processes ([127]Figure 9B). A similar dichotomy was observed in B-cell populations, with MI samples showing leukocyte cell‒cell adhesion versus HFs exhibiting lymphocyte differentiation ([128]Figure 9C‒D). FIGURE 9. [129]Four bubble plots labeled A, B, C, and D depict gene enrichment data. Each plot shows different pathways or processes on the y-axis versus GeneRatio on the x-axis. Bubble sizes indicate count, and colors represent p-value adjustments. Plot A focuses on Wnt signaling pathways, B on metabolic processes, C on leukocyte actions, and D on immune responses. The color gradient ranges from red to blue, indicating significance levels. [130]Open in a new tab Functional enrichment analysis of target genes in T cells and B cells. (A–B) GO enrichment analysis of CYLD in T cells: (A) MI group and (B) HF group; (C–D) GO enrichment analysis of CYLD in B cells: (C) MI group and (D) HF group. The diagnostic potential of CYLD was validated through receiver operating characteristic (ROC) analysis (AUC = 0.817, 95% CI 0.647–0.940) in an independent validation cohort ([131]Supplementary Figure S7). CYLD is a novel biomarker reflecting immune dysregulation. 4 Discussion Despite therapeutic advances, post-MI HF remains a critical clinical challenge, with 45% of cases emerging within the first year postevent ([132]Carberry et al., 2024). While early mortality rates have halved since the 1990s, chronic HF development continues to increase mortality risk compared with that of non-HF counterparts ([133]Docherty et al., 2023; [134]Carberry et al., 2024). These findings underscore the importance of early predictive biomarkers. We report that markers of HFs from MIs can be identified and that an HF warning can be issued. Through integrative bioinformatics combining differential expression analysis and WGCNA, we identified 98 candidate genes linked to post-MI HF. By utilizing the machine learning algorithms LASSO and SVM-RFE, we successfully identified hub genes, namely, GPER1, E2F5, DZIP3, CYLD, ADAMTS2, ZNF366, ST14, SNORD28, LHFPL2 and HIVEP2, from a panel of 98 candidate genes. By utilizing a random forest model, which has proven to be highly effective, we were able to successfully predict the progression of MI patients to HF. Notably, the hub genes demonstrated robust predictive accuracy for HF subsequent to MI, maintaining consistent performance across diverse time points. This consistency suggests that these hub genes play critical roles in the underlying biological processes leading to HF development post-MI. However, with increasing time post-MI, the model’s predictions for HF become increasingly less precise. The progression to HF following MI is attributed to a multitude of complex pathological mechanisms, which encompass myocardial cell necrosis, inflammatory responses, fibrosis and the regulation of myocardial repair ([135]Akhtar et al., 2024); therefore, this elevated level of uncertainty can be attributed to multiple factors, such as the intricacy of disease progression, alterations in patient health status over time, and the plausible interplay between diverse genetic and environmental factors. Nevertheless, our findings underscore the potential of hub genes as pivotal diagnostic tools for the early and precise identification of HF in patients with MI. In our pathway enrichment analysis, we examined the alterations in pathways at various time points post-MI and their possible associations with the progression of HF. Notably, the inflammatory response plays a pivotal role in this context. Consistent with the findings of the present study, research has shown that individual variations in the magnitude of the inflammatory response following acute MI, which involves one or multiple inflammatory-modulating pathways, may play pivotal roles in promoting adverse left ventricular remodelling ([136]Prabhu and Frangogiannis, 2016; [137]Kufazvinei et al., 2024). This maladaptive process is closely associated with the progressive development of HF([138]Westman et al., 2016; [139]Del Buono et al., 2022). Following MI, the resulting ischaemic injury elicits the mobilization and influx of a diverse spectrum of innate and adaptive immune cells to the infarcted myocardium, encompassing macrophages, DCs, T cells and various other cellular entities ([140]Kretzschmar et al., 2012; [141]Swirski and Nahrendorf, 2018; [142]Jung et al., 2022; [143]Delgobo et al., 2023). Simultaneously, inflammation plays a pivotal role in the progression of various aetiologies underlying HF([144]Carlisle et al., 2019; [145]Paulus and Zile, 2021). Despite the mixed outcomes observed in clinical trials, the modulation of inflammation continues to hold promise as a target for therapeutic intervention in the management of HF([146]Riehle and Bauersachs, 2019). Thus, it can be inferred that immune cells play crucial roles in the progression of HF following MI. Within the immunological axis, DCs, professional antigen-presenting cells expressing high levels of MHC class II, orchestrate post-MI adaptive immunity through three primary mechanisms: (1) neoantigen presentation via cross-dressing with cardiomyocyte-derived exosomes, (2) cytokine-mediated T-cell polarization, and (3) regulation of tertiary lymphoid structure formation in peri-infarct zones ([147]Balan et al., 2019). Single-cell transcriptomics revealed that DC subpopulation dynamics correlate with HF progression rates, suggesting that DC activation states may serve as early prognostic indicators. Their activation status and functional proficiency within the post-MI milieu may critically influence the trajectory of HF progression. In addition to DCs, naïve CD4^+ T cells are at the heart of adaptive immunity ([148]Luckheeram et al., 2012). Inflammation plays a dual role in regulating the progression of HF post-MI by either promoting or inhibiting the processes of tissue repair and fibrosis in the injured myocardium ([149]Lafuse et al., 2020; [150]Wang et al., 2023). CYLD is a specialized deubiquitinating enzyme with substrate specificity for lysine 63 (K63)- and methionine 1 (M1)-linked ubiquitin chains and exhibits minimal catalytic activity towards canonical K48-linked polyubiquitination marks critical for proteasomal degradation ([151]Sato et al., 2015). Mechanistic studies have established its role as a master regulator of inflammatory signalling through targeted deubiquitination of key adaptor proteins, including NF-κB essential modulator (NEMO/IKKγ), TNF receptor-associated factors (TRAF2/6) and receptor-interacting protein kinase 1 (RIPK1), thereby attenuating both NF-κB and MAPK pathway activation ([152]Brummelkamp et al., 2003; [153]Kovalenko et al., 2003; [154]Trompouki et al., 2003; [155]Wright et al., 2007). Emerging evidence further implicates CYLD in Wnt/β-catenin signalling modulation via dishevelled (DVL) protein deubiquitination, suggesting broad-spectrum regulatory functions across developmental and oncogenic pathways ([156]Tauriello et al., 2010; [157]Fernandez-Majada et al., 2016). This pleiotropic enzyme orchestrates a delicate balance between proinflammatory responses and cellular homeostasis, with dysregulation implicated in pathological processes spanning chronic inflammation, immune dysregulation, and tumorigenesis ([158]Marin-Rubio et al., 2023). The pleiotropic nature of these biomarkers, particularly CYLD’s dual regulation of inflammatory and fibrotic pathways, positions them as promising candidates for both diagnostic stratification and targeted therapeutic intervention. The compartment-specific dysregulation of CYLD, a known regulator of T-cell receptor signalling and NF-κB activation, may underlie the transition from acute inflammatory responses to maladaptive immune remodelling during HF progression; its dual role in cardiomyocyte survival pathways and lymphocyte activation could explain its superior diagnostic performance compared with that of conventional myocardial stress markers. Our findings indicate that the CYLD provides a more accurate prediction of the onset of HF after MI. However, our study has several limitations. Due to the challenges encountered in data collection and the practical difficulties associated with long-term patient follow-up for prognosis, our research results have not been validated in clinical patient samples. Furthermore, the mechanisms underlying the associations between these biomarkers and cellular targets in HF following MI remain to be fully elucidated. Future research should focus on understanding the biological pathways and interactions among these factors, as well as exploring potential therapeutic interventions that target these pathways. 5 Conclusion In conclusion, our integrative multiomics investigation identified CYLD as a critical determinant of post-MI HF pathogenesis, serving a role as a prognostic biomarker. The CYLD-centred 10-gene signature enables early HF risk stratification with >80% accuracy post-MI, outperforming conventional biomarkers. Single-cell resolution reveals the immune-modulatory function of CYLD. The developed nomogram, which integrates CYLD expression, DC activation states and naïve CD4^+ T cell counts, demonstrated superior predictive value. These findings fundamentally shift our understanding of post-MI, positioning CYLD-mediated ubiquitination as a linchpin mechanism governing the transition from MI to HF. Acknowledgments