Abstract Patients with rheumatoid arthritis (RA) have an increased risk of ischemic heart failure (IHF), but the shared mechanisms are unclear. This study analyzed RNA sequencing data from five RA and IHF datasets to identify common biological mechanisms and significant biomarkers. One hundred and seventy-seven common differentially expressed genes (CDEGs) were identified, with enrichment analysis highlighting pathways related to sarcomere organization, ventricular myocardial tissue morphogenesis, chondrocyte differentiation, prolactin signaling, hematopoietic cell lineage, and protein methyltransferases. Five hub genes (CD2, CD3D, CCL5, IL7R, and SPATA18) were identified through protein-protein interaction (PPI) network analysis and machine learning. Co-expression and immune cell infiltration analyses underscored the importance of the inflammatory immune response, with hub genes showing significant correlations with plasma cells, activated CD4^+ T memory cells, monocytes, and T regulatory cells. Single-cell RNA sequencing (scRNA-seq) confirmed hub gene expression primarily in T cells, activated T cells, monocytes, and NK cells. The findings underscore the critical roles of sarcomere organization, prolactin signaling, protein methyltransferase activity, and immune responses in the progression of IHF in RA patients. These insights not only identify valuable biomarkers and therapeutic targets but also offer promising directions for early diagnosis, personalized treatments, and preventive strategies for IHF in the context of RA. Moreover, the results highlight opportunities for repurposing existing drugs and developing new therapeutic interventions, which could reduce the risk of IHF in RA patients and improve their overall prognosis. Keywords: Rheumatoid arthritis, Ischemic heart failure, Microarray, Machine learning, Single-cell 1. Background Rheumatoid arthritis (RA) is an autoimmune disease characterized by peripheral, symmetric arthritis and extra-articular manifestations. As the most common autoimmune disease, RA usually requires lifelong treatment and may leave permanent disability [[37]1]. The risk of cardiovascular disease is significantly increased in the RA population due to the prolonged inflammatory response, as highlighted by the increased incidence of coronary atherosclerotic heart disease and heart failure (HF) in RA patients [[38]2,[39]3]. Extensive cohort studies have demonstrated that individuals with RA face a markedly elevated risk of myocardial infarction, comparable to that observed in diabetic patients [[40][4], [41][5], [42][6]]. Likewise, ischemic heart failure (IHF) frequently arises as a complication of acute coronary syndromes, with its prevalence notably higher among RA patients [[43]2]. As one of the most prevalent forms of heart failure, IHF affects approximately 64.3 million individuals globally [[44]7]. Myocardial necrosis due to ischemia is irreversible, and the risk of death from IHF remains high despite the increasing number of drugs used to treat HF in recent years [[45]8]. RA has been linked to a heightened risk of IHF. A cohort study involving 51,859 RA patients reported a hazard ratio (HR) of 2.28 (95 % CI: 2.06–2.53) for HF development within the first year of diagnosis compared to a control cohort. For RA patients without ischemic heart disease (IHD), the HR for newly diagnosed HF was 1.23 (95 % CI: 1.20–1.25), whereas RA patients with IHD exhibited an HR of 2.06 (95 % CI: 1.90–2.24) for new-onset HF [[46]9]. Another cohort study further revealed that RA patients had a 1.88-fold higher risk of IHF than non-RA individuals [[47]2]. Moreover, the occurrence of HF significantly amplifies the likelihood of subsequent cardiovascular events in RA patients [[48]10]. It is generally believed that the increase in circulating inflammatory products resulting from RA's autoimmune process accelerates atherosclerosis, leading to the development of IHD and subsequent IHF in RA patients [[49]11]. The study conducted by Mantel et al. [[50]2] notably revealed that individuals diagnosed with RA, even in the absence of prior IHD, faced a markedly elevated likelihood of developing HF. This finding implies the involvement of additional pathways in the progression of HF. With the rapid advancement of bioinformatics, a large amount of genetic data has become publicly accessible, providing unprecedented opportunities to explore previously unknown pathophysiological mechanisms and potential connections between different diseases. The processing and interpretation of extensive transcriptomic sequencing data pose significant difficulties for conventional analytical techniques. As an essential aspect of artificial intelligence, machine learning (ML) excels in capturing hierarchical and non-linear relationships within such data. ML is pivotal in identifying genes critical to biological functions using genetic and epigenetic data and supports disease diagnosis [[51]12,[52]13]. In a previous study, Li et al. [[53]14] combined multi-omics data with ML to identify genes such as CXCR2 that may be involved in the pathophysiology of HF. Similarly, research by Tran et al. [[54]15] found that tools developed using ML could predict subtypes of myopathies by mining transcriptomic data. Although the association between RA and the increased risk of IHF susceptibility has not yet been fully elucidated, both conditions, despite involving different physiological systems, exhibit significant immune-inflammatory characteristics. Understanding the molecular mechanisms linking the two diseases could not only provide new avenues for the early diagnosis, personalized treatment, and prevention of IHF but also open potential opportunities for the development of new therapeutic interventions. In this study, ML algorithms are combined with extensive bioinformatics approaches, including immune cell infiltration analysis, to investigate the molecular mechanisms shared by the two diseases. Through the identification of common differentially expressed genes (CDEGs) between RA and IHF and subsequent functional enrichment analyses, this work seeks to shed light on the shared pathogenesis of these conditions. Additionally, core modules within the protein-protein interaction (PPI) network are identified, and ML algorithms are applied to further screen for hub genes with potential diagnostic and therapeutic value. We tried to provide reference and direction for further study of the relationship between RA and IHF through the construction of the “transcription factor (TF)-mRNA-microRNA (miRNA)” network, hub gene co-expression network analysis, and immune cell infiltration correlation analysis. Finally, the cellular localization and differential expression of the hub genes were verified by single-cell analysis. The study flowchart is illustrated in [55]Fig. 1. Fig. 1. [56]Fig. 1 [57]Open in a new tab Study flowchart. RA, rheumatoid arthritis; IHF, ischemic heart failure; GSE, gene expression omnibus series; DEGs, differentially expressed gene; PPI, protein-protein interaction; RF, random forest algorithm; LASSO, the least absolute shrinkage and selection operator logistic regression; ROC, receiver operating characteristic; TF, transcription factor; MicroRNAs, miRNAs. 2. Methods 2.1. Data source Searches were performed in the Gene Expression Omnibus (GEO) database [[58]16] ([59]http://www.ncbi.nlm.nih.gov/geo/) using the search terms “RA” and “IHF” with the following filtering criteria: species source “homo sapiens,” sample source “tissue,” study type “microarray study,” and the dataset should contain both disease samples and the corresponding control samples. To minimize the impact of varying sampling sites on data analysis, we selected microarrays derived from the same tissue source. Compared to blood samples, tissue-specific samples provide a more accurate representation of the unique pathological changes associated with the disease. Ultimately, four datasets were chosen: [60]GSE77298 [[61]17] (synovial tissue samples from 16 RA patients and seven controls) and [62]GSE36700 [[63]18] (synovial tissue samples from seven RA patients and four controls) were utilized for RA expression profiles. For IHF, [64]GSE21610 [[65]19] (left ventricular tissue samples from nine IHF patients and eight controls) and [66]GSE57345 [[67]20] (left ventricular tissue samples from 95 IHF patients and 136 controls) were selected. In the annotation files for each microarray, we selected samples from heart failure of ischemic origin and synovial biopsies from RA for the disease group. This selection ensures that the disease states are significantly representative. We reviewed the annotation files of the datasets and the original literature to ensure that the selected samples met the diagnostic criteria for RA and that the IHF samples had confirmed diagnoses of IHD. [68]GSE77298 and [69]GSE21610 served as training datasets, while [70]GSE36700 and [71]GSE57345 were utilized for external validation. ScRNA-seq data for IHF and healthy controls are from [72]GSE145154 [[73]21]. In this dataset, we selected cardiac cells from the left ventricle of a healthy control and cardiac cells from the infarct zone of three IHF patients for subsequent analysis of scRNA-seq data. [74]Supplementary Table S1 presents information about the datasets used in this study. 2.2. Data processing First, the probe arrays are converted to gene symbols based on the chip's platform annotation, and log2 transformation is applied to matrix files that have not yet been converted. Duplicates are averaged and retained when multiple probes correspond to the same gene. After that, we divided the samples in the dataset into disease and control groups based on the clinical information of the samples. Using the “get_deg_all” function in the “tinyarray” R package. The “limma (voom)" algorithm [[75]22] was chosen for differential analysis to identify the DEGs for RA and IHF, using criteria of a fold change (FC) > 1.5 (log|FC| > 0.58) and P < 0.05. The limma (voom) algorithm enhances the reliability of DEG identification by integrating linear modeling techniques and offering robustness to small sample sizes and outliers. This makes it particularly well-suited for analyses involving relatively few samples [[76]23]. After obtaining the DEGs for RA and IHF, the up-and-down-regulated genes in the two datasets were intersected separately, and Venn diagrams were plotted. The final DEGs obtained were used as CDEGs for both diseases. 2.3. Functional annotation and enrichment analysis of CDEGs ClueGo is a Cytoscape plugin that integrates up-to-date information from GO terms and databases such as KEGG, WikiPathways, and Reactome pathway to support functional annotation of uploaded gene lists and visualization in network format. Notably, ClueGO employs kappa statistics to connect representative term subsets within the network. By computing a similarity matrix adjusted for chance, it evaluates the strength of associations between terms. Finally, an organic layout algorithm is employed to automatically arrange the visualization, facilitating an intuitive understanding of the functional relationships among different enriched entries [[77]24]. We imported the obtained CDEGs into ClueGo, restricted the species to “homo sapiens,” and performed gene ontology enrichment analysis (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis. GO includes Biological Processes (BP), Cellular Components (CC), and Molecular Functions (MF) [[78]25]. The results were visualized by merging GO parent-child terms for similarly associated genes, with a screening threshold of P < 0.05. 2.4. PPI network construction and key module identification To explore interactions among CDEGs, they were uploaded to the STRING online platform ([79]https://cn.string-db.org/) [[80]26], with the species limited to “homo sapiens” and confidence scores set above 0.4. A PPI network was generated and subsequently visualized using Cytoscape 3.7.2 [[81]27]. The confidence score indicates the reliability of protein-protein interactions, with higher scores suggesting more trustworthy interactions. Setting the confidence score to an intermediate value (>0.4) strikes a balance between data coverage and accuracy, ensuring the inclusion of a sufficient number of potential interactions without introducing excessive uncertainty. The MCODE plugin in Cytoscape is a clustering algorithm that can quickly find highly interconnected regions in a network and can be used to identify key protein clusters in PPI networks. To remove loosely connected nodes and identify tightly connected protein clusters, we established the following filtering parameters: a degree cutoff of 2, a node density cutoff of 0.1, a node score cutoff of 0.2, a k-core of 2, a maximum depth of 100, and the Haircut option enabled, which focuses on nodes with at least two connections. Cytohubba is a plugin for finding key nodes in the network and supports 11 topological analysis methods to predict key nodes in the network, with the MCC algorithm being more accurate [[82]28]. The MCODE plugin is utilized to identify top-ranked key modules within the PPI network, while the MCC algorithm in Cytohubba is employed to extract the top five key genes. The overlap between the predictions from both plugins is then taken to determine the key module genes in the network. 2.5. Hub gene screening and validation based on ML algorithms ML has been widely applied to the identification of characteristic genes. LASSO regression effectively addresses the issue of multicollinearity among variables, thereby reducing the risk of overfitting. It is one of the widely used ML algorithms [[83]29]. The RF algorithm, by integrating multiple decision trees, also achieves high predictive accuracy. Additionally, by introducing randomness, RF effectively mitigates the risk of overfitting [[84]30]. Both algorithms are suitable for analyzing small sample data. Therefore, this study selects the RF algorithm and LASSO regression as the two ML algorithms to further screen key candidate genes in CDEGs. LASSO regression analysis was performed on the [85]GSE77298 dataset using the ‘glmnet’ R package [[86]31], with 5-fold cross-validation employed to calculate the λ value. Key genes were selected based on the minimum λ. The RF model, built using the ‘randomforest’ R package, involved 500 trees and 5-fold cross-validation. Genes were ranked by importance, and the minimum number of genes associated with the model's output error rate was selected. The genes identified by both algorithms were intersected, and these intersected genes, along with the key module genes from the PPI network, were treated as candidate hub genes. Receiver operating characteristic (ROC) curves were plotted with the “pROC” R package [[87]32] to evaluate the diagnostic performance of candidate hub genes in the training sets. The accuracy of the screening results was validated in independent datasets by verifying the differential expression of candidate hub genes. The results were visualized with the ‘ggplot2′ R package, and genes showing consistent differential expression trends in validation sets were identified as Hub genes. Finally, ROC curves were plotted to assess the diagnostic potential of these Hub genes in the validation group. 2.6. Construction of hub gene co-expression network and TF-mRNA-miRNA regulatory network Hub genes were analyzed for co-expression and functional predictions using the GeneMANIA platform [[88]33] ([89]http://genemania.org). Predicted genes underwent KEGG pathway analysis via ClueGo to identify signaling pathways influenced by the hub gene co-expression network. TFs regulating hub genes were predicted using the TRRUST database ([90]https://ngdc.cncb.ac.cn/) [[91]34] and JASPAR ([92]http://jaspar.genereg.net/) [[93]35]. The miRwalk database ([94]http://mirwalk.umm.uni-heidelberg.de/) [[95]36] was employed to predict miRNAs targeting hub genes. Cytoscape was utilized to visualize the “TF-mRNA-miRNA” regulatory network. Additionally, potential drug targets for hub genes were identified using the DGIDB 3.0 database ([96]http://www.dgidb.org/) [[97]37], which provided summaries of known or predicted drug-gene interactions. Binding scores between genes and drugs were calculated to predict potential therapeutics. 2.7. Analysis of the correlation between hub genes and immune cell infiltration Analysis of hub gene correlation with immune cell infiltration was performed using CIBERSORT, which employs a deconvolution algorithm to estimate immune cell proportions from gene expression profiles [[98]38]. The relative proportions of 22 immune cell types in the training set were quantified using the “CIBERSORT” algorithm from the “IOBR” R package [[99]39], and differences between control and disease groups were evaluated. Pearson correlation coefficients were used to assess relationships between hub gene expression and the proportions of immune cells, ensuring optimal statistical precision. Normality tests were initially performed on the two datasets to confirm adherence to the assumptions of Pearson correlation analysis. Pearson correlation coefficients were then calculated using the “cor” function in R, with statistical significance defined as a P-value below 0.05. The results were finally visualized using the “ggplot2” package. 2.8. Single-cell data processing and clustering The raw data from [100]GSE145154 [[101]21] was obtained from the GEO database, converted to sparse matrix format using R studio (version 4.2.2), and analyzed with the “Seurat” R package (version 4.1.3) [[102]40]. Quality control for scRNA-seq data followed these criteria: (1) excluding cells with fewer than 500 genes (UMI >0), (2) removing cells with UMI counts below 800 or exceeding 8000, and (3) eliminating cells with mitochondrial UMI counts above 10 % [[103]21]. The data underwent log-normalization, and 2000 highly variable genes were identified using the “vst” method. These genes were scaled with the “ScaleData” function and dimensionally reduced via the “RunPCA” function. Appropriate PCs were selected using the “ElbowPlot” function, followed by constructing a shared nearest neighbor (SNN) graph of the top 30 PCs with the “FindNeighbors” function and clustering cells using the “FindClusters” function (Resolution = 0.5). The “RunUMAP” function was employed for visualization. Cell subpopulation annotations were derived from RAO et al. [[104]21] and supplemented with the Cellmaker 2.0 database ([105]http://CellMarker/index.html) [[106]41]. Specifically, the marker genes for monocytes are CD14, CD31, CEBPB, FCGR3A; T cells are CD3D; activated T cells are CCR7; NK cells are PRF1, GZMB, GZMK; bone marrow cells are LYZ; endothelial cells are FABP4; B cells are CD79A; and fibroblasts are LUM. The annotation was validated by cross-referencing the identified marker genes with known cell type-specific markers in the Cellmaker 2.0 database. Differential expression analysis was conducted to verify the uniqueness of each cell subpopulation. The “VlnPlot” function was employed to visualize the expression patterns and locations of hub genes across different cell types. DEGs between control and IHF patients were identified for each cell type using the “FindAllMakers” function with parameters set to “min. pct” = 0.25, “logfc. pct” = 0.25, and corrected p-value <0.05. This also determined the differential expression of hub genes within various cell subpopulations. 3. Results 3.1. Identification of CDEGs Following microarray normalization, 3466 DEGs were identified in the RA group (1501 up-regulated and 1965 down-regulated), while 1898 DEGs were identified in the IHF group (849 up-regulated and 1049 down-regulated) ([107]Fig. 2A and B). By intersecting up-regulated and down-regulated genes separately, 177 CDEGs were obtained, including 63 up-regulated and 114 down-regulated genes ([108]Fig. 2C). Fig. 2. [109]Fig. 2 [110]Open in a new tab Screening and enrichment analysis of CDEGs. (A) Cluster heat map, principal component analysis map, and DEGs volcano map of [111]GSE77298. (B) Cluster heat map, principal component analysis map, and DEGs volcano map of [112]GSE21610. (C) Venn diagram of DEGs. (D) The plot of enrichment analysis results for CDEGs. Different colors represent different clusters. C is the control group. P is the disease group. 3.2. Functional annotation and enrichment analysis of CDEGs ClueGo's enrichment analysis showed that 11 GO terms were enriched, including 9 BP and 2 CC terms, and 2 KEGG pathways ([113]Supplementary Table S2). The 13 enrichment results were divided into six groups according to Cohen's Kappa score. As shown in [114]Fig. 2D, different colors represent different groupings, and the entries with the most significant p-values in each group are bolded in the figure. The main enriched GO entries were “GO:0008276∼protein methyltransferase activity (P = 0.004)”, “GO:0055010∼ventricular cardiac muscle tissue morphogenesis (P = 0.004),” “GO:0045214∼sarcomere organization (P < 0.001),” “GO:0055001∼muscle cell development (P < 0.001)”. KEGG pathway as “KEGG:04640∼Hematopoietic cell lineage (P = 0.004)” and “KEGG:04917∼Prolactin signaling pathway (P = 0.011).” These pathways are proposed to be associated with the shared pathogenesis of RA and IHF. 3.3. PPI network construction and key module identification After constructing the PPI network of CDEGs and excluding genes that were not interconnected, the network contained 73 genes with interactions ([115]Fig. 3A). The results of the MCODE cluster analysis show that a module containing 12 edges and 4 nodes has the highest score (score = 4) ([116]Fig. 3B). These four genes are CD2 molecule (CD2), CD3 delta subunit of T-cell receptor complex (CD3D), interleukin 7 receptor (IL7R), and C-C motif chemokine ligand 5 (CCL5). Cytohubba obtained five genes, which were CD2, CD3D, IL7R, CCL5, and cofilin 2 (CFL2) ([117]Fig. 3C). Highly similar results were observed, so CD2, CD3D, IL7R, and CCL5 were included as key module genes. Fig. 3. [118]Fig. 3 [119]Open in a new tab PPI network and key module genes of CDEGs. (A) PPI networks for CDEGs. (B) MCODE clustering results. (C) cytohubba clustering results. 3.4. Hub gene screening and validation based on machine learning algorithms We further utilized two ML algorithms to screen key candidate genes among CDEGs. The results of LASSO regression indicated that when λmin was 0.041, there were 12 coefficients with non-zero values ([120]Fig. 4A). The results of RF showed that the minimum model output error rate was 0.130, corresponding to 33 selected genes ([121]Fig. 4B and C). A total of 18 genes were predicted by LASSO and 33 by RF, with eight overlapping genes identified at the intersection, namely spermatogenesis associated 18 (SPATA18), chromodomain helicase DNA binding protein 3 (CHD3), heparan sulfate-glucosamine 3-sulfotransferase 3A1 (HS3ST3A1), phosphatidylinositol specific phospholipase C X domain containing 1 (PLCXD1), chromosome 4 open reading frame 54 (C4orf54), long intergenic non-protein coding RNA 1165 (LINC01165), calponin 1 (CNN1), and MMEL1 antisense RNA 1 (MMEL1-AS1) ([122]Fig. 4D and [123]Supplementary Table S3). These eight genes and the four key module genes CD2, CD3D, IL7R, and CCL5 obtained from PPI network screening were combined as candidate hub genes, and ROC curves were drawn to assess their predictive power on [124]GSE77298 and [125]GSE21610. Analysis revealed that all 12 genes exhibited strong diagnostic performance in the training set, with an AUC exceeding 0.7 for each gene ([126]Supplementary Fig. S1). To assess the generalizability of these findings, the differential expression of these 12 genes was further examined in the validation set, and ROC curves were generated. Results indicated that CD2, SPATA18, CD3D, CCL5, and IL7R showed significantly different expression levels between disease and control groups in the validation set ([127]Supplementary Figure S2 A,B). Therefore, we identified these five genes as the final hub genes. The ROC curves showed good diagnostic results for all hub genes except for IL7R in [128]GSE57345 where the AUC was less than 0.6 ([129]Supplementary Figure S2 C,D). This suggests that the hub gene has the potential to be a marker gene. Fig. 4. [130]Fig. 4 [131]Open in a new tab Machine learning-based screening of hub genes. (A) Screening of hub genes using the LASSO regression algorithm. (B) Screening of hub genes using the RF algorithm. (C) Importance score graph of the top 20 genes predicted by the RF algorithm. (D) Venn diagram of intersecting genes predicted by LASSO regression and RF algorithm. 3.5. Construction of hub gene co-expression network and TF-mRNA-miRNA regulatory network GeneMANIA predicted 20 genes linked to the hub gene, along with 789 interaction links, highlighting a complex network between the hub gene and these associated genes. Enrichment analysis revealed processes related to cytokine activity, cellular response to chemokines, cytokine receptor binding, T cell differentiation, and lymphocyte migration. KEGG analysis indicated that enrichment was primarily focused on pathways such as cytokine-cytokine receptor interaction, chemokine signaling, hematopoietic cell lineage, T cell receptor signaling, as well as Th1, Th2, and Th17 cell differentiation ([132]Fig. 5A and [133]Supplementary Fig. S3). The “TF-mRNA-miRNA” regulatory network comprised 52 TFs regulating hub genes and 23 miRNAs targeted by hub genes, involving 75 interactions. Notably, several TFs in the network such as signal transducer and activator of transcription 3 (STAT3), GATA binding protein 2 (GATA2), forkhead box C1 (FOXC1), and paired box 2 (PAX2), can regulate multiple hub genes simultaneously ([134]Fig. 5B). DGIDB predictions show that ALEFACEPT [[135]42] and ALDESLEUKIN [[136]43] are FDA-approved drugs targeting CD2, while MUROMONAB-CD3 [[137]44] is an approved drug targeting CD3D. Fig. 5. [138]Fig. 5 [139]Open in a new tab (A) Co-expression network analysis of hub genes. (B) The “TF-mRNA-miRNA” regulatory network of hub genes. Hub genes are marked with gray circles; TFs are marked with blue octagons; miRNAs are marked with orange diamonds. 3.6. Analysis of the correlation between hub genes and immune cell infiltration Enrichment results of hub gene co-expression analysis and the hub genes themselves highlight the importance of immune and inflammatory responses. Immune cell infiltration analysis was conducted separately on the expression profiles of RA and IHF ([140]Fig. 6A–C), followed by an evaluation of the correlation between hub gene expression and immune cells. As shown in [141]Fig. 6B, in the RA group, patients with RA had a higher proportion of plasma cells, macrophages M0, and a lower proportion of B memory cells, activated NK cells, monocytes, resting dendritic cells, and resting mast cells compared with controls (all P values < 0.05). In the IHF group, patients with IHF had a higher proportion of resting CD4^+ T memory cells and a lower proportion of monocytes and macrophages M2 than controls (all P values < 0.05) ([142]Fig. 6D). Correlation analysis revealed that in the RA group, CD2, CD3D, CCL5, and IL7R expression was positively associated with plasma cells, activated CD4^+ T memory cells, and γδ T cells (P < 0.05) while negatively associated with resting mast cells, monocytes, activated NK cells, and T regulatory cells (Treg) (P < 0.05) ([143]Fig. 6E). In the IHF group, these genes were positively correlated with plasma cells, activated CD4^+ T memory cells, and macrophages M1 (P < 0.05) and negatively correlated with monocytes, naive CD4^+ T cells, and T regulatory cells (P < 0.05) ([144]Fig. 6F). These findings indicate that the interactions between hub genes and plasma cells, activated CD4^+ T memory cells, monocytes, and T regulatory cells may contribute to the development of IHF in RA patients. Fig. 6. [145]Fig. 6 [146]Open in a new tab Immune cell infiltration analysis. (A) Relative proportions of 22 immune cell subpopulations in the [147]GSE77298 dataset. (B) Comparison of immune cell ratios between RA and disease groups in the [148]GSE77298 dataset. (C) Relative proportions of 22 immune cell subpopulations in the [149]GSE21610 dataset. (D) Comparison of immune cell ratios between IHF and disease groups in the [150]GSE21610 dataset. (E) Heat map of the correlation between hub genes and immune cells in the [151]GSE77298 dataset. (F) Heat map of the correlation between hub genes and immune cells in the [152]GSE21610 dataset. C: control group, P: disease group. ∗, p < 0.05; ∗∗, p < 0.01; ∗∗∗, p < 0.001. 3.7. Single-cell analysis for the location of hub genes To understand the localization of hub genes in different cell subpopulations and their differential expression, we analyzed scRNA-seq data containing control and IHF groups. After completing quality control, we clustered 13,832 cell samples into 19 clusters. We then manually annotated the clusters, ultimately annotating the 19 clusters into eight classes of cell subpopulations (6377 monocytes, 3371 T cells, 1380 activated T cells, 1403 NK cells, 869 myeloid cells, 204 endothelial cells, 180 B cells, and 48 fibroblasts) ([153]Fig. 7A). Afterward, we visualized the localization of the hub gene in control and disease samples. The analysis revealed that CD2, CD3D, CCL5, and IL7R were predominantly expressed in T cells, activated T cells, monocytes, NK cells, and endothelial cells ([154]Fig. 7B), whereas SPATA18 showed no significant expression in these cells. Differential expression analysis indicated that CD2, CD3D, CCL5, and IL7R were expressed at significantly higher levels in T cells, activated T cells, monocytes, and NK cells compared to controls (adj P value < 0.05) ([155]Supplementary Table S4). These findings align with immune cell infiltration results, underscoring a potential link between CD2, CD3D, CCL5, and IL7R and immune cells, particularly T cells, in the pathogenesis of RA combined with IHF. Fig. 7. [156]Fig. 7 [157]Open in a new tab (A) UMAP plots of the cell types in the control and IHF groups. (B) Visualization of hub genes expression in control and IHF patients. The yellow dots indicate gene expression. 4. Discussion Patients with RA have an increased risk of developing coronary artery disease and HF [[158]45,[159]46]. Among the different subtypes of HF, IHF is directly associated with coronary arteriosclerotic heart disease [[160]47]. Symptoms such as limitation of movement due to joint deformity and pain in RA patients mask their HF symptoms to some extent, which leads to a possible underdiagnosis of HF in RA patients [[161]48]. In addition, the study found a significantly higher risk of unrecognized myocardial infarction in the RA population [[162]3]. Therefore, there is a need to raise awareness of RA combined with IHF. Unfortunately, the exact mechanisms that precede RA and IHF have not been fully elucidated. The development of microarray technology offers the possibility to identify DEGs in different disease states and to understand the complex biological processes behind the genes [[163]49]. To our knowledge, there are no reported studies on the relationship between RA and IHF at the genetic level using bioinformatics. This study represents the first attempt to integrate gene datasets from two diseases to uncover shared mechanisms and identify potential biomarkers. We obtained 177 CDEGs for RA and IHF. Functional enrichment analysis revealed that the enrichment results could be divided into six clusters, three of which were linked to each other: sarcomere organization-related pathways, ventricular myocardial tissue morphogenesis, and chondrocyte differentiation. In addition, the prolactin (PRL) signaling pathway, hematopoietic cell lineage, and protein methyltransferase activity were significantly enriched. Acute and chronic myocardial ischemia leads to massive myocardial cell necrosis when myocardial cells become thin and longitudinally elongated, and myocardial contractility decreases. Thus, IHF is characterized by systolic dysfunction [[164]50]. The sarcomere is the basic unit of myocardial contraction and consists of thick and thin myofilaments. Myocardial contractility is primarily driven by the interaction between myosin heads (thick filaments) and actin (thin filaments) [[165]51]. In RA, inflammatory mediators accumulate in the synovial joints, leading to chondrocyte dysfunction. Additionally, RA chondrocytes function as effector cells, undergoing changes that directly or indirectly contribute to joint damage in RA [[166]52]. The PRL signaling pathway and hematopoietic cell lineage are closely associated with the inflammatory immune response. Serum PRL levels were elevated in RA patients compared to healthy individuals [[167]53]. Adán et al. [[168]54] found that PRL can inhibit chondrocyte apoptosis through tumor protein p53 (TP53) and the Janus kinase 2 (JAK2)/STAT3 pathway, showing therapeutic potential for RA joint inflammation. A positive correlation has been identified between serum PRL levels and immune-inflammatory factors, including interleukin 6 (IL-6), interleukin 10 (IL-10), tumor necrosis factor-alpha (TNF-α), and intercellular adhesion molecule 1 (ICAM1), in patients with congestive HF. Additionally, serum PRL levels serve as an independent predictor of mortality or hospitalization in patients with advanced congestive HF [[169]55]. This suggests that high PRL levels in vivo during RA may be associated with developing IHF. Protein methyltransferases regulate gene expression by targeting histone molecules' arginine and lysine side chains. As a significant player in transcriptional regulation, protein methyltransferases are extensively involved in many processes, including differentiation, proliferation, and apoptosis of cardiac cells [[170]56,[171]57]. Our results suggest that the above pathways may be a potential mechanism for the development of IHF in RA patients. After exploring the common biological pathways of the two diseases, we further focused on identifying promising biomarkers. By screening the key modules in the PPI network and further ML algorithms to identify key genes, we finally obtained five hub genes, CD2, SPATA18, CD3D, CCL5, and IL7R. They were significantly differentially expressed in the training and validation sets and showed good diagnostic value. We found that, compared to the results from ML screening, the PPI network identified more hub genes. This suggests that the screening results based on protein-protein interactions may better reflect the crosstalk and common mechanisms between the two diseases, whereas ML algorithms focus more on the specific expression characteristics of individual genes. Previous studies have shown that CD2, IL-7R, and CD3D are involved to varying degrees in the onset and progression of RA, with their functions primarily related to regulating immune responses in RA [[172][58], [173][59], [174][60]]. However, there is limited research on these genes in IHF. Nonetheless, their importance in immune regulation suggests they may participate in the development of IHF by modulating cardiac immune responses. CD3D, a part of the T-cell receptor/CD3 complex (TCR/CD3 complex), is involved in T-cell development and signal transduction [[175]61]. Research suggests that CD3D could be a promising biomarker for early RA diagnosis, especially in patients negative for anti-citrullinated protein antibodies [[176]58]. Currently, substantial evidence suggests that CD3D could be a potential therapeutic target in immunotherapy [[177]62,[178]63]. Li et al. [[179]64] also identified CD3D as a promising RA therapeutic target using a module-based cumulative score method. CD2, a transmembrane glycoprotein belonging to the immunoglobulin superfamily, is broadly expressed on T cells, NK cells, thymocytes, and dendritic cells [[180]65]. Research indicates that CD2 enhances T-cell activation via co-stimulation of the TCR/CD3 complex [[181]66]. Two biologics targeting CD2, Alefacept and Siplizumab, have been used clinically. Clinical trials have demonstrated that Alefacept significantly improves the symptoms of psoriatic arthritis by inhibiting T-cell activation [[182]67]. Notably, CD2 expression is markedly elevated in the synovial tissues of RA patients compared to healthy synovia [[183]59], with CD2 receptors (CD2R) also showing high expression levels in the peripheral blood of RA patients [[184]68]. This suggests that Alefacept may have potential therapeutic applications in RA treatment. The IL-7R is a heterodimeric complex. Given the adverse effects of steroid therapy, such as metabolic disorders, secondary osteoporosis, and steroid resistance, increasing attention has been drawn to IL-7R antibody therapies as promising targets for autoimmune diseases [[185]69]. Studies have shown that IL-7R-positive lymphocytes can evade steroid-induced apoptosis, making IL-7R-targeted therapies a potential approach for overcoming steroid resistance, thus showing promise in RA treatment [[186]70]. Evidence suggests that IL-7R antibody treatment can alleviate RA disease activity by inhibiting the proliferation of collagen type II (CII)-specific CD4^+ T cells, reducing Th1 cell differentiation, and preventing the migration of CD4^+ T cells to joint damage sites [[187]60]. Inhibiting the IL-7/IL-7R pathway to regulate joint myeloid cell migration, pro-inflammatory reprogramming, and T effector cell polarization during angiogenesis offers a promising therapeutic approach for RA [[188]71]. This indicates that IL-7R-targeted therapy may also hold significant potential in preventing IHF in RA patients. CCL5 may be a promising target that drives both IHF and RA. CCL5 is essential for recruiting T cells, eosinophils, and basophils to inflammatory sites. Produced by circulating T cells, it actively enhances T cell chemotactic activity in RA [[189]72]. Studies have demonstrated elevated CCL5 levels in synovial tissue and fluid, with suppressing its transcriptional expression shown to alleviate RA progression [[190]73,[191]74]. Notably, CCR5 antagonists, targeting the receptor for CCL5, have been successfully utilized in HIV treatment [[192]75]. A Phase II clinical trial is currently underway to evaluate whether the CCR5 antagonist GSK706769 can maintain the clinical remission established by Enbrel in RA patients after Enbrel discontinuation. Notably, numerous studies have identified CCL5 as a potential biomarker for HF [[193]76,[194]77]. Montecucco et al. [[195]78] found that suppressing CCL5 expression significantly reduced infarct size and post-infarction HF in a chronic myocardial ischemia mouse model, a mechanism linked to reduced leukocyte recruitment. These findings suggest that CCL5-targeted therapies may be worth investigating for treating IHF in RA patients. SPATA18 is essential in maintaining mitochondrial mass by inducing lysosome-like cells in mitochondria to scavenge oxidized mitochondrial proteins [[196]79]. Currently, research on SPATA18 is mainly focused on oncology, and its relationship with RA and IHF requires further study. The results of multi-stage enrichment analysis and studies on Hub genes have highlighted the importance of inflammatory immune responses. To further explore the relationship between hub genes and immune cells, immune cell infiltration analysis was conducted. Previous studies have documented immune cell infiltration in RA and IHF [[197]80,[198]81]. Our analysis focused on the association between hub genes and immune cells in these conditions. The results revealed significant correlations between hub genes and plasma cells, activated CD4^+ T memory cells, monocytes, and T regulatory cells in both datasets. Hub genes showed positive associations with plasma cell and CD4^+ T memory cell activation but negative associations with Tregs and monocytes, indicating their potential involvement in the co-occurrence of RA and IHF. Plasma cells, which differentiate from B cells, primarily secrete autoantibodies involved in the pathogenesis of RA, such as rheumatoid factor (RF) and anti-citrullinated protein antibodies (ACPA) [[199]82]. Immune complexes formed by RF or ACPA can lead to joint destruction in RA patients through multiple mechanisms, including complement activation and osteoclast differentiation [[200]83]. Studies in humans have shown widespread deposition of immunoglobulin (IgG) in the myocardium of patients with end-stage heart failure with reduced ejection fraction (HFrEF) compared to controls [[201]84]. Although it remains unclear whether these antibodies play a direct pathogenic role, it has been observed in a coronary artery ligation-induced myocardial ischemia mouse model that IgG-deficient mice exhibited less severe myocardial injury [[202]85]. Furthermore, Hasham et al. [[203]86] found that antibody-dependent mechanisms contribute, at least in part, to lupus-induced myocarditis. Clinically, serum RF in RA patients has been linked to a higher risk of HF [[204]87], indicating that plasma cells could be a key mechanism underlying the increased susceptibility to IHF in RA patients. The observed positive correlation between hub genes and plasma cells highlights potential avenues for immunotherapeutic strategies that target plasma cells by modulating hub genes. Memory T cells are critical for maintaining immune memory. In RA, the sublining layer of the synovium is an important structure and one of the primary sites of inflammatory responses [[205]88]. Memory T cells are the predominant cell type in the sublining layer. These cells either infiltrate tissues diffusely or form ectopic germinal centers, ultimately leading to the production of large quantities of antibodies, such as RF or ACPA, which exacerbate RA progression [[206]89]. Studies have identified c-Met + memory T cells, expressing the c-Met receptor, as having a preferential ability to migrate to cardiac tissue during inflammation, exacerbating inflammatory responses [[207]90]. In mouse heart failure models, memory CD4^+ T cells have been linked to left ventricular dysfunction, fibrosis, and myocardial hypertrophy [[208]91]. The differentiation and maintenance of memory cells rely on cytokines, with IL-7 playing a key role in preserving the homeostasis of memory CD4^+ T cells [[209]92,[210]93], which may explain the observed positive correlation between IL-7R and memory cells. Tregs are essential for suppressing excessive immune responses and ensuring immune homeostasis. In RA patients, Treg levels are reduced [[211]94], potentially due to the chronic inflammatory state in RA that inhibits Treg formation and Foxp3 expression [[212]95]. A reduction in Treg numbers has also been reported in the peripheral blood of IHF patients [[213]96]. Impaired Treg cell numbers or functions amplify immune responses, leading to increased pro-inflammatory cytokine release and worsening RA [[214]97]. Hammer et al. [[215]98] reported a negative correlation between low circulating Treg proportions in IHF patients and mortality. Additionally, studies have shown that in vivo Treg expansion can mitigate adverse remodeling and enhance cardiac function after myocardial infarction by suppressing apoptosis, neutrophil, macrophage, and lymphocyte infiltration, and reducing inflammatory factors like TNF-α and interleukin 1 beta (IL1B) [[216]99]. The negative correlation between Hub genes and Tregs suggests that high expression of these genes may be associated with a reduction in Tregs. This implies that Hub genes may play a role in the immune inflammatory response during disease progression, affecting the generation and function of Tregs, and consequently leading to their decreased numbers. This finding provides a significant direction for future cellular studies targeting Hub genes. During inflammation, monocytes in the blood are recruited and differentiated into macrophages and dendritic cells. Recent studies have found that monocytes are essential in differentiating into osteoclasts and causing subsequent bone erosion in RA patients [[217]100]. The role of monocytes/macrophages in atherosclerotic plaque formation has been widely emphasized [[218]101]. In addition, during ischemia-induced myocardial injury, a sustained inflammatory response and infiltration of M1-type macrophages also lead to apoptosis, abnormal ventricular remodeling, and impaired systolic function [[219]102]. Since the CIBSORT algorithm is analyzed to derive the relative proportions of immune cells in different samples, we believe that the negative correlation between hub genes and monocytes is caused by the massive differentiation of monocytes into macrophages and osteoclasts. The single-cell analysis for the hub gene echoed the results of immune cell infiltration, again highlighting the crosstalk between the hub gene and immune cells. These results indicate that immune cells may play a role in the development of IHF in RA patients. In addition, we constructed a “TF-mRNA-miRNA” regulatory network of the hub gene and predicted therapeutic agents targeting the hub gene, which may provide a basis for further studies in the future. Notably, TFs such as STAT3, GATA2, FOXC1, and RELA can regulate more than two hub genes at the same time. This suggests that the role of the above TFs in the common mechanism of RA and IHF may be of interest. This study has certain limitations, primarily the reliance on data from the GEO database. Despite applying strict screening criteria and validating results with external datasets, the samples were derived from diverse regions and genetic backgrounds. Additionally, RA and IHF are highly heterogeneous diseases, and individual variations among patients may have influenced the results. Batch effects across datasets, sample size imbalances, and differences in age, sex, and environmental factors between the disease and control groups may also have introduced potential confounding biases. Due to data limitations, we were unable to fully adjust for these confounders, which could be addressed in future studies by incorporating larger sample sizes, multi-dimensional data, and integrating imaging data with deep learning techniques. Second, to avoid missing key but less significant genes, we set relatively lenient criteria for DEG selection. However, such an approach may have resulted in some false-positive findings. Additionally, in our immune cell infiltration analysis, the RA and IHF samples we selected came from different biological environments and tissue sources, potentially containing varied types and proportions of cellular components. Since CIBERSORT relies on gene expression data to infer immune cell proportions, the heterogeneity of samples from different sources could affect the accuracy of the results [[220]103]. Furthermore, systematic bias and technical variation in gene expression data across samples are inevitable, and even with standardization, heterogeneity may persist, impacting the accurate estimation of immune cell proportions [[221]104]. Thus, future validation using flow cytometry or mass cytometry is necessary. Third, studies have shown that heart failure with preserved ejection fraction (HFpEF) accounts for a considerable proportion of RA combined with HF [[222]48] and seems to be more closely associated with inflammatory responses compared to HFrEF [[223]105]. However, due to the lack of publicly available chip data for HFpEF patients, assessing the potential relationship between HFpEF and RA is equally important. Further elucidation of the molecular network changes in the RA-IHD-IHF pathway is crucial for revealing the complex relationships and potential mechanisms among these diseases. Lastly, while the identified Hub genes demonstrated high diagnostic value, the model's training and validation were based on small sample sizes and single-dimensional data. Therefore, the generalizability of the predictive results needs to be confirmed with future large-sample, multi-dimensional studies. Additionally, validation with in vivo and in vitro experiments is essential. Based on the comparison and discussion of existing research findings, our preliminary bioinformatics analysis suggests several potential directions for future in-depth studies on RA and IHF. Firstly, future research should focus on validating these biomarkers in clinical settings to determine their practical utility in diagnosing or treating RA and IHF. Additionally, exploring the specific molecular pathways regulated by these genes could provide deeper insights into the connections between the two diseases. Our study highlights CCL5 and the prolactin signaling pathway as the most promising biological targets and pathways. The significance of CCL5 in myocardial ischemia and RA suggests its potential as a common therapeutic target. The prolactin signaling pathway, closely associated with the immune inflammatory responses in RA and IHF, may offer therapeutic benefits when modulated. Immune cell infiltration and single-cell analysis further suggest that plasma cells, memory T cells, Tregs, and monocytes are promising target cell types. Lastly, the potential application of drugs predicted through target prediction, such as Alefacept and Aldesleukin, in RA or IHF could be a direction for future research. By integrating clinical and multi-omics data in the future, new directions for personalized treatment of RA and IHF comorbidity are expected to emerge. 5. Conclusions In summary, we spearheaded a bioinformatics study targeting the common patho-genesis and screening of key targets in RA and IHF. Our results suggest that sarcomere organization, PRL-related pathways, protein methyltransferase activity, and activation of immune-inflammatory responses may be involved in developing IHF in RA patients. Similarly, we propose five biomarkers with potential diagnostic and therapeutic value, CD2, SPATA18, CD3D, CCL5, and IL7R. These may provide further insight into the relationship between RA and IHF. These findings provide further insights into the relationship between RA and IHF. CRediT authorship contribution statement Ziyi Sun: Writing – review & editing, Writing – original draft, Visualization, Formal analysis, Data curation, Conceptualization. Jianguo Lin: Data curation, Conceptualization. Xiaoning Sun: Formal analysis, Data curation. Zhangjun Yun: Formal analysis, Data curation. Xiaoxiao Zhang: Formal analysis, Data curation. Siyu Xu: Formal analysis, Data curation. Jinlong Duan: Writing – review & editing. Kuiwu Yao: Writing – review & editing, Writing – original draft, Supervision, Project administration, Funding acquisition, Conceptualization. Ethics approval and consent to participate Review and/or approval by an ethics committee was not needed for this study. As our data originates from the GEO database, a public repository, all patients included in the database have received ethical approval. Researchers can freely access and utilize this data for studies and publications. Since our study relies on open-source data, there are no ethical concerns or conflicts of interest. Consent for publication Not applicable. Availability of data and materials The gene datasets used in this study, including [224]GSE77298, [225]GSE36700, [226]GSE21610, [227]GSE57345, and [228]GSE145154, are available in the Gene Expression Omnibus ([229]https://www.ncbi.nlm.nih.gov/geo/). Funding This study was supported by the General Program of the National Natural Science Foundation of China (81473466, 81873173) and the National Key Research and Development Program of China (2019YFC1708703). Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements