Abstract

   Patients with rheumatoid arthritis (RA) have an increased risk of
   ischemic heart failure (IHF), but the shared mechanisms are unclear.
   This study analyzed RNA sequencing data from five RA and IHF datasets
   to identify common biological mechanisms and significant biomarkers.
   One hundred and seventy-seven common differentially expressed genes
   (CDEGs) were identified, with enrichment analysis highlighting pathways
   related to sarcomere organization, ventricular myocardial tissue
   morphogenesis, chondrocyte differentiation, prolactin signaling,
   hematopoietic cell lineage, and protein methyltransferases. Five hub
   genes (CD2, CD3D, CCL5, IL7R, and SPATA18) were identified through
   protein-protein interaction (PPI) network analysis and machine
   learning. Co-expression and immune cell infiltration analyses
   underscored the importance of the inflammatory immune response, with
   hub genes showing significant correlations with plasma cells, activated
   CD4^+ T memory cells, monocytes, and T regulatory cells. Single-cell
   RNA sequencing (scRNA-seq) confirmed hub gene expression primarily in T
   cells, activated T cells, monocytes, and NK cells. The findings
   underscore the critical roles of sarcomere organization, prolactin
   signaling, protein methyltransferase activity, and immune responses in
   the progression of IHF in RA patients. These insights not only identify
   valuable biomarkers and therapeutic targets but also offer promising
   directions for early diagnosis, personalized treatments, and preventive
   strategies for IHF in the context of RA. Moreover, the results
   highlight opportunities for repurposing existing drugs and developing
   new therapeutic interventions, which could reduce the risk of IHF in RA
   patients and improve their overall prognosis.

   Keywords: Rheumatoid arthritis, Ischemic heart failure, Microarray,
   Machine learning, Single-cell

1. Background

   Rheumatoid arthritis (RA) is an autoimmune disease characterized by
   peripheral, symmetric arthritis and extra-articular manifestations. As
   the most common autoimmune disease, RA usually requires lifelong
   treatment and may leave permanent disability [[37]1]. The risk of
   cardiovascular disease is significantly increased in the RA population
   due to the prolonged inflammatory response, as highlighted by the
   increased incidence of coronary atherosclerotic heart disease and heart
   failure (HF) in RA patients [[38]2,[39]3]. Extensive cohort studies
   have demonstrated that individuals with RA face a markedly elevated
   risk of myocardial infarction, comparable to that observed in diabetic
   patients [[40][4], [41][5], [42][6]]. Likewise, ischemic heart failure
   (IHF) frequently arises as a complication of acute coronary syndromes,
   with its prevalence notably higher among RA patients [[43]2]. As one of
   the most prevalent forms of heart failure, IHF affects approximately
   64.3 million individuals globally [[44]7]. Myocardial necrosis due to
   ischemia is irreversible, and the risk of death from IHF remains high
   despite the increasing number of drugs used to treat HF in recent years
   [[45]8].

   RA has been linked to a heightened risk of IHF. A cohort study
   involving 51,859 RA patients reported a hazard ratio (HR) of 2.28 (95 %
   CI: 2.06–2.53) for HF development within the first year of diagnosis
   compared to a control cohort. For RA patients without ischemic heart
   disease (IHD), the HR for newly diagnosed HF was 1.23 (95 % CI:
   1.20–1.25), whereas RA patients with IHD exhibited an HR of 2.06 (95 %
   CI: 1.90–2.24) for new-onset HF [[46]9]. Another cohort study further
   revealed that RA patients had a 1.88-fold higher risk of IHF than
   non-RA individuals [[47]2]. Moreover, the occurrence of HF
   significantly amplifies the likelihood of subsequent cardiovascular
   events in RA patients [[48]10]. It is generally believed that the
   increase in circulating inflammatory products resulting from RA's
   autoimmune process accelerates atherosclerosis, leading to the
   development of IHD and subsequent IHF in RA patients [[49]11]. The
   study conducted by Mantel et al. [[50]2] notably revealed that
   individuals diagnosed with RA, even in the absence of prior IHD, faced
   a markedly elevated likelihood of developing HF. This finding implies
   the involvement of additional pathways in the progression of HF.

   With the rapid advancement of bioinformatics, a large amount of genetic
   data has become publicly accessible, providing unprecedented
   opportunities to explore previously unknown pathophysiological
   mechanisms and potential connections between different diseases. The
   processing and interpretation of extensive transcriptomic sequencing
   data pose significant difficulties for conventional analytical
   techniques. As an essential aspect of artificial intelligence, machine
   learning (ML) excels in capturing hierarchical and non-linear
   relationships within such data. ML is pivotal in identifying genes
   critical to biological functions using genetic and epigenetic data and
   supports disease diagnosis [[51]12,[52]13]. In a previous study, Li et
   al. [[53]14] combined multi-omics data with ML to identify genes such
   as CXCR2 that may be involved in the pathophysiology of HF. Similarly,
   research by Tran et al. [[54]15] found that tools developed using ML
   could predict subtypes of myopathies by mining transcriptomic data.
   Although the association between RA and the increased risk of IHF
   susceptibility has not yet been fully elucidated, both conditions,
   despite involving different physiological systems, exhibit significant
   immune-inflammatory characteristics. Understanding the molecular
   mechanisms linking the two diseases could not only provide new avenues
   for the early diagnosis, personalized treatment, and prevention of IHF
   but also open potential opportunities for the development of new
   therapeutic interventions. In this study, ML algorithms are combined
   with extensive bioinformatics approaches, including immune cell
   infiltration analysis, to investigate the molecular mechanisms shared
   by the two diseases. Through the identification of common
   differentially expressed genes (CDEGs) between RA and IHF and
   subsequent functional enrichment analyses, this work seeks to shed
   light on the shared pathogenesis of these conditions. Additionally,
   core modules within the protein-protein interaction (PPI) network are
   identified, and ML algorithms are applied to further screen for hub
   genes with potential diagnostic and therapeutic value. We tried to
   provide reference and direction for further study of the relationship
   between RA and IHF through the construction of the “transcription
   factor (TF)-mRNA-microRNA (miRNA)” network, hub gene co-expression
   network analysis, and immune cell infiltration correlation analysis.
   Finally, the cellular localization and differential expression of the
   hub genes were verified by single-cell analysis. The study flowchart is
   illustrated in [55]Fig. 1.

Fig. 1.

   [56]Fig. 1
   [57]Open in a new tab

   Study flowchart. RA, rheumatoid arthritis; IHF, ischemic heart failure;
   GSE, gene expression omnibus series; DEGs, differentially expressed
   gene; PPI, protein-protein interaction; RF, random forest algorithm;
   LASSO, the least absolute shrinkage and selection operator logistic
   regression; ROC, receiver operating characteristic; TF, transcription
   factor; MicroRNAs, miRNAs.

2. Methods

2.1. Data source

   Searches were performed in the Gene Expression Omnibus (GEO) database
   [[58]16] ([59]http://www.ncbi.nlm.nih.gov/geo/) using the search terms
   “RA” and “IHF” with the following filtering criteria: species source
   “homo sapiens,” sample source “tissue,” study type “microarray study,”
   and the dataset should contain both disease samples and the
   corresponding control samples. To minimize the impact of varying
   sampling sites on data analysis, we selected microarrays derived from
   the same tissue source. Compared to blood samples, tissue-specific
   samples provide a more accurate representation of the unique
   pathological changes associated with the disease. Ultimately, four
   datasets were chosen: [60]GSE77298 [[61]17] (synovial tissue samples
   from 16 RA patients and seven controls) and [62]GSE36700 [[63]18]
   (synovial tissue samples from seven RA patients and four controls) were
   utilized for RA expression profiles. For IHF, [64]GSE21610 [[65]19]
   (left ventricular tissue samples from nine IHF patients and eight
   controls) and [66]GSE57345 [[67]20] (left ventricular tissue samples
   from 95 IHF patients and 136 controls) were selected. In the annotation
   files for each microarray, we selected samples from heart failure of
   ischemic origin and synovial biopsies from RA for the disease group.
   This selection ensures that the disease states are significantly
   representative. We reviewed the annotation files of the datasets and
   the original literature to ensure that the selected samples met the
   diagnostic criteria for RA and that the IHF samples had confirmed
   diagnoses of IHD. [68]GSE77298 and [69]GSE21610 served as training
   datasets, while [70]GSE36700 and [71]GSE57345 were utilized for
   external validation. ScRNA-seq data for IHF and healthy controls are
   from [72]GSE145154 [[73]21]. In this dataset, we selected cardiac cells
   from the left ventricle of a healthy control and cardiac cells from the
   infarct zone of three IHF patients for subsequent analysis of scRNA-seq
   data. [74]Supplementary Table S1 presents information about the
   datasets used in this study.

2.2. Data processing

   First, the probe arrays are converted to gene symbols based on the
   chip's platform annotation, and log2 transformation is applied to
   matrix files that have not yet been converted. Duplicates are averaged
   and retained when multiple probes correspond to the same gene. After
   that, we divided the samples in the dataset into disease and control
   groups based on the clinical information of the samples. Using the
   “get_deg_all” function in the “tinyarray” R package. The “limma (voom)"
   algorithm [[75]22] was chosen for differential analysis to identify the
   DEGs for RA and IHF, using criteria of a fold change (FC) > 1.5
   (log|FC| > 0.58) and P < 0.05. The limma (voom) algorithm enhances the
   reliability of DEG identification by integrating linear modeling
   techniques and offering robustness to small sample sizes and outliers.
   This makes it particularly well-suited for analyses involving
   relatively few samples [[76]23]. After obtaining the DEGs for RA and
   IHF, the up-and-down-regulated genes in the two datasets were
   intersected separately, and Venn diagrams were plotted. The final DEGs
   obtained were used as CDEGs for both diseases.

2.3. Functional annotation and enrichment analysis of CDEGs

   ClueGo is a Cytoscape plugin that integrates up-to-date information
   from GO terms and databases such as KEGG, WikiPathways, and Reactome
   pathway to support functional annotation of uploaded gene lists and
   visualization in network format. Notably, ClueGO employs kappa
   statistics to connect representative term subsets within the network.
   By computing a similarity matrix adjusted for chance, it evaluates the
   strength of associations between terms. Finally, an organic layout
   algorithm is employed to automatically arrange the visualization,
   facilitating an intuitive understanding of the functional relationships
   among different enriched entries [[77]24]. We imported the obtained
   CDEGs into ClueGo, restricted the species to “homo sapiens,” and
   performed gene ontology enrichment analysis (GO) and Kyoto encyclopedia
   of genes and genomes (KEGG) pathway enrichment analysis. GO includes
   Biological Processes (BP), Cellular Components (CC), and Molecular
   Functions (MF) [[78]25]. The results were visualized by merging GO
   parent-child terms for similarly associated genes, with a screening
   threshold of P < 0.05.

2.4. PPI network construction and key module identification

   To explore interactions among CDEGs, they were uploaded to the STRING
   online platform ([79]https://cn.string-db.org/) [[80]26], with the
   species limited to “homo sapiens” and confidence scores set above 0.4.
   A PPI network was generated and subsequently visualized using Cytoscape
   3.7.2 [[81]27]. The confidence score indicates the reliability of
   protein-protein interactions, with higher scores suggesting more
   trustworthy interactions. Setting the confidence score to an
   intermediate value (>0.4) strikes a balance between data coverage and
   accuracy, ensuring the inclusion of a sufficient number of potential
   interactions without introducing excessive uncertainty. The MCODE
   plugin in Cytoscape is a clustering algorithm that can quickly find
   highly interconnected regions in a network and can be used to identify
   key protein clusters in PPI networks. To remove loosely connected nodes
   and identify tightly connected protein clusters, we established the
   following filtering parameters: a degree cutoff of 2, a node density
   cutoff of 0.1, a node score cutoff of 0.2, a k-core of 2, a maximum
   depth of 100, and the Haircut option enabled, which focuses on nodes
   with at least two connections. Cytohubba is a plugin for finding key
   nodes in the network and supports 11 topological analysis methods to
   predict key nodes in the network, with the MCC algorithm being more
   accurate [[82]28]. The MCODE plugin is utilized to identify top-ranked
   key modules within the PPI network, while the MCC algorithm in
   Cytohubba is employed to extract the top five key genes. The overlap
   between the predictions from both plugins is then taken to determine
   the key module genes in the network.

2.5. Hub gene screening and validation based on ML algorithms

   ML has been widely applied to the identification of characteristic
   genes. LASSO regression effectively addresses the issue of
   multicollinearity among variables, thereby reducing the risk of
   overfitting. It is one of the widely used ML algorithms [[83]29]. The
   RF algorithm, by integrating multiple decision trees, also achieves
   high predictive accuracy. Additionally, by introducing randomness, RF
   effectively mitigates the risk of overfitting [[84]30]. Both algorithms
   are suitable for analyzing small sample data. Therefore, this study
   selects the RF algorithm and LASSO regression as the two ML algorithms
   to further screen key candidate genes in CDEGs. LASSO regression
   analysis was performed on the [85]GSE77298 dataset using the ‘glmnet’ R
   package [[86]31], with 5-fold cross-validation employed to calculate
   the λ value. Key genes were selected based on the minimum λ. The RF
   model, built using the ‘randomforest’ R package, involved 500 trees and
   5-fold cross-validation. Genes were ranked by importance, and the
   minimum number of genes associated with the model's output error rate
   was selected. The genes identified by both algorithms were intersected,
   and these intersected genes, along with the key module genes from the
   PPI network, were treated as candidate hub genes. Receiver operating
   characteristic (ROC) curves were plotted with the “pROC” R package
   [[87]32] to evaluate the diagnostic performance of candidate hub genes
   in the training sets. The accuracy of the screening results was
   validated in independent datasets by verifying the differential
   expression of candidate hub genes. The results were visualized with the
   ‘ggplot2′ R package, and genes showing consistent differential
   expression trends in validation sets were identified as Hub genes.
   Finally, ROC curves were plotted to assess the diagnostic potential of
   these Hub genes in the validation group.

2.6. Construction of hub gene co-expression network and TF-mRNA-miRNA
regulatory network

   Hub genes were analyzed for co-expression and functional predictions
   using the GeneMANIA platform [[88]33] ([89]http://genemania.org).
   Predicted genes underwent KEGG pathway analysis via ClueGo to identify
   signaling pathways influenced by the hub gene co-expression network.
   TFs regulating hub genes were predicted using the TRRUST database
   ([90]https://ngdc.cncb.ac.cn/) [[91]34] and JASPAR
   ([92]http://jaspar.genereg.net/) [[93]35]. The miRwalk database
   ([94]http://mirwalk.umm.uni-heidelberg.de/) [[95]36] was employed to
   predict miRNAs targeting hub genes. Cytoscape was utilized to visualize
   the “TF-mRNA-miRNA” regulatory network. Additionally, potential drug
   targets for hub genes were identified using the DGIDB 3.0 database
   ([96]http://www.dgidb.org/) [[97]37], which provided summaries of known
   or predicted drug-gene interactions. Binding scores between genes and
   drugs were calculated to predict potential therapeutics.

2.7. Analysis of the correlation between hub genes and immune cell
infiltration

   Analysis of hub gene correlation with immune cell infiltration was
   performed using CIBERSORT, which employs a deconvolution algorithm to
   estimate immune cell proportions from gene expression profiles
   [[98]38]. The relative proportions of 22 immune cell types in the
   training set were quantified using the “CIBERSORT” algorithm from the
   “IOBR” R package [[99]39], and differences between control and disease
   groups were evaluated. Pearson correlation coefficients were used to
   assess relationships between hub gene expression and the proportions of
   immune cells, ensuring optimal statistical precision. Normality tests
   were initially performed on the two datasets to confirm adherence to
   the assumptions of Pearson correlation analysis. Pearson correlation
   coefficients were then calculated using the “cor” function in R, with
   statistical significance defined as a P-value below 0.05. The results
   were finally visualized using the “ggplot2” package.

2.8. Single-cell data processing and clustering

   The raw data from [100]GSE145154 [[101]21] was obtained from the GEO
   database, converted to sparse matrix format using R studio (version
   4.2.2), and analyzed with the “Seurat” R package (version 4.1.3)
   [[102]40]. Quality control for scRNA-seq data followed these criteria:
   (1) excluding cells with fewer than 500 genes (UMI >0), (2) removing
   cells with UMI counts below 800 or exceeding 8000, and (3) eliminating
   cells with mitochondrial UMI counts above 10 % [[103]21]. The data
   underwent log-normalization, and 2000 highly variable genes were
   identified using the “vst” method. These genes were scaled with the
   “ScaleData” function and dimensionally reduced via the “RunPCA”
   function. Appropriate PCs were selected using the “ElbowPlot” function,
   followed by constructing a shared nearest neighbor (SNN) graph of the
   top 30 PCs with the “FindNeighbors” function and clustering cells using
   the “FindClusters” function (Resolution = 0.5). The “RunUMAP” function
   was employed for visualization. Cell subpopulation annotations were
   derived from RAO et al. [[104]21] and supplemented with the Cellmaker
   2.0 database ([105]http://CellMarker/index.html) [[106]41].
   Specifically, the marker genes for monocytes are CD14, CD31, CEBPB,
   FCGR3A; T cells are CD3D; activated T cells are CCR7; NK cells are
   PRF1, GZMB, GZMK; bone marrow cells are LYZ; endothelial cells are
   FABP4; B cells are CD79A; and fibroblasts are LUM. The annotation was
   validated by cross-referencing the identified marker genes with known
   cell type-specific markers in the Cellmaker 2.0 database. Differential
   expression analysis was conducted to verify the uniqueness of each cell
   subpopulation. The “VlnPlot” function was employed to visualize the
   expression patterns and locations of hub genes across different cell
   types. DEGs between control and IHF patients were identified for each
   cell type using the “FindAllMakers” function with parameters set to
   “min. pct” = 0.25, “logfc. pct” = 0.25, and corrected p-value <0.05.
   This also determined the differential expression of hub genes within
   various cell subpopulations.

3. Results

3.1. Identification of CDEGs

   Following microarray normalization, 3466 DEGs were identified in the RA
   group (1501 up-regulated and 1965 down-regulated), while 1898 DEGs were
   identified in the IHF group (849 up-regulated and 1049 down-regulated)
   ([107]Fig. 2A and B). By intersecting up-regulated and down-regulated
   genes separately, 177 CDEGs were obtained, including 63 up-regulated
   and 114 down-regulated genes ([108]Fig. 2C).

Fig. 2.

   [109]Fig. 2
   [110]Open in a new tab

   Screening and enrichment analysis of CDEGs. (A) Cluster heat map,
   principal component analysis map, and DEGs volcano map of
   [111]GSE77298. (B) Cluster heat map, principal component analysis map,
   and DEGs volcano map of [112]GSE21610. (C) Venn diagram of DEGs. (D)
   The plot of enrichment analysis results for CDEGs. Different colors
   represent different clusters. C is the control group. P is the disease
   group.

3.2. Functional annotation and enrichment analysis of CDEGs

   ClueGo's enrichment analysis showed that 11 GO terms were enriched,
   including 9 BP and 2 CC terms, and 2 KEGG pathways ([113]Supplementary
   Table S2). The 13 enrichment results were divided into six groups
   according to Cohen's Kappa score. As shown in [114]Fig. 2D, different
   colors represent different groupings, and the entries with the most
   significant p-values in each group are bolded in the figure. The main
   enriched GO entries were “GO:0008276∼protein methyltransferase activity
   (P = 0.004)”, “GO:0055010∼ventricular cardiac muscle tissue
   morphogenesis (P = 0.004),” “GO:0045214∼sarcomere organization
   (P < 0.001),” “GO:0055001∼muscle cell development (P < 0.001)”. KEGG
   pathway as “KEGG:04640∼Hematopoietic cell lineage (P = 0.004)” and
   “KEGG:04917∼Prolactin signaling pathway (P = 0.011).” These pathways
   are proposed to be associated with the shared pathogenesis of RA and
   IHF.

3.3. PPI network construction and key module identification

   After constructing the PPI network of CDEGs and excluding genes that
   were not interconnected, the network contained 73 genes with
   interactions ([115]Fig. 3A). The results of the MCODE cluster analysis
   show that a module containing 12 edges and 4 nodes has the highest
   score (score = 4) ([116]Fig. 3B). These four genes are CD2 molecule
   (CD2), CD3 delta subunit of T-cell receptor complex (CD3D), interleukin
   7 receptor (IL7R), and C-C motif chemokine ligand 5 (CCL5). Cytohubba
   obtained five genes, which were CD2, CD3D, IL7R, CCL5, and cofilin 2
   (CFL2) ([117]Fig. 3C). Highly similar results were observed, so CD2,
   CD3D, IL7R, and CCL5 were included as key module genes.

Fig. 3.

   [118]Fig. 3
   [119]Open in a new tab

   PPI network and key module genes of CDEGs. (A) PPI networks for CDEGs.
   (B) MCODE clustering results. (C) cytohubba clustering results.

3.4. Hub gene screening and validation based on machine learning algorithms

   We further utilized two ML algorithms to screen key candidate genes
   among CDEGs. The results of LASSO regression indicated that when λmin
   was 0.041, there were 12 coefficients with non-zero values ([120]Fig.
   4A). The results of RF showed that the minimum model output error rate
   was 0.130, corresponding to 33 selected genes ([121]Fig. 4B and C). A
   total of 18 genes were predicted by LASSO and 33 by RF, with eight
   overlapping genes identified at the intersection, namely
   spermatogenesis associated 18 (SPATA18), chromodomain helicase DNA
   binding protein 3 (CHD3), heparan sulfate-glucosamine
   3-sulfotransferase 3A1 (HS3ST3A1), phosphatidylinositol specific
   phospholipase C X domain containing 1 (PLCXD1), chromosome 4 open
   reading frame 54 (C4orf54), long intergenic non-protein coding RNA 1165
   (LINC01165), calponin 1 (CNN1), and MMEL1 antisense RNA 1 (MMEL1-AS1)
   ([122]Fig. 4D and [123]Supplementary Table S3). These eight genes and
   the four key module genes CD2, CD3D, IL7R, and CCL5 obtained from PPI
   network screening were combined as candidate hub genes, and ROC curves
   were drawn to assess their predictive power on [124]GSE77298 and
   [125]GSE21610. Analysis revealed that all 12 genes exhibited strong
   diagnostic performance in the training set, with an AUC exceeding 0.7
   for each gene ([126]Supplementary Fig. S1). To assess the
   generalizability of these findings, the differential expression of
   these 12 genes was further examined in the validation set, and ROC
   curves were generated. Results indicated that CD2, SPATA18, CD3D, CCL5,
   and IL7R showed significantly different expression levels between
   disease and control groups in the validation set ([127]Supplementary
   Figure S2 A,B). Therefore, we identified these five genes as the final
   hub genes. The ROC curves showed good diagnostic results for all hub
   genes except for IL7R in [128]GSE57345 where the AUC was less than 0.6
   ([129]Supplementary Figure S2 C,D). This suggests that the hub gene has
   the potential to be a marker gene.

Fig. 4.

   [130]Fig. 4
   [131]Open in a new tab

   Machine learning-based screening of hub genes. (A) Screening of hub
   genes using the LASSO regression algorithm. (B) Screening of hub genes
   using the RF algorithm. (C) Importance score graph of the top 20 genes
   predicted by the RF algorithm. (D) Venn diagram of intersecting genes
   predicted by LASSO regression and RF algorithm.

3.5. Construction of hub gene co-expression network and TF-mRNA-miRNA
regulatory network

   GeneMANIA predicted 20 genes linked to the hub gene, along with 789
   interaction links, highlighting a complex network between the hub gene
   and these associated genes. Enrichment analysis revealed processes
   related to cytokine activity, cellular response to chemokines, cytokine
   receptor binding, T cell differentiation, and lymphocyte migration.
   KEGG analysis indicated that enrichment was primarily focused on
   pathways such as cytokine-cytokine receptor interaction, chemokine
   signaling, hematopoietic cell lineage, T cell receptor signaling, as
   well as Th1, Th2, and Th17 cell differentiation ([132]Fig. 5A and
   [133]Supplementary Fig. S3). The “TF-mRNA-miRNA” regulatory network
   comprised 52 TFs regulating hub genes and 23 miRNAs targeted by hub
   genes, involving 75 interactions. Notably, several TFs in the network
   such as signal transducer and activator of transcription 3 (STAT3),
   GATA binding protein 2 (GATA2), forkhead box C1 (FOXC1), and paired box
   2 (PAX2), can regulate multiple hub genes simultaneously ([134]Fig.
   5B). DGIDB predictions show that ALEFACEPT [[135]42] and ALDESLEUKIN
   [[136]43] are FDA-approved drugs targeting CD2, while MUROMONAB-CD3
   [[137]44] is an approved drug targeting CD3D.

Fig. 5.

   [138]Fig. 5
   [139]Open in a new tab

   (A) Co-expression network analysis of hub genes. (B) The
   “TF-mRNA-miRNA” regulatory network of hub genes. Hub genes are marked
   with gray circles; TFs are marked with blue octagons; miRNAs are marked
   with orange diamonds.

3.6. Analysis of the correlation between hub genes and immune cell
infiltration

   Enrichment results of hub gene co-expression analysis and the hub genes
   themselves highlight the importance of immune and inflammatory
   responses. Immune cell infiltration analysis was conducted separately
   on the expression profiles of RA and IHF ([140]Fig. 6A–C), followed by
   an evaluation of the correlation between hub gene expression and immune
   cells. As shown in [141]Fig. 6B, in the RA group, patients with RA had
   a higher proportion of plasma cells, macrophages M0, and a lower
   proportion of B memory cells, activated NK cells, monocytes, resting
   dendritic cells, and resting mast cells compared with controls (all P
   values < 0.05). In the IHF group, patients with IHF had a higher
   proportion of resting CD4^+ T memory cells and a lower proportion of
   monocytes and macrophages M2 than controls (all P values < 0.05)
   ([142]Fig. 6D). Correlation analysis revealed that in the RA group,
   CD2, CD3D, CCL5, and IL7R expression was positively associated with
   plasma cells, activated CD4^+ T memory cells, and γδ T cells (P < 0.05)
   while negatively associated with resting mast cells, monocytes,
   activated NK cells, and T regulatory cells (Treg) (P < 0.05) ([143]Fig.
   6E). In the IHF group, these genes were positively correlated with
   plasma cells, activated CD4^+ T memory cells, and macrophages M1
   (P < 0.05) and negatively correlated with monocytes, naive CD4^+ T
   cells, and T regulatory cells (P < 0.05) ([144]Fig. 6F). These findings
   indicate that the interactions between hub genes and plasma cells,
   activated CD4^+ T memory cells, monocytes, and T regulatory cells may
   contribute to the development of IHF in RA patients.

Fig. 6.

   [145]Fig. 6
   [146]Open in a new tab

   Immune cell infiltration analysis. (A) Relative proportions of 22
   immune cell subpopulations in the [147]GSE77298 dataset. (B) Comparison
   of immune cell ratios between RA and disease groups in the
   [148]GSE77298 dataset. (C) Relative proportions of 22 immune cell
   subpopulations in the [149]GSE21610 dataset. (D) Comparison of immune
   cell ratios between IHF and disease groups in the [150]GSE21610
   dataset. (E) Heat map of the correlation between hub genes and immune
   cells in the [151]GSE77298 dataset. (F) Heat map of the correlation
   between hub genes and immune cells in the [152]GSE21610 dataset. C:
   control group, P: disease group. ∗, p < 0.05; ∗∗, p < 0.01; ∗∗∗,
   p < 0.001.

3.7. Single-cell analysis for the location of hub genes

   To understand the localization of hub genes in different cell
   subpopulations and their differential expression, we analyzed scRNA-seq
   data containing control and IHF groups. After completing quality
   control, we clustered 13,832 cell samples into 19 clusters. We then
   manually annotated the clusters, ultimately annotating the 19 clusters
   into eight classes of cell subpopulations (6377 monocytes, 3371 T
   cells, 1380 activated T cells, 1403 NK cells, 869 myeloid cells, 204
   endothelial cells, 180 B cells, and 48 fibroblasts) ([153]Fig. 7A).
   Afterward, we visualized the localization of the hub gene in control
   and disease samples. The analysis revealed that CD2, CD3D, CCL5, and
   IL7R were predominantly expressed in T cells, activated T cells,
   monocytes, NK cells, and endothelial cells ([154]Fig. 7B), whereas
   SPATA18 showed no significant expression in these cells. Differential
   expression analysis indicated that CD2, CD3D, CCL5, and IL7R were
   expressed at significantly higher levels in T cells, activated T cells,
   monocytes, and NK cells compared to controls (adj P value < 0.05)
   ([155]Supplementary Table S4). These findings align with immune cell
   infiltration results, underscoring a potential link between CD2, CD3D,
   CCL5, and IL7R and immune cells, particularly T cells, in the
   pathogenesis of RA combined with IHF.

Fig. 7.

   [156]Fig. 7
   [157]Open in a new tab

   (A) UMAP plots of the cell types in the control and IHF groups. (B)
   Visualization of hub genes expression in control and IHF patients. The
   yellow dots indicate gene expression.

4. Discussion

   Patients with RA have an increased risk of developing coronary artery
   disease and HF [[158]45,[159]46]. Among the different subtypes of HF,
   IHF is directly associated with coronary arteriosclerotic heart disease
   [[160]47]. Symptoms such as limitation of movement due to joint
   deformity and pain in RA patients mask their HF symptoms to some
   extent, which leads to a possible underdiagnosis of HF in RA patients
   [[161]48]. In addition, the study found a significantly higher risk of
   unrecognized myocardial infarction in the RA population [[162]3].
   Therefore, there is a need to raise awareness of RA combined with IHF.
   Unfortunately, the exact mechanisms that precede RA and IHF have not
   been fully elucidated. The development of microarray technology offers
   the possibility to identify DEGs in different disease states and to
   understand the complex biological processes behind the genes [[163]49].
   To our knowledge, there are no reported studies on the relationship
   between RA and IHF at the genetic level using bioinformatics. This
   study represents the first attempt to integrate gene datasets from two
   diseases to uncover shared mechanisms and identify potential
   biomarkers.

   We obtained 177 CDEGs for RA and IHF. Functional enrichment analysis
   revealed that the enrichment results could be divided into six
   clusters, three of which were linked to each other: sarcomere
   organization-related pathways, ventricular myocardial tissue
   morphogenesis, and chondrocyte differentiation. In addition, the
   prolactin (PRL) signaling pathway, hematopoietic cell lineage, and
   protein methyltransferase activity were significantly enriched. Acute
   and chronic myocardial ischemia leads to massive myocardial cell
   necrosis when myocardial cells become thin and longitudinally
   elongated, and myocardial contractility decreases. Thus, IHF is
   characterized by systolic dysfunction [[164]50]. The sarcomere is the
   basic unit of myocardial contraction and consists of thick and thin
   myofilaments. Myocardial contractility is primarily driven by the
   interaction between myosin heads (thick filaments) and actin (thin
   filaments) [[165]51]. In RA, inflammatory mediators accumulate in the
   synovial joints, leading to chondrocyte dysfunction. Additionally, RA
   chondrocytes function as effector cells, undergoing changes that
   directly or indirectly contribute to joint damage in RA [[166]52]. The
   PRL signaling pathway and hematopoietic cell lineage are closely
   associated with the inflammatory immune response. Serum PRL levels were
   elevated in RA patients compared to healthy individuals [[167]53]. Adán
   et al. [[168]54] found that PRL can inhibit chondrocyte apoptosis
   through tumor protein p53 (TP53) and the Janus kinase 2 (JAK2)/STAT3
   pathway, showing therapeutic potential for RA joint inflammation. A
   positive correlation has been identified between serum PRL levels and
   immune-inflammatory factors, including interleukin 6 (IL-6),
   interleukin 10 (IL-10), tumor necrosis factor-alpha (TNF-α), and
   intercellular adhesion molecule 1 (ICAM1), in patients with congestive
   HF. Additionally, serum PRL levels serve as an independent predictor of
   mortality or hospitalization in patients with advanced congestive HF
   [[169]55]. This suggests that high PRL levels in vivo during RA may be
   associated with developing IHF. Protein methyltransferases regulate
   gene expression by targeting histone molecules' arginine and lysine
   side chains. As a significant player in transcriptional regulation,
   protein methyltransferases are extensively involved in many processes,
   including differentiation, proliferation, and apoptosis of cardiac
   cells [[170]56,[171]57]. Our results suggest that the above pathways
   may be a potential mechanism for the development of IHF in RA patients.

   After exploring the common biological pathways of the two diseases, we
   further focused on identifying promising biomarkers. By screening the
   key modules in the PPI network and further ML algorithms to identify
   key genes, we finally obtained five hub genes, CD2, SPATA18, CD3D,
   CCL5, and IL7R. They were significantly differentially expressed in the
   training and validation sets and showed good diagnostic value. We found
   that, compared to the results from ML screening, the PPI network
   identified more hub genes. This suggests that the screening results
   based on protein-protein interactions may better reflect the crosstalk
   and common mechanisms between the two diseases, whereas ML algorithms
   focus more on the specific expression characteristics of individual
   genes. Previous studies have shown that CD2, IL-7R, and CD3D are
   involved to varying degrees in the onset and progression of RA, with
   their functions primarily related to regulating immune responses in RA
   [[172][58], [173][59], [174][60]]. However, there is limited research
   on these genes in IHF. Nonetheless, their importance in immune
   regulation suggests they may participate in the development of IHF by
   modulating cardiac immune responses.

   CD3D, a part of the T-cell receptor/CD3 complex (TCR/CD3 complex), is
   involved in T-cell development and signal transduction [[175]61].
   Research suggests that CD3D could be a promising biomarker for early RA
   diagnosis, especially in patients negative for anti-citrullinated
   protein antibodies [[176]58]. Currently, substantial evidence suggests
   that CD3D could be a potential therapeutic target in immunotherapy
   [[177]62,[178]63]. Li et al. [[179]64] also identified CD3D as a
   promising RA therapeutic target using a module-based cumulative score
   method. CD2, a transmembrane glycoprotein belonging to the
   immunoglobulin superfamily, is broadly expressed on T cells, NK cells,
   thymocytes, and dendritic cells [[180]65]. Research indicates that CD2
   enhances T-cell activation via co-stimulation of the TCR/CD3 complex
   [[181]66]. Two biologics targeting CD2, Alefacept and Siplizumab, have
   been used clinically. Clinical trials have demonstrated that Alefacept
   significantly improves the symptoms of psoriatic arthritis by
   inhibiting T-cell activation [[182]67]. Notably, CD2 expression is
   markedly elevated in the synovial tissues of RA patients compared to
   healthy synovia [[183]59], with CD2 receptors (CD2R) also showing high
   expression levels in the peripheral blood of RA patients [[184]68].
   This suggests that Alefacept may have potential therapeutic
   applications in RA treatment. The IL-7R is a heterodimeric complex.
   Given the adverse effects of steroid therapy, such as metabolic
   disorders, secondary osteoporosis, and steroid resistance, increasing
   attention has been drawn to IL-7R antibody therapies as promising
   targets for autoimmune diseases [[185]69]. Studies have shown that
   IL-7R-positive lymphocytes can evade steroid-induced apoptosis, making
   IL-7R-targeted therapies a potential approach for overcoming steroid
   resistance, thus showing promise in RA treatment [[186]70]. Evidence
   suggests that IL-7R antibody treatment can alleviate RA disease
   activity by inhibiting the proliferation of collagen type II
   (CII)-specific CD4^+ T cells, reducing Th1 cell differentiation, and
   preventing the migration of CD4^+ T cells to joint damage sites
   [[187]60]. Inhibiting the IL-7/IL-7R pathway to regulate joint myeloid
   cell migration, pro-inflammatory reprogramming, and T effector cell
   polarization during angiogenesis offers a promising therapeutic
   approach for RA [[188]71]. This indicates that IL-7R-targeted therapy
   may also hold significant potential in preventing IHF in RA patients.
   CCL5 may be a promising target that drives both IHF and RA. CCL5 is
   essential for recruiting T cells, eosinophils, and basophils to
   inflammatory sites. Produced by circulating T cells, it actively
   enhances T cell chemotactic activity in RA [[189]72]. Studies have
   demonstrated elevated CCL5 levels in synovial tissue and fluid, with
   suppressing its transcriptional expression shown to alleviate RA
   progression [[190]73,[191]74]. Notably, CCR5 antagonists, targeting the
   receptor for CCL5, have been successfully utilized in HIV treatment
   [[192]75]. A Phase II clinical trial is currently underway to evaluate
   whether the CCR5 antagonist GSK706769 can maintain the clinical
   remission established by Enbrel in RA patients after Enbrel
   discontinuation. Notably, numerous studies have identified CCL5 as a
   potential biomarker for HF [[193]76,[194]77]. Montecucco et al.
   [[195]78] found that suppressing CCL5 expression significantly reduced
   infarct size and post-infarction HF in a chronic myocardial ischemia
   mouse model, a mechanism linked to reduced leukocyte recruitment. These
   findings suggest that CCL5-targeted therapies may be worth
   investigating for treating IHF in RA patients. SPATA18 is essential in
   maintaining mitochondrial mass by inducing lysosome-like cells in
   mitochondria to scavenge oxidized mitochondrial proteins [[196]79].
   Currently, research on SPATA18 is mainly focused on oncology, and its
   relationship with RA and IHF requires further study.

   The results of multi-stage enrichment analysis and studies on Hub genes
   have highlighted the importance of inflammatory immune responses. To
   further explore the relationship between hub genes and immune cells,
   immune cell infiltration analysis was conducted. Previous studies have
   documented immune cell infiltration in RA and IHF [[197]80,[198]81].
   Our analysis focused on the association between hub genes and immune
   cells in these conditions. The results revealed significant
   correlations between hub genes and plasma cells, activated CD4^+ T
   memory cells, monocytes, and T regulatory cells in both datasets. Hub
   genes showed positive associations with plasma cell and CD4^+ T memory
   cell activation but negative associations with Tregs and monocytes,
   indicating their potential involvement in the co-occurrence of RA and
   IHF.

   Plasma cells, which differentiate from B cells, primarily secrete
   autoantibodies involved in the pathogenesis of RA, such as rheumatoid
   factor (RF) and anti-citrullinated protein antibodies (ACPA) [[199]82].
   Immune complexes formed by RF or ACPA can lead to joint destruction in
   RA patients through multiple mechanisms, including complement
   activation and osteoclast differentiation [[200]83]. Studies in humans
   have shown widespread deposition of immunoglobulin (IgG) in the
   myocardium of patients with end-stage heart failure with reduced
   ejection fraction (HFrEF) compared to controls [[201]84]. Although it
   remains unclear whether these antibodies play a direct pathogenic role,
   it has been observed in a coronary artery ligation-induced myocardial
   ischemia mouse model that IgG-deficient mice exhibited less severe
   myocardial injury [[202]85]. Furthermore, Hasham et al. [[203]86] found
   that antibody-dependent mechanisms contribute, at least in part, to
   lupus-induced myocarditis. Clinically, serum RF in RA patients has been
   linked to a higher risk of HF [[204]87], indicating that plasma cells
   could be a key mechanism underlying the increased susceptibility to IHF
   in RA patients. The observed positive correlation between hub genes and
   plasma cells highlights potential avenues for immunotherapeutic
   strategies that target plasma cells by modulating hub genes. Memory T
   cells are critical for maintaining immune memory. In RA, the sublining
   layer of the synovium is an important structure and one of the primary
   sites of inflammatory responses [[205]88]. Memory T cells are the
   predominant cell type in the sublining layer. These cells either
   infiltrate tissues diffusely or form ectopic germinal centers,
   ultimately leading to the production of large quantities of antibodies,
   such as RF or ACPA, which exacerbate RA progression [[206]89]. Studies
   have identified c-Met + memory T cells, expressing the c-Met receptor,
   as having a preferential ability to migrate to cardiac tissue during
   inflammation, exacerbating inflammatory responses [[207]90]. In mouse
   heart failure models, memory CD4^+ T cells have been linked to left
   ventricular dysfunction, fibrosis, and myocardial hypertrophy
   [[208]91]. The differentiation and maintenance of memory cells rely on
   cytokines, with IL-7 playing a key role in preserving the homeostasis
   of memory CD4^+ T cells [[209]92,[210]93], which may explain the
   observed positive correlation between IL-7R and memory cells. Tregs are
   essential for suppressing excessive immune responses and ensuring
   immune homeostasis. In RA patients, Treg levels are reduced [[211]94],
   potentially due to the chronic inflammatory state in RA that inhibits
   Treg formation and Foxp3 expression [[212]95]. A reduction in Treg
   numbers has also been reported in the peripheral blood of IHF patients
   [[213]96]. Impaired Treg cell numbers or functions amplify immune
   responses, leading to increased pro-inflammatory cytokine release and
   worsening RA [[214]97]. Hammer et al. [[215]98] reported a negative
   correlation between low circulating Treg proportions in IHF patients
   and mortality. Additionally, studies have shown that in vivo Treg
   expansion can mitigate adverse remodeling and enhance cardiac function
   after myocardial infarction by suppressing apoptosis, neutrophil,
   macrophage, and lymphocyte infiltration, and reducing inflammatory
   factors like TNF-α and interleukin 1 beta (IL1B) [[216]99]. The
   negative correlation between Hub genes and Tregs suggests that high
   expression of these genes may be associated with a reduction in Tregs.
   This implies that Hub genes may play a role in the immune inflammatory
   response during disease progression, affecting the generation and
   function of Tregs, and consequently leading to their decreased numbers.
   This finding provides a significant direction for future cellular
   studies targeting Hub genes. During inflammation, monocytes in the
   blood are recruited and differentiated into macrophages and dendritic
   cells. Recent studies have found that monocytes are essential in
   differentiating into osteoclasts and causing subsequent bone erosion in
   RA patients [[217]100]. The role of monocytes/macrophages in
   atherosclerotic plaque formation has been widely emphasized [[218]101].
   In addition, during ischemia-induced myocardial injury, a sustained
   inflammatory response and infiltration of M1-type macrophages also lead
   to apoptosis, abnormal ventricular remodeling, and impaired systolic
   function [[219]102]. Since the CIBSORT algorithm is analyzed to derive
   the relative proportions of immune cells in different samples, we
   believe that the negative correlation between hub genes and monocytes
   is caused by the massive differentiation of monocytes into macrophages
   and osteoclasts. The single-cell analysis for the hub gene echoed the
   results of immune cell infiltration, again highlighting the crosstalk
   between the hub gene and immune cells. These results indicate that
   immune cells may play a role in the development of IHF in RA patients.

   In addition, we constructed a “TF-mRNA-miRNA” regulatory network of the
   hub gene and predicted therapeutic agents targeting the hub gene, which
   may provide a basis for further studies in the future. Notably, TFs
   such as STAT3, GATA2, FOXC1, and RELA can regulate more than two hub
   genes at the same time. This suggests that the role of the above TFs in
   the common mechanism of RA and IHF may be of interest.

   This study has certain limitations, primarily the reliance on data from
   the GEO database. Despite applying strict screening criteria and
   validating results with external datasets, the samples were derived
   from diverse regions and genetic backgrounds. Additionally, RA and IHF
   are highly heterogeneous diseases, and individual variations among
   patients may have influenced the results. Batch effects across
   datasets, sample size imbalances, and differences in age, sex, and
   environmental factors between the disease and control groups may also
   have introduced potential confounding biases. Due to data limitations,
   we were unable to fully adjust for these confounders, which could be
   addressed in future studies by incorporating larger sample sizes,
   multi-dimensional data, and integrating imaging data with deep learning
   techniques. Second, to avoid missing key but less significant genes, we
   set relatively lenient criteria for DEG selection. However, such an
   approach may have resulted in some false-positive findings.
   Additionally, in our immune cell infiltration analysis, the RA and IHF
   samples we selected came from different biological environments and
   tissue sources, potentially containing varied types and proportions of
   cellular components. Since CIBERSORT relies on gene expression data to
   infer immune cell proportions, the heterogeneity of samples from
   different sources could affect the accuracy of the results [[220]103].
   Furthermore, systematic bias and technical variation in gene expression
   data across samples are inevitable, and even with standardization,
   heterogeneity may persist, impacting the accurate estimation of immune
   cell proportions [[221]104]. Thus, future validation using flow
   cytometry or mass cytometry is necessary. Third, studies have shown
   that heart failure with preserved ejection fraction (HFpEF) accounts
   for a considerable proportion of RA combined with HF [[222]48] and
   seems to be more closely associated with inflammatory responses
   compared to HFrEF [[223]105]. However, due to the lack of publicly
   available chip data for HFpEF patients, assessing the potential
   relationship between HFpEF and RA is equally important. Further
   elucidation of the molecular network changes in the RA-IHD-IHF pathway
   is crucial for revealing the complex relationships and potential
   mechanisms among these diseases. Lastly, while the identified Hub genes
   demonstrated high diagnostic value, the model's training and validation
   were based on small sample sizes and single-dimensional data.
   Therefore, the generalizability of the predictive results needs to be
   confirmed with future large-sample, multi-dimensional studies.
   Additionally, validation with in vivo and in vitro experiments is
   essential.

   Based on the comparison and discussion of existing research findings,
   our preliminary bioinformatics analysis suggests several potential
   directions for future in-depth studies on RA and IHF. Firstly, future
   research should focus on validating these biomarkers in clinical
   settings to determine their practical utility in diagnosing or treating
   RA and IHF. Additionally, exploring the specific molecular pathways
   regulated by these genes could provide deeper insights into the
   connections between the two diseases. Our study highlights CCL5 and the
   prolactin signaling pathway as the most promising biological targets
   and pathways. The significance of CCL5 in myocardial ischemia and RA
   suggests its potential as a common therapeutic target. The prolactin
   signaling pathway, closely associated with the immune inflammatory
   responses in RA and IHF, may offer therapeutic benefits when modulated.
   Immune cell infiltration and single-cell analysis further suggest that
   plasma cells, memory T cells, Tregs, and monocytes are promising target
   cell types. Lastly, the potential application of drugs predicted
   through target prediction, such as Alefacept and Aldesleukin, in RA or
   IHF could be a direction for future research. By integrating clinical
   and multi-omics data in the future, new directions for personalized
   treatment of RA and IHF comorbidity are expected to emerge.

5. Conclusions

   In summary, we spearheaded a bioinformatics study targeting the common
   patho-genesis and screening of key targets in RA and IHF. Our results
   suggest that sarcomere organization, PRL-related pathways, protein
   methyltransferase activity, and activation of immune-inflammatory
   responses may be involved in developing IHF in RA patients. Similarly,
   we propose five biomarkers with potential diagnostic and therapeutic
   value, CD2, SPATA18, CD3D, CCL5, and IL7R. These may provide further
   insight into the relationship between RA and IHF. These findings
   provide further insights into the relationship between RA and IHF.

CRediT authorship contribution statement

   Ziyi Sun: Writing – review & editing, Writing – original draft,
   Visualization, Formal analysis, Data curation, Conceptualization.
   Jianguo Lin: Data curation, Conceptualization. Xiaoning Sun: Formal
   analysis, Data curation. Zhangjun Yun: Formal analysis, Data curation.
   Xiaoxiao Zhang: Formal analysis, Data curation. Siyu Xu: Formal
   analysis, Data curation. Jinlong Duan: Writing – review & editing.
   Kuiwu Yao: Writing – review & editing, Writing – original draft,
   Supervision, Project administration, Funding acquisition,
   Conceptualization.

Ethics approval and consent to participate

   Review and/or approval by an ethics committee was not needed for this
   study. As our data originates from the GEO database, a public
   repository, all patients included in the database have received ethical
   approval. Researchers can freely access and utilize this data for
   studies and publications. Since our study relies on open-source data,
   there are no ethical concerns or conflicts of interest.

Consent for publication

   Not applicable.

Availability of data and materials

   The gene datasets used in this study, including [224]GSE77298,
   [225]GSE36700, [226]GSE21610, [227]GSE57345, and [228]GSE145154, are
   available in the Gene Expression Omnibus
   ([229]https://www.ncbi.nlm.nih.gov/geo/).

Funding

   This study was supported by the General Program of the National Natural
   Science Foundation of China (81473466, 81873173) and the National Key
   Research and Development Program of China (2019YFC1708703).

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Acknowledgements