Abstract Background MicroRNAs (miRNAs) play a crucial role in regulating adaptive and maladaptive responses in cardiovascular diseases, making them attractive targets for potential biomarkers. However, their potential as novel biomarkers for diagnosing cardiovascular diseases requires systematic evaluation. Methods In this study, we aimed to identify a key set of miRNA biomarkers using integrated bioinformatics and machine learning analysis. We combined and analyzed three gene expression datasets from the Gene Expression Omnibus (GEO) database, which contains peripheral blood mononuclear cell (PBMC) samples from individuals with myocardial infarction (MI), stable coronary artery disease (CAD), and healthy individuals. Additionally, we selected a set of miRNAs based on their area under the receiver operating characteristic curve (AUC-ROC) for separating the CAD and MI samples. We designed a two-layer architecture for sample classification, in which the first layer isolates healthy samples from unhealthy samples, and the second layer classifies stable CAD and MI samples. We trained different machine learning models using both biomarker sets and evaluated their performance on a test set. Results We identified hsa-miR-21-3p, hsa-miR-186-5p, and hsa-miR-32-3p as the differentially expressed miRNAs, and a set including hsa-miR-186-5p, hsa-miR-21-3p, hsa-miR-197-5p, hsa-miR-29a-5p, and hsa-miR-296-5p as the optimum set of miRNAs selected by their AUC-ROC. Both biomarker sets could distinguish healthy from not-healthy samples with complete accuracy. The best performance for the classification of CAD and MI was achieved with an SVM model trained using the biomarker set selected by AUC-ROC, with an AUC-ROC of 0.96 and an accuracy of 0.94 on the test data. Conclusions Our study demonstrated that miRNA signatures derived from PBMCs could serve as valuable novel biomarkers for cardiovascular diseases. Keywords: MicroRNA, Machine learning, Myocardial infarction, Bioinformatics, Biomarker Introduction Cardiovascular diseases (CVDs) are the leading cause of human mortality, accounting for 32% of all global deaths. It is estimated that approximately 85% of CVD mortality is due to myocardial infarction (MI) [[37]1]. MI is an acute coronary syndrome characterized by sudden blockage and stenosis of the coronary artery and subsequent myocardial ischemia, leading to extensive cardiomyocyte damage and necrosis [[38]2]. Over the last 50 years, numerous attempts have been made to use biomarkers to facilitate diagnosis, assess the risk, follow-up therapy, and determine therapeutic efficacy in CVD candidates. Based on released guidelines, cardiac troponins (cTns) are used as a highly sensitive and accurate approach for detecting MI. Despite these inherent advantages, the high sensitivity of cTn-based assays has also led to more false-positive results [[39]3], necessitating the advent and development of new modalities with pathological value. To improve the diagnostic value of existing MI biomarkers, a combination of complementary biological markers, such as microRNAs (miRNAs) and other genetic factors, has been proposed. Previous research supports the notion that miRNAs exhibit great potential as alternative biomarkers for CVD detection and follow-up [[40]4]. It has been suggested that miRNAs possess 18-22 nucleotides and play a crucial role in the regulation of gene expression. Evidence indicates that miRNAs are involved in the pathogenesis of cardiac tissue injury [[41]5]. Several biological processes, such as angiogenesis, cardiomyocyte growth and contractility, lipid metabolism, plaque formation, and cardiac rhythm, are regulated by miRNAs [[42]6]. Circulating and tissue-specific miRNAs have shown promise as diagnostic and prognostic biomarkers across a range of cardiovascular diseases, including MI and other conditions such as CAD, heart failure, atrial fibrillation, cardiac hypertrophy, and fibrosis [[43]7, [44]8]. The use of miRNAs as diagnostic and prognostic biomarkers in CVDs is supported by their stability and rapid release into circulation after myocardial injury [[45]7]. In CAD, altered expression of miRNAs like miR-1, miR-133a, miR-208a/b, and miR-499, which are abundantly expressed in the heart, has been reported in patients compared to healthy controls. Additional miRNAs including miR-21, miR-208a/b, miR-133a/b, and the miR-30 family are frequently dysregulated in acute coronary syndrome (ACS) versus stable CAD [[46]9]. Furthermore, miRNAs like miR-3113-5p, miR-223-3p, miR-499a-5p, and miR-133a-3p demonstrate potential as biomarkers to identify patients at risk of sudden cardiac death [[47]10]. Moreover, miRNAs have shown diagnostic potential in other CVDs. For instance, miR-21 has been associated with cardiac injury and has been implicated in the pathology and recurrence of MI. Elevated levels of miR-21 have been observed in ACS patients and have been linked to cardiomyocyte apoptosis and cardiac hypertrophy. Similarly, miR-26 has been implicated in the pathology and recurrence of MI [[48]11]. In addition to their diagnostic potential, miRNAs have also shown promise as prognostic biomarkers for adverse myocardial effects, sudden death, and risk assessment in MI and other CVDs. For example, miR-101 and miR-150 have been associated with flawed left ventricular contractility after MI, while miR-16 and miR-27a have been linked to an increased risk of adverse left ventricular remodeling [[49]7, [50]9]. These miRNAs may provide valuable prognostic information and aid in risk stratification for post-MI complications. Numerous studies have investigated the potential of miRNAs as biomarkers for MI, revealing promising findings. For instance, miR-1 has been proposed as a potential biomarker for MI [[51]9]. This miRNA has shown increased expression levels in patients with MI, suggesting its potential diagnostic value. Additionally, other miRNAs, such as miR-19b-3p, miR-208a, miR-223-3p, miR-483-5p, and miR-499a-5p, have demonstrated promising diagnostic accuracy for MI within a short time window after the onset of symptoms [[52]10]. A recent systematic review compared the peak time and diagnostic accuracy of miRNAs and conventional biomarkers in MI. The results revealed miR-1-3p, miR-19b-3p, miR-208a, miR-223-3p, miR-483-5p, and miR-499a-5p had superior peak times within 4 h and better accuracy versus cTn and Creatine kinase-MB, indicating their promise for early diagnosis. The strengths of miRNAs included their early peak expression, satisfactory sensitivity and specificity, and higher accuracy especially within the first few hours of symptom onset compared to conventional biomarkers [[53]12]. It has been postulated that the function and diagnostic properties of miRNAs are beyond the myocardium in patients with CVD. Specifically, the expression of miRNAs can vary in different biofluids and cell components such as serum and peripheral blood mononuclear cells (PBMCs) [[54]13]. PBMCs are a fraction of white blood cells, including monocytes, lymphocytes, macrophages, and other cells of the immune system [[55]14]. Emerging data indicate that PBMCs can be used as a valid source of biomarkers for monitoring various pathological conditions. Of note, the alteration of mRNAs and miRNAs under pathological conditions provides valuable information about different kinds of disorders. PBMCs can recapitulate the conditions of target tissues, thus providing a highly sensitive and specific source of biomarkers [[56]15]. Combined with these conditions, these cells are repositories of dysregulated genes and miRNA expression profiles in CVDs [[57]14, [58]15]. In recent years, the advent and application of machine learning (ML) has been an exciting prospect for advancing scientific research. Although the concept of ML and its initial algorithms were conceived many years ago, recent improvements in computing power and access to vast amounts of data have demonstrated that ML techniques outperform classical statistical methods in various fields. Furthermore, the progress made in omics technologies has enabled the analysis of massive and intricate biological datasets, consisting of hundreds to thousands of samples, which makes it possible for ML to extract valuable biological insights and information from such data [[59]16]. Consequently, ML provides innovative methods for merging and interpreting diverse types of omics data, leading to the identification of new biomarkers. These biomarkers can aid in precise disease prediction, patient stratification, and the development of novel therapeutic approaches [[60]17]. In this study, we aimed to identify potential miRNA biomarkers in patients with MI by combining and analyzing three different microarray datasets from PBMCs. The integration of omics data with bioinformatics and ML techniques could be a promising tool in the discovery of new and more accurate biomarkers for monitoring MI. Additionally, this approach can deepen our understanding of the underlying mechanisms of MI and aid in the development of valid diagnostic biomarkers and patient stratification. Methods Microarray data collection Microarray datasets were obtained from the Gene Expression Omnibus (GEO) database ([61]https://www.ncbi.nlm.nih.gov/geo/). To obtain robust classification performance between MI, healthy control, and CAD samples, sufficiently large sample sizes for each group are required. For this purpose, the [62]GSE59867 dataset was selected, as it contains sizable numbers of both MI and CAD samples. To provide an equally large set of healthy controls, the [63]GSE56609 and [64]GSE54475 datasets containing healthy samples were also included. Combining these three datasets enabled comparative analysis between MI, CAD, and healthy control groups with adequate statistical power. All samples were produced using Affymetrix Human Gene 1.0 ST Array platform ([65]GPL6244). This platform contains 189 miRNA probes based on the annotation data from the GEO database. Only healthy, CAD, and early-stage MI samples were selected from these datasets for further analysis. Early-stage MI samples were analyzed to enable detection of miRNA biomarkers specific to the initial ischemia and infarction event, before extensive myocardial necrosis and remodeling occurs. Using samples from the early phase enhances identification of miRNA signals related to plaque rupture and MI onset versus stable CAD. Additionally, early-stage samples allow investigation of mechanisms initiating myocardial injury. The basic information for the three datasets evaluated in this study is provided in Table [66]1. Bioinformatics analyses including preprocessing, differential expression analysis, and functional and pathway enrichment analyses were conducted using R, ver. 4.2.0 [[67]18], and RStudio [[68]19]. All plots and graphics of these sections were created using the ggplot2 R package [[69]20]. Table 1. Sample information on the GEO microarray dataset Dataset Platform Healthy CAD MI References