Abstract Background Osteoarthritis (OA) is a chronic and complex degenerative joint disease that increasingly burdens and affects the elderly population. Abnormal energy metabolism is closely associated with the pathological mechanisms of OA. This study aims to identify key genes related to energy metabolism that are closely linked to the treatment and diagnosis of OA. Methods The transcriptomic data for OA were collected from the Gene Expression Omnibus (GEO), with [30]GSE51588 and [31]GSE63359 serving as the training and validation datasets, respectively. Differential expression analysis was conducted to identify key energy metabolism-related genes. Unsupervised clustering was performed to classify molecular subtypes. Three machine learning algorithms were employed to identify key diagnosis genes, specifically Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Random Forest (RF). We construct a comprehensive nomogram, and the diagnostic performance of both the nomogram and feature genes was evaluated using operating characteristic curve (ROC) and calibration curves. We assessed the immune infiltration levels in OA samples using the IOBR platform and the CIBERSORT algorithm. Results We classified OA patients into two molecular subtypes, which exhibited distinct differences in immune infiltration levels. Subsequently, we successfully identified two characteristic genes, NUP98 and RPIA, and demonstrated statistically significant differences (P < 0.05) and diagnostic performance in the validation cohort. Evaluation using ROC and calibration curves demonstrated that these characteristic genes exhibit robust diagnostic performance. Multiple immune cells may be involved in the development of OA, and all characteristic genes may be associated with immune cells to varying degrees. Conclusion In conclusion, NUP98 and RPIA have the potential to serve as diagnostic biomarkers for OA and may offer opportunities for therapeutic intervention in OA. Keywords: osteoarthritis, energy metabolism, diagnosis, immunology Introduction Osteoarthritis (OA) is a chronic degenerative joint disease and one of the leading causes of disability, affecting approximately 500 million people worldwide.[32]^1 The clinical pathological features of OA primarily include articular cartilage erosion, synovial hyperplasia, synovial inflammation, abnormal angiogenesis, subchondral bone disruption, ligament and tendon instability, and joint stiffness.[33]^2 Flaviu et al[34]^3 found that the postoperative neutrophil/lymphocyte ratio (NLR) and the aggregate inflammatory systemic index (AISI) are reliable inflammatory biomarkers involved in orthopedic pathological inflammatory processes. Additionally, some studies have shown that tissue inhibitor of metalloproteinases 3 (TIMP-3) can effectively inhibit cartilage degradation in osteoarthritis and may aid in predicting early disease progression and treatment response.[35]^4 Overall, identifying effective biomarker targets is crucial for early diagnosis and the development of molecular-level therapeutic strategies for OA patients. Metabolism is the process by which cells obtain energy from nutritional substrates and sustain vital cellular functions.[36]^5 Metabolic dysregulation can lead to various diseases, including cardiovascular diseases and OA.[37]^6^,[38]^7 Clinical evidence shows that OA frequently coexists with metabolic-related diseases, such as diabetes and cardiovascular conditions, which accelerate the rapid deterioration of OA-affected joints.[39]^8 Moreover, chondrocyte metabolic dysfunction has been demonstrated to play a pivotal role in cartilage degeneration and OA progression.[40]^9 Wu et al[41]^10 reported that chondrocyte energy metabolism disorders result in significantly reduced mitochondrial respiration, decreased ATP production, impaired mitochondrial membrane potential, and morphological damage, ultimately compromising the function of OA chondrocytes. Another study found that energy metabolism disruption in OA chondrocytes leads to the accumulation of reactive oxygen species (ROS), contributing to chondrocyte degradation and apoptosis, thus promoting OA development.[42]^11 In summary, metabolism plays a critical role in the onset and progression of OA, particularly energy metabolism, which is closely linked to mitochondrial dysfunction and is key in the degeneration of chondrocytes leading to OA. Based on this context, the present study utilizes bioinformatics and machine learning techniques to conduct a thorough analysis of the role of energy metabolism-related genes (EMRGs) in the development of OA. We developed a comprehensive nomogram and evaluated the diagnostic efficacy of the feature genes associated with OA. Furthermore, our research delves into how feature genes interact with immune cell infiltration. This study seeks to identify biomarkers that play a crucial role in OA pathogenesis and establish a gene signature based on molecular diagnostics for OA. Materials and Methods Accessing Analysing Data This is a data-driven study. In this research, gene expression profile data for subchondral bone were initially acquired from the Gene Expression Omnibus (GEO, [43]https://www.ncbi.nlm.nih.gov/geo/) database, categorized as the microarray dataset ([44]GSE51588, comprising 26 samples from the normal group and 40 from the abnormal group). Subsequently, genes linked to OA were sourced from the GeneCards database ([45]https://www.ncbi.nlm.nih.gov/geo/). By comparing the differentially expressed genes identified in the microarray data with the OA-related genes, a selection of the top 20 genes was made based on their relevance scores. For validation purposes, peripheral blood expression profile data from OA patients were utilized ([46]GSE63359, normal group: 26, abnormal group: 46). Moreover, the “msigdbr” package was harnessed to gather gene data pertinent to energy metabolism.[47]^12 Identification and Preliminary Analysis of EMRGs Associated with OA The analysis of differential expression for the [48]GSE51588 dataset was carried out using the “limma” package. We established our selection parameters at |log fold change (FC)|>0.585 and adjust.p<0.05, which allowed us to pinpoint the differentially expressed genes (DEGs, [49]Supplementary Table 1). To illustrate these findings, we created both volcano plots and heat maps. By comparing these DEGs with EMRGs, we discovered a subset of differentially expressed energy metabolism-related genes (DEEMRGs). We subsequently carried out enrichment analyses for Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) on this group of DEEMRGs. Furthermore, we constructed a protein-protein interaction (PPI) network for the DEEMRGs using the STRING database ([50]https://string-db.org/), applying a confidence score threshold greater than 0.4, and subsequently visualized the network with Cytoscape v3.10.2 software (The Cytoscape Consortium, USA). Consensus Cluster Analysis of DEEMRGs Consensus clustering analysis serves as a robust technique for identifying the ideal number of clusters, thereby enhancing the reliability and precision of the findings.[51]^13 In our study, we employed the “ConsensusClusterPlus” package to examine the expression profiles of DEEMRGs, allowing us to pinpoint specific OA subtypes. The assessment utilized Euclidean distance metrics, incorporating 1000 resampling iterations to bolster the analysis’s resilience. Principal Component Analysis (PCA) was conducted using the “FactoMineR” package to reduce data dimensionality and evaluate the clustering distribution of samples in high-dimensional space. Analysis and Identification of Differentially Expressed Genes in Molecular Subtypes This study carried out a differential expression analysis between the two subtypes employing the “limma” package, establishing criteria of |log FC| > 0.585 and adjust.p < 0.05 ([52]Supplementary Table 2). To visualize the genes that were differentially expressed, we created volcano and heat maps. Furthermore, we performed KEGG enrichment analysis utilizing the “clusterProfiler” package. The top 15 pathways, based on the number of differentially expressed genes, were subsequently depicted in a bubble plot for enhanced visualization. Single-Sample Gene Set Enrichment Analysis of Molecular Subtypes We kicked off our analysis by employing the single-sample Gene Set Enrichment Analysis (ssGSEA) method on the samples, utilizing the “GSVA” package for the task. For delving into the gene expression levels, we turned to the “ggpubr” package to parse the data and then depicted our findings through box plots. To ascertain the statistical significance of the disparities in expression levels, we resorted to the Wilcoxon test. Screening of Diagnostic Biomarker Signature Genes for OA The current research utilized three different machine learning techniques to pinpoint gene features that might act as diagnostic biomarkers for OA. In our quest to identify the most promising OA biomarkers, We employed Least Absolute Shrinkage and Selection Operator (LASSO) regression using the “glmnet” package to refine the dataset. ([53]Supplementary Table 3). Additionally, we executed Support Vector Machine Recursive Feature Elimination (SVM-RFE) analysis via the “caret” package ([54]Supplementary Table 4). To further our analysis, we also conducted a Random Forest (RF) evaluation utilizing the “randomForest” package, selecting the top 10 genes based on their MeanDecreaseGini importance rankings ([55]Supplementary Table 5). Receiver Operating Characteristic Curve and Expression Level Validation To evaluate the diagnostic potential of the feature genes, we utilized the “pROC” package to create ROC curves for these genes within both the training dataset ([56]GSE51588) and the validation dataset ([57]GSE63359). Additionally, we applied the Wilcoxon test to assess the statistical significance of variations in gene expression levels. Gene Set Enrichment Analysis (GSEA) of OA Characteristic Genes We performed pathway enrichment analysis on the characteristic genes utilizing the “clusterProfiler” package along with the reference dataset (c2.cp.kegg.v7.5.symbols.gmt). We set our criteria at adjust.p < 0.05 and |Normalized Enrichment Score (NES)| > 1. For every characteristic gene, we showcased the top five pathways that yielded the most significant P-values. Establishment of Competitive Endogenous RNA (ceRNA) Networks The present study employed the “multiMiR” package to pinpoint miRNAs that interact with specific genes, sifting through results from both TargetScan and miRDB. Next, we determined the overlap of predictions yielded by these two sources. Interaction data concerning lncRNAs and miRNAs was retrieved from the ENCORI database ([58]https://rnasysu.com/encori/), selecting lncRNAs with a clipExpNum greater than 10. Finally, the findings were illustrated using the “ggsankey” package. Construction of an OA Diagnostic Model Employing the “rms” package, we developed a nomogram and calibration curve for the feature genes. To visualize the performance of these nomograms and characteristic genes, ROC curves were created utilizing the “pROC” package. Furthermore, we produced decision curve analysis (DCA) curves for both the characteristic genes and the nomograms through the “rmda” package. Analysis of Immune Infiltration To evaluate the infiltration levels of immune cells in OA, we employed the “IOBR” package to create stacked bar charts illustrating the distribution of these immune cells. The CIBERSORT algorithm was applied to calculate immune infiltration levels, which were visualized using box plots. Furthermore, the “ggcorrplot” package was employed to visualize the correlations among differentially expressed immune cells. This approach also generated a heatmap that highlights the connections between characteristic genes and these immune cells. Statistical Analysis All analyses were conducted using R (version 4.1.3, R Foundation for Statistical Computing, Austria), and related packages. The Wilcoxon test was used to determine statistical differences between groups. ROC curves were plotted using the “pROC” package. Data visualization was primarily performed using the “ggplot2” package. P-values were summarized as follows: **** P < 0.0001; *** 0.0001 < P < 0.001; ** 0.001 < P < 0.01; * 0.01 < P < 0.05; ns P > 0.05. Results Identification and Preliminary Analysis of DEEMRGs To investigate the role of energy metabolism in OA, this study first identified 1561 DEGs from the [59]GSE51588 dataset by comparing normal and OA tissue samples. As shown in [60]Figure 1A, 815 genes were downregulated, while 746 were upregulated. [61]Figure 1B presents a heatmap of the top 20 DEGs with the highest relevance scores. By intersecting the DEGs with EMRGs, 49 DEEMRGs were obtained ([62]Figure 1C). A PPI network analysis of these DEEMRGs revealed 41 nodes and 157 edges, as illustrated in [63]Figure 1D. The darker color and larger size of the nodes indicate a higher number of interactions with other proteins, reflecting the tight connections among the DEEMRGs. GO enrichment analysis revealed that these DEEMRGs were significantly involved in biological processes such as the monosaccharide metabolic process, carbohydrate biosynthetic process, cellular carbohydrate metabolic process, hexose metabolic process, and ribose phosphate metabolic process ([64]Figure 1E). KEGG pathway analysis further demonstrated that these genes were primarily enriched in glycolysis/gluconeogenesis, carbon metabolism, the HIF-1 signaling pathway, the glucagon signaling pathway, and central carbon metabolism in cancer ([65]Figure 1F). Figure 1. [66]Figure 1 [67]Open in a new tab Identification and preliminary analysis of DEEMRGs. (A) The volcano plot illustrates the DEGs between normal tissues and OA tissues, with 815 downregulated genes and 746 upregulated genes. (B) The heatmap displays the expression levels of the top 20 DEGs by relevance score in different tissues, with red representing OA tissues and blue representing normal tissues. Values greater than 0 indicate high expression, while values less than 0 indicate low expression. (C) The Upset plot shows that the number of DEGs is 1,512, the number of EMRGs is 546, and the number of intersecting genes is 49. (D) The PPI network diagram of the 49 DEEMRGs. (E) GO enrichment analysis of the biological processes involving DEEMRGs. (F) KEGG pathway enrichment analysis of DEEMRGs. Consensus Clustering Analysis Based on DEEMRGs To further investigate the DEEMRGs, we conducted consensus clustering analysis using the “ConsensusClusterPlus” package. As illustrated in [68]Figure 2A–D, when K=2, the within-group variability was minimal, while the between-group differences were more pronounced. The changes in the cumulative distribution function (CDF) also provided clear insight into the quality of the clustering results. As shown in [69]Figure 2F, the smallest change in the CDF occurred at K=2, suggesting that dividing the samples into two subtypes was the most appropriate. This conclusion is further supported by [70]Figure 2E, which confirms that K=2 provided the optimal clustering effect. Consequently, the 40 OA samples in the training set were divided into two subtypes. The PCA plot of the subtypes, presented in [71]Figure 2G, demonstrates distinct separation between the two groups. Figure 2. [72]Figure 2 [73]Open in a new tab Consensus clustering analysis based on DEEMRGs. Consensus matrix heatmap at K=2 (A), K=3 (B), K=4 (C), K=5 (D). (E) Consensus cumulative distribution plot of the consensus clustering analysis. The X-axis represents the consensus index, and the Y-axis represents the change in the CDF value. CDF: cumulative distribution function. (F) Delta area plot of the consensus clustering analysis. K=2 indicates the optimal number of clusters. (G) The PCA plot of molecular subtypes shows significant differences between the two molecular subtypes. Blue represents molecular subtype group 1, and red represents molecular subtype group 2. Differential Expression Analysis and GSEA of Molecular Subtypes Following an in-depth analysis of differential expression between the two molecular subtypes, a total of 1,785 differential genes were identified. As illustrated in [74]Figure 3A, there are 930 genes upregulated in clust1 relative to clust2, and 855 genes downregulated in clust1 relative to clust2. [75]Figure 3B displays a heatmap of the expression levels for the top 20 differential genes based on relevance score. To further elucidate the molecular mechanisms underlying the subtypes, GSEA was conducted. The enrichment analysis results for upregulated differential genes, shown in [76]Figure 3C, indicate significant enrichment in pathways such as the cytoskeleton in muscle cells, human papillomavirus infection, cell cycle, motor proteins, and protein digestion and absorption. Conversely, downregulated genes were predominantly enriched in pathways associated with cytokine-cytokine receptor interaction, lipid metabolism and atherosclerosis, the MAPK signaling pathway, Chagas disease, and the chemokine signaling pathway ([77]Figure 3D). Figure 3. [78]Figure 3 [79]Open in a new tab Differential expression analysis and GSEA of molecular subtypes. (A) The volcano plot illustrates the genes that differ between the two molecular subtypes, revealing a total of 930 genes that are upregulated and 855 that are downregulated. (B) The heatmap shows the expression levels of the top 20 differential genes in the two molecular subtype groups. (C) KEGG pathway enrichment analysis results for upregulated genes. (D) Analysis of KEGG for downregulated genes. Immune Infiltration Analysis of Molecular Subtypes The level of immune cell infiltration is strongly associated with disease onset and progression. Thus, we utilized ssGSEA to analyze the abundance of immune cell infiltration between the two subtypes. The results, as shown in [80]Figure 4A, indicate that subtype 2 exhibits a significantly higher abundance of immune cell infiltration compared to subtype 1. To further explore the infiltration levels of immune cells, we applied the CIBERSORT algorithm to evaluate the infiltration of 22 types of immune cells in each subtype ([81]Figure 4B). The expression levels of these immune cells are presented in [82]Figure 4C, revealing nine differential immune cells. Specifically, Eosinophils, Macrophages M2, Mast cells activated, Monocytes, NK cells resting, and T cells CD4 memory activated are highly expressed in clust1, while B cells memory, Dendritic cells resting, and NK cells activated are more prevalent in clust2. Existing literature highlights the role of Macrophages in immune suppression, immune evasion, and resistance to immunotherapy, whereas NK cells activated are crucial for inhibiting disease progression.[83]^14^,[84]^15 Therefore, patients in subtype 2 are more likely to benefit from immune activation strategies to suppress disease development. To enhance our understanding of immune cell interactions and assist OA patients in preventing disease exacerbation, we performed Spearman correlation analysis of the differential immune cells between subtypes 1 and 2 and visualized the results using a heatmap. [85]Figure 4D illustrates a notable negative correlation between activated NK cells and resting NK cells. Meanwhile, resting dendritic cells express significant negative correlations with activated M2 macrophages, activated mast cells, and activated CD4 memory T cells. Eosinophils, on the other hand, display significant positive correlations with M2 macrophages and monocytes, whereas monocytes show strong positive correlations with activated CD4 memory T cells and resting NK cells. Figure 4. [86]Figure 4 [87]Open in a new tab Immune infiltration analysis of molecular subtypes. (A) The boxplot represents the ssGSEA scores of the two molecular subtypes. (B) The distribution plot of immune cell infiltration levels between molecular subtypes. (C) The boxplot shows the differential infiltration levels of immune cells between molecular subtypes. (D)The heatmap represents the correlation analysis of immune cell infiltration. For all figures: “ns” represents no significant difference, * represents P<0.05, ** represents P<0.01, and *** represents P<0.001. Screening Feature Genes Based on Machine Learning To identify diagnostic biomarkers for OA from EMRGs, we first applied LASSO analysis to eliminate highly correlated genes, resulting in the selection of 11 feature genes, as shown in [88]Figure 5A and [89]B. We then performed further analysis using RF. The residual distribution plot of the RF ([90]Figure 5C) shows the number of trees on the X-axis and the error on the Y-axis, indicating a gradual decrease in error as the number of trees increases. Based on the MeanDecreaseGini metric, we ranked the top 10 most important genes ([91]Figure 5D). Additionally, we used SVM-RFE to select core feature genes, with the results shown in [92]Figure 5E, identifying a classifier with the minimum cross-validation error of 3. Finally, by intersecting the genes selected through LASSO, SVM-RFE, and RF, we identified 2 key feature genes: NUP98 and RPIA ([93]Figure 5F). Figure 5. [94]Figure 5 [95]Open in a new tab Feature genes were selected based on machine learning. (A) Ten cross-validations for selecting adjustment parameters in the LASSO model. Each curve signifies a distinct gene. (B) LASSO coefficient evaluation. Vertical dashed lines indicate the optimal lambda. (C) The graph produced by the RF tree illustrates the correlation between the quantity of trees represented on the X-axis and the cross-validation error plotted on the Y-axis. (D) Feature importance plot of the RF. (E) The SVM-RFE feature selection result plot, with the horizontal axis representing variables and the vertical axis representing cross-validation error. (F) The Venn diagram illustrates the intersection of gene sets after applying three different machine learning methods. Diagnostic Value and Expression Level Assessment of Feature Genes Based on EMRGs We further investigated the diagnostic value of the feature genes NUP98 and RPIA for OA using ROC curves. In the training set [96]GSE51588, the AUC values for NUP98 and RPIA were 1 and 0.992, respectively ([97]Figure 6A). The results shown in [98]Figure 6B indicate that the expression levels of NUP98 and RPIA in OA tissues are significantly lower than those in normal tissues (P<0.001). In the validation set [99]GSE63359, the AUC values for the ROC curves were 0.748 for NUP98 and 0.703 for RPIA ([100]Figure 6C). Similarly, the expression levels of these feature genes in OA tissues were significantly lower than those in normal tissues ([101]Figure 6D). Additionally, [102]Table 1 presents the NPV, PPV, sensitivity, and specificity of the diagnostic gene ROC curves for both the training and validation sets, demonstrating the robust diagnostic performance of these two genes. The ROC curve analysis and expression patterns of these feature genes demonstrate their high accuracy and reliability in OA diagnosis. Notably, after analyzing the expression levels of these feature genes in the two molecular subtypes, we found that NUP98 and RPIA are significantly more expressed in subtype 1 compared to subtype 2 ([103]Figure 6E). [104]Figure 6F compares the expression levels of the feature genes between the subtypes and normal tissues, showing that NUP98 and RPIA are significantly more expressed in the control samples compared to subtype 1, and subtype 1 exhibits higher expression levels than subtype 2. The significant differences in NUP98 and RPIA expression between normal and OA tissues provide strong support for their potential as biomarkers for OA diagnosis. Figure 6. [105]Figure 6 [106]Open in a new tab Diagnostic value and expression level assessment of feature genes based on EMRGs. (A) In the [107]GSE51588 training set, the ROC curve AUC value for NUP98 is 1, while the AUC value for RPIA is 0.992. (B) Comparison of the expression levels of feature genes between normal and OA tissues in the training set. (C) In the [108]GSE63359 validation set, the ROC curve AUC value for NUP98 is 0.748, and for RPIA, it is 0.703. (D) Comparison of the expression levels of feature genes between normal and OA tissues in the validation set. (E) Analyzing the expression levels of characteristic genes across various molecular subtypes. (F) A thorough analysis of the expression levels of key genes in relation to the two molecular subtypes and normal tissues. For all figures: “ns” represents no significant difference, ** represents P<0.01, *** represents P<0.001, and **** represents P<0.0001. Table 1. Detailed Information of the GWAS in Our Analysis Datasets Gene AUC Sensitivity Specificity PPV NPV CutOff Training set NUP98 1 1 1 1 1 0.355 RPIA 0.992 1 0.95 1 0.833 0.298 Validation set NUP98 0.748 0.462 0.957 0.759 0.857 5.544 RPIA 0.703 0.731 0.696 0.821 0.576 6.422 [109]Open in a new tab Construction of an OA Diagnostic Signature Based on EMRGs To provide a clearer visualization of the diagnostic performance of the characteristic genes, we constructed a comprehensive nomogram ([110]Figure 7A). The nomogram, incorporating gene expression levels and risk scores, significantly improved our ability to predict OA risk. As shown in [111]Figure 7B, the AUC values for the ROC curves of the nomogram, NUP98, and RPIA were 1, 1, and 0.992, respectively, indicating excellent diagnostic accuracy. Furthermore, the calibration curve ([112]Figure 7C) demonstrated that the nomogram’s predictions closely aligned with actual observations. The DCA was employed to evaluate the clinical utility of this diagnostic model. As illustrated in [113]Figure 7D, the nomogram showed a substantial net benefit, highlighting its strong potential for use in clinical decision process. Overall, the diagnostic model based on EMRGs exhibited outstanding diagnostic efficacy and precision. Figure 7. [114]Figure 7 [115]Open in a new tab Establishment of an OA diagnostic model based on EMRGs. (A) A nomogram created from the two feature genes. (B) The results of the ROC curve show that the AUC values for the nomogram, NUP98, and RPIA are 1, 1, and 0.992, respectively. (C) The calibration curve: the horizontal axis represents the predicted probability of different clinical outcomes by the model, and the vertical axis represents the observed probability of patient clinical outcomes. The dashed line represents the ideal curve for comparison. (D) The DCA curve for the feature genes and diagnostic model: “None” represents the clinical outcome where none of the OA patients are diagnosed with the disease, and “All” represents the outcome where all OA patients are diagnosed. The X-axis indicates the threshold probability, while the Y-axis shows the net gain. GSEA Enrichment Analysis and Construction of the ceRNA Network for Characteristic Genes To further investigate the functional roles of the characteristic genes, we conducted GSEA analysis. The enrichment analysis of NUP98 ([116]Figure 8A) revealed that it is primarily involved in pathways such as hypertrophic cardiomyopathy, spliceosome, leishmania infection, receptor interaction, and olfactory transduction. Similarly, RPIA was found to participate in olfactory transduction, ribosome function, receptor interaction, basal cell carcinoma, and the hedgehog signaling pathway ([117]Figure 8B). Additionally, by constructing a ceRNA network, we identified 18 pairs of relationships involving 4 miRNAs and 8 lncRNAs. The interactions between the characteristic genes, lncRNAs, and miRNAs were highly interconnected ([118]Figure 8C). Figure 8. [119]Figure 8 [120]Open in a new tab GSEA enrichment analysis and construction of the ceRNA network for feature genes. (A) The GSEA enrichment results indicate the pathways involving NUP98. (B) The GSEA enrichment findings reveal the pathways associated with RPIA. (C) The Sankey diagram illustrates the ceRNA network of the feature genes. Analysis of the Immunological Landscape To clearly depict the immune landscape in OA patients, we analyzed the immune cell infiltration status in normal and OA tissues ([121]Figure 9A). Notably, an examination of immune cell infiltration through differential analysis showed considerable variations among 15 different immune cell types across the groups ([122]Figure 9B). Among these, naive B cells, CD8^+ T cells, naive CD4^+ T cells, Tregs, activated NK cells, M1 and M2 macrophages, and resting dendritic cells showed higher infiltration in OA tissues, whereas plasma cells, resting memory CD4^+ T cells, monocytes, activated dendritic cells, resting mast cells, activated mast cells, and neutrophils exhibited lower infiltration in OA tissues. We further conducted a detailed analysis of the interactions between these immune cells to uncover the relationships among immune infiltrates within the immune microenvironment ([123]Figure 9C). Crucially, we also explored the relationship between the expression levels of feature genes and the infiltration of immune cells. [124]Figure 9D indicates a notable negative correlation between NUP98 and naive B cells, CD8^+ T cells, Tregs, activated NK cells, as well as activated dendritic cells. Conversely, it shows a positive correlation with plasma cells, resting memory CD4^+ T cells, resting mast cells, and neutrophils. On the other hand, RPIA displayed a negative correlation with CD8^+ T cells, follicular helper T cells, Tregs, activated NK cells, M1 macrophages, and resting dendritic cells. However, it featured a positive correlation with plasma cells, resting memory CD4^+ T cells, activated memory CD4^+ T cells, monocytes, resting mast cells, and neutrophils. Figure 9. [125]Figure 9 [126]Open in a new tab Immune cell infiltration analysis. (A) A comparison of the infiltration levels of 22 different immune cell types in both normal and osteoarthritic tissues. (B) The box plot illustrates the comparative analysis of immune cell infiltration across various tissues. (C) The heatmap depicts the correlation analysis related to differences in immune cell infiltration. (D) The examination of the relationship between feature gene expression levels and immune cell infiltration. For all figures: “ns” represents no significant difference, * represents P<0.05, ** represents P<0.01, *** represents P<0.001, and **** represents P<0.0001. Discussion OA is a chronic neurodegenerative joint disorder that significantly affects the quality of life in middle-aged and elderly individuals.[127]^16 From a histopathological perspective, OA is marked by a gradual deterioration of the articular cartilage, the emergence of osteophytes, and the formation of subchondral cysts.[128]^16 Under adverse conditions, metabolic activity shifts from a resting state to a highly active one, playing a key role in combating disease onset and progression.[129]^17 Metabolic processes are crucial for the functioning of various organs, including cartilage and synovial joints.[130]^18 Glucose serves as the primary energy source for the body, and its breakdown provides the necessary energy to maintain normal physiological functions. Cartilage, in particular, relies on an adequate energy supply for its proper function.[131]^19 Disruptions in energy metabolism can lead to abnormalities in intracellular biological processes, laying the foundation for disease development.[132]^10^,[133]^12 The present study, therefore, investigates the diagnostic potential of EMRGs in OA, exploring their functional roles in OA progression and providing a theoretical basis for identifying diagnostic biomarkers and informing clinical decision making in OA. This study identified two key genes associated with OA progression, NUP98 and RPIA, from the EMRGs gene set using three machine learning methods: LASSO, SVM-RFE, and RF. Nucleoporin 98 (NUP98) encodes a protein that forms part of the nuclear pore complex.[134]^20 Research indicates that nuclear pore proteins (Nups) interact with chromatin to regulate gene expression and chromatin architecture.[135]^21 According to S. Capitanio et al, NUP98 interacts with several DExH/D-box proteins (DBPs) to modulate gene expression and RNA metabolism.[136]^22 RNA metabolism plays a pivotal role in chondrocyte senescence, a key process in OA pathology. Specifically, the long noncoding RNA ELDR is known to accelerate chondrocyte aging and OA progression.[137]^23 Abnormal gene expression forms the basis of disease development and is crucial for identifying disease biomarkers.[138]^24 Our findings demonstrate that NUP98 expression is significantly reduced in OA tissues. ROC and DCA curve analyses further confirmed the high diagnostic accuracy of NUP98 in OA. The another gene, ribose-5-phosphate isomerase A (RPIA), has been implicated in the development of several malignancies, including liver cancer[139]^25 and colorectal cancer.[140]^26 Our results suggest that RPIA holds promise as a potential diagnostic biomarker for OA. While RPIA is highly expressed in normal tissues, its levels are significantly decreased in OA tissues. Importantly, further validation through nomogram analysis and ROC curves confirmed the high efficiency of RPIA as a diagnostic marker for OA. OA is a complex disease driven by multiple factors, including inflammation.[141]^27 Immune cells, particularly macrophages, play a critical role as mediators of the inflammatory response.[142]^27 Macrophages are highly heterogeneous and can be classified into activated M1 and M2 macrophages, as well as classically activated macrophages.[143]^28 M1 macrophages are pro-inflammatory, secreting cytokines such as interleukin-6 (IL-6), inducible nitric oxide synthase (iNOS), and tumor necrosis factor-alpha (TNF-α).[144]^29 Conversely, M2 macrophages produce anti-inflammatory factors, including IL-10, IL-1 receptor antagonist, and arginase, which inhibit IL-1β and iNOS activity.[145]^29 Studies have shown that shifting macrophages from the M1 to the M2 phenotype may have therapeutic benefits for OA.[146]^7^,[147]^30^,[148]^31 In this study, ssGSEA analysis revealed elevated infiltration of both M1 and M2 macrophages in OA tissues, suggesting that targeted regulation of macrophage polarization could be a promising therapeutic strategy for OA. In addition, our analysis revealed a strong negative correlation between RPIA expression and the infiltration levels of M1 macrophages, which supports the notion of tailoring OA treatments to individual patients. However, there are several limitations in this study. First, this research was based on transcriptomic data obtained from publicly available databases, which lacks access to additional clinically relevant data from OA samples, potentially impacting the results. Secondly, further experimental validation is needed to confirm the inflammatory findings and strengthen our conclusions. In summary, our study provides valuable insights into targeting energy metabolism as a therapeutic approach for OA, and future research could expand and deepen the investigation in this area. Conclusion This study identified two diagnostic genes associated with EMRGs in OA through bioinformatics combined with machine learning algorithms, emphasizing the connection between energy metabolism and OA progression. The diagnostic accuracy of these feature genes was validated via the construction of nomogram and ROC curve. Furthermore, we examined the differences in immune cell infiltration between normal and OA tissues, and explored the correlation between feature genes and immune infiltration patterns. Collectively, this analysis underscores the critical role of energy metabolism in OA, providing theoretical support for personalized treatment strategies and clinical diagnostic decisions for OA patients. Funding Statement There is no funding to report. Data Sharing Statement The data and materials in the current study are available from the corresponding author on reasonable request. Ethics Approval and Consent to Participate This study involves publicly available human data and complies with the exemption criteria outlined in items 1 and 2 of Article 32 of the Measures for Ethical Review of Life Science and Medical Research Involving Human Subjects (issued on February 18, 2023, China). Specifically: 1. The research involves publicly accessible data that has been legally obtained and does not involve the collection of new biological samples or private information from participants. 2. The research poses no risk to the participants’ rights or health and does not violate ethical principles. Based on the above guidelines, this study is exempt from review by the Institutional Review Board (IRB) or local ethics committee. All data used in this study are de-identified and anonymized prior to public release, ensuring the protection of participants’ privacy. Author Contributions All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising, or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work. Disclosure The authors declare that they have no potential conflicts of interest. References