Abstract

Background

   Osteoarthritis (OA) is a chronic and complex degenerative joint disease
   that increasingly burdens and affects the elderly population. Abnormal
   energy metabolism is closely associated with the pathological
   mechanisms of OA. This study aims to identify key genes related to
   energy metabolism that are closely linked to the treatment and
   diagnosis of OA.

Methods

   The transcriptomic data for OA were collected from the Gene Expression
   Omnibus (GEO), with [30]GSE51588 and [31]GSE63359 serving as the
   training and validation datasets, respectively. Differential expression
   analysis was conducted to identify key energy metabolism-related genes.
   Unsupervised clustering was performed to classify molecular subtypes.
   Three machine learning algorithms were employed to identify key
   diagnosis genes, specifically Least Absolute Shrinkage and Selection
   Operator (LASSO), Support Vector Machine Recursive Feature Elimination
   (SVM-RFE), and Random Forest (RF). We construct a comprehensive
   nomogram, and the diagnostic performance of both the nomogram and
   feature genes was evaluated using operating characteristic curve (ROC)
   and calibration curves. We assessed the immune infiltration levels in
   OA samples using the IOBR platform and the CIBERSORT algorithm.

Results

   We classified OA patients into two molecular subtypes, which exhibited
   distinct differences in immune infiltration levels. Subsequently, we
   successfully identified two characteristic genes, NUP98 and RPIA, and
   demonstrated statistically significant differences (P < 0.05) and
   diagnostic performance in the validation cohort. Evaluation using ROC
   and calibration curves demonstrated that these characteristic genes
   exhibit robust diagnostic performance. Multiple immune cells may be
   involved in the development of OA, and all characteristic genes may be
   associated with immune cells to varying degrees.

Conclusion

   In conclusion, NUP98 and RPIA have the potential to serve as diagnostic
   biomarkers for OA and may offer opportunities for therapeutic
   intervention in OA.

   Keywords: osteoarthritis, energy metabolism, diagnosis, immunology

Introduction

   Osteoarthritis (OA) is a chronic degenerative joint disease and one of
   the leading causes of disability, affecting approximately 500 million
   people worldwide.[32]^1 The clinical pathological features of OA
   primarily include articular cartilage erosion, synovial hyperplasia,
   synovial inflammation, abnormal angiogenesis, subchondral bone
   disruption, ligament and tendon instability, and joint stiffness.[33]^2
   Flaviu et al[34]^3 found that the postoperative neutrophil/lymphocyte
   ratio (NLR) and the aggregate inflammatory systemic index (AISI) are
   reliable inflammatory biomarkers involved in orthopedic pathological
   inflammatory processes. Additionally, some studies have shown that
   tissue inhibitor of metalloproteinases 3 (TIMP-3) can effectively
   inhibit cartilage degradation in osteoarthritis and may aid in
   predicting early disease progression and treatment response.[35]^4
   Overall, identifying effective biomarker targets is crucial for early
   diagnosis and the development of molecular-level therapeutic strategies
   for OA patients.

   Metabolism is the process by which cells obtain energy from nutritional
   substrates and sustain vital cellular functions.[36]^5 Metabolic
   dysregulation can lead to various diseases, including cardiovascular
   diseases and OA.[37]^6^,[38]^7 Clinical evidence shows that OA
   frequently coexists with metabolic-related diseases, such as diabetes
   and cardiovascular conditions, which accelerate the rapid deterioration
   of OA-affected joints.[39]^8 Moreover, chondrocyte metabolic
   dysfunction has been demonstrated to play a pivotal role in cartilage
   degeneration and OA progression.[40]^9 Wu et al[41]^10 reported that
   chondrocyte energy metabolism disorders result in significantly reduced
   mitochondrial respiration, decreased ATP production, impaired
   mitochondrial membrane potential, and morphological damage, ultimately
   compromising the function of OA chondrocytes. Another study found that
   energy metabolism disruption in OA chondrocytes leads to the
   accumulation of reactive oxygen species (ROS), contributing to
   chondrocyte degradation and apoptosis, thus promoting OA
   development.[42]^11 In summary, metabolism plays a critical role in the
   onset and progression of OA, particularly energy metabolism, which is
   closely linked to mitochondrial dysfunction and is key in the
   degeneration of chondrocytes leading to OA.

   Based on this context, the present study utilizes bioinformatics and
   machine learning techniques to conduct a thorough analysis of the role
   of energy metabolism-related genes (EMRGs) in the development of OA. We
   developed a comprehensive nomogram and evaluated the diagnostic
   efficacy of the feature genes associated with OA. Furthermore, our
   research delves into how feature genes interact with immune cell
   infiltration. This study seeks to identify biomarkers that play a
   crucial role in OA pathogenesis and establish a gene signature based on
   molecular diagnostics for OA.

Materials and Methods

Accessing Analysing Data

   This is a data-driven study. In this research, gene expression profile
   data for subchondral bone were initially acquired from the Gene
   Expression Omnibus (GEO, [43]https://www.ncbi.nlm.nih.gov/geo/)
   database, categorized as the microarray dataset ([44]GSE51588,
   comprising 26 samples from the normal group and 40 from the abnormal
   group). Subsequently, genes linked to OA were sourced from the
   GeneCards database ([45]https://www.ncbi.nlm.nih.gov/geo/). By
   comparing the differentially expressed genes identified in the
   microarray data with the OA-related genes, a selection of the top 20
   genes was made based on their relevance scores. For validation
   purposes, peripheral blood expression profile data from OA patients
   were utilized ([46]GSE63359, normal group: 26, abnormal group: 46).
   Moreover, the “msigdbr” package was harnessed to gather gene data
   pertinent to energy metabolism.[47]^12

Identification and Preliminary Analysis of EMRGs Associated with OA

   The analysis of differential expression for the [48]GSE51588 dataset
   was carried out using the “limma” package. We established our selection
   parameters at |log fold change (FC)|>0.585 and adjust.p<0.05, which
   allowed us to pinpoint the differentially expressed genes (DEGs,
   [49]Supplementary Table 1). To illustrate these findings, we created
   both volcano plots and heat maps. By comparing these DEGs with EMRGs,
   we discovered a subset of differentially expressed energy
   metabolism-related genes (DEEMRGs). We subsequently carried out
   enrichment analyses for Gene Ontology (GO) and the Kyoto Encyclopedia
   of Genes and Genomes (KEGG) on this group of DEEMRGs. Furthermore, we
   constructed a protein-protein interaction (PPI) network for the DEEMRGs
   using the STRING database ([50]https://string-db.org/), applying a
   confidence score threshold greater than 0.4, and subsequently
   visualized the network with Cytoscape v3.10.2 software (The Cytoscape
   Consortium, USA).

Consensus Cluster Analysis of DEEMRGs

   Consensus clustering analysis serves as a robust technique for
   identifying the ideal number of clusters, thereby enhancing the
   reliability and precision of the findings.[51]^13 In our study, we
   employed the “ConsensusClusterPlus” package to examine the expression
   profiles of DEEMRGs, allowing us to pinpoint specific OA subtypes. The
   assessment utilized Euclidean distance metrics, incorporating 1000
   resampling iterations to bolster the analysis’s resilience. Principal
   Component Analysis (PCA) was conducted using the “FactoMineR” package
   to reduce data dimensionality and evaluate the clustering distribution
   of samples in high-dimensional space.

Analysis and Identification of Differentially Expressed Genes in Molecular
Subtypes

   This study carried out a differential expression analysis between the
   two subtypes employing the “limma” package, establishing criteria of
   |log FC| > 0.585 and adjust.p < 0.05 ([52]Supplementary Table 2). To
   visualize the genes that were differentially expressed, we created
   volcano and heat maps. Furthermore, we performed KEGG enrichment
   analysis utilizing the “clusterProfiler” package. The top 15 pathways,
   based on the number of differentially expressed genes, were
   subsequently depicted in a bubble plot for enhanced visualization.

Single-Sample Gene Set Enrichment Analysis of Molecular Subtypes

   We kicked off our analysis by employing the single-sample Gene Set
   Enrichment Analysis (ssGSEA) method on the samples, utilizing the
   “GSVA” package for the task. For delving into the gene expression
   levels, we turned to the “ggpubr” package to parse the data and then
   depicted our findings through box plots. To ascertain the statistical
   significance of the disparities in expression levels, we resorted to
   the Wilcoxon test.

Screening of Diagnostic Biomarker Signature Genes for OA

   The current research utilized three different machine learning
   techniques to pinpoint gene features that might act as diagnostic
   biomarkers for OA. In our quest to identify the most promising OA
   biomarkers, We employed Least Absolute Shrinkage and Selection Operator
   (LASSO) regression using the “glmnet” package to refine the dataset.
   ([53]Supplementary Table 3). Additionally, we executed Support Vector
   Machine Recursive Feature Elimination (SVM-RFE) analysis via the
   “caret” package ([54]Supplementary Table 4). To further our analysis,
   we also conducted a Random Forest (RF) evaluation utilizing the
   “randomForest” package, selecting the top 10 genes based on their
   MeanDecreaseGini importance rankings ([55]Supplementary Table 5).

Receiver Operating Characteristic Curve and Expression Level Validation

   To evaluate the diagnostic potential of the feature genes, we utilized
   the “pROC” package to create ROC curves for these genes within both the
   training dataset ([56]GSE51588) and the validation dataset
   ([57]GSE63359). Additionally, we applied the Wilcoxon test to assess
   the statistical significance of variations in gene expression levels.

Gene Set Enrichment Analysis (GSEA) of OA Characteristic Genes

   We performed pathway enrichment analysis on the characteristic genes
   utilizing the “clusterProfiler” package along with the reference
   dataset (c2.cp.kegg.v7.5.symbols.gmt). We set our criteria at adjust.p
   < 0.05 and |Normalized Enrichment Score (NES)| > 1. For every
   characteristic gene, we showcased the top five pathways that yielded
   the most significant P-values.

Establishment of Competitive Endogenous RNA (ceRNA) Networks

   The present study employed the “multiMiR” package to pinpoint miRNAs
   that interact with specific genes, sifting through results from both
   TargetScan and miRDB. Next, we determined the overlap of predictions
   yielded by these two sources. Interaction data concerning lncRNAs and
   miRNAs was retrieved from the ENCORI database
   ([58]https://rnasysu.com/encori/), selecting lncRNAs with a clipExpNum
   greater than 10. Finally, the findings were illustrated using the
   “ggsankey” package.

Construction of an OA Diagnostic Model

   Employing the “rms” package, we developed a nomogram and calibration
   curve for the feature genes. To visualize the performance of these
   nomograms and characteristic genes, ROC curves were created utilizing
   the “pROC” package. Furthermore, we produced decision curve analysis
   (DCA) curves for both the characteristic genes and the nomograms
   through the “rmda” package.

Analysis of Immune Infiltration

   To evaluate the infiltration levels of immune cells in OA, we employed
   the “IOBR” package to create stacked bar charts illustrating the
   distribution of these immune cells. The CIBERSORT algorithm was applied
   to calculate immune infiltration levels, which were visualized using
   box plots. Furthermore, the “ggcorrplot” package was employed to
   visualize the correlations among differentially expressed immune cells.
   This approach also generated a heatmap that highlights the connections
   between characteristic genes and these immune cells.

Statistical Analysis

   All analyses were conducted using R (version 4.1.3, R Foundation for
   Statistical Computing, Austria), and related packages. The Wilcoxon
   test was used to determine statistical differences between groups. ROC
   curves were plotted using the “pROC” package. Data visualization was
   primarily performed using the “ggplot2” package. P-values were
   summarized as follows: **** P < 0.0001; *** 0.0001 < P < 0.001; **
   0.001 < P < 0.01; * 0.01 < P < 0.05; ns P > 0.05.

Results

Identification and Preliminary Analysis of DEEMRGs

   To investigate the role of energy metabolism in OA, this study first
   identified 1561 DEGs from the [59]GSE51588 dataset by comparing normal
   and OA tissue samples. As shown in [60]Figure 1A, 815 genes were
   downregulated, while 746 were upregulated. [61]Figure 1B presents a
   heatmap of the top 20 DEGs with the highest relevance scores. By
   intersecting the DEGs with EMRGs, 49 DEEMRGs were obtained ([62]Figure
   1C). A PPI network analysis of these DEEMRGs revealed 41 nodes and 157
   edges, as illustrated in [63]Figure 1D. The darker color and larger
   size of the nodes indicate a higher number of interactions with other
   proteins, reflecting the tight connections among the DEEMRGs. GO
   enrichment analysis revealed that these DEEMRGs were significantly
   involved in biological processes such as the monosaccharide metabolic
   process, carbohydrate biosynthetic process, cellular carbohydrate
   metabolic process, hexose metabolic process, and ribose phosphate
   metabolic process ([64]Figure 1E). KEGG pathway analysis further
   demonstrated that these genes were primarily enriched in
   glycolysis/gluconeogenesis, carbon metabolism, the HIF-1 signaling
   pathway, the glucagon signaling pathway, and central carbon metabolism
   in cancer ([65]Figure 1F).

Figure 1.

   [66]Figure 1
   [67]Open in a new tab

   Identification and preliminary analysis of DEEMRGs. (A) The volcano
   plot illustrates the DEGs between normal tissues and OA tissues, with
   815 downregulated genes and 746 upregulated genes. (B) The heatmap
   displays the expression levels of the top 20 DEGs by relevance score in
   different tissues, with red representing OA tissues and blue
   representing normal tissues. Values greater than 0 indicate high
   expression, while values less than 0 indicate low expression. (C) The
   Upset plot shows that the number of DEGs is 1,512, the number of EMRGs
   is 546, and the number of intersecting genes is 49. (D) The PPI network
   diagram of the 49 DEEMRGs. (E) GO enrichment analysis of the biological
   processes involving DEEMRGs. (F) KEGG pathway enrichment analysis of
   DEEMRGs.

Consensus Clustering Analysis Based on DEEMRGs

   To further investigate the DEEMRGs, we conducted consensus clustering
   analysis using the “ConsensusClusterPlus” package. As illustrated in
   [68]Figure 2A–D, when K=2, the within-group variability was minimal,
   while the between-group differences were more pronounced. The changes
   in the cumulative distribution function (CDF) also provided clear
   insight into the quality of the clustering results. As shown in
   [69]Figure 2F, the smallest change in the CDF occurred at K=2,
   suggesting that dividing the samples into two subtypes was the most
   appropriate. This conclusion is further supported by [70]Figure 2E,
   which confirms that K=2 provided the optimal clustering effect.
   Consequently, the 40 OA samples in the training set were divided into
   two subtypes. The PCA plot of the subtypes, presented in [71]Figure 2G,
   demonstrates distinct separation between the two groups.

Figure 2.

   [72]Figure 2
   [73]Open in a new tab

   Consensus clustering analysis based on DEEMRGs. Consensus matrix
   heatmap at K=2 (A), K=3 (B), K=4 (C), K=5 (D). (E) Consensus cumulative
   distribution plot of the consensus clustering analysis. The X-axis
   represents the consensus index, and the Y-axis represents the change in
   the CDF value. CDF: cumulative distribution function. (F) Delta area
   plot of the consensus clustering analysis. K=2 indicates the optimal
   number of clusters. (G) The PCA plot of molecular subtypes shows
   significant differences between the two molecular subtypes. Blue
   represents molecular subtype group 1, and red represents molecular
   subtype group 2.

Differential Expression Analysis and GSEA of Molecular Subtypes

   Following an in-depth analysis of differential expression between the
   two molecular subtypes, a total of 1,785 differential genes were
   identified. As illustrated in [74]Figure 3A, there are 930 genes
   upregulated in clust1 relative to clust2, and 855 genes downregulated
   in clust1 relative to clust2. [75]Figure 3B displays a heatmap of the
   expression levels for the top 20 differential genes based on relevance
   score. To further elucidate the molecular mechanisms underlying the
   subtypes, GSEA was conducted. The enrichment analysis results for
   upregulated differential genes, shown in [76]Figure 3C, indicate
   significant enrichment in pathways such as the cytoskeleton in muscle
   cells, human papillomavirus infection, cell cycle, motor proteins, and
   protein digestion and absorption. Conversely, downregulated genes were
   predominantly enriched in pathways associated with cytokine-cytokine
   receptor interaction, lipid metabolism and atherosclerosis, the MAPK
   signaling pathway, Chagas disease, and the chemokine signaling pathway
   ([77]Figure 3D).

Figure 3.

   [78]Figure 3
   [79]Open in a new tab

   Differential expression analysis and GSEA of molecular subtypes. (A)
   The volcano plot illustrates the genes that differ between the two
   molecular subtypes, revealing a total of 930 genes that are upregulated
   and 855 that are downregulated. (B) The heatmap shows the expression
   levels of the top 20 differential genes in the two molecular subtype
   groups. (C) KEGG pathway enrichment analysis results for upregulated
   genes. (D) Analysis of KEGG for downregulated genes.

Immune Infiltration Analysis of Molecular Subtypes

   The level of immune cell infiltration is strongly associated with
   disease onset and progression. Thus, we utilized ssGSEA to analyze the
   abundance of immune cell infiltration between the two subtypes. The
   results, as shown in [80]Figure 4A, indicate that subtype 2 exhibits a
   significantly higher abundance of immune cell infiltration compared to
   subtype 1. To further explore the infiltration levels of immune cells,
   we applied the CIBERSORT algorithm to evaluate the infiltration of 22
   types of immune cells in each subtype ([81]Figure 4B). The expression
   levels of these immune cells are presented in [82]Figure 4C, revealing
   nine differential immune cells. Specifically, Eosinophils, Macrophages
   M2, Mast cells activated, Monocytes, NK cells resting, and T cells CD4
   memory activated are highly expressed in clust1, while B cells memory,
   Dendritic cells resting, and NK cells activated are more prevalent in
   clust2. Existing literature highlights the role of Macrophages in
   immune suppression, immune evasion, and resistance to immunotherapy,
   whereas NK cells activated are crucial for inhibiting disease
   progression.[83]^14^,[84]^15 Therefore, patients in subtype 2 are more
   likely to benefit from immune activation strategies to suppress disease
   development. To enhance our understanding of immune cell interactions
   and assist OA patients in preventing disease exacerbation, we performed
   Spearman correlation analysis of the differential immune cells between
   subtypes 1 and 2 and visualized the results using a heatmap. [85]Figure
   4D illustrates a notable negative correlation between activated NK
   cells and resting NK cells. Meanwhile, resting dendritic cells express
   significant negative correlations with activated M2 macrophages,
   activated mast cells, and activated CD4 memory T cells. Eosinophils, on
   the other hand, display significant positive correlations with M2
   macrophages and monocytes, whereas monocytes show strong positive
   correlations with activated CD4 memory T cells and resting NK cells.

Figure 4.

   [86]Figure 4
   [87]Open in a new tab

   Immune infiltration analysis of molecular subtypes. (A) The boxplot
   represents the ssGSEA scores of the two molecular subtypes. (B) The
   distribution plot of immune cell infiltration levels between molecular
   subtypes. (C) The boxplot shows the differential infiltration levels of
   immune cells between molecular subtypes. (D)The heatmap represents the
   correlation analysis of immune cell infiltration. For all figures: “ns”
   represents no significant difference, * represents P<0.05, **
   represents P<0.01, and *** represents P<0.001.

Screening Feature Genes Based on Machine Learning

   To identify diagnostic biomarkers for OA from EMRGs, we first applied
   LASSO analysis to eliminate highly correlated genes, resulting in the
   selection of 11 feature genes, as shown in [88]Figure 5A and [89]B. We
   then performed further analysis using RF. The residual distribution
   plot of the RF ([90]Figure 5C) shows the number of trees on the X-axis
   and the error on the Y-axis, indicating a gradual decrease in error as
   the number of trees increases. Based on the MeanDecreaseGini metric, we
   ranked the top 10 most important genes ([91]Figure 5D). Additionally,
   we used SVM-RFE to select core feature genes, with the results shown in
   [92]Figure 5E, identifying a classifier with the minimum
   cross-validation error of 3. Finally, by intersecting the genes
   selected through LASSO, SVM-RFE, and RF, we identified 2 key feature
   genes: NUP98 and RPIA ([93]Figure 5F).

Figure 5.

   [94]Figure 5
   [95]Open in a new tab

   Feature genes were selected based on machine learning. (A) Ten
   cross-validations for selecting adjustment parameters in the LASSO
   model. Each curve signifies a distinct gene. (B) LASSO coefficient
   evaluation. Vertical dashed lines indicate the optimal lambda. (C) The
   graph produced by the RF tree illustrates the correlation between the
   quantity of trees represented on the X-axis and the cross-validation
   error plotted on the Y-axis. (D) Feature importance plot of the RF. (E)
   The SVM-RFE feature selection result plot, with the horizontal axis
   representing variables and the vertical axis representing
   cross-validation error. (F) The Venn diagram illustrates the
   intersection of gene sets after applying three different machine
   learning methods.

Diagnostic Value and Expression Level Assessment of Feature Genes Based on
EMRGs

   We further investigated the diagnostic value of the feature genes NUP98
   and RPIA for OA using ROC curves. In the training set [96]GSE51588, the
   AUC values for NUP98 and RPIA were 1 and 0.992, respectively
   ([97]Figure 6A). The results shown in [98]Figure 6B indicate that the
   expression levels of NUP98 and RPIA in OA tissues are significantly
   lower than those in normal tissues (P<0.001). In the validation set
   [99]GSE63359, the AUC values for the ROC curves were 0.748 for NUP98
   and 0.703 for RPIA ([100]Figure 6C). Similarly, the expression levels
   of these feature genes in OA tissues were significantly lower than
   those in normal tissues ([101]Figure 6D). Additionally, [102]Table 1
   presents the NPV, PPV, sensitivity, and specificity of the diagnostic
   gene ROC curves for both the training and validation sets,
   demonstrating the robust diagnostic performance of these two genes. The
   ROC curve analysis and expression patterns of these feature genes
   demonstrate their high accuracy and reliability in OA diagnosis.
   Notably, after analyzing the expression levels of these feature genes
   in the two molecular subtypes, we found that NUP98 and RPIA are
   significantly more expressed in subtype 1 compared to subtype 2
   ([103]Figure 6E). [104]Figure 6F compares the expression levels of the
   feature genes between the subtypes and normal tissues, showing that
   NUP98 and RPIA are significantly more expressed in the control samples
   compared to subtype 1, and subtype 1 exhibits higher expression levels
   than subtype 2. The significant differences in NUP98 and RPIA
   expression between normal and OA tissues provide strong support for
   their potential as biomarkers for OA diagnosis.

Figure 6.

   [105]Figure 6
   [106]Open in a new tab

   Diagnostic value and expression level assessment of feature genes based
   on EMRGs. (A) In the [107]GSE51588 training set, the ROC curve AUC
   value for NUP98 is 1, while the AUC value for RPIA is 0.992. (B)
   Comparison of the expression levels of feature genes between normal and
   OA tissues in the training set. (C) In the [108]GSE63359 validation
   set, the ROC curve AUC value for NUP98 is 0.748, and for RPIA, it is
   0.703. (D) Comparison of the expression levels of feature genes between
   normal and OA tissues in the validation set. (E) Analyzing the
   expression levels of characteristic genes across various molecular
   subtypes. (F) A thorough analysis of the expression levels of key genes
   in relation to the two molecular subtypes and normal tissues. For all
   figures: “ns” represents no significant difference, ** represents
   P<0.01, *** represents P<0.001, and **** represents P<0.0001.

Table 1.

   Detailed Information of the GWAS in Our Analysis
      Datasets    Gene   AUC  Sensitivity Specificity  PPV   NPV  CutOff
   Training set
                  NUP98 1     1           1           1     1     0.355
                  RPIA  0.992 1           0.95        1     0.833 0.298
   Validation set
                  NUP98 0.748 0.462       0.957       0.759 0.857 5.544
                  RPIA  0.703 0.731       0.696       0.821 0.576 6.422
   [109]Open in a new tab

Construction of an OA Diagnostic Signature Based on EMRGs

   To provide a clearer visualization of the diagnostic performance of the
   characteristic genes, we constructed a comprehensive nomogram
   ([110]Figure 7A). The nomogram, incorporating gene expression levels
   and risk scores, significantly improved our ability to predict OA risk.
   As shown in [111]Figure 7B, the AUC values for the ROC curves of the
   nomogram, NUP98, and RPIA were 1, 1, and 0.992, respectively,
   indicating excellent diagnostic accuracy. Furthermore, the calibration
   curve ([112]Figure 7C) demonstrated that the nomogram’s predictions
   closely aligned with actual observations. The DCA was employed to
   evaluate the clinical utility of this diagnostic model. As illustrated
   in [113]Figure 7D, the nomogram showed a substantial net benefit,
   highlighting its strong potential for use in clinical decision process.
   Overall, the diagnostic model based on EMRGs exhibited outstanding
   diagnostic efficacy and precision.

Figure 7.

   [114]Figure 7
   [115]Open in a new tab

   Establishment of an OA diagnostic model based on EMRGs. (A) A nomogram
   created from the two feature genes. (B) The results of the ROC curve
   show that the AUC values for the nomogram, NUP98, and RPIA are 1, 1,
   and 0.992, respectively. (C) The calibration curve: the horizontal axis
   represents the predicted probability of different clinical outcomes by
   the model, and the vertical axis represents the observed probability of
   patient clinical outcomes. The dashed line represents the ideal curve
   for comparison. (D) The DCA curve for the feature genes and diagnostic
   model: “None” represents the clinical outcome where none of the OA
   patients are diagnosed with the disease, and “All” represents the
   outcome where all OA patients are diagnosed. The X-axis indicates the
   threshold probability, while the Y-axis shows the net gain.

GSEA Enrichment Analysis and Construction of the ceRNA Network for
Characteristic Genes

   To further investigate the functional roles of the characteristic
   genes, we conducted GSEA analysis. The enrichment analysis of NUP98
   ([116]Figure 8A) revealed that it is primarily involved in pathways
   such as hypertrophic cardiomyopathy, spliceosome, leishmania infection,
   receptor interaction, and olfactory transduction. Similarly, RPIA was
   found to participate in olfactory transduction, ribosome function,
   receptor interaction, basal cell carcinoma, and the hedgehog signaling
   pathway ([117]Figure 8B). Additionally, by constructing a ceRNA
   network, we identified 18 pairs of relationships involving 4 miRNAs and
   8 lncRNAs. The interactions between the characteristic genes, lncRNAs,
   and miRNAs were highly interconnected ([118]Figure 8C).

Figure 8.

   [119]Figure 8
   [120]Open in a new tab

   GSEA enrichment analysis and construction of the ceRNA network for
   feature genes. (A) The GSEA enrichment results indicate the pathways
   involving NUP98. (B) The GSEA enrichment findings reveal the pathways
   associated with RPIA. (C) The Sankey diagram illustrates the ceRNA
   network of the feature genes.

Analysis of the Immunological Landscape

   To clearly depict the immune landscape in OA patients, we analyzed the
   immune cell infiltration status in normal and OA tissues ([121]Figure
   9A). Notably, an examination of immune cell infiltration through
   differential analysis showed considerable variations among 15 different
   immune cell types across the groups ([122]Figure 9B). Among these,
   naive B cells, CD8^+ T cells, naive CD4^+ T cells, Tregs, activated NK
   cells, M1 and M2 macrophages, and resting dendritic cells showed higher
   infiltration in OA tissues, whereas plasma cells, resting memory CD4^+
   T cells, monocytes, activated dendritic cells, resting mast cells,
   activated mast cells, and neutrophils exhibited lower infiltration in
   OA tissues. We further conducted a detailed analysis of the
   interactions between these immune cells to uncover the relationships
   among immune infiltrates within the immune microenvironment
   ([123]Figure 9C). Crucially, we also explored the relationship between
   the expression levels of feature genes and the infiltration of immune
   cells. [124]Figure 9D indicates a notable negative correlation between
   NUP98 and naive B cells, CD8^+ T cells, Tregs, activated NK cells, as
   well as activated dendritic cells. Conversely, it shows a positive
   correlation with plasma cells, resting memory CD4^+ T cells, resting
   mast cells, and neutrophils. On the other hand, RPIA displayed a
   negative correlation with CD8^+ T cells, follicular helper T cells,
   Tregs, activated NK cells, M1 macrophages, and resting dendritic cells.
   However, it featured a positive correlation with plasma cells, resting
   memory CD4^+ T cells, activated memory CD4^+ T cells, monocytes,
   resting mast cells, and neutrophils.

Figure 9.

   [125]Figure 9
   [126]Open in a new tab

   Immune cell infiltration analysis. (A) A comparison of the infiltration
   levels of 22 different immune cell types in both normal and
   osteoarthritic tissues. (B) The box plot illustrates the comparative
   analysis of immune cell infiltration across various tissues. (C) The
   heatmap depicts the correlation analysis related to differences in
   immune cell infiltration. (D) The examination of the relationship
   between feature gene expression levels and immune cell infiltration.
   For all figures: “ns” represents no significant difference, *
   represents P<0.05, ** represents P<0.01, *** represents P<0.001, and
   **** represents P<0.0001.

Discussion

   OA is a chronic neurodegenerative joint disorder that significantly
   affects the quality of life in middle-aged and elderly
   individuals.[127]^16 From a histopathological perspective, OA is marked
   by a gradual deterioration of the articular cartilage, the emergence of
   osteophytes, and the formation of subchondral cysts.[128]^16 Under
   adverse conditions, metabolic activity shifts from a resting state to a
   highly active one, playing a key role in combating disease onset and
   progression.[129]^17 Metabolic processes are crucial for the
   functioning of various organs, including cartilage and synovial
   joints.[130]^18 Glucose serves as the primary energy source for the
   body, and its breakdown provides the necessary energy to maintain
   normal physiological functions. Cartilage, in particular, relies on an
   adequate energy supply for its proper function.[131]^19 Disruptions in
   energy metabolism can lead to abnormalities in intracellular biological
   processes, laying the foundation for disease
   development.[132]^10^,[133]^12 The present study, therefore,
   investigates the diagnostic potential of EMRGs in OA, exploring their
   functional roles in OA progression and providing a theoretical basis
   for identifying diagnostic biomarkers and informing clinical decision
   making in OA.

   This study identified two key genes associated with OA progression,
   NUP98 and RPIA, from the EMRGs gene set using three machine learning
   methods: LASSO, SVM-RFE, and RF. Nucleoporin 98 (NUP98) encodes a
   protein that forms part of the nuclear pore complex.[134]^20 Research
   indicates that nuclear pore proteins (Nups) interact with chromatin to
   regulate gene expression and chromatin architecture.[135]^21 According
   to S. Capitanio et al, NUP98 interacts with several DExH/D-box proteins
   (DBPs) to modulate gene expression and RNA metabolism.[136]^22 RNA
   metabolism plays a pivotal role in chondrocyte senescence, a key
   process in OA pathology. Specifically, the long noncoding RNA ELDR is
   known to accelerate chondrocyte aging and OA progression.[137]^23
   Abnormal gene expression forms the basis of disease development and is
   crucial for identifying disease biomarkers.[138]^24 Our findings
   demonstrate that NUP98 expression is significantly reduced in OA
   tissues. ROC and DCA curve analyses further confirmed the high
   diagnostic accuracy of NUP98 in OA. The another gene,
   ribose-5-phosphate isomerase A (RPIA), has been implicated in the
   development of several malignancies, including liver cancer[139]^25 and
   colorectal cancer.[140]^26 Our results suggest that RPIA holds promise
   as a potential diagnostic biomarker for OA. While RPIA is highly
   expressed in normal tissues, its levels are significantly decreased in
   OA tissues. Importantly, further validation through nomogram analysis
   and ROC curves confirmed the high efficiency of RPIA as a diagnostic
   marker for OA.

   OA is a complex disease driven by multiple factors, including
   inflammation.[141]^27 Immune cells, particularly macrophages, play a
   critical role as mediators of the inflammatory response.[142]^27
   Macrophages are highly heterogeneous and can be classified into
   activated M1 and M2 macrophages, as well as classically activated
   macrophages.[143]^28 M1 macrophages are pro-inflammatory, secreting
   cytokines such as interleukin-6 (IL-6), inducible nitric oxide synthase
   (iNOS), and tumor necrosis factor-alpha (TNF-α).[144]^29 Conversely, M2
   macrophages produce anti-inflammatory factors, including IL-10, IL-1
   receptor antagonist, and arginase, which inhibit IL-1β and iNOS
   activity.[145]^29 Studies have shown that shifting macrophages from the
   M1 to the M2 phenotype may have therapeutic benefits for
   OA.[146]^7^,[147]^30^,[148]^31 In this study, ssGSEA analysis revealed
   elevated infiltration of both M1 and M2 macrophages in OA tissues,
   suggesting that targeted regulation of macrophage polarization could be
   a promising therapeutic strategy for OA. In addition, our analysis
   revealed a strong negative correlation between RPIA expression and the
   infiltration levels of M1 macrophages, which supports the notion of
   tailoring OA treatments to individual patients.

   However, there are several limitations in this study. First, this
   research was based on transcriptomic data obtained from publicly
   available databases, which lacks access to additional clinically
   relevant data from OA samples, potentially impacting the results.
   Secondly, further experimental validation is needed to confirm the
   inflammatory findings and strengthen our conclusions. In summary, our
   study provides valuable insights into targeting energy metabolism as a
   therapeutic approach for OA, and future research could expand and
   deepen the investigation in this area.

Conclusion

   This study identified two diagnostic genes associated with EMRGs in OA
   through bioinformatics combined with machine learning algorithms,
   emphasizing the connection between energy metabolism and OA
   progression. The diagnostic accuracy of these feature genes was
   validated via the construction of nomogram and ROC curve. Furthermore,
   we examined the differences in immune cell infiltration between normal
   and OA tissues, and explored the correlation between feature genes and
   immune infiltration patterns. Collectively, this analysis underscores
   the critical role of energy metabolism in OA, providing theoretical
   support for personalized treatment strategies and clinical diagnostic
   decisions for OA patients.

Funding Statement

   There is no funding to report.

Data Sharing Statement

   The data and materials in the current study are available from the
   corresponding author on reasonable request.

Ethics Approval and Consent to Participate

   This study involves publicly available human data and complies with the
   exemption criteria outlined in items 1 and 2 of Article 32 of the
   Measures for Ethical Review of Life Science and Medical Research
   Involving Human Subjects (issued on February 18, 2023, China).
   Specifically:
    1. The research involves publicly accessible data that has been
       legally obtained and does not involve the collection of new
       biological samples or private information from participants.
    2. The research poses no risk to the participants’ rights or health
       and does not violate ethical principles.

   Based on the above guidelines, this study is exempt from review by the
   Institutional Review Board (IRB) or local ethics committee. All data
   used in this study are de-identified and anonymized prior to public
   release, ensuring the protection of participants’ privacy.

Author Contributions

   All authors made a significant contribution to the work reported,
   whether that is in the conception, study design, execution, acquisition
   of data, analysis and interpretation, or in all these areas; took part
   in drafting, revising, or critically reviewing the article; gave final
   approval of the version to be published; have agreed on the journal to
   which the article has been submitted; and agree to be accountable for
   all aspects of the work.

Disclosure

   The authors declare that they have no potential conflicts of interest.

References