Abstract Background Postmenopausal osteoporosis (PMOP) represents as a significant health concern, particularly as the population ages. Currently, there is a paucity of comprehensive descriptions regarding the immunoregulatory mechanisms and early diagnostic biomarkers associated with PMOP. This study aims to examine immune-related differentially expressed genes (IR-DEGs) in the peripheral blood mononuclear cells of PMOP patients to identify immunological patterns and diagnostic biomarkers. Methods The [37]GSE56815 dataset from the Gene Expression Omnibus (GEO) database was used as the training group, while the [38]GSE2208 dataset served as the validation group. Initially, differential expression analysis was conducted after data integration to identify IR-DEGs in the peripheral blood mononuclear cells of PMOP. Subsequently, feature selection of these IR-DEGs was performed using RF, SVM-RFE, and LASSO regression models. Additionally, the expression of IR-DEGs in distinct bone marrow cell subtypes was analyzed using single-cell RNA sequencing (scRNA-seq) datasets, allowing the identification of cellular communication patterns within various cell subgroups. Finally, molecular subtypes and diagnostic models for PMOP were constructed based on these selected IR-DEGs. Furthermore, the expression levels of characteristic IR-DEGs were examined in rat osteoporosis (OP) models. Results Using machine learning, six IR-DEGs (JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5) were identified. Subsequently, two molecular subtypes of PMOP (subtype 1 and subtype 2) were established, with subtype 1 exhibiting a higher proportion of M1 macrophage infiltration. Analysis of the scRNA-seq dataset revealed 11 distinct cell clusters. It was noted that JUN was significantly overexpressed in M1 macrophages, while HMOX1 showed a marked elevation in endothelial cells and M2 macrophages. Cell communication results suggested that the PMOP microenvironment features increased interactions among M2 macrophages, CD8^+ T cells, Tregs, and fibroblasts. The diagnostic model based on these six IR-DEGs demonstrated excellent diagnostic performance (AUC = 0.927). In the OP rat model, the expression of IL1R2 and TNFSF8 were significantly elevated. Conclusion JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5 may serve as promising molecular targets for diagnosing and subtyping patients with PMOP. These results offer novel perspectives on the early diagnosis of PMOP and the advancement of personalized immune-based therapies. Keywords: Postmenopausal osteoporosis, Immune, Diagnosis, Molecular subtype, Biomarkers 1. Introduction Osteoporosis is a bone disorder marked by diminished bone strength and loss of bone mass, leading to a substantially higher risk of fractures [[39]1]. Postmenopausal osteoporosis (PMOP) is the most prevalent form, primarily driven by an imbalance in bone homeostasis due to diminished estrogen secretion in postmenopausal women [[40]2]. Specifically, decreased ovarian function causes bone resorption to exceed bone formation during the remodeling process [[41]3,[42]4]. This imbalance leads to a rapid decline in both bone density and quality, thereby increasing the risk of fractures. PMOP represents a substantial global health concern, imposing significant physical and economic burdens on elderly women. In the United States, nearly 20 % of women aged 50 and above suffer from hip osteoporosis, with approximately 50 % experiencing a substantial reduction in hip bone mass [[43]5]. Another study indicates that the prevalence of osteoporosis in women escalates from 4 % at age 50 to a staggering 52 % at age 80 [[44]6]. Thus, early diagnosis and timely intervention for PMOP are of paramount importance. Currently, clinical diagnosis of PMOP heavily relies on bone mineral density (BMD) testing, considered the most critical indicator for predicting fracture risk. This includes assessing BMD through dual-energy X-ray absorptiometry (DXA) [[45]7,[46]8]. However, for most female PMOP patients, clinical symptoms are notably absent until explicit fractures occur. BMD testing only provides a limited reflection of bone quantity and poses challenges for precise and prompt early-stage PMOP diagnosis [[47]9]. Presently, there is a dearth of biomarkers for early PMOP diagnosis, which is essential for timely intervention and treatment to enhance patient outcomes. Estrogen deficiency is a significant etiological factor in PMOP, with its receptors widely distributed among osteoblasts, osteoclasts, and hematopoietic cells [[48]10]. Estrogen may suppress bone resorption by regulating osteoclast formation. Previous research has emphasized the strong connection between the pathophysiological mechanisms of PMOP and immune regulation within the body [[49]11,[50]12]. From an immunological perspective, as aging process intensifies, immune defenses weaken of body, maintaining a state of chronic inflammation that could activate osteoclasts and exacerbate bone loss [[51]13]. However, the mechanisms behind bone metabolism and immune regulation in PMOP remain unclear [[52]14]. Recent advancements in genomics provide promising prospects for exploring biomarkers in PMOP patients at a micro-level [[53]15]. Therefore, investigating the immune microenvironment of PMOP patients, identifying immune-related biomarkers for diagnosis and classification, could aid in early diagnosis and personalized immune-based therapies, offering molecular targets for PMOP patients. This study is the first to investigate the expression profile of immune-related genes and the immune microenvironment in PMOP patients from an immunological standpoint. Utilizing machine learning algorithms, distinctive biomarkers were identified, leading to the establishment of reliable immune-related molecular subtypes and diagnostic models ([54]Fig. 1). This provides a foundation for early PMOP diagnosis and personalized immunotherapy for patients. Fig. 1. [55]Fig. 1 [56]Open in a new tab The flowchart of this study. 2. Materials and methods 2.1. Data source Data from PMOP patients were sourced from the Gene Expression Omnibus (GEO) database ([57]https://www.ncbi.nlm.nih.gov/geo/), specifically from the [58]GSE56815 and [59]GSE2208 datasets. The [60]GSE56815 dataset, derived from array-based gene expression profiles, includes gene expression matrices of peripheral blood mononuclear cells from 20 PMOP patients and 20 healthy controls, all sequenced using the [61]GPL96 platform. Similarly, the [62]GSE2208 dataset contains 9 PMOP patients and 20 healthy controls, also sequenced on the [63]GPL96 platform. To ensure consistency in the training set results, [64]GSE56815 was used as the training set, while [65]GSE2208 served as the validation set for subsequent analysis. Furthermore, we utilized “Perl” software [[66]16] to convert probe names into gene symbols for further analysis. Detailed dataset information can be found in [67]Table 1. Table 1. Dataset source and information. Dataset Platform Sample Cell Type [68]GSE56815 [69]GPL96 20 PMOP vs 20 Control Mononuclear cells derived from peripheral blood [70]GSE2208 [71]GPL96 9 PMOP vs 20 Control Mononuclear cells derived from peripheral blood [72]GSE147287 [73]GPL24676 1 PMOP Mononuclear cells derived from bone marrow [74]Open in a new tab 2.2. Analysis of immune-related differentially expressed genes (IR-DEGs) A total of 2483 immune-related genes (see [75]Supplementary Table 1) were obtained from the ImmPort database ([76]https://immport.org). Subsequently, Using the “limma” package [[77]17], the expression matrix of immune-related genes from 20 PMOP and 20 healthy samples was extracted. Differential expression analysis was then conducted to identify significantly differentially expressed immune-related genes (IR-DEGs) between PMOP patients and healthy controls. Visualization was performed using differential heatmaps and box plots. The criteria for selection were set as log fold change (log|FC|)> 0.585 and adjusted P < 0.05. 2.3. Enrichment analysis To explore the potential molecular mechanisms of IR-DEGs between PMOP patients and healthy samples, Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were analyzed. The GO functional analysis encompassed Molecular Functions (MF), Cellular Components (CC), and Biological Processes (BP). The top 5 enriched functions and the top 10 KEGG pathways were visualized using bar graphs in different colors. We used a significance threshold of P < 0.05 to determine which functions or pathways were significantly enriched. 2.4. Characteristic gene selection To identify key genes, we employed machine learning algorithms to select characteristic IR-DEGs. Initially, we compared the residual error values of two commonly used Support Vector Machine Recursive Feature Elimination (SVM-RFE) [[78]18] and Random Forest (RF) [[79]19] models, to determine the most suitable machine learning algorithm. Subsequently, a Least Absolute Shrinkage and Selection Operator (LASSO) [[80]20] regression was employed to identify the selection of characteristic genes. The final set of characteristic IR-DEGs was determined by taking the intersection of genes identified by both machine learning algorithms. Additionally, we analyzed the chromosomal locations of these characteristic IR-DEGs. 2.5. Single-cell data and analysis The ScRNA-seq dataset [81]GSE147287 of PMOP patients was obtained from the GEO database, comprising the raw matrix data of one PMOP patient. Data processing and filtering were conducted using the “Seurat” package in R software (version 4.3.2). Standard data preprocessing procedures involved determining the number of genes, cells, and total molecules. Cells with a detected gene count less than 500 or exceeding 7500 were discarded. Cells with high mitochondrial gene content (≥20 %) were also removed. Subsequently, data standardization was performed using the “ScaleData” function in “Seurat,” and 2000 highly variable genes (genes showing significant differences between some cells) were selected for principal component analysis (PCA) dimensionality reduction and cell identification across all samples. Appropriate PC values (PC = 10) and resolution (0.5) were chosen for further optimization of cell clustering. Visualization and storage of DEGs for each cluster were carried out. Additionally, each cluster was subsequently defined and annotated based on the specific expression patterns of marker genes. Finally, differential expression analysis was conducted for key IR-DEGs across different clusters. 2.6. Cell cycle and cell communication The cell cycle is a well-conserved and ongoing process. The “tricycle” package was employed to predict the specific positions of cells in the cell cycle based on data projection. The cell cycle stages were inferred as five phases: G1.S phase, S phase, G2 phase, G2/M phase, and M.G1 phase, represented by the UAMP of the data. The proportions of different cell cycle phases were further calculated for each distinct cluster. Additionally, the “CellChat” package was utilized to analyze chondrocyte-like samples from OA joints, calculating intercellular communication within each cell cluster of PMOP samples to reveal the communication strength of each signaling pathway. A significance threshold of P < 0.05 was applied, and the visualization was based on the number and intensity of connections between various cell clusters. 2.7. Determination of molecular subtypes in PMOP patients Conducting disease subtyping through clustering analysis is beneficial for personalized treatment of patients. Therefore, we utilized Consensus Clustering analysis to identify molecular subtypes in PMOP patients. Initially, PMOP patients were subjected to Consensus Clustering using the “ConsensusClusterPlus” package [[82]21,[83]22]. We generated eight clustering models (K = 2–9) based on different K values and visualized them using clustering heatmaps. The clustering parameters were set as follows: max K = 9, reps = 50, pFeature = 1, pItem = 0.8. Subsequently, we calculated the consensus scores for the K = 2–9 clustering models and created cumulative distribution functions (CDF) along with area under the curve (AUC) to identify the most appropriate clustering K value. Moreover, to confirm significant differences between PMOP molecular subtypes with distinct features, we conducted PCA [[84]23] for subtyping 20 PMOP patients. Subsequently, we examined the expression differences of key IR-DEGs across the identified molecular subtypes. 2.8. Analysis of immune microenvironment in PMOP patients The pathogenesis of PMOP may be associated with immune regulation. Therefore, we conducted a comprehensive exploration of the immune microenvironment in PMOP patients. Initially, we performed immune cell infiltration analysis and immunoscore calculation for each PMOP sample and healthy control using the “CIBERSORT” package [[85]24]. Subsequently, the infiltration levels of 22 immune cell types were compared between PMOP patients and control samples, with the significant differences visualized through cluster heatmaps and box plots. Finally, the variations in infiltration levels of these 22 immune cell types among different PMOP subtypes (subtype 1 and subtype 2) were visualized through histograms and box plots. 2.9. Immune correlation analysis of characteristic IR-DEGs We further explored the immunoregulatory relationship of characteristic IR-DEGs with PMOP patients from an immunological perspective. Subsequently, we conducted correlation analyses between these key IR-DEGs and the 22 immune cell types. A significance threshold was set at P < 0.05 and |R|>0.3 to identify genes with a pronounced correlation with immune cells. 2.10. Single-sample gene set enrichment analysis (ssGSEA) ssGSEA is a widely used method to assess the enrichment of target genes under specific biological conditions [[86]25]. In this study, PMOP samples from the training group were divided into low- and high-expression groups for each characteristic IR-DEG, based on the median expression level of these genes. This categorization allowed for the analysis of biologically significant functions or pathways that were significantly enriched for each f characteristic IR-DEG. A significance threshold of P < 0.05 was utilized to determine these enriched biological functions or pathways. 2.11. Diagnostic model based on characteristic IR-DEGs We further developed a diagnostic model for PMOP using the characteristic IR-DEGs, evaluating their ability to distinguish between PMOP patients and control samples. Specifically, we employed the “glmnet” and “pROC” packages [[87]26] to build ROC curves for the characteristic genes and performed ROC evaluations for each gene. Subsequently, we established a logistic regression model for the characteristic genes to generate an overall ROC curve and calculate the AUC values. Finally, we employed the independent [88]GSE2208 dataset as a validation set to validate the ROC curves constructed for the characteristic genes. 2.12. PPI and gene-gene interaction networks To investigate the potential interaction mechanisms among characteristic IR-DEGs, we constructed a Protein-Protein Interaction (PPI) network using the online STRING database ([89]https://string-db.org). Genes lacking interactions were excluded, and the selected minimum confidence score was set to 0.15. Furthermore, we established a gene-gene interaction network for characteristic IR-DEGs based on the GENEMANIA online database ([90]http://genemania.org), visually represented as a circular plot. Additionally, we created gene-miRNA and gene-disease regulatory networks through the NetworAnalyst database ([91]https://www.networkanalyst.ca). 2.13. Establishment of PMOP rat model and RT-qPCR Purchase 24 female SD rats, aged 4–5 months and weighing 400–500g, from the Guangdong Province Medical Experimental Animal Center. After a week of routine feeding in the lab, administer 2 % pentobarbital sodium intraperitoneally (40 mg/kg) for anesthesia, then prepare and disinfect the abdominal area. Randomly divide the 24 rats into two groups: 12 in the PMOP group and 12 in the sham group. The PMOP group undergoes bilateral ovariectomy, while the sham group only has the adipose tissue around the ovaries removed, followed by suturing and disinfection of the incisions. Resume normal diet post-operation and establish the PMOP model after 8 weeks. After the model was established, 24 rats were anesthetized with an intraperitoneal injection of sodium pentobarbital, and blood was collected via the jugular vein. The rats were then euthanized, and the right femoral tissue was extracted. Extract total RNA from femur tissue and peripheral blood using the Simply Total RNA Extraction Kit (BioFlux, Japan), according to the manufacturer's instructions. Transcribe cDNA using the SweScrip RT I First Strand Synthesis Kit (Service Bio, Guangzhou, China) as per the instructions provided. Employ the MyiQ PCR instrument to detect the expression of the genes JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5. Use GAPDH for normalization. Primer sequences are listed in [92]Table 2. The animal experiments conducted in this study were approved by the Medical Research and Clinical Trial Ethics Committee of the First People's Hospital of Huzhou (Approval No. 2022-SDYW-004). Table 2. The primers used for characteristic immune-related DEGs. Targeted genes Forward (5′–3′) Reverse (3′–5′) JUN AACTCGGACCTTCTCACGTC GGTCGGTGTAGTGGTGATGT CYSLTR2 TCATGCTCAACCTGGCCATT GGCCCAGTCCCCAAATATCC SSTR5 CCTCCACACCAAGCTGGAAT CAACAGGTAGAGCACAGGCA TNFSF8 ACACTGCAGCTCGTCATCAA TCCACCCGGACAGATATGGT IL1R2 TGTAAACGCCAGGCTGGAAA TTTGGTTTGGGCTGGAAGGG HMOX1 GCATGTCCCAGGATTTGTCC ACCAGCTTAAAGCCTTCCCT GAPDH TTCACCACCATGGAGAAGGC CTCGTGGTTCACACCCATCA [93]Open in a new tab 2.14. Statistics analysis For normally distributed data, comparisons between two groups were conducted using two-sample independent t-tests. In cases where normality assumptions were not met, the Wilcoxon test was employed for comparing two groups. Statistical significance was defined as a P-value less than 0.05. All gene expression matrices and statistical analyses were conducted using R version 4.1.3 ([94]https://www.R-project.org/) along with relevant packages. 3. Results 3.1. Identification and enrichment analysis of IR-DEGs The [95]GSE56815 dataset comprised 40 patients with low bone density and 40 patients with high bone density. After excluding premenopausal individuals with low bone density and postmenopausal individuals with high bone density, we obtained a control group consisting of 20 premenopausal individuals with high bone density and 20 patients with PMOP. Similarly, the [96]GSE2208 dataset included 10 patients with high bone density and 9 PMOP patients, resulting in a control group of 5 premenopausal individuals with high bone density and 9 PMOP patients after excluding 5 premenopausal individuals with low bone density. We utilized the “limma” package to extract immune-related genes from the data matrices and conducted differential expression analysis, identifying a total of 327 IR-DEGs (see [97]Supplementary Table 2). The top 10 IR-DEGs were visually represented through differential heatmaps and box plots ([98]Fig. 2A and B). Fig. 2. [99]Fig. 2 [100]Open in a new tab Identification and Functional Analysis of Immune-Related Differentially Expressed Genes (IR-DEGs). (A) Differential heat map of the top 10 IR-DEGs, with red representing upregulated genes and blue representing downregulated genes. (B) Differential box plot of the top 10 IR-DEGs, with red indicating the PMOP group and blue indicating the control group. (C) The top 10 significant enrichment of KEGG pathways. (D) The top 5 significantly enriched biological functions (BP) revealed by GO functional analysis, the top 4 enriched cellular components, and the top 5 enriched molecular functions. Significance: ∗∗P < 0.01, ∗∗∗P < 0.001. We performed functional and pathway enrichment analyses to investigate the possible biological roles and mechanisms of the IR-DEGs. The KEGG pathway analysis indicated the IR-DEGs were significantly enriched in pathways related to blood coagulation, wound healing, and hemostasis ([101]Fig. 2C). GO functional analysis revealed that these IR-DEGs were strongly linked to biological processes (BP) related to wound healing, lymphocyte differentiation, and coagulation. In terms of cellular components (CC), the IR-DEGs were primarily associated with the external side of the plasma membrane, the lumen of cytoplasmic vesicles, and secretory granules. Molecular functions (MF) were mainly enriched in phospholipid binding, DNA-binding transcription activator activity, and collagen binding ([102]Fig. 2D). These results suggest that IR-DEGs could contribute to the regulation of PMOP and may have an impact on the hematological system. 3.2. Selection of characteristic IR-DEGs To select characteristic genes, we initially compared the residual error values of RF and SVM-RFE. Using box plots and residual distribution curves, we determined that SVM-RFE was the appropriate selection method ([103]Fig. 3A and B). In the SVM-RFE analysis, cross-validation accuracy plots indicated that 17 features achieved an accuracy of 0.825, while cross-validation error plots showed that these 17 features had an error rate of 0.175 ([104]Fig. 3C and D). Consequently, 17 SVM-RFE-selected characteristic genes were obtained ([105]Supplementary Table 3). Next, we performed LASSO regression analysis with cross-validation on IR-DEGs, resulting in 13 characteristic genes ([106]Fig. 3E and F). By further intersecting these genes, we identified 6 characteristic IR-DEGs: JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5 ([107]Fig. 3G). Chromosome location analysis indicated that JUN, IL1R2, TNFSF8, CYSLTR2, SSTR5, and HMOX1 were situated on chromosomes 1, 2, 9, 13, 16, and 22, respectively ([108]Fig. 3H). Fig. 3. [109]Fig. 3 [110]Open in a new tab Identification of characteristic IR-DEGs using machine learning methods. (A and B) Residuals from the Random Forest (RF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) models. (C) Change curves in the predicted true values for each IR-DEG. (D) Change curves in the predicted error values for each IR-DEG. (E) Gene coefficient plots from the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis. (F) Cross-validation curves. (G) Characteristic IR-DEGs obtained by the intersection of genes selected by LASSO and SVM-RFE. (H) Chromosomal positions of the 6 characteristic IR-DEGs. 3.3. Identification of different cell types, cell cycle and cell communication analysis To further investigate the occurrence and progression of PMOP, the immune microenvironment of PMOP patients was analyzed at the cellular level using scRNA-seq data set ([111]GSM4423510). After quality control, filtering, PCA dimensionality reduction, and t-SNE clustering analysis, a total of 12 different cell types were identified ([112]Fig. 4A). Using feature markers, these cells were defined as 11 different subtypes: M1 macrophage (FCGR3B), Treg (NRP1), CD8^+ T cell (GZMK and CD8A), Endothelial cell (PECAM1), M2 macrophage (CD163), RBC (HBD), B cell (CD37), lymphoid-primed multi-potential progenitors (LMPP) (CAMP), Fibroblast (FGF7), NK cell (NKG7), Epithelial_cell (CDH1) ([113]Fig. 4B and C). The heatmap shows significant differences in gene markers among different cell clusters ([114]Fig. 4D). Subsequently, UMAP was employed to visualize the expression of key genes (JUN, HMOX1, CYSLTR2, TNFSF8, and IL1R2) in cells ([115]Fig. 5A). Differential expression analysis of key IR-DEGs across cell clusters showed that JUN expression was markedly elevated in M1 macrophages compared to the other six cell types (Treg, CD8^+ T cell, Endothelial cell, M2 macrophage, RBC, and B cell). HMOX1 showed significantly higher expression in endothelial cells and M2 macrophages. CYSLTR2, TNFSF8, and IL1R2 were expressed less in various cell types and showed no significant differences ([116]Fig. 5B). Fig. 4. [117]Fig. 4 [118]Open in a new tab Cell clustering analysis and annotation using the tSNE method. (A) Identification of 12 cell clusters based on the tSNE method. (B) Determination of 11 cell types based on cell-specific markers. (C) Specific markers for the 11 cell types. (D) Heatmap illustrating the classification of the 11 cell types. Fig. 5. [119]Fig. 5 [120]Open in a new tab Cell cycle and cell communication analysis in PMOP patients. (A) Expression levels of five characteristic IR-DEGs in bone marrow cells. (B) Visualization of differential expression of five characteristic IR-DEGs among different cell types. (C) Proportions of different cell types in various cell cycles. (D) Visualization of the cell cycle distribution among different cell types. (E) Interactions among the 11 different cell types. Additionally, to explore potential connections and molecular mechanisms between different cell types, further analysis of the cell cycle and cell communication was conducted. M1 macrophages were found to be present in almost all cell cycles (M.G1, G2.M, G2, S, G1.S) and constituted a significant proportion, while Treg cells were mainly concentrated in the S and G1.S phases of the cell cycle ([121]Fig. 5C and D). Cell communication analysis revealed that M2 macrophages, CD8^+ T cells, Treg cells, and fibroblasts exhibited more interactive cell communication pairs in the PMOP microenvironment, with M1 macrophages having the highest number of interacting cells ([122]Fig. 5E). These findings suggest that macrophages may play a crucial role in the immune microenvironment of PMOP. 3.4. Determination of molecular subtypes in PMOP patients Subtyping PMOP patients helps in further analyzing the differences in the immune microenvironment between different subtypes, thereby aiding the development of personalized immunotherapy approaches. Based on six characteristic IR-DEGs, we employed consensus cluster analysis to identify molecular subtypes within PMOP patients. Specifically, based on the clustering matrix heatmaps for K = 2 to K = 4, K = 2 was considered the optimal clustering solution ([123]Fig. 6A). Subsequently, the Consensus Cluster Score results indicated that both subtypes achieved scores exceeding 0.80 for K = 2 ([124]Fig. 6B). Furthermore, the determination of the best K value was reinforced by the Consensus CDF plot and the area under the CDF curve ([125]Fig. 6C and D). Consequently, K = 2 was confirmed as the most appropriate clustering solution, revealing two subtypes, Subtype 1 (comprising 8 samples) and Subtype 2 (comprising 12 samples). Principal Component Analysis (PCA) results highlighted marked dissimilarities between Subtypes 1 and 2 ([126]Fig. 6E). Fig. 6. [127]Fig. 6 [128]Open in a new tab Construction of molecular subtypes in peripheral blood monocytes of PMOP patients based on characteristic IR-DEGs. (A) Heatmap of different K-means clustering matrices (K = 2–4). (B) Consistency clustering scores of different subtypes (K = 2–9). (C) Cumulative distribution function (CDF) plot determining the optimal K value (K = 2). (D) Delta area plot. (E) Principal component analysis (PCA) of PMOP subtypes 1 and 2. (F) Differential expression analysis of the 6 IR-DEGs between Subtypes 1 and 2. Significance: ∗∗P < 0.01, ∗∗∗P < 0.001. Additionally, we analyzed the differential expression of characteristic genes between the two subtypes ([129]Fig. 6F). The heatmap and boxplot results showed significant differences in HMOX1 and IL1R2 expression levels between Subtype 1 and Subtype 2, with both genes showing notably higher expression in Subtype 2. 3.5. Analysis of immune microenvironment in PMOP patients Previous research has indicated that alterations in the immune status of PMOP patients may contribute to ongoing bone destruction, emphasizing the interaction between immune cells and bone [[130]27,[131]28]. In this context, we performed an in-depth analysis of the immune microenvironment in PMOP patients. The “CIBERSORT” algorithm was employed to identify and score 22 distinct immune cell types in both PMOP patients and healthy controls. The results were visually represented through a heatmap and a boxplot ([132]Fig. 7A and B). The analysis revealed significant upregulation of resting dendritic cells and naive B cells in PMOP patients, while follicular helper T cells and naive CD4 T cells showed substantial downregulation in their infiltration levels within the PMOP patient group. These findings suggest a potential involvement of T cells and B cells in the immune regulation mechanisms underlying PMOP. Furthermore, within PMOP subtype 2, resting NK cells, and monocytes exhibited significantly elevated immune infiltration, while M1 macrophages, M2 macrophages, and eosinophils had higher infiltration levels in subtype 1 ([133]Fig. 7C). Fig. 7. [134]Fig. 7 [135]Open in a new tab Analysis of the immune microenvironment in PMOP patients. (A) Heatmap depicting the correlation between the infiltration abundance of 22 immune cell types between the PMOP group and control group. (B) Differential expression analysis of the infiltration abundance of 22 immune cell types between the PMOP group and control group. (C) Differential expression analysis of immune cell infiltration abundance between PMOP Subtypes 1 and 2. Significance: ∗P < 0.05, ∗∗P < 0.01, ∗∗∗P < 0.001. 3.6. Immune correlation analysis Additionally, we performed a correlation analysis between each key IR-DEG and the proportions of 22 immune cell types. Notably, HMOX1, IL1R2, and SSTR5 demonstrated significant correlations with immune cells ([136]Fig. 8A–C). Specifically, HMOX1 exhibited a significant negative correlation with resting mast cells (R = −0.53, P = 0.018), while IL1R2 showed a significant positive correlation with memory B cells (R = 0.51, P = 0.02). Additionally, SSTR5 exhibited a significant negative correlation with resting memory CD4 T cells (R = −0.55, P = 0.013). Fig. 8. [137]Fig. 8 [138]Open in a new tab Correlation analysis of key IR-DEGs with 22 immune cell types. (A) HMOX1 exhibits a significant negative correlation with resting Mast cells. (B) IL1R2 shows a significant positive correlation with activated dendritic cells. (C) SSTR5 demonstrates a significant negative correlation with resting T cells CD4 memory. 3.7. ssGSEA enrichment analyses Additionally, we investigated the potential molecular mechanisms and pathways related to the six IR-DEGs. ssGSEA-KEGG pathway enrichment analysis revealed the following associations: JUN was primarily enriched in pathways related to acute myeloid leukemia and galactose metabolism ([139]Fig. 9A–F). HMOX1 showed prominent enrichment in pathways associated with acute myeloid leukemia and focal adhesion. CYSLTR2 was notably enriched in pathways such as axon guidance, cytokine-cytokine receptor interaction, and drug metabolism. TNFSF8 was enriched primarily in pathways including focal adhesion, huntington's disease, and nod-like receptor signaling pathway. IL1R2 was enriched in pathways linked to acute myeloid leukemia and lysosome. SSTR5 was predominantly enriched pathways connected to aldosterone regulated sodium reabsorption and asthma. In summary, these IR-DEGs may influence the development of PMOP by regulating amino acid metabolism and potentially have close associations with acute myeloid leukemia. Fig. 9. [140]Fig. 9 [141]Open in a new tab Single-Sample Gene Set Enrichment Analysis (ssGSEA) of the 6 Key IR-DEGs. ssGSEA results for JUN (A), HMOX1 (B), CYSLTR2 (C), TNFSF8 (D), IL1R2 (E) and SSTR5 (F) between high-expression and low-expression groups. 3.8. Construction of PMOP diagnostic model To investigate the diagnostic value of the six IR-DEGs, the diagnostic models were constructed ([142]Fig. 10A and B). ROC curves for each gene were generated in the training set, demonstrated that, except for SSTR5 (AUC = 0.690), all other characteristic genes exhibited AUC values exceeding 0.700, with JUN (AUC = 0.700), HMOX1 (AUC = 0.772), CYSLTR2 (AUC = 0.725), TNFSF8 (AUC = 0.705), and IL1R2 (AUC = 0.780). The AUC value for the logistic regression model was 0.927 (95 % CI: 0.837–0.995). Furthermore, using another dataset ([143]GSE2208) for validation and ROC curves were established ([144]Fig. 10C and D). Due to the limited sample size, information for only three characteristic genes was available: JUN (AUC = 0.933), CYSLTR2 (AUC = 0.711), and IL1R2 (AUC = 0.733). All these AUC values exceeded 0.70. In the validation set, the AUC value for the logistic regression model was 0.956 (95 % CI: 0.800–1.000), indicating that the IR-DEGs hold substantial diagnostic value for PMOP. Fig. 10. [145]Fig. 10 [146]Open in a new tab Diagnostic value of 6 characteristic IR-DEGs in PMOP. (A) ROC curves for each of the 6 characteristic IR-DEGs (JUN, HMOX1, CYSLTR2, TNFSF8, IL1R2, and SSTR5) in the training dataset ([147]GSE56815). (B) ROC curve for the Logistic regression model constructed based on the 6 characteristic IR-DEGs in the [148]GSE56815 dataset. (C) ROC curves for each of the 3 characteristic IR-DEGs in the validation dataset ([149]GSE2208). (D) ROC curve for the Logistic regression model constructed based on the 3 characteristic IR-DEGs in the [150]GSE2208 dataset. 3.9. Construction of PPI and gene-gene interaction networks To investigate the potential molecular mechanisms of the six characteristic IR-DEGs, a PPI network was constructed using the STRING database ([151]Fig. 11A). This network consisted of 4 nodes and 6 edges. Subsequently, utilizing GENEMANIA database references, including