Abstract

Introduction

   Osteosarcoma, the prevailing primary bone malignancy among children and
   adolescents, is frequently associated with treatment failure primarily
   due to its pronounced metastatic nature.

Methods

   This study aimed to establish potential associations between hub genes
   and subtypes for the treatment of metastatic osteosarcoma.
   Differentially expressed genes were extracted from patients diagnosed
   with metastatic osteosarcoma and a control group of non-metastatic
   patients, using the publicly available gene expression profile
   ([33]GSE21257). The intersection of these gene sets was determined by
   focusing on endoplasmic reticulum (ER) stress-related genes sourced
   from the GeneCards database. We conducted various analytical
   techniques, including functional and pathway enrichment analysis, WGCNA
   analysis, protein-protein interaction (PPI) network construction, and
   assessment of immune cell infiltration, using the intersecting genes.
   Through this analysis, we identified potential hub genes.

Results

   Osteosarcoma subtype models were developed using molecular consensus
   clustering analysis, followed by an examination of the associations
   between each subtype and hub genes. A total of 138 potential
   differentially expressed genes related to endoplasmic reticulum (ER)
   stress were identified. These genes were further investigated using
   Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and
   Gene Set Enrichment Analysis (GSEA) pathways. Additionally, the PPI
   interaction network revealed 38 interaction relationships among the top
   ten hub genes. The findings of the analysis revealed a strong
   correlation between the extent of immune cell infiltration and both
   osteosarcoma metastasis and the expression of hub genes. Notably, the
   differential expression of the top ten hub genes was observed in
   osteosarcoma clusters 1 and 4, signifying their significant association
   with the disease.

Conclusion

   The identification of ten key genes linked to osteosarcoma metastasis
   and endoplasmic reticulum stress bears potential clinical significance.
   Additionally, exploring the molecular subtype of osteosarcoma has the
   capacity to guide clinical treatment decisions, necessitating further
   investigations and subsequent clinical validations.

   Keywords: Osteosarcoma, Metastasis, Endoplasmic reticulum stress,
   Subtype analysis

1. Introduction

   Osteosarcoma, a primary bone tumor prevalent among children and
   adolescents, exhibits an annual incidence of approximately 8–11/million
   [[34]1]. This malignancy is characterized by its highly aggressive
   nature, heterogeneity, and propensity for metastasis [[35]2], leading
   to a significant risk of disability and mortality following metastatic
   spread. Osteosarcoma commonly manifests in the vicinity of the
   metaphysis of long bones, including the distal femur, proximal tibia,
   and humerus [[36]3]. The precise etiology, progression, and metastasis
   of osteosarcoma have not been thoroughly investigated [[37]4,[38]5].
   However, the elevated prevalence among adolescents and its predilection
   for specific anatomical sites suggest a correlation with accelerated
   bone tissue growth [[39]6]. Presently, the primary therapeutic approach
   for osteosarcoma entails preoperative chemotherapy, surgical excision,
   and postoperative chemotherapy. Despite extensive removal of
   macroscopic tumor tissue during surgical intervention, recurrence
   frequently occurs [[40]7]. The implementation of neoadjuvant
   chemotherapy has led to a substantial enhancement in the 5-year
   survival rate among individuals diagnosed with osteosarcoma [[41]8].
   When neoadjuvant chemotherapy is combined with surgical resection, the
   5-year survival rate for patients with non-metastatic osteosarcoma is
   approximately 60% [[42]9]. However, the survival rate drops
   significantly to only 20–30% for patients with metastatic osteosarcoma
   [[43][9], [44][10], [45][11]]. The high incidence of metastasis in
   osteosarcoma is the primary determinant of the unfavorable prognosis
   [[46]12]. Presently, there is no treatment available that significantly
   improves the 5-year survival rate for patients with metastatic
   osteosarcoma [[47]13].

   The endoplasmic reticulum (ER) plays a crucial role in the processes of
   protein synthesis, folding, and structural maturation. ER stress, which
   occurs as a result of genetic or environmental factors, disrupts the
   ER's ability to properly modify, fold, and secrete proteins, leading to
   the accumulation of misfolded proteins within organelles [[48]14]. This
   phenomenon of ER stress is closely linked to the onset and progression
   of various diseases, including tumors, diabetes, and neurodegenerative
   disorders [[49][15], [50][16], [51][17]]. Unfavorable conditions within
   the tumor microenvironment, including nutritional deficiency, hypoxia,
   hypermetabolism, and oxidative stress, have been shown to disrupt
   protein folding by the endoplasmic reticulum (ER), leading to the
   persistent activation of “ER stress.” This phenomenon enhances the
   tumorigenic, metastatic, and drug-resistant properties of malignant
   cells [[52]14,[53]16,[54]18]. Osteosarcoma is characterized by
   extensive genetic variation, dysregulation of multiple signaling
   pathways, and genome instability. These factors are closely associated
   with the regulation of bone development, tumor microenvironment, genome
   homeostasis, cell cycle control, and cell signal transduction pathways
   [[55]16]. The correlation between ER stress-related genes and
   osteosarcoma metastasis has been established [[56]19], and the
   utilization of molecular subtype models is crucial for predicting tumor
   risk and evaluating prognosis [[57][20], [58][21], [59][22]].
   Nevertheless, there is currently a lack of research on molecular
   subtype models specifically pertaining to ER stress and osteosarcoma
   metastasis.

   In this study, the author developed a molecular subtype model for
   osteosarcoma AMI. By analyzing the gene data from 14 metastatic
   osteosarcoma patients and 19 non-metastatic osteosarcoma patients, the
   study identified co-expressed differential genes associated with
   osteosarcoma metastasis and ER stress. Subsequently, a validation set
   comprising 10 key genes was established, confirming the robustness of
   the molecular subtype model for osteosarcoma. These key genes hold
   promise as potential targets for targeted therapeutic drugs and
   molecular markers for prognostic evaluation.

2. Materials and methods

     * 1
       GEO data difference analysisC

   We performed an analysis on the osteosarcoma dataset. [60]GSE21257
   comes from the Gene Expression Omnibus (GEO) database [[61]23].
   [62]GSE21257 has 53 samples, including 14 samples of osteosarcoma
   metastases (metastases present at diagnosis) and 19 samples of
   non-metastases (no metastases). The chip platform is a [63]GPL10295
   Illumina human-6 v2. 0 expression beadchip (using nuIDs as
   identifiers).

   We used the standardized data expression matrix downloaded from GEO to
   match, select, and delete gene data based on the samples. The selected
   samples were grouped according to metastasis and non-metastasis for
   subsequent analysis. For the preprocessed data set, we drew a boxplot
   using the R language's boxplot function to observe the data
   distribution. Expression difference P-values and expression fold change
   values were calculated using the R package limma (Version 3.42.2)
   [[64]24]. We selected genes with a P-value <0.05 and |log2FC| > 0.263
   (1.2-fold relationship) as significantly differentially expressed
   mRNAs. The R package heatmap function (Version 1.0.12) [[65]25] was
   used to make differential heat maps and volcano maps for visualization.
     * 2
       Molecular subtype construction

   We calculated the AMI molecular subtypes with the ConsensusClusterPlus
   package [[66]26] and the Rtsne package [[67]27]. The ggplot2 package
   was used for molecular subtype visualization. Correlations between hub
   genes and molecular subtypes were plotted with the gspubr package.
     * 3
       Enrichment analysis of intersection genes

   The GeneCards database ([68]https://www.genecards.org/) [[69]28]
   provides annotated and predicted human genetic information. The
   database automatically integrated genetic data from approximately 150
   network sources, including genomics, transcriptomics, proteomics, and
   genetics, as well as clinical and functional information.

   In this analysis, ER stress-related genes were downloaded from the
   GeneCards database using “endoplasmic reticulum stress” as the search
   key. We took the intersection of differential genes and ER
   stress-related human genes to create a Venn diagram, which was
   constructed using the R-package Venn diagram function (Version 1.6.20)
   [[70]29].

   Gene Ontology (GO) [[71]30] describes our understanding of the field of
   biology based on three aspects, molecular function (MF), cellular
   components (CC), and biological processes (BP). GO enrichment analysis
   is typically used to explore the enrichment degree of GO terms
   associated with differentially expressed genes. Kyoto Encyclopedia of
   Genes and Genomes (KEGG) [[72]31] is a utility database resource for
   understanding advanced functions and biological systems (such as cells,
   organisms, and ecosystems) based on molecular-level information,
   especially genome sequencing and other high-throughput experimental
   techniques generated from large molecular datasets. KEGG was
   established in 1995 by Kanehisa's laboratory at the Center for
   Bioinformatics at Kyoto University, Japan.

   The analysis used in this study was based on the use of the R package
   clusterProfiler package (version 3.14.3) [[73]32] to perform GO
   function/pathway enrichment analysis on the intersection genes and the
   significance threshold was set to P ≤ 0.05. Bubble plots were created
   for visualization using the R package ggplot2 (Version 3.3.3) [[74]33].
   The pathway network diagram was drawn to visualize the pathway
   relationship using the cytoscape plug-in ClueGo [[75]34].

   We selected and downloaded the c2.cp.v7.2.symbols.gmt gene set data
   from the GSEA ([76]http://www.gsea-msigdb.org/gsea/index.jsp) [[77]35]
   database as the reference gene set and performed GSEA enrichment
   analysis on the two sets of data with the R clusterProfiler package.
   The GSEA statistical process was used to calculate the enrichment
   score, estimate the significance of the enrichment score, correct for
   multiple hypothesis testing, and select the enrichment results with
   P < 0.05 to draw the GSEA enrichment map.
     * 4
       WGCNA analysis

   The gene co-expression network was constructed using the WGCNA R
   package [[78]36]. By calculating the Pearson correlation coefficient
   between two genes, we used the expression data to create a similarity
   matrix and chose an appropriate soft threshold β to make the
   constructed network more compatible with the standard of a scale-free
   network. We transformed the adjacency matrix into a topological overlap
   matrix TOM and used hierarchical clustering to generate a hierarchical
   clustering tree of genes. The correlations between genes and clinical
   information were calculated, and the significant associations of
   modules with ER-related DEGs were analyzed.
     * 5
       PPI Interaction Network Construction

   The STRING (version 11.0, [79]http://www.string-db.org/) database
   [[80]37] was used to perform protein-protein interactions (PPI) on
   critical module genes from the WGCNA results and the ER stress-related
   differentially expressed genes. We selected Required Confidence
   (combined score) > 0.4 as the threshold value of the PPI relationship.
   Cytoscape software [[81]38] was used to construct the PPI network of
   interaction genes, and the top ten hub genes were extracted using the
   cytohbhs plug-in Ref. [[82]39].
     * 6
       Analysis of Immune Cell Infiltration

   CIBERSORT is based on the principle of linear support vector regression
   to deconvolve the transcriptome expression matrix and estimate the
   composition and abundance of immune cells in mixed cell populations
   [[83]40]. We used the gene expression matrix data to analyze the
   infiltration of 22 immune genes in the samples using the CIBERSORT
   algorithm. The samples with P < 0.05 were filtered to obtain the immune
   cell infiltration matrix. The corplot package [[84]41] was used to draw
   correlation heatmaps to visualize the correlations of the 22 immune
   cell infiltrations. The gepubr package
   (https:/CRAN.R-pnojiect.org/package-ggpuby) was used to draw violin
   plots to visualize the differences in the infiltration of the 22 immune
   cells. We plotted the correlations between the hub genes and immune
   cells.
     * 7
       Statistical analysis

   All statistical analyses were performed using means ± standard
   deviation. The data were analyzed using R software (version 3.6.3). A
   P-value <0.05 was considered statistically significant, and all
   statistical tests were two-sided. The overall analysis protocol used in
   this study is shown in [85]Fig. 1.

Fig. 1.

   [86]Fig. 1
   [87]Open in a new tab

   Flow chart.

3. Results

     * 1
       GEO data variance analysis

   The downloaded standardized data was preprocessed to obtain a boxplot
   of the data before and after preprocessing ([88]Fig. 2A and B). The
   results of the PCA analysis revealed specific differences between the
   metastatic and non-metastatic groups ([89]Fig. 2C). The [90]GSE21257
   data set screened the expression data for 24,994 genes with
   |logFC|>0.263, p.adj<0.05 as the threshold. Finally, 1960
   differentially expressed genes were identified, of which 1145 were
   up-regulated and 815 were down-regulated ([91]Fig. 2D and E). The ER
   stress-related genes were downloaded from the GeneCard database. Then,
   138 potential ER stress-related differentially expressed genes were
   identified by their intersection with the previously detected
   differentially expressed genes from the [92]GSE21257 data set ([93]Fig.
   2F).

Fig. 2.

   [94]Fig. 2
   [95]Open in a new tab
   Difference analysis (Pink is the metastatic group and green is the
   non-metastatic group): (A) Boxplot before [96]GSE21257 data processing.
   (B) Boxplots after [97]GSE21257 data processing. (C) PCA graph.
   (D)[98]GSE21257 dataset differential genes volcano plot. Red is
   up-regulated differential genes, blue is down-regulated differential
   genes, and grey is non-significant genes. (E) [99]GSE21257
   datasetdifferential genes heat map. Blue is the metastatic group, pink
   is the non-metastatic group, green is low expression, and red is high
   expression. (F) Venn diagram of the intersection of endoplasmic
   reticulum stress-related genes and differential genes. 138 potential ER
   stress-related differentially expressed genes. (For interpretation of
   the references to color in this figure legend, the reader is referred