Abstract Introduction Osteosarcoma (OS) is an invasive and lethal malignancy showing a low 5 year survival rate, underscoring the need for identifying new therapeutic targets and their inhibitors to enhance prevention and treatment strategies. Methods In this study, in vitro experiments including CCK-8 assay, anchorage-independent growth assays, and plate cloning assays were used to detect the anti-proliferation ability of natural compound tangeretin towards OS cells. An integrated approach was performed including WGCNA and network pharmacology to identify the key genes of tangeretin for the treatment of OS. Multigene diagnostic model, reverse transcription quantitative polymerase chain reaction (RT-qPCR) analysis along with molecular docking analysis were further conducted to validate the reliability of the targets obtained by bioinformatics methods. Single-cell and gene enrichment analyses were chosen to explore the mechanism of tangeretin in OS. Results Hub genes identified by the bioinformatics strategy included ABCC1, AKR1C3, BACE1, and CA12. RT-qPCR validation and molecular docking analysis confirmed that ABCC1 and BACE1 were the most likely potential targets. A multigene diagnostic model for OS demonstrated moderate accuracy of the hub genes. Single-cell sequencing results indicated that these two hub targets were closely related to OS and provided more potential mechanisms for targeting OS. Conclusion Our research highlights the therapeutic potential of the natural compound tangeretin and its antineoplastic mechanisms in OS. It offers new insights into the molecular mechanisms of tangeretin, paving the way for the development of effective OS treatments. Keywords: Osteosarcoma, Natural compound, Tangeretin, Therapeutic targets, Network pharmacology, WGCNA Introduction As a primary bone malignancy largely impacting adolescents and kids, Osteosarcoma (OS) represents the most prevalent form of bone cancer in this age group [[38]1]. It accounts for approximately 5% of all pediatric tumors and an alarming 20% of all bone malignancies [[39]1]. Even though chemotherapy, radiotherapy, and surgery are regarded as the most efficient clinical therapies, the overall survival proportion only reaches about 15–17% [[40]2]. Hence, it is highly indispensable to identify efficient molecular targets and their suppressors [[41]3–[42]5]. Natural compounds that were derived from fruits, vegetables, and medicinal plants were regarded as promising inhibitor resources for tumors [[43]6–[44]8]. Previous studies demonstrated that many flavonoids obtained from medicinal plants displayed promising anti-cancer properties through carcinogen inactivation, anti-proliferative effect, cell cycle arrestment, and apoptosis induction, etc. [[45]9, [46]10]. Tangeretin, which was found comprehensively in fruits of the citrus genus, pertains to a set of polymethoxylated flavonoids [[47]11]. Tangeretin elevated the Bax expression and decreased the expression of Bcl-2 in the BFTC-905 cell line, which also elicited premature and late apoptosis [[48]12]. Tangeretin could enhance the anticancer activity of metformin by improving the activation production of both AMPK and ROS [[49]13]. Nevertheless, little work has been done to value the anti-cancer activities and elucidate the underlying mechanisms of tangeretin on OS. This study first confirmed the inhibitory effect of tangeretin on OS and explored its potential mechanisms through a combined approach of bioinformatics and network pharmacology. A multigene diagnostic model was developed to assess the accuracy of four identified targets, with a moderate accuracy demonstrated ([AUC] = 0.575). ABCC1 and BACE1 were identified as the most likely targets through RT-qPCR and molecular docking analysis. We identified key core targets and elucidated the potential mechanisms of tangeretin in treating OS, providing new insights for both OS treatment and the clinical application of natural compounds. Materials and methods Cell lines and reagents Tangeretin (CAS# 481-53-8) was purchased from Shanghai Rechemscience (Shanghai, China). Cell substrate (RPMI1640, DMEM) and Fetal bovine serum (FBS) were supplied by BI (Shenzhen, China). RevertAid™ First Strand Scientific cDNA Synthesis Kit was supplied by Thermo Fisher Scientific (USA). Cell Counting Kit-8 Assay Kit was offered by Meilunbio (Dalian, China). Human OS cell lines 143B and KHOS were provided by the cell bank of the Chinese Academy of Sciences (Shanghai). 143B and KHOS cell lines were sustained in McCoy’s 5A medium (Corning) integrated with 1% streptomycin-penicillin (Gibco) and 10% FBS (Gibco). All cell substrates were preserved under 5% CO[2] at 37%. Cell proliferation Cell proliferation progress was assessed via Cell Counting Kit-8 Assay method. 143B cells and KHOS cells (1 × 10^3 each well) were fostered with 96-well plates all night. Concentrations of tangeretin (0,5,10,20,40 μM) were integrated into the above cells on next day. CCK-8 assay was dispensed into each well of the 96-well plates. The reaction time persisted for 1 h within a 5% CO[2] moist hatcher under light-exclusion conditions. Absorbance was then gauged spectrophotometrically at 450 nm. Anchorage-independent growth assays 143B and KHOS cells (8 × 10^3 each well) were suspended in Eagle’s Basal Medium (BME) comprising 0.33% agar and 10% fetal bovine serum (FBS) in the top layer, with discrepant doses of tangeretin (0, 5, 10, 20, 40 μM) were integrated into the blended agar in the top layer and base layer. The substrates were cultured in a hatcher containing 5% CO[2] for approximately 1 week at 37 ℃. Then, colonies were checked and counted through IN Cell Analyzer 6000. Plate cloning assays 143B cells and KHOS cells (800 each well) were fostered with 6-well plates for 8 days, and diverse concentrations of tangeretin (0, 5, 10, 20, 40 μM) were integrated into the above cells. When the clones had grown to the appropriate size, the cell culture medium was removed, and clones were subjected to staining with 0.5% crystal violet for a period of 4 min. Subsequently, clones were washed with deionized water, photographed, and then enumerated through IN Cell Analyzer 6000. Acquisition and analysis of the microarray data We have obtained two microarray data sets related to OS ([50]GSE14359, [51]GSE42572) from the Gene Expression Omnibus Database (GEO) ([52]https://www.ncbi.nlm.nih.gov/geo/) and their basic information was shown in Table [53]1. The R program inSilicoMerging was used to combine the datasets [54]GSE14359 and [55]GSE42572, and the batch effect [[56]14] was eliminated then using the technique developed by Johnson WE et al. [[57]15] to produce the final normalization profile. Table 1. Summary statistics Data sets number Platform information OS group Normal control Species [58]GSE14359 [59]GPL96 18 2 Homo sapien [60]GSE42572 [61]GPL13376 7 5 Homo sapien [62]Open in a new tab Differential expression analysis The analysis of differential expression was undertaken by the R package linear models for microarray (limma) (version 3.40.6), with the aim of identifying differential genes between normal controls and OS groups. Genes having an expression value of 0 for over 50% of the instances were removed in the acquired expression profiling datasets. Subsequently, the voom function was utilized to convert the data, and the lmFit function was further employed to carry out multiple linear regression. Log-odds of differential expression, moderated F-statistics and t-statistics were calculated by eBay's functionality via empirical Bayes mediacy of the standard errors with regard to a representative value, consequently achieving the distinctive significance of each gene. The genes that displayed differential expression genes (DEGs) between normal controls and OS groups were chosen via using a filtering criterion of |log 2 FC|> 1 and p < 0.05. Weighted gene co-expression network analysis (WGCNA) The Median Absolute Deviation (MAD) for individual genes was computed based on their respective expression profiles to assess the variability in gene expression across the samples. Initially, the upper 50% of genes exhibiting the highest MAD values were removed, as these genes exhibited extreme variability that could potentially represent noise rather than meaningful biological signals. Next, the goodSamplesGenes function from the WGCNA R package was employed to systematically identify and eliminate both outlier genes and samples that failed to meet quality control standards. This function helped to filter out any samples or genes with poor quality expression profiles, ensuring that only high-quality data were used for the downstream analysis. Subsequently, a scale-free gene co-expression network was constructed using the WGCNA method. The sensitivity parameter was set at 3, a value chosen to balance the trade-off between sensitivity and specificity in the co-expression network construction. This configuration aimed to capture the most relevant co-expressed gene pairs without introducing too much noise. A module dendrogram was then constructed through hierarchical clustering of genes based on their co-expression patterns, and a cut line was chosen to separate genes into distinct modules. The chosen threshold for module merging was set at a distance of 0.25, which meant that modules with a similarity distance lower than this value were merged into one co-expression module. This resulted in the identification of 23 unique co-expression modules. To assess the biological relevance of these modules, a correlation analysis was performed between each module and the clinical traits associated with OS, such as tumor progression, survival rates, and treatment response. The modules that exhibited a significant correlation with OS clinical scores were selected for further exploration. To identify the most relevant genes within these co-expression modules, Gene Significance (GS) and Module Membership (MM) values were calculated for each gene. GS measures the correlation between a gene’s expression profile and the clinical traits of interest, whereas MM indicates the strength of the association between a gene and its corresponding module. Genes with high GS and MM values were considered to be central to the module and relevant to the clinical traits of OS. A MM value greater than 0.8 and a GS value greater than 0.1 were established as the boundary criteria for selecting key OS-related genes. OS feature genes extraction and functional enrichment analysis DEGs were cross-referenced with the candidate genes identified through WGCNA to derive the OS-related feature genes [[63]16]. To investigate the signaling pathways and distinctive biological characteristics of the feature gene, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis were conducted based on the R package named clusterProfiler (version 3.14.3), with a significant set to  p < 0.05. Tangeretin targets prediction The prediction of tangeretin targets was performed using the Swiss Target Prediction database ([64]http://swisstargetprediction.ch/) and the PharmMapper database ([65]http://www.lilab-ecust.cn/pharmmaper/submitfile.html). The species Homo sapiens was selected for taxonomic categorization. Potential targets exhibiting a probability score > 0 were initially identified and subsequently processed through a deduplication procedure to obtain the final results of tangeretin-associated molecular targets. The target protein sequences were translated into corresponding gene identifiers, followed by the systematic exclusion of non-human genes through the utilization of the Uniprot database ([66]http://www.uniprot.org/). Establishment of drug-target-illness connection networks and analysis of functional enrichment The recognition of overlapping between tangeretin targets and OS feature genes was performed using Venny 2.1.0, an online bioinformatics tool (available at [67]http://bioinformatics.psb.ugent.be/webtools/Venn/). The protein–protein interaction network was established utilizing the STRING database ([68]http://string-db.org/) [[69]17]. The ClusterProfiler package (version 3.14.3) in R was used to conduct GO and KEGG enrichment analysis. Statistical significance was determined using a threshold of p < 0.05 for all enrichment analyses. Hub genes identification and GSEA analysis for tangeretin for the treatment of OS The identification of key regulatory genes associated with tangeretin treatment in OS was obtained through the integration of DEGs, WGCNA modules, and target gene datasets. The protein–protein interaction network was established using the STRING database. The identified hub genes comprise ABCC1, AKR1C3, BACE1, and CA12. To investigate the underlying signaling cascades and potential biological functions of the target genes, we performed a comprehensive Gene Set Enrichment Analysis (GSEA) [[70]18]. Genes were ranked based on differential expression and their enrichment across multiple important pathways was analyzed. The Enrichment Score (ES) and Normalized Enrichment Score (NES) were calculated for each gene set, with statistical significance determined by p-values and false discovery rates (FDR). Hub genes validation The transcriptomic profiling data, including raw read counts and associated clinical metadata for OS samples, were acquired from the TARGET database (accessible at [71]https://ocg.cancer.gov/programs/target) at level 3 RNA sequencing resolution. The statistical significance of survival differences among two or more distinct groups was assessed by the log-rank test within the Kaplan–Meier survival analysis. Subsequently, the predictive performance of the model was evaluated through time-dependent receiver operating characteristic (ROC) curve analysis. Feature extraction was performed by the Least Absolute Shrinkage and Selection Operator (LASSO) regression method, utilizing a tenfold cross-validation approach to ensure robust model evaluation. The glmnet package implemented in the R programming environment was performed to complete the analysis mentioned above. A predictive model was developed through multivariate Cox proportional hazards regression analysis. The analysis mentioned above was performed utilizing the survival package within the R programming environment. Initially, multivariate Cox proportional regression analysis was conducted utilizing the stepwise function to identify the ideal model, which will subsequently serve as the definitive predictive model. The statistical analysis of survival data was performed using Kaplan–Meier methodology. We acquired the p-value through the log-rank test, which was used to determine whether there was a significant difference between the survival curves of different groups. We also obtained the hazard ratio (HR) and its 95% confidence interval through univariate Cox regression, which was performed to assess the impact of a certain factor on survival risk and the uncertainty range of this impact. The statistical analyses were executed utilizing R packages based on the Foundation for R Statistical Computing (version 4.0.3 released in 2020). Statistical significance was determined using a threshold of  p < 0.05. A superior predictive performance was indicated by higher area under the curve (AUC) values along with lower p-values of Log-rank. We identified the genes demonstrating predictive performance with maximum AUC values, as determined through survival ROC analysis during the initial stage of model optimization. Computer docking model In silico docking was conducted using the Schrödinger Suite 2019 software. The three-dimensional (3D) structure of tangeretin was obtained from PubChem. The crystal composition of ABCC1, AKR1C3, BACE1, and CA12 were offered by the PDB database. The stimulation of docking as well as the choice of the genetic algorithm settings were performed by AutoDockTools software (version 1.5.7). The molecular docking analysis and preparation were conducted using Pymol software. Evaluating the expression of chosen genes through RT-qPCR TRIzol was utilized with the aim of isolating the overall mRNA of OS cells guided by the manufacture’s standard instructions. The appraisal of RNA sample quality and quantity was conducted through distinct analytical methods, utilizing conventional agarose gel electrophoresis for qualitative evaluation and NanoDrop spectrophotometric analysis for quantitative assessment. The transcription of complementary DNA (cDNA) from RNA was performed using the RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scientific, USA), following the protocol provided by the manufacturer. RT-qPCR analysis was conducted utilizing the RealQ Plus Master Mix Green reagent in conjunction with the Applied Biosystems 7500 Real-Time PCR platform. The specific RT-qPCR primer sequences are presented in Table [72]2. GAPDH served as an internal reference gene to quantify and normalize target gene expression levels, as well as to normalized the relative mRNA expression levels for the purpose of data analysis. Next, the 2^−△△CT approach was performed to analyze the data. Each sample was tested three times to ensure the accuracy of the data. Table 2. The details of specific RT-qPCR primer sequences Gene name Forward primer (5’ to 3’) Reverse primer (5’ to 3’) ABCC1 AGGAATTGGTTGTATAGAAG AAGGATGTTAGAGGTGAT AKR1C3 ACCTCTATCTTATTCATTCTC ATCCTTACACTTCTCCAT BACE1 ACCTCTTCTATCTAATCCTT AACCACTCTGAATTATGTAA CA12 AACCTGGAATGGAGACTT GAATATGGAGTATCTTAGTAGCA [73]Open in a new tab Analyze the single-cell data using t-SNE method The single-cell datasets along with their annotation outcomes were obtained in h5 file format from the Tumor Immune Single-cell Hub (TISCH) database. The MAESTRO and Seura packages within the R programming environment were utilized in the aim of analyzing the single-cell RNA sequencing data. The cells were re-clustered using the t-SNE method. Statistical analysis GraphPad Prism 8.0 was used for all statistical analyses, with quantitative results presented as mean ± SD. To determine significant differences between groups, either the unpaired Student’s t-test or one-way analysis of variance (ANOVA) was employed, depending on the experimental design. Post-hoc tests were conducted where appropriate to further analyze multiple group comparisons. To ensure sufficient statistical power, power analysis was performed prior to the experiments to justify the sample sizes for the in vitro assays, including cell proliferation, colony formation, and migration assays. Power calculations were conducted based on effect size, variability, and significance thresholds to confirm that the study was sufficiently powered to detect meaningful differences. Specifically, power calculations were aimed to achieve a power of 80% or greater with a significance level of 0.05, ensuring that the study had a high probability of detecting true effects and minimizing the risk of type I and type II errors. Statistical significance was determined with the following thresholds: *p < 0.05, **p < 0.01, and ***p < 0.001. Results Tangeretin inhibits the proliferation of OS After preliminary screening of natural compounds, we found that tangeratin has an inhibitory effect on OS. In an effort to delve deeper into the effects of tangeretin, we evaluated the proliferation of 143B and KHOS cells following exposure to varying concentrations of tangeretin (0, 10, 20, and 40 μM) utilizing proliferation assays. Findings illustrated an important dose-dependent inhibition of proliferation in 143B and KHOS cells post-treatment with tangeratin (Fig. [74]1A–B). Quantitative analysis of plate clone formation assays revealed a marked suppression of clonogenic potential in cells exposed to tangeretin, as evidenced by both decreased colony diameter and lower colony counts. (Fig. [75]1C–D). Comparative analysis presented that tangeretin exhibited superior inhibition of colony formation in 143B and KHOS (Fig. [76]1E–F). These findings implied that tangeretin treatment efficiently hindered cellular proliferation and colony-forming capacity in a dose-dependent manner. Fig. 1. [77]Fig. 1 [78]Open in a new tab Tangeretin attenuates OS cell proliferation. A, B The influences of discrepant tangeretin concentrations on cell progress capacity of 143B and KHOS at 1 day, 2 days, and 3 days were examined via the CCK-8 assay.; C, D Colony formation cell progress trial was executed to appraise the impact of tangeretin (0, 5, 10, 20, 40 μM) on the progress of 143B (C) and KHOS (D). E, F Anchorage-independent cell progress trial was executed to appraise the impact of tangeretin (0, 5, 10, 20, 40 μM) on the progress of 143B (E) and KHOS (F). Statistical analysis was executed through the Students’ unpaired t-test. *p < 0.05, **p < 0.01, and ***p < 0.001 signify key disparities in the groups. The outcomes aligned with three isolated experiments Tangeretin target prediction To further investigate the mechanism of tangeratin in OS, we firstly utilized two computational prediction platforms to perform comprehensive target prediction analyses for tangeretin. The molecular structure of tangeretin, represented by either SMILES notation or SDF format, was entered into the two computational platforms: Swiss Target Prediction (accessible at [79]http://www.Swisstargetprediction.ch/) and the PharmMapper database ([80]http://www.lilab-ecust.cn/pharmmapper/submitfile.htlm). Human subjects were screened, and potential targets with a probability score greater than zero were identified. Duplicate entries were subsequently eliminated to derive a refined set of biologically relevant targets. The protein targets were mapped to their corresponding genes through the Uniprot database (available at [81]http://www.uniprot.org/), yielding a total of 345 target genes. This approach represents a vital step in elucidating the pharmacological profile and potential therapeutic applications of tangeretin at the molecular level. Differential genes identification between OS and normal controls We employed bioinformatics approaches to explore the molecular signatures of OS, with particular emphasis on identifying protein expression profiles that distinguish OS groups from normal controls. The integration of [82]GSE14359 and [83]GSE42572 datasets was conducted using the inSilicoMerging R package jointly with the approach described by Johnson et al. [[84]15]. This method enabled the effective removal of batch effects, resulting in a consolidated expression data profile comprising 32 distinct samples. Following batch effect elimination, it was apparent that the data distribution between each dataset was consistent according to the bar plot, with median values aligning along a single horizontal axis. This result indicated successful normalization of inter-batch variations, suggesting that the datasets were now comparable for subsequent analyses. (Fig. [85]2A and 2B). The expression density plot revealed significant heterogeneity in sample distributions across datasets before batch effect elimination, indicative of substantial technical variation. Following batch effect removal, the datasets demonstrated improved alignment, with comparable distribution patterns characterized by similar central tendencies and dispersion measures (Fig. [86]2C and 2D). The spatial distribution patterns of individual datasets were visualized using uniform manifold approximation and projection (UMAP) dimensionality reduction, with comparative analyses conducted both prior to and following batch effect elimination (Fig. [87]2E and 2F). Fig. 2. [88]Fig. 2 [89]Open in a new tab Normalization procedure for integrating multiple datasets from GEO. A Visualization of gene expression distributions across both datasets prior to normalization. B Comparative analysis of expression profiles following normalization procedures. C Density distribution patterns of gene expression levels in unnormalized datasets. D Normalized expression density profiles demonstrating data standardization effects. E UMAP dimensional reduction analysis of pre-normalized datasets. F UMAP visualization of normalized datasets illustrating clustering patterns post-normalization Following the normalization of the expression profile, differential gene expression analysis was performed using stringent criteria (|log2 fold change|> 1, p-value < 0.05), which identified a significant number of DEGs. The DEGs were graphically represented through a volcano plot (Fig. [90]3A), accompanied by a heatmap that highlighted the 50 most significantly elevated-regulated and the 50 most significantly reduced-regulated genes (Fig. [91]3B). Fig. 3. [92]Fig. 3 [93]Open in a new tab The DEG analysis resulted from the [94]GSE14359 and [95]GSE42572 datasets. A volcano plots where up-regulated genes are denoted in red, while down-regulated genes are marked in green. B Hierarchical clustering heatmap of the 50 most significantly differentially expressed genes between OS and normal tissue samples Weighted gene-co-expression network analysis (WGCNA) The MAD was computed for individual genes based on the normalized expression profile. Following this, half of the genes exhibiting the lowest MAD values were excluded from subsequent analyses. Outlier detection and sample filtering were performed by the Sample Genes algorithm implemented in the WGCNA R package. Subsequently, a scale-free gene co-expression network was constructed through WGCNA analysis, with the soft-thresholding power parameter determined at 3. This configuration yielded an average network connectivity of 5.87 and achieved a scale-free topology fit index of 0.95 (Fig. [96]4A, [97]B). With a module merging threshold of 0.25, sensitivity parameter adjusted to 3, and minimum module size established at 30, 23 unique co-expression modules were successfully identified (Fig. [98]4C). The parameter selection represents a balanced method between biological relevance and module specificity, as proved by the resulting module count and size distribution. Subsequently, through comprehensive correlation analyses between individual module and clinical traits (Fig. [99]4D), as well as the association between module membership (MM) and gene significance (GS), the turquoise module containing 178 genes emerged as the most significant candidate for subsequent exploration (correlation coefficient = -0.58, p = 4.7e^−4). The scatter plot analysis focusing on the correlation between GS and MM demonstrated a highly significant association, with a p-value of 2.1e-^67 (Fig. [100]4E). Fig. 4. [101]Fig. 4 [102]Open in a new tab Outcomes of the WGCNA. A Scale-free topology of the standardized expression profile. B Mean connectivity values derived from the standardized expression profile. C Hierarchical clustering dendrogram from gene expression profiles. D The association between distinct gene modules and clinical traits, where light blue and blue hues represent positive and negative correlations, respectively. E The association between the degree of module membership (MM) and gene significance (GS) within the turquoise module was examined Identification and functional annotation of OS related genes and pathway enrichment evaluation Through the intersection between the 178 genes identified by WGCNA and the 635 DEGs, a subset of 130 overlapping genes was obtained (Fig. [103]5A). These genes were subsequently selected as characteristic biomarkers for OS progression. Functional annotation and pathway enrichment analyses were conducted on the 130 candidate genes utilizing The GO and KEGG databases. Gene ontology biological process (GO-BP) analysis revealed essential enrichment of the identified feature genes in critical cellular pathways, particularly associated with cellular demise mechanisms, including both programmed cell death and apoptosis (Fig. [104]5B). The Gene Ontology Cellular Component (GO-CC) analysis unveiled significant enrichment primarily in extracellular compartments, including the extracellular matrix, vesicular structures, and specific subdomains of the extracellular region (Fig. [105]5C). The top three significantly enriched categories identified through Gene Ontology Molecular Function (GO-MF) analysis were associated with cell adhesion molecule binding, structural components of the extracellular matrix, and integrin-mediated binding interactions (Fig. [106]5D). Furthermore, according to KEGG pathway analysis, the signature genes were substantiated predominantly associated with key biological processes, including lysosomal function, autophagic pathways, and metabolic modulation of glycolysis/gluconeogenesis (Fig. [107]5E). Fig. 5. [108]Fig. 5 [109]Open in a new tab Identification of OS characteristic genes and their functional annotation. A The overlapping gene sets between WGCNA module genes and DEGs. B Bubble chart illustrating the KEGG pathway enrichment results, highlighting the top 10 significantly enriched pathways among OS-associated genes. C–E GO enrichment outcomes, depicting the premier 10 enriched biological procedures categories (C), premier 10 cellular component terms (D), and the premier 10 molecular functions categories (E). In all bubble plots, the color gradient and bubble diameter respectively indicate the statistical significance (p-values) and the number of genes involved in each term Tangeretin treatment OS potential targets acquisition and single gene GSEA analysis Through comprehensive analysis, four candidate target genes of tangeretin in OS were identified by intersecting 130 characteristic genes with 345 obtained tangeretin-associated target genes. This integrative approach revealed key molecular targets potentially mediating the therapeutic effects of tangeretin in OS (Fig. [110]6A). To elucidate the functional significance of target genes in OS progress, we conducted GSEA analysis at the individual gene level to identify potential pathways involved. The analysis implied that tangeretin predominantly modulates genes associated with critical biological processes, including cell cycle regulation, proteasomal degradation, meiotic division in oocytes, base excision repair mechanisms, and DNA replication pathways (Fig. [111]6C-D). Fig. 6. [112]Fig. 6 [113]Open in a new tab Identification and functional characterization of potential molecular targets associated with the therapeutic effects of tangeretin in OS. A The overlapping gene sets between OS-related molecular signatures and tangeretin targets. B Circular visualization of KEGG enrichment pathway enrichment analysis, highlighting the biological processes influenced by tangeretin in OS. C, D, E, F GSEA enrichment analysis of ABCC1 (C), AKR1C3 (D), BACE1 (E), and CA12 (F) Hub genes validation A predictive polygenic model was established through the implementation of logistic regression analysis, utilizing standardized gene expression profiles as input data. The model was trained on normalized transcriptomic data to establish robust associations between gene expression patterns and phenotypic outcomes. This computational framework employed statistical learning techniques to integrate multiple genetic factors and quantify their collective contribution to complex traits. The logistic regression approach provided a probabilistic framework for estimating the likelihood of specific phenotypic manifestations based on the combined effects of multiple genetic loci. As a result, four hub genes (ABCC1, AKR1C3, BACE1, CA12) were identified to construct the diagnostic model utilizing stepwise regression analysis. The findings indicated that the predictive model developed using these key target genes demonstrated significant discriminatory power in differentiating OS samples from normal controls, achieving an AUC value of 0.575 for the three-year survival analysis (Fig. [114]7A-D). Fig. 7. [115]Fig. 7 [116]Open in a new tab A The association between the expression levels of CA12, BACE1, AKR1C3, and ABCC1 with survival outcomes in the Target dataset. The upper panel displays a gradient of gene expression intensities, categorized by distinct color codes representing varying expression clusters. The central panel depicts the correlation between gene expression profiles and corresponding survival durations and statuses across different samples. The lower panel presents a heatmap visualization of gene expression patterns. B The KM survival analysis for CA12, BACE1, AKR1C3, and ABCC1 within the Target dataset. The log-rank test was employed to compare survival differences between expression groups, with HR denoting the hazard ratio for the high-expression cohort relative to the low-expression group. C The ROC curves as well as AUC values for CA12, BACE1, AKR1C3, and ABCC1 at different time points, where a higher AUC value indicates a stronger predictive ability of the gene Quantitative assessment of gene expression profiles for targeted genetic markers by RT-qPCR To confirm the findings from computational analyses, we implemented reverse RT-qPCR assays targeting ABCC1, AKR1C3, BACE1, and CA12 genes. These experiments were conducted using cDNA derived from OS cells that had been exposed to tangeretin treatment, along with corresponding untreated control samples. The results were shown in Fig. [117]8A-B, the expression of BACE1 and ABCC1 was significantly decreased relative to the control. Fig. 8. [118]Fig. 8 [119]Open in a new tab A, B RT-qPCR Analysis of BACE1 and ABCC1 Expression: The expression levels of BACE1 and ABCC1 were quantified using RT-qPCR in OS cells treated with tangeretin. Statistical significance was determined using the unpaired Student’s t-test, with *p < 0.05, **p < 0.01, and ***p < 0.001 indicating significant differences between the treatment and control groups. The data represent results from three independent experiments, ensuring reproducibility and reliability of the findings. C, D Computational Docking Models: Molecular docking simulations were performed to assess the binding affinity of tangeretin with BACE1 (C) and ABCC1 (D), providing insights into their potential interactions at the molecular level. The docking results were visualized, with binding energies indicating the strength of the ligand-receptor interactions. E, H t-SNE Clustering of Single-Cell Data for BACE1 (E) and ABCC1 (H): t-distributed stochastic neighbor embedding (t-SNE) was used to visualize the clustering of single-cell data, highlighting the distinct populations of cells expressing BACE1 (E) and ABCC1 H. Each cell type was color-coded to show differentiation, helping to identify specific cell populations involved in OS. F, I Spatial Distribution of Gene Expression: t-SNE plots (F for BACE1 and I for ABCC1) showed the spatial distribution of BACE1 and ABCC1 gene expression across individual cells. The color gradient indicated the relative expression levels, with darker colors representing lower expression and brighter colors reflecting higher gene activity. G, J Quantitative Bar Graphs of Gene Expression Across Cell Types: Bar graphs (G for BACE1 and J for ABCC1) summarize the relative expression levels of these genes in different cell types. This comparative analysis provided a clear overview of how gene expression patterns varied across various cell populations, offering insights into their potential roles in OS progression and treatment response Molecular docking validation To further verify whether BACE1 and ABCC1 were therapeutic targets for OS, we employed molecular docking analysis to assess the binding affinity between ligand molecules and their corresponding receptor targets. The computational approach allowed for the systematic evaluation of molecular interactions and binding potential at the atomic level. The stability of ligand-receptor interactions was inversely correlated with binding energy values, where lower energy levels indicated stronger molecular associations. Molecular docking analyses revealed that binding affinities below −5.0 kcal/mol demonstrated optimal binding characteristics, whereas interactions with energy values exceeding −4.25 kcal/mol represented baseline binding capabilities [[120]19]. The two core proteins (ABCC1 and BACE1) were docked against tangeretin using Autodock software, and docking scores are shown in Fig. [121]8C, 8D. The docking results were analyzed by Pymol software (Fig. [122]8C, 8D). The results showed that the binding energy of two core proteins and tangeretin all was ≤ -5 kcal/mol. Analyze the single-cell data using t-SNE method To explore the cellular composition and molecular profiles linked to two candidate hub genes, we employed t-distributed Stochastic Neighbor Embedding (tSNE) for dimensionality reduction and visualized the single-cell transcriptomic data. This unsupervised approach enabled us to identify distinct cell populations and their associated gene expression patterns. The highest levels of BACE1 expression were observed in several cell types, including fibroblasts, malignant cells, monocyte-macrophages, endothelial cells, osteoblasts, conventional CD4 + T cells, plasma cells, and exhausted CD8 + T cells, all of which exhibit associations with OS (Fig. [123]8E–G). In parallel, the highest levels of ABCC1 expression were predominantly observed in monocyte-derived macrophages, CD8 + exhausted T cells, vascular endothelial cells, osteoblasts, fibroblast populations, malignant cells, and plasma cell lineages (Fig. [124]8H–J). This expression pattern suggested a potential role of ABCC1 in diverse cellular processes across multiple tissue types. The consistent upregulation in these specific cell populations might indicate functional significance in cellular transport mechanisms and drug resistance pathways. Notably, the expression profile demonstrated tissue-specific regulation, with particularly strong signals detected in both normal and transformed cellular compartments. These findings provided valuable insights into the potential biological functions and therapeutic implications of BACE1 and ABCC1 in OS. Discussion As the most prevalent primary bone malignancy, OS, presents a considerable challenge in enhancing patient survival rates and treatment globally [[125]20]. Traditional chemotherapy, relying on the same drugs since the 1980s, is often ineffective due to resistance and severe side effects. The high heterogeneity of OS cells further complicates the development of targeted therapies. As a result, there is a pressing need for new therapeutic strategies. Natural dietary compounds, such as tangeretin, have gained attention for their potential health benefits, including cancer prevention. However, the therapeutic efficacy and mechanisms of tangeretin in OS remain unclear. In this study, we applied Weighted Gene Co-expression Network Analysis (WGCNA) to reconstruct gene co-expression networks, aiming to reveal key genes and transcriptional regulation mechanisms involved in OS pathogenesis. ABCC1 (MRP1), a constituent of the ABC transporter superfamily, is initially characterized by Cole and colleagues in 1992 [[126]21]. This membrane-associated protein has been proven to confer cellular resistance to multiple chemotherapeutic agents, including etoposide, doxorubicin, and vincristine, through its drug efflux activity [[127]22]. The accessibility of the ABCC1 enhancer was significantly elevated in various malignancies including Clear Cell Renal Cell Carcinoma (ccRCC), Glioblastoma Multiforme (GBM), and Uterine Corpus Endometrial Carcinoma (UCEC), representing a subset of genes that exhibit downregulation following exposure to ultraviolet radiation [[128]23]. The ABCC1 gene is responsible for encoding multidrug-resistance protein-1 (MRP1), which plays a pivotal role in tumor progression by facilitating drug efflux mechanisms in neuroblastoma cells [[129]24] Additionally, ABCC1 contributes to tumorigenesis by modulating lipid-signaling pathways in uterine leiomyoma cells [[130]25]. Similar to other members of the ABC transporter family, ABCC1 functions as a membrane-bound efflux pump responsible for the cellular export of various chemotherapeutic compounds. Recent studies have demonstrated that the inhibition of specific protein kinases through tyrosine kinase inhibitors presents a promising novel therapeutic strategy for OS management [[131]26, [132]27]. It has also to be taken into account that ABC substrates play key roles among tyrosine kinase inhibitors [[133]28, [134]29]. Tetrandrine-mediated inhibition of ABC transporters has been demonstrated to enhance apoptotic processes in OS cells [[135]30], while also exhibiting the potential to eradicate OS-derived cancer stem cell populations. Moreover, Emerging evidence indicates that OS stem cells exhibit phenotypic and functional similarities with cancer stem cells (CSCs) derived from various malignancies [[136]31–[137]33]. Intrinsic chemoresistance in OS is largely driven by the upregulation of ABC transporter proteins, a mechanism observed across various tumor types. This suggests a shared survival strategy among CSCs, contributing to treatment resistance in OS. Based on bioinformatic analysis, ABCC1 emerged as a pivotal hub gene in OS, identified through its gene expression patterns, network centrality, and functional relevance. This highlights ABCC1 as a key player in OS progression, suggesting its potential as a therapeutic target for overcoming chemoresistance and eliminating OS stem cells. The β-site amyloid precursor protein cleaving enzyme 1 (BACE1) gene and its related proteins play a multifaceted role in cancer prevention and treatment, including regulating cell proliferation, affecting the tumor microenvironment, serving as therapeutic targets, and being associated with CSCs characteristics. The results indicate that BACE1 could potentially serve as a novel therapeutic target in cancer treatment strategies [[138]34]. Research investigates the radiosensitization effects of BACE1 functional inhibition in various cancer cell lines, suggesting BACE1 as a potential radiosensitization target for cancer radiotherapy[[139]35]. Elevated expression levels of BACE1 and BACE2 have been demonstrated to correlate with the progression of multiple human malignancies [[140]34]. Emerging evidence indicates that BACE1 plays a pivotal role in modulating various signaling cascades implicated in the development of malignancies. [[141]36]. In mammary carcinoma cells, the amyloid precursor protein (APP), which interacts with BACE1, has been proven to exert remarkably regulatory effects on cellular proliferation processes [[142]37]. The novel function of BACE1 in promoting cellular viability within pancreatic carcinoma cells was initially documented, alongside a comparative analysis of the suppressive impacts exerted by various inhibitors on cell survival [[143]38]. BACE1/2 fosters cancer progression by affecting tumor microenvironmental characteristics, particularly through their influence on neutrophil extracellular traps (NETs), and therefore, their inhibition has a regulatory effect on tumor growth [[144]39]. Research indicates that pharmacological inhibition of BACE1 using MK-8931 induces phenotypic reprogramming of pro-tumorigenic macrophages (pTAMs) into tumor-suppressive macrophages (sTAMs), thereby exerting anti-tumor effects in glioblastoma models [[145]40]. BACE1 has emerged as a promising therapeutic target in prostate cancer management. The expression of this gene has been detected in prostate cancer (PCa) tissue specimens, and treatment with the BACE1-specific pharmacological inhibitor MK-8931 has demonstrated significant inhibition of PCa cell proliferation [[146]41]. The experimental findings demonstrated that pharmacological inhibition of BACE1 or genetic knockout of BACE1 significantly altered TAM polarization. Specifically, these interventions facilitated the phenotypic transition from the protumoral M2-like state to the antitumoral M1-like state, thereby contributing to the observed tumor growth inhibition mediated by BACE1 modulation [[147]42]. The transcriptional abundance of BACE1-AS in hepatocellular carcinoma specimens demonstrates significant associations with clinicopathological parameters, with elevated BACE1-AS expression serving as an independent prognostic indicator for diminished overall survival and increased risk of disease recurrence in hepatocellular carcinoma patients [[148]43]. Targeting BACE1 holds therapeutic potential across various cancers, making it a promising molecular target for drug development. Its involvement in tumorigenesis warrants further preclinical and clinical exploration. While the study provides valuable insights into the effects of tangeretin on OS and identified key molecular targets, there are several limitations that should be considered. First, while the computational target prediction and pathway enrichment analyses suggest potential targets of tangeretin, the identification of these targets relies heavily on in silico methods, which may not fully capture the complexity of in vivo interactions. Additionally, the experimental validation of these targets was conducted only through in vitro cell line models, which may not entirely reflect the tumor microenvironment and the full range of genetic heterogeneity observed in OS patients. Moreover, although the RT-qPCR and molecular docking analyses confirmed the interaction between tangeretin and key targets like ABCC1 and BACE1, the study did not include detailed in vivo studies to assess the pharmacokinetics, toxicity, or long-term therapeutic efficacy of tangeretin, limiting the translation of these findings to clinical applications. Furthermore, the focus on a few hub genes may overlook other potential molecular players involved in tangeretin's mechanism of action, and the absence of clinical trial data leaves unanswered questions about the drug's relevance and effectiveness in human OS cases. Finally, while single-cell RNA sequencing data provided useful insights into the expression profiles of ABCC1 and BACE1 across different cell populations, further studies are needed to validate the functional significance of these findings in the context of OS treatment and progression. Conclusion In conclusion, our comprehensive investigation employing WGCNA, network pharmacology, RT-qPCR, and molecular docking has revealed that ABCC1 and BACE1 emerge as promising molecular targets implicated in the regulation of OS. These findings offer novel insights and potential avenues for the development of therapeutic interventions and pharmacological strategies targeting OS. Author contributions Conceptualization: J. T. R.Y. and Z. Y., Writing—Original Draft: R.Y.; Software and Formal analysis: X. L., H. Z. and Y. Y.; Investigation: R. S. and Y.W.; Writing—Review & Editing: All authors; Funding acquisition: R.Y. X.L. and Z.Y. Funding This work was supported by the program of TCM research in Henan Province, China (2022ZY1136, 2022ZY1123, 2024ZY2111), Medical Science and Technology Project of Henan Province, China (LHGJ20240454). Data availability The data and materials are all included in this manuscript. Declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare no competing interests. Footnotes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Ruoping Yanzhang and Zhaojie Yang contributed equally to this work. References