Abstract Objective Metabolism-related lncRNAs may play a significant role in the occurrence and development of breast cancer. This study aims to identify metabolism-related lncRNAs with high predictive value for prognosis and to construct a model that can predict the prognosis of breast cancer individually. Methods Transcriptome data and clinical data of patients with breast cancer were retrieved from the TCGA database, and metabolism-related genes were sourced from the GSEA database. Metabolism-related lncRNAs in breast cancer were obtained through differential expression analysis and Pearson correlation analysis. Prognostic-related lncRNAs were further screened using Univariate Cox regression and LASSO regression. Kaplan–Meier survival analysis was performed and the survival curve of the two groups was drawn. Univariate and Multivariate Cox regression analyses were conducted to identify the independent prognostic factors, which were subsequently integrated into a nomogram for individualized prognostic prediction. Results Through differential analysis, 2135 differential lncRNAs were obtained, of which 231 were metabolism-related lncRNAs. Using Univariate Cox regression and LASSO regression, a risk prediction model incorporating 19 metabolism-related lncRNAs was constructed. The survival curve suggested that patients with high-risk scores had a poor prognosis compared to those with low-risk scores (P < 0.05). Cox regression analysis further identified that age, stage classification, distant metastasis and risk score as independent prognostic factors to construct a nomogram. KEGG pathway enrichment analysis revealed that differential lncRNAs may be related to JAK-STAT signaling pathway, MAPK signaling pathway and mTOR signaling pathway. Finally, based on the analysis of the CIBERSORT algorithm, lncRNAs used in the construction of the model had a strong correlation with CD8^+T cells, activated CD4^+T cells and the polarization of M2 macrophages. Conclusion Bioinformatics methods were utilized to identify metabolism-related lncRNAs associated with breast cancer prognosis, and a prognostic risk model was constructed, laying a solid foundation for the study of metabolism-related lncRNAs in breast cancer. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02178-y. Keywords: Breast cancer, Metabolism, LncRNA, LASSO, Prognostic model, CIBERSORT Background Breast cancer is the most common malignant tumor among women, with its incidence steadily increasing each year. In 2022, there were more than 2.3 million new cases of breast cancer in women worldwide, accounting for 11.6% of all newly diagnosed cancers. The deaths resulting from breast cancer reach more than 660 thousand, representing 6.9% of the total deaths because of cancer [[34]1]. Studies have revealed that breast cancer is a type of malignant tumor with a high degree of heterogeneity at the molecular level, which causes different patients to show significant differences in clinical manifestation, treatment response, and prognostic survival [[35]2, [36]3]. At present, the clinical features of tumor morphology and its pathological stage are mainly used to evaluate the prognosis of patients. Common prognostic indicators include tumor size, lymph node metastasis, histological grade, and clinicopathological stage, among others [[37]4, [38]5]. However, more and more clinical cases have shown that there are significant differences in treatment sensitivity and survival time with the same pathological type or clinical stage [[39]6–[40]8]. Therefore, investigating the relationship between the molecular heterogeneity of breast cancer and its occurrence and development at the genomic level, and to identifying novel biomarkers closely related to early diagnosis, disease monitoring and prognosis evaluation, may provide a novel perspective for the precise diagnosis and personalized treatment of breast cancer. Extensive evidence shows that tumor metabolism plays a key role in the occurrence and development of cancer. Tumor metabolism mainly includes sugar metabolism, nucleic acid metabolism, enzyme metabolism, and protein metabolism. Prognostic markers based on metabolism-related long non-coding RNA (lncRNA) have been convinced to be useful for assessing the overall survival rate of a variety of cancers, including melanoma, cervical cancer, and colon cancer, among others [[41]9–[42]12]. Among these, the metabolism-related lncRNA NEAT1 is important for promoting metabolic changes in breast cancer growth and metastasis. Studie has shown that Pinin (PNN) mediates glucose-stimulated nuclear export of NEAT1, enabling isoform-specific and paraspeckle-independent functions. Depletion of Pinin leds to the cytoplasmic depletion and aberrant nuclear accumulation of nuclear paraspeckle assembly transcript 1 (NEAT1) even upon glucose stimulation, and this phenotype was not associated with changes in NEAT1 expression [[43]13]. This study analyzed the transcriptome sequencing data and clinical prognosis information of patients with breast cancer in The Cancer Genome Atlas (TCGA), so as to explore the relationship between the expression of metabolism-related lncRNAs and tumor prognosis in breast cancer. By screening the prognosis-related metabolic lncRNAs, a prognostic risk model that can accurately predict the prognosis of patients with breast cancer was constructed, providing guidance for the evaluation of the prognosis. Additionally, this study explored the impact of metabolism-related lncRNAs on the immune microenvironment of breast cancer, offering new insights into their potential roles in tumor progression and immune regulation. Materials and methods Data resources The research flowchart of this study is shown in Fig. [44]1. Transcriptome data of breast cancer tissue samples was downloaded from TCGA ([45]https://cancergenome.nih.gov), all samples contain complete high-throughput sequencing data. The expression matrix of lncRNA was extracted by perl script. The clinical data of patients with breast cancer at the same time was download as well, including age, TNM stage, tumor stage, overall survival time, and survival status. Patients with unclear clinical stage and unclear prognostic information were excluded. Metabolism-related data was from Gene Set Enrichment Analysis (GSEA, [46]https://www.gsea-msigdb.org/gsea/index.jsp). Fig. 1. [47]Fig. 1 [48]Open in a new tab Flowchart of this study Differential expression analysis The “limma” package in R software was used to perform differential analysis, and the intersection of the metabolic-related genes and lncRNAs differentially expressed in breast cancer tissues and adjacent tissues in TCGA. | log2FC |> 1 and false discovery rate (FDR) < 0.05 were set as the threshold for screening criteria. Pearson correlation analysis was used to calculate the correlation coefficient (R^2) between lncRNAs and metabolism-related genes. Similarly, lncRNAs with R^2 > 0.3 and P < 0.05 were defined as metabolism-related lncRNAs. Construction of a prognosis model Patients with breast cancer were divided into the training set and validation set at the ratio of 2:1. Univariate Cox regression analysis on metabolism-related differential lncRNAs was performed based on the training set, so as to obtain lncRNAs related to patients’ survival, and then Least absolute shrinkage and selection operator (LASSO) regression analysis was used to further determine the key prognosis-related metabolic lncRNAs. Multivariate Cox regression was applied for generating risk coefficients and constructing a risk model. According to the model, the risk scores of different patients were calculated. The patients were thereby divided into high risk and low risk groups on the basis of scores, and the survival curve was used to compare the prognosis of the two groups for validation. Area under curve (AUC) was generated by the Receiver Operating Characteristic (ROC) curve analysis to evaluate the prediction effect of the model. The above analyses were verified in the validation set. Clinical subgroup analysis By using “survival” package in R software, combined with the clinical subgroup characteristics of patients (age, TNBC, TNM stage, clinical stage), subgroup analysis of the lncRNAs risk model was carried out. The ability of the risk model to distinguish between high risk and low risk patients in different subgroups was thus clarified. Construction and evaluation of the nomogram Univariate and Multivariate Cox regression were used to analyze risk scores and clinical factors, and independent prognostic factors were screened out. We built a nomogram based on the results of Multivariate Cox regression including risk scores. The 3 year and 5 year Overall Survival (OS) for each patient based on the nomogram were predicted. At the same time, the C index, clinical decision curve, calibration curve and ROC curve were obtained to evaluate the prediction effect of the model. The above results were all verified in the validation set to ensure the stability. GSEA enrichment analysis KEGG analysis was performed on the selected metabolism-related lncRNAs by GSEA software to identify genes with rich function and classify gene clusters. The purpose of this analysis was to identify specific pathways that are significantly enriched in the selected lncRNAs and to uncover underlying biological processes that may be associated with metabolism. P < 0.05 can be seen as statistically significant. This approach provides valuable insights into the potential roles of metabolism-related lncRNAs and their involvement in various cellular functions. Analysis of the correlation between prognostic lncRNAs and tumor immunity The infiltration level of 22 immune cells in patients with breast cancer was analyzed with Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT). Then, based on correlation analysis, the correlation between metabolism-related lncRNAs and infiltration of immune cells was clarified, thus exploring the mechanism of prognostic lncRNAs affecting breast cancer progression. Statistical analysis The analysis in this study was performed by R software version 4.0.4, in which “limma” package was used for the acquisition of differential genes, cluster Profiler and org.Hs.eg.db were used for functional enrichment analysis, and “survival” package was used for Kaplan–Meier survival analysis. Meanwhile, we used R software to perform single or Multivariate Cox regression, LASSO regression and ROC curve analysis, and evaluated the value of the obtained risk factors in predicting the prognosis of patients with breast cancer. All statistical tests were two-tailed, and P < 0.05 was considered significant. Results Metabolism-related lncRNAs obtained in differential expression analysis and correlation analysis In order to identify the metabolism-related lncRNAs to the prognosis of breast cancer, we first performed differential analysis on the expression data from 1226 cases of breast cancer tissues and normal breast tissues downloaded from the TCGA database. 896 patients were obtained after eliminating samples with data defects. Metabolism-related genes were obtained from GSEA, and a total of 214 metabolism-related genes that were differentially expressed in breast cancer and adjacent tissues were obtained by differential expression analysis, of which 130 genes were down-regulated and 84 genes were up-regulated (Fig. [49]2A, Supplementary Table 1). a total of 2135 lnRNAs that were differentially expressed in breast cancer and adjacent tissues were obtained by differential expression analysis, of which 1061 lncRNAs were down-regulated and 1074 lncRNAs were up-regulated (Fig. [50]2B, Supplementary Table 2). Then, Pearson correlation analysis was performed on 216 differentially expressed metabolism-related genes and 1023 differentially expressed lnRNAs, 231 metabolic-related lncRNAs were obtained, and | R2 |> 0.5 and P < 0.05 were set as the standards (Supplementary Table 3 and Supplementary Table 4). Fig. 2. [51]Fig. 2 [52]Open in a new tab The volcano map showed the differentially expressed metabolism-related genes and lncRNAs in breast cancer tissues and adjacent tissues. Genes with significant upregulation are highlighted in red, while those with significant downregulation are shown in green; genes with no significant differential expression are represented in black. A The volcano map showed differentially expressed metabolism-related genes. B Volcano map showed differentially expressed lncRNAs Construction of the prognostic risk prediction model Based on the comprehensive analysis of the training set, a Univariable Cox regression analysis was conducted to evaluate the association between the selected metabolism-related lncRNAs and the survival outcomes of the study participants. Each lncRNA’s expression levels were tested independently in relation to patient survival, and a total of 21 lncRNAs related to prognosis were found, namely LINC01614 (P = 0.001), [53]AC021087.3 (P < 0.001), MIR193BHG (P = 0.016), SNHG26 (P = 0.041), LINC00667 (P = 0.045), [54]AC092142.1 (P = 0.015), [55]AC019131.2 (P = 0.017), [56]AC061992.1 (P = 0.050), ST8SIA6-AS1 (P = 0.028), [57]AL513283.1 (P = 0.026), ACP11503.2 (P = 0.026), LINC01615 (P = 0.048), [58]AL445524.1 (P = 0.035), [59]AP000851.2 (P = 0.045), [60]AC108134.3 (P = 0.033), [61]AC009119.1 (P = 0.017), [62]AC036108.3 (P = 0.016), LINC00578 (P = 0.049), U62317.4 (P = 0.008) and [63]AP003306.1 (P = 0.040) (Fig. [64]3A). These 21 lncRNAs were included in the LASSO regression analysis to further obtain the 19 prognostic metabolism-related lncRNAs (Fig. [65]3B, [66]C). Subsequently, the Multivariable COX regression analysis was used to construct a prognostic risk prediction model (Risk score =  [MATH: 1i(coefexp) :MATH] ) (Supplementary Table 5). The risk score of each patient was calculated, and the patients were divided into high-risk group (n = 298) and low-risk group (n = 299) according to the median value of the risk scores. We used the same method to evaluate the prognostic prediction performance of the risk scoring model in both the training set and the validation set (Supplementary materials Table 6). In the training set, the survival curve showed that the OS of the low-risk group was significantly higher than that of the high-risk group (P < 0.001, Fig. [67]4A). The scatter plot and risk curve showed that the risk coefficient and mortality of the low-risk group were lower than those of the high-risk group (Fig. [68]4B, [69]C). The AUC of the ROC curve was 0.761 (Fig. [70]4D). Similar results were observed in the validation set (P < 0.001, Fig. [71]4E, [72]F, [73]G). The AUC of the ROC curve in the validation set was 0.704 (Fig. [74]4H), indicating that the prediction results of the prediction model were highly reliable. Fig. 3. [75]Fig. 3 [76]Open in a new tab Identification of metabolism-related lncRNAs with prognostic value for breast cancer. A The Univariate COX regression forest plot showed that 21 metabolism-related lncRNAs were significantly associated with the OS of breast cancer. B The adjustment parameters of the LASSO regression model. C LASSO coefficient spectrum of prognostic-related lncRNAs Fig. 4. [77]Fig. 4 [78]Open in a new tab Construction of the risk scoring model. A, E Kaplan–Meier's survival analysis of patients with breast cancer in the training set and validation set showed that the prognosis of the high-risk group was significantly worse, and the low-risk group was better. B, F The survival rate and survival status of patients with breast cancer in the training set and the validation set. C, G The distribution of risk scores of patients with breast cancer in the training set and the validation set. D, H The ROC curve of the OS of patients with breast cancer in the training set and the validation set Subgroup survival analysis In order to further evaluate the independent predictive ability of the risk score model, we stratified patients with breast cancer based on common clinical prognostic factors (age, TNM stage, clinical stage, TNBC). Kaplan–Meier survival analysis showed that there were significant differences in the OS of patients in the high-risk group and the low-risk group in multiple subgroups. The OS rate of patients in the high-risk group was relatively low. The log-rank results were as follows: Among the subgroups of older than 65 years old, younger than 65 years old, non-triple negative breast cancer (N-TNBC), T1-2, T3-4, N0, N1-3, M0, stage I-II and stage III-IV, the OS of the low-risk group was significantly higher than that of the high-risk group, and the difference was statistically significant (P < 0.05). However, in the triple negative breast cancer and M1 subgroups, there was no statistical difference in the OS between the high-risk and low-risk groups (Fig. [79]5). These stratified survival analysis results highlight the high predictive accuracy of the risk score model, confirming its utility as an independent prognostic tool across diverse clinical scenarios. Fig. 5. [80]Fig. 5 [81]Open in a new tab Survival curves in each clinical subgroup of high and low risk groups of patients with breast cancer. A, B age. C, D whether it is triple-negative breast cancer. E, F T stage. G, H N stage. I, J M stage. K, L Clinical stage Identification of independent prognostic factors In the Univariate Cox regression analysis, we found that age, T, N, M stage, and risk score were related to the prognosis of patients (P < 0.05) (Fig. [82]6A). The results of Multivariate Cox regression further showed that only age, M stage, clinical stage and risk score were closely related to the prognosis and were considered the independent prognostic factors for patients with breast cancer (Fig. [83]6B). Fig. 6. [84]Fig. 6 [85]Open in a new tab The prognostic value of clinicopathological characteristics and risk scores. A Univariate Cox regression analysis of patients with breast cancer. B Multivariate Cox regression analysis of patients with breast cancer Construction of a nomogram In order to provide better guidance for clinical decision-making, in the total samples, we used “rms” package to integrate the above 4 independent prognostic factors (age, clinical stage, M stage, and risk score), and successfully constructed an individualized nomogram that can be used for predicting the prognosis of the breast cancer (Fig. [86]7A). It can be seen from the figure that the variable that contributes the most to the prognosis prediction was the risk scoring model, followed by stage and M stage, and the variable making the least contribution was age. The C index of the nomogram was 0.800, and the AUC of the 3 year and 5 year OS was 0.757 and 0.709, respectively (Fig. [87]7B). The calibration curve fitted well with the 45° dashed line, indicating that the survival probability predicted by the nomogram was consistent in the actual survival probability calculated by Kaplan–Meier (Fig. [88]7D). The results of the clinical decision curve analysis (DCA) (Fig. [89]7F) showed that the model has good predictive power and can bring benefits to patients. All results were successfully validated in the independent validation set, reaffirming the reliability and applicability of the nomogram (Fig. [90]7C, [91]E, [92]G). Fig. 7. [93]Fig. 7 [94]Open in a new tab Construction and evaluation of the nomogram. A The nomogram for predicting 3 year and 5 year OS in the training set. B, C ROC curves of the training set and the validation set. D, E The nomogram of the calibration chart of the training set and the validation set. F, G DCA clinical decision curve of the training set and the validation set Pathway enrichment analysis In order to explore the role of the above-mentioned prognostic-related lncRNAs in the occurrence and development of breast cancer, we performed KEGG pathway enrichment analysis on the differentially expressed metabolism-related lncRNAs, and the results showed that the differential lncRNAs mainly involved the following pathways: DNA replication, JAK-STAT signaling pathway, MAPK signaling pathway, mTOR signaling pathway, Notch signaling pathway, oxidative phosphorylation, pentose phosphate pathway and proteasome pathway (Fig. [95]8). Fig. 8. [96]Fig. 8 [97]Open in a new tab GSEA pathway enrichment analysis of metabolism-related lncRNAs Analysis of the correlation between prognostic lncRNAs and infiltration of immune cells Finally, based on CIBERSORT analysis, the infiltration level of 22 immune cells in patients with breast cancer was clarified. The results showed that the lncRNAs used to construct the model had a strong correlation with B cells, CD8^+T cells and the polarization of M0 macrophages (Fig. [98]9), suggesting that these lncRNAs may play a pivotal role in modulating tumor growth and progression by affecting the tumor immune infiltration microenvironment. Fig. 9. [99]Fig. 9 [100]Open in a new tab The correlation between the proportion of immune cells based on the CIBERSORT algorithm and the expression of 19 lncRNAs used in the risk scoring model Discussion With the progressive advancement in breast cancer research, it has become increasingly evident that breast cancer is a highly genetically heterogeneous malignant tumor, and general clinicopathological evaluations cannot accurately predict the survival prognosis of patients with breast cancer [[101]5]. Therefore, revealing the internal mechanism of the malignant transformation of breast cancer at the molecular level, and finding new tumor markers for early diagnosis, risk assessment and prognosis assessment of breast cancer are crucial for enhancing clinical management and improving patient prognosis. Extensive studies have shown that epigenetics plays an important role in the occurrence, development and prognosis of diseases [[102]14–[103]16]. Among the key epigenetic regulators, the aberrant transcription of lncRNAs is recognized as one of the most common transcriptional changes in malignant tumors [[104]9]. This type of non-coding RNA, which was originally identified as “junk sequence” of the genome, has been found to be involved in the malignant progression of breast cancer in terms of epigenetics, gene transcription, and post-translational modification in recent years [[105]17, [106]18]. In addition, the specific expression patterns of lncRNAs in breast cancer suggest their potential as promising tumor biomarkers with significant clinical utility [[107]19–[108]21]. Researchers have found that lncRNA HOTAIR [[109]22], lncRNA NEAT1 [[110]21], OSTN-AS1 [[111]23], MEG3 [[112]24] and other metabolic lncRNAs are significantly related to postoperative metastasis, recurrence and overall survival of patients with breast cancer. However, the predictive ability of a single lncRNA is relatively limited. If lncRNAs can be screened from the transcriptome and integrated those closely related to the prognosis of breast cancer, the specificity and sensitivity of patient prognosis prediction will be greatly improved. In this study, through the transcriptome sequencing data and clinical prognosis information of patients with breast cancer in TCGA, a comprehensive analysis of 19 metabolic lncRNAs related to the prognosis of patients with breast cancer was carried out, and a prognostic model was constructed to better predict their survival and prognosis. Notably, the prognostic risk score generated by this model was confirmed to be an independent risk factor for the prognosis of patients with breast cancer. Previous studies have shown that LINC01614 is not only highly expressed in breast cancer tissues, but also in multiple molecular subgroups such as ER + , PR + and HER2 + . It is also positively correlated with TGF-β1, CDH1 signaling and cell adhesion signaling and other genomes [[113]25]. MIR193BHG has been convinced as a biomarker for ovarian cancer and lung adenocarcinoma [[114]26, [115]27]. SNHG26 is seen as playing a key role in the development of bladder cancer and rectal cancer [[116]27, [117]28]. Wu et al. reported that LINC00667 can promote the growth of lung cancer cells via LINC00667/miR-143-3p/RRM2 signal pathway [[118]29]. Bioinformatics analysis revealed that [119]AC061992.1, LINC01615, [120]AL445524.1, [121]AP000851.2, LINC00578, U62317.4 were related to tumor prognosis [[122]30–[123]34]. We detected that ST8SIA6-AS1 was highly expressed in liver cancer, and regulated HOXB6 by the uptake of miR-5195-3p to promote liver cancer [[124]35]. However, [125]AC021087.3, [126]AC092142.1, [127]AC019131.2, [128]AL513283.1, ACP11503.2, [129]AC108134.3, [130]AC009119.1, [131]AC036108.3 and [132]AP003306.1 don’t have relevant studies, whose mechanism in breast cancer still remains unclear, which needs further study. Through the pathway enrichment analysis of the differentially expressed metabolism-related lncRNAs, we found that it was mainly related to DNA replication, JAK-STAT signaling pathway, MAPK signaling pathway, mTOR signaling pathway, Notch signaling pathway, oxidative phosphorylation, pentose phosphate pathway, and proteasome pathway. Studies have shown that cancer-promoting lncRNA SNHG26 can promote resistance to the breast cancer drug Adriamycin, activating the downstream JAK-STAT molecular signaling pathway through feedback signal loop and RPA1 activation, thereby affecting sensitivity to adriamycin. In addition, it is found that lncRNA GAS5 has low expression in breast cancer tissues, which inhibits cell proliferation and induces apoptosis by regulating the PI3K/AKT/mTOR signaling pathway of triple-negative breast cancer, so as to inhibit tumor progression [[133]36]. Aberrant expression of lncRNAs has been shown to affect the occurrence, development and prognosis of a variety of immune system diseases. In order to further reveal the role of prognostic-related lncRNAs signaling in breast cancer, we analyzed the correlation between lncRNAs in the expression matrix of breast cancer and the infiltration of 22 immune cells based on the CIBERSORT algorithm. The results showed that the prognostic-related lncRNAs were strongly correlated with Tregs, follicular-like T cells, CD8^+T cells, activated memory CD4^+T cells, resting mast cells and the polarization of M2 macrophages. Huang et al. found that in the microenvironment of breast cancer and lung cancer, tumor-specific cytotoxic T lymphocytes and TH1 cells were more sensitive to activation induced cell death (AICD) than TH2 cells and regulatory T cells. lncRNA NKILA regulates the sensitivity of T cells to AICD by inhibiting the activity of NF-kB [[134]37]. These results indirectly suggest that these metabolic-related lncRNAs may affect the prognosis of breast cancer by modulating the tumor immune microenvironment. However, the precise regulatory mechanism remains to be studied in depth. Limitations Despite the valuable insights provided by our study, several limitations should be acknowledged. First, this is a retrospective study using data from the TCGA database, which lacked critical clinical information, such as smoking history and treatment regimens, potentially limiting the comprehensiveness of our findings. Secondly, this study only involves database data without experimental verification, and further research is needed in the future. Conclusion In this study, we systematically screened and established a risk scoring model composed of 19 metabolism-related lncRNAs that were closely related to the prognosis of breast cancer. In addition, pathway enrichment analysis found that prognostic-related lncRNAs may participate in the occurrence and development of breast cancer by affecting DNA replication, oxidative phosphorylation, and tumor immune microenvironment. These findings provide novel insights and potential therapeutic targets for improving breast cancer treatment strategies.. Supplementary Information [135]12672_2025_2178_MOESM1_ESM.xls^ (21KB, xls) Additional file 1. Differentially expressed metabolism-related genes between cancer and paracancer [136]12672_2025_2178_MOESM2_ESM.xls^ (223.2KB, xls) Additional file 2. Differentially expressed lncRNAs between cancer and paracancer [137]12672_2025_2178_MOESM3_ESM.xlsx^ (63.1KB, xlsx) Additional file 3. Results of Pearson correlation analysis [138]12672_2025_2178_MOESM4_ESM.xlsx^ (16.7KB, xlsx) Additional file 4. Metabolism-related lncRNAs was obtained by Pearson correlation analysis [139]12672_2025_2178_MOESM5_ESM.xlsx^ (11.1KB, xlsx) Additional file 5. The lncRNAs used to construct a prognostic risk prediction model [140]12672_2025_2178_MOESM6_ESM.docx^ (18.7KB, docx) Additional file 6. Baseline of training and validation sets Acknowledgements