Abstract Early diagnosis and treatment of bladder cancer are crucial, and since inflammation plays a role in all stages of bladder cancer, this study aims to develop a model based on inflammation-related genes to accurately predict patient prognosis. The data were initially processed through differential analysis and prognostic correlation analysis, then a Least absolute shrinkage and selection operator (LASSO) regression model was constructed by M-cohort and a nomogram was designed to increase the model readability. The T-cohort was used for internal validation, with the [36]GSE32894 and Imvigor210 cohorts used as external data to verify the model’s accuracy. The model’s predictive ability was verified for the prognosis of patients of different ages, gender, tumor stage, and tumour grade. The [37]GSE3167, [38]GSE13507 and GeneExpression Profiling Interactive Analysis (GEPIA) datasets and Human Protein Atlas (HPA) database were used to verify the expression of the inflammation-related genes, which were confirmed by real-time Polymerase Chain Reaction (PCR). A comprehensive analysis of the model’s inflammation-related genes, Gene Set Enrichment Analysis (GSEA), Gene Set Variation Analysis (GSVA) enrichment analysis, and immune-related analysis were also performed. Both internal and external data validations confirmed that the developed model can accurately predict the prognosis across different patient populations. Hierarchical validation results demonstrated that the model’s predictive power is reliable for various patient stratifications. The expression of inflammation-related genes was consistent across The Cancer Genome Atlas (TCGA) database, [39]GSE3167 dataset, [40]GSE13507 dataset, Gene Expression Profiling Interactive Analysis (GEPIA) database, and the Human Protein Atlas (HPA) database, and was further validated by real-time Polymerase Chain Reaction (PCR). Pathway enrichment analysis indicated that patients in the high-risk (H-risk) group exhibited a variety of tumors. Meanwhile, patients in the low-risk (L-risk) group may be candidates for immunotherapy, whereas those in the high-risk group are more likely to benefit from chemotherapy. The model of inflammation-related genes can accurately predict bladder cancer patient prognosis, with MEST, FASN, KRT6B, and RGS2 anticipated to become new prognostic bladder cancer markers. Keywords: Bladder cancer, Bioinformatics, PCR, Inflammation, Prognostic model Subject terms: Cancer, Computational biology and bioinformatics Introduction Bladder cancer is a common tumour of the urinary system^[41]1,[42]2 and is ranked among the top ten cancers globally^[43]3. It is a distinctive tumour with a higher incidence in men in developed regions^[44]4 and the number of new cases increasing worldwide annually^[45]5,[46]6. Although most bladder cancers are non-muscle-invasive, the prevalence of bladder cancer has remained persistent due to its high recurrence rate^[47]7. Once muscular invasion occurs, bladder cancer easily metastasizes and has a poor prognosis^[48]8. At present, early bladder cancer can only be detected with some invasive tests but the development of bioinformatics and prognostic models are advantageous for predicting the prognosis of bladder cancer patients in the early stage of the disease^[49]9. Approximately 25% of bladder cancer patients require systemic chemotherapy and/or immunotherapy, curative therapy, or palliative care^[50]10, therefore, there is an urgent need to develop a model to predict patient prognosis for early intervention. The tumour microenvironment comprises numerous stromal tissues, inflammatory cells, and inflammatory mediators^[51]11 and inflammation can induce tumorigenesis, with tumour sites initiating an inflammatory response that accelerates tumour progression^[52]11,[53]12. For example, Helicobacter pylori infection is closely related to stomach cancer, schistosomiasis infection often leads to bladder cancer, and colitis may eventually lead to colon cancer^[54]13,[55]14. Anti-inflammatory therapy can reduce tumour mortality^[56]15, so in recent years, the relationship between chronic inflammation and tumours has also received widespread attention^[57]16. Related studies found that JAK2/STAT3 pathway regulation can affect immune evasion in hepatocellular carcinoma^[58]17 and long noncoding RNAs associated with inflammation aid in the typing of bladder cancer^[59]18. Fibroblasts in anti-tumour immunity are also associated with inflammation. The inflammatory environment before surgery for bladder cancer is closely related to the prognosis after surgery^[60]19, with many studies on the inflammatory characteristics of bladder cancer showing that bladder cancer is inextricably linked to inflammation^[61]12,[62]18. There is no doubt that tumor development is closely linked to inflammation. This study aimed to develop a model based on inflammation-related genes to accurately predict the prognosis of bladder cancer patients. To enhance the model’s interpretability, a nomogram was constructed, and its reliability was validated using multiple datasets. Furthermore, the expression of inflammation-related genes was confirmed. A comprehensive analysis of these genes identified novel biomarkers for bladder cancer. We present this article in accordance with the TRIPOD reporting checklist. Methods Data acquisition and preliminary analysis The study process is illustrated in Fig. [63]1. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Inflammation-related genes used in this study were sourced from the GeneCards database ([64]https://www.genecards.org/). Transcriptome data from 412 bladder cancer patients and 19 normal samples were obtained from The Cancer Genome Atlas (TCGA) ([65]https://portal.gdc.cancer.gov/) database to identify a gene signature in bladder cancer patients. Additionally, copy number data and mutation data necessary for the analysis were acquired from the TCGA database. Fig. 1. [66]Fig. 1 [67]Open in a new tab Flowchart of this study. HPA, Human Protein Atlas; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; GSE, Gene Series; GEPIA, Gene Expression Profiling Interactive Analysis; M-cohort, Model cohort; T-cohort, Test cohort; LASSO, Least absolute shrinkage and selection operator; PCR, Polymerase Chain Reaction. IMvigor210 is an open-label, multicenter, single-arm phase II clinical study, and the Imvigor210 cohort was used as external data to verify the model’s accuracy. Furthermore, the Gene Expression Omnibus (GEO) ([68]https://www.ncbi.nlm.nih.gov/geo/) database was searched using ‘bladder cancer’, and datasets with complete survival information and a sample size of no less than 30 were selected for model construction. The [69]GSE32894 dataset from the GEO database was used for external validation, including 308 bladder cancer samples, of which 224 samples contained complete follow-up information. Any data with incomplete survival information were excluded from all cohorts. The [70]GSE3167 and [71]GSE13507 datasets were also obtained from the GEO database and used to validate differential gene expression patterns. The [72]GSE3167 dataset consisted of 60 samples, including 46 tumor samples and 14 normal samples, while the [73]GSE13507 dataset comprised 68 normal samples and 188 tumor samples. Bladder cancer-related data from the Gene Expression Profiling Interactive Analysis (GEPIA) ([74]http://gepia.cancer-pku.cn/) database was used to further validate differential gene expression patterns. Additionally, protein expression of the inflammation-related genes was examined through the Human Protein Atlas (HPA) ([75]https://www.proteinatlas.org/) database. Preliminary data analysis using R software identified differentially expressed inflammation-related genes, which were then normalized using the SVA (Surrogate Variable Analysis) software package to remove batch effects (logFC = 1, P = 0.05). In total, 790 differentially expressed inflammation-related genes associated with bladder cancer prognosis were identified, which were subsequently screened to obtain signatures of inflammation-related genes in bladder cancer patients (P = 0.05). The LASSO regression model and nomogram construction The caret package was used to randomly divide the TCGA cohort into two groups: the M-cohort (Model cohort) and the T-cohort (Test cohort). The M-cohort was utilized to build the LASSO regression model, which offers advantages over traditional linear regression models due to its parameter selection and shrinkage capabilities. An optimized model was developed by methodically compressing less important parameters, and cross-validation was then performed to select the most appropriate lambda. This optimal lambda was used to create the final model, which assigned a score to each standardized sample. All samples were divided into two groups based on the median M-cohort sample score: the H-risk (high risk) group, with scores above the median, and the L-risk (low risk) group, with scores below the median. A nomogram was constructed by combining model scores with common clinical indicators to enhance the interpretability of the model’s predictions. Principal Component Analysis (PCA) was performed to assess whether the model could effectively distinguish between patients in the H-risk and L-risk groups. Finally, univariate and multivariate independent prognostic analyses were conducted to determine whether the risk scores and nomogram could serve as independent prognostic indicators, separate from other factors. The model’s predictive performance was validated using survival analysis, the area under the Receiver Operating Characteristic (ROC) curve, and survival status distribution plots of M-cohort patients. Verification of the model accuracy with TCGA internal data The T-cohort (Test cohort) samples were used for internal validation of the model. These samples were divided into two groups based on the median M-cohort (Model cohort) sample score for survival analysis and other operations to verify the model’s ability to predict patient prognosis. The accuracy of the nomogram was assessed by comparing the predicted results with the actual outcomes. The Area Under the Curve (AUC) was also used to evaluate the prediction accuracy of the nomogram. Decision Curve Analysis (DCA) is a widely used method for assessing clinical predictive models and offers unique advantages over the Receiver Operating Characteristic (ROC) curve area. DCA can incorporate the preferences of decision-makers into the analysis, making it a more