Abstract

   Early diagnosis and treatment of bladder cancer are crucial, and since
   inflammation plays a role in all stages of bladder cancer, this study
   aims to develop a model based on inflammation-related genes to
   accurately predict patient prognosis. The data were initially processed
   through differential analysis and prognostic correlation analysis, then
   a Least absolute shrinkage and selection operator (LASSO) regression
   model was constructed by M-cohort and a nomogram was designed to
   increase the model readability. The T-cohort was used for internal
   validation, with the [36]GSE32894 and Imvigor210 cohorts used as
   external data to verify the model’s accuracy. The model’s predictive
   ability was verified for the prognosis of patients of different ages,
   gender, tumor stage, and tumour grade. The [37]GSE3167, [38]GSE13507
   and GeneExpression Profiling Interactive Analysis (GEPIA) datasets and
   Human Protein Atlas (HPA) database were used to verify the expression
   of the inflammation-related genes, which were confirmed by real-time
   Polymerase Chain Reaction (PCR). A comprehensive analysis of the
   model’s inflammation-related genes, Gene Set Enrichment Analysis
   (GSEA), Gene Set Variation Analysis (GSVA) enrichment analysis, and
   immune-related analysis were also performed. Both internal and external
   data validations confirmed that the developed model can accurately
   predict the prognosis across different patient populations.
   Hierarchical validation results demonstrated that the model’s
   predictive power is reliable for various patient stratifications. The
   expression of inflammation-related genes was consistent across The
   Cancer Genome Atlas (TCGA) database, [39]GSE3167 dataset, [40]GSE13507
   dataset, Gene Expression Profiling Interactive Analysis (GEPIA)
   database, and the Human Protein Atlas (HPA) database, and was further
   validated by real-time Polymerase Chain Reaction (PCR). Pathway
   enrichment analysis indicated that patients in the high-risk (H-risk)
   group exhibited a variety of tumors. Meanwhile, patients in the
   low-risk (L-risk) group may be candidates for immunotherapy, whereas
   those in the high-risk group are more likely to benefit from
   chemotherapy. The model of inflammation-related genes can accurately
   predict bladder cancer patient prognosis, with MEST, FASN, KRT6B, and
   RGS2 anticipated to become new prognostic bladder cancer markers.

   Keywords: Bladder cancer, Bioinformatics, PCR, Inflammation, Prognostic
   model

   Subject terms: Cancer, Computational biology and bioinformatics

Introduction

   Bladder cancer is a common tumour of the urinary system^[41]1,[42]2 and
   is ranked among the top ten cancers globally^[43]3. It is a distinctive
   tumour with a higher incidence in men in developed regions^[44]4 and
   the number of new cases increasing worldwide annually^[45]5,[46]6.
   Although most bladder cancers are non-muscle-invasive, the prevalence
   of bladder cancer has remained persistent due to its high recurrence
   rate^[47]7. Once muscular invasion occurs, bladder cancer easily
   metastasizes and has a poor prognosis^[48]8. At present, early bladder
   cancer can only be detected with some invasive tests but the
   development of bioinformatics and prognostic models are advantageous
   for predicting the prognosis of bladder cancer patients in the early
   stage of the disease^[49]9. Approximately 25% of bladder cancer
   patients require systemic chemotherapy and/or immunotherapy, curative
   therapy, or palliative care^[50]10, therefore, there is an urgent need
   to develop a model to predict patient prognosis for early intervention.

   The tumour microenvironment comprises numerous stromal tissues,
   inflammatory cells, and inflammatory mediators^[51]11 and inflammation
   can induce tumorigenesis, with tumour sites initiating an inflammatory
   response that accelerates tumour progression^[52]11,[53]12. For
   example, Helicobacter pylori infection is closely related to stomach
   cancer, schistosomiasis infection often leads to bladder cancer, and
   colitis may eventually lead to colon cancer^[54]13,[55]14.
   Anti-inflammatory therapy can reduce tumour mortality^[56]15, so in
   recent years, the relationship between chronic inflammation and tumours
   has also received widespread attention^[57]16. Related studies found
   that JAK2/STAT3 pathway regulation can affect immune evasion in
   hepatocellular carcinoma^[58]17 and long noncoding RNAs associated with
   inflammation aid in the typing of bladder cancer^[59]18. Fibroblasts in
   anti-tumour immunity are also associated with inflammation. The
   inflammatory environment before surgery for bladder cancer is closely
   related to the prognosis after surgery^[60]19, with many studies on the
   inflammatory characteristics of bladder cancer showing that bladder
   cancer is inextricably linked to inflammation^[61]12,[62]18.

   There is no doubt that tumor development is closely linked to
   inflammation. This study aimed to develop a model based on
   inflammation-related genes to accurately predict the prognosis of
   bladder cancer patients. To enhance the model’s interpretability, a
   nomogram was constructed, and its reliability was validated using
   multiple datasets. Furthermore, the expression of inflammation-related
   genes was confirmed. A comprehensive analysis of these genes identified
   novel biomarkers for bladder cancer.

   We present this article in accordance with the TRIPOD reporting
   checklist.

Methods

Data acquisition and preliminary analysis

   The study process is illustrated in Fig. [63]1. The study was conducted
   in accordance with the Declaration of Helsinki (as revised in 2013).
   Inflammation-related genes used in this study were sourced from the
   GeneCards database ([64]https://www.genecards.org/). Transcriptome data
   from 412 bladder cancer patients and 19 normal samples were obtained
   from The Cancer Genome Atlas (TCGA)
   ([65]https://portal.gdc.cancer.gov/) database to identify a gene
   signature in bladder cancer patients. Additionally, copy number data
   and mutation data necessary for the analysis were acquired from the
   TCGA database.

Fig. 1.

   [66]Fig. 1
   [67]Open in a new tab

   Flowchart of this study. HPA, Human Protein Atlas; TCGA, The Cancer
   Genome Atlas; GEO, Gene Expression Omnibus; GSE, Gene Series; GEPIA,
   Gene Expression Profiling Interactive Analysis; M-cohort, Model cohort;
   T-cohort, Test cohort; LASSO, Least absolute shrinkage and selection
   operator; PCR, Polymerase Chain Reaction.

   IMvigor210 is an open-label, multicenter, single-arm phase II clinical
   study, and the Imvigor210 cohort was used as external data to verify
   the model’s accuracy. Furthermore, the Gene Expression Omnibus (GEO)
   ([68]https://www.ncbi.nlm.nih.gov/geo/) database was searched using
   ‘bladder cancer’, and datasets with complete survival information and a
   sample size of no less than 30 were selected for model construction.
   The [69]GSE32894 dataset from the GEO database was used for external
   validation, including 308 bladder cancer samples, of which 224 samples
   contained complete follow-up information. Any data with incomplete
   survival information were excluded from all cohorts.

   The [70]GSE3167 and [71]GSE13507 datasets were also obtained from the
   GEO database and used to validate differential gene expression
   patterns. The [72]GSE3167 dataset consisted of 60 samples, including 46
   tumor samples and 14 normal samples, while the [73]GSE13507 dataset
   comprised 68 normal samples and 188 tumor samples. Bladder
   cancer-related data from the Gene Expression Profiling Interactive
   Analysis (GEPIA) ([74]http://gepia.cancer-pku.cn/) database was used to
   further validate differential gene expression patterns. Additionally,
   protein expression of the inflammation-related genes was examined
   through the Human Protein Atlas (HPA)
   ([75]https://www.proteinatlas.org/) database.

   Preliminary data analysis using R software identified differentially
   expressed inflammation-related genes, which were then normalized using
   the SVA (Surrogate Variable Analysis) software package to remove batch
   effects (logFC = 1, P = 0.05). In total, 790 differentially expressed
   inflammation-related genes associated with bladder cancer prognosis
   were identified, which were subsequently screened to obtain signatures
   of inflammation-related genes in bladder cancer patients (P = 0.05).

The LASSO regression model and nomogram construction

   The caret package was used to randomly divide the TCGA cohort into two
   groups: the M-cohort (Model cohort) and the T-cohort (Test cohort). The
   M-cohort was utilized to build the LASSO regression model, which offers
   advantages over traditional linear regression models due to its
   parameter selection and shrinkage capabilities. An optimized model was
   developed by methodically compressing less important parameters, and
   cross-validation was then performed to select the most appropriate
   lambda. This optimal lambda was used to create the final model, which
   assigned a score to each standardized sample.

   All samples were divided into two groups based on the median M-cohort
   sample score: the H-risk (high risk) group, with scores above the
   median, and the L-risk (low risk) group, with scores below the median.
   A nomogram was constructed by combining model scores with common
   clinical indicators to enhance the interpretability of the model’s
   predictions. Principal Component Analysis (PCA) was performed to assess
   whether the model could effectively distinguish between patients in the
   H-risk and L-risk groups.

   Finally, univariate and multivariate independent prognostic analyses
   were conducted to determine whether the risk scores and nomogram could
   serve as independent prognostic indicators, separate from other
   factors. The model’s predictive performance was validated using
   survival analysis, the area under the Receiver Operating Characteristic
   (ROC) curve, and survival status distribution plots of M-cohort
   patients.

Verification of the model accuracy with TCGA internal data

   The T-cohort (Test cohort) samples were used for internal validation of
   the model. These samples were divided into two groups based on the
   median M-cohort (Model cohort) sample score for survival analysis and
   other operations to verify the model’s ability to predict patient
   prognosis. The accuracy of the nomogram was assessed by comparing the
   predicted results with the actual outcomes. The Area Under the Curve
   (AUC) was also used to evaluate the prediction accuracy of the
   nomogram.

   Decision Curve Analysis (DCA) is a widely used method for assessing
   clinical predictive models and offers unique advantages over the
   Receiver Operating Characteristic (ROC) curve area. DCA can incorporate
   the preferences of decision-makers into the analysis, making it a more