Abstract
Background
Bladder cancer (BLCA) is the ninth most common cancer globally, as well
as the fourth most common cancer in men, with an incidence of 7%.
However, few effective prognostic biomarkers or models of BLCA are
available at present.
Methods
The prognostic genes of BLCA were screened from one cohort of The
Cancer Genome Atlas (TCGA) database through univariate Cox regression
analysis and functionally annotated by Kyoto Encyclopedia of Genes and
Genomes (KEGG) pathway enrichment analysis. The intersecting genes of
the BLCA gene set and focal adhesion-related gene were obtained and
subjected to the least absolute shrinkage and selection operator
regression (LASSO) to construct a prognostic model. Gene set enrichment
analysis (GSEA) of high- and low-risk patients was performed to explore
further the biological process related to focal adhesion genes.
Univariate and multivariate Cox analysis, receiver operating
characteristic (ROC) curve analysis, and Kaplan–Meier survival analysis
(KM) were used to evaluate the prognostic model. DNA methylation
analysis was presented to explore the relationship between prognosis
and gene methylation. Furthermore, immune cell infiltration was
assessed by CIBERSORT, ESTIMATE, and TIMER. The model was verified in
an external [33]GSE32894 cohort of the Gene Expression Omnibus (GEO)
database, and the Prognoscan database presented further validation of
genes. The HPA database validated the related protein level, and
functional experiments verified significant risk factors in the model.
Results
VCL, COL6A1, RAC3, PDGFD, JUN, LAMA2, and ITGB6 were used to construct
a prognostic model in the TCGA-BLCA cohort and validated in the
[34]GSE32894 cohort. The 7-gene model successfully stratified the
patients into both cohorts’ high- and low-risk groups. The higher risk
score was associated with a worse prognosis.
Conclusions
The 7-gene prognostic model can classify BLCA patients into high- and
low-risk groups based on the risk score and predict the overall
survival, which may aid clinical decision-making.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12885-022-10264-5.
Keywords: Bladder cancer, Computational biology, Risk signature, Immune
infiltration, Focal adhesion
Background
According to Cancer Statistics 2021, published by American Cancer
Society, bladder cancer (BLCA) is the ninth most commonly diagnosed
cancer globally and is the fourth most common malignancy in men
[[35]1]. Over 570,000 new cases of BLCA and 210,000 deaths were
recorded in 2020 alone, indicating poor prognosis [[36]2]. Men are at
four times the risk of developing BLCA than women [[37]3]. The
significant risk factors of BLCA are advanced age (between 70 and
84 years) and cigarette smoking. In fact, approximately 50% of BLCA
patients are smokers [[38]4]. Furthermore, almost 3/4^th of the
diagnosed cases are non-muscular invasive bladder cancer (NMIBC), often
treated with transurethral resection of bladder tumors (TRUBT) and
intravesical therapy. Muscular invasive bladder cancer (MIBC) is
relatively rare and is generally treated by radical cystectomy and
neoadjuvant chemotherapy [[39]5, [40]6]. Although pathological biopsies
and cystoscopies are routinely used to detect BLCA, these methods are
invasive and inconvenient. Although several urine biomarkers of BLCA
have been confirmed by the US Food and Drug Administration (FDA), they
lack the diagnostic accuracy to replace cystoscopy [[41]4]. Therefore,
this study aimed to identify novel, effective diagnostic biomarkers of
BLCA.
Focal adhesion (FA) is a group of macromolecular proteins that connect
the ends of specialized actin fibers to the extracellular matrix (ECM)
and enable cell migration, which is critical to the process of tumor
metastasis [[42]7]. FAs are frequently downregulated during cancer
metastasis, although some FA components are upregulated in some
invasive tumors [[43]8]. Thus, FAs are increasingly being considered
therapeutic targets of cancer.
Analysis of gene expression datasets from The Cancer Genome Atlas
(TCGA) and Gene Expression Omnibus (GEO) have helped identify
prognostic gene signatures of various cancers. For instance, a
predictive model consisting of hypoxia gene signatures was constructed
for BLCA based on TCGA and GEO databases [[44]9]. In addition, a risk
score model of epithelial-mesenchymal transition (EMT)-related gene
signature was also developed to predict BLCA prognosis based on the two
databases [[45]10]. A recent study established an 11-gene prognostic
signature of BLCA based on five cohorts from TGCA and GEO [[46]11].
However, the prognostic value of FAs has not been ascertained by any
study so far. Therefore, this study aimed to explore the relationship
between FAs and BLCA prognosis using the bioinformatics approach and
establish a predictive model based on the risk score.
Materials and methods
Data source and processing
TCGA and GEO databases were screened for BLCA cohorts with a sample
size > 150, including clinical data such as overall survival duration,
survival status, gender, age, histological grade, pathological stage,
TNM stage, and lymphatic stage metastasis. The GEO cohorts were further
screened based on additional requirements for the verification set. The
gene expression matrix dataset (HTSeq-FPKM) of bladder cancer (n = 430)
was downloaded from TCGA on UCSC Xena, and the clinical data were
obtained from cBioPortal ([47]http://www.cbioportal.org). The external
[48]GSE32894 cohort (n = 308) with expression matrix and clinical data
[[49]12] was acquired from the GEO database.
Selection of FA-related differentially expressed genes (DEGs)
TCGA-BLCA cohort was set as the training set, and prognostically
relevant genes were screened using the univariate Cox analysis with a
p-value < 0.01 as the criterion. R software package “limma” was used to
identify DEGs between the BLCA and normal bladder samples in the same
cohort from these selected genes [[50]13]. The threshold was set as
|log (fold change) |> 1, and the adjusted p-value < 0.01. The
significant DEGs related to BLCA prognosis intersected with 199
FA-related genes obtained from the Molecular Signatures Database
(MSigDB) of GSEA using keywords KEGG_FOCAL_ADHESION [[51]14] using a
Venn diagram.
Establishment of the predicting model related to risk score
A prognostic model was constructed with the intersecting genes
identified as above by LASSO regression using the R software packages
“glmnet” [[52]15] and “survival”. The “CV.glmnet” function can randomly
simulate 1000 times for k-fold cross-validation (k = 10). The dataset
was automatically divided into 10 equal portions in the tenfold
cross-validation. One random part was selected as the validation set,
and the remaining 9 parts as training sets. The deviance of the 10
tests was used to evaluate the accuracy of the tenfold CV, and minimum
deviance indicated the best performance of the model. The regression
coefficients of individual genes were determined, and genes with
regression coefficient approaching 0 with the increase in Lambda were
excluded. The remaining candidate genes were used to construct the
model, and the risk score of each patient in the TGCA-BLCA cohort was
calculated as ∑
[MATH: 7i :MATH]
(
[MATH: mi :MATH]
·
[MATH: ni :MATH]
), where 7 is the number of candidate genes,
[MATH: mi :MATH]
is the gene expression value and
[MATH: ni :MATH]
is regression coefficient.
KEGG enrichment analysis
The latest KEGG Pathway gene annotations were obtained through KEGG
rest API ( [53]https://www.kegg.jp/kegg/rest/keggapi.html) in the KEGG
official database [[54]16]. KEGG pathway enrichment analysis was then
performed using the R software package “clusterProfiler” through an
online tool called Sangerbox ([55]http://www.sangerbox.com/tool). The
threshold for statistical significance were p < 0.05 and FDR of < 0.1.
Gene set enrichment analysis (GSEA)
GSEA software and predefined gene set files were downloaded from
[56]https://www.gsea-msigdb.org, and the samples were divided into
high- and low-risk groups based on the risk score. The number of
permutations was set as 1000.
Univariate and multivariate Cox regression analyses
The R software package "survival" was used for univariate and
multivariate Cox regression analysis of the risk score and
clinicopathological factors, including age, gender, pathological stage,
T stage, histological grade, lymphatic metastasis, and angio-lymphatic
invasion. Only the statistically significant factors in the univariate
Cox analysis were included in the multivariate Cox model.
ROC analysis
The R software package “pROC” was used for receiver operating
characteristic (ROC) analysis. The area under the curve (AUC) was
obtained, and the confidence interval was evaluated. The cut-off values
for 1-, 3- and 5-year overall survival (OS) were calculated.
Kaplan–Meier analysis
The R software package “Survival” was used to integrate the OS rate and
duration with the gene expression data of both TCGA-BLCA and
[57]GSE32894 cohorts. The prognostic significance of each gene was
evaluated by the Cox method. The patients were divided into the high-
and low-risk groups using the cut-off value of 3-year OS. For the
subgroups based on clinical variables and the expression levels of the
7 candidate genes, the best cut-off value for the risk score was
calculated using the R software package “Maxstat”. The minimum sample
size was set at > 25%, and the maximum sample size at < 75%, and the
patients were divided into high- and low-risk groups.
DNA methylation analysis
It is believed that DNA methylation is responsible for influencing
prognosis in cancer development. An online tool MethSurv
([58]https://biit.cs.ut.ee/methsurv/) was used to explore the
prognostic patterns of single CpG methylation of the 7 genes in bladder
cancer [[59]17]. Only the most significant prognostic p-values were
selected (likelihood ratio (LR) test p-value).
Immune environment evaluation
The R packages “CIBERSORT” (used to calculate the cell composition as a
function of gene expression profile) and “ESTIMATE” (used to calculate
the fraction of stromal and immune cells according to gene expression
level) were used to calculate the number of infiltrating immune cells,
immune score, stromal score and tumor purity in each patient from
TCGA-BLCA cohort [[60]18, [61]19]. Twenty-two immune cell genotypes
were obtained by combining CIBERSORT with LM22, a gene matrix
downloaded from the CIBERSORT website
([62]https://cibersort.stanford.edu/), within the R software. The
differences between the risk groups were analyzed, and the immune score
and risk score were combined for survival analysis. Besides, the TIMER
platform ([63]http://timer.cistrome.org/) was also used to verify the
immune infiltration analysis completed by CIBERSORT. The “gene module”
of immune association was presented to evaluate the correlation between
immune cells and every 7 genes in the prognostic model [[64]20].
Verification of the prognostic model
The accuracy of the prognostic model was tested on the external
[65]GSE32894 dataset. Besides, the Prognoscan database
([66]www.prognoscan.org) was also applied to validate further the
correlation between gene expression and overall survival time [[67]21],
where [68]GSE5287 and [69]GSE13507 were utilized. The protein
expression of individual genes in the model in cancer and normal
tissues was also observed in the Human Protein Atlas (HPA) database
([70]http://www.proteinatlas.org/), so as further to validate the genes
in our model [[71]22].
Cell culture and small interfering RNA (siRNA) transfection
The human BLCA cell line T24 was used in this study, purchased from the
Chinese Academy of Sciences cell bank. T24 was cultured in RPMI-1640
medium (Procell) with 10% fetal bovine serum. The sequence of siRNA
targeting COL6A1 and LAMA2 purchased from JTSBIO Co., were listed in
Supplementary Table [72]5.
RNA extraction and quantitative real-time PCR (qRT-PCR)
Extraction of RNA was performed with RNAiso Plus (Takara). Prime Script
RT Master Mix (Takara) was used to reverse transcription then cDNA was
produced. The SYBR kit (Takara) was used to perform qRT-PCR. The
relative expression of the gene was calculated by the 2^−ΔΔCt method.
The primer sequences targeting COL6A1, LAMA2 and GAPDH were listed in
Supplementary Table [73]6.
Wound-healing assay
The BLCA cells were seeded in a six-well plate. When the density
reached more than 90%, a straight line was drawn with a 200-ul tip.
Cultivation of cells was continued with a low-serum medium containing
3% serum. Photographs were taken at 0 h and 48 h, and then the speed of
scratch healing was compared between the different groups.
Transwell assay
Six hundred ul of medium containing 10% serum was added to the lower
chamber of a 24-well plate. Each 200 ul BLCA cell suspension was
inoculated in the upper chamber. Transwell chambers with 8-μm-pore were
used for cell migration assay. Following incubation for 24 h, cells
beneath the membrane were stained with crystal violet, and cells above
the membrane were washed off and imaged by microscopy.
Statistical analysis
All the statistical analysis was completed by software R. The Logrank
test was used to assess the significance of prognostic differences
between different groups in the Kaplan–Meier analysis. The
Kruskal–Wallis rank sum test was used in multiple groups comparisons of
clinical sub-group analysis. Univariate analysis and multivariate
analysis were performed using Cox regression analysis with the R
package “survival”. The R package “limma” was used to identify DEGs
between the tumor and normal samples in the same TCGA-BLCA cohort from
these selected genes. The R package “glmnet” was used in LASSO
regression to establish the predicting model. A p-value < 0.05 was
considered as statistically significant.
Results
Data extraction
The gene expression and clinical data of BLCA samples were retrieved
from TCGA (n = 430) and [74]GSE32894 (n = 308). After filtering the
data, there were 403 cases in the TCGA-BLCA cohort and 224 cases in the
[75]GSE32894 cohort. The flow chart is shown in Fig. [76]1. Clinical
data regarding age, gender, histological grade, WHO grade, pathological
stage, T stage, lymphatic node metastasis, and angiolymphatic invasion
of the two cohorts are summarized in Table [77]1.
Fig. 1.
Fig. 1
[78]Open in a new tab
Flow chart of the study process
Table 1.
Clinical information table. The major clinical factors related to
prognosis were listed as following
Clinical factors TCGA-BLCA
n = 403% [79]GSE32894
n = 224%
Age
> 60 296 73.45 178 79.46
≤ 60 107 26.55 46 20.54
Gender
Male 298 73,95 163 72.77
Female 105 26.05 61 27.23
Pathological stage
Stage I&II 130 32.26 - -
Stage III 140 34.74 - -
Stage IV 131 32.51 - -
unknown 2 0.49 - -
T stage
≤ T2 121 30.02 216 96.43
T3 191 47.39 7 1.74
T4 57 14.14 1 0.25
unknown 34 8.44 - -
Lymphatic node metastasis
Negative 235 58.31 - -
Positive 125 31.02 - -
Unknown 43 10.67 - -
Angiolymphatic Invasion
Negative 127 31.52 - -
Positive 148 36.72 - -
unknown 128 31.76 - -
Histological grade
High 380 94.53 - -
Low 20 4.98 - -
Vital status
Alive 225 55.83 199 88.84
Dead 178 45.17 25 11.16
Grade (WHO1999)
G1 - - 45 20.09
G2 - - 84 37.50
G3 - - 93 41.52
[80]Open in a new tab
Establishment of a prognostic model based on the risk score
Univariate Cox analysis of the TCGA-BLCA cohort identified 2461 genes
(p < 0.01), of which 274 were differentially expressed between the
tumor and non-tumor samples (Fig. [81]2a). Sixteen DEGs intersected
with FA-related genes (Fig. [82]2b), and were functionally annotated by
KEGG pathway enrichment analysis (Fig. [83]2c). The detailed
information of these genes is listed in Supplementary Table [84]1. The
above DEGs were subjected to Lasso regression, and 7 genes with the
smallest deviance were included in the prognostic model (Figs. [85]3a,
b). The coefficient values and other details of these genes are listed
in Supplementary Table [86]2, and outcomes of univariate regression
analysis are summarized in Supplementary Table [87]3. The risk score
was calculated as VCL * 0.1452—ITGB6 * 0.0832 + COL6A1 * 0.0077 + RAC3
* 0.2404 + PDGFD * 0.0817 + JUN * 0.1192 + LAMA2 * 0.1927. Furthermore,
we performed a matrix correlation analysis to determine any collinear
relationship between these genes. As shown in Fig. [88]3c, apart from
COL6A1 and LAMA2, the co-expression indices of the other gene pairs
were all < 0.5.
Fig. 2.
[89]Fig. 2
[90]Open in a new tab
Identification of Focal Adhesion (FA)-related prognostic genes. a
Volcano plot showing the differentially expressed genes (DEGs) between
the tumor and normal tissue samples in the TCGA-BLCA cohort. The red
dots indicate a higher expression of genes in tumor tissues, and the
green dots indicate a lower expression of genes in tumor tissues. b
Venn diagram showing the intersection of FA-related genes and BLCA
DEGs. c Bubble plot showing the enriched KEGG pathways of DEGs
([91]www.kegg.jp/kegg/kegg1.html). The circle size indicates gene
ratio, and the color refers to the p-value
Fig. 3.
[92]Fig. 3
[93]Open in a new tab
LASSO Cox regression analysis. a Line graph of regression coefficients
and Lambda value. Different colors represented different genes. The
regression coefficients of some genes dropped to 0 with an increase in
Lambda, indicating that they may not affect the model. b The deviance
of the cross-validation. Minimum deviance indicates optimal model
performance. c The co-expression heat map of 11 genes. Red indicates
positive correlation, and green indicates negative correlation
Gene set enrichment analysis (GSEA)
The 403 samples in the TCGA-BLCA cohort were stratified into high- and
low-risk groups based on the 7-gene risk score. GSEA further indicated
that the high-risk group was significantly associated with biosynthesis
of unsaturated fatty acids, tight junction, lysine degradation, and
ubiquitin-mediated proteolysis (p < 0.005; Fig. [94]4).
Fig. 4.
[95]Fig. 4
[96]Open in a new tab
Gene set enrichment analysis (GSEA) of high and low-risk score groups.
a Enrichment plot of KEGG_BIOSYNTHESIS_OF_UNSATURATED_FATTY_ACIDS. b
Enrichment plot of KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS. c Enrichment
plot of KEGG_TIGHT_JUNCTION. (d) Enrichment plot of
KEGG_LYSINE_DEGRADATION
Kaplan–Meier analysis of 7 genes
The patients in the TCGA-BLCA cohort were stratified into high- and
low-expression subgroups for each of the 7 prognostic genes and
subjected to Kaplan–Meier analysis to determine their impact on
survival. These seven genes include COL6A1 (Fig. [97]5a), ITGB6
(Fig. [98]5b), JUN (Fig. [99]5c), LAMA2 (Fig. [100]5d), PDGFD
(Fig. [101]5e), RAC3 (Fig. [102]5f) and VCL (Fig. [103]5g). And the
[104]GSE32894 cohort was divided with the same criterion for
validation. These seven genes include VCL (Fig. [105]6a), COL6A1
(Fig. [106]6b), ITGB6 (Fig. [107]6c), JUN (Fig. [108]6d), LAMA2
(Fig. [109]6e), PDGFD (Fig. [110]6f) and RAC3 (Fig. [111]6g). The
heatmap demonstrated correlation between gene and survival
(Fig. [112]6h), except for JUN and PDGFD in [113]GSE32894; all genes
comprising the prognostic model were significantly associated with the
survival of BLCA patients (p < 0.05). The high expression of ITGB6
indicated a better prognosis in both the training and validation sets.
In contrast, increased expression of RAC3, COL6A1, and LAMA2 was
correlated with worse prognosis in both sets. Interestingly, high
expression of VCL was associated with a worse prognosis in the training
set but indicated a favorable prognosis in the validation set. Taken
together, ITGB6, RAC3, COL6A1, and LAMA2 could accurately predict
patient prognosis.
Fig. 5.
[114]Fig. 5
[115]Open in a new tab
Kaplan–Meier (KM) survival analysis of patients divided into the high-
and low-expression groups of the 7 prognostic genes in the TCGA-BLCA
cohort (training set). Blue represents high-risk group, red represents
low-risk group. a COL6A1 b ITGB6 c JUN d LAMA2 e PDGFD f RAC3 g VCL
Fig. 6.
[116]Fig. 6
[117]Open in a new tab
Kaplan–Meier (KM) survival analysis of patients divided into the high-
and low-expression groups of the 7 prognostic genes in the
[118]GSE32894 cohort (test set). Blue represents high-risk group, red
represents low-risk group. a VCL b COL6A1 c ITGB6 d JUN e LAMA2 f PDGFD
g RAC3 h Heatmap of gene expression level and prognosis. Yellow
indicates that high gene expression leads to worse prognosis, blue
indicates that low gene expression leads to worse prognosis, and grey
indicates a lack of significance (p > 0.05). The deeper color refers to
smaller p-value
Univariate and multivariate Cox regression analysis
Univariate and multivariate Cox regression analyses identified age,
angiolymphatic invasion, and the risk score as independent prognostic
factors of BLCA (Table [119]2). Furthermore, ROC curve analysis of the
three factors indicated that the AUC of the risk score was greater than
that of the other two factors (Supplementary Fig. [120]1), suggesting
more substantial predictive accuracy. Furthermore, higher risk scores
correlated with more advanced angiolymphatic invasion, T stage,
pathological stage, and lymphatic node metastasis. The patients were
divided into subgroups based on these clinical factors, and the
expression levels of the 7 prognostic genes were compared. As shown in
Fig. [121]7, the expression levels of COL6A1 and LAMA2 were
significantly different across all subgroups. Furthermore, the high-
and low-risk groups had very different survival rates in subgroups
demarcated by age, gender, histological grade, lymphatic node
metastasis, and T stage (Fig. [122]8), indicating that the risk score
can predict the prognosis in clinical sub-groups and may assist in
clinical decision making.
Table 2.
Univariate and multivariate Cox regression analysis of clinical
factors. (HR: hazard ratio; CI: confidence interval)
HR (95%CI) P value HR (95%CI) P value
TCGA
Age 1.035 (1.019 ~ 1.051) 1.82e-05* 1.039 (1.018 ~ 1.061) < 0.001*
Gender-male 0.872 (0.630 ~ 1.206) 0.407
Pathological Stage 1.703 (1.405 ~ 2.064) 5.63e-08* 1.236
(0.615 ~ 2.485) 0.552
T stage 1.702 (1.353 ~ 2.142) 5.79e-06* 1.173 (0.806 ~ 1.706) 0.405
Histological grade 2.915 (0.721 ~ 11.78) 0.133
Lymphatic metastasis 2.227 (1.625 ~ 3.051) 6.29e-07* 0.967
(0.385 ~ 2.427) 0.942
Angiolymphatic invasion 2.339 (1.601 ~ 3.416) 1.1e-05* 1.711
(1.054 ~ 2.777) 0.030*
Risk score 3.584 (2.455 ~ 5.233) 3.84e-11* 2.521 (1.455 ~ 4.367)
< 0.001*
[123]Open in a new tab
Fig. 7.
[124]Fig. 7
[125]Open in a new tab
Risk scores and expression levels of the 7 prognostic genes in the
different clinical sub-groups of TCGA-BLCA cohort. Risk scores in
subgroups of a pathological stages, c T stages, e angiolymphatic
invasion, and g lymphatic node metastasis. Gene expression levels in
subgroups of b pathological stages, d T stages, f angiolymphatic
invasion, and h lymphatic node metastasis
Fig. 8.
[126]Fig. 8
[127]Open in a new tab
Kaplan–Meier survival curves of high- and low-risk groups stratified
based on a-b gender, c-d angiolymphatic invasion, e–f lymphatic node
metastasis, g-h T stage, i-j age, and k histological grade. Blue high-
represents risk group, red represents low-risk group
DNA methylation analysis of 7 genes
We presented prognostic values of DNA methylation clustering the
expression levels of each 7 genes of the prognostic model in the
TCGA-BLCA cohort by KM curves (Supplementary Fig. [128]2). The CpG
islands corresponding to the smallest likelihood ratio (LR)
test p-value were chosen in all 7 genes to ensure the statistical
significance. The specific CpG resource of each gene was depicted in
the figure. Besides, the relationship between DNA methylation and the
prognosis of ITGB6 demonstrated its protective effect, displaying the
same trend as it does in the 7-gene prognostic model. prognosis of
ITGB6 demonstrated its protective effect, displaying the same trend as
in the 7-gene prognostic model.
Immune cell infiltration
The infiltration ratio of 22 immune cell types was analyzed in the
TGCA-BLCA cohort using CIBERSORT (Fig. [129]9a) and compared between
the high- and low-risk groups. As shown in Fig. [130]9b, the
predominant infiltrating immune cells in the high-risk groups were
activated CD4 memory T cells, resting dendritic cells (DCs), and
activated mast cells, whereas the M1 macrophages and activated DCs
showed higher infiltration in the low-risk group (p < 0.0001).
Kaplan–Meier survival analysis further showed that high infiltration of
resting DCs activated mast cells and activated CD4 memory T cells,
along with a high-risk score, which portended the worst prognosis. In
addition, low infiltration of M1 macrophages and activated DCs in the
high-risk group was associated with the worst prognosis (Supplementary
Fig. [131]3). The immune score and tumor purity in TCGA-BLCA cohort
were evaluated using the “ESTIMATE” R package (Supplementary Table
[132]4). The patients’ samples were divided into 4 clusters using the
median risk and immune scores. As shown in Fig. [133]9c, patients with
the lowest immune score and highest risk score had the worst prognosis.
Besides, M0, M2 macrophages, and neutrophils were also statistically
significant (p < 0.05), thus these immune cells were also selected for
further analysis on the TIMER platform. After filtering associated
immune cells, M2 macrophages were ultimately chosen to explore further
the relationship with the 7 genes (Supplementary Fig. [134]4). Apart
from ITGB6, which showed a negative correlation as a protective factor,
the other genes were all positively correlated with Macrophages
M2_CIBERSORT as risk factors, demonstrating the same trend as the
prognostic model. COL6A1, RAC3, LAMA2, and VCL showed significant
statistical meanings (p < 0.05).
Fig. 9.
[135]Fig. 9
[136]Open in a new tab
Immune infiltration status. a Bar chart showing the infiltration ratio
of 22 immune cell types in high and low-risk groups samples. b Violin
plot showing differences in immune cell types between the high and
low-risk groups. c KM survival curves of patients stratified by immune
score and risk score together
Verification of the prognostic model
The optimum cut-off value of the risk score for 3-year OS was -2.2174
and was used to divide the patients in the TCGA-BLCA cohort into the
high- and low-risk groups. Likewise, the optimum cut-off value of the
validation cohort ([137]GSE32894) was calculated as -0.1290. As shown
in Figs. [138]10a, b, patients in the high-risk group had significantly
worse survival than the low-risk group in both the training and
verification sets (p < 0.05). Furthermore, the AUC values of the risk
score for 1-, 3- and 5-year OS were respectively 0.66, 0.68, and 0.69
in the training set, and 0.66, 0.73, and 0.72 in the validation set
(Figs. [139]10c, d). We analyzed the relationship between the risk
scores and OS duration and the changes in the expression of various
genes in both cohorts (Figs. [140]10e, f). As expected, ITGB6 was
identified as a protective factor in both cohorts. It was downregulated
with the increasing risk score. Besides, limited by the sample size of
the Prognoscan online tool, RAC3 and COL6A1 were the only risk factors
that demonstrated a significant statistical correlation with the
overall survival time of patients (Supplementary Figs. [141]5 and
[142]6, p < 0.05) in external dataset [143]GSE13507 and [144]GSE5287,
respectively. Though KM curves of ITGB6 in the [145]GSE13507 cohort
(Supplementary Fig. [146]7) still showed a tendency as a protective
factor, the p-value was > 0.05, with no statistical significance. Taken
together, the 7-gene risk model can successfully stratify BLCA patients
into prognostic groups. Furthermore, In the validation of protein level
on the HPA database, VCL, COL6A1, RAC3, PDGFD and JUN showed higher
protein expression in BLCA tissue than in normal tissue. In comparison,
ITGB6 showed higher protein expression in normal tissue than in BLCA
tissue. In this database, LAMA2 expression was not detected in normal
or cancerous tissues (Supplementary Fig. [147]8).
Fig. 10.
[148]Fig. 10
[149]Open in a new tab
Kaplan–Meier survival curves of the low- and high-risk groups in a
TCGA-BLCA and b [150]GSE32894 cohorts. Blue represents high-risk, red
represents low-risk. The ROC curves and corresponding AUC values of the
risk score for 1-, 3- and 5-year OS in c TCGA-BLCA and d [151]GSE32894
cohorts. Risk score distribution, survival status, and gene expression
in e TCGA-BLCA and f [152]GSE32894 cohorts
COL6A1 and LAMA2 significantly promotes BLCA cell migration
To further investigate the effect of COL6A1 and LAMA2 on the biological
function of bladder cancer cells, a series of functional experiments
were performed to verify its effect. After knocking down the expression
of COL6A1 (Fig. [153]11a) and LAMA2 (Fig. [154]11b), the Transwell
assay showed that the migration ability of bladder cancer cells was
down-regulated (Figs. [155]11c, d). The result of the wound-healing
assay is consistent with that of the Transwell assay (Figs. [156]11e,
f). COL6A1 and LAMA2 were again validated to promote the migration of
bladder cancer cells.
Fig. 11.
[157]Fig. 11
[158]Open in a new tab
Validation of functional experiments on COL6A1 and LAMA2. a Relative
mRNA level of COL6A1 after being knocked down. b Relative mRNA level of
LAMA2 after being knocked down. c The number of migrated cells were
lower in Si-COL6A1 group than that in NC group. d The number of
migrated cells was lower in Si-LAMA2 group than that in NC group. e
Relative wound healing area was lower in Si-COL6A1 group than that in
NC group. f Relative wound healing area was lower in Si-LAMA2 group
than that in NC group
Discussion
Bladder cancer is a common malignancy of the urinary system with
unpredictive outcomes. Several bioinformatics studies in recent years
have established prognostic models, including an immune genes-related
model [[159]23] and an 11-gene model based on 5 cohorts [[160]11], for
clinical decision-making. Furthermore, hypoxia-related risk factors
[[161]9] and immune landscapes [[162]24] have also been associated with
bladder cancer prognosis. In this study, we successfully established a
reliable 7-gene focal adhesion-related prognostic model for BLCA using
RNA-seq data from the TCGA-BLCA cohort. We verified it on the external
[163]GSE32894 dataset. Given the regional differences between the two
datasets, the former being from Europe and the latter from North
America, we can conclude that the model can be applied universally.
The model comprises of 6 risk genes (VCL, COL6A1, RAC3, PDGFD, JUN,
LAMA2) and 1 protective gene (ITGB6), all of which are closely related
to focal adhesion. Among the 7 genes, COL6A1 and LAMA2 are the two most
significant genes in either KM analysis for predicting prognosis or
Kruskal–Wallis rank sum test for combined analysis of expression level
and clinical factors, thus, deserve to be further explored.
COL6A1, a gene encoding the collagen VI α1 chain, is widely present in
the connective tissues of vertebrates [[164]25]. Collagen VI is a major
extracellular matrix protein commonly used to support cell structures.
Some studies have shown that collagen VI can regulate cell migration,
apoptosis, and tumor progression [[165]26, [166]27]. Previous studies
tend to focus on the function of collagen VI itself, due to its
deficiency in myopathy and skeletal muscle diseases [[167]28].
However, COL6A1 has been shown to stimulate proliferation and prevent
apoptosis of cancer cells, which has also been found to be related to
different types of cancers. For instance, it was proved to be a
potential marker of cervical cancer progression in Kazobinka G et al.’s
research. The over-expression of COL6A1 was correlated with cervical
patients’ prognosis and cell biological functions [[168]29]. Besides,
the up-regulation of COL6A1 expression induces tumorigenesis in
prostate cancer cells in vivo has also been reported in a study about
castration-resistant prostate cancer [[169]30] and enhanced probability
of lung cancer cell metastasis in another research [[170]31]. Some
researchers also reported that the over-expression of COL6A1
contributed to poor prognosis of renal clear cell carcinoma and glioma
patients and enhanced probability of lung cancer cell metastasis
[[171]32, [172]33]. In addition, Snipstad K et al. have reported
an up-regulation of extracellular matrix proteins COL6A1 and LAMA4 in
rectal cancer after radio‐chemotherapy [[173]34]. All this evidence
indicated that COL6A1 has a close relationship with tumor progression
and was a novel biomarker of prognosis in different types of cancers,
not a simple gene related to collagen anymore.
As a member of the cell adhesion family, LAMA2 also encodes components
of extracellular matrix protein called laminin, a glycoprotein in the
connective tissue basement membrane, and promotes cell adhesion
[[174]35]. Laminin-α2, encoded by LAMA2, is abundant in skeletal
muscle, motor nerves, and the brain. It is a composition of trimeric
laminin-211 [[175]36] and is an essential constituent of tumor stromal,
which can be associated with the malignancy of the tumor. Since damage
to the basement membrane of tumor cells plays a vital role in tumor
invasion and transfer, many studies were conducted, and evidence has
shown that laminin expression was related to tumor progression
[[176]37]. LAMA2 belongs to the laminin family. However, seldom
researchers paid attention to the direct correlation between LAMA2
overexpression and tumor progression. Most studies focused on LAMA2
deficiency leading to muscular disease, and it is known that mutations
in LAMA2 produce a particularly severe type of congenital muscular
dystrophy, called LAMA2 chain deficient congenital muscular dystrophy
(LAMA2-CMD) [[177]38].
More importantly, seldom studies have explored the relationship between
COL6A1, LAMA2, and bladder cancer, thus of great significance in our
study.
Besides, other genes in the prognostic model are also related to
tumors. For instance, Cheng C et al. claimed RAC3 promoted
proliferation, migration, and invasion through PYCR1/JAK/STAT signaling
in bladder cancer [[178]39]. VCL has been reported as an important
prognostic biomarker in prostate cancer [[179]40]. Satow R et al.
reported that PDGFD promotes aggressiveness in prostate and colorectal
cancer [[180]41]. Previous research has reported that up-regulation of
JUN is associated with the invasiveness of colorectal cancer cells
[[181]42]. Singh A et al. reported in 2009 that ITGB6 correlated with
a well-differentiated K-Ras-driven cancer such as lung, pancreatic and
colon cancer [[182]43].
GSEA of the high- and low-risk groups indicated significant enrichment
of biological processes including biosynthesis of unsaturated fatty
acids, tight junction, lysine degradation and ubiquitin-mediated
proteolysis. proteolysis. Focal adhesion and tight junction commonly
belong to enriched cell adhesion/junction pathways [[183]44],
indicating that these biological processes can further explore the
relationship between tumorigenesis in the future. The risk score also
showed more substantial predictive power than multiple clinical
factors. Furthermore, patients with advanced clinical features had
higher risk scores. Survival analysis indicated that the prognosis of
the high-risk group was worse in both the training set and validation
set (p < 0.01), with respective 3-year AUC values of 0.68 and 0.73,
suggesting that the risk score model was capable of predicting BLCA
prognosis independent of the clinical factors. The risk score
distribution was also similar in both sets, which indicated good
consistency and universality of the risk model.
DNA methylation analysis of each 7 genes demonstrated a strong
correlation between the gene methylation and overall survival time in
the TCGA-BLCA cohort. ITGB6 was a protective factor in our prognostic
model and its protective effect was also provn by DNA methylation, as
the lower methylation level indicated a better prognosis, which
reflected the reliability of the 7-gene prognostic model to some
extent. Anuraga G et al. and Wang Z et al. also reported DNA
methylation analysis in their gene signature for predicting breast
cancer and lung adenocarcinoma via MethSurv [[184]45–[185]47]. The
validation of the HPA database on the protein level showed the same
tendency as the 7 genes in the prognostic model. Borowczak J et al.
reported CDK9 in bladder cancer via the HPA database [[186]48]. The
functional experiments successfully verified that COL6A1 promoted the
migration of bladder cancer cells as a risk factor, and bladder cancer
cells were down-regulated after knocking down the expression of COL6A1.
This reflected the reliability of our bioinformatic predicting model.
Previous studies have established tumor-infiltrating lymphocytes as one
of the immune-related prognostic factors [[187]49], and CD20 B cell has
been identified as a long-term survival factor in BLCA [[188]50]. In
this study, we found that both resting and activated DCs were relevant
to prognosis. High infiltration of DCs and a high-risk score indicate
the worst prognosis. Thus, the infiltration ratio of DCs is a potential
new prognostic factor that can be combined with the risk score for more
accurate prediction. Furthermore, the correlation between every single
gene in the model and Macrophage M2 replenished the evidence that
immune cells can influence clinical outcomes, assist in specific
immunotherapeutic responses, and help select suitable patients for
immunotherapy combined with risk score. Previous studies have reported
the relationship between gene and immune cells [[189]51, [190]52].
However, our results are limited because we only analyzed data from
TCGA and GEO databases. Though we validated the model on external
cohorts, only the [191]GSE32894 dataset showed an excellent result.
[192]GSE13507 and [193]GSE5287 did not establish statistically
significant relationships between the expression of each 7genes in the
model and survival time on the Prognoscan database. Previous
researchers have also drawn from such databases to determine the
relationship between genes and prognosis [[194]53, [195]54] Due to the
lack of clinical cohorts, the model’s reliability cannot be verified
clinically. And the best cut-off value was directly chosen as a 3-year
OS value of ROC without calculating in a more accurate method. The two
up-regulated key genes COL6A1 and LAMA2 were not confirmed for their
significant roles on the basic experimental level.
Conclusions
The 7-gene FA-related prognostic model can accurately predict the
prognosis of BLCA patients and aid in clinical decision-making. Further
studies are needed to amend its accuracy and stability for clinical
applications.
Supplementary Information
[196]12885_2022_10264_MOESM1_ESM.pdf^ (387.8KB, pdf)
Additional file 1: Supplementary Figure 1. The ROC curves of (a) risk
score, (b) angiolymphatic invasion and (c) age for 1-, 3- and 5-year OS
and the corresponding AUC values.
[197]12885_2022_10264_MOESM2_ESM.pdf^ (421.4KB, pdf)
Additional file 2: Supplementary Figure 2. Kaplan–Meier survival curves
of patients demarcated on the basis of high- and low-DNA methylation of
each 7 genes in the model. (a) COL6A1, (b) ITGB6, (c) JUN, (d) LAMA2
(e) PDGFD, (f)RAC3, (g)VCL.
[198]12885_2022_10264_MOESM3_ESM.pdf^ (474.3KB, pdf)
Additional file 3: Supplementary Figure 3. Kaplan–Meier survival curves
of high- and lowrisk patients demarcated on the basis of (a) activated
CD4 memory T cell, (b) dendritic cell, (c) activated mast cells, (d) M1
macrophages and (e) dendritic cell infiltration.
[199]12885_2022_10264_MOESM4_ESM.pdf^ (522.8KB, pdf)
Additional file 4: Supplementary Figure 4. Correlation scatter plot of
Macrophage M2 infiltration ration and expression level of each 7 genes
in the model. (a) COL6A1, (b) ITGB6, (c) JUN, (d) LAMA2, (e) PDGFD, (f)
RAC3, (g) VCL.
[200]12885_2022_10264_MOESM5_ESM.pdf^ (500.2KB, pdf)
Additional file 5: Supplementary Figure 5. Related validation plots on
Prognoscan platform of the RAC3 expression. (a) expression level
distribution plot, (b) expression level histogram plot, (c) p-value
distribution plot, (d) K-M curves of patients with high- and
low-expression of RAC3, (e) survival time distribution plot.
[201]12885_2022_10264_MOESM6_ESM.pdf^ (540.5KB, pdf)
Additional file 6: Supplementary Figure 6. Related validation plots on
Prognoscan platform of the COL6A1 expression. (a) expression level
distribution plot, (b) expression level histogram plot, (c) p-value
distribution plot, (d) K-M curves of patients with high- and
low-expression of COL6A1, (e) survival time distribution plot.
[202]12885_2022_10264_MOESM7_ESM.pdf^ (507.9KB, pdf)
Additional file 7: Supplementary Figure 7. Related validation plots on
Prognoscan platform of the ITGB6 expression. (a) expression level
distribution plot, (b) expression level histogram plot, (c) p-value
distribution plot, (d) K-M curves of patients with high- and
low-expression of ITGB6, (e) survival time distribution plot.
[203]12885_2022_10264_MOESM8_ESM.pdf^ (510.9KB, pdf)
Additional file 8: Supplementary Figure 8. Validation of the seven-mRNA
prognostic signature in the HPA database. The deeper the color, the
higher expression in the tissues. (a-b) COL6A1 expression was higher in
tumor tissues. (c-d) ITGB6 expression was higher in normal tissues.
(e-f) JUN expression was higher in tumor tissues. (g-h) LAMA2
expression differences was insignificant between normal and tumor
tissues. (i-j) PDGFD expression was higher in tumor tissues. (k-l) RAC3
expression was higher in tumor tissues. (m-n) VCL expression was higher
in tumor tissues.
[204]12885_2022_10264_MOESM9_ESM.xlsx^ (17KB, xlsx)
Additional file 9: Supplementary Table 1. Detailed genes table.
[205]12885_2022_10264_MOESM10_ESM.pdf^ (177.6KB, pdf)
Additional file 10: Supplementary Table 2. Detailed information of the
7 genes.
[206]12885_2022_10264_MOESM11_ESM.pdf^ (147.7KB, pdf)
Additional file 11: Supplementary Table 3. Univariate cox regression
analysis of 7 genes (HR: hazard ratio; CI: confidence interval).
[207]12885_2022_10264_MOESM12_ESM.xlsx^ (35.7KB, xlsx)
Additional file 12: Supplementary Table 4. “ESTIMATE” results.
[208]12885_2022_10264_MOESM13_ESM.pdf^ (96KB, pdf)
Additional file 13: Supplementary Table 5. Sequences of siRNA for
transfection.
[209]12885_2022_10264_MOESM14_ESM.pdf^ (119.5KB, pdf)
Additional file 14: Supplementary Table 6. Sequences of primer pair for
qPCR.
Acknowledgements