Abstract
Background
Circadian rhythm is an internal timing system generated by
circadian-related genes (CRGs). Disruption in this rhythm has been
associated with a heightened risk of breast cancer (BC) and regulation
of the immune microenvironment of tumors. This study aimed to
investigate the clinical significance of CRGs in BC and the immune
microenvironment.
Methods
CRGs were identified using the GeneCards and MSigDB databases. Through
unsupervised clustering, we identified two circadian-related subtypes
in patients with BC. We constructed a prognostic model and nomogram for
circadian-related risk scores using LASSO and Cox regression analyses.
Using multi-omics analysis, the mutation profile and immunological
microenvironment of tumors were investigated, and the immunotherapy
response in different groups of patients was predicted based on their
risk strata.
Results
The two circadian-related subtypes of BC that were identified differed
significantly in their prognoses, clinical characteristics, and tumor
immune microenvironments. Subsequently, we constructed a
circadian-related risk score (CRRS) model containing eight signatures
(SIAH2, EZR, GSN, TAGLN2, PRDX1, MCM4, EIF4EBP1, and CD248) and a
nomogram. High-risk individuals had a greater burden of tumor
mutations, richer immune cell infiltration, and higher expression of
immune checkpoint genes, than low-risk individuals, indicating a “hot
tumor" immune phenotype and a more favorable treatment outcome.
Conclusions
Two circadian-related subtypes of BC were identified and used to
establish a CRRS prognostic model and nomogram. These will be valuable
in providing guidance for forecasting prognosis and developing
personalized treatment plans for BC.
Keywords: Circadian rhythm, Circadian-related genes, Breast cancer,
Tumor immune microenvironment, Immunotherapy
1. Introduction
Breast cancer (BC) is the most prevalent malignancy in women and the
leading cause of cancer-related deaths. BC incidence is increasing
annually worldwide. Several epidemiological investigations have linked
circadian disruptions, such as jet lag, disrupted sleep, and shift
work, increasing the risk of BC [[35][1], [36][2], [37][3], [38][4]].
Notably, many studies have verified that the circadian-related genes
(CRGs) are a factor effecting the initiation, development, and
prognosis of BC, by regulating physiological processes such as cell
cycle control, cell proliferation, apoptosis, and DNA damage repair
[[39][5], [40][6], [41][7], [42][8], [43][9]].
Recent findings have shown that circadian disruption affects the
remodeling of the tumor immune microenvironment (TIME) and induces the
establishment of an immunosuppressive microenvironment [[44][10],
[45][11], [46][12]]. Historically, BC has been considered an
immunologically “cold" tumor [[47]13,[48]14]. However, high throughput
genomic and cellular profiling have indicated that the immunologic
environment of BC is dynamic and heterogeneous, with a tumor
microenvironment (TME) comprising unique functional immune infiltrates,
suggesting potential new immunotherapeutic strategies [[49][15],
[50][16], [51][17]]. Therefore, targeting analysis of the regulation of
circadian rhythms on the BC immune microenvironment may provide an
opportunity to diagnose and treat BC.
The circadian rhythm is a 24-h endogenous rhythm established by
organisms to adapt to the Earth's rotation. As an important regulatory
system for the maintenance of normal cell and tissue homeostasis,
circadian rhythm plays a key role in processes associated with cancer
[[52][18], [53][19], [54][20], [55][21]]. This rhythm depends on the
production of circadian genes and proteins regulated by the circadian
clock. These genes can be specifically divided into core clock genes
and clock control genes that interact with each other and affect the
operation and expression of the circadian system [[56][22], [57][23],
[58][24]]. Dysregulated or mutated circadian genes affect cancer
development through complex and precise regulatory mechanisms,
including regulation of the cell cycle, changes in metabolism,
epithelial–mesenchymal transition, proliferation, and metastasis
[[59]5,[60]25,[61]26]. For example, the overexpression of CLOCK and
BMAL1 in BC tissues promotes the proliferation and invasion of BC cells
[[62][27], [63][28], [64][29]]. Mutations in NPAS2 are associated with
BC risk [[65]30,[66]31].
Studies have shown that there is a strong association between CRGs and
immune checkpoint inhibitors that can affect both the infiltration of
immune cells into tumor patients and the TME [[67]32]. Using a
chronotherapy strategy to improve the benefit of immune checkpoint
inhibitors may be sufficient to advance BC immunotherapy. In this
strategy, CRGs could be a promising new set of biomarkers for BC risk
and prognosis. Further research on CRGs is needed to realize precise
and personalized medical regimens for BC.
In previous studies, Zhang et al. constructed a risk prediction model
to classify BC patients into high- and low-risk groups through CRGs,
which laid a certain foundation for our study [[68]33]. We included
more abundant CRGs and first identified the presence of different
circadian subtypes in BC by consensus cluster analysis, which provided
a reliable basis for model construction. A larger validation dataset
was used, and after re-statistical analysis, our BC risk prediction
model was constructed to distinguish between high-risk and low-risk
groups of BC. The immune microenvironmental landscape of BC was mapped
for a new perspective on circadian rhythms. In addition, the variance
in gene mutations and biological processes between patients with
different prognostic risks was further elucidated, which might provide
new strategies for individualized healthcare.
2. Materials and methods
2.1. Data sources
The Cancer Genome Atlas (TCGA, [69]https://tcga-data.nci.nih.gov/tcga/)
and Molecular Taxonomy of Breast Cancer International Consortium
(METABRIC, [70]http://www.cbioportal.org) databases were used to
describe the transcriptome and clinical characteristics of patients
with BC. TCGA data were used for training purposes. METABRIC data were
used as a validation set. To process the data, first, we removed the
normal samples from the dataset. Then, we screened only those samples
for which the survival time was >30 days, retaining only one sample per
patient. Finally, we preserved genes that were detected in over fifty
percent of the samples. CRGs were obtained from GeneCards [[71]34]
([72]http://www.genecards.org/) and MSigDB
([73]http://software.broadinstitute.org/gsea/msigdb/index.jsp/). Local
ethics committee approval was not required because the data were
obtained from public databases (TCGA and METABRIC). The study strictly
adhered to the data access policy and publication guidelines provided
by TCGA and METABRIC [[74]35,[75]36].
2.2. Construction of circadian subtypes in BC
According to the expression of CRGs, the “ConsensusClusterPlus" was
performed [[76]37]. It could identify different circadian associated
subtypes in BC. To assess the capacity of mRNA expression profiles in
discriminating between the two types associated with the circadian
rhythm, we conducted principal component analysis (PCA). The R packages
“survival” and “survminer” were used to conduct survival analysis. For
the identification of differentially expressed genes (DEGs) between the
two isoforms, the R package “Limma” was used [[77]38]. Technical
support for the Sankey diagram was provided by the SangerBox platform
([78]http://sangerbox.com/). Additionally, the association between
different circadian-related clusters and clinical features, including
age, survival status, and stage, was determined using the R package
“pheatmap”.
2.3. Enrichment analysis and immune landscape of circadian subtypes
In order to elucidate disparities in biological processes that exist in
the two identified circadian subtypes, we performed gene set variation
analysis (GSVA) using the “GSVA” R package [[79]39]. Subsequently, DEGs
were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG),
gene ontology (GO), and gene set enrichment analysis (GSEA) using
“Clusterprofiler” [[80][40], [81][41], [82][42]]. We looked more
closely at how the infiltrating immune cells differed between the two
subtypes with the help of “GSVA” and “GSEABase” R packages. The
“ESTIMATE” calculated the immune, stromal, and estimated scores for the
patients with BC [[83]43]. To increase the precision of anticipating,
the tumor immune dysfunction and exclusion (TIDE) tool
([84]http://tide.dfci.harvard.edu/) was used to compare TIDE,
microsatellite instability (MSI), dysfunction, and exclusion scores
amongst various subtypes [[85]44]. In addition, the “pRRophetic” R
package screened commonly used drugs for BC to determine the potential
sensitivity of chemotherapy and endocrine therapy drugs for different
subtypes [[86]45].
2.4. Construction and validation of circadian-related risk score (CRRS) model
in BC
In breast cancer, core prognostic genes influence patient prognosis and
immune function by regulating circadian rhythms. When the expression of
these genes is dysregulated, circadian rhythm disruption may result. To
investigate prognostic CRGs, univariate Cox regression analysis was
performed utilizing the “survival”. These prognostic CRGs were further
analyzed. The “glmnet” was selected to perform a least absolute
shrinkage and selection operator (LASSO) penalized Cox regression
analysis [[87]46]. Our goal is to identify genes that are important in
predicting prognosis. The product of the expression of core
prognosis-related CRGs and their coefficients yielded the CRRS. The
formula for the risk score has been established as follows:
[MATH: CRRS=∑i<
mrow>Coefficientof(i)×Expressionofgene(i).
:MATH]
The coefficient of the gene (i) is the regression coefficient of the
gene (i), and the expression of the gene (i) is the expression value of
gene (i) for each patient. Using the risk score formulation described
above, patients in each cohort were classified into high- or low-risk
groups using the median risk score as the cutoff value. The impact of
multiple key CRGs on breast cancer prognosis was fully considered by
CRRS, which was calculated to differentiate different risk groups, and
to some extent visualized the status of circadian rhythm disruption in
patients. High-risk patients had dysregulated expression of prognostic
core genes, disrupted circadian rhythms, and worse prognosis compared
to low-risk patients. Simultaneously, PCA and t-distributed stochastic
neighbor embedding analyses were used to verify the reliability of the
CRRS model in discriminating between different risk groups using the
“Rtsne” R package. To check the reliability of the prediction model,
the receiver operating characteristic (ROC) curve was plotted in the
“survivalROC” R package. C-index was calculated by the R package
“survcomp”. Immunohistochemistry staining of prognostic CRGs was
validated using the Human Protein Atlas (HPA;
[88]http://www.proteinatlas.org/) [[89]47]. Risk scores, clinical
events and model genes were presented using the SangerBox online
platform ([90]http://sangerbox.com/) [[91]48].
2.5. Construction of the nomogram prediction model
Based on TCGA, the survival time, survival status, and other clinical
characteristics of breast cancer patients were sorted out. Using the R
packages “survival", “regplot” and “rms”, a nomogram was created that
is suitable for clinical use. After multifactorial Cox regression
analysis, we can assess the extent to each independent variable
(influencing factor) contributes to the outcome variables (e.g.,
survival time, survival status, etc.), i.e., the magnitude of the
regression coefficients. The nomogram takes into account factors such
as age, clinical stage and risk score. Points were assigned according
to the level at which each independent variable was taken, and these
scores were then summed to obtain a total points. To visualize these
predictors, we integrated them and used scaled line segments, plotted
on the same plane at a certain scale. Clinicians can use this intuitive
linear scale to predict a patient's probability of survival at
different time points, specifically 1-, 3-, and 5-year. Furthermore,
the ROC curve and decision analysis (DCA) are also applied employing
the “rmda” R package [[92]49].
2.6. Mutation analysis
Mutational information were retrieved from TCGA and cBioPortal for
Cancer Genomics ([93]http://www.cbioportal.org) databases. To evaluate
differences in the percentage of genomic modifications, mutation count,
and TMB, the “Maftools” was used [[94]50]. The median TMB was used to
stratify the specimens. Then, for the purpose of survival analysis, we
combined the risk score with the TMB score. In addition, the
relationship between risk score and TMB was investigated. Information
from the Gene Set Cancer Analysis (GSCA) website
([95]http://bioinfo.life.hust.edu.cn/GSCA/) was used to identify
mutations in core prognostic genes [[96]51].
2.7. Functional enrichment analysis and evaluation of TIME
Initially, GO and KEGG analyses and GSEA were performed on DEGs between
different risk groups. Based on potential gene signatures and the
amount of each type of immune cell, the study examined the association
between people at high risk and those at low risk using CIBERSORT.
Using Spearman's correlation analysis, the association between
signature genes and various immune cells was examined. The GSCA
database was used to extract data on GSVA scores for these genes. This
was then compared with immune functions, copy number variants (CNVs),
and single nucleotide site variants (SNVs). Finally, variations in
immune checkpoint gene expression and TIDE scores were compared between
the high-risk and low-risk groups.
2.8. Statistical analyses
R software (version 4.0.5) was used to conduct statistical analysis.
Groups were compared for differences in overall survival (OS) times
using a bilateral log-rank test and a survival curve was constructed.
To determine the P-values for the correlation study, Spearman's method
was applied. More than two groups were compared using the
Kruskal-Wallis test, whereas two groups were compared using the
Wilcoxon test. We used the R q value function (included in the R
package “q value”) to compute Q-values in order to solve the problem of
multiple hypothesis testing [[97]52]. When P < 0.05, the results were
deemed statistically significant.
3. Results
3.1. Identification of circadian subtypes in BC
In the end, there were 1823 samples in the validation set and 883
samples in the training set. After gathering 2796 and 343 CRGs from
GeneCards and MSigDB, respectively, duplicates were removed from the
union of these sets. By intersecting the resulting set with the genes
of patients with BC in TCGA, we identified 2655 CRGs. Unsupervised
clustering with clusters 1 and 2 (512 and 371 cases, respectively)
included in the training set was used to separate the two distinct
circadian-related subpopulations based on CRG expression levels
([98]Fig. 1A). PCA divided patients into two groups, confirming the
presence of two diverse circadian-related subtypes (Supplemental Fig.
1A). To determine the variations in survival between clusters, BC
cohorts in TCGA and METABRIC with relatively complete data on survival
were examined. The survival advantage of Cluster 2 was greater than
that of Cluster 1 ([99]Fig. 1B, Supplementary Fig. 1B). Subsequently,
associations between both groups and numerous clinical features were
assessed ([100]Fig. 1C). In contrast to Cluster 1, Cluster 2 had a
larger percentage of patients aged ≤55 years and survival outcomes.
Chi-square test findings revealed significant variations in many
clinical aspects ([101]Fig. 1D). These findings suggest that CRGs
influence tumor development. Using a heat map, we identified two
distinct circadian subtypes based on their transcriptomic profiles
([102]Fig. 1D).
Fig. 1.
[103]Fig. 1
[104]Open in a new tab
Consensus clustering of circadian-related genes in BC in TCGA. (A) TCGA
cohort patient consensus matrix. (B) KM curve for OS with circadian
classes. (C) Alluvial diagram showing changes in age, circadian
cluster, and survival status. (D) Unsupervised clustering of all
circadian-related genes.
3.2. Immune landscape of circadian subtypes
GSVA was used to ascertain how the pathway enrichment analysis differed
between the two circadian clusters in the TCGA cohort. According to the
enrichment map, Cluster 1 was mainly focused on metabolic pathways and
Cluster 2 was significantly enriched in immune and inflammatory
pathways (e.g., chemokine, T-cell receptor, B-cell receptor, and
Toll-like receptor) ([105]Fig. 2A). KEGG and GO analyses of the DEGs
between the two circadian subtypes also revealed enrichment in immune
function pathways ([106]Fig. 2B and C). With this finding further
confirmed by GSEA, Cluster 2 was significantly enriched in T-cell
receptor, cell adhesion molecules, and chemokine signaling pathways
([107]Fig. 3A and B). The ESTIMATE algorithm was used to further
analyze the immune microenvironment of the two subtypes of BC by
generating stromal, immune, and estimated scores. Cluster 2 had higher
scores than Cluster 1 in all categories ([108]Fig. 3C). Finally, we
tested whether circadian subtypes were significantly correlated with
the effects of immunotherapy. TIDE analysis revealed that Cluster 2 was
associated with higher TIDE, MSI, and T-cell dysfunction scores than
Cluster 1, while the opposite was observed for the T-cell exclusion
scores ([109]Fig. 3D). Due to the intriguing findings, we conducted
further analysis on the correlation between different clusters and
cancer-associated fibroblasts (CAFs). Additionally, Cluster 1 CAF
signature genes with elevated expression levels were included in the
investigation (Supplementary Fig. 2). Upon comparing the immune cell
infiltration of both kinds, Cluster 2 had a unique immunological
profile. Cluster 2 showed increased infiltration of natural killer
cells and activated dendritic cells compared to Cluster 1 ([110]Fig.
3E). Human leukocyte antigen (HLA) and immune checkpoint molecules play
crucial roles in immune function and have significant clinical
immunotherapy implications. Therefore, we also compared the expression
levels of HLA and immune checkpoint genes among the different subtypes.
Compared to Cluster 1, Cluster 2 has higher levels of HLA molecule
expression ([111]Fig. 3F). Comparative analysis of immune checkpoint
gene expression suggested that Cluster 2 may have a more positive
response to immunotherapy against the CTLA4, CD274, LAG3, TIGIT,
HAVCR2, PDCD1, and PDCD1LG2 immune checkpoints ([112]Fig. 3G).
Furthermore, the prediction of sensitivity to chemotherapeutic drugs in
different subtypes was performed in this study (Supplementary Fig. 3).
Among them, cyclophosphamide, paclitaxel, vinorelbine, gemcitabine,
cisplatin, 5-fluorouracil, epirubicin and docetaxel, which are commonly
used in breast cancer chemotherapy, showed more benefit in Cluster 1
(Supplementary Fig. 3A). Cluster 1 also has advantages in endocrine
therapy, such as tamoxifen and fulvestrant (Supplementary Fig. 3B).
Overall, our findings suggest that Cluster 2 patients, with increased
HLA molecule expression and a unique immunological profile, may respond
better to immunotherapy targeting specific immune checkpoints. These
findings further underline the importance of understanding the role of
the TIME in circadian-related disorders and their potential
implications for therapeutic strategies.
Fig. 2.
[113]Fig. 2
[114]Open in a new tab
Functional annotation analysis of DEGs between different circadian
subtypes in TCGA. (A) GSVA data of the biological pathways.
Immunologically relevant pathways: highlight in bold red font. Blue:
inhibition of biological pathways. Red: activated biological pathways.
(B) The analysis of KEGG enrichment in two circadian subtypes. (C) GO
analysis of differentially expressed genes between two circadian
subtypes. (For interpretation of the references to colour in this