Abstract

Background

   Circadian rhythm is an internal timing system generated by
   circadian-related genes (CRGs). Disruption in this rhythm has been
   associated with a heightened risk of breast cancer (BC) and regulation
   of the immune microenvironment of tumors. This study aimed to
   investigate the clinical significance of CRGs in BC and the immune
   microenvironment.

Methods

   CRGs were identified using the GeneCards and MSigDB databases. Through
   unsupervised clustering, we identified two circadian-related subtypes
   in patients with BC. We constructed a prognostic model and nomogram for
   circadian-related risk scores using LASSO and Cox regression analyses.
   Using multi-omics analysis, the mutation profile and immunological
   microenvironment of tumors were investigated, and the immunotherapy
   response in different groups of patients was predicted based on their
   risk strata.

Results

   The two circadian-related subtypes of BC that were identified differed
   significantly in their prognoses, clinical characteristics, and tumor
   immune microenvironments. Subsequently, we constructed a
   circadian-related risk score (CRRS) model containing eight signatures
   (SIAH2, EZR, GSN, TAGLN2, PRDX1, MCM4, EIF4EBP1, and CD248) and a
   nomogram. High-risk individuals had a greater burden of tumor
   mutations, richer immune cell infiltration, and higher expression of
   immune checkpoint genes, than low-risk individuals, indicating a “hot
   tumor" immune phenotype and a more favorable treatment outcome.

Conclusions

   Two circadian-related subtypes of BC were identified and used to
   establish a CRRS prognostic model and nomogram. These will be valuable
   in providing guidance for forecasting prognosis and developing
   personalized treatment plans for BC.

   Keywords: Circadian rhythm, Circadian-related genes, Breast cancer,
   Tumor immune microenvironment, Immunotherapy

1. Introduction

   Breast cancer (BC) is the most prevalent malignancy in women and the
   leading cause of cancer-related deaths. BC incidence is increasing
   annually worldwide. Several epidemiological investigations have linked
   circadian disruptions, such as jet lag, disrupted sleep, and shift
   work, increasing the risk of BC [[35][1], [36][2], [37][3], [38][4]].
   Notably, many studies have verified that the circadian-related genes
   (CRGs) are a factor effecting the initiation, development, and
   prognosis of BC, by regulating physiological processes such as cell
   cycle control, cell proliferation, apoptosis, and DNA damage repair
   [[39][5], [40][6], [41][7], [42][8], [43][9]].

   Recent findings have shown that circadian disruption affects the
   remodeling of the tumor immune microenvironment (TIME) and induces the
   establishment of an immunosuppressive microenvironment [[44][10],
   [45][11], [46][12]]. Historically, BC has been considered an
   immunologically “cold" tumor [[47]13,[48]14]. However, high throughput
   genomic and cellular profiling have indicated that the immunologic
   environment of BC is dynamic and heterogeneous, with a tumor
   microenvironment (TME) comprising unique functional immune infiltrates,
   suggesting potential new immunotherapeutic strategies [[49][15],
   [50][16], [51][17]]. Therefore, targeting analysis of the regulation of
   circadian rhythms on the BC immune microenvironment may provide an
   opportunity to diagnose and treat BC.

   The circadian rhythm is a 24-h endogenous rhythm established by
   organisms to adapt to the Earth's rotation. As an important regulatory
   system for the maintenance of normal cell and tissue homeostasis,
   circadian rhythm plays a key role in processes associated with cancer
   [[52][18], [53][19], [54][20], [55][21]]. This rhythm depends on the
   production of circadian genes and proteins regulated by the circadian
   clock. These genes can be specifically divided into core clock genes
   and clock control genes that interact with each other and affect the
   operation and expression of the circadian system [[56][22], [57][23],
   [58][24]]. Dysregulated or mutated circadian genes affect cancer
   development through complex and precise regulatory mechanisms,
   including regulation of the cell cycle, changes in metabolism,
   epithelial–mesenchymal transition, proliferation, and metastasis
   [[59]5,[60]25,[61]26]. For example, the overexpression of CLOCK and
   BMAL1 in BC tissues promotes the proliferation and invasion of BC cells
   [[62][27], [63][28], [64][29]]. Mutations in NPAS2 are associated with
   BC risk [[65]30,[66]31].

   Studies have shown that there is a strong association between CRGs and
   immune checkpoint inhibitors that can affect both the infiltration of
   immune cells into tumor patients and the TME [[67]32]. Using a
   chronotherapy strategy to improve the benefit of immune checkpoint
   inhibitors may be sufficient to advance BC immunotherapy. In this
   strategy, CRGs could be a promising new set of biomarkers for BC risk
   and prognosis. Further research on CRGs is needed to realize precise
   and personalized medical regimens for BC.

   In previous studies, Zhang et al. constructed a risk prediction model
   to classify BC patients into high- and low-risk groups through CRGs,
   which laid a certain foundation for our study [[68]33]. We included
   more abundant CRGs and first identified the presence of different
   circadian subtypes in BC by consensus cluster analysis, which provided
   a reliable basis for model construction. A larger validation dataset
   was used, and after re-statistical analysis, our BC risk prediction
   model was constructed to distinguish between high-risk and low-risk
   groups of BC. The immune microenvironmental landscape of BC was mapped
   for a new perspective on circadian rhythms. In addition, the variance
   in gene mutations and biological processes between patients with
   different prognostic risks was further elucidated, which might provide
   new strategies for individualized healthcare.

2. Materials and methods

2.1. Data sources

   The Cancer Genome Atlas (TCGA, [69]https://tcga-data.nci.nih.gov/tcga/)
   and Molecular Taxonomy of Breast Cancer International Consortium
   (METABRIC, [70]http://www.cbioportal.org) databases were used to
   describe the transcriptome and clinical characteristics of patients
   with BC. TCGA data were used for training purposes. METABRIC data were
   used as a validation set. To process the data, first, we removed the
   normal samples from the dataset. Then, we screened only those samples
   for which the survival time was >30 days, retaining only one sample per
   patient. Finally, we preserved genes that were detected in over fifty
   percent of the samples. CRGs were obtained from GeneCards [[71]34]
   ([72]http://www.genecards.org/) and MSigDB
   ([73]http://software.broadinstitute.org/gsea/msigdb/index.jsp/). Local
   ethics committee approval was not required because the data were
   obtained from public databases (TCGA and METABRIC). The study strictly
   adhered to the data access policy and publication guidelines provided
   by TCGA and METABRIC [[74]35,[75]36].

2.2. Construction of circadian subtypes in BC

   According to the expression of CRGs, the “ConsensusClusterPlus" was
   performed [[76]37]. It could identify different circadian associated
   subtypes in BC. To assess the capacity of mRNA expression profiles in
   discriminating between the two types associated with the circadian
   rhythm, we conducted principal component analysis (PCA). The R packages
   “survival” and “survminer” were used to conduct survival analysis. For
   the identification of differentially expressed genes (DEGs) between the
   two isoforms, the R package “Limma” was used [[77]38]. Technical
   support for the Sankey diagram was provided by the SangerBox platform
   ([78]http://sangerbox.com/). Additionally, the association between
   different circadian-related clusters and clinical features, including
   age, survival status, and stage, was determined using the R package
   “pheatmap”.

2.3. Enrichment analysis and immune landscape of circadian subtypes

   In order to elucidate disparities in biological processes that exist in
   the two identified circadian subtypes, we performed gene set variation
   analysis (GSVA) using the “GSVA” R package [[79]39]. Subsequently, DEGs
   were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG),
   gene ontology (GO), and gene set enrichment analysis (GSEA) using
   “Clusterprofiler” [[80][40], [81][41], [82][42]]. We looked more
   closely at how the infiltrating immune cells differed between the two
   subtypes with the help of “GSVA” and “GSEABase” R packages. The
   “ESTIMATE” calculated the immune, stromal, and estimated scores for the
   patients with BC [[83]43]. To increase the precision of anticipating,
   the tumor immune dysfunction and exclusion (TIDE) tool
   ([84]http://tide.dfci.harvard.edu/) was used to compare TIDE,
   microsatellite instability (MSI), dysfunction, and exclusion scores
   amongst various subtypes [[85]44]. In addition, the “pRRophetic” R
   package screened commonly used drugs for BC to determine the potential
   sensitivity of chemotherapy and endocrine therapy drugs for different
   subtypes [[86]45].

2.4. Construction and validation of circadian-related risk score (CRRS) model
in BC

   In breast cancer, core prognostic genes influence patient prognosis and
   immune function by regulating circadian rhythms. When the expression of
   these genes is dysregulated, circadian rhythm disruption may result. To
   investigate prognostic CRGs, univariate Cox regression analysis was
   performed utilizing the “survival”. These prognostic CRGs were further
   analyzed. The “glmnet” was selected to perform a least absolute
   shrinkage and selection operator (LASSO) penalized Cox regression
   analysis [[87]46]. Our goal is to identify genes that are important in
   predicting prognosis. The product of the expression of core
   prognosis-related CRGs and their coefficients yielded the CRRS. The
   formula for the risk score has been established as follows:
   [MATH: <mrow><mtext>CRRS</mtext><mo
   linebreak="badbreak">=</mo><mrow><munder><mo>∑</mo><mi>i</mi></munder><
   mrow><mtext>Coefficient</mtext><mspace
   width="0.25em"></mspace><mtext>of</mtext><mspace
   width="0.25em"></mspace><mrow><mo stretchy="false">(</mo><mi
   mathvariant="normal">i</mi><mo stretchy="false">)</mo></mrow><mo
   linebreak="badbreak">×</mo><mtext>Expression</mtext><mspace
   width="0.25em"></mspace><mtext>of</mtext><mspace
   width="0.25em"></mspace><mtext>gene</mtext><mspace
   width="0.25em"></mspace><mrow><mo stretchy="false">(</mo><mi
   mathvariant="normal">i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow><mtext>.</mtext></mrow>
   :MATH]

   The coefficient of the gene (i) is the regression coefficient of the
   gene (i), and the expression of the gene (i) is the expression value of
   gene (i) for each patient. Using the risk score formulation described
   above, patients in each cohort were classified into high- or low-risk
   groups using the median risk score as the cutoff value. The impact of
   multiple key CRGs on breast cancer prognosis was fully considered by
   CRRS, which was calculated to differentiate different risk groups, and
   to some extent visualized the status of circadian rhythm disruption in
   patients. High-risk patients had dysregulated expression of prognostic
   core genes, disrupted circadian rhythms, and worse prognosis compared
   to low-risk patients. Simultaneously, PCA and t-distributed stochastic
   neighbor embedding analyses were used to verify the reliability of the
   CRRS model in discriminating between different risk groups using the
   “Rtsne” R package. To check the reliability of the prediction model,
   the receiver operating characteristic (ROC) curve was plotted in the
   “survivalROC” R package. C-index was calculated by the R package
   “survcomp”. Immunohistochemistry staining of prognostic CRGs was
   validated using the Human Protein Atlas (HPA;
   [88]http://www.proteinatlas.org/) [[89]47]. Risk scores, clinical
   events and model genes were presented using the SangerBox online
   platform ([90]http://sangerbox.com/) [[91]48].

2.5. Construction of the nomogram prediction model

   Based on TCGA, the survival time, survival status, and other clinical
   characteristics of breast cancer patients were sorted out. Using the R
   packages “survival", “regplot” and “rms”, a nomogram was created that
   is suitable for clinical use. After multifactorial Cox regression
   analysis, we can assess the extent to each independent variable
   (influencing factor) contributes to the outcome variables (e.g.,
   survival time, survival status, etc.), i.e., the magnitude of the
   regression coefficients. The nomogram takes into account factors such
   as age, clinical stage and risk score. Points were assigned according
   to the level at which each independent variable was taken, and these
   scores were then summed to obtain a total points. To visualize these
   predictors, we integrated them and used scaled line segments, plotted
   on the same plane at a certain scale. Clinicians can use this intuitive
   linear scale to predict a patient's probability of survival at
   different time points, specifically 1-, 3-, and 5-year. Furthermore,
   the ROC curve and decision analysis (DCA) are also applied employing
   the “rmda” R package [[92]49].

2.6. Mutation analysis

   Mutational information were retrieved from TCGA and cBioPortal for
   Cancer Genomics ([93]http://www.cbioportal.org) databases. To evaluate
   differences in the percentage of genomic modifications, mutation count,
   and TMB, the “Maftools” was used [[94]50]. The median TMB was used to
   stratify the specimens. Then, for the purpose of survival analysis, we
   combined the risk score with the TMB score. In addition, the
   relationship between risk score and TMB was investigated. Information
   from the Gene Set Cancer Analysis (GSCA) website
   ([95]http://bioinfo.life.hust.edu.cn/GSCA/) was used to identify
   mutations in core prognostic genes [[96]51].

2.7. Functional enrichment analysis and evaluation of TIME

   Initially, GO and KEGG analyses and GSEA were performed on DEGs between
   different risk groups. Based on potential gene signatures and the
   amount of each type of immune cell, the study examined the association
   between people at high risk and those at low risk using CIBERSORT.
   Using Spearman's correlation analysis, the association between
   signature genes and various immune cells was examined. The GSCA
   database was used to extract data on GSVA scores for these genes. This
   was then compared with immune functions, copy number variants (CNVs),
   and single nucleotide site variants (SNVs). Finally, variations in
   immune checkpoint gene expression and TIDE scores were compared between
   the high-risk and low-risk groups.

2.8. Statistical analyses

   R software (version 4.0.5) was used to conduct statistical analysis.
   Groups were compared for differences in overall survival (OS) times
   using a bilateral log-rank test and a survival curve was constructed.
   To determine the P-values for the correlation study, Spearman's method
   was applied. More than two groups were compared using the
   Kruskal-Wallis test, whereas two groups were compared using the
   Wilcoxon test. We used the R q value function (included in the R
   package “q value”) to compute Q-values in order to solve the problem of
   multiple hypothesis testing [[97]52]. When P < 0.05, the results were
   deemed statistically significant.

3. Results

3.1. Identification of circadian subtypes in BC

   In the end, there were 1823 samples in the validation set and 883
   samples in the training set. After gathering 2796 and 343 CRGs from
   GeneCards and MSigDB, respectively, duplicates were removed from the
   union of these sets. By intersecting the resulting set with the genes
   of patients with BC in TCGA, we identified 2655 CRGs. Unsupervised
   clustering with clusters 1 and 2 (512 and 371 cases, respectively)
   included in the training set was used to separate the two distinct
   circadian-related subpopulations based on CRG expression levels
   ([98]Fig. 1A). PCA divided patients into two groups, confirming the
   presence of two diverse circadian-related subtypes (Supplemental Fig.
   1A). To determine the variations in survival between clusters, BC
   cohorts in TCGA and METABRIC with relatively complete data on survival
   were examined. The survival advantage of Cluster 2 was greater than
   that of Cluster 1 ([99]Fig. 1B, Supplementary Fig. 1B). Subsequently,
   associations between both groups and numerous clinical features were
   assessed ([100]Fig. 1C). In contrast to Cluster 1, Cluster 2 had a
   larger percentage of patients aged ≤55 years and survival outcomes.
   Chi-square test findings revealed significant variations in many
   clinical aspects ([101]Fig. 1D). These findings suggest that CRGs
   influence tumor development. Using a heat map, we identified two
   distinct circadian subtypes based on their transcriptomic profiles
   ([102]Fig. 1D).

Fig. 1.

   [103]Fig. 1
   [104]Open in a new tab

   Consensus clustering of circadian-related genes in BC in TCGA. (A) TCGA
   cohort patient consensus matrix. (B) KM curve for OS with circadian
   classes. (C) Alluvial diagram showing changes in age, circadian
   cluster, and survival status. (D) Unsupervised clustering of all
   circadian-related genes.

3.2. Immune landscape of circadian subtypes

   GSVA was used to ascertain how the pathway enrichment analysis differed
   between the two circadian clusters in the TCGA cohort. According to the
   enrichment map, Cluster 1 was mainly focused on metabolic pathways and
   Cluster 2 was significantly enriched in immune and inflammatory
   pathways (e.g., chemokine, T-cell receptor, B-cell receptor, and
   Toll-like receptor) ([105]Fig. 2A). KEGG and GO analyses of the DEGs
   between the two circadian subtypes also revealed enrichment in immune
   function pathways ([106]Fig. 2B and C). With this finding further
   confirmed by GSEA, Cluster 2 was significantly enriched in T-cell
   receptor, cell adhesion molecules, and chemokine signaling pathways
   ([107]Fig. 3A and B). The ESTIMATE algorithm was used to further
   analyze the immune microenvironment of the two subtypes of BC by
   generating stromal, immune, and estimated scores. Cluster 2 had higher
   scores than Cluster 1 in all categories ([108]Fig. 3C). Finally, we
   tested whether circadian subtypes were significantly correlated with
   the effects of immunotherapy. TIDE analysis revealed that Cluster 2 was
   associated with higher TIDE, MSI, and T-cell dysfunction scores than
   Cluster 1, while the opposite was observed for the T-cell exclusion
   scores ([109]Fig. 3D). Due to the intriguing findings, we conducted
   further analysis on the correlation between different clusters and
   cancer-associated fibroblasts (CAFs). Additionally, Cluster 1 CAF
   signature genes with elevated expression levels were included in the
   investigation (Supplementary Fig. 2). Upon comparing the immune cell
   infiltration of both kinds, Cluster 2 had a unique immunological
   profile. Cluster 2 showed increased infiltration of natural killer
   cells and activated dendritic cells compared to Cluster 1 ([110]Fig.
   3E). Human leukocyte antigen (HLA) and immune checkpoint molecules play
   crucial roles in immune function and have significant clinical
   immunotherapy implications. Therefore, we also compared the expression
   levels of HLA and immune checkpoint genes among the different subtypes.
   Compared to Cluster 1, Cluster 2 has higher levels of HLA molecule
   expression ([111]Fig. 3F). Comparative analysis of immune checkpoint
   gene expression suggested that Cluster 2 may have a more positive
   response to immunotherapy against the CTLA4, CD274, LAG3, TIGIT,
   HAVCR2, PDCD1, and PDCD1LG2 immune checkpoints ([112]Fig. 3G).
   Furthermore, the prediction of sensitivity to chemotherapeutic drugs in
   different subtypes was performed in this study (Supplementary Fig. 3).
   Among them, cyclophosphamide, paclitaxel, vinorelbine, gemcitabine,
   cisplatin, 5-fluorouracil, epirubicin and docetaxel, which are commonly
   used in breast cancer chemotherapy, showed more benefit in Cluster 1
   (Supplementary Fig. 3A). Cluster 1 also has advantages in endocrine
   therapy, such as tamoxifen and fulvestrant (Supplementary Fig. 3B).
   Overall, our findings suggest that Cluster 2 patients, with increased
   HLA molecule expression and a unique immunological profile, may respond
   better to immunotherapy targeting specific immune checkpoints. These
   findings further underline the importance of understanding the role of
   the TIME in circadian-related disorders and their potential
   implications for therapeutic strategies.

Fig. 2.

   [113]Fig. 2
   [114]Open in a new tab

   Functional annotation analysis of DEGs between different circadian
   subtypes in TCGA. (A) GSVA data of the biological pathways.
   Immunologically relevant pathways: highlight in bold red font. Blue:
   inhibition of biological pathways. Red: activated biological pathways.
   (B) The analysis of KEGG enrichment in two circadian subtypes. (C) GO
   analysis of differentially expressed genes between two circadian
   subtypes. (For interpretation of the references to colour in this