Abstract Background Circadian rhythm is an internal timing system generated by circadian-related genes (CRGs). Disruption in this rhythm has been associated with a heightened risk of breast cancer (BC) and regulation of the immune microenvironment of tumors. This study aimed to investigate the clinical significance of CRGs in BC and the immune microenvironment. Methods CRGs were identified using the GeneCards and MSigDB databases. Through unsupervised clustering, we identified two circadian-related subtypes in patients with BC. We constructed a prognostic model and nomogram for circadian-related risk scores using LASSO and Cox regression analyses. Using multi-omics analysis, the mutation profile and immunological microenvironment of tumors were investigated, and the immunotherapy response in different groups of patients was predicted based on their risk strata. Results The two circadian-related subtypes of BC that were identified differed significantly in their prognoses, clinical characteristics, and tumor immune microenvironments. Subsequently, we constructed a circadian-related risk score (CRRS) model containing eight signatures (SIAH2, EZR, GSN, TAGLN2, PRDX1, MCM4, EIF4EBP1, and CD248) and a nomogram. High-risk individuals had a greater burden of tumor mutations, richer immune cell infiltration, and higher expression of immune checkpoint genes, than low-risk individuals, indicating a “hot tumor" immune phenotype and a more favorable treatment outcome. Conclusions Two circadian-related subtypes of BC were identified and used to establish a CRRS prognostic model and nomogram. These will be valuable in providing guidance for forecasting prognosis and developing personalized treatment plans for BC. Keywords: Circadian rhythm, Circadian-related genes, Breast cancer, Tumor immune microenvironment, Immunotherapy 1. Introduction Breast cancer (BC) is the most prevalent malignancy in women and the leading cause of cancer-related deaths. BC incidence is increasing annually worldwide. Several epidemiological investigations have linked circadian disruptions, such as jet lag, disrupted sleep, and shift work, increasing the risk of BC [[35][1], [36][2], [37][3], [38][4]]. Notably, many studies have verified that the circadian-related genes (CRGs) are a factor effecting the initiation, development, and prognosis of BC, by regulating physiological processes such as cell cycle control, cell proliferation, apoptosis, and DNA damage repair [[39][5], [40][6], [41][7], [42][8], [43][9]]. Recent findings have shown that circadian disruption affects the remodeling of the tumor immune microenvironment (TIME) and induces the establishment of an immunosuppressive microenvironment [[44][10], [45][11], [46][12]]. Historically, BC has been considered an immunologically “cold" tumor [[47]13,[48]14]. However, high throughput genomic and cellular profiling have indicated that the immunologic environment of BC is dynamic and heterogeneous, with a tumor microenvironment (TME) comprising unique functional immune infiltrates, suggesting potential new immunotherapeutic strategies [[49][15], [50][16], [51][17]]. Therefore, targeting analysis of the regulation of circadian rhythms on the BC immune microenvironment may provide an opportunity to diagnose and treat BC. The circadian rhythm is a 24-h endogenous rhythm established by organisms to adapt to the Earth's rotation. As an important regulatory system for the maintenance of normal cell and tissue homeostasis, circadian rhythm plays a key role in processes associated with cancer [[52][18], [53][19], [54][20], [55][21]]. This rhythm depends on the production of circadian genes and proteins regulated by the circadian clock. These genes can be specifically divided into core clock genes and clock control genes that interact with each other and affect the operation and expression of the circadian system [[56][22], [57][23], [58][24]]. Dysregulated or mutated circadian genes affect cancer development through complex and precise regulatory mechanisms, including regulation of the cell cycle, changes in metabolism, epithelial–mesenchymal transition, proliferation, and metastasis [[59]5,[60]25,[61]26]. For example, the overexpression of CLOCK and BMAL1 in BC tissues promotes the proliferation and invasion of BC cells [[62][27], [63][28], [64][29]]. Mutations in NPAS2 are associated with BC risk [[65]30,[66]31]. Studies have shown that there is a strong association between CRGs and immune checkpoint inhibitors that can affect both the infiltration of immune cells into tumor patients and the TME [[67]32]. Using a chronotherapy strategy to improve the benefit of immune checkpoint inhibitors may be sufficient to advance BC immunotherapy. In this strategy, CRGs could be a promising new set of biomarkers for BC risk and prognosis. Further research on CRGs is needed to realize precise and personalized medical regimens for BC. In previous studies, Zhang et al. constructed a risk prediction model to classify BC patients into high- and low-risk groups through CRGs, which laid a certain foundation for our study [[68]33]. We included more abundant CRGs and first identified the presence of different circadian subtypes in BC by consensus cluster analysis, which provided a reliable basis for model construction. A larger validation dataset was used, and after re-statistical analysis, our BC risk prediction model was constructed to distinguish between high-risk and low-risk groups of BC. The immune microenvironmental landscape of BC was mapped for a new perspective on circadian rhythms. In addition, the variance in gene mutations and biological processes between patients with different prognostic risks was further elucidated, which might provide new strategies for individualized healthcare. 2. Materials and methods 2.1. Data sources The Cancer Genome Atlas (TCGA, [69]https://tcga-data.nci.nih.gov/tcga/) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC, [70]http://www.cbioportal.org) databases were used to describe the transcriptome and clinical characteristics of patients with BC. TCGA data were used for training purposes. METABRIC data were used as a validation set. To process the data, first, we removed the normal samples from the dataset. Then, we screened only those samples for which the survival time was >30 days, retaining only one sample per patient. Finally, we preserved genes that were detected in over fifty percent of the samples. CRGs were obtained from GeneCards [[71]34] ([72]http://www.genecards.org/) and MSigDB ([73]http://software.broadinstitute.org/gsea/msigdb/index.jsp/). Local ethics committee approval was not required because the data were obtained from public databases (TCGA and METABRIC). The study strictly adhered to the data access policy and publication guidelines provided by TCGA and METABRIC [[74]35,[75]36]. 2.2. Construction of circadian subtypes in BC According to the expression of CRGs, the “ConsensusClusterPlus" was performed [[76]37]. It could identify different circadian associated subtypes in BC. To assess the capacity of mRNA expression profiles in discriminating between the two types associated with the circadian rhythm, we conducted principal component analysis (PCA). The R packages “survival” and “survminer” were used to conduct survival analysis. For the identification of differentially expressed genes (DEGs) between the two isoforms, the R package “Limma” was used [[77]38]. Technical support for the Sankey diagram was provided by the SangerBox platform ([78]http://sangerbox.com/). Additionally, the association between different circadian-related clusters and clinical features, including age, survival status, and stage, was determined using the R package “pheatmap”. 2.3. Enrichment analysis and immune landscape of circadian subtypes In order to elucidate disparities in biological processes that exist in the two identified circadian subtypes, we performed gene set variation analysis (GSVA) using the “GSVA” R package [[79]39]. Subsequently, DEGs were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG), gene ontology (GO), and gene set enrichment analysis (GSEA) using “Clusterprofiler” [[80][40], [81][41], [82][42]]. We looked more closely at how the infiltrating immune cells differed between the two subtypes with the help of “GSVA” and “GSEABase” R packages. The “ESTIMATE” calculated the immune, stromal, and estimated scores for the patients with BC [[83]43]. To increase the precision of anticipating, the tumor immune dysfunction and exclusion (TIDE) tool ([84]http://tide.dfci.harvard.edu/) was used to compare TIDE, microsatellite instability (MSI), dysfunction, and exclusion scores amongst various subtypes [[85]44]. In addition, the “pRRophetic” R package screened commonly used drugs for BC to determine the potential sensitivity of chemotherapy and endocrine therapy drugs for different subtypes [[86]45]. 2.4. Construction and validation of circadian-related risk score (CRRS) model in BC In breast cancer, core prognostic genes influence patient prognosis and immune function by regulating circadian rhythms. When the expression of these genes is dysregulated, circadian rhythm disruption may result. To investigate prognostic CRGs, univariate Cox regression analysis was performed utilizing the “survival”. These prognostic CRGs were further analyzed. The “glmnet” was selected to perform a least absolute shrinkage and selection operator (LASSO) penalized Cox regression analysis [[87]46]. Our goal is to identify genes that are important in predicting prognosis. The product of the expression of core prognosis-related CRGs and their coefficients yielded the CRRS. The formula for the risk score has been established as follows: [MATH: CRRS=i< mrow>Coefficientof(i)×Expressionofgene(i). :MATH] The coefficient of the gene (i) is the regression coefficient of the gene (i), and the expression of the gene (i) is the expression value of gene (i) for each patient. Using the risk score formulation described above, patients in each cohort were classified into high- or low-risk groups using the median risk score as the cutoff value. The impact of multiple key CRGs on breast cancer prognosis was fully considered by CRRS, which was calculated to differentiate different risk groups, and to some extent visualized the status of circadian rhythm disruption in patients. High-risk patients had dysregulated expression of prognostic core genes, disrupted circadian rhythms, and worse prognosis compared to low-risk patients. Simultaneously, PCA and t-distributed stochastic neighbor embedding analyses were used to verify the reliability of the CRRS model in discriminating between different risk groups using the “Rtsne” R package. To check the reliability of the prediction model, the receiver operating characteristic (ROC) curve was plotted in the “survivalROC” R package. C-index was calculated by the R package “survcomp”. Immunohistochemistry staining of prognostic CRGs was validated using the Human Protein Atlas (HPA; [88]http://www.proteinatlas.org/) [[89]47]. Risk scores, clinical events and model genes were presented using the SangerBox online platform ([90]http://sangerbox.com/) [[91]48]. 2.5. Construction of the nomogram prediction model Based on TCGA, the survival time, survival status, and other clinical characteristics of breast cancer patients were sorted out. Using the R packages “survival", “regplot” and “rms”, a nomogram was created that is suitable for clinical use. After multifactorial Cox regression analysis, we can assess the extent to each independent variable (influencing factor) contributes to the outcome variables (e.g., survival time, survival status, etc.), i.e., the magnitude of the regression coefficients. The nomogram takes into account factors such as age, clinical stage and risk score. Points were assigned according to the level at which each independent variable was taken, and these scores were then summed to obtain a total points. To visualize these predictors, we integrated them and used scaled line segments, plotted on the same plane at a certain scale. Clinicians can use this intuitive linear scale to predict a patient's probability of survival at different time points, specifically 1-, 3-, and 5-year. Furthermore, the ROC curve and decision analysis (DCA) are also applied employing the “rmda” R package [[92]49]. 2.6. Mutation analysis Mutational information were retrieved from TCGA and cBioPortal for Cancer Genomics ([93]http://www.cbioportal.org) databases. To evaluate differences in the percentage of genomic modifications, mutation count, and TMB, the “Maftools” was used [[94]50]. The median TMB was used to stratify the specimens. Then, for the purpose of survival analysis, we combined the risk score with the TMB score. In addition, the relationship between risk score and TMB was investigated. Information from the Gene Set Cancer Analysis (GSCA) website ([95]http://bioinfo.life.hust.edu.cn/GSCA/) was used to identify mutations in core prognostic genes [[96]51]. 2.7. Functional enrichment analysis and evaluation of TIME Initially, GO and KEGG analyses and GSEA were performed on DEGs between different risk groups. Based on potential gene signatures and the amount of each type of immune cell, the study examined the association between people at high risk and those at low risk using CIBERSORT. Using Spearman's correlation analysis, the association between signature genes and various immune cells was examined. The GSCA database was used to extract data on GSVA scores for these genes. This was then compared with immune functions, copy number variants (CNVs), and single nucleotide site variants (SNVs). Finally, variations in immune checkpoint gene expression and TIDE scores were compared between the high-risk and low-risk groups. 2.8. Statistical analyses R software (version 4.0.5) was used to conduct statistical analysis. Groups were compared for differences in overall survival (OS) times using a bilateral log-rank test and a survival curve was constructed. To determine the P-values for the correlation study, Spearman's method was applied. More than two groups were compared using the Kruskal-Wallis test, whereas two groups were compared using the Wilcoxon test. We used the R q value function (included in the R package “q value”) to compute Q-values in order to solve the problem of multiple hypothesis testing [[97]52]. When P < 0.05, the results were deemed statistically significant. 3. Results 3.1. Identification of circadian subtypes in BC In the end, there were 1823 samples in the validation set and 883 samples in the training set. After gathering 2796 and 343 CRGs from GeneCards and MSigDB, respectively, duplicates were removed from the union of these sets. By intersecting the resulting set with the genes of patients with BC in TCGA, we identified 2655 CRGs. Unsupervised clustering with clusters 1 and 2 (512 and 371 cases, respectively) included in the training set was used to separate the two distinct circadian-related subpopulations based on CRG expression levels ([98]Fig. 1A). PCA divided patients into two groups, confirming the presence of two diverse circadian-related subtypes (Supplemental Fig. 1A). To determine the variations in survival between clusters, BC cohorts in TCGA and METABRIC with relatively complete data on survival were examined. The survival advantage of Cluster 2 was greater than that of Cluster 1 ([99]Fig. 1B, Supplementary Fig. 1B). Subsequently, associations between both groups and numerous clinical features were assessed ([100]Fig. 1C). In contrast to Cluster 1, Cluster 2 had a larger percentage of patients aged ≤55 years and survival outcomes. Chi-square test findings revealed significant variations in many clinical aspects ([101]Fig. 1D). These findings suggest that CRGs influence tumor development. Using a heat map, we identified two distinct circadian subtypes based on their transcriptomic profiles ([102]Fig. 1D). Fig. 1. [103]Fig. 1 [104]Open in a new tab Consensus clustering of circadian-related genes in BC in TCGA. (A) TCGA cohort patient consensus matrix. (B) KM curve for OS with circadian classes. (C) Alluvial diagram showing changes in age, circadian cluster, and survival status. (D) Unsupervised clustering of all circadian-related genes. 3.2. Immune landscape of circadian subtypes GSVA was used to ascertain how the pathway enrichment analysis differed between the two circadian clusters in the TCGA cohort. According to the enrichment map, Cluster 1 was mainly focused on metabolic pathways and Cluster 2 was significantly enriched in immune and inflammatory pathways (e.g., chemokine, T-cell receptor, B-cell receptor, and Toll-like receptor) ([105]Fig. 2A). KEGG and GO analyses of the DEGs between the two circadian subtypes also revealed enrichment in immune function pathways ([106]Fig. 2B and C). With this finding further confirmed by GSEA, Cluster 2 was significantly enriched in T-cell receptor, cell adhesion molecules, and chemokine signaling pathways ([107]Fig. 3A and B). The ESTIMATE algorithm was used to further analyze the immune microenvironment of the two subtypes of BC by generating stromal, immune, and estimated scores. Cluster 2 had higher scores than Cluster 1 in all categories ([108]Fig. 3C). Finally, we tested whether circadian subtypes were significantly correlated with the effects of immunotherapy. TIDE analysis revealed that Cluster 2 was associated with higher TIDE, MSI, and T-cell dysfunction scores than Cluster 1, while the opposite was observed for the T-cell exclusion scores ([109]Fig. 3D). Due to the intriguing findings, we conducted further analysis on the correlation between different clusters and cancer-associated fibroblasts (CAFs). Additionally, Cluster 1 CAF signature genes with elevated expression levels were included in the investigation (Supplementary Fig. 2). Upon comparing the immune cell infiltration of both kinds, Cluster 2 had a unique immunological profile. Cluster 2 showed increased infiltration of natural killer cells and activated dendritic cells compared to Cluster 1 ([110]Fig. 3E). Human leukocyte antigen (HLA) and immune checkpoint molecules play crucial roles in immune function and have significant clinical immunotherapy implications. Therefore, we also compared the expression levels of HLA and immune checkpoint genes among the different subtypes. Compared to Cluster 1, Cluster 2 has higher levels of HLA molecule expression ([111]Fig. 3F). Comparative analysis of immune checkpoint gene expression suggested that Cluster 2 may have a more positive response to immunotherapy against the CTLA4, CD274, LAG3, TIGIT, HAVCR2, PDCD1, and PDCD1LG2 immune checkpoints ([112]Fig. 3G). Furthermore, the prediction of sensitivity to chemotherapeutic drugs in different subtypes was performed in this study (Supplementary Fig. 3). Among them, cyclophosphamide, paclitaxel, vinorelbine, gemcitabine, cisplatin, 5-fluorouracil, epirubicin and docetaxel, which are commonly used in breast cancer chemotherapy, showed more benefit in Cluster 1 (Supplementary Fig. 3A). Cluster 1 also has advantages in endocrine therapy, such as tamoxifen and fulvestrant (Supplementary Fig. 3B). Overall, our findings suggest that Cluster 2 patients, with increased HLA molecule expression and a unique immunological profile, may respond better to immunotherapy targeting specific immune checkpoints. These findings further underline the importance of understanding the role of the TIME in circadian-related disorders and their potential implications for therapeutic strategies. Fig. 2. [113]Fig. 2 [114]Open in a new tab Functional annotation analysis of DEGs between different circadian subtypes in TCGA. (A) GSVA data of the biological pathways. Immunologically relevant pathways: highlight in bold red font. Blue: inhibition of biological pathways. Red: activated biological pathways. (B) The analysis of KEGG enrichment in two circadian subtypes. (C) GO analysis of differentially expressed genes between two circadian subtypes. (For interpretation of the references to colour in this