Abstract Background E2F transcription factors are crucial in various biological processes, including cell proliferation, differentiation, and apoptosis. However, the exact role of E2F target genes in breast cancer (BC), as well as their influence on survival and immune response, remains poorly understood. Methods To investigate the differential expression of E2F target genes and their relationship with patient prognosis and immune cell infiltration, transcriptomic data from the Cancer Genome Atlas database were analyzed. A risk model was developed to identify genes associated with survival. BC samples were clustered into high-expression (C1) and low-expression (C2) groups of E2F target genes. The correlation between gene expression and factors such as survival, immune cell infiltration (CD4 + and CD8 + T cells), and immune checkpoint inhibitors (PD-L1 and PD-L2) was analyzed. We analyzed the link between clusters and clinical characteristics using the chi-squared test. For further investigation, single-cell data from [32]GSE243526 were utilized. For validation, the expression levels of JPT1 and TBRG4 were assessed using RT-qPCR in clinical samples. Results Genes targeting E2F, such as AURKB, JPT1, TBRG4, and KIF4A, showed increased expression linked to poor patient prognosis, regardless of clinical features. Kaplan-Meier survival analysis revealed that elevated expression of these genes correlated significantly with decreased survival rates and heightened mortality risk. Single-cell data confirmed that candidate genes exhibited higher expression in tumor-associated epithelial cells than healthy ones. Furthermore, samples from group C1 exhibited a lower survival rate than C2. Immune cell infiltration analysis determined that high expression of E2F target genes in the C1 subgroup was associated with diminished T cell infiltration and increased PD-L1 and PD-L2 expression. A strong and significant association was also identified between triple-negative breast cancer and the C1 cluster. RT-qPCR validation confirmed a significant elevation of JPT1 and TBRG4 expression levels relative to adjacent healthy tissues in BC. Conclusion These findings suggest that E2F target genes, including JPT1 and TBRG4, may act as prognostic biomarkers and contribute to immune evasion in BC. E2F target genes can also offer good potential for classifying and treating patients. Keywords: E2F transcription factors, Breast cancer, Prognosis, Immune cell infiltration, Gene expression, Single cell Introduction E2F transcription factors are crucial for regulating genes essential for cell division, particularly during the transition from G1 to S-phase of the cell cycle [[33]1]. E2F factors change from transcriptional activators to repressors through interactions with the retinoblastoma tumor suppressor protein and its partner proteins, p107 and p130 [[34]2]. Dysregulation of the E2F function has been associated with oncogenesis, highlighting its significance in cancer biology [[35]3]. In addition to their acknowledged functions in the regulation of the cell cycle, E2Fs in breast cancer (BC) serve crucial roles as mediators of tumor growth and metastasis [[36]4]. E2Fs regulate the growth and metastasis of tumors by promoting the expression of genes essential for DNA synthesis, replication, and cell cycle progression [[37]5]. The CDK-RB-E2F pathway is a key regulator of E2Fs, crucial for controlling gene expression throughout the cell cycle. Traditionally, it is understood that the retinoblastoma tumor suppressor becomes phosphorylated and inactivated by CDK4/6-cyclin D complexes stimulated by mitogenic signals. When RB is phosphorylated, E2F is released, increasing the expression of its target genes [[38]6]. Reports indicate that the expression levels of certain E2F transcription factors are linked to poor prognosis in BC [[39]7]. Some E2F transcription factors have also been linked to immune responses in different cancers, including BC [[40]8, [41]9]. These findings suggest that the E2F family of transcription factors and their target genes significantly contribute to cancer development. Research has shown that E2F and its target genes play a pivotal role in the development and malignancy of cancers. This study utilized both in silico data and ex vivo research to better illuminate the function of E2F target genes in BC. We further investigated how their expression relates to patient prognosis and immune cell infiltration. Materials and methods Data sources This study utilized transcriptomic data for BC sourced from the Cancer Genome Atlas (TCGA) database. It focused on identifying changes in the expression of E2F target genes. We downloaded the raw data and performed initial preprocessing following the methodologies detailed in our previous research [[42]10]. The BC dataset from TCGA comprised 113 healthy samples and 1109 cancerous samples. The most recent clinical data were utilized to assess the clinical characteristics of TCGA cancer samples, such as stage, TNM.T, and TNM.N. Various BC subtypes, including triple-negative breast cancer (TNBC), Luminal A, Luminal B, and human epidermal growth factor receptor 2 positive (HER2+), were also recognized in this analysis way. The most recent gene expression profiles and clinical data from TCGA were incorporated into this research. Additionally, single-cell data from the [43]GSE243526 dataset were assessed. The study included 12 tumor samples and four healthy ones. Single-cell data analysis The [44]GSE243526 data were downloaded in raw format. The Seurat package (V 5.2) and other related packages were utilized to analyze the single-cell data. The mitochondrial percentage for each cell was calculated based on the expression of mitochondrial genes, and samples with a mitochondrial percentage greater than 10% were removed from the dataset. The data were normalized using the logNormalize method, followed by scaling based on genes associated with cell proliferation. Significant principal components (PCs) were identified using the JackStraw package (V 1.3.17), and PCs with a p-value less than 0.05 were selected for clustering. The identified clusters were visualized using UMAP. The SingleR package (V 3.21) was used to determine the cell type in each cluster. Using the FindMarkers function, markers specific to each cluster were identified and manually validated against CellMarker and Azimuth databases. The manual results were integrated with the outcomes from SingleR. Epithelial cells were chosen due to their primary role in BC. We calculated the expression differences of candidate genes in epithelial cells from cancer samples versus those from healthy tissue. Prognosis and risk assessment Clinical data preprocessing followed established methodologies from previous studies [[45]10]. Survival analyses were conducted using the survival package (V 3.8). A univariate Cox regression test identified the association between candidate gene expression and patient prognosis. Furthermore, multivariate Cox regression analysis assessed whether the link between candidate gene expression and patient mortality rates remained independent of clinical characteristics. Risk scores based on the expression of candidate genes were calculated using the following formula: Risk score = Exp [(gene1)] * Beta value [(gene1)] + Exp [(gene2)] * Beta value [(gene2)] +…. Kaplan-Meier survival curve analysis confirmed the link between candidate gene expression, especially the elevated levels of E2F target genes, and patient mortality rates. Clustering and differential expression BC samples from the TCGA database were divided into two categories based on E2F target gene expression: high-expression (C1) and low-expression (C2). The clustering and related analyses were conducted using the cluster (V 2.1.8) and NbClust (V 3.0.1) packages. The k-means clustering algorithm facilitated this categorization. The optimal number of suitable clusters was obtained through the Elbow method. The K-means algorithm’s iterations per run were chosen to be 20, and the maximum number of replications per run was selected to be 500. Increasing the parameters did not affect clustering. Clustering quality was evaluated using silhouette scores calculated via the silhouette function from the cluster package in R. The silhouette value reflects the consistency of each sample within its cluster, balancing intra- and inter-cluster distances. The average silhouette width was used to summarize clustering performance. To examine the differential expression of E2F target genes, clinical data were leveraged to distinguish samples into cancerous and healthy groups. The expression level variations for all genes in group C1 relative to C2 were computed. A linear model assessed differential expression, and the false discovery rate (FDR) threshold was enforced to guarantee statistical significance. Candidate genes, pathway enrichment, and immune cell filtration To identify E2F target genes, we used the msigDB database ([46]https://www.gsea-msigdb.org/gsea/msigdb) to extract the E2F target gene set. The KEGG database was employed for enrichment and identifying pathways linked with differentially expressed genes. We used the Estimation of Proportions of Immune and Cancer Cells (EPIC) algorithm to calculate each sample’s immune cell filtration levels in the RNA-seq data. Transcriptomic data were analyzed in TPM format. We applied the Wilcoxon test to evaluate the significance of filtration differences between the C1 and C2 groups. We assessed the expression levels of two key T-cell inhibitors, Programmed Death-Ligand 1 (PD-L1) and Programmed Death-Ligand 2 (PD-L2), in both the C1 and C2 groups. Sample collection For this study, 35 BC samples and their corresponding healthy tissues were sourced from the Iran Tumor Bank. Table [47]1 summarizes the participants’ clinical information. All ethical guidelines and protocols established by the Iranian Ministry of Health and Medical Education were meticulously followed. The samples were gathered at Imam Khomeini Hospital in Tehran, where the hospital’s ethics review board approved all ethical protocols. Informed consent was secured from every participant, and all samples were preserved in liquid nitrogen until needed. Table 1. Summary of clinical information of BC patients participating in this study Subgroups Number of samples Stage (Number) HER2+ 6 Stage I (0) Stage II (3) Stage III (2) Stage IV (1) Luminal A 13 Stage I (3) Stage II (5) Stage III (4) Stage IV (1) Luminal B 11 Stage I (2) Stage II (8) Stage III (1) Stage IV (0) TNBC 5 Stage I (0) Stage II (2) Stage III (3) Stage IV (0) Healthy 35 - [48]Open in a new tab cDNA synthesis, primer design, and RT-qPCR RNA was extracted from samples using TRIzol reagent per the manufacturer’s protocol. RNA quality was assessed by measuring absorbance at 260 and 280 nm. DNA contamination was removed using DNase I (SinaClon, Iran) treatment. CDNA synthesis (Genius Gene, Iran) utilized oligo-dT, random hexamer primers, and reverse transcriptase. Primers were designed using the Primer-BLAST tool. The primer sequences were as follows: JPT1 (F: 5’-GCAGAGGAAGGCTTGGATGT-3’ and R: 5’-GAAGACCCGCTTCAGTGTGA-3’); TBRG4 (F: 5’-AGTACAAGCACCTGGCCTTC-3’ and 5’-AGGCGGTTCATTAGTGGCTC-3’); and B-actin (F:5’-CGAGCACAGAGCCTCGC-3’ and R: 5’-GCGGCGATATCATCATCCAT-3’). Target gene expression levels were evaluated using specific primers and SYBR GREEN dye. B-actin expression was used as an internal control for normalization. Gene expression levels in each sample were calculated using the 2^−ΔCT method. Statistics and software All preprocessing and statistical analyses were conducted using the R programming language (V 4.4.2). The false discovery rate (FDR) test was utilized to ascertain statistical significance within the TCGA and single-cell data. For survival-related analyses, the log-rank test assessed significance levels. The sensitivity and specificity of candidate genes in group C1 against group C2 were evaluated using receiver operating characteristic (ROC) curves, calculating the area under the curve (AUC) for assessment. The analysis of expression differences, ROC, and visualization of ex vivo data were conducted using GraphPad Prism (V 8.4). A chi-squared test was used to examine the association of identified clusters with clinical characteristics. Results Significant elevated levels of E2F target genes in BC and their link to poor prognosis Figure [49]1 illustrates a flowchart outlining the study design and its sequential steps. Using data from the MSigDB database, researchers identified 200 genes that may serve as E2F targets. Of these, 87 genes showed overexpression of more than two-fold in cancerous samples when compared to healthy samples, as per TCGA data (Fig. [50]2A, logFC > 1 and FDR < 0.01). Additionally, Cox regression analysis indicated that the expression levels of 31 out of the 200 genes were linked to unfavorable patient prognosis (Fig. [51]2B, HR > 1 and logRank < 0.05). Notably, 24 of these genes exhibited significant overexpression and a connection to poor prognosis (Fig. [52]2C). Fig. 1. [53]Fig. 1 [54]Open in a new tab A flowchart of the overall study process is shown Fig. 2. [55]Fig. 2 [56]Open in a new tab (A) Differential expression of E2F target genes in BC samples versus healthy tissues, according to TCGA data. A total of 87 genes exhibited significant overexpression (fold change > 2) in cancerous tissues. (B) The link between E2F target genes and patient prognosis is established. (C) Genes from E2F targets that were overexpressed and linked to unfavorable prognosis in BC patients were identified Subsequently, the independent prognostic value of the 24 identified genes was evaluated using multivariate analysis, considering clinical characteristics. Among these, AURKB, JPT1, TBRG4, and KIF4A expression levels were independently associated with unfavorable prognosis (Table [57]2, HR > 1 and log-rank < 0.05). A risk model was created using the expression data of four genes. The risk model, which considers the expression levels of these four genes, indicated that higher expression correlates with an elevated mortality rate (Fig. [58]3A and B, log-rank = 0.0002). The Kaplan-Meier survival curve analysis further validated the link between increased expression of these genes and elevated mortality rates (Fig. [59]4A and D, log-rank < 0.05). Thus, these results indicate that E2F target genes could act as potential prognostic biomarkers in BC. Table 2. The multivariate Cox regression analysis was conducted for the twenty-four candidate E2F target genes in order to investigate their associations with patient prognosis, taking into account the impact of clinical parameters Parameters Univariate Multivariate HR P value 95% CI HR P value 95% CI Beta value Pathological Stage (Stage III, IV vs. Stage I, II) 2.54 0 < 0.00001 1.65–3.71 1.82 0.00002 1.43–2.93 0.73 TNM.T (T3,4 vs. T1,2) 1.72 0 < 0.00001 1.13–2.66 1.24 0.16 0.93–1.24 0.13 TNM.N (N0 vs. N1,2,3) 2.21 0 < 0.00001 1.62–3.13 1.52 0.001 1.21–1.96 0.65 Subtype (TNBC vs. HER+,lumA and lumB) 0.99 0.24 0.64– 1.51 - - - - Age (> 60 vs. <60) 0.97 0.63 0.83–1.13 - - - - AURKB 1.47 0.001 1.14–1.73 1.37 0.01 1.09–1.33 0.38 JPT1 1.43 0.001 1.08–1.71 1.32 0.02 1.02–1.28 0.31 TBRG4 1.32 0.006 1.03–1.64 1.24 0.03 1.01–1.26 0.29 KIF4A 1.36 0.004 1.01–1.82 1.23 0.04 1.01–1.24 0.27 [60]Open in a new tab Fig. 3. [61]Fig. 3 [62]Open in a new tab (A and B) Displays the outcomes of the model developed using AURKB, JPT1, TBRG4, and KIF4A expressions. The increase in risk was associated with an increase in patient mortality rates Fig. 4. [63]Fig. 4 [64]Open in a new tab (A-D) Demonstrates the Kaplan-Meier curve results for the candidate genes. Elevated AURKB, JPT1, TBRG4, and KIF4A expression levels were associated with higher mortality rates Increased expression of AURKB, JPT1, TBRG4, and KIF4A in epithelial cells derived from cancer samples Single-cell data provided for additional validation. After preprocessing the [65]GSE243526 dataset, we discovered nine unique cell types (Fig. [66]5A). Subsequently, we focused solely on epithelial cells since these are the primary source of BC cells. As displayed in Fig. [67]5B, the expression levels of candidate genes such as AURKB, JPT1, TBRG4, and KIF4A were markedly elevated in epithelial cells from cancer samples compared to those from healthy samples (logFC > 1 and FDR < 0.01). This data indicates a marked increase in the expression of E2F target genes in primary cancer-related cells. Fig. 5. [68]Fig. 5 [69]Open in a new tab (A) Clustering results are shown for single-cell data and different cell subgroups based on [70]GSE243526 data. (B) Candidate gene expression levels in epithelial cells from cancer samples were assessed relative to healthy samples Differential gene expression related to cell proliferation and p53 pathways is evident in samples with high E2F target gene expression All 1,109 TCGA BC samples were clustered according to the 87 E2F target genes identified during the initial analysis. Figure [71]6A illustrates that the samples in group C1 (N = 568) displayed high expression of these E2F target genes, while group C2 (N = 541) samples showed reduced expression levels. The silhouette analysis for the 2-cluster solution yielded an average silhouette width of 0.34, indicating acceptable clustering quality (Fig. [72]6B). While a subset of samples demonstrated strong cluster membership (silhouette > 0.4), others exhibited lower scores, suggesting partial overlap between clusters. These results reflect the inherent heterogeneity of the dataset and support the choice of k = 2 as the most stable partitioning. Fig. 6. [73]Fig. 6 [74]Open in a new tab (A) BC samples were categorized into two groups based on E2F target gene expression: high (C1, N = 568) and low (C2, N = 541). (B) The silhouette score for the clustered samples is shown Patient survival rates in group C1 were notably lower than in group C2 (Fig. [75]7A, log-rank < 0.0001). Furthermore, 435 genes had a logFC > 1 and an FDR < 0.01, indicating differential expression between the C1 and C2 groups (Fig. [76]7B). The pathway enrichment analysis for these 435 genes highlighted their participation in pathways related to cell proliferation, DNA replication, and the p53 signaling pathway (Fig. [77]7C, FDR < 0.01). These findings indicate that E2F target genes could play a significant role in these crucial pathways, emphasizing their possible involvement in the progression of BC. The chi-squared test results suggested a significant association between the C1 cluster samples and clinical features such as Stage, TNM.T, and subtype (Table [78]3, P < 0.0001). TNM.T4 showed a strong association with cluster C1, occurring at a rate of 64% (Table [79]3). Among the various BC subtypes, TNBC, HER2+, and Luminal B showed the highest association with the C1 cluster, with frequencies of 96%, 89%, and 88%, respectively (Table [80]3, P = 0). These findings indicate that the elevated expression of E2F target genes is particularly pronounced in the TNBC subgroup, making them potentially more appropriate therapeutic targets in this group. Fig. 7. [81]Fig. 7 [82]Open in a new tab (A) The C1 group exhibited significantly lower survival rates than the C2 group, highlighting the prognostic value of E2F target gene expression. (B) A volcano plot displays all differentially expressed genes from the C1 group compared to C2. (C) The pathway enrichment analysis of the differentially expressed genes in the C1 group has identified significant pathways, including cell proliferation, DNA replication, and the p53 signaling pathway. These pathways demonstrate enrichment, suggesting their potential role in cancer progression Table 3. The association of clinical features with identified clusters related to the expression levels of target genes is presented. A Chi-squared test was utilized for the analysis Clinical features Number in Cluster C1 (Frequency) Number in Cluster C2 (Frequency) χ² P.value Stage Stage I Stage II Stage III Stage IV 64 (35%) 339 (55%) 128 (52%) 9 (56%) 115 (65%) 276 (45%) 117 (48%) 7 (44%) 21.15 0.0001 TNM.T T1 T2 T3 T4 100 (37%) 353 (58%) 66 (48%) 21 (64%) 168 (63%) 263 (42%) 72 (52%) 12 (36%) 32.54 0 Subtype TNBC Luminal A Luminal B HER2+ 182 (96%) 105 (19%) 178 (88%) 68 (89%) 8 (4%) 442 (81%) 24 (12%) 9 (11%) 545 0 TNM.N N0 N1 N2 N3 258 (51%) 181 (51%) 68 (53%) 33 (46%) 256 (49%) 172 (49%) 58 (47%) 39 (54%) 8 0.7 [83]Open in a new tab Elevated expression of PD-L1 and PD-L2 in group C1 and decreased T-cell filtration The expression levels of inhibitory T-cell genes, such as PD-L1 and PD-L2, were analyzed in groups C1 and C2. Group C1 showed significantly higher levels of both genes than group C2 (Fig. [84]8A and B, FDR < 0.01). The EPIC algorithm assessed immune cell infiltration in the two subgroups. Cancer-associated fibroblast (CAF) filtration was significantly lower in the C1 group (Fig. [85]8C, P < 0.0001). CD4 + T cell and CD8 + T cell infiltration were notably lower in samples from C1 than in those from C2 (Fig. [86]8D and E, P < 0.001). NK cell infiltration also decreased in the C1 group (Fig. [87]8F, P < 0.01). These results imply a potential correlation between E2F target gene expression and the presence of immune checkpoint inhibitors and decreased immune cell infiltration in the tumor microenvironment of cancer cells. Fig. 8. [88]Fig. 8 [89]Open in a new tab (A and B) Immunological cell infiltration and levels of immune checkpoint inhibitors (PD-L1 and PD-L2) were analyzed in C1 (N = 568) and C2 (N = 541) groups. (C-F) The C1 group displayed a decrease in CD4 + and CD8 + T cell infiltration, with increased expression of PD-L1 and PD-L2, highlighting immune evasion in samples with high E2F target expression Overexpression of JPT1 and TBRG4 in BC The expression levels of JPT1 and TBRG4 were validated in BC samples relative to adjacent healthy tissues using RT-qPCR. These genes were chosen explicitly because they are less explored in BC and have been linked to poorer patient outcomes in prior studies. As illustrated in Fig. [90]9A and B, the expression of both genes was notably higher in group C1, demonstrating a twofold increase compared to group C2. ROC analysis revealed that JPT1 and TBRG4 expression levels effectively distinguished between groups C1 and C2, exhibiting strong sensitivity and specificity (Fig. [91]9C and D, P < 0.0001). Moreover, substantial overexpression of JPT1 and TBRG4 was detected in BC samples relative to adjacent healthy tissues, as indicated by RT-qPCR results (Fig. [92]9E and F, P < 0.001). These findings underscore their potential role as biomarkers for BC prognosis and classification. Fig. 9. [93]Fig. 9 [94]Open in a new tab (A and B) The expression levels of JPT1 and TBRG4 were analyzed in greater detail across different clusters, specifically C1 and C2 based on TCGA data. (C and D) The AUC for JPT1 and TBRG4 expression was compared between the C1 subgroup and C2, as shown. (E and F) JPT1 and TBRG4 overexpression in BC tissues (N = 35) was confirmed in comparison adjacent healthy tissues (N = 35), as reflected in RT-qPCR results Discussion This study discovered 200 potential E2F target genes related to BC using the MISGDB database. TCGA data shows 87 genes exhibited significant overexpression in cancer samples compared to healthy tissues. Several genes showed statistically substantial upregulation, with a fold change greater than 2. Moreover, Cox regression analysis revealed that the expression levels of 31 genes were associated with a poor prognosis in BC patients. From this subset, 24 genes were selected for further study due to their substantial overexpression and strong association with adverse patient outcomes. Multivariate analysis of clinical features indicated that the expression levels of the AURKB, JPT1, TBRG4, and KIF4A genes were independently linked to poor prognosis in patients. A risk model using the expression levels of these four genes was developed, showing that their concurrent upregulation significantly increased the mortality rate among patients. Furthermore, Kaplan-Meier survival analysis confirmed that elevated expression of these genes was associated with higher mortality rates. These results are consistent with earlier research, including a study by Huang et al., who reported that overexpression of AURKB is associated with poor survival in BC [[95]11]. AURKB is a serine/threonine kinases that play essential roles in cell cycle regulation. Increased expression and activity of AURKB can increase cell proliferation and invasion of cancer cells [[96]12]. It has also been shown in bladder cancer that AURKB can regulate P53 activity through MAD2L2 [[97]13]. Additionally, it has been reported that the level of AURKB expression is associated with tumor immune responses in various cancers [[98]14]. JPT1 was found to be overexpressed in a particular subset of BC, leading to heightened cell proliferation [[99]15]. The role of JPT1 as an oncogene in endometrial and prostate cancers has been recognized [[100]16, [101]17]. TBRG4 may influence the cell cycle by stabilizing regulatory proteins at the transcriptional level, a property attributed to its leucine zipper motif. Increasing its activity increases cell proliferation [[102]18]. Elevated levels of TBRG4 have been observed in hepatocellular carcinoma, lung cancer, and pancreatic cancers, correlating with higher malignancy rates [[103]18–[104]20]. For example, in hepatocellular carcinoma, the knockdown of TBRG4 can decrease the proliferation, migration, and invasion of cancer cells through the TGF-β pathway [[105]20]. KIF4A has been recognized as a prognostic and oncogenic biomarker in BC [[106]21, [107]22]. KIF4A, a kinesin 4 protein, regulates chromosome condensation and segregation during mitosis [[108]23]. Research has demonstrated that KIF4A influences stemness and metastasis pathways in lung cancer and glioma, with its elevated levels correlating to greater malignancy [[109]24]. Furthermore, studies have demonstrated that reducing KIF4A expression can inhibit BC cell proliferation, migration, and invasion [[110]22]. Fujiwara et al. reported similar findings, observing that high nuclear levels of E2F4 were associated with lower survival rates in BC patients [[111]7]. Additionally, Zhang et al. confirmed that higher expression of E2F1 is linked to poor prognosis in various cancer types, including BC [[112]25]. These studies support our conclusion that E2F target genes, particularly AURKB, JPT1, TBRG4, and KIF4A, could serve as valuable prognostic biomarkers for BC. Single-cell data analyses corroborated these findings. In this study, we analyzed 1,109 BC samples from the TCGA database, stratifying them by the expression levels of 87 E2F target genes identified in the initial phase. As shown in Fig. [113]1, the C1 group exhibited high expression of E2F target genes, while the C2 group displayed low expression. Notably, the C1 group had a lower survival rate than the C2 group. Subsequently, we identified 435 genes with significant differential expression (logFC > 1 and FDR < 0.01) between the C1 and C2 groups. Pathway enrichment analysis revealed that these genes are involved in critical cellular processes such as cell proliferation, DNA replication, and the p53 signaling pathway. This suggests that E2F target genes may play a role in these pathways and interact with them. The E2F family of transcription factors regulates genes crucial for cell cycle progression, DNA synthesis, and DNA replication [[114]26]. E2F proteins recruit transcriptional activators to regulate the expression of these genes, thus impacting cell proliferation [[115]27]. The p53 protein, an essential tumor suppressor, can block cell cycle progression by interacting with E2F proteins [[116]28]. Upon DNA damage, p53 activates the transcription of p21, which inhibits cyclin-dependent kinases. This results in the retention of Rb in its active form and suppresses E2F-mediated transcription [[117]29]. Our results are consistent with these established mechanisms, suggesting that the increased expression of E2F target genes in the C1 group could interfere with normal cell cycle regulation and the tumor-suppressive functions of p53, leading to a worse prognosis in BC patients [[118]30, [119]31]. This study highlights the significance of E2F target genes and their therapeutic and prognostic potential in BC. For the first time, we showed the elevated expression of JPT1 and TBRG4 in BC, linking it to a worse patient prognosis. The findings indicated that the TNBC, HER2+, and Luminal B subgroups exhibited greater expression than the E2F target genes (Cluster C1). Previous studies have also shown that the expression levels of genes related to cell proliferation can be higher in the TNBC subgroup than in other subgroups [[120]10]. Our results also showed that in samples from subgroup C1, the filtration rate of T-cell immune cells may be lower. While the involvement of E2Fs in immune responses is indicated in certain cancers, it remains underexplored [[121]8, [122]32]. This study suggests that E2F target genes may affect the tumor microenvironment and the infiltration of immune cells, positioning E2Fs as valuable therapeutic and diagnostic targets for BC. A limitation of this study is that further in vitro and in vivo experiments are required to clarify the precise biological roles of JPT1 and TBRG4 in BC. Moreover, the retrospective nature of the TCGA and single-cell RNA-seq datasets may introduce inherent selection biases, thereby limiting the generalizability of the results. In addition, the relatively small sample size used for ex vivo RT-qPCR validation (n = 35 clinical samples) may limit the statistical power of the findings. Future studies should address these limitations using larger, independent cohorts and functional assays. Conclusion This study identified significant overexpression of 87 E2F target genes in BC, with over two-fold upregulation compared to normal tissues. Cox regression analysis shows that 31 genes are linked to poor prognosis; 24 have overexpression and strong correlations with adverse outcomes. Notably, AURKB, JPT1, TBRG4, and KIF4A are independently associated with poor prognosis, even after adjusting for clinical features. A risk model based on these genes indicates increased mortality rates, confirmed by Kaplan-Meier analysis. Pathway analysis suggests these genes are involved in cell proliferation, DNA replication, and p53 signaling, indicating their regulatory role. Our results emphasize the potential of E2F target genes, particularly AURKB, JPT1, TBRG4, and KIF4A, as prognostic biomarkers for BC. Acknowledgements