Abstract Purpose The aim of this study was to explore potential gene therapy targets for triple-negative breast cancer (TNBC). Patients and Methods Three gene expression profiles ([40]GSE64790, [41]GSE62931, and [42]GSE38959) from the Gene Expression Omnibus (GEO) database were analyzed. The GEO2R analysis tool was used to screen for differentially expressed genes (DEGs) between TNBC and normal tissues, followed by Gene Ontology functional annotation and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the DEGs. The protein–protein interaction network of DEGs was visualized using Metascape to identify the core genes. Subsequently, transcriptional data for the core genes in patients with breast cancer were investigated in the ONCOMINE database. Kaplan–Meier survival analysis was used to evaluate the prognostic value of core gene expression levels in patients with TNBC. Finally, the clinicopathological and long-term follow-up data of 39 patients with TNBC were retrospectively analyzed at the First Affiliated Hospital of the Bengbu Medical College between January 2014 and July 2020. Immunohistochemistry was used to evaluate the expression and subcellular localization of CCNB2 in TNBC tissues. Results A total of 66 DEGs were identified between TNBC and normal tissues, including 33 upregulated and 33 downregulated genes in TNBC. Furthermore, a potential protein complex was identified for five core genes. The high expression of these core genes, especially the overexpression of CCNB2, was correlated with a poor prognosis of patients with TNBC. The CCNB2 protein was expressed in the cytoplasm, and its expression was significantly higher in TNBC tissues than that in the adjacent nontumor tissues. Overall survival of patients was significantly correlated with the expression of CCNB2 (p < 0.05). Conclusion CCNB2 may play a crucial role in the development of TNBC and has the potential to be used as a prognostic biomarker for TNBC. Keywords: triple-negative breast cancer, bioinformatics, prognosis, CCNB2, immunohistochemistry, overall survival Introduction Breast cancer has now surpassed lung cancer as the most common cancer, with an estimated 2.3 million new cases. According to reports, the incidence of breast cancer has increased annually in the past few decades, and it has become the most prevalent among female malignancies. Despite advances in the treatment and diagnosis, approximately 685,000 patients die from breast cancer in 2020 worldwide.[43]^1–3 Triple-negative breast cancer (TNBC), a special clinical subtype of breast cancer, which is characterized by negative expression of the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER-2), accounts for 12% to 17% of all invasive breast cancers.[44]^4 Since TNBC demonstrates highly malignant features, such as strong invasiveness, early metastasis, frequent recurrence, and a short survival, it has attracted widespread attention.[45]^5 Owing to the abundance of molecular information from several public databases, such as Gene Expression Omnibus (GEO) and ONCOMINE,[46]^6 the mechanism of cancer progression can be researched using unparalleled methods. Additionally, the differentially expressed genes (DEGs) between cancer and normal tissues can be screened based on bioinformatics analysis. The identification of these oncogenes or tumor suppressor genes may lead to the prediction of potential biomarkers and may provide new therapeutic strategies for cancers. Among various bioinformatics methods, DEG analysis is a widely used independent tool to study gene upregulation and downregulation.[47]^7 In this study, we analyzed and validated cancer-related genes using bioinformatics methods to explore new therapeutic targets that could improve the overall prognosis of patients with TNBC. Materials and Methods Source of Data The GEO database ([48]https://www.ncbi.nlm.nih.gov/geo/) was used to download the original data, including the expression profiles of TNBC and non-TNBC tissues. A total of 4442 results for “TNBC” were retrieved from the GEO datasets, among which three TNBC-related gene expression profiles ([49]GSE64790, [50]GSE62931, and [51]GSE38959) were selected. This study did not involve any human or animal experiments. Screening for DEGs In each profile, the data were divided into TNBC and non-TNBC subsets. The online analysis tool GEO2R ([52]https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to analyze the data. An adjusted p-value of < 0.05 and a |log[10] fold change (FC)| of ≥ 1.5 were defined as meaningful differences. Statistical analysis was then performed on the three datasets, and the Venn graph network tool ([53]http://www.interactivenn.net/index.html) was used to determine the overlapping DEGs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Enrichment Analyses The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used to simultaneously perform GO functional annotation and KEGG pathway enrichment analysis. In GO analysis, p < 0.01 and a count of ≥ 10 were defined as statistically significant. For KEGG pathway analysis, p < 0.01 was considered meaningful. Protein–Protein Interaction (PPI) Network Construction and DisGeNET Analysis Metascape ([54]http://metascape.org/)[55]^8^,[56]^9 was applied to analyze the enriched pathways and processes of DEGs and their adjacent genes, including GO terms for cellular component (CC), biological process (BP), and molecular function (MF) categories and KEGG pathways. A p-value of < 0.01, enrichment factor of > 1.5, and minimum count of 3 were considered meaningful. A subset of enriched terms was selected, and a network plot was drawn to further determine the relationships among the terms. The following databases were used for PPI enrichment analysis: BioGrid+, InWeb_IM+, and OmniPath+. Moreover, the molecular complex detection (MCODE) algorithm was used to identify tightly connected network components. Metascape-provided DisGeNET analysis was used to study and predict human disease-related genes. ONCOMINE Database Analysis Using the ONCOMINE ([57]www.oncomine.org)[58]^10 database, we determined the mRNA expression levels of the SKA1, CCNB2, CENPF, CENPA, and BIRC5 genes in various cancers. Kaplan–Meier Analysis The Kaplan–Meier plotter ([59]www.kmplot.com) was used to evaluate the prognostic value of DEGs in TNBC. The patient samples were divided into two groups (high and low expression) based on the median expression level. Using Kaplan–Meier survival plots, the relapse-free survival (RFS) of patients with TNBC was determined, and the risk ratio was estimated, along with the 95% confidence interval (CI) and the log-rank p-value. Immunohistochemistry TNBC specimens confirmed by pathological diagnosis were obtained from the Pathology Department of the First Affiliated Hospital of the Bengbu Medical College. CCNB2 expression was detected by immunohistochemical staining. The tissue sections were deparaffinized and dehydrated following routine protocols, and endogenous peroxidase activity was inactivated with 3% H[2]O[2] in methanol. The primary antibody against CCNB2 (ab185622, Abcam, Cambridge, UK) was diluted to 1/100 with PBS. The tissue sections were then incubated with the relevant antibodies, stained with diaminobenzidine, and counterstained with hematoxylin. Except for the primary antibody, all reagents used in the immunohistochemical experiment were purchased from Fuzhou Maixin Biological Co. Ltd., Fujian Province, China. The positive cells were identified by the obvious brown granules in the cell membrane or cytoplasm. Statistical Analysis The Kaplan–Meier method was used for univariate overall survival (OS) analysis. OS was defined as the period from diagnosis to recurrence, metastasis, death, or the end of follow-up. The SPSS software version 22.0 (IBM, New York, NY, USA) was used for all statistical analyses. Statistical significance was set at p < 0.05. Results Identification of DEGs In the GEO database, we selected three TNBC-related gene expression profiles ([60]GSE64790, [61]GSE62931, and [62]GSE38959). The results showed that there were 600 DEGs in [63]GSE62931, of which 269 were upregulated and 331 were downregulated in TNBC. [64]GSE38959 had 1550 DEGs, of which 1010 were upregulated and 540 were downregulated in TNBC. In [65]GSE64790, a total of 660 DEGs were detected, including 186 upregulated and 374 downregulated in TNBC. Venn diagram analysis resulted in the identification of a total of 66 overlapping DEGs, of which 33 were upregulated and 33 were downregulated in TNBC ([66]Table 1, [67]Figure 1). Table 1. Statistics of the Three Microarray Datasets Selected from the Gene Expression Omnibus Database Dataset ID Triple-Negative Breast Cancer Normal Total [68]GSE64790 3 3 6 [69]GSE62931 47 53 100 [70]GSE38959 30 13 43 [71]Open in a new tab Figure 1. [72]Figure 1 [73]Open in a new tab Venn diagrams of DEGs common to three Gene Expression Omnibus datasets. (A) Total DEGs; (B) upregulated DEGs; (C) downregulated DEGs. Abbreviation: DEG, differentially expressed gene. Functional Enrichment Analysis of DEGs in Patients with TNBC The DEG list was entered into the DAVID for GO and KEGG pathway enrichment analyses. The enriched GO terms included the CC, BP, and MF categories. The results of GO analysis revealed that the DEGs were mainly enriched in BP terms related to mitosis and cell proliferation, CC terms related to the nucleus and nucleoplasm, and MF terms related to protein binding. In addition, KEGG pathway analysis revealed that the DEGs were mainly enriched in pathways related to progesterone-mediated oocyte maturation, oocyte meiosis, and the cell cycle ([74]Table 2). Table 2. Significantly Enriched GO Terms and KEGG Pathways Category Term Description Count p-value BP GO:0007067 Mitotic nuclear division 10 1.4E−7 BP GO:0008283 Cell proliferation 10 3.6E−6 CC GO:0005634 Nucleus 32 4.3E−4 CC GO:0005654 Nucleoplasm 20 1.4E−3 MF GO:0005515 Protein binding 42 4.8E−3 KEGG pathway hsa04914 Progesterone-mediated oocyte maturation 4 3.7E−3 KEGG pathway hsa04114 Oocyte meiosis 4 7.3E−3 KEGG pathway hsa04110 Cell cycle 4 9.8E−3 [75]Open in a new tab Abbreviations: BP, biological process; CC, cellular component; MF, molecular function; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes. The results of Metascape analysis showed that the DEGs and their neighboring genes were mainly enriched in cell division, mitotic nuclear division, and cell cycle phase transition ([76]Figure 2A and B). Meanwhile, subnetwork analysis of the PPI network resulted in the identification of the potential protein complex for five core genes (CCNB2, BIRC5, CENPA, CENPF, and SKA1) ([77]Figure 2C and D). Quality control and association analysis using DisGeNET showed that these DEGs were significantly related to the occurrence of invasive breast carcinoma, carcinoma of the male breast, malignant neoplasm of the male breast, and other diseases (p < 0.01) ([78]Figure 2E). Figure 2. [79]Figure 2 [80]Open in a new tab Enrichment analysis of DEGs and neighboring genes in triple-negative breast cancer. (A) Heatmap of enriched GO and KEGG terms, colored based on p-values. (B) Network of enriched GO and KEGG terms, colored based on p-values (terms containing more genes tend to have a more significant p-value). (C) PPI network. (D) Five most significant MCODE components from the PPI network. (E) DisGeNET data for the DEGs. Abbreviations: DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein–protein interaction. Transcription Levels of DEGs in Patients with Breast Cancer The graph shows the number of datasets with statistically significantly upregulated (red) or downregulated (blue) mRNA expression of the target genes ([81]Figure 3). The threshold was designed with the following parameters: a p-value of 0.001 and an FC of 1.5. The transcription levels of the core genes in cancers were compared with those in normal tissues using the ONCOMINE database ([82]Figure 4). The data showed that the mRNA expression of BIRC5, CCNB2, CENPA, CENPF, and SKA1 was upregulated in patients with breast cancer. In the Curtis dataset, BIRC5 was upregulated in medullary breast carcinoma, with an FC of 6.014 and a p-value of 9.13E−17. In the Turashvili dataset, CCNB2 was overexpressed in invasive ductal breast carcinoma, with an FC of 4.653 and a p-value of 6.05E−6. In the Curtis dataset, CENPA was overexpressed in invasive ductal breast carcinoma, with an FC of 2.183 and a p-value of 1.27E−115. In The Cancer Genome Atlas dataset, the transcription level of CENPF was significantly higher in patients with invasive lobular breast carcinoma than that in normal specimens, with an FC of 6.980 and a p-value of 1.31E−21. In the Turashvili dataset, the FC in the mRNA expression of SKA1 in invasive ductal breast carcinoma was 7.501, and the p-value was 2.48E−6. Figure 3. [83]Figure 3 [84]Open in a new tab Transcription levels of the core genes in different types of cancers in Oncomine database (blue: low expression, red: high expression, comparison within the same line). Figure 4. [85]Figure 4 [86]Open in a new tab Expressions of the core genes in different breast cancer research microarrays. (A) CENPF expression in TCGA breast (1: breast, 2: invasive lobular breast carcinoma). (B) SKA1 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma). (C) BIRC5 expression in Curtis breast (1: breast, 2: medullary breast carcinoma). (D) CENPA expression in Curtis breast (1: breast, 2: invasive ductal breast carcinoma). (E) CCNB2 expression in Turashvili breast (1: ductal breast cell, 2: lobular breast cell, 3: invasive ductal breast carcinoma). Association Between DEG Expression and Survival of Patients Kaplan–Meier analysis revealed that the five core genes (CCNB2, CENPF, SKA1, CENPA, and BIRC5) were related to the RFS of patients with TNBC. Patients with higher expression levels had a worse RFS than those with lower expression levels. In particular, the overexpression of CCNB2 was the most unfavorable prognostic factor for RFS of patients with TNBC (hazard ratio = 1.98; 95% CI: 1.28–3.06; p = 0.0018; n = 255), consistent with the lowest log-rank p-value ([87]Figure 5). Figure 5. [88]Figure 5 [89]Open in a new tab Prognostic values of mRNA expression levels of the core genes in patients with triple-negative breast cancer (Kaplan–Meier analysis). (A) Association of BIRC5 with RFS in TNBC. (B) Association of CENPA with RFS in TNBC. (C) Association of SKA1 with RFS in TNBC. (D) Association of CENPF with RFS in TNBC. (E) Association of CCNB2 with RFS in TNBC. Abbreviations: RFS, relapse-free survival; TNBC, triple-negative breast cancer. CCNB2 Protein Expression in TNBC Tissues The results of immunohistochemical staining showed that CCNB2 protein expression in TNBC tissues was significantly higher than that in adjacent nontumor tissues. The protein was localized to the cytoplasm, as indicated by brown-yellow granular staining in the TNBC tissues ([90]Figure 6). Figure 6. [91]Figure 6 [92]Open in a new tab Immunohistochemical staining (EnVision Method). (A) CCNB2 is negatively expressed in the adjacent nontumor tissue at ×400 magnification. (B) CCNB2 is positively expressed in the triple-negative breast cancer tissue (magnification, ×400). Follow-Up All patients with TNBC were females and were followed up until July 2020. The median follow-up time was 44 months (range: 10–78 months). During follow-up, eight patients (20.5%) died. The median age of the patients was 47 years old; 26 were under 50 years old, and 13 were over 50 years old. The average diameter of the tumor was 2.7 cm; however, in two cases, the tumor was larger than 5 cm. In 39 patients with complete follow-up data, Kaplan–Meier survival analysis showed that the expression of CCNB2 and the primary location of the tumor were significantly related to OS of the patients (p < 0.05). However, the patient’s age, tumor location, tumor diameter, lymph node metastasis, and distant metastasis were not associated with OS (p > 0.05) ([93]Table 3 and [94]Figure 7). Table 3. Clinicopathological Characteristics and Kaplan–Meier Univariate Overall Survival Analysis of Patients with Triple-Negative Breast Cancer Characteristic Subgroup Number of Samples 95% CI Lower Bound 95% CI Upper Bound χ^2* p Age (years) ≤ 50 26 61.697 79.679 1.883 0.170 > 50 13 46.244 69.827 Tumor size (cm) ≤ 2 17 45.493 74.502 2.140 0.143 > 2 22 63.207 76.322 Location Left 22 54.224 75.853 0.478 0.489 Right 17 56.994 77.265 T T1 17 45.493 74.502 6.681 0.035 T2 20 68.256 76.744 T3 2 26.000 26.000 N N0 24 59.740 75.413 1.780 0.619 N1 8 37.780 68.554 N2 4 63.859 83.474 N3 3 30.456 41.544 M M0 35 59.427 75.784 0.104 0.747 M1 4 40.792 63.708 CCNB2 Positive 21 45.131 68.858 5.265 0.022 Negative 18 71.377 80.909 [95]Open in a new tab Abbreviations: CI, confidence interval; *χ^2, Log rank test chi-square value; CCNB2, cyclin B2. Figure 7. [96]Figure 7 [97]Open in a new tab Kaplan–Meier overall survival curves according to the (A) patient age (p = 0.170), (B) tumor size (p = 0.143), and (C) CCNB2 expression (p = 0.022). Discussion Breast cancer is a malignant tumor that occurs in breast epithelial tissue.[98]^11 Because of the loose connection, breast cancer cells easily fall off, and free cancer cells can easily spread in the blood or lymph throughout the body, forming life-threatening metastases. All these factors make breast cancer a serious threat to women’s health. TNBC is a unique subtype of breast cancer. It does not express hormone receptors (ER and PR) and HER-2, which makes clinical targeted therapy and endocrine therapy ineffective.[99]^12 Chemotherapy is currently the main adjuvant treatment for patients with TNBC. However, its efficacy is limited compared with that of comprehensive therapy, especially in patients who show resistance to chemotherapy drugs.[100]^13 Therefore, identifying reliable biomarkers and effective targets for TNBC is urgently needed to improve the prognosis of patients. The development of second-generation sequencing technology and high-throughput sequencing platforms has resulted in the generation of a large amount of data, which is being interpreted using bioinformatics methods by an increasing number of researchers. In our study, gene and protein expression analysis based on publicly available bioinformatics databases was performed to screen for potential key genes related to TNBC. Using gene expression profiling data from the GEO database, 66 DEGs were identified between TNBC tissues and normal human breast tissues, which play an important role in cell proliferation and cell cycle. By constructing a PPI network, a potential protein complex that was strongly linked to invasive breast cancer was identified. This protein complex was mainly composed of the protein expression products of the five core genes (BIRC5, CCNB2, CENPA, CENPF, and SKA1). The analysis of these 5 genes using the GEO and ONCOMINE databases showed that they were all significantly overexpressed in breast cancer than in normal tissues (p < 0.05). The Kaplan-Meier plotter showed that CCNB2 overexpression is an unfavorable prognostic factor for patients with TNBC. To strengthen the credibility of the results of the bioinformatics analysis, experimental verification was conducted. Immunohistochemical staining indicated the cytoplasmic localization of CCNB2 in breast tissue and demonstrated that CCNB2 expression in breast tissue was higher than that in the adjacent tissues. Clinicopathological correlation analysis further confirmed that high expression of CCNB2 in patients with TNBC was associated with poor survival expectations. Cyclin family proteins, including cyclins B1 (CCNB1) and B2 (CCNB2), regulate the activity of cyclin-dependent kinases (CDKs). Different cyclins are involved in specific phases of the cell cycle,[101]^14–16 and CCNB2 plays an important role in the regulation of the cell cycle. During the interphase and mitosis, CCNB2 is located in the Golgi apparatus and participates in its decomposition.[102]^17 According to previous reports, CCNB2 usually triggers the process of G2/M phase transition by activating CDK1, and downregulation of CCNB2 inhibits cell proliferation and promotes cell cycle arrest in the G2/M phase.[103]^18–20 A study has shown that metformin downregulates the expression of CCNB2 to increase the rates of apoptosis and cell cycle arrest.[104]^21 A high level of CCNB2 is positively correlated with the degree of undifferentiation, the tumor size, lymph node metastasis, distant metastasis, and the clinical stage. In the past few years, the overexpression of CCNB2 in tumor tissues has been shown to be an unfavorable prognostic biomarker in many human cancers, including gastric cancer,[105]^22 breast cancer,[106]^23 pituitary adenoma,[107]^24 nasopharyngeal carcinoma,[108]^25 and adrenocortical carcinoma.[109]^26 The results of bioinformatics analysis showed that mRNA expression of CCNB2 was significantly associated with TNBC patients’ prognosis. Owing to the small sample size of TNBC, this study has certain limitations. In the future, we will continue to collect sample information in the clinic, and hope the findings of this study can provide a direction for future TNBC research and clinical treatment. In this study, BIRC5, SKA1, CENPA, and CENPF were highly expressed in TNBC compared to their expression in normal breast tissues, and CENPA, CENPF, and SKA1 expression levels were significantly correlated with a poor RFS (log-rank p < 0.05). However, the role of these genes in TNBC remains unclear, and more studies are needed. Conclusions In summary, CCNB2 protein expression was significantly increased in TNBC tissues and was related to the malignant status and prognosis of patients. The clinical value of CCNB2 has yet to be confirmed by further studies. In the future, the regulation mechanism of CCNB2-related signal pathways in TNBC will be further studied. Nevertheless, CCNB2 has broad potential as a therapeutic target and prognostic factor for TNBC. Acknowledgments