Abstract
   Background: DNA methylation played essential roles in regulating gene
   expression. The impact of DNA methylation status on the occurrence and
   development of cancers has been well demonstrated. However, little is
   known about its prognostic role in breast cancer (BC).
   Materials: The Illumina Human Methylation450 array (450k array) data of
   BC was downloaded from the UCSC xena database. Transcriptomic data of
   BC was downloaded from the Cancer Genome Atlas (TCGA) database.
   Firstly, we used univariate and multivariate Cox regression analysis to
   screen out independent prognostic CpGs, and then we identified
   methylation-associated prognosis subgroups by consensus clustering.
   Next, a methylation prognostic model was developed using multivariate
   Cox analysis and was validated with the Illumina Human Methylation27
   array (27k array) dataset of BC. We then screened out differentially
   expressed genes (DEGs) between methylation high-risk and low-risk
   groups and constructed a methylation-based gene prognostic signature.
   Further, we validated the gene signature with three subgroups of the
   TCGA-BRCA dataset and an external dataset [41]GSE146558 from the Gene
   Expression Omnibus (GEO) database.
   Results: We established a methylation prognostic signature and a
   methylation-based gene prognostic signature, and there was a close
   positive correlation between them. The gene prognostic signature
   involved six genes: IRF2, KCNJ11, ZDHHC9, LRP11, PCMT1, and TMEM70. We
   verified their expression in mRNA and protein levels in BC. Both
   methylation and methylation-based gene prognostic signatures showed
   good prognostic stratification ability. The AUC values of 3-years,
   5-years overall survival (OS) were 0.737, 0.744 in the methylation
   signature and 0.725, 0.715 in the gene signature, respectively. In the
   validation groups, high-risk patients were confirmed to have poorer OS.
   The AUC values of 3 years were 0.757, 0.735, 0.733 in the three
   subgroups of TCGA dataset and 0.635 in [42]GSE146558 dataset.
   Conclusion: This study revealed the DNA methylation landscape and
   established promising methylation and methylation-based gene prognostic
   signatures that could serve as potential prognostic biomarkers and
   therapeutic targets.
   Keywords: DNA methylation, gene expression, breast cancer, prognostic
   signature, the cancer genome atlas (TCGA)
Introduction
   Breast cancer (BC) is the most prevalent cancer and the leading cause
   of cancer mortality in women worldwide ([43]Freddie et al., 2020). In
   the United States, it is estimated that around 13% of women will suffer
   from BC in their lifetime ([44]DeSantis et al., 2019). In recent years,
   the mortality of BC has gradually declined and the 5-years OS rate has
   reaches 90% attributed to early detection and improved treatment
   ([45]Allemani et al., 2018).
   Breast cancers are categorized into ER+, ER+/HER2-, HER2+ and
   Triple-negative subtypes based on the expression of estrogen receptor
   (ER), progesterone receptor (PR) and human epidermal growth factor
   receptor 2 (HER2). Similarly, gene expression analysis of these
   receptors further recognizes four subsets of BC: luminal A, luminal B,
   HER2-enriched (HER2-E) and Basal-like ([46]Parker et al., 2009). These
   classification systems not only help predict the prognosis of BC
   patients, but also guide treatment choices. Conventional therapies such
   as surgery, radiotherapy, and chemotherapy form the basis of BC
   treatment. In addition, endocrine therapy for hormone receptor-positive
   BC and anti-HER2 treatment for HER2 expressing BC have greatly improved
   the prognosis of patients. Unfortunately, triple negative breast cancer
   (TNBC) still lacks effective therapeutic targets. Recent studies
   demonstrated that poly ADP-ribose polymerase 1 (PARP1) inhibitors and
   immune checkpoint inhibitors (ICIs) showed potential effect in TNBC
   ([47]Mittendorf et al., 2020), ([48]Schmid et al., 2020). Despite the
   great achievements in treatment, about 25–40% of BC patients will
   develop metastases ([49]Siegel et al., 20172017). Among them, bone
   metastases are the most common, and approximately 75% of late-stage BC
   patients are diagnosed with bone metastases ([50]Tulotta and Ottewell,
   2018). Moreover, 5–20% of BC patients would have brain metastases
   ([51]Achrol et al., 2019). Once the patient develops metastasis, the
   prognosis is poor, with the median OS of only 1–2 years ([52]Redig and
   McAllister, 2013), ([53]Martin et al., 2017). Therefore, it is urgent
   to find potential prognosis-related biomarkers to accurately predict
   the prognosis of BC patients.
   Epigenetics events such as DNA methylation, histone modifications,
   chromatin remodeling, and non-coding RNAs play essential roles in the
   regulation of gene expression and actively participate in the
   development and progression of cancers. DNA methylation, which affects
   gene expression without changing the DNA sequence, is the most common
   epigenetic modification ([54]Nakao, 2001), ([55]Strahl and Allis,
   2000). Stefansson et al. demonstrated that abnormal methylation of CpG
   islands in the promoter regions might activate proto-oncogenes or
   silence tumor suppressor genes, thereby contributing to the occurrence
   and development of tumors ([56]Stefansson and Esteller, 2013).
   Accumulating evidences showed that decreased levels of genome-wide
   methylation were a critical sign of early cancers and were related to
   cancer grade and metastasis ([57]Yang and Schwartz, 2011), ([58]Ding et
   al., 2019). Indeed, DNA methylation was associated with most
   malignancies including bladder cancer ([59]Chen et al., 2020a), lung
   cancer ([60]Liang et al., 2019), and gastrointestinal tumors ([61]Woo
   et al., 2018), ([62]Huang et al., 2018).
   Emerging studies have revealed the important roles of DNA methylation
   in BC ([63]Pasculli et al., 2018; [64]Kresovich et al., 2019; [65]Xu et
   al., 2020). For instance, distinct DNA methylation patterns and
   associated gene expression profiles were found in different molecular
   subtypes of BC. SFRP1, a tumor suppressor gene, was down-regulated by
   hypermethylation in ER + breast cancer, leading poor prognosis
   ([66]Stefansson et al., 2015), ([67]Park et al., 2012). Other genes
   such as BRCA1, CDH1, and PTEN, were also abnormally methylated in BC.
   These events could serve as potential therapeutic and prognostic
   biomarkers ([68]FitzGerald et al., 1998; [69]Pharoah et al., 2001;
   [70]King et al., 2003; [71]Walsh et al., 2006; [72]Suijkerbuijk et al.,
   2008; [73]Luo et al., 2016). However, the prognostic role of DNA
   methylation in BC remains incompletely demonstrated.
   In this study, we used bioinformatics methods to determine the
   prognostic role of DNA methylation and constructed
   methylation-associated prognostic signatures for BC. This study will
   help unveil the significance of DNA methylation in BC and might help
   discover novel prognostic biomarkers.
Materials and Methods
Data Acquisition and Processing
   RNA-seq data in fragments per kilobase of transcript per million mapped
   reads (FPKM) form and clinical information of BC were downloaded from
   the Cancer Genome Atlas (TCGA: [74]https://portal.gdc.cancer.gov/)
   database. Illumina Human Methylation450 BeadChip array (450k array) and
   Illumina Human Methylation27 BeadChip array (27k array) data of TCGA
   database were downloaded from UCSC xena ([75]https://xenabrowser.net/)
   ([76]Goldman et al., 2020). DNA Methylation levels were evaluated by
   the β value, which ranged from 0 to 1 (0 means unmethylated and 1 means
   fully methylated). Probes with over 70% of missing values and probes
   located at chromosomes X and Y were removed. The missing values of the
   remaining probes were imputed using the k-nearest neighbours (knn)
   imputation algorithm of the impute R package. Since DNA methylation in
   promoter regions would strongly influence gene expression, we focused
   on the methylation probes in promoter regions defined as 2.0 kb
   upstream to 0.5 kb downstream from transcription start sites (TSS).
   Batch effects were removed by the ComBat algorithm of the sva R package
   ([77]Leek et al., 2012). Ultimately, 560 patients including methylation
   data (from 450k array) and corresponding clinical data, 986 patients
   (from TCGA database) containing both gene expression data and
   corresponding clinical data were used for mainly analysis. And we
   obtained 557 overlapping patients (from the above two datasets) with
   complete gene expression data, methylation data and clinical data.
   Moreover, RNA-seq data and clinical information of 106 samples from
   [78]GSE146558 were downloaded from NCBI (GEO:
   [79]https://www.ncbi.nlm.nih.gov/) as external validation dataset. The
   mRNA expression profile from GEO dataset was normalized by the Robust
   Multichip Average (RMA) algorithm with background adjustment, quantile
   normalization, and final summarization. The workflow of our study was
   illustrated in [80]Figure 1.
FIGURE 1.
   [81]FIGURE 1
   [82]Open in a new tab
   The workflow for this study. DEGs, differentially expressed genes.
Independent Prognostic CpG Loci Screening
   Univariate Cox regression analysis was performed to screen out the
   prognosis-related CpGs of the 560 BC patients. In our study, we used
   the OS as clinical parameter of prognosis. Next, all these CpGs were
   subjected to multivariate Cox regression analysis, with age,
   pathological stage, and TNM stage as covariates to identify independent
   prognostic CpGs.
GO and KEGG Analysis
   The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes
   (KEGG) ([83]http://www.genome.jp/kegg/) ([84]Kanehisa et al., 2017)
   analysis were performed using the R ClusterProfiler package ([85]Yu et
   al., 2012). A p value < 0.05 was set as the cut-off value for both GO
   and KEGG analyses in our study.
Consensus Clustering and Evaluation of CpG-Related Subtypes
   Consensus clustering ([86]Wilkerson and Hayes, 2010) was performed to
   determine subgroups of different methylation characteristics of the 560
   BC patients based on the independent prognostic CpG loci using the
   ConsensusClusterPlus package of R. The criteria to determine the number
   of clusters were: ([87]Freddie et al., 2020) The consistency within the
   cluster was relatively high; ([88]DeSantis et al., 2019) There was no
   significant increase in the area under the CDF curve; ([89]Allemani et
   al., 2018) The relative change in area under CDF curve tended to be
   stable. We then generated a consensus matrix to better visualize and
   help determine the number of clusters.
Construction and Validation of the Methylation Prognostic Model
   Differentially methylated independent prognostic CpG loci between the
   different prognosis clusters were screened out using Wilcoxon test. The
   filtering conditions were false discovery rate (FDR) < 0.05 and | log2
   (fold change) | >0.585. On the basis of these differentially methylated
   CpG loci, a methylation prognostic model of 560 BC patients was
   constructed using multivariate Cox analysis. The formula of the risk
   score was as follows:
   [MATH: Risk score=∑coefiβi :MATH]
   where coef[i] was the multivariate Cox regression coefficient, and βi
   was the corresponding methylation β value. According to the median risk
   score ([90]Chen et al., 2020b), ([91]Shen et al., 2019), patients were
   divided into methylation low-risk (n = 280) and high-risk (n = 280)
   groups. Survival curves were employed to compare the OS of the two
   groups. Risk curve was plotted to visualize the relationship of the
   risk score, survival status and the methylated levels of the six
   signature CpG loci. Univariate and multivariate Cox regression analysis
   were performed to explore whether the risk score could be an
   independent predictor of OS. The sensitivity and specificity of the
   methylation prognostic model were evaluated by calculating the area
   under the curve (AUC) of the receiver operating characteristic (ROC)
   curve.
Construction and Evaluation of Gene Prognostic Model
   Differentially expressed genes (DEGs) were identified from methylation
   high-risk and low-risk groups (of 557 BC patients’ dataset) with the
   filter conditions of adjusted p < 0.05 and | log2 (fold change) | >1.
   And then we further extracted the above-mentioned DEGs from the
   transcriptomic profiles of 986 BC patients’ dataset for subsequent
   analysis. Kaplan-Meier analysis and univariate Cox regression analysis
   were employed to investigate prognosis-related DEGs of the 986 samples.
   Similarly, a gene prognostic signature was constructed by multivariate
   Cox regression analysis. And the risk formula was:
   [MATH:
   Risk score=∑i=1<
   /mn>n(ex
   pressio<
   /mi>ni∗coefi)
   mrow> :MATH]
   Patients were also categorized into low-risk (n = 493) and high-risk (n
   = 493) groups based on the median risk score. Kaplan-Meier survival
   curve, risk curve, ROC curve, univariate and multivariate Cox
   regression analysis were also used for evaluating and validating the
   prognostic signature. Besides, two subgroups from 986 BC patients, the
   557 patients’ dataset and [92]GSE146558 dataset were used to validate
   the prognostic value of the gene signature.
Antitumor Drug Sensitivity Analysis
   CellMiner ([93]https://discover.nci.nih.gov/cellminer/home.do) is a
   robust, user-friendly online database that integrates drug sensitivity
   and genomic data ([94]Reinhold et al., 2017), ([95]Wang et al., 2016).
   Anti-tumor activity data obtained from NCI-60 tumor cell line panel of
   the developmental therapeutics program (DTP) and RNA-seq data for the
   60 cell lines of the NCI DTP drug screen were downloaded from this
   website. Subsequently, correlation between the sensitivity of
   anti-tumor drugs and the signature genes was analyzed.
Statistical Analysis
   R 3.6.3 (version 3.6.3,
   [96]https://pan.baidu.com/s/1sufVf2lmoj9GYG_j5_fJKQ) was used for
   statistical analysis and plotting. Consensus clustering was performed
   using the ConsensusClusterPlus package of R; COX regression analysis
   was performed with the coxph function in survival package of R
   ([97]Zhang, 2016); Kaplan-Meier curve was plotted using the survival
   and survminer packages of R; Pheatmap was plotted using the pheatmap
   package of R; The forest plots were plotted by the forestplot package
   of R; ROC curve was plotted by the survival ROC package of R. GO and
   KEGG analyses were performed using the ClusterProfiler package of R.
   Mann-Whitney test was used to estimate the statistical significance of
   two groups of skewed distributed continuous variables, and
   Kruskal-Wallis test was used to evaluate the statistical significance
   of multiple groups of skewed distributed continuous variables (with
   Bonferroni correction for pairwise comparisons among multiple groups).
   All tests were two-sided and for all statistical tests, p < 0.05 was
   considered to be statistically significant unless otherwise specified.
Results
Screening of Prognosis-Related CpG Loci in Breast Cancer
   In our study, 450k array dataset was defined as train group and 27k
   array dataset was defined as test group ([98]Table 1). Firstly, 144 CpG
   loci with p < 0.001 by univariate Cox analysis were screened out and
   identified as prognosis-related CpG loci. Using age, pathological
   staging and TNM staging as covariates, 66 CpG loci (49 CpG loci
   associated with favorable prognosis, 17 CpG loci associated with poor
   prognosis) with p < 0.001 by multivariate Cox analysis were further
   selected and used as the methylation classification features
   ([99]Figure 2).
TABLE 1.
   General clinical characteristics of 560 BC patients.
   Parameters               Methylation train group Methylation test group
                                   (n = 560)              (n = 278)
   Age                                 —                      —
    ≤65 years                     438 (78.3)              191 (68.7)
    >65 years                     122 (21.7)              87 (31.3)
   Pathologic stage[100] ^a            —                      —
    I/II                          418 (74.6)              215 (77.3)
    III/IV                        142 (25.4)              57 (20.5)
    unknow                             0                   6 (2.2)
   T stage[101] ^a                     —                      —
    T1/2                          479 (85.5)              238 (85.6)
    T3/4                           81 (14.5)              40 (14.4)
   N stage[102] ^a                     —                      —
    N0                            254 (45.4)              143 (51.4)
    N1/3                          306 (54.6)              128 (46.1)
    unknow                             0                   7 (2.5)
   [103]Open in a new tab
   ^a
   Staging according to Seventh Edition AJCC, Guidelines (Edge SB, Byrd
   DR, Compton CC, Fritz AG, Greene FL, Trotti A, eds. AJCC, Cancer
   Staging Manual. Seventh ed New York, NY: springer; 2010).
FIGURE 2.
   [104]FIGURE 2
   [105]Open in a new tab
   Significance and hazard ratio values of 66 independent
   prognosis-related CpG loci obtained from multivariate Cox regression
   analysis.
Identification of DNA Methylation-Based Prognosis Subgroups
   Then, the 560 patients were categorized into clusters of different
   methylation characteristics with consensus clustering based on the
   methylation of the 66 independent prognostic CpG loci. When the
   patients were assigned to 5 categories, the consistency within the
   clusters was high, the area under the cumulative distribution function
   (CDF) curve began to stabilize, and the relative change in area under
   CDF curve tended to be stable ([106]Figures 3A,B). A consensus matrix
   representing the consensus for k = 5 also displayed a well-defined
   5-block structure ([107]Figure 3C). Accordingly, the optimal number of
   clusters was determined to be 5.
FIGURE 3.
   [108]FIGURE 3
   [109]Open in a new tab
   The methylated levels and prognosis of consensus clustering subgroups
   of breast cancer. (A,B) Consensus clustering cumulative distribution
   function for k = 2–9. (C) Consensus clustering matrix for k = 5. (D)
   Heatmap of the differentially methylated levels of the five subgroups.
   (E) Methylated levels of the 66 independent prognosis-related CpG loci
   among the five clusters. (F) Survival analysis of the five subgroups.
   Subsequently, we conducted a subgroup analysis for the 5 clusters.
   Firstly, we compared the methylation levels of these 66 independent
   prognostic methylation loci among the five clusters. As illustrated in
   [110]Figures 3D,E, Cluster 5 had the lowest methylation levels,
   followed by Clusters 2 and 4, Clusters 1 and 3. And the methylated
   difference between Cluster 5 and each of the remaining clusters was
   statistically significant. To explore the prognostic significance of
   the five clusters, Kaplan-Meier survival analysis was performed. We
   found that the prognosis was statistically significantly different
   among the 5 clusters, where Cluster 1 and Cluster 3 had the best OS,
   while Cluster 5 had the worst ([111]Figure 3F).
Construction and Evaluation of Methylation Prognosis Model
   The five clusters were significantly prognosis-associated, and
   therefore were used to identify potential prognostic biomarkers. Given
   that Cluster 5 had the lowest methylation level and the worst OS, it
   was reasonable to be selected as the reference cluster. Next, 20
   differentially methylated independent prognostic CpG loci were
   identified from Cluster 5 and the rest clusters ([112]Table 2).
   Ultimately, a methylation prognostic model was constructed which
   included six CpG loci (cg00945507, cg05406101, cg10092957, cg13060154,
   cg14992108, cg18678121) determined by multivariate Cox analysis
   ([113]Table 3). Kaplan-Meier analysis showed that cg00945507,
   cg05406101, cg10092957, cg14992108, cg18678121 were associated with
   improved survival, and cg13060154 was associated with poor survival
   ([114]Figure 4).
TABLE 2.
   Characteristics of the differential prognosis-related CpGs by wilcoxon
   rank-sum test (cluster 5 vs. the rest clusters).
   CpGs         Log2 FC      p value       FDR
   cg00945507 −1.078388031  6.76E-07    1.35E-06
   cg02022375 −1.043706338  1.58E-12    1.30E-11
   cg02630888 −0.851143204  1.04E-12    1.15E-11
   cg05406101 −1.143837139  1.23E-15    8.12E-14
   cg06646021 −0.804998375  5.45E-10    1.89E-09
   cg10092957 −1.057286596  1.11E-07    2.44E-07
   cg11072113 −0.594310951  1.86E-11    1.02E-10
   cg13060154  0.6679634   0.030369072 0.04090528
   cg14992108 −0.673767411  2.43E-10    1.07E-09
   cg15798153 −0.959416408  2.90E-14    9.57E-13
   cg16466334 −0.985246418  4.22E-12    3.09E-11
   cg16522484 −0.696605561  9.22E-14    2.03E-12
   cg18678121 −0.729884099  1.14E-07    2.44E-07
   cg19094438 −1.478722276  5.50E-12    3.63E-11
   cg21032583 −1.145936985  2.48E-13    3.28E-12
   cg22197830 −1.050980669  1.22E-12    1.15E-11
   cg24194539 1.144078012  0.010277521 0.014432263
   cg26065841 −0.62551186   2.25E-13    3.28E-12
   cg26147480 −0.695572224  1.37E-10    6.47E-10
   cg27653134 −0.904508613  3.80E-11    1.93E-10
   [115]Open in a new tab
TABLE 3.
   Formula of Methylation prognostic model.
   CpG loci     Coef      HR
   cg00945507 −4.01607 0.018024
   cg05406101 1.117929 3.058513
   cg10092957 −1.78259 0.168202
   cg13060154 1.655685 5.236664
   cg14992108 −1.28305 0.27719
   cg18678121 −1.1362  0.321038
   [116]Open in a new tab
FIGURE 4.
   [117]FIGURE 4
   [118]Open in a new tab
   The survival difference of hypermethylation and hypomethylation of the
   six signature CpG loci.
   Then we explored the mechanisms by which these signature CpG loci might
   act on BC. The six CpG loci, cg00945507, cg05406101, cg10092957,
   cg14992108, cg18678121, and cg13060154, were located at gene promoter
   regions of SEC61G, RWDD2B, NCCRP1, SNTB1, SEC61A2, DAB2IP,
   respectively. We firstly analyzed the correlation of these CpG loci and
   their corresponding target genes. The methylation of cg10092957,
   cg05406101, cg18678121, cg00945507 were moderately negatively
   correlated with the expression of their target genes. Whereas
   cg13060154 was weakly positively correlated with its corresponding gene
   ([119]Supplementary Figures S1A–F). Consistent with the above results,
   the increased expression of NCCRP1, RWDD2B, SEC61A2, and SEC61G was
   associated with the decreased β values of cg10092957, cg05406101,
   cg18678121, and cg00945507, respectively. To the opposite of them,
   DAB2IP had higher expression in the presence of hypermethylated
   cg13060154 ([120]Supplementary Figures S1G–L). However, there was no
   relationship between the cg14992108 and its target gene SNTB1. In
   addition, we took advantage of TCGA Wanderer, an interactive viewer
   exploring DNA methylation and gene expression data in human cancer
   ([121]http://maplab.imppc.org/wanderer/) ([122]Díez-Villanueva et al.,
   2015), to explore the methylated difference of the six CpG loci between
   breast cancer and normal tissues. We found that the methylation levels
   of cg05406101, cg18678121, cg14992108 were higher in normal tissues.
   However, the methylation levels of cg13060154 and cg10092957 were
   higher in breast cancer ([123]Supplementary Figures S1M–R). We also
   explored the prognostic roles of the six CpG loci in breast cancer
   through the public database MethSurv ([124]Modhukur et al., 2018)
   ([125]https://biit.cs.ut.ee/methsurv/). High methylation levels of
   cg00945507, cg05406101, cg10092957, cg14992108, cg18678121 were
   associated with favorable prognosis. On the contrary, high methylation
   level of cg13060154 was associated with poor survival
   ([126]Supplementary Figures S1S–X).
   On the basis of multivariate Cox regression, we developed the following
   risk model:
   Risk score = −4.016 × cg00945507 + 1.117 × cg05406101−1.78 × cg10092957
   + 1.655 × cg13060154−1.283 × cg14992108−1.136 × cg18678121 ([127]Table
   3).
   According to the formula, we computed the risk score of each BC patient
   in the train group (n = 560) and assigned them into high-risk (n = 280)
   and low-risk groups (n = 280) with reference to the median risk score.
   The methylation levels of the six CpG loci between the high-risk group
   and low-risk group were showed in [128]Supplementary Figures S2A.
   Kaplan-Meier curve indicated that the high-risk group had significantly
   poorer OS than the low-risk group ([129]Figure 5A). To confirm the
   methylation risk score could effectively predict the BC patients’
   prognosis, we plotted the ROC curve. Notably, we observed that the risk
   score had the highest prediction performance of prognosis compared with
   the conventional clinical features, with the 3-years and 5-years AUC
   values being 0.739 and 0.744 ([130]Figures 5B,C). The relationship of
   methylation risk score, survival status and methylation levels of the
   six signature CpG loci was shown in [131]Figures 5D–F. Univariate Cox
   analysis indicated that age, stage, T stage, N stage and risk score
   were significantly associated with OS. However, when they were
   introduced into multivariate Cox analysis, only age [hazard ratio,
   1.026 (95% CI, 1.008–1.045), p = 0.005], and risk score [hazard ratio,
   2.823 (95% CI, 2.131–3.741), p < 0.001] remained as independent
   prognostic predictors ([132]Figures 5G,H). Similar prognostic
   significance was observed in the validation cohort (27k array dataset),
   with AUC values of 0.603 and 0.657 for 3 and 5 years, respectively
   ([133]Supplementary Figures S2B–D).
FIGURE 5.
   [134]FIGURE 5
   [135]Open in a new tab
   Methylation prognostic model assessment in 560 breast cancer samples.
   (A) Survival analysis between the high-risk and low-risk groups. (B,C)
   The time-dependent receiver operating characteristic (ROC) curves at 3
   and 5 years. (D) The risk score distribution. (E) Survival status
   scatter plots. (F) Heatmap of the six signature CpG loci. (G)
   Univariate Cox regression analysis. (H) Multivariate Cox regression
   analysis.
Identification of Differentially Expressed Genes From the Methylation
High-Risk and Low-Risk Groups
   Using the thresholds of adjusted p < 0.05 and | log2 (fold change) |
   >1, a total of 413 differentially expressed genes (DEGs) between
   methylation high-risk and low-risk groups were obtained. The volcano
   and heatmap visually displayed the DEGs ([136]Figures 6A,B). To further
   investigate the biological characteristics of the DEGs, function and
   pathway annotations were performed. GO analysis indicated that these
   genes were involved in the regulation of cell cycle processes and
   mitotic cell cycle phase transition. KEGG analysis showed that these
   DEGs were mainly enriched in p53 and TGF-β signaling pathways
   ([137]Figures 6C–F).
FIGURE 6.
   [138]FIGURE 6
   [139]Open in a new tab
   (A) Volcano plot of differentially expressed genes (DEGs); (B) Heatmap
   of the DEGs; (C–F) GO and KEGG enrichment analysis for DEGs. KEGG
   pathway (C), BP (D), CC (E), and MF (F).
Construction and Evaluation of Gene Prognostic Signature
   To explore the prognostic values of the identified 413 DEGs, we
   extracted the expression of these DEGs from the transcriptomic profiles
   of 986 BC patients in the TCGA-BRCA database for subsequent analysis.
   Firstly, 50 DEGs significantly related with OS were selected by
   Kaplan-Meier analysis (p < 0.05). All of these prognostic genes were
   subjected to univariate Cox regression analysis and 13 of them with p <
   0.05 were further identified as prognosis-associated genes
   ([140]Supplementary Table S1). Ultimately, six prognosis-associated
   genes, namely IRF2, KCNJ11, ZDHHC9, LRP11, PCMT1, and TMEM70, were
   included in developing a gene prognostic signature by multivariate Cox
   regression analysis. Among them, four genes (ZDHHC9, LRP11, PCMT1,
   TMEM70) were related with poor survival and two genes (IRF2, KCNJ11)
   were associated with good survival ([141]Supplementary Figures S3). The
   risk formula was as follows:
   Risk score = −(0.06128 × IRF2) − (0.0342 × KCNJ11) + (0.0228 × ZDHHC9)
   + (0.01257 × LRP11) + (0.01082 × PCMT1) + (0.02917 × TMEM70).
   Gene expression analysis of the six signature genes in the Oncomine
   database ([142]https://www.oncomine.org) revealed that ZDHHC9, LRP11,
   PCMT1, TMEM70 were highly expressed in breast cancer, and IRF2, KCNJ11
   were highly expressed in normal tissues ([143]Figure 7A). Then we
   explored the protein levels of these six genes between breast cancer
   and normal tissues in the Human Protein Atlas database ([144]Uhlen et
   al., 2010) (HPA:
   [145]https://www.proteinatlas.org/humanproteome/pathology). In
   accordance with the gene expression levels, the protein levels of
   ZDHHC9, PCMT1, TMEM70 were significantly higher in breast cancer, and
   the protein levels of IRF2, KCNJ11 were higher in normal tissues
   ([146]Figure 7B). Moreover, we further checked the prognostic values of
   our six genes in the public database TCGA portal (version 1.0)
   ([147]http://tcgaportal.org/TCGA/Breast_TCGA_BRCA/process.php), and we
   found that ZDHHC9, LRP11, PCMT1, TMEM70 were associated with poor
   prognosis, while IRF2 and KCNJ11 were related with good prognosis of BC
   patients ([148]Figure 7C).
FIGURE 7.
   [149]FIGURE 7
   [150]Open in a new tab
   Validation of the six signature genes. (A) The expression of the six
   signature genes in breast cancer and normal tissue in Oncomine
   database. (B) The protein expression levels of the six prognostic genes
   in Human Protein Atlas database. (C) Survival analysis of the six
   prognostic genes based on TCGA portal.
Association of Methylation Prognostic Model and Methylation-Based Gene
Prognostic Model
   Perhaps not surprisingly, a positive and significant correlation was
   observed between the two prognostic signatures, which was mainly
   reflected at the following aspects: firstly, a moderate correlation was
   found between the six signature CpG loci and six signature genes.
   Specifically, the expression of IFR2 was positively related with the
   methylation of cg00945507, cg05406101, cg14992108, and cg18678121,
   above of which were all good prognostic factors, while IFR2 expression
   was negatively related with the methylation of poor prognostic CpG
   locus: cg13060154; And KCNJ11 expression was positively correlated with
   the methylation of favorable prognostic CpG loci: cg05406101 and
   cg10092957, and negatively related with the methylation of cg13060154;
   To the opposite of these two genes, ZDHHC9 and TMEM70 expression were
   negatively related with the methylation of cg05406101 and cg18678121,
   and positively associated with the methylation of cg13060154;
   Similarly, the expression of PCMT1 was negatively correlated with the
   methylation of cg05406101, cg14992108, cg18678121; And LRP11 expression
   was negatively related with the methylation of cg00945507, cg05406101,
   cg18678121 ([151]Supplementary Figures S4A, [152]Supplementary Figures
   S4I).
   Subsequently, we examined the correlation between the six CpG loci and
   the gene risk score, and we observed that except for cg13060154 having
   the trend being positively correlated with the risk score, cg05406101
   (R2 = −0.34, p < 0.001), cg18678121 (R2 = −0.28, p < 0.001), cg14992108
   (R2 = −0.26, p < 0.001), cg00945507 (R2 = −0.26, p < 0.001), and
   cg10092957 (R2 = −0.11, p < 0.01) were all negatively correlated with
   the risk score ([153]Supplementary Figures S4B–G, [154]Supplementary
   Figures S4I ). Besides, we also explored the relationship between
   methylation risk score and gene risk score. Interestingly, we found
   that these two established risk scores were positively correlated with
   each other (R2 = 0.34, p < 0.001) ([155]Supplementary Figures S4H,I).
Evaluation and Validation of the Gene Prognostic Signature
   The expression of the six signature genes between the gene high-risk
   and low-risk groups was shown in [156]Supplementary Figures S5A.
   Kaplan-Meier analysis showed that survival probability in the low-risk
   group was higher ([157]Figure 8A). The AUC values of 3-years and
   5-years OS were 0.725 and 0.715 ([158]Figures 8B,C). The risk score
   distribution, the survival status, and the expression of the six genes
   of 986 BC patients were visualized in [159]Figures 8D–F. Univariate and
   multivariate Cox regression analyses indicated that the risk score was
   associated with OS and could be an independent prognostic predictor,
   with univariate hazard ratio, 1.187 (95% CI, 1.082–1.302, p < 0.001),
   multivariate hazard ratio, 1.199 (95% CI, 1.094–1.314), p < 0.001,
   respectively ([160]Figures 8G,H).
FIGURE 8.
   [161]FIGURE 8
   [162]Open in a new tab
   Gene prognostic signature assessment of 986 breast cancer samples. (A)
   Survival analysis between the high-risk and low-risk groups. (B,C) The
   time-dependent receiver operating characteristic (ROC) curves at 3 and
   5 years. (D) The risk score distribution. (E) Survival status scatter
   plot. (F) Heatmap of the six prognostic genes. (G) Univariate Cox
   regression analysis. (H) Multivariate Cox regression analysis.
   To confirm the prognostic value of the six-gene signature, we tested it
   with the validation subgroup comprising of 557 BC patients. The results
   were consistent with our previous findings. Specifically, the OS in the
   high-risk group was poorer ([163]Supplementary Figures S5B), and the
   AUC values of 3 and 5 years were 0.735 and 0.696 ([164]Supplementary
   Figures S5C). Univariate and multivariate COX regression analysis also
   showed that the risk score was an independent prognostic predictor of
   BC ([165]Supplementary Figures S5D,E). Subsequently, the 986 BC
   patients’ dataset was randomly assigned into two test subgroups [test
   group one (n = 494) and test group two (n = 492)] which were with
   balanced baseline characteristics ([166]Table 4), and both of them were
   used for validating the gene signature. The two subgroups could also
   distinguish the favorable OS patients from the poor OS patients
   ([167]Supplementary Figures S5F,G). AUC values of 1 year, 3 years,
   5 years were 0.711, 0.757, 0.721 in test group one, and 0.864, 0.733,
   0.702 in test group two, respectively ([168]Supplementary Figures
   S5I,J). Moreover, the external dataset [169]GSE146558 further confirmed
   our gene prognostic signature. In high-risk group of [170]GSE146558
   dataset, patients were with poorer OS ([171]Supplementary Figures S5H).
   And AUC value of 3 years was 0.634 ([172]Supplementary Figures S5K).
TABLE 4.
   Clinical characteristics of the two validation subgroups from the
   dataset of 986 samples.
   Covariates Type Total Test group one Test group two p Value
   Gender Female 986 (100%) 492 (100%) 494 (100%) 0.9492
   Age ≤65 716 (72.62%) 358 (72.76%) 358 (72.47%) 0.9742
   — >65 270 (27.38%) 134 (27.24%) 136 (27.53%) —
   Pathologic stage[173] ^a Stage I-II 727 (73.73%) 355 (72.15%) 372
   (75.3%) 0.1261
   — Stage III-IV 239 (24.24%) 131 (26.63%) 108 (21.86%) —
   — Unknow 20 (2.03%) 6 (1.22%) 14 (2.83%) —
   T stage[174] ^a T1-2 829 (84.08%) 404 (82.11%) 425 (86.03%) 0.1253
   — T3-4 154 (15.62%) 86 (17.48%) 68 (13.77%) —
   — unknow 3 (0.3%) 2 (0.41%) 1 (0.2%) —
   M stage[175] ^a M0 811 (82.25%) 406 (82.52%) 405 (81.98%) 0.2694
   — M1 20 (2.03%) 7 (1.42%) 13 (2.63%) —
   — Unknow 155 (15.72%) 79 (16.06%) 76 (15.38%) —
   N stage[176] ^a N0 451 (45.74%) 215 (43.7%) 236 (47.77%) 0.1875
   — N1-3 518 (52.54%) 270 (54.88%) 248 (50.2%) —
   — Unknow 17 (1.72%) 7 (1.42%) 10 (2.02%) —
   [177]Open in a new tab
   ^a
   Staging according to Seventh Edition AJCC, Guidelines (Edge SB, Byrd
   DR, Compton CC, Fritz AG, Greene FL, Trotti A, eds. AJCC, Cancer
   Staging Manual. Seventh ed New York, NY: springer; 2010).
   In addition, we confirmed the prognostic value of the six-gene
   prognostic signature in the subgroups of BC patients presented with
   different clinical features (age (<65 and≥65), T staging (T1-2 and
   T3-4), N staging [N0 and N1-3) and stage (stage I-II and stage III-IV)]
   ([178]Supplementary Figure S6).
   Considering the important roles of BRCA1, BRCA2, CDH1, PTEN, TP53,
   PIK3CA in BC, we also evaluated these gene expression between gene
   high-risk and low-risk groups, and observed that the expression of
   oncogenes such as BRCA1, BRCA2 and CDH1 were significantly higher in
   high-risk group. On the other hand, the expression of the tumor
   suppressor gene PTEN was significantly higher in low-risk group
   ([179]Figure 9).
FIGURE 9.
   [180]FIGURE 9
   [181]Open in a new tab
   The differential expression of breast cancer-associated genes in gene
   high-risk and low-risk groups.
Gene Set Enrichment Analysis
   To investigate potential functions and signaling pathways related to
   the six-prognosis signature, we performed Gene Set Enrichment Analysis
   (GSEA: [182]http://www.gsea-msigdb.org/gsea/index.jsp). Notably, we
   found that more tumor-related GO terms and KEGG pathways were
   associated with low-risk group ([183]Figure 10A). In detail, low-risk
   group was mainly associated with the function of regulating epithelial
   and endothelial cell migration, and high-risk group was related with
   nuclear chromosome condensing, protein folding. Pathway enrichment
   analysis indicated that JAK/STAT signaling pathway, cell adhesion
   molecule signaling pathway, VEGF signaling pathway, and MAPK signaling
   pathway were active in the low-risk group. On the other hand, P53
   signaling pathway was active in the high-risk group ([184]Figure 10B).
FIGURE 10.
   [185]FIGURE 10
   [186]Open in a new tab
   GSEA for the gene prognostic signature. (A) The significant enrichment
   of the top 5 tumor-related GO terms in high-risk group and low-risk
   group. (B) The significant enrichment of the top 5 tumor-related
   pathways in high-risk group and low-risk group.
Correlation Analysis of the Six Signature Genes and the Sensitivity of
Anti-Tumor Drugs
   Correlation analysis between the expression of the six prognosis genes
   and the sensitivity of anti-tumor drugs was performed based on the
   CellMiner database ([187]https://discover.nci.nih.gov/cellminer/), and
   the results indicated that our signature genes were moderately
   correlated with the response of some common anti-tumor drugs such as
   PARP inhibitor (Olaparlib), chemotherapy drugs (Fluorouracil,
   Decitabine, Oxaliplatin), which might imply potential value in
   anti-tumor therapy ([188]Figure 11).
FIGURE 11.
   [189]FIGURE 11
   [190]Open in a new tab
   Correlation of the expression of the six prognostic genes and the
   sensitivity of anti-tumor drugs.
Discussion
   With the advent of next-generation sequencing, genome-wide DNA
   methylation profile analysis has become possible. Multiple studies have
   suggested that DNA methylation plays an important role in early
   detection, improved molecular classification, prognosis prediction of
   BC. Moreover, numerous studies have demonstrated that DNA methylation
   could regulate immune-related gene expression, thereby affecting the
   response of anti-tumor immunotherapy and BC patients’ prognosis. For
   examples, increasing researches have reported that the expression of
   immune genes such as CD3D, CD6, and HLA-A was found to be negatively
   correlated with DNA methylation, and was related with a better
   prognosis in BC ([191]Győrffy et al., 2016). Potential targets for
   immunotherapy are still being explored. Recent studies have shown that
   immune cell infiltration might be a biomarker for immunotherapy.
   Importantly, the methylation of immune genes could also highly
   sensitively reflect the presence of tumor infiltrating lymphocytes.
   Thus, DNA methylation profiles could be used to predict the proportion
   of all kinds of immune cells in the tumor microenvironment
   ([192]Győrffy et al., 2016), ([193]Jeschke et al., 2015). Given the
   important role of DNA methylation, it is not surprising that a better
   understanding of the DNA methylation and the exploration of the
   interaction mechanism between genes and methylation are crucial for BC
   patients.
   DNA methylation has a substantial impact on gene expression, and
   affects the prognosis of different subtypes of BC patients (44). In
   this study, we obtained six prognosis-related CpG loci, cg00945507,
   cg05406101, cg10092957, cg14992108, cg18678121, cg13060154,
   respectively targeting SEC61G, RWDD2B, NCCRP1, SNTB1, SEC61A2, DAB2IP
   genes. SEC61G was found to be overexpressed in BC and might co-amplify
   with epidermal growth factor receptor (EGFR) ([194]Reis-Filho et al.,
   2006). Lu et al. reported that the expression of SEC61G in BC was
   negatively correlated with its promoter methylation ([195]Lu et al.,
   2021). In our research, similar trend could be found that the
   expression of SEC61G was negatively related with the methylation of
   cg00945507. Moreover, the methylation level of SEC61G was positively
   correlated with the prognosis of patients with glioma ([196]Liu et al.,
   2019a). Miwa et al. proved that NCCRP1 transcription was inhibited by
   promoter hypermethylation in esophageal squamous cell carcinoma
   ([197]Miwa et al., 2017). And high expression of NCCRP1 in patients
   with pancreatic cancer was associated with a poor prognosis ([198]Zuo
   et al., 2020). In our study, we observed that the expression of NCCRP1
   was negatively correlated with the methylation of cg10092957. DAB2IP
   was a candidate tumor suppressor gene and its expression
   down-regulation mechanism was mainly through the promoter
   hypermethylation ([199]Qiu et al., 2007). Demethylation of DAB2IP gene
   weakened the EMT process and suppressed hepatocellular carcinoma growth
   ([200]Liu et al., 2019b). However, we observed a weak positive
   correlation between DAB2IP expression and cg13060154 methylation in our
   study. Regrettably, studies on the correlation between the expression
   of SEC61A2, RWDD2B, SNTB1 and DNA methylation in tumors were
   insufficient.
   On the bias of the six prognostic CpG loci, we developed a methylation
   risk model that could accurately classify BC patients with different
   death risk. Subsequently, we identified 413 DEGs from the methylation
   high-risk and low-risk groups. Function enrichment analysis indicated
   that these DEGs were related with cell cycle checkpoint, ubiquitin-like
   protein ligase binding. KEGG pathway analysis showed these genes were
   mainly enriched in p53 signaling pathway, and TGF-β signaling pathway.
   The above functions and pathways were common and critical in tumor
   proliferation, invasion, and metastasis. And then, we further extracted
   the expression of the 413 DEGs from the transcriptomic profiles of 986
   BC patients to evaluate the prognostic roles of these DEGs in TCGA-BRCA
   dataset. The prognostic value of individual genes or gene signatures
   has been extensively studied in cancers ([201]Parker et al., 2009).
   Herein, we got a methylation-based gene prognostic signature using
   multivariate Cox analysis.
   The six signature genes were composed of IRF2, ZDHHC9, KCNJ11, LRP11,
   PCMT1, and TMEM70. IRF2, a transcription factor in the interferon gamma
   signal transduction pathway, was different expression in breast cancer
   and normal tissues. Kriegsman et al. found that IRF2, which positively
   regulated the MHC class I pathway and negatively regulated PD-L1
   expression, had good implications for immunotherapy and prognosis of BC
   ([202]Kriegsman et al., 2019). ZDHHC9, one of risk genes of BC, was
   found to participate in palmitoylating PD-L1 to keep its protein
   stability, leading to immune escape. Inhibiting the ZDHHC9 expression
   made breast cancer cells susceptible to T cell killing and inhibited
   tumor growth. Thus, ZDHHC9 could be a biomarker of immunotherapy
   response ([203]Yang et al., 2019). KCNJ11 played a key role in
   glucose-stimulated insulin secretion ([204]Cook and Hales, 1984). It is
   well established that diabetes is closely related to a variety of
   tumors ([205]Giovannucci et al., 2010), and the mortality is higher
   among women with longer diabetes duration in BC ([206]Lega et al.,
   2018). Therefore, diabetes-related genes KCNJ11 may also be a potential
   prognostic biomarker of BC. Yet, the relationship between KCNJ11 and
   breast cancer has not been systematically reported. PCMT1 has gradually
   been considered as a risk gene in tumors. Study demonstrated that BC
   patients with higher PCMT1 expression had a poorer prognosis ([207]Dong
   et al., 2021). Furthermore, the different expression of the six
   signature genes in mRNA and protein levels was validated in public
   databases.
   The methylation-based gene signature could also distinguish BC patients
   with a significantly increased risk from those with a decreased risk.
   Moreover, correlation analysis showed that the methylation of the six
   signature CpGs were closely correlated with the expression of the six
   signature genes, and the established gene risk score was significantly
   positively correlated with the methylation risk score.
   Multigene analysis has been popularized to predict the response of
   anti-tumor therapy and prognosis in BC. For instance, the EndoPredict
   score (12-gene molecular signature) has been used to predict the
   survival without distant recurrence up to 15 years after diagnosis.
   Recently, the 12-gene MS has also been proven to predict the response
   to neoadjuvant chemotherapy (NaCT) and neoendocrine therapy (NET) in
   HR+, her2- BC patients, with AUC values being 0.736 for NaCT, 0.726 for
   NET ([208]Dubsky et al., 2020). Other widely used multigene assays
   involve Oncotype Dx, MammaPrint, and PAM50 which have been validated to
   predict the treatment response, recurrence, prognosis in BC patients.
   Among these multi-gene tests, MammaPrint has the best predictive
   performance (AUC = 0.88), following by Oncotype Dx (AUC = 0.76), PAM50
   risk of relapse based on subtype (ROR-S) (AUC = 0.68) and the PAM50
   risk of relapse based on subtype and proliferation (ROR-P) (AUC = 0.55)
   ([209]Grimm and Mazurowski, 2020). Our study developed methylation and
   methylation-based prognostic signatures, both of which had excellent
   performance in predicting the prognosis of BC patients with 3-years,
   5-years AUC values being 0.739, 0.744 for methylation signature, and
   0.725, 0.715 for methylation-based gene signature. Three TCGA-BRCA
   subgroups were used to validate the gene prognostic signature and all
   of them showed powerful prediction effects with 3-years AUC values of
   0.757, 0.735, 0.733, respectively. Moreover, the external dataset
   [210]GSE146558 was also used to validate our gene prognostic signature.
   Due to small sample size (n = 106) and inter-dataset heterogeneity, we
   could not obtain a higher AUC value in the validation set, although the
   3-years AUC value being 0.634 was still statistically significant.
   Increasing researches reported that CDH1, BRCA2, and BRCA1 were
   susceptibility genes for BC ([211]Petridis et al., 2019) and around
   60–70% of women with BRCA1 or BRCA2 gene mutations would be suffered
   with BC in her lifetime ([212]Antoniou et al., 2003). Besides, BRCA2
   mutation carriers were more likely to develop brain metastases than
   non-carriers ([213]Song et al., 2020). PTEN is a tumor suppressor gene
   in BC, and researches proved that lack or decrease of PTEN expression
   might be associated with poor prognosis in BC ([214]Luen et al., 2018).
   Then we examined the expression alteration of these genes in gene
   high-risk and low-risk groups to understand the contribution of the
   gene signature to the carcinogenesis of BC. As expected, the expression
   of proto-oncogenes BRCA1, BRCA2, and CDH1 was significantly higher in
   high-risk group. Conversely, expression of tumor suppressor gene PTEN
   was significantly higher in low-risk group.
   There were some advantages of our study. First of all, we were the
   first one to discuss the prognostic roles of CpG loci in breast cancer,
   and constructed methylation-associated signatures. Secondly, these two
   prognostic signatures were positively correlated with each other and
   both of them could accurately discriminate breast cancer patients with
   different death risk. Besides, three subgroups of TCGA dataset and an
   external dataset [215]GSE146558 were verified the prognostic value of
   our gene signature. Finally, the above results, together with risk gene
   expression verification, GSEA, drug sensitivity analysis, might provide
   novel treatment and prognosis biomarkers for breast cancer patients. We
   believe with the advent of the era of precision medicine, clinical
   trials could be designed using gene signature-based risk scores to
   select the patients most likely to develop poor prognosis in which to
   develop novel or more intensive postoperative therapies in future.
   One major limitation for our study was that data in our study was
   downloaded from the public databases, the mechanism of the six
   signature genes and six signature CpGs affecting the occurrence and
   development of breast cancer still needs to be further verified by vivo
   and vitro experiments. Even, prospective clinical trials are needed to
   check the prognostic values of these two signatures.
Conclusion
   Taken together, we proposed two methylation-related prognostic
   signatures. These two signatures were significantly positively
   correlated with each other and both of them could predict the prognosis
   of BC patients more accurately than traditional clinical predictors.
   Importantly, the six key genes (IRF2, ZDHHC9, KCNJ11, LRP11, PCMT1,
   TMEM70) of gene prognostic signature may act as potential prognostic
   biomarkers and therapeutic targets.
Data Availability Statement
   The original contributions presented in the study are included in the
   article/[216]Supplementary Material, further inquiries can be directed
   to the corresponding authors.
Ethics Statement
   Ethical review and approval was not required for the study on human
   participants in accordance with the local legislation and institutional
   requirements. Written informed consent for participation was not
   required for this study in accordance with the national legislation and
   the institutional requirements.
Author Contributions
   Conceptualization, CZ, QW, and YZ; methodology, CZ, QW, and YZ;
   Software, CZ and QW; Validation, CZ, ZZ, SZ, and DL; formal analysis,
   CZ, and QW; investigation, CZ, QW, and SZ; resources, CZ, QW, and DL;
   data curation, CZ and QW.; Visualization, CZ, QW, and NY.;
   Writing—Original Draft Preparation, CZ, QW, and YZ; Writing—Review and
   Editing, CZ, QW, and YZ; Supervision, CZ, QW, and YZ; Project
   Administration, CZ, QW and YZ; All authors have read and agreed to the
   published version of the manuscript.
Funding
   This study was supported by a grant from the Leading Discipline
   Construction Project of Oncology of Zhongnan Hospital of Wuhan
   University, a grant from the Science, Technology and Innovation Seed
   Fund of Zhongnan Hospital of Wuhan University (Grant No. znpy2018123),
   and a grant from the National Natural Science Foundation of China
   (81472799).
Conflict of Interest
   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.
Publisher’s Note
   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.
Supplementary Material
   The Supplementary Material for this article can be found online at:
   [217]https://www.frontiersin.org/articles/10.3389/fgene.2021.742578/ful
   l#supplementary-material
   Supplementary Figure S1
   (A–F) The correlation between the six signature CpGs and their target
   genes. (A) DAB2IP, (B) NCCRP1, (C) RWDD2B, (D) SEC61A2, (E) SEC61G, (F)
   SNTB1; (G–L) Differential expression of target genes in the
   hypermethylation and hypomethylation of the six signature CpGs. (G)
   DAB2IP, (H) NCCRP1, (I) RWDD2B, (J) SEC61A2, (K) SEC61G, (L) SNTB1;
   (M–R) The methylated levels of the CpGs of the target genes in breast
   cancer and normal tissue in MethSurv. (M) cg13060154, (N) cg14992108,
   (O) cg18678121, (P) cg10092957, (Q) cg05406101, (R) cg00945507; (S–X)
   The survival analysis of the high- and low-methylation levels of the
   CpGs of target genes in MethSurv. (S) SEC61A2, (T) RWDD2B, (U) DAB2IP,
   (V) SEC61G, (W) NCCRP1, (X) SNTB1.
   Supplementary Figure S2
   (A) The methylated levels of the six signature CpGs between methylation
   high-risk and low-risk groups of 560 breast cancer samples. (B–D)
   Methylation prognostic model assessment in the testing group of 278
   breast cancer samples. (B) Survival analysis between the high-risk and
   low-risk groups. (C,D) The time-dependent receiver operating
   characteristic (ROC) curves at 3 and 5 years.
   Supplementary Figure S3
   Survival analysis of the six signature genes.
   Supplementary Figure S4
   (A–I) The association of the methylation prognostic model and the gene
   prognostic signature. (A) The correlation of the six signature genes
   and the six signature CpGs. (B–G) The correlation between the β value
   of the six CpGs and the gene risk scores. (H) The correlation between
   methylation risk scores and gene risk scores. (I) The correlation of
   the methylation prognostic model and gene prognostic model.
   Supplementary Figure S5
   (A–I) (A) The expression of the 6 signature genes between gene
   high-risk and low-risk groups of 986 breast cancer samples; (B–E)
   Validation of the gene prognostic signature in a BRCA-TCGA subgroups
   composed of 557 samples, (B) Survival analysis, (C) ROC curve, (D)
   Univariate Cox regression analysis, (e) Multivariate Cox regression
   analysis; (F–H) Survival analysis in the other two BRCA-TCGA subgroups
   and the [218]GSE146558 dataset, (F) survival analysis in the subgroups
   composed of 494 samples, (G) Survival analysis in the subgroups
   composed of 492 samples, (H) Survival analysis in the [219]GSE146558
   dataset; (I–K) ROC curves in the other two BRCA-TCGA subgroups and the
   [220]GSE146558 dataset, (I) ROC curve in the subgroups composed of 494
   samples, (J) ROC curve the subgroups composed of 492 samples, (K) ROC
   curve in the [221]GSE146558 dataset.
   Supplementary Figure S6
   The overall survival analysis of breast cancer patients with different
   clinical characteristics in gene high-risk and low-risk groups.
   [222]Click here for additional data file.^ (38MB, zip)
References