Abstract
Long non-coding RNAs (lncRNAs) play critical roles in tumor immunity;
however, the functional roles of immune-related lncRNAs in breast
cancer (BC) remain elusive. To further explore the immune−related
lncRNAs in BC, whole genomic expression data and corresponding clinical
information were obtained from multiple BC datasets. Based on
correlation with the immune-related genes within the training set, we
screened out the most promising immune-related lncRNAs. Subsequently,
Lasso penalized Cox regression analysis followed by stepwise
multivariate Cox regression analysis identified six survival-related
lncRNAs ([29]AC116366.1, [30]AC244502.1, [31]AC100810.1, MIAT,
[32]AC093297.2, and [33]AL356417.2) and constructed a prognostic
signature. The cohorts in the high−risk group had significantly poor
survival time compared to those in the low−risk group. In addition, a
nomogram integrated with clinical features and the prognostic signature
was developed on the basis of the training set. Importantly, all the
findings had a similar performance in three validated datasets. In the
following studies, our integrative analyses indicated that the
infiltration of CD8-positive (CD8) T cells associated with a good
prognosis was strikingly activated in the low−risk group. To further
provide an interpretation of biological mechanisms for the prognostic
signature, we performed weighted gene co−expression network analysis
(WGCNA) followed by KEGG pathway-enrichment analysis. Our results
showed that the antigen presentation pathway involved in protein
processing in endoplasmic reticulum and antigen processing and
presentation was markedly altered in the high-risk group, which might
promote tumor immune evasion and associate with poor clinical outcomes
in BC patients with high risk scores. In conclusion, we aimed to take
advantage of bioinformatics analyses to explore immune−related lncRNAs,
which could function as prognostic indicators and promising therapeutic
targets for BC patients.
Keywords: breast cancer, immune infiltration, long non-coding RNA, gene
expression omnibus, the cancer genome atlas, prognostic signature
Introduction
Breast cancer (BC) is one of the most frequent malignant tumors and a
leading cause of cancer death among females ([34]Siegel et al., 2019).
Although comprehensive treatments including surgery, radiotherapy, and
chemotherapy have been well supplied, the prognosis of BC patients is
not satisfactory ([35]Schwartz and Erban, 2017; [36]Emens, 2018).
Therefore, it is urgent to have an insight into novel prognostic and
diagnostic markers to further develop novel therapeutic approaches for
BC patients.
The tumor microenvironment (TME) is a complicated system, consisting of
not only tumor cells but also tumor-associated normal epithelial and
stromal cells, immune cells, and vascular cells ([37]Junttila and de
Sauvage, 2013). Emerging evidence has identified that dysfunctional
immune status in TME is a hallmark of cancers ([38]Yu et al., 2018).
Infiltrating immune cells in TME play critical roles in the initiation
and progression of cancer, and positively or negatively influence
patient prognosis, which is primarily due to tumor heterogeneity and
distinct populations of tumor-infiltrating immune cells ([39]Talmadge
et al., 2007). The infiltration of regulatory T (Treg) cells inhibits
endogenous immune responses against tumors and correlates with poor
prognosis ([40]Joshi et al., 2015), whereas dendritic cell (DC)
infiltration is typically associated with a positive clinical outcome
associated with their capacity to initiate and regulate both innate and
adaptive immunity ([41]Saxena and Bhardwaj, 2018). Myeloid-derived
suppressor cells (MDSCs) could prevent cytotoxic T lymphocytes (CTLs)
from binding to the peptide–MHC complex and therefore inhibit antitumor
activity ([42]Nagaraj et al., 2007). In contrast, infiltration of
CD8-positive cytotoxic T lymphocytes (CD8 T cells) into solid tumors is
associated with good prognosis in various types of cancer, including BC
([43]Jiang and Shapiro, 2014; [44]Li et al., 2019). CD4-positive T
cells, namely, T follicular helper (TFH) cells, have two predominant
cell subtypes, Th1 and Th2. In general, infiltration of TH1 cells
predicts a positive outcome, whereas TH2 cells predict a negative
outcome ([45]Giraldo et al., 2014; [46]Pradhan et al., 2014). Natural
killer (NK) cells, lymphocytes of the innate immune system, are
essential for defense against infectious pathogens and nascent
malignant cells. Classically, the CD56dim NK cell subset is considered
to mediate antitumor responses, whereas the CD56bright NK cell subset
is involved in immunomodulation. Nevertheless, current studies
demonstrated that brief priming with IL-15 strikingly enhanced the
antitumor response of CD56 bright NK cells ([47]Wagner et al., 2017).
Given that clinical outcomes of cancer patients were frequently
associated with the infiltration of immune cells in TME, current
diagnostic and therapeutic strategies are mainly focused on the
analysis and manipulation of infiltrating immune cellular populations
([48]Talmadge et al., 2007; [49]Tamborero et al., 2018).
Long non-coding RNAs (lncRNAs), longer than 200 nucleotides, play
crucial roles in the occurrence and progression of cancers ([50]Iyer et
al., 2015). Previous studies have shown that lncRNAs could interact
with other biomolecules, such as proteins, regulatory DNA regions, and
miRNAs, and therefore regulate gene expression at multiple levels,
including epigenetic, transcriptional, and post-transcriptional ([51]Wu
et al., 2018). Recently, mounting studies found that the lncRNAs also
play a pivotal role in cancer immunity ([52]Hu et al., 2013; [53]Heward
and Lindsay, 2014). For instance, lincRNA-Cox2, proximal to the
prostaglandin-endoperoxide synthase 2 (Ptgs2/Cox2) gene, regulated both
the activation and repression of diverse types of immune genes
([54]Carpenter et al., 2013; [55]Hu et al., 2016). LncRNA FENDRR had an
impact on the immune escape of hepatocellular carcinoma cells through
interaction with miR-423-5p and GADD45B ([56]Yu et al., 2019). Another
research revealed that lncRNA NKILA overexpression in tumor-specific
CTLs and TH1 promoted their apoptosis and contributed to shorter
survival time in patients with BC or lung cancer ([57]Huang et al.,
2018). However, the immune−related lncRNAs associated with BC
progression remain poorly understood.
In this study, we integrated bioinformatics analyses and took
advantages of lncRNA expression profiles to explore novel prognostic
marks and therapeutic targets for BC patients.
Materials and Methods
Sample Datasets and Data Processing
The [58]GSE20685 dataset was downloaded from the GENE EXPRESSION
OMNIBUS (GEO) which contains 327 BC cases. This dataset was based on
the [59]GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array).
Meanwhile, patients’ clinical information was also obtained from the
GEO database, including age, overall survival time (OS), survival
status, and clinical stage. We re-annotated probe sets of Affymetrix
Hg-U133 Plus 2.0 array by mapping all probes to the human genome
(GRCh37) using SeqMap ([60]Jiang and Wong, 2008). In the analysis, we
ensure that those probes mapped to the genome were unique with no
mismatch, and all probes mapped to pseudogene transcripts were removed.
The analysis procedure was conducted by the previous study ([61]Du et
al., 2013).
Construction and Validation of Immune-Related LncRNAs Prognostic Signature
Firstly, the immune−related genes were extracted from immune system
process and immune response gene sets sourced from the Molecular
Signatures Database (MSigDB,^[62]1) ([63]Liberzon et al., 2015). Then,
a cohort of immune−related lncRNAs was identified according to Pearson
correlation analysis between immune genes and lncRNA expression level
in BC samples (|r| > 0.5, p < 0.001). Next, the univariate Cox
regression analysis and Kaplan–Meier method were separately performed
to sort out survival−related lncRNAs with significant prognostic value
(p < 0.05) on the basis of the training set. Following these two
independent analyses, the overlapping lncRNAs with significant
difference (p < 0.05) were sent for further analyses. Finally, Lasso
penalized Cox regression analysis followed by stepwise multivariate Cox
regression analysis identified the most promising survival-related
lncRNAs and constructed a prognostic signature.
The enrichment scores (ES) of immune genes associated with the
prognostic signature were calculated using the Single-sample Gene Set
Enrichment Analysis (ssGSEA) in Gene Set Variation Analysis (GSVA)
package ([64]Hanzelmann et al., 2013), and then the immune signature
based on enrichment scores was developed. The Pearson correlation
analysis was employed to evaluate the relationship between the
prognostic signature and the immune signature. Subsequently, random
forest (RF) and logistic regression (LR), two well machine learning
algorithms, were applied to evaluate the performance of the six-lncRNA
signature for stratifying the immune status based on the median
enrichment score of immune signature according to the area under the
ROC curve (AUC) through five-fold cross validation ([65]Han et al.,
2014).
Risk Score Construction
To confirm the reliability of the detected immune−related lncRNAs, the
risk score was designed to establish a unitive signature for the
prognostic analysis, and the coefficient for each lncRNA was obtained
through the multivariate cox regression following bidirectional
stepwise selection. A risk score formula was established as follows:
[MATH: ∑iCoefficient(IncRNAi)×Expression(IncRNAi) :MATH]
Risk score = (−0.468 × expression of [66]AC116366.1) + (−0.260 ×
expression of [67]AC244502.1) + (−0.279 × expression of [68]AC100810.1)
+ (−0.162 × expression of MIAT) + (−0.312 × expression of
[69]AC093297.2) + (0.173 × expression of [70]AL356417.2).
Nomogram Construction
A nomogram integrated clinical features and the six-lncRNA signature
was constructed using the “rms” R package. Calibration curve were
performed to assess the predictive accuracy of the nomogram. The
predicted outcomes and actual observed outcomes of the nomogram were
presented in the calibrate curve, and the 45° line represents the best
prediction.
Gene Set Enrichment Analysis
To further explore the relationship between immune-related gene sets
and the six-lncRNA signature, GSEA was performed with the JAVA program.
Genes were ranked on the basis of differential significance between the
high- and low- risk groups, which were stratified via the median risk
score in BC patients. After performing 1000 permutations, gene sets
enrichment with nominal p < 0.05 and FDR < 0.25 were considered
significant.
Weighted Gene Co-expression Network Analysis (WGCNA)
The WGCNA package ([71]Langfelder and Horvath, 2008) was applied for
further analysis on the basis of the training set, and the analysis
procedure was conducted as previously described ([72]Giulietti et al.,
2016). In brief, two weighted gene co-expression networks were
constructed via the WGCNA package according to quartile of risk scores,
with the bottom group serving as the reference network and the top
group serving as the test network. The conservation of gene pairs
between two networks was evaluated by module preservation function from
WGCNA. In order to make analysis more accurate, 1000 times of
permutation were performed. Subsequently, median rank was used to
confirm non-preservative modules and Z-summary score (Z-score) was
applied to evaluate the significance of non-preservative modules. Note
that Z-scores greater than 10 indicate a high preservation and Z-scores
less than 10 indicate a weak preservation. Finally, the genes in
non-preservation modules were enrolled to send for DAVID tool (version
6.8) ([73]Huang et al., 2007) to map these hub genes to KEGG pathways.
Enriched KEGG pathways with p-values < 0.05 were extracted, and then
the enrichment results were visualized by ggraph package.
External Data Validation
To further validate predictive capability of the six-lncRNA signature,
we enrolled three independent datasets, including [74]GSE21653,
[75]GSE88770, and TCGA with a total of 245, 117, and 888 cases,
respectively. [76]GSE21653 and [77]GSE88770 were based on the platform
[78]GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). The TCGA
dataset (level3, raw count) was obtained from The Cancer Genome Atlas
(TCGA) via the TCGAbiolinks package. Patients without complete
information in training and validation sets had been excluded from this
study. The inclusion patient ids are listed in [79]Supplementary Table
S1.
Statistical Analysis
In Kaplan–Meier survival analysis, the “surv_cutpoint” function in
survminer package was applied to determine the optimal cutoff value of
risk scores. Note that the amount of samples in any group could not be
less than 30% of the total samples. According to the optimal cutoff
value, BC patients were divided into high− and low−risk groups to
perform survival analysis. The time−dependent receiver operating
characteristic (ROC) analysis was performed to investigate the
prognostic accuracy of the six−lncRNA signature using the survivalROC
package. After the ssGSEA method calculated the enrichment scores of 24
immune cell types ([80]Bindea et al., 2013), Pearson correlation
analysis was performed to evaluate the relationship between risk scores
and infiltrating immune cells. Logistic regression analysis was carried
out to evaluate whether combination of the six lncRNAs have higher
predictive accuracy for infiltrating status of CD8 T-cells than each
predictor alone. Differences with p < 0.05 were considered significant.
Results
Construction of Immune-Related LncRNAs Prognostic Signature
A flow chart described our analysis procedure was presented in
[81]Figure 1A. In our study, 327 samples sourced from [82]GSE20685
served as the training set. A total of 313 immune−related genes were
obtained from the Molecular Signatures Database, which were associated
with immune response and immune system process ([83]Supplementary Table
S2). Next, 82 immune−related lncRNAs were extracted using Pearson
correlation analysis (|r| > 0.5, p < 0.001) ([84]Supplementary Table
S3). Univariate Cox regression analysis and Kaplan–Meier method was
separately performed to identify survival-related lncRNAs (p < 0.05),
of which 13 were overlapped ([85]Figure 1B and [86]Supplementary Table
S4). After primary filtration, Lasso penalized Cox regression analysis
following stepwise multivariate Cox regression analysis was carried out
to sort out six survival-related lncRNAs and developed a prognostic
signature ([87]Figures 1C,D; [88]Supplementary Table S5). Moreover, a
regulatory network associated with 6 lncRNAs and 86 signature-related
immune genes was established and visualized using Cytoscape3.7.1
([89]Supplementary Figure S1).
FIGURE 1.
[90]FIGURE 1
[91]Open in a new tab
Construction of Immune–related LncRNAs Prognostic Signature (A) Flow
chart indicating the analysis procedure. (B) Survival analysis was
separately performed by univariate Cox regression analysis and
Kaplan–Meier method, of which 13 survival-related lncRNAs were
overlapped. (C) The selection of the tuning parameter via 10 times
cross–validation in the Lasso model. (D) Lasso coefficient profiles of
13 lncRNAs. (E) The predictive efficacy of the six-lncRNA signature in
classification of immune signature. RF: random forest; LR: logistic
regression. (F) The correlation identified by Pearson correlation
analysis among immune signature, six-lncRNA signature, and six lncRNAs.
Based on 86 model-related immune genes ([92]Supplementary Table S6), we
constructed an immune signature via the ssGSEA method. The enrichment
scores of the immune signature are listed in [93]Supplementary Table
S7. Subsequently, random forest (RF) and logistic regression (LR)
revealed that the prognostic signature could significantly stratify two
immune subgroups on the basis of the median enrichment score (LR, AUC
score = 0.919, RF, AUC score = 0.898) ([94]Figure 1E). Moreover, the
interplay among immune signature, prognostic signature, and six lncRNAs
was evaluated by Pearson correlation analysis ([95]Figure 1F;
[96]Supplementary Table S8). We observed that the six-lncRNA signature
was negatively associated with the immune signature (r = -0.342, p =
1.99E-10). In the prognostic signature, [97]AC116366.1, [98]AC244502.1,
[99]AC100810.1, MIAT, and [100]AC093297.2 acted as the protective
factors with an HR value less than 1, and [101]AL356417.2 acted as the
deleterious factor with an HR value greater than 1 ([102]Table 1).
TABLE 1.
Six immune-related lncRNAs of the prognostic signature.
Pearson correlation analysis
__________________________________________________________________
Univariate Cox regression analysis
__________________________________________________________________
lncRNAs Ensemble_ID R p-value HR HR.95L HR.95H p-value
[103]AC116366.1 ENSG00000234290.2 0.462 1.12E−18 0.427 0.280 0.649
6.76E−5
[104]AC244502.1 ENSG00000251002.6 0.786 6.59E−70 0.777 0.660 0.915
0.0025
[105]AC100810.1 ENSG00000253982.1 –0.195 0.0004 0.705 0.563 0.882
0.0023
MIAT ENSG00000225783.5 0.540 3.54E−26 0.853 0.746 0.975 0.0202
[106]AC093297.2 ENSG00000272335.1 –0.446 2.04E−17 0.757 0.628 0.912
0.0034
[107]AL356417.2 ENSG00000233682.3 0.140 0.0112 1.319 1.079 1.613 0.0070
[108]Open in a new tab
R: Correlation of six lncRNAs and the immune signature.
Primary Evaluation of the Six-LncRNA Signature in the Training Set
Overall survival curves of six immune-related lncRNAs are shown in
[109]Figure 2A. Risk scores, survival status, OS period, and six-lncRNA
expression level of each patient are presented in [110]Figure 2B.
Moreover, patients in the low−risk group showed increased survival time
in contrast to those in the high−risk group ([111]Figure 2C).
Univariate following multivariate Cox regression analysis was conducted
to identify that N-stage and the six-lncRNA signature functioned as
independent prognostic factors ([112]Supplementary Figures S2,
[113]S3). The predictive accuracy of the six-lncRNA signature was
assessed by time−dependent ROC curves at 1, 3, and 5 years on the basis
of AUC scores. The AUC scores at 1, 3, and 5 years were 0.88, 0.758,
and 0.725, respectively, which indicated that the prognostic signature
had a moderate sensitivity and specificity in the training set
([114]Figure 2D).
FIGURE 2.
[115]FIGURE 2
[116]Open in a new tab
Primary evaluation of the six-LncRNA Signature in the Training Set. (A)
Overall survival curves of six lncRNAs. (B) From top to bottom are the
risk score, survival status distribution, and six-lncRNA expression
level of each patient. (C) Overall survival curves of the prognostic
signature, in which the blue line represents the high-risk subgroup and
the yellow line represents the low-risk subgroup. (D) The 1-, 3-, and
5-year survival receiver operating characteristic curves.
Further Evaluation of the Six-LncRNA Signature in Validation Sets
To further evaluate the predictive efficacy of the six−lncRNA
signature, three independent external datasets ([117]GSE21653,
[118]GSE88770, and TCGA) were used to validate our findings.
Kaplan–Meier analysis showed that patients in the low-risk group had
evidently longer survival time than those in the high-risk group in
three validation sets ([119]Figures 3A–C), which was consistent with
the results from the training set. The ROC curves at 1, 3, and 5 years
indicated that the signature also had a good predictive performance in
validation sets ([120]Figures 3D–F).
FIGURE 3.
[121]FIGURE 3
[122]Open in a new tab
Further evaluation of the six-LncRNA Signature in Validation Sets.
(A–C) Survival curves of the prognostic signature in three validation
sets. [123]GSE21653: Disease-free survival (DFS) curve. [124]GSE88770
and TCGA: Overall survival (OS) curve. (D–F) Survival receiver
operating characteristic curves in three validation sets.
Construction and Evaluation of Nomogram
To enhance the predictive accuracy, four clinical features and the
six-lncRNA signature were integrated and a nomogram was constructed on
the basis of the training set ([125]Figure 4A). The calibration curves
presented a good agreement between prediction by nomogram and actual
observation for predicting survival probability at 3 and 5 years in
training and validation datasets ([126]Figures 4B–E).
FIGURE 4.
[127]FIGURE 4
[128]Open in a new tab
Construction and Evaluation of Nomogram. (A) Nomogram to predict
survival probability. (B–E) Three- and 5-year nomogram calibration
curves in training and validation sets. The training set:
[129]GSE20685. The validation sets: [130]GSE21653, [131]GSE88770, and
TCGA.
Immune Status Analysis for High- and Low-Risk Groups
In the following study, GSEA was applied to show that immune system
process and immune response gene sets were significantly enriched in
the low−risk group ([132]Figures 5A,B). The ssGSEA followed by Pearson
correlation analysis was conducted to validate the relationships
between risk scores and infiltration immune cells. As presented in
[133]Figure 5C, risk scores had the strongest correlation with CD8 T
cells compared to other types of immune cells. Based on the median risk
score, BC patients in the training set were stratified into two
subgroups, high- and low-risk groups. High-risk cohorts displayed lower
enriched levels of the immune signature and CD8 T cells compared to
low-risk cohorts ([134]Figure 5D). Meanwhile, the immune signature was
strikingly positively related with CD8 T cells on the basis of Pearson
correlation analysis (r = 0.72, p = 1.54E-53; [135]Supplementary Figure
S4). Survival curve revealed that infiltration of CD8 T cells was
associated with a favorable prognosis in BC patients ([136]Figure 5E).
FIGURE 5.
[137]FIGURE 5
[138]Open in a new tab
Immune status analysis for high– and low–risk groups. (A,B) Gene set
enrichment analysis (GSEA) indicated a significant enrichment of
immune–related phenotype in the low–risk group. (C) Pearson correlation
analysis between risk scores of the six-lncRNA signature and enrichment
scores of the immune cells. The results with p-value < 0.05 were
extracted. (D) Enrichment scores of immune signature and CD8 T cells in
high- and low-risk cohorts. The patients were divided into two
subgroups on the basis of the median risk score. The enrichment score
was scaled to be between 0 and 1 in the plot. Orange: immune signature;
blue: CD8 T cells; red: high-risk patients; green: low-risk patients.
(E) Kaplan–Meier curve associated with CD8 T cells.
In training and validation sets, risk scores were negatively related
with enrichment scores of CD8 T cells on the basis of Pearson
correlation analysis ([139]Figures 6A,D). To further identify
predictive efficacy of the signature for infiltration of CD8 T cells,
the patients of training and validation sets were divided into high-
and low-infiltration groups according to the median enrichment score of
CD8 T cells. Logistic regression analysis showed that the combination
of six lncRNAs had higher predictive efficacy than each predictor alone
([140]Figures 7A–D).
FIGURE 6.
[141]FIGURE 6
[142]Open in a new tab
The correlation of the six-lncRNA signature and CD8 T cells. (A–D)
Pearson correlation analysis was conducted to identify the relationship
between risk scores of the prognostic signature and enrichment scores
of CD8 T cells in training and validation sets.
FIGURE 7.
[143]FIGURE 7
[144]Open in a new tab
The predictive efficacy of the six-lncRNA signature for infiltration of
CD8 T cells. (A–D) Logistic regression was conducted to evaluate the
predictive efficacy of the combination of the six lncRNAs and each
lncRNA alone on the basis of the AUC value in training and validation
sets.
Construction of Weighted Gene Co-expression Network and Identification of Key
Modules
In the following study, we applied the WGCNA method to identify
non-preservation modules, whose characteristics could distinguish the
test network from the reference network. Firstly, we confirmed a soft
threshold power value of 16 as best optimum index in the high-risk
group ([145]Supplementary Figure S5) and 2 as the best optimum index in
the low-risk group ([146]Supplementary Figure S6). Next, we identified
6 modules in the high-risk group and 22 modules in the low-risk group
except for the gray module via hierarchical clustering. Finally, the
module preservation function in the WGCNA package was applied to
identify three non-preservation modules, the light green module
(Z-score = 8.2), the midnight blue module (Z-score = 9.8) and the tan
module (Z-score = 6.0) ([147]Figure 8A).
FIGURE 8.
[148]FIGURE 8
[149]Open in a new tab
Construction of weighted co–expression network and identification of
key modules. (A) Each module is represented by its color-code and name.
Left plot shows the preservation median rank, which tends to be
independent from module size with high median ranks indicating low
preservation. Right plot shows Z-score value. The green dotted line
indicates that Z-score is equal to 10. Z-score < 10 represents weak
preservation. (B) Enriched KEGG pathways with p-values < 0.05 were
presented. KME value (eigengene connectivity) of module internal hub
genes was indicated by dots’ size, respectively.
To further investigate the biological mechanisms of non-preservation
modules, we mapped internal genes of non-preservation modules to KEGG
pathways using the DAVID tool (version 6.8) ([150]Huang et al., 2007).
Enriched KEGG pathways with p values < 0.05 were extracted. KEGG
pathway-enrichment analysis demonstrated that the midnight blue module
was mainly involved in protein processing in endoplasmic reticulum and
antigen processing and presentation; the light green module was mainly
involved in sphingolipid signaling pathway, hippo signaling pathway,
and tight junction; and the tan module was mainly involved in circadian
rhythm ([151]Figure 8B). In summary, the midnight blue module could be
a crucial module, whose disorganized functions affected immune cell
infiltration and contribute to shorter survival time in patients with
high risk scores.
Discussion
Infiltration of immune cells in TME has dual roles on cancer
progression and patient prognosis, and deciphering their complex
interplay is of seminal importance for exploring novel prognostic
markers and therapeutic targets in cancer patients ([152]Fridman et
al., 2011; [153]Klemm and Joyce, 2015). LncRNAs have been reported to
function as key regulators in immune response, inflammation, and
distinct immune cell types ([154]Das et al., 2018; [155]Li et al.,
2018; [156]Wang et al., 2018; [157]Xu and Cao, 2019), and play crucial
roles in cancer malignant progression ([158]Bartonicek et al., 2016;
[159]Jiang et al., 2016; [160]Schmitt and Chang, 2016). Therefore,
insight into immune-related lncRNAs could bring profound effects on
cancer-related therapeutics and survival prognosis prediction
([161]Chen et al., 2017).
Since the lncRNA expression level and biological functions were
markedly associated with immune-related genes ([162]Roux et al., 2017),
we employed immune-associated gene sets to screen out highly
immune-related lncRNAs by co-expression analysis. Subsequently, a
series of bioinformatics analyses were integrated to identify a
six-immune-related lncRNA signature, which could well stratify patients
into favorable or unfavorable prognoses. Moreover, the six-lncRNA
signature was considered as an independent risk factor by multivariate
Cox regression analysis. To further improve the predictive accuracy, a
nomogram consisting of age, TMN stage, and the six-lncRNA signature was
constructed. A good performance of the nomogram was identified by the
calibration curve. Meanwhile, our findings were validated in multiple
BC datasets. Additionally, functional annotation conducted by GSEA
revealed that the immune-related gene sets, the immune system process,
and immune response were significantly activated in the low-risk group.
Previous studies have shown that different types of infiltrating immune
cells have distinct effects on tumor progression ([163]Fridman et al.,
2012; [164]Bindea et al., 2013). In order to investigate the
association between infiltration immune cells and the lncRNAs
signature, the ssGSEA method was used to calculate enrichment scores of
24 immune cell types ([165]Bindea et al., 2013), and then Pearson
correlation analysis was conducted to evaluate their correlations. Our
integrated analyses indicated that risk scores were significantly
associated with the enrichment level of CD8 T cells. In addition, the
combination of six lncRNAs had higher prognostic accuracy for CD8 T
cell infiltrating status than each predictor alone, indicating that the
signature could provide promising immunotherapeutic targets for the
treatment of BC. CD8 T cells function as a tumor suppressor and play
crucial roles in cancer progression ([166]Barry and Bleackley, 2002;
[167]Vesely et al., 2011; [168]Jansen et al., 2019). In BC, previous
studies have demonstrated that CD8 T cells are the primary effector
immune cells and associated with favorable clinical outcomes of BC
patients ([169]Mahmoud et al., 2011). Taken together, dysregulation of
CD8 T cells might be a key mechanism associated with a higher mortality
of patients in the high-risk group.
To further provide an interpretation of biological mechanisms for the
six-lncRNA signature, we performed WGCNA analysis to explore
non-preservative modules in the high-risk group compared with the
low-risk group. Subsequent KEGG pathway-enrichment analysis
demonstrated that the midnight blue module was strongly associated with
immune-related pathways mainly involved in protein processing in
endoplasmic reticulum and antigen processing and presentation. In the
immune system, the antigen presentation pathway plays central roles for
the immune surveillance and elimination of malignant transformations
([170]Hu et al., 2019). Nascent malignant cells may reduce antigenicity
so that anti-tumor lymphocytes fail to detect transformed cells to
escape immune surveillance ([171]Hu et al., 2019). Endoplasmic
reticulum (ER) could process and present mutation-derived tumor-related
nascent antigens to CD8 T cells, which contribute to tumor suppression
([172]Johnsen et al., 1999; [173]Hu et al., 2019). In our study, the
antigen presentation pathway involved in protein processing in
endoplasmic reticulum and antigen processing and presentation was
disorganized in the high-risk group, which might promote tumor immune
evasion and associate with poor clinical outcomes in the patients with
high risk scores.
In the study, we integrated bioinformatics analyses to provide a solid
theoretical basis for potential value of the six-lncRNA signature.
However, in the process of public database mining, some unknown factors
are usually difficult to extrapolate. In subsequent studies, we will
further demonstrate and improve our findings by more rigorous
experimental evidence in immune-related lncRNAs associated with BC.
Conclusion
In conclusion, we integrated informatics analyses to identify a six
immune-related lncRNAs signature. Moreover, CD8 T cell infiltration was
significantly activated in the low-risk group, which could contribute
to a better prognosis of BC patients. Our findings aim to help improve
clinical outcome prediction and provide promising immunotherapeutic
targets for BC patients.
Data Availability Statement
Publicly available datasets were analyzed in this study, these can be
found in The Cancer Genome Atlas ([174]http://cancergenome.nih.gov/)
and the NCBI Gene Expression Omnibus
([175]https://www.ncbi.nlm.nih.gov/geo/) ([176]GSE20685, [177]GSE21653,
and [178]GSE88770).
Author Contributions
QY conceived and directed the project. ZL designed the study and
analyzed the data. ZL and YL wrote the manuscript. XW reviewed the
data. All authors have read and approved the final manuscript for
publication.
Conflict of Interest
The authors declare that the research was conducted in the absence of
any commercial or financial relationships that could be construed as a
potential conflict of interest.
Abbreviations
AUC
the area under the ROC curve
BC
breast cancer
CD8 T cells
CD8-positive cytotoxic T lymphocytes
DFS
disease-free survival
FDR
false discovery rate
GEO
the Gene Expression Omnibus
GSEA
gene set enrichment analysis
GSVA
gene set variation analysis
HR
hazard ratio
KEGG
Kyoto Encyclopedia of Genes and Genomes
lncRNAs
long non-coding RNAs
LR
logistic regression analysis
MSigDB
the Molecular Signatures Database
OS
overall survival
RF
random forest
ROC curve
receiver operating characteristic curve
ssGSEA
single sample gene set enrichment analysis
TCGA
The Cancer Genome Atlas
TME
the tumor microenvironment
WGCNA
weighted gene co-expression network analysis.
Funding. This work was supported by the National Natural Science
Foundation of China (Nos. 81672613 and 81874119), the Special
Foundation for Taishan Scholars (No. ts20190971), the Special Support
Plan for National High Level Talents (Ten Thousand Talents Program
W01020103), the National Key Research and Development Program (No.
2018YFC0114705), and the Qilu Hospital Clinical New Technology
Developing Foundation (Nos. 2018-7 and 2019-3).
^1
[179]https://www.gsea-msigdb.org/gsea/msigdb/
Supplementary Material
The Supplementary Material for this article can be found online at:
[180]https://www.frontiersin.org/articles/10.3389/fgene.2020.00680/full
#supplementary-material
[181]Click here for additional data file.^ (884.3KB, JPEG)
[182]Click here for additional data file.^ (719.8KB, JPEG)
[183]Click here for additional data file.^ (718.1KB, JPEG)
[184]Click here for additional data file.^ (599.7KB, JPEG)
[185]Click here for additional data file.^ (371.1KB, JPEG)
[186]Click here for additional data file.^ (370.8KB, JPEG)
[187]Click here for additional data file.^ (34.5KB, XLSX)
[188]Click here for additional data file.^ (15.2KB, XLSX)
[189]Click here for additional data file.^ (12.1KB, XLSX)
[190]Click here for additional data file.^ (9.6KB, XLSX)
[191]Click here for additional data file.^ (9.2KB, XLSX)
[192]Click here for additional data file.^ (10.7KB, XLSX)
[193]Click here for additional data file.^ (60.5KB, XLSX)
[194]Click here for additional data file.^ (11.2KB, XLSX)
References