Abstract
Globally, non-small cell lung cancer (NSCLC) is the most common
malignancy and its prognosis remains poor because of the lack of
reliable early diagnostic biomarkers. The competitive endogenous RNA
(ceRNA) network plays an important role in the tumorigenesis and
prognosis of NSCLC. Tumor immune microenvironment (TIME) is valuable
for predicting the response to immunotherapy and determining the
prognosis of NSCLC patients. To understand the TIME-related ceRNA
network, the RNA profiling datasets from the Genotype-Tissue Expression
and The Cancer Genome Atlas databases were analyzed to identify the
mRNAs, microRNAs, and lncRNAs associated with the differentially
expressed genes. Weighted gene co-expression network analysis revealed
that the brown module of mRNAs and the turquoise module of lncRNAs were
the most important. Interactions among microRNAs, lncRNAs, and mRNAs
were prognosticated using miRcode, miRDB, TargetScan, miRTarBase, and
starBase databases. A prognostic model consisting of 13 mRNAs was
established using univariate and multivariate Cox regression analyses
and validated by the receiver operating characteristic (ROC) curve. The
22 immune infiltrating cell types were analyzed using the CIBERSORT
algorithm, and results showed that the high-risk score of this model
was related to poor prognosis and an immunosuppressive TIME. A
lncRNA–miRNA–mRNA ceRNA network that included 69 differentially
expressed lncRNAs (DElncRNAs) was constructed based on the five mRNAs
obtained from the prognostic model. ROC survival analysis further
showed that the seven DElncRNAs had a substantial prognostic value for
the overall survival (OS) in NSCLC patients; the area under the curve
was 0.65. In addition, the high-risk group showed drug resistance to
several chemotherapeutic and targeted drugs including cisplatin,
paclitaxel, docetaxel, gemcitabine, and gefitinib. The differential
expression of five mRNAs and seven lncRNAs in the ceRNA network was
supported by the results of the HPA database and RT-qPCR analyses. This
comprehensive analysis of a ceRNA network identified a set of
biomarkers for prognosis and TIME prediction in NSCLC.
Introduction
Lung cancer is the leading cause of cancer-related deaths [[44]1].
Non-small cell lung cancer (NSCLC) accounts for approximately 83% of
all lung cancers [[45]2]; its two dominant histological phenotypes
include lung adenocarcinoma (LUAD, ~50%) and lung squamous cell
carcinoma (LUSC, ~40%) [[46]3]. Although remarkable advances have been
made for lung cancer diagnoses and treatment strategies, 60-month
overall survival (OS) rate and 5-year survival rates remain poor (68%
and 0%–10% at stages IB and IV, respectively) [[47]4]. Thus, accurate
detection of NSCLC at an early stage can provide a good prognosis.
However, localized diseases at stages I or II are diagnosed in only 16%
of patients [[48]5]. A low-dose computerized tomography scan is
currently the most practical method for the early diagnosis of lung
cancer [[49]6]. However, its high false-positive rate of up to 96.4%
[[50]7] requires greater medical attention than needed and is an
unnecessary psychological burden for the patients. Diagnostic precision
can be enhanced by developing biomarkers that can accurately classify
the patients according to their probable disease risk, diagnosis, and
prognosis or response to treatment [[51]8]. In addition, functional
biomarkers with known underlying mechanisms related to the disease can
be used as potential therapeutic targets [[52]8]. For instance,
identification of epidermal growth factor receptor (EGFR), anaplastic
lymphoma kinase (ALK), c-ros proto-oncogene 1 receptor tyrosine kinase
(ROS1), kirsten rat sarcoma viral oncogene homolog (KRAS),
serine/threonine-protein kinase B-Raf (BRAF), mesenchymal-epithelial
transition factor (MET), proto-oncogene tyrosine-protein kinase
receptor Ret (RET), human epidermal growth factor receptor 2 (HER2),
and neurotrophic receptor tyrosine kinase (NTRK) tyrosine kinase
inhibitors (TKIs) have improved the outcomes for oncogene-predisposed
NSCLC patients [[53]9]. However, patient responses to TKIs are usually
temporary because tumors eventually develop resistance to targeted
therapies through on- or off-target mechanisms [[54]10]. NSCLC patients
show a positive response to immune checkpoint inhibitors (ICIs) that
target programmed cell death-1 (PD-1)/programmed cell death ligand-1
(PD-L1) interaction [[55]11]. Nevertheless, most NSCLC patients do not
show any initial responses to ICIs, whereas others who respond for a
certain period relapse due to an immunosuppressive microenvironment
[[56]12]. Thus, future studies should focus on possible options such as
combination strategies involving an agent that can address a specific
on- or off-target resistance mechanism upfront or during disease
progression [[57]12,[58]13]. Therefore, the identification of novel
molecular network biomarkers for early screen to improve prognosis and
treatment in NSCLC is needed.
Cellular and molecular immune markers in the tumor immune
microenvironment (TIME) play an important role in NSCLC development.
The density and localization of tumor-infiltrating immune cells greatly
affect the OS in NSCLC patients [[59]14]. In NSCLC patients, a high
intra-tumoral density of immature dendritic cells, regulatory T cells,
and M2 macrophages has been associated with poor prognosis, whereas a
high intra-tumoral density of CD8+ T cells, CD4+ T cells, M1
macrophages, mature dendritic cells, and natural killer cells has been
correlated with better prognosis [[60]14]. PD-1 is the main immune
checkpoint for immunotherapy in NSCLC. Several meta-analyses have
recently shown a possible link between the PD-1/PD-L1 axis and the
prognosis of NSCLC patients [[61]15]. The expression of chemokines,
including CXCL9, CXCL -10, and CXCL -16, is also correlated with the
prognosis of these patients [[62]16]. Thus, the complex regulation of
TIME in NSCLC must be examined to identify effective immune targets for
accurate prognosis and increase the efficacy of clinical immunotherapy.
Pandolfi et al. [[63]17] first proposed the competing endogenous RNA
(ceRNA) hypothesis. Accordingly, a post-transcriptional regulatory
network allows all RNA transcripts including mRNAs, long non-coding
RNAs (lncRNAs), and circular RNAs, to regulate each other, by competing
for shared microRNAs (miRNAs). miRNA response elements (MREs) are the
keystone for the competitive binding of miRNAs with lncRNAs and mRNAs
[[64]18]. ceRNAs participate in the pathogenesis of multiple cancers,
including NSCLC, and their abnormal expression is substantially
associated with the prognosis of the patients [[65]19,[66]20]. Jiang et
al. [[67]21] constructed a ceRNA network to identify a prognostic
signature in bladder cancer. Liu et al. [[68]22] developed the
seven-lncRNA prognostic signature in melanoma through an integrative
analysis of the DElncRNA–DEmiRNA–DEmRNA ceRNA network. For the ceRNA
network of NSCLC, through gene expression studies, some biomarkers
associated with prognosis and tumor progression have been identified
[[69]23,[70]24]. Based on the alternative splicing events, Li et al.
[[71]25] developed a high-performance prognostic predictor model for
risk stratification in patients with NSCLC. The above studies have
focused on the comparative analyses between limited peritumoral samples
and paired tumor samples from The Cancer Genome Atlas (TCGA) database.
However, this paradigm is far from ideal because of the differences in
the transcriptomic profiles between para-tumor and healthy tissues
[[72]26]. The Genotype-Tissue Expression (GTEx) dataset is a
large-scale public resource for genome-wide association studies in
healthy human tissues [[73]27]. GTEx and TCGA databases allow
unprecedented studies of gene expression to address the limitation
[[74]28]. Weighted gene co-expression network analysis (WGCNA) is an
effective approach to examine the intrinsic links between clinical
outcomes and co-expression gene modules [[75]29]; however, only a few
studies have employed this method to analyze the relevance of
co-expression patterns for ceRNAs.
In this study, a TIME-associated ceRNA regulatory network for the
prognostic prediction of NSCLC was constructed using the data from TCGA
and GTEx databases by multiple bioinformatic analytic methods including
WGCNA and The Cell Type Identification by Estimating Relative Subsets
of RNA Transcripts (CIBERSORT).
Results
Analysis of differentially expressed genes between TCGA and GTEx samples
[76]Fig 1 depicts the study design. The RNA expression levels in 944
NSCLC samples and 578 normal lung samples were evaluated. A total of
1838 significantly up-regulated mRNAs and 3097 down-regulated mRNAs
were identified and termed as differentially expressed mRNAs (DEmRNAs)
in NSCLC; their distribution is shown in [77]Fig 2A. Gene Ontology (GO)
analysis was performed to identify the enriched biological mechanisms
underlying DEmRNAs, and the GO enrichment results are shown in [78]Fig
2B. In the biological process (BP), DEmRNAs were significantly enriched
in cornification, epidermis development, skin development,
keratinization, extracellular structure organization, and keratinocyte
differentiation. The Z-score of GO enrichment was closer to blue
module, which indicated that most of the BPs were likely
under-represented ([79]Fig 2C). Gene Set Enrichment Analysis (GSEA)
showed that the N-glycan biosynthesis, p53 signaling pathway, cell
cycle, and DNA replication-related genes were up-regulated, whereas
calcium, VEGF, mTOR, and MAPK signaling pathways were significantly
down-regulated ([80]Fig 2D).
Fig 1. Flow diagram of study process.
[81]Fig 1
[82]Open in a new tab
Fig 2. Analysis of DEmRNAs from TCGA and GTEx.
[83]Fig 2
[84]Open in a new tab
(A) Volcano map of important DEmRNAs including 1838 up-regulated and
3097 down-regulated mRNAs. (B) GO analysis showing the involvement of
BP, CC, and MF in DEmRNAs. (C) Distribution of DEmRNAs in BP. Red
circles display up-regulated genes, and blue ones show down-regulated
genes in the scatter plot of outer circle. (D) KEGG-GSEA was applied to
the signaling pathway enrichment of DEmRNAs. TCGA, the Cancer Genome
Atlas database; GTEx, the Genotype–Tissue Expression database; DEmRNAs,
differentially expressed mRNAs; BP, biological process; CC, cellular
component; MF, molecular function; GO, Gene Ontology.
Analyses of mRNA modules by WGCNA
The top 80% mRNAs (14428 genes) obtained from variance comparison were
used to construct the gene modules using the WGCNA algorithm.
Soft-thresholding was set at β = 8 and module size cut-off at 25 to
characterize a scale-free network. After the co-expressed genes were
categorized by average linkage clustering, a total of 35 gene color
modules were identified ([85]Fig 3A). Unassigned genes were categorized
into the gray module. A total of 1000 genes from the mRNA color modules
were randomly chosen to plot the network heatmap and visualize the
topological overlap matrix (TOM) ([86]Fig 3B). Among the 35 color
modules, the module eigengene (ME) of the brown module showed the
highest association with NSCLC and normal traits ([87]Fig 3C). Thus,
the brown module containing 9041 genes was considered as the key module
for NSCLC. GO term enrichment analysis was performed for the mRNAs in
the brown module and the functional enrichment and gene interactions in
BPs were identified ([88]Fig 3D and 3E). These genes were mainly
related to the developmental processes in reproduction, multi-organism
reproductive processes, and cell cycle regulation; these were clustered
in focal adhesion, protein processing in the endoplasmic reticulum,
mTOR signaling pathway, and autophagy according to the Kyoto
Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis
([89]Fig 3F).
Fig 3. Analysis of mRNA modules by WGCNA.
[90]Fig 3
[91]Open in a new tab
(A) Dendrogram of mRNAs with top 80% variances clustered based on
topological overlap. (B) TOM heatmap plot of selected genes. Light
colors denote low topological overlap, and dark colors denote high
topological overlap. (C) Correlation of gene color modules between
NSCLC and normal samples. The ME of the brown module was the most
positively related to NSCLC. (D–E) GO-GSEA showing the gene symbols and
gene interaction in the brown module. (F) KEGG investigating the signal
pathway enrichment in the brown module. WGCNA, weighted gene
co-expression network analysis; TOM, topological overlap matrix; NSCLC,
non-small cell lung cancer; ME, module eigengene; GO, Gene Ontology;
KEGG, Kyoto Encyclopedia of Genes and Genomes pathway analyses; GSEA,
Gene Set Enrichment Analysis.
Analysis of lncRNA modules by WGCNA and prediction of target genes
The constructed lncRNA network was used to identify the hub modules.
The top 60% lncRNAs (8102 genes) obtained from variance comparison were
analyzed by WGCNA. Soft-thresholding was set at β = 3 and module size
cut-off at 25. A total of 23 co-expressed lncRNA modules were
identified ([92]Fig 4A). Among them, the turquoise module, which
included 1587 lncRNAs as shown in [93]Fig 4B, was the most
significantly associated with NSCLC (r = 0.92). Furthermore, lncRNAs in
the turquoise module had the most remarkable correlation with gene
significance for NSCLC in all modules ([94]Fig 4C, cor = 0.96). A total
of 316 differentially expressed miRNAs (DEmiRNAs) were obtained from
the comparison of TCGA miRNA-seq data between 961 NSCLC and 90 normal
lung tissue samples ([95]Fig 4D). The lncRNA–miRcode–miRNA relationship
between 1587 differentially expressed lncRNAs (DElncRNAs) in the
turquoise module and 316 DEmiRNAs was evaluated using the online
miRcode tool. Overlapping miRNAs (60) between lncRNA–miRcode–miRNA
(326) and DEmiRNAs (316) were then obtained. Using miRNA target
prediction tools, 60 common miRNAs from the intersection were used for
subsequent analyses. A total of 8704 predicted target mRNAs were
acquired ([96]Fig 4E). As shown in [97]Fig 4F, 1094 mRNAs were obtained
through the interactions of the 8704 predicted mRNAs and 4935 DEmRNAs
(1838 up-regulated and 3097 down-regulated mRNAs) with 9041 mRNAs in
the brown module. The 1094 genes in 1577 samples are shown using an
expression heatmap in [98]Fig 4G.
Fig 4. Analysis of lncRNA modules by WGCNA and prediction of target genes.
[99]Fig 4
[100]Open in a new tab
(A) Cluster dendrogram of lncRNAs with top 60% variances clustered
based on topological overlap. (B) Correlation of lncRNA modules between
NSCLC and normal samples. The ME of the turquoise module was the most
positively related to traits. (C) Relationship between gene
significance and module membership in the turquoise module. (D) Volcano
map of 316 DEmiRNAs (|logFC| > 1, P < 0.05) in NSCLC. (E) Flow chart of
target mRNA prediction. (F) Overlapped target mRNAs were analyzed by
the predicted target mRNAs, WGCNA-brown mRNAs, and the significantly
up- or down-regulated mRNAs. (G) Expression heatmap displaying the 1094
predicted target mRNAs in NSCLC. WGCNA, weighted gene co-expression
network analysis; NSCLC, non-small cell lung cancer; ME, module
eigengene.
Construction of 13-mRNA-based prognostic model to predict chemotherapeutic
response
TCGA patient data including OS and clinical features were obtained for
944 NSCLC patients. The clinicopathological characteristics of NSCLC
patients are shown in [101]Table 1. A univariate regression analysis
was performed to investigate the relevance of the expression levels of
the 1094 target genes with OS in NSCLC. The cut-off for significant
association was set at p < 0.05. As a result, 31 genes were obtained
and further included in the subsequent multivariate cox regression
analysis ([102]Table 2). A prognostic model for OS was constructed
using the following 13 genes: FSTL3, CPS1, PTPN21, DEPDC1B, COL9A3,
DSG2, LAMB1, STYK1, RBM6, DEPDC1, GTSE1, NAV3, and FKBP5. As shown in
[103]Fig 5A, DEPDC1, COL9A3, STYK1, DSG2, GTSE1, and DEPDC1B were
up-regulated, whereas RBM6, LAMB1, CPS1, NAV3, FKBP5, FSTL3, and PTPN21
were down-regulated in patients with NSCLC. [104]Fig 5B displays the
correlation between the expressions of each gene in the 13 gene-based
model. The risk assessment scoring was computed for each NSCLC patient,
and the threshold for risk score was set at 1.006. Accordingly, the
patients were divided into high- (n = 471) and low-risk (n = 473)
groups as shown in [105]Table 3. The accuracy of the 13 gene-based
prognostic model in predicting NSCLC clinical outcomes were estimated
by Kaplan–Meier (K–M) survival analysis, Cox proportional hazard
regression model, and ROC. As shown in [106]Fig 5C, the K–M survival
curves showed that OS in the high-risk group was significantly shorter
(P < 0.001) than that for patients with low-risk prediction. [107]Fig
5D shows the comparison between the risk score distribution, survival
status, and expression heatmap for 13 genes between the high- and
low-risk groups. According to the univariate analysis, OS was
significantly associated with the risk score and the TNM, T, N, and M
stages (P<0.001) ([108]Fig 5E). However, multivariate analysis
indicated that among the above-mentioned prognostic factors, the risk
score was the only independent prognostic predictor (HR = 1.960, 95% CI
= 1.689–2.275, P<0.001) ([109]Fig 5F). The area under the curve (AUC)
for the risk score was 0.690 ([110]Fig 5G); higher than that for other
clinical factors. The relationship between the clinicopathological
parameters and the 13 gene-based prognostic model is shown in [111]Fig
5H. The values for the prognostic prediction model were higher in T3-4
than in T1-2 stages (P = 0.011); in N1-3 than in N0 (P < 0.001), and in
stages III–IV than in stages I–II (P < 0.001). The estimated
half-maximal inhibitory concentrations (IC50) of three commonly used
NSCLC chemotherapeutic agents, including, cisplatin, docetaxel, and
paclitaxel, were compared between the low- and high-risk groups of
patients using the pRRophetic algorithm. The three drugs showed higher
IC50 values and lower sensitivity in the high-risk as compared to
low-risk groups (all P < 0.01) ([112]Fig 5I). Thus, the result
suggested that high-risk patients with NSCLC exhibit resistance to
cisplatin, docetaxel, and paclitaxel. Taken together, the prognostic
model was effective in predicting the therapeutic response of high- and
low-risk NSCLC patients towards the three drugs.
Table 1. The clinicopathological characteristics of 944 NSCLC patients.
Parameters LUSC (n = 489) LUAD (n = 455) Total(n = 944)
Gender
FEMALE 127 (25.97%) 248 (54.51%) 375 (39.7%)
MALE 362 (74.03%) 207 (45.49%) 569 (60.3%)
Smoking history
No 75 (15.34%) 142 (31.21%) 217 (23.0%)
Yes 414 (84.66%) 313 (68.79%) 727 (77.0%)
Age
Mean(SD) 66.4847 (9.3926) 65.778 (9.318) 66.1441 (9.3584)
Median[Min, Max] 67 [33, 88] 67 [38, 90] 67 [33, 90]
Survival status
Dead 278 (56.85%) 282 (61.98%) 560 (59.3%)
Alive 211 (43.15%) 173 (38.02%) 384 (40.7%)
TNM Stage
Stage I 239 (48.88%) 246 (54.07%) 485 (51.4%)
Stage II 158 (32.31%) 110 (24.18%) 268 (28.4%)
Stage III 85 (17.38%) 75 (16.48%) 160 (16.9%)
Stage IV 7 (1.43%) 24 (5.27%) 31 (3.3%)
[113]Open in a new tab
Table 2. Univariate and multivariate cox proportional hazard regression
analysis of 31 genes.
Gene Univariate Multivariate
HR(95%CI) P HR(95%CI) P
FSTL3 1.195(1.109–1.288) <0.0001 1.178(1.078–1.287) 0.000294 **
ANLN 1.156(1.067–1.252) 0.0004
HMMR 1.162(1.060–1.273) 0.0014
PRR11 1.154(1.057–1.260) 0.0014
ARNTL2 1.099(1.034–1.169) 0.0024
DSG2 1.158(1.053–1.273) 0.0025 1.118(0.993–1.259) 0.065513
CPS1 1.048(1.016–1.080) 0.0028 1.053(1.018–1.089) 0.002604 **
DEPDC1B 1.119(1.035–1.209) 0.0046 1.226(1.046–1.437) 0.012009 *
PKM 1.219(1.062–1.399) 0.0049
DCBLD2 1.131(1.037–1.232) 0.0053
NAV3 1.104(1.028–1.186) 0.0067 1.066(0.981–1.158) 0.130179
KIF4A 1.115(1.029–1.207) 0.0076
AURKA 1.137(1.034–1.250) 0.0080
GTSE1 1.112(1.026–1.205) 0.0096 1.154(0.979–1.361) 0.087804
NGEF 1.081(1.018–1.149) 0.0117
PTPN21 1.150(1.030–1.283) 0.0129 1.203(1.044–1.386) 0.010568 *
FKBP5 1.115(1.022–1.218) 0.0149 1.075(0.977–1.183) 0.137657
CLSPN 1.104(1.018–1.198) 0.0164
DEPDC1 1.094(1.016–1.178) 0.0167 0.864(0.733–1.019) 0.083033
LAMB1 1.118(1.017–1.230) 0.0211 0.890(0.785–1.009) 0.068225
STYK1 1.109(1.015–1.212) 0.0218 1.092(0.989–1.206) 0.081164
NDC80 1.095(1.010–1.187) 0.0271
ITIH4 0.922(0.855–0.993) 0.0320
RBM6 0.856(0.741–0.989) 0.0346 0.860(0.726–1.019) 0.082
CDC6 1.087(1.005–1.175) 0.0364
ADAMTS6 1.095(1.005–1.193) 0.0376
COL9A3 1.055(1.002–1.111) 0.0401 1.067(1.009–1.129) 0.024166 *
AASS 0.917(0.845–0.996) 0.0408
MCM10 1.077(1.002–1.158) 0.0436
NEXMIF 0.956(0.915–0.999) 0.0455
SCIN 0.952(0.907–1.000) 0.0489
[114]Open in a new tab
Fig 5. Construction of the 13-mRNA-based prognostic model to predict
chemotherapeutic response.
[115]Fig 5
[116]Open in a new tab
(A) Differential expression of 13 selected genes between NSCLC and
normal samples. (B) Relationship of each gene in the 13 gene-based
model. (C) Risk score distribution in patients with NSCLC, survival
status in different groups, and heatmap of the expression of 13 genes.
(D) K–M plot represents that patients in the high-risk group had
significantly shorter OS than those in the low-risk group. (E) Forest
plot of univariate Cox regression analysis showing the association
between risk factors and NSCLC survival. (F) Forest plot of
multivariate Cox regression analysis showing that the 13-mRNA signature
was an independent predictor of prognosis in NSCLC. (G) ROC curve
showed that AUC of 13-mRNA prognostic model was 0.690. (H)
Clinicopathological significance of 13 gene-related prognostic model in
NSCLC. (I) Relation of the risk score to the estimated IC50 of
cisplatin, paclitaxel, and docetaxel. NSCLC, non-small cell lung
cancer; K-M, Kaplan–Meier survival analysis; ROC, receiver operating
characteristic curve analysis; AUC, the area under the curve; IC50,
half maximal inhibitory concentration.
Table 3. The association between risk score of the 13-gene-based prognostic
model and clinicopathological characteristics.
Parameters Low risk (n = 473) High risk (n = 471) χ^2 P Value
Age
≤ 65 215 (52.31%) 196 (47.69%) 1.4163 0.234
> 65 258 (48.41%) 275 (51.59%)
Gender
FEMALE 210 (56.00%) 165 (44.00%) 8.6454 0.0033 **
MALE 263 (46.22%) 306 (53.78%)
Subtype
LUSC 224 (45.81%) 265 (54.19%) 7.4972 0.0062 **
LUAD 249 (54.73%) 206 (45.27%)
Smoking history
No 119 (54.84%) 98 (45.16%) 2.5246 0.1121
Yes 354 (48.69%) 373 (51.31%)
T stage
T1-2 406 (51.20%) 387 (48.80%) 2.3649 0.1241
T3-4 67 (44.37%) 84 (55.63%)
N stage
N0 333 (53.54%) 289 (46.46%) 8.5866 0.0034 **
N1-3 140 (43.48%) 182 (56.52%)
TNM stage
Stage I-II 393 (52.19%) 360 (47.81%) 6.4734 0.0109 *
Stage III-IV 80 (41.88%) 111 (58.12%)
[117]Open in a new tab
Construction of a ceRNA network
[118]Fig 1 shows the established lncRNA–miRNA–mRNA ceRNA network
obtained through the prognostic mRNA signatures, predicted miRNAs, and
DElncRNAs. First, 29 miRNAs were obtained through prediction using the
13 mRNAs. After the analysis of lncRNA expression data from TCGA
database, 69 DElncRNAs [|log (fold change)| (|logFC|) > 4, P < 0.05]
were identified through the empirical analysis of digital gene
expression in R (edgeR) ([119]Fig 6A). However, only three miRNAs were
further selected. These three miRNAs were retained after overlapping
with the 29 predicted miRNAs. Thus, five mRNAs were obtained from the
13 prognostic mRNAs which corresponded to the three miRNAs ([120]Fig
6B). Finally, a ceRNA network containing five mRNAs, three miRNAs, and
69 lncRNAs was constructed as shown in [121]Fig 6C. The relationships
between clinicopathological parameters and expression of the five mRNAs
in the ceRNA network are shown in [122]Fig 6D.
Fig 6. Construction of a lncRNA–miRNA–mRNA ceRNA network.
[123]Fig 6
[124]Open in a new tab
(A) Volcano map of 69 significant DElncRNAs. (B) Correlation between
the five prognostic mRNAs and their corresponding miRNAs. (C) A ceRNA
network was constructed containing five mRNAs, three miRNAs, and 69
lncRNAs for NSCLC prognosis. (D) Clinicopathological significance of
FKBP5, DEPDC1, CPS1, COL9A3, and PTPN21 in NSCLC. ceRNA, Competing
endogenous RNA; DElncRNAs, differentially expressed lncRNAs; NSCLC,
non-small cell lung cancer.
Relationship between the risk score from ceRNA network and tumor immune
microenvironment
The differences in the 22 infiltrating immune cell types were
determined between low- and high-risk NSCLC patients using the
CIBERSORT algorithm. [125]Fig 7A summarizes the proportion of immune
cell types in 944 patients from the TCGA database. The high-risk
patients had higher infiltration levels of activated CD4 memory and
follicular helper T cells ([126]Fig 7B and 7D), whereas greater
infiltration of CD4 memory resting T cells, M0 macrophages, resting
dendritic cells, and neutrophils were observed in low-risk patients
([127]Fig 7B and 7E–7H). Given the important role of chemokines and
immune checkpoints in TIME and immune response, we analyzed the
correlation between the expression of these molecules and the risk
score. The expression of PD-1, positively associated with the risk
score, was significantly upregulated in the high-risk group as compared
to the low-risk group ([128]Fig 7I and 7J). In addition, the expression
of chemokines for immune activation (i.e., CXCL9, CXCL10, and CXCL16)
was significantly lower in the high-risk group than in the low-risk
group ([129]Fig 7K). These results indicated that ceRNA risk signatures
may be implicated in the NSCLC immunosuppressive microenvironment.
Fig 7. Relationships between the risk score of ceRNA network and tumor immune
microenvironment.
[130]Fig 7
[131]Open in a new tab
(A) Landscape of immune cell infiltration between low- and high-risk
patients with NSCLC. (B) Fractions of significantly differential immune
cell types between low- and high-risk patients with NSCLC. (C–H)
Correlations between risk score and infiltration levels of
significantly different immune cell types. (I) Comparison of PD-1
expression level between low- and high-risk patients with NSCLC. (J)
Correlations between risk score and PD-1 expression levels. (K)
Comparison of expression level of chemokines between low- and high-risk
patients with NSCLC. NSCLC, non-small cell lung cancer; PD-1,
programmed cell death protein 1.
Construction of seven-lncRNA-based prognostic model from ceRNA network to
predict drug resistance in cancer
The results of lncRNA analysis were similar to the findings through
mRNA analysis. Univariate regression analysis was performed to evaluate
the relevance of the expression of the 69 lncRNAs from the ceRNA
network with OS. Nine lncRNAs were obtained by setting the cut-off for
significant association at P < 0.05. These were further included for
the subsequent multivariate cox regression analysis ([132]Table 4). A
prognosis model for OS was constructed with the following seven
lncRNAs: LINC00707, LINC00460, FEZF1-AS1, LINC00593, OTX2-AS1,
LINC01833, and CASC8. The risk score was calculated for each NSCLC
patient, and the threshold was set at 0.9755. Accordingly, the patients
were classified into high- (n = 471) and low-risk (n = 473) groups.
[133]Fig 8A shows the distribution of risk scores for NSCLC patients,
the survival status in different groups, and the heatmap of the
expression of seven lncRNAs. The accuracy of the seven-lncRNA-based
prognostic model in predicting NSCLC clinical outcomes was estimated by
K–M survival analysis, Cox regression model, and ROC. As shown in
[134]Fig 8B, the K–M survival curves showed that OS in the high-risk
group was significantly shorter than that in patients with low
predicted risk (P < 0.001). According to the univariate analysis, the
risk score and TNM, T, N, and M stages were significantly correlated
with OS in NSCLC (P<0.001) ([135]Fig 8C). However, multivariate
analysis results showed that among them, only the risk score was an
independent prognostic predictor (HR = 2.050, 95% CI = 1.702–2.470,
P<0.001) ([136]Fig 8D). The AUC for risk score was 0.650 ([137]Fig 8E);
higher than those for other clinical factors. [138]Fig 8F illustrates
the relationship between the clinicopathological parameters and the
seven-lncRNA-based model. Higher prognostic prediction values were
found in T3-4 than in T1-2 (P = 0.007) stages; in N1-3 than in N0 (P =
0.012), and in stages III–IV than in stages I–II (P = 0.013).
Gefitinib, gemcitabine, and cisplatin are drugs commonly used in NSCLC
treatment [[139]30]. The estimated IC50 values of these drugs for the
low- and high-risk patients were compared using the pRRophetic
algorithm. All three drugs had higher IC50 values and lower sensitivity
in the high-risk patients as compared to those for low-risk patients
(all P < 0.01) ([140]Fig 8G). This result indicated that high-risk
NSCLC patients were resistant to gefitinib, gemcitabine, and cisplatin.
Based on the seven-lncRNA signature, a prognostic ceRNA network
involving five mRNAs of the 13 gene-based prognostic model and three
miRNAs were extracted ([141]Fig 8H).
Table 4. Univariate and multivariate cox proportional hazard regression
analysis of 9 lncRNAs.
lncRNA Univariate Multivariate
HR(95%CI) P HR(95%CI) P
LINC00707 1.092(1.045–1.141) 0.0001 1.051(1.004–1.099) 0.0316 *
LINC00460 1.073(1.033–1.115) 0.0003 1.044(1.004–1.085) 0.0314 *
FEZF1-AS1 0.944(0.911–0.980) 0.0021 0.964(0.927–1.002) 0.0650
LINC00593 0.937(0.898–0.977) 0.0024 0.943(0.902–0.986) 0.0094 **
OTX2-AS1 0.907(0.850–0.968) 0.0034 0.902(0.842–0.968) 0.0039 **
LINC02535 1.079(1.025–1.136) 0.0039
LINC01833 1.050(1.015–1.087) 0.0049 1.055(1.020–1.092) 0.0021 **
CASC8 1.061(1.014–1.110) 0.0097 1.054(1.004–1.106) 0.0331 *
LINC00601 1.059(1.001–1.120) 0.0465
[142]Open in a new tab
Fig 8. Construction of the seven-lncRNA-based prognostic model from ceRNA
network to predict resistance to cancer drugs.
[143]Fig 8
[144]Open in a new tab
(A) Risk score distribution in patients with NSCLC, survival status in
different groups, and heatmap of the expression of seven lncRNAs. (B)
K–M plot represents that patients in the high-risk group had
significantly shorter OS than those in the low-risk group. (C) Forest
plot of univariate Cox regression analysis showed the association
between risk factors and survival of NSCLC. (D) Forest plot of
multivariate Cox regression analysis showing that the seven-lncRNA
signature was an independent predictor of prognosis in NSCLC. (E) ROC
curve showing that the AUC of seven-lncRNA prognostic model was 0.650.
(F) Clinicopathological significance of seven-lncRNA-related prognostic
model in NSCLC. (G) Relation between the lncRNA risk score and the
estimated IC50 of gefitinib, gemcitabine, and cisplatin. (H) Hub ceRNAs
from coregulatory network involving five mRNAs from the 13gene-based
prognostic model and three miRNAs from the seven-lncRNA signature.
ceRNA, Competing endogenous RNA; NSCLC, non-small cell lung cancer;
K-M, Kaplan–Meier survival analysis; ROC, receiver operating
characteristic curve analysis; AUC, the area under the curve.
Validation of the risk mRNAs and risk lncRNAs in ceRNA network
We analyzed the immunohistochemical staining data obtained from the
Human Protein Atlas (HPA) database to compare the expression levels of
five risk genes (FSTL3, CPS1, PTPN21, DEPDC1B, and COL9A3) from the
seven-lncRNA-based ceRNA network. The results showed that the
expression at the protein level for two risk genes (DEPDC1B and COL9A3)
was significantly higher in LUSC or LUAD tissues than in normal lung
tissues ([145]Fig 9D and 9E); two genes (FSTL3 and PTPN21) showed an
opposing trend ([146]Fig 9A and 9C), which was also consistent at the
transcriptional level ([147]Fig 5A). Only CPS1 expression at the
protein level was not detected in both the normal lung and NSCLC
tissues in the HPA database ([148]Fig 9B).
Fig 9. Validation of the risk mRNAs in ceRNA network by using the human
protein atlas database.
[149]Fig 9
[150]Open in a new tab
(A-E) Validation of FSTL3, CPS1, PTPN21, DEPDC1B, and COL9A3 by The
Human Protein Atlas database (IHC) at the protein level. ceRNA,
Competing endogenous RNA; IHC, immunohistochemistry.
In the GTEx and TCGA databases, the gene expression levels of six
OS-related lncRNAs (LINC00707, LINC00460, FEZF1-AS1, LINC00593,
OTX2-AS1, LINC01833, and CASC8) were significantly lower in NSCLC as
compared to the normal lung tissues ([151]Fig 6A). The results of
reverse transcription-quantitative polymerase chain reaction (RT-qPCR)
showed that the gene expression levels of LINC00707, LINC00460,
OTX2-AS1, LINC01833, LINC00593, and CASC8 were significantly lower in
PC-9, H1650, and H1975 cells as compared to the BEAS-2B cell line
([152]Fig 10A–10F).
Fig 10. Validation of lncRNAs by RT-qPCR.
[153]Fig 10
[154]Open in a new tab
(A-F) Relative expression levels of LINC00707, LINC00460, LINC00593,
LINC01833, and CASC8 in normal and NSCLC cell lines. Experiments were
performed in triplicate, * P < 0.05, ** P < 0.01, *** P < 0.001, **** P
< 0.0001 by ANOVA. RT-qPCR, quantitative reverse transcription
polymerase chain reaction; NSCLC, non-small cell lung cancer.
Discussion
NSCLC is one of the leading causes of cancer-related death, worldwide
[[155]31]. Despite the rapid development in immunotherapy and targeted
therapy for NSCLC, the 5-year overall survival of NSCLC patients
remains low, due to the lack of availability of effective biomarkers
for early diagnosis and a poor understanding of the pathogenesis of
NSCLC. Powerful computational models to predict potential
disease-related non-coding RNAs for experimental validation are helpful
for in-depth interpretation of the pathogenesis and processes
underlying NSCLC development and improving related treatment
strategies, and may dramatically decrease the time and expenditure on
biological experiments [[156]32–[157]35]. The ceRNA network hypothesis
was proposed to describe a new RNA regulatory crosstalk where different
RNA transcripts can potentially regulate each other through MREs
[[158]17]. This concept unfolds a novel paradigm for the
lncRNA–miRNA–mRNA regulatory network that may explain the mechanism
underlying tumorigenesis. Recently, the advancement in interaction
prediction studies using computational biology provides valuable
insights into the development of lncRNA-mediated ceRNA network
[[159]36–[160]38]. TIME composition is strongly associated with
neoplastic progression, anti-tumor immune response, and clinical
outcome in NSCLC [[161]14,[162]39]. Therefore, an immune-related ceRNA
network must be explored to improve the accuracy of prognostic
prediction and clinical response to immunotherapy. To the best of our
knowledge, this is the first study that establishes an immune-related
ceRNA prognostic model based on GTEx-TCGA data, WGCNA, pRRophetic
algorithm, and TIME in NSCLC.
In this study, the RNA-Seq data from 944 NSCLC and 578 normal lung
specimens were acquired from TCGA and GTEx databases, respectively. Key
mRNAs and lncRNA signatures associated with NSCLC prognosis were
identified using bioinformatic methods, including edgeR, WGCNA, GO term
enrichment analysis, GSEA, KEGG pathway enrichment analysis, ceRNA
network construction, and Cox regression analyses. A ceRNA network
containing five mRNA signatures, seven lncRNA signatures, and three
miRNAs were significantly related to OS in NSCLC patients.
Additionally, correlations between the risk score of ceRNA prognostic
model and TIME, and with sensitivity towards cancer drugs were
investigated. Finally, immunohistochemical data from the HPA database
and RT-qPCR were used to validate the findings from this study.
The lncRNA–miRNA–mRNA ceRNA network has an important influence on NSCLC
pathogenesis and prognosis. Wang et al. [[163]23] investigated a ceRNA
network including 113 DElncRNAs, 12 DEmiRNAs, and 36 DEmRNAs with
significant association with NSCLC. Wang et al. [[164]40] constructed a
ceRNA network involving 155 lncRNAs, 30 miRNAs, and 68 mRNAs to
identify novel targets and pathways in NSCLC. In both these studies,
the tumor and non-tumor samples were obtained from the same patients
from the TCGA database to accurately elucidate the ceRNA interactive
mechanism underlying NSCLC biology. Their findings highlight the
prognostic value of ceRNA for NSCLC patients. However, the number of
normal samples in the TCGA-NSCLC cohort is limited. Biomolecules
secreted by cancer cells can be engulfed by the neighboring normal
cells, thereby altering the gene expression and biological processes in
these two cells [[165]41]. The idea that cancer-related genetic
aberrancy also exists in para cancer tissues adjacent to the highly
genetically abnormal tumor tissues may be biased [[166]42]. Hence, the
use of a large number of normal samples from individuals with no tumors
is necessary [[167]43]. In the present work, edgeR was used to identify
the DEmRNAs between NSCLC and normal lung samples. KEGG–GSEA results
showed that DEmRNAs were enriched in the cell cycle, DNA
replication-related genes, and the mTOR pathway. Further WGCNA analysis
showed that the brown gene modules were significantly related to NSCLC.
GO term enrichment results showed that the mRNAs in the brown modules
were involved in cell cycle regulation, and KEGG analysis showed that
they were clustered in the mTOR signaling pathway and DNA replication
processes. PI3K/Akt/mTOR signaling commonly occurs in tumors and
controls DNA replication and replisome stability, thereby regulating
cell cycle progression [[168]44]. Therefore, PI3K/Akt/mTOR signaling
may be an essential mechanism underlying NSCLC initiation and
progression.
Only a few ceRNA network studies discuss the relationship of expression
patterns among ceRNAs associated with NSCLC. WGCNA was used to identify
the co-expression modules in NSCLC using the data from the TCGA and
GTEx consortia. GTEx provides abundant data on gene expression from
healthy people, thus benefitting the tumor purity quantification of the
samples [[169]41]. The WGCNA-brown mRNA module and the WGCNA-turquoise
lncRNA module were the key modules for NSCLC. Among the predicted
mRNAs, DEmRNAs, and mRNAs in the brown module, 1094 overlapping mRNAs
were identified. Furthermore, a survival prognostic model with 13 genes
(FSTL3, CPS1, PTPN21, DEPDC1B, COL9A3, DSG2, LAMB1, STYK1, RBM6,
DEPDC1, GTSE1, NAV3, and FKBP5) was constructed for NSCLC through K–M
estimator and Cox proportional hazard regression model.
Non-coding RNAs can promote PD-L1/PD-1 expression through ceRNA
regulation of NSCLC mechanisms [[170]45]. However, given the complexity
of the immune system, knowledge of a single biomarker that can
precisely identify patients who can potentially benefit from ICIs is
lacking. Immune-related gene signatures, immune infiltrating cells, and
chemokines are important in predicting the clinical response to
immunotherapy and prognosis of NSCLC patients [[171]14]. A previous
study shows that activated memory CD4 T cells and helper T cells are
positively correlated with improvement in survival AMONG lung cancer
patients [[172]14]. By contrast, suppressive immune cells, including
resting dendritic cells, M0 macrophages, and neutrophils, form an
immunosuppressive microenvironment [[173]46] and indicate a poorer
prognosis. Integrated analysis of the correlation between ceRNA and
TIME is necessary to improve the accuracy of prognostic predictions.
The ceRNA signature in NSCLC is related to immune infiltrating cell
types and PD-1 expression. Results from the execution of the CIBERSORT
algorithm showed that the proportion of activated CD4 memory T cells
and follicular helper T cells was positively correlated with the risk
score of 13 gene-based ceRNA network, and the proportion of CD4 memory
resting T cells, M0 macrophages, resting dendritic cells, and
neutrophils was negatively associated with the 13 gene-based ceRNA risk
score. Chemokines, especially CXCL9, CXCL-10, and CXCL11/CXCR3 axis,
play key roles in TIME and immune response [[174]39]. Reduced levels of
CXCL9 and CXCL10 are correlated with the immunosuppressive
microenvironment, negative responses to ICIs, and poor prognosis among
cancer patients [[175]47]. In the present study, CXCL9 and CXCL10 were
downregulated in the high-risk group, which further promoted
immunosuppressive TIME. Taken together, our ceRNA risk model may
effectively predict the TIME in NSCLC patients.
Furthermore, five mRNAs were obtained from the 13 gene-based prognostic
model which corresponded to the three miRNAs selected from the
prediction from among the 69 DElncRNAs. Based on these five mRNAs
(FKBP5, DEPDC1, CPS1, COL9A3, and PTPN21), a ceRNA network that
included 69 DElncRNAs and three DEmiRNAs (miR-17-5p, miR-20b-5p, and
miR-429) was further established. Survival analysis revealed that seven
DElncRNAs (LINC00707, LINC00460, FEZF1-AS1, LINC00593, OTX2-AS1,
LINC01833, and CASC8) in the ceRNA network were associated with the OS
in NSCLC patients. Finally, the prognostic ceRNAs from a coregulatory
network involving five mRNAs from the 13-gene-based prognostic model
and three DEmiRNAs were identified based on the seven-lncRNA signature.
Although chemotherapy and targeted anti-cancer therapies have improved
the outcomes in NSCLC patients, resistance to chemotherapeutic and
targeted drugs pose a serious challenge. The mechanisms of drug
resistance remain unclear owing to the lack of approaches to accurately
predict clinical responses to drugs. In this study, we predicted the
association between agents and signatures in the ceRNA network using
the pRRophetic algorithm. Our results demonstrated that the signatures
in the ceRNA network were correlated with resistance to
chemotherapeutic and targeted drugs, including cisplatin, paclitaxel,
docetaxel, gemcitabine, and gefitinib, thus allowing prediction of drug
response for developing personalized treatment of NSCLC patients.
In the five-gene signatures in the ceRNA network, the expression levels
of DEPDC1, CPS1, COL9A3, and PTPN21 were significantly different
between early- and advanced-stage tumor tissues. Meanwhile, FKBP5 was
associated with a subtype of NSCLC. In progressive stages of NSCLC,
DEPDC1, CPS1, and COL9A3 were overexpressed, while PTPN21 was
down-regulated. DEPDC1 is a putative oncogene [[176]48] that can
function as a transcriptional co-repressor during transcriptional
regulation [[177]49]. The results of a previous study strongly indicate
that DEPDC1 expression is positively related to the poor prognosis in
NSCLC [[178]50], which is consistent with the findings of the present
study. Another previous study reports that DEPDC1 promotes angiogenesis
and invasion by activating chemokines in hepatocellular carcinoma
[[179]51]. CPS1 is the mitochondrial enzyme involved in the first
committed step of the urea cycle [[180]52]. Its expression is
positively correlated with tumor growth and is also associated with
poor NSCLC prognosis [[181]53]. COL9A3 encodes instructions for
generating one of the three alpha chains of type IX collagen [[182]54]
and is highly expressed in salivary adenoid cysts, breast basal-like
carcinomas, and melanoma [[183]55]. However, its function in cancer
remains unknown. The correlation between COL9A3 and NSCLC has not yet
been reported. The present study is the first to show that abnormal
COL9A3 expression is associated with the tumor size in NSCLC patients.
PTPN21, also called PTPD-1, is a member of the protein tyrosine
phosphatase (PTP) family and controls cell growth, migration, and
oncogenic transformation [[184]56]. Given its inverse correlation with
tumor invasion in bladder cancer tissues [[185]57] and its
immunosuppressive function [[186]58], PTPN21 may influence the early
stages of tumor progression by inhibiting the immunosuppressive TIME
and tumor invasion. This phenomenon could explain the high expression
of PTPN21 in the N0 stage.
miR-17-5p, miR-20b-5p, and miR-429 are all involved in lung cancer
[[187]59–[188]61]. miR-429, a member of the miRNA-200 family, functions
as a tumor promoter in human NSCLC cells [[189]62]. It is also
positively correlated with the expression of immune checkpoints in
NSCLC patients [[190]63]. miR-429 is known to regulate six lncRNAs
(LINC00707, LINC00460, FEZF1-AS1, LINC00593, LINC01833, and CASC8) out
of the seven lncRNAs in the proposed prognostic model. Among them,
LINC00707 is a novel long intergenic non-coding RNA that enhances
sensitivity towards cisplatin in NSCLC cells [[191]64], regulates mRNA
stability, and can function as a potential diagnostic and prognostic
marker in gastric cancer [[192]65]. LINC00460 promotes cell migration
and invasion in NSCLC; it is a potential prognostic marker and a
treatment target for patients with NSCLC [[193]66]. FEZF1-AS1 can
promote the proliferation and invasion in tumor cells by suppressing
Wnt/β-catenin signaling in NSCLC [[194]67]. lncRNA CASC8A is a
potential diagnostic biomarker for cancer [[195]68] which can predict
treatment response to platinum-based therapy and toxicity in patients
with lung cancer [[196]69]. Previous research findings corroborate our
prediction of the ceRNA network and the prognostic models. Using the
13-mRNA-based prediction model and the seven-lncRNA-based prognostic
model, we estimated the risk scores of NSCLC patients. The AUC of the
ROC curve was > 0.65, which indicated that the prediction value had
high accuracy in these prognostic models. Patients with NSCLC belonging
to the high-risk group exhibited a high risk of progression according
to the ceRNA network. Thus, these results provide a strong basis for
the potential use of these novel predictive biomarkers for NSCLC
prognosis and may guide molecular treatment approaches and further
clinical research in NSCLC.
However, there were some inherent limitations in this study. First, was
the lack of robust experimental verification. The function and
mechanism of key ceRNAs should be further explored in the in vivo and
in vitro settings. Second, was the need for prospective clinical trials
to validate the prognosticators of the five-mRNA and seven-lncRNA
signatures in the ceRNA network. Third, some superior computational
algorithms could be used for biomarker identification [[197]32–[198]35]
and prediction of miRNA-lncRNA interactions [[199]36–[200]38] in NSCLC.
Finally, only the correlation between the 13-gene-based ceRNA
prognostic model and NSCLC TIME was investigated, but further research
on the predicted effect of the relevant lncRNAs in the microenvironment
remains unknown. Despite these limitations, this study provided
reliable prognostic markers that could be used to predict TIME and
outcome in NSCLC patients. A multi-dimensional perspective was also
proposed for the molecular regulatory mechanism of NSCLC through the
TIME-related ceRNA network.
Materials and methods
TCGA and GTEx RNA sequencing datasets
The RNA-seq data for 455 LUAD samples and 489 LUSC samples were
obtained from the TCGA dataset ([201]https://gdc.cancer.gov/). All data
for normal tissue samples were obtained from the GTEx Analysis V8
release version ([202]https://gtexportal.org/home/datasets) which
included a total of 578 lung samples. The miRNA sequence data of 455
tumor tissues with 45 control samples from LUAD patients and 489 tumor
tissues with 45 adjacent non-cancerous lung tissues from LUSC patients
were retrieved from the TCGA database.
Identification of differentially expressed genes
The ensemble IDs of NSCLC samples were transformed into gene symbols
using The Human GENCODE Gene Set-Release 34 version
([203]https://www.gencodegenes.org/human/). The trimmed mean of the
M-values was used to normalize the raw RNA-seq data. DEmRNAs,
DElncRNAs, and DEmiRNAs between NSCLC and normal lung tissues were
identified using the edgeR package in the Bioconductor project (version
3.11; [204]http://www.bioconductor.org/). Absolute |logFC| > 1.5 and
statistically adjusted P value of < 0.05 (P < 0.05) were considered as
statistically significant DEmRNAs; |LogFC| > 1 and P < 0.05 were
considered as statistically significant DEmiRNAs, and |LogFC| > 4 and P
< 0.05 were considered as statistically significant DElncRNAs. The
volcano map of DEmRNA and DEmiRNA expressions was visualized using the
ggplot2 package (version 3.3.1,
[205]https://github.com/tidyverse/ggplot2) in R.
Gene function and pathway enrichment analysis
ClusterProfiler R package [[206]70] was used to perform GO functional
annotation, KEGG pathway enrichment, and GSEA analyses for DEmRNAs. GO
analysis was conducted to describe gene functions including BPs,
cellular components (CCs), and molecular functions (MFs). KEGG-GSEA was
performed for identification of the enriched signaling pathways, and P
< 0.05 was considered as statistically significant enrichment.
Co-expression network construction by WGCNA
The WGCNA package in R [[207]71] was utilized to establish a
co-expression network and identify the co-expression modules of mRNAs
and lncRNAs. First, the variances of mRNAs and lncRNAs were computed by
analyzing the variance across LUAD and LUSC samples. The top 80% mRNAs
and the top 60% lncRNAs with the highest variances in WGCNA were
chosen. Second, an appropriate soft-thresholding power (β) and a
scale-free fit R^2 > 0.9 were set to establish a weighted adjacency
matrix based on the scale-free topology criterion. After the modules
with the highest correlation coefficient were analyzed, the adjacency
matrix was transformed into a topological overlap matrix [[208]72].
Co-expression modules were obtained using the dynamicTreeCut package in
R, and their expression was analyzed by ME. After merging the highly
correlated modules, the gene’s connectivity was calculated using the
WGCNA package in R. Finally, GO-GSEA and KEGG analyses showed the
functional enrichment of mRNAs in key modules.
Prediction of target mRNAs
miRcode ([209]http://www.mircode.org/) was used to predict interactions
between the DEmiRNAs and the lncRNAs in the key modules that were most
relevant to NSCLC. StarBase ([210]http://starbase.sysu.edu.cn/),
miRTarBase ([211]http://mirtarbase.mbc.nctu.edu.tw/), miRDB
([212]http://www.mirdb.org/), and TargetScan
([213]http://www.targetscan.org/) databases were used to identify the
target DEmRNAs from the predicted miRNAs.
Construction of target mRNAs-related prognostic model
Clinical data of NSCLC patients were obtained to validate the potential
prognostic applications of the target genes. The correlation between
gene expression and patient survival was identified through univariate
cox proportional hazard regression analysis using the “survival”
package in R. Multivariate Cox proportional hazards regression model
was subsequently utilized to construct the prognostic gene-based model.
The risk score was calculated for each patient using the following
formula:
[MATH: Riskscore=expmRNA1×βmRNA1+expmRNA2×βmRNA2+⋯+expmRNAn×βmRNAn :MATH]
A high-risk score indicated a high risk of poor prognosis. The
coefficient of genes, “β”, was calculated using the multivariate Cox
regression model, and the expression of the corresponding mRNAs was
represented as “exp” [[214]73]. Patients with NSCLC were categorized
into high- and low-risk groups with the medium value set as the
demarcation point. Survival differences between the two groups were
examined by K–M analysis and log-rank test. A ROC curve of risk score
and the clinical features, including age, gender, and T, N, M, and TNM
stages were plotted using the “survival ROC” package in R. The accuracy
of the predictions of the prognostic model was evaluated using ROC
curves. Associations between risk score or identified genes and
clinical features were evaluated by stratified analysis. R software
(version 3.6.1) was used for all statistical analyses.
Analyses of tumor-infiltrating immune cells, PD-1, and chemokines
The fractions of 22 types of infiltrating immune cells, PD-1, and
chemokines were identified using the gene expression data of NSCLC
patients from the TCGA database to establish the relationship between
ceRNA network and TIME. The differential immune infiltration levels
were compared between the low- and high-risk groups using the CIBERSORT
algorithm as previously described [[215]74]. Only immune cells with a
CIBERSORT P < 0.05 were selected for subsequent correlation analysis.
The correlation coefficient between risk mRNAs in the ceRNA network and
immune cell infiltration was evaluated through Pearson analysis.
Construction of the ceRNA co-expression network
Competing endogenous RNA (ceRNA) networks were constructed using
prognostic mRNA signatures, predicted miRNAs, and DElncRNAs. mRNA–miRNA
interactions were identified using the miRDB, miRTarBase, StarBase, and
TargetScan databases. The potential target lncRNAs for DEmiRNAs were
identified using miRcode. Overlaps between mRNA–miRNA and lncRNA–miRNA
interactions with prognostic mRNAs, DEmiRNAs, and DElncRNAs were
further used to construct the ceRNA network. The open-source Cytoscape
platform ([216]https://cytoscape.org/) was used to visualize the
regulatory relationships in the ceRNA network.
Cox regression analyses for lncRNAs
A Cox proportional hazard regression analysis was performed for lncRNAs
and mRNAs. Relevant associations between the lncRNAs in the ceRNA
network and the OS in NSCLC patients were evaluated through the
univariate Cox proportional hazard regression model. Prognostic lncRNAs
were identified by the multivariate Cox regression model. The risk
score formula for lncRNA was identical to that of mRNA and was as
follows:
[MATH: Riskscore=expmRNA1×βmRNA1+expmRNA2×βmRNA2+…+expmRNAn×βmRNAn :MATH]
A high-risk score indicated a high risk of poor prognosis. The
regression coefficient of lncRNAs, “β”, was obtained through the
multivariate Cox regression model. The expression of the corresponding
lncRNAs was represented as “exp” [[217]73]. ROC curve and
clinicopathological correlation analyses were used to estimate the
potential applicability of the prognostic signatures for predicting the
NSCLC outcomes.
Estimation of the sensitivity towards chemotherapeutic and targeted molecular
agents
The IC50 values and sensitivity of chemotherapeutic and targeted
therapeutic drugs were compared between the NSCLC subgroups classified
by risk scores using the “pRRophetic” package in R. TCGA mRNA and
lncRNA expression levels were analyzed based on the mRNA and lncRNA
risk scores, respectively.
Validation of the risk signatures from ceRNA network at protein level
The HPA database was used to validate the protein level expression of
five risk mRNAs and compare them at the gene transcriptional level.
Cell culture, RNA extraction, and RT-qPCR
The normal human bronchial epithelial cell line, BEAS-2B [[218]75] was
obtained from the State Key Laboratory of Oncology in South China. The
cells were cultured in RPMI Medium 1640 (C11875500BT, Gibco, USA)
supplemented with 10% fetal bovine serum (A31008-02, Gibco, USA) and 1%
penicillin-streptomycin ([219]PB180120, Procell Life Science &
Technology Co. Ltd., China) in a 5% CO[2] incubator at 37 °C. NSCLC
cell lines, including PC-9 [[220]76] (ZQ0487, Zhong Qiao Xin Zhou
Biotechnology Co. Ltd., China), H1650 [[221]77] (CL- 0166, Procell Life
Science & Technology Co. Ltd., China), and H1975 [[222]78] (CL-0298,
Procell Life Science & Technology Co. Ltd., China), were cultured in
Dulbecco’s Modified Eagle Medium (C11995500BT, Procell Life Science
&Technology Co. Ltd., China) supplemented with 10% fetal bovine serum
(A31008-02, Gibco, USA) and 1% penicillin-streptomycin ([223]PB180120,
Procell Life Science & Technology Co. Ltd., China) in a 5% CO[2]
incubator at 37 °C. Cells in the logarithmic growth phase were
harvested for RNA extraction. Total RNA was isolated from the cells
using the TRIzol reagent (15596026, Invitrogen, USA) and reverse
transcribed into cDNA using the first-strand cDNA template through the
PrimeScript RT reagent Kit with gDNA Eraser (RR047A, TaKaRa, Japan).
SYBR Premix Ex Taq II (RR820A, TaKaRa, Japan) was used for
amplification on the CXF96 System (BioRad, United States). ACTB served
as the endogenous control. The primer designs are listed in [224]S1
Table. lncRNA expression was normalized to that of ACTB and calculated
using the 2^−ΔΔCq method.
Statistical analysis
Statistical analyses for bioinformatic experiments were performed using
the previously mentioned R packages in the R software (v.3.6.3).
Differential expression levels between groups through RT-qPCR analyses
were compared using the parametric one-way ANOVA in GraphPad Prism
version 8.0 (GraphPad Software) and SPSS 23.0 software (IBM).
Statistical significance was considered at P < 0.05.
Supporting information
S1 Table. Real-time quantitative PCR primer sequences used in this
study.
(DOCX)
[225]Click here for additional data file.^ (15.4KB, docx)
Acknowledgments