Abstract
Extracellular vesicles (EVs) mediate intercellular communications in
the tumor microenvironment and contribute to the aggressive phenomenon
of cancers. Although EVs in body fluids are supposed to be ideal
biomarkers for cancer diagnosis and prognosis, it remains difficult to
distinguish the tumor-derived EVs from those released by other tissues.
We hypothesized that analyzing the EV-related molecules in tumor
tissues would help to estimate the prognostic value of tumor-specific
EVs. Here, we investigate the expression of coding genes of proteins
carried by small EVs (sEVs) in primary lung adenocarcinoma. Based on
the protein-protein interaction network, we identified three network
modules (3-PPI-Mod) as a signature that could predict recurrence. This
signature was validated in three independent datasets and demonstrated
better prognostic value than signature generated from gene expression
alone. Meanwhile, the high-risk subgroup assigned by the signature
could benefit from adjuvant chemotherapy, although it was not
beneficial in unselected patients. Two out of three modules were
enriched by proteins identified in sEVs from non-small-cell lung cancer
cells. Furthermore, the two modules were remarkably correlated with
intratumoral hypoxia score. These results suggest that the 3-PPI-Mod
signature was enriched in tumor-derived sEVs and could serve as a
prognostic and predictive biomarker for lung adenocarcinoma.
Keywords: exosome, NSCLC, microenvironment, gene expression, adjuvant
chemotherapy
Introduction
Lung cancer is one of the most common malignancies worldwide.[43]^1
According to pathomorphology, nearly 85% of all lung cancers are
non-small-cell lung cancers (NSCLCs).[44]^2 Lung adenocarcinoma (ADC)
accounts for 35% of NSCLCs. Despite the improvement of treatment, the
5-year overall survival of NSCLC is below 20%.[45]^3 Loco-regional and
distant relapse are the major causes of cancer-related death for NSCLC.
Even in stage I NSCLC, about 30% of patients would suffer recurrent
disease, and many patients would die of it after surgical
treatment.[46]3, [47]4 Although adjuvant chemotherapy (ACT) improves
outcome in late-stage NSCLC, the survival benefit of ACT in early-stage
patients remains controversial.[48]^5 Thus, exploring novel prognostic
biomarkers would be helpful to select patients for ACT and improve
prognosis in early-stage NSCLCs.
Extracellular vesicles (EVs) are cell-released membrane particles that
regulate intercellular communication by transporting functional
molecules (e.g., proteins, nucleic acids, and lipids) to recipient
cells.[49]^6 Cells can secret diverse types of EVs with a wide range of
sizes and different cellular origins. Exosomes mainly range from
50–150 nm in diameter, which are generated inside multivesicular
endosomes.[50]^7 Other EVs (with a diameter from 100 to 1,000 nm), such
as microvesicles (MVs), ectosomes, or microparticles, are directly
budded from cellular membrane.[51]^7 Given that the specific
subcellular origin was not strictly demonstrated in many literatures,
the generic term EVs is used in this study, as recommended by the
International Society for Extracellular Vesicles (ISEV).[52]^8
In recent years, increasing evidences have suggested that EVs
play an important role in cancer development and progression.
Cancer-derived EVs are implicated in various carcinogenesis processes,
including malignant transformation, angiogenesis, immunosuppression,
invasion, and treatment resistance.[53]9, [54]10, [55]11 On the other
hand, EVs released by the tumor microenvironment could also influence
the traits of cancer cells.[56]^12 Hypoxia, a common feature of tumor
microenvironment, was demonstrated to induce cancer EVs that transport
aggressive and metastatic phenomenon to the recipient cells.[57]13,
[58]14, [59]15, [60]16 Moreover, cancer-released EVs could modify the
distant microenvironment to form a pre-metastatic niche to facilitate
the formation of metastatic lesions, suggesting that cancer EVs could
act as both local and distant effects.[61]^17
Cancer-derived EVs detected in various body fluids are proposed as
novel noninvasive biomarkers. For instance, circulating small EVs
(sEVs, with a diameter less than 200 nm) carried microRNAs
(miRNAs)[62]18, [63]19 and proteins,[64]20, [65]21, [66]22 which are
promising diagnostic and prognostic biomarkers for NSCLC. Park
et al.[67]^23 examined the proteomics of sEVs isolated in malignant
pleural effusions (PEs) from metastatic lung ADCs. The study identified
a lot of well-known cancer-related proteins, such as EGFR, RAS, and
Src-family kinase.[68]^23 However, the sEVs identified in body fluids
may represent a mixed population released by both tumors and other
tissues. To date, it remains a great challenge to distinguish
tumor-specific sEVs from those released by other tissues due to the
lack of specific biomarkers.[69]^24 It was reported that the protein
levels in sEVs released to body fluids was consistent with protein
expression of the NSCLC tissues.[70]^25 Therefore, we proposed that
integrative analysis of primary tumors with secreted sEVs may be
feasible to identify tumor-specific sEV-related biomarkers for NSCLC.
In this study, we analyzed the intratumoral gene-expression profile of
the coding genes of sEV proteins identified in malignant PEs from
advanced lung ADC.[71]^23 Based on a protein-protein interaction (PPI)
network approach, a sEV-associated gene signature was established for
recurrence prediction. This signature was further validated for its
prognostic value in three retrospective datasets and one prospective
dataset from the public database ([72]Table S1). Moreover, the gene
signature was further compared with proteomics of sEVs isolated from
conditional medium of NSCLC cells as well as peripheral blood from
NSCLC patients. The association between the signature and tumor
microenvironment, including hypoxic status and stromal cell abundance,
were also investigated.
Results
Identifying Relapse-Associated PPI Modules from sEV Proteins
[73]Figure 1 depicts an overall flow chart of this study. For the sEV
proteins identified in malignant PEs, 568 proteins were mapped onto the
reference PPI network. The sEV-associated PPI network was integrated
with the gene-expression profile of the training dataset. Next, the
module searching procedure identified 144 modules, which showed locally
maximal discriminative scores for recurrent status. Three permutation
methods were used to estimate the significance of the discriminative
score for each module. As a result, 21 modules showed significantly
higher discriminative scores for recurrence than those by chance (p <
0.001 for all three methods, [74]Figure S1). Details of 21 candidate
modules are presented in [75]Figure S2.
Figure 1.
[76]Figure 1
[77]Open in a new tab
The Workflow of This Study
The PPI network of sEV proteins isolated from PEs was integrated with
the gene expression profile of the training dataset. Candidate modules
with locally maximal relapse scores were identified by a greedy
searching approach. Random forest algorithm was used to establish a
network-based signature for relapse risk. The signature 3-PPI-Mod was
further validated for prognosis in three independent datasets.
Moreover, the predictive value for adjuvant chemotherapy in ADCs was
evaluated in a prospective dataset conducted from the JBR.10 trial.
Biological validation was performed by comparing the signature with
various databases. Abbreviations: PPI, protein-protein interaction;
sEVs, small extracellular vesicles; 3-PPI-Mod, 3 PPI modules; ADC,
adenocarcinoma; RFS, relapse-free survival; DSS, disease-free survival;
OS, overall survival.
The expression score profiles of the 21 PPI modules are presented in
[78]Figure 2A. All modules were clustered into two major clusters by
unsupervised clustering algorithm. One cluster with 14 modules was more
likely upregulated in recurrent tumors, and another cluster with 7
modules was inversely correlated with recurrence. Patients were also
grouped into two major subgroups, which were significantly associated
with recurrent status (adjusted p < 0.001) and EGFR mutation (adjusted
p = 0.036) ([79]Figure 2A). However, there was no statistically
significant association of patient subgroups with other
clinicopathological factors, such as age, sex, smoking index, tumor
stage, KRAS status, or TP53 status.
Figure 2.
[80]Figure 2
[81]Open in a new tab
Establishing a Prognostic Signature Based on a sEV-Associated Network
(A) Expression score profiles of candidate PPI modules. The upper
heatmap shows the expression scores of the 21 PPI modules. Unsupervised
hierarchical clustering analysis was performed for modules (rows) and
samples (columns). The lower panel indicates clinicopathological
variables of the tumor samples. The p value was calculated by
chi-square test and then adjusted by FDR for multiple testing. (B)
Feature selection based on RF algorithm. The x axis indicates the
number of used variables (modules). The y axis is the out-of-bag (OOB)
error of each prediction model. The arrow indicates that the
combination of 3 PPI modules achieved the selection criteria with the
smallest feature number and maximal prediction accuracy. (C) Expression
scores of the 3 modules that were selected in (B). The p value was
calculated by t test. (D) The recurrent risk predicted by 3-PPI-Mod was
used to perform ROC analysis. AUROC and 95% CI are presented. (E)
Scatterplotting of the predictive risk of recurrence. The x axis
indicates patient index. The y axis indicates the predictive risk of
recurrence (OOB). A cutoff of 0.5 was used to classify patients into
low- and high-risk subgroups. Red and green colors indicate patients
with and without recurrent disease, respectively. (F and G) Survival
analysis for 3-PPI-Mod in all patients (F) and in stage I only (G).
Kaplan-Meier curves and log-rank test were used to compare OS between
different risk subgroups identified in (E). Abbreviations: PPI,
protein-protein interaction; ADC, adenocarcinoma; FDR, false discovery
rate; RF, random forest; ROC, receiver operating characteristic curve;
AUROC, area under the receiver operating characteristic curve; CI,
confident interval; OS, overall survival.
Establishing a Predictive Signature for Cancer Recurrent Risk
We then thought to establish an optimal model to predict recurrent risk
for patients using the 21 PPI models. By using the random forest (RF)
algorithm, we found that the combination of three PPI modules achieved
the criteria of the smallest feature number and the best prediction
accuracy ([82]Figure 2B, out-of-bag [OOB] error = 0.216 ± 0.038). Two
out of three modules (Mod_8 and Mod_10) were upregulated, and the last
module (Mod_137) was downregulated in recurrent tumors ([83]Figure 2C).
Next, the three PPI modules were used to construct a prediction model,
which was named 3-PPI-Mod signature. The recurrent risk of patients was
predicted by the signature, with an area under the receiver operating
characteristic curve (AUROC) of 0.84 (95% confident interval [CI],
0.77–0.91) ([84]Figure 2D). Using 0.5 as the cut-off, patients were
divided into two groups with high or low risk for recurrence,
respectively ([85]Figure 2E). Overall, 78.4% (91/116) of patients were
correctly classified when compared with the true recurrent status.
Kaplan-Meier curves confirmed that the 3-PPI-Mod signature could
predict overall survival (OS) for all patients and those with stage I
disease (both p < 0.001, [86]Figures 2F and 2G). Multivariate Cox
regression demonstrated that the 3-PPI-Mod signature was an independent
prognostic factor for OS (adjusted hazard ratio [HR] = 6.58, 95% CI,
3–14.4, p < 0.001, [87]Table S2).
Within the 3-PPI-Mod signature, 5, 6, and 5 genes were included in
Mod_8, Mod_10, and Mod_137, respectively. One gene (Cofilin-1 [CFL1])
was overlapped by Mod_8 and Mod_10. Mod_10 was connected with Mod_8 and
Mod_137 within the PPI network ([88]Figure 3A). Gene Ontology (GO) and
pathway enrichment analysis showed that all three modules were
significantly correlated with focal adhesion ([89]Figure 3B). Notably,
genes in Mod_8 and Mod_10 were enriched in extracellular exosome
([90]Figure 3B). Mod_137 showed association with diverse biological
processes, including protein kinase binding, membrane raft, and T cell
receptor signaling pathway ([91]Figure 3B).
Figure 3.
[92]Figure 3
[93]Open in a new tab
Biological Characteristics of the 3-PPI-Mod Signature
(A) The view of interaction of 3-PPI-Mod signature. The presented
network was generated by mapping the signature proteins on the
reference PPI network. Three modules are shown by different colors. (B)
Enrichment analysis for three modules. The significant GO terms or KEGG
pathways were listed on the left. Bars indicate the enrichment
significance (log10-transformed FDR). The color legend indicates
different modules. Abbreviations: PPI, protein-protein interaction;
3-PPI-Mod, 3 PPI modules; GO, gene ontology; FDR, false discovery rate.
Validation of the 3-PPI-Mod Signature in Independent Cohorts
For each validation dataset, the recurrent risks of patients were
predicted by the 3-PPI-Mod signature, which was established in the
training dataset. Kaplan-Meier curves demonstrated that patients who
were classified as high risk had significantly shorter relapse-free
survival (RFS) time than those classified as low risk (log-rank test:
Gene Expression Omnibus [GEO]: [94]GSE50081, p = 0.005; GEO:
[95]GSE30219, p = 0.003; GEO: [96]GSE31210, p < 0.001) ([97]Figures
4A–4C). The adjusted HRs (95% CI) for high-risk versus low-risk
subgroups were 2.24 (1.07–4.67), 3.74 (1.54–9.07), and 3.34 (1.81–6.15)
in GEO: [98]GSE50081, [99]GSE30219, and [100]GSE31210, respectively
([101]Table S3). Similarly, the high-risk subgroup was significantly
correlated with unfavorable OS compared with the low-risk subgroup
(GEO: [102]GSE50081, HR [95% CI] = 1.92 [1.03–3.58], p < 0.001; GEO:
[103]GSE30219, HR [95% CI] = 2.41[1.25–4.63], p = 0.006; GEO:
[104]GSE31210, HR [95% CI] = 2.46[1.09–5.58], p = 0.001) ([105]Figures
4D–4F; [106]Table S3). Subgroup analysis revealed that the 3-PPI-Mod
signature remained significant in distinguishing RFS and OS for
patients with stage I disease ([107]Figures S3A–S3F).
Figure 4.
[108]Figure 4
[109]Open in a new tab
Survival Analysis of 3-PPI-Mod Signature in Three Validation Datasets
(A–F) In each dataset, patients were predicted to be in low- or
high-risk subgroups by the signature. Kaplan-Meier curves and log-rank
test were used to compare RFS (A–C) and OS (D–F) of patients in
different risk subgroups: GEO: [110]GSE50081 (A and D), [111]GSE30219
(B and E), [112]GSE31210 (C and F). Survival data within 10 years are
shown. Abbreviations: 3-PPI-Mod, 3 PPI modules; RFS, relapse-free
survival; OS, overall survival.
We further validated the 3-PPI-Mod signature in a prospective dataset
conducted from the JBR.10 clinical trial, which was designed to
evaluate the advantage of ACT in stage IB to II NSCLCs. When all ADC
patients were taken into consideration, the ACT arm showed no
significant improvement of disease-specific survival (DSS) compared
with the observation (OBS) arm ([113]Figure 5A, log-rank test, p =
0.503). The 3-PPI-Mod signature classified 32 and 39 of 71 ADC patients
as low- and high-risk subgroups, respectively. Although ACT did not
improve outcome in the low-risk subgroup ([114]Figure 5B, log-rank
test, p = 0.189), it significantly prolonged DSS in the high-risk
subgroup ([115]Figure 5C, log-rank test, p = 0.019). Similar results
were observed in OS analysis (ACT versus OBS, log-rank test: all ADCs,
p = 0.907; low-risk subgroup, p = 0.115; high-risk subgroup, p = 0.064)
([116]Figures 5D–5F). Multivariate Cox’s model revealed a significant
interaction effect between risk groups and ACT (test for interaction:
DSS, p = 0.011; OS, p = 0.013) ([117]Table S4). For non-ADC patients,
however, there was no evidence demonstrating that ACT could improve DSS
or OS, even if in subgroups stratified by the 3-PPI-Mod signature
([118]Figure S4; [119]Table S4).
Figure 5.
[120]Figure 5
[121]Open in a new tab
Predictive Value of 3-PPI-Mod Signature for ACT Benefit in ADC Patients
from the JBR.10 Dataset
(A–F) Results with lung ADC patients are presented. Postoperative DSS
(A–C) and OS (D–F) between ACT arm and observation arm (OBS) were
compared by Kaplan-Meier curves and log-rank test. Patients were
stratified by 3-PPI-Mod signature. The survival comparisons were
performed in all patients (A and D), low risk subgroup (B and E), and
high risk subgroup (C and F), respectively. Abbreviations: ACT,
adjuvant chemotherapy; 3-PPI-Mod, 3 PPI modules; DSS, disease-specific
survival; OS, overall survival; OBS, observation.
The sEV-Associated Signature Exhibited Better Prognostic Value Than Signature
Derived from Gene Expression Alone
The sEV-associated 3-PPI-Mod signature was compared with an 8-gene
signature, which was constructed based on gene expression alone
([122]Figures S5A–S5D). The 8-gene signature was established by using
the same algorithm as used for the 3-PPI-Mod signature. In the training
dataset, the 8-gene signature showed similar prediction performance as
the 3-PPI-Mod signature in distinguishing relapse and OS ([123]Figures
S5E and S5F). In the validation datasets, however, the C-index of
3-PPI-Mod signature was higher than that of the 8-gene signature in
predicting both RFS and OS ([124]Figures S5G and S5H). Kaplan-Meier
analysis also confirmed that the 8-gene signature could not
consistently distinguish RFS and OS across different validation cohorts
when considering neither all patients nor the stage I subgroup
([125]Figures S6A and S6B). Meanwhile, the 8-gene signature could not
predict ACT benefit in the JBR.10 dataset ([126]Figure S7). These
findings suggest that the sEV-associated signature exhibited improved
prognostic value compared with signature derived from gene expression
alone, especially for independent cohorts.
The 3-PPI-Mod Reflects Tumor-Stroma Interaction and Hypoxic Tumor
Microenvironment
The proteins within the 3-PPI-Mod signature were further validated by
the proteomics of sEVs isolated from NSCLC cell supernatants[127]^26
and serums of NSCLC patients.[128]^22 Overall, 10 and 8 proteins of the
signature were overlapped with those identified in sEVs from NSCLC
cells and circulating serums, respectively (both p < 0.001, by
hypergeometric test) ([129]Figures 6A and 6B). More than 90% of the
overlapped proteins came from Mod_8 and Mod_10. For Mod_137, only
one protein (CTNNB1) was overlapped with proteomics of NSCLC-derived
sEVs and none with that of circulating sEVs ([130]Figures 6A and 6B).
Figure 6.
[131]Figure 6
[132]Open in a new tab
The 3-PPI-Mod Reflects Interaction between Cancer Cells and Tumor
Microenvironment
(A and B) Comparison of 3-PPI-Mod signature with other proteomic
databases of sEVs from NSCLCs. The venn diagram shows the overlapping
of proteins of three modules with those identified in sEVs from NSCLC
cell supernatants (A) and circulating serum (B). (C) Correlation
analysis of the signature with stromal cell abundance and
microenvironment status. The abundance of 8 immune cells and 2
non-immune cells were calculated by MCP-counter. The microenvironment
status for hypoxia, angiogenesis, and inflammation were estimated as
described in [133]Materials and Methods. Circles in the matrix indicate
the correlation coefficient r, and red and blue colors indicate
positive and negative correlations, respectively. A larger circle size
represents a higher correlation. The raw p value was adjusted by FDR.
Correlations with FDR > 0.05 were not presented. (D) GSEA of hypoxia
metagene against the risk subgroups classified by 3-PPI-Mod. The NES
and q value (adjusted p value) are presented. (E) Diagram shows
Pearson’s correlation between hypoxic score and HIF1A expression level.
(F) Boxplot shows intratumoral hypoxic scores between low- and
high-risk subgroups. The p value was calculated by t test. (G) Boxplot
shows HIF1A expression levels between tumors with low and high risk.
The p value was calculated by t test. Abbreviations: PPI,
protein-protein interaction; 3-PPI-Mod, 3 PPI modules; FDR, false
discovery rate; NSCLC, non-small cell lung cancer; MCP,
microenvironment cell populations; GSEA, gene set enrichment analysis;
NES, new enrichment score.
We hypothesized that Mod_137 may be associated with non-cancer cells.
The abundance of different stromal cells was estimated by
microenvironment cell population (MCP)-counter.[134]^27 Surprisingly, a
higher expression score of Mod_137 was correlated with elevated
abundance of fibroblasts, endothelial cells, and diverse types of
immune cells ([135]Figure 6C). In contrast, Mod_8 and Mod_10 expression
scores did not show a remarkably positive correlation with those of the
stromal components. Similar results were observed when analysis was
conducted by using individual genes within each module
([136]Figure 6C).
The association of 3-PPI-Mod signature with tumor microenvironment
status was analyzed. We found that Mod_8 and Mod_10 were positively
correlated with hypoxia, while Mod_137 was associated with angiogenesis
and inflammation ([137]Figure 6C). The hypoxia metagene was
significantly enriched in high-risk tumors predicted by the 3-PPI-Mod
signature (new enrichment score [NES] = 1.88, q value = 0.005)
([138]Figure 6D). There was a significant correlation between
intratumoral hypoxic score and hypoxia inducible factor 1 (HIF1)
expression level, which were both upregulated in the high-risk subgroup
identified by the 3-PPI-Mod signature ([139]Figures 6E–6G). However,
the angiogenesis and inflammation status was not associated with
3-PPI-Mod subgroups, based on gene set enrichment analysis (GSEA)
([140]Figures S8A and S8B) or expression scores ([141]Figures S8C and
S8D).
Discussion
In the present study, we evaluate the intratumoral gene expression
level of coding genes of sEV proteins in lung ADC. In the training
dataset, we identified a 3-PPI-Mod signature that could distinguish
patients with high risk of recurrence. This signature was also
validated in three independent datasets and demonstrated better
prognostic value than the gene signature established by using gene
expression alone. Moreover, the patients assigned as high risk by the
signature could benefit from ACT, although no significant benefit was
achieved for unselected patients. Further analysis revealed that the
sEV-associated signature is highly correlated with intratumoral hypoxic
status and might be derived from both cancer cells and stromal cells.
To the best of our knowledge, this is the first comprehensive analysis
of sEV-associated genes in predicting outcome of lung ADC based on
large-number populations from several independent cohorts.
To date, it remains a great challenge to discriminate exosomes from
other sEVs such as MVs.[142]^24 Exosomes are derived from endosomal
multivesicular bodies that fused with the cytoplasma membrane, while
MVs are usually considered to bud directly from the cytoplasma
membrane.[143]^9 The PE-derived “MVs”[144]^23 and NSCLC cell-released
“exosomes”[145]^26 were isolated by using differential ultracentrifuge
followed by density gradient methodology, which recovers mixed EV
populations. For these EVs, the multivesicular body origin or purity
were not further demonstrated. Therefore, we used the general term sEVs
in this study to refer to these EVs, which were characterized with
diameters less than 200 nm.[146]23, [147]26
The sEV cargos are suggested as promising cancer biomarkers for
non-invasive detection due to the presence in various body fluids
(e.g., plasma, PEs, sputum, and urine). However, it remains difficult
to evaluate the role of tumor-specific sEVs, which may be influenced by
those released from other tissues.[148]^24 It was reported that the
protein expression of sEVs released to body fluids was consistent with
that of the tumor tissues, suggesting feasibility for analyzing primary
tumors to reflect the status of circulating sEVs.[149]^25 In this
study, by investigating the intratumoral expression of coding genes of
proteins carried within sEVs,[150]^23 we identified a prognostic gene
signature for lung ADC. Most proteins of the signature were confirmed
by sEV proteomics from NSCLC cells,[151]^26 suggesting that the
signature may be enriched by tumor-specific sEVs. It is possible that
the gene expression in tumor tissues could influence cancer
progression. Nevertheless, secretion of the gene products to
circulating blood through sEVs may further promote recurrence or
distant metastasis.[152]17, [153]28 This may be explainable by our
observation that the sEV-associated signature exhibited better
prognostic value than signature that was established from gene
expression alone. Therefore, this study provides a promising strategy
to evaluate the prognostic value of tumor-specific sEVs from complex
body fluids.
The human PPI network has been demonstrated to be informative in
biomarker development for cancer when being integrated with
gene-expression profiles.[154]^29 Based on the PPI network approach, we
identified three network modules that were highly correlated with
cancer recurrence. Although these modules were established in the
training dataset, their prognostic value was confirmed in external
cohorts with hundreds of patients. As we previously reported, the PPI
network approach could identify vital biological themes that reflect
mechanisms underlying cancer behaviors.[155]^30 In this study, most of
the modules (e.g., Mod_8 and Mod_10) were associated with exosome,
focal adhesion, and cytoskeleton regulation, which are involved in
cancer invasion and metastasis.[156]^31 These results suggest that the
3-PPI-Mod signature represents malignant traits of cancer and is robust
in outcome prediction across different cohorts.
Hypoxia, a frequent feature in solid tumors, could induce lung cancer
cells to release more sEVs that promote migration via miR-23a
delivery.[157]^16 In our signature, Mod_8 and Mod_10 were positively
correlated with a hypoxia score that could predict prognosis of NSCLC
as reported previously.[158]^32 This was consistent with our findings
that Mod_8 and Mod_10 were associated with higher recurrent risk. CFL1,
a common gene between Mod_8 and Mod_10, was correlated with
invasiveness and unfavorable outcome in NSCLC.[159]33, [160]34
Functional analysis revealed that CFL1 participates in actin dynamics
and contributes to membrane extension direction in cell
migration.[161]^35 Moreover, many other genes (WYHAG, NCKAP1, ACTB,
GNA13, and RDX) identified in Mod_8 and Mod_10 also play an important
role in cancer invasion and metastasis.[162]36, [163]37, [164]38,
[165]39, [166]40 However, the association with hypoxia of these genes
in NSCLC remains unclear. We found that most of these genes were
positively correlated with intratumoral hypoxic score. Thus, we propose
that the expression and prognostic role of the sEV-associated signature
may be promoted by hypoxic microenvironment, although extended
experimental validation is required.
Unlike Mod_8 and Mod_10, Mod_137 had only one protein that was
presented in sEVs from NSCLC cells. By analyzing the microenvironment
cell populations,[167]^27 we found that four proteins in Mod_137 were
correlated with diverse stromal cells. This observation was supported
by the fact that FYN, PTPRC (CD45), and THY1 (CD90) might be involved
in lymphocytes,[168]41, [169]42 macrophages,[170]^43 and
fibroblasts[171]^44 in tumor microenvironment. Moreover, PTPRC was also
identified in sEVs released by dentritic cells.[172]^45 Interestingly,
four proteins in Mod_137 were significantly correlated with endothelial
cell abundance, which was consistent with the fact that Mod_137 was
associated with angiogenesis score. Overall, the sEV proteins derived
from cancer cells together with stromal cells may contribute to
aggressive microenvironment and influence postoperative recurrence in
lung ADC.
One limitation of this study is the relative small sample set of the
sEVs cohort, which was used to establish the gene signature. To address
this issue, the 3-PPI-Mod signature was further compared with
proteomics of circulating sEVs from another cohort with 26 NSCLC
patients.[173]^22 The results confirmed that most proteins of Mod_8 and
Mod_10 were presented in circulating sEVs, which was isolated by
anti-CD9 antibodies.[174]^22 Notably, none of the proteins in Mod_137
were identified in these circulating CD9+ sEVs. This could be explained
by the fact that Mod_137 is probably a representative for sEVs from
non-cancer cells. Meanwhile, it was reported that a subpopulation of
major histocompatibility complex (MHC) class II-bearing sEVs from
dentritic cells could escape from anti-CD9 capture.[175]^45 Taken
together, these findings suggest that the 3-PPI-Mod signature is
consistently present in different sEV proteomic datasets of NSCLC.
In summary, this study developed a sEV-associated gene-expression
signature (3-PPI-Mod) that could predict prognosis and guide ACT
benefit for lung ADC. The signature was enriched by cancer-secreted
sEVs and was remarkably associated with intratumoral hypoxic status.
This study provided a promising strategy to evaluate the prognostic
value of sEV-related molecules in tumor tissues and shed new light on
the mechanisms underlying metastasis and recurrence of lung ADC.
Materials and Methods
Patients and Clinical Characteristics
The proteomic database of sEVs isolated from malignant PEs from three
patients with lung ADC was obtained from Park et al.’s [176]^23 study.
There were two females and one male. All of the patients suffered stage
IV disease, and two patients had metastases at distant organs. Another
proteomic database of serum sEVs from 26 NSCLC patients was employed
for validation, including 14 cases with lung ADC.[177]^22
Gene expression profiles of primary lung ADCs from 623 patients were
collected from five independent datasets in GEO. These datasets were
characterized with geographical and technical diversity
([178]Table S1). GEO: [179]GSE14814 was conducted from the JBR.10
trial, which included NSCLC patients who were randomly assigned into
the ACT and OBS arms. For datasets with multiple pathological types of
NSCLC, we analyzed the subgroup of lung ADC only. The clinical
parameters and follow-up information of each cohort were obtained from
GEO. Cases without prognostic information were excluded from analysis.
Finally, there were 116 lung ADC samples in GEO: [180]GSE13213, 127
lung ADC samples in GEO: [181]GSE50081, 83 lung ADC samples in GEO:
[182]GSE30219, 226 lung ADC samples in GEO: [183]GSE31210, and 71 lung
ADC samples in GEO: [184]GSE14814. Clinical characteristics of patients
from each dataset are summarized in [185]Table 1. The involvement of
human subjects from publicly available resources is in accordance with
the Declaration of Helsinki.
Table 1.
Clinical Characteristics in Five Datasets of Lung ADC
Training Validation
Variables GEO: [186]GSE13213 GEO: [187]GSE50081 GEO: [188]GSE30219 GEO:
[189]GSE31210 GEO: [190]GSE14814
__________________________________________________________________
Age
__________________________________________________________________
< 60 ys 43 (37.07) 19 (14.96) 46 (55.42) 96 (42.48) 35 (49.3)
≥ 60 ys 73 (62.93) 108 (85.04) 37 (44.58) 130 (57.52) 36 (50.7)
__________________________________________________________________
Gender
__________________________________________________________________
Female 57 (49.14) 62 (48.82) 18 (21.69) 121 (53.54) 34 (47.89)
Male 59 (50.86) 65 (51.18) 65 (78.31) 105 (46.46) 37 (52.11)
__________________________________________________________________
Smoking
__________________________________________________________________
Never 56 (48.28) 23 (18.11) – 115 (50.88) –
Ever 60 (51.72) 92 (72.44) – 111 (49.12) –
N/A 0 (0) 12 (9.45) – 0 (0) –
__________________________________________________________________
Stage
__________________________________________________________________
I 78 (67.24) 92 (72.44) 79 (95.18) 168 (74.34) 42 (59.15)
II 13 (11.21) 35 (27.56) 3 (3.61) 58 (25.66) 29 (40.85)
III 25 (21.55) 0 (0) 1 (1.20) 0 (0) 0 (0)
__________________________________________________________________
EGFR
__________________________________________________________________
Mutation 45 (38.79) – – 127 (56.19) –
Wild type 71 (61.21) – – 99 (43.81) –
__________________________________________________________________
KRAS
__________________________________________________________________
Mutation 14 (12.07) – – – –
Wild type 102 (87.93) – – – –
__________________________________________________________________
TP53
__________________________________________________________________
Mutation 38 (32.76) – – – –
Wild type 77 (66.38) – – – –
N/A 1 (0.86) – – – –
__________________________________________________________________
Postsurgery Recurrence
__________________________________________________________________
No 58 (50) 87 (68.50) 56 (67.47) 162 (71.68) –
Yes 58 (50) 37 (29.13) 27 (32.53) 64 (28.32) –
N/A 0 (0) 3 (2.36) 0 (0) 0 (0) –
__________________________________________________________________
ACT
__________________________________________________________________
No – – – – 32 (45.07)
Yes – – – – 39 (54.93)
[191]Open in a new tab
The table shows the number (%) of patients in different groups. ys,
years; ACT, adjuvant chemotherapy; N/A, not available.
Proteomic Database of sEVs
The proteomics of sEVs isolated from malignant PEs,[192]^23 serum
samples,[193]^22 and supernatant of NSCLC cells HCC827 and A549[194]^26
were profiled by mass spectrometer. These proteins were mapped to a PPI
network and gene expression profiles by Entrez gene IDs.
Gene-Expression Data Processing
Raw data from the training dataset (GEO: [195]GSE13213) with the
Agilent platform were processed with “median-scaled” normalization by R
package “limma”. Raw data fom four validation datasets (GEO:
[196]GSE50081, [197]GSE30219, [198]GSE31210, and [199]GSE14814) with
the Affymetrix platform were downloaded from GEO. These data were
processed and normalized with “RMA” method by R package “affy”. Probe
annotation of each dataset was downloaded from the GEO database. All
gene expression profiles were log[2]-transformed.
Processing of PPI Databases
Three public databases were employed to construct the reference PPI
network in this study. PPI network from the Human Protein Reference
Database (HPRD, release 9.0)[200]^46 was composed of 9,466 proteins and
36,895 interactions. Protein-protein relationships labeled with
“Compound/Binding” in Kyoto Encyclopedia of Genes and Genomes (KEGG)
pathways were extracted, resulting in 2,986 proteins and 31,047
interactions. Protein-protein relationships labeled with “Direct
Complex” in Reactome pathways were extracted (4,885 proteins and 22,763
interactions). These databases were merged into a reference PPI
network, with 11,591 proteins and 77,059 interactions (after removing
duplicated interactions and protein-self loops). Network mapping across
different databases was done by Entrez gene IDs.
Identification of Relapse-Associated PPI Modules
The sEV proteins from PEs were mapped to the reference PPI network. The
largest component of the mapped network was used for subsequent
analysis, including 568 proteins and 1,632 interactions. This network
was integrated with the gene expression profile of the training dataset
(GEO: [201]GSE13213). PPI modules (sub-networks) that could
discriminate recurrent status were used as previously
described.[202]^29 In a given module M with m genes, the expression
score (e) of M in sample j was defined as:
[MATH: ej=∑i∈mZij<
msqrt>m, :MATH]
where Z[ij] is the Z-transformed gene expression value of gene i.
Then, the discriminative score S(M) of module M was defined as the
mutual information (MI) between e and relapse class (c):
[MATH:
S(M)
=MI(e',c)=∑x∈e'∑y∈cp<
/mi>(x,y)logp(x,y)p(x)p(y)
, :MATH]
where e’ represents the discretized form of e, by discretizing
expression score into 8
[MATH:
(log2<
/mn>(n)+1
mn>) :MATH]
(where n is sample numbers) equally spaced bins as described
previously.[203]^47
A greedy searching program was performed to identify modules with
locally maximal recurrent score S(M) using the similar parameters as in
the previous study.[204]^29 Briefly, each protein in the PPI network
(568 proteins) was seeded and iteratively expanded with a specified
distance d (d = 2 in this study). For each expansion, the protein that
yielded the largest improvement of S(M) among all candidate proteins
was added to the module. The module expansion stopped when further
protein addition did not increase the S(M) by a specified improvement
rate of 10%. Modules with only one protein were removed, and modules
with >80% common proteins were merged.
Three permutation methods were utilized to estimate the significance of
a candidate module M (with m genes): (1) stimulating random PPI modules
with m proteins in the network; (2) shuffling the expression value of m
genes; (3) shuffling the recurrent class of samples. Each method was
performed by 10,000 times. The random discriminative scores were used
as null distribution of S(M). PPI modules with a p value < 0.001 by all
three methods were considered to be significant. The gene expression
score of candidate modules was used in the following modeling process.
Development of Predictive Signature for Recurrence
To construct a PPI network-based gene signature, the gene expression
score (as defined above) of candidate modules was used. Feature
selection and modeling was performed based on RF algorithm using the R
package “varSelRF”.[205]^48 In detail, the predictive importance for
recurrence of each candidate module was estimate by an initial RF with
5,000 trees. A stepwise backward selection procedure was used to
identify the optimal combination of candidate modules for recurrence
prediction. In each iteration, 10% of features were excluded, and the
remaining features were used to build an RF with 3,000 trees. This
program was stopped when there were only two features left. The final
RF model was selected with the smallest feature number that achieved
the minimum OOB error among all iterative results. The recurrence risk
of patients was defined by the OOB probability in the final RF model.
To assess the efficiency of the PPI network-based methodology, we
established a gene-based RF prediction model using the same program and
parameters as used in the sEV network-based method. The predictive
results of the two methods were compared by ROC curve analysis. The
AUROCs with 95% confidence interval (CI) were calculated by the R
package “pROC”.[206]^49 Also, the two methods were compared by the
Harrell’s C-index in predicting RFS and OS in the independent datasets.
A higher C-index means more accuracy in survival predicting.
Estimation of Tumor Microenvironment Status
A hypoxia metagene across different cancer types was obtained from a
previous study.[207]^32 A core angiogenesis signature for primary tumor
was identified by Masiero et al.[208]^50 A set of inflammatory
cytokines[209]^51 was used to estimate the intratumoral inflammation
level. The intratumoral hypoxia, angiogenesis, and inflammation scores
were calculated by averaging the Z-normalized expression value of the
corresponding signature genes as we described previously.[210]^52 The
abundance of immune and non-immune cells in the tumor microenvironment
was calculated by MCP-counter, a robust estimator for
tissue-infiltrating cell populations by gene expression
profiles.[211]^27
Other Statistical Methods
GO and KEGG pathway enrichment analysis for PPI modules was performed
by the online tool DAVID.[212]^53 The significance of overlapping
between two gene sets was calculated by hypergeometric test. The
association between two category variables of patients was calculated
by Chi-squared test or Fisher’s exact test. GSEA of was used to compare
interested gene sets against the patient subgroups classified by
3-PPI-Mod.[213]^54 The correlations of PPI modules with hypoxia,
angiogenesis, and inflammation scores as well as the stromal cell
abundance were calculated by Pearson’s correlation. Multiple testing
was adjusted by false discovery rate (FDR) using the Benjamini and
Hochberg’s method. A comparison of patient survival between low- and
high-risk groups assigned by 3-PPI-Mod was performed by Kaplan-Meier
curve and log-rank test. DSS and OS of patients in the ACT arm and the
OBS were compared by Kaplan-Meier curve and log-rank test, with
stratification by the 3-PPI-Mod signature. Multivariate Cox’s models
were performed to assess the prognostic and predictive value of
3-PPI-Mod signature. All statistical analyses were performed by R
software (version 3.3.1). A p value < 0.05 was considered to be
significant.
Author Contributions
Conception and design: J.L. and Q.L.; Development of methodology: B.C.
and W.D.; Acquisition of data: S.M., Q.W., M.L., H.L., and T.C.;
Analysis and interpretation of data: B.C., W.D., and S.M.; Writing,
review, and/or revision of the manuscript: B.C., J.L., Q.L., G.Z., and
X.Y.; Administrative, technical, or material support: G.Z. and X.Y.;
Study supervision: Q.W., M.L., and H.L. All authors read and approved
the final manuscript.
Conflicts of Interest
The authors declare no competing interests.
Acknowledgments