Abstract

   Identification of clinically applicable molecular subtypes
   of pancreatic ductal adenocarcinoma (PDAC) is crucial to improving
   patient outcomes. However, the traditional tissue-dependent
   transcriptional subtyping strategies are invasive and not amenable to
   routine clinical evaluation. In this study, we developed a circulating
   extracellular vesicle (cEV) long RNA (exLR)-based PDAC subtyping method
   and provided exLR-derived signatures for predicting immunogenic
   features and clinical outcomes in PDAC. We enrolled 426 individuals,
   among which 227 PDACs served as an internal cohort, 118 PDACs from two
   other medical centers served as an independent validation cohort, and
   81 healthy individuals served as the control. ExLR sequencing was
   performed on all plasma samples. We found that PDAC could be
   categorized into three subtypes based on plasma exLR profiles. Each
   subpopulation showed its own molecular features and was associated with
   patient clinical prognosis. The immunocyte-derived cEV fractions were
   altered among PDAC subtypes and interconnected with tumor-infiltrating
   lymphocytes in cancerous tissue. Additionally, we found a significant
   concordance of immunoregulators between tissue and blood EVs, and we
   harvested potential PDAC therapeutic targets. Most importantly, we
   constructed a nine exLR-derived, tissue-applicable signature for
   prognostic assessment of PDAC. The circulating exLR-based features may
   offer an attractive platform for personalized treatment and predicting
   patient outcomes in multiple types of cancer.

   Keywords: circulating EVs, extracellular vesicle long RNA,
   tissue-cellular origin, immune checkpoint genes, tumor microenvironment

Graphical abstract

   graphic file with name fx1.jpg
   [47]Open in a new tab
     __________________________________________________________________

   This is the first study to use large-scale cEV transcriptome analysis
   for cancer risk stratification and uncover a complex interaction
   network of immunogenic components with clinical implications between
   circulatory particles and primary nidus from PDAC cases.

Introduction

   Pancreatic ductal adenocarcinoma (PDAC) is currently the fourth leading
   cause of cancer-related death worldwide.[48]^1 The treatment efficiency
   of PDAC is restricted by the aggressive tumor-intrinsic characteristics
   and suppressive tumor microenvironment (TME).[49]^2 Previous studies
   showed that personalized treatment approaches based on PDAC genotypic
   and phenotypic characteristics may potentially improve patient
   outcome.[50]^3^,[51]^4 However, most of these reports focused on gene
   subtyping and exploring biomarkers in resectable tumors. Resection is
   possible only in a minority of patients, as most patients present at an
   advanced stage at the time of diagnosis,[52]^5 and safely obtaining
   sufficient quantities of pancreatic tumor tissue for molecular analysis
   is difficult at unresectable stages. Therefore, the development of
   specific, robust, and noninvasive molecular subtyping strategies for
   the stratification and prognostic evaluation of PDAC are required.

   Extracellular vesicles (EVs) have become a hotspot in liquid biopsy for
   cancer precision medicine. EVs are nanometer-sized membrane-bound
   particles that are shed from both tumor and nonneoplastic cells into
   the peripheral circulation. EVs deliver a milieu of cargo material,
   such as proteins, genomic DNA, and lipids, as well EV long RNAs
   (exLRs), from parent cells to receptive cells.[53]^6 Circulating EVs
   (cEVs) contain approximately 10,000 exLRs, mainly mRNAs, long
   non-coding RNAs, and circular RNAs, which are stably expressed and have
   shown clinical significance.[54]^7 Tissue- and tumor-originated exLRs
   are specifically enriched in cEVs and reflect the biological
   activities, metabolic conditions, and immunogenic features of their
   host cells.[55]^8 Previous studies showed that PDAC tumor-derived EVs
   regulate the immunosuppressive TME and deliver anti-tumoral
   transcriptomic signals, and that plasma-originated exLRs are critical
   indicators of PDAC antitumor immune activities.[56]^9^,[57]^10
   Additionally, exLRs are involved in critical processes of PDAC
   carcinogenesis, including tumor growth, metastasis, drug resistance,
   and immune modulation.[58]^11^,[59]^12 Therefore, the mixture of
   non-enriched cEVs containing both tumor-derived and non-tumor-derived
   exLRs[60]^13 may reflect tumor-intrinsic and non-tumor signatures, and
   evaluation of exLRs may represent a new strategy for noninvasive
   molecular stratification, prognostic evaluation, and treatment
   monitoring in PDAC.

   We previously determined the abundance of exLRs in the plasma of cancer
   patients[61]^14 and developed a strategy for exLR sequencing (exLR-seq)
   in human plasma.[62]^15 Our study indicated that cEVs contain
   approximately 10,000 exLRs, mainly mRNAs, which are stably expressed,
   harbor tissue-cellular specificity, and could serve as biomarkers for
   PDAC early screening.[63]^13

   In this study, we evaluated the potential usefulness of the circulating
   exLR profile for PDAC subtyping. We traced the tissue-cellular source
   contribution of cEVs and explored the correlation of the immune
   landscape between tumors and cEVs. Our study established a prognostic
   signature enabling noninvasive prediction of immunogenic features and
   clinical outcome for PDAC.

Results

Circulating exLR-based subtyping of PDAC

   The exLR-seq approach was implemented on plasma samples from an
   internal cohort of 227 PDAC patients ([64]Table S1). Approximately
   14,000 exLRs were reliably detected, including protein-coding genes
   (10,253, 74.91%), circRNAs (1,322, 9.66%), pseudogenes (1,218, 8.9%),
   lncRNA genes (764, 5.58%), and other small non-coding RNAs (130, 0.95%)
   ([65]Figure S1A). We included the top 60% aberrantly expressed exLRs as
   candidate targets to explore underlying exLR-based PDAC subtypes. We
   assessed clustering stability using the consensus-clustering algorithm,
   which indicated the existence of three major clusters ([66]Figures
   S1B–S1F; see [67]Materials and methods); 60, 97, and 70 cases were
   stratified to clusters C-I, C-II, and C-III, respectively
   ([68]Figure 1A). Principal-component analysis demonstrated a clear
   distinction between each subgroup ([69]Figure 1B). The number of
   detected exLR species shows a significant difference between three
   clusters. In particular, the amount of enriched protein-coding genes in
   C-III was greater than in other group sets (C-I and C-II), which
   indicates that plasma EVs of the third cases were additionally enriched
   with exLRs, which could conduct exclusive biological functions
   ([70]Figure 1C).

Figure 1.

   [71]Figure 1
   [72]Open in a new tab

   PDAC prognostic subtyping based on circulating exLR-seq profile

   (A) Consensus clustering matrix of exLR-seq data from the internal
   cohort for aggregating number k (k = 3), representing the clustering
   stability using 500 iterations of the hierarchical clustering approach.
   The heatmap depicts the relative abundance of signature exLRs. (B)
   Two-dimensional scatterplot of principal-component analysis (PCA);
   first two principal components derived from expression intensities of
   variable exLRs, with samples connected by centroids according to
   exLR-based PDAC subsets (red, C-I; blue, C-II; yellow, C-III), are
   shown. The ellipse indicates the 90% confidence interval for each
   clustered group. (C) The relative distribution of exLR species per
   sample among PDAC clusters. The diagram shows exLRs reliably enriched
   in plasma vesicles, with a frequency > 0.1 and mean expression value >
   1. The center of the boxplots indicates median values; the boundaries
   of the boxes show 25% and 75% quantiles. (D) Alluvial diagram shows
   differences among PDAC clusters according to overall survival (OS),
   patient status, tumor stage, and carbohydrate antigen 19-9 (CA19-9).
   CA19-9 level greater than 100 U/mL was defined as high. ∗∗p < 0.01;
   n.s., not significant. (E) Kaplan-Meier curves for OS in the 227 PDAC
   patients in the internal exLR-seq cohort indicate that exLR-based
   phenotypes strongly correlated with PDAC prognosis (log-rank test, p <
   0.001).

   Evaluation of the clinical characteristics of the three clusters
   revealed a significantly increased number of patients with advanced
   tumor stage and shorter overall survival (OS) time in C-III; the other
   characteristics (age, sex, and CA19-9) were not associated with
   exLR-based stratification ([73]Figure 1D). Survival analysis showed
   that C-III correlated with the shortest OS, while C-I correlated with
   the longest OS (log-rank test, p < 0.001; [74]Figure 1E). Therefore,
   our results suggested that circulating exLR-based subtyping enables
   noninvasive prediction of clinical outcome for pancreatic cancer.

Molecular features and pathway annotation of exLRs in exLR-based clusters

   To assemble these dysregulated circulating exLRs and explore their
   underlying biological functions in each exLR-based cluster, the
   weighted correlation network analysis (WGCNA) approach was used to
   generate co-expressing exLR modules based on circulating exLR-seq
   expression profiles, and it correlated each module with phenotypes and
   clinical parameters (see [75]Materials and methods and [76]Figure S2A).
   The variably expressed exLRs were distributed in eight independent
   modules according to their degree of connectivity. Six gene modules
   (M2–M7) were significantly associated with C-III and one module (M1)
   was particularly referred to C-I (Spearman’s correlation analysis, p <
   0.001, Rho > 0.45; [77]Figures 2A and [78]S2B). The exLRs that could
   not be contained in any modules were placed into M8. All C-III-related
   modules were positively correlated with tumor aggressiveness, whereas
   the inverse was observed with the C-I-specific module; none of the
   modules showed any demographic bias ([79]Figure 2A).

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Systematic annotation of molecular features and pathway activation of
   exLR-based clusters

   (A) The module-trait correlating matrix displays the correlation
   coefficients and significance levels of PDAC clusters and clinical and
   demographical factors for each exLR module. (B) Systematic Gene
   Ontology (GO) annotation network for genes in the eight modules in
   biological process, cellular component, and molecular function aspects.
   We filtered the top 10 significant annotating terms of each module and
   displayed these terms on the network diagram. The annotated GO terms
   are shown on each ellipse; the size of the ellipse reflects the degree
   of enrichment (−log10[p value]), and the transparency of the text
   indicates the number of exLR genes from each enriched GO term (log[2]
   transferred). The gray connection communicates each gene module with
   its corresponding annotation information. (C) Unsupervised clustering
   heatmap of ssGSEA enrichment score of 191 cluster-specific signaling
   pathway signatures in the exLR-seq data from the internal cohort.
   Hierarchical clustering was implemented with Euclidean distance and
   Ward linkage. Rows represent the signaling pathways derived from the
   current MSigDB 7.2 database that includes three major gene set
   collections (Hallmark, Reactome, and KEGG sets), and columns represent
   plasma samples. The right panels display the average normalized
   enrichment score (NES) of represented signaling pathways in PDAC
   clusters I and III. (D) The scatterplot demonstrates the tissue-cEV
   correlated signaling pathways from 61 PDAC patients with cEV and
   matched tissue transcriptomic expression profiles.

   Gene Ontology (GO) enrichment analysis was conducted to annotate exLR
   genes in each module ([82]Figure S2B). The exLRs of M1 (corresponding
   to C-I) were involved in the adhesion and migration of hemopoietic
   cells and represented favorable indicators for PDAC prognosis
   ([83]Figures 2B and [84]S2C). Conversely, the exLRs of M2–M7 (C-III)
   were involved in the activation of tumoral and immunogenic features,
   including RNA anabolic and catabolic processes, energy metabolism,
   cellular and immune responses, and DNA replication ([85]Figures 2B and
   [86]S2C; [87]Tables S2–S8). The pre-ranked gene set enrichment analysis
   (GSEA) strategy identified 191 pathways that were significantly
   different in each cluster (C-I, n = 57; C-II, n = 42; C-III, n = 92;
   false discovery rate [FDR] < 0.1, normalized enrichment score > 1.5;
   [88]Table S9). Unsupervised hierarchical clustering of the absolute
   enrichment score determined two distinct pathway signatures enriched in
   C-I and C-III samples ([89]Figure 2C). C-I was exclusively enriched in
   vesicle-mediated molecule transport, GPR signal transduction, and cell
   motility-related pathways, while C-III was enriched in cell
   proliferation and metastasis, metabolism, and immune-related pathways
   ([90]Table S9). The activation of chemokines, T cell receptor (TCR)/B
   cell receptor (BCR) and antigen processing and presentation signaling,
   as well as complement component genes overexpressed in cEVs suggests
   that patients in the C-III group showed primary and adaptive immune
   reactions in response to PDAC development.

   Correlation analysis revealed 77 signaling pathways that showed similar
   patterns in cEVs and tissue profiles, including PDAC- and
   immune-related hallmark signatures, such as JAK-STAT, transforming
   growth factor β (TGF-β), and WNT/β-catenin pathways and TLR-9/10
   cascades ([91]Figure 2D; [92]Table S10). These results suggest that
   circulating exLRs serve as noninvasive indicators to jointly reflect
   tumor-intrinsic and antitumor immune signatures.

Comprehensive landscape of tissue-cellular origins for cEVs

   We previously developed a deconvoluting approach, termed EV-origin, as
   a computational framework to infer the tissue-cellular origins of cEVs
   and cell type-specific genes from transcriptomic profiles of
   cEVs.[93]^16 In this study, we expanded the number of traceable
   hemopoietic cell fractions. A total of 39 types of tissue-cellular
   fractions were relatively or absolutely quantified, including 23
   hematopoietic components and 16 types of tissue constituents. Detailed
   information of the candidate tissue/blood components with their
   hierarchical relationships are shown in [94]Table S11. Using the
   updated EV-origin strategy, we portrayed an integrated atlas of a
   traceability system of cEVs in exLR-seq samples from the internal
   cohort. In evaluation of hemopoietic components, cEVs predominantly
   originated from the hematopoietic stem cell fraction (platelets, 57%;
   red blood cells, 13%), followed by immunocyte components (30%)
   ([95]Figure 3A, left). Concerning the proportion of immune cell
   sources, lymphocytes accounted for 81%, and the remaining myeloid
   component was 19% ([96]Figure 3A, right). These immune cell
   constituents were categorized into adaptive (71%) and innate (29%)
   immune-related components ([97]Figure 3A, right; [98]Table S11).
   CD8/CD4^+ T cells accounted for 66% of all immunocyte-derived
   constituents, and there was also a considerable abundance of
   immunosuppressive cell components (regulatory T cells, 5%; monocytes,
   16%). We also evaluated the abundance of tissue/solid organ-originated
   cEVs in PDAC plasma. Adipose tissue was the most prominent tissue
   contributor to plasma EVs (72%), followed by muscle (10%), lung (4%),
   and liver (4%); the pancreas accounted for only 0.24% ([99]Figure 3B;
   [100]Table S8).

Figure 3.

   [101]Figure 3
   [102]Open in a new tab

   Comprehensive landscape of tissue-cellular origins for cEVs

   (A) Relative distribution of all blood cell components estimated using
   the EV-origin approach from exLR-seq data from the internal PDAC
   cohort. The bar chart on the left displays the relative abundance of
   cEVs from erythrocytes, immune cells, and platelets. The multilayer pie
   chart on the right shows the relative proportions of
   immunocyte-originated cEV fractions. The diagram represents the
   categorization of immune cells belonging to each hemopoietic
   constituent and indicates the relative proportion of each subset in the
   total immune fraction. (B) Relative distribution of tissue/solid
   organ-derived EV fractions from the internal PDAC exLR-seq cohort.
   n.s., not significant. (C) Integral comparison of hemopoietic- and
   tissue-derived cEV absolute fractions from healthy control and PDAC
   plasma exLR-seq samples. The mean and upper and lower standard
   deviation values are indicated by bars in each panel; the pink and
   yellow boxes show the absolute scores of hemopoietic- and
   tissue-derived EV fractions for healthy control and PDAC exLR-seq
   samples, respectively. (D) Overview of the results of tissue-cellular
   originated cEVs in the PDAC and healthy control exLR-seq samples. The y
   axis shows the mean fold change of all absolute fractions for PDAC
   versus healthy individuals.

   We additionally evaluated the cEV fractions from 81 healthy
   individuals. In the comparison of absolute fractions of components,
   there was no statistically significant difference in blood
   cell/tissue-derived fractions between healthy and PDAC individuals
   (Mann-Whitney U test, p > 0.05; [103]Figure 3C). Source contribution
   heterogeneity analysis revealed that 80% (12/15) of the tissue-derived
   components were increased in the PDAC group, while 65% (15/23) of
   hemopoietic-sourced fractions were highly enriched in the healthy
   cohort (absolute score, mean fold change > 1; [104]Figure 3C). The
   enriched level of the pancreatic tissue-sourced cEV fraction
   demonstrated a modest upregulation in the PDAC cohort ([105]Figure 3D).

Immunocyte-derived EVs perturbed in PDAC subtypes and interconnected with
tumor-infiltrating lymphocytes (TILs) in tissue

   We next investigated the underlying heterogeneity of cEVs and their
   immunogenic source contribution among the PDAC phenotypes. The
   immunocyte-derived EV fractions can be categorized into innate and
   adaptive immunity categories to decode complex TME components. The
   three clusters showed significant differences in four types of innate
   immune constituents (i.e., monocytes, natural killer [NK] cells,
   dendritic cells [DCs], and γδ T cells). C-III exhibited the highest
   proportion of all four innate immune cell types while C-I showed the
   lowest proportions (Kruskal-Wallis rank-sum test, p < 0.05;
   [106]Figure 4A, top). Among the adaptive immunity constituents, T
   helper (Th)2 and Th17 cell fractions were upregulated in C-I, while
   high proportions of T regulatory (Treg) cells, naive Th cells, central
   memory CD8 T cells, memory B cells, and plasmablasts were observed in
   C-III ([107]Figure 4A, bottom). We further examined the absolute
   numbers to determine the true abundance of cEVs derived from each
   immune cell population. The immunogenic composition of cEVs gradually
   increased from C-I to C-III (mean absolute score: C-I, 0.38; C-II,
   0.40; C-III, 0.51; [108]Figure 4B). The levels of innate and adaptive
   immunogenic cell fractions were both significantly upregulated in C-III
   compared with C-I and C-II (Mann-Whitney U test, p < 0.05;
   [109]Figure 4B).

Figure 4.

   [110]Figure 4
   [111]Open in a new tab

   Immunocyte-derived EVs perturbed in PDAC subtypes and interconnect with
   TILs in tissue

   (A) The graph indicates the relative enrichment results of EV fractions
   from 21 types of immunocytes in the three PDAC subtypes from the
   discovery exLR-seq dataset. The graph shows 7 types of innate immunity
   components on the top and 14 adaptive immune subsets on the bottom. The
   significantly different immune constituents are highlighted in red
   color. (B) Comparison of the absolute fractions of cEVs from innate and
   adaptive immunocytes in the three exLR-based subtypes. The points and
   error bars indicate the mean value and standard deviation of the
   absolute score, respectively. ∗p < 0.05; n.s., not significant. (C) OS
   in high- and low-enriched groups for absolute immunocyte fractions by
   calculating the hazard ratio (HR) through Cox regression analysis. High
   and low expression was defined by the optimal cutoff of the
   quantitative score. The dots represent the HR of univariate Cox test
   and the error bars show the 95% confidence intervals. (D) The panel
   displays the correlation between exLR-based PDAC classifications and
   immune cell-originated EV fractions. The dots show the odds ratios of
   univariate logistic regression for immune cells in innate and adaptive
   categories. The error bars show 95% confidence levels of odds ratios.
   (D) Venn diagram illustrating the number of immunocyte-originated EV
   constituents that are significantly perturbed in PDAC subtypes and may
   function as prognostic indicators for PDAC. The bottom panel lists the
   three hub immune cell components along with their cell origin and
   prognostic impact. (F) Co-enrichment network of tissue/cEV
   immune-relevant signatures. The three candidate immunocyte-originated
   EV constituents are shown in the inside track (yellow triangles). The
   outside panel refers to the TILs in paired PDAC tissue. Each line
   represents the correlation coefficient of the immune cell signature
   between tissue and circulating vesicles among the same PDAC cases. The
   bottom panel shows the prognostic value (high level, orange line; low
   level, purple line) of the three hub TIL signatures in TCGA cohort.

   In addition, we dichotomized the absolute enrichment score of each
   fraction and identified six innate immune subpopulations, including
   five risk components (monocytes, basophils, DCs, NK cells, and
   neutrophils; hazard ratio [HR] > 1) and one potential protective factor
   (γδ T cells; HR < 1), that were significantly related to OS in PDAC
   (p < 0.05; [112]Figure 4C, left). In addition, 8 of 12 adaptive immune
   populations were significantly correlated with OS in PDAC, including
   five unfavorable (naive Th cells, Treg cells, plasmablasts, effector
   memory CD8 T cells, and terminal effector CD8 T cells) and three
   protective cell subsets (memory B, naive CD8 T, and Th2 cells)
   ([113]Figure 4C, right). Univariate regression analysis indicated that
   4 of 7 innate cell fractions and 9 of 12 adaptive-lymphocyte-derived
   components were likely to be associated with exLR-based PDAC phenotypes
   (p < 0.05; [114]Figure 4D). We then used a stepwise variable selection
   procedure from Cox’s model to identify the most prognostic immunogenic
   fractions as variables, including Th2 cells, monocytes, Treg cells,
   neutrophils, and effector memory CD8 T cells. By taking the
   intersection of the variables and the significant cell sets with regard
   to OS ([115]Figure 4C) and subtypes ([116]Figure 4D), we identified
   three hub immunocyte-originated cEV fractions, including two adaptive
   immune components (Treg and Th2 cells) and one innate immune fraction
   (monocytes) that displayed critical roles for PDAC clinical outcome,
   and these components showed significant differences among PDAC subsets
   ([117]Figure 4E).

   The enrichment results of immunogenic cEVs raises questions regarding
   whether EVs from blood could represent the abundance of TILs in PDAC
   tissue. By deciphering the detailed portraits of immunocyte
   disaggregation from transcriptomic profiles of cEVs and paired tissues,
   we generated a full-scale correlation network between TILs and hub cEV
   fractions ([118]Figure 4F). The Treg cell-originated EV component
   positively correlated with DC and CD8^+ T cell infiltration, whereas it
   was inversely associated with Th2 cell abundance in paired tissue.
   There was a parallel enrichment pattern between the congenital immunity
   cell-derived vesicle components (monocytes) and the infiltration of NKT
   cells in tissues.

   We externally validated the prognostic significance of the three hub
   constituents on PDAC patient survival using The Cancer Genome Atlas
   (TCGA) dataset. The enrichment of immunogenic signatures from Treg and
   monocyte cells exhibited equivalent prognostic value in tissue and
   cEVs, which is consistent with their immunosuppressive properties,
   whereas the Th2 cell-derived fraction showed opposite results
   ([119]Figure 4F, bottom). The two candidate immune cell (Treg cells and
   monocytes)-originated cEV components could jointly serve as
   unprotective factors to forecast the clinical outcome for PDAC. Taken
   together, these data suggested perturbed immunogenic cEVs that were
   associated with immunosuppressive TILs in the TME.

Concordance of immunoregulatory targets in cEVs and cancer tissues

   Immunomodulators (IMs) are critical regulators for cancer
   chemoradiotherapy and immunotherapy, and numerous IM agonists and
   antagonists have been explored and applied in clinical
   oncology.[120]^17^,[121]^18 We collected a list of 76 IMs from the
   literature that exhibit multiple effects on the tumor immunoreaction,
   with functions in antigen presentation, cell adhesion, and immune
   checkpoint responses[122]^19 ([123]Table S12). The exLR level of IMs
   perturbed and unsupervised clustering results demonstrated that the IM
   profiles in cEVs could distinctly segregate patients into three groups,
   which precisely matched the exLR subtypes aforementioned in this study
   ([124]Figure 5A). Eight immunogenic exLRs that are upregulated in C-I
   show antitumor effects such as major histocompatibility complex (MHC)
   class I antigen presentation and stimulatory immune checkpoint
   responses. C-III was over-enriched with both antitumor and
   immunosuppressive IMs, with an expression pattern that was completely
   different from that of C-I. In addition, we compared candidate
   immune-related targets in paired cEV and tumor transcriptomes from 61
   patients and observed close correlations of expression between the two
   IM profiles (Rho = 0.65; [125]Figures 5B and 5D, top two tracks).

Figure 5.

   [126]Figure 5
   [127]Open in a new tab

   Concordance of gene expression of immunoregulatory targets in cEVs and
   neoplastic tissues

   (A) The heatmap of the expressional heterogeneity of immunomodulator
   (IM) genes in exLR-seq data from the internal PDAC cohort, scaled by
   row. Each row refers to an immune-related gene and columns represent
   exLR-seq samples. Detailed classification and underlying functions of
   these immunoregulatory targets are listed in [128]Table S12. (B)
   Scatterplot displaying the relationship of IM gene expression values
   between cEVs and paired tissue in 61 PDAC patients. (C) Correlation of
   ligand-receptor immune checkpoint genes in cEVs and corresponding
   paired tissues. (D) Circular chart represents the differently expressed
   IMs among the three PDAC subtypes. From the outside to the inside, the
   diagram displays the average gene expression level of IMs in cEVs and
   matched tumor tissue, the mean fold-change enrichment of IMs among PDAC
   subtypes, the p value (−log transformed), and the hazard ratio from
   univariate Cox analysis. The innermost arc-shaped curve highlights the
   prognostically consistent targets for tissue and plasma vesicles. (E)
   Bubble plot indicates the prognostic concordance of 17 IM genes in
   cEVs, paired tumor tissue, and TCGA expression profiles. Three of the
   IMs (dotted box marked) predicted PDAC prognosis (p < 0.05) in both
   blood and tissue datasets.

   Considering that the ligand-receptor pairs of immune checkpoint genes
   (ICGs) are key factors that reflect the magnitude and duration of the
   immune checkpoint response,[129]^20 we evaluated the expression
   correlation of 19 ICG pairs in cEVs and matched cancer tissue. All 19
   pairs were detected in tumors and cEVs ([130]Table S13), and 7 of 19
   ICG pairs showed coincident expression patterns (p < 0.01;
   [131]Figure 5C). The expression levels of nine ligands displayed a
   significantly positive correlation with their receptors in cEV profiles
   (p < 0.05; [132]Figure 5C). Overall, four ligand-receptor ICG pairs
   were activated both in cEVs and tissue, including three inhibitory
   pairs (LGALS9-HAVCR2, CD80-CTLA4, and PVR-TIGIT) and one stimulatory
   pair (TNFRSF14-CD160) ([133]Table S14). No significant expression
   correlation was observed between CD274 (which encodes PD-L1) and
   PDCD1LG2 (PD-L2) and their co-receptor PDCD1 (PD1) in cEVs
   ([134]Figure 5C). The inhibitory immune checkpoint receptor HAVCR2
   (TIM-3) and its ligand LGALS9 (GAL9) represented the most significant
   co-expressed targets activated both in cancer tissue and cEVs
   ([135]Figures 5C, [136]S3A, and S3B).

   We next investigated whether these immunogenic targets can be
   noninvasively used as biomarkers to track patient clinical outcomes. We
   identified 52 IM targets that were differentially expressed among the
   three PDAC clusters (Kruskal-Wallis rank-sum test, p < 0.05;
   [137]Figure 4A) and we correlated their expression level with OS in
   PDAC using HRs from univariate Cox regression model. Higher levels of
   34 (65%) IMs indicated an increased risk of death in PDAC (HR > 1),
   whereas 18 (35%) IMs showed the opposite prognostic trend (HR < 1;
   [138]Figure 5D, last track). Notably, HAVCR2 (TIM-3) was significantly
   upregulated in C-III and represented the best survival predictor (p <
   0.05; [139]Figures 5D and [140]S3C). [141]Figure 5E summarizes the risk
   assessment power of all IMs in PDCA from both internal and independent
   cohorts. Seventeen IMs showed similar prognostic patterns between cEVs
   and cancer tissue. Among the 17 IMs, two immunosuppressive targets,
   TIM-3 and VEGFA, were significantly correlated with poor prognosis,
   while one stimulatory target, SELP, was correlated with good prognosis
   ([142]Figures 5D, [143]S3D, and S3E).

Construction of a tissue/cEV prognostic signature for tracking PDAC clinical
outcome

   We next tried to simplify the above stratification strategy. The sample
   sizes of our included internal and external cohorts enable adequate
   power (1-β, more than 0.9) to predict patient survival and prognosis
   (see [144]Materials and methods). The workflow used to identify a
   tissue/cEV prognostic signature for PDAC is shown in [145]Figure 6A. We
   first eliminated exLRs with an expression frequency less than 25% and
   evaluated the differentially expressed genes among the three PDAC
   subtypes from exLR-seq profiles in the internal cohort (Kruskal-Wallis
   rank-sum test, FDR < 0.05; [146]Table S15). We further harvested
   markers with prognostic concordance in blood and lesions; the
   univariate Cox approach was used to investigate the blood-tissue
   prognostic valuable markers in patients with cEV and paired-tissue
   transcriptomic profiles (p < 0.05, see [147]Materials and methods). We
   obtained 986 exLRs, most of which (n = 802) were associated with poor
   prognosis of PDAC (HR > 1); a small number of candidate exLRs (n = 184)
   were indicated as protective prognostic factors (HR < 1) ([148]Table
   S16). GO enrichment analysis revealed that the exLRs were involved in
   regulation of lymphocyte activation and differentiation,
   innate/adaptive immune responses, RNA transport, and activation of
   cancer-related cell signaling ([149]Table S17).

Figure 6.

   [150]Figure 6
   [151]Open in a new tab

   Construction of a tissue/cEV prognostic signature for tracking PDAC
   clinical outcome

   (A) Workflow of prognostic marker selection and prognostic model
   construction. DEG, differentially expressed gene. (B) Characteristics
   of the nine markers with their coefficient and hazard ratios of a
   multivariate Cox model for PDAC prognosis prediction. (C) Comparison of
   cp-scores among the exLR-based PDAC subtypes. (D and E) OS curves of
   PDAC patients were stratified according to high- and low-cp-score in
   the discovery (D) and independent datasets (E). (F) Multivariate Cox
   analysis of clinicopathologic factors and cp-score with OS in the
   entire cohort. Variables with independently prognostic significance are
   highlighted in red. (G) Analysis of the tissue prognostic score
   (tp-score) using the paired tissue of the internal cohort; the number
   of patients in the high-tp-score and low-tp-score groups are n = 31 and
   n = 30, respectively. (H and I) Kaplan-Meier curve shows the OS (H) and
   time to recurrence (I) in patients according to tp-score in the
   TCGA-PDAC cohort. All of the high and low classifications were
   determined for each patient based on the dichotomized p-score.

   We further conducted a variable constriction in the internal exLR-seq
   dataset; the random-forest algorithm and the LASSO-Cox method were used
   to reduce the dimensionality. Nine exLR targets (HAVCR2, ADK, TP53I11,
   TBL3, ACD, LGALS9, CANT1, LYL1, and PKIG) were used to construct a
   model for predicting PDAC prognosis. The immunoregulatory factors
   HAVCR2 and LGALS9, which were identified as receptor-ligand ICGs, were
   also selected as candidate prognostic factors ([152]Figure 6B). Using
   the multivariate Cox algorithm, we established a prognostic model and
   generated an aggregate circulating exLR-based prognostic score for PDAC
   (cp-score; [153]Figure 6B, left). C-III exhibited a high median
   cp-score compared with the other exLR-based clusters (C-I, −0.39; C-II,
   −0.05; CIII, 0.40; Kruskal-Wallis test, p < 0.001). Kaplan-Meier curves
   were constructed for PDAC patients separated into high- and low-risk
   subgroups relative to the optimal cutoff for cp-score (see
   [154]Materials and methods). The median OS in the low-risk group was
   significantly higher than that of the high-risk group in the internal
   and external independent exLR-seq cohorts (p < 0.001, internal cohort,
   n = 227; p = 0.013, external independent cohort, n = 118; log-rank
   test; [155]Figures 6D and 6E). Multivariate Cox regression analysis
   showed that the cp-score was highly associated with the risk of death
   and was an independent prognostic factor of survival in all exLR-seq
   datasets (p < 0.001; [156]Figure 6F).

   We next explored whether the cp-score can be used for evaluation of
   prognosis using resected tissue. Using the same markers, we
   reconstructed a prognostic model and generated a tissue-specific
   p-score (tp-score) from internal paired tissue profiles. The nine
   candidates showed equivalent power for risk estimation in tissue
   consistent with that in cEVs ([157]Figure 6B, right). Patients with a
   high tp-score showed a shortened survival time (p < 0.001, HR = 2.9;
   95% confidence interval [CI], 1.66–5.09; [158]Figure 6G). The
   predictive ability of the tp-score was externally validated in tissue
   transcriptomic profiles from three independent datasets (TCGA-PDAC, p <
   0.01, HR = 2.08; ICGC-PDAC-AU, p < 0.05, HR = 2.99; ICGC-PDAC-CA, p <
   0.05, HR = 1.64; [159]Figures 6H, [160]S4A, and S4B).

   In addition, we evaluated whether the p-score could predict PDAC
   therapeutic effect by assigning the patients into high or low p-score
   groups both in cEVs and tissue datasets. The cp-score predicted tumor
   recurrence for resectable PDAC (stage I/II) cases after surgery (TTR,
   HR, 1.71; 95% CI, 1.02–2.85; p < 0.05; [161]Figure S4C). Consistent
   with these findings, a low tp-score in patients showed significant
   association with prolonged TTR both in the internal (HR, 2.69; 95% CI,
   1.49–4.83; p < 0.001; [162]Figure S4D) and independent tissue cohort
   (HR, 1.94; 95% CI, 1.17–3.22; p < 0.01; TCGA dataset; [163]Figure 6I).
   Collectively, our results suggested that the cEV-derived prognostic
   signature harbor equivalent as that of tumor tissue derived for
   predicting PDAC clinical outcome, including for predicting risk of
   recurrence after surgery.

Discussion

   PDAC is characterized by a high degree of intra-tumoral heterogeneity
   and an immunosuppressive TME, which represent the main obstacles to
   accurate stratification and treatment efficiency.[164]^21^,[165]^22 The
   difficulty in obtaining tumor specimens from most PDAC patients as well
   as the molecular heterogeneity of PDAC tumors make liquid biopsy-based
   biomarker assays crucially important to individualize
   management.[166]^5^,[167]^13^,[168]^23 In this study, we deciphered the
   cEV transcriptomic information using exLR-seq profiles from 345 PDAC
   plasma samples, representing the first study of noninvasive prognostic
   subtyping of PDAC based on large-scale plasma EV transcriptome
   analysis. We comprehensively characterized the heterogeneity of
   tissue-cellular origins and the immunomodulatory landscape from cEV
   profiles, identified novel therapeutic targets, and constructed a nine
   exLR-based prognostic signature for PDAC.

   We identified three PDAC subtypes based on plasma exLR profiles. Each
   subtype has its own molecular features and is related to patient
   prognosis. C-III patients had the worst prognosis and exhibited high
   expression of multiple cancer/immune-related exLRs in plasma. In
   addition, many signaling pathways shared similar concordance both in
   cEVs and cancer tissue, which includes PDAC- and immune-related
   hallmark signatures. Wnt/β-catenin signaling plays a critical role in
   pancreatic carcinogenesis and therapeutic resistance and regulates cell
   cycle progression, apoptosis, epithelial-mesenchymal transition,
   angiogenesis, stemness, and the tumor immune microenvironment.[169]^24
   TGF-β signaling pathways play a central role in the immunosuppressive
   response to cancer.[170]^25 PDAC-derived EVs could transform TGF-β
   signaling relative signatures into Kupffer cells and are involved in
   liver pre-metastatic niche formation and TME re-construction to
   increase PDAC metastatic burden.[171]^9 The crosstalk between TGF-β and
   WNT/β-catenin signaling in EVs promotes PDAC carcinogenesis. Thus,
   targeting these signaling pathways is an actively pursued strategy in
   PDAC treatment.[172]26, [173]27, [174]28

   We previously demonstrated that plasma exLRs derived from pancreatic
   cancer were significantly over-enriched in PDAC cases compared with
   benign samples and healthy controls.[175]^13 Based on the enumerating
   results of our EV-origin approach, the quantification of cEVs from
   specific tissue sources may reflect organ status during disease
   progression. In this study, cEVs in PDAC patients had significantly
   dysregulated tissue/organ sources, such as pancreas and liver, and were
   more likely severely affected by neoplastic invasion from primary and
   metastatic lesions, suggesting that the tissue/solid organ tracing
   strategy of our approach may noninvasively monitor the occurrence and
   development of PDAC in real time. We also observed that the cytotoxic
   immune cell components with a tumor-killing effect in PDAC were
   significantly reduced while immunosuppressive-related components were
   increased, reflecting neoplastic education to the immune system in
   response to carcinogenesis.[176]29, [177]30, [178]31

   PDAC comprises tumor cells within a microenvironment of extracellular
   matrix, stroma cells, endothelial cells, and immune cells.[179]^32 The
   therapeutic limitations of chemotherapy and immunotherapy have been
   attributed to the PDAC-specific TME.[180]^21 We hypothesize that
   analysis of the immune cell source contribution of cEVs could partially
   trace the body’s anti-tumor immune response and the dynamic alternation
   of the TME. A previous study reported that the PDAC-specific TME was
   predominantly affected by immunosuppressive cells, which may lead to
   the dysfunction of infiltrated CD8^+ T cells.[181]^33 These findings
   are consistent with our results and show that although cEV components
   from anti-tumor immune cells (such as terminal effector CD8/CD4^+
   T cells) were enriched in C-III, the patient prognosis and clinical
   outcome were still dismal. At the same time, monocyte-derived
   suppressor components, as well as Treg cells, are partnering factors
   that jointly contribute to local immunosuppression and tumor immune
   escape.[182]^34^,[183]^35 In agreement with this, we found that
   monocyte- and Treg cell-originated EV fractions were significantly
   associated with aggressive phenotype and indicated poor patient
   outcome. In addition, pancreatic neoplasia cells may produce high
   levels of cytokines to recruit Treg cells, which in turn inhibit the
   anti-tumor effects of cytotoxic T lymphocytes.[184]^36 Treg cells
   engage in extended interactions with tumor-infiltrated DCs and restrain
   their immunogenic function by suppressing the expression of
   costimulatory ligands necessary for CD8^+ T cell activation.[185]^37
   Consistent with the results observed in the TME, we found that the
   abundance of EVs derived from Treg cells was positively correlated with
   DCs and CD8 T cells infiltrating in tissues, suggesting that these
   interactions might be relevant to the immunosuppressive function of
   Treg cells. The immunosuppressive cEV components partially explain the
   poor prognosis in C-III. Therefore, exploration of these immune
   cell-derived components can provide a noninvasive and joint approach to
   dissect TME alterations and provide evidence for PDAC treatment.

   Advances in immune checkpoint blockade have brought a renewed
   appreciation of the value for reversing the immunosuppressive
   TME,[186]^38 and recent studies have focused on targeting PD-1/PDL1 and
   CTLA-4 in cancer treatment.[187]^39^,[188]^40 However, it appears that
   other key regulators also suppress the host’s immune system and
   function in immunotherapy resistance in cancers, including PDAC. By
   determining the heterogeneous expression pattern of ICGs within PDAC
   subtypes, we identified inhibitory immune checkpoint receptors,
   including TIM-3 and TIGIT, which were upregulated in cEVs and cancer
   tissue. TIM-3 and TIGIT expressed on the surface of T lymphocytes are
   bound by their ligands, Galectin-9 (encoded by LGALS9) and CD155 (PVR),
   to induce T cell exhaustion and immune escape.[189]^41^,[190]^42
   Several anti-TIM-3 agents have been evaluated in clinical trials for
   solid and hemopoietic malignances.[191]^43 Targeting the TIM-3/LGALS9
   and PVR/TIGIT axis may represent a promising approach for improvement
   of current immune checkpoint therapies.[192]^44^,[193]^45 We confirmed
   that this pattern of co-expression activation of ligand-receptor pairs
   can be detected in cEVs and harbored prognostic significance as in
   tissue. In addition, these suppressive ICGs were strictly enriched in
   the aggressive subsets, which may also explain why most patients with
   advanced PDAC are resistant to immunotherapies.[194]^46

   Our study had several strengths. First, this is the first PDAC
   subtyping study using cEV long RNA profiling. Second, the EV-origin
   deconvolution study has provided the first comprehensive landscape of
   tissue-cellular origins for cEVs in PDAC and suggested that immune
   cell-derived cEV components could reflect the TME heterogeneity, thus
   allowing dynamic monitoring of TME remodeling in a real-time,
   noninvasive fashion. However, this study has some limitations. The
   subtyping study was based on retrospective cohorts, and therefore most
   patients did not have enough gene mutation and cancer tissue-based
   molecular information to annotate exLR-based molecular subtypes.
   Moreover, the tumor samples matched with plasma EVs included in our
   study were derived from early resectable PDAC tumor tissue, which may
   insufficiently explain the characteristic correlation in advanced PDAC.

Conclusions

   Our findings demonstrated the potential of circulating exLR for PDAC
   subtyping. Based on a deconvolution study and paired cancer tissue
   profiling, we identified a profound correlation of the immune landscape
   between tumors and cEVs and identified novel PDAC therapeutic targets.
   Importantly, we constructed a robust nine exLR-based tissue-applicable
   signature for prognostication assessment of PDAC. Therefore, cEV long
   RNA-based subtyping and deconvolution enable noninvasive prediction of
   immunogenic signatures and clinical outcomes for PDAC.

Materials and methods

Study design

   This study employed a two-phase design. First, the subtyping,
   deconvoluting, and biomarker discovery phase was conducted, where we
   comprehensively analyzed the circulating exLR-seq transcriptomic data
   from patients with PDAC in the internal cohort for the identification
   of exLR-based PDAC phenotypes and clinical significance. We
   deconvoluted the tissue-cellular origin of cEVs between healthy
   controls and PDAC samples and investigated the potential clinical
   efficacy of immunogenic components at tissue and circulatory levels. In
   the second phase, we identified a clinically translatable signature
   from cEVs that predicts patient survival and tumor relapse. We
   performed validation of the discovered biomarkers both in cEVs and
   tissue profiles from multiple clinical cohorts of patients with PDAC.

Patient cohorts, sample characteristics, and data generation

   A total of 426 individuals were enrolled in this study, including 345
   patients with pathologically confirmed PDAC and 81 healthy controls
   receiving routine healthcare. Among the 345 PDAC cases, 227 plasma
   specimens were collected from Fudan University Shanghai Cancer Center
   (Shanghai, China) (as an internal cohort) between January 31, 2012 and
   June 6, 2017, and 118 PDAC cases (as an external independent cohort)
   were recruited from Shanghai Hospital of the Second Military Medical
   University (Shanghai, China; n = 100) and Xi’an Jiaotong University
   Affiliated Medical Center (Xi’an, China; n = 18) between September 11,
   2018 and April 2, 2019. Plasma was collected from PDAC patients before
   surgery for patients with resectable tumors and before chemotherapy for
   patients with inoperable tumors. The available plasma EV transcriptomic
   profiles originated from our previously published study[195]^13 and can
   be downloaded from the Gene Expression Omnibus (GEO) dataset with the
   access no. GEO: [196]GSE133684. For tissue specimen collection, we
   obtained 61 formalin-fixed, paraffin-embedded exLR-seq-matched PDAC
   tumor tissues from the internal cohort. The transcriptomic expression
   profiles of plasma EVs and matched tissue specimens from 61 PDAC
   patients from the internal cohort with OS and TTR information were
   determined. We downloaded tissue transcriptomic profiles from three
   external PDAC datasets (TCGA-PDAC, n = 150; ICGC-PDAC-CA, n = 131;
   ICGC-PDAC-AU, n = 73) to further validate the candidate exLR signature
   at the tissue level. The demographics and clinicopathologic
   characteristics for all patient cohorts are shown in [197]Table S1.

   We isolated plasma EVs by affinity-based binding to a spin column using
   the exoRNeasy serum/plasma kit (QIAGEN, Hilden, Germany). A SMARTer
   stranded total RNA-seq kit (Clontech, Palo Alto, CA, USA) was used to
   prepare the strand-specific RNA-seq libraries. The library quality of
   the 10 plasma exLR-seq samples per lane was sequenced with adding 20%
   PhiX to control base balance. The previously published Assembling
   Splice Junctions Analysis (ASJA) approach was used to conduct quality
   control, reads mapping, and transcripts quantification.[198]^47 The raw
   count matrixes were transformed into transcripts per kilobase million
   (TPM) values with gencodeV29 annotation to make gene expression
   profiles more comparable between cEVs and paired tissues. The
   expression of a circRNA gene for each exLR-seq sample was calculated as
   spliced reads per billion mapping (SRPBM = number of circular
   reads/total mapped reads [units in billion]). For the external
   TCGA-PDAC cohort, the RNA sequencing data and updated
   clinicopathological information were downloaded from UCSCXenaTools
   package ([199]https://github.com/cran/UCSCXenaTools).

Digital quantification of tissue-cellular-originated cEVs and TILs

   We used the EV-origin approach to quantify the abundance of EV
   fractions from blood and tissue origins from plasma exLR expression
   data.[200]^16 The EV-origin method was constructed from two
   representative signature matrices and a core SVM regression algorithm
   to deconvolute 7 hemopoietic cells and 16 tissue components from
   exLR-seq transcriptomic profiles. In this study, we expanded the number
   of traced hemopoietic fractions to 23, including 21 types of
   immunocytes and 2 hematopoietic stem cell constituents (see [201]Table
   S11). The new hemopoietic signature matrix was constructed by the
   normalized expression profiles of isolated blood cells from two
   datasets.[202]^16^,[203]^48 We selected cell-specific genes and
   re-built the comparable blood signature matrix by the same method as
   EV-origin. The relative proportions and absolute fractions of 23
   hemopoietic cells and 16 types of tissue/solid organ-derived cEV
   constituents were detected in each exLR-seq sample. To determine the
   abundance of infiltrating immune cells in PDAC tissue, we used a
   single-sample GSEA (ssGSEA) integrated with TIL gene
   signatures,[204]^49^,[205]^50 which allows for reliably discriminating
   the enrichment of 17 human immune cell phenotypes from tissue
   expression profiles.

exLR-based PDAC subtyping

   We used the ConsensusClusterPlus package to select the optimal cluster
   number and determined the clustering stability for the internal
   cohort.[206]^51 To capture stabilized exLR-based subtypes of PDAC, we
   first removed the exLRs with expression frequency no more than 50% to
   reduce the influence of low-frequency noises. Then we used a median
   absolute deviation (MAD) algorithm (a robust measure of sample bias for
   univariate numerical data) to fully characterized the exLR expressional
   viability in the internal exLR-seq cohort. We finally harvested 8,922
   exLR targets (top 60% targets of MAD quantitative ranking) as input
   markers for consensus clustering. These exLR-seq samples were grouped
   using hierarchical agglomerative clustering based on Manhattan distance
   and the strategy of partition around medoids (PAM). We selected the
   iteration number as 500 and the subsampling ratio as 0.8 to identify
   optimal clusters from a consensus similarity matrix generated by
   consensus clustering. Three completely segregated subtypes from the
   exLR-seq data in the internal cohort were obtained. These discovered
   clusters were used as representative subpopulations for further
   analysis.

Building a predictive model for PDAC clinical outcome

   On the basis of the exLR-seq analysis, we developed a combined
   circulating prognostic score (cp-score) for noninvasively estimating
   PDAC prognosis. We used the 227 cases in the internal cohort for
   discovery and 118 cases in the external independent cohort for
   validation. In the exLR-seq data from the internal cohort, we removed
   exLRs with a frequency less than 25% to obtain reliably expressed
   variables. A Kruskal-Wallis test was used to assess the differential
   expression of exLRs among the three clusters, and the p value of each
   marker was adjusted by the Benjamini-Hochberg method to control the
   FDR.[207]^52 We retained exLRs with an FDR less than 0.05 and carried
   out a consecutive model-based variable selection approach to identify
   prognostic valuable markers for predicting survival outcome.[208]^53 We
   first performed a univariate pre-screening procedure to remove
   excessive noise and obtain cancer-related targets to facilitate the
   subsequent computational analysis. We used whole-transcriptome markers
   as covariates, and the univariate Cox proportional hazards model was
   fitted to explore prognostic targets both in cEVs and paired tumor
   tissue profiles. Only markers with a p value <0.05 from the Wald
   statistic were retained. Overall, we obtained 986 exLRs that
   concurrently showed prognostic concordance in blood and tissue, in
   which the expression level indicates the same prognostic risk both in
   patient cEVs and tissues. We then used LASSO-Cox and the random forest
   algorithm to further shrink the marker number to a reasonable
   range.[209]^54^,[210]^55 For the LASSO-Cox constraint, 80% of exLR-seq
   samples from the training cohort were randomly subsampled to conduct
   LASSO regularization in 1,000 repetitions. We used 10-fold
   cross-validation and the Akaike information criterion to select an
   optimal value of “1-se” lambda parameter to construct an adaptive
   prognostic model for marker selection. The Boruta algorithm (a wrapper
   built around the random forest classification algorithm) was used to
   concurrently conduct dimension reduction and screen prognostically
   significant exLRs. This process generated nine final markers for a
   prognostic signature in the internal and validation cohorts. By fitting
   a multivariable Cox proportional hazards model on these nine targets
   from the internal EVs and tissue dataset, we determined the
   coefficients of each marker and generated two combined prognostic
   scores (p-score, designated as the cp-score for blood EVs and the
   tp-score for PDAC tumor tissue) for each individual. We then validated
   the significance of the prognostic scores in the independent datasets
   for the assessment of clinical outcome of PDAC.

Functional and pathway enrichment analysis

   Weighted gene co-expression network analysis (WGCNA) was performed to
   construct co-expressed networks and identify co-expression of exLR
   modules in the exLR-seq data from the internal cohort.[211]^56 We used
   median absolute deviation to estimate the degree of variation for exLR
   expression and applied WGCNA based on the exLR-seq expression data
   constructed by the top 5,000 aberrantly expressed exLRs. We first
   created a matrix of adjacencies by conducting the Pearson correlation
   analysis between gene pairs to determine the concordances of gene
   expression; this matrix was then transformed into a topological overlap
   matrix. The resulting topological overlap is a biologically meaningful
   measure of gene similarity based on co-expression relationships for the
   comparison of each gene pair. To identify the relatively balanced scale
   independence and mean connectivity of the WGCNA, we carefully selected
   the scale-free topology fit index to find an optimal soft-thresholding
   power. As shown in [212]Figure S2A, the power value 16 was used to
   produce a hierarchical clustering dendrogram of 5,000 variable exLR
   genes. The module eigengenes were determined as the primary key
   components computed by principal-component analysis that represents the
   manifestation of exLRs of a functional module into a unitary
   characteristic quantitative value. The manifestation configurations of
   exLR modules related to the traits of internal individuals were
   quantified by gene significance (GS) and module significance (MS). The
   GS measure was defined as the results of the Spearman correlation
   analysis among the ith gene profile xi and the sample trait T, and MS
   was defined as the average GS for all genes in the module:
   [MATH: <mrow><mi>G</mi><mi>S</mi><mi>i</mi><mtext> </mtext><mo
   linebreak="badbreak">=</mo><mrow><mo>|</mo><mrow><mi>c</mi><mi>o</mi><m
   i>r</mi><mrow><mo>(</mo><mrow><mi>x</mi><mi>i</mi><mo>,</mo><mspace
   width="0.25em"></mspace><mi>T</mi></mrow><mo>)</mo></mrow></mrow><mo>|<
   /mo></mrow><mo>.</mo></mrow> :MATH]

   The exLR genes were separated into eight co-expression modules
   according to their degree of connectivity. Each subunit was
   autonomously authenticated with another, which revealed a high level of
   individuality among the modules and the comparative independence of
   gene manifestation in each module. The exLRs that could not be
   contained in any modules were placed into module 8, which was removed
   in the subsequent GO analysis. All modules revealed a high correlation
   with PDAC phenotypes (C-I and C-III) and stage ([213]Figure 2A). We
   represented seven modules (M1–M7) as the most significant modules to
   reflect cEV inner transcriptomic heterogeneity among PDAC clusters.
   Gene annotation enrichment results were estimated from the
   clusterProfiler package[214]^57 for each module-specific exLR gene. We
   used a strict cutoff of FDR <0.05 to identify GO terms that were
   significantly enriched in each module ([215]Tables S2–S8). We
   identified the aberrantly enriched signaling pathways among PDAC
   subtypes by running a pre-ranked GSEA of the expression data for exLR
   genes. Gene sets were downloaded from the MSigDB platform of the Broad
   Institute that includes current annotations of Hallmark, Reactome, and
   KEGG datasets (Molecular Signatures Database v7.2).[216]^58^,[217]^59
   The p value calculations were based on 5,000 permutations of GSEA, and
   the multiple testing was subsequently adjusted by Benjamini-Hochberg
   method to regulate the FDR.[218]^52

Statistical analysis

   We used the Shapiro-Wilk normality test to examine the normality of
   quantitative variables. For the comparison of variables in two groups,
   the statistical significance for normally distributed variates was
   evaluated by a two-side Student’s t tests, and the group with
   non-normal distributions was tested by a Mann-Whitney U test (Wilcoxon
   rank-sum test). For the comparison of unreplicated data in more than
   two groups, the Kruskal-Wallis test and Friedman rank-sum test were
   implemented as non-parametric statistical approaches. We applied the
   Benjamini-Hochberg strategy to adjust the p value of multiple DEG
   analysis to control the FDR.[219]^52 The logistic regression was used
   to determine the statistical significance of the correlation between
   qualitative variables and PDAC subtypes. The optimal cutoff values of
   each dataset were calculated based on the correlation between patient
   survival data and continuous variables in each separate dataset using
   the survminer package. The Kaplan-Meier approach was used to construct
   cumulative survival curves for subgroups of each cohort, and the
   log-rank test was implemented to determine the statistical difference
   between groups. We used the univariate Cox proportional hazards
   regression model to analyze HRs for univariate prognostic analysis. We
   hypothesized that the power of prognostic evaluation could be more than
   0.9 with the introduction of exLRs, EV-derived components, and
   p-signature. With a two-side alpha (significance level) at 0.05, an
   overall event rate set at 0.93 for the internal cohort and 0.72 for the
   independent cohort, and an HR at 1.5, the included sample size was
   sufficient in the internal and validation cohorts to ensure a power of
   more than 95% for predicting PDAC prognosis (according to the PASS 11.0
   software of power [1−β] calculation for the Cox proportional hazards
   regression analysis with covariates).[220]^60 The prognostic
   independence of the p-score was determined by the multivariate Cox
   regression model. All statistical analyses were conducted by R program
   (v3.6.1, [221]https://www.r-project.org/). The p values were two-sided,
   and a p value less than 0.05 was considered statistically significant.

Ethical approval and consent to participate

   This study was approved by the Ethics Committee of the Fudan University
   Shanghai Cancer Center, Shanghai, China (approval no. 050432-4-1212B),
   and written informed consent was obtained from each participant in
   accordance with institutional guidelines.

Consent for publication

   All of the authors have read and approved the final manuscript for
   publication.

Availability of supporting data

   The transcriptomic data have been deposited at the GEO. The raw
   sequencing reads and integrated count matrix can be fully accessed
   under GEO: [222]GSE133684 and [223]GSE181273.

Acknowledgments