Abstract
Tumor tissue collections are used to uncover pathways associated with
disease outcomes that can also serve as targets for cancer treatment,
ideally by comparing the molecular properties of cancer tissues to
matching normal tissues. The quality of such collections determines the
value of the data and information generated from their analyses
including expression and modifications of nucleic acids and proteins.
These biomolecules are dysregulated upon ischemia and decompose once
the living cells start to decay into inanimate matter. Therefore,
ischemia time before final tissue preservation is the most important
determinant of the quality of a tissue collection. Here we show the
impact of ischemia time on tumor and matching adjacent normal tissue
samples for mRNAs in 1664, proteins in 1818, and phosphosites in 1800
cases (tumor and matching normal samples) of four solid tumor types
(CRC, HCC, LUAD, and LUSC NSCLC subtypes). In CRC, ischemia times
exceeding 15 min impacted 12.5% (mRNA), 25% (protein), and 50%
(phosphosites) of differentially expressed molecules in tumor versus
normal tissues. This hypoxia- and decay-induced dysregulation increased
with longer ischemia times and was observed across tumor types.
Interestingly, the proteomics analysis revealed that specimen ischemia
time above 15 min is mostly associated with a dysregulation of proteins
in the immune-response pathway and less so with metabolic processes. We
conclude that ischemia time is a crucial quality parameter for tissue
collections used for target discovery and validation in cancer
research.
Subject terms: Cancer, Translational research
Introduction
One of the major concepts in identifying functional proteins as
oncogenic targets is the detection of alterations that lead to aberrant
signaling in cancer-relevant pathways. A core principle of
onco-pharmacological targeting approach is detecting and applying
therapeutics that antagonize oncogenic protein function and thereby
significantly improve overall survival [[44]1]. An important strategy
to systematically identify new target proteins for cancer treatment is
to build a cancer registry combining tissue and metadata collection, a
strategy pursued, for example, by the TCGA consortium
([45]https://www.cancer.gov/tcga). To this end, cancer and ideally also
matching normal adjacent tissue are collected and analyzed to compare
the molecular characteristics of the two tissue types. The two
parameters which have the strongest impact on the quality of tissue
registries are the asservation method and the so-called cold ischemia
time (in short ‘ischemia time’), which denotes the time it takes to
freeze the tissue for permanent storage after its removal from the body
during surgery.
There is an ongoing debate about the consequences of ischemia on the
quality of biomolecules, especially in the context of characterizing
diverse molecular entities of tissues or cells (so-called ‘omics’). The
corresponding findings vary widely [[46]2–[47]4] in part due to
insufficient numbers of samples and lack of annotation for ischemia
time leading to an insufficient power to detect the impact of ischemia
time on the molecular composition of the materials. Hence, the impact
of ischemia time on the molecular characteristics of the tissue under
investigation remains unclear. Here, we use a large set of samples
annotated for ischemia time to investigate the impact on candidate
target discovery.
We focus on the impact of ischemia time on differential expression of
mRNAs, proteins and phosphoproteins using a carefully curated
collection of patient-derived fresh-frozen tissue samples and related
multiomic data. This database contains the breadth of molecular
characteristics of tumor and normal adjacent tissues from multiple
cancer types. Specimens were analyzed at the genomic, transcriptomic
(TRX), proteomic (PTX) and phosphoproteomic (PPX) level with focus on
colon cancer (CRC). Analyses were expanded to hepatocellular carcinoma
(HCC) and non-small cell lung epithelial cancer (NSCLC), covering lung
adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC).
To generate a baseline in our studies we set an initial filter to
capture biomolecules differentially expressed in the group of samples
with the shortest ischemia time available (<10 min.). We then analyze
the changes in expression of these biomolecules over time with an
emphasis on the impact of ischemia time on the target identification
process, rather than trying to model the tissue decay under ischemia.
We found that DNA sequences are to a large extent unaffected even by
the longest ischemia times of samples collected. In contrast, the
relationship between tumor and normal tissue mRNA, protein and
phosphoprotein expression is affected at increasing severity (mRNA <
protein < phosphoproteins). Based on the analyses we propose an
ischemia time cut-off threshold at 12 min that enables a sufficient
amount of tissues to be collected while avoiding a dilution of signals
for biomolecules of interest.
Results
Cancer samples and differential expression
Datasets and samples used in the analysis are shown in Table [48]1,
tumor stages are shown in Table [49]2. Details on the samples can be
found in the Methods section.
Table 1.
Number of tumor samples per ischemia time interval (min.) with specific
omic data.
Cohort T[1]: t < 10 T[2]: 10 ≤ t ≤ 12 T[3]: 13 ≤ t ≤ 15 T[4]: 16 ≤
t ≤ 18 T[5]: 19 ≤ t ≤ 20 t > 20 Total
TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX
TRX PTX PPX
CRC 181 188 188 223 223 217 111 102 101 44 47 47 18 19 19 36 54 54 613
633 626
HCC 48 41 41 35 36 36 11 9 9 15 15 16 6 6 6 32 38 38 147 145 146
LUAD 215 258 255 119 152 153 90 95 94 51 50 48 17 26 26 35 44 44 527
625 620
LUSC 134 157 153 95 103 102 56 58 58 40 37 35 20 23 23 32 37 37 377 415
408
[50]Open in a new tab
Table 2.
Number of tumor samples per cancer stage group with specific omic data.
I II III IV NA Total
Cohort TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX
PTX PPX
CRC 65 60 60 229 251 248 195 201 199 121 118 116 3 3 3 613 633 626
HCC 31 33 34 18 22 22 18 17 17 7 7 7 73 66 66 147 145 146
LUAD 232 293 291 129 145 144 133 143 141 24 33 33 9 11 11 527 625 620
LUSC 126 151 147 125 130 130 116 119 116 6 8 8 4 7 7 377 415 408
[51]Open in a new tab
We defined the ischemia time reference group as the group of samples
with ischemia times less than 10 min, the shortest time possible for
the collection and storage of a sufficient number of samples. To obtain
differential biomolecule expression for mRNAs, proteins and
phosphoproteins for the shortest ischemia time group, we normalized the
expression data and selected the differentially expressed biomolecules
using non-parametric Wilcoxon tests for paired samples to take into
account the non-normal distribution of the data [[52]5] as described in
the Methods part. We adjusted the resulting P-values for multiple
testing and selected the biomolecule sets for mRNA, protein, and
phosphoprotein using an α[fdr] level of 0.01 and an effect boundary
using the 5th and 95th percentiles of the effect distribution. For the
CRC cohort this yields 1948 differentially expressed mRNAs, 794
proteins, and 1846 phosphosites. Analogously, we identified 1870
differentially expressed mRNAs, 523 proteins, and 388 phosphosites in
the HCC cohort. In the LUAD cohort, we found 1951 differentially
expressed mRNAs, 805 proteins and 2217 phosphosites, and in the LUSC
cohort, we found 1950 differentially expressed mRNAs, 798 proteins and
1919 phosphosites on which we focussed in the subsequent analyses.
Differential expression over time in CRC
We performed a detailed analysis of differential (tumor versus normal
adjacent tissue) expression over time in CRC, and then applied the most
relevant analyses to the other cancer types listed in Table [53]1.
To evaluate whether DNA sequences remain unaffected by ischemia times
as reported by others [[54]6], we analyzed protein sequences for
affecting mutations (PAM), the presence or absence of gene deletions or
amplification as well as gene truncations by comparing the shortest
available ischemia time tissue group with the longer ischemia time
groups for each genomic locus (see Methods). At α[fdr] = 0.01, only
0.09% of the proteins showed a signal in the PAM submodality, and no
change was found in the other DNA-derived submodalities. This is likely
based on random somatic difference in the genomic sequence of the
patients of the various time groups (see section “Discussion”).
To obtain an overview of the effect of ischemia time on the selected
biomolecules, we initially partitioned the samples into groups of T[1]:
t < 10, T[2]: 10 ≤ t ≤ 14, T[3]: 15 ≤ t ≤ 19, T[4]: 20 ≤ t ≤ 24, T[5]:
t ≥ 25 min of ischemia time. We then used hierarchical clustering to
assess the changes in biomolecule expression for the three omic
modalities as described in “Material and methods” section. Figure [55]1
shows the results for the phosphoprotein modality for 10 clusters (the
corresponding plots for the other two expression modalities are shown
in Supplementary Figure [56]S1).
Fig. 1. Differential expression of the selected phosphosites in 5-min
intervals (abscissa) grouped into ten clusters (ordinate) in CRC.
Fig. 1
[57]Open in a new tab
Colors indicate log[2]-fold mean expression differences
(phosphosite-wise standardized) between tumor and normal tissue. Red:
upregulation in tumor, blue: downregulation.
The effect magnitude and significance seem to increase for times over
20 min. To confirm this impression, we tested for differences in
differential expression over time in each modality using a 2 × 5
factorial ANOVA-like design with the tissue types (tumor and normal) as
first factor and the time groups T[1] … T[5] as second factor using the
non-parametric Scheirer–Ray–Hare test also assessing interaction
effects between the two main effects. Table [58]3A shows the
phosphosites per cluster with significant tissue, time or interaction
effects. The three effects on display are the two main effects of
tissue and time difference and the interaction effect of the two
variables.
Table 3.
Alterations in phosphosite expression over time across tissue types.
A: Long duration ischemia time groups B: Refined ischemia time groups
Significant (α[fdr] = 0.05) effect for Significant (α[fdr] = 0.05)
effect for
Cluster Tissue Time Interaction Cluster Tissue Time Interaction
1a 147 45 0 1b 115 16 0
2a 169 63 3 2b 293 51 2
3a 177 61 6 3b 283 35 2
4a 369 129 2 4b 95 11 0
5a 317 101 12 5b 136 25 0
6a 188 68 8 6b 89 10 0
7a 54 11 2 7b 442 80 1
8a 52 16 0 8b 189 21 2
9a 97 35 0 9b 74 15 0
10a 242 76 6 10b 105 20 2
Totals 1812 605 39 Totals 1821 284 9
[59]Open in a new tab
Table [60]3 confirms that across all clusters (note that the clusters
in both panels do not contain identical phosphosites since the data
vectors per phosphosite differ between the panels) of the original time
groups (panel A), 33% (605/1814) of the phosphosites display a
significant time effect. We therefore repartitioned the available
samples into the shortest ischemia group (samples with ischemia time
shorter than 10 min), and into further groups of 3 min intervals from
t ≥ 10 to 20 min (T[1] … T[5]) as shown in Table [61]1 in order to
quantify which ischemia time point to use as cut-off (we call these
refined time groups). With these refined groups (Table [62]3B) which
exclude longer ischemia times, the time effect halves to 16% (284/1823)
of the phosphosites. Note that the differing denominator between the
two groupings is due to reassigned samples and a subsequently altered
pattern of missing values. There are also fewer interaction effects
when excluding longer ischemia times. The trends for mRNA and protein
modalities are comparable, though less pronounced (see Supplementary
Fig. [63]S1 and Supplementary Tables [64]S1 and [65]S2). But even
though the effect is least pronounced in mRNA, to perform valid
multiomics analyses, we need an ischemia time limit that is the same
for all modalities. Thus, we aim for a limit that is optimal for the
most sensitive type of molecule (phosphate groups), which must be under
20 min.
Before investigating differential biomolecule expression, we analyzed
the distribution of the underlying expression levels in normal and
cancer tissue samples of the groups T[1] … T[5]. We specifically
focused on biomolecules with extreme expression values which could play
an important role in cancer. We regarded biomolecules with expression
values outside the (Q[2.5], Q[97.5]) interval as extremely expressed.
We determined the relative loss of such extreme biomolecules of
reference group T[1] in other ischemia time groups. In detail, we
counted how many biomolecules with extreme expression values outside
(Q[2.5], Q[97.5]) of T[1] were not detected outside the (Q[2.5],
Q[97.5]) intervals of the other groups T[2] … T[5] any more. For
example for T[1] and T[2], this relative loss is
[MATH: ℒ=|δ(T1)\δ(T2)||δ(T1)|⋅100
:MATH]
where δ computes the biomolecules outside the interval (Q[2.5],
Q[97.5]) and | · · · | gives the set size. Table [66]4 shows the loss
rates for tumor and normal tissues for the time group average. The
vanishing biomolecules fall from outside (Q[2.5], Q[97.5]) into the
interval in the time groups T[2] … T[5].
Table 4.
Loss L of biomolecules outside the (Q[2.5], Q[97.5]) interval.
Modality Tissue Biomolecule loss [%] vs. T[1]
T[2] T[3] T[4] T[5]
mRNA Normal 27 32 39 21
Tumor 23 28 45 27
Protein Normal 24 25 36 56
Tumor 20 23 27 42
Phosphoprotein Normal 49 64 72 94
Tumor 46 60 64 81
[67]Open in a new tab
The loss of biomolecule sets in group T[2] is striking for all types of
analyzed biomolecules, but especially for phosphosites. Notably, the
loss of the most highly expressed mRNAs is slower than for proteins and
even more so than for phosphoproteins, where almost 50% of the most
highly expressed sites are lost in a short interval time between 10 and
12 min of ischemia. In mRNA, there seems to be a recovery of the
initial expression pattern after the longest ischemia duration, an
artifact that may be explained by the degradation and altered detection
of the molecules. This has to be taken into account when looking at
differential biomolecule expression.
Temporal patterns of refined time series
To evaluate the influence of ischemia on differential expression
patterns, we next analysed the shorter interval groups T[1] … T[5]
using a Dirichlet process Gaussian process mixture model (DPGP, details
see “Materials and methods” section). This non-parametric time series
analysis technique jointly models data clusters with a Dirichlet
process and temporal dependencies with a Gaussian process. Note that
this is a pseudo-time series.
Figure [68]2 shows the eleven clusters obtained in CRC for the protein
modality, which we selected out of the three available expression
modalities because it allows for a deeper biological interpretation
(analogous time series analyses were generated for each modality and
tumor type but we did not gain additional biological insights).
Fig. 2. Differential expression of the selected proteins in refined time
groups (T[1]… T[5] as 3-min intervals on the abscissa) grouped into 11
clusters.
[69]Fig. 2
[70]Open in a new tab
The ordinates of each cluster show the normalized differential
expression. The blue line shows the cluster mean expression. The red
lines indicate the individual protein expression levels. The shaded
blue area indicates the cluster mean ±2 · σ.
We expect various patterns of protein expression variance due to
ischemia with proteins losing or gaining abundance of expression due to
the effect of the ischemia.
Biological cluster interpretation
To interpret the biological meaning of the protein expression patterns,
we used a Wilcoxon test to identify proteins with differential ex-
pression from one time group to another (µ[T2] − µ[T1],…,
[MATH:
μT<
mi>k+1 :MATH]
−
[MATH:
μT<
mi>k :MATH]
, k = 2 … 4) excluding effects inside the interval (2^−1/2, 2^1/2).
We then mapped the genes related to these differentially expressed
proteins to KEGG ([71]https://www.genome.jp/kegg/mapper/search.html)
and performed a pathway enrichment analysis. We could not find
cluster-specific relevant biological patterns, maybe because ischemia
regulation cannot be revealed by DPGP-clustering on our pseudo-time
series, but we still found interesting overall patterns. The analysis
revealed acute inflammatory response and metabolism as the most
significantly up- and downregulated pathways, respectively. Table [72]5
shows the number of significantly differentially expressed proteins per
pathway.
Table 5.
Pathways with differential protein expression T[k+1] − T[k].
Pathways Number of proteins
Upregulated Downregulated
Immune response 9 2
Metabolic processes 5 8
Regulation of cell signaling 7 2
Transport of molecules 2 3
Cell structure and adhesion 1 2
Total 24 17
[73]Open in a new tab
The data show an upregulation of proteins which are involved in
immune-response but also in regulation of various cellular pathways and
metabolism. Specifically, proteins involved in lipid metabolism are
upregulated, namely ACSL4, FMO5, MOGAT2 and SULT2B1. That is most
likely to supply for the high energy requirements induced by oxidative
stress. Interestingly, 20% of the proteins in Table [74]5 are already
deregulated between subsequent time points below 16 min (group T[2] and
T[3]), indicating that alterations can occur rapidly under ischemic
conditions. All of the deregulated proteins and their expression
profiles are listed in Supplementary Table [75]S3. There is almost no
differential expression effect in this group based on a tissue effect.
These pathways reveal a coping mechanism involving a decrease in
expression of proteins that are either not survival-critical under
metabolic stress induced by ischemia (such as proteins involved in drug
metabolism like UGT2B17 or UGT1A8) or those that are energy intensive
for the cell (ribosome biogenesis). Ischemia very interestingly appears
to slowly breach through this coping mechanism by causing a decrease in
tumor survival-critical proteins such as NUDT1, ELOVL5, HPGDS or DZIP3
[[76]7–[77]10].
Confounder analysis
We next analyzed the influence of ischemia time on differential
biomolecule expression compared to other independent variables of known
clinical importance by performing a classical confounder analysis using
multiple linear regression. We computed a multi-variate linear model
with a Gaussian error distribution at the biomolecule level for each
omic type and each of the differentially expressed biomolecule sets
obtained from the initial selection step described in subsection
“Cancer samples and differential expression”.
The variation of the fold-change between tumor and normal tissue
expression was modeled as dependent variable to be explainable by the
independent variables (details see section “Material and methods“).
This analysis is only a rough approximation because at the biomolecule
level both statistical assumptions for Gaussian linear models of normal
error distribution and linear variable relations are not fully met as
shown by statistical testing (not shown). Nevertheless, the analysis
revealed informative trends when we determined the distributions of
variable effect estimates for biomolecules with any statistically
significant (p < 0.01) variable. The corresponding plots of regression
coefficient estimate means, split by positive and negative effects, for
the CRC cohort are shown in Fig. [78]3. Note that variables which did
not have any significant effect on any biomolecule are not shown in the
confounder plots (details see section “Material and methods”).
Fig. 3. Mean regression coefficient estimates and standard deviation of
important predictors (linear model T-statistic with p < 0.01) for the
differentially expressed biomolecules in CRC.
Fig. 3
[79]Open in a new tab
Numbers related to each variable denote the number of biomolecules with
any significant positive (red) or negative (blue) effect. For
categorical variables, the reference variables of the linear model are
indicated in section 4. For example, T[1] is the reference group for
the other ischemia times. A Transcriptome, B Proteome, C
Phosphoproteome.
The figure shows that for CRC, disregarding the ischemia time variable
effects (for which T[1] is the reference variable), the tumor grade and
stage as well as the alcohol consumption status of the patients
relative to their respective references are the strongest predictors of