Abstract Tumor tissue collections are used to uncover pathways associated with disease outcomes that can also serve as targets for cancer treatment, ideally by comparing the molecular properties of cancer tissues to matching normal tissues. The quality of such collections determines the value of the data and information generated from their analyses including expression and modifications of nucleic acids and proteins. These biomolecules are dysregulated upon ischemia and decompose once the living cells start to decay into inanimate matter. Therefore, ischemia time before final tissue preservation is the most important determinant of the quality of a tissue collection. Here we show the impact of ischemia time on tumor and matching adjacent normal tissue samples for mRNAs in 1664, proteins in 1818, and phosphosites in 1800 cases (tumor and matching normal samples) of four solid tumor types (CRC, HCC, LUAD, and LUSC NSCLC subtypes). In CRC, ischemia times exceeding 15 min impacted 12.5% (mRNA), 25% (protein), and 50% (phosphosites) of differentially expressed molecules in tumor versus normal tissues. This hypoxia- and decay-induced dysregulation increased with longer ischemia times and was observed across tumor types. Interestingly, the proteomics analysis revealed that specimen ischemia time above 15 min is mostly associated with a dysregulation of proteins in the immune-response pathway and less so with metabolic processes. We conclude that ischemia time is a crucial quality parameter for tissue collections used for target discovery and validation in cancer research. Subject terms: Cancer, Translational research Introduction One of the major concepts in identifying functional proteins as oncogenic targets is the detection of alterations that lead to aberrant signaling in cancer-relevant pathways. A core principle of onco-pharmacological targeting approach is detecting and applying therapeutics that antagonize oncogenic protein function and thereby significantly improve overall survival [[44]1]. An important strategy to systematically identify new target proteins for cancer treatment is to build a cancer registry combining tissue and metadata collection, a strategy pursued, for example, by the TCGA consortium ([45]https://www.cancer.gov/tcga). To this end, cancer and ideally also matching normal adjacent tissue are collected and analyzed to compare the molecular characteristics of the two tissue types. The two parameters which have the strongest impact on the quality of tissue registries are the asservation method and the so-called cold ischemia time (in short ‘ischemia time’), which denotes the time it takes to freeze the tissue for permanent storage after its removal from the body during surgery. There is an ongoing debate about the consequences of ischemia on the quality of biomolecules, especially in the context of characterizing diverse molecular entities of tissues or cells (so-called ‘omics’). The corresponding findings vary widely [[46]2–[47]4] in part due to insufficient numbers of samples and lack of annotation for ischemia time leading to an insufficient power to detect the impact of ischemia time on the molecular composition of the materials. Hence, the impact of ischemia time on the molecular characteristics of the tissue under investigation remains unclear. Here, we use a large set of samples annotated for ischemia time to investigate the impact on candidate target discovery. We focus on the impact of ischemia time on differential expression of mRNAs, proteins and phosphoproteins using a carefully curated collection of patient-derived fresh-frozen tissue samples and related multiomic data. This database contains the breadth of molecular characteristics of tumor and normal adjacent tissues from multiple cancer types. Specimens were analyzed at the genomic, transcriptomic (TRX), proteomic (PTX) and phosphoproteomic (PPX) level with focus on colon cancer (CRC). Analyses were expanded to hepatocellular carcinoma (HCC) and non-small cell lung epithelial cancer (NSCLC), covering lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). To generate a baseline in our studies we set an initial filter to capture biomolecules differentially expressed in the group of samples with the shortest ischemia time available (<10 min.). We then analyze the changes in expression of these biomolecules over time with an emphasis on the impact of ischemia time on the target identification process, rather than trying to model the tissue decay under ischemia. We found that DNA sequences are to a large extent unaffected even by the longest ischemia times of samples collected. In contrast, the relationship between tumor and normal tissue mRNA, protein and phosphoprotein expression is affected at increasing severity (mRNA < protein < phosphoproteins). Based on the analyses we propose an ischemia time cut-off threshold at 12 min that enables a sufficient amount of tissues to be collected while avoiding a dilution of signals for biomolecules of interest. Results Cancer samples and differential expression Datasets and samples used in the analysis are shown in Table [48]1, tumor stages are shown in Table [49]2. Details on the samples can be found in the Methods section. Table 1. Number of tumor samples per ischemia time interval (min.) with specific omic data. Cohort T[1]: t < 10 T[2]: 10 ≤ t ≤ 12 T[3]: 13 ≤ t ≤ 15 T[4]: 16 ≤ t ≤ 18 T[5]: 19 ≤ t ≤ 20 t > 20 Total TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX CRC 181 188 188 223 223 217 111 102 101 44 47 47 18 19 19 36 54 54 613 633 626 HCC 48 41 41 35 36 36 11 9 9 15 15 16 6 6 6 32 38 38 147 145 146 LUAD 215 258 255 119 152 153 90 95 94 51 50 48 17 26 26 35 44 44 527 625 620 LUSC 134 157 153 95 103 102 56 58 58 40 37 35 20 23 23 32 37 37 377 415 408 [50]Open in a new tab Table 2. Number of tumor samples per cancer stage group with specific omic data. I II III IV NA Total Cohort TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX TRX PTX PPX CRC 65 60 60 229 251 248 195 201 199 121 118 116 3 3 3 613 633 626 HCC 31 33 34 18 22 22 18 17 17 7 7 7 73 66 66 147 145 146 LUAD 232 293 291 129 145 144 133 143 141 24 33 33 9 11 11 527 625 620 LUSC 126 151 147 125 130 130 116 119 116 6 8 8 4 7 7 377 415 408 [51]Open in a new tab We defined the ischemia time reference group as the group of samples with ischemia times less than 10 min, the shortest time possible for the collection and storage of a sufficient number of samples. To obtain differential biomolecule expression for mRNAs, proteins and phosphoproteins for the shortest ischemia time group, we normalized the expression data and selected the differentially expressed biomolecules using non-parametric Wilcoxon tests for paired samples to take into account the non-normal distribution of the data [[52]5] as described in the Methods part. We adjusted the resulting P-values for multiple testing and selected the biomolecule sets for mRNA, protein, and phosphoprotein using an α[fdr] level of 0.01 and an effect boundary using the 5th and 95th percentiles of the effect distribution. For the CRC cohort this yields 1948 differentially expressed mRNAs, 794 proteins, and 1846 phosphosites. Analogously, we identified 1870 differentially expressed mRNAs, 523 proteins, and 388 phosphosites in the HCC cohort. In the LUAD cohort, we found 1951 differentially expressed mRNAs, 805 proteins and 2217 phosphosites, and in the LUSC cohort, we found 1950 differentially expressed mRNAs, 798 proteins and 1919 phosphosites on which we focussed in the subsequent analyses. Differential expression over time in CRC We performed a detailed analysis of differential (tumor versus normal adjacent tissue) expression over time in CRC, and then applied the most relevant analyses to the other cancer types listed in Table [53]1. To evaluate whether DNA sequences remain unaffected by ischemia times as reported by others [[54]6], we analyzed protein sequences for affecting mutations (PAM), the presence or absence of gene deletions or amplification as well as gene truncations by comparing the shortest available ischemia time tissue group with the longer ischemia time groups for each genomic locus (see Methods). At α[fdr] = 0.01, only 0.09% of the proteins showed a signal in the PAM submodality, and no change was found in the other DNA-derived submodalities. This is likely based on random somatic difference in the genomic sequence of the patients of the various time groups (see section “Discussion”). To obtain an overview of the effect of ischemia time on the selected biomolecules, we initially partitioned the samples into groups of T[1]: t < 10, T[2]: 10 ≤ t ≤ 14, T[3]: 15 ≤ t ≤ 19, T[4]: 20 ≤ t ≤ 24, T[5]: t ≥ 25 min of ischemia time. We then used hierarchical clustering to assess the changes in biomolecule expression for the three omic modalities as described in “Material and methods” section. Figure [55]1 shows the results for the phosphoprotein modality for 10 clusters (the corresponding plots for the other two expression modalities are shown in Supplementary Figure [56]S1). Fig. 1. Differential expression of the selected phosphosites in 5-min intervals (abscissa) grouped into ten clusters (ordinate) in CRC. Fig. 1 [57]Open in a new tab Colors indicate log[2]-fold mean expression differences (phosphosite-wise standardized) between tumor and normal tissue. Red: upregulation in tumor, blue: downregulation. The effect magnitude and significance seem to increase for times over 20 min. To confirm this impression, we tested for differences in differential expression over time in each modality using a 2 × 5 factorial ANOVA-like design with the tissue types (tumor and normal) as first factor and the time groups T[1] … T[5] as second factor using the non-parametric Scheirer–Ray–Hare test also assessing interaction effects between the two main effects. Table [58]3A shows the phosphosites per cluster with significant tissue, time or interaction effects. The three effects on display are the two main effects of tissue and time difference and the interaction effect of the two variables. Table 3. Alterations in phosphosite expression over time across tissue types. A: Long duration ischemia time groups B: Refined ischemia time groups Significant (α[fdr] = 0.05) effect for Significant (α[fdr] = 0.05) effect for Cluster Tissue Time Interaction Cluster Tissue Time Interaction 1a 147 45 0 1b 115 16 0 2a 169 63 3 2b 293 51 2 3a 177 61 6 3b 283 35 2 4a 369 129 2 4b 95 11 0 5a 317 101 12 5b 136 25 0 6a 188 68 8 6b 89 10 0 7a 54 11 2 7b 442 80 1 8a 52 16 0 8b 189 21 2 9a 97 35 0 9b 74 15 0 10a 242 76 6 10b 105 20 2 Totals 1812 605 39 Totals 1821 284 9 [59]Open in a new tab Table [60]3 confirms that across all clusters (note that the clusters in both panels do not contain identical phosphosites since the data vectors per phosphosite differ between the panels) of the original time groups (panel A), 33% (605/1814) of the phosphosites display a significant time effect. We therefore repartitioned the available samples into the shortest ischemia group (samples with ischemia time shorter than 10 min), and into further groups of 3 min intervals from t ≥ 10 to 20 min (T[1] … T[5]) as shown in Table [61]1 in order to quantify which ischemia time point to use as cut-off (we call these refined time groups). With these refined groups (Table [62]3B) which exclude longer ischemia times, the time effect halves to 16% (284/1823) of the phosphosites. Note that the differing denominator between the two groupings is due to reassigned samples and a subsequently altered pattern of missing values. There are also fewer interaction effects when excluding longer ischemia times. The trends for mRNA and protein modalities are comparable, though less pronounced (see Supplementary Fig. [63]S1 and Supplementary Tables [64]S1 and [65]S2). But even though the effect is least pronounced in mRNA, to perform valid multiomics analyses, we need an ischemia time limit that is the same for all modalities. Thus, we aim for a limit that is optimal for the most sensitive type of molecule (phosphate groups), which must be under 20 min. Before investigating differential biomolecule expression, we analyzed the distribution of the underlying expression levels in normal and cancer tissue samples of the groups T[1] … T[5]. We specifically focused on biomolecules with extreme expression values which could play an important role in cancer. We regarded biomolecules with expression values outside the (Q[2.5], Q[97.5]) interval as extremely expressed. We determined the relative loss of such extreme biomolecules of reference group T[1] in other ischemia time groups. In detail, we counted how many biomolecules with extreme expression values outside (Q[2.5], Q[97.5]) of T[1] were not detected outside the (Q[2.5], Q[97.5]) intervals of the other groups T[2] … T[5] any more. For example for T[1] and T[2], this relative loss is [MATH: =|δ(T1)\δ(T2)||δ(T1)|100 :MATH] where δ computes the biomolecules outside the interval (Q[2.5], Q[97.5]) and | · · · | gives the set size. Table [66]4 shows the loss rates for tumor and normal tissues for the time group average. The vanishing biomolecules fall from outside (Q[2.5], Q[97.5]) into the interval in the time groups T[2] … T[5]. Table 4. Loss L of biomolecules outside the (Q[2.5], Q[97.5]) interval. Modality Tissue Biomolecule loss [%] vs. T[1] T[2] T[3] T[4] T[5] mRNA Normal 27 32 39 21 Tumor 23 28 45 27 Protein Normal 24 25 36 56 Tumor 20 23 27 42 Phosphoprotein Normal 49 64 72 94 Tumor 46 60 64 81 [67]Open in a new tab The loss of biomolecule sets in group T[2] is striking for all types of analyzed biomolecules, but especially for phosphosites. Notably, the loss of the most highly expressed mRNAs is slower than for proteins and even more so than for phosphoproteins, where almost 50% of the most highly expressed sites are lost in a short interval time between 10 and 12 min of ischemia. In mRNA, there seems to be a recovery of the initial expression pattern after the longest ischemia duration, an artifact that may be explained by the degradation and altered detection of the molecules. This has to be taken into account when looking at differential biomolecule expression. Temporal patterns of refined time series To evaluate the influence of ischemia on differential expression patterns, we next analysed the shorter interval groups T[1] … T[5] using a Dirichlet process Gaussian process mixture model (DPGP, details see “Materials and methods” section). This non-parametric time series analysis technique jointly models data clusters with a Dirichlet process and temporal dependencies with a Gaussian process. Note that this is a pseudo-time series. Figure [68]2 shows the eleven clusters obtained in CRC for the protein modality, which we selected out of the three available expression modalities because it allows for a deeper biological interpretation (analogous time series analyses were generated for each modality and tumor type but we did not gain additional biological insights). Fig. 2. Differential expression of the selected proteins in refined time groups (T[1]… T[5] as 3-min intervals on the abscissa) grouped into 11 clusters. [69]Fig. 2 [70]Open in a new tab The ordinates of each cluster show the normalized differential expression. The blue line shows the cluster mean expression. The red lines indicate the individual protein expression levels. The shaded blue area indicates the cluster mean ±2 · σ. We expect various patterns of protein expression variance due to ischemia with proteins losing or gaining abundance of expression due to the effect of the ischemia. Biological cluster interpretation To interpret the biological meaning of the protein expression patterns, we used a Wilcoxon test to identify proteins with differential ex- pression from one time group to another (µ[T2] − µ[T1],…, [MATH: μT< mi>k+1 :MATH] − [MATH: μT< mi>k :MATH] , k = 2 … 4) excluding effects inside the interval (2^−1/2, 2^1/2). We then mapped the genes related to these differentially expressed proteins to KEGG ([71]https://www.genome.jp/kegg/mapper/search.html) and performed a pathway enrichment analysis. We could not find cluster-specific relevant biological patterns, maybe because ischemia regulation cannot be revealed by DPGP-clustering on our pseudo-time series, but we still found interesting overall patterns. The analysis revealed acute inflammatory response and metabolism as the most significantly up- and downregulated pathways, respectively. Table [72]5 shows the number of significantly differentially expressed proteins per pathway. Table 5. Pathways with differential protein expression T[k+1] − T[k]. Pathways Number of proteins Upregulated Downregulated Immune response 9 2 Metabolic processes 5 8 Regulation of cell signaling 7 2 Transport of molecules 2 3 Cell structure and adhesion 1 2 Total 24 17 [73]Open in a new tab The data show an upregulation of proteins which are involved in immune-response but also in regulation of various cellular pathways and metabolism. Specifically, proteins involved in lipid metabolism are upregulated, namely ACSL4, FMO5, MOGAT2 and SULT2B1. That is most likely to supply for the high energy requirements induced by oxidative stress. Interestingly, 20% of the proteins in Table [74]5 are already deregulated between subsequent time points below 16 min (group T[2] and T[3]), indicating that alterations can occur rapidly under ischemic conditions. All of the deregulated proteins and their expression profiles are listed in Supplementary Table [75]S3. There is almost no differential expression effect in this group based on a tissue effect. These pathways reveal a coping mechanism involving a decrease in expression of proteins that are either not survival-critical under metabolic stress induced by ischemia (such as proteins involved in drug metabolism like UGT2B17 or UGT1A8) or those that are energy intensive for the cell (ribosome biogenesis). Ischemia very interestingly appears to slowly breach through this coping mechanism by causing a decrease in tumor survival-critical proteins such as NUDT1, ELOVL5, HPGDS or DZIP3 [[76]7–[77]10]. Confounder analysis We next analyzed the influence of ischemia time on differential biomolecule expression compared to other independent variables of known clinical importance by performing a classical confounder analysis using multiple linear regression. We computed a multi-variate linear model with a Gaussian error distribution at the biomolecule level for each omic type and each of the differentially expressed biomolecule sets obtained from the initial selection step described in subsection “Cancer samples and differential expression”. The variation of the fold-change between tumor and normal tissue expression was modeled as dependent variable to be explainable by the independent variables (details see section “Material and methods“). This analysis is only a rough approximation because at the biomolecule level both statistical assumptions for Gaussian linear models of normal error distribution and linear variable relations are not fully met as shown by statistical testing (not shown). Nevertheless, the analysis revealed informative trends when we determined the distributions of variable effect estimates for biomolecules with any statistically significant (p < 0.01) variable. The corresponding plots of regression coefficient estimate means, split by positive and negative effects, for the CRC cohort are shown in Fig. [78]3. Note that variables which did not have any significant effect on any biomolecule are not shown in the confounder plots (details see section “Material and methods”). Fig. 3. Mean regression coefficient estimates and standard deviation of important predictors (linear model T-statistic with p < 0.01) for the differentially expressed biomolecules in CRC. Fig. 3 [79]Open in a new tab Numbers related to each variable denote the number of biomolecules with any significant positive (red) or negative (blue) effect. For categorical variables, the reference variables of the linear model are indicated in section 4. For example, T[1] is the reference group for the other ischemia times. A Transcriptome, B Proteome, C Phosphoproteome. The figure shows that for CRC, disregarding the ischemia time variable effects (for which T[1] is the reference variable), the tumor grade and stage as well as the alcohol consumption status of the patients relative to their respective references are the strongest predictors of