Abstract For patients with locally advanced rectal cancer (LARC) treated with preoperation chemoradiation (pCRT), identifying differentially expressed (DE) genes between non-responders and responders is a common approach for investigating mechanisms of chemoradiation resistance. However, some of such DE genes might be irrelevant to cancer itself but simply reflect the pharmacokinetic differences of the normal tissues. In this study, we adopted the RankComp algorithm to identify DE genes for each of LARC sample compared with its own normal state. Then, we identified genes with significantly different deregulation frequencies between the non-responders and responders, defined as cancer-related pCRT-response genes. Pathway enrichment and protein-protein interaction analyses showed that these genes specifically and intensively interacted with currently known effective genes of pCRT, involving in DNA replication, cell cycle and DNA repair. In contrast, after excluding the cancer-related pCRT-response genes, the other DE genes between non-responders and responders were enriched in many pathways of drug and protein metabolisms and transports, and interacted with both the known effective genes and pharmacokinetic genes. Hence, these two types of DE genes should be distinguished for investigating mechanisms of pCRT response in LARCs. __________________________________________________________________ Currently, neoadjuvant preoperation chemoradiation (pCRT) with 5-fluorouracil (5-FU)-based regimens followed by surgical resection is extensively employed for the locally advanced rectal cancers (LARCs)[44]^1. Usually, about 10–25% of LARCs have pathologic complete response, whereas most patients cannot benefit from the therapy but suffer delayed toxicity risk[45]^2. Although many factors mitigating the pCRT response are known, we are still unable to identify patients who will be susceptible to pCRT and benefit from this therapy[46]^3. Because response to pCRT correlates to positive survival outcomes[47]^4, it is necessary to investigate the mechanisms of pCRT response in LARCs. A common approach for investigating mechanisms of pCRT response is firstly identifying differentially expressed (DE) genes between the non-responders and responders[48]^5,[49]^6,[50]^7,[51]^8,[52]^9,[53]^10,[54]^11,[55]^12,[5 6]^13,[57]^14,[58]^15,[59]^16,[60]^17. However, such DE genes may have various origins. Some of such DE genes might be relevant to LARCs but deregulated with different patterns in non-responders and responders compared with normal rectal tissues. The other of such DE genes might be deregulated in neither non-responders nor responders, compared with normal rectal tissues, and thus be irrelevant to LARCs but simply reflect the pharmacokinetic differences between the non-responders and responders. Thus, it should be of great interest to clarify the different origins and roles of DE genes between the non-responders and responders. However, it is difficult to identify DE genes in responders and/or non-responders compared with normal controls because currently few data for studying LARCs response to chemoradiation include normal rectal samples. Although there are some available data of normal rectal samples generated by different laboratories, they cannot be compared directly due to experimental batch effects[61]^18,[62]^19 and current methods for adjusting batch effects could distort biological signals[63]^20. Recently, we have reported an interesting biological phenomenon that the within-sample relative expression orderings (REOs) of genes within each of a particular type of normal tissues are highly stable but widely disrupted in the corresponding cancer tissues[64]^21. Based on that, RankComp was proposed to detect the deregulated genes in a disease sample through comparing the REOs in this sample with the stable REOs in normal human tissues[65]^21. Because all comparisons of REOs occur within individual samples, which obviates the need of data normalization, we will be able to utilize multiple datasets for the rectal normal samples to identify deregulated genes in cancer samples at the individual level. In this study, using data of 34 normal rectal tissue samples from three datasets, we extracted gene pairs with identical REOs in all the samples as the landscape of normal rectal tissue. Then, we used RankComp to identify DE genes for each of 38 non-responders and 34 responders of LARCs. Subsequently, we identified 186 genes that had significantly different deregulation frequencies between the non-responders and responders, defined as cancer-related pCRT-response genes. Among these genes, 57 genes were shared with DE genes between the non-responders and responders identified with RankProduct[66]^22. After excluding the cancer-related pCRT-response genes, the other DE genes between the non-responders and responders were defined as cancer-unrelated pCRT-response genes. Finally, by pathway enrichment and protein-protein inteaction analyses, we found that the cancer-related and cancer-unrelated pCRT-response genes tend to play different roles in mechanisms of chemoradiation resistance. The framework of our research is described in [67]Fig. 1. Figure 1. The flowchart of the analysis procedure. [68]Figure 1 [69]Open in a new tab DE genes between the non-responders and responders includes 57 cancer-related pCRT-response genes. After excluding these 57 genes, the other DE genes between the non-responders and responders were defined as the cancer-unrelated pCRT-response genes. Results Cancer-related pCRT-response genes Because only a limited number of normal rectal samples measured by a particular gene expression profiling platform could be obtained, we collected 34 normal rectal samples from three datasets measured by the Affymetrix, Illumina and Agilent platforms (see Methods), respectively, to construct cross-platform stable relative expression orderings (REOs) of gene pairs in normal rectal tissues. As reported in our recent paper[70]^23, based on gene pairs with stable REOs across normal samples measured by several platforms, RankComp could accurately detect DE genes in individual cancer samples measured by any of these platforms. Firstly, we evaluated the stability of REOs using a set of 21 normal tissue samples from [71]GSE68204 measured by the Agilent platform and another set of 13 samples from two datasets ([72]GSE9254 and [73]GSE75548 measured by Affymetrix and Illumina platforms, respectively). In each set, we identified gene pairs which showed identical REOs in all the normal samples. A total 37,811,288 gene pairs were identified in both of the two sample sets, among which 92.9% had the same REO patterns across the two sets of samples. This was highly unlikely to happen by chance (Binomial test, p-value = 1.0E-100), suggesting that the REOs of these gene pairs are highly stable in normal rectal tissues measured by the three different platform. Thus, taking the 35,127,930 gene pairs with identical REOs in the 34 normal rectal tissue samples measured by the three platforms as the normal REOs landscape of rectal tissues, we could apply RankComp to detect DE genes in rectal cancer samples measured by any of these platforms[74]^23. Then, we identified DE genes in each of the 72 samples of rectal cancer collected from two datasets ([75]GSE35452 and [76]GSE53781) with RankComp through comparing with the stable REO normal landscape of rectal tissue[77]^21 (see Methods). Averagely, 1596 DE genes per sample were identified, among which 186 genes had significantly different deregulation frequencies between the non-responders and responders (Fisher exact test, FDR < 0.05). These 186 genes, referred to as cancer-related pCRT-response genes, were significantly enriched in cancer-related pathways including DNA replication[78]^24, cell cycle[79]^25 and mismatch repair[80]^26 (Hypergeometric distribution model, FDR < 0.2). The pathways were discribed in [81]Table 1. Notably, we found no significant pathways with a stricter FDR control level of 0.05, possibly due to the insufficient power of pathway enrichment analysis based on only 186 cancer-related pCRT-response genes found in the data. Thus, we detected significant pathways with a loose FDR control level of 0.2 to provide potentially functional clues of the cancer-related pCRT-response genes. Table 1. The pathways enriched with cancer-related and cancer-unrelated pCRT-response genes, respectively. Genes of pCRT-response KEGG Pathway P-value Cancer-related DNA replication 5.24E-04 Mismatch repair 1.52E-03 Cell cycle 1.80E-03 Cancer-unrelated Metabolism of xenobiotics by cytochrome P450 9.45E-03 Cysteine and methionine metabolism 2.74E-03 Metabolic pathways 2.32E-03 Ribosome 1.72E-08 Proteasome 9.74E-07 Protein digestion and absorption 4.42E-07 ECM-receptor interaction 1.65E-04 Oxidative phosphorylation 1.53E-08 [82]Open in a new tab Cancer-unrelated pCRT-response genes Firstly, we identified 1288 and 805 DE genes between the non-responders and responders in the [83]GSE35452 and [84]GSE53781 datasets, respectively (RankProduct, FDR < 5%). The two lists of DE genes shared 101 genes, of which 80% showed consistent deregulation directions (up- or down-regulations) in the non-responders compared with responders across the two datasets. This consistency score was unlikely to be observed by random chance(Binomial distribution test, p-value = 1.3E-10), indicating that at least the overlaps of the two lists of DE genes between the non-responders and responders were significantly reproducible in the two datasets. Notably, the percentage of overlapping genes between the two lists of DE genes was apparently low, which indicated that only a small percentage of DE genes could be found in each of the two datasets due to insufficient statistical power[85]^27,[86]^28. Considering the insufficient power of DE genes detection, we merged the two lists of DE genes, excluding inconsistent DE genes between the two datasets, and obtained a list of 1976 DE genes. These 1976 DE genes, referred to as pCRT-response genes for simplicity, included 57 of the cancer-related pCRT-response genes (as shown in [87]Supplementary Table 2). After excluding these 57 genes, the other 1919 DE genes were defined as the cancer-unrelated pCRT-response genes, which were significantly enriched in various metabolic pathways including metabolism of xenobiotics by cytochrome P450[88]^29, and other pathways including oxidative phosphorylation and extracellular matrix receptor interaction (Hypergeometric distribution model, FDR < 0.2). The pathways were shown in [89]Table 1. PPI network analysis of the two types of pCRT-response genes We collected 113 genes known to be involved in LARCs response to pCRT, including 28 genes participating in metabolisms and transports of drug, 47 genes participating in purine and pyrimidine metabolism, 17 downstream effective genes of 5-FU which play roles in DNA repair, cell cycle arrest and apoptosis[90]^30 and 21 radio-response genes playing roles in DNA-damage related function[91]^31. Because the metabolism genes of purine and pyrimidine participate in DNA-damage-related function[92]^30, we classified them as effective genes together with the downstream effective genes of 5-FU and the radio-response genes, including a total 85 genes. The other 28 genes participating in drug metabolism and transports were referred to as the pharmacokinetic genes. The effective genes and pharmacokinetics genes are described in [93]Supplementary Table 1. Then, through the human protein-protein interaction (PPI) network[94]^32, we showed that cancer-related pCRT-response genes tend to closely connected with the effective genes only, whereas the cancer-unrelated pCRT-response genes tend to interact with the pharmacokinetic genes as well as the effective genes. To be more specific, for the 186 cancer-related and 1919 cancer-unrelated pCRT-response genes defined in the above section, 124 and 1405 genes were mapped in the human PPI network, respectively. The 124 cancer-related pCRT-response genes had 117 and zero direct interactions with the effective genes and pharmacokinetic genes, respectively (See [95]Fig. 2). And the 1405 cancer-unrelated pCRT-response genes had 672 and 54 direct interactions with the effective genes and the pharmacokinetic genes, respectively. Notably, the average number of the direct interactions between a cancer-related pCRT-response gene and the effective genes was 0.94, which was significantly more than the corresponding average number of 0.48 for the 1405 cancer-unrelated pCRT-response genes (Rank sum test, p-value = 2.2E-2). Figure 2. The direct PPI links between the cancer-related pCRT-response genes and the known effective genes of pCRT. [96]Figure 2 [97]Open in a new tab The red (circular) nodes denote the known effective genes of pCRT. The green (diamond-shaped) nodes denote the cancer-related pCRT-response genes. The yellow (oval) nodes denote the genes overlapped between the cancer-related pCRT-response genes and the known effective genes of pCRT. MCM3, detected as a cancer-related pCRT-response gene, interacts with MYC, CHEK1 and ATR. Discussion In this work, we proposed to distinguish cancer-related and cancer-unrelated pCRT-response genes for genes differentially expressed between non-responders and responders of LARCs. We showed that these two types of genes affect LARCs response to pCRT in totally different ways. Notably, according to the framework of cancer hallmarks network[98]^33, the pathways enriched with the cancer-related pCRT-response genes are related to the survival network which presents the cancer traits of resistance to cell death, sustaining chronic proliferation and blocking signals of inhibitory growth. Furthermore, PPI network analyses revealed that the cancer-related pCRT-response genes specifically and intensively interact with the known effective genes of pCRT, mostly conducting the functions in well known pCRT-response related pathways including DNA replication, cell cycle and DNA repair[99]^30,[100]^31. For example, MCM3 was found as a cancer-related pCRT-response gene and it interacts with MYC, CHEK1 and ATR, which are all the known effective genes of pCRT-response[101]^30,[102]^31 (See [103]Fig. 2). In contrast, the cancer-unrelated pCRT-response genes were significantly enriched in typical metabolism pathways related to drug metabolisms such as cytochrome P450 which contributes to multidrug resistance in tumor[104]^34 and other diseases[105]^35. Outstandingly, we found that the cancer-unrelated pCRT-response genes tend to interact with the known pharmacokinetic genes (See [106]Fig. 3). Together, these results suggested that the cancer-unrelated pCRT-response genes may determine the metabolism characteristics to shape LARCs response. For example, ATIC was detected as a cancer-unrelated pCRT-response gene and it interacts with MTR, AMT, MTHFD1/2, SHMT1/2, FTCD and DHFR, which are all involved in the inhibition of thymidylate synthase[107]^30,[108]^31 (See [109]Fig. 3). And genetic variant of ATIC has been confirmed to be a pharmacokinetics marker of methotrexate[110]^36. On the other hand, our results showed that the cancer-unrelated pCRT-response genes also interact with effective genes of pCRT. However, this result should be explained carefully because some of the cancer-unrelated pCRT-response genes could be mistakenly identified due to the insufficient power of the identification of cancer-related pCRT-response genes. The number of available samples of normal rectal tissues was only 34, which could be insufficient to detect all the genes pairs with stable REOs in normal rectal tissues and the individual-level DE genes for each LARC. Therefore, it is important that the cancer-related pCRT-response genes should be further studied using larger samples of normal and LARCs. Figure 3. The direct PPI links between the cancer-unrelated pCRT-response genes and the known pharmacokinetics genes of 5-FU. [111]Figure 3 [112]Open in a new tab The red (circular) nodes denote the known pharmacokinetics genes of 5-FU. The green (diamond-shaped) nodes denote the cancer-unrelated pCRT-response genes. The yellow (oval) nodes denote the genes overlapped between the cancer-unrelated pCRT-response genes and the known pharmacokinetics genes of 5-FU. ATIC, detected as a cancer-unrelated pCRT-response gene, has the largest number of interaction links with the known pharmacokinetics genes of 5-FU. In summary, the two types of pCRT-response genes should be distinguished in studying mechanisms of LARCs response to pCRT. Materials and Methods Data and pre-processing Five microarray datasets for rectal normal samples and LARCs were downloaded from Gene Expression Omnibus[113]^37 (GEO, [114]http://www.ncbi.nlm.nih.gov/geo/), as described in detail in [115]Table 2 and [116]3. For the data from [117]GSE9254 and [118]GSE35452 measured by the Affymetrix platform, the raw data was processed for background adjustment via the Robust Multi-array Average algorithm[119]^38 without quantile normalization because all comparisons of gene relative orderings occurred within individual samples. For the data derived from [120]GSE68204 measured by the Agilent platform, the raw fluorescent signal intensity values of the channel (gMedianSignal or rMedianSignal) for normal samples minus the corresponding background signal intensities as the probe-expression matrix. For the data of [121]GSE53781 and [122]GSE75548 measured by CodeLink bioarrays and Illumina platform, we directly downloaded the processed data. Then, each probe-set ID was mapped to its Entrez gene ID with the corresponding custom CDF files. If several probesets were mapped to a gene, the expression value for the gene was defined as the arithmetic mean of the values of the multiple probesets (on the log2 scale). Table 2. The data of normal rectal samples used for identifying stable gene pairs. Accession number Platforms Number of genes Number of samples References