Graphical abstract graphic file with name fx1.jpg [39]Open in a new tab Highlights * • Semi-supervised learning to derive clinically relevant biomarkers * • A new 10-gene signature refines chemotherapy decisions in stage II/III colon cancer * • The signature shows the strong prognostic value and potential for treatment benefits * • Validation using FFPE samples to ensure clinical applicability __________________________________________________________________ Xu et al. develop and validate a 10-gene signature that refines chemotherapy decisions for stage II and III colon cancer, demonstrating its prognostic value and potential to predict treatment benefits using extensive dataset analysis and validation with FFPE samples. Introduction Colon cancer, a disease characterized by its heterogeneity, is the fourth deadliest cancer worldwide, causing nearly 900,000 deaths each year.[40]^1 Colon cancer has been categorized as either localized to bowel (stage I to II), regionally spread to lymph nodes (stage III), or distantly spread to the liver, lung, peritoneum, brain, or distant lymph nodes (stage IV). The five-year survival rates for patients with stage I colon cancer are approximately 90% without chemotherapy but range from 10% to 50% for stage IV disease despite treatment with chemotherapy.[41]^2^,[42]^3 Five-year disease-free survival rates in patients with high-risk stage II colon cancer were reported to be 80.7%–83.9%, and the three-year disease-free survival rate among patients with stage III colon cancer was from 73.4% to 76.3%.[43]^4^,[44]^5 Generally, patients with stage II and III colon cancer can be considered as clinically relevant groups for risk stratification of disease recurrence.[45]^6^,[46]^7^,[47]^8 For some patients with high-risk stage II colon cancer, and the majority of patients with stage III colon cancer, adjuvant chemotherapy (ACT) consisting of fluoropyrimidine with or without oxaliplatin has been routinely used as an important treatment following surgery with the intent to cure.[48]^9 However, the decision about ACT for stage II colon cancer has been challenging and controversial; the results from clinical trials are not supportive of ACT for all patients with stage II colon cancer.[49]^10^,[50]^11 Although ACT has been recommended for all patients with stage III colon cancer, only ∼20% of patients receiving such therapy will benefit from the treatment in terms of a survival advantage.[51]^12 Progress in treating this disease has been hampered by the inability to accurately determine which patients are most likely to benefit and which are least likely to benefit from the current systemic treatment options. It is therefore critical to discover better predictive biomarkers to identify the stage II and III patients who will most likely benefit from ACT. Also, it is important to avoid unnecessary toxicity for the rest of the patients who are not as likely to benefit from that therapy and/or to stimulate further research into more effective adjuvant therapy options. Several biological biomarkers, including microsatellite instability (MSI) and loss of heterozygosity at chromosome 18q (18q LOH), have historically been proposed for prognostic and predictive purposes in colon cancer.[52]^13^,[53]^14 However, only a minority of patients with early-stage colon cancer (less than 15%) are MSI-high (MSI-H), and the prognostic significance of 18q LOH has not been conclusively validated in clinical trials.[54]^15^,[55]^16 Moreover, the currently available commercial gene signature panels for colon or colorectal cancer (CRC), such as Oncotype DX colon cancer,[56]^17 ColoPrint,[57]^18 Veridex,[58]^19 and GeneFx colon,[59]^20 have primarily been developed with a focus on prognosis. While these panels have undergone external validation, their results have been somewhat inconsistent, and there is a lack of concrete evidence supporting their predictive value specifically for ACT benefit.[60]^21^,[61]^22 In the realm of colon cancer ACT benefit prediction, a few gene signatures have been proposed.[62]^23^,[63]^24^,[64]^25 However, compared to general prognostic signatures, these specific signatures targeting chemotherapy benefit prediction have not undergone extensive independent validation. This gap highlights a critical need for reliable ACT benefit biomarkers, not only for stage II and III colon cancer but also for practical implementation in clinical settings using formalin-fixed paraffin-embedded (FFPE) samples. Our study aims to address this gap by focusing on the development and validation of gene signatures tailored for predicting ACT benefits, thereby contributing to the refinement of personalized treatment strategies in colon cancer. To further improve the identification of prognosis/prediction signatures, we assembled a comprehensive colon cancer stage II and III gene expression dataset, which included 933 samples from six datasets. We then applied an innovative semi-supervised machine learning approach to identify an 18-gene prognosis signature and a 10-gene chemotherapy benefit prediction signature set ([65]Figure 1). The prognosis and chemotherapy-benefit prediction signatures were validated in five and two testing datasets, respectively, including targeted NanoString nCounter data we generated from FFPE samples. We showed that patients predicted to benefit from ACT based on the 10-gene signature had significantly better survival outcomes than those predicted not to benefit using both fresh-frozen and FFPE samples. Additionally, the 10-gene signature displayed the potential to predict the utility of cancer immunotherapy in an adjuvant setting in selected patients. Figure 1. [66]Figure 1 [67]Open in a new tab The workflow chart Schematic of the study design for (A) the development and validation of the 18-hub-gene prognosis signature; and (B) the development and validation of the 10-gene chemotherapy-benefit signature. Results Semi-supervised gene network analysis identified an 18-hub-gene set We first selected [68]GSE39582 , which was our only discovery dataset with survival outcomes and is one of the largest gene expression datasets for stage II and III colon cancer, to identify genes associated with survival outcome. A total of 3,091 genes with significant nominal p values (p ≤ 0.05) were selected as candidates. Next, to further pare down a more focused gene set from these survival-associated genes, these survival-associated genes in [69]GSE39582 were combined with five gene expression datasets without survival outcome to construct the large gene network in stage II and III colon cancer. Specifically, we combined [70]GSE39582 with five additional stage II and III colon cancer expression datasets ([71]GSE27854, [72]GSE26906, [73]GSE21510, [74]GSE18088, and [75]GSE2109) to form a large gene expression dataset with 933 samples and 3,091 genes. The SPACE algorithm was then used to construct a gene network, which included 504 nodes and 737 edges ([76]Figure 2A). Since hub genes with more connections in the gene network are more biologically meaningful compared with the p value-ranked gene list of meta-analysis,[77]^26^,[78]^27 we selected the top 18 hub genes with more than 11 connections (degrees) in the hub gene list ([79]Table S2), which included THY1, BUB1, SDPR, NAP1L3, DEPDC1, STON1, DLGAP5, FBN1, ATAD2, HSD17B2, KIF23, CHAC2, FRMD6, MCM10, TPX2, AURKB, CYR61, and FAM84A genes ([80]Figure 2B). The description of gene functions and pathway analysis results are listed in [81]Table S3 and [82]Figure S1, respectively. Figure 2. [83]Figure 2 [84]Open in a new tab Identification of the 18-hub-gene signature and validation of the Random Survival Forests (RSF) prognosis model (A) The topology of the constructed survival-related gene network in colon cancer. A total of 504 genes (nodes) with at least 11 connections with other genes were identified as hub genes and labeled in enlarging. (B) Composition of the 18 survival-associated hub genes. Degree refers to the number of genes that the hub gene has a direct connection with based on the constructed gene network in Figure A. GISTIC ([85]http://portals.broadinstitute.org/tcga/home) records the information of somatic copy-number variation: “+” indicates the genes with significant amplification and “−” indicates significant deletion in CRC. MethHC ([86]http://methhc.mbc.nctu.edu.tw/php/index.php) records the information on DNA methylation: “+” indicates the genes with significant hypermethylation and “−” indicates significant hypomethylation in CRC. DisGeNET ([87]https://www.disgenet.org/home/) records the information on human gene-disease associations (GDAs): “+” indicates the genes related to CRC. The gene symbols of the ten genes with more than two pieces of evidence that occurred in CRC were highlighted in red. (C) Maximally selected log rank statistics were applied to the in-bag predicted value of the RSF prognosis model. The optimal cutoff point was determined as 14.38. (D–G) Validation of the 18-hub-gene signature in 4 independent datasets, including [88]GSE17538 , [89]GSE33113, [90]GSE37892, and [91]GSE38832. The high- and low-risk groups were defined based on the 18-hub-gene signature, which was derived from the [92]GSE39582 cohort. The optimal cutoff point, which was determined by the maximally selected rank statistics (maxstat), was used to divide the patients into high- and low-risk groups. Red and dark blue lines indicate predicted high- and low-relapse risk groups. HR compares the RFS of the low- and high-relapse risk groups. p values were obtained by the log rank test. Prognosis model construction and performance evaluation To determine if the discovered 18-hub-gene set could predict colon cancer prognosis, we constructed a prognosis prediction model using [93]GSE39582 as our training dataset. The optimal cutoff point based on maximally selected rank statistics was determined as 14.38 for the high- and low-risk group ([94]Figure 2C).[95]^28 Next, we evaluated this prognostic model using four additional independent expression datasets with available survival information ([96]GSE17538, [97]GSE33113, [98]GSE37892, and [99]GSE38832). In each testing dataset, the samples were also divided into high- and low-relapse risk groups using the optimal cutoff point in [100]Figure 2C. The hazard ratio (HR) and log rank p value of predicted risk groups in the five datasets were 3.324 (p = 0.005), 3.933 (p = 0.003), 11.543 (p < 0.0001), and 4.745 (p = 0.032), respectively ([101]Figures 2D–2G). Furthermore, the univariable Cox proportional hazard regression analysis was performed and the results suggested that the 18-hub-gene set was significantly associated with relapse-free survival (RFS) outcome in [102]GSE17538 (HR = 3.336, p = 0.008), [103]GSE33113 (HR = 3.967, p = 0.006), [104]GSE37892 (HR = 12.05, p = 3.79 × 10^−^9), and [105]GSE38832 (HR = 4.795, p = 0.051) ([106]Table S4). Also, the multivariable Cox proportional hazard regression analysis with additional clinical variables including age, gender, and stage was completed, and the results showed that the 18-hub-gene set was an independent factor for RFS outcome in [107]GSE17538 (HR = 4.136, p = 0.003), [108]GSE33113 (HR = 4.117, p = 0.005), [109]GSE37892 (HR = 8.238, p = 2.73 × 10^−^6), and [110]GSE38832 (HR = 4.577, p = 0.058) ([111]Figure S2A; [112]Table S4). These results demonstrated the 18-hub-gene set provides an effective gene signature for colon cancer prognosis. 18-Hub-gene prognostic model outperformed existing signatures In the construction of clinical prognostic models, the efficient selection of a gene signature set from a high-dimensional gene list is a crucial step. Typically, in a p value-based gene selection approach, the set of genes with the most significant p values is selected to build the prognostic prediction model. To compare the performances of the p value-based gene selection approaches with our network-based gene selection approach, we chose 18 genes with the most significant p values from the Cox regression analysis and re-constructed a prognosis model of RFS using the same five independent GEO datasets described earlier. We found that the prediction performance of the 18-hub genes obtained with our semi-supervised approach had better performance than those based on the 18 most significant genes. The median AUC of the 18-hub-gene set and the 18-most-significant gene set for predicting 5-year RFS was 0.732 and 0.591, respectively ([113]Figure 3A). Additionally, analysis using pairwise mutual information distance measuring association showed that the 18-hub-gene signature set generated by our semi-supervised approach had much lower redundant information than the 18-top-ranked gene set (two-sample Kolmogorov-Smirnov test, p value < 0.00001, [114]Figure 3A). Moreover, principal component analysis also demonstrated that the 18-hub-gene set captured more variation in patients with colon cancer than the 18-top-ranked gene set ([115]Figure 3B). Figure 3. [116]Figure 3 [117]Open in a new tab Comparison of the 18-hub-gene set and 18-most-significant gene set (A) Boxplot of median AUC for predicting 5-year RFS in four GEO testing datasets ([118]GSE17538 , [119]GSE33113, [120]GSE37892, and [121]GSE38832), and pairwise mutual information distance based on expression values in the [122]GSE39582 dataset. (B) Expression variation across the samples in the [123]GSE39582 dataset based on the principal component analysis. (C) Gene expression patterns of the two gene sets in the [124]GSE39582 dataset. Using the expression dataset of 469 stage II and III colon cancer samples in [125]GSE39582 , we next compared the expression patterns of two gene sets and found the average expression level of the 18-hub-gene group was significantly higher than the expression level of the 18-top-ranked group (t test, p value < 2.2 × 10^−^16), demonstrating the effectiveness of our approach for optimally identifying a more relevant colon cancer-related gene signature set for prediction analysis ([126]Figure 3C). Furthermore, we compared the prediction performance of the 18-hub-gene set with other established gene sets for colon cancer prognosis prediction, including the Oncotype DX,[127]^29 ColoGuidePro,[128]^30 ColoFinder,[129]^31 the 8-gene signature of Peng et al.,[130]^32 the 13-gene signature of Tian et al.,[131]^33 the 3-gene signature of Cheng et al.,[132]^34 the 15-gene signature of Rokavec et al.,[133]^35 and the 12-gene signature of Fang et al.[134]^36 For each prognostic signature generated by these alternative methods, we also calculated HR, log rank test p values, and 5-year AUC for the four GEO validation cohorts ([135]GSE17538 , [136]GSE33113, [137]GSE37892, and [138]GSE38832). The results showed the 18-hub-gene set generated by our semi-supervised approach had the best performance in prognostic prediction ([139]Table S5). Identification and validation of 5-FU-based ACT chemotherapy-benefit gene signature 5-Fluorouracil (5-FU)-based ACT refers to patients who have received 5-FU treatment alone (either intravenous 5-FU or oral capecitabine) or in combination with other medications, including folinic acid and oxaliplatin. Among all available colon cancer datasets, two datasets ([140]GSE39582 and [141]GSE17538) included 142 and 80 primary tumor samples from patients who subsequently received adjuvant 5-FU-based treatments. We next tried to determine a gene signature that can more accurately predict patients with colon cancer who are likely to benefit from ACT therapy and thus demonstrate improved survival relative to 5-FU-based ACT from the primary 18-hub-gene set. To this end, we identified potential cancer driver genes with either significant genomic aberration or DNA methylation alteration between CRC and normal samples using the GISTIC tool[142]^37 for The Cancer Genome Atlas copy number and MethHC database[143]^38 (see [144]Figure 1B) and manually interrogated the 18-hub-gene set with these two databases. To not miss some crucial genes that play an important role in colon cancer, we also verified the 18-hub genes in the DisGeNET database,[145]^39 which is the largest disease-associated gene database. Finally, the genes occurring more than two times in the GISTIC, MethHC, or DisGeNET databases ([146]Figure 2B) were selected and used to construct a 10-gene chemotherapy-benefit signature set. An RSF chemotherapy-benefit model was re-constructed using these 10 genes, along with 142 samples with 5-FU-based ACT information in the [147]GSE39582 dataset. The prognostic performance of the 10-gene chemotherapy-benefit signature set was evaluated in the four GEO colon cancer cohorts ([148]GSE17538 , [149]GSE33113, [150]GSE37892, and [151]GSE38832). The HRs and log rank test p values calculated to evaluate the prediction performance in four datasets were 2.97 (p = 0.002), 4.467 (p = 0.004), 5.963 (p < 0.0001), and 2.642 (p = 0.122), respectively ([152]Figures S3A–S3D). The results suggested that the 10-gene signature set demonstrated a prognostic performance similar to the 18-hub-gene prognosis signature set. We then assessed which patients demonstrated an association with survival outcomes relative to 5-FU-based ACT by the 10-gene chemotherapy-benefit prediction model. According to the maximally selected rank statistics (maxstat) provided in the “survminer” package, the samples were divided into ACT-benefit and non-benefit groups using the optimal cutoff point ([153]Figure 4A). The HR and p value validated by another colon cancer cohort, [154]GSE17538 with ACT information, were HR = 3.884 and p = 0.002 ([155]Figure 4B), which demonstrated significant prediction of RFS outcome for the 10-gene chemotherapy-benefit signature set in stage II and III colon cancer. Furthermore, the results of univariable Cox proportional hazard regression analysis suggested that the 10-gene chemotherapy-benefit signature set was significantly associated with RFS of patients with 5-FU-based ACT in [156]GSE17538 (HR = 3.933, p = 0.004) ([157]Table S6); and the results of multivariable Cox proportional hazard regression analysis also showed that the 10-gene chemotherapy-benefit signature set was a significant independent factor for RFS of patients with 5-FU-based ACT in [158]GSE17538 (HR = 4.476, p = 0.003) ([159]Figure S2B; [160]Table S6). Furthermore, we conducted a control experiment in which we randomly selected 5–15 genes 1,000 times and compared their prediction performance with the 10-gene set in [161]GSE17538. As shown in the results ([162]Figure 4C), the AUC of the 10-gene set for predicting 5-year RFS was 0.756, significantly outperforming the random gene set (p = 0.009). These data demonstrate that the 10-gene chemotherapy-benefit signature set selected by the genomic event could more accurately predict patients with colon cancer who would more likely benefit from ACT. Figure 4. [163]Figure 4 [164]Open in a new tab Validation of the RSF chemotherapy-benefit model (A) Maximally selected log rank statistics were applied to the in-bag predicted value of the RSF chemotherapy-benefit model. The optimal cutoff point was determined as 8.88. (B) Validation of the 10-gene chemotherapy-benefit signature in another independent dataset, [165]GSE17538 , which contains only samples with 5-FU-based ACT. The ACT-benefit and non-benefit groups were defined based on the 10-gene chemotherapy-benefit signature, which was derived from samples with 5-FU-based ACT in the [166]GSE39582 cohort. The optimal cutpoint, which was determined by the maximally selected rank statistics (maxstat), was used to divide the patients into the ACT-benefit and non-benefit groups. Dark blue and red lines indicate predicted ACT-benefit and non-benefit groups. HR compares the RFS of the ACT-benefit and non-benefit group. p values were obtained by the log rank test. (C) The density plot illustrated the AUC values of 1,000 randomly selected gene sets for predicting 5-year RFS. The red dotted line represents an AUC of 0.756 for the 10-gene set in predicting 5-year RFS. (D and E) The predictive performance evaluation of the prognosis model and chemotherapy-benefit model in the VUMC data. The 18-gene prognosis model divided high- and low-relapse risk groups in stage II to III patients in (D), and the 10-gene chemotherapy-benefit model divided ACT-responsive and non-responsive groups in stage II to III patients in (E). (F) The mutation pattern of KRAS, BRAF, PIK3CA, PTEN, NRAS, and AKT1 between VUMC ACT-benefit and non-benefit patients. (G) The boxplot of the 18-gene score and 10-gene score between the different locations of lesions in the VUMC cohort. (H) The bar plot of the 18-gene and 10-gene groupings according to the patients with MSI-H in the VUMC cohort. Validation of 18- and 10-gene signatures using FFPE samples To verify the utility of our 18-gene prognosis model and 10-gene chemotherapy-benefit model, we collected 109 stage II/III colon cancer specimens and detected the expression of 18-hub-gene by the NanoString nCounter platform, and 56 of them had received 5-FU-based ACT ([167]Table 1; [168]Table S8). The 18-gene prognosis model was used to divide these 109 specimens into high- and low-relapse risk groups. The HR and log rank p values were 3.542 (p = 0.009) ([169]Figure 4D). The results of univariable (p = 0.014, HR = 3.559) ([170]Table S4) and multivariable (p = 0.0262, HR = 4.46) ([171]Figure S2A; [172]Table S4) Cox regression analyses suggested that the 18-gene prognosis model is an independent prognosis factor. Then the 10-gene chemotherapy-benefit prediction model was used to divide 56 stage II to III specimens with 5-FU-based ACT into benefit and non-benefit groups. Those patients predicted to be in the non-benefit group showed worse RFS after 5-FU-based ACT. The HR was 2.735 and log rank test p value was p = 0.058 ([173]Figure 4E). The HR and p value of univariable and multivariable Cox regression analyses were 2.765 (p = 0.069) and 4.469 (p = 0.074) ([174]Figure S2B; [175]Table S6). We also investigated the mutation pattern of KRAS, BRAF, PIK3CA, PTEN, NRAS, and AKT1 in the Vanderbilt University Medical Center (VUMC) analysis group of patients within the ACT-benefit and non-benefit groups. The results suggested that the mutation frequencies of KRAS and PIK3CA in the ACT-non-benefit group (40% and 20%) were higher than in the ACT-benefit group (9.8% and 4.9%) ([176]Figure 4F). In addition, we analyzed the distribution of the location of lesions and MSI-H status according to the 18- and 10-gene signature groupings in the VUMC cohort. The results suggested that the 18-gene score and 10-gene score show no significant difference between left- and right-sided lesions in the VUMC cohort; the p value (Wilcoxon rank-sum test) was 0.548 and 0.82, respectively ([177]Figure 4G). For MSI status, the VUMC cohort has 10 patients with MSI-H. Intriguingly, MSI-H status is equally distributed among the high- (n = 5) and low (n = 5)-risk groups according to the 18-gene signature ([178]Figure 4H). However, MSI-H status is distributed only according to the ACT-benefit grouping (n = 6) of the 10-gene signature ([179]Figure 4H). This observation suggests that the 10-gene signature may have implications for identifying patients who are more likely to benefit from ACT, specifically within the MSI-H subgroup. Table 1. Patient and tumor characteristics of the different sets Characteristics Type Mean age (sd[180]^a, range), years Gender ^a(M/F) TNM stage __________________________________________________________________ Treatment with 5-FU[181]^a Median follow-up[182]^a (sd, range), months Relapse[183]^a __________________________________________________________________ 0 I II III IV NA Yes No N/A [184]GSE39582 (n = 566)∗ mRNA, independent 66.9 (13.3, 22–97) 310/256 4 33 264∗ 205∗ 60 0 142∗ 48 (40.4, 0–201) 139 322 8 [185]GSE27854 (n = 115)∗ mRNA, independent – – 0 16 41∗ 35∗ 23 0 – – – – – [186]GSE26906 (n = 90)∗ mRNA, independent 66.5 (12.6, 31–94) 43/47 0 0 90∗ 0∗ 0 0 – – – – – [187]GSE21510 (n = 123)∗ mRNA, independent – – 0 15 46∗ 39∗ 23 0 – – – – – [188]GSE18088 (n = 53)∗ mRNA, independent 65.4 (12.2, 34–85) 26/27 0 0 53∗ 0∗ 0 0 – – – – – [189]GSE2109 (n = 276)∗ mRNA, independent – 138/137 2 38 87∗ 73∗ 36 40 – – – – – [190]GSE17538 (n = 232)∗ mRNA, independent 64.7 (13.4, 23–94) 122/110 0 27 72∗ 76∗ 56 0 80∗ 40.7 (26.6, 0.4–118.6) 34 111 3 [191]GSE33113 (n = 90)∗ mRNA, independent 70.4 (13, 35–95) 42/48 0 0 90∗ 0∗ 0 0 0 38.8 (26.5, 1.6–118.3) 18 71 1 [192]GSE37892 (n = 130)∗ mRNA, independent 68.3 (12.7, 22–97) 69/61 0 0 73∗ 57∗ 0 0 0 42.9 (23.4, 0.6–103.1) 37 93 0 [193]GSE38832 (n = 122)∗ mRNA, independent – – 0 18 35∗ 39∗ 30 0 0 41.2 (34.1, 0.3–111.4) 9 65 0 VUMC (n = 136/144)∗ mRNA, independent 65.5 (11.5, 42–86) – 0 1 65∗ 44∗ 26 0 56∗ 49.9 (51.8, 0.1–241.6) 20 89 0 [194]Open in a new tab N/A, not available; sd, standard deviation; ∗ the datasets used to derive and validate the signature. ^a Among patients with stage II to III colon cancer. Drug sensitivity analysis for 5-FU-based ACT chemotherapy-benefit gene signature The 10-gene chemotherapy-benefit signature was applied to predict the 5-FU benefit of NCI-60 cell lines, which were collected and downloaded from CellMiner.[195]^40 We then assigned 60 cell lines into 14 5-FU-sensitive and 46 5-FU-resistant groups by the 10-gene prediction model, which included 2 and 5 colon cell lines, respectively ([196]Figure 5A). Meanwhile, 1,886 5-FU-related NCI-60 studies were collected from CellMiner, The Wilcoxon rank-sum test for −log10(GI 50) values between 2 5-FU-resistant colon cell lines and 5 5-FU-sensitive colon cell lines was applied. Interestingly, we found the median of GI 50 values in the 5 5-FU-sensitive colon cell lines was significantly higher than that in the 2 5-FU-resistant colon cell lines (p value < 2.2 × 10^−^16) ([197]Figure 5B), suggesting that 5 ACT-beneficial colon cell lines were more sensitive to 5-FU. In summary, the 10-gene chemotherapy-benefit signature set has a good performance for predicting 5-FU sensitivity vs. resistance for colon cancer cell lines from the NCI-60 data. Figure 5. [198]Figure 5 [199]Open in a new tab The performance of the 10-gene chemotherapy-benefit signature set for predicting 5-FU resistance for 60 cancer cell lines from the NCI-60 data (A) The heatmap plot of NCI-60 cell lines, which were clustered into 22 resistant and 38 sensitive groups based on the drug activity Z scores of the 10-gene chemotherapy-benefit signature. (B) The violin plot of −log10(GI 50) values between resistant and sensitive groups. Abbreviations are as follows: BR, breast; CNS, central nervous system; CO, colon; LC, non-small cell lung; LE, leukemia; ME, melanoma; OV, ovarian; PR, prostate; and RE, renal. Pathway enrichment analysis for 10-gene chemotherapy-benefit signature-associated gene networks To better understand the biology underlying the 10-gene chemotherapy-benefit signature, we next performed pathway enrichment of the ten genes (THY1, BUB1, DEPDC1, STON1, ATAD2, HSD17B2, TPX2, AURKB, CYR61, and FAM84A), along with the 135 genes connected to them on the SPACE gene network ([200]Figure 2A). We tested 743 canonical pathways collected in MSigDB (Molecular Signature Database from Broad Institute) that contained genes overlapping with the 145 genes described earlier ([201]Table S7). The results showed a total of 147 pathways with nominal p values less than 0.05. Next, we computed a pathway enrichment score for each sample using ssGSEA (single-sample gene set enrichment analysis) and compared these scores for the ACT-benefit and non-benefit groups in the testing cohort [202]GSE17538 ([203]Table S7).[204]^41 [205]Figure S4 shows a heatmap including pathways with significantly different ssGSEA scores between the two groups (adjusted p value less than 0.1). Among these significant pathways, 49 pathways showed high expression activities in the ACT-non-benefit group, and notably, many of them are tumor microenvironment-related pathways, such as REACTOME_INTEGRIN_CELL_SURFACE_INTERACTIONS, REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION, and others. The potential predicted response to immune checkpoint blockade of 10-gene chemotherapy-benefit signature A study explored the roles of fatty acid metabolism in CRC and developed a prognostic risk score model. The study utilized the Tumor Immune Dysfunction and Exclusion (TIDE) score alongside a SubMap algorithm to demonstrate that high-risk patients, as identified by the model, are more suitable for immunotherapy. This suggests that the TIDE score can help identify patients with CRC who might benefit from specific immunotherapies.[206]^42 Given our findings relative to the patients with MSI-H and the 10-gene signature, we investigated whether the 10-gene chemotherapy-benefit signature could also predict response to immune checkpoint blockade (ICB). In our study, we assessed the relationship between the TIDE score[207]^43 and the 10-gene chemotherapy-benefit score for each sample in our datasets. We then examined the Pearson correlation between these two scores. Our analysis, involving 442 tumor samples across four datasets ([208]GSE17538 , [209]GSE33113, [210]GSE37892, and [211]GSE38832), revealed a significant yet moderate positive correlation between the TIDE score and the 10-gene chemotherapy-benefit score (r = 0.33, p = 4.97 × 10^−^13, [212]Figure 6A). However, this correlation varied across datasets. Specifically, in the [213]GSE37892 dataset, the correlation was relatively weak and not statistically significant (r = 0.13, p = 0.13, [214]Figure 6D), while in the [215]GSE17538 (r = 0.42, p = 1.05 × 10^−^6, [216]Figure 6B), [217]GSE33113 (r = 0.42, p = 3.77 × 10^−^5, [218]Figure 6C), and [219]GSE38832 (r = 0.53, p = 9.2 × 10^−^7, [220]Figure 6E) datasets, a moderate and significant positive correlation was observed. Our results, therefore, underscore the need for further investigation, particularly in clinical settings where immunotherapy is administered, to better understand and validate the potential predictive role of our 10-gene signature in the context of ICB. Figure 6. [221]Figure 6 [222]Open in a new tab Association of the 10-gene chemotherapy-benefit signature with TIDE score The Pearson correlation between the TIDE score and the 10-gene chemotherapy-benefit score in the merged four datasets (A), [223]GSE17538 (B), [224]GSE33113 (C), [225]GSE37892 (D), and [226]GSE38832 (E). Discussion In our pursuit of personalized cancer therapy, identifying patients with stage II and III colon cancer most likely to benefit from 5-FU-based ACT remains a crucial goal. However, developing gene signatures related to prognosis or ACT benefit poses challenges due to limited training datasets, variability in gene expression data across different platforms, and heterogeneity in patient stages, treatments, and tumor molecular characteristics. To overcome these challenges, we employed a semi-supervised machine learning approach, identifying genes significantly associated with survival and highly interconnected within gene networks. Utilizing a discovery dataset of 933 stage II and III colon cancer samples, the largest of its kind, we aimed to maximize the robustness of our model. Gene regulatory network modeling, which studies how genes interact with each other and work together as modules to perform biological functions, has been extensively applied to genomic data analysis and shown to be an effective approach for understanding complex biological systems.[227]^44^,[228]^45^,[229]^46^,[230]^47^,[231]^48 It also has been demonstrated that compared with the genes selected by meta-analysis p values, highly connected hub genes in the network have more biological functional importance and are more predictive of outcome.[232]^27 The hub genes have been used as effective biomarkers for the prognosis and prediction of survival in different cancer types.[233]^49^,[234]^50^,[235]^51^,[236]^52^,[237]^53^,[238]^54 To select relevant genes for the construction of a gene network, we first applied the Cox regression model to select 3,091 genes significantly associated with patient survival outcomes in a subset of the discovery cohort with available survival information (n = 566) and then constructed a gene network using the survival-associated genes with all 933 samples in the discovery cohort. The 18-hub-gene selected from our semi-supervised gene network, in which each gene has more than 11 connections with its neighbors in the gene co-expression network, led to coordinated changes at the system level. We found that compared to the 18-top-ranked gene set, our 18-hub-gene set has higher information content as well as more robust prognostic performances across different datasets. Further genomic and DNA methylation analysis also developed a 10-gene chemotherapy-benefit signature, which was successfully validated in another independent fresh-frozen tissues dataset we generated previously ([239]GSE17538). Moreover, we demonstrated both 18- and 10-gene signatures to have independent prognostic and predictive value beyond the clinical variables including tumor stage. These results suggested that the gene signatures we developed were not just prognostic for patients with colon cancer but, more importantly, were also predictive of patients’ benefit from 5-FU-based chemotherapy in terms of RFS as in previous work.[240]^55^,[241]^56 The pathway analyses of the 10-ACT benefit gene signature and its network revealed that for the ACT-non-benefit group, a series of the tumor microenvironment and cell surface interaction-related pathways such as integrin 5 and integrin 2 pathways are activated, which might provide a survival advantage for tumor cells against chemotherapy treatment.[242]^57^,[243]^58^,[244]^59 It is also well known that increased TGFβ signaling in the microenvironment is associated with adverse outcomes in patients with CRC.[245]^60^,[246]^61 On the other hand, the ACT-benefit group is enriched with cell cycle-related pathways. Thus, the 10-ACT benefit gene signature captures the distinct biological functions and activities associated with chemotherapy responses. ICB is an effective strategy for enhancing anti-tumor T cell activity and therapeutic effects. ICB therapy has resulted in unprecedented and durable responses in a sizable proportion of patients with diverse cancers such as melanoma, lung, kidney, bladder, and others.[247]^62^,[248]^63^,[249]^64 ICB drugs such as pembrolizumab and nivolumab, which are inhibitors of programmed cell death protein 1 (PD-1), have been authorized by the Food and Drug Administration to treat patients with metastatic CRC whose tumors are MSI-H or deficient mismatch repair (dMMR). Recently PD-1 blockade was applied to patients with locally advanced rectal cancer with dMMR in a neoadjuvant setting, and all 12 patients achieved a striking clinical complete response.[250]^65 However, MSI-H/dMMR CRC only comprises around 15% of CRC, and patients with microsatellite stable (MSS)/mismatch repair-proficient (pMMR) CRC show poor responses to ICB treatments due to “immune-cold” tumor immune microenvironment.[251]^66^,[252]^67^,[253]^68^,[254]^69^,[255]^70 Although overall patients with MSS/pMMR CRC lack strong responses to ICB therapy, the heterogeneity nature of CRC indicates that the different subgroups may have divergent treatment responses.[256]^71^,[257]^72^,[258]^73 A recent large-scale transcriptome analysis identified a subgroup of “immune-hot” MSS/pMMR rectal cancer with favorable outcomes to neoadjuvant therapy, which highlights the potential for effective response to ICB therapy.[259]^74 On the other hand, there is a small yet notable fraction of patients with MSI-H/dMMR metastatic CRC who unfortunately show early disease progression after undergoing ICB treatment.[260]^75^,[261]^76 Therefore, there is an urgent need to identify potential biomarkers for ICB response. TIDE score has been shown to capture tumor immune evasion by combining gene signatures estimating T cell dysfunction and exclusion to predict ICB response.[262]^43 However, TIDE score calculation may need a large number of genes and is focused on the state of T cells. We observed a significant, although moderate, correlation (r = 0.33, p = 4.97E−13, [263]Figure 6A) between the 10-gene signature and TIDE score using 442 colon tumors. While this finding is intriguing, we acknowledge that it does not strongly indicate a predictive capability for checkpoint inhibition response. This association, therefore, should be interpreted with caution, and we do not assert it as a definitive predictor of immunotherapy benefit. Further investigation in populations receiving immunotherapy is required to explore the potential relevance of this correlation. One of the key bottlenecks for translating biomarkers discovered from the microarray/RNA sequencing transcriptomic data based on fresh-frozen tissue to routine clinical practice is the development of accurate and robust clinical tests, especially on FFPE samples. Previously, we systematically evaluated the gene expression pattern of colon cancer prognostic gene signature between Affymetrix microarray data on fresh-frozen tissue and NanoString nCounter data on matched FFPE samples and showed the moderate correlation between these two measurements, which indicated the feasibility of translating the discovery from fresh-frozen tissues to clinical tests using FFPE samples for colon cancer gene signatures.[264]^77^,[265]^78 In this study, we developed the clinical assay containing 18 genes using NanoString nCounter platform and tested the 18- and 10-gene signatures in VUMC stage II to III colon cancer samples that were sectioned from FFPE tissue blocks. The 18-gene prognosis signature performed exceptionally well in the FFPE validation cohort (log rank p value = 0.009). Despite the relatively small sample size of the chemotherapy-benefit validation cohort (56 patients with 13 recurrent events), the 10-gene chemotherapy-benefit signature also achieved marginal significance (log rank p value = 0.0583), indicating a promising trend that distinguishes 5-FU-based ACT-beneficial and non-beneficial groups in terms of survival outcomes. These results demonstrated that our 10-gene signature set could provide useful information for identifying patients with colon cancer who are most likely to benefit from 5-FU-based ACT. The development of our custom NanoString panel represents a significant advancement in translating our gene signature from the research setting into clinical practice. This innovative approach enables the practical application of our 10-gene signature, potentially changing how patients with stage II and III colon cancer could be managed in terms of 5-FU-based ACT. By providing a robust and clinically applicable tool, we anticipate that our signature could aid in stratifying patients more accurately, thereby enabling personalized treatment strategies. The panel’s utility in distinguishing patients likely to benefit from ACT could lead to more precise and effective treatments, reducing unnecessary exposure to chemotherapy and its associated side effects for those unlikely to benefit. Furthermore, the integration of our panel into clinical workflows necessitates a series of next steps to fully realize its potential. Prospective clinical trials are imperative to validate the predictive power of the 10-gene signature in a real-world setting and assess its impact on patient outcomes. These trials should also explore the utility of the signature in conjunction with other emerging diagnostic tools, like circulating tumor DNA, to further refine treatment strategies. Additionally, the broader applicability of the signature in various stages of colon cancer and in different treatment scenarios, such as immune checkpoint blockade therapy, should be investigated. This expanded research could uncover more comprehensive applications of the signature, thereby contributing to the overarching goal of precision oncology. The economic aspects of implementing this technology in routine clinical settings also warrant thorough evaluation to ensure its cost-effectiveness and accessibility. Ultimately, these steps will be crucial in transitioning our research findings from the bench to the bedside, offering new hope for optimized cancer treatment strategies. In our study, we also conducted a simulation analysis to assess the adequacy of our sample size for validating the 10-gene signature for prospective studies. Adhering to the methodological framework suggested by Riley et al.,[266]^79 we targeted a calibration slope with narrow confidence. Our simulations across varying sample sizes (N = 250 to 3,000) demonstrated a consistent trend of decreasing SE with increasing sample size, underscoring the enhancement in the precision of our model’s calibration ([267]Table S9). Notably, the SE decreased from 0.259 at N = 250 to 0.075 at N = 3,000, highlighting the improved precision in larger cohorts. This analysis not only substantiates the reliability of our gene signature in smaller cohorts but also suggests increased robustness in larger ones. These findings, pivotal in contextualizing the clinical applicability of our model, support the notion that while larger sample sizes offer enhanced precision, even smaller sample sizes in our study provide valuable insights, contributing significantly to the nuanced understanding of gene signature validations in oncological research. This alignment of our results with statistical principles and clinical relevance underlines the robustness of our study’s findings and reinforces its potential impact in guiding ACT decisions in colon cancer. Our 10-gene signature offers promise, especially with its strong prognostic value and hints at predicting the benefits of ACT. Pathway analysis of this signature has uncovered specific biological functions related to chemotherapy responses, highlighting its potential to deepen our understanding of treatment impacts. In summary, our research introduces a 10-gene signature that could play a significant role in predicting ACT benefits. Looking ahead, its value in clinical decision-making could greatly enhance how we tailor ACT and immunotherapy for patients with colon cancer. The next steps involve more extensive validation through prospective clinical trials to firmly establish its predictive power. Limitations of the study Our study presents a 10-gene signature for predicting ACT benefit in stage II and III colon cancer, yet it is not without limitations. The primary challenge was our inability to benchmark this signature against others due to the unique focus on chemotherapy response, a contrast to the prevalent prognostic nature of existing signatures. This limitation, combined with the absence of comparable signatures and suitable datasets, restricts direct comparative analysis. In addition, future validation of the 10-gene signature is essential to confirm its clinical utility. This should involve prospective clinical trials and diverse datasets to ensure its efficacy across various patient groups. Exploring and benchmarking against alternative computational models could enrich the understanding of its relative performance. Moreover, it is important to note that our study was based solely on transcriptome data. While our approach provides valuable insights into gene expression profiles related to chemotherapy response, it does not capture the full spectrum of biological mechanisms that might influence treatment outcomes. The generation of multi-omics data from future clinical trials could offer a more comprehensive understanding of the biological landscape of colon cancer. Incorporating data from genomics, proteomics, metabolomics, and other omics layers will not only enhance our biological insight but also facilitate the discovery of biomarkers from different omics data, potentially leading to more robust and predictive biomarkers for chemotherapy benefit. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Software and algorithms __________________________________________________________________ R (v3.6.3) R CRAN [268]https://cran.r-project.org/ frma R package (v1.38.0) McCall et al. (2010)[269]^80 [270]https://git.bioconductor.org/packages/frma sva R package (v3.34.0) Leek et al. (2012)[271]^81 [272]https://git.bioconductor.org/packages/sva space R package (v 0.1-1.1) Peng et al. (2009)[273]^82 [274]https://cran.r-project.org/src/contrib/Archive/space/ randomForestSRC R package (v2.7.0) Ishwaran et al. (2008)[275]^83 [276]https://cran.r-project.org/web/packages/randomForestSRC/index.html clusterProfiler R package (v3.14.3) Yu et al. (2012)[277]^84 [278]https://git.bioconductor.org/packages/clusterProfiler GSVA R package (v1.34.0) Hanzelmann et al. (2013)[279]^85 [280]https://git.bioconductor.org/packages/GSVA hgu133plus2.db R package (v3.2.3) Carlson (2016)[281]^86 [282]https://bioconductor.org/packages/release/data/annotation/html/hgu 133plus2.db.html survminer R package (v0.4.9) Alboukadel et al. (2021)[283]^87 [284]https://cran.r-project.org/web/packages/survminer/index.html NanoStringNorm R package (v1.2.1) Waggott et al. (2012)[285]^88 [286]https://cran.r-project.org/src/contrib/Archive/NanoStringNorm/ GISTIC Mermel et al. (2011)[287]^37 [288]https://portals.broadinstitute.org/tcga/home MethHC Huang et al. (2021)[289]^38 [290]https://methhc.mbc.nctu.edu.tw/php/index.php DisGeNET Pinero et al. (2020)[291]^39 [292]https://www.disgenet.org/home/ CellMiner Reinhold et al. (2012)[293]^40 [294]https://discover.nci.nih.gov/cellminer/ TIDE Jiang et al. (2018)[295]^43 [296]https://tide.dfci.harvard.edu/ __________________________________________________________________ Deposited data __________________________________________________________________ Microarray of colon cancer expression profiles Marisa et al. (2013)[297]^89 GEO: [298]GSE39582 Microarray of colon cancer expression profiles Smith et al. (2010)[299]^90 [300]GSE17538 Microarray of colon cancer expression profiles Tripathi et al. (2014)[301]^91 [302]GSE38832 Microarray of colon cancer expression profiles De Sousa et al. (2011)[303]^92 [304]GSE33113 Microarray of colon cancer expression profiles Laibe et al. (2012)[305]^93 [306]GSE37892 Microarray of colon cancer expression profiles Kikuchi et al. (2012)[307]^94 [308]GSE27854 Microarray of colon cancer expression profiles Birnbaum et al. (2011)[309]^95 [310]GSE26906 Microarray of colon cancer expression profiles Tsukamoto et al. (2011)[311]^96 [312]GSE21510 Microarray of colon cancer expression profiles Gröne et al. (2011)[313]^97 [314]GSE18088 Microarray of colon cancer expression profiles Expression Project for Oncology ([315]http://intgen.org) [316]GSE2109 The VUMC NanoString nCounter® data This study [317]https://github.com/PoShine/CRC-5-FU [318]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, X. Steven Chen (steven.chen@miami.edu). Materials availability This study did not generate new unique reagents. Data and code availability * • All public datasets used in this study are available in the Gene Expression Omnibus ([319]https://www.ncbi.nlm.nih.gov/geo/). NanoString nCounter® data is available in the Supplementary [320]Data S1. * • The R code for the computational refinement of gene signatures was deployed [321]https://github.com/PoShine/CRC-5-FU. The DOI at Zenodo was [322]https://doi.org/10.5281/zenodo.10748699. * • Any additional information required to reanalyze the data reported in this paper is available from the [323]lead contact upon request. Experimental model and study participant details Patients and ethics statements The FFPE samples of surgically resected tumor tissues were obtained from colon cancer patients undergoing surgery at Vanderbilt University Medical Center (Nashville, USA) from 1990 to 2013 after informed consent was provided. All cases were confirmed by histopathology. This research was approved by the Institutional Review Board (IRB) and Ethics Committee of Vanderbilt University Medical Center (Nashville, USA). Each patient provided written informed consent. The patient demographic information is available in the Supplementary [324]Data S1. Method details Patients and tumor samples To ensure the most comprehensive coverage of probesets in the human genome, as well as concordance in the microarray platform, currently available colon cancer expression profiles detected by "Affymetrix Human Genome U133 Plus 2.0 Array" platform ([325]GPL570), were collected and used for analysis. All gene expression data of stage I and IV patients were removed before signature derivation/verification. As a result, a total of 1375 stage II to III colon cancer frozen tissue specimens from 10 independent gene expression datasets were collected from the Gene Expression Omnibus (GEO) microarray data repository, including [326]GSE39582 , [327]GSE17538, [328]GSE38832, [329]GSE33113, [330]GSE37892, [331]GSE27854, [332]GSE26906, [333]GSE21510, [334]GSE18088, and [335]GSE2109 ([336]Table 1). Among them, the first five datasets provide clinical information about relapse-free survival (RFS) time ([337]Table 1); and [338]GSE39582 and [339]GSE17538 contain the samples of patients who had received 5-fluorouracil-based (5-FU) adjuvant chemotherapy ([340]Table S8). Data pre-processing First, we applied the Frozen Robust Multi-array Analysis (fRMA) algorithm to each microarray dataset for data normalization.[341]^80 The probeset identifiers (IDs) were mapped to gene symbols using the R package "hgu133plus2.db". When multiple probesets correspond to one gene, the probeset with the most significant p-value associated with survival outcome was selected to represent the gene. For the discovery cohort, batch effects within [342]GSE39582 and among six different GEO datasets ([343]GSE39582, [344]GSE27854, [345]GSE26906, [346]GSE21510, [347]GSE18088, and [348]GSE2109) were removed using the ComBat method as implemented in the R package 'sva'.[349]^81 Construction of survival-related gene network and identification of hub genes Currently, for colon cancer, there are limited publicly available gene expression datasets with complete clinical annotation, such as survival outcomes. To maximally leverage information in gene expression datasets without survival information, we developed and validated our prognosis prediction models using a new semi-supervised gene-network based strategy. Within the largest stage II to III colon cancer collection, [350]GSE39582 was the only dataset with RFS information. We first used [351]GSE39582 to select genes significantly associated with RFS outcome based on the Cox proportional hazards model (p < 0.05). Next, we further filtered these genes using a network-based strategy. We hypothesized that hub genes, which have many interactions with other genes in the survival-related network, are more likely to be predictive of prognosis. To obtain a robust survival-related network, we combined all stage II and III colon cancer samples in [352]GSE39582, together with the other five expression datasets: [353]GSE27854, [354]GSE26906, [355]GSE21510, [356]GSE18088, and [357]GSE2109, all of which did not have RFS information, into an aggregated gene expression matrix. In the combined gene expression dataset, survival-associated genes were used to construct the survival-related gene network using the Sparse Partial Correlation Estimation (SPACE) algorithm,[358]^82 which was shown to be one of the best-performing gene network approaches to discover hub genes.[359]^98 Finally, we selected genes with more than 11 connections to their neighbor nodes in the network as hub genes. Prognosis model construction and evaluation Given the selected set of hub genes, we next used the "randomForestSRC" R package to construct the random survival forest prognosis model, using [360]GSE39582 which has RFS time as our training samples, and validated the prediction model in four additional independent datasets with available RFS time, including [361]GSE17538, [362]GSE33113, [363]GSE37892, and [364]GSE38832. The random survival forests approach is particularly advantageous for its ability to handle complex interactions between genes and its robustness in managing high-dimensional genomic data, making it an excellent tool for survival analysis without the need for assumptions on the data’s distribution.[365]^83^,[366]^99 The maximally selected rank statistic (maxstat) provided in the "survminer" package was used to determine the optimal cut-off point for discriminating high-risk vs. low-risk groups. The prediction performance of the RSF prognosis model was assessed using the hazard ratio (HR) and log rank test p-value. The detailed workflow for our proposed semi-supervised gene network-based prognosis model-building strategy is illustrated in [367]Figure 1A. The other existing colon cancer prognosis signatures, including OncotypeDX,[368]^29 ColoGuidePro,[369]^30 and ColoFinder,[370]^31 were used to compare the 18-gene prognosis model. Determining a robust chemotherapy-benefit signature set from adjuvant chemotherapy (ACT) samples To additionally identify a gene signature set for stage II to III colon cancer patients who are most likely to benefit from the 5-FU-based ACT, we further refined the 18-hub-gene set and investigated its ability to predict colon cancer outcomes following 5-FU-based ACT. More specifically, we identified potential cancer driver genes with either significant genomic aberration or DNA methylation alteration between colon cancer and normal samples using the GISTIC tool[371]^37 for TCGA copy number and MethHC database[372]^38 (see [373]Figure 1B). We also verified the 18-hub-genes in the DisGeNET database.[374]^39 An RSF chemotherapy-benefit model based on these 10 genes was then re-constructed using 142 samples with 5-FU-based ACT information in the [375]GSE39592 dataset. Finally, we accessed the prediction performance of this chemotherapy-benefit signature using two datasets generated from our group-microarray data based on fresh frozen tissues in [376]GSE17538 and NanoString nCounter data using FFPE samples from Vanderbilt University Medical Center (VUMC), which included expression data of colon cancer patients who received 5-FU-based ACT samples ([377]Figure 1B). NanoString nCounter assay development To facilitate the clinical application of the gene signatures, we designed a custom nCounter assay (NanoString Technologies, Seattle, WA) for quantitative assessment of the expression of 25 gene elements using a multiple probe approach. The final codeset contained the 18-hub-gene set identified from the discovery cohort based on the microarray platform using fresh frozen tissue. Seven housekeeping genes derived from the least variant expression elements in pilot nCounter studies were also included. To provide a more robust measurement, three probes were designed for each of the 18 hub genes and 6 of the housekeeping genes. Only two probes were designed for RPL38 due to proximity issues between the probes. A full list of gene symbols and the probe design for these 25 genes can be found in [378]Table S1. This multiplexed assay is reported to detect expression of up to 800 transcripts at very low mRNA concentrations (0.1fM/1 copy per cell) even in RNA samples that are significantly degraded, as long as at least 20% of the sample has RNA fragments of 300 base pairs or longer. Therefore, the 144 CRC samples with 20%–76% (median value is 43%) of the RNA sample containing fragments of 300 base pairs or longer were hybridized to the custom nCounter assay. Evaluation of the predictive performance in the VUMC NanoString nCounter data To analyze stage II to III colon cancer specimens using the NanoString nCounter platform, total RNAs were isolated from 144 FFPE preserved colon cancer specimens (with associated RFS prognosis information), of which 109 specimens were stage II to III colon cancer patients, 56 of which had received 5-fluorouracil-based (5-FU) adjuvant chemotherapy ([379]Table 1). We used the R package 'NanoStringNorm' to normalize the raw expression data, summarized the CodeCount (positive) and SampleContent (housekeeping) controls by the geometric mean, and used the 'mean' option for background correction. Normalized data were log2-transformed for further analyses. Batch effects were removed using the ComBat method as implemented in the R package 'sva'.[380]^81 Finally, when multiple probesets were targeted to the same gene, the probeset with the maximum expression value was used to represent the gene. A total of 109 stage II to III samples were utilized to assess the RFS prognosis model constructed by the 18-hub-gene set, and 56 samples that received 5-fluorouracil-based (5-FU) adjuvant chemotherapy were utilized to validate the chemotherapy-benefit 10-gene prediction model. Because of the difference in microarray platforms, when the NanoString data were utilized to evaluate the efficiency of our prediction model, for each gene, we computed Z score transformations for samples in the training set and samples in the testing set separately. The NanoString data is available in the Supplementary [381]Data S1. Drug sensitivity analyses Gene transcript z scores data for the chemotherapy-benefit gene signature set and their corresponding -log10(GI 50) values treated by 5-FU in NCI-60 cell lines were obtained from CellMiner database ([382]https://discover.nci.nih.gov/cellminer/).[383]^40 GI 50 represents the concentration of a drug required to reduce cancer cell proliferation by 50%. Using Z score data from the chemotherapy-benefit gene signature set, complete-linkage hierarchical clustering was performed to categorize all cell lines into resistant and sensitive groups. The Wilcoxon rank-sum test was then carried out to evaluate the difference between resistant and sensitive cell lines by their median -log10(GI 50) values. Pathway analysis The five pathway collections including BioCarta, KEGG, PID, Reactome, and WikiPathways from MSigDB C2 Canonical pathway database were used for pathway analysis. We used the enricher function in the Bioconductor package clusterProfiler for pathway enrichment analysis.[384]^84 The ssGSEA method was implemented in the GSVA R package and was applied to compare ACT benefit and non-benefit groups and heatmap visualization for samples from the [385]GSE17538 dataset.[386]^85 Tumor immune dysfunction and exclusion (TIDE) analysis The TIDE score computed for each tumor sample can serve as a surrogate biomarker to predict response to immune checkpoint blockade, including anti-PD1 and anti-CTLA.[387]^43 We utilized the TIDE online tool ([388]http://tide.dfci.harvard.edu/) to calculate the TIDE score for each tumor sample in four datasets: [389]GSE17538 , [390]GSE33113, [391]GSE37892, and [392]GSE38832. Then, we calculated a 10-gene chemotherapy-benefit score for each sample in the four datasets, irrespective of whether the sample had received 5FU chemotherapy. The TIDE scores and the 10-gene chemotherapy-benefit scores were further compared. Quantification and statistical analysis The two-sample Kolmogorov-Smirnov test or Wilcoxon rank-sum test was appropriately used to compare groups. The Kaplan–Meier method was used to analyze the RFS of patients with different risk stratification. All analyses were performed using R software (version 3.6.3). Acknowledgments