Abstract One major topic in colorectal cancer (CRC) research is the role of immune cells against cancer cells. The association of single-nucleotide polymorphisms (SNPs) and polygenic risk scores (PRS) with CRC was examined and their functional properties were identified using a gene-gene interaction network. 960 CRC patients at Seoul National University Hospital (SNUH, discovery) and 6,627 CRC patients at Chonnam National University Hospital (CNNUH, validation) were enrolled. SNPs were genotyped using the Korean Biobank Array. 2,729 immune-related genes were selected from the Ensembl, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes, and 37,398 SNPs were mapped. PRS were categorized into tertiles. Cox proportional hazard models were fitted for overall survival (OS) and progression-free survival (PFS). A gene-gene interaction network was analyzed. Among CRC patients from SNUH, 154 (16.0%) died, while 245 (25.5%) had progression. In CNNUH, 3,537 (53.4%) died. For OS, the most significant association was observed for rs117322760 (8q23.1, PKHD1L1, hazard ratio (HR) = 4.58, p-value = 1.40 × 10^− 6). For PFS, it was observed in rs143531681 (7q36.1, NOS3, HR = 4.67, p-value = 9.72 × 10^− 8). For PRS, the highest tertile group showed an increased risk for OS (HR = 59.58, p-value = 9.20 × 10^–48) and PFS (HR = 9.81, p-value = 1.69 × 10^–23). Significant interactions were observed between PIK3R2 and PIK3CA for OS and ALOX5 and COTL1 for PFS. This study presented novel genetic variants associated with OS and PFS in CRC patients, and notable findings from the analysis of PRS and the gene-gene interaction. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-025-13819-4. Keywords: Colorectal cancer, Immune-related gene, Single-nucleotide polymorphism, Polygenic risk score, Gene-gene interaction network, Survival Introduction In South Korea, colorectal cancer (CRC) is a leading malignancy, ranking third in incidence among men and women [[42]1]. Moreover, it stands as the third cause of cancer-related deaths in men and women [[43]1]. Owing to the early detection of nationwide medical screenings and the improvements in cancer treatments, the survival rate of CRC has improved from 56.2% (1993–1995) to 74.3% (2016–2020) [[44]1]. Nonetheless, patients with a late diagnosis of CRC persistently face an elevated mortality risk [[45]1]. This assertion is substantiated by data indicating that the five-year survival probabilities for CRC at stages 1 and 4 are 91.9% and 30.4%, respectively [[46]2]. The role of immune cells in cancer therapy is a critical area of research and new anticancer therapies based on the immune system are being developed [[47]3]. Immune checkpoint inhibitors (ICIs) targeting PD-L1, PD-1, CAR-T cells, and novel ICIs targeting unchallenged targets have emerged as breakthroughs in this field [[48]4]. This trend is evident in CRC treatment, particularly in late-stage cases, where an effective immunological response can significantly improve overall survival (OS) and progression-free survival (PFS) by affecting metastatic features [[49]4, [50]5]. Immunity-related biological pathways are gaining attention in CRC research. Pathways such as Interleukin-1 signaling, CLEC7A in innate immunity, and B cell receptor signaling significantly impact the survival and clinical outcomes of patients with CRC [[51]6]. Several genetic association studies have been conducted on immune-related genes and survival outcomes of association studies. IL23A, LIF, VGF, SLIT2, and FGF18 have a negative effect on the survival outcome of patients with CRC [[52]7]. Additionally, rs4464148 (18q21.1, SMAD7) and rs6983267 (8q24.21, CCAT2) are associated with poorer OS outcomes in patients with CRC [[53]8]. One of the limitations of previous candidate gene approaches and genome-wide association studies (GWAS) is that they did not apply functional analysis to provide biological interpretations and clinical implications. Therefore, functional analyses such as polygenic risk scoring and gene network analysis could be utilized to connect genetic association results with additional beneficial biological and clinical effects. The role of polygenic risk scores (PRS) was emphasized in the genetic analysis of cancer. The importance of PRS in cancer genetics for estimating genetic risk and enabling precision medicine has been highlighted [[54]9]. Polygenic risk score-based genetic screening of patients with CRC could yield more accurate results than the current guidelines [[55]10]. Moreover, applying the concept of PRS to CRC revealed that the cumulative burden of CRC-associated common genetic variants is more strongly associated with early-onset CRC compared with late-onset CRC [[56]11]. Additionally, network-based functional analysis in cancer research is being extensively studied for a detailed biological understanding of SNP analysis outcomes [[57]12–[58]14]. Therefore, this study aimed to identify the susceptibility of single nucleotide polymorphisms (SNPs) and genes using genome-wide immune-related gene association studies to uncover the clinical importance of the immune system for survival outcomes in patients with CRC. Additionally, we utilized PRS and gene-gene interaction networks to reveal the clinical and biological impacts of susceptible loci in CRC. Methods Study population and follow-up Patients with CRC were recruited in 2002 and underwent surgical resection at the Seoul National University Hospital (SNUH), Korea. This study included patients diagnosed with CRC and who underwent surgery in 2013.Among 1,099 selected patients, we have performed genome-wide SNP arrays in 960 patients. In this study, all patients underwent surgical treatment between May 2014 and June 2017 and were prospectively followed up until April 14, 2023. Additional details of the patient selection process were presented in Supplementary Fig. [59]1, Additional file [60]1. For the validation study, patients with CRC from Chonnam National University Hospital (CNNUH), Korea, were utilized as described in detail elsewhere [[61]15]. These patients were diagnosed with CRC between 2004 and 2014 and were prospectively followed up until December 31, 2020. They were divided into two groups (MEGA array and ONCO array), with each group having 3,465 and 3,162 patients, respectively. Further details regarding the validation study population are provided in Supplementary Table [62]1, Additional file [63]2. Overall and progression-free survival The definitions of OS and PFS were used to assess the clinical outcomes in patients with CRC. Overall survival was defined as the time elapsed from surgical treatment to death (event) or the end of the follow-up period if the patient was alive. In the SNUH patient group, death certificate data provided by the Korean Statistical Information Service [[64]16] were linked to the patient with CRC data. Furthermore, the cause and date of death in the CNNUH patient group provided by the National Statistical Office were merged with the patient with CRC data. Progression-free survival was defined as the time elapsed from surgical treatment to disease progression (event, death, or relapse), or the end of the follow-up period if the patients did not have any progression events. Gene selection We utilized three biological databases to obtain immune-related gene data: ENSEMBL [[65]17], Kyoto Encyclopedia of Genes and Genomes (KEGG) [[66]18], and Gene Ontology (GO) [[67]19]. In ENSEMBL, we searched for immune-related genes using the keyword “immune” and obtained the resulting data in XML format. In KEGG, we selected all pathways related to the “5.1 Immune System” category. In GO, we selected the “Immune System Process” category through the AmiGO2 search browser and obtained result genes. Through integration of genes from three databases, we identified 3,523 non-redundant immune-related genes. (Supplementary Fig. [68]3, Additional file [69]1) Genotyping, SNP selection, and quality controls Single nucleotide polymorphism data were obtained using the Axiom Korean chip 1.1 [[70]20] with 960 blood samples. Sample quality control and genotype calling were performed using Affymetrix Power Tools (APT) and the AxiomGT1 BRLMM-P Algorithm. As a result, we obtained 800,493 genome-wide SNPs Furthermore, we selected immune-related SNPs within ± 200 kb from the position of 3,523 immune-related genes using the bioMart [[71]21] and SnpMatrix R package [[72]22]. Finally, we assembled 63,459 SNPs within or near 3,017 immune-related genes. We applied four exclusion criteria that were not mutually exclusive to the flowchart of the study design (Supplementary Fig. [73]3): (1) minor allele frequency (MAF) < 0.01 (N = 23,049), (2) Hardy–Weinberg equilibrium (HWE) p-value < 1 × 10^− 6 (N = 2,493), (3) mono-allelic (N = 10,823), and (4) call rate < 98% (N = 5,435). We used the criteria from previous quality control (QC) procedures [[74]23] but there were no additional excluded SNPs from this procedure. In total, 37,398 genotyped SNPs were identified. Additionally, genotype imputation was performed using the Michigan Imputation Server [[75]24]. We used MINIMAC 4 and selected 1000 genomes [[76]25] as the reference panel from East Asia for the subpopulation. For QC, we selected an r-squared statistic (rsq) filter of 0.3, which removed over 70% of poorly imputed SNPs. Subsequently, we implemented standard settings of QC by the Michigan Imputation Server, such as minor-allele frequency < 1%, call rate > 90%, and removal of duplicate SNPs. Finally, 2,514 SNPs were excluded from the imputed dataset for two reasons (invalid alleles, 942 SNPs; allele mismatch, 1,503 SNPs), leaving 445,947 imputed SNPs. Accordingly, 450,667 genotyped and imputed SNPs within or near 3,017 immune-related genes were selected. Single nucleotide polymorphism association analysis We utilized the “gwasurvivr” R package [[77]26] to identify statistically significant SNPs associated with time-to-clinical outcomes (OS and PFS) based on the Cox proportional hazards model (“plinkCoxSurv” function) with selected six covariates (sex, age, BMI, tumor location, histologic grade, and TNM stage), as described in the previous section. For the SNP association analysis, we set the cut-off values for the p-values at 1.0 × 10^− 4 (extended suggestive p-value), 1.0 × 10^− 5 (suggestive p-value), and 1.34 × 10^− 6 (Bonferroni-corrected p-value calculated from 0.05 divided by 37,398, total number of genotyped immune-related SNPs). To describe independent index SNPs among linkage disequilibrium (LD), we performed a pairwise LD analysis between two SNPs and calculated the r^2 values using the “LDmatrix” function in the “LDlink” web-based service [[78]27]. We chose the SNP with the smallest p-value among the SNPs with an r^2 value equal to or greater than 0.6. To visualize the results of previous steps, we used the “CMplot” R package [[79]28], which generated Manhattan plots of significant SNPs linked to clinical outcomes and differentiated them by distinctive colors. Furthermore, significant SNPs from the discovery study results were used for the validation study with the same covariates as in the discovery study. The p-value threshold for the validation study was 0.05. The validation study population did not have relapse data for the PFS outcome; therefore, it was replaced with the OS outcome. We performed meta-analysis utilizing “meta” R package [[80]29] and extracted heterogeneous p-values and I^2 values to combine the SNP effects on survival of discovery and validation studies. We selected random-effect p-values for SNPs that showed heterogeneous p-values > 0.05, and fixed-effect p-values for SNPs that did not show heterogeneous p-values > 0.05. Polygenic risk score analysis We followed the standard protocol outlined by Choi et al. and the clinical applications described by Lewis et al. to calculate the PRS of the study participants (Supplementary Fig. [81]4, Additional file [82]1) [[83]30, [84]31]. Briefly, the effect size of each SNP and allelic dosage of the effect allele were multiplied and summed individually. For normalization, it was divided by the number of non-missing SNPs in each individual multiplied by the ploidy, which is two in humans. We used z-scores as the effect size of each SNP since we used a Cox proportional hazards model. We analyzed the PRS tertile groups stratified by the lowest, middle, and highest levels to evaluate the clinical significance of PRS in patients with CRC. Each group was designated as “Low,” “High,” and “Middle.” We utilized the “survminer” and “survival” R packages to demonstrate the distinctive survival outcomes by PRS [[85]32, [86]33]. The “coxph” function was used to perform PRS survival analysis with the selected six covariates (sex, age, BMI, tumor location, histologic grade, and TNM stage) from the previous SNP analysis. The extended suggestive p-value of 1.0 × 10^− 4 was used for PRS analysis. Subsequently, we performed pathway enrichment analysis using the Reactome Knowledgebase [[87]34] on gene sets matched to selected SNPs. Statistically significant pathways were identified using a Bonferroni-corrected p-value threshold, calculated as 0.05 divided by the total number of pathways identified in each analysis. Gene-gene network analysis We utilized GeneMANIA within Cytoscape software 3.10.0 to analyze and visualize the gene-gene interaction network of selected genes matched to suggestive SNPs [[88]35, [89]36]. All annotation databases (co-expression, co-localization, genetic interactions, pathways, physical interactions, predicted, and shared protein domains) were selected. The tables of edges (connected nodes) and nodes (gene names) were extracted. Finally, the extended suggestive p-value of 1.0 × 10^− 4 was used for gene-gene interaction network analysis. Statistical analysis Confounding factors We selected six variables–sex, age, body mass index (BMI), tumor location, histological grade, and Tumor Nodes Metastasis (TNM) stage–as potential confounding factors. Specifically, age, histological grade, and tumor location were statistically significant. Although sex, BMI, and age were not statistically significant within the discovery study population, extensive research has demonstrated their association with poor CRC outcomes. Therefore, we additionally selected sex, BMI and age in our analysis despite the lack of statistical significance in the study population. However, we did not select “adjuvant chemotherapy” because this factor was statistically associated with and predominantly determined by the TNM stage. This relationship between adjuvant chemotherapy and TNM stage was revealed through Pearson’s correlation test with a correlation coefficient of 0.514 and p-value less than 2.2 × 10^–16, which indicates a statistically significant positive correlation. We subsequently conducted a multicollinearity test using the variance inflation factor (VIF) method (Supplementary Fig. [90]2, Additional file [91]1). The VIF values for each confounding factor ranged from 1.002695 to 1.350784, indicating that the selected confounding factors were independent [[92]37]. Results Study population characteristics The characteristics of the discovery study population are described in Table [93]1. The non-event group in OS had a mean age of 61.9 years with a standard deviation of 10.5 years, whereas the event group had a mean age of 64.4 years with a standard deviation of 12.4 years. In PFS, the non-event group had a mean age of 62.0 years with a standard deviation of 10.7 years, and the event group had a mean age of 63.8 years with a standard deviation of 11.7 years. Regarding sex category, the non-event group in OS had 57.5% of males and 42.5% of females, while the event group had 61.2% of males and 38.8% of females. In PFS, the non-event group had 56.6% of males and 43.4% of females, while the event group had 62.1% of males and 37.9% of females. Details of the remaining clinical variables (BMI, tumor location, and so on) are represented in Table [94]1. Table 1. Characteristics of participants included in the discovery study Characteristics OS PFS Non-event (N = 718) Event (N = 242) p-value^a Non-event (N = 643) Event (N = 317) p-value^a Follow-up time (months) Median [range] 84 [3–96] 31 [1–93] 71 [1–96] 14 [1–93] Age (year) 0.006 0.02  Mean (SD) 61.9 (10.5) 64.4 (12.4) 62.0 (10.7) 63.8 (11.7) Sex 0.359 0.117  Male 413 (57.5%) 148 (61.2%) 364 (56.6%) 197 (62.1%)  Female 305 (42.5%) 94 (38.8%) 279 (43.4%) 120 (37.9%) BMI (kg/m^2) 0.158 0.221  Underweight  (BMI < 18.0) 30 (4.2%) 14 (5.8%) 25 (3.9%) 19 (6.0%)  Normal  (18.0 ≤ BMI < 23.0) 259 (36.1%) 101 (41.7%) 233 (36.2%) 127 (40.1%)  Overweight  (23.0 ≤ BMI < 25.0) 174 (24.2%) 58 (24.0%) 158 (24.6%) 74 (23.3%)  Obese  (25.0 ≤ BMI) 255 (35.5%) 69 (28.5%) 227 (35.3%) 97 (30.6%) Synchronous other cancer 1 0.96  Positive 703 (97.9%) 237 (97.9%) 629 (97.8%) 311 (98.1%)  Negative 15 (2.1%) 5 (2.1%) 14 (2.2%) 6 (1.9%) Tumor location 0.02 0.167  Colon cancer 500 (69.6%) 154 (63.6%) 449 (69.8%) 205 (64.7%)  Rectal cancer 199 (27.7%) 73 (30.2%) 175 (27.2%) 97 (30.6%)  Multiple cancer 19 (2.6%) 15 (6.2%) 19 (3.0%) 15 (4.7%) Histologic grade < 0.001 < 0.001  High 44 (6.1%) 37 (15.3%) 40 (6.2%) 41 (12.9%)  Low 625 (87.0%) 195 (80.6%) 554 (86.2%) 266 (83.9%)  Unknown 49 (6.8%) 10 (4.1%) 49 (7.6%) 10 (3.2%) TNM Stage < 0.001 < 0.001  0 24 (3.3%) 1 (0.4%) 24 (3.7%) 1 (0.3%)  1 162 (22.6%) 13 (5.4%) 154 (24.0%) 21 (6.6%)  2 222 (30.9%) 36 (14.9%) 207 (32.2%) 51 (16.1%)  3 267 (37.2%) 70 (28.9%) 235 (36.5%) 102 (32.2%)  4 41 (5.7%) 121 (50.0%) 21 (3.3%) 141 (44.5%)  Unknown 2 (0.3%) 1 (0.4%) 2 (0.3%) 1 (0.3%) Adjuvant chemotherapy 0.1 < 0.001  Positive 470 (65.5%) 173 (71.5%) 405 (63.0%) 238 (75.1%)  Negative 248 (34.5%) 69 (28.5%) 238 (37.0%) 79 (24.9%) [95]Open in a new tab Abbreviations: OS (Overall survival), PFS (Progression-free survival), BMI (Body mass index), TNM Stage (Tumor Node Metastasis stage), SD (Standard deviation) ^aContinuous variables are performed by linear-rank test and categorical variables are performed by log-rank test The median OS for the non-event group was 84 months, with a minimum and maximum of 3 and 96 months, respectively. For the event group, the median OS was 31 months, with a minimum and maximum of 1 and 93 months, respectively. The median PFS in the non-progression group was 71 months, with minimum and maximum values of 1 and 96 months, respectively. In the progression group, the median PFS was 14 months, with minimum and maximum PFS of 1 and 93 months, respectively. Single nucleotide polymorphism association on overall and progression-free survival Associations between immune-related SNPs and survival are represented as Manhattan plots in Fig. [96]1. The cluster of SNPs from all stages of OS highlights important positions that have risen above the suggestive p-value (1.0 × 10^− 5) (Fig. [97]1A). One SNP (rs75663950) on chromosome 9 was in LD with rs1414944 (Fig. [98]1A). In this analysis, four SNPs (rs1414944 (PIP5K1B, 9q21.11, hazard ratio (HR) = 1.5, p-value = 3.39 × 10^− 6), rs34009001 (ADCY7, 16q12.1, HR = 3.28, p-value = 4.38 × 10^− 6), rs2967871 (COTL1, 16q24.1, HR = 1.58, p-value = 6.07 × 10^− 6), and rs1082967 (PRKCI, 3q26.2, HR = 1.57, p-value = 7.86 × 10^− 6) represented p-values lower than the suggestive p-value (Table [99]2). Although none of these SNPs reached a Bonferroni-corrected p-value of 1.34 × 10^− 6, they suggested an association with OS in CRC. Fig. 1. [100]Fig. 1 [101]Open in a new tab Manhattan plots of immune-related SNP associations with survival. (A) Overall survival for all stages. (B) Overall survival for stages 1–3. (C) Progression-free survival for all stages. (D) Progression-free survival for stage 1–3 Each dot represents the test results for a single SNP. The x-axis represents the genomic location along each chromosome and the y-axis represents the log10 p-value. The solid horizontal line and the dotted horizontal line correspond to a p-value of 1.34 × 10^− 6 and 1.00 × 10^− 5, respectively Table 2. Top SNPs associated with survival among CRC patients with suggestive statistical significance OS Category SNP Chr Pos Reference allele Alternate allele Band Matched gene Hazard ratio p-value p-value (fixed) p-value (hetero) I^2 (%) All stages rs1414944 9 68,841,464 G A q21.11 PIP5K1B 1.5 3.39E-06 rs34009001 16 50,287,266 C T q12.1 ADCY7 3.28 4.38E-06 6.19E-06 5.17E-03 87.21 rs2967871 16 84,614,385 C T q24.1 COTL1 1.58 6.07E-06 rs1082967 3 170,230,623 C T q26.2 PRKCI 1.57 7.86E-06 Stage 1–3 rs117322760 8 109,495,997 G C, T q23.1 PKHD1L1 4.58 1.40E-06 rs117802394 7 80,873,623 T C q21.11 SEMA3C 6.29 1.47E-06 rs79798148 1 206,824,216 G A q32.1 IL19 2.5 3.67E-06 rs142842205 7 5,620,896 C G p22.1 RNF216 4.29 5.74E-06 rs147127574 12 103,667,078 A G q23.3 STAB2 3.82 7.82E-06 rs74996929 9 20,503,822 C A p21.3 MLLT3 2.91 7.90E-06 PFS Category SNP Chr Pos Reference allele Alternate allele Band Matched gene Hazard ratio p-value p-value (fixed) p-value (hetero) I^2 (%) All stages rs76181827 11 120,404,330 C G q23.3 ARHGEF12 2.77 1.03E-06 rs896531 11 3,902,679 A G p15.4 STIM1 2.03 4.99E-06 rs116332704 2 80,124,577 G A, C,T p12 CTNNA2 3.83 9.16E-06 Stage 1–3 rs143531681 7 151,004,636 T C q36.1 NOS3 4.67 9.72E-08 rs145049833 2 175,145,457 T C q31.1 ATF2 4.04 2.83E-07 rs76181827 11 120,404,330 C G q23.3 ARHGEF12 3.36 5.90E-07 [102]Open in a new tab Abbreviations: SNP, single nucleotide polymorphism; Chr, chromosome; Pos, position; fixed, fixed-effect; hetero, heterogenous; All stages, the discovery data set of 960 CRC patients with all stages in OS or PFS; Stage 1–3, the discovery data set of 795 CRC patients with stage 1–3 in OS or PFS; Furthermore, we conducted a stratification analysis of stages 1–3 in OS. The cluster of SNPs from stages 1–3 highlights important positions that have risen above the suggestive p-value (1.00 × 10^− 5) (Fig. [103]1B). No SNPs were observed in LD above the suggested p-value threshold in the stratification analysis by stage in contrast to the analysis of all stages based on OS. Six SNPs had p-values lower than the suggestive p-value: rs117322760 (PKHD1L1, 8q23.1, HR = 4.58, p-value = 1.40 × 10^− 6), rs117802394 (SEMA3C, 7q21.11, HR = 6.29, p-value = 1.47 × 10^− 6), rs79798148 (IL19, 1q32.1, HR = 2.5, p-value = 3.67 × 10^− 6), rs142842205 (RNF216, 7p22.1, HR = 4.29, p-value = 5.74 × 10^− 6), rs147127574 (STAB2, 12q23.3, HR = 3.82, p-value = 7.82 × 10^− 6), and rs74996929 (MLLT3, 9p21.3, HR = 2.91, p-value = 7.90 × 10^− 6) (Table [104]2). None of these SNPs reached a Bonferroni-corrected p-value of 1.34 × 10^− 6 from all stages. Information regarding all SNPs above the suggestive p-value (1.00 × 10^− 5) threshold is discussed in Supplementary Table [105]2, Additional file [106]2. For the meta-analysis of all stages of OS, only SNP (rs34009001) was selected from the validation study because it achieved a p-value below 0.05. This SNP showed an I^2 value of 87% and a heterogeneous p-value of 5.17 × 10^− 3 and its fixed-effect p-value was 6.19 × 10^− 6. Similarly, only one SNP (rs1111691) was selected from the validation study in the meta-analysis of stages 1–3 in OS, and it showed an I^2 value of 87%, a heterogeneous p-value of 5.62 × 10^− 3, and a fixed-effect p-value of 2.82 × 10^− 4. Although the validation study revealed high heterogeneity among the discovery and validation studies, the fixed effects p-values were < 0.05. The cluster of SNPs from all stages in PFS highlights important positions that have risen above the suggestive p-value (1.00 × 10^− 5) (Fig. [107]1C). None of the SNPs were in LD above the suggested p-value (Fig. [108]1C). In this analysis, three SNPs (rs76181827 (ARHGEF12, 11q23.3, HR = 2.77, p-value = 1.03 × 10^− 6), rs896531 (STIM1, 11p15.4, HR = 2.03, p-value = 4.99 × 10^− 6), and rs116332704 (CTNNA2, 2p12, HR = 3.83, p-value = 9.16 × 10^− 6) had p-values lower than the suggested p-value (Table [109]2). In contrast to previous results for OS, one SNP (rs76181827) exceeded the Bonferroni-corrected p-value of 1.34 × 10^− 6. Additionally, we conducted a stratification analysis of stage 1–3 of PFS. The cluster of SNPs highlights important positions that have risen above the suggestive p-value (1.00 × 10^− 5) (Fig. [110]1D). None of the SNPs were in LD above the suggested p-value (Fig. [111]1D). Three SNPs had p-values lower than the suggestive p-value: rs143531681 (NOS3, 7q36.1, HR = 4.67, p-value = 9.72 × 10^− 8), rs145049833 (ATF2, 2q31.1, HR = 4.04, p-value = 2.83 × 10^− 7), and rs76181827 (ARHGEF12, 11q23.3, HR = 3.36, p-value = 5.90 × 10^− 7) (Table [112]2). All three SNPs exceeded the Bonferroni-corrected p-values of 1.34 × 10^− 6. For the meta-analysis of all stages of PFS, none of the SNPs were selected from the validation study because none of the significant SNPs in the discovery study achieved a p-value lower than 0.05 in the validation study. However, two SNPs (rs11906029 and rs4616886) were selected from the validation study for the meta-analysis of stages 1–3 of PFS. They showed I^2 values of 91% and 90%, respectively, heterogeneous p-values of 8.58 × 10^− 4 and 1.76 × 10^− 3, respectively, and their fixed-effect p-values were 2.58 × 10^− 3 and 1.32 × 10^− 3, respectively. The fixed effects p-values were < 0.05 although the validation study revealed high heterogeneity among the discovery and validation studies. Polygenic risk score association on overall and progression-free survival Initially, the PRS for the non-event and event groups was the same at 0.051 in the analysis of all stages of OS. The PRS distribution overlapped in the density and box plots (Fig. [113]2A). Second, the PRS for the non-event and event groups were 0.020 and 0.421, respectively in the analysis of stage 1–3 OS. This result was significantly different from the analysis of all stages (Fig. [114]2B). The PRS was not feasible as a biomarker for analyzing all stages of OS according to the survival plot and HR (Fig. [115]2A). This contrasted with the analysis of OS at stages 1–3, which significantly differed (Fig. [116]2B). The difference in the survival curve plot is consistent with the HR for both groups. The HR for high and middle-grade subgroups of all stages were 1.07 (p-value 0.7) and 1.03 (p-value, 0.9), respectively with the low-grade subgroup assigned as a reference. In contrast, the HR for subgroups of stages 1–3 was 59.58 and 9.69, respectively (p-values < 0.001 for both) using the low-grade subgroup as the reference. Additionally, the concordance index (C-index) of all stages of OS with PRS was 0.78, which is equal to the value obtained without PRS. Moreover, the C-index of stages 1–3 of OS with PRS was 0.77, slightly higher than 0.76 observed from stages 1–3 of OS without PRS. In the analysis of all stages of PFS, the PRS for the non-event and event groups were 0.078 and 0.082, respectively. The PRS distribution overlapped in the density and box plots (Fig. [117]2C). The PRS for the non-event and event groups were 0.032 and 0.050 in the analysis of stages 1–3 of PFS. The results of this analysis significantly differed from those of the analysis at all the stages (Fig. [118]2D). Only the high PRS differed between the middle- and low-grade groups according to the survival plot and HR (Fig. [119]2C). According to the results for stages 1–3, PRS is a biomarker with significantly distinct survival curves across each group. The differences in the survival curve plot were consistent with the HR results for both groups. The HR for high- and middle-grade subgroups of all stages were 1.83 and 1.17, as the low-grade subgroup was assigned to a reference. Additionally, the p-value for the middle-grade subgroup was 0.3 and the p-value for the high-grade subgroup was < 0.001. The results for stages 1–3 were slightly different from the previous results for all stages. The HR for stages 1–3 were 9.61 and 4.21, as the low-grade subgroup was assigned to a reference. The p-values for this outcome were < 0.001 for both groups. Furthermore, the C-index of all stages of PFS with PRS was 0.82, higher than the 0.70 observed from all stages of PFS without PRS. Lastly, the C-index of stages 1–3 of PFS with PRS was 0.78, higher than the corresponding value of 0.67 without PRS. Fig. 2. [120]Fig. 2 [121]Open in a new tab Polygenic risk score (PRS) distribution and associations with survival. (A) Overall survival for all stages. (B) Overall survival for stages 1–3. (C) Progression-free survival for all stages. (D) Progression-free survival for stage 1–3 The PRS value was used as a continuous variable for distribution and box plots; its grade was used as a categorical variable for the survival curve and forest plot Gene-gene interaction network on overall and progression-free survival The gene-gene interaction network from all stages of OS is depicted in Fig. [122]3A and the information for both the top edge and nodes is discussed in Table [123]3. Orange-colored nodes are query genes matched to significant SNPs and light-yellow nodes are predicted genes from GeneMANIA. Each node had 45.95% and 54.05%. For each gene, the individual scores ranged from 0.006 to 0.039 for the predicted genes, and 0.497–0.931 for the query genes. Additionally, there were four categories of edges (co-expression, genetic interactions, pathways, and physical interactions), with the proportions of edges equal to 10.10%, 57.58%, 26.26%, and 6.06%, respectively. The edges showed normalized max weights from 1.09 × 10^− 6 to 0.087 and “Pathway” usually showed larger weights than other types of edges (Supplementary Table [124]3, Additional file [125]2). DTX1 (predicted gene) and ADAM10 (query gene) had the largest weights. Fig. 3. [126]Fig. 3 [127]Open in a new tab Gene-gene interaction networks on survival. (A) Overall survival for all stages. (B) Overall survival for stages 1–3. (C) Progression-free survival for all stages. (D) Progression-free survival for stage 1–3 Orange nodes represent the query genes. Light-yellow nodes indicate predicted genes. Each edge has a different value for the relationship between two nodes (genes) and is distinctively colored. The thickness of each edge indicated its weight Table 3. Top edge and node in gene-gene interaction network OS Top edge Top node Data type Name Normalized max weight Gene Name Score Category All stages Pathway H__sapiens__1_-779,769| H__sapiens__1_-780,309| Pathway 0.087 DTX1 H__sapiens__1_-779,769 0.039 predicted ADAM10 H__sapiens__1_-780,309 0.749 query Stage 1–3 Physical Interactions H__sapiens__2_-775,897| H__sapiens__2_-777,953| Physical Interactions 0.123 PIK3R2 H__sapiens__2_-775,897 0.035 predicted PIK3CA H__sapiens__2_-777,953 0.469 query PFS Top edge Top node Data type Name Normalized max weight Gene Name Score Category All stages Predicted H__sapiens__3_-773,060| H__sapiens__3_-775,518| Predicted 0.139 ALOX5 H__sapiens__3_-773,060 0.089 predicted COTL1 H__sapiens__3_-775,518 0.761 query Stage 1–3 Shared protein domains H__sapiens__5_-774,226| H__sapiens__5_-774,129| Shared protein domains 0.011 DCT H__sapiens__3_-774,226 0.038 predicted TYR H__sapiens__3_-774,129 0.751 query [128]Open in a new tab Abbreviations: All stages, the discovery data set of 960 CRC patients with all stages in OS or PFS; Stage 1–3, the discovery data set of 795 CRC patients with stage 1–3 in OS or PFS; The gene-gene interaction network from stages 1–3 in the OS is depicted in Fig. [129]3B and detailed information for both the top edges and nodes is discussed in Table [130]3. Query genes accounted for 59.18%, and predicted genes accounted for 40.82%. For each gene, the individual scores ranged from 0.009 to 0.041 for the predicted genes and from 0.469 to 0.991 for the query genes. In addition to genes, there were three categories of edges (co-expression, pathway, and physical interactions), and the proportions of edges were 53.26%, 37.68%, and 8.70%, respectively. The edges showed normalized maximum weights from 8.00 × 10^− 6 to 0.123, and physical Interactions usually showed larger weights than other types of edges (Supplementary Table [131]3, Additional file [132]2). PIK3R2 (predicted gene) and PIK3CA (query gene) had the largest weights. The gene-gene interaction network from all stages in PFS is depicted in Fig. [133]3C and the information for both the top edge and nodes is discussed in Table [134]3. Query genes accounted for 55.56%, and predicted genes accounted for 44.44%. For each gene, the individual scores ranged from 0.030 to 0.089 for the predicted genes and 0.473 to 0.968 for the query genes. Additionally, there were five categories of edges (co-expression, genetic Interactions, physical interactions, predicted, and co-localization), and the proportions of edges were 25.41%, 49.18%, 4.92%, 15.57%, and 4.92%, respectively. Furthermore, the edges showed normalized max weights from 2.36 × 10^− 6 to 0.139, and “predicted” usually showed larger weights than other types of edges (Supplementary Table [135]3, Additional file [136]2). ALOX5 (predicted gene) and COTL1 (query gene) had the greatest weight. The gene-gene interaction network from stages 1–3 in PFS is depicted in Fig. [137]3D and detailed information for the top edge and nodes is discussed in Table [138]3. Query genes and predicted genes accounted for 63.64% and 36.36%, respectively. For each gene, the individual scores ranged from 0.003 to 0.009 for the predicted genes and from 0.482 to 0.964 for the query genes. Seven categories of edges were observed (co-expression, genetic interactions, pathways, physical interactions, shared protein domains, co-localization, and predicted), and the proportions of edges were 60.40%, 20.23%, 3.70%, 5.13%, 8.55%, 1.71%, and 0.28%, respectively. The edges showed normalized max weights from 1.71 × 10^− 7 to 0.012, and the shared protein domains usually showed larger weights than other types of edges (Supplementary Table [139]3, Additional file [140]2). DCT (predicted gene) and TYR (query gene) had the greatest weight. Pathway enrichment analysis of PRS gene sets on overall and progression-free survival The pathway enrichment analysis results from all stages and stages 1–3 of OS, PFS is discussed in Supplementary Table [141]4. From the results for all stages in OS, we identified ‘RUNX3 regulates RUNX1-mediated transcription’ with RUNX1 and ‘BH3-only proteins associate with and inactivate anti-apoptotic BCL-2 members’ with BCL2. Each pathway showed p-values of 1.35 × 10^− 5 and 1.01 × 10^− 4 respectively. Additionally, for stage 1–3 in OS, none of the pathway results were statistically significant. Furthermore, from the results for all stages in PFS, we identified ‘RUNX2 regulates genes involved in differentiation of myeloid cells’ and ‘RUNX2 regulates chondrocyte maturation’ with RUNX2, with p-values of 7.58 × 10^− 5 and 1.03 × 10^− 4. Lastly, from the results for stage 1–3 in PFS, we identified ‘Insulin-like Growth Factor-2 mRNA Binding Proteins (IGF2BPs/IMPs/VICKZs) bind RNA’ with CD44, ‘Signaling by VEGF’ and ‘VEGFA-VEGFR2 Pathway’ with four genes (PIK3CA, NOS3, ITPR1, PRKCZ, VEGFA), and ‘VEGFR2 mediated cell proliferation’ with three genes (ITPR1, PRKCZ, VEGFA). Each pathway presented p-values of 7.04 × 10^–10, 4.93 × 10^− 6, 5.18 × 10^− 5 and 1.31 × 10^− 4 respectively. Discussion This study investigated the association between SNPs in immune-related genes and the clinical outcomes of patients with CRC in South Korea. We identified 15 independent immune-related SNPs that were significantly associated with OS and PFS. These findings indicate the potential use of novel biomarkers for CRC prognosis. In the analysis of OS, four SNPs (rs1414944, rs34009001, rs2967871, and rs1082967) were associated with outcomes in all stages, and six SNPs (rs117322760, rs117802394, rs79798148, rs142842205, rs147127574, and rs74996929) in stage 1–3. In the PFS analysis, we observed that three SNPs (rs76181827, rs896531, and rs116332704) were associated with outcome in all stages and three SNPs were associated in stages 1–3 (rs143531681, rs145049833, and rs76181827). One of the significant differences in the OS results was that several SNPs exceeded the Bonferroni-corrected p-value threshold (rs76181827 in all stages and rs143531681, rs145049833, and rs76181827 in stage 1–3). In the PRS analysis of OS, higher PRS grades were associated with poorer survival outcomes at all stages and stages 1–3. Moreover, we uncovered the potential for a new interpretation of elements that were deemed less significant in the individual SNP analysis by conducting a gene-gene interaction network analysis. The PRS analysis of PFS showed a similar trend. A higher PRS grade is associated with poorer progression outcomes at all stages and stage 1–3. However, this differed from the OS results. As more SNPs showed significance in the PFS analysis, the role of PRS as a biomarker appeared to be better in PFS than in OS. Although none of the SNPs exceeded the Bonferroni-corrected p-value threshold, we performed additional analysis with extended suggestive SNPs for PRS and gene-gene interaction networks. In the gene-gene interaction network of OS at all stages, the pathway edge between DTX1 and ADAM10 had the largest weight. A correlation exists between both genes in the Notch signaling pathway [[142]38], which is commonly mutated in various types of cancer. Although no direct connection between ADAM10 and DTX1 has been identified, individual studies on both genes have revealed their correlation with CRC survival. Several studies on ADAM10 and CRC show that higher ADAM10 expression is associated with higher tumor stage or poorer survival outcomes (metastasis) in CRC [[143]39, [144]40]. Increased DTX1 expression is associated with a poor prognosis for CRC [[145]41]. Additionally, PIK3R2 and PIK3CA held the highest weights for physical interactions in the OS of stages 1–3. Both genes encode the PI3K familial proteins, which have been extensively studied for anticancer therapy [[146]42]. Both genes have been studied for their association with CRC. Differential expression analysis of the miRNA profile showed that PIK3R2 is significantly enriched in the CRC pathway [[147]43]. For PIK3CA, mutations in the PIK3CA gene can lead to increased PI3K activity, which promotes cell growth, survival, and proliferation in CRC [[148]44]. The predicted edge between ALOX5 and COTL1 had the largest weight in the PFS for all stages. Both genes are involved in the association in the lipoxygenase pathway, which is significant in cancer progression and metastasis [[149]45]. MiR-216a-3p inhibits CRC cell proliferation and targeted ALOX5 [[150]46]. COTL1 has not been reported to cause CRC. However, the differential expression level of COTL1 correlates with the metastatic features of non-small cell lung cancer (NSCLC) [[151]47]. Further research on COTL1 may reveal its correlation with survival outcomes considering that ALOX5 strongly correlates with CRC. Moreover, the shared protein domain edge between DCT and TYR showed the greatest weight in the PFS of stages 1–3. Dopachrome tautomerase (DCT) correlates with CRC since it promotes COX-2 expression through β-catenin mediated mechanisms and these two factors are critical contributors to CRC [[152]48]. Although TYR is unreported in CRC, tyrosinase suppresses vasculogenic mimicry in human melanoma cells [[153]49]. An alteration of rs149180340 matched to TYR could be interpreted as a risk factor for CRC because this result suggests the anti-oncogenic effects of TYR. Moreover, it is essential to acknowledge the strengths and limitations of this study. First, we identified novel SNPs that were not previously reported to be associated with CRC prognosis. These SNPs were associated with OS and PFS and would serve as potential biomarkers for CRC prognosis and personalized medicine. Second, we discovered a significant implication of the PRS as a biomarker of CRC prognosis. Previous studies have used the PRS in CRC case-control studies and normal populations. However, our study utilized PRS to analyze OS and PFS in CRC, providing a comprehensive approach to clinical outcomes. Third, we performed a network-based analysis of post-GWAS data that allowed us to uncover novel gene-gene interactions. This provides a deeper understanding of genetic interactions in CRC. However, this study has several limitations. First, the most significant limitation was its small sample size. This limitation may have contributed to the failure of the validation study, the large I^2 values, and the low statistical significance. Further studies with larger populations are needed to confirm our findings and to provide a more robust understanding of the association between SNPs matched to immune-related genes and CRC prognosis. Second, we could not retrieve p-values exceeding the Bonferroni-corrected threshold for certain SNPs. This limitation suggests that the association of these SNPs with CRC prognosis may not be as strong as initially planned. Conclusions In summary, we identified novel predictive biomarker genetic variants of PKHD1L1 and NOS3 associated with OS and PFS, respectively, among patients with CRC. Additionally, we identified PRS as a potential biomarker for survival outcomes and suggested that a gene-gene interaction network is involved in CRC progression. Future studies including larger and more racially and ethnically diverse populations and integrating multi-omics data (such as RNA-seq and metabolomics profiling) will promote a deeper understanding of CRC survival and the development of immune-related therapeutic interventions for patients with CRC. Electronic supplementary material Below is the link to the electronic supplementary material. [154]12885_2025_13819_MOESM1_ESM.pdf^ (614.3KB, pdf) Additional file 1 included supplementary figures. Supplementary Fig. 1. Flowchart of the study population selection. Supplementary Fig. 2. Variance inflation factor estimation to detect multicollinearity of potential confounding factors. Supplementary Fig. 3. Flowchart of gene and SNP selection. Supplementary Fig. 4. Formula for calculating polygenic risk score (PRS) [155]12885_2025_13819_MOESM2_ESM.xlsx^ (137KB, xlsx) Additional file 2 included supplementary tables. Supplementary Table 1. Participant characteristics in the validation study. Supplementary Table 2. Additional information on extended suggestive SNPs. Supplementary Table 3. Additional information about the network. Supplementary Table 4. Pathway enrichment analysis results of PRS gene sets Acknowledgements