Abstract This study aimed to explore the underlying molecular mechanisms of colorectal cancer (CRC) using bioinformatics analysis. Using [30]GSE4107 datasets downloaded from the Gene Expression Omnibus, the differentially expressed genes (DEGs) were screened by comparing the RNA expression from the colonic mucosa between 12 CRC patients and ten healthy controls using a paired t-test. The Gene Ontology (GO) functional and pathway enrichment analyses of DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) software followed by the construction of a protein–protein interaction (PPI) network. In addition, hub gene identification and GO functional and pathway enrichment analyses of the modules were performed. A total of 612 up- and 639 downregulated genes were identified. The upregulated DEGs were mainly involved in the regulation of cell growth, migration, and the MAPK signaling pathway. The downregulated DEGs were significantly associated with oxidative phosphorylation, Alzheimer’s disease, and Parkinson’s disease. Moreover, FOS, FN1, PPP1CC, and CYP2B6 were selected as hub genes in the PPI networks. Two modules (up-A and up-B) in the upregulated PPI network and three modules (d-A, d-B, and d-C) in the downregulated PPI were identified with the threshold of Molecular Complex Detection (MCODE) Molecular Complex Detection (MCODE) score ≥4 and nodes ≥6. The genes in module up-A were significantly enriched in neuroactive ligand–receptor interactions and the calcium signaling pathway. The genes in module d-A were enriched in four pathways, including oxidative phosphorylation and Parkinson’s disease. DEGs, such as FOS, FN1, PPP1CC, and CYP2B6, may be used as potential targets for CRC diagnosis and treatment. Keywords: molecular mechanisms, network module, enrichment analysis Introduction Colorectal cancer (CRC) is a leading cause of cancer-related deaths worldwide, accounting for more than 1 million cases and 600,000 deaths every year.[31]^1 CRC is a cancer mostly due to uncontrolled cell growth in the colon, rectum, or appendix. Many factors may cause CRC including smoking, obesity, alcohol, older age, high fat intake, and a lack of physical exercise.[32]^2 For patients in the age range of 55–64 years, flexible sigmoidoscopic screening has been used to treat CRC and could reduce its incidence and mortality by 33% and 43%, respectively.[33]^3 Although colonoscopic screening has become a reliable screening tool for treatment of CRC, it has the shortcomings of bleeding and perforations, as well as a tendency to cause acute diverticulitis.[34]^4 Therefore, the identification of reliable, accurate, and sensitive biomarkers to improve the detection of CRC is imperative. Over the past decades, various genetic studies have been established, which are associated with the accumulated process and progress of CRC. For example, a recent study identified that the microRNA-126 that targets CXCR4 was downregulated in CRC tissue and was involved in distant metastasis, clinical tumor–node–metastasis staging, and poor survival.[35]^5 Xu et al[36]^6 reported that downregulated microRNA-375 in tissue and plasma was matched in CRC and associated with some critical signal pathways, such as MAPK, Wnt, and TGF-beta signal pathways in the development and progression of CRC. In addition, based on gene/exon/network-level analysis tools, ATP8B1 was found to be a novel gene associated with CRC, which changed at the cytogenetic, gene, and exon levels.[37]^7 After implementing a computational compilation of protein conformational changes and the probable CRC-associated phenotype, Kumar et al[38]^8 found that the E403K mutation in the mitotic centromere-associated kinesis protein could produce CRC-associated phenotypic affects by producing a major impact on the structural conformation of the protein. Recent evidence generated from an experimental interleukin-10- deficient mouse model using microbial genomic analysis showed that inflammation played an important role in bacteria-induced CRC.[39]^9 Although these expanded efforts studied the genetic bases of CRC, the molecular mechanisms of the development and progress are not fully understood. Recently, gene expression profile data associated with CRC have been studied by many researchers. For example, Liu et al[40]^10 downloaded the microarray data of [41]GSE4107 and [42]GSE8671, screened common differentially expressed genes (DEGs) between CRC and healthy controls (HC), and performed Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. However, they did not perform further study of DEGs, such as the construction of a protein–protein interaction (PPI) network and module analysis of the PPI network. In the present study, using the same microarray data ([43]GSE4107), we compared RNA expression from colonic mucosa between 12 patients with CRC and ten HC samples to identify the DEGs. In addition, we constructed the PPI network of the DEGs and performed hub gene identification and module analysis of the PPI network. The findings of this study may provide an improved understanding of and lead to an improved diagnosis of CRC. Materials and methods Affymetrix chip data The Gene Expression Omnibus (GEO) ([44]http://www.ncbi.nlm.nih.gov/geo/) database at the National Center for Biotechnology Information (NCBI) is currently the largest fully public gene expression resource, which includes 214,268 samples and 4,500 platforms.[45]^11 The chip dataset [46]GSE4107, which included 12 CRC patients and ten HC samples, was downloaded from GEO.[47]^12 The HC and CRC samples were age- (50 years or less), ethnicity- (Chinese), and tissue-matched. The RNA extracted from the colonic mucosa of the HC and CRC samples was analyzed using the [48]GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array. Data preprocessing and analysis of DEGs The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays.[49]^13 Data preprocessing was performed using robust multi-array average in the affy package in R, including background adjustment, normalization, and log transformation of the values.[50]^14 Then, the original CEL file data were transformed to probe-level data. Further, the probe-level data were converted to gene symbols using the Perl procedure given that when several probes corresponded to one gene symbol, the gene expression value was the mean of the probes. Multiple testing was corrected using the Benjamini–Hochberg[51]^15 procedure to obtain the adjusted P-value. Then, the log2-fold change (log2FC) was calculated. Only genes with an adjusted P-value <0.05 and |log2FC| >1 were regarded as DEGs. Functional and pathway enrichment analysis As a comprehensive set of functional annotation tools, the Database for Annotation, Visualization and Integrated Discovery (DAVID) has been used for systematic and integrative analysis of large gene lists.[52]^16 GO terms are significantly overrepresented in a set of genes from three aspects, including the cellular component, molecular function, and biological process.[53]^17 In our work, the significant GO biological process terms and KEGG pathway enrichment analyses of the identified DEGs were performed using DAVID with the thresholds of P-value <0.05 and enrichment gene count >2. PPI network construction The Search Tool for the Retrieval of Interacting Genes (STRING) database ([54]http://www.bork.embl-heidelberg.de/STRING/)[55]^18 is a pre-computed global resource for the exploration and analysis of PPI information. In the present study, the STRING 9.0 online tool was used to screen the PPI of the DEGs. The DEGs with required confidence (combined score) >0.4 were selected, and then the PPI network was constructed and visualized using Cytoscape.[56]^19 Given that most of the networks were scale-free, the hub genes were then selected with a connectivity degree >10. Module analysis of the PPI network The module analysis was performed on the PPI network using the Molecular Complex Detection (MCODE) in Cytoscape with a degree cutoff =2, node score cutoff =0.2, k-core =2, and max depth =100.[57]^20 Then, the significant modules with MCODE score ≥4 and node ≥6 were selected. Next, GO functional and KEGG pathway enrichment analyses of the most significant module were performed with a threshold of P<0.05. Results Identification of DEGs Using bioinformatics approaches, a total of 1,251 DEGs between ten HC and 12 CRC samples were identified, including 612 upregulated and 639 downregulated genes. GO functional and pathway enrichment analysis Based on DAVID software, a total of 29 GO functions were enriched. Among them, the upregulated DEGs were mainly enriched in 15 GO functions and the downregulated DEGs were mainly enriched in 14 GO functions. The top five GO terms in each are shown in [58]Table 1. The most significant GO function in the upregulated DEGs was the regulation of cell growth, and the most significant one in the downregulated DEGs was Golgi vesicle transport. Table 1. GO functional enrichment analysis of up- and downregulated DEGs (top five in each) Category Term Description Count P-value Size Upregulated  BP GO:0001558 Regulation of cell growth 20 3.38E-04 193  BP GO:0016477 Cell migration 24 6.74E-04 270  BP GO:0044057 Regulation of system process 26 7.02E-04 305  BP GO:0040008 Regulation of growth 27 1.31E-03 336  BP GO:0008361 Regulation of cell size 19 1.81E-03 205 Downregulated  BP GO:0048193 Golgi vesicle transport 16 3.24E-05 131  BP GO:0015031 Protein transport 45 2.18E-04 752  BP GO:0045184 Establishment of protein localization 45 2.63E-04 759  BP GO:0046907 Intracellular transport 40 3.21E-04 652  BP GO:0070727 Cellular macromolecule localization 28 6.28E-04 409 [59]Open in a new tab Notes: Category: GO function. Count: the number of DEGs. The P-value was adjusted using the Benjamini–Hochberg method. Size: the total number of genes in the GO BP. Abbreviations: BP, biological process; DEGs, differentially expressed genes; GO, Gene Ontology. After KEGG pathway enrichment analysis, the upregulated DEGs were mainly enriched in seven KEGG pathways including vascular smooth muscle contraction, axon guidance, and the MAPK signaling pathway. The downregulated DEGs were mainly enriched in ten pathways, such as oxidative phosphorylation and Parkinson’s disease ([60]Table 2). Table 2. The enriched KEGG pathway of DEGs Category Term Description Count P-value Size Upregulated  KEGG hsa04270 Vascular smooth muscle contraction 10 3.45E-02 111  KEGG hsa04360 Axon guidance 11 3.50E-02 129  KEGG hsa04010 MAPK signaling pathway 18 3.93E-02 266 Downregulated  KEGG hsa00190 Oxidative phosphorylation 18 6.70E-06 119  KEGG hsa05012 Parkinson’s disease 17 1.59E-05 114  KEGG hsa05010 Alzheimer’s disease 19 7.04E-05 156  KEGG hsa05016 Huntington’s disease 20 7.42E-05 171  KEGG hsa00970 Aminoacyl-tRNA biosynthesis 8 1.33E-03 41  KEGG hsa00830 Retinol metabolism 8 3.83E-03 49  KEGG hsa00860 Porphyrin and chlorophyll metabolism 6 7.30E-03 30  KEGG hsa00280 Valine, leucine, and isoleucine degradation 6 3.49E-02 44  KEGG hsa04664 Fc epsilon RI signaling pathway 8 3.80E-02 76  KEGG hsa04666 Fc gamma R-mediated phagocytosis 9 4.13E-02 94 [61]Open in a new tab Notes: Count: the number of DEGs. Size: the total number of genes in the pathway. Abbreviations: DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes. PPI network construction and hub gene identification After the upregulated and downregulated DEGs were submitted into STRING, a total of 998 upregulated and 900 downregulated PPI relationships were obtained, respectively. Then, the hub genes or proteins in the networks with connectivity degree >10 were identified ([62]Table 3). A total of 44 hub genes were selected from the upregulated PPI network, which included FBJ murine osteosarcoma viral oncogene homolog (FOS), fibronectin 1 (FN1), and CREB binding protein (CREBBP). Meanwhile, 43 hub genes, such as protein phosphatase 1, catalytic subunit, gamma isozyme (PPP1CC); ATP synthase, H+ transporting, mitochondrial Fo complex, subunit B1 (ATP5F1); and cytochrome P450, family 2, subfamily B, polypeptide 6 (CYP2B6), were identified from the downregulated PPI network. Table 3. The hub proteins in the upregulated and downregulated protein–protein interaction network (top 20 in each) ID Degree ID Degree ID Degree ID Degree Upregulated  FOS 41 IGF1 25 DES 19 CALD1 16  FN1 34 CD44 25 EGR1 19 CAV1 16  CREBBP 33 COL14A1 21 TAGLN 17 RET 16  ESR1 32 MGP 20 ATF3 17 ACTA2 15  DCN 26 GSK3B 20 MYH11 16 RUNX1T1 15 Downregulated  PPP1CC 24 HDAC1 22 SOD2 20 ATP5O 18  ATP5F1 24 GAPDH 22 PPARA 19 RAC1 17  CYP2B6 23 MAPK1 22 UQCRH 19 COX4I1 17  ATP5J 23 ATP5L 21 HINT1 18 C14orf2 16  DHX15 23 UQCR11 20 ATP5G3 18 COX7C 15 [63]Open in a new tab Note: Degree: the connectivity of the protein. Module analysis Two modules with MCODE scores ≥4 and nodes ≥6 were selected in the upregulated PPI networks: up-A (MCODE score =6.429) with 29 nodes and up-B (MCODE score =4.522) with 24 nodes ([64]Figure 1). Meanwhile, there were three modules in the downregulated PPI network: d-A (MCODE score =14) with 15 nodes, d-B (MCODE score =7.75) with nine nodes, and d-C (MCODE score =7.714) with eight nodes ([65]Figure 2). Figure 1. [66]Figure 1 [67]Open in a new tab The significant modules in the upregulated protein–protein interaction network with MCODE score ≥4 and node ≥6. Notes: (A) Module up-A. (B) Module up-B. The node stands for the protein (gene); yellow stands for seed nodes; the width of edge was determined according to the combined score of the protein–protein interaction relationship. Abbreviation: MCODE, Molecular Complex Detection. Figure 2. [68]Figure 2 [69]Open in a new tab The significant modules in the downregulated protein–protein interaction network with MCODE score ≥4 and node ≥6. Notes: (A) Module d-A. (B) Module d-B. (C) Module d-C. The node stands for the protein (gene); yellow stands for seed nodes; the width of edge was determined according to the combined score of the protein–protein interaction relationship. Abbreviation: MCODE, Molecular Complex Detection. The GO functional enrichment scores of modules up-A and d-A were 3.54 and 15.55, respectively. Module up-A was enriched in six functions such as behavior, taxis, and chemotaxis; module d-A was enriched in four functions such as oxidative phosphorylation and phosphorus metabolic process ([70]Table 4). In addition, the KEGG pathway analysis revealed that the genes in module up-A were significantly enriched in pathways including neuroactive ligand–receptor interactions and the calcium signaling pathway; the genes in module d-A were significantly enriched in oxidative phosphorylation, Parkinson’s disease, Alzheimer’s disease, and Huntington’s disease ([71]Table 5). Table 4. The GO functional enrichment analysis of DEGs in the up-A and d-A modules with the threshold of P<0.05 Category Term Description Count P-value Size Module up-A Enrichment score:3.54  BP GO:0007610 Behavior 11 2.57E-08 458  BP GO:0042330 Taxis 5 3.45E-04 154  BP GO:0006935 Chemotaxis 5 3.45E-04 154  BP GO:0006952 Defense response 7 1.53E-03 587  BP GO:0007626 Locomotory behavior 5 2.56E-03 264  BP GO:0006955 Immune response 5 4.88E-02 632 Module d-A Enrichment score:15.55  BP GO:0006119 Oxidative phosphorylation 9 2.53E-15 91  BP GO:0016310 Phosphorylation 9 8.11E-08 781  BP GO:0006793 Phosphorus metabolic process 9 3.72E-07 950  BP GO:0006796 Phosphate metabolic process 9 3.72E-07 950 [72]Open in a new tab Notes: Category: GO function. Count: the number of DEGs. Enrichment score: the score of GO enrichment. Size: the total number of genes in the GO BP. Abbreviations: BP, biological process; DEGs, differentially expressed genes; GO, Gene Ontology. Table 5. The KEGG pathway enrichment analysis of DEGs in the up-A and d-A modules with the threshold of P<0.05 Category Term Description Count P-value Size Module up-A  Pathway hsa04080 Neuroactive ligand–receptor interaction 7 1.16E-04 254  Pathway hsa04020 Calcium signaling pathway 6 2.07E-04 175 Module d-A  Pathway hsa00190 Oxidative phosphorylation 13 3.75E-20 119  Pathway hsa05012 Parkinson’s disease 12 1.17E-17 114  Pathway hsa05010 Alzheimer’s disease 12 4.20E-16 156  Pathway hsa05016 Huntington’s disease 12 1.19E-15 171 [73]Open in a new tab Notes: Category: KEGG pathway. Count: the number of DEGs. Size: the total number of genes in the pathway. Abbreviations: DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes. Discussion The analysis of gene expression profiling has been widely used to reveal the abnormally expressed genes associated with CRC and has enabled the identification of targets for therapeutic strategies. In our study, the CRC mechanism was analyzed by bioinformatics, including DEGs screening, PPI network construction, hub gene identification, and module analysis of the PPI network. Based on these results, the underlying molecular mechanisms of CRC were explored on genetic and molecular levels, which provided new insights into CRC diagnosis and treatment. After analyzing the hub genes from the upregulated PPI network, FOS and FN1 were selected with the high connective degree. The FOS gene family was demonstrated to encode leucine zipper proteins and act as a regulator of cell proliferation, differentiation, and transformation.[74]^21 FOS was upregulated in the mucosa of CRC and participated in the biological processes of transcription regulation, cell growth, and inflammatory response in early-onset CRC.[75]^16 In addition, the overexpression of Fra-1, a member of the FOS family, played an essential role in tumorigenesis and was required for the motility and invasion of CRC cells through the downregulation of β1-integrin, resulting in decreased cell-substratum adhesion and cell inactivation of the Pho-ROCK pathway.[76]^22^,[77]^23 FN1 has been found to play a major role in cell adhesion, growth, migration, differentiation, and wound healing.[78]^24 Increased expression of FN1 was found in node-positive CRC and mediated migration of CRC cells in vitro.[79]^25 Additionally, it is well established that FN expression is involved in the tissue remodeling processes that are associated with CRC progression.[80]^26 Together with the genes that participate as mediators of cytoskeletal organization and integrin signaling, they form a pattern of colon cancer-specific changes and particularly impact cell motility.[81]^27 In our study, FOS and FN1 were upregulated in the PPI network in CRC, suggesting that FOS and FN1 may be enrolled in the process and progress of CRC. However, further studies are needed to verify our hypothesis. Additionally, PPP1CC and CYP2B6 were found as two hub genes in the downregulated PPI network. PPP1CC belongs to the protein phosphatase family, protein phosphatase 1 (PP1) subfamily. After analyzing the microarray data of colon cancer patients by using a Takagi-Sugeno-Kang (TSK)-type recurrent neural fuzzy network, Vineetha et al[82]^28 revealed that PPP1CC is associated with the GO term of proteolysis involved in cellular protein catabolic process and the insulin signaling pathway in CRC-related canonical pathways. In addition, PP1 was found to play an essential role in negatively regulating the extracellular signal-regulated kinase pathway in CRC cells.[83]^29 CYP2B6 encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 enzymes play an important role in tumor development through their metabolism of many carcinogens.[84]^30 Several cytochrome P450s were found via immunohistochemistry to have increased expression in CRC and may act as independent markers of prognosis.[85]^31 Compounds associated with the etiology of CRC contain polycyclic aromatic hydrocarbons, in particular, heterocyclic amines, which require metabolic activation by the cytochrome P450 enzymes before exerting their genotoxic effect.[86]^32 In addition, many of the cytochrome P450 enzyme genes such as CYP1A2 and CYP1B1 play essential roles in CRC susceptibility.[87]^33 Taken together, these data support the hypothesis that PPP1CC and CYP2B6 may act as candidate molecular markers associated with CRC. In the current study, our data were downloaded from the gene expression profile dataset [88]GSE4107, which has been reported by Hong et al.[89]^12 As in Hong et al’s study, we also analyzed the DEGs between CRC samples and HC and performed GO functional and pathway enrichment analyses to explore the potential molecular mechanism of CRC. However, there are still some discrepancies. Based on Hong et al’s study, further studies using bioinformatics were performed, such as the construction of a PPI network, hub gene analysis, and module analysis of the PPI network. Then, 998 upregulated and 900 downregulated PPI relationships were obtained followed by the identification of 44 hub genes from the upregulated PPI network and 43 from the downregulated PPI network. Additionally, two modules (up-A and up-B) in the upregulated PPI network and three modules (d-A, d-B, and d-C) in the downregulated PPI network were identified. However, there are some limitations in our study. First, the small amount of data used in the current study were downloaded from the GEO database, not generated by us. Because GEO is a huge data repository, a meta-analysis of the relevant datasets of CRC may be performed in future studies. Second, the results were web-based and were not verified by biological experiments. Thus, further experimental studies based on our findings are still needed. In summary, our study provides a comprehensive bioinformatics analysis of DEGs which may be involved in the progression and development of CRC. The findings of this study may provide a basis for understanding the underlying molecular mechanisms of CRC. In addition, DEGs such as FOS, FNI, PPP1CC, and CYP2B6 may be used as potential therapeutic targets for CRC. However, further studies are necessary for improving the diagnosis and treatment of CRC through regulating DEGs. Footnotes Disclosure The authors report no conflicts of interest in this work. References