Abstract Background Colorectal cancer (CRC) is one of the most common malignant tumors. In the present study, the expression profile of human multistage colorectal mucosa tissues, including healthy, adenoma, and adenocarcinoma samples was downloaded to identify critical genes and potential drugs in CRC. Methods Expression profiles, [32]GSE33113 and [33]GSE44076, were integrated using bioinformatics methods. Differentially expressed genes (DEGs) were analyzed by R language. Functional enrichment analyses of the DEGs were performed using the Database for Annotation, visualization, and integrated discovery (DAVID) database. Then, the search tool for the retrieval of interacting genes (STRING) database and Cytoscape were used to construct a protein–protein interaction (PPI) network and identify hub genes. Subsequently, survival analysis was performed among the key genes using Gene Expression Profiling Interactive Analysis (GEPIA). Connectivity Map (CMap) was used to query potential drugs for CRC. Results A total of 428 upregulated genes and 751 downregulated genes in CRC were identified. The functional changes of these DEGs were mainly associated with cell cycle, oocyte meiosis, DNA replication, p53 signaling pathway, and progesterone‐mediated oocyte maturation. A PPI network was identified by STRING with 482 nodes and 2,368 edges. Survival analysis revealed that high mRNA expression of AURKA, CCNB1, CCNF, and EXO1 was significantly associated with longer overall survival. Moreover, CMap predicted a panel of small molecules as possible adjuvant drugs to treat CRC. Conclusion Our study found key dysregulated genes involved in CRC and potential drugs to combat it, which may provide novel insights and potential biomarkers for prognosis, as well as providing new CRC treatments. Keywords: colorectal cancer, bioinformatics, biomarker, drug 1. INTRODUCTION Colorectal cancer (CRC) is one of the most common cancers with high morbidity worldwide. Despite advances in screening detection and new treatment strategies, CRC remains a leading cause of cancer‐associated mortality, due to the lack of effective diagnostic methods at the early stage and reduce sensitivity to chemotherapy (Weiser, [34]2018). Therefore, it is crucial to understand the precise molecular mechanisms involved in the carcinogenesis and thus develop promising prognostic biomarkers and potential therapeutic targets. Colorectal cancer is a complex bioprocess following the adenomacarcinoma multistage sequence (Testa, Pelosi, & Castelli, [35]2018). Thus, molecular dysregulations during the process of carcinogenesis, particularly during the precancerous stage, should be considered as the most important risk factors contributing to the progression of CRC. Several studies have used gene expression profiling to identify key genes between CRC samples and normal samples. Huang, Yang, and Huang ([36]2018) identified hundreds of CRC associated differentially expressed genes (DEGs) based on the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) database. An, Zhao, Yu, and Yang ([37]2019) found 35 genes significantly associated with patient survival based on the transcription profile of multi‐stage carcinogenesis and bioinformatics analysis. Hou et al. ([38]2018) found a collection of DEGs and DNA methylation aberrations in CRC. Hundreds of DEGs were detected. However, DEGs are different in different studies with only some of them consistently detected. Therefore, the discovery of novel effective therapeutic targets against CRC is urgently required. In this study, we selected the following microarray datasets [39]GSE33113 (de Sousa et al., [40]2011; Kemper et al.., [41]2012) and [42]GSE44076 (Closa et al., [43]2014; Cordero et al., [44]2014; Sanz‐Pamplona et al., [45]2014; Solé et al., [46]2014) from the GEO database to identify DEGs. Kyoto encyclopedia of genes and genomes (KEGG) and gene ontology (GO) pathway analysis were used to investigate DEGs. Then we identified hub genes from the common DEGs by constructing protein–protein interaction (PPI) network. Furthermore, the survival analysis was used on the gene expression profiling interactive analysis (GEPIA) website. Candidate small molecules were identified for their potential use in the treatment of CRC. 2. MATERIALS AND METHODS 2.1. Data resources To investigate the differential gene expression between CRC and normal samples, [47]GSE33113 and [48]GSE44076 microarray datasets were downloaded from the GEO website ([49]http://www.ncbi.nlm.nih.gov/geo/). These RNA profiles were provided on platform [50]GPL570 (Affymetrix Human Genome U113 Plus 2.0 Array) ([51]GSE33113) and [52]GPL13667([HG‐U219] Affymetrix Human Genome U219 Array) ([53]GSE44076). A total of 188 CRC samples and 154 normal samples were obtained in our study, including 40 tumor samples and 6 normal samples in [54]GSE33113 profile, 98 tumor samples, and 148 normal samples in [55]GSE44076 profile. 2.2. Identification of DEGs The original CIMFast Event Language files were downloaded and classified as CRC and normal groups. The raw data were standardized and transformed into expression values using the affy package of Bioconductor (Bioconductor, [56]http://www.bioconductor.org/). The significance analysis of the empirical bayes methods within limma package was applied to identify DEGs between CRC samples and normal tissue sample (Ritchie et al., [57]2015). Adj.p value <0.01 and |logFC| > 1 were set as the cutoff criteria to select the significant DEGs. 2.3. KEGG and GO enrichment analyses of DEGS To explore potential biological process (BP), molecular functions (MF), and cellular components (CC) related to the overlap DEGs, the Database for Annotation, Visualization, and Integrated Discovery (DAVID; [58]http://david.ncifcrf.gov) (version 6.7) was introduced to perform functional annotation and pathway enrichment analysis, including GO and KEGG pathway analysis (Ashburner et al., [59]2000; Dennis et al., [60]2003; "The Gene Ontology (GO) project," [61]2006; Huang, Sherman, & Lempicki, [62]2009; Kanehisa & Goto, [63]2000). GO is a major bioinformatics tool to annotate genes and analyze BP of these genes. KEGG is a database resource for understanding high‐level functions and biological systems from large‐scale molecular datasets generated by high‐throughput experimental technologies. p value <0.05 was considered statistically significant. 2.4. Protein–protein interaction (PPI) network construction and module analysis The Search Tool for the Retrieval of Interacting Genes database (STRING, [64]https://string-db.org/) is an online tool designed to analyze the PPI information (Damian et al., [65]2015). Analyzing the functional interactions between proteins may provide insights into the mechanisms of generation or development of diseases. In the present study, all the previously identified DEGs were submitted to STRING database for exploring their potential interactions. The interactions with a combined score >0.4 were considered significant and extracted for constructing the PPI networks through the Cytoscape software that is an open source bioinformatics software platform for visualizing molecular interaction networks (Bandettini et al., [66]2012). Subsequently, Molecular Complex Detection (MCODE) was used to screen significant modules from the PPI network with degree cutoff = 2, node score cutoff = 0.2, k‐core = 2, and max depth = 100 (Smoot, Ono, Ruscheinski, Wang, Ono, Ruscheinski, Wang, & Ideker, [67]2011). The functional and pathway enrichment analysis of for the significant modules was also performed. The networks gene oncology tool (BiNGO) plugin of Cytoscape was used to perform and visualize the BP analysis of the hub genes (Maere, Heymans, & Kuiper, [68]2005). 2.5. Analysis and validation of hub genes A network of module genes and their co‐expression genes was established by cBioPortal online platform ([69]http://www.cbioportal.org). To confirm the reliability of hub genes from our detection, we analyzed their prognostic and expression in CRC using Gene Expression Profiling Interactive Analysis (GEPIA), an interactive web application tool for gene expression analysis, containing 8,587 normal samples and 9,736 tumors samples from the Genotype‐Tissue Expression databases and TCGA databases (Cerami et al., [70]2012; Gao et al., [71]2013; Tang et al., [72]2017). Then the survival curve and box plot were performed to visualize the relationships. In addition, the protein expression of the hub genes between CRC and normal tissues was determined by the human protein atlas (HPA, [73]www.proteinatlas.org) database, an online tool for analyzing protein level from clinical samples. 2.6. Identification of small molecules The Connectivity Map (CMap, [74]http://www.broadinstitute.org/cmap/) was used to predict potential small active molecular that may induce or reverse the current biological status encoded by a particular gene expression signature (Lamb et al., [75]2006). We contrasted the DEGs with those participating in small active molecular interference in the CMap database to find potential small molecular related to these DEGs. First, the overlaps DEGs were divided into upregulated and downregulated groups. Then, these probe sets were used to query the CMap database. Finally, the enrichment score representing similarity was calculated, ranging from −1 to 1. A positive connectivity value (closer to +1) indicated the small molecules could induce the state of CRC cells, whereas a negative connectivity value (closer to −1) indicated greater similarity among the genes and the small molecules could reverse the above cancer cell status. 3. RESULTS 3.1. Identification of DEGs in CRC Analyzed with the Limma package, a total of 1,186 overlap DEGs expressed in CRC samples were extracted from the [76]GSE33113 and [77]GSE44076 datasets. The volcano plot of DEGs of CRC in each dataset was presented in Figure [78]1a. The Venn diagrams showed the 1,186 overlap DEGs among the three datasets (Figure [79]1bA) including 428 significantly upregulated genes (Figure [80]1bB) and 751 downregulated genes (Figure [81]1bC). Figure 1. Figure 1 [82]Open in a new tab (a) Volcano plot of gene expression profile data between CRC and normal tissues in each dataset. Red dots: significantly upregulated genes in CRC; Green dots: significantly downregulated genes in CRC; Black dots: nondifferentially expressed genes. Adj. p < 0.01 and |log2 FC| > 1 were considered as significant. (b) A. Venn diagram of 1,186 overlapping DEGs from [83]GSE33113 and [84]GSE44076 datasets including 428 upregulated DEGs and 751 downregulated DEGs. CRC, Colorectal cancer; DEGs, Differentially expressed genes 3.2. Enrichment analyses To explore the biological functions of identified DEGs, we performed functional and pathway enrichment analyses using DAVID. For biological processes, GO analysis results indicated that upregulated and downregulated DEGs were significantly enriched in xenobiotic glucuronidation, cellular glucuronidation, flavonoid glucuronidation, negative regulation of cellular glucuronidation, and negative regulation of glucuronosyl transferase activity. Cell component analysis showed that these DEGs were particularly involved in extracellular exosome, extracellular space, extracellular matrix, proteinaceous extracellular matrix, and cell surface. Similarly, changes in molecular function of DEGs were significantly enriched in glucuronosyl transferase activity, retinoic acid binding, chemokine activity, transferase activity, transferring hexosyl groups, and CXCR chemokine receptor binding. Additionally, the results of KEGG pathway analysis revealed that these DEGs were mainly enriched in Pentose and glucuronate interconversions, Ascorbate and aldarate metabolism, Drug metabolism‐cytochrome P450, Retinol metabolism, and Steroid hormone biosynthesis (Figure [85]2 & Table [86]1). Figure 2. Figure 2 [87]Open in a new tab Functional and signaling pathway analysis of the overlapped DEGs in CRC. (a) Biological processes (b) Cellular components (c) Molecular function (d) KEGG pathway. CRC, Colorectal cancer; DEGs, Differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes Table 1. Functional and pathway enrichment analysis of the overlap DEGs Category Term p Value Genes GOTERM_BP_DIRECT GO:0052697~xenobiotic glucuronidation 2.85E‐09 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGT1A1 GOTERM_BP_DIRECT GO:0052695~cellular glucuronidation 7.93E‐09 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A4, UGT2A3, UGT2B15, UGT1A1 GOTERM_BP_DIRECT GO:0052696~flavonoid glucuronidation 3.33E‐08 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGT2A3, UGT2B15, UGT1A1 GOTERM_BP_DIRECT GO:2001030~negative regulation of cellular glucuronidation 3.88E‐08 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT1A8, UGT1A3, UGT1A4, UGT1A1 GOTERM_BP_DIRECT GO:1904224~negative regulation of glucuronosyltransferase activity 3.88E‐08 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT1A8, UGT1A3, UGT1A4, UGT1A1 GOTERM_CC_DIRECT GO:0070062~extracellular exosome 2.01E‐14 LTBP4, PTGS1, CCT2, SELENBP1, EDIL3, OGN, ACTG2, CD44, CD46, SERPINE1, CFH, COL12A1, GNG2, CFI, GUCA2A, CFD, GUCA2B, RAB27A, GNG7, ACAA2, BSG, CLCA4, CRYAB, RAN, CDHR2, KRT12, LIFR, CDHR5, COLEC12, GLTP, THY1, PRELP, NAPEPLD, TACSTD2, PGM1, RAB15, GCNT3, AHCY, GNAI1, MME, SERPINH1, EPHB4, ACAT1, DAAM2, KRT24, GPD1L, SERINC2, SIAE, FGL2, FAM162A, SPP1, BMP3, MGAT4A, CFB, MAOB, IL1RN, GARS, S100A11, C16ORF89, HBA2, HBA1, LGALS9, GART, PPA1, GGT6, CD55, CKMT1A, SFRP1, FCGBP, PLAU, CKMT1B, NAAA, ANPEP, SLC26A2, SPINK5, CKB, HSPH1, SLC23A1, SMPDL3B, SMPDL3A, GPX3, SLC22A5, FAM129A, CEACAM1, KCNMA1, CCT6A, FLNB, TRAP1, MTMR11, PFDN2, CHMP1B, BGN, SERPINB5, CLIC5, CLIC6, RUVBL1, CHL1, C7, SORD, UGDH, CLDN11, ITM2C, CCL28, ITM2A, GLIPR2, GPM6A, CD177, AGT, SULT1C2, C2, THBS1, CD27, TMPRSS2, CES3, BCAS1, CES2, SHMT2, TRHDE, CAPN5, SI, APPL2, PCK1, FBLN1, PROM2, GLA, PLSCR4, TOM1L2, PTP4A1, EEF1E1, GFRA1, CP, PSAT1, PAICS, HPGD, CDH11, GNPDA1, TSPAN1, NIT1, THRB, TSPAN3, MMP9, GNA11, IQGAP2, SLC7A5, CXCL12, CMBL, PRKAR2B, ASPA, SLC16A1, TGFBI, IL1B, PRKACB, SLC4A4, NEGR1, MB, COCH, SPARCL1, RBKS, METTL7A, NEBL, PCOLCE2, TRIM36, AKR1B10, COL1A2, SNRPB, MFAP4, DST, SRI, WNT5A, FXYD3, PPIL1, ITLN1, CLU, PSMA7, TIMP1, B3GNT8, REG1A, CSE1L, OSBPL1A, ENTPD5, ANGPTL1, NDRG2, DEFB1, FN1, CPNE8, S100P, CPNE5, MYO1D, EPHX2, SLC6A14, SPPL2A, ATAD2, ABCB1, ENDOD1, SLC6A19, MUC4, LCN2, TST, LYVE1, PLCG2, MYH11, FABP4, SLC13A2, HSPD1, ANTXR1, MYLK, PRPH, FIGNL1, SLC5A1, IGFBP6, CLDN5, PRDX4, FAM63A, AMN, DES, GSN, ACOT11, PDE8A, RHOF, AHNAK, AKR1C1, CDK1, ACADM, CILP, MPP6, PADI2, PIGR, MAN1A1, PRKCB, CFL2, SCIN, CA4, WASL, CA2, CA1, CTSG, UGP2, CPM, ADH6, OAF, PBLD, UGT1A6, MTM1, UGT1A9, HSPA2, ATIC, PAFAH1B3, PLCD1, SCNN1G, NEDD4L, SLC39A5, SCNN1B, COL8A1, HBB, MYOC, DPT, GSTA1, SECTM1, RNASE1, TNXB, OLR1, TNXA, SLC12A2, KL, NFASC, KLK1, GDPD3, FUCA1, ANXA3, SMPD1, DSC2, GDF15, ATP8A1 GOTERM_CC_DIRECT GO:0005615~extracellular space 2.08E‐12 EDN3, CTHRC1, RETNLB, MMP9, LTBP4, SELENBP1, MMP3, CXCL11, CXCL12, CXCL10, REG3A, ACTG2, OGN, SERPINE2, SOSTDC1, FAP, TGFBI, SERPINE1, CFH, COL12A1, IL1B, CFI, CFD, LGI1, MTUS1, SPON1, SPARCL1, C10ORF99, PRELP, SCGB2A1, TNFAIP6, ADAMTS9, CD36, TACSTD2, VEGFA, COL1A2, STC1, RELN, COL1A1, WNT5A, ODAM, ENPP2, CLU, CCL8, FAM132A, CHEK1, GREM2, SERPINH1, ABI3BP, TIMP1, SIAE, ANGPTL1, DEFB1, OLFM1, SPP1, FN1, BMP3, BMP2, CFB, LGALS4, IL1RN, S100A11, LMCD1, TINAG, LGALS9, MUC4, C2ORF40, TST, LCN2, LAMA1, TNFSF11, NPY, SRPX2, S100B, SFRP1, SFRP4, NLGN4X, POP1, HSPD1, PYY, AREG, TPSAB1, SST, PLAU, IGFBP6, PRDX4, SPINK2, ANPEP, AMN, CKB, MTHFD2, SMPDL3B, GSN, SMPDL3A, GPX3, SEMA3D, LOXL2, ANGPT2, GHR, EGFL6, CILP, PIGR, IL6R, SLIT3, CHGA, SEMA4G, SERPINB5, CFL2, VCAN, CA2, CTSG, CXCL1, CPM, SORD, CXCL5, VPREB3, TNC, CXCL3, CXCL2, CXCL9, CXCL6, KIT, CCL28, PCSK2, CCL23, CCL20, AGT, PTN, PCSK9, C2, THBS1, MYOC, SCG2, DPT, CES3, SECTM1, CES2, TNXB, TNFSF4, KL, SPARC, GCG, DKK2, FBLN1, CCL13, SULF1, SMPD1, CMTM7, MEP1B, CP, GDF15 GOTERM_CC_DIRECT GO:0031012~extracellular matrix 3.99E‐09 FGFR2, LTBP4, TNC, CLU, CCT2, EDIL3, MMRN1, ABI3BP, MMP1, OGN, SERPINE2, TGFBI, SERPINE1, COL12A1, COL8A1, LOXL2, THBS1, THBS2, MYOC, DPT, SPON1, FN1, COCH, HAPLN1, TNXB, COL4A1, RAN, CILP, LMCD1, CCT6A, MMP14, COL5A2, FLNB, MMP11, PRELP, LAMA1, ADAMTS9, FBLN1, BGN, SFRP1, ZG16, COL1A2, VCAN, COL1A1, HSPD1, TPSAB1, MFAP4, CTSG GOTERM_CC_DIRECT GO:0005578~proteinaceous extracellular matrix 1.12E‐07 WNT5A, CTHRC1, MAMDC2, MMP9, LTBP4, MMP28, MMP3, MMP1, TIMP1, OGN, TGFBI, MYOC, DPT, SPON1, FN1, COCH, HAPLN1, TNXB, TNXA, SPARCL1, OLFML2B, CILP, SPARC, COL5A2, COL4A6, MMP12, MUC4, SLIT3, MMP11, COL4A5, PRELP, LAMA1, ADAMTS9, FBLN1, BGN, SFRP1, ZG16, VEGFA, COL1A2, VCAN, RELN, CHL1 GOTERM_CC_DIRECT GO:0009986~cell surface 8.51E‐07 TLR2, TLR3, IQGAP2, LPAR1, SRPX, CD44, ANK3, CD46, FAP, RSPO2, CHRNA1, CEACAM1, GHR, TMEM206, CRYAB, NRXN1, IL6R, SLC7A11, ADAMTS9, BGN, CD36, TNFRSF10B, BACE2, VEGFA, CA4, SLITRK6, CTSG, AOC3, WNT5A, FGFR2, CPM, FGFR3, TNFRSF12A, CLU, HMMR, SLC11A2, FOLR2, HSPA2, P2RY1, PCSK9, PTN, SLC39A6, THBS1, FCER1A, BMP2, CAPN5, TNFSF4, MET, ITGA2, ABCB1, SPARC, FZD5, CD1D, CD55, PROM2, SRPX2, SFRP1, NLGN4X, SULF1, SFRP4, AREG, ANTXR1, HSPD1, SCARA5, PLAU GOTERM_MF_DIRECT GO:0015020~glucuronosyltransferase activity 8.82E‐08 CSGALNACT1, UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGT2A3, UGT2B15, UGT1A1 GOTERM_MF_DIRECT GO:0001972~retinoic acid binding 6.55E‐06 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A4, UGT2B15, UGT1A1 GOTERM_MF_DIRECT GO:0008009~chemokine activity 8.97E‐06 CXCL1, CCL13, CCL23, CXCL5, CCL20, CXCL3, CXCL2, CXCL9, CCL8, CXCL6, CXCL11, CXCL12, CCL28, CXCL10 GOTERM_MF_DIRECT GO:0016758~transferase activity transferring hexosyl groups 7.61E‐05 UGT1A7, UGT1A6, MGAT4A, UGT1A9, UGT1A8, UGT1A3, UGT1A4, UGT2A3, UGT2B15, UGT1A1 GOTERM_MF_DIRECT GO:0045236~CXCR chemokine receptor binding 1.11E‐04 CXCL1, CXCL5, CXCL3, CXCL2, CXCL6, CXCL12 KEGG_PATHWAY hsa00040:Pentose and glucuronate interconversions 3.08E‐10 SORD, KL, UGDH, UGT1A1, UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, AKR1B10, UGT1A5, UGT1A4, UGT2A3, UGT2B15, UGP2 KEGG_PATHWAY hsa00053:Ascorbate and aldarate metabolism 1.83E‐07 UGT1A7, UGT1A10, UGT1A6, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGDH, UGT2A3, UGT2B15, UGT1A1 KEGG_PATHWAY hsa00982:Drug metabolism ‐ cytochrome P450 3.48E‐07 GSTA1, MAOA, MAOB, ADH1C, ADH1B, ADH6, UGT1A1, FMO4, UGT1A7, UGT1A10, UGT1A6, FMO5, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGT2A3, UGT2B15 KEGG_PATHWAY hsa00830:Retinol metabolism 3.13E‐06 ADH1C, ADH1B, ADH6, DHRS9, UGT1A1, RDH5, UGT1A7, UGT1A6, UGT1A10, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, UGT2A3, UGT2B15, RETSAT KEGG_PATHWAY hsa00140:Steroid hormone biosynthesis 1.70E‐‐05 HSD3B2, HSD17B2, UGT1A1, UGT1A7, UGT1A6, UGT1A10, UGT1A9, UGT2B17, UGT1A8, UGT1A3, UGT1A5, UGT1A4, HSD11B2, UGT2A3, UGT2B15, AKR1C1 [88]Open in a new tab Abbreviations: CRC, Colorectal cancer; DEGs, Differentially expressed genes 3.3. PPI network construction and module analysis Based on the STRING database, a PPI network of DEGs with 482 nodes and 2,368 interactions was established using the Cytoscape software (Figure [89]3). The most significant modules were extracted from this PPI network by MCODE (Figure [90]4a). The results of signaling pathway enrichment analysis suggested that the module genes were mainly enriched in cell cycle, DNA replication, oocyte meiosis, p53 signaling pathway, and progesterone‐mediated oocyte maturation (Table [91]2). The BP analysis showed that the module genes were significantly related to DNA replication, DNA strand elongation, and DNA‐dependent DNA replication (Figure [92]4b) Figure 3. Figure 3 [93]Open in a new tab Protein–protein interaction networks construction Figure 4. Figure 4 [94]Open in a new tab (a)The top three most significant modules extracted from PPI network and KEGG pathway analysis of module genes. (b)The biological process of module genes analyzed by BiNGO. The color depth of nodes represents the corrected p‐value. The size of nodes represents the number of genes involved. PPI, protein–protein interaction; KEGG, Kyoto Encyclopedia of Genes and Genomes Table 2. The pathway enrichment analysis of module genes pathway ID pathway description false discovery rate hsa4110 Cell cycle 2.87E‐07 hsa3030 DNA replication 4.61E‐05 hsa4114 Oocyte meiosis 0.00258 hsa4115 p53 signaling pathway 0.0128 hsa4914 Progesterone‐mediated oocyte maturation 0.0175 [95]Open in a new tab 3.4. Analysis and validation of hub genes In order to validate the correlation between hub genes expression and the clinical characteristics of CRC, HPA database, and the cBioPortal for Cancer Genomics database were used for further analysis. The mining of GEPIA database also demonstrated that DEGs exhibited significant differences in expression between CRC and normal tissues. This result further confirmed that the expression level of these hub genes was closely correlated with the onset of CRC. A total of 270 CRC patients were available from GEPIA database for the overall survival analysis and divided into high expression and low expression groups. It was found that all the four hub genes were upregulated (Figure [96]5a). Together, the expression level of AURKA, CCNB1, CCNF, and EXO1 could represent the important prognostic biomarkers for predicting the survival of CRC patients (Figure [97]5b). AURKA, CCNB1, CCNF, and EXO1 were selected as hub genes for further analysis. The full names and function roles for these hub genes were shown in Table [98]3. The immunohistochemical staining results from HPA database indicated significantly higher positivity for AURKA, CCNB1, CCNF, and EXO1(Not found in HPA database) in cancer tissues than in adjacent normal tissues (Figure [99]6a). A network of the module genes and their co‐expression genes was constructed using cBioPortal online platform (Figure [100]6b). Figure 5. Figure 5 [101]Open in a new tab The expression level and prognostic value of hub genes according to the GEPIA database. GEPIA, Gene Expression Profiling Interactive Analysis Table 3. The full name, functional roles, p‐value, and LogFC of hub genes gene symbol Summary [102]GSE33113 [103]GSE44076 AURKA The protein encoded by this gene is a cell cycle‐regulated kinase that appears to be involved in microtubule formation and/or stabilization at the spindle pole during chromosome segregation. The encoded protein is found at the centrosome in interphase cells and at the spindle poles in mitosis. This gene may play a role in tumor development and progression. A processed pseudogene of this gene has been found on chromosome 1, and an unprocessed pseudogene has been found on chromosome 10. Multiple transcript variants encoding the same protein have been found for this gene. p value = 3.87e‐07, LogFC = 1.60 p Value = 4.11e‐51, LogFC = 1.65 CCNB1 The protein encoded by this gene is a regulatory protein involved in mitosis. The gene product complexes with p34(cdc2) to form the maturation‐promoting factor (MPF). The encoded protein is necessary for proper control of the G2/M transition phase of the cell cycle. p Value = 2.41e−05, LogFC = 1.31 p Value = 3.56e−37, LogFC = 1.71 CCNF This gene encodes a member of the cyclin family. Cyclins are important regulators of cell cycle transitions through their ability to bind and activate cyclin‐dependent protein kinases. This member also belongs to the F‐box protein family which is characterized by an approximately 40 amino acid motif, the F‐box. The F‐box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1‐cullin‐F‐box), which function in phosphorylation‐dependent ubiquitination. The F‐box proteins are divided into three classes: Fbws containing WD−40 domains, Fbls containing leucine‐rich repeats, and Fbxs containing either different protein–protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it was one of the first proteins in which the F‐box motif was identified. p Value = 1.80e‐06, LogFC = 1.03 p Value = 2.58e‐44, LogFC = 1.32 EXO1 This gene encodes a protein with 5' to 3' exonuclease activity as well as an RNase H activity. It is similar to the Saccharomyces cerevisiae protein Exo1 which interacts with Msh2 and which is involved in mismatch repair and recombination. Alternative splicing of this gene results in three transcript variants encoding two different isoforms. p value = 4.39e‐10, LogFC = 2.43 p value = 1.20e‐43, LogFC = 1.23 [104]Open in a new tab Figure 6. Figure 6 [105]Open in a new tab (a) Representative immunohistochemistry staining results reveal the protein level expression of hub genes in CRC and normal tissues. (b) The network of module genes and their co‐expression genes constructed by cBioPortal. Nodes with thick outline: hub genes; Nodes with thin outline: co‐expression genes. (c) Pop plot of top 20 identified small molecules that could reverse the gene expression of CRC. CRC, Colorectal cancer 3.5. Identification of related active small molecules To screen and identify candidate small molecules for potential therapeutic drugs in CRC, we uploaded upregulated and downregulated DEGs groups into the CMap database for Gene Set Enrichment Analysis and then matched them with small molecule treatment. This procedure aimed to find some small molecules that could reverse the changes of gene expression in CRC. Table [106]4 and Figure [107]6c listed the top 15 most significant small molecules with their enrichment scores and p value. Among these molecules, DL‐thiorphan, repaglinide, MS‐275, and quinostatin showed higher negative correlation and the potential to treat CRC. Table 4. List of the top 15 most significant small molecule drugs that could reverse the tumoral status of colorectal cancer (CRC) cmap name enrichment p DL‐thiorphan −0.932 0.00972 repaglinide −0.864 0.00062 MS–275 −0.857 0.04066 quinostatin −0.853 0.04298 1,4‐chrysenequinone −0.85 0.04519 triflusal −0.847 0.00717 trazodone −0.82 0.0117 piperidolate −0.817 0.0121 latamoxef −0.816 0.01232 ronidazole −0.765 0.02646 pargyline −0.76 0.00676 gliclazide −0.751 0.00774 tyloxapol −0.746 0.00824 acepromazine −0.744 0.00851 methylergometrine −0.741 0.00881 [108]Open in a new tab 4. DISCUSSION Although many studies on CRC development are available, more efforts are needed to identify driver genes and candidate drugs. This study integrated two gene profile datasets and utilized bioinformatics methods to analyze these datasets, and identified 1,179 commonly changed DEGs (428 upregulated and 751 downregulated). Pathway enrichment analysis indicated that that the module genes were mainly enriched in cell cycle, DNA replication, oocyte meiosis, p53 signaling pathway, and progesterone‐mediated oocyte maturation. The PPI network was constructed including 482 nodes and 2,368 edges. AURKA, CCNB1, CCNF, and EXO1 were clearly related to the prognosis of patients. In addition, small molecules that can provide new insights in CRC therapeutic studies were identified. Many researchers have found that four key genes were involved in cell cycle, participating in tumorigenesis and tumor proliferation. AURKA has been studied in a wide range of human malignancies and is associated with poor prognosis in several malignancies, including cervical squamous cell carcinoma, hepatocellular carcinoma, and nonsmall cell lung carcinoma (Ma et al., [109]2017; Wang et al., [110]2018; Zheng et al., [111]2018). Besides, Yang et al., ([112]2017) reported that AURKA as a transactivating co‐factor in the induction of the c‐Myc oncoprotein in breast cancer stem cells (BCSCs). In CRC, AURKA is critical for chromosome 20q amplification‐associated malignant transformation in colorectal adenomas (Chuang et al., [113]2016). CCNB1 acts as a central protein of cell cycle. Owing to its function, it is found that CCNB1 is generally abnormal in tumors. CCNB1 was associated with pathologic grade and metastasis of tumors in cases of human breast and ovarian cancer (Fei et al., [114]2018). Several literatures found that the expression of CCNB1 was significantly associated with pathologic grade, metastasis, and prognosis of tumors (Ding et al., [115]2018; Zuryn, Krajewski, Klimaszewska‐Wisniewska, Grzanka, & Grzanka, [116]2019). CCNF, capable of forming Skp1‐Cul1‐F‐box protein ubiquitin ligase complex, is implicated in controlling centrosome duplication and preventing genome instability. Gagat, Krajewski, Grzanka, and Grzanka ([117]2018) revealed that high expression of CCNF in melanoma patients was associated with worse overall survival. Additionally, with gene network reconstruction, CCNF was regarded as one of the main drivers in cell cycle network in gastric cancer (Zhao et al., [118]2019). But Fu et al. ([119]2013) found that CCNF was downregulated in HCC, which was an independent poor prognostic marker for overall survival. However, the function of CCNF in CRC is not clear. The contribution of EXO1 in the safeguarding stability of the genome during DNA replicative and postreplicative processes is well‐established. EXO1 activity contributes to several DNA repair processes. EXO1 has been associated with different types of cancers owning to its mutations, including colon, breast, ovarian, lung, pancreatic, and gastric tract cancer (Bau et al., [120]2009; Hansen et al., [121]2015; Jin et al., [122]2008; Sun, Zheng, & Shen, [123]2002). Nevertheless, the overexpression of EXO1 has also been reported in several other cancers associated with poor prognosis, which in part is related to increased DNA repair activity (Axelsen, Lotem, Sachs, & Domany, [124]2007; Dai et al., [125]2018; de Sousa et al., [126]2017; Muthuswami et al., [127]2013). In this study, AURKA, CCNB1, CCNF, and EXO1 were significantly upregulated in CRC compared with normal samples, and in CRC patients, the survival rate was positively correlated with the high expression of these genes. Several small molecules with potential therapeutic efficacy against CRC were identified. The most significant small molecules in our result have been reported to display anticancer activity. DL‐thiorphan is served as the specific neutral endopeptidase (NEP) inhibitor, which is widely used to differentiate NEP enzyme activity. NEP enzyme is a membrane‐bound metallopeptidase that plays key roles in wound repair (Muangman, Tamura, & Gibran, [128]2005). DL‐thiorphan may be the target and candidate agent for Type 2 Diabetes Treatment (Wang, Zhao, Shang, & Xia, [129]2014). Besides, thiorphan binding to CD10 might interfere with the processing of neuropeptide hemoregulatory factors and thus influence the progenitor cell proliferation in acute leukemia (Feng et al., [130]2011). However, there are insufficient evidences indicating DL‐thiorphan can be directly used in anticancer. Moreover, repaglininde is a new class of oral antidiabetic agents, which stimulates insulin release within a few minutes by inhibiting ATP‐sensitive potassium channels of the beta‐cell membrane via binding to a receptor distinct from that of sulphonylureas. A previous study revealed that repaglininde may have direct antitumor effects and have been shown to suppress various types of cancer cells in cell culture and in animal models (Stanovic et al., [131]2000). On the other hand, El Sharkawi et al reported that repaglininde may have cytotoxic effects against hepatic, breast, and cervical carcinoma cells (El Sharkawi, Shemy, & Khaled, [132]2014). Thus, we might suppose that these identified drugs could play certain roles to combat CRC. However, further studies were still required to clarify the role of these candidate small molecules in the pathogenesis of CRC. 5. CONCLUSION Using bioinformatics analysis, 1,179 DEGs were identified, which were significantly enriched in several pathways, mainly associated with cell cycle, oocyte meiosis, DNA replication, p53 signaling pathway, and progesterone‐mediated oocyte maturation. We also identified key genes including AURKA, CCNB1, CCNF, and EXO1 that might play important roles in CRC and that might represent novel biomarkers in CRC prognosis and therapy. Additionally, a group of small molecules was identified that might be exploited as adjuvant drugs for improved therapeutics for CRC. However, further investigations are required to validate the predicted drugs. CONFLICT OF INTEREST The authors report no conflict of interest in this work. ACKNOWLEDGMENT