Abstract

   Esophageal squamous cell carcinoma (ESCC) is one of the most common
   malignant tumors. However, the mechanisms underlying ESCC tumorigenesis
   have not been fully elucidated. Thus, we aimed to determine the key
   genes involved in ESCC tumorigenesis. The following bioinformatics
   analyses were performed: identification of differentially expressed
   genes (DEGs); gene ontology and Kyoto Encyclopedia of Genes and Genomes
   pathway enrichment analysis; integrated analysis of the protein–protein
   interaction network and Gene Expression Profiling Interactive Analysis
   database for validation of hub genes. Finally, western blotting and
   qPCR were used to explore the expression of cell division cycle 6
   (CDC6) in ESCC cell lines. Immunohistochemistry analysis of ESCC
   samples from patients and matched clinical characteristics was used to
   determine the effects of CDC6. A total of 494 DEGs were identified, and
   functional enrichment was mainly focused on cell cycle and DNA
   replication. Biological pathway analysis of the hub genes was closely
   related to the cell cycle. We found that CDC6 was upregulated in ESCC
   cell lines and patient tissues and was related to the
   clinicopathological characteristics of ESCC. In conclusion, this study
   identified hub genes and crucial biological pathways related to ESCC
   tumorigenesis and integrated analyses indicated that CDC6 may be a
   novel diagnostic and therapeutic target for ESCC.

   Keywords: ESCC, CDC6, GEO, Bioinformatics analysis

   Subject terms: Cancer, Computational biology and bioinformatics

Introduction

   Esophageal carcinoma (EC) is one of the most malignant tumors with a
   low 5-year survival rate and is the sixth leading cause of cancer death
   worldwide^[30]1. The two most common EC histological subtypes are
   esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma
   (ESCC). ESCC accounts for 90% of all EC cases and is the predominant
   histological type in Eastern Asia^[31]2. New developments have been
   made in the multimodal treatment of several cancers; however, EC is
   still associated with a poor prognosis and high mortality rate. In most
   cases, EC is diagnosed at an advanced stage, disqualifying patients
   from surgical resection. Additionally, owing to chemotherapy
   resistance, EC recurrence is associated with poorer prognosis and
   shorter survival^[32]3. Therefore, identifying new biomarkers and
   targets for patients with ESCC is crucial.

   The tumorigenesis of ESCC is related to genetic and environmental
   factors as well as individual lifestyle factors, including smoking and
   the consumption of hot drinks^[33]4. Gene mutations promote ESCC
   carcinogenesis via complex biological processes and multiple molecular
   interactions. For example, MTA1, SOX2, and FXYD-3 upregulation is
   associated with proliferation and angiogenesis in ESCC^[34]5–[35]7.
   With the use of next-generation gene sequencing, novel genes and
   biomarkers have been identified^[36]8. Numerous differentially
   expressed genes (DEGs), critical for ESCC progression, have been
   reported^[37]9–[38]11. However, the underlying mechanisms of ESCC
   tumorigenesis remain elusive and only a few hub genes are identified as
   ESCC biomarkers^[39]12.

   Therefore, in this study, we aimed to identify DEGs and hub genes, and
   the underlying molecular mechanisms that contribute to ESCC progression
   through bioinformatics analyses of gene expression profiles obtained
   from four GEO databases.

Results

Identification of common DEGs

   Gene expression data was obtained from four GEO datasets ([40]GSE23400,
   [41]GSE38129, [42]GSE20347, and [43]GSE29001). Using differential gene
   expression and Venn analyses, we identified 494 overlapping DEGs. Of
   the 494 genes, 348 were upregulated, and 146 were downregulated
   (Fig. [44]1A,[45]B; Table [46]1).

Figure 1.

   [47]Figure 1
   [48]Open in a new tab

   Identification of common DEGs, as well as GO and KEGG pathway
   enrichment analyses of identified DEGs. (A) Volcano plot of detectable
   genome-wide mRNA profiles in four independent GEO datasets, Red
   indicates DGEs with p < 0.05 and logFC (fold change) > 1. (B) Different
   color areas represented different datasets (blue oval represents
   [49]GSE23400; yellow represents [50]GSE38129; green represents
   [51]GSE20347; pink represents [52]GSE29001). Overlapping area in Venn
   diagrams represents overlapping DEGs among the four datasets. (C) GO
   analysis of DEGs in ESCC, including biological process, cell component,
   and molecular function. (D) KEGG pathway enrichment analysis of
   upregulated DEGs. DEGs, differentially expressed genes; GO, Gene
   Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Table 1.

   348 upregulated and 146 downregulated DEGs were identified from the
   four GEO datasets of ESCC.
   DEGs Genes name
   Upregulated (348) ORC6, MYBL2, TTI1, RFC2, PDPN, ZNF200, CENPN, NEK2,
   RRS1, BID, SLC7A7, AURKA, CDCA3, COLGALT1, NCAPG, MTERF3, CENPM, HJURP,
   TROAP, ECT2, RBM28, LEPREL4, CAD, TTF2, FGD6, ACTL6A, APMAP, RPL39L,
   HOXD11, CDCA8, HSPBAP1, MELK, CBX3, CDK1, DLGAP5, POLE2, KIF14, ADSL,
   HOMER3, MMP11, KIF18B, TBCCD1, STIL, MAD2L1, GINS1, P4HA1, CHST12,
   RUVBL1, RFC4, FANCI, HSPE1, C5AR1, SAC3D1, GMNN, RAD54L, FCGR2A,
   RACGAP1, AURKB, OIP5, RAD54B, ATP2C1, ASAP1, TRMT12, MMP3, AUNIP,
   COL11A1, FSCN1, CCT6A, KIF11, TMEM185B, DBF4, ATAD2, ANP32E, IMPAD1,
   APOC1, SNAI2, DNAJC2, CEP55, COL1A1, DNA2, RAD51AP1, CDC45, MMP1,
   TOPBP1, ADAMDEC1, DTL, BIRC5, FAP, POPDC3, LACTB2, MCM10, ZNF146, MCM5,
   KIF2C, MEST, MCM6, TPX2, NCBP2, PFDN2, PARP1, E2F3, ASPM, FZD6,
   DPY19L4, SOAT1, SNX10, SLC38A6, MINPP1, LPCAT1, LAPTM4B, AHR, PLAU,
   PGAP1, SNRPG, CST4, RNASEH2A, F2R, RAI14, CCL18, SLC20A1, CKS1B, RCN1,
   RNF7, FJX1, HAT1, SOX4, TNFRSF12A, MLF1, POT1, ARPC1B, EFHD2, NCAPD2,
   CDK4, PRC1, YEATS2, DNMT1, HMGB3, DDX60, KIF20A, PCNA, HOXC10, KIF4A,
   PBK, SULF1, SLC16A1, TOP2A, TFRC, TMEM38B, GGH, SLC33A1, SERPINE1,
   NUP155, MCM4, LAMB3, INHBB, CALU, SERPINH1, VCAN, VOPP1, POLR2H,
   COL7A1, NUDT1, HPRT1, LSG1, PRKDC, MARCKSL1, GTF3C3, MCM2, CMC2, LOX,
   SLC25A32, COL3A1, FST, RPA1, CDC20, TTK, MAGEF1, CSGALNACT2, NUP107,
   CDC25B, ACVR1, U2SURP, NUSAP1, CDC6, PLAG1, MMP9, H2AFZ, CXCL1, NDC80,
   BGN, OLFML2B, MMP12, CCNE2, COL5A1, PANX1, CERS2, COL10A1, ADA, DDX39A,
   KREMEN2, NETO2, MCM7, FANCL, HSPD1, FXR1, NCF2, MFAP2, NREP, SNAPC1,
   ZWINT, RSAD2, SPP1, GINS2, PRKRIR, SMYD3, CDC7, MMD, DONSON, CTPS1,
   COL5A2, LMNB1, IGF2BP2, MSN, SP140L, JAG2, PTHLH, KIAA0101, HELLS,
   PLOD3, CKS2, TBL1XR1, SLC39A6, CCNB2, PCOLCE, COTL1, GINS4, DSG2, EZH2,
   TMEM97, CDH3, CCNB1, LUM, HSPH1, GALNT2, VRK1, USP18, TGIF1, CBS,
   MTHFD2, CXCL10, THY1, LY96, HEY1, TGFBI, SH3BP4, CXCL6, NID2, MAGEA12,
   ITGA6, TRIP13, ARMC1, PTPRK, SEL1L3, EPCAM, CCNB1IP1, EN1, COL1A2,
   HMGA2, PUS7, POSTN, FOXM1, AMIGO2, IFI44, IRS1, SSFA2, COL4A1, DFNA5,
   ATP1B3, LOC100129361, MFHAS1, PXDN, MMP10, UBE2C, MMP13, LAMB1, ISG15,
   ANKRD10, COL6A3, CCT2, TIMP1, RRM2, GPNMB, PRSS21, TUSC3, APOE, FAT1,
   ITGAV, APOL1, IFI6, BUB1B, CXCL8, SLC39A14, IFIT1, LAMA3, GCLM, TNC,
   ASPN, SPARC, FADD, LAMC2, PFN2, CDKN3, LYN, GALNT6, ODC1, ANO1, IGFBP3,
   CHST2, HLTF, PPFIA1, SLC6A8, KRT8, PSMB9, CYP24A1, PPAP2C, ALCAM,
   NELL2, ACKR3, TMEM45A, APOBEC3B, SLC7A11, MMP2, CALB1, COL4A2, IFI44L,
   KRT75, MAGEA6, BST2, MMP7, LGALS1, CXCL9, MAGEA11, RBP1, COL4A5, LAMP3,
   GREM1, CXCL11, NTS
   Downregulated (146) SCIN, PAIP2B, CAPN5, CYP4F12, CXCR2, GALNT12,
   CYP2J2, ACPP, MPC1, CWH43, COBL, UPK1A, ZNF426, ABLIM3, ABLIM1, UBL3,
   CPEB3, CEACAM1, FMO2, ID4, FLG, SASH1, GMDS, NUCB2, ENDOU, KAT2B,
   ADIRF, HPGD, EHD3, GPR110, MGLL, RRAD, SLC24A3, GPD1L, PPP1R3C, CRISP3,
   ECHDC2, TMPRSS2, BEX4, HLF, PDZD2, SLURP1, MAL, EPS8L1, GDPD3, STK39,
   IL1RN, SCEL, FUT3, CYP4B1, RORA, GYS2, CYP2C18, ADH1B, CLIC3, GPX3,
   BBOX1, EPB41L3, CES2, BLNK, TTC9, SULT2B1, TM7SF2, ANXA9, EPHX3,
   TRIP10, CCNG2, KLK13, CRNN, SPINK5, PPL, KANK1, SCNN1B, HSPB8, KLF4,
   TMOD3, C1orf116, CYP3A5, GLTP, EMP1, DHRS9, TGM3, FUT6, MXD1, CITED2,
   PITX1, RANBP9, LPIN1, PSCA, TMPRSS11E, ECM1, CEACAM6, OR7E14P, CLCA4,
   CRCT1, CRYAB, IL36A, TMPRSS11D, DUSP5, CRABP2, UPK3B, SERPINB13,
   SERPINB1, IL18, GABRP, IVL, HOPX, EHF, ZNF365, ALOX12, PRSS3, C2orf54,
   SYNPO2L, EPS8L2, NMU, RHCG, DHRS1, RAB11FIP1, ERO1L, OBFC1, CEACAM7,
   EVPL, KRT4, TGM1, SPRR2C, MALL, PLAC8, KRT13, ZNF185, CD24, SERPINB2,
   SPRR3, LYPD3, PTK6, AQP3, EREG, KLK12, AIM1, KRT24, CDA, KLK11, ABCA8,
   SPRR2B, PDZK1IP1, SLPI, LCN2
   [53]Open in a new tab

Gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) pathway
enrichment analyses of the common DEGs

   We attempted to determine the roles of the identified DEGs in the
   progression of ESCC. Functional and pathway enrichment analyses were
   performed using Database for Annotation, Visualization, and Integrated
   Discovery (DAVID) software. GO analysis revealed that the biological
   processes of the DEGs (Supplementary Material [54]1) were significantly
   enriched in extracellular matrix organization, mitotic cell cycle phase
   transition, nuclear division, DNA replication, and chromosome
   segregation. Enriched cell components of the DEGs were primarily
   involved in the extracellular matrix, chromosomal region, spindle,
   nuclear chromosome, and endoplasmic reticulum lumen. Changes in
   molecular function were mainly enriched in extracellular matrix
   structural constituents, structural molecule activity, DNA replication
   origin binding, and DNA helicase activity (Fig. [55]1C). KEGG pathway
   analysis of the upregulated genes (Supplementary Material [56]2)
   revealed involvement in the cell cycle, DNA replication, protein
   digestion and absorption, extracellular matrix (ECM)–receptor
   interaction, small cell lung cancer, amoebiasis, p53 signaling pathway,
   cancer pathway, IL-17 signaling pathway, and AGE-RAGE signaling pathway
   in diabetic complications^[57]13–[58]16(Fig. [59]1D).

Protein–protein interaction (PPI) network and module analysis

   The PPI network of the DEGs was constructed using the STRING database
   and Cytoscape software. The top three significant modules
   (Supplementary Material [60]3) were selected using the MCODE plug-in in
   Cytoscape (Module 1: MCODE score = 66.056; Module 2: MCODE
   score = 21.758; Module 3: MCODE score = 8.621) (Fig. [61]2A). We then
   analyzed the functions of these modules. The enriched KEGG pathway
   showed that Module 1 consisted of 73 nodes and 2378 edges mainly
   enriched in the cell cycle, DNA replication, and oocyte meiosis. Module
   2, containing 34 nodes and 359 edges, was primarily correlated with
   protein digestion and absorption by the ECM receptor. Module 3, which
   included 30 nodes and 125 edges, primarily correlated with
   cytokine–cytokine receptor interactions, chemokine signaling pathways,
   and IL-17 signaling pathways^[62]13–[63]16 (Fig. [64]2B).

Figure 2.

   [65]Figure 2
   [66]Open in a new tab

   Significant modules and enrichment analyses of DEGs. (A) Top three
   modules from PPI network of DEGs. (B) Results of KEGG pathway analysis
   of DEGs in these three modules.

Expression level of hub genes

   Based on the degree of connectivity, the top 10 significant hub genes
   were filtered (Table [67]2). We created a heatmap of the 10 most
   essential hub genes, based on four data series ([68]GSE23400,
   [69]GSE38129, [70]GSE20347, and [71]GSE29001) and found that the hub
   genes could distinguish ESCC samples from normal samples (Fig. [72]3A).
   The PPI network of the 10 hub genes was predicted using the STRING
   database and constructed using Cytoscape (Fig. [73]3B). The biological
   pathways for the top 10 hub genes were cell cycle, mitosis, cell cycle
   checkpoints, G2/M checkpoints, G0 and Early G1, DNA replication, and
   Mitotic G1-G1-/S phases (Fig. [74]3C). Furthermore, we constructed a
   network of hub and co-expressed genes (Fig. [75]3D) and performed gene
   co-expression analysis of the hub genes using the STRING database and
   FunRich software (version 3.1.3; [76]http://www.funrich.org/), which
   indicated that the hub genes may strongly interact with each other
   (Fig. [77]3E).

Table 2.

   Top ten hub genes identified by MCODE scores with Degree method.
   Gene Symbol Description                                       MCODE Score
   CDK1        Cyclin-dependent kinase 1                         104
   CCNB1       Cyclin B1                                         97
   RFC4        Replication factor C subunit 4                    94
   TOP2A       Topoisomerase (DNA) II alpha                      94
   CDC6        Cell division cycle 6                             89
   MAD2L1      Mitotic arrest deficient 2 like 1                 89
   AURKA       Aurora kinase A                                   89
   BUB1B       BUB1 mitotic checkpoint serine/threonine kinase B 88
   PCNA        Proliferating cell nuclear antigen                88
   MCM4        Minichromosome maintenance complex component 4    87
   [78]Open in a new tab

Figure 3.

   [79]Figure 3
   [80]Open in a new tab

   Comprehensive analysis of hub genes. (A) Hierarchical clustering
   heatmap of hub genes from four databases. (B). PPI network of the top
   10 hub genes through the STRING database. (C) Biological pathway for
   the top 10 hub genes. (D). Network of the top 10 hub genes and the
   other 50 frequently altered genes; red nodes represent hub genes and
   green nodes represent the co-expression genes. (E). Results of
   co-expression analysis of the top 10 hub genes.

Expression verification of hub genes and genetic alteration

   To investigate the transcriptional expression difference of the hub
   genes between tumor and normal tissues, we analyzed the datasets from
   the Gene Expression Profiling Interactive Analysis (GEPIA) database
   using one-way analysis of variance (ANOVA) (cut-off criteria:
   |log2FC|= 1 and p = 0.01). As shown in Fig. [81]4A, compared with
   normal tissues, the genes were significantly upregulated in tumor
   tissues. We then investigated the genetic alterations in the hub genes,
   and found that RFC4 and CDC6 had the top two frequencies of alteration
   (35% and 4%, respectively), including missense mutations, truncation
   mutations, amplifications, and deep deletions (Fig. [82]4B). These
   results suggest that RFC4 and CDC6 may play roles in the development
   and progression of ESCC and remained of great significance to further
   research.

Figure 4.

   [83]Figure 4
   [84]Open in a new tab

   Validation of the top 10 gene expression levels based on the GEPIA. (A)
   Expression levels of top 10 hub genes in human ESCC. Gray and red boxes
   represent normal and tumor tissues, respectively. The results were
   consistent with the preceding consequence of this study. (B) Summary of
   genetic alternation of the top 10 hub genes in ESCC. *p < 0.05.

Receiver operating characteristic (ROC) analysis of hub genes and drug–hub
gene interactions

   ROC curve analysis was performed to assess the diagnostic value of the
   top ten hub genes for ESCC (Fig. [85]5A). The expression levels of hub
   genes were extracted from four databases ([86]GSE23400, [87]GSE38129,
   [88]GSE20347, and [89]GSE29001). The area under the curve (AUC) of
   these genes ranged from 0.8353 to 1.000. Furthermore, we identified
   promising drugs associated with these hub genes. By searching druggable
   gene categories in DGIdb, we found that most hub genes matched with
   tumor suppressor genes (Table [90]3). Finally, 35 Food and Drug
   Administration (FDA) approved drugs (Table [91]4) targeting CDK1,
   TOP2A, AURKA, and PCNA were identified, with a large majority of the
   drugs inhibiting TOP2A. Furthermore, a downstream network of the 10 hub
   genes was constructed using the STITCH database (Fig. [92]5B). These
   modules showed that the hub genes were mainly associated with CDK, CDC,
   and MCM family members, and drugs including doxorubicin, rapamycin,
   etoposide, paclitaxel, amsacrine, levofloxacin, alisertib, MLN8054,
   aminopurvalano, dexrazoxane, MgATP, and MgADP.

Figure 5.

   [93]Figure 5
   [94]Open in a new tab

   (A) ROC curves for ESCC diagnosis according to the AUC, red curve
   represents the [95]GSE23400 dataset; green the [96]GSE38129 dataset;
   blue the [97]GSE20347 dataset; and purple the [98]GSE29001 dataset. (B)
   Networks of drug–hub gene interactions. ROC, Receiver operator
   characteristic and AUC, area under the curve.

Table 3.

   Druggable categories of hub genes of ESCC.
   Druggable gene category Matching gene count* Matching gene(s)
   Non-matching gene(s)
   Tumor suppressor 8 CDK1, CCNB1, TOP2A, CDC6, MAD2L1, AURKA, BUB1B, PCNA
   RFC4
   Druggable genome 5 CDK1, CCNB1, TOP2A, AURKA, BUB1B
   RFC4, CDC6, MAD2L1, PCNA
   Kinase 5 CDK1, CCNB1, CDC6, AURKA, BUB1B RFC4, TOP2A, MAD2L1, PCNA
   Serine threonine kinase 5 CDK1, CCNB1, CDC6, AURKA, BUB1B
   RFC4, TOP2A, MAD2L1, PCNA
   DNA repair 3 CDK1, RFC4, PCNA CCNB1, TOP2A, CDC6, MAD2L1, AURKA, BUB1B
   Histone modification 3 CDK1, CCNB1, AURKA
   RFC4, TOP2A, CDC6, MAD2L1, BUB1B, PCNA
   Clinically actionable 2 TOP2A, AURKA
   CDK1, CCNB1, RFC4, CDC6, MAD2L1, BUB1B, PCNA
   Drug resistance 2 CDK1, CCNB1
   RFC4, TOP2A, CDC6, MAD2L1, AURKA, BUB1B, PCNA
   [99]Open in a new tab

   *MCM4 gene category was not included on DGIdb website.

Table 4.

   FDA approved drugs targeting hub genes of ESCC.
   Hub gene Drug                         Interaction type Score
   CDK1     Eltrombopag                  Agonist          1
   CDK1     Romiplostim                  Agonist          1
   TOP2A    Mitoxantrone                 Inhibitor        13
   TOP2A    Teniposide                   Inhibitor        12
   TOP2A    Amsacrine                    Inhibitor        12
   TOP2A    Etoposide                    Inhibitor        10
   TOP2A    Podofilox                    Inhibitor        9
   TOP2A    Valrubicin                   Inhibitor        6
   TOP2A    Epirubicin                   Inhibitor        6
   TOP2A    Doxorubicin                  Inhibitor        4
   TOP2A    Enoxacin                     Inhibitor        4
   TOP2A    Daunorubicin                 –                3
   TOP2A    Vincristine                  –                2
   TOP2A    Norfloxacin                  Inhibitor        2
   TOP2A    Levofloxacin                 Inhibitor        2
   TOP2A    Ofloxacin                    Inhibitor        2
   TOP2A    Pefloxacin                   Inhibitor        2
   TOP2A    Dexrazoxane                  –                2
   TOP2A    Lomefloxacin                 Inhibitor        2
   TOP2A    Dactinomycin                 –                2
   TOP2A    Finafloxacin                 Inhibitor        2
   TOP2A    Idarubicin                   –                2
   TOP2A    Hydroquinone                 –                2
   TOP2A    Doxorubicin hydrochloride    Inhibitor        1
   TOP2A    Etoposide phosphate          Inhibitor        1
   TOP2A    Paclitaxel                   –                1
   TOP2A    Idarubicin hydrochloride     Inhibitor        1
   TOP2A    Daunorubicin hydrochloride   Inhibitor        1
   TOP2A    Mitoxantrone dihydrochloride Inhibitor        1
   TOP2A    Daunorubicin citrate         Inhibitor        1
   AURKA    Paclitaxel                   –                2
   AURKA    Tamoxifen                    –                2
   AURKA    Fluorouracil                 –                2
   PCNA     Capsaicin                    –                2
   PCNA     Pentoxifylline               –                2
   [100]Open in a new tab

High expression levels of CDC6 and its clinicopathologic characteristics in
ESCC

   Through extensive literature review, we found that RFC4, with the
   highest mutation frequency, has been reported in ESCC, whereas
   few studies have focused on CDC6 in ESCC. To determine whether CDC6
   plays a crucial role in ESCC, we evaluated the expression levels of
   CDC6 in ESCC cell lines. Both qPCR and western blotting showed that,
   compared to that in human esophageal epithelial cells, CDC6 was
   upregulated in ESCC cell lines (Fig. [101]6A,B). Furthermore, to detect
   the expression of CDC6 in ESCC samples, we performed
   immunohistochemical staining. Of the 35 samples, 26 exhibited high CDC6
   expression levels, while the remaining samples showed low CDC6
   expression (Fig. [102]6C,D). Statistical analysis of the H-score showed
   that, compared to normal esophageal tissues, CDC6 was upregulated in
   ESCC tissues (Fig. [103]6E). Therefore, we aimed to determine the
   relationship between CDC6 and the clinicopathological characteristics
   of ESCC using CDC6 H-scores (Supplementary Material [104]4). We
   selected five terms: age, grade, tumor size, lymph node metastasis, and
   stage. We found that the gene expression level of CDC6 was related to
   tumor size, age, lymph node metastasis, and stage, whereas it was not
   associated with tumor grade (Fig. [105]6F,G).

Figure 6.

   [106]Figure 6
   [107]Open in a new tab

   Verification of the CDC6 expression levels in ESCC and associated
   clinicopathologic characteristics. (A–B) mRNA and protein levels of
   CDC6 were detected in human esophageal epithelial cells (Het-1A) and
   ESCC cell lines using qPCR (A) and western blotting (B) (Het-1A vs.
   TE1, p < 0.0001; Het-1A vs. TE10, p = 0.0002; Het-1A vs. KYSE30,
   p < 0.0001; Het-1A vs. KYSE150, p = 0.0010; Het-1A vs. KYSE450,
   p < 0.0001; Het-1A vs. KYSE510, p < 0.0001). C. Immunohistochemical
   staining intensity scores of normal esophageal and tumor tissues
   ranging from 0 to 3 (× 200, 50µm). (D) Immunohistochemical staining for
   CDC6 expression in representative ESCC tissues and matched adjacent
   normal tissues (× 200, 50 µm). E. Compared to normal tissues, CDC6 was
   significantly upregulated in ESCC tissues. F–G. Correlation of the CDC6
   H-score reflects the expression levels and clinicopathological
   characteristics, including age, grade, tumor size, lymph node
   metastasis, and stage. qPCR Quantitative polymerase chain reaction
   *p < 0.05, **p < 0.01, ***p < 0.001.

Discussion

   Although many studies have focused on ESCC tumorigenesis^[108]17, the
   incidence of ESCC remains high and its prognosis poor. Therefore, in
   this study, using expression data obtained from four datasets, we aimed
   to identify the critical genes and biological pathways involved in ESCC
   tumorgenesis. We identified 490 DEGs, and GO enrichment analysis showed
   that the identified DEGs were mainly enriched in the mitotic cell cycle
   phase transition, DNA replication origin binding, and biological
   pathways primarily enriched in the cell cycle, DNA replication, and
   protein digestion and absorption, indicating that the DEGs may be
   associated with the progression of ESCC.

   We constructed a PPI network and selected the top three valuable
   modules. The biological pathway analyses of Module 1 revealed that the
   modules were mainly related to cell cycle and DNA replication,
   indicating that the genes may participate in the regulation of ESCC
   proliferation. Deregulation in proliferation remains is key to tumor
   development^[109]18. We found that part of Module 1 is related to the
   cell cycle and DNA replication, and previously, researchers have found
   that RFC4 is associated with increased DNA copy number alterations in
   ESCC^[110]19. CDK1, CCNB1, and CCNB2 are related to the cell cycle and
   are altered in ESCC^[111]20. Matrix metalloproteinases (MMPs) and
   collagen family members constitute the majority of Module 2. MMPs are
   involved in tumor invasion and metastasis. For instance, MMP13
   participates in the proliferation and invasion of ESCC^[112]21. MMP9
   can degrade Collagen IV in the basement membrane, which could promote
   invasion^[113]22. The ectopic expression of the collagen family is
   associated with the prognosis of some cancers^[114]23–[115]26. The
   collagen family is associated with invasion and migration, which
   participate in epithelial to mesenchymal transition (EMT) and
   extracellular matrix remodeling^[116]27. Several genes of the collagen
   family have been identified as potential diagnostic and prognostic
   biomarkers for ESCC; however, the underlying molecular mechanisms need
   to be further expored^[117]28,[118]29. The PPI network of Module 3 was
   mainly connected to the chemokine signaling pathway, cytokine–cytokine
   receptor interaction, and the IL-17 signaling pathway. Chemokine
   signaling pathways cause differences in the proliferation,
   angiogenesis, EMT, and metastasis of various cancers. CXCL[1]6/CXCR6
   chemokine signaling mediates breast cancer^[119]30. IL-8 and CXCR-1 are
   involved in the EMT in colon carcinoma^[120]31. GROalpha-CXCR2 and
   GRObeta-CXCR2 signaling play crucial roles in ESCC cell
   proliferation^[121]32.

   Based on the degree of connectivity, we considered the top 10 genes as
   hub genes. Both the heatmap and the results from the GEIPA database
   showed that the expression levels of the genes were significantly
   higher in tumor samples than in normal tissues, indicating that the
   genes may play key a role in the development of ESCC. The biological
   pathways of the 10 hub genes were mainly associated with cell cycle
   transition, indicating that the genes may affect the proliferation of
   tumor cells. Cyclin B1(CCNB1) silencing inhibited cell proliferation
   and facilitated senescence in pancreatic cancer^[122]33. CCNB1 is
   involved in the pathogenesis of esophagus carcinoma, and the CCNB1
   upregulation is associated with a poor prognosis in patients with
   ESCC^[123]34. PCNA serves as a moving platform that allows DNA- and
   chromatin-interacting proteins to operate at the fork in a DNA
   sequence-independent manner, and targeting PCNA-1 can inhibit the
   proliferation of lung cancer cells^[124]28,[125]35,[126]36. Thus, the
   functions of the genes correspond to their respective biological
   pathways. Additionally, we constructed an interactive network including
   the 10 hub genes and the top 60 frequently altered neighboring genes in
   ESCC. Several significant genes, such as TP53, EGFR, and
   PARP1^[127]37,[128]38 had direct or indirect connections to the 10 hub
   genes. These interactions indicate that hub genes may participate in
   ESCC tumorigenesis. The molecular mechanisms underlying these
   correlations warrant further study.

   Using the hub genes, we searched for candidate drugs and 35 drugs with
   therapeutic efficacy against ESCC were identified. Four of the 10 hub
   genes, CDK1, TOP2A, AURKA, and PCNA, may be the potential drug targets.
   Notably, TOP2A is a promising target for ESCC^[129]20. In addition to
   doxorubicin, rapamycin, paclitaxel, and etoposide, we identified new
   drugs, such as levofloxacin, dexrazoxane, and amsacrine, that have not
   been used in ESCC. Further studies and clinical trials are required to
   explore their effects in ESCC. Most of the hub genes interacted with
   MgATP, MgADP, and phosphate. Intracellular ATP is critical for
   chemoresistance in colon cancer cells^[130]39 and the effect of
   drug–gene interactions on ESCC chemoresistance requires further
   research.

   Alterations in hub genes included missense mutations, truncating
   mutations, amplification, and deep deletions. RFC4, with the highest
   mutation frequency, was upregulated in the early stage and was
   associated with early nodal metastases and tumor immunity in
   ESCC^[131]19. CDC6, with the second highest alteration frequency, has
   rarely been reported in ESCC. CDC6 forms part of the pre-replication
   complex that controls DNA replication licensing in the cell
   cycle^[132]40. Previous studies have shown that CDC6 was overexpressed
   in other tumors^[133]41–[134]43. The expression level of CDC6 in
   ovarian cancer tissues was significantly higher than that in adjacent
   tissues and it was related to tumor stage, differentiation degree,
   lymph node metastasis, ascites and prognosis, which was an independent
   factor of ovarian cancer patients^[135]44. Mahadevappa et al. suggested
   that CDC6 expression was significantly increased in breast cancer
   tissues and correlated with poor prognosis of patients^[136]45. Further
   studies showed that down-regulation of CDC6 expression in bladder
   cancer could significantly inhibit a variety of malignant phenotypes of
   tumor cells^[137]46. After knocking out CDC6, the proliferation of
   tongue squamous cell carcinoma cells was inhibited^[138]47.

   Additionally, CDC6 upregulation represses E-cadherin correlates with
   EMT^[139]48. CDC6 is also involved in the progression of ESCC induced
   by RFBP-and circular RNA circNELL2^[140]49,[141]50. Consistent with the
   results from the bioinformatics analysis, CDC6 was upregulated in the
   ESCC cell lines. Furthermore, we explored the clinical significance of
   the findings by analyzing the CDC6 H-scores. The tumor samples had
   higher H-scores than normal samples. CDC6 expression was associated
   with tumor size, lymph node metastasis, and disease stage, indicating
   that CDC6 may promote the proliferation and invasion of tumor cells and
   serves as a novel ESCC therapeutic target. This study has some
   limitations. First, the number of clinical samples used to investigate
   the expression level of CDC6 was relatively small. Additionally, the
   effect of CDC6 on the prognosis of patients with ESCC remains unclear.
   The molecular mechanisms and biological effects of CDC6 in ESCC have
   not been fully elucidated.

Conclusions

   Using comprehensive bioinformatics analysis and in vitro assays, we
   demonstrated that the CDC6 gene plays a key role in the progression of
   ESCC and serves as a novel potential biomarker and therapeutic target
   for ESCC.

Methods

Data collection and DEGs extraction

   Gene expression profiles of ESCC and adjacent normal tissues
   ([142]GSE23400, [143]GSE38129, [144]GSE20347, and [145]GSE29001) were
   downloaded from the GEO database
   ([146]http://www.ncbi.nlm.nih.gov/geo/). The [147]GSE23400 dataset
   consisted of 53 pairs of tumor tissues and matched noncancerous
   samples. [148]GSE38129 consisted of 30 pairs of tumor and normal tissue
   samples. [149]GSE20347 consisted of 17 pairs of tumor and normal
   esophageal tissue samples. [150]GSE29001 consisted of 21 tumor samples
   and 24 normal tissues. DEGs were identified using R, and the
   significance procedures were as follows: the Affy package was used to
   perform background corrections and normalize the data, and then the
   Limma package was used to perform differential expression analysis.
   p < 0.05 and logFC (fold change) > 1 were set as cut-off criteria.
   Volcano plots for each dataset was drawn in R using the ggplot2
   package, and the overlapping DEGs among the four microarrays were
   visualized using the Venn diagram tool
   ([151]https://bioinfogp.cnb.csic.es/tools/venny/).

Functional enrichment analyses

   GO enrichment and KEGG pathway analyses were performed using the DAVID
   (available online: [152]http://david.ncifcrf.gov). The GO results of
   crucial terms for cellular component (CC), biological process (BP), and
   molecular function (MF) were obtained by importing the DEGs into DAVID,
   and p < 0.01 was considered statistically significant.

PPI network construction and analysis

   PPI network comprising DEGs and hub genes were constructed using
   Metascape online analyses
   ([153]https://metascape.org/gp/index.html#/main/step1), which could
   predict the interactions of proteins, and a combined score > 0.4 was
   considered statistically significant. Molecular interaction networks
   were constructed using Cytoscape (version 3.1.2;
   [154]https://cytoscape.org/release_notes_3_2_1.html). The three most
   significant modules were identified by Molecular Complex Detection
   (MCOD, plug-in in Cytoscape software), which had MCODE scores > 3,
   false degree cut-off = 2, node score cut-off = 0.2, maximum
   depth = 100, and false k-score = 2. Functional enrichment analysis of
   each module was conducted using Metascape.

Identification, analysis, and validation of hub genes

   The top ten genes with the highest degrees of connectivity were
   selected as hub genes. The hub genes within the four ESCC databases
   were visualized using cluster heatmaps drawn using GraphPad Prism
   heatmap (version 6.0; [155]http://uone-tech.cn/graphpad-prism.html).
   The biological pathway and co-expression analyses of the ten hub genes
   was conducted using STRING ([156]https://string-db.org/). We validated
   the expression levels and connections of the ten hub genes in ESCC
   tissues and matched normal tissues using GEPIA
   ([157]http://gepia.cancer-pku.cn/index.html), an online database for
   analyzing gene expression profiles of tumors. The genetic alterations
   of the ten selected hub genes in ESCC were determined using cancer
   genomics prolifers obtained from the cBioCancer Genomic Portal
   ([158]http://www.cbioportal.org/), which contains a large number of
   cancer genomics datasets.

Analysis of drug–hub gene interactions

   The 10 hub genes served as the promising targets for searching for
   candidate drugs through the DGIdb
   ([159]http://dgidb.genome.wustl.edu/), drugs from more than one
   database, or PubMed references that are approved by the FDA. The target