Abstract Background Esophageal squamous cell carcinoma (ESCC) is one of leading malignant cancers of gastrointestinal tract worldwide. Until now, the involved mechanisms during the development of ESCC are largely unknown. This study aims to explore the driven-genes and biological pathways in ESCC. Methods mRNA expression datasets of [49]GSE29001, [50]GSE20347, [51]GSE100942, and [52]GSE38129, containing 63 pairs of ESCC and non-tumor tissues data, were integrated and deeply analyzed. The bioinformatics approaches include identification of differentially expressed genes (DEGs) and hub genes, gene ontology (GO) terms analysis and biological pathway enrichment analysis, construction and analysis of protein–protein interaction (PPI) network, and miRNA–gene network construction. Subsequently, GEPIA2 database and qPCR assay were utilized to validate the expression of hub genes. DGIdb database was performed to search the candidate drugs for ESCC. Results Finally, 120 upregulated and 26 downregulated DEGs were identified. The functional enrichment of DEGs in ESCC were mainly correlated with cell cycle, DNA replication, deleted in colorectal cancer (DCC) mediated attractive signaling pathway, and Netrin-1 signaling pathway. The PPI network was constructed using STRING software with 146 nodes and 2392 edges. The most significant three modules in PPI were filtered and analyzed. Totally ten genes were selected and considered as the hub genes and nuclear division cycle 80 (NDC80) was closely related to the survival of ESCC patients. DGIdb database predicted 33 small molecules as the possible drugs for treating ESCC. Conclusions In summary, the data may provide new insights into ESCC pathogenesis and treatments. The candidate drugs may improve the efficiency of personalized therapy in future. Electronic supplementary material The online version of this article (10.1186/s12935-019-0854-6) contains supplementary material, which is available to authorized users. Keywords: Esophageal squamous cell carcinoma, Bioinformatics, Hub genes, Cell cycle, Differentially expressed genes, Drug Background Esophageal cancer (EC) ranks seventh in terms of incidence and sixth in cancer deaths worldwide, responsible for about 572,000 new cases and 509,000 deaths last year [[53]1]. Although we have made great progress on the early diagnosis and novel therapy, EC still is one of challengeable diseases in Eastern Asian [[54]1]. Generally, EC includes two most common histologic subtypes: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) [[55]2]. ESCC comprises over 90% of all EC cases [[56]1]. And risk factors, such as smoking and hot drinks, are closely related to the initiation of ESCC [[57]1, [58]2]. However, the underlying mechanisms of ESCC are not well understood. And due to the lack of specific biomarkers, most ESCC patients are diagnosed at a late stage, leading to particularly poor outcomes of patients [[59]3]. Even worse, some of ESCC patients suffer from tumor recurrence due to the chemotherapy resistance [[60]3]. Therefore, it is of paramount importance to find novel biomarkers and effective targets for ESCC patients. Recently, gene profile and gene chip have been extensively applied in the field of scientific researches [[61]4, [62]5]. Gene expression analysis based on these methods can quickly detect the differentially expressed genes (DEGs) that may have a strong influence on cancer progression [[63]6]. However, most of the gene chip or gene profile data have been only deposited in public databases. And re-analyzing these data can be an efficient way to provide the new insights into further studies. So far, many studies have used gene chip or gene profile to identify key genes for ESCC, and numerous DEGs have been detected [[64]7]. Nevertheless, the results may be inconsistent and variable because of the existence of tumor heterogeneity. To date, few reliable biomarkers and therapeutic targets have been identified for ESCC [[65]8]. Thus, it’s urgent to discover new markers and therapeutic targets for ESCC patients. Many chemotherapeutic drugs have shown activity against ESCC, including docetaxel [[66]9–[67]11], cisplatin [[68]10, [69]11], fluorouracil [[70]9–[71]11], and nedaplatin [[72]9]. Moreover, the combinations of these agents are also recommended because of the existence of chemotherapy resistance. A recent study found that concurrent chemoradiotherapy (CCRT) with 5-fluorouracil plus cisplatin were more effective and less toxic than CCRT with the docetaxel plus cisplatin as the first-line treatment for ESCC patients [[73]11]. However, the progression-free survival and overall survival (OS) of ESCC patients remained short, highlighting the importance of developing some molecular drugs. In the study, four mRNA expression profiles were downloaded ([74]GSE29001 [[75]12], [76]GSE20347 [[77]13], [78]GSE100942 [[79]14], and [80]GSE38129 [[81]15]) from GEO database, from which there are 63 pairs of ESCC and non-tumor tissues data available. Integrated analyses included identifying DEGs using the GEO2R tool, overlapping four datasets using a Venn diagram tool, GO terms analysis, biological pathway enrichment analysis, PPI construction, hub genes identification and verification, miRNA–hub genes network construction, and exploration of the candidate small molecular drugs for ESCC. Materials and methods Data collection ESCC and adjacent normal tissue gene expression profiles of [82]GSE20347 [[83]13], [84]GSE29001 [[85]12], [86]GSE100942 [[87]14], and [88]GSE38129 [[89]15] were downloaded from GEO ([90]http://www.ncbi.nlm.nih.gov/geo/) database [[91]16]. The microarray data of [92]GSE29001 was based on [93]GPL571 Platforms (Affymetrix Human Genome U133A 2.0 Array) and included 12 pairs of ESCC and non-tumor tissues (Submission date: May 02, 2011). The [94]GSE20347 data was based on [95]GPL571 Platforms (Affymetrix Human Genome U133A 2.0 Array) and included 17 ESCC tissues and 17 normal tissues (Submission date: Feb 16, 2010). The [96]GSE100942 data was based on [97]GPL570 Platforms (Affymetrix Human Genome U133 Plus 2.0 Array) and included 4 ESCC tissues and 4 non-tumor tissues (Submission date: Jul 07, 2017). The [98]GSE38129 data was based on [99]GPL571 Platforms (Affymetrix Human Genome U133A 2.0 Array) and included 30 pairs of ESCC and non-tumor tissues (Submission date: May 22, 2012). The above datasets met the following criteria: (1) they used tissue samples from human ESCC tissues and paired adjacent or non-tumor tissues; (2) each dataset involved more than eight samples. DEGs identification GEO2R ([100]https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to pick out the DEGs in ESCC tissues and adjacent non-tumor tissues [[101]17]. p < 0.05 and |logFC| > 1 were set as the cut-off criterion to select DEGs for every dataset microarray respectively [[102]7, [103]17]. Finally, the overlapping DEGs among the four datasets was identified by Venn diagram tool ([104]http://bioinfogp.cnb.csic.es/tools/venny/). Cell culture, RNA extraction and quantitative PCR (qPCR) Human ESCC cell line EC109 and human esophageal squamous epithelial cell line Het-1A were cultured in RPMI-1640 medium (Gibco) with 10% fetal bovine serum (Gibco) at 37 °C in a humidified atmosphere with 5% CO[2]. Total RNA was extracted from cells using the E.Z.N.A.™ Total RNA Kit I (OMEGA). PrimeScript™ RT Master Mix (Perfect Real Time) was used for RNA reverse transcription. SYBR Premix Ex Taq (TaKaRa) was employed to conduct qPCR assay. PCR primers were designed and synthesized by TaKaRa (Additional file [105]1: Table S1). The experiments were performed in three times. GAPDH was used as the internal control. GO and biological pathway enrichment analysis GO terms analysis of selected DEGs were performed using the DAVID database ([106]https://david.ncifcrf.gov/; version: 6.8) [[107]18]. We submitted the DEGs, including 120 upregulated genes and 26 downregulated genes, into DAVID with p < 0.05 as the cut-off criterion. The GO results of significant terms for cellular component (CC), biological process (BP), and molecular function (MF) were ranked by p-value and exhibited as bar charts. The FunRich tool (version: 3.0) was mainly used for analyzing the functional enrichment and interaction networks of genes and proteins [[108]19]. In this study, the FunRich was used to analyze the biological pathways of DEGs. Finally, the top 10 biological pathways of upregulated genes and downregulated genes were presented as bar charts, respectively. p-value < 0.05 was considered as statistically significant. PPI network construction and analysis PPI networks are the networks of protein complexes formed as the results of biochemical or electrostatic forces [[109]20]. PPI network is crucial for molecular processes, and abnormal PPI is the basis of many diseases, including tumors [[110]21]. In this study, the Search Tool for the Retrieval of Interacting Genes (STRING) database ([111]https://string-db.org/cgi/input.pl; version: 11.0) [[112]20], Cytoscape software (version: 3.6.1) [[113]22], and FunRich were utilized to construct PPI networks. Cytoscape and FunRich tool were applied to present the PPI networks with the cut-off criterion as confidence score ≥ 0.4, maximum number of interactors = 0. The Molecular Complex Detection (MCODE) plug-in of Cytoscape tool was employed to visualize the significant gene modules in ESCC with degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. depth = 100. The criteria for selecting the top 3 significant modules were set as follows: MCODE scores ≥ 4 and number of nodes ≥ 4 [[114]23]. FunRich tool was performed to do the functional enrichment for each module. 10 hub genes with high degree of connectivity were selected and mapped into PPI based on STRING following confidence score ≥ 0.4, maximum number of interactors ≤ 5. Furthermore, STRING was used to perform the co-expression analysis of hub genes. Validation of the hub genes The GEPIA2 ([115]http://gepia2.cancer-pku.cn/#index) is an online database for analyzing gene expression profiles of 9736 tumors and 8587 normal samples from the Cancer Genome Atlas (TCGA) and the genotype-tissue expression (GTEx) projects [[116]24]. Thus, we can validate the expression levels and genes correlations of hub genes in ESCC tissues and normal tissues. The cBio Cancer Genomics Portal ([117]http://www.cbioportal.org/; version: 2.2.0) is an open access tool which provides analysis, visualization, and downloads of cancer genomics datasets of many types of tumors [[118]25]. Complex cancer genomics profiles are accessible from the cBioPortal tool, thus enabling us to compare the genetic alterations of the selected ten hub genes in ESCC. miRNA–hub gene network The targeted miRNAs of hub genes were predicted by four established miRNA target prediction databases [miRanda, PITA, PicTar, and TargetScan (version: 3.1)]. The miRNAs predicted by at least two programs were selected as the targeted miRNAs of hub genes. A co-expression network based on correlation analysis of hub genes and miRNAs associated with cancer was constructed by Cytoscape software. In the network, a green circular node represented the miRNA and a red circular node represented the hub gene, their interaction was represented by an arrow. The numbers of arrows in the networks indicated the contribution of one miRNA to the surrounding genes, and the higher the degree, the more central the hub gene was within the network. Drug-hub gene interaction The 10 hub genes were also served as the promising targets for searching drugs through the DGIdb ([119]http://dgidb.genome.wustl.edu/; version: 3.0.2—sha1 ec916b2) [[120]26]. This database contains drug-gene interaction data from 30 disparate sources including ChEMBL, DrugBank, Ensembl, NCBI Entrez, PharmGKB, PubChem, clinical trial databases, and literature in NCBI PubMed. The results of this process were arranged so that each entry was a specific drug-gene interaction associated with its source link [[121]27]. Drugs supported by more than one databases or PubMed references were selected as the potential