Abstract

Background

   Esophageal squamous cell carcinoma (ESCC) is one of leading malignant
   cancers of gastrointestinal tract worldwide. Until now, the involved
   mechanisms during the development of ESCC are largely unknown. This
   study aims to explore the driven-genes and biological pathways in ESCC.

Methods

   mRNA expression datasets of [49]GSE29001, [50]GSE20347, [51]GSE100942,
   and [52]GSE38129, containing 63 pairs of ESCC and non-tumor tissues
   data, were integrated and deeply analyzed. The bioinformatics
   approaches include identification of differentially expressed genes
   (DEGs) and hub genes, gene ontology (GO) terms analysis and biological
   pathway enrichment analysis, construction and analysis of
   protein–protein interaction (PPI) network, and miRNA–gene network
   construction. Subsequently, GEPIA2 database and qPCR assay were
   utilized to validate the expression of hub genes. DGIdb database was
   performed to search the candidate drugs for ESCC.

Results

   Finally, 120 upregulated and 26 downregulated DEGs were identified. The
   functional enrichment of DEGs in ESCC were mainly correlated with cell
   cycle, DNA replication, deleted in colorectal cancer (DCC) mediated
   attractive signaling pathway, and Netrin-1 signaling pathway. The PPI
   network was constructed using STRING software with 146 nodes and 2392
   edges. The most significant three modules in PPI were filtered and
   analyzed. Totally ten genes were selected and considered as the hub
   genes and nuclear division cycle 80 (NDC80) was closely related to the
   survival of ESCC patients. DGIdb database predicted 33 small molecules
   as the possible drugs for treating ESCC.

Conclusions

   In summary, the data may provide new insights into ESCC pathogenesis
   and treatments. The candidate drugs may improve the efficiency of
   personalized therapy in future.

Electronic supplementary material

   The online version of this article (10.1186/s12935-019-0854-6) contains
   supplementary material, which is available to authorized users.

   Keywords: Esophageal squamous cell carcinoma, Bioinformatics, Hub
   genes, Cell cycle, Differentially expressed genes, Drug

Background

   Esophageal cancer (EC) ranks seventh in terms of incidence and sixth in
   cancer deaths worldwide, responsible for about 572,000 new cases and
   509,000 deaths last year [[53]1]. Although we have made great progress
   on the early diagnosis and novel therapy, EC still is one of
   challengeable diseases in Eastern Asian [[54]1]. Generally, EC includes
   two most common histologic subtypes: esophageal squamous cell carcinoma
   (ESCC) and esophageal adenocarcinoma (EAC) [[55]2]. ESCC comprises over
   90% of all EC cases [[56]1]. And risk factors, such as smoking and hot
   drinks, are closely related to the initiation of ESCC [[57]1, [58]2].
   However, the underlying mechanisms of ESCC are not well understood. And
   due to the lack of specific biomarkers, most ESCC patients are
   diagnosed at a late stage, leading to particularly poor outcomes of
   patients [[59]3]. Even worse, some of ESCC patients suffer from tumor
   recurrence due to the chemotherapy resistance [[60]3]. Therefore, it is
   of paramount importance to find novel biomarkers and effective targets
   for ESCC patients.

   Recently, gene profile and gene chip have been extensively applied in
   the field of scientific researches [[61]4, [62]5]. Gene expression
   analysis based on these methods can quickly detect the differentially
   expressed genes (DEGs) that may have a strong influence on cancer
   progression [[63]6]. However, most of the gene chip or gene profile
   data have been only deposited in public databases. And re-analyzing
   these data can be an efficient way to provide the new insights into
   further studies. So far, many studies have used gene chip or gene
   profile to identify key genes for ESCC, and numerous DEGs have been
   detected [[64]7]. Nevertheless, the results may be inconsistent and
   variable because of the existence of tumor heterogeneity. To date, few
   reliable biomarkers and therapeutic targets have been identified for
   ESCC [[65]8]. Thus, it’s urgent to discover new markers and therapeutic
   targets for ESCC patients.

   Many chemotherapeutic drugs have shown activity against ESCC, including
   docetaxel [[66]9–[67]11], cisplatin [[68]10, [69]11], fluorouracil
   [[70]9–[71]11], and nedaplatin [[72]9]. Moreover, the combinations of
   these agents are also recommended because of the existence of
   chemotherapy resistance. A recent study found that concurrent
   chemoradiotherapy (CCRT) with 5-fluorouracil plus cisplatin were more
   effective and less toxic than CCRT with the docetaxel plus cisplatin as
   the first-line treatment for ESCC patients [[73]11]. However, the
   progression-free survival and overall survival (OS) of ESCC patients
   remained short, highlighting the importance of developing some
   molecular drugs.

   In the study, four mRNA expression profiles were downloaded
   ([74]GSE29001 [[75]12], [76]GSE20347 [[77]13], [78]GSE100942 [[79]14],
   and [80]GSE38129 [[81]15]) from GEO database, from which there are 63
   pairs of ESCC and non-tumor tissues data available. Integrated analyses
   included identifying DEGs using the GEO2R tool, overlapping four
   datasets using a Venn diagram tool, GO terms analysis, biological
   pathway enrichment analysis, PPI construction, hub genes identification
   and verification, miRNA–hub genes network construction, and exploration
   of the candidate small molecular drugs for ESCC.

Materials and methods

Data collection

   ESCC and adjacent normal tissue gene expression profiles of
   [82]GSE20347 [[83]13], [84]GSE29001 [[85]12], [86]GSE100942 [[87]14],
   and [88]GSE38129 [[89]15] were downloaded from GEO
   ([90]http://www.ncbi.nlm.nih.gov/geo/) database [[91]16]. The
   microarray data of [92]GSE29001 was based on [93]GPL571 Platforms
   (Affymetrix Human Genome U133A 2.0 Array) and included 12 pairs of ESCC
   and non-tumor tissues (Submission date: May 02, 2011). The [94]GSE20347
   data was based on [95]GPL571 Platforms (Affymetrix Human Genome U133A
   2.0 Array) and included 17 ESCC tissues and 17 normal tissues
   (Submission date: Feb 16, 2010). The [96]GSE100942 data was based on
   [97]GPL570 Platforms (Affymetrix Human Genome U133 Plus 2.0 Array) and
   included 4 ESCC tissues and 4 non-tumor tissues (Submission date: Jul
   07, 2017). The [98]GSE38129 data was based on [99]GPL571 Platforms
   (Affymetrix Human Genome U133A 2.0 Array) and included 30 pairs of ESCC
   and non-tumor tissues (Submission date: May 22, 2012). The above
   datasets met the following criteria: (1) they used tissue samples from
   human ESCC tissues and paired adjacent or non-tumor tissues; (2) each
   dataset involved more than eight samples.

DEGs identification

   GEO2R ([100]https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used to pick
   out the DEGs in ESCC tissues and adjacent non-tumor tissues [[101]17].
   p < 0.05 and |logFC| > 1 were set as the cut-off criterion to select
   DEGs for every dataset microarray respectively [[102]7, [103]17].
   Finally, the overlapping DEGs among the four datasets was identified by
   Venn diagram tool ([104]http://bioinfogp.cnb.csic.es/tools/venny/).

Cell culture, RNA extraction and quantitative PCR (qPCR)

   Human ESCC cell line EC109 and human esophageal squamous epithelial
   cell line Het-1A were cultured in RPMI-1640 medium (Gibco) with 10%
   fetal bovine serum (Gibco) at 37 °C in a humidified atmosphere with 5%
   CO[2]. Total RNA was extracted from cells using the E.Z.N.A.™ Total RNA
   Kit I (OMEGA). PrimeScript™ RT Master Mix (Perfect Real Time) was used
   for RNA reverse transcription. SYBR Premix Ex Taq (TaKaRa) was employed
   to conduct qPCR assay. PCR primers were designed and synthesized by
   TaKaRa (Additional file [105]1: Table S1). The experiments were
   performed in three times. GAPDH was used as the internal control.

GO and biological pathway enrichment analysis

   GO terms analysis of selected DEGs were performed using the DAVID
   database ([106]https://david.ncifcrf.gov/; version: 6.8) [[107]18]. We
   submitted the DEGs, including 120 upregulated genes and 26
   downregulated genes, into DAVID with p < 0.05 as the cut-off criterion.
   The GO results of significant terms for cellular component (CC),
   biological process (BP), and molecular function (MF) were ranked by
   p-value and exhibited as bar charts. The FunRich tool (version: 3.0)
   was mainly used for analyzing the functional enrichment and interaction
   networks of genes and proteins [[108]19]. In this study, the FunRich
   was used to analyze the biological pathways of DEGs. Finally, the top
   10 biological pathways of upregulated genes and downregulated genes
   were presented as bar charts, respectively. p-value < 0.05 was
   considered as statistically significant.

PPI network construction and analysis

   PPI networks are the networks of protein complexes formed as the
   results of biochemical or electrostatic forces [[109]20]. PPI network
   is crucial for molecular processes, and abnormal PPI is the basis of
   many diseases, including tumors [[110]21]. In this study, the Search
   Tool for the Retrieval of Interacting Genes (STRING) database
   ([111]https://string-db.org/cgi/input.pl; version: 11.0) [[112]20],
   Cytoscape software (version: 3.6.1) [[113]22], and FunRich were
   utilized to construct PPI networks. Cytoscape and FunRich tool were
   applied to present the PPI networks with the cut-off criterion as
   confidence score ≥ 0.4, maximum number of interactors = 0. The
   Molecular Complex Detection (MCODE) plug-in of Cytoscape tool was
   employed to visualize the significant gene modules in ESCC with degree
   cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. depth = 100.
   The criteria for selecting the top 3 significant modules were set as
   follows: MCODE scores ≥ 4 and number of nodes ≥ 4 [[114]23]. FunRich
   tool was performed to do the functional enrichment for each module. 10
   hub genes with high degree of connectivity were selected and mapped
   into PPI based on STRING following confidence score ≥ 0.4, maximum
   number of interactors ≤ 5. Furthermore, STRING was used to perform the
   co-expression analysis of hub genes.

Validation of the hub genes

   The GEPIA2 ([115]http://gepia2.cancer-pku.cn/#index) is an online
   database for analyzing gene expression profiles of 9736 tumors and 8587
   normal samples from the Cancer Genome Atlas (TCGA) and the
   genotype-tissue expression (GTEx) projects [[116]24]. Thus, we can
   validate the expression levels and genes correlations of hub genes in
   ESCC tissues and normal tissues. The cBio Cancer Genomics Portal
   ([117]http://www.cbioportal.org/; version: 2.2.0) is an open access
   tool which provides analysis, visualization, and downloads of cancer
   genomics datasets of many types of tumors [[118]25]. Complex cancer
   genomics profiles are accessible from the cBioPortal tool, thus
   enabling us to compare the genetic alterations of the selected ten hub
   genes in ESCC.

miRNA–hub gene network

   The targeted miRNAs of hub genes were predicted by four established
   miRNA target prediction databases [miRanda, PITA, PicTar, and
   TargetScan (version: 3.1)]. The miRNAs predicted by at least two
   programs were selected as the targeted miRNAs of hub genes. A
   co-expression network based on correlation analysis of hub genes and
   miRNAs associated with cancer was constructed by Cytoscape software. In
   the network, a green circular node represented the miRNA and a red
   circular node represented the hub gene, their interaction was
   represented by an arrow. The numbers of arrows in the networks
   indicated the contribution of one miRNA to the surrounding genes, and
   the higher the degree, the more central the hub gene was within the
   network.

Drug-hub gene interaction

   The 10 hub genes were also served as the promising targets for
   searching drugs through the DGIdb ([119]http://dgidb.genome.wustl.edu/;
   version: 3.0.2—sha1 ec916b2) [[120]26]. This database contains
   drug-gene interaction data from 30 disparate sources including ChEMBL,
   DrugBank, Ensembl, NCBI Entrez, PharmGKB, PubChem, clinical trial
   databases, and literature in NCBI PubMed. The results of this process
   were arranged so that each entry was a specific drug-gene interaction
   associated with its source link [[121]27]. Drugs supported by more than
   one databases or PubMed references were selected as the potential