Abstract

   Background: Non-small-cell lung cancer (NSCLC) remains the leading
   cause of cancer morbidity and mortality worldwide. In the present
   study, we identified novel biomarkers associated with the pathogenesis
   of NSCLC aiming to provide new diagnostic and therapeutic approaches
   for NSCLC.

   Methods: The microarray datasets of [50]GSE18842, [51]GSE30219,
   [52]GSE31210, [53]GSE32863 and [54]GSE40791 from Gene Expression
   Omnibus database were downloaded. The differential expressed genes
   (DEGs) between NSCLC and normal samples were identified by limma
   package. The construction of protein–protein interaction (PPI) network,
   module analysis and enrichment analysis were performed using
   bioinformatics tools. The expression and prognostic values of hub genes
   were validated by GEPIA database and real-time quantitative PCR. Based
   on these DEGs, the candidate small molecules for NSCLC were identified
   by the CMap database.

   Results: A total of 408 overlapping DEGs including 109 up-regulated and
   296 down-regulated genes were identified; 300 nodes and 1283
   interactions were obtained from the PPI network. The most significant
   biological process and pathway enrichment of DEGs were response to
   wounding and cell adhesion molecules, respectively. Six DEGs (PTTG1,
   TYMS, ECT2, COL1A1, SPP1 and CDCA5) which significantly up-regulated in
   NSCLC tissues, were selected as hub genes according to the results of
   module analysis. The GEPIA database further confirmed that patients
   with higher expression levels of these hub genes experienced a shorter
   overall survival. Additionally, CMap predicted the 20 most significant
   small molecules as potential therapeutic drugs for NSCLC. DL-thiorphan
   was the most promising small molecule to reverse the NSCLC gene
   expression.

   Conclusions: Based on the gene expression profiles of 696 NSCLC samples
   and 237 normal samples, we first revealed that PTTG1, TYMS, ECT2,
   COL1A1, SPP1 and CDCA5 could act as the promising novel diagnostic and
   therapeutic targets for NSCLC. Our work will contribute to clarifying
   the molecular mechanisms of NSCLC initiation and progression.

   Keywords: non-small-cell lung cancer, novel biomarkers, candidate small
   molecules, prognosis, bioinformatics analysis

Introduction

   Lung cancer remains the leading cause of cancer morbidity and mortality
   worldwide. In 2018, there are 234,030 newly diagnosed lung cancer
   patients, accounting for 13.5% of all types of malignant tumors. In
   addition, lung cancer results in approximately 154,050 death cases each
   year, accounting for 25.3% of all cancer-related deaths; 80–85% of all
   lung cancer patients are diagnosed with non-small-cell lung cancer
   (NSCLC) subtype and 80% lung cancer-associated deaths are caused by
   NSCLC. The subtypes of NSCLC mainly consist of lung adenocarcinoma,
   lung squamous cell carcinoma and large cell lung cancer based on
   histological sub-classification.[55]^1^–[56]^4 Although great advance
   has been made in the therapeutic methods for NSCLC such as surgical
   resection, chemotherapy, radiotherapy and targeted therapy, the
   patients’ prognosis is still far from ideal, with 5-year survival rate
   less than 20%. For advanced patients who are inoperable, chemotherapy
   such as platinum still remains the most ideal and important treatment
   strategy for NSCLC.[57]^5^–[58]^7 The poor long-term survival rate of
   NSCLC patients is mainly attributed to the lack of specific symptoms
   and effective diagnostic methods at an early stage. In additionally,
   high metastasis rate and drug resistance are also vital factors that
   can not be ignored.[59]^8^–[60]^10 In recent years, with our increased
   understanding of molecular characterization of NSCLC, molecular
   targeting therapies especially individualized precision treatment have
   undergone remarkable developments.[61]^11^,[62]^12 Despite the
   prominent progress in the molecular diagnosis and treatment for NSCLC,
   substantive breakthroughs have not yet been made in patients’
   survival.[63]^13 Therefore, there is still an urgent demand to identify
   the novel biomarkers correlated with NSCLC diagnosis and prognosis to
   elucidate the precise molecular mechanism of NSCLC occurrence and
   progression. In this study, the microarray data of [64]GSE18842,
   [65]GSE30219, [66]GSE31210, [67]GSE32863 and [68]GSE40791 from Gene
   Expression Omnibus (GEO) database was used to identify the differential
   expressed genes (DEGs) between NSCLC and adjacent normal tissues. Gene
   Ontology (GO) and pathway enrichment analysis were performed to better
   understand the biological functions of these DEGs. We also established
   a protein–protein interaction (PPI) network associated the DEGs.
   Furthermore, we also identified potential candidate small molecules for
   a better treatment of NSCLC. Six novel biomarkers were found to be
   related to the pathogenesis and prognosis of NSCLC. In summary, this
   study aimed to exploit promising novel biomarkers for NSCLC diagnosis,
   prognosis and molecular targeting therapies from new insights.
   [69]Figure 1 shows the workflow of our study.

Figure 1.

   Figure 1
   [70]Open in a new tab

   The workflow of this study.

Materials and methods

Data resources

   Series matrix files of [71]GSE18842, [72]GSE30219, [73]GSE31210,
   [74]GSE32863 and [75]GSE40791 were downloaded from GEO database
   ([76]http://www.ncbi.nlm.nih.gov/geo/). The platforms were based on
   [77]GPL9948 (Agilent Human 0.6 K miRNA Microarray G4471A; Agilent
   Technologies, Santa Clara, CA, USA) ([78]GSE32863) and [79]GPL570
   (Affymetrix Human Genome U113 Plus 2.0 Array) ([80]GSE18842,
   [81]GSE30219, [82]GSE31210 and [83]GSE40791). A total of 696 NSCLC
   samples and 237 normal samples were included in our study, of which 46
   tumor samples and 45 normal samples were in [84]GSE18842 profile, 272
   tumor samples and 14 normal samples in [85]GSE30219 profile, 226 tumor
   samples and 20 normal samples in [86]GSE31210 profile, 58 tumor samples
   and 58 normal samples in [87]GSE32863 profile, and 94 tumor samples and
   100 normal samples in [88]GSE40791 profile.

Screeningfor DEGs

   The matrix data of each dataset was performed log2 conversion and
   normalization using limma package of R/Bioconductor software.[89]^14
   The limma package was also utilized to screen and identify the DEGs
   between NSCLC samples and normal tissue sample. Adjust P-value <0.05
   and |log2FC| >1 were considered the statistical significance of
   differential expression.

Functional enrichment analysis

   GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway
   enrichment analysis were performed to determine the biological
   functions of the overlapping DEGs. GO enrichment analysis is an
   extensively used method to investigate the molecular function (MF),
   cell component (CC) and biological process (BP) of genes or gene
   products. KEGG is a widely used database for systematic analysis of
   high-level gene functions. In this study, we carried out GO function
   and KEGG pathway enrichment analysis based on the platform of Database
   for Annotation Visualization and Integrated Discovery (DAVID,
   [90]http://david.ncifcrf.gov), an online database rich in comprehensive
   annotation information of gene and protein functions. P-value <0.05 was
   considered statistically significant.[91]^15^–[92]^19

Protein–protein interaction (PPI) network construction and module analysis

   We used the online database STRING (Search Tool for the Retrieval of
   Interacting Genes, [93]https://string-db.org/) to better illustrate the
   potential interactive relationships among the overlapping DEGs.[94]^20
   Then the Cytoscape software was utilized for analyzing the interactions
   with a combined score >0.4 ([95]http://www.cytoscape.org/).[96]^21
   Finally, the plug-in MCODE (Molecular Complex Detection) was used to
   filter the significant modules from the PPI network for the selection
   of hub genes (degree cutoff = 2, node score cutoff = 0.2, k-core = 2,
   and max. depth = 100).[97]^22 We also performed functional and pathway
   enrichment analysis for the genes in the significant modules. The heat
   map of module genes was constructed using UCSC Cancer Genomics Browser
   ([98]http://genome-cancer.ucsc.edu). The Networks Gene Oncology tool
   (BiNGO), a plugin in Cytoscape, was used to explore and visualize the
   BP of the selected hub genes.[99]^23

Survival analysis and validation of hub genes

   To further explore the roles of module genes in the NSCLC occurrence
   and development, we predicted the co-expression genes of module genes
   and the co-expression network was constructed by cBioPortal online
   platform ([100]http://www.cbioportal.org).[101]^24^,[102]^25 The Gene
   Expression Profiling Interactive Analysis (GEPIA) database was utilized
   to assess the impact of hub genes on the patients’ prognosis.[103]^26
   The NSCLC patients were divided into high expression and low expression
   groups according to the median expression levels of hub genes. The
   hazard ratio (HR) with 95% CI of overall survival was calculated for
   each group. And the GEPIA platform was applied to further verify the
   expression level of hub genes between NSCLC and normal samples. We
   analyzed the protein expression of hub genes by using the human protein
   atlas (HPA, [104]www.proteinatlas.org) database considering that gene
   expression was not always consistent with its protein level.[105]^27

Identification of small molecules

   The CMap database ([106]http://www.broadinstitute.org/cmap/) was used
   to explore potential small molecule drugs for use in patients based on
   the gene signature of NSCLC. CMap collects >7,000 gene expression
   profile changes induced by various small molecular agents.[107]^28 The
   overlapping differently expressed probesets among five datasets were
   classified into up-regulated and down-regulated groups. Then, these
   probesets from the two groups were uploaded into CMap database to match
   corresponding active small molecules. Finally, the enrichment scores
   between −1 and 1, which represent similarity, were calculated. A
   positive connectivity score (closer to +1) indicated the corresponding
   small molecule is able to induce the state of NSCLC cells, whereas a
   negative connectivity score (closer to −1) demonstrate greater
   similarity between the genes. We investigated and calculated negative
   connectivity scores with potential therapeutic value.

Real-time quantitative PCR

   Total RNA from tumor tissues and non-tumorous tissues was extracted
   with Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the
   protocol. cDNA was synthesized using an Omniscript Reverse
   Transcription kit (Qiagen, Valencia, CA, USA). Quantitative real-time
   PCR (qPCR) assays were performed using EvaGreen Master Mix (Biotium
   Inc., Hayward, CA, USA). The conditions for qPCR amplification were as
   follows: 95°C for 120 s followed by 40 cycles of 95°C for 15 s,
   annealing temperature for 45 s. Each sample was run in triplicate.
   Relative expression level for each target gene was normalized by the Ct
   value of GAPDH (endogenous reference) using a 2^−ΔΔCt relative
   quantification method. The primers are as follows:

   PTTG1 gene 5ʹ-GACTCAGGCTGGAAGATTTG-3ʹ (sense) and 5ʹ-
   GGGAAGGTGGGAGAAGC-3ʹ (anti-sense).

   CDCA5 gene 5ʹ-TTTTCAGTTCCGTGGGTTTC-3ʹ (sense) and
   5ʹ-CCCAACTAAGGCTCCCTACAT-3ʹ (anti-sense).

   TYMS gene 5ʹ-AGCGAGAACCCAGACCTT-3ʹ (sense) and
   5ʹ-AATAGTTGGATGCGGATTGTA-3ʹ (anti-sense).

   ECT2 gene 5ʹ-AGGCGGAATGAACAGGA-3ʹ (sense) and 5ʹ-TTCATCTCCAAGCGGTAAA-3ʹ
   (anti-sense)

   COL1A1 gene 5ʹ-CAAGGTGTTGTGCGATGACG-3ʹ (sense) and
   5ʹ-CGACGCCGGTGGTTTCTT-3ʹ (anti-sense)

   SPP1 gene 5ʹ-CTGCCAGCAACCGAAGT-3ʹ (sense) and
   5ʹ-GTGATGTCCTCGTCTGTAGC-3ʹ (anti-sense);

   All reactions were performed on the Eppendorf Mastercycler ep realplex
   (2S; Eppendorf, Hamburg, Germany). using following cycling parameters,
   95°C for 2 min, followed by 40 cycles of 95°C for 15 s, 60°C for 45 s.

Ethics statement

   This study was performed with the approval of the institutional ethics
   committee of the Affiliated Hospital of Nantong University. And written
   informed consent had been provided for the NSCLC patients included in
   the present study, which was conducted in accordance with the
   Declaration of Helsinki.

Results

Identification of DEGs in NSCLC

   After integrated bioinformatical analysis for [108]GSE18842,
   [109]GSE30219, [110]GSE31210, [111]GSE32863 and [112]GSE40791 datasets,
   a total of 408 overlapping genes were found to be differentially
   expressed. The volcano plot showed the up-regulated and down-regulated
   DEGs in each dataset with the cutoff criterion of P<0.05 and |log2FC|
   >1. The Venn diagrams showed the 408 overlap DEGs among the three
   datasets ([113]Figure 2Ba) including 109 significantly up-regulated
   genes ([114]Figure 2Bb) and 296 down-regulated genes ([115]Figure 2Bc).

Figure 2.

   [116]Figure 2
   [117]Open in a new tab

   (A) Volcano plot of gene expression profile data between NSCLC and
   normal tissues in each dataset. Red dots: significantly up-regulated
   genes in NSCLC; green dots: significantly down-regulated genes in
   NSCLC; black dots: non-differentially expressed genes. Adj. P<0.01 and
   |log[2] FC|>1 were considered as significant. (Ba) Venn diagram of 408
   overlapping DEGs from [118]GSE18842, [119]GSE30219, [120]GSE31210,
   [121]GSE32863 and [122]GSE40791 datasets. (Bb) Up-regulated DEGs (Bc)
   Down-regulated DEGs.

Enrichment analyses

   In order to investigate the biological functions of these DEGs in
   NSCLC, GO and KEGG pathway, enrichment analysis was performed using
   DAVID. For BPs, GO analysis results indicated that up-regulated and
   down-regulated DEGs were significantly enriched in response to
   wounding, negative regulation of cell proliferation, regulation of cell
   proliferation, cell adhesion and response to steroid hormone stimulus.
   CC analysis showed that these DEGs were particularly involved in
   extracellular region part, extracellular region, extracellular space,
   proteinaceous extracellular matrix and extracellular matrix. Similarly,
   changes in MF of DEGs were significantly enriched in carbohydrate
   binding, growth factor binding, glycosaminoglycan binding, transforming
   growth factor beta binding and pattern binding. Furthermore, KEGG
   pathway enrichment analysis revealed that these DEGs were mainly
   enriched in cell adhesion molecules (CAMs), leukocyte transendothelial
   migration, TGF-beta signaling pathway, complement and coagulation
   cascades and ECM-receptor interaction ([123]Figure 3 and [124]Table 1).

Figure 3.

   [125]Figure 3
   [126]Open in a new tab

   Functional and signaling pathway analysis of the overlapped DEGs in
   NSCLC. (A) Biological processes, (B) cellular components, (C) molecular
   function and (D) KEGG pathway.

Table 1.

   Functional and pathway enrichment analysis of the overlap DEGs
   Category Term Count PValue
   GOTERM_BP_FAT GO:0009611~response to wounding 39 3.00E-09
   GOTERM_BP_FAT GO:0008285~negative regulation of cell proliferation 31
   5.60E-09
   GOTERM_BP_FAT GO:0048545~response to steroid hormone stimulus 22
   1.08E-08
   GOTERM_BP_FAT GO:0042127~regulation of cell proliferation 48 1.34E-08
   GOTERM_BP_FAT GO:0007155~cell adhesion 44 2.70E-08
   GOTERM_CC_FAT GO:0044421~extracellular region part 78 5.55E-20
   GOTERM_CC_FAT GO:0005576~extracellular region 102 5.06E-12
   GOTERM_CC_FAT GO:0005615~extracellular space 52 5.87E-12
   GOTERM_CC_FAT GO:0005578~proteinaceous extracellular matrix 32 2.19E-10
   GOTERM_CC_FAT GO:0031012~extracellular matrix 33 3.36E-10
   GOTERM_MF_FAT GO:0030246~carbohydrate binding 28 1.64E-07
   GOTERM_MF_FAT GO:0019838~growth factor binding 14 1.59E-06
   GOTERM_MF_FAT GO:0005539~glycosaminoglycan binding 16 1.64E-06
   GOTERM_MF_FAT GO:0050431~transforming growth factor beta binding 6
   1.92E-06
   GOTERM_MF_FAT GO:0001871~pattern binding 16 5.41E-06
   KEGG_PATHWAY hsa04514:Cell adhesion molecules (CAMs) 13 6.98E-04
   KEGG_PATHWAY hsa04670:Leukocyte transendothelial migration 10 0.010253
   KEGG_PATHWAY hsa04350:TGF-beta signaling pathway 8 0.017312
   KEGG_PATHWAY hsa04610:Complement and coagulation cascades 7 0.019019
   KEGG_PATHWAY hsa04512:ECM-receptor interaction 7 0.044383
   [127]Open in a new tab

PPI network construction and module analysis

   The STRING database and Cytoscape were used to construct a PPI network
   of the potential interactions between the overlapping DEGs. As
   presented in [128]Figure 4, there were 300 nodes and 1283 interactions
   found in the network. The top three significant modules were detected
   by MCODE ([129]Figure 5). Pathway enrichment analysis suggested that
   the module1 genes were mainly enriched in DNA replication, cell cycle
   and oocyte meiosis ([130]Figure 5A). The genes in module 2 were mainly
   enriched in tumor necrosis factor signaling pathway, CAMs and African
   trypanosomiasis ([131]Figure 5B). The genes in module 3 were
   significantly enriched in PPAR signaling pathway, ECM-receptor
   interaction and Focal adhesion ([132]Figure 5C) ([133]Table 2). The
   heat map clearly showed the significant difference of module genes
   between cancer tissues and adjacent tissues([134]Figure 6A and B).
   Macromolecular complex subunit organization, S phase and cell cycle
   phase were the main BP of module genes ([135]Figure 6C). Among the
   module genes, we selected PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5
   with high degree of connectivity as hub genes ([136]Table 3). The
   expression of hub genes in NSCLC tissues was significantly up-regulated
   compared to normal tissues.

Figure 4.

   [137]Figure 4
   [138]Open in a new tab

   The protein–protein interaction networks of overlapping DEGs.

Figure 5.

   [139]Figure 5
   [140]Open in a new tab

   The three most significant modules extracted from PPI network and KEGG
   pathway analysis of module genes.

Table 2.

   The pathway enrichment analysis of module genes
   Module pathway ID pathway description observed gene count false
   discovery rate
   Module1 hsa3030 DNA replication 3 0.00264
   Module1 hsa4110 Cell cycle 4 0.00264
   Module1 hsa4114 Oocyte meiosis 3 0.0336
   Module2 hsa4668 TNF signaling pathway 3 0.00213
   Module2 hsa4514 Cell adhesion molecules (CAMs) 3 0.00225
   Module2 hsa5143 African trypanosomiasis 2 0.00672
   Module3 hsa3320 PPAR signaling pathway 3 0.00117
   Module3 hsa4512 ECM-receptor interaction 3 0.00117
   Module3 hsa4510 Focal adhesion 3 0.0103
   [141]Open in a new tab

Figure 6.

   [142]Figure 6
   [143]Open in a new tab

   (A, B) The heatmap of module genes between NSCLC (LUAD and LUSC) and
   normal samples. (C) The BiNGO revealed the biological process of module
   genes. The color depth of nodes represents the corrected P-value. The
   size of nodes represents the number of genes involved.

Table 3.

   The full name and functional roles of hub genes
   Gene symbol Full name Funcution
   PTTG1 Pituitary Tumor-Transforming 1 The encoded protein is a homolog
   of yeast securin proteins, which prevent separins from promoting sister
   chromatid separation. It is an anaphase-promoting complex (APC)
   substrate that associates with a separin until activation of the APC.
   The gene product has transforming activity in vitro and tumorigenic
   activity in vivo, and the gene is highly expressed in various tumors.
   The gene product contains 2 PXXP motifs, which are required for its
   transforming and tumorigenic activities, as well as for its stimulation
   of basic fibroblast growth factor expression. It also contains a
   destruction box (D box) that is required for its degradation by the
   APC. The acidic C-terminal region of the encoded protein can act as a
   transactivation domain. The gene product is mainly a cytosolic protein,
   although it partially localizes in the nucleus. Three transcript
   variants encoding the same protein have been found for this gene.
   CDCA5 Cell Division Cycle Associated 5 CDCA5 (Cell Division Cycle
   Associated 5) is a Protein Coding gene. Diseases associated with CDCA5
   include Cornelia De Lange Syndrome. Among its related pathways are Cell
   Cycle, Mitotic and MicroRNAs in cancer. Gene Ontology (GO) annotations
   related to this gene include chromatin binding.
   TYMS Thymidylate Synthetase Thymidylate synthase catalyzes the
   methylation of deoxyuridylate to deoxythymidylate using,
   10-methylenetetrahydrofolate (methylene-THF) as a cofactor. This
   function maintains the dTMP (thymidine-5-prime monophosphate) pool
   critical for DNA replication and repair. The enzyme has been of
   interest as a target for cancer chemotherapeutic agents. It is
   considered to be the primary site of action for 5-fluorouracil,
   5-fluoro-2-prime-deoxyuridine, and some folate analogs. Expression of
   this gene and that of a naturally occurring antisense transcript,
   mitochondrial enolase superfamily member 1 (GeneID:55556), vary
   inversely when cell-growth progresses from late-log to plateau phase.
   Polymorphisms in this gene may be associated with etiology of
   neoplasia, including breast cancer, and response to chemotherapy.
   ECT2 Epithelial Cell Transforming 2 The protein encoded by this gene is
   a guanine nucleotide exchange factor and transforming protein that is
   related to Rho-specific exchange factors and yeast cell cycle
   regulators. The expression of this gene is elevated with the onset of
   DNA synthesis and remains elevated during G2 and M phases. In situ
   hybridization analysis showed that expression is at a high level in
   cells undergoing mitosis in regenerating liver. Thus, this protein is
   expressed in a cell cycle-dependent manner during liver regeneration,
   and is thought to have an important role in the regulation of
   cytokinesis. Several transcript variants encoding different isoforms
   have been found for this gene.
   COL1A1 Collagen Type I Alpha 1 Chain This gene encodes the pro-alpha1
   chains of type I collagen whose triple helix comprises two alpha1
   chains and one alpha2 chain. Type I is a fibril-forming collagen found
   in most connective tissues and is abundant in bone, cornea, dermis and
   tendon. Mutations in this gene are associated with osteogenesis
   imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos
   syndrome Classical type, Caffey Disease and idiopathic osteoporosis.
   Reciprocal translocations between chromosomes 17 and 22, where this
   gene and the gene for platelet-derived growth factor beta are located,
   are associated with a particular type of skin tumor called
   dermatofibrosarcoma protuberans, resulting from unregulated expression
   of the growth factor. Two transcripts, resulting from the use of
   alternate polyadenylation signals, have been identified for this gene.
   SPP1 Secreted Phosphoprotein 1 The protein encoded by this gene is
   involved in the attachment of osteoclasts to the mineralized bone
   matrix. The encoded protein is secreted and binds hydroxyapatite with
   high affinity. The osteoclast vitronectin receptor is found in the cell
   membrane and may be involved in the binding to this protein. This
   protein is also a cytokine that upregulates expression of
   interferon-gamma and interleukin-12. Several transcript variants
   encoding different isoforms have been found for this gene.
   [144]Open in a new tab

Analysis and validation of hub genes

   The prognostic information of PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5
   was freely obtained from GEPIA database. A total of 962 NSCLC patients
   were available for survival analysis. It was found that the high
   expression level of PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5 was
   markedly associated with worse overall survival for NSCLC patients
   ([145]Figures 7and [146]8A). This finding further confirmed the key
   role of these hub genes in the onset of NSCLC. Based on the
   immunohistochemical staining results from HPA database, the protein
   expression level of PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5 was
   consistent with their gene expression, that is, the protein levels of
   hub genes were also in a higher expression state in NSCLC tissues
   compared to normal tissues ([147]Figure 7B). In addition, we
   established a network of module genes and their co-expression genes
   ([148]Figure 9A). In summary, PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5
   could represent the important diagnostic and prognostic biomarkers for
   NSCLC.

Figure 7.

   [149]Figure 7
   [150]Open in a new tab

   The expression level of hub genes according to the GEPIA database.

Figure 8.

   [151]Figure 8
   [152]Open in a new tab

   (A) The survival analysis for hub genes according to the GEPIA
   database. (B) The protein level expression of hub genes in NSCLC and
   normal tissues using immunohistochemistry.

Figure 9.

   [153]Figure 9
   [154]Open in a new tab

   (A) The network of module genes and their co-expression genes
   constructed by cBioPortal. Nodes with thick outline: hub genes; nodes
   with thin outline: co-expression genes. (B) The 20 most small molecules
   drugs identified by CMap database. (C) qPCR validation of these six hub
   genes in seven paired NSCLC samples. *P<0.05.

Identification of related active small molecules

   To identify candidate small molecule drugs targeting the gene
   expression of NSCLC, all the overlapping DEGs, which were divided into
   up-regulated and down-regulated groups, were submitted to the CMap
   database. The 20 most significant small molecules matched to the NSCLC
   gene expression changes are listed in [155]Table 4 and [156]Figure 9B.
   Among these small molecules, DL-thiorphan (enrichment score = −0.826)
   and phenoxybenzamine (enrichment score = −0.823) showed a highly
   significant negative correlation and have the potential to reverse the
   tumoral status of NSCLC. This analysis provided novel insights into the
   treatment of NSCLC. However, further studies were still needed to
   explore the molecular mechanism of these small molecules in NSCLC. To
   futher investigate the moleuclar mechinism of the hub genes in NSCLCS,
   we predicted potential transcription factors (Figure S1) and
   constructed a regulatory network of lncRNA, miRNA and mRNA (Figure S2)
   by Gene-Cloud Biotechnology Information (GCBI) database.

Table 4.

   List of the 20 most significant small molecule drugs
          cmap name         enrichment    p
   DL-thiorphan             -0.92      0.01312
   phenoxybenzamine         -0.9       0.00016
   sanguinarine             -0.844     0.04873
   blebbistatin             -0.826     0.05964
   repaglinide              -0.797     0.00334
   medrysone                -0.795     0.00016
   8-azaguanine             -0.792     0.00378
   menadione                -0.778     0.09664
   milrinone                -0.774     0.02354
   Y-27632                  -0.774     0.10102
   4,5-dianilinophthalimide -0.757     0.11651
   puromycin                -0.748     0.00806
   5255229                  -0.748     0.12557
   pyrvinium                -0.736     0.0007
   rottlerin                -0.736     0.03758
   5224221                  -0.736     0.13802
   ginkgolide A             -0.733     0.01011
   0297417-0002B            -0.72      0.04521
   estriol                  -0.718     0.01281
   sulconazole              -0.71      0.01428
   [157]Open in a new tab

Evaluation of gene expression in NSCLC

   To further verify the expression of PTTG1, CDCA5, TYMS, ECT2, COL1A1
   and SPP1 genes in NSCLC tissues and corresponding adjacent normal
   tissues, we choose seven pairs of tumor tissues and corresponding
   adjacent tissues. Relative expression of PTTG1, CDCA5, TYMS, ECT2,
   COL1A1 and SPP1 mRNA in NSCLC and adjacent non-tumorous tissues were
   quantified by qPCR. The results showed that the average PTTG1, CDCA5,
   TYMS, ECT2, COL1A1 and SPP1 mRNA expression level in NSCLC tissues was
   significantly higher compared with non-tumorous tissues (*p<0.05,
   compared with adjacent non-tumorous tissues, [158]Figure 9C).

Discussion

   Recently, the rapid advance in microarray and high-throughput
   technologies has expanded the application biomedicine in clinical
   practice, such as cancer early diagnosis, novel targeted drug discovery
   and prognosis prediction. GEO database, as a public repository for
   archiving high-throughput microarray experimental data, has provided
   the powerful tools to determine key genes and pathways associated with
   the pathogenesis of tumors.[159]^29^,[160]^30 In the present study,
   based on the GEO database, five gene expression profiles including 696
   NSCLC samples and 237 normal samples were integrated for a
   comprehensive bioinformatics analysis. The aim of our study was to find
   the potential small molecule drugs for the treatment of NSCLC and to
   identify the novel biomarkers correlated with the pathogenesis and
   prognosis of NSCLC. A total of 408 overlapping DEGs between tumor
   tissues and corresponding adjacent normal tissues were identified,
   which consisted of 109 up-regulated genes and 296 down-regulated genes.
   For a better in-depth understanding of these overlapping DEGs, the GO
   function and KEGG pathway enrichment for these DEGs were performed. GO
   term analysis was carried out via the following aspects: BP, MF and CC.
   The BP analysis showed that these DEGs were mainly enriched in response
   to wounding, negative regulation of cell proliferation and regulation
   of cell proliferation. MF analysis indicated that these DEGs were
   significantly associated with carbohydrate binding, growth factor
   binding and glycosaminoglycan binding. Changes in CC were mainly
   enriched in extracellular region part, extracellular region and
   extracellular space. The KEGG pathway enrichment analysis revealed nine
   significant signaling pathways including CAMs, leukocyte
   transendothelial migration, TGF-beta signaling pathway, complement and
   coagulation cascades and ECM-receptor interaction. Multiple CAM are
   involved in the tumor growth, metastasis and angiogenesis. Vascular
   CAM-1 was first noted as an endothelial cell adhesion receptor for more
   than two decades, which plays a key role in leukocyte recruitment in
   cellular immune responses. The L1 cell adhesion molecule (L1CAM), as
   neural adhesion molecules, extensively participates in the progression
   of human malignant tumors.[161]^31^,[162]^32 Targeting the TGFβ pathway
   has been used for various cancer therapy.[163]^33^,[164]^34 An
   increasing number of studies reveal that the ECM-receptor interaction
   pathway is significantly associated with the various cancer cells
   proliferation and invasion. Zhang et al demonstrated that Twist2 is
   involved in the proliferation and invasion of kidney cancer cells
   through regulating the expression of two molecules in the ECM-receptor
   interaction pathway: ITGA6 and CD44.[165]^35 In summary, the above
   theories strongly supported our findings from bioinformatics analysis.
   The PPI network complex based on DEGs-encoding proteins was constructed
   and 300 nodes with 1283 interactions were obtained. The MCODE plug-in
   extracted three modules with the most significant degree from the PPI
   network. TTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5 with high degree of
   connectivity were selected as hub genes. Survival analysis for 962
   NSCLC patients from GEPIA database showed that patients with high
   expression levels of PTTG1, TYMS, ECT2, COL1A1, SPP1 and CDCA5
   experienced a worse prognosis than those with low expression. To
   validate the results of bioinformatics analysis, we performed qPCR
   analysis to evaluate the expression of hub genes expression in seven
   paired NSCLC tissues. The qPCR analysis showed the same gene expression
   trend as found in the GEO database, thereby verifying the reliability
   of our results. Additionally, the establishment of a network of
   lncRNA–miRNA–mRNA and the prediction of transcription factors will
   enhance our understanding of the mechanism of PTTG1, TYMS, ECT2,
   COL1A1, SPP1 and CDCA5 in the pathogenesis of NSCLC. Previous studies
   have proved that PTTG1 protein is abundantly expressed in various
   invasive tumors and hematopoietic malignant tumors, but its expression
   level is low or undetectable in most normal tissues. Several studies
   have further emphasized the role of PTTG1 in the growth and metastasis
   of tumors. They have shown that ectopic expression of PTTG1 enhances
   proliferation or invasiveness in various histologically derived cancer
   cell lines, whereas silencing of PTTG1 produces the opposite
   result.[166]^36^,[167]^37 CDCA5 plays a key role in ensuring the
   accurate separation of sister chromatids in S and G2/M phases of cell
   cycle by interacting with coherents and cdk1. Additionally, CDCA5 also
   interacts with the key regulatory factors ERK and cyclin E1 of
   G1/Smitotic checkpoint. Recent studies have shown that the expression
   of CDCA5 in oral squamous cell carcinoma, urothelial cell carcinoma and
   gastric cancer, which is related to tumorigenesis and tissue
   invasion.[168]^38^,[169]^39 ECT2 is a guanine nucleotide exchange
   factor (GEF), which is related to tumor cell differentiation, TNM
   stage, prognosis and lymph node metastasis, such as breast cancer,
   osteosarcoma cells, gastric cancer, and gliomas.[170]^40^,[171]^41
   COL1A1 is a target gene of miR-133a-3p in oral squamous cell carcinoma
   and miR-129-5p in gastric cancer.[172]^42^,[173]^43 However, no studies
   have reported the potential mechanism of ITGB5 and RGS4 in the
   initiation and progression of NSCLC. The above studies indicated PTTG1,
   TYMS, ECT2, COL1A1, SPP1 and CDCA5 may also play an important role in
   the occurrence and development of NSCLC.

   In additionally, analyzed with the overlapping DEGs and CMap database,
   we determined a set of small molecule drugs that had potential to
   reverse the gene expression changes of NSCLC. The small molecules with
   a highly significant negative enrichment value may become new targeted
   drugs for the treatment of NSCLC. DL-thiorphan, as the most significant
   small molecule (enrichment score = −0.826), was the most promising
   small molecule to reverse the abnormal NSCLC gene expression. It is
   worth noting that so far no research has focused on the potential role
   of this small molecule in NSCLC. Similarly, the relationship between
   phenoxybenzamine (enrichment score = −0.823) and NSCLC was also not
   investigated. This information is beneficial for the development of
   novel targeted drugs for the treatment of NSCLC. Given the emergence of
   these candidate biomarkers in silico, in vitro studies (with cell
   lines, etc.) and then in vivo experiments could be worth of interest in
   functional validation.

Conclusion

   In this study, six key genes were identified for the first time in
   NSCLC by integrated bioinformatics analysis. Survival analysis revealed
   that high expression levels of PTTG1, TYMS, ECT2, COL1A1, SPP1 and
   CDCA5 were markedly associated with worse prognosis of patients. These
   hub genes could act as the promising novel biomarkers for the
   diagnosis, prognosis and treatment of NSCLC. We also revealed several
   crucial signaling pathways correlated with the NSCLC initiation and
   progression. Furthermore, we identified a set of candidate small
   molecule drugs which could reverse the abnormal gene expression of
   NSCLC. We hope the present study could provide powerful evidence for
   the future development of genomic individualized treatment in NSCLC.

Disclosure

   The authors report no conflicts of interest in this work.

Supplementary materials

Figure S1.

   [174]Figure S1
   [175]Open in a new tab

   The potential transcription factors associated with the expression of
   hub genes.

Figure S2.

   [176]Figure S2
   [177]Open in a new tab

   A regulatory network of lncRNA-miRNA-mRNA constructed by GCBI. Purple
   nodes: related lncRNA; Blue nodes: targeted miRNA.

References