Abstract Differential expressions of certain genes during tumorigenesis may serve to identify novel manageable targets in the clinic. In this work with an integrated bioinformatics approach, we analyzed public microarray datasets from Gene Expression Omnibus (GEO) to explore the key differentially expressed genes (DEGs) in non-small cell lung cancer (NSCLC). We identified a total of 984 common DEGs in 252 healthy and 254 NSCLC gene expression samples. The top 10 DEGs as a result of pathway enrichment and protein–protein interaction analysis were further investigated for their prognostic performances. Among these, we identified high expressions of CDC20, AURKA, CDK1, EZH2, and CDKN2A genes that were associated with significantly poorer overall survival in NSCLC patients. On the contrary, high mRNA expressions of CBL, FYN, LRKK2, and SOCS2 were associated with a significantly better prognosis. Furthermore, our drug target analysis for these hub genes suggests a potential use of Trichostatin A, Pracinostat, TGX-221, PHA-793887, AG-879, and IMD0354 antineoplastic agents to reverse the expression of these DEGs in NSCLC patients. Keywords: Non-small cell lung cancer, differential gene expressions, integrative bioinformatics analysis, drug target potential analysis Introduction Lung cancer is one of the deadliest diseases all around the world. The GLOBOCAN estimations by the International Agency for Research on Cancer predict approximately 13 million new cancer cases by the year 2040. Currently, lung cancer is the most commonly diagnosed form of cancer (11.6% of total cases) and the leading cause of cancer deaths (18.4% of total deaths).^ [29]1 Histologically, lung cancer can be divided into non-small cell lung cancer (NSCLC), which accounts for approximately 85% of lung cancer cases and small cell lung cancer.^ [30]2 The lack of early-stage symptoms and effective diagnostic markers restrain the treatment success in NSCLC. Thus, most of the patients are diagnosed at an advanced stage, and half of them have distant metastatic disease at initial diagnosis.^ [31]3 Over the last decade, there have been considerable improvements in chemotherapy, radiation therapy, surgery, and targeted therapy for lung cancer. Especially with the recent advances in molecular biology, significant progress has been made through molecule-targeted therapy in NSCLC. For instance, it has been shown that about 20% of Caucasian and 50% of Asian NSCLC patients had mutations on their EGFR (epidermal growth factor receptor) genes.^ [32]4 However, with the application of EGFR targeting small tyrosine kinase inhibitors (EGFR-TKIs) erlotinib and gefitinib, both response rate and median survival of these patients were found to be improved.^ [33]5 Also, approximately 7% of NSCLC patients bear an activated ALK gene, but treatment with ALK inhibitor crizotinib has been shown to improve both response and 6-month progression-free survival rates in these patients.^ [34]6 Nonetheless, the 5-year survival rates of NSCLC patients remain low with a poor prognosis due to the development of intrinsic or acquired chemoresistance against therapeutic drugs.^ [35]7 Non-small cell lung cancer is a result of the accumulation of several genetic and epigenetic modifications, which could have originated from multiple reasons.^ [36]8 The discovery and characterization of new prognostic or diagnostic markers together with enhanced therapeutic approaches for NSCLC are of top priority for the successful treatment of this disease. Unfortunately, information on the heterogeneous nature of the tumor and the involvement of affecting factors in the process of NSCLC tumor development are far from completely resolved. Therefore, it is highly important to shed light on the molecular mechanisms governing the pathogenesis of NSCLC and to identify effective diagnostic and/or prognostic biomarkers for novel treatment options. High-throughput technologies such as microarrays and integrated bioinformatics methods are used to obtain gene alterations during tumorigenesis and to identify novel prognostic markers in patients with cancer.^[37]9,[38]10 For instance, in a recent study, Huang and Gao^ [39]11 have demonstrated that CDC20, CENPF, KIP2C, and ZWINT genes were differentially expressed in NSCLC tissues. Similarly, Xiao et al^ [40]12 have identified CCNB1, CCNA2, CEP55, PBK, and HMMR as hub genes and key differentially expressed genes (DEGs) associated with NSCLC by bioinformatics analyses. Interestingly, Wang et al^ [41]13 have shown CCND1 as the most enriched gene and a potential prognostic biomarker in NSCLC through a gene set enrichment analysis. More recently, Zhang et al^ [42]14 have identified TOP2A, CCNB1, BIRC5, and TTK as well as miR-21-5p and miR-31-5p to be significantly associated with NSCLC prognosis through an integrative analysis of mRNA and miRNA expression profiles. Nevertheless, the genes that are discovered by 1 cohort might be difficult to be identified in other cohorts.^ [43]15 For this reason, it is essential to validate genes in several independent studies. In this study, we sought to identify potential therapeutic targets or prognostic biomarkers among the DEGs associated with NSCLC through an integrated bioinformatics approach. For this purpose, we retrieved 4 different microarray datasets from Gene Expression Omnibus (GEO) database and screened for DEGs between NSCLC tumor and neighboring normal tissues. After gene set enrichment analysis to identify associated biological processes, a protein–protein interaction (PPI) network analysis was performed to elucidate potential key DEGs. We also explored the significance of candidate key DEGs and their correlation to patient prognosis through survival analysis. Finally, potential therapeutic drugs that may target and reverse the expression of these key DEGs were predicted by using the L1000CDS2 signature search engine. Materials and Methods Microarray datasets A comprehensive database search was conducted for identifying appropriate datasets including NSCLC tumor tissue and matched adjacent normal samples from the public GEO database.^ [44]16 To avoid microarray platform differences, the datasets originating from Affymetrix microarrays utilizing Human Genome U133 Plus 2.0 chips (Thermo Fisher Scientific, Inc., Waltham, MA, USA) were selected. Four datasets with accession numbers [45]GSE18842, [46]GSE19804, [47]GSE27262, and [48]GSE102287 were identified and downloaded for further analysis ([49]Table 1). [50]GSE19804^ [51]18 included 60 pairs of NSCLC tumor and matched adjacent normal lung tissue, while [52]GSE18842^ [53]17 comprised 46 NSCLC tumors and 45 paired controls, [54]GSE27262^ [55]19 contained 25 tumors and normal tissue pairs from stage I lung adenocarcinoma, and [56]GSE102287^ [57]20 contained 66 matched NSCLC tumor and normal tissues. A total of 506 gene expression samples including 252 healthy and 254 NSCLC tissues were evaluated in this study. Table 1. Transcriptome datasets employed in the present study. Source—ID Purpose No. of tumor samples No. of control samples References