Abstract

   Background: Despite the significant survival benefits of
   anti-PD-1/PD-L1 immunotherapy, non-small cell lung cancer (NSCLC)
   remains one of the most common tumors and major causes of
   cancer-related deaths worldwide. Thus, there is an urgent need to
   identify new therapeutic targets for this refractory disease.

   Methods: In this study, microarray datasets [39]GSE27262, [40]GSE75037,
   [41]GSE102287, and [42]GSE21933 were integrated by Venn diagram. We
   performed functional clustering and pathway enrichment analyses using
   R. Through the STRING database and Cytoscape, we conducted
   protein-protein interaction (PPI) network analysis and identified the
   key genes, which were verified by the GEPIA2 and UALCAN portal.
   Validation of actin-binding protein anillin (ANLN) was performed by
   quantitative real-time polymerase chain reaction and Western blotting.
   Additionally, Kaplan-Meier methods were used to compute the survival
   analyses.

   Results: In total, 126 differentially expressed genes were identified,
   which were enriched in mitotic nuclear division, mitotic cell cycle
   G2/M transition, vasculogenesis, spindle, and peroxisome
   proliferator-activated receptor signaling pathway. 12 central node
   genes were identified in the PPI network complex. The survival analysis
   revealed that high transcriptional levels were associated with inferior
   survival in NSCLC patients. The clinical implication of ANLN was
   further explored; its protein expression showed a gradually increasing
   trend from grade I to III.

   Conclusion: These Key genes may be involved in the carcinogenesis and
   progression of NSCLC, which may serve as useful targets for NSCLC
   diagnosis and treatment.

   Keywords: non-small cell lung cancer, gene expression omnibus,
   differentially expressed genes, protein-protein interaction, biological
   process

Introduction

   Lung cancer is the most common cause of cancer-related death worldwide,
   wherein NSCLC accounts for 85% of lung cancer cases ([43]Sung et al.,
   2021). An increased understanding of the biology and pathogenic genomic
   changes in NSCLC has led to advances and developments in its treatment.
   Particularly, the emergence of molecularly targeted therapies and
   immunotherapy has fundamentally changed the way NSCLC patients are
   treated ([44]Jordan et al., 2017). A large number of genes have been
   recognized as drug targets and their molecular alterations, including
   epidermal growth factor receptor mutations, proto-oncogene receptor
   tyrosine kinase 1 rearrangements, anaplastic lymphoma kinase
   rearrangements, and BRAF V600E mutations, could predict the response to
   treatment ([45]Stella et al., 2013). Testing for these genes is
   becoming increasingly routine and has yielded motivating results.

   However, the incidence of rearrangement, fusion, or over-expression of
   these genes in NSCLC patients are very low, leading to limited
   availability of molecular targeted therapies for these genes. For
   example, aberrantly activations of ALK was found in approximately 4% of
   NSCLC tumors, and chromosomal rearrangement of ROS1 has been identified
   in approximately 1% of NSCLC patients ([46]Wong et al., 2009;
   [47]Gainor and Shaw, 2013). EGFR somatic activating mutations were
   found in approximately 20% of advanced NSCLC patients, and represented
   a paradigm for the use of tyrosine kinase inhibitors for subsets of
   cancer treatment. However, acquired resistance inevitably occurs in
   these cases ([48]Yu et al., 2015). In addition, there is currently a
   very limited number of drug targets for other subtypes of lung cancer,
   such as squamous cell and large cell carcinoma, other than
   adenocarcinoma. Furthermore, targeted drugs developed for lung
   adenocarcinoma are basically ineffective for lung squamous cell
   carcinoma ([49]Rekhtman et al., 2012). As for immunotherapy, the
   improvement in survival of lung cancer patients by blocking the immune
   checkpoint PD-1/PD-L1 is encouraging. However, only about 20% of
   patients benefit, and resistance is likely to develop after the initial
   response ([50]Topalian et al., 2019). Thus, identifying potential gene
   targets or pathway alterations in this refractory disease is urgently
   needed.

   Currently, the availability of information about the human genome and
   proteome, especially those that assist in the development of new
   anti-cancer agents, is largely dependent on advances in bioinformatics.
   As an enabling technology, bioinformatics bridges the gap between
   sequence information and clinical practice, and it has evolved into
   multiple ways to enable us not only to identify “driver” and
   “passenger” genes toward neoplasia, but also to comprehend genetic
   alterations and mechanisms in cancer ([51]Mount and Pandey, 2005).

   In this study, four microarrays, namely [52]GSE27262, [53]GSE75037,
   [54]GSE102287, and [55]GSE21933, were integrated and analyzed.
   Differentially expressed genes (DEGs) between NSCLC samples and
   corresponding normal specimens were analyzed. Subsequently, Gene
   Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
   pathway enrichment analyses of the DEGs were developed. The PPI network
   was developed by the Search Tool for the Retrieval of Interacting Genes
   (STRING) database. We screened out the key genes with the supreme
   connectivity in the network and evaluated their prognostic value, which
   would be helpful for further development of prognostic biomarkers and
   novel therapeutic targets for NSCLC patients.

Materials and methods

Microarray datasets information

   The National Center for Biotechnology Information Gene Expression
   Omnibus (GEO) is an open-access database for data regarding
   next-generation sequencing, microarray, and other forms of
   high-throughput gene data ([56]Barrett et al., 2013), from which the
   microarray datasets of lung cancer samples and adjacent non-malignant
   samples ([57]GSE27262, [58]GSE75037, [59]GSE102287, and [60]GSE21933)
   were downloaded. Gene expression profiles of [61]GSE27262 and
   [62]GSE102287 were based on platform [63]GPL570 [HG-U133_Plus_2]
   Affymetrix Human Genome Array, with 25 lung adenocarcinoma tissues
   versus 25 adjacent normal specimens and 32 NSCLC samples versus 34
   normal samples, respectively. [64]GSE75037 was based on platform
   [65]GPL6884 HumanWG-6 v3.0 expression beadchip, including 83
   adenocarcinomas and 83 adjacent normal samples. [66]GSE21933 was based
   on platform [67]GPL6254 Phalanx Human OneArray, including 21 NSCLC
   tissues and 21 matched adjacent non-malignant tissues.

Data analysis

   GEO2R, a network application based on R that utilizes the Bioconductor
   (R packages) to analyze GEO data, was used to identify DEGs between
   lung cancer and adjacent non-malignant specimens. The selection
   criteria, |logFC| > 2.0, and adjusted p < 0.05 were used to define the
   DEGs. We analyzed each dataset and intersected them using Venn
   diagrams.

GO and KEGG pathway enrichment analysis

   The GO knowledgebase was composed of ontology and ontology annotations.
   As of 2018, there were approximately 45,000 terms in GO, including CC,
   BP, and MF terms ([68]The Gene Ontology Consortium, 2017). R software
   version 4.0.3 (clusterProfiler and ggplot2 packages) was used for gene
   classification and GO, KEGG pathway enrichment analyses. Statistical
   significance was set at p < 0.05.

PPI network visualization

   STRING v11, an online resource with currently the largest number of
   proteins (24.6 million) and broad data sources ([69]Szklarczyk et al.,
   2019), was employed to explore protein-protein associations among the
   DEGs. In addition, Cytoscape software was used for the visualization of
   the protein interaction network and the analyzation of the interaction
   of the candidate DEGs that encode proteins in NSCLC. The top 12
   molecules with the strongest connectivity in the network were
   identified as key genes by CytoHubba, a plug-in of Cytoscape.

Genetic alteration analysis and enrichment analysis of the key gene-related
drugs

   Through the data of lung adenocarcinoma and lung squamous cell
   carcinoma of the TCGA project and the Sangerbox platform, we obtained
   the mutation profile of 12 key genes in NSCLC. Furthermore, we have
   enriched and analyzed these key gene-related drugs by using Enrichr
   platforms ([70]https://maayanlab.cloud/Enrichr/). We used
   Diseases/Drugs and DSigDB module for cluster analysis.

Survival analysis

   To evaluate the effect of the 12 key genes on prognosis of NSCLC
   patients, we used the Kaplan–Meier plotter
   ([71]http://kmplot.com/analysis/), an interactive database for
   validation of prognostic biomarkers that contains mRNA, miRNA, protein
   data, and clinical information from a variety of cancer patients.
   Patients with NSCLC were grouped based on their mRNA levels and hazard
   ratios, and the respective p values were calculated. In addition, we
   verified the survival analysis using the TCGA database
   ([72]https://portal.gdc.cancer.gov). We downloaded and collated lung
   adenocarcinoma and squamous cell carcinoma RNAseq data and clinical
   data from the TCGA database; Survival package of R software was used to
   test the proportional risk hypothesis, and the results were visualized
   using survminer package and ggplot2 package.

Expression analysis and clinicopathological association

   The expression validation of the key genes was performed based on RNA
   sequencing data produced by The Cancer Genome Atlas (TCGA) and
   Genotype-Tissue Expression (GTEx) project. Tissue-wise expression
   analyses of key genes between 969 NSCLC samples and 685 non-malignant
   samples from TCGA and the GTEx project were profiled using GEPIA2
   ([73]http://gepia2.cancer-pku.cn/).

   Clinicopathologic features of patients with NSCLC, including pathologic
   stage, tumor grade, age, gender, living status, and body weight, were
   obtained from the Clinical Proteomic Tumor Analysis Consortium (CPTAC)
   Confirmatory/Discovery dataset. Proteomic analyses of lung cancer and
   normal samples were performed using UALCAN, an open network repository
   for investigation on gene expression and its disease association
   ([74]Chandrashekar et al., 2017). Furthermore, we obtained its
   immunohistochemical results from the HPA database (The Human Protein
   Atlas [75]https://www.proteinatlas.org/).

Cell culture

   Lung cancer cell lines NCI-H1975, NCI-H1650, A549, and NCI-H1299, and
   normal lung epithelial cell line BEAS-2B were bought from the Shanghai
   Cell Bank and ICELL Company. Cells were cultured in Roswell Park
   Memorial Institute-1640 (RPMI-1640; Solarbio, Beijing, China) and
   Dulbecco’s modified Eagle’s medium (DMEM; Solarbio, Beijing, China)
   supplemented with 10% fetal bovine serum (FBS; Gemini, California,
   United States) and maintained at 37 °C thermostatic and humidified cell
   incubator with 5% CO2.

RNA extraction and qRT-PCR

   Total RNA was extracted from NSCLC cell lines and normal lung
   epithelial cell lines with an RNA extraction kit (Axygen; Silicon
   Valley, United Ststes) and reverse transcription was performed using a
   cDNA Synthesis Kit (Vazyme Biotech, Nanjing, China). Quantitative
   real-time polymerase chain reaction (qRT-PCR) was carried out using
   Bio-Rad CFX96 Touch with ChamQ SYBR^® Green qRT-PCR Master Mix. All
   qRT-PCRs were performed three times and measured using
   2^−ΔΔCTalgorithm. Primer sequences were as follows: ANLN, Former:
   TCT​TCG​TGG​CCG​ATT​TGA​CA, Reverse: TGG​ACT​TAC​CAC​ACC​AAC​TGT;
   GAPDH, Former: CGA​GCC​ACA​TCG​CTC​AGA​CA, Reverse:
   GTG​GTG​AAG​ACG​CCA​GTG​GA.

Western blotting

   We extracted proteins for Western blotting using RIPA lysis buffer
   (Solarbio, Beijing, China; R0010) and phenylmethylsulfonyl fluoride
   protease inhibitors (Solarbio, Beijing, China; IP0280). The BCA Protein
   Assay Kit (Vazyme Biotech, Nanjing, China; E112-02) was used for
   protein concentration determination. The Western blot system was
   established using the Bio-Rad Bis-Tris Gel system according to the
   manufacturer’s instructions. Proteins isolated by SDS-PAGE were
   electroblotted onto polyvinylidene fluoride membranes and incubated
   with a primary antibody (dilution: 1:1250) overnight in a shaker at
   4°C. They were then incubated in a shaker for 1 h with horseradish
   peroxidase labeled secondary antibody (dilution: 1:20000) at 25°C.
   After rinsing, a multi-functional chemiluminescent imaging system
   (Analytik-Jena, United States) was used for development.

Results

Screening of DEGs

   Four microarray datasets ([76]GSE27262, [77]GSE75037, [78]GSE102287,
   and [79]GSE21933) were selected in this study, and their statistics are
   shown in [80]Table 1. Clustering of all overlapping DEGs is shown in
   the heatmap ([81]Figure 1). In accordance with the selection criteria,
   |logFC| > 2.0 and adjusted p < 0.05, a total of 445, 845, 891, and 794
   DEGs were identified from the [82]GSE27262, [83]GSE75037,
   [84]GSE102287, and [85]GSE21933 microarrays, respectively, as shown in
   the volcano plots ([86]Figure 2A). After intersecting the DEGs of the
   four databases, 126 DEGs including 37 upregulated genes and 89
   downregulated genes, were found to be significant in all four
   microarray datasets ([87]Figure 2B).

TABLE 1.

   The composition of four different gene expression omnibus datasets.
   GEO series    NSCLC Normal Total number
   [88]GSE75037  83    83     166
   [89]GSE21933  21    21     42
   [90]GSE27262  25    25     50
   [91]GSE102287 32    34     66
   [92]Open in a new tab

   Abbreviations: NSCLC, non-small cell lung cancer; GEO, gene expression
   omnibus.

FIGURE 1.

   [93]FIGURE 1
   [94]Open in a new tab

   Screening of DEGs in four gene expression datasets (|logFC| > 2 and p <
   0.05). Heatmap of all overlapping DEGs. Upregulated DEGs, orange;
   Downregulated DEGs, blue; logFC, log fold change; DEG, differentially
   expressed gene.

FIGURE 2.

   [95]FIGURE 2
   [96]Open in a new tab

   The overlapping DEGs of the four gene expression datasets. (A) Volcano
   plots of each gene expression profiles in NSCLC and normal tissues. (B)
   Venn diagrams of DEGs. The one on the left refers to 37 upregulated
   DEGs; The right one refers to 89 downregulated DEGs; NSCLC, non-small
   cell lung cancer; DEG, differentially expressed gene.

Functional annotation and pathway enrichment analyses

   GO and KEGG pathway analyses were conducted using R 4.0.3
   (clusterProfiler, org.Hs.e.g.,.db, and ggplot2 packages). DEGs were
   basically enriched in mitotic nuclear division, cell cycle, G2/M phase
   transition, vasculogenesis, G2/M transition of mitotic cell cycle in
   biological process (BP) terms, spindle, midbody, condensed chromosome
   outer kinetochore in cellular components (CC) terms, and growth factor
   binding, G protein-coupled peptide receptor activity, and peptide
   receptor activity in molecular functions (MF) terms ([97]Figure 3). The
   KEGG pathway analysis found that the DEGs were predominantly involved
   in the peroxisome proliferator-activated receptors (PPAR) signaling
   pathway, cell cycle, and ECM-receptor interaction pathway ([98]Figure
   4A).

FIGURE 3.

   [99]FIGURE 3
   [100]Open in a new tab

   Gene ontology analysis of DEGs. (A) Biological process terms of DEGs.
   (B) Cellular component terms of DEGs. (C) Molecular function terms of
   DEGs. DEG, differentially expressed gene.

FIGURE 4.

   [101]FIGURE 4
   [102]Open in a new tab

   KEGG pathway analysis, protein-protein interaction network
   construction, and module analysis. (A) Significantly enriched KEGG
   pathway terms of DEGs in NSCLC. (B) DEGs protein–protein interaction
   network complex. Red nodes refer to upregulated genes. Green nodes
   refer to downregulated genes. Edges represent protein-protein
   associations. (C) Top 12 key genes with high connectivity in the
   network. The shade of the color indicates the strength of the
   connection. NSCLC, non-small cell lung cancer; DEG, differentially
   expressed gene.

PPI network construction and key gene identification

   There were 126 nodes and 1,054 edges in the PPI network with an
   enrichment p-value of <1.0e-16 ([103]Figure 4B). Twelve central node
   genes, including ANLN, cyclin-dependent kinase inhibitor 3 (CDKN3),
   kinesin family member 4A (KIF4A), centrosomal protein 55 kDa (CEP55),
   G2/mitotic-specific cyclin-B1 (CCNB1), kinesin family member 11
   (KIF11), G2/mitotic-specific cyclin-B2 (CCNB2), maternal embryonic
   leucine zipper kinase (MELK), hyaluronan-mediated motility receptor
   (HMMR), abnormal spindle-like microcephaly associated protein (ASPM),
   centromere protein F (CENPF), and checkpoint serine/threonine-protein
   kinase (BUB1), were identified among the 126 nodes by using CytoHubba
   of Cytoscape ([104]Figure 4C). Furthermore, ANLN was the top gene in
   the network with the highest connectivity and maximum neighborhood
   component (Table S1).

Transcriptional level validation of the 12 key genes

   We profiled the tissue-wise expression of key genes in NSCLC tissues
   and normal tissues using GEPIA2. The results revealed that the mRNA
   expression levels of the 12 key genes in the NSCLC samples were
   significantly higher than those in normal samples ([105]Figure 5).

FIGURE 5.

   [106]FIGURE 5
   [107]Open in a new tab

   mRNA expression of the key genes (A‐L) in NSCLC and normal samples from
   TCGA and GTEx. *p < 0.01. The red box refers to the tumor group, blue
   box refers to normal group. NSCLC, non-small cell lung cancer.

Genetic alteration analysis and key gene-related drugs enrichment analysis

   We observed the mutation status of these key genes in different NSCLC
   samples of TCGA. As shown in [108]Figure 6A, ASPM had the highest
   mutation frequency, followed by CENPF. CCNB2 and CDKN3 had the lowest
   mutation frequency. Missense mutation and frame-shift mutation were the
   most common types of mutations, while in-frame internal deletion was
   rare. To explore drugs that associated with the key genes, we used
   Enrichr platform to perform cluster analysis and UMAP algorithm to draw
   scatter map for all corresponding terms in DSigDB gene set database. We
   found that the terms of enrichment of these key genes were correlated
   with antitumor drugs etoposide and methotrexate, as well as non-tumor
   drugs such as lucanthone, troglitazone, testosterone, calcitriol and
   piroxicam ([109]Figure 6).

FIGURE 6.

   [110]FIGURE 6
   [111]Open in a new tab

   Genetic alteration analysis and enrichment analysis of the key
   gene-related drugs. (A) Mutation profile of the 12 key genes in NSCLC.
   Enrichment analysis of the key gene-related drugs by Enrichr platform
   and shown by bar chart (B), heat map (C) and scatterplot (D). NSCLC,
   non-small cell lung cancer.

Prognostic role of key genes

   To evaluate the prognostic values of the 12 key genes, we used the
   Kaplan–Meier plotter, an online database that contained transcriptomic
   data of 3,452 NSCLC patients. Just as PD-L1 expression, tumor
   mutational burden can be used to predict immune checkpoint inhibitor
   outcomes, the key molecules we identified can predict survival outcomes
   in patients with NSCLC. Overall survival (OS) and first-progression
   (FP) survival curves are shown in [112]Figure 7; [113]Supplementary
   Figure S1. High transcriptional levels of the 12 key genes (ANLN,
   CDKN3, KIF4A, CEP55, CCNB1, KIF11, CCNB2, MELK, HMMR, ASPM, CENPF, and
   BUB1) were all significantly related to poorer OS (all p < 0.001) and
   FP survival (all p < 0.01) in NSCLC. We verified the overall survival
   analysis of NSCLC patients through the TCGA database, and the
   conclusion reached was consistent with those obtained by the
   Kaplan-Meier plotter analysis ([114]Figure 8).

FIGURE 7.

   [115]FIGURE 7
   [116]Open in a new tab

   Kaplan–Meier overall survival analyses of the 12 key genes (A‐L) in
   NSCLC patients. NSCLC, non-small cell lung cancer.

FIGURE 8.

   [117]FIGURE 8
   [118]Open in a new tab

   The overall survival analyses of the 12 key genes (A‐L) performed by R
   software using the RNAseq data of NSCLC in the TCGA database. NSCLC,
   non-small cell lung cancer.

In vitro verification of ANLN and the relationship between its protein
expression and the clinicopathologic parameters of NSCLC patients

   To verify the transcription level and protein expression level of ANLN,
   QRT-PCR and Western blotting assays were performed in BEAS-2B and four
   NSCLC cell lines. We found that both mRNA and protein levels of ANLN in
   the four NSCLC cell lines were significantly higher than those in
   BEAS-2B ([119]Figures 9A, B). Through the HPA database, we found that
   ANLN was strongly positive in the immunohistochemical test of lung
   cancer tissues ([120]Figure 9C). In addition, we further investigated
   the relationship of ANLN and various clinicopathological parameters of
   NSCLC and its gene expression profile in different cancer types. There
   was a gradually increasing trend based on the protein expression of
   ANLN from grade I to grade III, while age, weight, and tumor stage
   groups did not significantly differ given the protein expression of
   ANLN ([121]Figures 9D–I). Interestingly, we also found that ANLN level
   of was higher in male patients than in female patients ([122]Figure 9E,
   p < 0.01). As shown in [123]Supplementary Figure S1, ANLN is elevated
   in various TCGA and GTEx tumors, including hepatocellular carcinoma,
   pancreatic adenocarcinoma, and breast carcinoma, compared with paired
   normal tissues.

FIGURE 9.

   [124]FIGURE 9
   [125]Open in a new tab

   Validation of ANLN mRNA and protein expression and its association with
   different clinicopathological parameters in NSCLC patients. (A) qRT-PCR
   analysis of ANLN in four NSCLC cell lines and normal lung epithelial
   cell line. (B) Western blotting of ANLN in four NSCLC cell lines and
   normal lung epithelial cell line. (C) Immunohistochemical result of
   ANLN in lung cancer tissues in HPA database. (D–I) Diverse
   clinicopathological parameters: Sample types, patients' gender, age,
   weight, pathologic stage and tumor grade. *p < 0.05, **p < 0.01, ***p <
   0.001. NSCLC, non-small cell lung cancer.

Discussion

   With rapidly increasing morbidity and mortality, the 5-year survival of
   lung cancer patients varies from 4% to 17%, depending on the region and
   stage ([126]Hirsch et al., 2017). Substantial progress has been made in
   NSCLC treatment in recent years, but long-term effective responses are
   still rare for most patients ([127]Herbst et al., 2018). It remains
   critical to explore the underlying pathogenesis of lung cancer and
   achieve more precise treatment. A number of researchers have made
   impressive progress in this area, exploring the microenvironment of
   tumors, looking for biomarkers and individual targeted treatment
   strategies ([128]Guo et al., 2022; [129]Jiang et al., 2022).

   Rather than focusing on a single cohort study, we integrated four
   cohorts of microarray databases and identified 126 overlapping DEGs (37
   upregulated and 89 downregulated) in this study. Through further
   functional clustering and enrichment analyses, we found that these
   genes were mainly enriched in the mitotic nuclear division, cell cycle
   G2/M phase transition, and PPAR signaling pathway. Mitotic nuclear
   division, a biological process that is complementary to but opposite to
   apoptosis, plays a crucial part in carcinogenesis, tumor cell
   maintenance, and tumor progression ([130]Sinha et al., 2019). Given
   that cancer is a cell cycle disease, the progression of the cell cycle
   is inextricably linked to the proliferation and activation of cancer
   cells. The progression of the cell cycle is coordinated by the
   continuous activation of cyclin-dependent kinases through their
   corresponding cyclin chaperone ([131]Malumbres, 2014). Some tumor
   suppressor genes and drug molecules can inhibit tumor cell
   proliferation and invasion by arresting the cell in the G2/M phase
   transition ([132]Song et al., 2009). PPARs have three subtypes (PPAR-α,
   PPAR-β and PPAR-γ), which exhibit diverse roles in vertebrates. PPAR-α
   mainly plays a role in removing circulating lipids or cell lipids,
   PPAR-β is involved in lipid oxidation and cell proliferation, while
   PPAR-γ activation enhances the proliferation of cancer cells and
   promotes brain metastasis ([133]Bougarne et al., 2018; [134]Magadum and
   Engel, 2018; [135]Zou et al., 2019). To further explore the internal
   interactions of the overlapping DEGs, a PPI network was developed. 12
   genes with the strongest connectivity in the network were identified.
   High transcriptional levels of these genes were significantly
   correlated with poor prognosis, which reveals their potential
   prognostic value.

   ANLN, the top gene in our modules, is a unique scaffolding protein that
   was first isolated from Drosophila melanogaster embryos and was mainly
   associated with cytokinesis ([136]Zhang and Maddox, 2010).ANLN has been
   reported to be overexpressed in many tumors. It is involved in the
   progression of pancreatic, brain, breast, and lung cancers ([137]Hall
   et al., 2005; [138]Olakowski et al., 2009; [139]Magnusson et al., 2016;
   [140]Long et al., 2018), which is consistent with our experimental
   results. Evidence has shown that ANLN promotes cell proliferation, and
   the loss of ANLN prevents the cancer cells from dividing and reduces
   their migration and invasion ([141]Wang et al., 2019). Furthermore,
   there is also evidence showing that ANLN expression correlates with
   lung adenocarcinoma metastasis ([142]Xu et al., 2019). In breast
   cancer, ANLN was found to be a alternative marker for Ki-67 (cell
   proliferation index), which is consistent with our findings
   ([143]Figure 6F). Based on the evidence supporting the correlation of
   ANLN with acknowledged features of cancer, ANLN should be considered as
   a novel target for cancer therapy.

   CDKN3 has been reported to be overexpressed in glioma and cervical
   cancer, and its over-expression is associated with inferior survival
   ([144]Yu et al., 2007; [145]Espinosa et al., 2013). Since there are
   more mitotic cells in rapidly proliferating tumor cells, CDKN3
   transcription and protein levels fluctuate throughout the cell cycle,
   reaching a peak during mitosis. High levels of mitotic CDKN3 expression
   is the most likely mechanism for frequent CDKN3 mRNA over-expression in
   human cancer ([146]Fan et al., 2015). The cell cycle-dependent elements
   of CCNB1 and CCNB2 are essential for meiotic resumption. CCNB1 has been
   observed to expedite tumor cell division, cell proliferation, and tumor
   growth in colorectal and pancreatic cancers ([147]Fang et al., 2014;
   [148]Zhang et al., 2018). CCNB2 is also correlated with cancer
   progression and inferior prognosis in breast cancer, hepatocellular
   carcinoma and NSCLC ([149]Qian et al., 2015; [150]Li et al., 2019;
   [151]Jayanthi et al., 2020). KIF4A, the kinesin family member 4A, plays
   a key role in process of DNA replication and repair. It promotes cell
   proliferation, correlates with the size of the tumor in oral carcinoma,
   and serve as a potential prognostic indicator in various solid tumors
   ([152]Wu et al., 2008; [153]Rouam et al., 2010). KIF11 (E.g.,5) and
   MELK have been identified as oncogenes in multiple tumors and
   inhibiting agents targeting them have entered phase I/II clinical
   trials with encouraging results ([154]Ganguly et al., 2014;
   [155]Garcia-Saez and Skoufias, 2021). As of now, nine clinical trials
   targeting KIF11 have been completed, and five clinical trials targeting
   MELK are ongoing or completed, according to [156]ClinicalTrials.gov
   ([157]https://clinicaltrials.gov/). These drugs are used alone or in
   combination with other medicines to treat patients with refractory
   cancers.

   CEP55 was identified as an ideal cancer vaccine candidate ([158]Inoda
   et al., 2011) and a marker for predicting cancer invasion risk,
   metastasis, and therapeutic outcome ([159]Tandon and Banerjee, 2020).
   HMMR, alternatively called RHAMM or CD168, is a microtubule-associated
   protein that regulates mitosis and meiosis. ([160]Chen et al., 2018).
   Abnormal expression of HMMR disrupts the microtubule process during
   cell division and leads to abnormalities in the mitotic spindle,
   altering the fate of progenitor cells and leading to genomic
   instability ([161]Pujana et al., 2007). HMMR has been reported to be
   closely linked to cancer risk and progression in various tumor types
   ([162]Rein et al., 2003). Currently, there are limited researches on
   ASPM’s role in tumors. Recently, it has been reported as a new
   predictor of tumor aggresiveness and prognosis in bladder, prostate,
   and endometrial cancers. ([163]Pai et al., 2019; [164]Saleh et al.,
   2020; [165]Zhou et al., 2020). The prenylated protein CENPF has been
   used clinically as a proliferative marker for malignant tumor cell
   growth ([166]Varis et al., 2006). BUB1, a serine/threonine-protein
   kinase, plays a crucial part in oncogenesis, chromosome arrangement,
   and spindle assembly ([167]Bolanos-Garcia and Blundell, 2011).

   Finally, we profiled the tissue-specific expression of key genes in
   NSCLC and normal specimens from TCGA database and found that its mRNA
   levels were significantly elevated in tumor than in adjacent non-tumor
   tissues. We further explored the clinical implication of ANLN, and its
   protein expression showed a gradually increasing trend from grade I to
   III, revealing its association with tumor aggressiveness.

Conclusion

   Through multiple microarray datasets and integrated bioinformatics
   analysis, we identified key genes and pathways that may be involved in
   NSCLC carcinogenesis, which are mainly associated with mitosis,
   vasculogenesis, and G2/M transition of the mitotic cell cycle. These
   findings provide new insights and opportunities for further development
   of prognostic biomarkers and therapeutic targets for NSCLC patients.

Funding Statement

   This work was funded by the National Natural Science Foundation of
   China (No. 81560410); Science and Technology Project of Jiangxi
   Provincial Administration of Traditional Chinese Medicine (No.
   2020B0365); Science and Technology Planning Project of Jiangxi
   Provincial Health Department (No. 20203122).

Data availability statement

   The original contributions presented in the study are included in the
   article/[168]Supplementary Material, further inquiries can be directed
   to the corresponding authors.

Author contributions

   J-HM and S-SW designed the study. Y-HR, Y-JH, YUL, and YIL collected
   the data. L-TL performed the experiments and wrote the article. Data
   analysis and interpretation was performed by L-TL and Y-JH All authors
   read and approved the submitted version.

Conflict of interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher’s note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

Supplementary material

   The Supplementary Material for this article can be found online at:
   [169]https://www.frontiersin.org/articles/10.3389/fgene.2023.1139994/fu
   ll#supplementary-material
   [170]Click here for additional data file.^ (1.7MB, ZIP)

References