Abstract

   Lung cancer is one of the leading causes of cancer mortality worldwide
   and non–small cell lung cancer (NSCLC) accounts for the most part.
   NSCLC can be further divided into adenocarcinoma (ACA) and squamous
   cell carcinoma (SCC). It is of great value to distinguish these two
   subgroups clinically. In this study, we compared the genome-wide copy
   number alterations (CNAs) patterns of 208 early stage ACA and 93 early
   stage SCC tumor samples. As a result, 266 CNA probes stood out for
   better discrimination of ACA and SCC. It was revealed that the genes
   corresponding to these 266 probes were enriched in lung cancer related
   pathways and enriched in the chromosome regions where CNA usually occur
   in lung cancer. This study sheds lights on the CNA study of NSCLC and
   provides some insights on the epigenetic of NSCLC.

Introduction

   Lung cancer is one of the leading cause of cancer mortality worldwide
   [29][1]. Basing on the 2011 International Association for the Study of
   Lung Cancer/American Thoracic Society/European Respiratory Society
   (IASLC/ATS/ERS) lung adenocarcinoma classification, it is now
   classified into 5 different subtypes: Atypical adenomatous hyperplasia
   (AAH), Adenocarcinoma in situ (AIS) (nonmucinous, mucinous, or mixed
   nonmucinous/mucinous), Minimally invasive adenocarcinoma (MIA) (≤3 cm
   lepidic predominant tumor with ≤5 mm invasion), Invasive
   adenocarcinoma, and variants of invasive adenocarcinoma, and each of
   them has its own histological feature [30][2]. Non–small cell lung
   cancer (NSCLC) accounts for 85% of all lung cancers. The most frequent
   histologic subtypes of NSCC is adenocarcinoma (ACA) and squamous cell
   carcinoma (SCC), accounting for 50% and 30% of NSCLC cases,
   respectively [31][3]. ACA is the most common histologic subtype
   reported with lung cancer in the never smokers (LCINS) [32][4], which
   is a cancer of an epithelium which originates in glandular tissue. SCC
   is a cancer of squamous epithelial cell, which arises most often in
   segmental bronchi and related to lobar and main stem bronchus occurs by
   its extension [33][5], and its incidence is correlated with smoking
   period [34][6] compared with ACA. Historically, well differentiated SCC
   cells include the morphologic features such as intercellular bridging,
   squamous pearl formation and individual cell keratinization [35][5].
   Nowadays, medicine development in NSCLC has introduced histologic
   subtyping, the differentiation of ACA from SCC in biopsy specimens, as
   an important factor for effective treatment choice and molecular
   therapy target. For example pemetrexed, antifolate agent, is effective
   in the treatment of patients with non-squamous NSCLC but should not be
   recommended for the treatment of squamous cell carcinoma [36][7].
   Bevacizumab, combined with paclitaxel/carboplatin, has excessive toxic
   effects in squamous-cell carcinoma [37][8], while it could
   significantly increase overall survival rate of patients with cancers
   of non-squamous histology [38][9], [39][10]. Traditional diagnosis
   method to distinguish adenocarcinoma from squamous cell carcinoma, is
   based on the histologic section and patients' smoking habit. However,
   because of the individual heterogeneity of lung cancer, this method
   cannot correctly distinguish ACA and SCC in some cases efficiently.
   Recently, immunohistochemistry is being used in biopsy and cytology
   material [40][11] as a complement, and several genes have been
   discovered as the immunohistochemical marker. Kargi et al. found
   thyroid transcription factor-1 (TTF-1) is a marker in immunostaining
   for ACA, while p63 and cytokeratins (CK) 5/6 are marks for SCC
   [41][12]. Moreover, molecular targeted therapy has been more and more
   used in NSCLC as the promising treatment strategy in recent years. It
   is demonstrated that superior efficacy of tyrosine kinase inhibitors
   (TKIs) as compared to standard chemotherapy for patients with
   EGFR-mutant tumors [42][13]. Kwak et al. also explored the
   small-molecule inhibitor of the ALK tyrosine kinase could be used as
   the efficacious therapy in advanced ALK-positive tumors in an
   early-phase clinical trial [43][14]. Therefore, it is meaningful to
   identifying genes which have distinct genetics features in ACA and SCC
   that could be used as prognostic factor or potential target for medical
   therapy.

   Previous analysis has showed CNAs are common in almost all human
   cancers [44][15], [45][16]. In NSCLC, CNAs increase with disease
   progression and CNAs are both positionally and functionally clustered
   [46][17]. Furthermore, Giovanni Tonon el at. found despite their
   distinct histopathological phenotypes, ACA and SCC genomic profiles
   showed a nearly complete overlap, with only one clear SCC-specific
   amplicon on 3q26–29 [47][18].

   In this study, to figure out the key genes distinguishing ACA and SCC
   from each other, we compare the genome-wide copy number alterations
   (CNAs) patterns of 208 early stage ACA and 93 early stage SCC tumor
   samples. By means of the feature selection and analysis methods,
   including the Maximum Relevance Minimum Redundancy method (mRMR) and
   the Incremental Feature Selection (IFS) method, 266 optimal CNA probes
   were selected for the discrimination of ACA and SCC. The classification
   model was built with Nearest Neighbor Algorithm (NNA). As a result, the
   classifier achieved a overall MCC of 0.6616. Further analysis on the
   266 CNA related genes showed that they were closely associated with
   lung cancer.

Materials and Methods

Dataset

   We used the copy number alterations data from the non-small cell lung
   cancer study of Huang et al. [48][19]. In their study, a series of 301
   snap-frozen tumor samples from NSCLC patients was collected during
   surgery or biopsy from the Massachusetts General Hospital (MGH),
   Boston, MA and the National Institute of Occupational Health, Oslo,
   Norway. The clinical information of these 301 samples was given in
   [49]File S1. The copy number profiling of 208 early stage
   adenocarcinoma tumors (ACA) samples and 93 early stage squamous cell
   carcinoma tumors (SCC) were retrieved from NCBI Gene Expression Omnibus
   (GEO) with the accession number of [50]GSE34140. The copy number
   profile was obtained using the using Affymetrix 250 K Nsp GeneChip.
   Only 256,554 probes on somatic chromosomes were analyzed. The SNP
   probes were mapped to the RefSeq genes with 2 kb extension both
   upstream and downstream using the UCSC Genome Browser. Among the
   256,554 probes on somatic chromosomes, 104,256 probes were mapped to
   11,700 genes [51][19].

mRMR method

   We used Maximum Relevance Minimum Redundancy (mRMR) method to rank the
   importance of the probes [52][20]. mRMR method could rank probes based
   on both their relevance to the class of samples and the redundancy
   among probes. A smaller index of a probe denotes that it has a better
   trade-off between maximum relevance to class of samples and minimum
   redundancy.

   Both relevance and redundancy were quantified by mutual information
   (MI), which estimates how much one vector is related to another. The MI
   equation was defined as below:
   graphic file with name pone.0088300.e001.jpg (1)

   In [53]equation (1), Inline graphic , Inline graphic are vectors,
   Inline graphic is their joint probabilistic density, and Inline graphic
   and Inline graphic are the marginal probabilistic densities.

   Let Inline graphic denote the whole probe set, Inline graphic denote
   the already-selected probe set containing m probes and Inline graphic
   denote the to-be-selected probe set containing n probes. The relevance
   Inline graphic between a probe Inline graphic in Inline graphic and the
   class of sample Inline graphic can be calculated by:
   graphic file with name pone.0088300.e014.jpg (2)

   The redundancy Inline graphic between a probe Inline graphic in Inline
   graphic and all the probes in Inline graphic can be calculated by:
   graphic file with name pone.0088300.e019.jpg (3)

   To get the probe Inline graphic in Inline graphic with maximum
   relevance and minimum redundancy, the mRMR function combines
   [54]equation (2) and [55]equation (3) and is defined as below:
   graphic file with name pone.0088300.e022.jpg (4)

   The mRMR probe rating would be executed N rounds when given a probe set
   with N (N = m+n) probes. After N rounds of execution, a probe set
   Inline graphic is produced:
   graphic file with name pone.0088300.e024.jpg (5)

   In Inline graphic , index h indicates at which round that the probe is
   selected. The smaller the index h is, the earlier the probe satisfies
   [56]equation (4) and the better the probe is.

Nearest neighbor algorithm (NNA)

   Nearest Neighbor Algorithm (NNA) [57][21], [58][22], which has been
   widely used in bioinformatics and computational biology [59][23],
   [60][24], [61][25], [62][26], [63][27], was adopted to predict the
   class of samples. The “nearness” was calculated according to the
   following equation
   graphic file with name pone.0088300.e026.jpg (6)

   where Inline graphic and Inline graphic are two vectors representing
   two samples, Inline graphic is their dot product, Inline graphic and
   Inline graphic are their moduluses. The smaller the Inline graphic ,
   the more similar the two samples are.

   For an intuitive illustration of how NNA works, see Fig.5 of [64][28].

Jackknife Cross-Validation Method

   Jackknife Cross-Validation Method [65][23], [66][24], [67][29],
   [68][30] (also called the Leave-one-out cross-validation, LOOCV) was
   used to evaluate the performance of a classifier. In Jackknife
   Cross-Validation Method, every sample is tested by the predictor that
   is trained with all the other samples. Let TP denotes true positive. TN
   denotes true negative. FP denotes false positive and FN denotes false
   negative. To evaluate the performance of our predictor, the prediction
   accuracy, specificity, sensitivity and MCC (Matthews's correlation
   coefficient) were calculated as below:
   graphic file with name pone.0088300.e033.jpg (7)

Incremental Feature Selection (IFS)

   Based on the ranked probes rated by mRMR evaluation, we used
   Incremental Feature Selection (IFS) [69][31], [70][32], [71][33] to
   determine the optimal number of probes. During IFS procedure, probes in
   the ranked probe set are added one by one from higher to lower rank. A
   new probe set is composed when one probe is added. Thus N probe sets
   would be composed given N ranked probes. The i-th probe set is:
   graphic file with name pone.0088300.e034.jpg (8)

   For each of the N probe sets, an NNA predictor was constructed and
   tested using LOOCV. With N prediction accuracies, sensitivities,
   specificities and MCCs calculated, we obtain an IFS table with one
   column being the index i and the other columns to be the prediction
   accuracy, sensitivity, specificity and MCC. The optimal probe set (
   Inline graphic ) is the one, using which the predictor achieves the
   best prediction performance.

Functional enrichment analysis of CNAs genes

   Functional annotation tool of GATHER [72][34] was used for KEGG
   pathway, GO and chromosome region enrichment analysis. All the genes in
   the human genome were selected as background during the enrichment
   analysis.

Results and Discussion

The mRMR Result

   Listed in the [73]File S2 are two kinds of outcomes obtained by running
   the mRMR software: one is called the “MaxRel feature list” that ranked
   all the probes according to their relevance to the class of samples;
   the other one is the “mRMR feature list” that ranked the probes
   according to the criteria of maximum relevance and minimum redundancy.
   In the mRMR probe list, the smaller the index of a probe was, the more
   important the probe would be for the discrimination of two kinds of
   NSCLC. Accordingly, the mRMR feature list could be used to establish
   the optimal feature set in the IFS procedure.

IFS and Final Optimal Feature Set

   Based on these two tables, 1000 feature subsets were constructed
   according to [74]Eq.8. An NNA predictor was modeled for each subset and
   was evaluated by LOOCV. Shown in [75]Fig.1 is the IFS curve plotted
   based on the data in [76]File S3. The x-axis is the number of probes
   used for the classification, and the y-axis is the MCC values of
   classifiers evaluated by LOOCV. The maximum MCC was 0.6616 when 266
   probes were utilized. With such a classifier, the prediction
   sensitivity, specificity and accuracy were 0.9567, 0.6452 and 0.8605,
   respectively. These 266 probes were regarded as the optimal biomarkers
   for the discrimination of two kinds of NSCLC. The information of these
   266 probes were given in [77]File S4. Shown in [78]Fig.2 is the heatmap
   based on these 266 probes. It can be seen that most of the 208 ACA
   samples and 93 SCC samples can be distinguished.

Figure 1. IFS curve for the adenocarcinoma (ACA) and squamous cell carcinoma
(SCC) samples classification.

   [79]Figure 1
   [80]Open in a new tab

   The IFS curves were drawn based on the data in [81]File S3. The MCC
   reached the peak when the number of probes was 266. The 266 probes thus
   obtained were used to compose the optimal probe set for discrimination
   of adenocarcinoma (ACA) and squamous cell carcinoma (SCC).

Figure 2. Heatmapof 208 adenocarcinoma (ACA) samples and 93 squamous cell
carcinoma (SCC) samples with 266 selected probes.

   [82]Figure 2
   [83]Open in a new tab

   Samples are arranged along the X axis and probes along the Y axis. Each
   square represents the copy number of a given probe in an individual
   sample. Red is increased copy number and blue is decreased copy number
   relative to the mean- and sample-centered scaled copy number across the
   samples. Adenocarcinoma (ACA) and squamous cell carcinoma (SCC) samples
   were presented with green and blue, respectively.

KEGG and GO enrichment results of CNAs genes

   The KEGG pathway enrichment analysis of CNAs genes indicated that they
   were enriched in Wnt signaling pathway, Focal adhesion, ECM-receptor
   interaction and so on ([84]Table 1). It is reported Wnt signaling
   pathway is activated during the carcinogenesis of NSCLC [85][35], and
   inhibition of Wnt-2-mediated signaling could induce non-small-cell lung
   cancer cells apoptosis [86][36]. Focal adhesion and ECM-receptor
   interaction are pathways in the biological processes interactions of
   cells with extracellular matrix (ECM), which play crucial roles in cell
   motility, cell proliferation, cell differentiation, regulation of gene
   expression and cell survival [87][37], [88][38]. The proteins of these
   pathways are up-regulated in NSCLCs [89][39], and take part in the
   activation of local invasion and distant metastasis of cancer cells
   [90][40]. As the KEGG pathway enrichment result, the GO enrichment
   result of these CNAs genes also shows enrichment in the terms of cell
   adhesion and intracellular signaling cascade. The GO enrichment result
   of these CNAs genes were listed in [91]File S5.

Table 1. KEGG enrichment result of CNAs genes.

   Pathway KEGG ID Your Genes (With Ann) Your Genes (No Ann) Genome (With
   Ann) Genome (No Ann) P-value
   Wnt signaling pathway hsa04310 6 32 141 2951 0.0077
   Focal adhesion hsa04510 7 31 227 2865 0.0204
   ECM-receptor interaction hsa04512 4 34 82 3010 0.0193
   [92]Open in a new tab

   Your Genes (With Ann): The number of genes from your list with the
   annotation.

   Your Genes (No Ann): The number of genes from your list without the
   annotation.

   Genome (With Ann): The number of genes in the genome (excluding those
   in your list) with the annotation.

   Genome (No Ann): The number of genes in the genome (excluding those in
   your list) without the annotation.

   P-value: The negative logarithm of the p value calculated using a
   Fisher's exact test.

Chromosome region enrichment result of CNAs genes

   It is reported copy number gain in region 3q26 [93][18], [94][41] and
   in region 8p12 [95][42] seem to be more common in squamous histology
   compared with adenocarcinoma. The analysis of our result shows that
   including these two regions, copy number alterations of 2q34, 10p15,
   18q11, 8p23, 3p21, 3q27, 22q12, Xq13, 2q36, 10p11, 10p12 also have the
   significance in discrimination between SCC and ACA, and deserved
   further researches on them ([96]Table 2).

Table 2. Chromosome region enrichment result of CNAs genes.

   Chromosome region Your Genes (With Ann) Your Genes (No Ann) Genome
   (With Ann) Genome (No Ann) P-value
   2q34 5 162 24 30139 5.09E-07
   10p15 5 162 55 30108 2.04E-05
   18q11 4 163 46 30117 0.0002
   3q26 5 162 105 30058 0.0004
   8p23 6 161 174 29989 0.0005
   3p21 7 160 251 29912 0.0006
   3q27 4 163 72 30091 0.0008
   22q12 5 162 142 30021 0.0014
   Xq13 4 163 100 30063 0.0027
   2q36 3 164 51 30112 0.0033
   10p11 3 164 62 30101 0.0056
   10p12 3 164 63 30100 0.0058
   [97]Open in a new tab

   Your Genes (With Ann): The number of genes from your list with the
   annotation.

   Your Genes (No Ann): The number of genes from your list without the
   annotation.

   Genome (With Ann): The number of genes in the genome (excluding those
   in your list) with the annotation.

   Genome (No Ann): The number of genes in the genome (excluding those in
   your list) without the annotation.

   P-value: The negative logarithm of the p value calculated using a
   Fisher's exact test.

CNAs genes identified in this study

   In this study, we identified several candidate genes corresponding to
   266 CNAs probes that can be used to distinguish two kinds of NSCLC. 50
   of them also has a significant correlation to the Smoking Pack-year
   including TP63, SOX2 and PPP2R2B (see [98]File S4). With literature
   retrieval of gene function and significance comparison by p-value, we
   focused on 8 genes which are most probably related to distinguish ACA
   and SCC from each other. Among them, TP63 has been reported as a
   biomarker to discriminate between SCC and ACA, and it is listed top in
   our result. Some of other genes are reported to have different gene
   expression level in ACA and SCC or in patients with distinct smoking
   habits. In accord with the KEGG and GO enrichment result, PPP2R2B is a
   gene in wnt signaling pathway, while ITGA9 takes a part in focal
   adhesion and ECM-receptor interaction. All above illustrates that our
   result is biologically significant and the 8 genes may be candidate
   biomarkers for distinguishing ACA and SCC from each other and deserved
   further studies on them. Below, we will briefly discuss their
   relationships with NSCLC.

   TP63 (Tumor protein 63) is listed top one in the optimal probe set with
   a CNA fold change of 0.7827 comparing ACA with SCC. It is a tumor
   suppressor p53 homologue and essential for p53 dependent apoptosis in
   response to DNA damage [99][43]. Mi Jin Kim et al. found P63 is a
   useful immunohistochemical panel in differentiating ACA from SCC of the
   lung with the positive rate 91% of SCC and 9% of ACA in their studies
   [100][44]. The chromosome location of TP63 is 3q27–29. Therefore, our
   result is coincide with former researches and TP63 may play a key role
   in distinguish ACA and SCC from each other.

   EPHA4 (Ephrin type-A receptor 4) is related to the fourth probe in our
   optimal probe set with a CNA fold change of 1.0846 comparing ACA with
   SCC, and is a member of the Eph receptor family, the largest receptor
   tyrosine kinase family of transmembrane proteins with their ligands,
   the ephrins, affecting the growth, migration and invasion of cancer
   cells in culture as well as tumor growth, invasiveness, angiogenesis
   and metastasis in vivo [101][45]. Junya Fukai et al. found EphA4
   promotes cell proliferation and migration through a novel EphA4-FGFR1
   signaling pathway in the human glioma U251 cell line [102][46]. One of
   the Eph receptors EphA2 is reported over expression in smokers and
   predicts poor survival in non-small cell lung cancer [103][47]. A
   mutation in EphA2 (G391R) was identified in two of 28 squamous cell
   lung cancers (7%), but not in any adenocarcinomas or large-cell lung
   carcinomas [104][48]. These all indicate that EphA4 may be a candidate
   biomarker for distinguishing ACA and SCC from each other and deserved
   further studies on it.

   PPP2R2B (Serine/threonine-protein phosphatase 2A 55 kDa regulatory
   subunit B beta isoform) is related to the fifth probe in our optimal
   probe set with a CNA fold change of 1.0781 comparing ACA with SCC. It
   is the regulatory subunit B beta isoform of PP2A, and is implicated in
   the negative control of cell growth and division [105][49]. Recently
   genome-wide association study (GWAS) of lung cancer in the Chinese
   population revealed that chromosome 5q32 (rs2895680 in
   PPP2R2B-STK32A-DPYSL3, P = 6.60×10−9) was lung cancer susceptibility
   loci and interacted with smoking dose [106][50]. As well as PPP2R2B is
   on the top of our result, the contribution of it in the NSCLC is worthy
   to be further elucidated.

   ITGA9 (Integrin alpha-9) is related to the twelfth probe in our optimal
   probe set with a CNA fold change of 1.1034 comparing ACA with SCC,
   which belongs to the integrin family and is expressed on a wide range
   of cell types. It interacts with many ligands for example fibronectin,
   tenascin-C and ADAM12, and takes part in several processes such as cell
   adhesion, migration, lung development, lymphatic and venous valve
   development, and in wound healing [107][51]. ITGA9 has been found down
   expression in NSCLC [108][52], and exhibiting strong cell growth
   inhibition activity [109][53]. Statistical analysis of Alexey a.
   Dmitriev et al. suggested that the methylation/deletion level of ITGA9
   has significant changes in ACA and SCC [110][53]. Our analysis
   presented the gene copy number of ITGA9 is dissimilar in NSCLC
   subtypes, implying ITGA9 as a candidate molecular to discriminate
   between SCC and ACA.

   SOX2 (Sex-determining region Y-Box 2) is related to the nineteenth
   probe in our optimal probe set with a CNA fold change of 0.7790
   comparing ACA with SCC, and has been reported to be differentially
   expressed between ACA and SCC. It is located at chromosome 3q26 and
   high-level amplification of SOX2 have been reported in approximately
   20% of lung squamous cell carcinomas [111][54], [112][55]. SOX2 is a
   transcription factor controlling the expression of a number of genes
   involved in embryonic development and keeps neural cells
   undifferentiated [113][56]. Suppression of SOX2 in amplified SOX2 cells
   has greater antiproliferative effects compared with other genes on
   3q26.33 including PIK3CA and TP63.

   FHIT (fragile histidine triad) is related to the thirty-third probe in
   our optimal probe set with a CNA fold change of 1.1110 comparing ACA
   with SCC, and behaves in vitro as a typical diadenosine triphosphate
   hydrolase cleaving A-5′-PPP-5′A to yield AMP and ADP [114][57], but
   little is known about its physiological function. It is considered as a
   tumor suppressor in many human cancers and its restoration in
   Fhit-negative cancer cell lines suppresses tumorigenicity and induces
   apoptosis [115][58]. Jennifer E. Tseng el at. found that the frequency
   of loss of FHIT expression is related with smoking habit in Stage I
   Non-Small Cell Lung Cancer [116][59]. In the studies of Gemma Toledo et
   al. FHIT expression was related to tumor histology: 52 of 54 (96.3%)
   SCC and 20 of 44 (45.5%) ACA were negative for FHIT (P<0.0001)
   [117][60]. As SCC is closely correlated with a history of tobacco
   smoking [118][6], and our results show the copy number of FHIT is
   significantly lower in SCC, FHIT may be a possible biomarker for NSCLC
   diagnosis and would be a potential medical target for cancer therapy.

   RBBP8 (Retinoblastoma-binding protein 8) is a ubiquitously expressed
   nuclear protein which is binding to the tumor suppressor proteins RB
   [119][61] and CtBP [120][62]. It is also interacting with BRCA1
   [121][63] and is thought to regulate the functions of BRCA1 in
   transcriptional regulation, DNA damage repair, and G2/M cell cycle
   checkpoint control [122][64], [123][65]. RBBP8 is required for DNA
   double-strand break (DSB) resection, and thereby for recruitment of the
   protein kinase ATR and replication protein A to DSBs, and promotes ATR
   activation and homologous recombination [124][66]. It is reported that
   DNA repair components were significantly up-regulated including
   retinoblastoma-binding protein 8 (RBBP8), in lung SCC compared with
   normal lung tissue, but such up-regulation was not found in lung ACA
   [125][67]. As an essential molecular in the cell process DNA damage
   repair and cell cycle control, RBBP8 has the potential to be a
   biomarker and therapy target for NSCLC and the mechanism of its
   distinct expression profile in SCC and ACA deserves further study.

   GPC5 (Glypican-5) is a member of the glypican gene family, which is a
   family of heparan sulphate proteoglycans that are linked to the
   exocytoplasmic surface of the plasma membrane via glycosyl
   phosphatidylinositol [126][68]. The expression level of GPC5 was
   significantly lower in lung adenocarcinoma tissue than in matched
   normal lung tissue in never smokers [127][69]. Yang et al. found
   deceased expression of GPC5 is correlated with reduced survival in ACA
   but not in SCC [128][70]. These all indicate that GPC5 may be a
   potential tumor suppressor gene in NSCLC, and a candidate bio-marker to
   discriminate between SCC and ACA.

Conclusion

   In this study, we constructed a classifier based on copy number
   alterations (CNA) to distinguish two subgroups of NSCLC. As a result,
   266 CNA probes were selected as the best discriminators. Analysis of
   genes corresponding to these 266 CNA probes indicate that they were
   enriched in lung cancer related pathways and enriched in the chromosome
   regions where CNA usually occur in lung cancer. Some of these genes,
   such as TP63, SOX2, EPHA4, PPP2R2B, ITGA9, FHIT, RBBP8 and GPC5 are
   closely related to lung cancer and these candidate genes may provide
   clues for further research and experiment validation.

Supporting Information

   File S1

   Clinical information of adenocarcinoma (ACA) and squamous cell
   carcinoma (SCC) samples.

   (DOCX)
   [129]Click here for additional data file.^ (13.8KB, docx)
   File S2

   mRMR result for classification. This file contains two sheets. The
   first one is the MaxRel feature table, which ranked the top 1000 probes
   according to the relevance between features and class of the samples.
   The second one is the mRMR feature table, which ranked these 1000
   probes according to the redundancy and relevance criteria.

   (XLSX)
   [130]Click here for additional data file.^ (73.5KB, xlsx)
   File S3

   The sensitivity (Sn), specificity (Sp), accuracy (Ac), Matthews
   correlation coefficient (MCC) of each run of IFS for classification.

   (XLSX)
   [131]Click here for additional data file.^ (43.2KB, xlsx)
   File S4

   The annotation of the 266 selected probes.

   (XLSX)
   [132]Click here for additional data file.^ (109.4KB, xlsx)
   File S5

   The GO enrichment result of CNAs genes.

   (XLSX)
   [133]Click here for additional data file.^ (13KB, xlsx)

Acknowledgments