Abstract Background Although screening programmes of smokers have detected resectable early lung cancers more frequently than expected, their efficacy in reducing mortality remains debatable. To elucidate the biological features of computed tomography (CT) screening detected lung cancer, we examined the mRNA signatures on tumours according to the year of detection, stage and survival. Methods Gene expression profiles were analysed on 28 patients (INT–IEO training cohort) and 24 patients of Multicentre Italian Lung Detection (MILD validation cohort). The gene signatures generated from the training set were validated on the MILD set and a public deposited DNA microarray data set ([42]GSE11969). Expression of selected genes and proteins was validated by real-time RT-PCR and immunohistochemistry. Enriched core pathway and pathway networks were explored by GeneSpring GX10. Findings A 239-gene signature was identified according to the year of tumour detection in the training INT–IEO set and correlated with the patients' outcomes. These signatures divided the MILD patients into two distinct survival groups independently of tumour stage, size, histopathological type and screening year. The signatures can also predict survival in the clinically detected cancers ([43]GSE11969). Pathway analyses revealed tumours detected in later years enrichment of the PI3K/PTEN/AKT pathway, with up-regulation of PDPK1, ITGB1 and down-regulation of FOXO1A. Analysis of normal lung tissue from INT–IEO cohort produced signatures distinguishing patients with early from late detected tumours. Interpretation The distinct pattern of “indolent” and “aggressive” tumour exists in CT-screening detected lung cancer according to the gene expression profiles. The early development of an aggressive phenotype may account for the lack of mortality reduction by screening observed in some cohorts. Keywords: Gene signature, CT screening, Lung cancer, Indolent, Aggressive, PI3K/PTEN/AKT signalling pathway Highlights * • Lung cancers detected in the first two years of screening program are less aggressive than the ones identified subsequently. * • A mRNA gene signature separates tumours from the initial and late years of screening and correlated with outcomes. * • Early stage lung cancers can be divided into “indolent” and “aggressive” with distinct biological basis. 1. Introduction Lung cancer screening detects early cancers in a higher number than expected but nonetheless there is still no consensus on both efficacy in reducing mortality and safety ([44]Melamed et al., 1984, [45]Anon., 2014, [46]Marcus et al., 2006, [47]Bach et al., 2003). Three European randomized CT screening clinical trials have so far failed to achieve a mortality reduction ([48]Infante et al., 2009, [49]Saghir et al., 2012, [50]Pastorino et al., 2012).This is apparently in contrast to the results from two larger America based studies: the International ELCAP group published a positive report on the efficacy of CT screening, with an estimated 10-year survival of 80% overall and 92% in clinical stage I cancers undergoing surgery ([51]Henschke et al., 2006)and the National Lung Screening Trial reported a 20% reduction in mortality in the LDCT group compared to chest X-rays group([52]The National Lung Screening Trial Research Team, 2011). To explain these apparently contrasting findings, it has been raised a hypothesis that early-stage tumours found at baseline screening are mostly indolent tumours and removing them does little to reduce development of fatal, fast growing cancers which develop and are picked up later on in the screening programme ([53]Bach, 2008, [54]Pastorino, 2006). The evidences so far suggested that most advanced cancers do not reflect slow evolution of indolent carcinomas, but instead are fast-growing carcinomas with a de novo aggressive phenotype. Remarkably, the year of detection has been associated with the clinical outcomes, as the tumours detected during the first two years had a better prognosis than those identified in later years of screening. Boeri et al. analysed microRNA expression of tissue and plasma of early detected lung cancer and found that specific signatures could distinguish tumours by the year of detection and predict their clinical outcome ([55]Boeri et al., 2011, [56]Sozzi et al., 2014). In order to gain further insights on the biological features of CT screening detected tumours, in the present study we analysed the gene expression profile on two sets of spiral CT screening detected tumours: the training set from the pilot trial (INT–IEO) ([57]Pastorino et al., 2003) and the validation set from the prospective randomized Multicentric Italian Lung Detection trial (MILD) ([58]Bianchi et al., 2004). We compared the differential gene expression according to the year of the cancer detection on the training set then we validated the gene signatures on the validation set and on an independent public deposit data set ([59]Takeuchi et al., 2006). 2. Methods 2.1. Sample 28 lung cancers and 23 non-adjacent normal lung tissues from INT–IEO cohort plus 24 tumours identified in the MILD trial were selected ([60]Supplementary Table 1). Recruitment and diagnostic imaging workup have been previously described ([61]Pastorino et al., 2003, [62]Bianchi et al., 2004). Tumours detected during the first two years of screening are defined as CT1–2 those detected between the 3rd and the 5th year as CT3–5. For association analyses the following clinical parameters were considered: CT year, pathological stage and histopathology type. The χ^2 test was used to examine the associations between predictor variables. Further details were in Supplementary material and methods. 2.2. RNA Extraction, Sample Selection Criteria and mRNA Amplification Total RNA was extracted using TRIzol (Invitrogen) and residual DNA removed by RNeasy (Qiagen). RNA quality was checked by Agilent bioanalyzer (Agilent Inc). The concentration was determined using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies). An RNA sample was further processed only if: (1) the ratio of A[260]/A[280] between 1.7–2.0; (2) concentration within the range of 0.5–10 μg/ml; (3) displaying two distinct peaks corresponding to the 28S and 18S ribosomal RNA bands at the ratio of 28S/18S > 0.5 with no degradation. One μg of total RNA was amplified with Amino Allyl MessageAmp™ aRNA Kit (Ambion, Austin, Texas) and indirectly labelled with Cy3 monofunctional dye (Amersham Biosciences UK Ltd., Bucks, United Kingdom) for the sample RNA and Cy5 monofunctional dye for the reference RNA (Stratagene®, Amsterdam, The Netherlands) and then co-hybridised onto the microarray. 2.3. cDNA Microarray Analysis The Human Exonic Evidence-Based Oligonucleotide (HEEBO) array containing 44,544 of 70-mer probes (Stanford Functional Genomics Facility, Stanford University) was used. Microarray hybridization and processing were performed according to Stanford protocols ([63]www.microarray.org/sfgf). After 20 h of incubation in a 42 °C hybridization oven, the microarray slides were washed with series of SSC and SDS and immediately scanned with a GenePix 4000B microarray scanner (Axon Instruments Inc., Union City, CA). The image QC (flag) set up as: SNR532 > 3, SNR635 > 3, mean of median background < 500, median PC > B + 1SD > 90%, feature variation < 1, background variation < 1, and feature with saturated pixels < 1%. Data was then background subtracted and normalised by the globe intensity correction factor normalisation (LOWESS). The data were then imported into GeneSpring™ GX10 software (Agilent Inc, California) followed by log transformation and LOWESS normalisation. Quality control was performed considering values between 20th and 100th percentile. The data has been through the completed MIAME guideline checklist and is being deposited with Gene expression Omnibus (GEO accession no. [64]GSE29827). 2.4. Statistical Analysis 28 INT–IEO and 24 MILD samples were analysed as training and validation set respectively. In the training set, the gene expression comparison between CT1–2 (n = 17)vs. CT3–5 (n = 11) and stage I (n = 19) vs. stages II–IV (n = 9) were carried out by parametric permutative (permutation times > 1000) t-test (detailed description in supplementary method) to generate the significance threshold necessary for the recognition of genes differentially expressed in the two groups (p < 0.01). We then combined the cases according to the CT year and stage. 14 of them were CT1–2 at stage I, while six were CT3–5 at stages II–IV. The 14 of CT1–2/stage I and the 6 of CT3–5/stages II–IV were each compared to the respective normal samples, using the same statistical method with a number of permutations set > 1000 and p < 0.01. Unsupervised hierarchical clustering by Pearson's distance measure on average linkage was followed to assess the distribution of each patient based on their similarities measured over the significantly expressed genes by the supervised filtered method. Overall survival time for lung cancer patients was calculated from the diagnosis of the disease until death or by censoring at the last follow-up date. Survival curves were estimated using the product-limit method of Kaplan–Meier and were compared using the Log-rank test. Statistical analyses were carried out using SAS® (SAS Institute Inc., Cary, NC, USA) and R (URL: [65]http://www.r-project.org/, last access Feb 8th 2010) software. Two-sided p values below 0.05 were considered statistically significant. 2.5. Biological Interpretation of the Microarray Data To elucidate the gene signature as a pathway and its possible biological function, we employed an on-line method, DAVID bioinformatics resources (National Institute of Allergy and Infectious Diseases, NIH), a public Database for Annotation, Visualization and Integrated Discovery ([66]http://david.abcc.ncifcrf.gov/) ([67]Huang Da et al., 2009). For pathway analysis, gene list from each comparison was imputed to GSGX10 which provides two approaches: enrichment analysis and network analysis. The former uses BioPAX (Biology Pathway Exchange) format that allows import pathway data from KEGG, Reactome. The latter uses the network database of biological associations extracted from up-to-date scientific literature (NLP) to construct the overall network and interaction such as activation, inhibition and binding to each other, and to identify the most enriched significant pathways in a given gene set. 2.6. Validation of Microarray Data The 24-patient MILD cohort was kept as blind when doing microarray analysis. Validation of the signatures generated on the training set was performed on this data set as well as on an independent public dataset of clinically detected lung cancer ([68]Takeuchi et al., 2006) (GEO accession number: [69]GSE11969). Common features between different platforms (for the [70]GSE11969 data set) were used for clustering analysis with centred correlation and complete linkage. Distribution of patients according to the signature were compared to clinical–pathological characteristics: year of screening (MILD set only), status, stage and histotype with a 2 × 2 contingency table and two-tailed Fisher's exact test. Kaplan–Meier survival plots with Log-rank test (HR and 95% CI) were used to compare survival. Results were considered significant at p-value ≤ 0.05 2.7. TaqMan Real-Time Quantitative PCR The microarray levels of expression of four selected genes ITGB1 (Hs01127543_m1), FOXO1A (Hs01054576_m1), SERPINA3 (Hs01038298_m1) and SELP (Hs00927900_m1) were validated using TaqMan qRT-PCR. From each samples, 0.5 μg of total RNA was converted to cDNA by the RetroScript kit (Applied Biosystems, Foster City, CA) using a random decamer as the primer in a 20-μl reaction according to the manufacturer's protocol. cDNA were then diluted 1:25 and five μl were used for qRT-PCR in a total volume of 20 μl containing 1 × TaqMan gene expression master mix and one selected primer. Results of triplicate assays were log-transformed and mean expression values calculated. Relative expression for each gene was assessed based on real-time PCR data normalised to the control gene ACTB (Hs99999903_m1) by the ΔCt method. Relative fold-change was calculated by 2^− ΔΔCt and compared with log-transformed microarray data. 2.8. Immunohistochemistry Paraffin-embedded, formalin-fixed tissues were sectioned (5-mm) onto glass slides. A monoclonal antibody against FOXO1A (Millipore clone 2H8.2) was used and immunostains and scoring (percentage of positive cells and intensity of staining) were performed as previously described ([71]Leek et al., 2000). 3. Results 3.1. Comparison Between Patients Detected in the First Two (CT1–2) and the Last Three (CT3–5) Years of Screening 239 genes were differentially expressed between 17 CT1–2 and 11 CT3–5 tumours from the INT–IEO training set ([72]Table 1 for selected genes and [73]Supplementary Table 2 for the full list), 110 genes were up-regulated and 129 down-regulated in CT1–2 compared to CT3–5 by parametric permutative (permutation times > 1000) t-test (p < 0.01). 153 genes were differentially expressed in 19 stage I compared to 9 stages II–IV patients (selected genes in [74]Table 2 and full list in [75]Supplementary Table 3). Combined comparison of 14 CT1–2/stage I vs. 6 CT3–5/stages II–IV patients identified 218 differentially expressed genes (selected genes in [76]Table 3 and full list in [77]Supplementary Table 4). Tumours of CT3–5 or stages II–IV or combined CT3–5/stages II–IV had higher expression of genes associated with metastasis and high cell mobility. Table 1. Selected genes of 239 differentially expressed by parametric permutative t-test between tumours detected in years 1–2 and in years 3–5 involved in metastases and tumour aggressiveness. Gene ID Fold p Value Direction Biological functions PCDHGB3 4.78 0.001 Up years 1–2 Neural cadherin-like adhesion proteins likely to play a role in the establishment of cell–cell connections in the brain CLDN16 3.89 0.003 Up years 1–2 Tight junction. Associated with less aggressive, reduced tumour volume and lack of metastases in breast. PLAC1 1.38 0.008 Up years 1–2 Associated with proliferation, motility, migration and invasion. FOXO1A 1.4 0.009 Up years 1–2 Forkhead box O1A (rhabdomyosarcoma) ITGB1 0.79 0.008 Down years 1–2 Controls invasion via regulation of MMP-2 collagenase expression through PI-3K, Akt, and Erk protein kinases and the c-Jun or via direct recruitment of MMP-2 to the cell surface MFI2 0.77 0.008 Down years 1–2 Melanoma progression and metastases. PECAM1 0.76 0.002 Down years 1–2 Endothelium. Associated with metastases. PAPPA2 0.68 0.01 Down years 1–2 Metalloproteinase. Detected in some invasive extravillous trophoblasts in the first trimester POSTN 0.63 0.008 Down years 1–2 Associated with aggressive metastatic tumours. PTP4A3 0.61 0.006 Down years 1–2 Associated with cell proliferation and metastases F8 0.59 0.004 Down years 1–2 Associated with cell proliferation and metastases SERPINA3 0.58 0.009 Down years 1–2 Associated with cell proliferation and metastases CCL25 0.57 0.002 Down years 1–2 Associated with metastatic melanoma. OPHN1 0.53 0.007 Down years 1–2 Rho-GTPase-activating protein that promotes GTP hydrolysis of Rho subfamily members. Promotes cell migration. AAMP 0.47 0.009 Down years 1–2 Heparin binding. Expressed strongly in metastatic colon adenocarcinoma cells in lymphatics. CLCA2 0.47 0.01 Down years 1–2 It may serve as adhesion molecule for lung metastatic cancer cells, mediating vascular arrest and colonization. MIZF 0.43 0.004 Down years 1–2 Transcription repressor. Possibly associated with invasiveness. CTSB 0.34 0.002 Down yrs 1–2 Associated with metastases. RLN2 0.32 0.005 Down yrs 1–2 Increase Cyclic AMP. Gs-adenylate cyclase and b-catenin pathway. Increases cell invasion and proliferation. SELP 0.32 0.002 Down years 1–2 Associated with metastases. MYO7B 0.3 0.001 Down years 1–2 Cell motility SFSCN2 0.21 0.001 Down years 1–2 Associated with metastases. CTSL1 0.18 0.001 Down years 1–2 Associated with metastases. Induced by hypoxia. FZD5 0.18 0.002 Down years 1–2 Canonical WNT pathway. Lung oncogenesis, increases cell migration. Involves b-catenin. [78]Open in a new tab Table 2. Selected genes of 153 genes differentially expressed by parametric permutative t-test between tumours stage 1 and stages 2–4 involving in metastases and tumour aggressiveness. Gene ID Fold p Value Direction Biological functions SUSD5 3.2 0.001 Up stage 1 Cell adhesion LAMC3 1.94 0.003 Up stage 1 Stability of basement membranes and of cellular attachments to basement membranes, VASP 1.39 0.002 Up stage 1 Associated with cell invasiveness NPEPPS 1.32 0.003 Up stage 1 Associated with proliferation, migration, and invasion PCLKC 0.74 0.006 Down stage 1 Cell adhesion. Tumour contact inhibition DDC 0.5 0.009 Down stage 1 Associated with metastatic neuroblastoma. GNE 0.47 0.01 Down stage 1 Induces sLex. Sialic acid — dependent processes in adhesion, signalling, differentiation, and metastasis. NEXN 0.45 0.005 Down stage 1 F-actin associated. Induces cell migration and adhesion. AAMP 0.38 0.001 Down stage 1 Heparin binding. Expressed strongly in metastatic colon adenocarcinoma cells in lymphatics. CSPG3 0.35 0.003 Down stage 1 Thought to be involved in the modulation of cell adhesion and migration EBAG9 0.34 0.005 Down stage 1 Associated with metastatic disease [79]Open in a new tab Table 3. Selected genes of 218 genes differentially expressed by parametric permutative t-test between tumours of CT1–2/stage 1 and CT3–5/stages 2–4 involved in metastases and tumour aggressiveness. Gene ID Fold p Value Direction Biological functions S100A2 3.3 0.009 Up yr 1–2/s 1 Down regulated in metastatic Head Neck carcinoma CD151 2.23 0.001 Up yr 1–2/s 1 Enhances cell motility, invasion and metastasis MITF 1.89 0.009 Up yr 1–2/s 1 Associated with suppression of metastases. CDK5 1.62 0.006 Up yr 1–2/s 1 Induces cell migration and apoptosis TACSTD2 1.52 0.009 Up yr 1–2/s 1 Associated with metastases PLAC1 1.51 0.01 Up yr 1–2/s 1 Associated with proliferation, motility, migration and invasion. VASP 1.43 0.006 Up yr 1–2/s 1 Associated with cell invasiveness ITGB1 0.70 0.003 Down yr 1–2/s 1 Controls invasion via regulation of MMP-2 collagenase expression through PI-3K, Akt, and Erk protein kinases and the c-Jun or via direct recruitment of MMP-2 to the cell surface PCLKC 0.69 0.003 Down yr 1–2/s 1 Cell adhesion. Tumour contact inhibition LAMA3 0.63 0.005 Down yr 1–2/s 1 Loss of expression associated with advanced stage. GPR68 0.62 0.009 Down yr 1–2/s 1 Metastases suppressor gene. Inhibits cell migration. CCL25 0.52 0.002 Down yr 1–2/s 1 Associated with metastatic melanoma. ANXA9 0.39 0.006 Down yr 1–2/s 1 Cell–cell adhesion NEXN 0.38 0.006 Down yr 1–2/s 1 F-actin associated. Induces cell migration and adhesion. CLCA2 0.35 0.01 Down yr 1–2/s 1 It may serve as adhesion molecule for lung metastatic cancer cells, mediating vascular arrest and colonization. RLN2 0.32 0.007 Down yr 1–2/s 1 Increase Cyclic AMP. Gs-adenylate cyclase and b-catenin pathway. Increases cell invasion and proliferation. SELP 0.31 0.001 Down yr 1–2/s 1 Associated with metastases. EBAG9 0.28 0.005 Down yr 1–2/s 1 Associated with metastatic disease PDPK1 0.25 0.007 Down yr 1–2/s 1 Cell–matrix adhesion and oncogenesis AAMP 0.24 0.001 Down yr 1–2/s 1 Heparin binding. Expressed strongly in metastatic colon adenocarcinoma cells in lymphatics. [80]Open in a new tab Unsupervised hierarchical clustering of the 239-gene showed the separation of CT1–2 and CT3–5 ([81]Fig. 1). Similarly the method applied to 153 genes distinguished stage I from stages II–IV and 218 genes of CT1–2/stage I vs. CT years3–5/stage II–IV tumours respectively ([82]Suppl. Fig. 1, [83]Suppl. Fig. 2). Fig. 1. [84]Fig. 1 [85]Open in a new tab Unsupervised hierarchical clustering of 17 tumours detected in years 1 and 2 (T) and 11 cases detected in years 3, 4 and 5 (T*) using 239 differentially expressed genes. graphic file with name fx1.gif We then analysed gene expression profiles in the normal lung from the same cohort using the same permutative t-test: 203 genes were differentially expressed between these histologically normal tissues of CT1–2 (n = 15) and CT3–5 (n = 8) patients (p < 0.01, [86]Supplementary Table 5). Unsupervised hierarchical clustering separated the normal tissues according to the year of tumour detection ([87]Supplementary Fig. 3). We also compared 17 tumours with 15 normal tissues of CT1–2 and 11 tumours with 8 normal lung tissues of CT3–5 using the similar parametric permutative t-test (p < 0.01);a larger amount of differentially expressed genes were identified respectively ([88]Supplementary Tables 6 &[89]7). GO pathway analysis was performed by importing the differentially expressed genes lists to the DAVID online database. [90]Table 4 shows the top annotation clusters in the three lists of 239, 153 and 218 genes of tumour comparison according to the different criteria plus the 203 genes from normal tissue comparison (full lists of annotation in [91]Suppl. Table 8, [92]Suppl. Table 9, [93]Suppl. Table 10, [94]Suppl. Table 11). Table 4. The top alteration in 4 genes lists of tumour & normal comparisons. From tumour of CT1–2 vs. CT3–5 __________________________________________________________________ Top annotation group Annotation cluster terms Enrichment score Count % FDR 1 Cysteine-type endopeptidase activity 1.82 6 0.0089 2 Endopeptidase activity 1.78 7 0.001 3 Glycosylation 1.51 42 0.0089 4 G-protein coupled receptor 0.89 11 0.009 5 Cell adhesion 0.87 10 0.0089 

 From tumour of stage I vs. stages II–IV __________________________________________________________________ Top annotation group Annotation cluster terms Enrichment score Count % FDR __________________________________________________________________ 1 Extracellular space 1.52 9 0.001 2 Jak-STAT signalling pathway 1.51 4 0.0064 3 Cell adhesion 0.92 7 0.001 

 From tumour of CT1–2/stage I vs. CT3–5/stages II–IV __________________________________________________________________ Top annotation group Annotation cluster terms Enrichment score Count % FDR __________________________________________________________________ 1 Cell motion 2.45 12 0.0077 2 Regulation of cell motion 1.55 12 0.0083 3 Cell adhesion 1.54 13 0.0092 4 Cell-substrate adherens junction 1.37 6 0.0066 5 Sodium ion binding 1.05 6 0.001 

 From normal of CT1–2 vs. CT3–5 __________________________________________________________________ Top annotation group Annotation cluster terms Enrichment score Count % FDR __________________________________________________________________ 1 Chromatin assembly 1.36 5 0.098 2 DNA binding, high mobility group 1.30 3 0.0023 3 Regulation of cell morphogenesis involved in differentiation 1.29 4 0.0096 4 Endoplasmic reticulum 1.17 12 0.0067 5 RAS GTPase mediated signal transduction 0.97 12 0.0044 [95]Open in a new tab We also listed functional annotation clustering from the genes in the tumours vs. same period normal in [96]Suppl. Table 12, [97]Suppl. Table 13. By focusing on a significant enrichment score ≥ 1, analysing the genes differentially expressed between tumour and normal of CT1–2, differences in cell adhesion, cell growth, ribosome activity, cell motility and regulation of kinase activity were identified. On the other hand, the same analysis on tumour vs. normal CT3–5, it is shown that different biological activities are altered for instance in regulation of protein modification & metabolic process, protein folding/chaperone and oxidoreductase activity. We tested the 239-gene signature differentially expressed between CT1–2 and CT3–5 on an independent cohort of 24 spiral-CT detected lung cancer from the MILD trial. Using complete linkage and centred correlation, the signature was able to divide the patients into two distinct groups independent from the tumour status, stage, histotype and the year of screening as shown in [98]Table 5, with the overall reproducibility of R-index = 0.612 and D-index = 6.835 ([99]Supplementary Tables 14). Survival analysis showed that despite a Hazard Ratio (HR) of 4.6 (95%CI 0.9–23.4), there was no significant association (Log-rank test p = 0.06) with the overall survival ([100]Fig. 2a), but it became significant (p = 0.03), with 5.2 HR (95%CI 1.2–23.1), when considering disease free survival ([101]Fig. 2b). Table 5. Patients' distribution of the MILD trial and the public GEO dataset [102]GSE11969 by 239-gene signature. MILD[103]^⁎ __________________________________________________________________ [104]GSE11969[105]^# __________________________________________________________________ 24 patients __________________________________________________________________ 79 patients __________________________________________________________________ Group 1 Group 2 p-Value[106]^# Group 1 Group 2 p-Value[107]^# Alive 7 11 0.15 16 21 0.01 Dead 5 1 30 12 Stage I 7 10 0.37 24 16 1.00 Stages II–IV 5 2 22 17 ADK 6 7 1.00 29 16 0.25[108]^## Other 6 5 17 17 CT1–2 8 6 1.00 CT3–5 4 6 [109]Open in a new tab Clustering experiment using centred correlation, complete linkage and cutting dendrograms at 2 clusters. ^⁎ 231/239 features in common. ^# 118/239 features in common. ^## Two-tailed Fisher's exact test. Fig. 2. [110]Fig. 2 [111]Open in a new tab Survival analysis of 24 MILD patients grouped according to the signature of CT-year of screening. These survival curves are based on (a) overall survival analysis and (b) disease free survival analysis. The (two-sided) p value is from by Log-rank (Mantel–Cox) test. On the other hand, the signatures of stage generated comparing tumour stage I vs. II–IV in the training set, were ineffective in this validation sets ([112]Supplementary Table 15). We then tested the same signature on a deposited independent NSCLC data set with annotated clinical information ([113]GSE11969) containing the expression profiles of 79 clinically detected lung adenocarcinomas and squamous cell carcinomas ([114]Takeuchi et al., 2006). There were 118 features comparable with the dataset and able to divide the patients into two distinct survival groups independently from tumour stage and histotype (p = 0.013, [115]Table 5). Moreover, the Kaplan–Meier curves showed clear differences in overall survival (Log-rank test p = 0.02) with 2.1 of HR (95%CI 1.1–3.9) as shown in [116]Fig. 3a. When the analysis was restricted to 40 stage I tumours, the features also discriminated this early stage tumour with distinct clinical outcomes (p = 0.03 and HR = 2.9, 95%CI 1.1–7.8), suggesting that also clinically detected stage I tumours are a heterogeneous category comprising indolent and aggressive tumours ([117]Fig. 3b). However, the stage generated signatures from the training set were ineffective in [118]GSE11969 validation sets ([119]Supplementary Table 15). Fig. 3. [120]Fig. 3 [121]Open in a new tab Overall survival analysis of in silico data ([122]GSE11969) considering (a) all the 79 patients or (b) the 40 stage I alone, grouped according to the signature of CT-year of screening. The (two-sided) p value is from by Log-rank (Mantel–Cox) test. 3.2. Building up Pathway-Based Gene Signatures 3.2.1. Pathway Network Model 1: Tumour of CT1–2 vs. CT3–5 and stage 1 vs. stages 2–4 We performed network modelling using GeneSpring GX10 on the 239-gene signature of differentially expressed by CT1–2 vs CT3–5. The results showed that the tumours detected in later years have an increased expression of the genes ITGB1, SELP, PECAM1, F8, SERPINA3 with loss of FOXO1A as hub nodes in the transcriptome regulatory network ([123]Fig. 4). Further pathway enrichment analysis ([124]Table 6) revealed in functional terms that these data would predict the aggressive tumours form later years more angiogenesis (ITGB1) and metastases (ITGB1, PECAM1, SELP), increased proliferation and diminished apoptosis following loss of FOXP1 transcription but the activation of the of PI3K/PTEN/AKT pathway([125]Fig. 5). Fig. 4. [126]Fig. 4 [127]Open in a new tab The 239-gene of differentially expressed genes according to the CT screening year were imported into the GeneSpring GX10 for searching the common regulators of these genes. The connection between these genes was built up and unlinked nodes (genes) were removed. Blue lines and squares signify that a defined regulatory relationship exits between genes. Grey lines and squares signify that a putative regulatory relationship between genes has been identified but not biochemically defined. +, positive regulation; −, negative regulation. Table 6. Pathway enrichment analysis. 239 genes tumours detected in years 1–2 vs years 3–5 __________________________________________________________________ Object identifier Common object Size Name 0 1625598 1 73 Long-term potentiation 1 1625611 2 82 ERK-PI3K (collagen) signalling 2 1625626 2 86 Integrin signalling 3 1625629 2 44 PTEN signalling 4 1625630 2 63 AKT signalling 5 1625632 2 32 ACH-R apoptosis signalling 6 1625637 1 144 Apoptosis 7 1625640 1 24 Interferon-alpha signalling 

 153 genes stage 1 vs stages 2–4 Object identifier Common object Size Name __________________________________________________________________ 0 1625597 1 130 SAPK–JNK signalling 1 1625605 1 72 Alzheimer's disease 

 218 genes combined ct/stage t1 vs t2 Object identifier Common object Size Name __________________________________________________________________ 0 1625598 1 73 Long-term potentiation 1 1625605 1 72 Alzheimer's disease 2 1625611 1 82 ERK–PI3K (collagen) signalling 3 1625626 1 86 Integrin signalling 4 1625629 2 44 PTEN signalling 5 1625630 1 63 AKT signalling 6 1625632 1 32 ACH-R apoptosis signalling 7 1625633 1 94 Wnt signalling (Calcium) [128]Open in a new tab Fig. 5. Fig. 5 [129]Open in a new tab Pathway enrichment analysis reveals PI3K/PTEN/AKT signalling and apoptosis pathway as the most significantly related pathway. ITGB1 has higher level of expression in the late year tumours while FOXO1A has higher levels in the tumours detected at first two years. When analysing 153 genes of stage I vs. stages II–IV tumours, it is shown that SAPK/JNK and Alzheimer pathways as the most significantly related networks, both regulating AKT phosphorylation, which is known to be involved in the regulation of apoptosis and cell cycle activity, mostly through ITGB1. Enrichment pathways on the genes differentially expressed between CT1–2/stage I and CT3–5/stages II–IV showed similar patterns ([130]Table 6). 3.2.2. Pathway Network Model 2: Pathway Alteration Events in Normal Tissues Similar pathway analysis approaches were applied to the gene list of 203 genes of normal tissues comparison according to the tumour detection time. The direct interaction of 203 genes revealed the direct connection of PSG1, PIK3R4, GRP, PSEN1, CEBPB and HIST1H2BK ([131]supplementary Fig. 5). Further biological process analysis revealed that this network mainly functions in activation of MAPK activity, regulation of angiogenesis, keratinocyte proliferation, chromatin remodelling and assembly ([132]supplementary Fig. 6). 3.3. Validation of Selected Genes TaqMan QRT-PCR analysis of ITGB1, SERPINA3, FOXO1A and SELP expression in 28 cases showed similar expression patterns to those found in microarray analysis ([133]supplementary Fig. 7). 3.4. Frequency and Expression Pattern of FOXO1A Immunohistochemistry demonstrated nuclear expression of FOXO1A in 81% (14/17) of the INT–IEO CT1–2 and in 60% (6/10) of the CT3–5 cases and cytoplasmic staining in 76% (13/17) and 50% (5/10) tumours respectively ([134]supplementary Fig. 8). 4. Discussion Fifteen years ago, we launched a prospective early-detection trial with spiral CT, positron emission tomography and molecular markers in a cohort of 1035 heavy smokers (INT–IEO set). A second prospective randomized trial, MILD, including 4099 participants was launched in 2005. A study to compare the efficacy and cost-effectiveness of low-dose spiral CT screening in four published randomized trials concluded that there was no overall mortality difference in the CT arms compared with the control arms ([135]Pastorino et al., 2012). The molecular findings including gene expression in the INT–IEO set and micoRNA signatures in both INT–IEO and MILD sets were previously reported ([136]Boeri et al., 2011, [137]Bianchi et al., 2004). Based on those findings, we proposed that the lung cancer natural history in early detection by CT screening can be stratified by their molecular signatures. Indeed, we find here that there are two clinically distinct types of early-detected lung cancer: “indolent” and “aggressive”. It is widely accepted that lung cancer progresses from pre-neoplastic to clinically detected diseases by accrual of genetic and epigenetic alterations, becoming metastatic in a later phase ([138]Goldstraw et al., 2007). Strategies to reduce mortality have focused on early diagnosis to eradicate lesions before metastases occur. While the ability of these strategies to reduce overall mortality remains debatable, there is agreement that a higher number of tumours than predicted were found during screening programmes, with some tumours having a poor outcome while others behave in an indolent manner ([139]Bach, 2008). A similar scenario is known to occur in other types of tumours e.g., non-Hodgkin's lymphomas which can develop as either indolent or de novo aggressive, with some indolent cases transforming into high-grade malignancies after several years ([140]Horning and Rosenberg, 1984). Boeri and colleagues analysed the role of microRNAs as biomarkers identifying a signature able to distinguish the indolent from the aggressive tumours ([141]Boeri et al., 2011). In this study we have further demonstrated by mRNA profiling, that considerable differences between indolent and aggressive early detected tumours suggested that the latter accumulate unexpectedly fatal genetic/epigenetic aberrations over a rather short period of time. Both gene functional annotation and topological pathway network analyses indicated a significant overrepresentation of genes associated with peptidase activity, response to wound healing, cell adhesion, signal transducer activity and, ultimately, metastases in the aggressive early detected tumours. Further pathway enrichment analysis, whatever the classification criteria applied (year of detection, stage or both), reflected the activation of the PI3K/PDPK1/ITGB1/AKT pathway involving ITGB1 and FOXO1A in the most aggressive tumours ([142]Testa and Bellacosa, 2001). The increase in ITGB1 levels is consistent with its known association with metastatic ability ([143]Akiyama et al., 1995, [144]Basson, 2008, [145]Ritzenthaler et al., 2008). The findings overall are also consistent with the overexpression of mir-128a, which is predicted to target the 3′-untranslated region of FOXO1A eventually regulating AKT signalling ([146]Sozzi et al., 2014), in the later year aggressive tumours of the same cohort ([147]Boeri et al., 2011). Interestingly, mir-128 was also found overexpressed in endometrial cancer with concomitant repression of FOXO1A expression ([148]Myatt et al., 2010). Thus, profiling studies on the INT/IEO training cohort by two approaches (miRNA and mRNA expression) in two different laboratories (Milan and Oxford) revealed FOXO1A/AKT pathway differential expression. [149]Balsara et al. (2004) reported that phosphorylation of FOXO1A and AKT correlates in “in situ” lung lesions, possibly leading to invasiveness, while [150]Maekawa et al. (2009) showed that the loss of expression of total FOXO1A is associated with advanced stage tumours. The association between phosphorylation or loss of FOXO1A and more aggressive disease has been also reported in prostate ([151]Li et al., 2007), colorectal ([152]Bravou et al., 2006) breast adenocarcinoma ([153]Li et al., 2009) and acute myeloid leukaemia ([154]Cheong et al., 2003). Note that PDPK1/AKT1 and the ITGB1 pathway plus FOXO1A has recently been identified as targets for treatment in light of their association with increased proliferation, metastasis and decreased apoptosis ([155]Cen et al., 2007, [156]Bloom and Calabro, 2009, [157]Fang et al., 2008). We found that not only the tumours, but also the normal lung tissue from patients identified in the first two years differ from the normal tissue of patients identified in the last three years, consistent with findings in microRNA expression analysis ([158]Sozzi et al., 2014) and supporting the notion that the tumour aggressiveness is likely conditioned by the underlying “field cancerization” ([159]Lochhead et al., 2015). These findings suggest the possibility of grouping patients as low or high risk by gene profiling the normal tissue. We validated our signature on two independent validation sets: early detected tumours from the MILD trial and a cohort of patients with clinically detected lung cancer for which microarray data was available. In the MILD trial, the 239-gene signature from the training set was able to distinguish the patients by their outcome independent of tumour stage, histotype and year of screening, indicating that this signature is specific for discriminating between indolent and aggressive tumours. This finding was confirmed in a second validation set of a cohort of clinically detected lung cancers. The signature was not only effective in predicting the outcome when applied to all patients but was able to identify those patients who will survive more than 5 years and those who survived less than 2 years within stage I. This suggests that, as in the screening trial, stage 1 clinically detected tumours are also a mixture of already highly aggressive and indolent diseases. This study conveys the translational significance in two aspects: firstly, CT screening detected early lung cancer is a pool of heterogeneous early tumours, not only from the histological type point of view, but a mixture of aggressive and indolent tumours. The gene signatures provided the molecular clue of disease outcomes independent from the cancer histopathology type. Our next step is to identify smaller, manageable signatures for personalised diagnostic purpose. It would be of value to be able to distinguish stage I tumours with aggressive characters which need targeted treatment to prevent metastatic relapse; secondly, the confirmed findings of “field cancerization” suggests the possibility to divide patients at low or high risk categories by gene profiling the normal mucosa, therefore to revise the protocol for recruiting the normal participants entering the screening trial. In conclusion, we provide molecular evidence that both screening and early stage clinically detected non-small cell lung cancers can be divided into “indolent” and “aggressive” tumours suggesting that the metastatic phenotype appears much earlier than previously thought. The 239-gene signature may serve as risk stratification biomarkers to distinguish these clinically different tumours for personalised management, although it cannot be excluded that some indolent cancers will eventually become aggressive. Due to the limitation of this study, larger studies are needed to confirm that this categorization based on mRNA and miRNA expression signatures from screening detected tumour also fits with clinically detected cases, particularly in stage I lung cancer. Since these two groups of patients could be identified by non-invasive tools, such as PET/SUV ([160]Pastorino et al., 2009) or circulating biomarkers, new prospective trials to improve the treatment of lung cancer may be conceived on this basis. The following are the supplementary data related to this article. Supplementary Material. [161]mmc1.docx^ (21.9KB, docx) Suppl. Table 1 28 cases for CT year and stage. [162]mmc2.xls^ (21.5KB, xls) Suppl. Table 2 239 genes differentially expressed between CT years 1–2 vs CT year 3–5 in 28 cases. [163]mmc3.xls^ (48.5KB, xls) Suppl. Table 3 153 genes stage 1 vs 2–4 in 28 cases by parametric permutative (permutation times > 1000) t-test p < 0.01. [164]mmc4.xls^ (37KB, xls) Suppl. Table 4 218 genes combined CT/stage 14t1 vs 6t2 fold change by parametric permutative (permutation times > 1000) t-test. [165]mmc5.xls^ (46.5KB, xls) Suppl. Table 5 203 gene in normal tissues of CT years 1–2 vs CT years 3–5 by parametric permutative (permutation times > 1000) t-test p < 0.01. [166]mmc6.xlsx^ (23.3KB, xlsx) Suppl. Table 6 1686 genes CT years 1–2 tumour vs. normal by parametric permutative (permutation times > 1000) t-test p < 0.01. [167]mmc7.xlsx^ (186.7KB, xlsx) Suppl. Table 7 1594 genes CT3–5 tumour vs. normal by parametric permutative (permutation times > 1000) t-test p < 0.01. [168]mmc8.xlsx^ (176.6KB, xlsx) Suppl. Table 8 239 genes CT12 vs CT35 annotation clustering. [169]mmc9.xlsx^ (38.5KB, xlsx) Suppl. Table 9 153 genes stage I vs stages II–IV annotation clustering. [170]mmc10.xlsx^ (28.2KB, xlsx) Suppl. Table 10 218 genes combined CT1–2/stage I vs CT3–5/stages II–IV annotation cluster. [171]mmc11.xlsx^ (48.7KB, xlsx) Suppl. Table 11 203 normal CT1/2 vs normal CT3/5 functional annotation clustering. [172]mmc12.xlsx^ (52KB, xlsx) Suppl. Table 12 1686 genes CT1/2 tumour vs normal functional annotation clustering. [173]mmc13.xlsx^ (53KB, xlsx) Suppl. Table 13 1594 genes CT3/5 tumour vs normal functional annotation clustering. [174]mmc14.xlsx^ (35.2KB, xlsx) Suppl. Table 14 Cluster-specific reproducibility. [175]mmc15.docx^ (13.5KB, docx) Suppl. Table 15 Patient distribution of the MILD trial and the public GEO dataset [176]GSE11969 by stage gene signature. [177]mmc16.pdf^ (92.8KB, pdf) Suppl. Fig. 1 Clustering heatmap of 153 genes of stage 1 vs 2–4. [178]mmc17.docx^ (414.3KB, docx) Suppl. Fig. 2 Clustering heatmap of 218 genes CT1–2/stage1 vs CT3–5/stages 2–4. [179]mmc18.docx^ (416.7KB, docx) Suppl. Fig. 3 Clustering heatmap of 203 genes normal 1–2 vs normal 3–5. [180]mmc19.docx^ (775.9KB, docx) Suppl. Fig. 4 Signature of tumour stage. [181]mmc20.docx^ (71.4KB, docx) Suppl. Fig. 5 203 genes network direct connection. [182]mmc21.docx^ (311.8KB, docx) Suppl. Fig. 6 203 genes network biological process. [183]mmc22.docx^ (341.5KB, docx) Suppl. Fig. 7 Comparison of expression level by TaqMan QRT-PCR and microarray in 4 selected genes. [184]mmc23.docx^ (61.6KB, docx) Suppl. Fig. 8 FOXO1a cyto score v stage & CT year; FOXO1a nuc score v stage & CT year; FOXO1a immunostaining. [185]mmc24.pdf^ (43.1KB, pdf) [186]mmc25.pdf^ (59.5KB, pdf) [187]mmc26.doc^ (5.4MB, doc) Author Contributions J.H., G.S., A.M. designed the study. U.P. initiated this work. J.H., G.S., D.L., and M.B. performed the research and obtained data. J.H., G.S., M.B., A.M., F.P., L.R., and G.P. analysed and interpreted the results. J.H., F.P., K.G., U.P. wrote the paper. All authors read, gave comments and approved the final version of the manuscript. All authors had full access to the data in the study and take responsibility for the integrity of the results. Declaration The authors have no conflict of interests related to this article. Acknowledgements