Graphical abstract graphic file with name fx1.jpg [57]Open in a new tab Highlights * • 4q12 deletion contributes to lung cancer progression via downregulation of SPATA18 * • Proteomics reveals diverse enriched pathways in different stages of lung adenocarcinoma * • Three proteomic subtypes linked to lung adenocarcinoma progression are identified * • Four immune clusters are illustrated, characterized by discernible immune signatures __________________________________________________________________ Zhang et al. conduct multi-omic profiling on both pre-invasive and invasive stages of lung adenocarcinoma. Their investigations elucidate the molecular architecture linked to the progression of lung adenocarcinoma and reveal the proteomic subtypes. This work establishes a valuable community resource for subsequent research on tumorigenesis and progression. Introduction Lung cancer remains the leading cause of cancer deaths worldwide.[58]^1^,[59]^2 Along with the wide application of low-dose computed tomography (LDCT) screening, there is an increasing detection rate of early-stage lung adenocarcinoma,[60]^3^,[61]^4^,[62]^5 which is predominantly attributed to rising cases of adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA). AIS is defined as a localized small (≤3 cm) adenocarcinoma with growth restricted to neoplastic cells along pre-existing alveolar structures without invasion, while MIA is a small adenocarcinoma (≤3 cm) with a predominantly lepidic pattern and ≤5 mm invasion in the greatest dimension in any one focus.[63]^6^,[64]^7 Pathologically, lung adenocarcinoma is generally considered to follow a stepwise progression from AIS/MIA to invasive lesions.[65]^8 Both AIS and MIA are classified as the pre-invasive stage of lung adenocarcinoma, whereas invasive adenocarcinoma (IAC) represents the invasive stage. Correspondingly, the 5-year recurrence-free survival (RFS) rate for patients with AIS/MIA after surgical resection could be as high as 100%,[66]^9 while it decreases dramatically for those with IAC.[67]^10 Therefore, it is critical to identify the molecular architecture and hallmarks contributing to the progression from AIS/MIA to IAC. Several studies have elucidated molecular changes during lung cancer progression at the dimension of the genome.[68]^11^,[69]^12^,[70]^13^,[71]^14^,[72]^15 However, due to the small diameter and limited tissues of pre-invasive lesions, the multi-omics analysis is yet lacking. Our previous study demonstrated that TP53 mutation, arm-level copy number alterations (CNAs), and HLA loss of heterozygosity were increased in frequency during the development of lung adenocarcinoma.[73]^16 Nevertheless, genomic changes need to be translated into alterations at the RNA and protein levels to affect phenotypes, and there remains a gap in the deep investigation of lung adenocarcinoma progression at the multi-omics level. To decipher the proteogenomic trajectories from normal lung tissues to AIS, MIA, and ultimately IAC, we performed an integrated proteogenomic characterization of pre-invasive and invasive lung adenocarcinoma as well as paired adjacent normal tissues. This proteogenomic study elucidates key molecular events driving the tumorigenesis and progression of lung adenocarcinoma, providing a resource for the scientific community to further delineate the underlying process. Results Proteogenomic landscape of AIS/MIA and IAC We prospectively collected 197 lung tumors and paired normal adjacent tissues (NATs) from east-Asian patients, including 98 AIS/MIAs and 99 IACs ([74]Figures 1A and [75]S1A). No discernible differences were observed between AIS and MIA subtypes ([76]Figures S1B and S1C), and thus AIS/MIA subtypes were classified as one group in this study. The prospective cohort consisted of 72.6% non-smoking and 27.4% smoking patients ([77]Figure S1D), and diverse clinical characteristics are listed in [78]Table S1A. Pathological diagnosis of pre-invasive or invasive lung adenocarcinoma was evaluated by at least two experienced pulmonary pathologists. There was no significant difference in tumor purity evaluated by the ESTIMATE algorithm[79]^17 (Wilcoxon signed-rank test, p = 0.15; [80]Figure S1E) or pathologists (Wilcoxon signed-rank test, p = 0.26; [81]Figure S1F) between AIS/MIA and IAC. Integrative profiling consisting of whole-exome sequencing (WES), RNA sequencing (RNA-seq), and proteomic and phosphoproteomic analyses was conducted ([82]Figure 1A). Figure 1. [83]Figure 1 [84]Open in a new tab Proteogenomic overview of AIS/MIA and IAC cohorts (A) Schematic summary and omics data of AIS/MIA and IAC cohorts. (B) Overview of top 10 altered genes stratified by invasive extent of tumor. (C) Lollipop plots showing details of EGFR mutations in the Fudan cohort (all tumors), as well as the AIS/MIA and IAC cohorts. (D) Tumor mutation burden in the Fudan cohort and other public cohorts. (E) Numbers of gene mutations (left) and tumor size (right) in AIS/MIA and IAC. (F) Cumulative protein number (left) and absolute protein number (right) identified by proteomic analysis in normal tissues and tumors of the AIS/MIA cohort. (G) Cumulative protein number (left) and absolute protein number (right) identified by proteomic analysis in normal tissues and tumors of the IAC cohort. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. The landscapes of somatic mutations, deletions, and insertions are presented in [85]Figure 1B ([86]Table S2A). Comparative analysis showed that lower tumor mutation burden (TMB) was detected in the AIS/MIA group ([87]Figures 1C, 1D, and 1E) and among the Fudan cohort and other studies.[88]^12^,[89]^18 In addition, the top-ranked mutation of EGFR was detected as early as in the AIS/MIA subtype and lasted into the IAC subtype ([90]Figures 1C and [91]S1G), indicating that EGFR mutation was a key event in the early stage during lung cancer progression. Mutually exclusive analysis is generally used to identify driver genes and pathways and investigate their functional relationships.[92]^19 The top-ranked mutation of EGFR was mutually exclusive of other driver gene mutations (e.g., BRAF, ERBB2) ([93]Figure S1H), in accordance with results from the Cancer Genome Atlas,[94]^20 TRACERx,[95]^21 and the MSKCC cohort.[96]^22 TP53 was the most frequently mutated tumor suppressor in our cohort, accounting for 6.12% of AIS/MIA patients and 37.37% of IAC patients ([97]Figure S1I). Taken together, EGFR mutation might already exist in AIS/MIA, whereas TP53 mutation was prominent in IAC. To investigate the effect of mutation signatures in AIS/MIA and IAC, mutation spectra were decomposed and compared according to known signatures derived from the Catalogue of Somatic Mutations in Cancer (COSMIC).[98]^23 Interestingly, cosine similarity of single base substitution (SBS) 1 and SBS2 was higher in IAC compared with AIS/MIA ([99]Figure S1J). Notably, SBS2, attributed to APOBEC activity,[100]^24 was associated with tumor progression from AIS/MIA to IAC. This is in line with the previous study revealing the association of APOBEC activity and invasiveness in distinct subtypes of invasive lung adenocarcinoma.[101]^25 These findings indicated the exact difference between AIS/MIA and IAC at the DNA level. Smoking is widely recognized as a significant promoter of lung cancer. In our study, the smoking group exhibited a higher TMB ([102]Figure S1K). Furthermore, we observed a lower prevalence of EGFR mutation and a higher prevalence of TP53 mutation in the former/current smoking group (Wilcoxon signed-rank test, p = 0.0087) ([103]Figure S1L). Proteomic analysis was performed using a label-free quantification strategy.[104]^26^,[105]^27 The average (Spearman’s) coefficients were 0.77 in the AIS/MIA cohort and 0.76 in the IAC cohort ([106]Figure S1M), allowing the comparison between samples during lung cancer progression. High consistency of mass spectrometry (MS) data for the standards (HEK293T cell lysates) was observed among different batches of HEK293T lysates ([107]Figure S1N), indicating the consistent stability of our MS platform. A total of 11,708 proteins were identified in the AIS/MIA cohort, and 19,629 proteins were detected in the IAC cohort with 1% false discovery rate (FDR) ([108]Figures 1F, 1G, and [109]S1O; [110]Tables S3A and S3B). At the phosphoprotein level, 11,852 phosphoprotein sites were discovered in our cohort ([111]Figure S1P; [112]Table S3C). Notably, more proteins and phosphoprotein sites were detected in the tumor tissues compared with paired normal tissues ([113]Figures 1F, 1G, and [114]S1P). The deletion of chr4q12 was the key event in the transition process from AIS/MIA to IAC As CNAs play critical parts in activating oncogenes and deactivating tumor suppressor genes during the progression of tumors, deletion peaks connecting to survival outcomes (log rank test, p < 0.05) were further investigated. As a result, deletions in chr4q12 and chr5q21.1 were two main alterations associated with clinical outcomes ([115]Figures 2A, 2B, 2C, and [116]S2A). The chr4q12 deletion was detected only in IAC, indicating that the crucial molecular events happened during chr4q12 deletion (Fisher’s exact test p = 1.5E−6; [117]Figure 2B; [118]Tables S2B and S2C). Figure 2. [119]Figure 2 [120]Open in a new tab Copy number alterations (CNAs) associated with the progression from AIS/MIA to IAC (A) Deletion peaks related to survival in patients with lung adenocarcinoma. (B) Distribution of two most significant deletion peaks (chr5q21.1 and chr4q12 deletion) in AIS/MIA and IAC (Fisher’s exact test). (C) Kaplan-Meier plot comparing overall survival (OS) stratified by chr4q12 deletion in IAC patients receiving whole-exon sequencing. Log rank test was used to calculate the p value. (D) cis CNA-protein and CNA-RNA correlations of genes located in chr4q12 and chr5q21.1. (E) Association between SPATA18 deletion and protein-level expression (Pearson’s correlation test). (F) Comparison of SPATA18 protein and RNA expression in AIS/MIA and IAC. (G) Kaplan-Meier plot comparing OS stratified by SPATA18 deletion or not in patients receiving whole-exome sequencing. (H) Kaplan-Meier plot comparing OS stratified by the protein level of SPATA18 in patients receiving proteomic analysis. (I) Heatmap visualizing the correlation between SPATA18 deletion and genes correlated to lysosome-like organelle and mitochondria markers. (J) Cell proliferation analysis was conducted with CCK-8 assays in A549 transfected with shSPATA18 or shNC. The experiment was repeated three times. (K) Cell-cycle analysis was performed in A549 transfected with shSPATA18 or shNC. The experiment was repeated three times. (L) The effect of downregulated SPATA18 on wound healing. The left shows typical images of wound healing in three groups, and the right shows the quantified results of gap area. The experiment was repeated three times. (M) The effect of downregulated SPATA18 on migration and invasion. The left shows typical images of migration and invasion in three groups, and the right shows quantified results of cell number. The experiment was repeated three times. (N) Immunofluorescence images of LysoSensor and MitoTracker in A549 transfected with overexpressed SPATA18 or shSPATA18. The density of LysoSensor indicates the pH of the lysosome, and density of MitoTracker represents mitochondrial membrane potential. The experiment was repeated three times. (O) The effect of SPATA18 on A549 cell xenografts in nude mice. (P) Diagram showing associations among chr4q12 deletion, SPATA18 downregulation, decreased mitophagy, and tumor growth. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. To investigate the biological effects of chr4q12 and chr5q21.1 deletions, we undertook investigations into genes displaying cis-regulatory effects of somatic CNAs (SCNAs) and downregulation from AIS/MIA to IAC at both the protein and the RNA levels. To this end, only Spermatogenesis Associated 18 (SPATA18, also known as Mieap) was identified at both levels ([121]Figure 2D), and showed cis effects on its counterpart protein ([122]Figure 2E). Compared with AIS/MIA, the expression of SPATA18 was lower in IAC at the RNA and protein levels ([123]Figure 2F). In addition, SPATA18 deletion event and low expression of SPATA18 protein were correlated with worse overall survival (OS) (log rank test, p < 1.0E−4 for deletion event, p = 3.2E−3 for protein level) ([124]Figures 2G and 2H). As well, we performed immunohistochemistry (IHC) analysis of SPATA18 expression in 15 tumors with AIS/MIA and 16 with IAC ([125]Figure S2B), and significantly lower protein level of SPATA18 in IAC was observed (t test, p = 3.0E−4; [126]Figure S2C). To further validate the prognostic effect of SPATA18 protein, we collected a new cohort (validation cohort) consisting of 270 patients with IAC, and performed IHC analysis. As a result, low expression of SPATA18 significantly indicated worse OS (log rank test, p = 2.5E−3) and RFS (log rank test, p < 1.0E−4) in patients with lung adenocarcinoma ([127]Figures S2D and S2E). SPATA18 is a novel p53-induced protein that has been reported to affect the biological functions of mitochondria.[128]^28 SPATA18 is able to induce lysosome-like organelles to eliminate oxidized mitochondrial proteins, which are essential for mitochondrial quality control.[129]^29^,[130]^30 Our analysis showed that SPATA18 deletion was associated with downregulation of lysosome-like organelle markers as well as upregulation of mitochondrial membrane molecules and oxidative phosphorylation (OXPHOS) ([131]Figure 2I). Lysosome-like organelle markers (e.g., VPS11 and SORL1) positively correlated with SPATA18 at the protein level ([132]Figures S2F and S2G). Furthermore, negative correlation was noted between SPATA18 and TOP2A as well as DNA replication ([133]Figures S2H and S2I), suggesting the pro-tumor role of SPATA18 downregulation. Two typical markers for mitochondrial membrane, TIMM17B and GHITM, were confirmed to be higher in IAC and affect survival negatively ([134]Figure S2J). In addition, the expression of proliferative markers, such as MCM4 and TOP2A, was higher in IAC and showed negative correlation with the OS of IAC patients ([135]Figure S2K). SPATA18 has been proven to be an important factor for mitophagy.[136]^31 We also performed in vitro and in vivo experiments to confirm the biological effect of SPATA18. Downregulation of SPATA18 could significantly promote cell growth and cell cycle ([137]Figures 2J and 2K). Migration and invasion are recognized as critical steps in which tumor cells detach from the primary tumor site and infiltrate lymphatic and blood vessels, ultimately contributing to tumor progression.[138]^32 Given the potential role of SPATA18 in the progression from AIS/MIA to IAC, we performed wound healing, migration, and invasion assays. The results demonstrated that the downregulation of SPATA18 enhanced the ability of migration and invasion in lung adenocarcinoma ([139]Figures 2L and 2M). Next, we investigated the effect of SPATA18 on mitophagy. Following transfection with two short hairpin RNAs (shRNAs) targeting shSPATA18, a notable decrease in fluorescence density was observed for LysoSensor, and an increase was detected for MitoTracker ([140]Figure 2N). These findings collectively suggested that the inhibition of mitophagy occurred as a consequence of SPATA18 downregulation. Animal experiments also confirmed the pro-tumor effect of SPATA18 downregulation ([141]Figure 2O). Taken together, the chr4q12 deletion might be the key event in the transition process from AIS/MIA to IAC, in which the SPATA18 deletion advanced tumor growth by suppressing mitophagy ([142]Figure 2P). This study provided evidence for using SPATA18 as a therapy target for lung adenocarcinoma. Proteomic characterization of normal lung tissues, AIS/MIA, and IAC Principal-component analysis (PCA) showed the differentiated profiles of normal lung tissues, AIS/MIA, and IAC, at the RNA and protein levels ([143]Figures 3A and [144]S3A–S3C; [145]Tables S3A and [146]S4), which indicated intrinsic differences in distinct stages of lung cancer progression. To understand the association between transcriptomic and proteomic processes, the correlation of mRNA and protein was calculated for NATs and tumor tissues. NATs displayed a fair correlation of 0.20, while tumors had a higher value of 0.39 ([147]Figure S3D). Among tumor tissues, we observed correlation coefficients of 0.36 for AIS/MIA and 0.40 for IAC, indicating similar correlations of mRNA and protein between these two groups ([148]Figure S3E). Figure 3. [149]Figure 3 [150]Open in a new tab Proteogenomic characterization of normal tissues, AIS/MIA, and IAC (A) Principal-component analysis (PCA) of proteomic data in normal tissues (n = 126), AIS/MIA (n = 59), and IAC (n = 75). (B) Heatmap of enriched pathways using proteomic data in normal tissues, AIS/MIA, and IAC. (Top) Green box, pathways related to normal tissues; red box, pathways related to AIS/MIA; blue box, pathways related to IAC. (C) Protein levels and fold changes in RNA in genes belonging to complement activation and MAPK cascade. Squares, log[2] (standard protein expression in normal tissues, AIS/MIA, and IAC); circles, log[2](fold changes in RNA levels in IAC compared with those in AIS/MIA). (D) Kaplan-Meier plots comparing overall survival (OS) stratified by protein levels of genes belonging to cell adhesion, complement activation, cell cycle, and FGFR signaling in patients receiving proteomic analysis. Log rank test was used to calculate the p value. (E) Gene set enrichment analysis (GSEA) based on proteomic data stratified by EGFR mutation status. (F) RNA and protein expression levels of genes related to NOTCH signaling pathway and citrate cycle TCA cycle stratified by EGFR mutation status. (G) Kaplan-Meier plot comparing OS stratified by the protein level of NOTCH1, a key molecule of the NOTCH signaling pathway, in patients receiving proteomic analysis. (H) A network model showing interactions between EGFR mutation and altered pathways. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. To explore the molecular characterization of the tumorigenesis and progression of lung adenocarcinoma, gene set enrichment analysis (GSEA) was performed ([151]Figures 3B, 3C, and [152]S3F). As a result, we found that normal lung tissues were predominantly associated with cell adhesion, oxygen transportation, and retinoid metabolic process ([153]Figure 3B), which could be partially explained by physiological functions of lungs, such as pulmonary gas exchange.[154]^33 Specifically, the highly expressed proteins in the normal lung tissues, such as LAMC2 and CDH5 (from cell adhesion), showed positive association with the prognosis of lung cancer patients ([155]Figure 3D). AIS/MIA had upregulated signatures of blood coagulation, complement activation, cytolysis, and fatty acid β-oxidation ([156]Figures 3B and 3C). In IAC, the predominant pathways consisted of MAPK cascade, small-GTPase-mediated signal transduction, cell cycle, NIK/NF-κB signaling, ephrin receptor signaling pathway, FGFR signaling pathway, and fatty acid biosynthetic process ([157]Figures 3B and 3C). We also validated our findings in a dataset[158]^34 that investigated the transcriptomic differences in distinct growth patterns within single lung tumors. Similar enriched pathways were also detected ([159]Figures S3G and S3H). Meanwhile, both AIS/MIA- and IAC-enriched proteins were correlated with worse prognosis in patients with lung adenocarcinoma, including CFB, C2 (from complement activation), MCM2, YWHAB (from cell cycle), TIAL1, and GRB2 (from FGFR signaling) ([160]Figure 3D). To provide potential targets for future studies, we also presented upregulated markers in AIS/MIA, which were associated with survival outcomes ([161]Table S1D). Taken together, these findings revealed the characterization during carcinogenesis and development of lung adenocarcinoma at the proteomic level. Proteomic characterization of lung adenocarcinoma with EGFR mutation Target therapy toward EGFR mutation has revolutionized treatments for lung cancer.[162]^35^,[163]^36 In the Fudan cohort, EGFR mutation stood out as the top-ranked frequent mutation, co-occurrent with TP53 and RBM10 mutations ([164]Figure S1H). To access the impact of EGFR mutation in tumorigenesis and progression, we then investigated significantly dysregulated hallmarks associated with EGFR mutation. GSEA showed that the citrate cycle TCA cycle (p = 0.047) and NOTCH signaling pathway (p = 0.012) were significantly enriched in patients with EGFR mutation at the protein level ([165]Figures 3E and 3F). Similar trends were also observed at the RNA level ([166]Figures 3F, [167]S3I, and S3J). Higher enrichment scores of NOTCH signaling were also detected in the EGFR mutation group from three other datasets ([168]Figure S3K).[169]^34^,[170]^37^,[171]^38 A key regulatory protein from NOTCH signaling, NOTCH1, had negative effects on the survival of patients with lung adenocarcinoma (log rank p < 1.0E−4) ([172]Figure 3G). The prognostic effect existed in both the EGFR mutation group (log rank p = 5.6E−4) and the EGFR wild-type (WT) group (log rank p < 1.0E−4) ([173]Figure S3L). Moreover, the association between EGFR mutation and downregulation of complement and coagulation cascades reported by the Clinical Proteomic Tumor Analysis Consortium (CPTAC)[174]^38 was also confirmed in the Fudan cohort ([175]Figure S3M). Specifically, high expression of C1QB and SERPINA1 was associated with poor prognosis in all the patients with lung adenocarcinoma and in the subgroup of patients with EGFR mutation ([176]Figure S3N). Collectively, our results delineated the landscape of downstream proteomic characterization of EGFR mutation ([177]Figure 3H). Proteomic evolution from AIS/MIA to IAC under the background of EGFR mutation EGFR mutation is an essential molecular event in the change from normal lung tissue to early-stage lung cancer according to our study, but the development trajectory of EGFR-mutant lung adenocarcinoma remains largely unknown. Therefore, the intrinsic proteomic difference between EGFR-mutant AIS/MIA and EGFR-mutant IAC was further investigated ([178]Figure S4A). A higher proportion of patients over 60 years of age was observed in the EGFR-mutant IAC (chi-squared test, p = 0.0132) ([179]Figure S4B). Extracellular matrix (ECM)-receptor interaction, complement and coagulation cascades, and endocytosis were enriched in EGFR-mutant AIS/MIA ([180]Figures 4A, 4B, and [181]S4C). When the disease stage switched to IAC, typical downstream pathways of EGFR signaling were activated, such as PI3K-Akt, MAPK, and mTOR signaling pathways ([182]Figures 4A, 4B, and [183]S4C). Similar trends of enriched pathways were also observed in other lung cancer cohorts, including the Tavernari cohort ([184]Figure S4D).[185]^34 Subsequently, we compared kinase abundance between the two groups. Despite harboring EGFR mutations, AIS/MIA had higher expression of MAPK7, while IAC was associated with overexpressed CSNK2A1 and MTOR ([186]Figure 4C). The downstream phosphorylation analyses of enriched kinases revealed that cell cycle was nominated in AIS/MIA with EGFR mutation. As the disease stage progressed to the IAC stage, DNA replication and autophagy were shown to be activated ([187]Figures 4D and [188]S4E), and the related phosphorylated proteins (e.g., SSB_S366 and ULK1_S638) showed negative association with the prognosis of lung cancer patients (log rank p = 0.033 for SSB_S366, log rank p = 0.0013 for ULK1_S638) ([189]Figure 4E). As a highly expressed kinase in AIS/MIA and IAC, CDK7 was negatively correlated with survival (log rank p = 0.048) ([190]Figure S4F). Collectively, EGFR mutation played distinct roles in AIS/MIA and IAC, providing possible explanations for the different biological behaviors of EGFR-mutant AIS/MIA and IAC ([191]Figure 4F). Figure 4. [192]Figure 4 [193]Open in a new tab Proteomic characterization from EGFR-mutant AIS/MIA to EGFR-mutant IAC (A) Heatmap of enriched pathways using proteomic data in AIS/MIA and IAC under the background of EGFR mutation. (Top) Red box, pathways related to AIS/MIA; blue box, pathways related to IAC. Chi-squared test was used to calculate p value for the association between baselines and pathological results. The Benjamini-Hochberg method was used to adjust p values of differentially expressed proteins between EGFR^mut AIS/MIA and IAC samples. (B) Overview of dysregulated pathways related to EGFR mutation in AIS/MIA and IAC. (C) Protein kinases with aberrant expression in EGFR-mutant AIS/MIA and IAC. (D) Protein level of upregulated kinases and their downstream phosphorylation sites. (E) Kaplan-Meier plot comparing overall survival stratified by the phosphorylation levels of SSB_S366 and ULK1_S638. (F) A model depicting the distinct roles of EGFR mutation in AIS/MIA and IAC. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. Proteomic subtypes of lung adenocarcinoma Consensus clustering was performed to specify proteomic subtypes, and three subtypes were identified (S-I, n = 27; S-II, n = 59; and S-III, n = 48) ([194]Figures 5A and [195]S5A). Patients classified as S-III exhibited worse RFS and OS, whereas those classified as S-I demonstrated improved RFS and OS ([196]Figures 5B and [197]S5B). S-I mainly consisted of patients with AIS/MIA (81.5%), while S-III was predominantly composed of patients with IAC (87.5%) ([198]Figures S5C and S5D). S-II was the intermediate group between S-I and S-III. The mutation signatures of the three subtypes were presented ([199]Figure S5E). A previous study also proposed several clusters for lung cancer,[200]^37^,[201]^38^,[202]^39^,[203]^40 and we further compared our proteomic subtypes with those from published studies. There were some overlaps between ours and theirs ([204]Figures S5F and S5G). Given the much higher proportion of AIS/MIA in our study compared with previous studies, we identified S-I as the subtype representing the pre-invasive stage ([205]Figures S5F and S5G). Patients with S-I in our study could not be identified by other clustering systems, highlighting the originality of our proteomic subtypes. Figure 5. [206]Figure 5 [207]Open in a new tab Proteomic subtypes for lung adenocarcinoma (A) Heatmap showing proteomic subtypes annotated with clinical features. (B) Kaplan-Meier plot comparing recurrence-free survival in different proteomic subtypes. Log rank test was used to calculate the p value. (C) Enriched pathways in three proteomic subtypes. The enrichment score was calculated by gene set variation analysis (GSVA). (D) Heatmap of differentially expressed proteins among the proteomic subtypes. (E) Kaplan-Meier plot comparing overall survival (OS) in patients with 7p11.2 amplification or not. (F) Copy number variation (CNV)-protein correlation for genes located in 7p11.2 (Pearson’s correlation test). (G) SCNA, RNA, and protein levels of SUMF2, EGFR, and GBAS in three proteomic subtypes. (H) Kaplan-Meier plot comparing OS in patients stratified by the protein level of EGFR. (I) Enriched pathways related to the protein level of EGFR (Pearson’s correlation test). (J) A working model of 7p11.2 amplification promoting tumor progression via EGFR upregulation. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. Subtype-specific pathway enrichment was further conducted to disclose different molecular features among the three subgroups. Extracellular matrix organization, cell adhesion, proteolysis, leukocyte migration, and neutrophil degranulation were enriched in S-I ([208]Figures 5C and 5D). S-II was predominantly associated with extracellular matrix organization, cell adhesion, leukocyte migration, neutrophil degranulation, Rho protein signal transduction, LDL particle-mediated signaling, and negative regulation of complement activation ([209]Figures 5C and 5D). S-III was characterized by high levels of Wnt signaling pathway, proteasome-mediated ubiquitin-dependent protein catabolic process, DNA repair, MAPK cascade, autophagy, integrin-mediated signaling pathway, cell cycle, fatty acid metabolic process, TOR signaling, Toll-like receptor signaling pathway, and ERBB2 signaling pathway ([210]Figures 5C and 5D). We also investigated the mutation status in the three subtypes and found that TP53 was more frequently mutated in S-III than in S-I and S-II (chi-squared test, p = 0.041) ([211]Figure S5H). Taken together, the results show that the three proteomic subtypes had distinct molecular backgrounds and represented different stages of tumor progression. Subsequently, difference in copy number variation was further investigated among the three subtypes. As a result, we found the chr1q21.1, chr5q35.1, chr7p11.2, and chr14q13.3 amplifications differentially occurred in the three subtypes ([212]Figures 5A and [213]S5I), and the appearance of these amplifications indicated worse survival ([214]Figures 5E and [215]S5J). The chr7p11.2 amplification also had a prognostic role in each proteomic subtype (log rank p = 1.7E−4) ([216]Figure S5K). SUMF2, EGFR, and GBAS, located at chr7p11.2, were the top three proteins that had significant correlation with the copy number ([217]Figure 5F). SCNA, RNA, and protein levels of these three genes increased from S-I to S-III, and EGFR stood out for its potential role in CRISPR and RNAi databases[218]^41^,[219]^42 ([220]Figure 5G). The protein level of EGFR was negatively associated with clinical outcome ([221]Figure 5H). To verify the effect of the chr7p11.2 amplification, the downstream pathway alterations related to the high expression of EGFR were further studied. Wnt signaling, ubiquitin-mediated proteolysis, and cell cycle were upregulated by EGFR ([222]Figures 5I and [223]S5L). Consequently, the chr7p11.2 amplification could lead to the upregulation of EGFR, ultimately promoting tumor progression of lung adenocarcinoma ([224]Figure 5J). Immune clusters and tumor microenvironment in AIS/MIA and IAC To gain a deeper understanding of immune characterization in AIS/MIA and IAC, we accessed xCell ([225]https://xcell.ucsf.edu)[226]^43 with the transcriptomic and proteomic data, which were then analyzed by the consensus clustering ([227]Table S5). Four distinct subtypes were identified among 186 paired tumor samples ([228]Figures 6A and [229]S6A). Based on a Sankey diagram, we found a positive association between proteomic clusters and immune clusters (Fisher’s exact test, p = 0.03) ([230]Figure S6B). Kaplan-Meier analysis implicated worse survival in patients with the immune depression subtype compared with the other three subtypes (log rank p = 1E−3 for OS, p < 1E−4 for RFS) ([231]Figures 6B and [232]S6C). The majority of patients with the endothelial cell subtype had AIS/MIA, whereas the majority of patients with the immune depression subtype had IAC ([233]Figure 6C). We further investigated the prevalence of the four immune clusters stratified by EGFR mutation status. There was no distributional difference in EGFR mutation among the four subtypes ([234]Figures 6A and [235]S6D). The endothelial cell subtype had the highest proportion of AIS/MIA in both EGFR mutation and WT groups, and the immune depression subtype had the highest proportion of IAC in both groups ([236]Figure S6E). In the endothelial cell subtype, immune signatures including endothelial cells and neutrophils were prominent, as evidenced by the high expression of CD34/VWF/KDR/FLT1/LYVE1 and CD16B ([237]Figure 6D). The B cell subtype displayed more B cells, as evidenced by the overrepresented CD19, CD79B, and CD22 ([238]Figure 6D). In the dendritic cell (DC) subtype, macrophages, monocytes, and DCs were predominant, and CD209, CD83, CD33, and CD68 were highly expressed ([239]Figure 6D). The immune depression subtype was distinct from the others, with low expression of all immune cells and CD molecules ([240]Figure 6D). Figure 6. [241]Figure 6 [242]Open in a new tab Immune clusters of lung adenocarcinoma (A) Heatmap showing proteogenomic profiles comprising xCell signatures, CD molecules (RNA), potentially targetable immune checkpoints (RNA), and enriched pathways (proteome) in four distinct immune clusters. (B) Kaplan-Meier plot comparing overall survival for the four immune clusters. Log rank test was used to calculate the p value. (C) Proportions of AIS/MIA and IAC in the four immune clusters. (D) Contour plot of density based on stroma score and endothelial cell signature. For each cluster, key upregulated molecules and pathways, as well as potentially targetable immune checkpoints, were reported based on transcriptome (R) or proteome (P). Red, endothelial cells; green, B cells; purple, DCs; blue, immune depression. (E) Distribution of enrichment scores of G2M checkpoint and E2F targets signaling calculated by gene set variation analysis (GSVA) based on global proteomic and transcriptomic data in the four immune clusters (Kruskal-Wallis test). (F) Proportions of TP53 mutation in the four RNA-based immune clusters (chi-squared test). (G) Gene set enrichment analysis (GSEA) based on proteomic data stratified by TP53 mutation status. (H) Molecular profiling of RNA and protein in genes from DNA replication and base excision repair stratified by TP53 mutation. The asterisk was defined as follows: ∗∗∗∗p < 1.0E−4, ∗∗∗p < 1.0E−3, ∗∗p < 0.01, ∗p < 0.05. Next, we investigated the prognostic significance of these immune cells within the entire cohort. The results revealed that elevated levels of endothelial cells and DCs were significantly linked to improved survival ([243]Figure S6F). Neutrophils were found to exhibit a pro-tumoral role in most studies primarily focused on advanced tumors.[244]^44 However, contrasting evidence from other investigations has underscored the anti-tumoral potential of neutrophils, particularly in the context of early-stage cancer, such as at the N0 stage.[245]^45^,[246]^46^,[247]^47 Consistently, we also found that the high expression of neutrophils in patients with N0 diseases indicated better survival ([248]Figure S6G). To investigate biological landscapes and functional processes in each subtype, we conducted gene set variation analysis (GSVA) using transcriptomic and proteomic data. TNFA signaling via NF-κB was mostly enriched in the endothelial cell subtype, allograft rejection and interferon gamma response in the B cell subtype, IL-2 STAT5 signaling and coagulation in the DC subtype, and pathways related to DNA repair, hypoxia, G2M checkpoint, E2F targets, MYC targets, and glycolysis in the immune depression subtype ([249]Figures 6D and 6E). Given the fact that the G2M checkpoint, E2F target, and MYC target pathways were closely correlated with cell proliferation and cell-cycle activity,[250]^48^,[251]^49 drugs targeting the cell cycle, including CDK1 inhibitors (e.g., RO-3306) and CDK4/6 inhibitors (e.g., palbociclib, abemaciclib),[252]^50^,[253]^51 might be effective alternatives for the immune depression subtype. Upregulation of DNA replication was also observed in the immune depression subtype ([254]Figure S6H), indicating the potential use of PARP inhibitors in this special subtype.[255]^52^,[256]^53 We also conducted analyses to confirm the predictive value of these immune signatures in patients receiving immunotherapy.[257]^54 Compared with patients receiving atezolizumab (PD-L1 inhibitor) with short OS (<6 months), the signatures of B cells, plasma cells, and DCs were highly expressed in patients with long OS (>12 months) with atezolizumab ([258]Figure S6I). Moreover, the corresponding markers of these immune signatures were also overrepresented in patients with long OS (>12 months) on atezolizumab ([259]Figure S6J). Importantly, those immune signatures exhibited a positive association with prognosis in patients receiving immunotherapy ([260]Figure S6K). Furthermore, DCs were correlated with improved survival in patients receiving docetaxel ([261]Figure S6I), and the upregulation of CD83 was observed in patients with longer OS ([262]Figure S6J). Taken together, B cells and plasma cells could predict the benefits of immunotherapy in patients with lung adenocarcinoma. Interestingly, TP53 mutation and patients over 60 years of age were prominent in the immune depression subtype ([263]Figures 6A, 6F, and [264]S6L). We further investigate the impacts of TP53 mutation. As a result, GSEA showed that TP53 mutation had positive effects on DNA replication and base excision repair at the RNA and protein levels ([265]Figures 6G, 6H, and [266]S6M). Furthermore, MCM3, MCM6, and RFC3, the overrepresented proteins from the DNA replication pathway, had a significantly negative effect on the survival of patients with lung adenocarcinoma (log rank p < 1.0E−4 for MCM3, p < 1.0E−4 for MCM6, and p < 1.0E−4 for RFC3) ([267]Figure S6N). Discussion Lung adenocarcinoma accounts for 40%–50% of non-small cell lung cancer.[268]^55^,[269]^56 Investigations on carcinogenesis and progression of lung adenocarcinoma enable the better understanding of this disease. Although some advancements have been achieved in genomic profiling in lung pre-neoplasia and IAC,[270]^12^,[271]^57 gaps between genomic alterations and oncogenic proteins still exist. To fulfill the above limitations, we conducted proteogenomic analysis on 197 tumors and matched NATs, comprising 98 lung pre-invasive (AIS/MIA) adenocarcinomas and 99 IACs. This is a comprehensive proteogenomic study to delineate the molecular architecture and hallmarks of carcinogenesis and tumor progression from AIS/MIA to IAC. Our analysis provides a comprehensive multi-omic landscape for pre-invasive and invasive lung adenocarcinoma, encompassing somatic gene mutations and signatures, arm-level events, proteome-based pathways, and the immune microenvironment. Our integrated proteogenomic characterization of AIS/MIA, IAC, and their paired NATs serves as a public resource for the scientific community, paving the way for future studies regarding carcinogenesis and progression of lung adenocarcinoma. As the utilization of LDCT continues to gain popularity as a screening tool for lung cancer, the proportion of patients with ground-glass opacity (GGO)-featured lung cancer is increasing.[272]^5 If we can uncover the intrinsic mechanisms that drive tumorigenesis and development from pre-invasive to invasive lung adenocarcinoma, interventions targeting these mechanisms may aid in reducing the prevalence of lung cancer and preventing pre-invasive adenocarcinoma from progressing into IAC. Our study presents the proteogenomic landscape from pre-invasive to invasive lung adenocarcinoma, assisting in the identification of treatment targets for early-stage lung cancer. However, there are also some extrinsic factors that contribute to this process. Recently, William Hill et al. discovered that air pollutants can promote lung cancer by acting on cells that already harbor pre-existing oncogenic mutations in healthy lung tissue.[273]^58 Oncogenic EGFR driver mutations were detected in 18% of healthy tissue samples using ultradeep mutational sequencing.[274]^58 These findings supported that EGFR mutations could be an early molecular event in lung cancer tumorigenesis. Further investigation is warranted to explore other extrinsic factors and the interplay between external and inherent factors. Moreover, our analysis revealed an interaction between EGFR mutation and NOTCH pathways, which has been nominated in neural stem cells.[275]^59 Traditionally, the NOTCH signaling cascade has a crucial role in shaping development and homeostasis in diverse organs and tissues.[276]^60 Recent studies report that NOTCH signaling could act as an oncogene or a tumor suppressor in distinct cancers.[277]^61 In non-small cell lung cancer, NOTCH signaling exhibits an oncogenic effect and contributes to drug resistance to tyrosine kinase inhibitors of EGFR.[278]^61^,[279]^62 Our study indicated that the NOTCH pathway was downstream of the EGFR receptor, which requires experimental validation. In addition, a previous study has reported the association between 4q deletion and micrometastasis of lung cancer.[280]^63 Our analysis ascertained that chr4q12 deletion is a crucial arm-level event in the progression from AIS/MIA to IAC, resulting in a low protein expression level of p53-induced SPATA18. Downregulation of SPATA18 diminished lysosome-like organelles, causing decreased mitophagy and tumor growth of the lung. Our study put forward a possible explanation of how chr4q12 deletion leads to the progression from AIS/MIA to IAC. Since AIS and MIA represent the indolent type of lung cancer, chr4q12 deletion might be part of the intrinsic differences between indolent and aggressive tumor behavior. Consistent with previous studies,[281]^37^,[282]^64 the correlations between transcriptome and proteome of our study were 0.39 for all the tumor tissues and 0.20 for NATs. Specifically, the study by Xu et al. reported correlations of 0.28 for lung tumor tissues and 0.07 for NATs.[283]^37 In the study by Chen et al., they documented a correlation of 0.31.[284]^64 The proteomic data identified distinct pathways in NATs, AIS/MIA, and IAC. Blood coagulation and fatty acid β-oxidation, linked to tumor growth and dissemination in previous reports,[285]^65^,[286]^66 are upregulated in the pre-invasive stage of lung adenocarcinoma, implying that targeting the above two pathways might be alternative treatments for AIS/MIA. In addition, complement activation was also enriched in AIS/MIA. The complement pathway is an essential part of the innate immune system and interacts with the adaptive immune system to eliminate abnormal cells. Nevertheless, recent studies reveal that complement activation acts as a “double-edged sword” and enables cancer progression.[287]^67 Future studies are warranted to investigate the actual effect of complement activation in AIS/MIA. In IAC, some classic oncogenic pathways (e.g., cell-cycle and FGFR signaling pathways) were enhanced. Taken together, our findings provide insights into the proteomic characterization of pre-neoplasia of lung cancer and pave the way for future studies of lung cancer carcinogenesis. The unsupervised clustering analysis identified three proteomic subtypes, which differed from the clusters reported in previous studies.[288]^37^,[289]^38^,[290]^39^,[291]^40 Due to the substantial proportion of pre-invasive lesions in our cohort, our proteomic subtypes exhibited a significant association with the invasiveness of lung adenocarcinoma. Specifically, S-I predominantly comprised AIS/MIA, while the majority of S-III was formed by IAC cases. Understanding these proteomic subtypes and their association with tumor invasiveness can offer valuable insights for personalized treatment decisions and prognosis evaluation in patients with lung adenocarcinoma. Immune escape is recognized as a significant cause of cancer formation and progression,[292]^68 making immunotherapy a promising treatment approach for most types of cancer.[293]^69 Malignant cells need to overcome multiple obstacles to achieve the successful formation of a tumor, and it is a significant step to hijack microenvironmental processes to support the initiation and development of tumors. The lesion is less compromised by immune evasion during the initiation, and upregulation of genes involved in immunosuppression was observed as the lesions progressed.[294]^70^,[295]^71 Compared with ovarian cancer and other “cold” tumors,[296]^72 lung adenocarcinoma is not a cold tumor, in general, and has better response to immunotherapy.[297]^73 Several studies using IHC and single-cell RNA-seq have confirmed the low infiltration of CD4^+ and CD8^+ in early-stage lung cancer.[298]^74^,[299]^75 However, some researchers think that PD-L1-mediated immune evasion is already present in the pre-invasive stage,[300]^74 indicating that PD-1 antibodies might be effective in that stage. Several studies have tried to answer this question, and some of them revealed limited response to immunotherapy for GGO-featured lung cancer.[301]^76 However, some other studies draw different conclusions.[302]^77^,[303]^78^,[304]^79 A clinical trial investigated the use of sintilimab (PD-1 inhibitor) in patients with GGOs and found that 15% of patients had a major pathologic response and 46.2% of patients had a partial pathologic response.[305]^79 Therefore, the actual efficacy of immunotherapy in pre-invasive lung cancer still warrants further investigation. Our study also unveiled that B cells and plasma cells were able to predict the efficacy of immunotherapy in patients with non-small cell lung cancer. This might be useful for the selection of patients receiving immunotherapy. Recently, several studies have confirmed the pro-tumoral role of neutrophils.[306]^44^,[307]^80 However, the anti-tumoral effect of neutrophils has also been observed, particularly at the early stage of cancer.[308]^45^,[309]^46^,[310]^47 For instance, a study focusing on head-and-neck cancer demonstrated that neutrophils exhibited an antigen-presenting phenotype and activated T cells at the N0 stage, while at the N1-3 stage, neutrophils displayed immunosuppressive effects and repressed T cell response.[311]^47 In our study, neutrophils were found to be upregulated at the pre-invasive stage, suggesting a potential anti-tumoral role. In patients with N0 lung adenocarcinoma, high levels of neutrophils indicated improved survival. However, if lymph node metastases occur, neutrophils might switch to a pro-tumoral function. The dual role of neutrophils highlights their complexity and the importance of understanding their distinct functions at different stages of cancer progression. In summary, our study depicts the comprehensive genomic, proteomic, and phosphoproteomic map from lung neoplasia to IAC, enabling a deep understanding of the molecular architecture and hallmarks of early-stage lung adenocarcinoma. Specifically, we presented a preliminary exploration of multi-omics in the multi-stage carcinogenesis of lung cancer, which can be used to model the key events in lung tumorigenesis. The top-ranked EGFR mutation happened as early as in AIS/MIA. The chr4q12 deletion was the key event in the transition process from AIS/MIA to IAC, and the protein-level expression of SPATA18 was decreased, leading to tumor cell proliferation and poor prognosis in patients with lung adenocarcinoma. The data from this project serve as a valuable resource for the scientific community to explore the tumorigenesis and progression of lung adenocarcinoma, guiding the way to a deeper understanding of this lethal disease. Limitations of the study Although this is a large-scale proteogenomic study on the initiation and development of lung adenocarcinoma, there are several inherent limitations in this study. First, our cohort included east-Asian patients with lung adenocarcinoma, characterized by the high prevalence of females and never-smokers. Consequently, a higher frequency of EGFR mutation was observed compared with Caucasian cohorts. It warrants further investigation as to whether the findings could expand to other races. Second, despite rich resources for the investigation of tumorigenesis and development in lung cancer provided by this study, the hypotheses and conclusions need experimental validation from cell lines, animal models, organoids, or even clinical trials. Third, spatial information was missing due to the nature of the bulk sequencing method. Future analysis incorporating spatial transcriptome into other ’omics might provide a deeper understanding. Fourth, our study utilized NATs as the normal controls for DNA analysis, which may result in slight differences from blood samples. Future researchers should be aware of this issue when incorporating data from our cohort into their analyses. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies __________________________________________________________________ Rabbit monoclonal anti-SPATA18 Abcam Catalog: ab180154; RRID: AB_3076533 __________________________________________________________________ Biological samples __________________________________________________________________ Lung adenocarcinoma tissues (bulk DNA, RNA, protein) The Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China This paper __________________________________________________________________ Chemical, peptides, and recombinant proteins __________________________________________________________________ Phenylmethylsulfonyl fluoride (PMSF) AMRESCO Catalog: M145-5G (2-carboxyl)-phosphine hydrochloride (TCEP) ALDRICH Catalog: 4706-10G Deoxycholic acid sodium salt (DOC) Solarbio Catalog: D8330 2-chloroacetamide (CAA) ALDRICH Catalog: 22790-250G-F 1M tris-phosphine hydrochloride (Tris) AMRESCO Catalog: 0497 Trypsin Promega Catalog: V528A HPLC-grade water J.T.Baker Catalog: 4218-03 Acetonitrile (ACN) J.T. Baker Catalog: 9829-03 Formic acid (FA) Sigma Catalog: F0507 Methanol J.T. Baker Catalog: 9830-03 Phosphatase inhibitor cocktail 3 Sigma Catalog: P0044 NH3.H2O Aladdin Catalog:1336-21-6 C18 resin 3M Catalog: 2215-C18 SepPark C18 cartridges This paper N/A (homemade) Xbridge C18 column This paper N/A (homemade) Phosphopeptide Enrichment Kit Thermo Catalog: A32992 Camptothecin Sigma-Aldrich Catalog: C9911 __________________________________________________________________ Critical commercial assays __________________________________________________________________ BCA protein assay kit Beyotime Catalog: P0011 DNA Assay kit Life Technologies Catalog: [312]Q10212 TRIzol reagent Invitrogen Catalog: 15596026 Phosphopeptide Enrichment Kit Thermo Catalog: A32992 ClonExpress MultiS One Step Cloning Kit Vazyme Biotech Co., Ltd Catalog: C113-02 __________________________________________________________________ Deposited data __________________________________________________________________ Proteomics data This paper iProx: IPX0003921002 Phosphoproteomics data This paper iProx: IPX0003921003 Genomics data (WES) This paper EGA: EGAS00001004006 Transcriptomic data (RNA) This paper EGA: EGAS00001004006 Images of hematoxylin-eosin staining This paper Zenodo: [313]https://doi.org/10.5281/zenodo.7993327 __________________________________________________________________ Software and algorithms __________________________________________________________________ R (version 3.5.1) The R Project for Statistical Computing [314]https://www.r-project.org/ BWA-MEM Li et al.[315]^81 [316]http://bio-bwa.sourceforge.net/ Picard Broad Institute [317]http://broadinstitute.github.io/picard/ Genome Analysis Toolkit DePristo et al.[318]^82 [319]https://gatk.broadinstitute.org/hc/en-us MuTect Broad Institute [320]https://github.com/broadinstitute/mutect MuTect2 Broad Institute[321]^83 [322]https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mut ect2 Strelka (version 2.0.13) Saunders et al.[323]^84 [324]https://github.com/Illumina/strelka Oncotator (version 1.9.1) Ramos et al.[325]^85 [326]https://software.broadinstitute.org/cancer/cga/oncotator MutSig2CV Lawrence et al.[327]^86 [328]https://github.com/getzlab/MutSig2CV Firmiana platform Feng et al.[329]^87 [330]https://phenomics.fudan.edu.cn/firmiana/gardener/ multiomicsViz (version 1.6.0) Wang[331]^88 [332]https://git.bioconductor.org/packages/multiOmicsViz GISTIC2.0 Mermel et al.[333]^89 [334]https://www.genepattern.org/modules/docs/GISTIC_2.0 Maftools Mayakonda et al.[335]^90 [336]https://bioconductor.org/packages/release/bioc/vignettes/maftools/ inst/doc/maftools.html KEGG database Kanehisa et al.[337]^91 [338]https://www.phosphosite.org/homeAction Reactome database Croft et al.[339]^92 [340]https://reactome.org ConsensusclusterPlus (version 1.46.0) Wilkerson et al.[341]^93 N/A ESTIMATE (version 1.0.11) Yoshihara et al.[342]^94 [343]https://bioinformatics.mdanderson.org/estimate/ xCell Aran et al.[344]^43 [345]https://xcell.ucsf.edu PhosphoSitePlus Hornbeck et al.[346]^95 [347]https://www.phosphosite.org/homeAction NetworKIN 3.0 Horn et al.[348]^96 [349]http://networkin.info Kinase-Substrate Enrichment Analysis (KSEA) algorithm Casado et al.[350]^97 N/A ImageQuant TL GE Healthcare [351]https://imagequanttl.updatestar.com/ MaxQuant Cox et al.[352]^98 [353]http://www.coxdocs.org/doku.php?id=maxquant:start [354]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to the lead contact, Dr. Haiquan Chen (hqchen1@yahoo.com) Materials availability This study did not generate new unique reagents. Data and code availability Raw data from WES and RNA-seq of AIS/MIA and IAC have been deposited at European Genome-phenome Archive (EGA) under the accession code EGAS00001004006. The mass spectrometry proteomics and phosphoproteomics data have been deposited to the iProx Consortium ([355]https://www.iprox.org/) with the subproject ID (IPX0003921002 and IPX0003921003, respectively). The images of HE slides have been uploaded to Zenodo ([356]https://doi.org/10.5281/zenodo.7993326). This paper did not report original codes and algorithms. Any additional information required to reanalyze the data reported in this work paper is available from the lead contact upon request. Experimental model and study participant details Human samples The specimens of primary tumor and normal adjacent tissue (NAT) were prospectively collected from patients, who underwent surgical resection without neoadjuvant treatments in the Department of Thoracic Surgery, Fudan University Shanghai Cancer Center (FUSCC). This study included patients, diagnosed with adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or lung invasive adenocarcinoma (IAC). Tumor tissues and paired NATs, cut from specimens immediately after surgical resection, were moved into liquid nitrogen within thirty minutes. NATs were defined as lung tissues at least 3cm away from tumor margin, consistent with previous studies[357]^37. A total of 197 patients were enrolled in this study, including 98 with AIS/MIA and 99 with IAC. Patient characteristics, such as age, gender, smoking history, and pathological diagnosis, were also acquired prospectively during sample collection. The study was approved by the Institutional Review Board of FUSCC (IRB#090977-1), and informed consents for sample donation to the tissue bank of FUSCC were provided by patients or their direct relatives. Method details Pathological review All the pathological diagnoses were made by at least two experienced pulmonary pathologists independently according to lung adenocarcinoma classification released by WHO[358]^7. PCA of the proteome in AIS and MIA revealed no discernible differences between the two subtypes ([359]Figure S1B). No significant difference was observed in tumor mutation burden (TMB) between AIS and MIA (Wilcoxon signed-rank test, p = 0.51) ([360]Figure S1C). Given the above findings and the excellent survival of AIS and MIA, we decided to classify the AIS and MIA as one group. Tumors were staged using the eighth edition of TNM Classification for Lung Cancer[361]^10. Tumor purity was evaluated by hematoxylin-eosin (HE) staining, and only tumors with purity >=50% were further included. HE staining was perform to confirm the normal status for NATs. Each sample was homogenized and subsequently divided into three parts for different analyses if possible: the first part was used for whole-exome sequencing (WES), the second part was applied to RNA sequencing, and the remaining part was prepared for proteomic profiling and phosphoproteomic profiling. The tumor tissue was prioritized for DNA and RNA sequencing before being utilized for protein analysis. Whole exome sequencing (WES) Genomic DNA was isolated from frozen tumors and paired NATs using QIAamp DNA Mini Kit (QIAGEN, German) according to the manufacturer’s protocols, as previously reported[362]^16. The SureSelect XT Target Enrichment System (Agilent Technologies, USA) was used to construct exon libraries. A total of 1-3μg genomic DNA was sheared into fragmentations with the average size of approximately 200 bp. SureSelect XT reagents were applied to capture DNA and generate target-enriched library. Constructed libraries were run on the Illumina HiSeq X Ten platform using paired-end methods. The average mappable depth was 200X for tumor tissues and 100X for paired NATs. RNA sequencing Total RNA from tumors and paired NATs was extracted using NucleoZOL (Macherey-Nagel, German) and NucleoSpin RNA Set for NucleoZOL (Macherey-Nagel, German) according to the manufacturer’s protocols. RNA contamination and degradation were assessed on 1% agarose gels. NanoVue Spectrophotometer (General Electric, USA) was used to determine the concentration of total RNA. Epicenter Ribo-Zero Gold Kits (Epicenter, USA) were applied to remove ribosomal RNA. Quality of RNA was evaluated using the NanoDrop 8000 and the Agilent Bioanalyzer, and RNA integrity number (RIN) score for each specimen was generated. Sequencing libraries were constructed using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer’s instructions. Subsequently, libraries were sequenced on the Illumina HiSeq X Ten platform using paired-end method. WES alignment and somatic mutation calling BWA-MEM was applied to align sequencing reads to the reference human genome (hg19)[363]^81. PCR duplicates were marked using the Picard tools ( [364]https://broadinstitute.github.io/picard/ ). Base quality was recalibrated and local indel was realigned using the Genome Analysis Toolkit[365]^82. Single nucleotide variants were called by MuTect and MuTect2[366]^83, and indels were called by MuTect2 and Strelka v2.0.13[367]^84. To get reliable results, variants were excluded if found in only one in silico method. Somatic mutations were annotated using Oncotator v1.9.1[368]^85. MutSig2CV was used to identify significantly mutated genes[369]^86. RNA-seq quantification STAR v2.5.3 was used to align transcriptomic reads to the reference human genome (hg19)[370]^99. RNA-level expression was normalized to transcripts per million (TPM) using RSEM v1.3.0[371]^100. GISTIC and MutSig analysis The Genomic Identification of Significant Targets in Cancer (GISTIC2.0)[372]^89 method was used to identify significantly amplified or deleted focal-level and arm-level events, with Q values smaller than 0.25 considered significant. Cox proportional hazard models were also used to estimate (HRs) with 95% confidence intervals (CIs) for univariate and multivariate analyses. We screened the most significant amplification and deletion peaks that were associated with hazard ratios to identified. Finally, we filtered the genes by their CN-protein correlation to keep the genes with significant CN cis-effect (p value < 0.05, Spearman’s correlation). Protein Extraction and tryptic digestion According to the assessment of the mass of prepared sample pouch and the corresponding (phospho)peptide mass for the mass spectrometry (MS) analysis, 50 μg (∼ getting 2 μg peptide mass for MS analysis) and 100 μg (∼ getting 500 ng phosphopeptide mass for MS analysis) mass were used for proteomic profiling and phosphoproteomic profiling, respectively. We previously have achieved deep proteome and phosphoproteome coverage of the tiny samples in early-stage gastrointestinal cancer[373]^101^,[374]^102. For clinical sample preparation, samples were macro-dissected, deparaffinized with xylene, and washed with ethanol. 50 μL of TCEP buffer (2% deoxycholic acid sodium salt (Solarbio, Catalog: D8330), 40 mM 2-chloroacetamide (ALDRICH, Catalog: 22790-250G-F), 100 mM tris-phosphine hydrochloride (AMRESCO, Catalog: 0497), 10 mM (2-carboxyl)-phosphine hydrochloride (ALDRICH, Catalog: 4706-10G), 1 mM phenylmethylsulfonyl fluoride (AMRESCO, Catalog: M145-5G) mixed with MS water (J.T. Baker, Catalog: 4218–03), pH 8.8) was added into 1.5 mL EP tubes with prepared samples, and then heated in a 99°C metal bath for 30 min. Cool to room temperature, 5.5 μg trypsin (Promega, Catalog: V528A) was added into each tube and digested for 18 h in a 37 °C incubator. Then, 13 μL of 10% formic acid (FA) (Sigma, Catalog: F0507) was added into each tube and vortexed for 3 min, followed by sedimentation for 5 min (12,000 g). After that, a new 1.5 mL tube with 350 μL buffer (0.1% FA in 50% acetonitrile [ACN] (J.T. Baker, Catalog: 9830–03)) was needed to collect the supernatant for extraction (vortex for 3 min, and then 12,000 g sedimentation for 5 min). Then, the supernatant was transferred into a new tube for drying in a 60 °C vacuum drier. First dimensional reversed-phase separation The dried peptides were redissolved with 0.1%FA and loaded into a homemade Durashell Reverse Phase column [2 mg packing (3 μm, 150 Å, Agela) in a 200 μl tip], then eluted sequentially with nine gradient elution buffer that contains mobile phases A (2 % acetonitrile (ACN), adjusted pH to 10.0 using NH3.H2O (Aladdin, Catalog:1336-21-6) and 6 %, 9 %, 12 %, 15 %, 18 %, 21 %, 25 %, 30 %, 35 % mobile phase B (98 % ACN, adjusted pH to 10.0 using NH3.H2O). The nine fractions then were combined into three groups (6 % + 15 % + 25 %, 9 % + 18 % + 30 %, 12 % + 21 % + 35 %) and dried under vacuum for sub-sequential MS analysis. Phosphopeptide enrichment For the phosphoproteomic analysis, peptides were extracted from the specimens after trypsin digestion using the methods described as Protein Extraction and tryptic digestion. The tryptic peptides were then enriched with the High-Select™ Fe-NTA Phosphopeptide Enrichment Kit (Thermo Scientific cat. A32992), following the manufacturer’s recommendations. Briefly, peptides were suspended in binding/wash buffer (contained in the enrichment kit), mixed with the equilibrated resins, and incubated at 21–25 °C for 30 min. After incubation, the resins were washed thrice with binding/wash buffer and twice with water. The enriched peptides were eluted with elution buffer (contained in the enrichment kit) and dried in a 30 °C vacuum drier. Proteome analysis For the proteome profiling samples, peptides were analyzed on a Q Exactive HF-X Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific, Rockford, IL, USA) coupled with a high-performance liquid chromatography system (EASY nLC 1200, Thermo Fisher). Dried peptide samples re-dissolved in Solvent A (0.1 % formic acid in water) were loaded to a 2-cm self-packed trap column (100-μm inner diameter, 3 μm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) using Solvent A and separated on a 150-μm-inner-diameter column with a length of 15 cm (1.9 μm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) over a 75 min gradient (Solvent A: 0.1 % Formic acid in water; Solvent B: 0.1 % Formic acid in 80 % ACN) at a constant flow rate of 600 nL/min (0-75 min, 0 min, 4 % B; 0-10 min, 4-15 % B; 10-60 min, 15-30 % B; 60-69 min, 30-50 % B; 69-70 min, 50-100 % B; 70-75 min, 100 % B). The eluted peptides were ionized under 2 kiloVolts and introduced into mass spectrometry. Mass spectrometry was operated under a data-dependent acquisition mode. For the MS1 Spectra full scan, ions with m/z ranging from 300 to 1,400 were acquired by Orbitrap mass analyzer at a high resolution of 120,000. The automatic gain control (AGC) Target value was set as 3E+06. The maximal ion injection time was 80 ms. MS2 Spectra acquisition was performed in the ion trap in a rapid speed mode. Precursor ions were selected and fragmented with higher energy collision dissociation (HCD) with normalized collision energy of 27 %. Fragment ions were analyzed by ion trap mass analyzer with AGC target at 5E+04. The maximal ion injection time of MS2 was 20 ms. Peptides that triggered MS/MS scans were dynamically excluded from further MS/MS scans for 12 s. In our study, sample numbers were randomly assigned and subsequently subjected to mass spectrometry analysis. For the purpose of quality control, the HEK293T cell (National Infrastructure Cell Line Resource) lysates were measured as the standards every 15 samples injection to assess the stability of the mass spectrometry performance. PCA indicated that there was no discernible boundary or clustering pattern between samples injected at different profiling time ([375]Figure S3B) or between samples collected at different time points ([376]Figure S3C). Phosphoproteome analysis For the phosphoproteomic samples, peptides were analyzed on an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Fisher Scientific, Rockford, IL, USA) equipped with an Easy nLC-1000 (Thermo Fisher Scientific, Rockford, IL, USA) and a Nanoflex source (Thermo Fisher Scientific, Rockford, IL, USA). Dried peptide samples re-dissolved in Solvent A (0.1 % formic acid in water) were loaded to a 2-cm self-packed trap column (100-μm inner diameter, 3 μm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) using Solvent A and separated on a 150-μm-inner-diameter column with a length of 30 cm (1.9 μm ReproSil-Pur C18-AQ beads, Dr Maisch GmbH) over a 150 min gradient (buffer A: 0.1 % Formic acid in water; buffer B: 0.1 % Formic acid in 80 % ACN) at a constant flow rate of 600 nL/min (0-150 min, 0min, 4 % B; 0-10 min, 4-15 % B; 10-125 min, 15-30 % B; 125-140 min, 30-50 % B; 140-141 min, 50-100 % B; 141-150 min, 100 % B). The eluted phosphopeptides were ionized and detected by an Orbitrap Fusion mass spectrometry. Mass spectra were acquired over the scan range of m/z 350-1500 at a resolution of 120,000 (AUG target value of 5E+05 and max injection time 50 ms). For the MS2 scan, the higher-energy collision dissociation fragmentation was performed at a normalized collision energy of 30 %. The MS2 AGC target was set to 1E+04 with a maximum injection time of 10 ms, peptide mode was selected for monoisotopic precursor scan and charge state screening was enabled to reject unassigned 1+, 7+, 8+, and > 8+ ions with a dynamic exclusion time of 45 s to discriminate against previously analyzed ions between ± 10 ppm. Peptide and protein identification MS raw files were processed with the “Firmiana” (a one-stop proteomic cloud platform)[377]^87 against the human RefSeq protein database (updated on 04-07-2013, 32,015 entries) maintained by the National Center for Biotechnology Information (NCBI). The maximum number of missed cleavages was set to 2. Mass tolerances of 20 ppm for precursor and 0.5 Da for production were allowed. The fixed modification was cysteine carbamidomethylation and the variable modifications were N-acetylation and oxidation of methionine. For the quality control of proteins identification, the target-decoy-based strategy was applied to confirm that the FDR (False Discovery Rate) of both peptide and protein were lower than 1 %. The program percolator was used to obtain the probability value (q-value), and showed that the FDR (measured by the decoy hits) of every peptide-spectrum match (PSM) was lower than 1 %. Then all peptides shorter than seven amino acids were removed. The cutoff ion score for peptide identification was 20. All of the PSMs in all fractions were combined for protein quality control, which was a stringent quality control strategy. The q-values of both target and decoy peptide sequences were dynamically increased until the corresponding protein FDR was less than 1 % employing the parsimony principle. Finally, to reduce the false positive rate, the proteins with at least one unique peptide were selected for further investigation. Label-free-based MS quantification of proteins The one-stop proteomic cloud platform “Firmiana” was further employed for protein quantification. Identification results and the raw data from mzXML file was loaded. Then for each identified peptide, the XIC (extracted-ion chromatogram) was extracted by searching against the MS1 based on its identification information, and the abundance was estimated by calculating the area under the extracted XIC curve. For protein abundance calculation, the nonredundant peptide list was used to assemble proteins following the parsimony principle. Then, the protein abundance was estimated with a traditional label-free, intensity-based absolute quantification (iBAQ) algorithm, which divided the protein abundance (derived from identified peptides’ intensities) by the number of theoretically observable peptides. Then the fraction of total (FOT), a relative quantification value which was defined as a protein’s iBAQ divided by the total iBAQ of all identified proteins in one experiment, was calculated as the normalized abundance of a particular protein among experiments. Finally, the FOT was further multiplied by 1E6 for the ease of presentation and FOTs less then 1E6 were replaced with 1E6 to adjust extremely small values. Match-between-runs (MBR) is applied where peptides identified by tandem mass spectra in one run are transferred to another by inference based on m/z, charge state, retention time, and ion mobility when applicable, was applied to mitigate the missing value problem[378]^103. Differential protein analysis The proteomic data filtered with no missing values (n = 12293 genes) in all of samples (n = 260) were used as input data for differential expression analysis. A non-paired Mann-Whitney U test was performed on overlapping samples to determine differential abundance of proteins between AIS/MIA, IAC tumor and their matched adjacent tissues, respectively. Upregulated or downregulated proteins in tumors were defined as proteins differentially expressed in tumors compared with their matched adjacent tissues (T/N > 2 or < 1/2, Mann-Whitney U test, Benjamini-Hochberg adjusted p < 0.05). Wilcoxon signed-rank test was used to examine whether proteins were differentially expressed between EGFR^Mut AIS/MIA patients and EGFR^Mut IAC patients (T/N > 1.5 or < 0.67, Benjamini-Hochberg adjusted p < 0.05). Pathway enrichment analysis Differentially expressed proteins identified in AIS/MIA, IAC tumor and their matched adjacent were subjected to Gene Ontology and KEGG pathway enrichment analysis in DAVID[379]^104 with a p value/FDR < 0.05. We used gene sets of molecular pathways from the KEGG[380]^91 /Hallmark[381]^105 /Reactome[382]^92 databases to compute pathways for both tumor and non-tumor. Gene Set Enrichment Analysis (GSEA) GSEA was performed by the GSEA software ([383]http://software.broadinstitute.org/gsea/index.jsp)[384]^106. Gene sets including KEGG, Reactome and HALLMARK downloaded from the Molecular Signatures Database (MSigDB v7.1, [385]http://software.broadinstitute.org/gsea/msigdb/index.jsp) were set as background. Principal component analysis We performed PCA on 59 AIS/MIA tumor samples, 75 IAC samples and 126 normal adjacent samples to illustrate the global proteomic difference between tumor and NAT samples. The PCA function under the FactoMineR R package (version 2.4) was implemented for unsupervised clustering analysis with the parameter ‘n_components = 3’ on the expression matrix of global proteomic data containing 12,293 proteins. The 95 % confidence coverage was represented by a colored ellipse for each group, which was calculated based on the mean and covariance of points in each specific group. The transcriptomic difference between AIS/MIA and IAC samples were also illustrated by PCA on 68 AIS/MIA tumor samples, 82 IAC samples. The PCA function under the FactoMineR R package (version 2.4) was implemented for unsupervised clustering analysis with the parameter ‘n_components = 2’ on the expression matrix of global proteomic data containing 20,208 proteins. The 95 % confidence coverage was represented by a colored ellipse for each group, which was calculated based on the mean and covariance of points in each specific group. Correlation between subtype and clinical features In the measurements of correlations between clinical subtypes, immune clusters and clinical features, Fisher’s exact test was performed on variables: BMI, age, gender, progression stage, status of mutation, peak events, and immune clusters. mRNA-protein correlation in tumors and normal samples Genes, which were detected in more than 30% of tumors or adjacent in both RNA-Seq and MS data, were enrolled to evaluate mRNA-protein correlation. Pearson correlation coefficient between paired mRNA expression and protein abundance was measured. In addition, p value corresponding to the correlation coefficient was calculated and adjusted by the FDR correction. Functional pathways were enriched by GSEA enrichment analysis using the correlation-ranked list of proteins. Immune deconvolution Tumor purity, immune, stroma scores were inferred by R package ESTIMATE v1.0.11[386]^17 using proteome/mRNA data. Though ESTIMATE algorithm was a tool designed to analyze transcriptome data, some studies have used it for proteome analysis[387]^107^,[388]^108. To evaluate the immune characterization of the lung cancer samples, the abundances of 64 different cell types for 196 paired Lung cancer (Tumor /NATs) samples were computed via xCell ([389]https://xcell.ucsf.edu/)[390]^43, in which 126 paired and 186 paired lung cancer samples were conducted for proteomic and transcriptomic data, respectively. Based on these 64 cell signatures, consensus clustering was performed to identify subtypes of the samples with the same immune/stromal characteristics, and then all the samples were profiled into 4 immune clusters at the RNA and protein levels. Each cell signature and immune/stromal/environmental score were normalized to Z-score and then modeled as function a. In addition, we performed the differential analysis of the identified CD/MHC molecules based on global transcriptomic and proteomic data. In addition, pathway scores were computed using ssGSEA[391]^109 via Gene Set Variation Analysis (GSVA) package[392]^110. Immune consensus clustering analysis The protein expression matrix of the 186 and 126 paired lung cancer samples were used to identify the transcriptomic and proteomic subtypes, respectively, using the consensus cluster method. Consensus clustering was performed using the ConsensusClusterPlus R package[393]^93 with the 64 cell signatures from xCell ([394]https://xcell.ucsf.edu/)[395]^43. Consensus Cluster Plus parameters were reps=1,000, pItem = 0.8, pFeature = 1, clusterAlg = “hc”, distance = “euclidean”, plot=”PDF”. Euclidean distance and 1,000 resampling repetitions in the range of 2 to 10 clusters. As a result, the consensus matrix of k = 4 showed clear separation among clusters. The empirical cumulative distribution function (CDF) plot initially showed optimal separation. Clustering by k = 4 had the lowest proportion of ambiguous clustering (PAC) and showed significant association with the patients’ survival. A consensus matrix with k = 4 appeared to have the clearest cut between clusters and showed significant association with the patients’ survival. Taken together, proteomic and transcriptomic clusters were defined using k-means consensus clustering with k = 4. As summarized in [396]Figure S6A, the clustering analysis of the tumors (vertical column) by protein/TPM abundance (horizontal rows) divided all tumors into 4 proteomic/transcriptomic subgroups defined by silhouette analyses. A consensus matrix with k = 4 appeared to have the clearest cut between clusters and showed significant association with the patients’ survival. Database searching of MS phosphoproteomic data MS phosphoproteomic raw data were processed with firmiana platform against the human RefSeq protein database (updated on 04-07-2013, 32,015 entries) in the National Center for Biotechnology Information (NCBI). Kinase activity prediction and phosphopeptide analysis The phospho-proteome data were searched against the same database with MaxQuant. The phosphorylation of S or T or Y was set as variable modification, in which three mis-cleavages were allowed, with a minimum Andromeda score of 40 for spectra matches. The ratios of identified phosphorylation sites of all samples were used to estimate the kinase activities by Kinase-Substrate Enrichment Analysis (KSEA) algorithm[397]^97. The information of kinase-substrate relationships was obtained from publicly available databases, including PhosphoSite[398]^111, Phospho.ELM[399]^112 and PhosphoPOINT[400]^113. The information of substrate motifs was obtained either from the literature[401]^114 or from an analysis of the KSEA dataset with Motif (sP)[402]^115. The kinase-substrate-motif network analysis was referenced from PhosphoSitePlus (PSP, [403]https://www.phosphosite.org/homeAction)[404]^95 and NetworKIN 3.0[405]^96. Statistical analysis was performed in R (version 3.5.1) with Kruskal-Wallis test. Immunohistochemistry Distinct protein expressions of SPATA18 in AIS/MIA and IAC were validated through immunohistochemistry (IHC) of SPATA18. FPPE tumor tissues were sliced into 4-μm slices, which were stained with SPATA18 antibody (dilution 1:100, Abcam, Catalog: ab180154) using Anti-mouse/rabbit Immunohistochemistry Detection Kit (Proteintech, USA, Cat No. PK10006) according the manufacture’s protocol. To explore survival effect of SPATA18 expression, IHC on tissue microarrays were also conducted. For the assessment of immunostaining intensity, a semi-quantitative method, Histoscore (H-score, 0-300) was calculated by multiplying the intensity of positive staining (no staining = 0, weak staining = 1, moderate staining = 2, and strong staining = 3) by the proportion of positive cells (0-100%)[406]^64^,[407]^116^,[408]^117. Two independent pathologists, who were blinded to the pathological results and survival outcomes, scored H-score. For cases with conflicting results, consensuses were reached through discussion. In vitro and in vivo experiments The A549 human LUAD cell line was purchased from the the Cell Bank of the Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences. The A549 cells were cultured in 1640 medium (Thermo Fisher Scientific, USA) containing 10% fetal bovine serum (Gibco, USA) and 100 units penicillin–streptomycin in a humidified atmosphere with 5% CO2 at 37°C. Construction of plasmids and transfection PLL3.7 plasmids encoding SPATA18-specifc shRNAs [GCAACAGTTAGACTCTAAT (shSPATA18-1) and GGAACTCACTCAAGCAGAA (shSPATA18-2)] or a control scrambled shRNA (shNC) were developed. Cells were transfected with the plasmids using Lipofectamine 3000 (Thermo Fisher Scientific, Tokyo, Japan), in accordance with the manufacturer’s instructions. Cells stably expressing the SPATA18 shRNA or control shRNA were selected using puromycin. Cell viability assay The cell viability assays were performed with a Cell Counting Kit-8 Assay (DOJINDO Laboratories, Japan). The LUAD cells were seeded into 96-well plates (1 × 10^3 cells per well) in 100 μl medium incubated at 37 °C, 5%CO[2] in humidified incubator. After the indicated number of days, the supernatants were removed. 90 μL medium and 10 μL CCK-8 were added to each well, and the plates were then incubated for 1 h. The absorbance at 450 nm was detected with a microplate reader (BioTek, USA). Wound-healing assay Cells were cultured until confluence and then wounded using a 200μl pipette tip. Cells were treated with Oncostatin M (R&D systems) at 100-200ng/mL every 48h during 4 days. Three wounds were made for each sample, and migration distance was photographed and measured at time 0h and 24h. The 24-well transwell plates (BD, USA) with 8 μm pore filters were used for measuring cell migration and invasion. According to the manufactures’ instructions, 1 × 10^6 LUAD cells per well were seeded in the serum-free medium into upper insert, with (invasion assay) or without (migration assay) Matrigel Matrix (BD, USA), and the subjacent compartment was incubated with medium, with 10% FBS at 37 °C in an incubator. After incubation for 24 h, LUAD cells migrating to the lower surface were stained with 1.0% crystal violet and photographed. The LUAD cells were collected and fixed using 75% pre-cooled ethanol at 4 °C overnight to perform cell cycle analysis. After being washed three times with phosphate buffered saline (PBS) and resuspended with 500 μL PI staining solution (Vazyme Biotech, China) at room temperature for 30 min. The cell cycle analysis was detected via a flow cytometry (FACS LSRII, BD Bioscience, USA). To confuct Confocal laser microscopy of mitochondrial markers, cells (5×10^5 /sample) were seeded in 35-mm dishes and then stained with 1 μM LysoSensor Green DND-189 (L7535; Invitrogen) to detect the pH of lysosomes; 50 nM MitoTracker Red FM ([409]M22425; Invitrogen) to stain active mitochondria and measure mitochondria membrane potential. Images were acquired using an LSM-880 confocal laser microscope (Carl Zeiss). Animal studies were performed in accordance with the guidelines approved by the Fudan University Shanghai Cancer Center (FUSCC-IACUC-S2023-0021). All mice were maintained in a standard cage on a 12-h light-dark cycle, in a temperature-controlled room and were allowed access to a standard chow diet and water. A549 cells with SPATA18 ablation were trypsinized and resuspended in sterile 1×phosphate-buffered saline (PBS). Cells were counted and diluted with 1×PBS to a concentration of 1×10^7 cells/mL, and then 100 μL containing 1×10^6 total cells were subcutaneously injected to BALB/c nude mice. Tumors were measured for the longest and shortest diameters. Quantification and statistical analysis Wilcoxon signed-rank test was used to as a nonparametric test to test the difference of variables with nonnormal distribution. Benjamini-Hochberg method was used to adjust p values for multiple testing. Kaplan-Meier survival curves (log-rank test) were used for overall survival or recurrence-free survival (RFS) of the molecular abundance and patients. Coefficient value, which equals to ln (HR), was calculated from Cox proportional hazards regression analysis. P-values less than 0.05 were considered as significantly different and were chosen to enter Cox regression multivariate analysis. Prior to the log-rank test of a given protein, phosphoprotein, or phosphosite, survminer R package (version:0.4.9) with maxstat (maximally selected rank statistics) ([410]http://r-addict.com/2016/11/21/Optimal-Cutpoint-maxstat.html) was used to determine the optimal cut-off point for the selected samples according to the previous study[411]^118. RFS or OS curves were then calculated (Kaplan-Meier analysis, log-rank test) based on the optimal cut-off point. Proteins associated with survival outcomes were also presented ([412]Tables S1B and S1C). Acknowledgments