Abstract Lung cancer is one of the most common human cancers both in incidence and mortality, with prognosis particularly poor in metastatic cases. Metastasis in lung cancer is a multifarious process driven by a complex regulatory landscape involving many mechanisms, genes, and proteins. Membrane proteins play a crucial role in the metastatic journey both inside tumor cells and the extra-cellular matrix and are a viable area of research focus with the potential to uncover biomarkers and drug targets. In this work we performed membrane proteome analysis of highly and poorly metastatic lung cells which integrated genomic, proteomic, and transcriptional data. A total of 1,762 membrane proteins were identified, and within this set, there were 163 proteins with significant changes between the two cell lines. We applied the Tied Diffusion through Interacting Events method to integrate the differentially expressed disease-related microRNAs and functionally dys-regulated membrane protein information to further explore the role of key membrane proteins and microRNAs in multi-omics context. Has-miR-137 was revealed as a key gene involved in the activity of membrane proteins by targeting MET and PXN, affecting membrane proteins through protein–protein interaction mechanism. Furthermore, we found that the membrane proteins CDH2, EGFR, ITGA3, ITGA5, ITGB1, and CALR may have significant effect on cancer prognosis and outcomes, which were further validated in vitro. Our study provides multi-omics-based network method of integrating microRNAs and membrane proteome information, and uncovers a differential molecular signatures of highly and poorly metastatic lung cancer cells; these molecules may serve as potential targets for giant-cell lung metastasis treatment and prognosis. Keywords: membrane proteome, lung cancer metastasis, multi-omics analysis, microRNA, prognostic Introduction Lung cancer remains the leading cause of cancer-related mortality in the world, with an overall 5-year survival rate of 18% ([37]Siegel et al., 2017). In 2016, over 155,000 people died from lung cancer in the United States alone ([38]Siegel et al., 2017). These low survival rates are partly due to the fact that over 50% of patients are diagnosed at a later stage, for which the 5-year survival is only 4%. Approximately 80–85% of lung cancer cases are non-small cell lung cancer (NSCLC), and the remaining 15–20% are small-cell lung cancer cases ([39]Giaccone and Zucali, 2008). In NSCLC, it is estimated that over 40% of patients have metastases at the time of diagnosis ([40]Waqar et al., 2018). The prognosis is poor in metastatic cases -only ∼1% of such NSCLC patients will survive five or more years ([41]Borghaei et al., 2017). Cancer metastasis involves tumor cell invasion across interstitial tissues and basement membranes ([42]Jiang et al., 2015; [43]Qiao et al., 2019). Abnormal expression of membrane proteins in cancer tissues and cells has been shown to play a key role in cancer occurrence and metastasis ([44]Lethlarsen et al., 2009; [45]Kampen, 2011). In a recent study of hepatocellular carcinoma (HCC), golgi membrane protein 1 (GOLM1) was shown as a key target of miR-382, HCC cells metastasis status was inhibited when GOLM1 is down-regulated in HCC cells ([46]Zhang et al., 2018). N-cadherin (CDH2), a direct target of miR-145, is a cell-cell adhesion molecule that contributes to the invasive/metastatic phenotype in many cancers such as gastric cancer, breast cancer, and lung cancer ([47]Lei et al., 2017; [48]Mo et al., 2017; [49]Ye et al., 2018). β-catenin (CTNNB1), involved in the regulation of cell adhesion, promote ovarian cancer metastasis and liver cancer ([50]Arend et al., 2013; [51]Ding et al., 2014). Mucin 1(MUC1) protein, a membrane-tethered mucin glycoprotein, is also associated with poor prognosis and enhanced metastasis in human pancreatic cancers ([52]Wu et al., 2018). In addition, microRNAs have important roles in cancer metastasis ([53]Nicoloso et al., 2009), and multiple microRNAs, such as hsa-miR-1, hsa-miR-217, hsa-miR-206, and has-miR-577 were previously shown to play key roles in cancer metastasis ([54]Liu et al., 2015; [55]Chen et al., 2017; [56]Samaeekia et al., 2017). The complex regulatory landscape of cancer metastasis underscores the need of integrative approaches in cancer research. Multi-omics computational studies are an active area of investigation and perform analysis of genomic, proteomic, and transcriptional data combined with prior knowledge of regulatory relationships to uncover clinically relevant discovery such as biomarkers, therapeutically targets, and outcome prediction ([57]Liu et al., 2017; [58]Jiang et al., 2019; [59]Xu et al., 2020). One such recently proposed method – the Tied Diffusion Through Interacting Events (TieDIE) algorithm uses a network diffusion approach to connect genomic perturbations to transcriptional changes ([60]Paull et al., 2013). With the help of the TieDIE algorithm, a contributing factor (small GTPase RHEB) to the differences observed between BRAF and RAS mutants was discovered; in another cancer study, researchers combined transcriptional regulators, mutated genes, and differentially expressed kinases with TieDIE and synthesized a robust signaling network which consists of drug-able kinase pathways ([61]Drake et al., 2016). Although both membrane proteome and microRNA have been shown of great importance previously, there was no systematic combined analysis using membrane proteome together with microRNA data on lung squamous cell carcinoma (LUSC). In this study, we utilized quantitative membrane proteome and microRNA expression together with multiple regulation networks to perform comparative analysis of highly and poorly metastatic lung cancer cell lines (95C and 95D). In the following, we described the methodology used for experiment and computational analysis; differentially expressed membrane proteins are identified, then using joint analysis method to integrate microRNA expression data. Finally the significance of the study is discussed. Materials and Methods The methods utilized in the work aim to discover biomarkers which are associated with disease outcomes measured by overall survival (OS). We integrated experimental and computational approaches, proteomics and genomics (microRNA) data, and bioinformatics analysis to drive bio-medical discovery. The overall work-flow is depicted in [62]Figure 1. FIGURE 1. [63]FIGURE 1 [64]Open in a new tab Overall workflow of multi-omics integrative analysis. Experimental Data Collection – Protein Identification and Quantification Cell Culture and Membrane Protein Preparation Human giant cell lung cancer cell lines of poorly (95C) and highly (95D) metastatic potential were purchased from the Institute of Biochemistry and Cell Biology of the Chinese Academy of Sciences (Shanghai, China). Cells were cultured in DMEM and RPMI 1640 medium supplemented with 10% fetal bovine serum for 95D and 95C cells respectively at 37°C and 5% CO[2] incubator. All culture media were supplemented with 100 U/mL penicillin and 100 mg/mL streptomycin sulfate. All cultured cells were tested for mycoplasma contamination before use. Membrane proteins isolation was performed with the Pierce^® Cell Surface Protein Isolation Kit (Pierce, Thermo Fisher Scientific, United States), and followed the protocol described by [65]de Wit et al. (2012). All reagents were cooled to 4°C before protein biotinylation. The cells were washed twice with ice-cold phosphate buffered saline (PBS) followed by incubation with 0.25 mg/mL Sulfo-NHS-SS-biotin (Pierce) in 48 mL ice-cold PBS per flask on orbital shaker for 30 min at 4°C. Then, 500 μL of quenching solution were added to each flask to quench the reaction. After being washed with ice-cold PBS, harvested by gentle scraping, and pelleted by centrifugation, the cells were lysed using the Lysis Buffer (Pierce) which was added with protease inhibitors for 30 min on ice with vortexing every 5 min for 5 s. The cell lysates were centrifuged at 10,000 × g for 2 min at 4°C to remove cell remnants. Before clarified supernatant was used to purify biotinylated proteins on NeutrAvidin Agarose (Pierce), 500 μL of NeutrAvidin Agarose slurry was added and centrifuged 1 min at 1,000 × g and the flow-through was discarded followed by washing with Pierce Wash Buffer in a provided column (Pierce) trice. The clarified supernatant was added and incubated for 2 h at 4°C using an end-over-end tumbler to mix vigorously and allow the biotinylated proteins to bind to the NeutrAvidin Agarose slurry. Unbound proteins were removed by washing with 1% Non-idet-P40 and 0.1% SDS in 500 μL PBS thrice and then by washing with 0.1% Non-idet-P40 and 0.5 M NaCl in 500 μL PBS trice. Finally, the captured proteins were eluted from the biotin-NeutrAvidin Agarose and were collected by column centrifugation at 1,000 × g for 2 min. Three biological replicates were obtained for both cell lines. All protein concentrations were quantified using the BCA protein Assay Kit (Pierce) and the lysates were stored at −20°C for further analysis. Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) Equal amount of proteins was digested overnight at 37°C by trypsin (Promega, Madison, WI, United States) using the FASP approach. Briefly, 30 μg membrane proteins were used and three biological repeats of each cell line were prepared. Equal amount of peptide was injected into Easy-nLC 1,000 m (Thermo Scientific) coupled with a Q-Exactive mass spectrometer (Thermo Fisher Scientific) ([66]Michalski et al., 2011; [67]Kelstrup et al., 2012). Peptides were eluted to analytical column (75 μm × 15 cm) packed with Jupiter Proteo resin (3 μm, C18, 300 Å, Phenomenex, Torrance, CA, United States). The mobile phase consisted of buffer A (2% acetonitrile and 0.1% formic acid in water) and buffer B (0.1% formic acid in 95% acetonitrile). A flow rate of 250 nL/min and 60 min of the gradient from 12% B to 32% B was applied for the separation of peptides. MS Scan range was from 300 to 1,600 m/z with the resolution of 70,000. For MS/MS, scan range was from 200 to 2,000 m/z with the resolution of 17,500. We performed full MS scan in a positive mode and then selected the five most dominant icons from the initial MS scan for collision-induced dissociation. Protein Identification and Quantitation To identify proteins from the acquired data, MS/MS spectra were searched against the Human SwissProt database (548, 208 sequences) ([68]Jungo et al., 2012) using the MASCOT software (version 2.0) (Matrix Science, London, United Kingdom). SwissProt database is a high quality manually annotated and non-redundant protein sequence database, and now more and more bioinformatics data mining algorithms are designed using this database ([69]Le et al., 2017a,[70]2019a). The parameters for searching were the MASCOT defaults – enzyme of trypsin, two missed cleavage, fixed modifications of carbamidomethyl (C), and variable modifications of oxidation (M). We set mass tolerance to 20 ppm for MS precursors and 0.05 Da for fragment ions, and then peptide charges of +2, +3, and +4 were retained. For protein identification, we used p-value less than 0.05 and false discovery rate (FDR) less than 0.01 at the protein level as the criteria to distinguish two peptides. Label-free quantification was performed by intensity-based absolute quantification (iBAQ), which was based on at least two unique peptides to quantify the different protein profiling in the 95C cell and 95D cell membrane. Quantile normalization was performed to ensure that each sample had the same distribution, the two-fold change and p-value less than 0.01 cut-off was set up for the screening of differentially expressed proteins. Bioinformatics Analysis Protein Subcellular Localization Annotation and Transmembrane Domain Prediction The first step of the bioinformatics analysis aimed to identify membrane proteins. Here, we used the following process to annotate the membrane proteins. First, Gene Ontology (GO) cellular component annotation of all identified proteins was performed by the R go.db package. The GO is a human and machine readable gene annotation resource, which has been widely used to enable computational discovery in diverse areas such as protein function identification ([71]Le et al., 2017b,[72]2019b), text mining in life sciences ([73]Przybyla et al., 2016), and underlying molecular disease mechanisms ([74]Kramarz et al., 2020). Second, additional subcellular location information was downloaded from the UniProt database ([75]Jungo et al., 2012) and added to the protein subcellular localization annotation. Third, transmembrane domains in all identified membrane proteins were predicted by TMHMM^[76]1 Serve v.2.0 ([77]Krogh et al., 2001). This gave us a list of 3,240 membrane proteins which were utilized in the downstream analysis. Differential Expression Analysis and Enrichment Analysis In the second step of the bioinformatics analysis we searched for protein differential expression and enrichment. Utilizing the Student’s t-test, we tested the 3,240 membrane proteins for differential expression between poorly metastatic 95C and highly metastatic 95D cell lines. Proteins with a p-value <0.05 and |log[2]FC| > 1 were considered to be significantly differentially expressed and included in the reduced set (n = 163) for further analysis. GO and KEGG pathway enrichment analysis by the clusterprofiler R package ([78]Yu et al., 2012) was performed on the differentially expressed membrane proteins in order to understand which function they may affect. Gene set enrichment analysis (GSEA) were performed by the GSEA software v.3.0 ([79]Subramanian et al., 2005) using the molecular signatures database MSigDB ([80]Liberzon et al., 2011). At the end of this step we obtained a list of 163 differentially expressed proteins and 87 enriched genes. Multi-Omics Data – MicroRNA To integrate genomics data into our analysis, microRNA expression array data of 95D and 95C cells ([81]GSE47788) was downloaded from the Gene Expression Omnibus (GEO) database ([82]Edgar et al., 2002; [83]Wang X. M. et al., 2013). We directly used the differential expression results provided by the study. For the 64 differentially expressed microRNAs, we applied the miRWALK2.0 ([84]Dweep and Gretz, 2015) software to build a list of microRNAs and their gene targets. Generation of Biological Pathway Subnetwork Connecting MicroRNA and Enriched Genes Having obtained (1) the set of 87 enriched genes and (2) the set 64 of differentially expressed microRNA and their gene targets, we build a sub-network which significantly close-connects these genes and microRNAs. We used the Tied Diffusion through Interacting Events (TieDIE) software ([85]Paull et al., 2013; [86]Drake et al., 2016). The Multinet pathway database ([87]Brown et al., 2013) together with the validated microRNA-target pairs selected by miRWALK2.0 in the previous step served as the background network of the TieDIE program. 32 membrane protein genes and 180 linker genes were selected in this step. Cancer Hallmarks Enrichment Calculation The set of genes in the sub-network built with TieDIE in the previous step were used to perform cancer hallmark enrichment which is now a popular functional analysis method for a gene clustering ([88]Drake et al., 2016; [89]Ge et al., 2020). Cancer hallmark definitions were also downloaded from the MSigDB database, and the cancer hallmark pathway enrichment were performed by calculating the probability of overlap between input genes and the hallmark pathway gene sets and evaluated by hyper geometric test with [90]Benjamini and Hochberg (1997) correction of p-values. The source code for cancer hallmark enrichment analysis is publicly available at: [91]https://github.com/YankongSJTU/CHEA. Survival Analysis Integrating the above-obtained gene lists with survival data, we performed patient survival analysis to determine if the selected genes have impact on cancer related outcomes. We downloaded mRNA expression data (FPKM values) for the 32 membrane protein genes from the Human Protein Atlas^[92]2 ([93]Uhlen et al., 2010) of lung adenocarcinoma (LAC) and LUSC patients ([94]Tomczak et al., 2015), 925 samples in total. From the same source we also obtained patient OS data. For each gene, FPKM values from the 20th to 80th percentiles were used to group the patients; significant differences in the survival outcomes of the groups were examined and the value yielding the lowest log-rank p-value was set to be the best cut-off value. Patients were classified into two groups: group 1 with values above the cutoff (high expression level group) and group 2 with values below the cutoff (low expression level group). The outcome differences for each group were calculated using the Log-rank test by the Kaplan–Meier method ([95]Bland and Altman, 1998). Prognostic analysis was performed by using the R packages KMsurv, survival, and survminer ([96]Latouche and Aurelien, 2019). All p-values were derived from two-tailed statistical tests, and p-value <0.05 was considered as statistically significant. At the completion of the bioinformatics analysis, six significant gene biomarkers significantly correlated with OS were determined. We also constructed a prognostic risk index with these selected membrane proteins: [MATH: RiskIndex=expression⁢< /mo>(Pi )*HR imaximum(expression( Pi)) :MATH] (1) where P[i] (i = 1,2,3,…,K) means the selected K membrane proteins. Samples are grouped according to the risk factor levels, and the prognosis differences are compared. Experimental Verification The cellular expression of the six candidate proteins were verified experimentally via Western Blot analysis. Total cell lysates were obtained using RIPA buffer (Thermo Fisher Scientific, United States). Proteins were separated by SDS-polyacrylamide gel electrophoresis (SDS–PAGE) and transferred to polyvinylidene fluoride (PVDF) membrane (Millipore, Burlington, MA, United States). The membranes were blocked in PBS, 10% (w/v) skim milk for 1 h in phosphate buffer saline-Tween 20 (PBS-T), and incubated for 3 h at RT in 5% milk in PBS-T with primary antibodies: CDH2, EGFR, ITGA3, ITGA5, ITGB1, and CALR (Abcam, Cambridge, United Kingdom). Then, after washing, the PVDF membrane was incubated with secondary antibody (The Jackson Laboratory, United States) for 40 min at RT. Thermo Scientific SuperSignal West Pico PLUS Chemiluminescent Substrate kit (Thermo Fisher Scientific, United States) was used for visualization. Results and Discussion Protein Identification and Differential Expression Analysis We first sought to determine the full set of membrane proteins detected by mass spectrometry and then identify the subset of differentially expressed ones. A total of 3,241 unique proteins were identified and quantified. A total of 3,107 proteins (95.9%) were annotated by GO cellular component analysis and 2,887 proteins (89.1%) were annotated by using the UniProt subcellular localization database. Subcellular localization annotation analysis predicted that 1,762 proteins were membrane proteins (54.7%), whereas the TMHMM algorithm predicted that 590 proteins (18.3%) had a transmembrane domain. Significant differences between 95D and 95C cells were observed in the expression of 163 membrane proteins (|log[2]FC| > 1 and p-value < 0.05). We provide detailed subcellular localization annotations together with TMHMM results of all proteins in [97]Supplementary Table S1. All differential expression results are listed in [98]Supplementary Table S2. Functional Characterization of Differentially Expressed Membrane Proteins Differential expressed membrane proteins may play a key role in tumor metastasis. Membrane proteins as well as extra-cellular matrix (ECM) molecules, cell adhesion molecules and adhesion receptors form into functional complex units and maintain cell–cell adhesions. These complexes, once disassembled, will increase tumor metastasis and invasion ([99]Gumbiner, 1996; [100]Vadakekolathu et al., 2018). To further examine the mechanistic role in cancer metastasis of differentially expressed membrane proteins, we performed two-step functional enrichment analysis. First, GO and KEGG functional enrichment analysis revealed that differentially expressed membrane proteins were mainly centered on focal adhesion and cell-substrate adherens junctions, and many metabolic pathways. For example, the three most significantly enriched GO terms were GO:0005925∼focal adhesion (q-value = 7.2e−43), GO:0005924∼cell-substrate adherens junction (q-value = 7.2e−43), and cell-substrate junction (q-value = 1.2e−42), and the three most significantly enriched KEGG pathway terms were hs03010∼Ribosome (q-value = 1.07e−16), hsa05412∼Arrhythmogenic right ventricular cardiomyopathy (ARVC) (q-value = 1.65e−4), and hsa05416∼Viral myocarditis (q-value = 3.22e−3). All detailed enrichment results are shown in [101]Supplementary Table S4. In the second step of this analysis, GSEA revealed that differentially expressed membrane proteins were enriched in 13 GO categories (MSigDB c6 Gene Ontology or GO categories) including of GO_POSITIVE_REGULATION_OF_LOCOMOTION and GO_CELL_JUNCTION_ORGANIZATION. KEGG_FOCAL_ADHESION pathway was also enriched. Similar molecular terms, “KEGG_FOCAL_ ADHESION” and “REACTOME_HEMOSTASIS” were also identified using other functional gene sets (MSigDB c6 KEGG or Reactome categories) ([102]Table 1). Of all 87 enriched genes, 21 were considered as a core enrichment set, i.e., genes which were considered to contribute the most to the enrichment result According to the differential expression analysis, all core enrichment genes were significantly up-regulated, including ALCAM, LDOA, TP1A1, TP1B1, TP1B3, CDH2, CTNNB1, CXADR, EGFR, EPHA2, GLG1, ITGA3, ITGA5, ITGB1, ITGB3, JAM3, MPZL1, NEGR1, PARK7, PTPRF, and SLC16A3 (see [103]Supplementary Table 5). TABLE 1. Gene set enrichment result of differentially expressed membrane proteins. GS(follow, link, to, MSigDB) Size ES NES NOM, p-value FDR, q-value GO_POSITIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS 17 0.49060193 2.2359705 0 0.044999983 GO_POSITIVE_REGULATION_OF_LOCOMOTION 15 0.52440554 2.2129607 0 0.044999983 GO_CELL_PROJECTION_PART 18 0.46777532 2.201335 0 0.044999994 GO_PLASMA_MEMBRANE_REGION 24 0.5549008 2.1945791 0 0.044999983 GO_CELLULAR_COMPONENT_MORPHOGENESIS 16 0.51833624 2.1498182 0 0.044999994 KEGG_FOCAL_ADHESION 17 0.45756185 1.9556208 0 0.044999994 REACTOME_HEMOSTASIS 26 0.38783315 1.9427094 0 0.045000017 GO_CELL_JUNCTION_ORGANIZATION 16 0.48022297 1.9331897 0 0.044999983 GO_REGULATION_OF_CELL_DIFFERENTIATION 18 0.49768242 1.9189458 0 0.044999994 GO_MEMBRANE_MICRODOMAIN 15 0.5177578 1.9009451 0 0.044999994 GO_TISSUE_DEVELOPMENT 32 0.38096464 1.8797828 0 0.045 GO_ANATOMICAL_STRUCTURE_FORMATION_INVOLVED_IN_MORPHOGENESIS 18 0.3993144 1.8518116 0 0.044999994 GO_REGULATION_OF_PHOSPHORUS_METABOLIC_PROCESS 23 0.42798108 1.8506685 0 0.045000024 GO_REGULATION_OF_PROTEIN_MODIFICATION_PROCESS 21 0.46240425 1.8467267 0 0.045000017 GO_ENZYME_LINKED_RECEPTOR_PROTEIN_SIGNALING_PATHWAY 15 0.5241594 1.8411065 0 0.04499999 GO_RECEPTOR_BINDING 24 0.39534718 1.8257784 0 0.044999983 GO_MEMBRANE_REGION 31 0.420572 1.8051884 0 0.051817805 GO_PROTEIN_COMPLEX_BINDING 28 0.36254478 1.787877 0 0.051439002 GO_REGULATION_OF_CELLULAR_COMPONENT_MOVEMENT 24 0.39252102 1.7859758 0 0.05110011 [104]Open in a new tab ES, NES, and FDR are enrichment score, normalized ES, and false discovery rate, respectively. MicroRNAs and Significant Sub-Network Derived From TieDIE Program MicroRNAs are non-coding RNAs which participate in cellular activity by regulating target genes. Increasing number of studies have reported that microRNAs are frequently differentially expressed in numerous types of human cancer and play an important role in the progression and development of NSCLC ([105]Inamura and Ishikawa, 2016; [106]Berrout et al., 2017; [107]Chang et al., 2017; [108]Zhang et al., 2017; [109]Iqbal et al., 2018). Although microRNAs act only in the cell where they are synthesized, they can also influence the functions of neighboring cells or play a role in the tumor micro-environment by modulating the ECM state ([110]Rutnam et al., 2013). Utilizing the TieDIE program (a pathway-based multi-omics method which extends on the heat diffusion strategies and uses a network diffusion approach to connect proteins and genes related to diseases) we were able to link differentially expressed microRNAs with membrane proteins. First, 64 differentially expressed microRNAs were obtained directly from a previous study ([111]Supplementary Table S2). We successfully found 300 targets of 64 microRNAs with the help of MIRWALK 2.0 and synthesized a validated microRNA-regulated target network ([112]Supplementary Table S3) which served as background database together with the Multinet database ([113]Khurana et al., 2013). The input membrane proteins and microRNAs were found to be significantly close (p < 0.001) in a pathway space based on a background model generated by 1,000 permutations of the data ([114]Paull et al., 2013), where each input set (membrane proteins, microRNAs) was swapped with genes of similar network connectivity while the other one was fixed ([115]Figure 2A). FIGURE 2. [116]FIGURE 2 [117]Open in a new tab Characterization of the TieDIE network. (A) The distribution of background scores and real score. The distribution of background scores is shown as blue bars, while the green line represents the real score. (B) Compact subnetwork after manually deleting linker proteins whose degree was one from the raw subnetwork constructed by the TieDIE program. The rhombus nodes represent for microRNAs, the circle nodes represent for linker proteins, while the square nodes represent differentially expressed membrane proteins. The node color changes according to the fold change values. (C) PPI subnetwork of 32 differentially expressed membrane proteins. (D) Wheel plot of cancer hallmark enrichment of the TieDIE subnetwork. We selected a compact sub-network with a high level of specificity, which consisted of 216 nodes – four microRNAs (hsa-miR-137, hsa-miR-483-5p, hsa-miR-638, and hsa-miR-127-3p), 32 differentially expressed membrane proteins (HSPA5, CANX, TJP1, FN1, ITGA3, ITGA4, ITGA5, XPO1, JAM3, F11R, SDC1, FLNB, DCTN2, VDAC2, ALCAM, RAB10, SRC, LDLR, CDH2, EGFR, CTNNB1, ITGB1, ITGB3, CD44, CD47, HGS, ARHGEF7, HGF, LMNA, CALR, PTPRF, and GNA13) and 180 linking proteins connected by 244 edges ([118]Supplementary Table S6). We manually deleted those linker proteins whose degree was 1 and constructed a sub-network consisting of four microRNAs, 32 membrane proteins, and 38 linker proteins with 101 edges ([119]Figure 2B). Focusing on differentially expressed membrane proteins, we found 32 membrane proteins which were connected through protein–protein interactions (PPI) directly ([120]Figure 2C). Regarding the four microRNAs in the sub-network (hsa-miR-137, hsa-miR-483-5p, hsa-miR-638, and hsa-miR-127-3p) – only hsa-miR-137 was up-regulated in the 95D cell line and the other three were down-regulated in the 95D cell lines as comparing with the 95C cell lines. We also found that the four sub-network microRNAs were not independent of each other but connected by at least one linker protein. The three down-regulated microRNAs were previously reported to be associated with tumor metastasis in recent studies, and to influence the expression of both XPO1 and ALCAM ([121]Ma et al., 2014; [122]Herr et al., 2017; [123]Shi et al., 2018; [124]Yue et al., 2018). We focused on has-miR-137, which was up-regulated in the high-metastatic (95D) cell line and increases invasion and metastasis of NSCLC cells ([125]Chang et al., 2017) Four up-regulated membrane proteins, HGF, CTNNB1, ITGB1, and ITGA4, involved in focal adhesion pathway, were linked to has-miR-137 by two mediation genes – PXN and MET ([126]Figure 2B). Paxillin (PXN), whose expression was negatively correlated with has-miR-137 ([127]Jiang et al., 2018), encodes a focal adhesion-associated protein and plays an important role in signal transduction, regulation of migration, proliferation and apoptosis. MET encodes tyrosine-protein kinase Met (c-Met) which possesses tyrosine kinase activity and is a well-characterized driver of oncogenesis occurs in multiple cancers include of NSCLC ([128]Gentile et al., 2008). In this study, we found that MET and PXN, which are regulated by has-miR-137, may affect membrane proteins through PPI. Although proto-oncogene tyrosine-protein kinase SRC was found to be down regulated in the high-metastasis (95D) cell line in this study, many other research indicated SRC was highly expressed in NSCLC ([129]Giaccone and Zucali, 2008; [130]Rothschild et al., 2010). Cancer Hallmark Pathway Enrichment We show that our TieDIE sub-network is significantly close in the pathway space. Considering the cancer hallmark pathway enrichment of all proteins involved in the TieDIE sub-network (32 differentially expressed membrane proteins and 180 linker proteins), we found the five main cancer hallmark pathway categories were significantly enriched, including of cell cycle pathway category (p-value = 0.0154), inflammatory response pathway category (p-value = 0.0105), metabolism pathway category (p-value = 0.0196), migration and invasion pathway category (p-value = 5.8380e−06), and PI3K/AKT mTOR pathway category (p-value = 0.0292) ([131]Table 2). TABLE 2. Cancer hallmark pathway enrichment result of all proteins involved in the TieDIE sub-network. Hallmark main category Hallmark sub-categories p-value Enrichment ratio Gene number Cell cycle E2F Targets, G2M Checkpoint, p53 Pathway, Mitotic Spindle, Apoptosis 0.015443819 1.35328 29 DNA repair PI3K/AKT/mTOR signaling, MTORC1 signaling 0.914616875 0.584921 7 Inflammatory response IL2/STAT5 signaling, IL6/JAK/STAT3 signaling, inflammatory response, interferon alpha response, interferon gamma response, TNF alpha signaling via NFKB 0.010537424 1.56327 19 Metabolism Hedgehog signaling, Myc targets, notch signaling, TGF beta signaling, WNT beta-catenin signaling 0.019642139 1.4941 18 Migration and invasion Apical junction, epithelial/mesenchymal transition 5.84E−06 2.35103 24 Nuclear receptor signaling DNA repair, UV response 0.199991976 1.16037 12 PI3K AKT mTOR cholesterol homeostasis, fatty acid metabolism, glycolysis, reactive oxygen species pathway 0.029191187 1.58529 12 Stemness Angiogenesis, hypoxia 0.302419603 1.06367 11 Tumor microenvironment Androgen response, estrogen response early and late (merged to become nuclear receptor response) 0.401740261 0.966142 6 [132]Open in a new tab In addition, 10 cancer related hallmark pathways were also significantly enriched, including apoptosis pathway (p-value = 0.0221, cholesterol homeostasis pathway (p-value = 0.04152), epithelial/mesenchymal transition pathway (p-value = 0.0048), estrogen response early and late pathway (p-value = 0.04203), G2M checkpoint pathway (p-value = 0.0322), glycolysis pathway (p-value = 0.0048), Il6/Jak/Stat3 signaling pathway (p-value = 0.0067), mitotic spindle pathway (p-value = 0.0130), Mtorc1 signaling pathway (p-value = 0.0131) and TGF beta signaling pathway (p-value = 0.0500). Hallmark “wheels” were colored proportionally to the negative log transformed p-values returned by the hypergeometric test ([133]Supplementary Table S7) ([134]Figure 2D). These results further demonstrate the importance of our selected membrane proteins. Prognostic Value of Differentially Expressed Membrane Proteins in TieDIE Sub-Network (Overall Survival) The OS prognostic value of the 32 membrane proteins in the TieDIE sub-network was evaluated by performing log-rank test in the TCGA lung cancer cohort ([135]Supplementary Table S8). Based on the best cut-off of each gene, Kaplan–Meier (KM) curves were generated for the high expression level group and the low expression level group. The expression group definition is described in section “Survival Analysis.” High expression of CDH2 (HR = 1.2874, 95% CI: 1.0425–1.5898, p = 0.0279), EGFR (HR = 1.2927, 95% CI: 1.0496–1.5921, p = 0.0166), ITGA3 (HR = 1.2965, 95% CI: 1.0202–1.6477, p = 0.0379), ITGB1 (HR = 1.5930, 95% CI: 1.2906–1.9663, p = 2.1922e−05), and ITGA5 (HR = 1.4656, 95% CI: 1.1836–1.8148, p = 5.8933e−04) was negatively associated with OS in NSCLC patients. Low expression of CALR (HR = 0.7930, 95% CI: 0.6454–0.9744, p = 0.0279) was associated with worse OS for NSCLC patients ([136]Figure 3). According to the differential expression analysis in this work; CDH2, EGFR, ITGA3, ITGA5, and ITGB1 were all highly expressed in the high- metastatic cell lines while CALR has low expression (see [137]Supplementary Figure S1). This is consistent with the widely accepted fact regarding NSCLC – that patients with metastasis have a poor prognosis. FIGURE 3. [138]FIGURE 3 [139]Open in a new tab Kaplan-Meier curves of overall survival for CHE2, EGFR, ITGA3, ITGA5, ITGB1, and CALR. A Risk Index was computed for each sample by applying the formula described in the section “Materials and Methods.” When comparing the prognosis differences between the high and low risk factor groups, we found that the high risk group showed poor prognosis (p-value < 1e−05) (see [140]Supplementary Figure S2). However, not all prognosis-associated genes match this pattern. We also found five other proteins – CD47, FN1, VDAC2, HGF, and ITGA4, which showed significant correlation with OS of NSCLC patients, however, the direction of the correlation was non-intuitive. HGF, ITGA4, and CD47 were also associated with increase in OS and we can observe that patients may live longer when these genes are highly expressed. However, high expression of HGF, ITGA4, and CD47 is observed in 95D cell lines (high metastasis cell lines) as compared with low metastasis cell lines. Finally, FN1 and VDAC2 are down regulated in the 95D cell lines but lower expression level of these proteins will result in longer OS ([141]Supplementary Figure S1 – KM curve). A potential mechanistic explanation is due to inconsistent expression of protein level and gene level ([142]Vogel and Marcotte, 2013). Validation of Altered Protein Expression To validate the quantitative differences observed by mass spectrometry and bioinformatics analysis, the expression levels of six proteins were verified in the two (95C and 95D) cell lines through Western blot. The experimental results confirmed our prediction ([143]Figure 4). Calreticulin (CALR gene production) showed low expression levels in high-metastasis cell lines (95D), while the other five proteins (CDH2, EGFR, ITGA3, ITGB1, and ITGA5) were highly expressed. The high expression of CDH2, EGFR, ITGA3, ITGA5, and ITGB1 and the low expression of CALR in lung cancer have been validated by experimental techniques (Western blot and/or immunohistochemistry) in previous studies ([144]Boelens et al., 2007; [145]Grinberg-rashi et al., 2009; [146]Wang H. et al., 2013; [147]Liu et al., 2015; [148]Wu et al., 2016; [149]Du et al., 2017). Combined with the results of this study, we can infer that from the onset of lung cancer, the high expression level of the five proteins (CDH2, EGFR, ITGA3, ITGB1, and ITGA5) and the low expression of ACALR may serve as biomarkers to determine whether the tumor has a high metastasis potential. FIGURE 4. [150]FIGURE 4 [151]Open in a new tab Western blot verification results. (A) Western blot assays of the protein level of CDH2, EGFR, ITGA3, TIGB1, TIGA5, and CALR. (B) The gray levels of Western blotting are shown by bar graph. Besides biochemical validation of the six proteins (CDH2, EGFR, ITGA3, ITGB1, and ITGA5), we also performed literature verification. N-cadherin (CDH2), which is a member of the cadherin family and is involved in EMT and cancer metastasis ([152]Hazan et al., 2004), has been reported as being highly expressed in LAC tissues. It was further revealed that LAC migration and invasion are suppressed after knocking down CDH2 ([153]Zhao et al., 2013). Regarding epidermal growth factor receptor (EGFR), its gene amplification was shown as significantly increased in tumor cells and it was closely related to metastasis and TNM stage ([154]Jia et al., 2015). Previous studies have verified the prognostic value of ITGA3, ITGA5, and ITGB1 expression on relapse and metastasis in lung cancer ([155]Zheng et al., 2016). In addition, CALR was shown to be an independent prognostic factor for lung cancer ([156]Liu et al., 2012), and the reduction in the expression of CALR was associated with an increased rate of proliferation ([157]Bergner et al., 2009). Conclusion The high metastatic status of giant cell lung cancer is strongly associated with the abnormal expression of membrane proteins, and microRNAs play a key role in regulation of expression. From this study, we conclude that the high expression of has-miR-137 and its indirect targets-CDH2, EGFT, ITGA3, ITGB1, ITGA5 and the low expression of CALR serve as markers of high-metastasis status of giant cell lung cancer. Our study provides a new approach to the analysis of integrated proteome and microRNAs and the synthesized sub-network provides candidate targets for giant-cell lung metastasis treatment. Data Availability Statement The mass spectrometry proteomics data generated for this study was deposited into the ProteomeXchange Consortium via the PRIDE ([158]Perez-Riverol et al., 2019) ([159]https://www.ebi.ac.uk/pride/) partner repository with the dataset identifier [160]http://www.ebi.ac.uk/pride/archive/projects/PXD016912. Author Contributions YK participated in the data acquisition, performed the statistical analysis, and drafted the manuscript. ZQ and YR conceived of the study and participated in its design. GG contributed to the data interpretation and manuscript writing. MG conducted western-blot experiment and summarized results. All authors read and approved the final manuscript. All aspects of the study were supervised by HX, HL, and HZ. Conflict of Interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Funding. This work is supported by National Key R&D Program of China 2018YFC0910500, SJTU-Yale Collaborative Research Seed Fund, and the Neil Shen’s SJTU Medical Research Fund. ^1 [161]http://www.cbs.dtu.dk/services/TMHMM/ ^2 [162]https://www.proteinatlas.org Supplementary Material The Supplementary Material for this article can be found online at: [163]https://www.frontiersin.org/articles/10.3389/fgene.2020.01023/full #supplementary-material FIGURE S1 The KM curve of 5 genes on TCGA lung cancer cohort. [164]Click here for additional data file.^ (1.3MB, TIF) FIGURE S2 The KM curve of risk index on TCGA lung cancer cohort. [165]Click here for additional data file.^ (666.5KB, TIF) TABLE S1 Identification results of membrane proteins. [166]Click here for additional data file.^ (3.9MB, XLS) TABLE S2 Different expression analysis results between high and low metastatic cell lines. [167]Click here for additional data file.^ (3.9MB, XLS) TABLE S3 Validated microRNA-regulated targets network. [168]Click here for additional data file.^ (1.4MB, XLS) TABLE S4 GO and KEGG pathway the enrichment results. [169]Click here for additional data file.^ (254.5KB, XLS) TABLE S5 Gene set enrichment analysis results. [170]Click here for additional data file.^ (122.5KB, XLS) TABLE S6 Compact subnetwork predicted by TieDIE program. [171]Click here for additional data file.^ (43KB, XLS) TABLE S7 cancer hallmark pathway enrichment analysis of genes involved in subnetwork. [172]Click here for additional data file.^ (112KB, XLS) TABLE S8 Raw clinical data and 32 membrane protein expression data on TCGA lung cancer cohort. [173]Click here for additional data file.^ (425.5KB, XLS) References