Abstract

   Lung cancer is one of the most common human cancers both in incidence
   and mortality, with prognosis particularly poor in metastatic cases.
   Metastasis in lung cancer is a multifarious process driven by a complex
   regulatory landscape involving many mechanisms, genes, and proteins.
   Membrane proteins play a crucial role in the metastatic journey both
   inside tumor cells and the extra-cellular matrix and are a viable area
   of research focus with the potential to uncover biomarkers and drug
   targets. In this work we performed membrane proteome analysis of highly
   and poorly metastatic lung cells which integrated genomic, proteomic,
   and transcriptional data. A total of 1,762 membrane proteins were
   identified, and within this set, there were 163 proteins with
   significant changes between the two cell lines. We applied the Tied
   Diffusion through Interacting Events method to integrate the
   differentially expressed disease-related microRNAs and functionally
   dys-regulated membrane protein information to further explore the role
   of key membrane proteins and microRNAs in multi-omics context.
   Has-miR-137 was revealed as a key gene involved in the activity of
   membrane proteins by targeting MET and PXN, affecting membrane proteins
   through protein–protein interaction mechanism. Furthermore, we found
   that the membrane proteins CDH2, EGFR, ITGA3, ITGA5, ITGB1, and CALR
   may have significant effect on cancer prognosis and outcomes, which
   were further validated in vitro. Our study provides multi-omics-based
   network method of integrating microRNAs and membrane proteome
   information, and uncovers a differential molecular signatures of highly
   and poorly metastatic lung cancer cells; these molecules may serve as
   potential targets for giant-cell lung metastasis treatment and
   prognosis.

   Keywords: membrane proteome, lung cancer metastasis, multi-omics
   analysis, microRNA, prognostic

Introduction

   Lung cancer remains the leading cause of cancer-related mortality in
   the world, with an overall 5-year survival rate of 18% ([37]Siegel et
   al., 2017). In 2016, over 155,000 people died from lung cancer in the
   United States alone ([38]Siegel et al., 2017). These low survival rates
   are partly due to the fact that over 50% of patients are diagnosed at a
   later stage, for which the 5-year survival is only 4%. Approximately
   80–85% of lung cancer cases are non-small cell lung cancer (NSCLC), and
   the remaining 15–20% are small-cell lung cancer cases ([39]Giaccone and
   Zucali, 2008). In NSCLC, it is estimated that over 40% of patients have
   metastases at the time of diagnosis ([40]Waqar et al., 2018). The
   prognosis is poor in metastatic cases -only ∼1% of such NSCLC patients
   will survive five or more years ([41]Borghaei et al., 2017).

   Cancer metastasis involves tumor cell invasion across interstitial
   tissues and basement membranes ([42]Jiang et al., 2015; [43]Qiao et
   al., 2019). Abnormal expression of membrane proteins in cancer tissues
   and cells has been shown to play a key role in cancer occurrence and
   metastasis ([44]Lethlarsen et al., 2009; [45]Kampen, 2011). In a recent
   study of hepatocellular carcinoma (HCC), golgi membrane protein 1
   (GOLM1) was shown as a key target of miR-382, HCC cells metastasis
   status was inhibited when GOLM1 is down-regulated in HCC cells
   ([46]Zhang et al., 2018). N-cadherin (CDH2), a direct target of
   miR-145, is a cell-cell adhesion molecule that contributes to the
   invasive/metastatic phenotype in many cancers such as gastric cancer,
   breast cancer, and lung cancer ([47]Lei et al., 2017; [48]Mo et al.,
   2017; [49]Ye et al., 2018). β-catenin (CTNNB1), involved in the
   regulation of cell adhesion, promote ovarian cancer metastasis and
   liver cancer ([50]Arend et al., 2013; [51]Ding et al., 2014). Mucin
   1(MUC1) protein, a membrane-tethered mucin glycoprotein, is also
   associated with poor prognosis and enhanced metastasis in human
   pancreatic cancers ([52]Wu et al., 2018). In addition, microRNAs have
   important roles in cancer metastasis ([53]Nicoloso et al., 2009), and
   multiple microRNAs, such as hsa-miR-1, hsa-miR-217, hsa-miR-206, and
   has-miR-577 were previously shown to play key roles in cancer
   metastasis ([54]Liu et al., 2015; [55]Chen et al., 2017; [56]Samaeekia
   et al., 2017).

   The complex regulatory landscape of cancer metastasis underscores the
   need of integrative approaches in cancer research. Multi-omics
   computational studies are an active area of investigation and perform
   analysis of genomic, proteomic, and transcriptional data combined with
   prior knowledge of regulatory relationships to uncover clinically
   relevant discovery such as biomarkers, therapeutically targets, and
   outcome prediction ([57]Liu et al., 2017; [58]Jiang et al., 2019;
   [59]Xu et al., 2020). One such recently proposed method – the Tied
   Diffusion Through Interacting Events (TieDIE) algorithm uses a network
   diffusion approach to connect genomic perturbations to transcriptional
   changes ([60]Paull et al., 2013). With the help of the TieDIE
   algorithm, a contributing factor (small GTPase RHEB) to the differences
   observed between BRAF and RAS mutants was discovered; in another cancer
   study, researchers combined transcriptional regulators, mutated genes,
   and differentially expressed kinases with TieDIE and synthesized a
   robust signaling network which consists of drug-able kinase pathways
   ([61]Drake et al., 2016).

   Although both membrane proteome and microRNA have been shown of great
   importance previously, there was no systematic combined analysis using
   membrane proteome together with microRNA data on lung squamous cell
   carcinoma (LUSC). In this study, we utilized quantitative membrane
   proteome and microRNA expression together with multiple regulation
   networks to perform comparative analysis of highly and poorly
   metastatic lung cancer cell lines (95C and 95D). In the following, we
   described the methodology used for experiment and computational
   analysis; differentially expressed membrane proteins are identified,
   then using joint analysis method to integrate microRNA expression data.
   Finally the significance of the study is discussed.

Materials and Methods

   The methods utilized in the work aim to discover biomarkers which are
   associated with disease outcomes measured by overall survival (OS). We
   integrated experimental and computational approaches, proteomics and
   genomics (microRNA) data, and bioinformatics analysis to drive
   bio-medical discovery. The overall work-flow is depicted in [62]Figure
   1.

FIGURE 1.

   [63]FIGURE 1
   [64]Open in a new tab

   Overall workflow of multi-omics integrative analysis.

Experimental Data Collection – Protein Identification and Quantification

Cell Culture and Membrane Protein Preparation

   Human giant cell lung cancer cell lines of poorly (95C) and highly
   (95D) metastatic potential were purchased from the Institute of
   Biochemistry and Cell Biology of the Chinese Academy of Sciences
   (Shanghai, China). Cells were cultured in DMEM and RPMI 1640 medium
   supplemented with 10% fetal bovine serum for 95D and 95C cells
   respectively at 37°C and 5% CO[2] incubator. All culture media were
   supplemented with 100 U/mL penicillin and 100 mg/mL streptomycin
   sulfate. All cultured cells were tested for mycoplasma contamination
   before use. Membrane proteins isolation was performed with the Pierce^®
   Cell Surface Protein Isolation Kit (Pierce, Thermo Fisher Scientific,
   United States), and followed the protocol described by [65]de Wit et
   al. (2012). All reagents were cooled to 4°C before protein
   biotinylation. The cells were washed twice with ice-cold phosphate
   buffered saline (PBS) followed by incubation with 0.25 mg/mL
   Sulfo-NHS-SS-biotin (Pierce) in 48 mL ice-cold PBS per flask on orbital
   shaker for 30 min at 4°C. Then, 500 μL of quenching solution were added
   to each flask to quench the reaction. After being washed with ice-cold
   PBS, harvested by gentle scraping, and pelleted by centrifugation, the
   cells were lysed using the Lysis Buffer (Pierce) which was added with
   protease inhibitors for 30 min on ice with vortexing every 5 min for 5
   s. The cell lysates were centrifuged at 10,000 × g for 2 min at 4°C to
   remove cell remnants. Before clarified supernatant was used to purify
   biotinylated proteins on NeutrAvidin Agarose (Pierce), 500 μL of
   NeutrAvidin Agarose slurry was added and centrifuged 1 min at 1,000 × g
   and the flow-through was discarded followed by washing with Pierce Wash
   Buffer in a provided column (Pierce) trice. The clarified supernatant
   was added and incubated for 2 h at 4°C using an end-over-end tumbler to
   mix vigorously and allow the biotinylated proteins to bind to the
   NeutrAvidin Agarose slurry. Unbound proteins were removed by washing
   with 1% Non-idet-P40 and 0.1% SDS in 500 μL PBS thrice and then by
   washing with 0.1% Non-idet-P40 and 0.5 M NaCl in 500 μL PBS trice.
   Finally, the captured proteins were eluted from the biotin-NeutrAvidin
   Agarose and were collected by column centrifugation at 1,000 × g for 2
   min. Three biological replicates were obtained for both cell lines. All
   protein concentrations were quantified using the BCA protein Assay Kit
   (Pierce) and the lysates were stored at −20°C for further analysis.

Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS)

   Equal amount of proteins was digested overnight at 37°C by trypsin
   (Promega, Madison, WI, United States) using the FASP approach. Briefly,
   30 μg membrane proteins were used and three biological repeats of each
   cell line were prepared. Equal amount of peptide was injected into
   Easy-nLC 1,000 m (Thermo Scientific) coupled with a Q-Exactive mass
   spectrometer (Thermo Fisher Scientific) ([66]Michalski et al., 2011;
   [67]Kelstrup et al., 2012). Peptides were eluted to analytical column
   (75 μm × 15 cm) packed with Jupiter Proteo resin (3 μm, C18, 300 Å,
   Phenomenex, Torrance, CA, United States). The mobile phase consisted of
   buffer A (2% acetonitrile and 0.1% formic acid in water) and buffer B
   (0.1% formic acid in 95% acetonitrile). A flow rate of 250 nL/min and
   60 min of the gradient from 12% B to 32% B was applied for the
   separation of peptides. MS Scan range was from 300 to 1,600 m/z with
   the resolution of 70,000. For MS/MS, scan range was from 200 to 2,000
   m/z with the resolution of 17,500. We performed full MS scan in a
   positive mode and then selected the five most dominant icons from the
   initial MS scan for collision-induced dissociation.

Protein Identification and Quantitation

   To identify proteins from the acquired data, MS/MS spectra were
   searched against the Human SwissProt database (548, 208 sequences)
   ([68]Jungo et al., 2012) using the MASCOT software (version 2.0)
   (Matrix Science, London, United Kingdom). SwissProt database is a high
   quality manually annotated and non-redundant protein sequence database,
   and now more and more bioinformatics data mining algorithms are
   designed using this database ([69]Le et al., 2017a,[70]2019a).

   The parameters for searching were the MASCOT defaults – enzyme of
   trypsin, two missed cleavage, fixed modifications of carbamidomethyl
   (C), and variable modifications of oxidation (M). We set mass tolerance
   to 20 ppm for MS precursors and 0.05 Da for fragment ions, and then
   peptide charges of +2, +3, and +4 were retained. For protein
   identification, we used p-value less than 0.05 and false discovery rate
   (FDR) less than 0.01 at the protein level as the criteria to
   distinguish two peptides. Label-free quantification was performed by
   intensity-based absolute quantification (iBAQ), which was based on at
   least two unique peptides to quantify the different protein profiling
   in the 95C cell and 95D cell membrane. Quantile normalization was
   performed to ensure that each sample had the same distribution, the
   two-fold change and p-value less than 0.01 cut-off was set up for the
   screening of differentially expressed proteins.

Bioinformatics Analysis

Protein Subcellular Localization Annotation and Transmembrane Domain
Prediction

   The first step of the bioinformatics analysis aimed to identify
   membrane proteins. Here, we used the following process to annotate the
   membrane proteins. First, Gene Ontology (GO) cellular component
   annotation of all identified proteins was performed by the R go.db
   package. The GO is a human and machine readable gene annotation
   resource, which has been widely used to enable computational discovery
   in diverse areas such as protein function identification ([71]Le et
   al., 2017b,[72]2019b), text mining in life sciences ([73]Przybyla et
   al., 2016), and underlying molecular disease mechanisms ([74]Kramarz et
   al., 2020). Second, additional subcellular location information was
   downloaded from the UniProt database ([75]Jungo et al., 2012) and added
   to the protein subcellular localization annotation. Third,
   transmembrane domains in all identified membrane proteins were
   predicted by TMHMM^[76]1 Serve v.2.0 ([77]Krogh et al., 2001). This
   gave us a list of 3,240 membrane proteins which were utilized in the
   downstream analysis.

Differential Expression Analysis and Enrichment Analysis

   In the second step of the bioinformatics analysis we searched for
   protein differential expression and enrichment. Utilizing the Student’s
   t-test, we tested the 3,240 membrane proteins for differential
   expression between poorly metastatic 95C and highly metastatic 95D cell
   lines. Proteins with a p-value <0.05 and |log[2]FC| > 1 were considered
   to be significantly differentially expressed and included in the
   reduced set (n = 163) for further analysis. GO and KEGG pathway
   enrichment analysis by the clusterprofiler R package ([78]Yu et al.,
   2012) was performed on the differentially expressed membrane proteins
   in order to understand which function they may affect. Gene set
   enrichment analysis (GSEA) were performed by the GSEA software v.3.0
   ([79]Subramanian et al., 2005) using the molecular signatures database
   MSigDB ([80]Liberzon et al., 2011). At the end of this step we obtained
   a list of 163 differentially expressed proteins and 87 enriched genes.

Multi-Omics Data – MicroRNA

   To integrate genomics data into our analysis, microRNA expression array
   data of 95D and 95C cells ([81]GSE47788) was downloaded from the Gene
   Expression Omnibus (GEO) database ([82]Edgar et al., 2002; [83]Wang X.
   M. et al., 2013). We directly used the differential expression results
   provided by the study. For the 64 differentially expressed microRNAs,
   we applied the miRWALK2.0 ([84]Dweep and Gretz, 2015) software to build
   a list of microRNAs and their gene targets.

Generation of Biological Pathway Subnetwork Connecting MicroRNA and Enriched
Genes

   Having obtained (1) the set of 87 enriched genes and (2) the set 64 of
   differentially expressed microRNA and their gene targets, we build a
   sub-network which significantly close-connects these genes and
   microRNAs. We used the Tied Diffusion through Interacting Events
   (TieDIE) software ([85]Paull et al., 2013; [86]Drake et al., 2016). The
   Multinet pathway database ([87]Brown et al., 2013) together with the
   validated microRNA-target pairs selected by miRWALK2.0 in the previous
   step served as the background network of the TieDIE program. 32
   membrane protein genes and 180 linker genes were selected in this step.

Cancer Hallmarks Enrichment Calculation

   The set of genes in the sub-network built with TieDIE in the previous
   step were used to perform cancer hallmark enrichment which is now a
   popular functional analysis method for a gene clustering ([88]Drake et
   al., 2016; [89]Ge et al., 2020). Cancer hallmark definitions were also
   downloaded from the MSigDB database, and the cancer hallmark pathway
   enrichment were performed by calculating the probability of overlap
   between input genes and the hallmark pathway gene sets and evaluated by
   hyper geometric test with [90]Benjamini and Hochberg (1997) correction
   of p-values. The source code for cancer hallmark enrichment analysis is
   publicly available at: [91]https://github.com/YankongSJTU/CHEA.

Survival Analysis

   Integrating the above-obtained gene lists with survival data, we
   performed patient survival analysis to determine if the selected genes
   have impact on cancer related outcomes. We downloaded mRNA expression
   data (FPKM values) for the 32 membrane protein genes from the Human
   Protein Atlas^[92]2 ([93]Uhlen et al., 2010) of lung adenocarcinoma
   (LAC) and LUSC patients ([94]Tomczak et al., 2015), 925 samples in
   total. From the same source we also obtained patient OS data. For each
   gene, FPKM values from the 20th to 80th percentiles were used to group
   the patients; significant differences in the survival outcomes of the
   groups were examined and the value yielding the lowest log-rank p-value
   was set to be the best cut-off value. Patients were classified into two
   groups: group 1 with values above the cutoff (high expression level
   group) and group 2 with values below the cutoff (low expression level
   group). The outcome differences for each group were calculated using
   the Log-rank test by the Kaplan–Meier method ([95]Bland and Altman,
   1998). Prognostic analysis was performed by using the R packages
   KMsurv, survival, and survminer ([96]Latouche and Aurelien, 2019). All
   p-values were derived from two-tailed statistical tests, and p-value
   <0.05 was considered as statistically significant. At the completion of
   the bioinformatics analysis, six significant gene biomarkers
   significantly correlated with OS were determined.

   We also constructed a prognostic risk index with these selected
   membrane proteins:
   [MATH: <mrow><mrow><mpadded
   width="+2.8pt"><mi>Risk</mi></mpadded><mo>⁢</mo><mi>Index</mi></mrow><m
   o>=</mo><mrow><mo
   movablelimits="false">∑</mo><mrow><mrow><mtext>expression</mtext><mo>⁢<
   /mo><mrow><mo>(</mo><msub><mtext>P</mtext><mrow><mtext>i</mtext></mrow>
   </msub><mo>)</mo></mrow></mrow><mo>*</mo><mfrac><msub><mtext>HR</mtext>
   <mrow><mtext>i</mtext></mrow></msub><mrow><mtext>maximum</mtext><mo>⁢</
   mo><mrow><mo>(</mo><mrow><mtext>expression</mtext><mo>⁢</mo><mrow><mo>(
   </mo><msub><mtext>P</mtext><mrow><mtext>i</mtext></mrow></msub><mo>)</m
   o></mrow></mrow><mo>)</mo></mrow></mrow></mfrac></mrow></mrow></mrow>
   :MATH]
   (1)

   where P[i] (i = 1,2,3,…,K) means the selected K membrane proteins.
   Samples are grouped according to the risk factor levels, and the
   prognosis differences are compared.

Experimental Verification

   The cellular expression of the six candidate proteins were verified
   experimentally via Western Blot analysis. Total cell lysates were
   obtained using RIPA buffer (Thermo Fisher Scientific, United States).
   Proteins were separated by SDS-polyacrylamide gel electrophoresis
   (SDS–PAGE) and transferred to polyvinylidene fluoride (PVDF) membrane
   (Millipore, Burlington, MA, United States). The membranes were blocked
   in PBS, 10% (w/v) skim milk for 1 h in phosphate buffer saline-Tween 20
   (PBS-T), and incubated for 3 h at RT in 5% milk in PBS-T with primary
   antibodies: CDH2, EGFR, ITGA3, ITGA5, ITGB1, and CALR (Abcam,
   Cambridge, United Kingdom). Then, after washing, the PVDF membrane was
   incubated with secondary antibody (The Jackson Laboratory, United
   States) for 40 min at RT. Thermo Scientific SuperSignal West Pico PLUS
   Chemiluminescent Substrate kit (Thermo Fisher Scientific, United
   States) was used for visualization.

Results and Discussion

Protein Identification and Differential Expression Analysis

   We first sought to determine the full set of membrane proteins detected
   by mass spectrometry and then identify the subset of differentially
   expressed ones. A total of 3,241 unique proteins were identified and
   quantified. A total of 3,107 proteins (95.9%) were annotated by GO
   cellular component analysis and 2,887 proteins (89.1%) were annotated
   by using the UniProt subcellular localization database. Subcellular
   localization annotation analysis predicted that 1,762 proteins were
   membrane proteins (54.7%), whereas the TMHMM algorithm predicted that
   590 proteins (18.3%) had a transmembrane domain. Significant
   differences between 95D and 95C cells were observed in the expression
   of 163 membrane proteins (|log[2]FC| > 1 and p-value < 0.05). We
   provide detailed subcellular localization annotations together with
   TMHMM results of all proteins in [97]Supplementary Table S1. All
   differential expression results are listed in [98]Supplementary Table
   S2.

Functional Characterization of Differentially Expressed Membrane Proteins

   Differential expressed membrane proteins may play a key role in tumor
   metastasis. Membrane proteins as well as extra-cellular matrix (ECM)
   molecules, cell adhesion molecules and adhesion receptors form into
   functional complex units and maintain cell–cell adhesions. These
   complexes, once disassembled, will increase tumor metastasis and
   invasion ([99]Gumbiner, 1996; [100]Vadakekolathu et al., 2018). To
   further examine the mechanistic role in cancer metastasis of
   differentially expressed membrane proteins, we performed two-step
   functional enrichment analysis.

   First, GO and KEGG functional enrichment analysis revealed that
   differentially expressed membrane proteins were mainly centered on
   focal adhesion and cell-substrate adherens junctions, and many
   metabolic pathways. For example, the three most significantly enriched
   GO terms were GO:0005925∼focal adhesion (q-value = 7.2e−43),
   GO:0005924∼cell-substrate adherens junction (q-value = 7.2e−43), and
   cell-substrate junction (q-value = 1.2e−42), and the three most
   significantly enriched KEGG pathway terms were hs03010∼Ribosome
   (q-value = 1.07e−16), hsa05412∼Arrhythmogenic right ventricular
   cardiomyopathy (ARVC) (q-value = 1.65e−4), and hsa05416∼Viral
   myocarditis (q-value = 3.22e−3). All detailed enrichment results are
   shown in [101]Supplementary Table S4.

   In the second step of this analysis, GSEA revealed that differentially
   expressed membrane proteins were enriched in 13 GO categories (MSigDB
   c6 Gene Ontology or GO categories) including of
   GO_POSITIVE_REGULATION_OF_LOCOMOTION and GO_CELL_JUNCTION_ORGANIZATION.
   KEGG_FOCAL_ADHESION pathway was also enriched. Similar molecular terms,
   “KEGG_FOCAL_ ADHESION” and “REACTOME_HEMOSTASIS” were also identified
   using other functional gene sets (MSigDB c6 KEGG or Reactome
   categories) ([102]Table 1). Of all 87 enriched genes, 21 were
   considered as a core enrichment set, i.e., genes which were considered
   to contribute the most to the enrichment result According to the
   differential expression analysis, all core enrichment genes were
   significantly up-regulated, including ALCAM, LDOA, TP1A1, TP1B1, TP1B3,
   CDH2, CTNNB1, CXADR, EGFR, EPHA2, GLG1, ITGA3, ITGA5, ITGB1, ITGB3,
   JAM3, MPZL1, NEGR1, PARK7, PTPRF, and SLC16A3 (see [103]Supplementary
   Table 5).

TABLE 1.

   Gene set enrichment result of differentially expressed membrane
   proteins.
   GS(follow, link, to, MSigDB) Size ES NES NOM, p-value FDR, q-value
   GO_POSITIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS 17 0.49060193 2.2359705
   0 0.044999983
   GO_POSITIVE_REGULATION_OF_LOCOMOTION 15 0.52440554 2.2129607 0
   0.044999983
   GO_CELL_PROJECTION_PART 18 0.46777532 2.201335 0 0.044999994
   GO_PLASMA_MEMBRANE_REGION 24 0.5549008 2.1945791 0 0.044999983
   GO_CELLULAR_COMPONENT_MORPHOGENESIS 16 0.51833624 2.1498182 0
   0.044999994
   KEGG_FOCAL_ADHESION 17 0.45756185 1.9556208 0 0.044999994
   REACTOME_HEMOSTASIS 26 0.38783315 1.9427094 0 0.045000017
   GO_CELL_JUNCTION_ORGANIZATION 16 0.48022297 1.9331897 0 0.044999983
   GO_REGULATION_OF_CELL_DIFFERENTIATION 18 0.49768242 1.9189458 0
   0.044999994
   GO_MEMBRANE_MICRODOMAIN 15 0.5177578 1.9009451 0 0.044999994
   GO_TISSUE_DEVELOPMENT 32 0.38096464 1.8797828 0 0.045
   GO_ANATOMICAL_STRUCTURE_FORMATION_INVOLVED_IN_MORPHOGENESIS 18
   0.3993144 1.8518116 0 0.044999994
   GO_REGULATION_OF_PHOSPHORUS_METABOLIC_PROCESS 23 0.42798108 1.8506685 0
   0.045000024
   GO_REGULATION_OF_PROTEIN_MODIFICATION_PROCESS 21 0.46240425 1.8467267 0
   0.045000017
   GO_ENZYME_LINKED_RECEPTOR_PROTEIN_SIGNALING_PATHWAY 15 0.5241594
   1.8411065 0 0.04499999
   GO_RECEPTOR_BINDING 24 0.39534718 1.8257784 0 0.044999983
   GO_MEMBRANE_REGION 31 0.420572 1.8051884 0 0.051817805
   GO_PROTEIN_COMPLEX_BINDING 28 0.36254478 1.787877 0 0.051439002
   GO_REGULATION_OF_CELLULAR_COMPONENT_MOVEMENT 24 0.39252102 1.7859758 0
   0.05110011
   [104]Open in a new tab

   ES, NES, and FDR are enrichment score, normalized ES, and false
   discovery rate, respectively.

MicroRNAs and Significant Sub-Network Derived From TieDIE Program

   MicroRNAs are non-coding RNAs which participate in cellular activity by
   regulating target genes. Increasing number of studies have reported
   that microRNAs are frequently differentially expressed in numerous
   types of human cancer and play an important role in the progression and
   development of NSCLC ([105]Inamura and Ishikawa, 2016; [106]Berrout et
   al., 2017; [107]Chang et al., 2017; [108]Zhang et al., 2017; [109]Iqbal
   et al., 2018). Although microRNAs act only in the cell where they are
   synthesized, they can also influence the functions of neighboring cells
   or play a role in the tumor micro-environment by modulating the ECM
   state ([110]Rutnam et al., 2013).

   Utilizing the TieDIE program (a pathway-based multi-omics method which
   extends on the heat diffusion strategies and uses a network diffusion
   approach to connect proteins and genes related to diseases) we were
   able to link differentially expressed microRNAs with membrane proteins.

   First, 64 differentially expressed microRNAs were obtained directly
   from a previous study ([111]Supplementary Table S2). We successfully
   found 300 targets of 64 microRNAs with the help of MIRWALK 2.0 and
   synthesized a validated microRNA-regulated target network
   ([112]Supplementary Table S3) which served as background database
   together with the Multinet database ([113]Khurana et al., 2013). The
   input membrane proteins and microRNAs were found to be significantly
   close (p < 0.001) in a pathway space based on a background model
   generated by 1,000 permutations of the data ([114]Paull et al., 2013),
   where each input set (membrane proteins, microRNAs) was swapped with
   genes of similar network connectivity while the other one was fixed
   ([115]Figure 2A).

FIGURE 2.

   [116]FIGURE 2
   [117]Open in a new tab

   Characterization of the TieDIE network. (A) The distribution of
   background scores and real score. The distribution of background scores
   is shown as blue bars, while the green line represents the real score.
   (B) Compact subnetwork after manually deleting linker proteins whose
   degree was one from the raw subnetwork constructed by the TieDIE
   program. The rhombus nodes represent for microRNAs, the circle nodes
   represent for linker proteins, while the square nodes represent
   differentially expressed membrane proteins. The node color changes
   according to the fold change values. (C) PPI subnetwork of 32
   differentially expressed membrane proteins. (D) Wheel plot of cancer
   hallmark enrichment of the TieDIE subnetwork.

   We selected a compact sub-network with a high level of specificity,
   which consisted of 216 nodes – four microRNAs (hsa-miR-137,
   hsa-miR-483-5p, hsa-miR-638, and hsa-miR-127-3p), 32 differentially
   expressed membrane proteins (HSPA5, CANX, TJP1, FN1, ITGA3, ITGA4,
   ITGA5, XPO1, JAM3, F11R, SDC1, FLNB, DCTN2, VDAC2, ALCAM, RAB10, SRC,
   LDLR, CDH2, EGFR, CTNNB1, ITGB1, ITGB3, CD44, CD47, HGS, ARHGEF7, HGF,
   LMNA, CALR, PTPRF, and GNA13) and 180 linking proteins connected by 244
   edges ([118]Supplementary Table S6). We manually deleted those linker
   proteins whose degree was 1 and constructed a sub-network consisting of
   four microRNAs, 32 membrane proteins, and 38 linker proteins with 101
   edges ([119]Figure 2B). Focusing on differentially expressed membrane
   proteins, we found 32 membrane proteins which were connected through
   protein–protein interactions (PPI) directly ([120]Figure 2C). Regarding
   the four microRNAs in the sub-network (hsa-miR-137, hsa-miR-483-5p,
   hsa-miR-638, and hsa-miR-127-3p) – only hsa-miR-137 was up-regulated in
   the 95D cell line and the other three were down-regulated in the 95D
   cell lines as comparing with the 95C cell lines.

   We also found that the four sub-network microRNAs were not independent
   of each other but connected by at least one linker protein. The three
   down-regulated microRNAs were previously reported to be associated with
   tumor metastasis in recent studies, and to influence the expression of
   both XPO1 and ALCAM ([121]Ma et al., 2014; [122]Herr et al., 2017;
   [123]Shi et al., 2018; [124]Yue et al., 2018). We focused on
   has-miR-137, which was up-regulated in the high-metastatic (95D) cell
   line and increases invasion and metastasis of NSCLC cells ([125]Chang
   et al., 2017) Four up-regulated membrane proteins, HGF, CTNNB1, ITGB1,
   and ITGA4, involved in focal adhesion pathway, were linked to
   has-miR-137 by two mediation genes – PXN and MET ([126]Figure 2B).
   Paxillin (PXN), whose expression was negatively correlated with
   has-miR-137 ([127]Jiang et al., 2018), encodes a focal
   adhesion-associated protein and plays an important role in signal
   transduction, regulation of migration, proliferation and apoptosis. MET
   encodes tyrosine-protein kinase Met (c-Met) which possesses tyrosine
   kinase activity and is a well-characterized driver of oncogenesis
   occurs in multiple cancers include of NSCLC ([128]Gentile et al.,
   2008). In this study, we found that MET and PXN, which are regulated by
   has-miR-137, may affect membrane proteins through PPI. Although
   proto-oncogene tyrosine-protein kinase SRC was found to be down
   regulated in the high-metastasis (95D) cell line in this study, many
   other research indicated SRC was highly expressed in NSCLC
   ([129]Giaccone and Zucali, 2008; [130]Rothschild et al., 2010).

Cancer Hallmark Pathway Enrichment

   We show that our TieDIE sub-network is significantly close in the
   pathway space. Considering the cancer hallmark pathway enrichment of
   all proteins involved in the TieDIE sub-network (32 differentially
   expressed membrane proteins and 180 linker proteins), we found the five
   main cancer hallmark pathway categories were significantly enriched,
   including of cell cycle pathway category (p-value = 0.0154),
   inflammatory response pathway category (p-value = 0.0105), metabolism
   pathway category (p-value = 0.0196), migration and invasion pathway
   category (p-value = 5.8380e−06), and PI3K/AKT mTOR pathway category
   (p-value = 0.0292) ([131]Table 2).

TABLE 2.

   Cancer hallmark pathway enrichment result of all proteins involved in
   the TieDIE sub-network.
   Hallmark main category Hallmark sub-categories p-value Enrichment ratio
   Gene number
   Cell cycle E2F Targets, G2M Checkpoint, p53 Pathway, Mitotic Spindle,
   Apoptosis 0.015443819 1.35328 29
   DNA repair PI3K/AKT/mTOR signaling, MTORC1 signaling 0.914616875
   0.584921 7
   Inflammatory response IL2/STAT5 signaling, IL6/JAK/STAT3 signaling,
   inflammatory response, interferon alpha response, interferon gamma
   response, TNF alpha signaling via NFKB 0.010537424 1.56327 19
   Metabolism Hedgehog signaling, Myc targets, notch signaling, TGF beta
   signaling, WNT beta-catenin signaling 0.019642139 1.4941 18
   Migration and invasion Apical junction, epithelial/mesenchymal
   transition 5.84E−06 2.35103 24
   Nuclear receptor signaling DNA repair, UV response 0.199991976 1.16037
   12
   PI3K AKT mTOR cholesterol homeostasis, fatty acid metabolism,
   glycolysis, reactive oxygen species pathway 0.029191187 1.58529 12
   Stemness Angiogenesis, hypoxia 0.302419603 1.06367 11
   Tumor microenvironment Androgen response, estrogen response early and
   late (merged to become nuclear receptor response) 0.401740261 0.966142
   6
   [132]Open in a new tab

   In addition, 10 cancer related hallmark pathways were also
   significantly enriched, including apoptosis pathway (p-value = 0.0221,
   cholesterol homeostasis pathway (p-value = 0.04152),
   epithelial/mesenchymal transition pathway (p-value = 0.0048), estrogen
   response early and late pathway (p-value = 0.04203), G2M checkpoint
   pathway (p-value = 0.0322), glycolysis pathway (p-value = 0.0048),
   Il6/Jak/Stat3 signaling pathway (p-value = 0.0067), mitotic spindle
   pathway (p-value = 0.0130), Mtorc1 signaling pathway (p-value = 0.0131)
   and TGF beta signaling pathway (p-value = 0.0500). Hallmark “wheels”
   were colored proportionally to the negative log transformed p-values
   returned by the hypergeometric test ([133]Supplementary Table S7)
   ([134]Figure 2D). These results further demonstrate the importance of
   our selected membrane proteins.

Prognostic Value of Differentially Expressed Membrane Proteins in TieDIE
Sub-Network (Overall Survival)

   The OS prognostic value of the 32 membrane proteins in the TieDIE
   sub-network was evaluated by performing log-rank test in the TCGA lung
   cancer cohort ([135]Supplementary Table S8). Based on the best cut-off
   of each gene, Kaplan–Meier (KM) curves were generated for the high
   expression level group and the low expression level group. The
   expression group definition is described in section “Survival
   Analysis.”

   High expression of CDH2 (HR = 1.2874, 95% CI: 1.0425–1.5898, p =
   0.0279), EGFR (HR = 1.2927, 95% CI: 1.0496–1.5921, p = 0.0166), ITGA3
   (HR = 1.2965, 95% CI: 1.0202–1.6477, p = 0.0379), ITGB1 (HR = 1.5930,
   95% CI: 1.2906–1.9663, p = 2.1922e−05), and ITGA5 (HR = 1.4656, 95% CI:
   1.1836–1.8148, p = 5.8933e−04) was negatively associated with OS in
   NSCLC patients. Low expression of CALR (HR = 0.7930, 95% CI:
   0.6454–0.9744, p = 0.0279) was associated with worse OS for NSCLC
   patients ([136]Figure 3). According to the differential expression
   analysis in this work; CDH2, EGFR, ITGA3, ITGA5, and ITGB1 were all
   highly expressed in the high- metastatic cell lines while CALR has low
   expression (see [137]Supplementary Figure S1). This is consistent with
   the widely accepted fact regarding NSCLC – that patients with
   metastasis have a poor prognosis.

FIGURE 3.

   [138]FIGURE 3
   [139]Open in a new tab

   Kaplan-Meier curves of overall survival for CHE2, EGFR, ITGA3, ITGA5,
   ITGB1, and CALR.

   A Risk Index was computed for each sample by applying the formula
   described in the section “Materials and Methods.” When comparing the
   prognosis differences between the high and low risk factor groups, we
   found that the high risk group showed poor prognosis (p-value < 1e−05)
   (see [140]Supplementary Figure S2). However, not all
   prognosis-associated genes match this pattern. We also found five other
   proteins – CD47, FN1, VDAC2, HGF, and ITGA4, which showed significant
   correlation with OS of NSCLC patients, however, the direction of the
   correlation was non-intuitive. HGF, ITGA4, and CD47 were also
   associated with increase in OS and we can observe that patients may
   live longer when these genes are highly expressed. However, high
   expression of HGF, ITGA4, and CD47 is observed in 95D cell lines (high
   metastasis cell lines) as compared with low metastasis cell lines.
   Finally, FN1 and VDAC2 are down regulated in the 95D cell lines but
   lower expression level of these proteins will result in longer OS
   ([141]Supplementary Figure S1 – KM curve). A potential mechanistic
   explanation is due to inconsistent expression of protein level and gene
   level ([142]Vogel and Marcotte, 2013).

Validation of Altered Protein Expression

   To validate the quantitative differences observed by mass spectrometry
   and bioinformatics analysis, the expression levels of six proteins were
   verified in the two (95C and 95D) cell lines through Western blot. The
   experimental results confirmed our prediction ([143]Figure 4).
   Calreticulin (CALR gene production) showed low expression levels in
   high-metastasis cell lines (95D), while the other five proteins (CDH2,
   EGFR, ITGA3, ITGB1, and ITGA5) were highly expressed. The high
   expression of CDH2, EGFR, ITGA3, ITGA5, and ITGB1 and the low
   expression of CALR in lung cancer have been validated by experimental
   techniques (Western blot and/or immunohistochemistry) in previous
   studies ([144]Boelens et al., 2007; [145]Grinberg-rashi et al., 2009;
   [146]Wang H. et al., 2013; [147]Liu et al., 2015; [148]Wu et al., 2016;
   [149]Du et al., 2017). Combined with the results of this study, we can
   infer that from the onset of lung cancer, the high expression level of
   the five proteins (CDH2, EGFR, ITGA3, ITGB1, and ITGA5) and the low
   expression of ACALR may serve as biomarkers to determine whether the
   tumor has a high metastasis potential.

FIGURE 4.

   [150]FIGURE 4
   [151]Open in a new tab

   Western blot verification results. (A) Western blot assays of the
   protein level of CDH2, EGFR, ITGA3, TIGB1, TIGA5, and CALR. (B) The
   gray levels of Western blotting are shown by bar graph.

   Besides biochemical validation of the six proteins (CDH2, EGFR, ITGA3,
   ITGB1, and ITGA5), we also performed literature verification.
   N-cadherin (CDH2), which is a member of the cadherin family and is
   involved in EMT and cancer metastasis ([152]Hazan et al., 2004), has
   been reported as being highly expressed in LAC tissues. It was further
   revealed that LAC migration and invasion are suppressed after knocking
   down CDH2 ([153]Zhao et al., 2013). Regarding epidermal growth factor
   receptor (EGFR), its gene amplification was shown as significantly
   increased in tumor cells and it was closely related to metastasis and
   TNM stage ([154]Jia et al., 2015). Previous studies have verified the
   prognostic value of ITGA3, ITGA5, and ITGB1 expression on relapse and
   metastasis in lung cancer ([155]Zheng et al., 2016). In addition, CALR
   was shown to be an independent prognostic factor for lung cancer
   ([156]Liu et al., 2012), and the reduction in the expression of CALR
   was associated with an increased rate of proliferation ([157]Bergner et
   al., 2009).

Conclusion

   The high metastatic status of giant cell lung cancer is strongly
   associated with the abnormal expression of membrane proteins, and
   microRNAs play a key role in regulation of expression. From this study,
   we conclude that the high expression of has-miR-137 and its indirect
   targets-CDH2, EGFT, ITGA3, ITGB1, ITGA5 and the low expression of CALR
   serve as markers of high-metastasis status of giant cell lung cancer.
   Our study provides a new approach to the analysis of integrated
   proteome and microRNAs and the synthesized sub-network provides
   candidate targets for giant-cell lung metastasis treatment.

Data Availability Statement

   The mass spectrometry proteomics data generated for this study was
   deposited into the ProteomeXchange Consortium via the PRIDE
   ([158]Perez-Riverol et al., 2019) ([159]https://www.ebi.ac.uk/pride/)
   partner repository with the dataset identifier
   [160]http://www.ebi.ac.uk/pride/archive/projects/PXD016912.

Author Contributions

   YK participated in the data acquisition, performed the statistical
   analysis, and drafted the manuscript. ZQ and YR conceived of the study
   and participated in its design. GG contributed to the data
   interpretation and manuscript writing. MG conducted western-blot
   experiment and summarized results. All authors read and approved the
   final manuscript. All aspects of the study were supervised by HX, HL,
   and HZ.

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

   Funding. This work is supported by National Key R&D Program of China
   2018YFC0910500, SJTU-Yale Collaborative Research Seed Fund, and the
   Neil Shen’s SJTU Medical Research Fund.
   ^1

   [161]http://www.cbs.dtu.dk/services/TMHMM/
   ^2

   [162]https://www.proteinatlas.org

Supplementary Material

   The Supplementary Material for this article can be found online at:
   [163]https://www.frontiersin.org/articles/10.3389/fgene.2020.01023/full
   #supplementary-material
   FIGURE S1

   The KM curve of 5 genes on TCGA lung cancer cohort.
   [164]Click here for additional data file.^ (1.3MB, TIF)
   FIGURE S2

   The KM curve of risk index on TCGA lung cancer cohort.
   [165]Click here for additional data file.^ (666.5KB, TIF)
   TABLE S1

   Identification results of membrane proteins.
   [166]Click here for additional data file.^ (3.9MB, XLS)
   TABLE S2

   Different expression analysis results between high and low metastatic
   cell lines.
   [167]Click here for additional data file.^ (3.9MB, XLS)
   TABLE S3

   Validated microRNA-regulated targets network.
   [168]Click here for additional data file.^ (1.4MB, XLS)
   TABLE S4

   GO and KEGG pathway the enrichment results.
   [169]Click here for additional data file.^ (254.5KB, XLS)
   TABLE S5

   Gene set enrichment analysis results.
   [170]Click here for additional data file.^ (122.5KB, XLS)
   TABLE S6

   Compact subnetwork predicted by TieDIE program.
   [171]Click here for additional data file.^ (43KB, XLS)
   TABLE S7

   cancer hallmark pathway enrichment analysis of genes involved in
   subnetwork.
   [172]Click here for additional data file.^ (112KB, XLS)
   TABLE S8

   Raw clinical data and 32 membrane protein expression data on TCGA lung
   cancer cohort.
   [173]Click here for additional data file.^ (425.5KB, XLS)

References