Abstract Gastric cancer (GC) is one of the most common malignancies and its prognosis is extremely poor. This study identifies a novel oncogene, microfibrillar-associated protein 2 (MFAP2) in GC. With integrative reanalysis of transcriptomic data, we found MFAP2 as a GC prognosis-related gene. And the aberrant expression of MFAP2 was explored in GC samples. Subsequent experiments indicated that silencing and exogenous MFAP2 could affect motility of cancer cells. The inhibition of silencing MFAP2 could be rescued by another FAK activator, fibronectin. This process is probably through affecting the activation of focal adhesion process via modulating ITGB1 and ITGA5. MFAP2 regulated integrin expression through ERK1/2 activation. Silencing MFAP2 by shRNA inhibited tumorigenicity and metastasis in nude mice. We also revealed that MFAP2 is a novel target of microRNA-29, and miR-29/MFAP2/integrin α5β1/FAK/ERK1/2 could be an important oncogenic pathway in GC progression. In conclusion, our data identified MFAP2 as a novel oncogene in GC and revealed that miR-29/MFAP2/integrin α5β1/FAK/ERK1/2 could be an important oncogenic pathway in GC progression. Subject terms: Gastric cancer, Cancer microenvironment Introduction Gastric cancer (GC) is one of the most common and lethal malignant cancer throughout the world, particularly in Eastern Asian and South American countries^[56]1. Surgery is the optimal strategy of treating patients with GC; unfortunately, the application of surgical resection in patients with GC is limited, as most patients are diagnosed at an advanced stage of the disease^[57]2. What is more, many cases of GC are also not sensitive to chemotherapy and radiotherapy, making the situation more severe^[58]2. Recent years have witnessed the great progress of targeted cancer therapies; however, for GC patients, only trastuzumab, a monoclonal antibody against human epidermal growth factor receptor 2, and ramucirumab, a monoclonal antibody against vascular endothelial growth factor receptor 2, proved to have certain therapeutic effects and are widely applied in clinic^[59]3,[60]4. Current treatment regimens for GC are still not adequate. Researchers are trying to clarify the biological mechanisms underlying tumorigenesis and progression of GC, aiming to provide novel clues to fight against this fatal disease. With the rapid development of high-throughput detection techniques, gene expression data are accumulating rapidly in public repositories and a massive amount of differentially expressed genes (DEGs) between GC and normal tissue has been identified in several studies^[61]5–[62]9. Many DEGs have been validated as oncogenes or tumor suppressors, which effect different malignant phenotypes of GC including proliferation, angiogenesis, metastasis, and chemoresistance via activating or inactivating multiple downstream signaling pathways^[63]5–[64]9. But owing to different sample resources, experimental techniques, and bioinformatics algorithms, the results among these studies are greatly divergent, and there is still no widely accepted factor dominating the malignant transformation and progression of GC. Integrative reanalysis of independent transcriptomic data may indicate common and remarkable changes during GC progression. In this study, by integrative analysis of datasets from either Gene Expression Omnibus (GEO) or The Cancer Genome Atlas (TCGA) databases, we successfully unveiled a set of DEGs that were invariably dysregulated in each cohort. Intriguingly, the functions of intersecting DEGs were found to significantly focus on the biological processes, such as extracellular space, extracellular matrix (ECM) organization, extracellular exosome, collagen catabolic process, and ECM–receptor interaction. ECM provides both the structure and signals that modulate biological behavior of cells, and recent studies have established the importance of the remodeling of ECM in cancer progression^[65]10,[66]11. Our results implied that matrix remodeling was a hallmark of GC, which was probably underestimated in the past. To further verify the crucial role of matrix remodeling in GC progression, we conducted survival analysis and obtained 14 genes associated with prognosis of GC patients, including SPARC, MFAP2, SERPINE1, LOX, PDGFRB, OLFML2B, VCAN, COLA18A1, SPON2, COL4A2, CHD11, NRP1, NREP, and COL4A5. Consistent with our expectations, most of them were important components of ECM or important modulators of matrix remodeling. This provided further evidence implying the crucial role of ECM in GC progression. Among the 14 genes, we were particularly interested in MFAP2 (the microfibrillar-associated protein 2), which is also named microfibril-associated glycoprotein 1 (MAGP1). It is a 183-amino acid protein composed of two domains: a proline- and glutamine-enriched residues in amino terminal half and a 54-amino acid region in carboxy terminal half that targets itself to ECM^[67]12,[68]13. Its extracellular form binds to fibrillin, collagen VI, tropoelastin, decorin, and biglycan^[69]14, and the intracellular form of MFAP2 upregulated the expression of downstream genes linked to cell adhesion, motility, and matrix remodeling^[70]13. Recently, the function of MFAP2 in metabolic disease has attracted a lot of attention. Previous studies demonstrated that, in adipose tissue, MFAP2 had high affinity for members of the transforming growth factor (TGF)-β superfamily, and in the absence of MFAP2, there was an increase in basal TGF-β activity^[71]15,[72]16. However, its role in cancer biology is still obscure. In this study, we validated that MFAP2 was upregulated in GC tissue, and it was implicated in the malignant behavior of GC cells, such as proliferation, migration, and invasion. We also demonstrated that it activated focal adhesion kinase (FAK), paxillin, and extracellular signal-regulated kinase 1/2 (ERK1/2) through the MFAP2/integrin α5β1/FAK/ERK1/2 pathway. Furthermore, we explored the mechanisms of its expression dysregulation in GC. Loss of microRNA29 (miR-29) is known to be a mechanism of fibrosis and we found that MFAP2 was a target of miRNA-29 family, and its aberrant high expression was probably due to the absence or inhibition of miR-29 family. In general, we reveal a set of GC-related genes that are potential diagnostic biomarkers and therapy targets. We also demonstrate that the novel oncogene MFAP2 endows cancer cells by activating integrin signaling. Finally, we provide evidence that miR-29 family members have potential to inhibit MFAP2 and at least partly reverse the aberrant matrix status of GC. Materials and methods Study strategy The workflow of data mining and the number of candidate genes remaining at each step are shown in Fig. [73]1. Fig. 1. Workflow of data mining. Fig. 1 [74]Open in a new tab Gastric cancer (GC)-related RNA sequence data were used to screen differentially expressed genes (DEGs) between GC and normal gastric tissues. After taking intersection from different cohorts, DEGs were further screened to identify prognosis-associated genes. Number of candidate genes remaining at each step is shown. Patients and gene expression data In this study, five cohorts of patients with GC ([75]GSE29272, [76]GSE79973, [77]GSE62254 and [78]GSE15459 and TCGA) were used for identifying and validating prognostic biomarkers. Description of these cohorts is presented in [79]Supporting Information. Identification of DEGs DEGs between matched GC and adjacent normal gastric tissues were identified using TwoClassDif^[80]17,[81]18. Briefly, we first filtered DEGs with a fold-change (Tumor/Normal) of >1.5 or <0.67. Next, we confirmed the DEGs with the random variance model-modified t test to reduce statistical errors. Venn diagrams were drawn by online BioVenn website ([82]http://www.biovenn.nl/index.php). Functional annotation DAVID database and Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were used to explore the potential biological function of intersecting DEGs from different cohorts^[83]19. P value < 0.01 and false discovery rate (FDR) <0.25 were set as the cutoff criteria. Identification of prognostic genes Patients were classified as either high-expression (more than the median expression level of DEGs) or low-expression (less than the median expression level of DEGs) groups according to the expression of intersecting DEGs one by one. Univariate analyses of overall survival (OS) were performed with two-sided log-rank test to compare the differences between the two groups. Kaplan–Meier plots were made using an online dataset ([84]http://www.kmplot.com)^[85]20 with the data of [86]GSE15459 and [87]GSE62254. The analysis was performed using both disease-free survival (DFS) and OS information of patients. The patients were split by median. Lentivirus transfection To knockdown the expression of MFAP2, we infected AGS and HGC-27 cells with the MFAP2–short hairpin RNA (shRNA) recombinant lentivirus (Genepharma, Suzhou, China). Detailed protocol of lentivirus transfection is presented in [88]Supporting Information. Cell proliferation assay MTT assay was performed using Thiazolyl Blue Tetrazolium Bromide (MTT, M2128, Sigma) following the manufacturer’s recommendations. The cell viability was detected using the multifunctional microplate reader at 490 nm with cells incubated for 2 h at 37 °C. The relative absorbance value was normalized and compared to the control group. In vitro migration and invasion assays In the migration assay, cells were plated into the upper chamber of 8-mm-pore-size Transwell chambers (Corning, Corning, NY). Dulbecco's modified Eagle’s medium containing 10% fetal bovine serum was added into the lower chamber. Then the chambers were incubated at 37 °C for 48 h. Cells in the upper chamber were then removed, and the bottom surface of the membranes was counted using 0.1% crystal violet dye. In the invasion assay, matrigel (Clontech, Madison, WI) was used in the Transwell chambers (Corning). Cell migration and invasion were qualified by counting six random fields under a microscope. Immunofluorescence AGS cells were grown to confluency on glass coverslips. Cells were fixed with 3.7% paraformaldehyde in phosphate-buffered saline for 20 min. Cells were permeabilized with 0.1% Triton X-100 for 5 min at 4 °C and then blocked with 5% bovine serum albumin in TBST for 1 h. Samples were incubated with primary antibodies overnight for 4 °C and then with appropriate secondary antibodies. Samples were mounted onto slides with mounting medium, and images were acquired using a fluorescence microscope. Images were processed using the Photoshop software (Adobe). Luciferase assay Cells were plated into 24-well plates and cotransfected with 200 ng of psiCHECK-2 plasmids and 50 nmol/l of miR-29a (or NC microRNA) for 48 h. Luciferase activities were then measured using Dual-Luciferase Reporter Assay system (Promega, Madison, WI). Renilla luciferase activity was normalized to firefly luciferase activity. In vivo assays Animal protocols were approved by the Institutional Animal Care and Use Committee of the Renmin Hospital of Wuhan University. Nude mice (4–5-week old) were raised in an specific pathogen-free environment at the experimental animal center of the Renmin Hospital of Wuhan University. Xenograft tumor growth models were established by subcutaneous injection of MFAP2 knockdown cells and NC cells (2 × 10^6 cells) into the right dorsal flank. Tumor growth in the nude mice was observed for 28 days. Tumor volume (V, cm^3) was evaluated based on tumor length (L) and width (W) with the following formula: V = 1/2 × L × W^2. In order to test how MFAP2 affect tumor metastasis, we established metastatic tumor model by giving intravenous tail vein injections of 1 × 10^5 MFAP2 knockdown cells to two groups of mice. After 7 weeks, the mice were sacrificed, and the tumor nodules formed on the lung and liver surfaces were counted. The tumors were embedded in paraffin for further study. All animal studies were conducted with the approval of the Renmin Hospital of Wuhan University and Use Committee. Statistical analysis The correlation between gene expression and the clinicopathologic features was analyzed by Chi-square test using SPSS 20.0 (International Business Machines, Armonk, NY, USA). Three independent experiments were conducted in cellular studies, and results were analyzed using the two-tailed, unpaired Student’s t test. The mean standard deviation (SD) of three independent experiments was determined. Results were expressed as mean ± S.E.M. P < 0.05 was considered statistically significant. Results Identification of DEGs in GC From the expression profile datasets [89]GSE29272 (n = 134), [90]GSE79973 (n = 10), and TCGA (n = 374), we extracted 1352, 2845, and 3453 DEGs, respectively. Two-dimensional hierarchical clustering showed a marked difference of expression modules of the DEGs (Fig. [91]2a–c). Taking the intersection of DEGs from the three datasets, we extracted 279 genes differently expressed in the GC tissues compared to normal tissues, including 171 upregulated and 108 downregulated genes (Fig. [92]2d, Table [93]S1). Fig. 2. Identification of differentially expressed genes (DEGs) and prognosis-associated genes. [94]Fig. 2 [95]Open in a new tab a–c Using two-dimensional hierarchical clustering, 1352, 2845, and 3453 DEGs were identified from the expression profile datasets [96]GSE29272 (n = 134), [97]GSE79973 (n = 10), and TCGA (n = 374), respectively. d Taking the intersection of DEGs from the three datasets, 279 DEGs were extracted between GC and normal gastric tissues. e DEGs in intersection were mapped onto the DAVID database and subjected to Gene Ontology (GO) functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. GO function and KEGG pathway analysis show that DEGs in intersection are significantly associated with matrix remodeling process. f Using data with clinical information from [98]GSE62254 (n = 300) and TCGA (n = 374), log-rank test was performed to explore the prognostic value of the intersecting DEGs. Ninety-two and 29 genes were closely related to patients’ overall survival in [99]GSE62254 and TCGA, respectively. Taking the intersection of the two datasets, 14 prognostic biomarkers were obtained. g Among the 14 prognostic biomarkers, we are most interested in microfibrillar-associated protein 2 (MFAP2). Kaplan–Meier survival for overall survival (OS) and disease-free survival (DFS) of GC patients was performed. OS (P = 0.009) and DFS (P = 0.008) of GC patients in [100]GSE15459 was significantly negatively associated with the expression of MFAP2. h OS (P = 0.027) and DFS (P = 0.019) of GC patients in [101]GSE62254 was significantly negatively associated with the expression of MFAP2. The DEGs in intersection are significantly associated with matrix remodeling process As shown in Fig. [102]2e, in GO functional analysis, biological processes of the 279 DEGs were found to focus on the extracellular space (P = 1.12 × 10^−16), ECM organization (P = 3.20 × 10^−15), extracellular exosome (P = 7.84 × 10^−15), ECM (P = 5.12 × 10^−14), etc. In KEGG pathway analysis, ECM–receptor interaction (P = 3.56 × 10^−7), protein digestion and absorption (P = 3.03 × 10^−6), cell cycle (P = 1.59 × 10^−5), and focal adhesion (P = 1.63 × 10^−4) were identified as significant pathways. Collectively, these results implied that the dysregulation of ECM-related proteins are common features in different cohorts. Identification of prognostic genes among the DEGs Using data with clinical information from [103]GSE62254 (n = 300) and TCGA (n = 374), we further explored the prognostic value of the 279 DEGs. As shown in Fig. [104]2f, 92 and 29 genes were closely related to patients’ OS in [105]GSE62254 and TCGA, respectively. Taking the intersection of the two datasets, 14 prognostic biomarkers were obtained (Fig. [106]2f, Tables [107]1, [108]2). Most of the 14 genes were closely related to matrix remodeling, which further supported that matrix remodeling is a crucial character of GC progression. Among the 14 genes, most of them have been reported in GC such as the well-known oncogenes PDGFRB^[109]21, VCAN^[110]22, and COL18A1^[111]23, while there were also four genes, MFAP2, OLFML2B, NREP, and COL4A5, that have never been studied in GC. We are especially interested in MFAP2. Kaplan–Meier analysis in cohorts [112]GSE15459 and [113]GSE62254 showed that increased MFAP2 expression revealed poor OS and DFS in GC patients (Fig. [114]2g, h). Clinical pathology analysis showed that the expression level of MFAP2 was positively correlated with venous invasion and local invasion (Table [115]S2). Table 1. P value of the 14 prognosis genes in survival analysis. Expression in GC Gene symbol P value of log-rank test [116]GSE62254 TCGA Up SPARC 0.004 0.011 MFAP2 0.006 0.002 SERPINE1 0.020 0.001 LOX 0.044 0.002 PDGFRB 0.004 0.000 OLFML2B 0.004 0.033 VCAN 0.039 0.000 Down COL18A1 <0.001 0.001 SPON2 0.023 0.012 COL4A2 0.000 0.013 CDH11 0.046 0.043 NRP1 0.002 0.009 NREP <0.001 0.006 COL4A5 0.000 0.029 [117]Open in a new tab Table 2. Functions of the 14 prognosis-related genes in gastric cancer. Gene symbol Description Roles in matrix remodeling Roles in cancer Roles in GC References