Abstract Gastric adenocarcinoma (GAC), also known as stomach adenocarcinoma (STAD), is one of the most lethal malignancies in the world. It is vital to classify and detect the hub genes and key pathways participated in the initiation and progression of GAC. In this study, we collected and sequenced 15 pairs of GAC tumor tissues and the adjacent normal tissues. Differentially expressed genes (DEGs) were analyzed and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) analysis were used to annotate the unique biological significance and important pathways of enriched DEGs. Moreover, we constructed the protein-protein interaction (PPI) network by Cytoscape and conducted KEGG enrichment analysis of the prime module. We further applied the TCGA database to start the survival analysis of these hub genes by Kaplan-Meier estimates. Finally, we obtained total 233 DEGs consisted of 64 up-regulated genes and 169 down-regulated genes. GO enrichment analysis found that DEGs most significantly enriched in single organism process, extracellular region, and extracellular region part. KEGG pathway enrichment analysis suggested that DEGs most significantly enriched in Protein digestion and absorption, Gastric acid secretion, and ECM-receptor interaction. Furthermore, the PPI network showed that the top 10 hub genes in GAC were IL8, COL1A1, MMP9, SST, COL1A2, TIMP1, FN1, SPARC, ALDH1A1, and SERPINE1 respectively. The prime gene interaction module in PPI network was enriched in protein digestion and absorption, ECM receptor interaction, the PI3K-Akt signaling pathway, and pathway in cancer. Survival analysis based on the TCGA database found that the expression of the FN1, SERPINE1, and SPARC significantly predicted poor prognosis of GAC. Collectively, we identified several hub genes and key pathways associated with GAC initiation and progression by analyzing the microarray data on DEGs, which provided a detailed molecular mechanism underlying GAC occurrence and progression. Subject terms: Gastric cancer, Prognostic markers, Oncogenes Introduction Gastric cancers are the fifth commonest cancer after lung, breast, colorectal and prostate cancers^[38]1. It imposes a considerable health burden worldwide. Gastric adenocarcinoma (GAC), also known as stomach adenocarcinoma (STAD) is the commonest histological type (~90–95%)^[39]2. In 2015, GAC was expected to be diagnosed nearly 777,000 new cases and led to deaths of 350,000 people worldwide^[40]3. Although advances have been made for the diagnostic and therapeutic techniques for decades, the mortality rate of GAC is still high and the global 5-year survival rates remain unsatisfactory^[41]4. It is well known that cancer is usually characterized by abnormally cell cycle activity, which generally results from either mutation in the up- or down-stream signaling pathways or genetic lesions in protein-encoding genes involved in cell cycle. The highly organized and regulated mammalian cell cycle ensures normal and accurate gene duplication, cell division and cell apoptosis^[42]5. Microarray could be used to probe the key biomarkers and provide a better understanding of the molecular mechanisms involved in GAC. Until now, clinically applicable biomarkers are still lacking. Therefore, exploring novel and effective molecular biomarkers to elucidate effective therapeutic targets for GAC patients is still imperative. In this study, we focused on the different expression pattern between the GAC tumor tissues and matched normal tissues. To discover the hub genes and key pathways associated with the initiation and progression of GAC, we applied differential gene expression analysis and functional enrichment analysis. In conclusion, we identified a set of hub genes that participated in several cancer-relevant pathways and their abnormal expression are correlated with the clinical prognosis of GAC people by overall survival analysis. Methods and Materials Patients and samples Tumor and matched normal tissues samples were obtained from the GAC patients at the Affiliated Hospital of Xuzhou Medical University in 2014. These tissues were stored in RNAlater (Ambion, Life Technologies, ThermoFisher Scientific, Waltham, MA, USA) at 4 °C until full penetration of RNAlater into the tissues and transferred to −80 °C for storage. The selection criteria were as follows: (1) the subject presented was diagnosed as GAC and no history of other tumors; (2) Complete demographic and clinical data including age, gender, clinical manifestations, tumor size, the extent of resection, and date of relapse and/or death have been collected. In order to get the formal permission of surgical procedures and the intelligent use of the resected tissues, the legal surrogates of those participants provided their Written informed consent. The National Regulations on the Use of Clinical Samples in China is as a guideline for human tissue acquisition and legitimate use. This study was approved by the Medical Ethics Committee of the Affiliated Hospital of Xuzhou Medical University. The demographic and clinical features of the patient were summarized in Table [43]1. Table 1. The demographic and clinical features of the patient. Gender Male 20 (~67%) Female 10 (~33%) Age Median 43 Range 23–66 Race EthnicHan Stage I 5 (~17%) II 8 (~27%) III 11 (~37%) IV 6 (~20%) [44]Open in a new tab Profiling of gene expression Total RNA was extracted from frozen tissues separately using EZNA ® HP Tissue RNA Kit (Omega Bio-Tek Inc., Norcross, GA, USA) according to the manufacturer’s recommended procedure. Bioanalyzer 2100 (Agilent Technologies, Palo Alto, Calif.) was used to assess the quality and quantity of these total RNAs. Affymetrix microarray was used for mRNA profiling, which was performed by GeneChem (Shanghai Genechem Co., Ltd.). Briefly, after rRNA removal, biotinylated aRNA (cRNA) was prepared according to the manufacturer’s protocol (3′ IVT Express Kit, Affymetrix 901228). PrimeView Human Gene Expression Array (cat. no. 901838; Affymetrix; Thermo Fisher Scientific, Inc.) were hybridized and scanned according to standard Affymetrix protocols. All samples were processed in technical duplicate. GeneChip Scanner 3000 (Affymetrix) was used to scan the completed arrays. Images were extracted with Affymetrix GeneChip Command Console (AGACC) and analyzed by using Expression Console Software (Affymetrix, CA, USA). Data were deposited in the Gene Expression Omnibus database ([45]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118916, [46]GSE118916). Data preprocess and differentially expressed genes identification Probes were converted to gene symbols according to the platform annotation information of the raw data. The expression value for a gene, which was mapped by multiple probes, was acquired by selecting the max value among those probes. Those invalided probes without any gene information were removed. The original CEL data was then started background correction, normalization, and expression calculation by using the R package “affy”. The R package limma ([47]http://www.R-project.org) was used to conduct data normality by log2 transformation. We applied the R package “limma” to identify the differentially expressed genes (DEGs) following the following criteria: (I) |logFC| > 2; (II) P-value < 0.05 and (III) false discovery rate (FDR) < 0.05. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted by using clusterProfiler^[48]6 to reveal the unique biological significance and key pathways associated with GAC of the DEGs (criteria: p-value < 0.05, significantly enriched). Fisher’s exact test^[49]7 was used to identify the significant GO terms and pathways and corrected P-value was obtained by Benjamini and Hochberg (BH) false discovery rate (FDR) algorithm. Cytoscape^[50]8, Enrichment Map^[51]9, and Gephi^[52]10 were used for visualization of the network. The protein-protein interaction network construction The Retrieval of Interacting Genes (STRING v10)^[53]11 ([54]http://string-db.org/) was used to analyze the interactive relationships among DEGs to construct protein-protein interaction (PPI) network and only experimentally validated interactions with a combined score >0.4 were selected as significant. Cytoscape was used to construct the PPI network and Gephi was used to network visualization. The plug-in Molecular Complex Detection (MCODE) was used to select the prime module from the PPI network. The criteria were set as follows: MCODE scores >2 and number of nodes >5. Then the KEGG pathway enrichment analysis of the DEGs from the module was conducted. P < 0.05 was considered to be significant. TCGA data acquisition and processing We searched the GAC cases with both clinical information and gene expression profile from The Cancer Genome Atlas (TCGA) database^[55]12 by using the R package “OIsurv”. The expression value of each hub genes was defined as either high (expression value >median value) or low (expression value 0.4 were selected as significant. The nodes were colored according to whether it belongs to up- or down-regulated genes. The thicknesses of those edges were associated with the combined scored. The size of each node is proportional to the number of connections, that is, the degree. (B) The expression heatmap of TOP10 hub genes. Figure 5. Figure 5 [87]Open in a new tab The prime module from the PPI network. (A) The sub-network of the main module. (B) The enriched pathway of the module. TCGA dataset analysis TCGA research analysis could combine extensive genetic studies of human gene expression with the specific disease. GAC cases were divided into high expression (expression value >median value) and low expression (expression value