Abstract Background Gastric cancer (GC) is the fifth most common cancer and the second leading cause of cancer-related deaths worldwide. Due to the lack of specific markers, the early diagnosis of gastric cancer is very low, and most patients with gastric cancer are diagnosed at advanced stages. The aim of this study was to identify key biomarkers of GC and to elucidate GC-associated immune cell infiltration and related pathways. Methods Gene microarray data associated with GC were downloaded from the Gene Expression Omnibus (GEO). Differentially expressed genes (DEGs) were analyzed using Gene Ontology (GO), Kyoto Gene and Genome Encyclopedia, Gene Set Enrichment Analysis (GSEA) and Protein−Protein Interaction (PPI) networks. Weighted gene coexpression network analysis (WGCNA) and the least absolute shrinkage and selection operator (LASSO) algorithm were used to identify pivotal genes for GC and to assess the diagnostic accuracy of GC hub markers using the subjects’ working characteristic curves. In addition, the infiltration levels of 28 immune cells in GC and their interrelationship with hub markers were analyzed using ssGSEA. And further validated by RT-qPCR. Results A total of 133 DEGs were identified. The biological functions and signaling pathways closely associated with GC were inflammatory and immune processes. Nine expression modules were obtained by WGCNA, with the pink module having the highest correlation with GC; 13 crossover genes were obtained by combining DEGs. Subsequently, the LASSO algorithm and validation set verification analysis were used to finally identify three hub genes as potential biomarkers of GC. In the immune cell infiltration analysis, infiltration of activated CD4 T cell, macrophages, regulatory T cells and plasmacytoid dendritic cells was more significant in GC. The validation part demonstrated that three hub genes were expressed at lower levels in the gastric cancer cells. Conclusion The use of WGCNA combined with the LASSO algorithm to identify hub biomarkers closely related to GC can help to elucidate the molecular mechanism of GC development and is important for finding new immunotherapeutic targets and disease prevention. Keywords: gastric cancer (GC), hub markers, immune cell infiltration, WGCNA, LASSO 1. Introduction GC is one of the most common malignancies in the human digestive tract. According to Global Cancer Statistics, GC has become the fifth most frequently diagnosed cancer and the third leading cause of cancer deaths, making it a major global health crisis ([31]1). In China, the total number of new cases of GC in 2020 was 478,000, ranking 2nd in the number of incidences of malignant tumors and 373,000 deaths, ranking 3rd in the number of deaths from malignant tumors ([32]2). The above figures are sufficient to show that GC is highly malignant, has a low survival rate and poor prognosis and is a serious threat to human health and life. GC is a malignant disease caused by a combination of factors, such as Helicobacter pylori infection, unhealthy lifestyle, genetics and immune cell imbalance. The pathogenesis of GC is still not fully understood, but the activation of proto-oncogenes caused by the abovementioned oncogenic factors is an important molecular mechanism. The molecular mechanisms involved in the pathogenesis of the disease still need to be further elucidated. Clinical treatments for GC based on surgical resection, chemotherapy, radiotherapy or a combination of targeted therapies have difficulty completely removing the tumor lesions, and the tumor is prone to progression or recurrence with high toxic side effects, with a 5-year survival rate of patients as low as 10% to 15% ([33]3–[34]5). It is important to emphasize that GC is usually asymptomatic in the early stages, and some patients are already at an advanced stage when diagnosed, with a survival rate of only 24% ([35]6). Therefore, it is important to develop effective biomarkers for the prognosis of gastric cancer and for targeted therapy. The tumor microenvironment (TME), due to its key role in cancer progression and drug resistance, has emerged as a potential immunotherapeutic target for a variety of malignancies, including GC. The TME consists of different cell types, including immune and inflammatory cells (lymphocytes and macrophages), stromal cells (fibroblasts, adipocytes and pericytes), small cell organelles, RNA, blood vessels and lymphatic vessels, extracellular matrix (ECM) and secreted proteins. The cells involved in the GC immune microenvironment are called tumor infiltrating immune cells (TIICs) ([36]7). Immunotherapy in the treatment of advanced GC improves survival and is associated with good survival in GC patients, according to the results of the CheckMate 649 case study presented at the European Society for Medical Oncology (ESMO) 2020 virtual meeting ([37]8, [38]9). However, recent studies have found that abnormal activation of the immune system may also be a key factor in the development of GC ([39]10). In short, tapping into immune cell-related targets is an effective pathway to optimize tumor immunotherapy. Due to advances in genomic technology, bioinformatics analysis of gene expression profiles has become increasingly popular in molecular mechanistic studies and is playing an increasingly important role in the discovery of disease-specific biomarkers. Weighted gene coexpression network analysis was proposed by Zhang & Horvath in 2005 as a systematic algorithm widely used for bioinformatics data, avoiding the drawbacks of traditional differential gene screening methods, which tend to miss core molecules in the regulatory process and make it difficult to explore the whole biological system, and has been widely used to screen molecular diagnostic markers or therapeutic targets for complex diseases ([40]11, [41]12). This provides a new way to predict the function of coexpressed genes and to find genes that play a key role in human disease. LASSO is a regression method that allows the calculation of correlation coefficients between variables and more accurate screening of variables ([42]13). There have been a host of studies on screening GC biomarkers based on bioinformatics methods both domestically and internationally, but there are problems with a small sample size and a single data analysis method as well as lack of further experimental verification ([43]14–[44]16). Thus, this article comprehensively utilizes various bioinformatics methods to integrate and analyze gene datasets from multiple platforms, and expand sample size and validated by in vitro cellular experiments, for improving the scientific nature of bioinformatics analysis, and in order to more accurately explore the pathogenesis and therapeutic targets of GC, and provide molecular biology basis and new research ideas and directions for subsequent experimental research. Based on the above, this study used the [45]GSE54129 and [46]GSE65801 datasets to construct a gene weighted coexpression network by the WGCNA algorithm to screen out pivotal modules that are highly relevant to the development of GC, analyze the biological functions of the pivotal modules and use the LASSO regression model to screen key genes and validate them with the [47]GSE118916 dataset, and then further identify important prognostic molecular markers and assess the extent of associated immune cell infiltration, with a view to providing new references for studying the development of GC, potential molecular