Abstract Background Stomach adenocarcinoma (STAD) is one of the most common malignancies. Infection of helicobacter pylori (H. pylori) is a major risk factor that leads to the development of STAD. This study constructed a risk model based on the H. pylori-related macrophages for predicting STAD prognosis. Methods The single-cell RNA sequencing (scRNA-seq) dataset and the clinic information and RNA-seq datasets of STAD patients were collected for establishing a prognostic model and for validation. The “Seurat” and “harmony” packages were used to process the scRNA-seq data. Key gene modules were sectioned using the “limma” package and the “WGCNA” package. Kaplan-Meier (KM) and Receiver Operating Characteristic Curve (ROC) analyses were performed with “survminer” package. The “GSVA” package was employed for single sample gene set enrichment analysis (ssGSEA). Cell migration and invasion were measured by carrying out wound healing and trans-well assays. Results A total of 17397 were screened and classified into 8 cell type clusters, among which the macrophage cluster was closely associated with the H. pylori infection. Macrophages were further categorized into four subtypes (including C1, C2, C3, and C4), and highly variable genes of macrophage subtype C4 could serve as an indicator of the prognosis of STAD. Subsequently, we developed a RiskScore model based on six H. pylori -associated genes (TNFRSF1B, CTLA4, ABCA1, IKBIP, AKAP5, and NPC2) and observed that the high-risk patients exhibited poor prognosis, higher suppressive immune infiltration, and were closely associated with cancer activation-related pathways. Furthermore, a nomogram combining the RiskScore was developed to accurately predict the survival of STAD patients. ABCA1 in the RiskScore model significantly affected the migration and invasion of tumor cells. Conclusion The gene expression profile served as an indicator of the survival for patients with STAD and addressed the clinical significance of using H. pylori-associated genes to treat STAD. The current findings provided novel understandings for the clinical evaluation and management of STAD. Keywords: Stomach adenocarcinoma (STAD), Single cell RNA-Seq profile, RiskScore, H. pylori infection phenotype, ssGSEA, WGCNA 1. Introduction Stomach adenocarcinoma (STAD) is a gastrointestinal cancer with the fifth highest incidence and the third highest mortality rate worldwide. Early diagnosis and treatment can increase 5-year survival rate to 90–97 % but this figure drops below 30 % for patients with advanced or metastatic STAD [[39]1,[40]2]. Previous study has classified gastric carcinomas (GC) into intestinal and diffuse types, which all belong to adenocarcinomas [[41]3] but with different pathological and epidemiological features [[42]4]. The World Health Organization (WHO) also provides a histopathology-based classification guideline that divides the GC into mucinous, papillary, tubular and poorly cohesive carcinoma [[43]5]. Although these classification methods are simple to use, pathological results may vary differently due to complex background (e.g. different pathogen infections) and subjective discrimination factors [[44]6]. Currently, molecular subtypes classified based on the biological behaviors of STAD are poorly studied. Studies have shown that the H. pylori is a gram-negative bacterium that colonizes the gastric mucous environment of 60.3 % of the world's population, especially in underdeveloped countries [[45]7]. This is partly due to poor sanitation and high population density in these countries, which can easily lead to oral and fecal-oral transmission of H. pylori [[46]8]. H. pylori is largely associated with increased risk of gastroduodenal disturbances (such as the gastroesophageal reflux disease, peptic ulcerations disease, and inflammatory bowel disease), particularly in mucosa-associated lymphoid tissue lymphoma, STAD and peptic ulcer [[47]8,[48]9]. The H. pylori infects and induces the normal mucosa to transfer chronic superficial gastritis, which further evolves to invasive GC from chronic gastritis, metaplasia and dysplasia [[49]10]. In addition, several factors such as high intake of dietary salt, drinking and smoking [[50]11], genetically high gastric acid secretion will all increase the risk of developing GC [[51]12,[52]13], but the inflammatory response to the persistent infection of H. pylori is a major etiological cause leading to carcinogenesis [[53]14]. Patients with polymorphisms of encoding tumor necrosis factor-α, interleukin-1β and IL-1β receptor antagonist [[54]15,[55]16] are at higher risk of gastric atrophy expansion, hypochlorhydria and GC after H. pylori infection [[56]17,[57]18]. Due to a high mutational rate, H. pylori populations are highly genetically diverse, including large number of deletions, insertions and chromosomal rearrangements that affect the function of housekeeping genes [[58]19]. H. pylori strains vary in their pathogenic properties, which could result in genetic variation in virulence factors, explaining why some infected individuals are asymptomatic and do not have severe pathological changes, while others exhibit peptic ulcer disease or cancers [[59]20,[60]21]. As STAD is often diagnosed at late stage, it is essential to identify factors that lead to the occurrence of the malignancy [[61]22,[62]23]. In addition, the next generation sequencing (NGS) technique has been widely used in molecular profiling research of many tumors. Fabio el. revealed the genetic and epigenetic heterogeneity of STAD by conducting large-scale genomic and transcriptomic studies [[63]24]. The single-cell RNA sequencing (scRNA-seq) characterizes the transcriptional states of a single cell [[64]25], allowing unbiased analysis of cellular characteristics in tumor tissues for exploring tumor ecosystem [[65]26] and cancer-immune heterogeneity [[66]27]. This study performed an unbiased and systemic transcriptomic profile analysis using the data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. The scRNA-seq landscape showed that macrophages were closely associated with the phenotype of H. pylori infection. We further used the differentially expressed genes (DEGs) and macrophage-related module to develop a reliable prognosis model [[67]28] applying WGCNA, un/multivariate and LASSO cox regression analysis. The current findings provided novel insights into the sell-cell molecular landscape, clinical diagnosis and prognosis of STAD. 2. Material and methods 2.1. Data acquisition and preprocessing The RNA-seq data and clinical follow-up information of STAD patients were downloaded from TCGA ([68]https://portal.gdc.cancer.gov/) through the TCGA GDC API tool [[69]29]. After excluding patients without survival time or state from the TCGA-STAD expression profile, a total of 353 STAD samples and 53 para-cancer control samples were obtained. The expression matrix were converted to TPM format and log2-transformed to select protein-encoding genes. Subsequently, [70]GSE66229 containing the expression of 300 STAD samples was retrieved from the GEO ([71]https://www.ncbi.nlm.nih.gov/geo/) [[72]30]. The probes were mapped to the genes based on the corresponding annotation information, and the mean value of expression was taken when multiple probes matched to one gene. In addition, through searching the keywords of KEGG_EPITHELIAL_CELL_SIGNALING_IN_HELICOBACTER_PYLORI_INFECTION and HP_HELICOBACTER_PYLORI_INFECTION, 73 H. pylori infection-related genes and the cancer-related Hallmark gene sets were collected from The Molecular Signatures Database (MSigDB, [73]https://www.gsea-msigdb.org/gsea/msigdb) [[74]31]. 2.2. Single-cell RNA profile analysis and enrichment analysis The scRNA-seq expression dataset ([75]GSE167297) of STAD containing 10 tumor samples were collected from the GEO [[76]30]. Genes expressed in fewer than three individual cells and containing less than 200 or more than 2000 genes were removed, while those with over 15 % of mitochondrial gene expression were retained. Subsequently, the scRNA-seq data were normalized using the “Seurat” R package [[77]32] and FindVariableFeatures function was used for searching highly variable genes (logfc.threshold = 0.5 and min.pct = 0.25) [[78]32]. Then, the ScaleData function was used to scale all the genes and principal component analysis (PCA) was preformed to find the clustering anchor (dim = 10). Batch effect among samples were removed using the “harmony” R package, followed by using the RunTSNE function to further reduce dimension [[79]33]. Finally, the cells were clustered by the FindNeighbors and FindClusters functions at resolution = 0.2, and different cell types were annotated according to marker genes. Each cell associated with H. pylori infection was assigned with an enrichment score using the "AUCell" package [[80]34], with a higher AUCell score indicating a closer biological relevance between the cell cluster and H. pylori infection. The “CellChat” R package was used for inferring the cell-to-cell communication. Finally, to assess the prognostic differences among patients with low and high scores, ssGSEA was performed using "GSVA" package [[81]35] to compute the scores for macrophage subpopulations. 2.3. Weighted gene co-expression network analysis (WGCNA) The DEGs analysis between tumor and para-cancer control samples in the TCGA cohort was used using the “limma” R package [[82]32] (setting p < 0.05 and |log2Fold Change|>log2(1.5)). WGCNA was performed applying the “WGCNA” R package to determine the gene module associated with the phenotype of H. pylori infection. The “pickSoftThreshold” function in the “WGCNA” R package was applied for determining the soft threshold β (setting the sensibility to 2 and the module merge threshold to 0.3). Gene modules with at least 50 genes were sectioned [[83]32]. The correlation analysis between the modules and the phenotype was performed using the Pearson method. The “clusterProfiler” R package [[84]36] was used for the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. 2.4. Identifying risk genes for developing a risk model Univariate cox regression analysis was used to identify significant prognostic genes (p < 0.05). Then LASSO cox regression analysis was performed applying the “glmnet” R package to primarily shrink the number of candidate genes [[85]32]. Multivariable cox with regression analysis and stepwise algorithm were employed to further reduce the candidate genes and calculate the regression coefficient, respectively. The risk model was established according to the formula of RiskScore = [MATH: (hazardcoxcoefficient*expressionofriskgene) :MATH] . 2.5. Verification of model prognostic value The RiskScore was calculated for all the patients and for classifying low- and high-risk groups according to the median RiskScore value. KM survival analysis and the ROC analysis with the Area Under Curve (AUC) were performed using the “survminer” R package [[86]32]. 2.6. Analysis of immune infiltration CIBERSORT is an algorithm estimates the abundance of different cell types in mixed tissues based on suppressed gene expression profiles [[87]37]. The TIMER algorithm is mainly used to quantify various types of immune cells in tumors [[88]38]. This study used both CIBERSORT and TIMER algorithms to assess the differences in immune cell infiltration in patients from different risk groups. 2.7. Cell culture and transfection From the American Type Culture Collection (ATCC), we obtained the GC cell line HGC27 and the gastric epithelial cell line GES-1. DEME medium (Invitrogen) containing 10 % fetal bovine serum (Gibco, Thermo Fisher Scientific), 50 μg/ml streptomycin and 100 U/ml penicillin was used for cell culture in 5 % CO[2] at 37 °C [[89]39]. The siRNA was used to silence ABCA1 applying the siRNA regent (Sangon, shanghai, china) with forward sequence of 5′-GCGACTCCACATAGAAGAC-3′ and reversed sequence of 5′-GACGTATGTGCAGATCATA-3’. The Lipofectamine 3000 (Invitrogen) was used for the cell transfection. After incubation for 12 h, the cell samples were harvested for qPCR detection. Briefly, total RNA was extracted by using the TRizol Reagent (Invitrogen) and the cDNA was synthesized by the ReverTra Qpcr RT Master Mix kit (TOYOBO). Then the SYBR Green PCR Master Mix (Biosystems) was applied for qRT-PCR on LightCycler 96 (Roche) according to the manufacturer's specification. Target gene expression [[90]40] from three times sample and technique repetition was calculated by the 2^–△△CT method, with gene β-actin as a reference. The specific primers were listed in [91]Table S1. 2.8. Cell migration and invasion assays Cell migration was measured by wound healing assay. 4 × 10^6 cells were seeded into a 6-well plate (Corning) and incubated until confluent, and then a rectilinear scratch was produced with a 100-μL pipette tip. After 24 h, the cells were fixed by 4 % paraformaldehyde for 15 min (min) and stained with 0.1 % crystal violet (Servicebio) for another 15 min. Next, wound closure was photographed with an inverted microscope (Leica) [[92]40]. For invasion assays, a total of 4 × 10^4 cells were plated into the upper chamber well of 24-well plates (Corning, 8-μm pore) containing 200 μL serum-free DMEM, while the lower chamber was supplemented with 800 μL of DMEM containing 20 % FBS (Thermo Fisher Scientific). After 48-h incubation, the migrating cells were fixed by 4 % paraformaldehyde and stained by 0.1 % crystal violet for 15 min and then imaged with an inverted microscope [[93]40]. 2.9. Statistical analysis All statistical analysis and data visualization was performed in the R software (version 4.3.1). The Pearson method was used for the correlation analysis. A p-value <0.05 was defined as statistically significant. SangerBox ([94]http://sangerbox.com/home.html) provided certain data analysis. 3. Results 3.1. The landscape of single-cell RNA of STAD samples After cell filtering ([95]Figs. S1A and B), normalization and dimensionality reduction clustering ([96]Fig. S1C), a total of 17397 cells from the scRNA-seq expression dataset were clustered into 8 clusters ([97]Fig. 1A) including B cells, epithelial cells, T cells, dendritic cells, fibroblasts, macrophages, endothelial cells, and mast cells ([98]Fig. 1B) according to the expression of marker genes ([99]Fig. 1C–[100]Table S2). The T cells and dendritic cells had a higher proportion in each sample ([101]Fig. 1D). Analysis on the H. pylori infection-related genes showed that macrophages had the highest AUCell score ([102]Fig. 1E), suggesting that macrophages were closely associated with the H. pylori infection in STAD. Fig. 1. [103]Fig. 1 [104]Open in a new tab Single cell atlas of stomach adenocarcinoma. (A) TSEN plot of single cell clustering. (B) TSEN plot of the annotated cell clusters. (C) The bubble plot of the expression of marker genes in each cell cluster. (D) The proportion of cell cluster in different samples. (E) The AUCell score of H. pylori infection in each cell cluster. 3.2. Identifying H. pylori infection-associated macrophage subtypes The macrophage population was selected to perform the t-distributed stochastic neighbor embedding (TSNE) clustering (resolution = 0.4). The macrophages were divided into 6 subclusters (cluster1-6, [105]Fig. 2A), among which the C4 cluster exhibited the highest AUCell score of H. pylori infection ([106]Fig. 2B). Subsequently, we identified the highly variable genes between these macrophage populations and found that several chemokines genes including the CCL3L3, CXCL5 and CCL3 were high-expressed in the C4 subpopulation ([107]Fig. 2C). These protein-encoding genes promoted inflammatory response through regulating the migration and activation of leukocytes. KEGG pathway enrichment analysis further showed that these genes were enriched in inflammatory response, cytokine-cytokine receptor interaction, TNF and IL-17 signaling pathways ([108]Fig. 2D). Fig. 2. [109]Fig. 2 [110]Open in a new tab Macrophage clustering. (A) The t-NSE plot of macrophage clustering. (B) The AUCell score of H. pylori infection in different macrophage cluster. (C) The bubble plot of the expression of marker genes in different macrophage cluster. (D) The KEGG analysis of highly variable gene of macrophage (C4). 3.3. C4 subcluster of macrophage-mediated cell communication In multicellular organisms, cell communication plays an important part in cell life activity. According to the results of enrichment analysis, it was found that the C4 macrophages functioned crucially in regulating STAD progression. Further cell communication analysis revealed that macrophages exhibited obvious interaction relationship with other cell clusters ([111]Fig. 3A), with the C4 macrophages having greater interaction intensity with other cell clusters ([112]Fig. 3B). Further analysis of ligand-receptor information between different cell clusters showed that the C4 macrophages affected other cell clusters, especially some immune cell clusters (T cells and mast cells) through the SPP1-CD44 and MIF(CD74+CD44) interaction ([113]Fig. 3C), while epithelial cells and fibroblasts clusters affected the C4 macrophages through the MDK-SDC2 interaction ([114]Fig. 3D). Fig. 3. [115]Fig. 3 [116]Open in a new tab Cell communication analysis. (A) The interaction relationship between the macrophage and other cell cluster. (B) The interaction strength analysis between the macrophage and other cell cluster. (C) The receptor-ligand interaction ways of macrophage to others cell cluster. (D) The receptor-ligand interaction ways of immune cell clusters to macrophage. 3.4. Identifying gene module related to C4 macrophages We calculated the ssGSEA score in the TCGA cohort according to the expression of the highly variable genes of C4 macrophages, and found that STAD patients with a higher score tended to have a worse prognosis ([117]Fig. 4A). Next, WGCNA was used to identify gene module related to C4 macrophages based on the DEGs. The soft threshold β was set at 12 to ensure a scale-free network ([118]Fig. 4B). After hierarchical clustering and module merging, a total of 5 co-expression modules were obtained ([119]Fig. 4C). As the grey module cannot be merged with other modules, it was considered as an invalid module. Further correlation analysis showed that the green module had a strongcorrelation with the C4 subcluster ([120]Fig. 4D). KEGG enrichment analysis revealed that the genes in the green module were enriched in cytokine receptor interaction, inflammatory response pathways, and cell adhesion molecules ([121]Fig. S2A). GO analysis showed that these genes were closely related to immune cell activation and immune response regulation pathways in biological process ([122]Fig. S2B), protein and immune complex pathways in cell component ([123]Fig. S2C) and receptor bind and activity regulation pathways in molecular function ([124]Fig. S2D). Fig. 4. [125]Fig. 4 [126]Open in a new tab WGCNA for Macrophage(C4)-related gene module searching. (A) The KM survival analysis of patients with different AUCell score of highly variable gene of macrophage (C4). (B) Analysis of the mean connectivity for various soft-thresholding powers for WGCNA. (C) Dendrogram of genes clustered based on a dissimilarity measure (1-TOM). (D) The correlation between module and feature. 3.5. Establishment of a risk classification model To establish risk model, the samples in the TCGA-STAD cohort were divided into the training set and test set at the ratio of 7:3, with the [127]GSE66229 cohort as an independent validation set. The clinical information of training set and test set was listed in [128]Table 1. Chi-square test showed no significant difference between varying clinical groups, indicating than our grouping was random and reliable. Univariate, LASSO and multivariate cox regression analysis filtered 6 key prognostic genes and used them to establish a risk model: RiskScore = [MATH: (0.25*T NFRSF1B< /mrow>)+(0.2*C< mi>TLA4)+(0.384*ABCA1)+(0.343*IKBIP)+(0.564*AKAP5)+(0.4*NP< mi>C2) :MATH] . Table 1. The clinical information of training set and test set. Characteristics Train cohort(N = 247) Test cohort(N = 106) Total(N = 353) pvalue FDR Age Mean ± SD 65.52 ± 10.53 65.50 ± 10.88 65.51 ± 10.62 Median[min-max] 67.00[35.00,90.00] 68.00[41.00,90.00] 67.00[35.00,90.00] Gender 0.63 1 FEMALE 85(24.08 %) 40(11.33 %) 125(35.41 %) MALE 162(45.89 %) 66(18.70 %) 228(64.59 %) AJCC stage 0.43 1 I 29(8.22 %) 19(5.38 %) 48(13.60 %) II 74(20.96 %) 35(9.92 %) 109(30.88 %) III 107(30.31 %) 39(11.05 %) 146(41.36 %) IV 25(7.08 %) 10(2.83 %) 35(9.92 %) unknown 12(3.40 %) 3(0.85 %) 15(4.25 %) Grade 0.54 1 G1 8(2.27 %) 1(0.28 %) 9(2.55 %) G2 86(24.36 %) 42(11.90 %) 128(36.26 %) G3 147(41.64 %) 60(17.00 %) 207(58.64 %) unknown 6(1.70 %) 3(0.85 %) 9(2.55 %) Status 0.92 1 Alive 146(41.36 %) 64(18.13 %) 210(59.49 %) Death 101(28.61 %) 42(11.90 %) 143(40.51 %) OS.time Mean ± SD 608.00 ± 527.65 625.69 ± 594.02 613.31 ± 547.63 Median[min-max] 476.00[3.00,3540.00] 406.00[20.00,3720.00] 468.00[3.00,3720.00] [129]Open in a new tab 3.6. Validation of the effectiveness of the model classification Based on the optimal cutoff point, the patients in training set were divided into high-risk and low-risk groups. It was found that high-risk patients had a poor prognosis ([130]Fig. 5A), with an AUC value of 0.67, 0.72, 0.71 and 0.65 for 1-, 2-, 3- and 4-year survival, respectively, which indicated that a high accuracy of the RiskScore in long- and short-term prediction and classification ([131]Fig. 5A). Similar survival results were also observed in the test set ([132]Fig. 5B) and the TCGA cohort ([133]Fig. 5C), and ABCA1, IKBIP and NPC2 were high-expressed in high-risk group ([134]Fig. 5C). We further evaluated the accuracy and robustness of the model in the [135]GSE66229 validation set. The results showed that the patients with a higher RiskScore had the worst prognosis and shorter survival time, with an AUC of 1- to 5-year survival rate higher than 0.65 ([136]Fig. 5D), which demonstrated that the RiskScore was highly effective in predicting long- and short-term prognosis of STAD ([137]Fig. 5D). Fig. 5. [138]Fig. 5 [139]Open in a new tab Validation of model prognostic performance. (A) KM survival and ROC analysis of varying patients in training set. (B) KM survival and ROC analysis of varying patients in test set. (C) KM survival, ROC and living time analysis of varying patients in TCGA cohort. (D) KM survival, ROC and living time analysis of varying patients in validation cohort. 3.7. Identifying independent prognostic factors and establishing a nomogram The RiskScore and other clinical factors were incorporated into univariate cox regression analysis, which showed that the Age, American Joint Committee on Cancer (AJCC) stage and RiskScore were significant influencing factors for STAD prognosis (p < 0.05, [140]Fig. 6A). Multivariate cox regression analysis also proved that these three factors were independent prognostic factors (p < 0.05, [141]Fig. 6B). To further improve the risk assessment and survival prediction for STAD, the Age, AJCC stage and RiskScore were combined to develop a nomogram model ([142]Fig. 6C). The results showed that the RiskScore had the greatest influence on predicting patients’ survival ([143]Fig. 6C). The calibration curve of nomogram presented that the 1-, 3- and 5- year calibration curve was close to the standard curve ([144]Fig. 6D), suggesting that the nomogram model had an excellent prognostic prediction performance. In the decision curve analysis (DCA), the net benefit of the nomogram and RiskScore was obviously higher than the extreme curve ([145]Fig. 6E), indicating that the current prognostic model had the strongest survival prediction ability. Fig. 6. [146]Fig. 6 [147]Open in a new tab Independent factor and nomogram developing. (A) Univariate cox regression analysis for significant prognostic factors. (B) Multivariate cox regression analysis for significant independent prognostic factors. (C) A nomogram developing. (D) Calibration curve of nomogram model. (E) Decision curve analysis (DCA) of nomogram model. 3.8. Immune infiltration and pathway activation difference Differences in tumor microenvironment (TME) were compared based on the immune infiltration score. CIBERSOER analysis showed that naïve B cells, CD4^+ memory activated T cells, CD8^+ T cells were infiltrated in low-risk group, while neutrophils and macrophage M2 were infiltrated in high-risk group ([148]Fig. S3A). Higher macrophage M2 infiltration suggests the presence of immunosuppressive TME [[149]41]. Similar results were observed in TIMER analysis, as CD4^+ T cells, B cells, and CD8^+ T cells were enriched in the low-risk group ([150]Fig. S3B). Further analysis showed that the RiskScore was positively related to tumorigenesis and development pathways such as angiogenic hypoxia, Notch signaling, epithelial mesenchymal transition (EMT) and negatively correlated with immune and cell cycle pathways ([151]Fig. S3C). This suggested that STAD patients with a higher RiskScore were more prone to activate these typical cancer activation-related pathways. 3.9. ABCA1 mediated the migration and invasion of tumor cells Finally, we examined the expression of model genes and the role of ABCA1 in cell migration and invasion. The results of qPCR showed that these genes including ABCA1, CTLA4, IKBIP, NPC2 and TNFRSF1B were significantly overexpressed in the HGC27 cancer cells (p < 0.05, [152]Fig. 7A–E), while AKAP5 was significantly downregulated in the HGC27 cells in comparison to that in the epithelial cell line GES-1 ([153]Fig. 7F). In addition, the invasion ability of cancer was greatly inhibited after ABCA1 silencing and the number of blue migrated cells in the si-ABCA1 was significantly lower than that in the si-NC groups (p < 0.05, [154]Fig. 7G). Wound healing assay revealed that silencing ABCA1 significantly affected the rate of wound closure, which was significantly reduced in the si-ABCA1 groups ([155]Fig. 7H). Fig. 7. [156]Fig. 7 [157]Open in a new tab qPCR, wound healing and invasion assay. (A–F) The expression of model genes in cancer cells and epithelial cells. (G) Wound healing assay for cell migration. (H) Trans-well assay for cell invasion. *p < 0.05, **p < 0.01, ***p < 0.001. 4. Discussion Infection of H. pylori could easily cause corpus-predominant gastritis and has been considered a high-risk factor that significantly promotes the occurrence of STAD [[158]42]. Some infected patients develop gastritis, while others with H. pylori infection may have gastric cancer [[159]43]. Many patients with gastric inflammation are asymptomatic, and gastritis symptoms in certain types of patients are more persistent or will recur after eradication treatment [[160]44]. Varied clinical outcomes could be explained by multiple factors [[161]28] such as virulence factors, but studies demonstrated that the pathogenic mechanisms of H. pylori infection are more complex than generally accepted [[162]45]. This study established a risk prognosis model according to the H. pylori infection phenotype to explore the potential pathogenic mechanism of H. pylori risk factor at molecular level and the model was able to achieve precise risk stratification for STAD patients. The nomogram model developed based on the RiskScore, AJCC stage and age can be used to accurately assess the survival probability of patients with STAD, meanwhile, the RiskScore also acted as an indicator of cancer activation-related pathways. Our finding could assist the clinical diagnosis and treatment of STAD patients. Using the AUCell algorithm, the current results demonstrated that H. pylori was closely associated with macrophages. The interaction between H. pylori and macrophages plays a significant role in the progression, pathogenesis, and suppression of the infection [[163]46]. In a mouse model, Zhuang et al. found that the NF-kappa B pathway is involved in the macrophage response to H. pylori and produces IL-6, IL-23, and CCL20, which in turn inhibit the NF-kappa-B pathway in macrophages and ultimately lead to reduced differentiation of Th17 cells [[164]47]. Wen et al. discovered that in a co-culture model of macrophages and H. pylori, the use of γ-secretase to inhibit Notch signaling causes a downregulation of the expression of inducible nitric oxide synthase (iNOS) and its product, nitric oxide (NO). Such an intervention reduces the secretion of pro-inflammatory cytokines and suppresses the phagocytic and bactericidal functions of macrophages against H. pylori [[165]48]. These results further strengthened a strong association between H. pylori and macrophages in STAD, providing new insights into the clinical management of STAD patients. Based on the coefficient of genes in the RiskScore model, TNFRSF1B and CTLA4 were regarded as the protective factors (coefficient <0), while the ABCA1, IKBIP, AKAP5 and NPC2 were regarded as the risk factors (coefficient >0). TNFRSF1B is a tumor necrosis factor receptor that recognizes their cognate ligands (TNF) and promotes the differentiation, clonal expansion and survival of antigen-primed CD8 and CD4 T cells, mediating adaptive immunity to kill cancer cells [[166]49]. Cytotoxic T lymphocyte antigen 4 (CTLA4) is an immune checkpoint molecule. Targeting CTLA4 is widely used to activate anti-cancer immune response through stimulating T cell activation [[167]50]. Guan et al. indicated that CTLA4 enhances macrophage recruitment and increases macrophage proportion in glioblastoma [[168]51]. This indicates that CTLA4 may influence tumor immunosuppression and progression of STAD by affecting macrophage polarization and, in turn, tumor immunosuppression. ATP binding cassette protein A1 (ABCA1) is a crucial molecule in cholesterol homeostasis, and the expression of ABCA1 is upregulated during the EMT of breast cancer to promote the metastatic capacity of tumor [[169]52]. We also found that the ABCA1 was overexpressed in the gastric cancer cells, and that its silencing affected the migration and invasion of tumors. Notably, a significantly positive correlation between the RiskScore and EMT indicated that EMT increased tumor spread and metastasis of STAD, and that H. pylori infection further exacerbated the EMT process through the activating EMT-related signaling pathways with ABCA1 as a crucial contributor during the process. I kappa B kinase interacting protein (IKBIP) is a biomarker that maintains abnormal proliferation of the glioblastoma cells through suppressing the ubiquitination and degradation of CDK4 [[170]53]. Also, IKPIP is an immunosuppressive microenvironment biomarker of digestive system malignancies [[171]54]. Zhong el. revealed that the expression of protein kinase A-anchoring protein 5 (AKAP5) is upregulated and is closely associated with the clinical stages in the STAD, whereas low-expressed AKAP5 can act as a protective factor [[172]55]. In addition, some researchers found that the recruitment of immature macrophages to the TME in lung cancer is inhibited by NPC2, and confirmed that NPC2 is secreted by tumor cells and absorbed by immature macrophages [[173]56]. Macrophages play a dual role in gastritis induced by H. pylori, ulcers and gastric cancer. On one hand, H. pylori induces macrophage polarization to promote inflammatory responses and eliminate H. pylori, on the other hand, macrophages are key cells in the primary immune response against H. pylori infection [[174]46]. Analytical method validation is the process used to prove that a test method consistently yields what it is expected to do, and its purpose is to establish that an accurate, precise, and rugged method has been developed [[175]57]. Further experimentals showed that ABCA1 promoted cancer cell migration and invasion. Taken together, the six prognostic genes identified in this study could influence the process by promoting macrophage-tumor cell interactions and their interactions with other immune cells. Previous studies have developed long non-coding RNA (lncRNA) signatures related to H. pylori infection for predicting the prognosis of gastric cancer patients [[176]58]. Zheng et al. integrated multiple gastric cancer cohorts and identified 28 key prognostic genes for evaluating the risk for patients [[177]59]. However, our model only contained 6 prognostic genes with a high classification effectiveness, indicating that our model had greater potential for clinical application. Although the current model had certain prognostic value, there were also some limitations. Firstly, all the datasets were public databases and more sample data from different populations are required to achieve greater generalization. Secondly, the mechanisms through which these model genes affected macrophages in the STAD microenvironment and their role in H. pylori infection have not been validated. Therefore, in the future, in vivo experiments should be carried out to explore the specific biological functions of these genes. In addition, bias was inevitable due to the heterogeneity of individual tumor, therefore large-scale clinical trials are needed to provide individualized treatment for STAD patients. 5. Conclusion This study analyzed the single-cell profile of gastric cancer and annotated a total of 8 cell clusters, among which T cells and dendritic cells accounted for the highest proportion in tumor tissues and the C4 macrophages were closely associated with the H. pylori infection phenotype. Furthermore, a risk prognostic model related to H. pylori infection was developed based on the DEGs and the module genes related to C4 subtype. Our model exhibited short- and long-term prognostic value and patients with higher RiskScore were prone to activate typical cancer activation-related pathways. A nomogram model was developed to accurately predict the survival probability of STAD patients. The present discoveries provided new insights for the diagnosis and management of patients with STAD. Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Availability of data and material The datasets generated during and/or analyzed during the current study are available in the GSE repository [[178]GSE66229] ([179]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= [180]GSE66229) and repository [[181]GSE16279] ([182]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= [183]GSE16279). Funding This work was supported by Shaanxi Provincial Health Research Fund (2021D036). CRediT authorship contribution statement Jing Zhou: Writing – review & editing, Writing – original draft, Software, Resources, Project administration, Formal analysis, Data curation, Conceptualization. Li Guo: Writing – review & editing, Writing – original draft, Validation, Software, Resources, Methodology, Investigation. Yuzhen Wang: Supervision, Resources, Project administration, Investigation, Formal analysis. Lina Li: Validation, Resources, Project administration, Methodology, Formal analysis. Yahuan Guo: Visualization, Validation, Resources, Methodology, Investigation. Lian Duan: Visualization, Software, Project administration, Methodology, Investigation. Mi Jiao: Visualization, Supervision, Project administration, Investigation, Formal analysis, Data curation. Pan Xi: Visualization, Supervision, Resources, Methodology, Formal analysis, Data curation. Pei Wang: Writing – review & editing, Visualization, Validation, Supervision, Software, Funding acquisition, Formal analysis, Conceptualization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements