Abstract Septicemia triggers a profound systemic inflammatory response, ultimately leading to cellular senescence. Understanding the mechanisms underlying intercellular interactions associated with sepsis is crucial for developing therapeutic strategies targeting sepsis and its associated complications. Neonatal septicemia (NS) is a critical condition characterized by systemic infection in newborns. Currently, there are no effective indicators for the early identification of sepsis and precise screening of newborns who truly require antibiotic treatment, which results in delayed diagnosis of neonatal sepsis and the overuse of antibiotics. We aimed to elucidate cellular senescence-related genes (CSRGs) in the context of NS and to investigate their potential regulatory networks and immune infiltration patterns. Our analysis identified 391 DEGs, including 301 upregulated and 90 downregulated genes and 15 differentially expressed CSRGs (CSRDEGs), including STAT3, MAPK14, IGFBP7, PPARG, and ETS2. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses revealed enrichment in biological processes, including cellular senescence, and pathways involving PD-L1 expression in cancer. Gene Set Enrichment Analysis highlighted significant pathways, including selenoamino acid metabolism and neutrophil degranulation. Protein–protein interaction network analysis identified eight hub genes: STAT3, MAPK14, CEBPB, TLR2, ETS1, JUNB, PPARG, and MAP3K5. Regulatory network analysis revealed interactions between CSRDEGs and multiple transcription factors/miRNAs. Immune infiltration analysis revealed profound differences in the composition of immune cells between the normal and NS groups. Our study findings offer valuable information for future research on therapeutic targets and diagnostic markers of NS. Keywords: Biomarkers, Immune infiltration, Neonatal sepsis, Disease pathology Subject terms: Molecular biology, Medical research Introduction Neonatal septicemia (NS) is a life-threatening condition that affects newborns and is characterized by systemic infections and substantial morbidity and mortality rates. According to recent epidemiological data, the incidence of NS fluctuates between one and eight cases per 1,000 live births, with mortality rates ranging from 15 to 50%, depending on the clinical setting and region^[36]1. Early diagnosis and timely intervention are essential to improve patient outcomes; however, current diagnostic methods, which include blood cultures and biomarkers such as C-reactive protein and procalcitonin (PCT), frequently exhibit insufficient sensitivity and specificity, leading to delayed or inaccurate diagnoses^[37]2. The early symptoms of neonatal sepsis are typically uncharacteristic and nonspecific^[38]3. Owing to the absence of specific and reliable sensitive diagnostic methods, there is a propensity for the excessive use of broad-spectrum antibiotics or delayed antibiotic therapy, thereby jeopardizing the safety of these fragile lives. The shortcomings of current diagnostic methodologies highlight the urgent need for innovative biomarkers and therapeutic targets to improve the management of NS. Cellular senescence, a state of irreversible cell cycle arrest, has been implicated in many pathological conditions, including cancer, cardiovascular diseases, and infectious diseases^[39]4. Senescent cells display a unique secretory profile, referred to as senescence-associated secretory phenotype, which modulates the tissue microenvironment and influences disease progression^[40]5. Recent studies have revealed that cellular senescence-related genes (CSRGs) play critical roles in immune responses, inflammation, and tissue repair processes, suggesting their potential involvement in the pathophysiology of sepsis and other infectious diseases^[41]6. For example, in adult sepsis, alterations in the expression of senescence markers are associated with disease severity and outcome, indicating that CSRGs may serve as valuable biomarkers and therapeutic targets^[42]7. Given the potential significance of CSRGs in infectious diseases, there is a compelling rationale to investigate their roles in NS. Previous studies have shown that cellular senescence contributes to the dysregulation of the immune response in sepsis, leading to persistent inflammation and immune suppression^[43]8. However, the specific contributions of CSRGs to NS remain unclear. This study aims to explore the differential expression of CSRGs in NS and elucidate their regulatory networks and influence on immune cell infiltration in neonates. Our findings provide novel insights into the molecular mechanisms governing NS and emphasize the potential for CSRGs as both diagnostic biomarkers and therapeutic targets. By advancing our understanding of the role of cellular senescence in NS, this study may aid the development of more effective diagnostic and therapeutic strategies for this devastating condition. Results Study flow chart Figure [44]1 describes the study workflow and analysis. Fig. 1. Fig. 1 [45]Open in a new tab Workflow diagram. CSRGs, cellular senescence-related genes; DEGs, differentially expressed genes; GSEA, Gene Set Enrichment Analysis; CSRDEGs, cellular senescence-related differentially expressed genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein–protein interaction; TF, transcription factor; ROC, receiver operating characteristic; CIBERSORT, Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts. Merging of neonatal sepsis datasets In this study, we used the R package sva to mitigate batch effects in the NS datasets [46]GSE69686 and [47]GSE25504, which resulted in the construction of a combined GEO dataset. Eliminating batch effects in the dataset through the use of distribution boxplots (Fig. [48]2A,B) and PCA (Principal Component Analysis) plots (Fig. [49]2C,D). Fig. 2. [50]Fig. 2 [51]Open in a new tab Batch effects removal of [52]GSE69686 and [53]GSE25504. (A) Before batch processing of the integrated GEO dataset (Combined Datasets) distribution box plot. (B) Post-batch integrated GEO Dataset (Combined Datasets) distribution boxplots. C. PCA plot of the datasets before de-batching. D. PCA map of the combined GEO datasets after batch processing. PCA, principal component analysis; NS, NS. Blue indicates NS data set [54]GSE69686, yellow indicates NS dataset [55]GSE25504. NS, neonatal septicemia. Differentially expressed genes related to cell senescence in the NS The data from the aggregated GEO datasets were classified into NS samples and normal samples. To analyze gene expression within the integrated GEO dataset in the context of sepsis, we examined the differences between NS samples and normal controls. In the combined GEO dataset, 391 differentially expressed genes (DEGs) were identified, meeting the criteria of |logFC|> 0.5 and adj. P < 0.05 threshold. A volcano plot was constructed based on the results of the differential analysis of this dataset (Fig. [56]3A). Fig. 3. [57]Fig. 3 [58]Open in a new tab Differential gene expression analysis. (A) Volcano plot of DEGs in septicemia (NS) samples and control (Normal) samples from the combined GEO datasets. (B) Venn diagram of DEGs and cellular senescence-related genes (CSRGs) in the integrated GEO Datasets (Combined Dataset). (C) Heat map of DEGs related to cellular senescence (CSRDEGs) in the integrated GEO Datasets (Combined Dataset). (D) Chromosomal mapping of CSRDEGs. NS, neonatal septicemia; DEGs, differentially expressed genes; CSRGs, cellular senescence-related genes; CSRDEGs, cellular senescence-related differentially expressed genes. Yellow indicates NS samples, and blue indicates Normal samples. Red indicates high expression, and blue indicates low expression in the heatmap. To derive the CSRDEGs, all DEGs with |logFC|> 0.5 and adj.p were intersected with CSRGs, using a Venn diagram (Fig. [59]3B). Ultimately, 15 CSRDEGs were identified, STAT3, MAPK14, IGFBP7, PPARG, ETS2, MAP3K5, MAP2K6, PRKCD, CEBPB, TLR2, JUNB, CTSD, ARG2, THBS1, and ETS1. Based on the results of the intersection, the expression variations of these CSRDEGs across various sample groups within the combined GEO datasets were meticulously examined using the R package heatmap to produce heatmaps that illustrate the analysis outcomes (Fig. [60]3C). Finally, the chromosomal locations of the 15 CSRDEGs were examined using the R package RCircos, which facilitated the generation of a chromosome localization map (Fig. [61]3D). Chromosomal mapping revealed that several CSRDEGs were predominantly located on chromosomes 3, 4, 6, 11 and 17. Specifically, PPARG and PRKCD were positioned on chromosome 3; IGFBP7 and TLR2 were located on chromosome 4; MAPK14 and MAP3K5 were located on chromosome 6; ETS1 and CTSD were found on chromosome 11; and STAT3 and MAP2K6 were located on chromosome 17 (Table [62]1). Table 1. Characteristics of the GEO Microarray Chip datasets. [63]GSE69686 [64]GSE25504 Platform [65]GPL20292 [66]GPL6947 Species Homo sapiens Homo sapiens Tissue Blood Blood Samples in NS group 64 26 Samples in Normal group 85 37 Reference PMID: 26,052,715 PMID: 25,120,092 PMID: 27,114,524 PMID: 26,484,146 [67]Open in a new tab GEO gene expression omnibus, ns – Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses Using GO and KEGG enrichment analyses, the associations among the 15 CSRGs relevant to NS were explored in terms of biological processes (BP), cellular components (CC), molecular functions (MF), and biological pathways (KEGG), with detailed results presented in Table [68]2. The findings of the enrichment analyses of GO and KEGG are visualized through a bubble chart (Fig. [69]4A).Presentation of the results of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis in network format(Fig. [70]4B–E). Table 2. Results of GO and KEGG Enrichment Analyses of CSRDEGs. Ontology ID Description Gene Ratio Bg Ratio P value p.adjust Q value BP GO:0,090,398 Cellular senescence 5/15 91/18,800 6.87205e−09 5.78093e−06 2.27961e−06 BP GO:0,030,099 Myeloid cell differentiation 7/15 391/18,800 8.89374e−09 5.78093e−06 2.27961e−06 BP GO:0,043,405 Regulation of MAP kinase activity 5/15 183/18,800 2.29584e−07 9.94862e−05 3.92306e−05 BP GO:0,050,727 Regulation of inflammatory response 6/15 394/18,800 3.47919e−07 0.000113074 4.45885e−05 BP GO:0,002,573 Myeloid leukocyte differentiation 5/15 210/18,800 4.54619e−07 0.000118201 4.66104e−05 BP GO:0,060,397 Growth hormone receptor signaling pathway via JAK-STAT 1/15 10/18,800 0.007952034 0.034690081 0.013679409 CC GO:0,090,575 RNA polymerase II transcription regulator complex 4/15 230/19,594 2.28076e−05 0.001168539 0.000804258 CC GO:0,034,774 Secretory granule lumen 4/15 322/19,594 8.46819e−05 0.001168539 0.000804258 CC GO:0,060,205 Cytoplasmic vesicle lumen 4/15 325/19,594 8.77781e−05 0.001168539 0.000804258 CC GO:0,031,983 Vesicle lumen 4/15 327/19,594 8.98876e−05 0.001168539 0.000804258 CC GO:0,005,667 Transcription regulator complex 4/15 483/19,594 0.000401016 0.004170563 0.002870428 MF GO:0,061,629 RNA polymerase II-specific DNA-binding transcription factor binding 6/15 348/18,410 1.89412e−07 2.08353e−05 6.97832e−06 MF GO:0,140,297 DNA-binding transcription factor binding 6/15 470/18,410 1.10372e−06 6.07046e−05 2.03317e−05 MF GO:0,004,708 MAP kinase kinase activity 2/15 18/18,410 9.40925e−05 0.003450057 0.001155522 MF GO:0,106,310 Protein serine kinase activity 4/15 360/18,410 0.000165479 0.00455066 0.001524144 MF GO:0,019,903 Protein phosphatase binding 3/15 148/18,410 0.00021579 0.004747384 0.001590033 KEGG hsa05417 Lipid and atherosclerosis 6/14 215/8164 7.83359e−07 4.67497e−05 2.54535e−05 KEGG hsa04668 TNF signaling pathway 5/14 112/8164 8.06029e−07 4.67497e−05 2.54535e−05 KEGG hsa05235 PD-L1 expression and PD-1 checkpoint pathway in cancer 4/14 89/8164 1.21543e−05 0.000469966 0.00025588 KEGG hsa05145 Toxoplasmosis 4/14 112/8164 3.02235e−05 0.000876482 0.000477213 KEGG hsa04935 Growth hormone synthesis, secretion and action 4/14 120/8164 3.96597e−05 0.000920106 0.000500965 KEGG hsa04657 IL-17 signaling pathway 2/14 94/8164 0.010908783 0.036154823 0.019685021 [71]Open in a new tab GO gene ontology, BP biological process, CC cellular component, MF molecular function, KEGG Kyoto encyclopedia of genes and genomes, CSRDEGs cellular senescence-related differentially expressed genes. Fig. 4. [72]Fig. 4 [73]Open in a new tab GO and KEGG enrichment analysis for CSRDEGs. Through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, we further explored the biological processes (BP), cellular components (CC), molecular functions (MF), and biological pathways (KEGG) associated with 15 cell senescence-related differentially expressed genes (CSRDEGs) in the context of neonatal sepsis (NS). Enrichment analysis of these 15 CSRDEGs yielded specific results, as shown in Table [74]2. The results revealed that these genes were primarily enriched in biological processes (BP) such as cellular senescence, regulation of MAP kinase activity, myeloid leukocyte differentiation, regulation of the inflammatory response, and the growth hormone receptor signaling pathway via JAK-STAT. They were also enriched in cellular components (CC) including RNA polymerase II transcription regulator complex, secretory granule lumen, cytoplasmic vesicle lumen, vesicle lumen, and transcription regulator complex. Regarding molecular functions (MF), the enrichment included RNA polymerase II-specific DNA-binding transcription factor binding, DNA-binding transcription factor binding, MAP kinase kinase activity, protein serine kinase activity, and protein phosphatase binding. Additionally, the genes were enriched in biological pathways (KEGG), including lipid and atherosclerosis, PD-L1 expression and PD-1 checkpoint pathway in cancer, TNF signaling pathway, toxoplasmosis, growth hormone synthesis, secretion and action, and IL-17 signaling pathway. The results of the GO and KEGG pathway enrichment analyses are visualized in a bubble chart (A). Gene set enrichment analysis (GSEA) To ascertain the conformity of the GEO dataset (Combined Datasets) with respect to the influence of gene expression levels on NS, GSEA was conducted, integrating the expression of all genes involved in BP within the dataset (Fig. [75]5A).And the detailed results are shown in Table [76]3. Figure [77]5B–E show that GSEA reveals significant gene enrichment in the following categories: Selenoamino Acid Metabolism (B), Neutrophil Degranulation (C), Metabolism of RNA (D), and Metabolism of Amino Acids and Derivatives (E). Fig. 5. [78]Fig. 5 [79]Open in a new tab GSEA for the combined GEO datasets. To determine the impact of all gene expression levels in the integrated GEO dataset (Combined Datasets) on neonatal sepsis (NS), we conducted Gene Set Enrichment Analysis (GSEA) to investigate the relationship between the expression of all genes in the integrated GEO dataset and the biological processes, affected cellular components, and molecular functions they are involved in (A), with specific results detailed in Table [80]3. The results indicated that all genes in the integrated GEO dataset (Combined Datasets) were significantly enriched in biologically relevant functions and signaling pathways, including Selenoamino Acid Metabolism (B), Neutrophil Degranulation (C), Metabolism of RNA (D), and Metabolism of Amino Acids and Derivatives (E). Table 3. Results of GSEA for the combined datasets. ID Set Size Enrichment Score NES P value p.adjust q value REACTOME_SELENOAMINO_ACID_METABOLISM REACTOME_SELENOAMINO_ACID_METABOLISM 83 0.772020154 2.891204038 1.00e−10 1.14e−08 REACTOME_NEUTROPHIL_DEGRANULATION REACTOME_NEUTROPHIL_DEGRANULATION 383 0.680330109 2.619954525 1.00e−10 1.14e−08 REACTOME_METABOLISM_OF_RNA REACTOME_METABOLISM_OF_RNA 495 0.522819036 2.38028149 1.00e−10 1.14e−08 REACTOME_METABOLISM_OF_AMINO_ACIDS_AND_DERIVATIVES REACTOME_METABOLISM_OF_AMINO_ACIDS_AND_DERIVATIVES 250 0.455746719 1.967564627 3.24e−10 3.36e−08 REACTOME_SIGNALING_BY_INTERLEUKINS REACTOME_SIGNALING_BY_INTERLEUKINS 357 0.485613461 1.861779367 2.75e−09 2.39e−07 REACTOME_INTERLEUKIN_1_FAMILY_SIGNALING REACTOME_INTERLEUKIN_1_FAMILY_SIGNALING 121 0.558216915 1.922232058 2.33e−06 0.000112592 REACTOME_COSTIMULATION_BY_THE_CD28_FAMILY REACTOME_COSTIMULATION_BY_THE_CD28_FAMILY 55 0.607779554 2.145728987 3.36e−06 0.000149073 KEGG_STARCH_AND_SUCROSE_METABOLISM KEGG_STARCH_AND_SUCROSE_METABOLISM 26 0.769200975 2.064260912 1.42e−05 0.000531023 REACTOME_METABOLISM_OF_LIPIDS REACTOME_METABOLISM_OF_LIPIDS 474 0.389511408 1.513189834 2.24e−05 0.000785358 REACTOME_INTERLEUKIN_4_AND_INTERLEUKIN_13_SIGNALING REACTOME_INTERLEUKIN_4_AND_INTERLEUKIN_13_SIGNALING 91 0.567115141 1.880050909 2.90e−05 0.000983446 BIOCARTA_IL1R_PATHWAY BIOCARTA_IL1R_PATHWAY 23 0.776695565 2.048122401 4.16e−05 0.001273441 WP_SIGNAL_TRANSDUCTION_THROUGH_IL1R WP_SIGNAL_TRANSDUCTION_THROUGH_IL1R 25 0.749596701 2.003179444 5.96e−05 0.001773345 REACTOME_INTERLEUKIN_1_SIGNALING REACTOME_INTERLEUKIN_1_SIGNALING 98 0.540107568 1.81069726 7.85e−05 0.002158606 PID_IL1_PATHWAY PID_IL1_PATHWAY 26 0.725129571 1.945988991 0.000139077 0.003472124 NABA_ECM_AFFILIATED NABA_ECM_AFFILIATED 83 0.55804809 1.835226291 0.00015728 0.003795691 BIOCARTA_IL17_PATHWAY BIOCARTA_IL17_PATHWAY 11 0.836089474 1.958474087 0.000172686 0.00412169 PID_IL8_CXCR1_PATHWAY PID_IL8_CXCR1_PATHWAY 20 0.7773685 1.984546617 0.000247197 0.005651715 PID_IL8_CXCR2_PATHWAY PID_IL8_CXCR2_PATHWAY 27 0.716271318 1.928189902 0.000253198 0.005728607 BIOCARTA_NO2IL12_PATHWAY BIOCARTA_NO2IL12_PATHWAY 14 0.791224293 2.001387677 0.00025959 0.005762246 REACTOME_METABOLISM_OF_CARBOHYDRATES REACTOME_METABOLISM_OF_CARBOHYDRATES 195 0.45793099 1.662635052 0.000259991 0.005762246 WP_CHOLESTEROL_METABOLISM_WITH_BLOCH_AND_KANDUTSCHRUSSELL_PATHWAYS WP_CHOLESTEROL_METABOLISM_WITH_BLOCH_AND_KANDUTSCHRUSSELL_PATHWAYS 38 0.652566577 1.890098992 0.000368999 0.007490329 REACTOME_INTERLEUKIN_10_SIGNALING REACTOME_INTERLEUKIN_10_SIGNALING 33 0.665735351 1.863220862 0.000420897 0.008387044 PID_IL6_7_PATHWAY PID_IL6_7_PATHWAY 44 0.606999254 1.797108281 0.000538896 0.01035824 KEGG_GLYCEROPHOSPHOLIPID_METABOLISM KEGG_GLYCEROPHOSPHOLIPID_METABOLISM 44 0.598932723 1.773226159 0.000831556 0.014110458 KEGG_GALACTOSE_METABOLISM KEGG_GALACTOSE_METABOLISM 21 0.729930291 1.890105844 0.001188736 0.018445194 KEGG_GLYCOLYSIS_GLUCONEOGENESIS KEGG_GLYCOLYSIS_GLUCONEOGENESIS 47 0.582545195 1.748863452 0.001915364 0.026839813 [81]Open in a new tab GSEA gene set enrichment analysis. PPI network construction and screening of key genes A PPI analysis was conducted, leading to the construction of a PPI network comprising 15 genes considered CSRDEGs, using the STRING database (Fig. [82]6A). According to the PPI network, the 14 CSRDEGs were JUNB, MAP3K5, MAP2K6, STAT3, THBS1, TLR2, PPARG, ARG2, PRKCD, MAPK14, CEBPB, CTSD, ETS1, and ETS2. Utilizing the CytoHubba plugin of Cytoscape software, five algorithms were employed to identify differentially expressed genes associated with cellular senescence, followed by the construction of a protein–protein interaction network. The algorithms used are as follows: MCC (Fig. [83]6C), NewSun Focus (Fig. [84]6D), Degree (Fig. [85]6E), EPC (Fig. [86]6F), and Closeness (Fig. [87]6G). Finally, the intersections were determined, and a Venn diagram (Fig. [88]6B) was generated.The intersecting genes identified by the algorithms constituted the hub genes associated with NS, including the following eight hub genes: STAT3, MAPK14, CEBPB, TLR2, ETS1, JUNB, PPARG, and MAP3K5. Fig. 6. [89]Fig. 6 [90]Open in a new tab PPI network and hub gene analysis. (A) The STRING database was used to determine the CSRDEG-associated PPI network. (B) CytoHubba plug-in of the algorithm of cells of the top 10 CSRDEGs is shown as a Venn diagram. Five algorithms were used for (C) to (F). CytoHubba plug-in calculated cells of the top 10 CSRDEGs in the PPI network, including the MCC (C), newsun focus (D), Degree (E), EPC (F), and Closeness (G). NS, neonatal septicemia; PPI, protein–protein interactions; CSRDEGs, cellular senescence-related differentially expressed genes. Construction of the control network To begin with, access to the ChIPBase database facilitated the examination of cellular genetic variations associated with senescence in conjunction with transcription factors (TFs). This led to the construction of an mRNA-TF regulatory network, which was visualized using Cytoscape (Fig. [91]7A). This network comprises three genetic variations related to cellular senescence among the identified CSRDEGs and 34 TFs; for specific details, see Supplementary Table S2. Fig. 7. [92]Fig. 7 [93]Open in a new tab Regulatory network of CSRDEGs. (A) CSRDEGs–TF regulatory network. (B) CSRDEGs mRNA–miRNA regulatory network. CSRDEGs, cellular senescence-related differentially expressed genes; TF: transcription factor. Blue indicates mRNA, green indicates TF, and pink indicates miRNA. Subsequently, the StarBase v3.0, database ([94]https://starbase.sysu.edu.cn/) was used to investigate the miRNA variations associated with cellular senescence and CSRDEGs. An mRNA–miRNA regulatory network was constructed and visualized using Cytoscape (Fig. [95]7B). This network included four CSRDEGs and 15 miRNAs (Supplementary Table 3). Validation of differentially expressed key genes and ROC curve analysis To investigate the differences in the expression of hub genes within the combined GEO datasets, a comparative grouping chart (Fig. [96]8A) was constructed that illustrates eight hub genes in the integrated GEO dataset, contrasting the NS group with the normal group. The analysis revealed significant differences in expression between high and low expression groups. The ROC curve (Fig. [97]8B–D) indicated that the expression levels of the following hub genes were accurately determined (0.7 < area under the curve (AUC) < 0.9) in the classification of sepsis (NS) and normal groups: STAT3, MAPK14, CEBPB, TLR2, ETS1, JUNB, PPARG, and MAP3K5. Fig. 8. [98]Fig. 8 [99]Open in a new tab Differential expression validation and ROC curve analysis. (A) Group comparison plots of hub genes in the NS and Normal groups of the Combined GEO Datasets. (B–D) Hub Genes (hub genes) STAT3, MAPK14, CEBPB (B), TLR2, JUNB, ETS1 (C), MAP3K5, and PPARG (D) in the integration of the GEO dataset (Combined Datasets) in the ROC curve. *p value < 0.05, statistically significant; **p value < 0.01, highly statistically significant; ***p value < 0.001, very highly statistically significant. AUC > 0.5 indicates that the expression of the molecule tends to promote the occurrence of the event, and the closer the AUC is to 1, the better the diagnostic effect. AUC between 0.5 and 0.7 had low accuracy, AUC between 0.7 and 0.9 had moderate accuracy, and AUC above 0.9 had high accuracy. NS, neonatal septicemia; CSRDEGs, cellular senescence-related differentially expressed genes; ROC, receiver operating characteristic; AUC, area under the curve; TPR, true positive rate; FPR, false positive rate; GEO, Gene Expression Omnibus. Blue indicates Normal group, and yellow indicates NS group. Immune infiltration analysis using cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) By employing the integrated combined GEO datasets, we utilized the CIBERSORT algorithm to calculate the abundance of 22 types of immune cells within the dataset and conducted a statistical analysis of the data, revealing that differences in the expression of 12 immune cell types in both the NS and the normal groups were statistically significant (p < 0.05) (Fig. [100]9). Notably, activated natural killer cells exhibited the strongest positive correlation with CD8^+ T cells (r = 0.43). Conversely, the most pronounced negative correlation was observed between regulatory T cells (Tregs) and resting memory CD4^+ T cells (r value = − 0.76). Moreover, the correlation bubble plot demonstrated a significant positive correlation between ETS1 and naïve CD4^+ T cells (r > 0.0, p < 0.05), in addition to a considerable negative correlation between TLR2 and resting memory CD4^+ T cells (r < 0.0, p < 0.05). Fig. 9. [101]Fig. 9 [102]Open in a new tab Combined dataset immune infiltration analysis using CIBERSORT. Using the integrated GEO dataset (Combined Datasets), we calculated the infiltration abundance of 22 immune cell types with the CIBERSORT algorithm. Based on the results of the immune infiltration analysis, we created a bar chart to illustrate the proportions of immune cells within the integrated GEO dataset (A). We then displayed the differences in immune cell infiltration abundance between the neonatal sepsis (NS) group and the normal (Normal) group in the integrated GEO dataset through a comparative chart (B). The results indicated that the expression levels of 12 immune cell types exhibited statistically significant differences between the neonatal sepsis (NS) group and the normal (Normal) group in the integrated GEO dataset (p value < 0.05). These cell types included activated dendritic cells, resting dendritic cells, M0 macrophages, neutrophils, activated NK cells, plasma cells, resting memory CD4 T cells, naive CD4 T cells, CD8 T cells, and regulatory T cells (Tregs). Subsequently, we presented a correlation heatmap showing the correlation results of the infiltration abundance of 10 immune cell types (C). The results revealed that activated NK cells and CD8 T cells displayed the strongest positive correlation (r value = 0.43), while regulatory T cells (Tregs) and resting memory CD4 T cells showed the strongest negative correlation (r value = − 0.76). Finally, we illustrated the correlation between hub genes and immune cell infiltration abundance in the integrated GEO dataset using a correlation bubble chart (D). The correlation bubble chart indicated that the hub gene ETS1 demonstrated a significant positive correlation with naive CD4 T cells (r value > 0.0, p value < 0.05), whereas the hub gene TLR2 exhibited a significant negative correlation with resting memory CD4 T cells (r value < 0.0, p value < 0.05). Discussion NS is a grave and potentially fatal condition that affects newborns and is characterized by systemic infection and inflammation. NS is a significant contributor to neonatal morbidity and mortality, particularly in developing nations, and leads to severe complications, including septic shock, multiorgan failure, and neurodevelopmental impairments^[103]9. Consequently, deepening our understanding of the molecular mechanisms that underlie NS is necessary to formulate precise diagnostic tools and effective therapeutic strategies. In this study, we focused on the CSRDEGs in NS and explored their potential regulatory networks and patterns of immune infiltration. Cell senescence is a state of irreversible cell cycle arrest that may be induced by various stressors and is associated with a multitude of diseases, including infectious and inflammatory disorders^[104]10. Our findings indicate that these genes are markedly differentially expressed in neonatal sepsis and play crucial roles in vital biological processes and pathways. Furthermore, established PPI networks and regulatory factors may function as novel biomarkers or therapeutic targets. Analysis of immune infiltration additionally uncovered significant disparities in immune cell composition between healthy and septic samples, offering valuable information on the immune landscape of NS and its possible ramifications for disease progression and treatment response. ROC curve analysis revealed that core gene expression (STAT3, MAPK14, CEBPB, TLR2, ETS1, JUNB, PPARG, and MAP3K5) exhibited a notable degree of accuracy (0.7 < AUC < 0.9) in differentiating between the neonatal sepsis and normal groups. STAT3 (AUC = 0.875) and MAPK14 (AUC = 0.857) showed exceptional classification accuracy, indicating their potentially critical roles in diagnosing sepsis.These results are comparable to commonly used biomarkers such as C-reactive protein (CRP) and procalcitonin (PCT), which have AUC values ranging from 0.70 to 0.85 in the diagnosis of neonatal sepsis, and may even be slightly superior. Additionally, genes such as CEBPB (AUC = 0.799), ETS1 (AUC = 0.789), and TLR2 (AUC = 0.765) exhibited notable classification abilities. These genes maintain relatively high sensitivity and specificity, thereby reinforcing their possible applications in the context of sepsis. Using ROC curve analysis, this study confirms the significant potential of eight hub genes in differentiating between septic and healthy populations. Particularly, the high AUC values of STAT3 and MAPK14 suggest that these genes could function as novel biomarkers, thereby establishing a foundation for clinical applications. Our study findings provide a solid foundation for future research and encourage further exploration of the roles that these genes might play in sepsis. Functional enrichment analysis revealed that these CSRDEGs were significantly involved in biological processes, such as cellular senescence, regulation of MAP kinase activity, and myeloid leukocyte differentiation. Moreover, KEGG pathway analysis revealed their engagement in essential pathways, including lipid metabolism and atherosclerosis, as well as PD-L1 expression and the PD-1 checkpoint pathway in cancer. The cellular senescence pathway is crucial, as it is associated with the cessation of cell division, which could influence the immune response and inflammation, key factors in the pathophysiology of sepsis^[105]10. MAP kinase (MAPK) activity regulation is another significant process as MAPKs are involved in transmitting extracellular signals to the nucleus, affecting gene expression and cellular responses, which are vital in the inflammatory response observed in sepsis^[106]11. In NS, the upregulation of MAPK14 suggests its involvement in the inflammatory cascade and cellular stress responses that are characteristic of the disease. The activation of the MAPK pathway could lead to the production of cytokines such as TNF-α and IL-1β, which are critical mediators of the septic response. Furthermore, the differential expression of MAP3K5 suggests its involvement in the inflammatory and apoptotic pathways characteristic of the disease^[107]12,[108]13. Myeloid leukocyte differentiation is essential for the immune system to respond to infections, and its dysregulation can lead to impaired immune responses in patients with sepsis^[109]14. The involvement of CSRDEGs in lipid and atherosclerosis pathways suggests a potential link between lipid metabolism and sepsis, which could influence the inflammatory state and immune response in NS^[110]15. The initiation of atherosclerosis is marked by infiltration of inflammatory cells into the intima, during which activated endothelial cells express leukocyte adhesion molecules that capture circulating monocytes. This intricate process catalyzes the synthesis of pro-inflammatory mediators, reactive oxygen species, and tissue factor procoagulants, thereby exacerbating localized inflammation and promoting thrombotic complications. T-helper cells type 1 (TH1) release the hallmark cytokine interferon-gamma (IFN-γ), which serves to activate vascular wall cells and macrophages, thus amplifying and perpetuating the inflammatory response within the intima. Meanwhile, regulatory T cells (Treg) produce interleukin-10 (IL-10) and transforming growth factor-beta (TGF-β), both of which are recognized for their anti-inflammatory properties. The primary role of PD-1 is to constrain the activity of peripheral T cells during the inflammatory response to infections and alleviate autoimmune disorders. The expression of PD-L1 and the PD-1 checkpoint pathway are crucial for regulating the immune system, and their dysregulation may result in immune exhaustion, a phenomenon frequently encountered in sepsis^[111]16. GSEA further supported these findings by showing significant enrichment in pathways such as selenoamino acid metabolism, neutrophil degranulation, and RNA metabolism. Selenoamino acid metabolism is important for antioxidant defense, which could be compromised in sepsis, leading to increased oxidative stress and cellular damage^[112]17. Neutrophil degranulation plays a crucial role in the innate immune response, and its dysregulation can exacerbate the inflammatory response observed in sepsis^[113]8,[114]18. RNA metabolism is crucial for maintaining cellular function and stress responses, and its alteration can affect the overall cellular response to septic conditions^[115]19. Our PPI network analysis revealed eight significant hub genes (STAT3, MAPK14, CEBPB, TLR2, ETS1, JUNB, PPARG, and MAP3K5) that are central to the NS regulatory network. In-depth mechanistic analysis revealed that STAT3, as a core transcription factor in the JAK-STAT signaling pathway, plays a crucial role in inflammation and immune regulation. It promotes the expression of pro-inflammatory factors, such as IL-6 and TNF-α, thereby exacerbating the inflammatory storm associated with sepsis (reference). MAPK14 (p38 MAPK), as a stress kinase, regulates cytokine synthesis and apoptosis, participating in pathogen-induced immune responses. CEBPB regulates myeloid cell differentiation and inflammatory factor production, significantly impacting immune cell function and the inflammatory response. TLR2, as a pattern recognition receptor, mediates the recognition of pathogen-associated molecular patterns (PAMPs), activates downstream NF-κB and MAPK signaling pathways, and promotes the release of inflammatory factors. Transcription factors such as ETS1 and JUNB are involved in T cell differentiation and the transcriptional regulation of inflammatory genes, while PPARG influences the immune microenvironment by regulating lipid metabolism and inhibiting inflammatory responses. MAP3K5 (ASK1) plays a role in cellular stress and apoptosis, and its activation leads to immune cell dysfunction. These hub genes collaboratively drive the onset and progression of neonatal sepsis by regulating inflammation, immune cell differentiation, and cellular senescence. In line with our findings, these genes and their regulatory networks not only help to uncover the molecular mechanisms of neonatal sepsis but also provide new insights for early diagnosis and targeted therapy of the disease. Immune infiltration analysis revealed notable disparities in the abundance of 12 distinct immune cell types between the NS and normal groups. In particular, the number of activated dendritic cells (DCs) showed a marked increase in the NS group. DCs serve as essential antigen-presenting cells that link innate and adaptive immunity through the capture, processing, and presentation of antigens to T cells, thereby initiating and modulating immune responses^[116]20. The elevated proportion of activated DCs in patients with NS suggests enhanced antigen-presenting activity, which may be a response to the systemic infection characteristic of sepsis. Furthermore, our analysis identified a positive correlation between ETS1 and naïve CD4^+ T cells. ETS1 functions as a transcription factor that plays a pivotal role in the regulation of T cell development and differentiation^[117]21. In NS, the upregulation of ETS1 suggests a role in the inflammatory and immune regulatory pathways of the disease. ETS1 could influence the expression of cytokines and other inflammatory mediators, contributing to the pathogenesis of septicemia^[118]22,[119]23. Conversely, TLR2 levels were negatively correlated with resting memory CD4^+ T cells. TLR2 is a pattern recognition receptor that plays a pivotal role in the recognition of pathogen-associated molecular patterns and the induction of inflammatory responses^[120]24. Increased expression of ETS1 in conjunction with naïve CD4 + T cells suggests its potential role in maintaining or expanding the naïve T cell pool, which is essential for mounting effective immune responses against new antigens encountered during sepsis. The inverse relationship between TLR2 and resting memory CD4 + T cells may indicate a shift from a memory to an active immune state, reflecting heightened immune activation and inflammation observed in sepsis. These findings highlight the intricate interplay between various types of immune cells and their regulatory genes in the NS. The differential expression and correlation of key immune-related genes, such as ETS1 and TLR2, with specific immune cell subsets highlight the potential mechanisms driving immune dysregulation observed in NS. Understanding these interactions will provide valuable insights into the pathophysiology of sepsis and may inform the development of targeted immunomodulatory therapies to improve clinical outcomes in neonates with sepsis. Immune infiltration analysis revealed notable discrepancies in the composition and abundance of immune cell populations between septic and normal samples. Notably, activated DCs were significantly different, indicating their potential role in the immune dysregulation observed in NS. Correlation analysis showed that ETS1 was positively associated with naïve CD4^+ T cells, whereas TLR2 was negatively associated with dormant memory CD4^+ T cells, suggesting a complex interplay between these hub genes and immune cell types in NS^[121]25. Limitations of the study Despite its comprehensive analysis and significant findings, this study has several limitations. First, the study relied solely on bioinformatic approaches without incorporating wet lab experiments, which could provide more direct evidence to support the computational predictions. Second, although the sample size was comparatively substantial, it may have been insufficient to comprehensively encompass the diversity of NS. Third, the lack of clinical affirmation means that the diagnostic and therapeutic potential of the identified hub genes remain speculative until further clinical trials are conducted. Furthermore, the amalgamation of diverse datasets, while enhancing the reliability of the findings, presents the possibility of batch effects that may not be entirely eliminated, even with the application of advanced normalization methodologies. Conclusions This investigation effectively identified CSRDEGs in NS and elucidated their potential regulatory networks and patterns of immune infiltration, highlighting several key biological processes and pathways significantly associated with NS. The identified hub genes exhibited remarkable diagnostic efficacy, indicating their promise as biomarkers for early diagnosis and targets for therapeutic intervention. Future research should concentrate on substantiating these findings through clinical trials and wet lab experiments to fully harness their potential for clinical applications. The insights gained from this study will pave the way for further targeted and effective interventions in the management of NS and provide a comprehensive analysis of the role of CSRDEGs in NS, highlighting their involvement in critical biological processes and pathways. These findings provide novel insights into the molecular mechanisms underlying NS and suggest promising biomarkers and therapeutic targets for this devastating condition. Methods Database sources We download NS datasets [122]GSE69686^[123]26 and [124]GSE25504^[125]27,[126]28 from GEO. All were human samples, and the tissue of origin was blood. The chip platform used for the [127]GSE69686 dataset was [128]GPL20292, and [129]GPL6947 was used for the [130]GSE25504 dataset; detailed information is presented in Table [131]1. [132]GSE69686 included data corresponding to 64 septicemia (NS) samples and 85 normal (Normal) samples; [133]GSE25504 included data corresponding to 26 NS samples and 37 Normal samples. All NS and normal samples (Normal) were integrated into this study. GeneCards^[134]29 ([135]https://www.genecards.org/) is a database that hosts information regarding human genes, including cellular aging- and cellular senescence-related genes. After using the term “Cellular Senescence” as a search keyword and filtering only CSRGs with the tags, “Protein Coding” and “Relevance Score > 2,” a total of 469 CSRGs were identified after merging and duplicate removal; specific details are presented in Supplementary Table S1. The datasets [136]GSE69686 and [137]GSE25504 were batch-corrected using the R package sva7 (Version 3.50.0) to generate the integrated GEO dataset (Combined Datasets). This integrated GEO dataset (Combined Datasets) includes 90 sepsis (NS) samples and 122 normal (Normal) samples. Subsequently, the integrated GEO dataset (Combined Datasets) was normalized using the R package limma8 R^[138]1–[139]4(Version 3.58.1), which included probe annotation and other normalization procedures. Principal component analysis (PCA) was then performed on the expression matrices both before and after batch effect removal to validate the effectiveness of the batch correction. Principal component analysis (PCA) is a dimensionality reduction technique that extracts feature vectors (components) from high-dimensional data, transforms them into lower-dimensional data, and visualizes these features in two-dimensional or three-dimensional plots. DEGs associated with cell senescence in NS Based on the sample grouping of the integrated GEO dataset (Combined Datasets), the samples were classified into sepsis (NS) samples and normal (Normal) samples. Differential expression analysis of genes within both the sepsis (NS) and normal (Normal) samples was performed using the R package limma8 (v.3.58.1)^[140]30.The thresholds for identifying differentially expressed genes (DEGs) were set at |logFC|> 0.5 and adj. p < 0.05. Specifically, genes with logFC > 0.5 and adj. p < 0.05 were classified as up-regulated genes, while those with logFC < − 0.5 and adj. p < 0.05 were categorized as down-regulated genes. The p-value correction method employed was the Benjamini-Hochberg (BH) procedure. The results of the differential analysis were visualized using a volcano plot generated by the R package ggplot2 (Version 3.4.4). To identify cellular senescence-related differentially expressed genes (CSRDEGs) associated with neonatal sepsis (NS), the differentially expressed genes (DEGs) obtained from the integrated GEO dataset (Combined Datasets), which met the criteria of |logFC|> 0.5 and adj. p < 0.05, were intersected with cellular senescence-related genes (CSRGs). A Venn diagram was created to illustrate this intersection, leading to the identification of CSRDEGs. Additionally, a heatmap illustrating these genes was generated using the R package pheatmap (Version 1.0.12), and a chromosome localization diagram was constructed using the R package RCircos10 (Version 1.2.2). GO and KEGG pathway enrichment analysis Gene Ontology (GO) analysis^[141]5 is a widely used method for large-scale functional enrichment studies, encompassing Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) The Kyoto Encyclopedia of Genes and Genomes (KEGG)^[142]6 is a well-established database that provides information on genomes, biological pathways, diseases, and drugs. We conducted GO and KEGG pathway enrichment analyses on the cellular senescence-related differentially expressed genes (CSRDEGs) using the R package clusterProfiler (Version 4.10.0). The criteria for entry selection were set at adj. p < 0.05 and an FDR value (q-value) < 0.25, with the p-value correction method being Benjamini-Hochberg (BH). Finally, the results of the KEGG pathway enrichment analysis were visualized through pathway diagrams generated using the R package Pathview (Version 1.42.0). GSEA Gene Set Enrichment Analysis (GSEA)^[143]31 is a method used to evaluate the distribution trends of genes within predefined gene sets, ranked according to their correlation with a phenotype, in order to determine their contribution to that phenotype. In this study, genes from the integrated GEO dataset (Combined Datasets) were initially sorted based on logFC values. Subsequently, GSEA was performed on all genes in the integrated GEO dataset using the R package clusterProfiler (Version 4.10.0). The parameters for the GSEA were as follows: the seed was set to 2022, the number of permutations was set to 1000, and each gene set included a minimum of 10 genes and a maximum of 500 genes. Gene sets were sourced from the Molecular Signatures Database (MSigDB) as c2.cp.all.v2022.1.Hs.symbols.gmt [All Canonical Pathways] (3050). The selection criteria for GSEA were established as adj. p < 0.05 and an FDR value (q-value) < 0.25, with the p-value correction method being Benjamini–Hochberg (BH). Protein–protein interaction network and screening of key genes The protein–protein interaction (PPI) network helps identify proteins that engage in interactions with one another, serving a crucial function in biological signaling and the modulation of gene expression. In this study, the STRING database was used to construct the PPI network. We selected PPI interactions in a network that incorporates interactions with other genes for subsequent analysis. Within the PPI network, the scores of CSRDEGs were computed, leading to the selection of the top 10 CSRDEGs based on these scores. Ultimately, the genes identified through five distinct algorithms were compared and analyzed using a Venn diagram. The genes at the intersection of these algorithms were recognized as hub genes associated with cellular senescence. Construction of a regulatory network TFs govern the expression of CSRGs by engaging with CSRDEGs during the post-transcriptional phase. By leveraging the information hosted on the ChIPBase database ([144]http://rna.sysu.edu.cn/chipbase/)^[145]7, we retrieved the TFs of interest and analyzed their regulatory functions in the context of genetic variations associated with cellular aging, i.e., these were our CSRDEGs. The Cytoscape^[146]8 software was used to illustrate the mRNA-TF Regulatory Network. Differential expression verification and receiver operating curve analysis of key genes To further investigate the expression variations in hub genes between the NS and Normal groups within the integrated GEO datasets, a comparative map was constructed based on the expression of the hub genes identified in the analysis above. Subsequently, the R package pROC^[147]32 (Version 1.18.5) was used to generate the ROC curve for the hub genes and calculate the AUC value. Immune infiltration analysis of neonatal sepsis (CIBERSORT) CIBERSORT^[148]26 is a method grounded in the principles of linear support vector regression, designed to perform deconvolution on transcriptome expression matrices and estimate the composition and abundance of immune cells within mixed cell populations. By employing the CIBERSORT algorithm in conjunction with the LM22 feature gene matrix and filtering out data with immune cell enrichment scores greater than zero, we obtained specific results for the immune cell infiltration matrix in the integrated GEO dataset (Combined Datasets), along with a bar chart to visualize the proportions. Subsequently, group comparison plots were generated using the R package ggplot2 (Version 3.4.4) to illustrate the expression differences of LM22 immune cells between the Sepsis (NS) and Normal (Normal) groups within the integrated GEO dataset (Combined Datasets). Immune cells exhibiting significant differences between the two groups were selected for further analysis. The correlation among immune cells was assessed using the Spearman algorithm, and a correlation heatmap was created with the R package pheatmap (Version 1.0.12) to present the results of the immune cell correlation analysis. Additionally, the correlation between Hub Genes and immune cells was calculated based on the Spearman algorithm, retaining results with a p-value < 0.05. A correlation bubble plot was subsequently generated using the R package ggplot2 (Version 3.4.4) to visualize the relationship between Hub Genes and immune cells. The immune cell infiltration analysis revealed significant differences in the abundance of 12 distinct immune cell types between the NS group and the normal group. During the data processing phase, we effectively eliminated batch effects across different datasets using the sva package, thereby minimizing systematic bias in subsequent analyses. However, it is important to note that CIBERSORT analysis relies on a reference immune signature matrix, which may not fully capture the dynamics and developmental characteristics of the neonatal immune system. Regardless, we employed CIBERSORT for this preliminary exploration of immune-related mechanisms. Given the constraints of time and resources, we will consider incorporating additional analytical methods in future studies. Statistical analysis All data processing and analyses in this study were conducted using R software (v.4.2.2). For the comparative assessment of continuous variables between the two groups, the statistical significance of normally distributed variables was determined using the independent Student’s t-test unless otherwise indicated. The Mann–Whitney U test (Wilcoxon rank sum test) was used to examine differences between variables that did not conform to a normal distribution. The Kruskal–Wallis test was used to compare three or more groups. Spearman’s correlation analysis was performed to calculate the correlation coefficients between different molecules. Author contributions Conception and design, JZ and SJ; Data analysis, YL and YP; Statistical evaluation, HH and RH; Manuscript writing, JZ; Review and editing, XZ and YL JZ: Jenny Zhou; HH: Huiyi Huo; RH: Ruth He; YL: Yongxue Lu; YP: Yunju Peng; XZ: Xiaoping Zou; SJ: Suhua Jiang. Data availability The datasets ([149]GSE69686 and [150]GSE25504) that support the conclusions of this article can be found in the GEO database ([151]https://www.ncbi.nlm.nih.gov/geo/). Declarations Competing interest The authors declare no competing interests. Footnotes Publisher’s notes Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. References