Abstract Objective Allergic asthma is driven by an antigen-specific immune response. This study aimed to identify immune-related differentially expressed genes in childhood asthma and establish a classification diagnostic model based on these genes. Methods [31]GSE65204 and [32]GSE19187 were downloaded and served as training set and validation set. The immune cell composition was evaluated with ssGSEA algorithm based on the immune-related gene set. Modules that significantly related to the asthma were selected by WGCNA algorithm. The immune-related differentially expressed genes (DE-IRGs) were screened, the protein-protein interaction network and diagnostic model of DE-IRGs was constructed. The pathway and immune correlation analysis of hub DE-IRGs was analyzed. Results Eight immune cell types exhibited varying levels of abundance between the asthma and control groups. A total of 112 differentially expressed immune-related genes (DE-IRGs) was identified. Through the application of four ranking methods (MCC, MNC, DEGREE, and EPC), 17 hub DE-IRGs with overlapping significance were further selected. Subsequently, 8 optimized were identified using univariate logistic regression analysis and the LASSO regression algorithm, based on which a robust diagnostic model was constructed. Notably, TNF and CD40LG emerged as direct participants in asthma-related signaling pathways, displaying a positive correlation with the immune cell types of immature B cells, activated B cells, activated CD8 T cells, activated CD4 T cells, and myeloid-derived suppressor cells. Conclusion The diagnostic model constructed using the DE-IRGs (CCL5, CCR5, CD40LG, CD8A, IL2RB, PDCD1, TNF, and ZAP70) exhibited high and specific diagnostic value for childhood asthma. The diagnostic model may contribute to the diagnosis of childhood asthma. Keywords: Childhood asthma, Diagnosis, Immune cells, Pathways Highlights * • The immune-related differentially expressed genes were analyzed for pathways and immune cells. * • CCL5, CCR5, CD40LG, CD8A, IL2RB, PDCD1, TNF, and ZAP70 were novel candidates for childhood asthma. * • A classification model for the accurate diagnosis of childhood asthma was obtained. 1. Introduction Asthma is the most prevalent chronic respiratory condition during childhood [[33]1]. Manifestations of asthma encompass recurring wheezing, coughing, chest constriction, and breathlessness, often accompanied by nocturnal and early morning symptoms, which can significantly impact one's quality of life [[34]2]. Recently, there has been a sharp increase in the overall prevalence of asthma among children worldwide, particularly among preschool-aged children [[35]3]. This not only profoundly affects the physical and mental well-being of children but also imposes substantial emotional and economic burdens on families and society at large [[36]4]. Asthma symptoms may manifest early in life, with approximately one-third of children experiencing wheezing within their first three years [[37]5]. Although the majority of these children will cease to wheeze by the age of six, 40 % will continue to experience wheezing, either having already developed asthma or developing it later in life [[38]6]. Depending on the methodology used for assessment, up to 10–15 % of children may exhibit asthma symptoms by school age [[39]7]. Nevertheless, the underlying molecular mechanisms responsible for the occurrence and progression of childhood asthma remain elusive. Hence, elucidating the molecular mechanisms underlying childhood asthma could enhance early diagnosis, prognostic evaluation, and disease management. With the advancement of bioinformatic methodologies, high-throughput analysis have become extensively utilized to furnish pivotal insights into the realm of molecular mechanisms [[40]8]. Bioinformatic scrutiny can unveil an intricate biological progression that traverses diverse functional networks across distinct disease models [[41]9]. Regrettably, in the realm of asthma-related investigations, only a handful of studies have delved into the gene expression profiles. For instance, gene expression analyses have identified disparate expression patterns of CXCR2, ALPL, CLC, CPA3, DNASEIL3, and IL1B in the pathogenesis of asthma [[42]10,[43]11]. Katayama et al. have discerned a distinctive gene module exclusive to acute wheezing in the blood samples of preschool wheezers [[44]12]. This module exhibited associations with vitamin D levels and asthma medication. Shi et al. found that TNF and HLA-DPA1 were associated with immune responses in childhood atopic asthma [[45]13]. Moreover, a mounting body of evidence suggests that innate immune cells play indispensable roles in the development of various asthma phenotypes [[46]14]. Immune cells have been implicated in the pathogenic progression of asthma, which is characterized by granulocytic inflammation infiltrating the airways [[47]15,[48]16]. Nonetheless, the bioinformatics analysis investigating the correlation between infiltrating immune cells and pivotal genes in childhood asthma remains largely understudied. In this paper, immunity was integrated into the whole genome analysis for genomics and predictive model of asthma in children. The distribution of immune cells in samples was evaluated, and then WGCNA algorithm was used, the hub genes related to immunity in children's asthma were obtained by constructing the network, and the candidate genes closely linked to immunity in childhood asthma were identified. Finally, the asthma classification diagnostic model was constructed based on the gene expression level. The results may help for early diagnosis and effective treatment of childhood asthma. 2. Materials and methods 2.1. GEO data download [49]GSE65204 and [50]GSE19187 datasets were downloaded from NCBI Gene Expression Omnibus (GEO) [[51]17] ([52]http://www.ncbi.nlm.nih.gov/geo/) on March 25, 2023. The [53]GSE65204 dataset [[54]18] was used as the training set, encompassing 69 samples of nasal epithelial tissue from children, consisting of 33 normal control samples and 36 asthma samples. The detection platform was Agilent-028,004 SurePrint G3 Human Ge 8 × 60K Microarray. The [55]GSE19187 dataset [[56]19] consists of 38 samples of nasal epithelial tissue from children, of which 24 relevant samples (11 normal control tissues and 13 asthma disease samples) were selected as validation set. The detection platform was Affymetrix Human Gene 1.0 St Array. 2.2. Analysis of immune cell infiltration To evaluate the type of immune infiltration of samples, the immunologic signature gene sets were downloaded from the Gene Set Enrichment Analysis (GSEA) database [[57]20]. Subsequently, the samples in the [58]GSE65204 dataset were assessed for immune infiltration types using the single sample gene set enrichment analysis (ssGSEA) algorithm [[59]21], based on the GSVA package (Version 1.36.3) [[60]22] ([61]http://www.bioconductor.org/Packages/bioc/GSVA.html) in the R3.6.1 language. Next, the Kruskal-Wallis method was used to compare the differences in the proportions of immune cells between the control and asthma samples. 2.3. Weighted gene correlation Network Analysis Weighed Gene Co-expression Network Analysis (WGCNA) is a bioinformatics algorithm utilized to construct co-expression networks, aiming to identify modules associated with diseases and select crucial pathological mechanisms or potential therapeutic targets [[62]23]. Based on the gene expression levels in [63]GSE65204, the R3.6.1 WGCNA package Version 1.61 [[64]24] ([65]https://cran.r-project.org/web/packages/wgcna/index.html) was used to screen for modules significantly correlated with sample grouping. The threshold for module was: at least 100 genes in the module set and cutheight = 0.995. Subsequently, the associations between each module and disease status, age, gender, and differentially expressed immune cells were calculated. Modules with p-values less than 0.05 and correlation coefficients exceeding 0.3 in absolute value were selected as modules associated with asthma. The genes within these modules that are significantly correlated with asthma were identified as candidate genes related to the disease and immune cells. 2.4. Screen of immune-related differentially expressed genes (DEGs) First, immune-related genes (IRGs) were downloaded from The Immunology DataBase and Analysis Portal (Immport) database [[66]25] ([67]https://www.immport.org/home). The DEGs between control and asthma samples in [68]GSE65204 were then screened using the R3.6.1 limma package Version 3.34.7 [[69]26] ([70]https://bioconductor.org/Packages/Release/BIOC/HTML/Limma.html) with screening thresholds of FDR less than 0.05 and log2 of absolute fold-change (log[2]FC) > 0.263. Two-directional hierarchical clustering [[71]27] of DEGs was performed on expression values based on Euclidean distance [[72]28] by the pheatmap package (Version 1.0.8) [[73]29] ([74]https://cran.r-project.org/package=pheatmap). After that, the screened DEGs were compared with the IRGs, and the overlapping genes were preserved as DE-IRGs. Using DAVID version 6.8 [[75]30] ([76]https://david.ncifCrf.gov/), the GO function and KEGG signaling pathway enrichment analysis of DE-IRGs were performed with FDR less than 0.05 as screening threshold. Finally, the DE-IRGs were compared with genes obtained by WGCNA algorithm, the overlapped genes were retained as asthma-related DE-IRGs for the next analysis. 2.5. Construction of protein-protein interaction (PPI) network and identification of hub genes Using the Retrieval of Interacting Genes (STRING) [[77]31] (version: 11.0, [78]http://string-db.org/) database, the interaction relationship between asthma-related DE-IRGs were searched to build an interaction network, and the network was visualized through Cytoscape Version 3.9.0 [[79]32] ([80]http://www.cytoscape.org/). Then, CytoHubba Version 0.1 [[81]33] in Cytoscape3.9.0 was used to identify hub genes in PPI network based on the topology analysis algorithm of MCC, MNC, DEGREE, and EPC. The overlapping hub genes screened under 4 algorithms were as the final hub DE-IRGs. 2.6. Construction of diagnostic model Based on the expression levels of the hub DE-IRGs in the [82]GSE65204 set, an univariate logistic regression analysis was performed using rms version 6.3-0 ([83]https://cran.r-project.org/web/packages/index.html) [[84]34] in R3.6.1, and the genes with p-values less than 0.05 were retained for further analysis. Next, the optimized DE-IRGs were screened using the LASSO algorithm in the R3.6.1 language lars package Version 1.2 [[85]35] ([86]https://cran.r-project.org/web/packages/lars/index.html). The risk value of each sample was calculated based on the LASSO coefficients of the obtained optimized DE-IRGs and the Kruskal- Wallis was used to compare the risk values between disease and normal control samples. In [87]GSE65204, The support vector machine (SVM) approach was utilized to construct a disease diagnostic classifier (Core: Sigmoid Kernel; Cross: 100-Fold Cross Validation) using R3.6.1 e1071 version 1.6–8 [[88]36] ([89]https://cran.r-project.org/web/packages/e1071). The sensitivity and specificity of the receiver operating characteristic (ROC) curve calculated using R 3.6.1 pROC version 1.12.1 [[90]37] ([91]https://cran.r-project.org/web/packages/proc/index.html) were used to evaluate the performance of the diagnostic model in the [92]GSE65204 and [93]GSE19187 dataset. 2.7. Analysis of the pathway and immune correlation analysis of hub DE-IRGs Based on the expression levels of hub DE-IRGs in [94]GSE65204, the correlation between their expression levels and immune cells was calculated using the “cor” function in the R 3.6.1 language. Subsequently, the hub DE-IRGs were subjected to KEGG enrichment analysis using the KEGG Orthology Based Annotation System 3.0 (KOBAS 3.0) database [[95]38] ([96]http://kobas.cbi.pku.cn/index.php.). Furthermore, asthma-related KEGG signaling pathways were downloaded from the Comparative Toxicogenomics Database 2023 update database ([97]http://ctd.mdibl.org/) and compared with the significantly correlated KEGG signaling pathways identified earlier among the hub DE-IRGs, retaining the overlapping portions. Finally, by integrating the immune cell-DE-IRGs-asthma related KEGG signaling pathway connections, a immune cell-DE-IRGs-asthma related KEGG signaling pathway relationship network was constructed. 2.8. Statistical analysis and workflow All the statistical analysis carried out in this study were performed using R 3.6.1. A p value < 0.05 was considered statistically significant (*p < 0.05, **p < 0.01, and ***p < 0.001). The flow chart of the study was described in [98]Fig. 1. Fig. 1. [99]Fig. 1 [100]Open in a new tab The flow chart of the study. 3. Results 3.1. Evaluation of immune cell infiltration The relative infiltration levels of 28 immune cells in the asthma group compared with control group were obtained by ssGSEA algorithm analysis ([101]Fig. 2). The results showed that Type 2 T helper cell, Mast cell, Activated dendritic cell, Immature B cell, Activated CD8 T cell, Activated CD4 T cell, Myeloid derived suppressor cell and Activated B cell were significantly different between asthma and control groups. Fig. 2. [102]Fig. 2 [103]Open in a new tab Heat map showing the distribution of each type of immune cell in two groups. *: p < 0.05, **: p < 0.01, and ***: p < 0.005. AS: asthma; and CTRL: control. 3.2. Construction of weight gene co-expression network and identification of the key module To elucidate the link between functional modules and immune cell infiltration in asthma patients, we performed WGCNA analysis. The results demonstrated that when squared correlation coefficients (r^2) = 0.9, the power value was 12, and the average node connectivity of the constructed co-expression network was 1 ([104]Fig. 3A and B). Then, we set the minimum number of genes per module to 100, and Cutheight = 0.995, a total of 11 modules were obtained ([105]Fig. 3C). Multi-dimensional scaling of expression data of all genes in these modules demonstrated that the genes contained in the various different modules tend to be distributed in the same regions ([106]Fig. 3D). Correlation analysis between the eigengenes of each module and immune cell is presented in [107]Fig. 4. Subsequently, four modules (brown, green, purple and red) were screened according to p < 0.05 and |r|>3, which contained 542, 293, 152, and 234 genes respectively. These genes are thought to be asthma-related genes. Fig. 3. [108]Fig. 3 [109]Open in a new tab A: The adjacency matrix of a weighted graph. The horizontal axis represents the weight parameter power, and the vertical axis represents the square of the correlation coefficient between log(k) and log(p(k)) in the network. The red line represents the standard line where the square of the correlation coefficient reaches 0.9. B: Schematic diagram of the average connectivity of genes under different power parameters. The red line represents the value of the average connectivity of network nodes under the power value of A, the average connectivity of network nodes was 1. C. Module division tree. Each color represents a different module. D. MDS map of genes in each module. (For interpretation of the references to color in this figure legend, the