Abstract Background Eosinophilic esophagitis (EoE) is a complex allergic condition frequently accompanied by various atopic comorbidities in children, which significantly affects their life qualities. Therefore, this study aimed to evaluate pivotal molecular markers that may facilitate the diagnosis of EoE in pediatric patients. Methods Three available EoE-associated gene expression datasets in children: [32]GSE184182, GSE 197702, [33]GSE55794, along with [34]GSE173895 were downloaded from the GEO database. Differentially expressed genes (DEGs) identified by “limma” were intersected with key module genes identified by weighted gene co-expression network analysis (WGCNA), and the shared genes went through functional enrichment analysis. The protein–protein interaction (PPI) network and the machine learning algorithms: least absolute shrinkage and selection operator (LASSO), random forest (RF), and XGBoost were used to reveal candidate diagnostic markers for EoE. The receiver operating characteristic (ROC) curve showed the efficacy of differential diagnosis of this marker, along with online databases predicting its molecular regulatory network. Finally, we performed gene set enrichment analysis (GSEA) and assessed immune cell infiltration of EoE/control samples by using the CIBERSORT algorithm. The correlations between the key diagnostic biomarker and immune cells were also investigated. Results The intersection of 936 DEGs and 1446 key module genes in EoE generated 567 genes, which were primarily enriched in immune regulation. Following the construction of the PPI network and filtration by machine learning, CXCR2 served as a potential diagnostic biomarker of pediatric EoE with a perfect diagnostic efficacy (AUC = ~1.00) in regional tissue/peripheral whole blood samples. Multiple infiltrated immune cells were observed to participate in disrupting the homeostasis of esophageal epithelium to varying degrees. Conclusion The immune-correlated CXCR2 gene was proved to be a promising diagnostic indicator for EoE, and dysregulated regulatory T cells (Tregs)/neutrophils might play a crucial role in the pathogenesis of EoE in children. Keywords: Eosinophilic esophagitis, pediatrics, bioinformatic analysis, machine-learning, potential diagnostic/therapeutic biomarker Introduction Eosinophilic esophagitis (EoE) is an immune-mediated disease mainly featured by esophageal eosinophilic infiltration histologically as well as by esophageal dysfunction (ie, dysphagia and gastroesophageal reflux for children and adolescence) clinically.[35]^1^,[36]^2 Current epidemiological evidence demonstrated an increasing incidence and prevalence of EOE among children in developed countries, from 7.3 per 100,000 per year in 1995–2004 to 50.5 per 100,000 across the US for 2009–2011,[37]^3^,[38]^4 and from no cases in 1993–2003 to 42.8 per 100,000 in 2007–2009 according to European cohorts.[39]^5^,[40]^6 The estimated 1/2000 incidence rate was similar to that of pediatric Crohn’s disease.[41]^7 An inflammatory phenotype in childhood would progressively develop into a fibro-stenotic situation over time and an average delay of diagnosis of 3–5 years normally found in pediatric patients leads to elevated risk of esophageal stricture,[42]^8 for they are neither unaware of the illness nor acceptant to repeated endoscopy inspection burden.[43]^9 Moreover, eosinophilia in EoE is frequently patch, and at least four biopsies would be required.[44]^7 Therefore, identifying sensitive and specific diagnostic biomarkers for early and accurate EoE diagnosis prior to reaching an irreversible fibrosis stage is important to improving outcomes in the long run. Previous studies have indicated the involvement of impaired esophageal epithelial barrier functions and abnormal T-helper cell type 2 (Th2)-correlated immune pathways, particularly those involving interleukin (IL)-4/IL-13 signaling.[45]^10^,[46]^11 Besides, it has been shown that EoE has an intimate association with atopy, which, on the one hand, refers to that atopic comorbidities’ rates are higher in EoE patients than in the normal population, with 26%–50% of EoE patients having asthma, 30%–90% having allergic rhinitis, and 19%–55% having atopic dermatitis,[47]^12^,[48]^13 and, on the other hand, refers to the characterized chronic allergen-induced mucosal inflammation and potentiated fibroblast activation.[49]^14 Notably, the susceptibility genes of EoE largely overlap with atopic diseases rather than with other prevalent inflammatory/neoplastic gastrointestinal diseases.[50]^15 Nowadays, high-throughput sequencing techniques could detect parallel expression levels for thousands of genes in tissues and assist in identifying disease-correlated genes to screen out novel diagnostic or therapeutic biomarkers.[51]^16 Utilizing either supervised or unsupervised methods, machine learning algorithms could help dig to find the underlying connections in high-dimensional transcriptomic data,[52]^17–19 which has been widely adopted in previous analyses of multiple diseases. Therefore, it is plausible to identify important biomarkers for predicting EoE through bioinformatical and machine-learning analysis. In addition, the emerging role of non-coding RNAs (ncRNAs) in regulating gene expression has been identified and these ncRNAs consist of: [1] long ncRNAs (lncRNAs) as well as [2] short ncRNAs (mainly microRNAs [miRNAs], 18–22 nucleotides).[53]^20 Using new-level sequencing methods reported lncRNAs as the most common form of genomic sequences transcription,[54]^21 supporting the most widely accepted proposal of competitive endogenous RNA (ceRNA) model that referring to lncRNAs competing with mRNAs for binding with miRNAs.[55]^22 To the best of our knowledge, available studies of pediatric EoE mainly concentrated on specific immunological or molecular mechanisms in the blood, and were mainly based on small-scale cohort analysis.[56]^9^,[57]^23 The definite cause of EoE still remains uncertain so far. Several genetic risk loci of EoE (ie, CAPN14, DEX1 and TSLP) have been determined by genome-wide association study (GWAS) or meta-analysis,[58]^24–26 while their underlying associations remain an open issue. In the present study, we utilized publicly open datasets from the GEO database, the “Limma” method, the “weighted gene co-expression network analysis” (WGCNA), and machine-learning algorithms to identify pivotal EoE-correlated diagnostic markers, which in combination with immune infiltration analysis and ceRNA regulatory network prediction could, to some extent, facilitate a deeper understanding of potential regulatory molecular mechanisms during the pathogenesis of EoE in children. Furthermore, the Connectivity Map (CMAP) database contained 6100 instances of 1309 small molecule reagents, with each containing gene expression profiles upon intervention of specific reagents.[59]^27 We imported 300 genes with the most difference in their expression level (150 upregulated genes and 150 downregulated genes) between the EoE and normal esophageal tissues to the CMAP database to predict promising molecular reagents for EoE treatment, which may inspire future clinical practices. Materials and Methods Data Collection and Processing To acquire EoE-correlated gene expression data, we searched the Gene Expression Omnibus (GEO) Database ([60]https://www.ncbi.nlm.nih.gov/geo/)[61]^28 through the keyword: “Eosinophilic esophagitis” [MeSH Terms] AND “Homo sapiens” [porgn: txid9606]. The [62]GSE184182[63]^9 and [64]GSE197702[65]^29 datasets containing samples from the healthy or diseased children (without any other disease except for EoE) were obtained and we performed the principal component analysis of the [66]GSE184182 dataset to demonstrate their distribution. Their expression matrix was processed following the criteria: [1] Removing empty probes that have no matched gene; [2] Deleting probes that match multiple genes; [3] Selecting the probe with the highest expression level if a gene corresponds to multiple probes; [4] Calculating the median values as the expression level if multiple probes identify the same gene. The log2 transformation was applied for the dataset [67]GSE197702. Subsequently, the [68]GSE184182 dataset functioned as the discovery dataset as well as the [69]GSE197702 dataset worked as the validation dataset. To further evaluate the diagnostic efficacy of identified biomarker, we additionally chosen [70]GSE156651, [71]GSE41687, [72]GSE148381, and [73]GSE55794 datasets, of which the detailed information is listed in [74]Table 1. Table 1. Detailed Information of Utilized GEO Datasets GSE Series Sample Counts Source Type Platform Group EoE non-EoE [75]GSE184182 n = 10 n = 10 Esophagus [76]GPL21185 Discovery [77]GSE197702 n = 20 n = 7 Esophagus [78]GPL20301 Validation [79]GSE156651 n = 5 n = 5 Whole blood [80]GPL16791 Validation n = 5 n = 5 Esophagus [81]GPL16791 Validation [82]GSE41687 n = 6 n = 12 Esophagus [83]GPL9115 Validation [84]GSE148381 n = 10 n = 41 Esophagus [85]GPL24676 Validation [86]GSE55794 n = 5 n = 6 Esophagus [87]GPL16142 Validation [88]Open in a new tab Identification of Differentially Expressed Genes (DEGs) The “Limma” R package was utilized to identify DEGs between the pediatric EoE group and control group in the discovery dataset,[89]^30 based on |log[2](Foldchange)| > 1 and adjusted P value < 0.05 (adopting Benjamini and Hochberg’s method to control the false discovery rate). A volcano plot was generated to show DEGs, and a heatmap was drawn to depict the distribution pattern of DEGs via using the “Pheatmap” R package. Gene Set Enrichment Analysis (GSEA) GSEA was performed via using the “cluster Profiler” R package,[90]^31^,[91]^32 and the “c2. cp.kegg.v7.4.symbols.gmt” from ([92]https://www.gsea-msigdb.org/gsea/msigdb/index.jsp)[93]^33 was utilized as the reference gene set. GSEA was carried out in children either in EoE or healthy states to determine whether the preset gene set was enriched at the top/bottom of the sequencing table and reveal several shared biological signaling pathways. The results were visualized by the “ggplot2” R package.[94]^34 P value < 0.05 was statistically significant. Identification of Hub Modules by WGCNA The weighted gene co-expression network (WGCNA) is a systematic approach for identifying gene clusters co-expressed by many genes and investigating the correlations within this gene co-expression network.[95]^35 First, the median absolute deviation (MAD) of the gene was computed and the 50% of genes with the smallest MAD were cancelled. Then, the DEGs expression matrix was filtered by the goodSamplesGenes functions to diminish unqualified genes and samples. After that, a scale-free co-expression network was constructed. Next, the co-expression similarity-derived “soft” thresholding power (β) was adopted to compute the adjacency, which was transformed into a topological overlap matrix (TOM). The fourth procedure was detecting modules via using hierarchical clustering and a dynamic tree cut function. Genes with identical expression patterns were divided into modules via average linkage hierarchical clustering, with a TOM-based dissimilarity metric. A minimum gene group at the size of n = 50 was set. Finally, we calculated the module eigengenes dissimilarity and the eigengene network was visualized. We utilized the “VennDiagram” R package to identify shared genes between DEGs and the above selected hub modules. DEGs with gene significance (GS) larger than 0.2, as well as module membership (MM) larger than 0.8 were designated as hub genes. GS and MM were indications of clinical relevance and a highly connected module.[96]^35 Gene Ontology (GO) and Pathway Enrichment Analysis The ClueGO plug-in of Cytoscape 3.8.2 was used to reveal the interaction network of the “biological processes” enriched by up-regulated and down-regulated DEGs. Those presented GO terms met the threshold of P value < 0.01. GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were taken via utilizing the “cluster Profiler” R package, with P value < 0.05 statistically significant. GO and functional processes are divided into 3 aspects: biological process (BP), cellular component (CC), and molecular function (MF).[97]^36 KEGG as well as Reactome databases were utilized as the source of pathway annotations.[98]^36 Construction of protein‒protein Interaction (PPI) Network The STRING platform ([99]http://string-db.org) was used to analyze the PPI network,[100]^37 including 567 DEGs with the MM > 0.8, and GS > 0.2. A minimum required interaction score > 0.4 (the medium confidence) was set as the cut-off criterium, along with “Homo sapiens” as the organism. The MCODE and CytoHubba plug-ins of the Cytoscape 3.8.2 were, respectively, utilized to screen out the core modules. Their default parameters were as follows (for the MCODE plug-in: degree cutoff ≥ 10, node score cutoff ≥ 0.2, K-core ≥ 5, and max depth = 100; for the CytoHubba plug-in: the top 10 nodes chosen by MCC). Network visualization of core clusters was conducted via utilizing the Cytoscape 3.8.2,[101]^38 an open-source software for analyzing and visualizing networks. The above PPI network of proteins encoded by common DEGs depicted how these proteins interacted with each other physically and functionally, on the basis of evidence from observations and verified experimental results including text-mining, experiments, databases, co-expression, neighborhood, gene fusion, and co-occurrence information.[102]^39 Identification of Hub Diagnostic Biomarkers by Machine-Learning Algorithms In order to further select feature genes for inclusion in an optimal diagnostic model, various machine-learning algorithms: random forests (RF), the least absolute shrinkage and selection operator (LASSO), and eXtreme Gradient Boosting (XGBoost) were used to identify important markers of pediatric EoE.[103]^40–42 RF is a class of integrated classifiers, which demonstrates a strong predictive power that could build decision-tree forests and prevent overfitting.[104]^43^,[105]^44 We built the RF model and acquired feature genes via using the “randomForest” function of the “randomForest” R package. LASSO logistic regression is a method of penalty regression that can effectively recognize feature genes from high-dimensional data, of which the procedures were finished using the “cv.glmnet” function of the “glmnet” R package, with a minimal lambda regarded as optimal.[106]^45 The penalized term was selected using a 10-fold cross-validation method. This technique was reported to be able to evaluate the strongest association with the outcomes among various factors, without being impacted by confounding factors.[107]^46 XGBoost is a GBDT-based algorithm that could determine key features according to feature importance ranking and recursive elimination.[108]^47 The model of XGBoost was developed by using the “xgboost” R package. These algorithms contained strong power in carrying out a binary classification of data. The shared gene by all of the three classification models was selected as core diagnostic gene and then used during the following procedures. The validation dataset for evaluating the usefulness of the selected biomarkers was [109]GSE197702. The method of receiver operating characteristic (ROC) curve was adopted and the area under the curve (AUC) was calculated to measure the diagnostic capability of the model. The closer the value of AUC approximated “1.0”, the better the diagnostic performance of the model. Feature genes with AUC values exceeding 0.7 were regarded to have a relatively good predictive capacity. Moreover, we showed the expression level of feature genes between the EoE group and control group via violin plots. Immune Cell Infiltration Analysis Twenty-two types of immune cell matrices were filtered by using the CIBERSORT algorithm (P < 0.05).[110]^48 The Spearman correlation was calculated between the diagnostic biomarker and infiltrated immune cells. The “ggplot2” R package illustrates the results of the infiltration abundance of 21 immune cell types in EoE patients in the [111]GSE184182 dataset. Construction of the lncRNA-miRNA-mRNA Regulatory Network We predicted miRNAs that regulated the above selected diagnostic biomarkers via intersecting results of the miRWalk database ([112]http://mirwalk.umm.uni-heidelberg.de/),[113]^49 miRTarBase database ([114]https://mirtarbase.cuhk.edu.cn/),[115]^50 and the [116]GSE55794 dataset to improve the accuracy of prediction. The lncRNA-miRNA-mRNA network was visualized by Cytoscape 3.8.2. The binding lncRNAs of selected miRNAs were predicted using the Encyclopedia of RNA Interactomes (ENCORI) database ([117]http://starbase.sysu.edu.cn/).[118]^51 Prediction of the Therapeutic Drugs of Pediatric EoE The most differentially DEGs’ matrix of pediatric EoE was imported into the CMAP database ([119]https://clue.io/) to predict potential drugs that can reverse this gene expression pattern. The top ten small molecules were acquired by the rank of enrichment scores. Statistical Analysis R software version 4.2.1, GraphPad Prism Version 9.4.0 (GraphPad Software, San Diego, CA, USA), and SPSS Version 26.0 (IBM Corporation, Armonk, NY, USA) were used to perform statistical analysis. Using the Student’s t-test, continuous variables were compared between two groups. For correlation analysis, the Spearman correlation was utilized to analyze the relationships between core genes and immune cells. The receiver operating characteristic (ROC) curves, and the area under the curve (AUC) values were computed using the “pROC” R package. The results were presented as mean ± standard deviation (SD). P value < 0.05 was statistically significant. The study design is illustrated in [120]Figure 1. Figure 1. [121]Figure 1 [122]Open in a new tab Study flowchart. The flowchart showing analysis procedures of this study. The screened-out biomarker CXCR2 is presented in bold font. Abbreviations: GEO, gene expression omnibus; DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network; GSEA, gene set enrichment analysis; PPI, protein–protein interaction; LASSO, the least absolute shrinkage and selection operator; XGBoost, eXtreme Gradient Boosting; CIBERSORT, cell-type identification by estimating relative subsets of RNA transcripts. Results Identifying DEGs in Pediatric EoE-Related Dataset The principal component analysis (PCA) of the [123]GSE184182 dataset was executed, and each point in the scatter plot represented a sample ([124]Figure 2A). After a differential expression analysis, a total of 936 screened DEGs (including 420 up-regulated and 516 down-regulated genes) between groups were acquired and shown by different colors in a volcano plot ([125]Figure 2B). A heatmap indicated the relative similar distribution pattern of DEGs in the two groups ([126]Figure 2C). Figure 2. [127]Figure 2 [128]Open in a new tab Detection of DEGs of the pediatric EoE dataset. (A) 3D PCA analysis of the dataset. (B) A volcano plot depicting the distribution pattern of all genes in the dataset. (C) A heatmap comparing the distribution of the top 50 DEGs in pediatric patients with EoE or controls. (D) The top three KEGG terms of the dataset enriched by GSEA, p adjusted < 0.05. (E) A Venn diagram showing overlapping genes between DEGs and key modules identified by WGCNA. Abbreviations: DEGs, differentially expressed genes; KEGG, Kyoto Encyclopedia of Genes and Genomes;; WGCNA, weighted gene co-expression network; GSEA, gene set enrichment analysis; EoE, eosinophilic esophagitis. GSEA results GSEA was performed on pediatric healthy subjects and patients with EoE to figure out biologically important signaling pathways. The top three terms identified by KEGG analysis of pediatric EoE patients are shown by [129]Figure 2D, which included the “JAK-STAT signaling pathway”, “antigen processing and presentation”, and “leukocyte trans-endothelial migration”. Through using a Venn diagram, 567 genes were overlapped from DEGs and genes of the key modules screened out subsequently ([130]Figure 2E). WGCNA Analysis and Identification of EoE-Related Hub Gene Modules WGCNA was applied to identify the most correlated gene modules in the pediatric EoE dataset. The clustering dendrogram ([131]Figure 3A) of samples from EoE and control groups showed no prominent outliers. Based on scale independence of > 0.85, we chose β = 4 as the soft thresholding power to establish a scale-free network, of which the scale independence and average connectivity are exhibited in [132]Figure 3B. According to this power, 17 gene co-expression modules were generated via dynamic branch cut methods for gene dendrograms and exhibited by different colors ([133]Figure 3C). [134]Figure 3D shows the heatmap of eigengene adjacency. Finally, the associations between EoE and its gene co-expression modules were analyzed, of which the “lightyellow” module exhibited the highest positive correlation with pediatric EoE (correlation coefficient = 0.65, P = 2.1*e^−3), while the “blue” module had the opposite correlation (correlation coefficient = −0.82, P = 9.0*e-^6) ([135]Figure 3E). Both modules were regarded as core modules for subsequent analysis and significant positive correlations between module membership (MM) and gene significance (GS) were observed in both modules for EoE (as for the “lightyellow” module, r = 0.65; for the “blue” module, r = −0.82). Finally, based on the cut-off criteria of (|MM| > 0.8 and |GS| > 0.2), a total of 1446 genes with high connectivity in clinically significant modules were preserved as hub genes for following analysis. Figure 3. [136]Figure 3 [137]Open in a new tab WGCNA analysis of the pediatric EoE dataset. (A) The clustering dendrogram of samples in the discovery dataset ([138]GSE184182) to detect outliers. (B) The soft power plot showing that soft-threshold β = 4 (scale free R2 > 0.85); Analysis of the mean connectivity at various soft-threshold powers and when β = 4. The star symbol (the upper one) labels the value of the minimum soft-threshold power β, when the value of scale free R2 conforming to > 0.85 for the first time (R2 = 0.86, β = 4); The lower one labels the value of the mean connectivity when β = 4. (C) The gene dendrogram generated by average linkage hierarchical clustering. (D) The heatmap of eigengene adjacency. (E) The module–trait relationships. Abbreviations: WGCNA, weighted gene co-expression network; EoE, eosinophilic esophagitis. Functional Enrichment Analysis of DEGs and Construction of the PPI Network We then performed functional enrichment analyses to have a deeper understanding of their biological functions. Through utilizing the ClueGO plug-in of Cytoscape 3.8.2 to analyze enriched biological processes (BP), up-regulated DEGs exhibited involvement in the processes: “regulation of immune effector process”, “positive regulation of leukocyte differentiation”, “myeloid leukocyte migration”, “interferon-gamma mediated signaling pathway”, whereas down-regulated DEGs were highly enriched in “regulation of integrin biosynthetic process” that was closely correlated with esophageal epithelium morphogenesis and homeostasis, as well as “eicosanoid metabolic process” ([139]Figure 4A). Moreover, enriched cellular components (CC) terms were located in “mast cell granule” and “collagen-containing extracellular matrix”, while collected molecular function (MF) terms included “cytokine activity”, “superoxide-generating NADPH oxidase activity”, and “arachidonate 12-lipoxygenase activity” ([140]Figure 4B). Enriched KEGG pathways contained “JAK-STAT signaling pathway”, “NF-kappa B signaling pathway”, “MAPK signaling pathway”, “Th1 and Th2 cell differentiation”, and “arachidonic acid metabol-ism”, and meanwhile the Reactome enrichment analysis showed the enrichment of “IL4 and IL13 signaling pathway”, “biosynthesis of specialized proresolving mediators” and “activation of matrix metalloproteinases” ([141]Figure 4C and [142]D). Figure 4. [143]Figure 4 [144]Open in a new tab Functional enrichment analysis of DEGs. (A) GO enrichment analysis of up-regulated and down-regulated DEGs by the ClueGO plug-in of Cytoscape 3.8.2 in aspects of biological process. The “diamond” symbol in the line connecting the node (the enriched pathways) “regulation of production of molecular mediators of immune response” and node “production of molecular mediators of immune response” indicating the meaning of “REGULATE”. All presented terms meeting P value < 0.01. (B) GO enrichment analysis of cellular component and molecular function. (C) KEGG pathway enrichment analysis. (D) Reactome pathway enrichment analysis. (E) PPI network and hub modules selected by the CytoHubba-MCC and MCODE plug-ins of Cytoscape 3.8.2. Abbreviations: GO, gene ontology; DEGs, differentially expressed genes; PPI, protein‒protein interaction; KEGG, Kyoto encyclopedia of genes and genomes. To better identify interactions between such 567 overlapping genes, a PPI network was constructed by the STRING database and it contained 326 nodes (proteins encoded by genes) and 1426 edges (interactions between proteins). Additionally, we utilized the MCODE and CytoHubba plug-in of Cytoscape 3.8.2 to, respectively, screen out the most important modules of the PPI interaction network. The results showed two but the same core clusters of ten nodes screened respectively based on the K-core = 5 and their values of Maximal Clique Centrality (MCC) ([145]Figure 4E). In the [146]GSE184182 and [147]GSE197702 dataset IL13, CCR3, ICAM1, TIMP1, GZMB, CXCL1, and CD86 were up-regulated, while IL18, CXCR2, and ARG1 were reduced. Biological functions of these core genes were significantly enriched in “regulation of type 2 immune response (IL18, CD86, and ARG1)”, “response to dexamethasone (ARG1, TIMP1, and ICAM1)”, “activated T cell proliferation (IL18, ARG1, and TIMP1)”, along with “positive regulation of tyrosine- phosphorylation of STAT protein (IL13, IL18, and TIPM1)” ([148]Figure S1). Identification of Hub Pediatric EoE Diagnostic Biomarkers by Machine-Learning Algorithms For the above ten selected genes, three machine-learning algorithms were used to screen out characteristic genes of pediatric EoE. By utilizing the LASSO algorithm, we identified three potential diagnostic biomarkers from statistically significant univariate variables, with Lambda (λ) = 0.17 ([149]Figure 5A). Likewise, we adopted the RF algorithm to select the top genes and displayed them in the descending order of relative relevance ([150]Figure 5B). In addition, we used the XGBoost algorithm to identify the feature marker at the threshold of the turning point ([151]Figure 5C). Such algorithms collectively enriched the CXCR2 gene ([152]Figure 5D), of which the area under the ROC curve (AUC) indicated its well diagnostic performance in both the discovery (AUC = 0.990) and the validation (AUC = 0.986) dataset. The expression level of CXCR2 in EoE samples was lower than that of control samples significantly ([153]Figure 5E). Moreover, the analysis of correlations between these ten genes suggested a significant result, of which CXCR2 had a relatively high absolute value of total scores ([154]Figure 5F), indicating its significant roles. Figure 5. [155]Figure 5 [156]Open in a new tab Identification of hub diagnostic markers by machine-learning algorithms. (A) LASSO logistic regression algorithm screening out potential diagnostic biomarkers with Lambda (λ) = 0.17; Different colors representing different genes. (B) RF algorithm identifying diagnostic biomarkers. (C) XGBoost identifying diagnostic biomarkers. The CXCR2 labelled by the red square shows the highest feature importance in differentiating by XGBoost machine-learning algorithm. (D) A Venn diagram displaying the intersected diagnostic biomarker of the three algorithms. (E) Violin plots demonstrating the expression level of key marker between the pediatric EoE and control groups in both discovery and validation datasets; ROC curves showing its diagnostic efficacy. (F) Correlations between the ten genes of the selected cluster in the last procedure. Abbreviations: LASSO, least absolute shrinkage and selection operator; RF, random forest; XGBoost, eXtreme Gradient Boosting; ROC, receiver operating characteristic curve; EoE, eosinophilic esophagitis; CXCR2, CXC chemokine receptor 2. Immune Cell Infiltration Abundance and Correlation Analysis Immune infiltration analysis was carried out to clarify the immune-relation of EoE in pediatric patients by the CIBERSORT algorithm. Regarding EoE and control samples, the proportion of 22 types of immune cells was calculated and visualized by [157]Figure 6A. Compared with healthy control samples, children with EoE generally had a higher level of eosinophils, resting mast cells, and Tregs (P < 0.05) ([158]Figure 6B). Moreover, analysis of the correlations between infiltrating immune cells showed that Tregs were positively associated with resting mast cells (r = 0.56) and eosinophils (r = 0.51); resting NK cells had a negative correlation with M2 macrophages (r = −0.49); memory B cells were also negatively associated with M0 macrophages (r = −0.52); gamma delta T cells displayed a negative association with activated dendritic cells (r = −0.50) ([159]Figure 6C). Moreover, correlation analysis indicated significant associations of the expression level of CXCR2 and the abundance of various immune cells, including a negative correlation with Tregs (correlation coefficient = −0.51), and a positive correlation with neutrophils (correlation coefficient = 0.49) via the Spearman method ([160]Figure 6D and [161]E). These immune cells might promote the progression of the immune microenvironment of EoE in children. Figure 6. [162]Figure 6 [163]Open in a new tab Immune cell infiltration analysis. (A) The boxplot showing the relative proportions of 21 types of immune cells calculated by the “CIBERSORT” algorithm in each sample. (B) The boxplot comparing the relative abundance of 21 immune cell types between EoE and healthy groups. “*”, P value < 0.05; “**”, P value < 0.01. (C) The Spearman correlations between immune cell types. (D-E) The Spearman correlations of infiltrating immune cells and the CXCR2 gene. The red boxes in D label the most significant positive correlation coefficient with neutrophils (the upper one) and the most significant negative correlation coefficient with Tregs (the lower one). (F) The diagnostic performance of CXCR2 in blood and esophageal tissue samples in [164]GSE156651 dataset. (G) The diagnostic performance of CXCR2 in differential diagnosis under different pathologies, including GERD and EoE-like esophagitis. Abbreviations: GERD, gastroesophageal reflux disease; EoE, eosinophilic esophagitis; CXCR2, CXC chemokine receptor 2. The Differential Diagnosis Efficacy and GSEA Results of CXCR2 The analysis result of whole blood samples in the [165]GSE156651 dataset suggested a moderate diagnostic efficacy of CXCR2 (AUC = 0.750), and the result of tissue samples demonstrated a great performance (AUC = 1.000) ([166]Figure 6F). Besides, the application of CXCR2 as a biomarker of differential diagnosis could be extended to those who have gastroesophageal reflux disease (GERD), which leads to an AUC = 0.944 in [167]GSE41687. In [168]GSE148381, the value for lymphocytic esophagitis = 0.820, for EoE-like esophagitis = 0.908, and for non-specific esophagitis = 0.790 ([169]Figure 6G). Furthermore, we performed GSEA analysis, and the results suggested that samples expressing lower levels of CXCR2 were highly enriched in inflammatory response: IL2-STAT5 signaling pathway and interferon gamma (IFN-γ) response than those with high expression of CXCR2 ([170]Figure 7A). Figure 7. [171]Figure 7 [172]Open in a new tab The regulatory network of CXCR2. (A) GSEA showing enriched functions in low and high CXCR2-expressing EoE groups. (B)The Venn diagram showing the intersected miRNAs between the [173]GSE55794 dataset, miRWalk database and miRTarBase database. (C) The lncRNA-miRNA-gene (ceRNA) regulatory network of CXCR2. The pink color refers to five lncRNAs (NEAT1, SNHG14, PWAR5, GUSBP11, and XIST) that participate in the regulation of both miRNAs, while the green color refers to those lncRNAs regulating any single miRNA. (D) The expression level of the above selected miRNA (hsa-miR-28-5p and hsa-miR-296-3p) in the [174]GSE55794 dataset obtained from pediatric esophageal tissues. (E) The diagnostic performance of CCL26 upon different situations. (F) The significant inverse correlations between CXCR2 and Th2 inflammatory factors: IL13 and IL5. Abbreviations: EoE, eosinophilic esophagitis; GSEA, gene set enrichment analysis. Prediction of the lncRNA-miRNA-CXCR2 mRNA Regulatory Network To identify potential regulatory mechanisms of CXCR2, its targeted miRNAs were predicted by integrating data from the miRWalk database, the miRTarBase database and the [175]GSE55794 dataset, which in total intersected two unique miRNAs: the up-regulated hsa-miR-28-5p and down-regulated hsa-miR-296-3p in the esophagus of pediatric EoE subjects compared to normal controls ([176]Figure 7B, [177]Table 2). Subsequently, we analyzed potential lncRNAs for these miRNAs in the ENCORI database, and the ceRNA network contained CXCR2, two miRNAs, and fifty-four lncRNAs ([178]Figure 7C). Five lncRNAs: NEAT1, SNHG14, PWAR5, GUSBP11, and XIST participated in the regulation of both miRNAs. The expression levels of hsa-miR-28-5p and hsa-miR-296-3p are shown by [179]Figure 7D. Understanding the role of miRNAs in the gene expression regulatory network may help obtain a novel insight into disease progression and treatment. Table 2. Predicted Regulatory miRNAs of CXCR2 mRNA in Pediatric EoE Patients ID p value Log[2][Foldchange] hsa-miR-28-5p 9.89E-3 0.867 hsa-miR-296-3p 2.63E-2 −0.997 [180]Open in a new tab Abbreviation: miRNA, micro-RNA. Prediction of Therapeutic Reagents for Pediatric EoE Potential therapeutic reagents for EoE therapy were predicted by the CMAP database, and the results suggested that ingenol, homatropine, colforsin, tienilic-acid, scoulerine, vinblastine, mifobate, fulvestrant, parecoxib along with cefotaxime having high potentials ([181]Table 3). Table 3. Predicted Therapeutic Small Molecules for Pediatric EoE Name Score Description Target Ingenol 84.5 PKC activator PRKCD, PRKCE Homatropine 78.43 Acetylcholine receptor antagonist CES1, CHRM1 Colforsin 76.51 Adenylyl cyclase activator ADCY2, ADCY5, GNAS Tienilic-acid 61.8 Sodium/potassium/chloride transporter inhibitor CYP2C9, SLC12A1, SLC22A12 Scoulerine 59.74 Adrenergic receptor antagonist ADRA1D, ADRA2A, GABRA1 Vinblastine 52.55 Microtubule inhibitor TUBB, JUN, TUBA1A, TUBD1, TUBE1 Mifobate 52.48 PPAR receptor antagonist PPARG, REN Fulvestrant 52.36 Estrogen receptor antagonist ESR1, EPHX2, ESR2, GPER1 Parecoxib 50.67 Cyclooxygenase inhibitor LTF, PTGS2 Cefotaxime 50.63 Bacterial cell wall synthesis inhibitor – [182]Open in a new tab Discussion Deeper understanding of the molecular mechanism of pediatric EoE could contribute to developing novel approaches to alleviate such an atopic disease.[183]^52 Among chronic pediatric diseases, EoE leads to one of the lowest qualities of life,[184]^53 and up to 10% of patients did not present with canonical endoscopic signs.[185]^54 Nowadays, it ranks the second cause of chronic esophagitis after GERD and is prominently refractory to proton pump inhibitors (PPIs) therapy (accounting for 10–30% patients).[186]^55 Prior studies identified several biomarkers for EoE diagnosis, including altered eosinophils biomarkers and a single/combination of miRNA(s) from esophagus/blood in distinguishing EoE.[187]^56^,[188]^57 As a member of the eotaxin family of chemokines in EoE, Eotaxin-3 (CCL26) is the most highly induced biomarker in EoE patients in comparison with those healthy individuals;[189]^58 However, cohort analysis showed a weaker performance of CCL26 in differentiating non-specific esophagitis (AUC = 0.680), GERD (AUC = 0.567), EoE-like esophagitis (AUC= 0.827), and lymphocytic esophagitis (AUC = 0.760) from EoE. Similarly, the eosinophils-related cytokine receptor IL5RA showed worse performance than CXCR2[190]^35 ([191]Figure 7E). The AUC value ≥ 0.7 of the ROC suggested the acceptable prediction value of CXCR2.[192]^60 A novel molecular diagnostic panel relying on the TaqMan-qPCR-based low-density array system could detect EoE patients via an algorithm of 96 genes; however, it has to combine with pH-impedance testing to distinguish EoE samples from acid-induced pathologies, as the data in detecting GERD samples shows.[193]^61 Besides, a recent work reported a failure in searching for available serum miRNAs to differentiate pediatric EoE cases.[194]^62 There is still little evidence showing circulating miRNAs in the serum of EoE subjects.[195]^62 In addition to the above-mentioned attempts, previous studies aiming at discovering common non-invasive indicators for EoE diagnosis, either failed in population testing,[196]^9^,[197]^63 scarcely tested in children,[198]^64–66 or still staged in laboratory (ie, exosomes, esophageal microbiome, or non-targeted metabolomics).[199]^67 For example, a panel of 8 cytokines (IL-4, IL-13, IL-5, IL-6, IL-12p70, CD40L, IL-1α, and IL-17) in plasma for identifying EoE has been tested but merely with a 61% sensitivity and 83% specificity.[200]^59 Two studies explored dysregulated 15(S)-hydroxyeicosatetraenoic acid, a metabolite elevated by 2.4-fold in the serum of EoE patients, as well as increased plasma urea cycle metabolites (ie, putrescine, N-acetylputrescine, and dimethylarginine) in differentiating affected/normal subjects.[201]^68^,[202]^69 The former index is not tested in children, and PPI use influences urea cycle metabolites.[203]^69 As to the microbial components of esophageal mucosa, healthy flora mainly consisted of: Streptococci, while flora in EoE enriched Neisseria, Corynebacterium, and Haemophilus, and downregulated Porphyromonas.[204]^67 Furthermore, former attempts testing fractionated exhaled nitric oxide testing (FeNO) and salivary miRNA levels (miR-26b-5p, miR-27b-3p, Let-7i-5p, miR-142-5p, miR-30a-5p, and miR-205-5p), performed not well in differentiating EoE and healthy samples,[205]^70^,[206]^71 with the alliance of the above-mentioned six miRNAs reached a 70% sensitivity and 68% specificity in children despite that miR-205-5p had the largest difference between the EoE and non-EoE groups.[207]^71 Other works determined some candidate salivary miRNAs in adults, such as miR-4668-5p, while their efficacy in pediatric samples required evaluation.[208]^72 Though greater efforts paid to seeking non-invasive markers of EoE, a robust candidate has not been screened out.[209]^73 In the present study, we integrated comprehensive bioinformatical methods to screen out CXCR2 as an independent diagnostic marker of pediatric EoE. ROC curves suggested its great performance in diagnosing EoE using biopsy and the validation result from whole blood samples further supported its value in children as an available and acceptant clinical testing (merely need to extract peripheral blood).[210]^74 Moreover, the evidence demonstrating the highly conserved transcriptomic signature of EoE in children subjects across different countries, genders, age, allergic status, and triggers, to some extent, favored the credibility of our conclusion.[211]^9^,[212]^75^,[213]^76 The fact that EoE and GERD patients shared some similar clinical manifestations (ie, dysphagia, epigastric pain, food impactions, or prominent esophageal eosinophilia infiltration) and diagnosing EoE by the presence of ≥ 15 eos/high power field (x400) in the esophageal tissue biopsy or other eosinophilic biomarkers are not that solid,[214]^7 makes this biomarker more important.[215]^77 Prior studies have confirmed that pediatric EoE patients showed higher levels of Tregs in their esophageal epithelium than GERD patients and healthy children,[216]^78–80 which was in opposite to adult EoE patients that had impaired induction of Tregs.[217]^81 Such difference indicated different pathophysiological mechanisms of EoE in adults and children, stressing the importance of investigating the disease in pediatric populations. Notably, CXCR2 is an independent index from eosinophils, as its expression level is negatively associated with Tregs while the number of Tregs shows no significant correlation with that of eosinophils (r = −0.04, P = 0.90).[218]^82^,[219]^83 As a G protein-coupled receptor (GPCR) for cellular signal transduction,[220]^84 CXCR2 is scarcely expressed by eosinophils but by other cell types related with chronic inflammation, such as neutrophils, dendritic cells, macrophages, mast cells, endothelial cells, and lymphocytes,[221]^85 mainly serving as one key mediator of neutrophils migration.[222]^86 Though there is a lack of researches focusing on the role of neutrophils in EoE, currently published literatures validated their unequivocal deficiency in other atopic diseases and roles in eliciting IgG-mediated anaphylaxis in patients with food allergy (one important etiology of EoE).[223]^87^,[224]^88 Lower levels of CXCR2 can be attributed to decreased expression of the receptor by neutrophils,[225]^86 potentially induced by overexpressed IFN-γ.[226]^89 Recently, it has been observed that neutrophils could produce platelet-activating factors in allergic conditions,[227]^88 while platelets interact with neutrophils, monocytes as well as eosinophils via forming platelet-leukocyte complexes.[228]^90 Subsequently, the impact of immune cell infiltration on pediatric EoE was studied, and GSEA analysis of whole transcriptomic features of pediatric EoE samples indicated the tendency towards antigen processing and presentation, conferring to food allergens- induced Th2 inflammation of EoE that could be ameliorated by food antigen avoidance.[229]^7 This process relied on migrated leukocytes, particularly the activated Th2 lymphocytes, mast cells, basophils, invariant natural killer cells (iNKTs), and eosinophils to their site.[230]^7 Functions of important inflammatory mediators, such as IL13, depend on the activation of STAT1 and STAT6 that belong to the enriched JAK-STAT signaling pathway.[231]^91 Also, KEGG enrichment analysis recognized this crucial type 2 cytokine signaling pathway.[232]^92 Besides, GO analysis of DEGs indicated their enrichment in inflammatory and immune responses. For MF, DEGs were enriched in 12-lipoxygenase (12-LOX) activity. Studies have suggested dysregulated arachidonic acid (AA) metabolism in eosinophilic allergic diseases and that 12-LOX exerted regulatory roles in eosinophilic airway inflammation, along with 15-lipoxygenase.[233]^93 Besides, the Reactome enrichment analysis enriched the pathway of biosynthesis of specialized pro-resolving mediators (SPMs) that was carried out by 12-LOX/15-LOX.[234]^94 In addition, enriched terms of NF-kappa B family members-mediated signaling in BP corresponded to the NF-kappa B signaling pathway in KEGG, and its members (NFKB1, NFKB2, RELA, REL, and RELB) were identified to be main regulatory TFs in the upper stream of key module genes of the PPI network. Their down stream-modulated IL4 and IL13 signaling could redirect Th1 and Th2 cells polarization towards Th2 cells.[235]^92^,[236]^95 Moreover, we observed that the DEGs got involved in regulating signaling by interleukins, cytokine signaling in immune system, and G protein-coupled receptor signaling pathway, which further strengthened the potentiality of CXCR2 as a diagnostic biomarker for EoE. The correlation heatmap has indicated that CXCR2 had a high (the second highest) sum of correlation coefficients. The correlation analysis also indicated a strong correlation between CXCR2 and type 2 inflammatory signature genes (ie, IL5 and IL13) in the esophageal epithelium ([237]Figure 7F). Treatment merely targeting at infiltrating eosinophils appeared to be not sufficient, as anti-IL5 monoclonal antibody (mAb) depleting eosinophils did not reduce symptoms or any other chronic inflammatory manifestations in esophagus.[238]^96^,[239]^97 This might suggest that elevated eosinophils were recruited due to Th2 inflammation and not necessary for maintaining such eosinophilic process. Besides, a recent study suggested the utilization of CXCR4 antagonist to protect mice models from such inflammation, which raised the interest whether targeting CXCR2 could achieve a similar effect. Moreover, prediction from the CMAP database indicated potential therapeutic molecules reversing the change trend of pediatric EoE transcriptome. Finally, an eight-axis ceRNA network consisting of the hub diagnostic marker, two miRNAs, and five lncRNA was built to reveal the regulatory mechanism of CXCR2, in which lncRNAs NEAT1, SNHG14, PWAR5, GUSBP11, and XIST competed with both miRNAs and hence had the potential to be core targets in the treatment of pediatric EoE. miRNA is a small non-coding RNA molecule (containing about 22 nucleotides) mainly functions to inhibit target mRNA translation via binding 3’ untranslated region (UTR).[240]^98 LncRNAs act on miRNAs to further regulate mRNA, forming lncRNA/miRNA/mRNA regulatory axis. The hsa-miR-28-5p was involved with immune tolerance regulation of transplanted organs and its upregulation in EoE might lead to reduced expression levels of CXCR2[241]^99 and therefore serves as a therapeutic target for pediatric EoE. It should be indicated that our conclusions came from previously published datasets. The suggested pathogenic mechanisms of miR-28-5p and CXCR2 related to EoE require validation in cells/animal models in the future. Limitations The above results had several limitations: [1] Limited sample size of pediatric EoE patients and a general lack of corresponding clinical data; [2] This study was limited to the transcriptomic level and current findings are awaited to be validated by prospective clinical cohorts or basic experiments; [3] Certain evaluations of the efficacy of CXCR2 in differential diagnosis were performed in adults’ cohort due to a lack of corresponding data of pediatric patients; [4] The efficacy of CXCR2 on differential diagnosis in atopic conditions were not assessed for a lack of data of EoE patients with atopic comorbidities on the same platform; [5] The therapeutic effects of targeting at CXCR2 remain uncertain; [6] The diagnostic/therapeutic efficacy of the miR-28-5p still requires to be investigated; [7] The prognostic value of CXCR2/miR-28-5p is still awaited to be revealed. Therefore, we recommend establishing multi-center, large-scale, prospective pediatric EoE patients’ cohorts to collect tissue samples, preserve clinical data, and follow-up outcomes to further validate our results, as well as exploring more accurate and applicable biomarkers of EoE. Conclusion This study applied systematical methodologies to reveal that immune-related gene CXCR2 is a potential diagnostic biomarker of EoE in pediatric patients. In addition, the CIBERSORT algorithm used to explore dysregulated proportions of subtype of immune cells provided novel insights into optimized immunomodulatory therapies against EoE. Acknowledgments