Abstract Background: Atopic dermatitis (AD) is a common inflammatory skin condition with complex origins. Current treatments often yield suboptimal results due to an incomplete understanding of its underlying mechanisms. This study aimed to identify pathway and gene signatures that distinguish between lesional AD, non-lesional AD, and healthy skin. Method: We conducted differential gene expression and co-expression network analyses to identify differentially co-expressed genes (DCEGs) in lesional AD vs. healthy skin, lesional vs. non-lesional AD, and non-lesional AD vs. healthy skin. Modules associated with lesional and non-lesional AD were identified based on the correlation coefficients between module eigengenes and clinical phenotypes (|R| ≥ 0.5, p-value < 0.05). Subsequently, we employed Ingenuity Pathway Analysis (IPA) on the identified DCEGs, followed by machine learning (ML) analysis within the pathway expression framework. The ML analysis of pathway expressions, selected by IPA and derived from gene expression data, identified relevant pathway signatures, which were validated using an independent dataset and correlated with AD severity measures (EASI and SCORAD). Results: We identified 975, 441, and 40 DCEGs in lesional vs. healthy skin, lesional vs. non-lesional, and non-lesional vs. healthy skin, respectively. IPA and ML analyses revealed 25 relevant pathway signatures, including wound healing, glucocorticoid receptor signaling, and S100 gene family signaling pathways. Validation confirmed the significance of 10 pathway signatures, which were correlated with the AD severity measures. DCEGs such as MMP12 and S100A8 demonstrated high diagnostic efficacy (AUC > 0.70) in both the discovery and validation datasets. Conclusions: Differential gene expression, co-expression networks and ML analyses of pathway expression have unveiled relevant pathways and gene signatures that distinguish between lesional, non-lesional, and healthy skin, providing valuable insights into AD pathogenesis. Keywords: atopic dermatitis (AD), weighted co-expression network analysis, pathway expression analysis, machine learning, diagnostic biomarkers 1. Introduction Atopic dermatitis (AD) is a multifactorial and complex skin disease that affects approximately 30% of children [[30]1] and 3–5% of adults globally [[31]2]. AD presents as a highly heterogeneous phenotype and can manifest in both lesional and non-lesional skin forms [[32]3]. While lesional AD skin is characterized by pronounced inflammation, increased water loss, and compromised barrier function, non-lesional AD skin appears visually healthy but harbors subtle features typical of diseased skin [[33]4,[34]5,[35]6]. Non-lesional areas may play a role in the development of active lesions and increased skin permeability over time. Notably, the trans-epidermal water loss (TEWL) values in non-lesional skin are relatively higher than those in healthy skin, although the differences are not significant [[36]7]. In contrast, lesional AD skin consistently shows significantly elevated TEWL values compared to both non-lesional and healthy skin [[37]8]. Consequently, numerous studies have aimed to discriminate between lesional AD and non-lesional skin [[38]9,[39]10]. In recent years, with the development of high-throughput omics technologies, large-scale transcriptomic AD data can help to understand the underlying molecular mechanisms of AD [[40]11]. Several studies have explored potential biomarkers by comparing gene expression patterns in lesional and non-lesional skin of AD patients. Dyjack et al. investigated skin tape strip samples and found overexpression of IL-13, IL-4R, CCL22, and CCR4 in severe AD [[41]9]. Using transcriptome data from non-lesional skin allowed stratification of patients based on disease severity. Several biomarkers distinguishing between lesional and non-lesional AD samples have also been proposed [[42]10,[43]12,[44]13]. In addition, dysregulated genes in enriched pathways, including atherosclerosis signaling [[45]14], have been implicated in AD pathogenesis, reflecting immune and inflammatory processes [[46]15]. Recent research highlights the activation of the TH22, TH17/IL-23, and TH1 pathways in AD skin subtyping [[47]16]. Despite recent breakthroughs, biological therapies for diagnosing non-lesional and lesional AD remain less understood. AD diagnosis currently relies on patient history and visual assessment, with management involving topical treatments (moisturizers, corticosteroids, or calcineurin inhibitors) that lack specificity for AD [[48]17]. While clinical manifestations of AD skin have been well documented, the molecular distinctions between lesional and non-lesional AD skin remain incompletely studied. Identifying pathways involved in the transition from non-lesional to lesional AD skin is crucial. The recent literature explored ML models for biomarker discovery in AD, leveraging omics data. Martínez et al. developed ML methods to examine four skin diseases, including AD, using gene expression data and differentiating non-lesional samples from normal skin [[49]10]. Zhong et al. used the LASSO method to discover three biomarkers distinguishing lesional from non-lesional AD skin [[50]18]. A recent study used random forest (RF) to identify two AD endotypes, including eosinophil low and eosinophil high, using blood transcriptome data [[51]12]. Several studies have conducted biological pathway analysis using biological relationships between genes, usually using enrichment analysis. However, ML- analysis of expression data at the pathway level or pathway expression analysis to distinguish lesional and non-lesional skin in AD has not been explored. Pathway expression analysis leverages the biological interconnectedness of genes to convert gene expression data into pathway expression data, facilitating the identification of pathway signatures [[52]19,[53]20]. In our study, we used an integrated analysis, including differential gene expression analysis, co-expression network analysis, and IPA and ML analysis of pathway expression to select pathway and gene signatures that discriminate three comparisons: lesional AD vs. heathy skin, lesional vs. non-lesional AD and non-lesional AD vs. healthy skin. We used gene expression datasets, including [54]GSE121212, [55]GSE107361 and [56]GSE130588, from the Gene Expression Omnibus (GEO) database. Potential DCEGs in three pairwise group comparisons were first identified. Then, we identified the underlaying major significant pathways associated with the DCEGs and constructed pathway expression data matrices using the annotated DCEGs in the pathways. The RF method and importance score were used to identify and prioritize pathways that discriminate lesional, non-lesional AD and healthy skin in the [57]GSE121212 dataset. The [58]GSE107361 and [59]GSE130588 datasets were used to validate the pathways and gene signatures associated with lesional, non-lesional and AD severity indices, including EASI and SCORAD. Overall, our integrative analyses were able to provide novel insights to identify important pathways and gene signatures that discriminate lesional vs. healthy, lesional vs. non-lesional and non-lesional vs. healthy skin and provide potential therapeutic targets. 2. Materials and Methods 2.1. Dataset Used in the Study In our study, we harnessed gene expression data and clinical information from publicly available databases to explore the transcriptomic landscape of AD. Our inclusion criteria included any dataset that (1) studies Homo sapiens, (2) has a sample size ≥ 50, and (3) contains the gene expression profiles of lesional, non-lesional and healthy skin samples. Using these inclusion criteria, we found three relevant Gene Expression Omnibus (GEO) datasets: [60]GSE121212, [61]GSE107361, and [62]GSE130588 to understand the molecular underpinnings of lesional and non-lesional AD. We leveraged the RNA-sequencing (RNA-seq) [63]GSE121212 dataset from 92 skin samples, including 27 non-lesional, 27 lesional AD and 38 healthy skins, as the discovery dataset to identify genes and pathway signatures associated with lesional and non-lesional AD. The details of the subject recruitment and study procedure were described in previous studies [[64]21,[65]22]. Dermatologists diagnosed AD based on standardized criteria, and high-depth RNA-seq allowed us to unravel the disease mechanisms. The other two datasets, [66]GSE107361 and [67]GSE130588, which consist of gene expression profiles and the corresponding clinical data, were used for the validation of the results. The [68]GSE107361 dataset consists of 39 healthy, 40 non-lesional and 29 lesional AD skin samples with microarray gene expression profiles. This dataset was preprocessed and normalized in the original study [[69]23]. The [70]GSE130588 dataset consists of 51 lesional, 41 non-lesional AD and 20 healthy skin biopsy specimens along with the disease severity indices EASI and SCORAD collected at baseline in the original study. Data preprocessing and normalization were described in a previous study [[71]24]. The [72]GSE130588 dataset was used to examine the association of the pathway expression and disease severity index measures. A summary of the datasets used in our study is shown in [73]Supplementary Table S1. 2.2. Differential Expression Genes (DEGs) Analysis In our study, we applied rigorous data preprocessing steps to the RNA-seq discovery data. Low-abundance genes (<10 reads) were removed to enhance the robustness of the RNA-seq discovery data [[74]25]. Using the DESeq2 [[75]26] method with variance stabilizing transformation, we performed pairwise differential expression analysis across different groups (lesional AD vs. healthy, lesional vs. non-lesional AD, and non-lesional AD vs. healthy). Genes with adjusted p-values (Benjamini–Hochberg) < 0.05 and an absolute log2 fold change > 1 were deemed significant. 2.3. Weighted Gene Co-Expression Network Analysis (WGCNA) In our study, we focused on identifying co-expressed genes alongside individual differentially expressed genes (DEGs) in AD. By calculating the coefficient of variation (CV) for each gene in the variance-stabilized and normalized RNA-seq data, we pinpointed hypervariable genes (CV > 15%) across three pairwise group comparisons: healthy vs. lesional, healthy vs. non-lesional, and lesional vs. non-lesional in the discovery cohort [[76]26]. Next, we conducted gene co-expression network analysis using the “WGCNA” package [[77]27]. Initially, we constructed a similarity matrix based on the pairwise correlation [MATH: Sij :MATH] = [MATH: cor (xi, < msub>xj :MATH] ), where [MATH: xi :MATH] and [MATH: xj :MATH] represent the ith row and the jth row of hypervariable gene normalized RNA-seq data matrix X, respectively, and then we transformed the similarity matrix into an adjacency matrix, denoted by [MATH: Aij=cor ( xi, xj )β :MATH] , where β represents the suitable soft-thresholding power (ranges from 1–20) and the suitable powers were estimated using the index value in the discovery dataset of each comparison group based on the pick Soft Threshold function [[78]27]. Next, we transformed the adjacency gene network into a topological overlap matrix (TOM) and corresponding dissimilarity (1-TOM) matrix. The WGCNA used hierarchical clustering to identify gene modules/clusters of densely interconnected genes in many samples and color to denote the modules. The minimum number of genes in each module was set to 50 [[79]28]. 2.4. Identifying Modules Correlated with Lesional and Non-Lesional Skin Here, we employed the Spearman correlation coefficient between the clinical phenotypes (lesional AD, non-lesional AD, and healthy skin) and the module eigengenes (MEs), the principal component of a module, to identify crucial gene modules. Genes in the modules correlated with the clinical phenotypes (absolute Spearman correlation coefficient (|R|) > 0.5 and p-value < 0.05) and DEGs obtained from differential analysis were then overlapped, pinpointing differentially co-expressed genes (DCEGs) or hub genes across three group comparisons. 2.5. Functional Enrichment Analysis The pathway enrichment analyses were conducted using the Ingenuity Pathway Analysis software (IPA, QIAGEN, Redwood City, CA, USA) [[80]29] to explore the pathway behaviors of genes within modules correlated with the AD clinical phenotypes and DEGs. For this analysis, we focused on gene signatures overlapping between DEGs meeting specific criteria: absolute log2 fold-change values > 1 and adjusted p-value < 0.05 and DCEGs in modules with an absolute Spearman correlation coefficient (|R|) > 0.5 and p-value < 0.05. IPA pathways with an adjusted p-value < 0.05 were considered in pathway expression analysis below. 2.6. Machine Learning Analysis of Pathway Expression and Statistical Analyses In this study, we utilized pathway expression analysis instead of gene expression level analysis approach [[81]19,[82]20] to identify key pathways. For the pathway expression analysis, we first constructed a pathway expression matrix from the gene expression matrix in the discovery and validation datasets. The general transformation of the gene expression matrix into the pathway expression matrix is shown in [83]Supplementary Figure S1. The pathway expression matrix was then defined as the average expression level of all the annotated genes for each subject and each significant pathway. These pathway expression matrices served as features for the supervised RF analysis, and relevant pathways that discriminated between lesional, non-lesional, and healthy skin were identified. Relevant pathways with an importance score, measured by a mean decrease in accuracy > 4 in classifying the AD clinical phenotype, including lesional, non-lesional AD and healthy skin, were selected using the RandomForests R package. The heatmaps of the AD clinical phenotype discriminatory pathways were visualized using the pheatmap R package. The mean pathway expression across the pairwise comparisons including lesional vs. healthy, non-lesional vs. lesional, and non-lesional vs. healthy skin were compared using t-tests in the discovery and validation normalized gene expression datasets. Pearson correlation (R) analysis was applied to examine the correlation between the pathway expression data and the numerical AD disease severity measures, including EASI and SCORAD, in the validation dataset. An area under the receiver operating characteristics (ROC) curve using a logistic regression model was used for assessing the diagnostic efficiency of the pathway expression and gene signatures in discriminating lesional vs. healthy, lesional vs. non-lesional and non-lesional vs. healthy skin in the discovery and validation datasets. All the statistical analyses were conducted using R v4.3.2. A p-value < 0.05 was used to define statistical significance, unless otherwise stated. 3. Results 3.1. Identification of Differentially Expressed Genes in Lesional and Non-Lesional AD Skin Differential gene expression analysis of the lesional (n = 27), non-lesional (n = 27) AD and healthy control (n = 38) samples revealed 1243 DEGs (544 up- and 699 down-regulated) in lesional vs. healthy, 801 DEGs (355 up- and 446 down-regulated) in lesional vs. non-lesional AD, and 42 DEGs (36 up- and 6 down-regulated) in non-lesional AD vs. healthy skin ([84]Figure 1A–C) in the discovery dataset. Notably, we identified 33 common DEGs (27 up-regulated and 6 down-regulated genes) across three pairwise comparisons, as shown in [85]Figure 1A–C and [86]Supplementary Figure S2A. Unsupervised cluster analysis of these DEGs in the three comparison groups successfully discriminated between lesional, non-lesional, and healthy skin ([87]Supplementary Figure S2B). The observed higher absolute log2 fold change in lesional vs. healthy skin, compared to lesional vs. non-lesional and non-lesional vs. healthy skin (as shown in [88]Supplementary Figure S2B), suggests that several genes are significantly dysregulated in lesional AD. These include up-regulated genes such as SERPINB4, S100A9, S100A8, SPRR2A, S100A7, S100A7A, SPRR2B, KRT16, HEPHL1, and DEFB4A, as well as down-regulated genes such as KRT77, BTC, WIF1, FABP7, CHRM4, ALOX15, SOCS3, NELL2, TMPRSS4, and MMP12), which may enhance our understanding of the AD disease mechanisms and contribute to personalized medicine. Figure 1. [89]Figure 1 [90]Open in a new tab Identification of lesional- and non-lesional-associated genes in the discovery dataset. (A–C) Volcano plot indicating 1243 differentially expressed genes (DEGs with 544 up- and 699 down-regulated) in lesional AD compared to healthy skin, 801 DEGs (355 up- and 446 down-regulated) in lesional compared to non-lesional AD skin and, 42 DEGs (36 up- and 6 down-regulated) in non-lesional AD compared to healthy skin, respectively, in the discovery RNA-seq dataset. A total of 33 common DEGs were identified across three comparisons, as highlighted in the volcano plots. Red dots indicate genes with an adjusted p-value < 0.05 and log2 fold change (log2FC) > 1; green dots indicate genes with an adjusted p-value < 0.05 and log2FC < 1; DEGs, differentially expressed genes. (D) The correlation between the module eigengenes and the clinical phenotype: lesional AD vs. healthy skin. (E) The correlation between the module eigengenes and the clinical phenotype: lesional vs. non-lesional AD skin. (F) The correlation between the module eigengenes and the clinical phenotype: non-lesional vs. healthy skin. Key module genes associated with lesional and non-lesional skin were selected with the following criteria: absolute correlation coefficient, |R| > 0.5 and p-value < 0.05. LS-lesional, NL-non-lesional, H-healthy skin, and R-Spearman’s correlation coefficient. 3.2. Identification of Differential Co-Expression Modules Associated with Lesional and Non-Lesional AD To explore the similarity and heterogeneity of biological molecules in lesional and non-lesional AD, we first identified 4905 hypervariable genes using coefficient of variation analysis across three pairwise comparisons in the discovery dataset. Next, co-expression network analyses of these 4905 hypervariable genes were conducted across the three pairwise comparisons using appropriate soft threshold powers (β = 5, 10, and 5, with scale-free R^2 ≥ 0.85) for the three comparisons ([91]Supplementary Figure S3A–C) to investigate their gene regulatory networks. We identified a total of 9, 12, and 10 co-expression modules in lesional vs. healthy, lesional vs. non-lesional, and non-lesional vs. healthy comparisons ([92]Supplementary Figure S4A–C), respectively. Of these, the key gene modules correlated with lesional and non-lesional AD (|R| > 0.5 and p-value < 0.05) were selected for further analysis ([93]Figure 1D–F and [94]Supplementary Table S2). For the lesional AD and healthy skin comparison, the blue module with 1171 genes is positively correlated with lesional AD, while the brown module with 637 genes is negatively correlated with lesional AD ([95]Figure 1D). In the lesional and non-lesional AD comparison, the blue module (601 genes) and brown module (459 genes) are positively correlated with lesional AD, whereas the black module (236 genes) and red module (312 genes) are negatively correlated with lesional AD ([96]Figure 1E). For the non-lesional AD and healthy skin comparison, the blue module (765 genes) is positively correlated and the magenta module (68 genes) is negatively correlated with non-lesional AD ([97]Figure 1F). After identifying the co-expressed genes/module genes, we conducted an overlapping analysis between the DEGs and the module genes in three pairwise comparisons to screen DCEGs in lesional and non-lesional AD ([98]Supplementary Table S2). For lesional AD and healthy skin, we identified 338 DCEGs in the blue module and 244 DCEGs in the brown module. For lesional AD and non-lesional AD, we identified 240 DCEGs in the blue module, 32 DCEGs in the brown module, 73 DCEGs in the black module, and 96 DCEGs in the red module. For non-lesional AD and healthy skin, we identified 35 DCEGs in the blue module and 5 in the magenta module. Overall, our analyses revealed 975, 441, and 40 DCEGs in the lesional vs. healthy, lesional vs. non-lesional, and non-lesional vs. healthy skin comparisons, respectively. Notably, there were more DCEGs in lesional AD vs. healthy skin, suggesting that more gene regulatory networks are involved in lesional AD. 3.3. Functional Enrichment Analysis of the DCEGs in Lesional and Non-Lesional Correlated Modules To further understand the biological function of DCEGs within modules correlated with non-lesional and lesional AD, we performed a canonical pathway enrichment analysis. We identified a total of 108 over-represented pathways (p-value < 0.05, [99]Supplementary Tables S3–S7). The 338 DCEGs in the blue module derived from lesional AD and healthy skin are involved in several pathways, including granulocyte adhesion and diapedesis, atherosclerosis signaling, granulocyte adhesion and diapedesis, pathogen-induced cytokine storm signaling pathway, interferon signaling and S100 family signaling pathways ([100]Supplementary Table S3). The 244 DCEGs in the brown module derived from lesional and healthy skin are enriched in 10 pathways, including the 3 most significant pathways, the role of osteoblasts in rheumatoid arthritis signaling, neurovascular coupling signaling and iron homeostasis signaling ([101]Supplementary Table S4). The 240 DECGs in the blue module derived from lesional and non-lesional AD are enriched in 51 pathways, with the 5 most significant pathways included atherosclerosis signaling, interferon signaling, S100 family signaling, IL-17 signaling, and differential regulation of cytokine production in intestinal epithelial cells by IL-17A and IL-17F ([102]Table S5). The 32 DCEGs in the brown module derived from lesional and non-lesional are involved in several pathways, including granulocyte adhesion and diapedesis, agranulocyte adhesion and diapedesis, atherosclerosis signaling, and pathogenesis of multiple sclerosis ([103]Supplementary Table S6). The 35 DCEGs in the blue module derived from non-lesional and healthy skin are enriched in 9 pathways, including the IL-13 signaling pathway, pathogen-induced cytokine storm signaling pathway, and granulocyte adhesion and diapedesis pathways ([104]Supplementary Table S7). 3.4. Identification of Lesional and Non-Lesional AD Discriminatory Pathway Signatures As shown in the functional enrichment analysis, we found several significant pathways enriched in lesional and non-lesional AD. To prioritize and identify important pathways that differentiate lesional and non-lesional AD, we first transformed the gene expression data into pathway expression-level data by calculating the average gene expression levels within each of the 108 enriched pathways. Then, we used RF algorithms and selected 25 pathway expression signatures based on their discriminatory importance for lesional, non-lesional AD, and healthy skin ([105]Figure 2A,B). These pathways exhibited differential modulation across the comparisons, with higher expression profiles in lesional AD compared to non-lesional AD and healthy skin ([106]Figure 2C–E and [107]Supplementary Figure S5). Figure 2. [108]Figure 2 [109]Open in a new tab Identification of the pathway signatures discriminating lesional AD, non-lesional AD and healthy skin in the discovery dataset. (A) Most important pathway signatures selected by the random forest method. (B) The heatmap showing the mean normalized expression level of 25 pathway signatures among three groups. (C–E) Comparison of the mean normalized expression level of the pathway signatures in three pairwise comparisons. The significance levels of the comparison pathway mean differential modulation across the pairwise group comparisons are indicated by the stars (**** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05). LS-lesional, NL-non-lesional, H-healthy skin. 3.5. Enriched Pathway Signatures Were Correlated with AD Clinical Severity Measures After identifying the pathway expression signatures, we validated the pathway expression signatures using independent datasets. Specifically, we analyzed the [110]GSE107361 dataset to confirm the differential modulation of ten pathway expression signatures across lesional, non-lesional, and healthy skin. Notably, ten pathway expression signatures, including wound healing, glucocorticoid receptor signaling and S100 family signaling, were differentially modulated among lesional, non-lesional AD and healthy skin ([111]Figure 3A–D and [112]Supplementary Figure S6). Genes implicated in wound healing (CXCL8, HBEGF, IL36A, IL36G, KRT16, KRT17, KRT6A, KRT6B, KRT6C, MMP1 and PGF), genes implicated in glucocorticoid receptor signaling (CCL2, CXCL8, IL4R, KRT16, KRT17, KRT6A, KRT6B, KRT6C, MMP1, PLA2G2F, PLA2G and SELE) and genes implicated in S100 family signaling (CCL20, CCR7, CXCL8, MMP1, MMP12, S100A12, and S100A8) and other pathways showed a higher pathway expression pattern in lesional AD compared to healthy skin and non-lesional AD skin ([113]Figure 3A–D and [114]Figure S6). Interestingly, multiple genes were involved in the same pathways ([115]Supplementary Figure S7), highlighting that targeted research on these pathways may enhance our understanding and lead to new therapies for lesional and non-lesional AD. Additionally, we performed Pearson correlation analysis between the pathway expression levels and the disease severity measured by the SCORAD and EASI indices to identify pathway expression associated with the disease severity index. The results showed that the pathway expression signatures, including wound healing, glucocorticoid receptor, and S100 family signaling ([116]Figure 4A–F and [117]Supplementary Figure S8), were correlated with SCORAD and EASI. Figure 3. [118]Figure 3 [119]Open in a new tab Validation of the pathway signatures in discriminating lesional, non-lesional and healthy skin in the independent validation dataset. (A) The heatmap showing the mean normalized expression level of the 25 pathway signatures among three groups. (B–D) Comparison of the mean normalized expression level of the pathway signatures in three pairwise comparisons. The significance levels of the comparison pathway mean differential modulation across pairwise group comparisons were indicated by the stars (**** p < 0.0001, *** p < 0.001, ** p < 0.01, ns: not significant). LS-lesional, NL-non-lesional, H-healthy skin. Figure 4. [120]Figure 4 [121]Open in a new tab Validation of the association of the pathway mean expression levels and the disease severity measures, including the SCORAD and EASI, in lesional AD skin. (A) The correlation between the mean normalized expression of genes in the wound-healing signaling pathway and the SCORAD score. (B) The correlation between the mean normalized expression of genes in the glucocorticoid receptor signaling pathway and the SCORAD score. (C) The correlation between the mean normalized expression of genes in the S100 family signaling pathway and the SCORAD score. (D) The correlation between the mean normalized expression of genes in the wound-healing signaling pathway and the EASI score. (E) The correlation between the mean normalized expression of genes in the glucocorticoid receptor signaling pathway and the EASI score. (F) The correlation between the mean normalized expression of genes in the S100 family signaling pathway and the EASI score. 3.6. Validation of Pathways and Gene Signatures Discriminating Lesional, Non-Lesional AD and Healthy Skin After identifying the relevant pathways and gene signatures that differentiate lesional, non-lesional and healthy skin, we examined the diagnostic performance of the pathways in the discovery and validation datasets ([122]Figure 5A). There were 33 key genes involved in the AD severity-associated pathways, of which 18 genes showed high diagnostic performance in discriminating lesional AD vs. non-lesional AD as well as non-lesional AD vs. health skin, as shown [123]Figure 5B. More specifically, the glucocorticoid receptor and IL-33 signaling-related gene SELE showed high diagnostic efficacy in the [124]GSE121212 (AUC = 0.947) and [125]GSE107361 (AUC = 0.727) datasets between lesional AD vs. non-lesional AD. The atherosclerosis and S100 family signaling-related gene S100A8 showed high diagnostic ability in the [126]GSE121212 (AUC = 0.933), and [127]GSE107361 (AUC = 0.846) datasets between lesional and non-lesional AD. The gene OAS2, related to the role of pattern recognition receptors in recognizing bacteria and viruses, had high diagnostic performance in the [128]GSE121212 (AUC = 0.922) and [129]GSE107361 (AUC = 0.679) datasets between lesional and non-lesional AD. The gene MMP12, associated with multiple pathways such as wound healing, glucocorticoid receptor, and IL-33 signaling, demonstrated high diagnostic ability in discriminating lesional and non-lesional AD in the [130]GSE121212 (AUC = 0.892) and [131]GSE107361 (AUC = 0.729) datasets. Moreover, differential regulation of cytokine production in intestinal epithelial cells by the IL-17A and IL-17F and IL-13 signaling- associated gene DEFB4A had high diagnostic performance in the [132]GSE121212 (AUC = 0.775) and [133]GSE107361 (AUC = 0.808) datasets, followed by OAS2 in the [134]GSE121212 (AUC = 0.76) and [135]GSE107361 (AUC = 0.588) and MMP1 in the [136]GSE121212 (AUC = 0.729) and [137]GSE107361 (AUC = 0.875) datasets in classifying non-lesional AD from healthy skin. Overall, integrative analysis using differential analysis, co-expression networks, and pathway expression ranking via ML approaches identified relevant pathways and gene signatures that discriminate between lesional, non-lesional, and healthy skin. Figure 5. [138]Figure 5 [139]Open in a new tab The diagnostic performance of the pathways and gene signatures. (A) The AUC values of the pathway signatures with selected representative annotated genes in discriminating lesional AD vs. healthy skin, lesional AD vs. non-lesional AD skin, and non-lesional AD and healthy skin in the discovery and validation datasets. (B) The AUC value of the gene signatures in discriminating lesional AD vs. healthy skin, lesional AD vs. non-lesional AD skin, and non-lesional AD and healthy skin in the discovery and validation datasets. LS-lesional skin, NL-non-lesional, H-healthy skin. [140]GSE121212-discovey dataset, [141]GSE107361-validation dataset. 4. Discussion In the transcriptome-wide pairwise comparison of lesional, non-lesional and healthy skin by integrating differential expression analysis and WGCNA, we identified 975, 441, and 40 DCEGs in lesional vs. healthy skin, lesional vs. non-lesional, and non-lesional vs. healthy skin, respectively, highlighting several clusters of densely interconnected genes involved in lesional skin relative to non-lesional and healthy skin. These DCEGs were enriched in several significant pathways, of which RF prioritized and validated 10 pathway expression signatures, including wound healing (CXCL8 and MMP1), glucocorticoid receptor signaling (CCL2, CXCL8, MMP1and SELE), and S100 family signaling (CXCL8, MMP1 and MMP12), that differentiate lesional, non-lesional, and healthy skin. The pathway expression signatures were correlated with clinical severity measures, including the EASI and SCORAD. Overall, our study demonstrates that high throughput integrative transcriptomics analysis differentiates lesional and non-lesional skin from heathy skin. Identifying pathways and gene signatures that differentiate lesional, non-lesional, and healthy skin is crucial for AD therapeutics. Previous studies have used differential gene expression analysis to differentiate lesional and non-lesional AD [[142]30,[143]31]. A recent study by Martínez et al. developed an ML method to discriminate between lesional and non-lesional inflammatory skin diseases, including AD, using expression data [[144]10]. Another study by Zhong et al. employed the LASSO method to discover three biomarkers that differentiate non-lesional and lesional AD [[145]18]. However, few studies used combinatorial approaches, including differential analysis, co-expression network analysis, and pathway expression analysis-based on an ML method, to identify pathways and gene signatures that discriminate lesional, non-lesional AD, and healthy skin. These methods can uncover connection networks and hidden biological models crucial to disease pathogenesis [[146]32]. Constructing networks of co-expressed genes identifies highly correlated modules, associates them with phenotypic traits, and elucidates complex gene interactions [[147]33]. This integrative approach demonstrates robustness against noise and variability, offering profound insights into gene regulation and potential therapeutic targets [[148]34]. Therefore, in this study, we used a combinatorial approach, including differential analysis, co-expression network analysis, and machine learning based on pathway expression, to help us select relevant pathways and gene signatures using RNA-seq data from diseased lesional, non-lesional, and healthy skin. More fundamentally, integrated approaches to identify relevant pathway signatures that can predict AD disease severity are crucial. Univariate statistical tests, such as the empirical Bayes-moderated t-statistics test, treat genes as independent from one another and are often poorly reproducible in independent datasets [[149]35]. Therefore, our study used more comprehensive methods to identify the DCEGs in lesional and non-lesional AD, showed these DCEGs were enriched in pathways and gene signatures that discriminated lesional, non-lesional and healthy skin. The 975 DCEGs in lesional vs. healthy, 441 DCEGs in non-lesional vs. healthy, and 40 DCEGs in non-lesional vs. healthy skin were enriched in 108 pathways. Using the RF machine learning algorithm, we showed that the pathway expression levels of 25 identified potential pathways clearly distinguished lesional, non-lesional and healthy skin. Validation analysis confirmed that ten pathway expressions, including wound-healing signaling, glucocorticoid receptor signaling, macrophage alternative activation signaling, IL-33 signaling, S100 family signaling, MIF-mediated glucocorticoid regulation, serotonin receptor signaling, MIF regulation of innate immunity, VEGF family ligand–receptor interactions and tumor microenvironment, showed significant differences among lesional, non-lesional and healthy skin. Remarkably, this study validated that the ten pathway signatures are correlated with the disease severity, including the EASI and SCORAD indices. This may suggest that these pathway trajectories contribute to the progression from non-lesional to lesional AD skin disease. After identifying pathways associated with lesional and non-lesional skin, we examined the diagnostic relevance of individual genes involved in these pathways. Most of the genes, including MMP12, DEFB103A/B, DEFB4A/B, S100A12, S100A8, CCL22, CCR7, CXCL1, CXCL10, MMP1 and SELE, in these pathways showed high diagnostic efficacy in discriminating lesional from non-lesional skin. In addition, genes such as DEFB4A, OAS2, MMP12 and S100A8 showed high diagnostic performance in classifying non-lesional and healthy samples with an AUC > 0.70 in both discovery and validated datasets. Dysregulated pathways such as the S100 family, tumor microenvironment and IL-33 signaling that contain MMP12 with high diagnostic ability may have functional relevance in AD’s pathology. The wound-healing-related gene signatures such as CXCL8 and MMP1 that were up-regulated in lesional AD compared with non-lesional and healthy skin showed high diagnostic performance in distinguishing lesional AD from non-lesional AD and healthy skin. Interestingly, previous studies indicated that wound healing is a complex process that involves the interaction between different cell types, growth hormones, cytokines, antioxidants, and a stable supply of metal ions [[150]36], and our identified genes may have functional relevance in this complex AD pathology. We also identified three genes, including MMP12, DEFB4A, and S100A8, with high diagnostic value in discriminating lesional AD vs. non-lesional AD as well as non-lesional AD vs. healthy skin. Overall, several genes with high diagnostic value in discriminating lesional, non-lesional and healthy skin might be involved in the disease pathogenesis and could be used as potential biomarkers for the diagnosis of AD and as therapeutic targets of AD. Despite the mechanisms that underlie AD progression still being unclear, in this study, we identified several DCEGs associated with lesional and non-lesional AD skin, and several of the identified genes showed a consistent direction of regulation in lesional and non-lesional AD compared with heathy skin, which were also consistent with previous studies. For example, a previous study identified MMP12 as a potential biomarker of both lesional and non-lesional AD skin [[151]37,[152]38]. A recent study by Guttman-Yassky et al. revealed that a 12-week treatment with abrocitinib, a Janus kinase 1-selective inhibitor, down-regulated the expression of several gene biomarkers, including MMP12 and S100A8, in patients with moderate-to-severe AD skin associated with inflammation, epidermal hyperplasia, and Th2 and Th22 immune responses, highlighting that genes found in our study have potential as biomarkers for monitoring and managing AD disease progression [[153]39]. In addition, another study showed that genes encoding epidermal components such as DEFB4A are up-regulated in lesional skin compared with non-lesional AD skin [[154]14]. Although there are numerous reports of gene expression in healthy skin, less is known about the architecture of clinically uninvolved non-lesional skin as compared to healthy skin. Examination of non-lesional skin in AD could provide previously unknown insights into the molecular processes operating in uninvolved skin and suggest a unique preclinical set of healthy skin in non-lesional condition. Non-lesional skin may be more useful in identifying the driving features underlying pathogenesis because the inflammatory milieu among diseases becomes more similar during the lesional stage. Our study indicated that non-lesional skin samples are extremely informative about the underlying disease process and could be used to create patient subsets for future clinical trials or de-risk clinical trials, as non-lesional skin is reported to be an effective marker of the treatment response [[155]40]. Our study has several strengths. Integrated analysis of the correlation network analysis and ML method identified disease-relevant pathways such as MIF regulation of innate immunity and IL-33 signaling, inflammatory-related pathway–wound healing and metabolic-related pathways–serotonin signaling, which can be used to distinguish lesional from non-lesional and healthy skin. Pathway prioritization in discriminating lesional, nonregional and healthy skin based on the ML approach showed little similarity to pathway prioritization based on the traditional method, which indicates that ML-based pathway prioritization may provide an alternative approach to identify relevant pathways associated with the disease phenotype. This study highlights pathways correlated with disease activity, suggesting their relevance to disease pathogenesis. However, the study faces limitations such as the small sample sizes in the gene expression datasets and the absence of detailed clinical variables. The relatively small sample sizes in some public datasets may hinder the identification of potential association pathways and gene signatures related to AD severity. Future research should address these limitations by increasing the sample sizes, incorporating clinical factors, and conducting functional experiments to validate the findings. 5. Conclusions In summary, we presented evidence supporting the predictive roles of pathway and gene expression signatures in distinguishing between lesional and non-lesional AD through an integrative analysis of networks and pathway expressions based on a machine learning approach. Notably, the pathway signatures were closely correlated with the AD severity measures, including the EASI and SCORAD indices. These findings may have practical implications for clinical practice, as these signatures could enhance the accuracy of predicting AD skin involvement, thereby informing more effective treatment strategies for AD disease. Supplementary Materials The following supporting information can be downloaded at [156]https://www.mdpi.com/article/10.3390/jpm14090960/s1. [157]jpm-14-00960-s001.zip^ (2.4MB, zip) Author Contributions T.B.M. conceived the study. E.Y.D. and T.B.M. designed the study. E.Y.D. wrote the manuscript with support from T.B.M., L.S., L.D. and E.Y.D. performed the statistical analyses. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement Not applicable. Informed Consent Statement Not applicable. Data Availability Statement All the transcriptomics datasets used in our study are freely accessible at NCBI GEO ([158]https://www.ncbi.nlm.nih.gov/geo/) with accession numbers [159]GSE121212, [160]GSE107361 and [161]GSE130588, accessed on 5 March 2024. Conflicts of Interest The authors declare no competing interests. Funding Statement This work was supported by the National Institutes of Health (NIH) NHGRI (R01 HG011411)] grant support. Footnotes Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. References