Abstract
Background: Atopic dermatitis (AD) is a common inflammatory skin
condition with complex origins. Current treatments often yield
suboptimal results due to an incomplete understanding of its underlying
mechanisms. This study aimed to identify pathway and gene signatures
that distinguish between lesional AD, non-lesional AD, and healthy
skin. Method: We conducted differential gene expression and
co-expression network analyses to identify differentially co-expressed
genes (DCEGs) in lesional AD vs. healthy skin, lesional vs.
non-lesional AD, and non-lesional AD vs. healthy skin. Modules
associated with lesional and non-lesional AD were identified based on
the correlation coefficients between module eigengenes and clinical
phenotypes (|R| ≥ 0.5, p-value < 0.05). Subsequently, we employed
Ingenuity Pathway Analysis (IPA) on the identified DCEGs, followed by
machine learning (ML) analysis within the pathway expression framework.
The ML analysis of pathway expressions, selected by IPA and derived
from gene expression data, identified relevant pathway signatures,
which were validated using an independent dataset and correlated with
AD severity measures (EASI and SCORAD). Results: We identified 975,
441, and 40 DCEGs in lesional vs. healthy skin, lesional vs.
non-lesional, and non-lesional vs. healthy skin, respectively. IPA and
ML analyses revealed 25 relevant pathway signatures, including wound
healing, glucocorticoid receptor signaling, and S100 gene family
signaling pathways. Validation confirmed the significance of 10 pathway
signatures, which were correlated with the AD severity measures. DCEGs
such as MMP12 and S100A8 demonstrated high diagnostic efficacy (AUC >
0.70) in both the discovery and validation datasets. Conclusions:
Differential gene expression, co-expression networks and ML analyses of
pathway expression have unveiled relevant pathways and gene signatures
that distinguish between lesional, non-lesional, and healthy skin,
providing valuable insights into AD pathogenesis.
Keywords: atopic dermatitis (AD), weighted co-expression network
analysis, pathway expression analysis, machine learning, diagnostic
biomarkers
1. Introduction
Atopic dermatitis (AD) is a multifactorial and complex skin disease
that affects approximately 30% of children [[30]1] and 3–5% of adults
globally [[31]2]. AD presents as a highly heterogeneous phenotype and
can manifest in both lesional and non-lesional skin forms [[32]3].
While lesional AD skin is characterized by pronounced inflammation,
increased water loss, and compromised barrier function, non-lesional AD
skin appears visually healthy but harbors subtle features typical of
diseased skin [[33]4,[34]5,[35]6]. Non-lesional areas may play a role
in the development of active lesions and increased skin permeability
over time. Notably, the trans-epidermal water loss (TEWL) values in
non-lesional skin are relatively higher than those in healthy skin,
although the differences are not significant [[36]7]. In contrast,
lesional AD skin consistently shows significantly elevated TEWL values
compared to both non-lesional and healthy skin [[37]8]. Consequently,
numerous studies have aimed to discriminate between lesional AD and
non-lesional skin [[38]9,[39]10].
In recent years, with the development of high-throughput omics
technologies, large-scale transcriptomic AD data can help to understand
the underlying molecular mechanisms of AD [[40]11]. Several studies
have explored potential biomarkers by comparing gene expression
patterns in lesional and non-lesional skin of AD patients. Dyjack et
al. investigated skin tape strip samples and found overexpression of
IL-13, IL-4R, CCL22, and CCR4 in severe AD [[41]9]. Using transcriptome
data from non-lesional skin allowed stratification of patients based on
disease severity. Several biomarkers distinguishing between lesional
and non-lesional AD samples have also been proposed
[[42]10,[43]12,[44]13]. In addition, dysregulated genes in enriched
pathways, including atherosclerosis signaling [[45]14], have been
implicated in AD pathogenesis, reflecting immune and inflammatory
processes [[46]15]. Recent research highlights the activation of the
TH22, TH17/IL-23, and TH1 pathways in AD skin subtyping [[47]16].
Despite recent breakthroughs, biological therapies for diagnosing
non-lesional and lesional AD remain less understood. AD diagnosis
currently relies on patient history and visual assessment, with
management involving topical treatments (moisturizers, corticosteroids,
or calcineurin inhibitors) that lack specificity for AD [[48]17]. While
clinical manifestations of AD skin have been well documented, the
molecular distinctions between lesional and non-lesional AD skin remain
incompletely studied. Identifying pathways involved in the transition
from non-lesional to lesional AD skin is crucial.
The recent literature explored ML models for biomarker discovery in AD,
leveraging omics data. Martínez et al. developed ML methods to examine
four skin diseases, including AD, using gene expression data and
differentiating non-lesional samples from normal skin [[49]10]. Zhong
et al. used the LASSO method to discover three biomarkers
distinguishing lesional from non-lesional AD skin [[50]18]. A recent
study used random forest (RF) to identify two AD endotypes, including
eosinophil low and eosinophil high, using blood transcriptome data
[[51]12]. Several studies have conducted biological pathway analysis
using biological relationships between genes, usually using enrichment
analysis. However, ML- analysis of expression data at the pathway level
or pathway expression analysis to distinguish lesional and non-lesional
skin in AD has not been explored. Pathway expression analysis leverages
the biological interconnectedness of genes to convert gene expression
data into pathway expression data, facilitating the identification of
pathway signatures [[52]19,[53]20].
In our study, we used an integrated analysis, including differential
gene expression analysis, co-expression network analysis, and IPA and
ML analysis of pathway expression to select pathway and gene signatures
that discriminate three comparisons: lesional AD vs. heathy skin,
lesional vs. non-lesional AD and non-lesional AD vs. healthy skin. We
used gene expression datasets, including [54]GSE121212, [55]GSE107361
and [56]GSE130588, from the Gene Expression Omnibus (GEO) database.
Potential DCEGs in three pairwise group comparisons were first
identified. Then, we identified the underlaying major significant
pathways associated with the DCEGs and constructed pathway expression
data matrices using the annotated DCEGs in the pathways. The RF method
and importance score were used to identify and prioritize pathways that
discriminate lesional, non-lesional AD and healthy skin in the
[57]GSE121212 dataset. The [58]GSE107361 and [59]GSE130588 datasets
were used to validate the pathways and gene signatures associated with
lesional, non-lesional and AD severity indices, including EASI and
SCORAD. Overall, our integrative analyses were able to provide novel
insights to identify important pathways and gene signatures that
discriminate lesional vs. healthy, lesional vs. non-lesional and
non-lesional vs. healthy skin and provide potential therapeutic
targets.
2. Materials and Methods
2.1. Dataset Used in the Study
In our study, we harnessed gene expression data and clinical
information from publicly available databases to explore the
transcriptomic landscape of AD. Our inclusion criteria included any
dataset that (1) studies Homo sapiens, (2) has a sample size ≥ 50, and
(3) contains the gene expression profiles of lesional, non-lesional and
healthy skin samples. Using these inclusion criteria, we found three
relevant Gene Expression Omnibus (GEO) datasets: [60]GSE121212,
[61]GSE107361, and [62]GSE130588 to understand the molecular
underpinnings of lesional and non-lesional AD. We leveraged the
RNA-sequencing (RNA-seq) [63]GSE121212 dataset from 92 skin samples,
including 27 non-lesional, 27 lesional AD and 38 healthy skins, as the
discovery dataset to identify genes and pathway signatures associated
with lesional and non-lesional AD. The details of the subject
recruitment and study procedure were described in previous studies
[[64]21,[65]22]. Dermatologists diagnosed AD based on standardized
criteria, and high-depth RNA-seq allowed us to unravel the disease
mechanisms. The other two datasets, [66]GSE107361 and [67]GSE130588,
which consist of gene expression profiles and the corresponding
clinical data, were used for the validation of the results. The
[68]GSE107361 dataset consists of 39 healthy, 40 non-lesional and 29
lesional AD skin samples with microarray gene expression profiles. This
dataset was preprocessed and normalized in the original study [[69]23].
The [70]GSE130588 dataset consists of 51 lesional, 41 non-lesional AD
and 20 healthy skin biopsy specimens along with the disease severity
indices EASI and SCORAD collected at baseline in the original study.
Data preprocessing and normalization were described in a previous study
[[71]24]. The [72]GSE130588 dataset was used to examine the association
of the pathway expression and disease severity index measures. A
summary of the datasets used in our study is shown in [73]Supplementary
Table S1.
2.2. Differential Expression Genes (DEGs) Analysis
In our study, we applied rigorous data preprocessing steps to the
RNA-seq discovery data. Low-abundance genes (<10 reads) were removed to
enhance the robustness of the RNA-seq discovery data [[74]25]. Using
the DESeq2 [[75]26] method with variance stabilizing transformation, we
performed pairwise differential expression analysis across different
groups (lesional AD vs. healthy, lesional vs. non-lesional AD, and
non-lesional AD vs. healthy). Genes with adjusted p-values
(Benjamini–Hochberg) < 0.05 and an absolute log2 fold change > 1 were
deemed significant.
2.3. Weighted Gene Co-Expression Network Analysis (WGCNA)
In our study, we focused on identifying co-expressed genes alongside
individual differentially expressed genes (DEGs) in AD. By calculating
the coefficient of variation (CV) for each gene in the
variance-stabilized and normalized RNA-seq data, we pinpointed
hypervariable genes (CV > 15%) across three pairwise group comparisons:
healthy vs. lesional, healthy vs. non-lesional, and lesional vs.
non-lesional in the discovery cohort [[76]26]. Next, we conducted gene
co-expression network analysis using the “WGCNA” package [[77]27].
Initially, we constructed a similarity matrix based on the pairwise
correlation
[MATH:
Sij :MATH]
=
[MATH:
cor (xi, <
msub>xj :MATH]
), where
[MATH:
xi
:MATH]
and
[MATH:
xj
:MATH]
represent the ith row and the jth row of hypervariable gene normalized
RNA-seq data matrix X, respectively, and then we transformed the
similarity matrix into an adjacency matrix, denoted by
[MATH:
Aij=cor (
xi, xj
)β :MATH]
, where β represents the suitable soft-thresholding power (ranges from
1–20) and the suitable powers were estimated using the index value in
the discovery dataset of each comparison group based on the pick Soft
Threshold function [[78]27]. Next, we transformed the adjacency gene
network into a topological overlap matrix (TOM) and corresponding
dissimilarity (1-TOM) matrix. The WGCNA used hierarchical clustering to
identify gene modules/clusters of densely interconnected genes in many
samples and color to denote the modules. The minimum number of genes in
each module was set to 50 [[79]28].
2.4. Identifying Modules Correlated with Lesional and Non-Lesional Skin
Here, we employed the Spearman correlation coefficient between the
clinical phenotypes (lesional AD, non-lesional AD, and healthy skin)
and the module eigengenes (MEs), the principal component of a module,
to identify crucial gene modules. Genes in the modules correlated with
the clinical phenotypes (absolute Spearman correlation coefficient
(|R|) > 0.5 and p-value < 0.05) and DEGs obtained from differential
analysis were then overlapped, pinpointing differentially co-expressed
genes (DCEGs) or hub genes across three group comparisons.
2.5. Functional Enrichment Analysis
The pathway enrichment analyses were conducted using the Ingenuity
Pathway Analysis software (IPA, QIAGEN, Redwood City, CA, USA) [[80]29]
to explore the pathway behaviors of genes within modules correlated
with the AD clinical phenotypes and DEGs. For this analysis, we focused
on gene signatures overlapping between DEGs meeting specific criteria:
absolute log2 fold-change values > 1 and adjusted p-value < 0.05 and
DCEGs in modules with an absolute Spearman correlation coefficient
(|R|) > 0.5 and p-value < 0.05. IPA pathways with an adjusted p-value <
0.05 were considered in pathway expression analysis below.
2.6. Machine Learning Analysis of Pathway Expression and Statistical Analyses
In this study, we utilized pathway expression analysis instead of gene
expression level analysis approach [[81]19,[82]20] to identify key
pathways. For the pathway expression analysis, we first constructed a
pathway expression matrix from the gene expression matrix in the
discovery and validation datasets. The general transformation of the
gene expression matrix into the pathway expression matrix is shown in
[83]Supplementary Figure S1. The pathway expression matrix was then
defined as the average expression level of all the annotated genes for
each subject and each significant pathway. These pathway expression
matrices served as features for the supervised RF analysis, and
relevant pathways that discriminated between lesional, non-lesional,
and healthy skin were identified. Relevant pathways with an importance
score, measured by a mean decrease in accuracy > 4 in classifying the
AD clinical phenotype, including lesional, non-lesional AD and healthy
skin, were selected using the RandomForests R package. The heatmaps of
the AD clinical phenotype discriminatory pathways were visualized using
the pheatmap R package. The mean pathway expression across the pairwise
comparisons including lesional vs. healthy, non-lesional vs. lesional,
and non-lesional vs. healthy skin were compared using t-tests in the
discovery and validation normalized gene expression datasets. Pearson
correlation (R) analysis was applied to examine the correlation between
the pathway expression data and the numerical AD disease severity
measures, including EASI and SCORAD, in the validation dataset. An area
under the receiver operating characteristics (ROC) curve using a
logistic regression model was used for assessing the diagnostic
efficiency of the pathway expression and gene signatures in
discriminating lesional vs. healthy, lesional vs. non-lesional and
non-lesional vs. healthy skin in the discovery and validation datasets.
All the statistical analyses were conducted using R v4.3.2. A
p-value < 0.05 was used to define statistical significance, unless
otherwise stated.
3. Results
3.1. Identification of Differentially Expressed Genes in Lesional and
Non-Lesional AD Skin
Differential gene expression analysis of the lesional (n = 27),
non-lesional (n = 27) AD and healthy control (n = 38) samples revealed
1243 DEGs (544 up- and 699 down-regulated) in lesional vs. healthy, 801
DEGs (355 up- and 446 down-regulated) in lesional vs. non-lesional AD,
and 42 DEGs (36 up- and 6 down-regulated) in non-lesional AD vs.
healthy skin ([84]Figure 1A–C) in the discovery dataset. Notably, we
identified 33 common DEGs (27 up-regulated and 6 down-regulated genes)
across three pairwise comparisons, as shown in [85]Figure 1A–C and
[86]Supplementary Figure S2A. Unsupervised cluster analysis of these
DEGs in the three comparison groups successfully discriminated between
lesional, non-lesional, and healthy skin ([87]Supplementary Figure
S2B). The observed higher absolute log2 fold change in lesional vs.
healthy skin, compared to lesional vs. non-lesional and non-lesional
vs. healthy skin (as shown in [88]Supplementary Figure S2B), suggests
that several genes are significantly dysregulated in lesional AD. These
include up-regulated genes such as SERPINB4, S100A9, S100A8, SPRR2A,
S100A7, S100A7A, SPRR2B, KRT16, HEPHL1, and DEFB4A, as well as
down-regulated genes such as KRT77, BTC, WIF1, FABP7, CHRM4, ALOX15,
SOCS3, NELL2, TMPRSS4, and MMP12), which may enhance our understanding
of the AD disease mechanisms and contribute to personalized medicine.
Figure 1.
[89]Figure 1
[90]Open in a new tab
Identification of lesional- and non-lesional-associated genes in the
discovery dataset. (A–C) Volcano plot indicating 1243 differentially
expressed genes (DEGs with 544 up- and 699 down-regulated) in lesional
AD compared to healthy skin, 801 DEGs (355 up- and 446 down-regulated)
in lesional compared to non-lesional AD skin and, 42 DEGs (36 up- and 6
down-regulated) in non-lesional AD compared to healthy skin,
respectively, in the discovery RNA-seq dataset. A total of 33 common
DEGs were identified across three comparisons, as highlighted in the
volcano plots. Red dots indicate genes with an adjusted p-value < 0.05
and log2 fold change (log2FC) > 1; green dots indicate genes with an
adjusted p-value < 0.05 and log2FC < 1; DEGs, differentially expressed
genes. (D) The correlation between the module eigengenes and the
clinical phenotype: lesional AD vs. healthy skin. (E) The correlation
between the module eigengenes and the clinical phenotype: lesional vs.
non-lesional AD skin. (F) The correlation between the module eigengenes
and the clinical phenotype: non-lesional vs. healthy skin. Key module
genes associated with lesional and non-lesional skin were selected with
the following criteria: absolute correlation coefficient, |R| > 0.5 and
p-value < 0.05. LS-lesional, NL-non-lesional, H-healthy skin, and
R-Spearman’s correlation coefficient.
3.2. Identification of Differential Co-Expression Modules Associated with
Lesional and Non-Lesional AD
To explore the similarity and heterogeneity of biological molecules in
lesional and non-lesional AD, we first identified 4905 hypervariable
genes using coefficient of variation analysis across three pairwise
comparisons in the discovery dataset. Next, co-expression network
analyses of these 4905 hypervariable genes were conducted across the
three pairwise comparisons using appropriate soft threshold powers (β =
5, 10, and 5, with scale-free R^2 ≥ 0.85) for the three comparisons
([91]Supplementary Figure S3A–C) to investigate their gene regulatory
networks. We identified a total of 9, 12, and 10 co-expression modules
in lesional vs. healthy, lesional vs. non-lesional, and non-lesional
vs. healthy comparisons ([92]Supplementary Figure S4A–C), respectively.
Of these, the key gene modules correlated with lesional and
non-lesional AD (|R| > 0.5 and p-value < 0.05) were selected for
further analysis ([93]Figure 1D–F and [94]Supplementary Table S2). For
the lesional AD and healthy skin comparison, the blue module with 1171
genes is positively correlated with lesional AD, while the brown module
with 637 genes is negatively correlated with lesional AD ([95]Figure
1D). In the lesional and non-lesional AD comparison, the blue module
(601 genes) and brown module (459 genes) are positively correlated with
lesional AD, whereas the black module (236 genes) and red module (312
genes) are negatively correlated with lesional AD ([96]Figure 1E). For
the non-lesional AD and healthy skin comparison, the blue module (765
genes) is positively correlated and the magenta module (68 genes) is
negatively correlated with non-lesional AD ([97]Figure 1F).
After identifying the co-expressed genes/module genes, we conducted an
overlapping analysis between the DEGs and the module genes in three
pairwise comparisons to screen DCEGs in lesional and non-lesional AD
([98]Supplementary Table S2). For lesional AD and healthy skin, we
identified 338 DCEGs in the blue module and 244 DCEGs in the brown
module. For lesional AD and non-lesional AD, we identified 240 DCEGs in
the blue module, 32 DCEGs in the brown module, 73 DCEGs in the black
module, and 96 DCEGs in the red module. For non-lesional AD and healthy
skin, we identified 35 DCEGs in the blue module and 5 in the magenta
module.
Overall, our analyses revealed 975, 441, and 40 DCEGs in the lesional
vs. healthy, lesional vs. non-lesional, and non-lesional vs. healthy
skin comparisons, respectively. Notably, there were more DCEGs in
lesional AD vs. healthy skin, suggesting that more gene regulatory
networks are involved in lesional AD.
3.3. Functional Enrichment Analysis of the DCEGs in Lesional and Non-Lesional
Correlated Modules
To further understand the biological function of DCEGs within modules
correlated with non-lesional and lesional AD, we performed a canonical
pathway enrichment analysis. We identified a total of 108
over-represented pathways (p-value < 0.05, [99]Supplementary Tables
S3–S7). The 338 DCEGs in the blue module derived from lesional AD and
healthy skin are involved in several pathways, including granulocyte
adhesion and diapedesis, atherosclerosis signaling, granulocyte
adhesion and diapedesis, pathogen-induced cytokine storm signaling
pathway, interferon signaling and S100 family signaling pathways
([100]Supplementary Table S3). The 244 DCEGs in the brown module
derived from lesional and healthy skin are enriched in 10 pathways,
including the 3 most significant pathways, the role of osteoblasts in
rheumatoid arthritis signaling, neurovascular coupling signaling and
iron homeostasis signaling ([101]Supplementary Table S4). The 240 DECGs
in the blue module derived from lesional and non-lesional AD are
enriched in 51 pathways, with the 5 most significant pathways included
atherosclerosis signaling, interferon signaling, S100 family signaling,
IL-17 signaling, and differential regulation of cytokine production in
intestinal epithelial cells by IL-17A and IL-17F ([102]Table S5). The
32 DCEGs in the brown module derived from lesional and non-lesional are
involved in several pathways, including granulocyte adhesion and
diapedesis, agranulocyte adhesion and diapedesis, atherosclerosis
signaling, and pathogenesis of multiple sclerosis ([103]Supplementary
Table S6). The 35 DCEGs in the blue module derived from non-lesional
and healthy skin are enriched in 9 pathways, including the IL-13
signaling pathway, pathogen-induced cytokine storm signaling pathway,
and granulocyte adhesion and diapedesis pathways ([104]Supplementary
Table S7).
3.4. Identification of Lesional and Non-Lesional AD Discriminatory Pathway
Signatures
As shown in the functional enrichment analysis, we found several
significant pathways enriched in lesional and non-lesional AD. To
prioritize and identify important pathways that differentiate lesional
and non-lesional AD, we first transformed the gene expression data into
pathway expression-level data by calculating the average gene
expression levels within each of the 108 enriched pathways. Then, we
used RF algorithms and selected 25 pathway expression signatures based
on their discriminatory importance for lesional, non-lesional AD, and
healthy skin ([105]Figure 2A,B). These pathways exhibited differential
modulation across the comparisons, with higher expression profiles in
lesional AD compared to non-lesional AD and healthy skin ([106]Figure
2C–E and [107]Supplementary Figure S5).
Figure 2.
[108]Figure 2
[109]Open in a new tab
Identification of the pathway signatures discriminating lesional AD,
non-lesional AD and healthy skin in the discovery dataset. (A) Most
important pathway signatures selected by the random forest method. (B)
The heatmap showing the mean normalized expression level of 25 pathway
signatures among three groups. (C–E) Comparison of the mean normalized
expression level of the pathway signatures in three pairwise
comparisons. The significance levels of the comparison pathway mean
differential modulation across the pairwise group comparisons are
indicated by the stars (**** p < 0.0001, *** p < 0.001, ** p < 0.01, *
p < 0.05). LS-lesional, NL-non-lesional, H-healthy skin.
3.5. Enriched Pathway Signatures Were Correlated with AD Clinical Severity
Measures
After identifying the pathway expression signatures, we validated the
pathway expression signatures using independent datasets. Specifically,
we analyzed the [110]GSE107361 dataset to confirm the differential
modulation of ten pathway expression signatures across lesional,
non-lesional, and healthy skin. Notably, ten pathway expression
signatures, including wound healing, glucocorticoid receptor signaling
and S100 family signaling, were differentially modulated among
lesional, non-lesional AD and healthy skin ([111]Figure 3A–D and
[112]Supplementary Figure S6). Genes implicated in wound healing
(CXCL8, HBEGF, IL36A, IL36G, KRT16, KRT17, KRT6A, KRT6B, KRT6C, MMP1
and PGF), genes implicated in glucocorticoid receptor signaling (CCL2,
CXCL8, IL4R, KRT16, KRT17, KRT6A, KRT6B, KRT6C, MMP1, PLA2G2F, PLA2G
and SELE) and genes implicated in S100 family signaling (CCL20, CCR7,
CXCL8, MMP1, MMP12, S100A12, and S100A8) and other pathways showed a
higher pathway expression pattern in lesional AD compared to healthy
skin and non-lesional AD skin ([113]Figure 3A–D and [114]Figure S6).
Interestingly, multiple genes were involved in the same pathways
([115]Supplementary Figure S7), highlighting that targeted research on
these pathways may enhance our understanding and lead to new therapies
for lesional and non-lesional AD. Additionally, we performed Pearson
correlation analysis between the pathway expression levels and the
disease severity measured by the SCORAD and EASI indices to identify
pathway expression associated with the disease severity index. The
results showed that the pathway expression signatures, including wound
healing, glucocorticoid receptor, and S100 family signaling
([116]Figure 4A–F and [117]Supplementary Figure S8), were correlated
with SCORAD and EASI.
Figure 3.
[118]Figure 3
[119]Open in a new tab
Validation of the pathway signatures in discriminating lesional,
non-lesional and healthy skin in the independent validation dataset.
(A) The heatmap showing the mean normalized expression level of the 25
pathway signatures among three groups. (B–D) Comparison of the mean
normalized expression level of the pathway signatures in three pairwise
comparisons. The significance levels of the comparison pathway mean
differential modulation across pairwise group comparisons were
indicated by the stars (**** p < 0.0001, *** p < 0.001, ** p < 0.01,
ns: not significant). LS-lesional, NL-non-lesional, H-healthy skin.
Figure 4.
[120]Figure 4
[121]Open in a new tab
Validation of the association of the pathway mean expression levels and
the disease severity measures, including the SCORAD and EASI, in
lesional AD skin. (A) The correlation between the mean normalized
expression of genes in the wound-healing signaling pathway and the
SCORAD score. (B) The correlation between the mean normalized
expression of genes in the glucocorticoid receptor signaling pathway
and the SCORAD score. (C) The correlation between the mean normalized
expression of genes in the S100 family signaling pathway and the SCORAD
score. (D) The correlation between the mean normalized expression of
genes in the wound-healing signaling pathway and the EASI score. (E)
The correlation between the mean normalized expression of genes in the
glucocorticoid receptor signaling pathway and the EASI score. (F) The
correlation between the mean normalized expression of genes in the S100
family signaling pathway and the EASI score.
3.6. Validation of Pathways and Gene Signatures Discriminating Lesional,
Non-Lesional AD and Healthy Skin
After identifying the relevant pathways and gene signatures that
differentiate lesional, non-lesional and healthy skin, we examined the
diagnostic performance of the pathways in the discovery and validation
datasets ([122]Figure 5A). There were 33 key genes involved in the AD
severity-associated pathways, of which 18 genes showed high diagnostic
performance in discriminating lesional AD vs. non-lesional AD as well
as non-lesional AD vs. health skin, as shown [123]Figure 5B. More
specifically, the glucocorticoid receptor and IL-33 signaling-related
gene SELE showed high diagnostic efficacy in the [124]GSE121212 (AUC =
0.947) and [125]GSE107361 (AUC = 0.727) datasets between lesional AD
vs. non-lesional AD. The atherosclerosis and S100 family
signaling-related gene S100A8 showed high diagnostic ability in the
[126]GSE121212 (AUC = 0.933), and [127]GSE107361 (AUC = 0.846) datasets
between lesional and non-lesional AD. The gene OAS2, related to the
role of pattern recognition receptors in recognizing bacteria and
viruses, had high diagnostic performance in the [128]GSE121212 (AUC =
0.922) and [129]GSE107361 (AUC = 0.679) datasets between lesional and
non-lesional AD. The gene MMP12, associated with multiple pathways such
as wound healing, glucocorticoid receptor, and IL-33 signaling,
demonstrated high diagnostic ability in discriminating lesional and
non-lesional AD in the [130]GSE121212 (AUC = 0.892) and [131]GSE107361
(AUC = 0.729) datasets. Moreover, differential regulation of cytokine
production in intestinal epithelial cells by the IL-17A and IL-17F and
IL-13 signaling- associated gene DEFB4A had high diagnostic performance
in the [132]GSE121212 (AUC = 0.775) and [133]GSE107361 (AUC = 0.808)
datasets, followed by OAS2 in the [134]GSE121212 (AUC = 0.76) and
[135]GSE107361 (AUC = 0.588) and MMP1 in the [136]GSE121212 (AUC =
0.729) and [137]GSE107361 (AUC = 0.875) datasets in classifying
non-lesional AD from healthy skin. Overall, integrative analysis using
differential analysis, co-expression networks, and pathway expression
ranking via ML approaches identified relevant pathways and gene
signatures that discriminate between lesional, non-lesional, and
healthy skin.
Figure 5.
[138]Figure 5
[139]Open in a new tab
The diagnostic performance of the pathways and gene signatures. (A) The
AUC values of the pathway signatures with selected representative
annotated genes in discriminating lesional AD vs. healthy skin,
lesional AD vs. non-lesional AD skin, and non-lesional AD and healthy
skin in the discovery and validation datasets. (B) The AUC value of the
gene signatures in discriminating lesional AD vs. healthy skin,
lesional AD vs. non-lesional AD skin, and non-lesional AD and healthy
skin in the discovery and validation datasets. LS-lesional skin,
NL-non-lesional, H-healthy skin. [140]GSE121212-discovey dataset,
[141]GSE107361-validation dataset.
4. Discussion
In the transcriptome-wide pairwise comparison of lesional, non-lesional
and healthy skin by integrating differential expression analysis and
WGCNA, we identified 975, 441, and 40 DCEGs in lesional vs. healthy
skin, lesional vs. non-lesional, and non-lesional vs. healthy skin,
respectively, highlighting several clusters of densely interconnected
genes involved in lesional skin relative to non-lesional and healthy
skin. These DCEGs were enriched in several significant pathways, of
which RF prioritized and validated 10 pathway expression signatures,
including wound healing (CXCL8 and MMP1), glucocorticoid receptor
signaling (CCL2, CXCL8, MMP1and SELE), and S100 family signaling
(CXCL8, MMP1 and MMP12), that differentiate lesional, non-lesional, and
healthy skin. The pathway expression signatures were correlated with
clinical severity measures, including the EASI and SCORAD. Overall, our
study demonstrates that high throughput integrative transcriptomics
analysis differentiates lesional and non-lesional skin from heathy
skin.
Identifying pathways and gene signatures that differentiate lesional,
non-lesional, and healthy skin is crucial for AD therapeutics. Previous
studies have used differential gene expression analysis to
differentiate lesional and non-lesional AD [[142]30,[143]31]. A recent
study by Martínez et al. developed an ML method to discriminate between
lesional and non-lesional inflammatory skin diseases, including AD,
using expression data [[144]10]. Another study by Zhong et al. employed
the LASSO method to discover three biomarkers that differentiate
non-lesional and lesional AD [[145]18]. However, few studies used
combinatorial approaches, including differential analysis,
co-expression network analysis, and pathway expression analysis-based
on an ML method, to identify pathways and gene signatures that
discriminate lesional, non-lesional AD, and healthy skin. These methods
can uncover connection networks and hidden biological models crucial to
disease pathogenesis [[146]32]. Constructing networks of co-expressed
genes identifies highly correlated modules, associates them with
phenotypic traits, and elucidates complex gene interactions [[147]33].
This integrative approach demonstrates robustness against noise and
variability, offering profound insights into gene regulation and
potential therapeutic targets [[148]34].
Therefore, in this study, we used a combinatorial approach, including
differential analysis, co-expression network analysis, and machine
learning based on pathway expression, to help us select relevant
pathways and gene signatures using RNA-seq data from diseased lesional,
non-lesional, and healthy skin. More fundamentally, integrated
approaches to identify relevant pathway signatures that can predict AD
disease severity are crucial. Univariate statistical tests, such as the
empirical Bayes-moderated t-statistics test, treat genes as independent
from one another and are often poorly reproducible in independent
datasets [[149]35]. Therefore, our study used more comprehensive
methods to identify the DCEGs in lesional and non-lesional AD, showed
these DCEGs were enriched in pathways and gene signatures that
discriminated lesional, non-lesional and healthy skin. The 975 DCEGs in
lesional vs. healthy, 441 DCEGs in non-lesional vs. healthy, and 40
DCEGs in non-lesional vs. healthy skin were enriched in 108 pathways.
Using the RF machine learning algorithm, we showed that the pathway
expression levels of 25 identified potential pathways clearly
distinguished lesional, non-lesional and healthy skin. Validation
analysis confirmed that ten pathway expressions, including
wound-healing signaling, glucocorticoid receptor signaling, macrophage
alternative activation signaling, IL-33 signaling, S100 family
signaling, MIF-mediated glucocorticoid regulation, serotonin receptor
signaling, MIF regulation of innate immunity, VEGF family
ligand–receptor interactions and tumor microenvironment, showed
significant differences among lesional, non-lesional and healthy skin.
Remarkably, this study validated that the ten pathway signatures are
correlated with the disease severity, including the EASI and SCORAD
indices. This may suggest that these pathway trajectories contribute to
the progression from non-lesional to lesional AD skin disease.
After identifying pathways associated with lesional and non-lesional
skin, we examined the diagnostic relevance of individual genes involved
in these pathways. Most of the genes, including MMP12, DEFB103A/B,
DEFB4A/B, S100A12, S100A8, CCL22, CCR7, CXCL1, CXCL10, MMP1 and SELE,
in these pathways showed high diagnostic efficacy in discriminating
lesional from non-lesional skin. In addition, genes such as DEFB4A,
OAS2, MMP12 and S100A8 showed high diagnostic performance in
classifying non-lesional and healthy samples with an AUC > 0.70 in both
discovery and validated datasets. Dysregulated pathways such as the
S100 family, tumor microenvironment and IL-33 signaling that contain
MMP12 with high diagnostic ability may have functional relevance in
AD’s pathology. The wound-healing-related gene signatures such as CXCL8
and MMP1 that were up-regulated in lesional AD compared with
non-lesional and healthy skin showed high diagnostic performance in
distinguishing lesional AD from non-lesional AD and healthy skin.
Interestingly, previous studies indicated that wound healing is a
complex process that involves the interaction between different cell
types, growth hormones, cytokines, antioxidants, and a stable supply of
metal ions [[150]36], and our identified genes may have functional
relevance in this complex AD pathology. We also identified three genes,
including MMP12, DEFB4A, and S100A8, with high diagnostic value in
discriminating lesional AD vs. non-lesional AD as well as non-lesional
AD vs. healthy skin. Overall, several genes with high diagnostic value
in discriminating lesional, non-lesional and healthy skin might be
involved in the disease pathogenesis and could be used as potential
biomarkers for the diagnosis of AD and as therapeutic targets of AD.
Despite the mechanisms that underlie AD progression still being
unclear, in this study, we identified several DCEGs associated with
lesional and non-lesional AD skin, and several of the identified genes
showed a consistent direction of regulation in lesional and
non-lesional AD compared with heathy skin, which were also consistent
with previous studies. For example, a previous study identified MMP12
as a potential biomarker of both lesional and non-lesional AD skin
[[151]37,[152]38]. A recent study by Guttman-Yassky et al. revealed
that a 12-week treatment with abrocitinib, a Janus kinase 1-selective
inhibitor, down-regulated the expression of several gene biomarkers,
including MMP12 and S100A8, in patients with moderate-to-severe AD skin
associated with inflammation, epidermal hyperplasia, and Th2 and Th22
immune responses, highlighting that genes found in our study have
potential as biomarkers for monitoring and managing AD disease
progression [[153]39]. In addition, another study showed that genes
encoding epidermal components such as DEFB4A are up-regulated in
lesional skin compared with non-lesional AD skin [[154]14]. Although
there are numerous reports of gene expression in healthy skin, less is
known about the architecture of clinically uninvolved non-lesional skin
as compared to healthy skin. Examination of non-lesional skin in AD
could provide previously unknown insights into the molecular processes
operating in uninvolved skin and suggest a unique preclinical set of
healthy skin in non-lesional condition. Non-lesional skin may be more
useful in identifying the driving features underlying pathogenesis
because the inflammatory milieu among diseases becomes more similar
during the lesional stage. Our study indicated that non-lesional skin
samples are extremely informative about the underlying disease process
and could be used to create patient subsets for future clinical trials
or de-risk clinical trials, as non-lesional skin is reported to be an
effective marker of the treatment response [[155]40].
Our study has several strengths. Integrated analysis of the correlation
network analysis and ML method identified disease-relevant pathways
such as MIF regulation of innate immunity and IL-33 signaling,
inflammatory-related pathway–wound healing and metabolic-related
pathways–serotonin signaling, which can be used to distinguish lesional
from non-lesional and healthy skin. Pathway prioritization in
discriminating lesional, nonregional and healthy skin based on the ML
approach showed little similarity to pathway prioritization based on
the traditional method, which indicates that ML-based pathway
prioritization may provide an alternative approach to identify relevant
pathways associated with the disease phenotype. This study highlights
pathways correlated with disease activity, suggesting their relevance
to disease pathogenesis. However, the study faces limitations such as
the small sample sizes in the gene expression datasets and the absence
of detailed clinical variables. The relatively small sample sizes in
some public datasets may hinder the identification of potential
association pathways and gene signatures related to AD severity. Future
research should address these limitations by increasing the sample
sizes, incorporating clinical factors, and conducting functional
experiments to validate the findings.
5. Conclusions
In summary, we presented evidence supporting the predictive roles of
pathway and gene expression signatures in distinguishing between
lesional and non-lesional AD through an integrative analysis of
networks and pathway expressions based on a machine learning approach.
Notably, the pathway signatures were closely correlated with the AD
severity measures, including the EASI and SCORAD indices. These
findings may have practical implications for clinical practice, as
these signatures could enhance the accuracy of predicting AD skin
involvement, thereby informing more effective treatment strategies for
AD disease.
Supplementary Materials
The following supporting information can be downloaded at
[156]https://www.mdpi.com/article/10.3390/jpm14090960/s1.
[157]jpm-14-00960-s001.zip^ (2.4MB, zip)
Author Contributions
T.B.M. conceived the study. E.Y.D. and T.B.M. designed the study.
E.Y.D. wrote the manuscript with support from T.B.M., L.S., L.D. and
E.Y.D. performed the statistical analyses. All authors have read and
agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All the transcriptomics datasets used in our study are freely
accessible at NCBI GEO ([158]https://www.ncbi.nlm.nih.gov/geo/) with
accession numbers [159]GSE121212, [160]GSE107361 and [161]GSE130588,
accessed on 5 March 2024.
Conflicts of Interest
The authors declare no competing interests.
Funding Statement
This work was supported by the National Institutes of Health (NIH)
NHGRI (R01 HG011411)] grant support.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data
contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI
and/or the editor(s) disclaim responsibility for any injury to people
or property resulting from any ideas, methods, instructions or products
referred to in the content.
References