Abstract

   Although atopic dermatitis (AD) and type 2 diabetes mellitus (T2DM) may
   appear clinically and pathophysiologically unrelated, AD is a common
   skin disease characterized by chronic inflammation and skin barrier
   dysfunction, whereas T2DM is a metabolic disorder marked by
   hyperglycemia and chronic inflammation, which further exacerbates
   insulin resistance (IR) through the release of systemic inflammatory
   factors. Despite their apparent differences, the molecular mechanisms
   shared between AD and T2DM remain relatively unexplored. In this study,
   we integrated transcriptomic data from both AD and T2DM using
   differential gene expression analyses (DEGs), gene set variation
   analysis (GSVA), and machine learning algorithms to uncover common
   features of these diseases. We identified several characteristic genes,
   including LTF, LTB4R, and CCR1, which are significantly upregulated in
   both conditions and may serve as potential biomarkers. Furthermore,
   virtual screening revealed that Dioscin, Camptothecin, and Albamycin
   exhibit strong affinity for the CCR1 binding site, indicating their
   potential as therapeutic candidates. In summary, this study elucidates
   the shared molecular mechanisms of AD and T2DM and introduces new
   potential targets and drugs for the diagnosis and treatment of these
   diseases.

   Keywords: Atopic dermatitis, Type 2 diabetes mellitus, Integrated
   transcriptomic analysis, Machinelearning, CCR1

   Subject terms: Computational biology and bioinformatics, Immunology

Introduction

   As a metabolic disease characterized by hyperglycemia, chronic
   hyperglycemia caused by insulin resistance and β-cell dysfunction, Type
   2 diabetes mellitus (T2DM) accounts for 90–95% of diabetes cases
   worldwide^[29]1. It often leads to serious complications, such as
   cardiovascular disease, neuropathy, kidney disease and retinopathy,
   which is usually associated with chronic inflammation^[30]2. Meanwhile,
   obesity and insulin resistance often trigger systemic low-grade
   inflammation, with elevated levels of inflammatory factors such as
   TNF-α and IL-6, which in turn impair insulin signaling^[31]3.
   Characterized by chronic inflammation, intense pruritus, and
   eczema-like skin lesions, atopic dermatitis (AD) is a disease that
   affects approximately 15–20% of children and 1–3% of adults worldwide.
   Its pathogenesis involves a complex interplay of genetic
   susceptibility, environmental triggers, immune dysregulation, and skin
   barrier dysfunction, giving rise to elevated immunoglobulin E (IgE),
   allergic reactions, and systemic inflammation^[32]4,[33]5. AD and T2DM
   both have significantly impacts on patient’s health and quality of
   life. As for AD, immune cells such as T cells and mast cells are
   hyperactive. As for T2DM, low-grade inflammation triggered by obesity
   and insulin resistance impairs insulin signaling^[34]6,[35]7. AD and
   T2DM are both intertwined with immune system imbalance and chronic
   inflammation^[36]8,[37]9. However, many questions remain unsolved about
   the causative genes or transcription factors shared by AD and T2DM, as
   well as potential common therapeutic targets.

   A recent study found an association between AD and type 2 diabetes
   (T2D). According to the survey using national health data from 2002 to
   2015 in South Korea, the risk of subsequent T2D in patients with AD is
   significantly increased^[38]10. Mendelian randomization study further
   explores the causal relationship between AD, type 1 diabetes (T1D) and
   T2D, which showed that the gene prediction of AD significantly
   increases the risk of T2D^[39]9. Both AD and diabetes involve immune
   dysregulation and share common inflammatory pathways, suggesting the
   existence of overlapping mechanisms. The immune system may serve as a
   bridge linking the pathogenesis of these two diseases. For example, the
   actions of cytokines and immune cells in AD may affect metabolic
   pathways involved in T2D. Obesity is an important risk factor for T2D,
   and studies have shown that obesity may lead to more frequent
   occurrences of AD and exacerbate its symptoms^[40]11. The prevalence of
   obesity is higher in AD patients compared to the general population,
   and its effects are particularly pronounced in children, with obesity
   before the age of 5 significantly increasing the risk of developing
   AD^[41]12. A large-scale study involving 2,090 adult patients further
   confirmed the clear association between obesity and AD^[42]13. In
   experimental models, obese AD mice exhibited a 2–4 times thicker ear
   tissue compared to non-obese AD mice, indicating that the persistent
   inflammation caused by obesity leads to an exacerbation of AD severity,
   even when obese mice reach similar body weight as the control
   group^[43]14,[44]15. However, a cross-sectional study from the
   “Canadian Tomorrow Study” found a negative association between AD and
   T2D, with AD linked to a lower risk of T2D (OR: 0.78, 95% CI:
   0.71–0.84) and reduced risks of hypertension, myocardial infarction,
   and stroke^[45]16.Given these contrasting findings, it remains unclear
   whether AD directly contributes to the development of T2D. Therefore,
   it is hypothesized that transcriptome data of AD and T2D could help
   elucidate the shared molecular mechanisms between the two diseases,
   identify potential biomarkers associated with their development, and
   suggest therapeutic targets to address both conditions.

   Moreover, two-disease model analysis has proven to be effective in
   elucidating commonalities between various chronic diseases. For
   example, rheumatoid arthritis (RA) and cardiovascular disease (CVD)
   share the inflammatory pathways, especially nuclear factor-kappa B
   (NF-κB) signaling pathway and tumor necrosis factor (TNF) pathway.
   Chronic obstructive pulmonary disease (COPD) and lung cancer share
   environmental and genetic risk factors, such as tobacco exposure and
   TP53 gene mutations. Alzheimer’s disease and T2DM share dysregulation
   of insulin signaling pathways, including the insulin receptor substrate
   (IRS) pathway and the Phosphoinositide 3-kinase (PI3K)/protein kinase B
   (Akt) pathway^[46]6. However, these two diseases closely related to the
   inflammatory response remain common and insurmountable health
   challenges. Therefore, it is of great significance to explore the
   common transcription in T2DM and AD groups and the gene expression
   pathway.

   This study aims to integrate the transcriptome data of AD and T2DM, so
   as to elucidate the shared molecular mechanisms between these two
   diseases, explore potential biomarkers associated with the development
   of AD and T2DM, and screen potential therapeutic drugs. To this end,
   the transcriptome data related to AD and T2DM are integrated from the
   database of gene expression omnibus (GEO). In addition, the differences
   in gene expression analysis and weighted total express network analysis
   (WGCNA) are used to identify the key to each disease gene and module.
   Through genetic variation analysis (GSVA), each disease and normal
   tissue samples of the enrichment score differences are calculated, with
   the enrichment analysis of GO and KEGG conducted^[47]17–[48]19. Two
   diagnostic genes shared by the disease are identified, including LTF,
   LTB4R, and CCR1. Their good performance was validated with external
   datasets. Further, we have analyzed common skin atopic dermatitis and
   T2D in 22 kinds of immune cells infiltrating levels in the blood, and
   adopted single-cell RNA sequencing positioning gene expression in
   specific cell types. Virtual screening was used to identify potential
   therapeutic compounds targeting shared genes. According to the results,
   the comorbidity mechanism of AD and T2DM may be related to CCR1;
   Dioscin, Camptothecin, and Albamycin are identified as the top three
   compounds with the highest affinity. In conclusion, this study reveals
   a shared molecular mechanism of AD and T2DM, highlighting CCR1 as a
   potential therapeutic target for these two diseases. Dioscin,
   Camptothecin and Albamycin in CCR1 binding sites present good affinity,
   which shows the potential as diagnostic markers and therapeutic
   potential of candidate compounds.

Methods

Data preparation

   The data preparation and analysis workflow in our study has been
   comprehensively outlined and depicted in Fig. [49]1. In this study,
   skin tissue microarray data for atopic dermatitis, including
   [50]GSE6012, [51]GSE16161, and [52]GSE182740, were initially obtained
   from the GEO database. Additionally, blood sample microarray data for
   type 2 diabetes, including [53]GSE15932, [54]GSE156993, and
   [55]GSE250283, were also retrieved. The preprocessing stage involved
   the use of the ComBat function from the sva package (version 3.44.0;
   Leek et al., 2012) to remove batch effects from the samples. This step
   is crucial as it eliminates technical differences between different
   batches, allowing for a more accurate comparison of biological
   differences between samples. Subsequently, the processed data were
   merged and normalized. Normalization can eliminate differences in data
   scale and dimension, enabling a fairer comparison between samples.
   Finally, transcriptome data for 25 normal samples and 39 atopic
   dermatitis samples were obtained, along with transcriptome data for 29
   normal and 61 type 2 diabetes patient blood samples.

Fig. 1.

   [56]Fig. 1
   [57]Open in a new tab

   Comprehensive outline of the data preparation and analysis workflow.

Differential expression genes (DEGs) analysis

   Differential expression gene analysis on the transcriptome data was
   performed using the limma package (version 3.54.0; Ritchie et al.,
   2015). By fitting a linear model, changes in gene expression were
   evaluated, with an empirical Bayesian method applied to stabilize
   variance estimates, effectively managing data noise and uncertainty.
   For data related to atopic dermatitis, a threshold parameter of |LogFC|
   > 0.5 and p-value < 0.05 was used. For the blood transcriptome data of
   type 2 diabetes, the threshold was set at |LogFC| > 0.3 and
   p-value < 0.05. All DEGs results are visualized with volcano plots and
   heatmaps.

Gene set variation analysis (GSVA)

   The relevant pathway gene set list originates from the KEGG_MEDICUS
   subset of CP (c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt). Differences
   in enrichment scores between each disease and normal tissue sample were
   calculated using the GSVA package (version 1.46.0; Hänzelmann et al.,
   2013) within the R 4.3.2 environment. This analysis aids in
   interpreting gene expression patterns under various disease states and
   their impact on specific biological pathways. For visual
   representation, box plots were utilized.

Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG)
enrichment analysis

   The DEGs from both atopic dermatitis and type 2 diabetes were
   intersected to obtain common differential genes for further functional
   and signal pathway enrichment analysis. Enrichment analysis for GO and
   KEGG was conducted using the “clusterProfiler” package. Enrichment
   results with a p-value less than 0.05 were selected and visualized in
   the form of bar charts and bubble charts.

Machine learning

   In the process of screening for characteristic genes of atopic
   dermatitis and type 2 diabetes, we employ two methods: Support Vector
   Machine - Recursive Feature Elimination (SVM-RFE) and Random Forest
   (RF). Initially, transcriptomic data from atopic dermatitis skin tissue
   and type 2 diabetes blood samples are normalized to ensure data
   comparability. Subsequently, the SVM-RFE method is utilized to select
   features by recursively reducing the size of the feature set. At each
   step, SVM-RFE removes the feature contributing the least to the model
   until a predetermined number of features is reached or optimal model
   performance is achieved. In the RF model, each decision tree is trained
   on a random subset of the dataset, enhancing model diversity and
   reducing the risk of overfitting. By comparing the performance of
   different decision trees, we can determine which features are most
   important for distinguishing between the two diseases. Through these
   two methods, we screen genes that are commonly upregulated in the
   disease group. Ultimately, we obtain a set of characteristic genes that
   are highly important in both diseases.

Receiver operating characteristic (ROC) analysis

   Initially, the intersection of diagnostic genes identified for atopic
   dermatitis and type 2 diabetes is taken to derive the common genes
   between the two conditions. Gene expression data and sample
   classification information are then extracted from the dataset to
   construct a new dataframe. Following this, the plot.roc function is
   utilized within the R4.3.2 environment to generate the ROC curve.

Immune cell infiltration analysis

   We employed a deconvolution method based on transcriptomic data to
   conduct an in-depth analysis of immune cell infiltration in each
   disease sample and normal sample. Using the CIBERSORT tool, which has
   been integrated into the Ecotyper platform
   ([58]https://ecotyper.stanford.edu/; accessed October 1, 2023), we
   analyzed the infiltration levels of 22 types of immune cells in atopic
   dermatitis skin tissue and type 2 diabetes blood samples.

Single-cell data analysis

   The single-cell transcriptomic datasets [59]GSE222840, related to
   atopic dermatitis, and [60]GSE244515, related to blood samples from
   type 2 diabetes patients, undergo quality control based on the
   criteria: nFeature_RNA > 500, percent.mt < 20, percent.HB < 3, and
   nCount_RNA > 1000. All samples integrate using the Harmony algorithm to
   remove batch effects. The RunUMAP function applies for dimensionality
   reduction of the integrated dataset. The plot_density function
   visualizes the expression levels of the co-diagnosis genes across
   different cell populations.

Cell chat analysis

   Firstly, we utilize the built-in database, CellChatDB.human, to create
   a CellChat object. Subsequently, we employ the
   identifyOverExpressedGenes and identifyOverExpressedInteractions
   functions to identify overexpressed genes and potential receptor-ligand
   pairs in each cell subpopulation. These receptor-ligand pairs are then
   mapped onto the protein-protein interaction network by the projectData
   function. Next, the computeCommunProbPathway function is used to
   calculate the communication probabilities between cell subpopulations
   and infer cell signaling at the pathway level. Finally, the
   communication between various cell types is visualized through a
   network graph.

Virtual screening

   Initially, the 7VL9.pdb file, which contains the crystal structure
   information of the CCR1 protein with a resolution of 2.6 Å, was
   retrieved from the PDB database. Subsequently, Autodock Vina software
   was employed for virtual screening. Prior to the commencement of the
   screening, we have prepared the three-dimensional structure files of
   all 323 natural products and appropriately preprocessed all input
   files, such as removing water molecules and adding hydrogen atoms.
   Then, the parameters for Autodock Vina were configured, including the
   size and location of the screening box among others. Upon completion of
   the virtual screening, Autodock Vina generates an Affinity value for
   each compound, reflecting the binding affinity of the compound to the
   protein. Finally, we utilize PyMol software to perform a visual
   analysis of the potential active compounds that exhibit high affinity.

Data integration and processing

   To address the challenges inherent in integrating data derived from
   different platforms (Affymetrix, Illumina, and NGS), we implemented a
   rigorous workflow designed to ensure consistency, comparability, and
   reliability across datasets. This process involved data preprocessing,
   batch effect correction, differential gene expression analysis, and
   external validation, detailed as follows: During the data preprocessing
   step, raw data from each platform underwent stringent quality control
   to remove low-quality samples and genes with low expression, ensuring
   high data quality across the dataset. Platform-specific preprocessing
   methods were applied to correct for background noise and normalize the
   data, mitigating systematic biases and aligning the datasets to a
   comparable scale. Specifically, the Robust Multi-array Average (RMA)
   method was used for Affymetrix data, and Quantile Normalization was
   applied to Illumina and NGS data. To reconcile differences in probe
   sets across platforms, probe IDs were mapped to a unified set of gene
   symbols using external databases such as ENSEMBL and NCBI Gene,
   enabling accurate integration at the gene level. To address potential
   batch effects due to platform-specific differences, the ComBat method
   (via the ComBat function from the sva package) was applied to correct
   for these effects, thereby enhancing inter-platform comparability.
   Following data integration, a robust differential gene expression
   analysis was conducted with stringent quality control and significance
   thresholds to ensure reliable and consistent results. Finally, an
   external validation dataset was employed to confirm the integrity of
   the integrated dataset, verifying the accuracy and stability of the
   results post-integration.

Other tools

   For functional and signal pathway enrichment analysis, the
   clusterProfiler package was used (Yu et al., 2012), while e1071 and
   randomForest packages were employed for machine learning processes in
   the screening of characteristic genes. Version details and references