Abstract Although atopic dermatitis (AD) and type 2 diabetes mellitus (T2DM) may appear clinically and pathophysiologically unrelated, AD is a common skin disease characterized by chronic inflammation and skin barrier dysfunction, whereas T2DM is a metabolic disorder marked by hyperglycemia and chronic inflammation, which further exacerbates insulin resistance (IR) through the release of systemic inflammatory factors. Despite their apparent differences, the molecular mechanisms shared between AD and T2DM remain relatively unexplored. In this study, we integrated transcriptomic data from both AD and T2DM using differential gene expression analyses (DEGs), gene set variation analysis (GSVA), and machine learning algorithms to uncover common features of these diseases. We identified several characteristic genes, including LTF, LTB4R, and CCR1, which are significantly upregulated in both conditions and may serve as potential biomarkers. Furthermore, virtual screening revealed that Dioscin, Camptothecin, and Albamycin exhibit strong affinity for the CCR1 binding site, indicating their potential as therapeutic candidates. In summary, this study elucidates the shared molecular mechanisms of AD and T2DM and introduces new potential targets and drugs for the diagnosis and treatment of these diseases. Keywords: Atopic dermatitis, Type 2 diabetes mellitus, Integrated transcriptomic analysis, Machinelearning, CCR1 Subject terms: Computational biology and bioinformatics, Immunology Introduction As a metabolic disease characterized by hyperglycemia, chronic hyperglycemia caused by insulin resistance and β-cell dysfunction, Type 2 diabetes mellitus (T2DM) accounts for 90–95% of diabetes cases worldwide^[29]1. It often leads to serious complications, such as cardiovascular disease, neuropathy, kidney disease and retinopathy, which is usually associated with chronic inflammation^[30]2. Meanwhile, obesity and insulin resistance often trigger systemic low-grade inflammation, with elevated levels of inflammatory factors such as TNF-α and IL-6, which in turn impair insulin signaling^[31]3. Characterized by chronic inflammation, intense pruritus, and eczema-like skin lesions, atopic dermatitis (AD) is a disease that affects approximately 15–20% of children and 1–3% of adults worldwide. Its pathogenesis involves a complex interplay of genetic susceptibility, environmental triggers, immune dysregulation, and skin barrier dysfunction, giving rise to elevated immunoglobulin E (IgE), allergic reactions, and systemic inflammation^[32]4,[33]5. AD and T2DM both have significantly impacts on patient’s health and quality of life. As for AD, immune cells such as T cells and mast cells are hyperactive. As for T2DM, low-grade inflammation triggered by obesity and insulin resistance impairs insulin signaling^[34]6,[35]7. AD and T2DM are both intertwined with immune system imbalance and chronic inflammation^[36]8,[37]9. However, many questions remain unsolved about the causative genes or transcription factors shared by AD and T2DM, as well as potential common therapeutic targets. A recent study found an association between AD and type 2 diabetes (T2D). According to the survey using national health data from 2002 to 2015 in South Korea, the risk of subsequent T2D in patients with AD is significantly increased^[38]10. Mendelian randomization study further explores the causal relationship between AD, type 1 diabetes (T1D) and T2D, which showed that the gene prediction of AD significantly increases the risk of T2D^[39]9. Both AD and diabetes involve immune dysregulation and share common inflammatory pathways, suggesting the existence of overlapping mechanisms. The immune system may serve as a bridge linking the pathogenesis of these two diseases. For example, the actions of cytokines and immune cells in AD may affect metabolic pathways involved in T2D. Obesity is an important risk factor for T2D, and studies have shown that obesity may lead to more frequent occurrences of AD and exacerbate its symptoms^[40]11. The prevalence of obesity is higher in AD patients compared to the general population, and its effects are particularly pronounced in children, with obesity before the age of 5 significantly increasing the risk of developing AD^[41]12. A large-scale study involving 2,090 adult patients further confirmed the clear association between obesity and AD^[42]13. In experimental models, obese AD mice exhibited a 2–4 times thicker ear tissue compared to non-obese AD mice, indicating that the persistent inflammation caused by obesity leads to an exacerbation of AD severity, even when obese mice reach similar body weight as the control group^[43]14,[44]15. However, a cross-sectional study from the “Canadian Tomorrow Study” found a negative association between AD and T2D, with AD linked to a lower risk of T2D (OR: 0.78, 95% CI: 0.71–0.84) and reduced risks of hypertension, myocardial infarction, and stroke^[45]16.Given these contrasting findings, it remains unclear whether AD directly contributes to the development of T2D. Therefore, it is hypothesized that transcriptome data of AD and T2D could help elucidate the shared molecular mechanisms between the two diseases, identify potential biomarkers associated with their development, and suggest therapeutic targets to address both conditions. Moreover, two-disease model analysis has proven to be effective in elucidating commonalities between various chronic diseases. For example, rheumatoid arthritis (RA) and cardiovascular disease (CVD) share the inflammatory pathways, especially nuclear factor-kappa B (NF-κB) signaling pathway and tumor necrosis factor (TNF) pathway. Chronic obstructive pulmonary disease (COPD) and lung cancer share environmental and genetic risk factors, such as tobacco exposure and TP53 gene mutations. Alzheimer’s disease and T2DM share dysregulation of insulin signaling pathways, including the insulin receptor substrate (IRS) pathway and the Phosphoinositide 3-kinase (PI3K)/protein kinase B (Akt) pathway^[46]6. However, these two diseases closely related to the inflammatory response remain common and insurmountable health challenges. Therefore, it is of great significance to explore the common transcription in T2DM and AD groups and the gene expression pathway. This study aims to integrate the transcriptome data of AD and T2DM, so as to elucidate the shared molecular mechanisms between these two diseases, explore potential biomarkers associated with the development of AD and T2DM, and screen potential therapeutic drugs. To this end, the transcriptome data related to AD and T2DM are integrated from the database of gene expression omnibus (GEO). In addition, the differences in gene expression analysis and weighted total express network analysis (WGCNA) are used to identify the key to each disease gene and module. Through genetic variation analysis (GSVA), each disease and normal tissue samples of the enrichment score differences are calculated, with the enrichment analysis of GO and KEGG conducted^[47]17–[48]19. Two diagnostic genes shared by the disease are identified, including LTF, LTB4R, and CCR1. Their good performance was validated with external datasets. Further, we have analyzed common skin atopic dermatitis and T2D in 22 kinds of immune cells infiltrating levels in the blood, and adopted single-cell RNA sequencing positioning gene expression in specific cell types. Virtual screening was used to identify potential therapeutic compounds targeting shared genes. According to the results, the comorbidity mechanism of AD and T2DM may be related to CCR1; Dioscin, Camptothecin, and Albamycin are identified as the top three compounds with the highest affinity. In conclusion, this study reveals a shared molecular mechanism of AD and T2DM, highlighting CCR1 as a potential therapeutic target for these two diseases. Dioscin, Camptothecin and Albamycin in CCR1 binding sites present good affinity, which shows the potential as diagnostic markers and therapeutic potential of candidate compounds. Methods Data preparation The data preparation and analysis workflow in our study has been comprehensively outlined and depicted in Fig. [49]1. In this study, skin tissue microarray data for atopic dermatitis, including [50]GSE6012, [51]GSE16161, and [52]GSE182740, were initially obtained from the GEO database. Additionally, blood sample microarray data for type 2 diabetes, including [53]GSE15932, [54]GSE156993, and [55]GSE250283, were also retrieved. The preprocessing stage involved the use of the ComBat function from the sva package (version 3.44.0; Leek et al., 2012) to remove batch effects from the samples. This step is crucial as it eliminates technical differences between different batches, allowing for a more accurate comparison of biological differences between samples. Subsequently, the processed data were merged and normalized. Normalization can eliminate differences in data scale and dimension, enabling a fairer comparison between samples. Finally, transcriptome data for 25 normal samples and 39 atopic dermatitis samples were obtained, along with transcriptome data for 29 normal and 61 type 2 diabetes patient blood samples. Fig. 1. [56]Fig. 1 [57]Open in a new tab Comprehensive outline of the data preparation and analysis workflow. Differential expression genes (DEGs) analysis Differential expression gene analysis on the transcriptome data was performed using the limma package (version 3.54.0; Ritchie et al., 2015). By fitting a linear model, changes in gene expression were evaluated, with an empirical Bayesian method applied to stabilize variance estimates, effectively managing data noise and uncertainty. For data related to atopic dermatitis, a threshold parameter of |LogFC| > 0.5 and p-value < 0.05 was used. For the blood transcriptome data of type 2 diabetes, the threshold was set at |LogFC| > 0.3 and p-value < 0.05. All DEGs results are visualized with volcano plots and heatmaps. Gene set variation analysis (GSVA) The relevant pathway gene set list originates from the KEGG_MEDICUS subset of CP (c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt). Differences in enrichment scores between each disease and normal tissue sample were calculated using the GSVA package (version 1.46.0; Hänzelmann et al., 2013) within the R 4.3.2 environment. This analysis aids in interpreting gene expression patterns under various disease states and their impact on specific biological pathways. For visual representation, box plots were utilized. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analysis The DEGs from both atopic dermatitis and type 2 diabetes were intersected to obtain common differential genes for further functional and signal pathway enrichment analysis. Enrichment analysis for GO and KEGG was conducted using the “clusterProfiler” package. Enrichment results with a p-value less than 0.05 were selected and visualized in the form of bar charts and bubble charts. Machine learning In the process of screening for characteristic genes of atopic dermatitis and type 2 diabetes, we employ two methods: Support Vector Machine - Recursive Feature Elimination (SVM-RFE) and Random Forest (RF). Initially, transcriptomic data from atopic dermatitis skin tissue and type 2 diabetes blood samples are normalized to ensure data comparability. Subsequently, the SVM-RFE method is utilized to select features by recursively reducing the size of the feature set. At each step, SVM-RFE removes the feature contributing the least to the model until a predetermined number of features is reached or optimal model performance is achieved. In the RF model, each decision tree is trained on a random subset of the dataset, enhancing model diversity and reducing the risk of overfitting. By comparing the performance of different decision trees, we can determine which features are most important for distinguishing between the two diseases. Through these two methods, we screen genes that are commonly upregulated in the disease group. Ultimately, we obtain a set of characteristic genes that are highly important in both diseases. Receiver operating characteristic (ROC) analysis Initially, the intersection of diagnostic genes identified for atopic dermatitis and type 2 diabetes is taken to derive the common genes between the two conditions. Gene expression data and sample classification information are then extracted from the dataset to construct a new dataframe. Following this, the plot.roc function is utilized within the R4.3.2 environment to generate the ROC curve. Immune cell infiltration analysis We employed a deconvolution method based on transcriptomic data to conduct an in-depth analysis of immune cell infiltration in each disease sample and normal sample. Using the CIBERSORT tool, which has been integrated into the Ecotyper platform ([58]https://ecotyper.stanford.edu/; accessed October 1, 2023), we analyzed the infiltration levels of 22 types of immune cells in atopic dermatitis skin tissue and type 2 diabetes blood samples. Single-cell data analysis The single-cell transcriptomic datasets [59]GSE222840, related to atopic dermatitis, and [60]GSE244515, related to blood samples from type 2 diabetes patients, undergo quality control based on the criteria: nFeature_RNA > 500, percent.mt < 20, percent.HB < 3, and nCount_RNA > 1000. All samples integrate using the Harmony algorithm to remove batch effects. The RunUMAP function applies for dimensionality reduction of the integrated dataset. The plot_density function visualizes the expression levels of the co-diagnosis genes across different cell populations. Cell chat analysis Firstly, we utilize the built-in database, CellChatDB.human, to create a CellChat object. Subsequently, we employ the identifyOverExpressedGenes and identifyOverExpressedInteractions functions to identify overexpressed genes and potential receptor-ligand pairs in each cell subpopulation. These receptor-ligand pairs are then mapped onto the protein-protein interaction network by the projectData function. Next, the computeCommunProbPathway function is used to calculate the communication probabilities between cell subpopulations and infer cell signaling at the pathway level. Finally, the communication between various cell types is visualized through a network graph. Virtual screening Initially, the 7VL9.pdb file, which contains the crystal structure information of the CCR1 protein with a resolution of 2.6 Å, was retrieved from the PDB database. Subsequently, Autodock Vina software was employed for virtual screening. Prior to the commencement of the screening, we have prepared the three-dimensional structure files of all 323 natural products and appropriately preprocessed all input files, such as removing water molecules and adding hydrogen atoms. Then, the parameters for Autodock Vina were configured, including the size and location of the screening box among others. Upon completion of the virtual screening, Autodock Vina generates an Affinity value for each compound, reflecting the binding affinity of the compound to the protein. Finally, we utilize PyMol software to perform a visual analysis of the potential active compounds that exhibit high affinity. Data integration and processing To address the challenges inherent in integrating data derived from different platforms (Affymetrix, Illumina, and NGS), we implemented a rigorous workflow designed to ensure consistency, comparability, and reliability across datasets. This process involved data preprocessing, batch effect correction, differential gene expression analysis, and external validation, detailed as follows: During the data preprocessing step, raw data from each platform underwent stringent quality control to remove low-quality samples and genes with low expression, ensuring high data quality across the dataset. Platform-specific preprocessing methods were applied to correct for background noise and normalize the data, mitigating systematic biases and aligning the datasets to a comparable scale. Specifically, the Robust Multi-array Average (RMA) method was used for Affymetrix data, and Quantile Normalization was applied to Illumina and NGS data. To reconcile differences in probe sets across platforms, probe IDs were mapped to a unified set of gene symbols using external databases such as ENSEMBL and NCBI Gene, enabling accurate integration at the gene level. To address potential batch effects due to platform-specific differences, the ComBat method (via the ComBat function from the sva package) was applied to correct for these effects, thereby enhancing inter-platform comparability. Following data integration, a robust differential gene expression analysis was conducted with stringent quality control and significance thresholds to ensure reliable and consistent results. Finally, an external validation dataset was employed to confirm the integrity of the integrated dataset, verifying the accuracy and stability of the results post-integration. Other tools For functional and signal pathway enrichment analysis, the clusterProfiler package was used (Yu et al., 2012), while e1071 and randomForest packages were employed for machine learning processes in the screening of characteristic genes. Version details and references