Abstract Objective The incidence of Crohn's disease (CD) and rheumatoid arthritis (RA) co-morbidity, as well as the number of individuals affected, is on the rise due to their shared molecular and cellular factors. This study aimed to identify potential therapeutic targets and medicines for comorbid CD and RA. Methods We integrated single-cell RNA sequencing, Mendelian randomization, and colocalization analysis results from public databases to analyse immune cell subgroups in CD and RA patients and identify candidate therapeutic targets. We further screened potential medicines for the identified candidate targets using the Comparative Toxicogenomics Database (CTD) and molecular docking and molecular dynamics simulations. Results The proportion of CD8 effector memory T cells (Tem) was consistently elevated in the peripheral blood mononuclear cells (PBMCs) of both CD and RA patients. MYBL1 had a causal effect on the onset of both CD (OR = 1.23; 95 % CI, 1.05–1.45; P = 0.046) and RA (OR = 1.45; 95 % CI, 1.14–1.85; P = 0.04). Four potential therapeutic molecules were retrieved from the CTD database, among which tretinoin (docking score: −6.3 kcal/mol) showed the best potential. Conclusion Our comprehensive analysis suggests that CD8 Tem cells are a key cell group in comorbid RA and CD and that MYBL1 has a causal effect. Tretinoin was identified as a potential targeted therapeutic drug, which is of great clinical research value. Keywords: Crohn's disease, Rheumatoid arthritis, Mendelian randomization, Drug target Highlights * • CD8 Tem cells are involved in the pathogenesis of comorbid CD and RA. * • MYBL1 is a potential drug target for comorbid CD and RA. * • Tretinoin is a potential drug for the treatment of CD and RA comorbidity. Abbreviations (if any) ANXA1 annexin A1 BH Benjamini-Hochberg CD crohn's disease CI confidence interval CT control CTD comparative toxicogenomics database CTSW contrary cathepsin W DC dendritic cell DEGs differentially expressed genes DMAIDs anti-inflammatory bowel disease medicines DMARDs disease-modifying anti-rheumatic medicines EBI European Bioinformatics Institute eQTL: expression quantitative trait loci FKBP11 FK506 binding protein 11 GBP5 guanylate binding protein 5 GEO gene expression omnibus GO gene ontology GWAS genome-wide association study GZMB granzyme B GZMH granzyme H ITGA4 integrin subunit alpha 4 IVW inverse variance-weighted KEGG Kyoto Encyclopedia of Genes and Genomes KLRD1 lectin-like receptor D1 MIAT myocardial infarction-associated transcript MIF migration inhibitory factor MM/GBSA molecular mechanics/generalized born surface area MR mendelian randomization MRPL10 mitochondrial ribosomal protein L10 MYBL1 myb protooncogene like 1 NK natural killer NSAIDs nonsteroidal anti-inflammatory medicines OR odds ratio PBMCs peripheral blood mononuclear cells PCA principal component analysis PTPN4 protein tyrosine phosphatase nonreceptor type 4 RA rheumatoid arthritis RMSD root mean square deviation RMSF root mean square fluctuation RNA-seq transcriptome sequencing scRNA-seq single-cell RNA sequencing SNP single nucleotide polymorphism SUN2 sad1 and UNC84 domain containing 2 Tef effector T cells Tem effector memory T cells TIGIT T cell immunoreceptor with Ig and ITIM domains Tn naive T cells TNFAIP3 tumor necrosis factor alpha-induced protein 3 UMAP uniform manifold approximation and projection 1. Introduction Crohn's disease (CD) and rheumatoid arthritis (RA) are both chronic autoimmune diseases that can result in debilitating complications affecting the gastrointestinal tract and joints, respectively [[35]1,[36]2]. Over the past 30 years, the incidence of both diseases has steadily increased with projections for continued rise [[37]3,[38]4]. These diseases share common pathogenic mechanisms, including dysregulated immune responses and genetic predisposition. Importantly, a growing body of research indicates a high likelihood of comorbidity between these two disorders. Studies have shown that patients with CD demonstrate a 3.35 times higher risk of developing RA compared to individuals without CD [[39]5]. Meanwhile, RA is associated with an 11 % increased risk of CD [[40]6]. The increasing prevalence and burden of this comorbidity highlight the urgent need for effective therapeutic strategies, which we aim to address in this study. Presently, pharmaceutical interventions serve as the first line of defence against these two diseases, with clinical remission as the primary goal. In recent years, the use of anti-inflammatory bowel disease medicines (DMAIDs) and disease-modifying anti-rheumatic medicines (DMARDs) has shown favourable outcomes in the treatment of CD and RA [[41]7,[42]8]. However, in populations with comorbid CD and RA, there are limited safe and effective therapeutic alternatives. While certain immunosuppressants and biologic agents demonstrate therapeutic efficacy in both CD and RA, their long-term use may increase susceptibility to infections and tumors. Moreover, despite the significant potency of these biologic agents, they are expensive and also may trigger severe allergic reactions. Furthermore, some commonly used medicines for the treatment of RA, such as penicillamine, gold preparations, and nonsteroidal anti-inflammatory medicines (NSAIDs), may cause various gastrointestinal complications or even worsen CD [[43]9]. Peripheral blood mononuclear cells (PBMCs) play a pivotal role in modulating the body's immune response. Interacting with PBMCs can effectively modulate the body's immune response and inhibit inflammatory responses, making them a highly promising therapeutic target. In recent years, an increasing number of studies have focused on investigating the significance of PBMCs in diagnosing, monitoring disease activity, and evaluating treatment response [[44]4]. Pu demonstrated that the differentiation and development of distinct PBMC subpopulations are crucial for maintaining immunological homeostasis, suppressing inflammatory responses, and improving the prognosis of CD [[45]10]. Recent studies have shown that monocytes in PBMCs can migrate to the synovial tissue in response to chemokines, differentiate into macrophages, and secrete proinflammatory factors that promote inflammatory responses in RA [[46]11]. Therefore, conducting an in-depth examination of the role of PBMCs in CD and RA progression could potentially lead to the discovery of more effective interventions aimed at improving patients' quality of life. While traditional methods including observational studies and animal models have been instrumental in disease research, they are often limited by confounding factors and lack the capability to address cell heterogeneity. Single-cell RNA sequencing (scRNA-seq) has emerged as a potent tool that enables an in-depth exploration of immune cell subpopulations within PBMCs, thereby unveiling the complexity of biological systems. When combined with scRNA-seq, several complementary technologies greatly help reveal the molecular mechanisms of complex diseases. Expression quantitative trait loci (eQTL) analysis aids in understanding the genetic regulation of gene expression, while Mendelian randomization helps to determine the causal relationships between genes and diseases [[47]12,[48]13]. Colocalization analysis assists in identifying common genetic variants associated with multiple traits or diseases [[49]14]. Transcriptome sequencing (RNA-seq) allows for a comprehensive analysis of gene expression patterns. Molecular docking and molecular dynamics simulation are helpful in studying the interactions between drug targets and ligands and the resulting conformational changes. Despite these advances, comprehensive understanding of comorbidities, particularly between CD and Rheumatoid RA, remains a significant gap in literature. This study addresses this issue by integrating advanced technologies to analyse PBMC sequencing data, aiming to identify potential therapeutic targets and treatments for comorbid CD and RA. The study workflow is illustrated in [50]Fig. 1. Fig. 1. [51]Fig. 1 [52]Open in a new tab Study design for identification of genes causally associated with CD and RA. 2. Materials and methods 2.1. Data source and preparation The scRNA-seq data (10X Genomics data) from PBMC of CD ([53]GSE163314) and RA ([54]GSE159117) patients were initially retrieved and downloaded from the Gene Expression Omnibus (GEO) database. Detailed information about these datasets can be found in [55]Table 1. To analyse the scRNA-seq data, an initial screening process was conducted using the Seurat R package [[56]15]. During this screening, only genes expressed in a minimum of 3 cells and cells expressing at least 200 genes were retained. The PercentageFeatureSet function was utilized to assess the ratio of mitochondrial DNA to rRNA. Additionally, we selected cells with feature counts ranging from 100 to 4000 and excluded cells with mitochondrial gene counts exceeding 10 %. Ultimately, our analysis included a total of 35,282 single-cell data points, comprising 7428 from normal controls (CT), 17,690 from CD patients and 10,164 from RA patients. Table 1. Detailed information about the data. ID Type Descriptive [57]GSM4976997 scRNA-seq scRNA-seq for PBMC from healthy control [58]GSM4977001 scRNA-seq scRNA-seq for PBMC from healthy control [59]GSM4976999 scRNA-seq scRNA-seq for PBMC from CD patient [60]GSM4977003 scRNA-seq scRNA-seq for PBMC from CD patient [61]GSE159117 scRNA-seq scRNA-seq for PBMC from RA patient ebi-a-GCST003044 GWAS Including 5956 CD cases and 21,770 population controls of European descent with 110,583 SNPs. ebi-a-GCST005569 GWAS Including 11,475 RA cases and 15,870 controls of European descent with 129,464 SNPs. [62]Open in a new tab The eQTL data used in this study are derived from an extensive dataset made publicly accessible by Vosa et al. [[63]16]. The genetic variants that exhibited a substantial association with CD were obtained from a comprehensive genome-wide association study (GWAS) carried out by the European Bioinformatics Institute (EBI), involving 5956 CD patients and 14,927 controls (CTs) [[64]17]. Similarly, the genetic variants significantly associated with RA were extracted from another large GWAS that was also conducted by the EBI and encompassed a cohort of 13,838 individuals diagnosed with RA and 33,742 CTs [[65]18]. To enhance the credibility of our results, we performed an external validation utilizing GWAS provided by the University of Bristol, which comprised 17,897 CD patients and 33,977 CTs [[66]17]. 2.2. Sequencing data analysis First, we applied log normalization to normalize the gene expression of individual cells. Subsequently, we identified 2000 highly variable genes using the FindVariableFeatures function. We then performed principal component analysis (PCA) to downscale the data, effectively addressing batch effects between samples by applying the Harmony method. Next, we utilized the uniform manifold approximation and projection (UMAP) nonlinear dimensionality reduction technique and visualized the results through the DimPlot function. We repeated these steps after extracting T-cell clusters from PBMCs. We employed the Slingshot package to infer cell lineage and pseudotemporal relationships [[67]19]. Additionally, the CellChat package was utilized to infer intercellular communication and identify significant pathways based on gene expression interactions, signalling ligands, receptors, and cofactors [[68]20]. Then, highly differentially expressed genes (DEGs) (at least 2-fold) in CD8 Tem cells relative to other cells were identified using the FindMarkers function. Finally, GO and KEGG enrichment analysis of genes specifically highly expressed in the screened CD8 Tem cells was performed using the clusterProfiler package [[69]21]. 2.3. Mendelian randomization analysis and Bayesian colocalization analysis In this study, we conducted MR analysis of the DEGs identified through single-cell analysis using the “TwoSampleMR” package. Only genes with significant eQTL associations (P < 5 × 10^−8) were considered, using CD and RA as outcome indicators. Furthermore, phenotypic scanning was performed using the “phenoscanner”, and SNPs associated with known risk factors for CD or RA were excluded. When a given gene contained only one eQTL, the Wald ratio was employed. Inverse variance-weighted MR (MR-IVW) was used as multiple genetic instrumental variables were present, and heterogeneity analysis was performed. SNPs in the exposure and outcome GWAS were directionally harmonized and Steiger filtered. Subsequently, common causative genes of CD and RA were selected for reverse causality analysis. In addition, Bayesian colocalization analysis was employed using the “coloc” package to evaluate the correlation between the two traits. And, determined from previous articles, when PPH_4 > 80 % was defined as the evidence of colocalization of genes [[70]12]. 2.4. Role of pathogenic genes at the scRNA-seq To investigate the involvement of pathogenic genes in T cells, we initially conducted weighted kernel density estimation using the Nebulosa package. This analysis allowed us to determine the densities and weights of disease-causative genes across different T cell subpopulations. Subsequently, GeneSwitches was employed to calculate the switching probabilities of genes over pseudo time, which provided insights into the activation and repression patterns of genes over their developmental trajectories. Based on the above analyses, T cell subpopulations exhibiting elevated levels of pathogenic genes in both CD and RA were classified according to the switching behaviour of the pathogenic genes as positive gene populations or negative cell populations. Subsequently, the intercellular communication and metabolic activities between these cell populations were further investigated with the aim of revealing their functional interactions. 2.5. Drug prediction We used MYBL1, CD, and RA as keywords to construct the disease–gene–drug linkage network using the Comparative Toxicogenomics Database (CTD) and screened for drugs targeting candidates that play key roles in disease development [[71]22]. Furthermore, the structures of small molecule drugs and targets were retrieved from the PDB and AlphaFold protein structure database, and molecular docking between the two was performed using AutoDock software [[72]23,[73]24]. Subsequently, molecular dynamics simulation of the small molecule–protein complexes was carried out using GROMACS software with energy minimization and simulation times up to 100 ns, and the root mean square deviation (RMSD), root mean square fluctuation (RMSF), and binding free energy between the protein and the small molecule were calculated [[74]25,[75]26]. 2.6. Statistical analysis All statistical analyses were executed in R (version 4.3.0). The Wilcoxon test was employed as the primary method for assessing differences in gene expression of sequencing data. In MR studies, multiple comparisons were rigorously controlled using the Benjamini-Hochberg (BH) procedure. The MR effect size was expressed using odds ratio (OR) with a 95 % confidence interval (95 % CI). 3. Results 3.1. Screening unique cell populations and DEGs in scRNA-seq First, we identified five major cell types from the PBMCs of CT, CD, and RA patients based on marker genes such as MS4A1, PPBP, FCER1A, GNLY, CD3E, and CD14: natural killer (NK) cells, T cells, monocytes, B cells, and dendritic cells (DCs). Among them, T cells were the most prevalent in PBMCs, and their proportion increased significantly in the disease groups ([76]Fig. 2A). Therefore, we focused on T cells for further classification. According to previous research by Zhang Zemin's team, T cells can be categorized into CD4 naive T cells (Tn), CD4 effector T cells (Tef), CD8 Tn cells, CD8 effector memory T cells (Tem), and NK and T cells [[77]27]. We found that the proportion of CD8 Tem cells was significantly increased in both diseases ([78]Fig. 2B). The results of T cell lineage analysis and pseudotime inference are depicted in [79]Fig. 2C, where all trajectories pass through CD8 Tem cells, suggesting an active proliferation phase of CD8 Tem cells during the development of both diseases. As shown in [80]Fig. 2D, CD8 Tem cells interact with four other cell types via four signalling pathways, with macrophage migration inhibitory factor (MIF) exhibiting the strongest outward signalling in CD patients. [81]Fig. 2E illustrates that CD8 Tem cells interact with five other cell types in RA patients through five signalling pathways, again with strong outward signalling via the MIF pathway. However, compared to those in CD patients, CD8 Tem cells in individuals with RA did not show an association with CD8 Tn cells but could regulate NK cells as well as their own population. We collected and organized a total of 138 DEGs in CD8 Tem cells ([82]Supplementary Table 1). Subsequently, we conducted GO and KEGG pathway enrichment analyses of the DEGs. The GO enrichment analysis revealed significant involvement in various biological processes, including lymphocyte differentiation, T cells differentiation, and immune activation. Cellular components included the plasma membrane signalling receptor complex, cytoplasmic side of the membrane, and cytoplasmic side of the plasma membrane. The associated molecular functions included protein self-association, protein tyrosine phosphatase activity, and MHC protein binding, as shown in [83]Fig. 2F. Furthermore, the KEGG pathway enrichment analysis indicated strong associations with pathways such as the T cell receptor pathway, NK cell receptor pathway, and MHC protein binding. Notably, the analysis revealed enrichment in the inflammatory bowel disease-related pathway, aligning with the purpose of our research ([84]Fig. 2G). Fig. 2. [85]Fig. 2 [86]Open in a new tab scRNA-seq analysis in the PBMC of CT, RA and CD patients. (A) UMAP plot and proportion of distinct cell subsets in PBMC; (B) UMAP plot and proportion of various T cell subgroups; (C) Pseudotime analysis of T cells inferred by Slingshot; (D) Intercellular ligand–receptor prediction among CD8 Tem and other PBMC cells revealed by CellChat in CD patient; (E) Intercellular ligand–receptor pre-diction among CD8 Tem and other PBMC cells revealed by CellChat in RA patient; (F) GO enrichment analysis of specifically highly expressed genes in CD8 Tem; (G) KEGG pathway enrichment of specifically highly expressed genes in CD8 Tem. 3.2. Screening the DEGs for comorbidity-causing genes The results of the genetic variation instrument variables for DEGs, variable harmonization, and Steiger filtering are presented in [87]Supplementary Table 2, ensuring alignment between the exposure and outcome instrumental variables. After performing BH correction, a total of 12 gene-CD pairs were obtained ([88]Fig. 3A, [89]Fig. 4A and [90]Supplementary Table 3). Of these, four demonstrated a protective effect on disease development, including annexin A1 (ANXA1, OR = 0.53; 95%CI, 0.43–0.66; P = 5.39E−07), granzyme H (GZMH, OR = 0.53; 95%CI, 0.41–0.68; P = 1.20E−05), myocardial infarction-associated transcript (MIAT, OR = 0.43; 95%CI, 0.31–0.58; P = 1.49E−06), and T cell immunoreceptor with Ig and ITIM domains (TIGIT, OR = 0.63; 95%CI, 0.48–0.82; P = 0.004). Conversely, the remaining eight exacerbate disease progression, including guanylate binding protein 5 (GBP5, OR = 1.43; 95%CI, 1.26–1.62; P = 5.39E−07), integrin subunit alpha 4 (ITGA4, OR = 1.14; 95%CI, 1.03–1.27; P = 0.046), mitochondrial ribosomal protein L10 (MRPL10, OR = 1.11; 95%CI, 1.03–1.19; P = 0.022), myb proto-oncogene like 1 (MYBL1, OR = 1.23; 95%CI, 1.05–1.45; P = 0.046), protein tyrosine phosphatase non-receptor type 4 (PTPN4, OR = 2.07; 95%CI, 1.58–2.71; P = 1.49E−06), sad1 and UNC84 domain containing 2 (SUN2, OR = 1.07; 95%CI, 1.02–1.13; P = 0.04), syne-2 (SYNE2, OR = 2.37; 95%CI, 1.72–3.26; P = 1.49E−06) and tumor necrosis factor alpha-induced protein 3 (TNFAIP3, OR = 1.41; 95%CI, 1.15–1.72; P = 0.06). Additionally, the P values of the MR analysis results for RA were also adjusted, and 5 causal genes remained ([91]Figs. 3B and [92]4B and [93]Supplementary Table 3), among which overexpression of FK506 binding protein 11 (FKBP11, OR = 5.01; 95%CI, 2.95, 8.5; P = 1.67E−07) and MYBL1 (OR = 1.45; 95%CI, 1.14–1.85; P = 0.04) increased the risk of RA, while on the contrary cathepsin W (CTSW, OR = 0.33; 95%CI, 0.21–0.52; P = 6.57E-05), granzyme B (GZMB, OR = 0.44; 95%CI, 0.29, 0.69; P = 0.005) and killer cell lectin-like receptor D1 (KLRD1, OR = 0.35; 95%CI, 0.22–0.57; P = 0.001) reduced the associated risk. Both MR results revealed that MYBL1 is a pathogenic gene shared between CD and RA. Meanwhile, no heterogeneity was detected for the genes analysed in the primary analysis ([94]Supplementary Table 4). Fig. 3. [95]Fig. 3 [96]Open in a new tab MR results for DEGs and the risk of CD and RA. Volcano plots of the MR results for DEGs on the risk of CD (A) and RA (B), re-spectively. The size of the points on the graph is determined by the PVE value. Dashed horizontal black line corresponded to ad just P = 0.05 (BH). ln: natural logarithm; PVE: proportion of variance explained. Fig. 4. [97]Fig. 4 [98]Open in a new tab Forest Plots of the results of the MR analysis for CD (A) and RA (B), respectively. The P values were adjusted by the BH method. SNP: single nucleotide polymorphism. Subsequently, since MYBL1 is a common causative gene for both CD and RA, bidirectional MR analysis was conducted for MYBL1 individually and in relation to comorbidities, revealing no evidence of a causal effect between disease and MYBL1 ([99]Supplementary Table 5). However, phenotypic scanning identified a between MYBL1 (rs1060242) and daytime napping ([100]Supplementary Table 6). Unexpectedly, however, the Bayesian colocalization analysis showed that MYBL1 did not share genetic variants with CD (PPH_4 = 0.199) and RA (PPH_4 = 0.384), as detailed in [101]Fig. 5 and [102]Supplementary Table 7. Fig. 5. [103]Fig. 5 [104]Open in a new tab Regional association plots of MYBL1 and CD (A) and RA (B), respectively. Diamond purple points represented the SNP that with the minimal sum of P value in corresponded gene GWAS and disease GWAS. 3.3. Role of MYBL1 in CD8 Tem As shown in [105]Fig. 6A, MYBL1 gene expression in PBMCs occurs predominantly in CD8 Tem cells. During the maturation process of CD8 Tem cells, MYBL1 gradually becomes downregulated, leading us to categorize them into MYBL1+ and MYBL1- CD8 Tem subpopulations ([106]Supplementary Table 8 and [107]Supplementary Fig. 1). In CD patients, MYBL1- CD8 Tem cells exhibited stronger interactions with other cell types than did MYBL1+ CD8 Tem cells. MYBL1-cells achieved these interactions primarily through MIF, while MYBL1+ cells utilized IL16–CD4 ([108]Fig. 6B). On the other hand, in RA patients, we observed similar intercellular interactions between the MYBL1+ and MYBL1-cell populations, both mediated by MIF. However, the MYBL1+ cell population was capable of interacting with both the MYBL1+ and MYBL1-cell populations ([109]Fig. 6C). In terms of cellular metabolic activities, the MYBL1+ cell population displayed heightened activity in drug metabolism involving other enzymes, whereas the MYBL1-cell population exhibited increased vigour in xenobiotic metabolism mediated by cytochrome P450 ([110]Fig. 6D and E). Fig. 6. [111]Fig. 6 [112]Open in a new tab The role of MYBL1 in T cells. (A) Nebulosa expression density for MYBL1 in T cells. (B) Intercellular ligand–receptor prediction among MYBL1(±) CD8 Tem and other PBMC cells revealed by CellChat in CD patient. (C) Intercellular ligand–receptor prediction among MYBL1(±) CD8 Tem and other PBMC cells revealed by CellChat in RA patient. (D)The metabolic activity of PBMC in CD patients. (E)The metabolic activity of PBMC in RA patients. 3.4. Molecular docking and dynamics simulatuons Four potential drug treatments, namely, arsenic trioxide, azathioprine, cyclosporine, and tretinoin, were screened from the CTD database as candidate inhibitors of MYBL1 expression in both CD and RA ([113]Supplementary Table 9). Subsequently, two of these small molecule medicines, azathioprine (PubChem CID: 2265, docking score: −5.5 kcal/mol) and Tretinoin (PubChem CID: 444795, binding energy: −6.3 kcal/mol), were selected for molecular docking with MYBL1 individually ([114]Supplementary Fig. 2 and [115]Fig. 7A). The results revealed that of these two drugs, tretinoin exhibited a stronger affinity towards MYBL1. Molecular dynamics simulations were then employed to validate the binding capability of tretinoin with MYBL1. Fig. 7. [116]Fig. 7 [117]Open in a new tab Molecular docking (A) and dynamics simulation between tretinoin and MYBL1. RMSD (B) of the tretinoin and MYBL1. RMSF (C) of the MYBL1 in the complex. RMSD: root mean square deviation; RMSAF: root mean square fluctuation. As shown in [118]Fig. 7B, over the initial 100 ns of the simulation, the RMSD value between the Tretinoin/MYBL1 complexes gradually stabilized within a narrow range, indicating a more stable binding interaction. The RMSF values, shown in [119]Fig. 7C illustrates the flexibility of each residue within the molecule. Moreover, the MM/GBSA analysis yielded an average binding free energy value of −25.56 ± 2.68 kcal/mol, suggesting that tretinoin can form a stable binding complex with MYBL1, thereby exerting pharmacological effects. 4. Discussion To the best of our knowledge, this is the first study to explore the comorbidity of CD and RA by integrating scRNA-seq data from PBMCs with eQTL, MR, and colocalization analyses. Importantly, we identified MYBL1 as a critical pathogenic target, which is highly expressed in CD8 Tem cells. To substantiate our findings, we employed molecular docking, and molecular dynamics simulation, which further corroborated the reliability of the identified potential targets and therapeutic agents. This comprehensive and integrative approach not only enhances our understanding of the shared pathogenesis of CD and RA but also paves the way for the development of novel therapeutic strategies for these debilitating diseases. T cells, a critical component of PBMCs, play an essential role in maintaining immune homeostasis and combating various diseases. Notably, the dysfunction of T cells is associated with numerous diseases, including autoimmune disorders, infectious diseases, and cancer, underscoring their crucial role in immune balance and disease development [[120]28]. Our study aligns with previous research indicating an elevated trend of CD8 Tem in various diseases, underscoring their pivotal role in adaptive responses and long-term immunity [[121]29]. Moreover, we found that both CD4 Tn and CD8 Tn cells can be converted to CD8 Tem cells through a series of regulatory mechanisms involving antigen expression, activation, differentiation, contraction, and survival. Interestingly, our study highlights the significant role of the MIF pathway in the interaction between CD8 Tem cells and multiple other immune cell types, which expands previous research finding that CD8 Tem cells can interact with multiple immune cell types [[122]30]. GO and KEGG enrichment analyses of DEGs specifically overexpressed in CD8 Tem cells showed that these DEGs were mainly involved in pathways related to lymphocyte cell differentiation, regulation, migration, apoptosis, autophagy, and inflammatory diseases, which is consistent with their biological significance. Future research on CD8 Tem cells could pave the way for novel diagnostic and therapeutic strategies targeting these cells, potentially expanding treatment options in clinical settings. In addition, DEGs in CD8 Tem cells are expected to be new therapeutic targets, so we combined MR and colocalization approaches to comprehensively analyse the causal proteins of CD and RA comorbidity towards a clinical translation of previous GWAS findings. To avoid genetic confounding due to reverse causality and linkage disequilibrium inherent in MR, we employed Steiger directionality filtering and reverse MR analysis. We identified MYBL1 as a causal gene in the comorbidity of CD and RA, with colocalization results supporting this genetic variation as shared by both disorders. Phenotypic scanning revealed an association between an MYBL1-related SNP (rs1060242) and daytime napping, a seemingly unrelated trait. However, recent prospective cohort studies have linked daytime napping with inflammatory bowel disease [[123]31]. Furthermore, daytime napping has been associated with pain, depression, anxiety, fatigue, memory difficulties, and insomnia, all of which are relevant to CD and RA [[124]32,[125]33]. MYBL1, a member of the MYB transcription factor family, plays crucial roles in cell cycle progression, differentiation, and apoptosis. Its aberrant expression is closely associated with human cancers, and it has been proposed as a novel therapeutic target in renal cancer through remodelling of the tumour microenvironment [[126]34]. However, the role of MYBL1 in autoimmune diseases has been largely overlooked. We further investigated the role of MYBL1 in T cells. Utilizing a gene-weighted 2D kernel density plot, we observed a distinct overexpression of MYBL1 in CD8 Tem cells, with its regulation in these cells negatively correlated with developmental time. Based on the regulatory status of MYBL1, we stratified CD8 Tem cells into MYBL1+ and MYBL1-subgroups. We hypothesized that MYBL1+ CD8 Tem cells could be the pathogenic cell subgroup involved in this comorbidity. Cell interaction analysis revealed reduced interactions between MYBL1+ CD8 Tem cells and other immune cells, and distinct metabolic activities were observed between the MYBL1+ and MYBL1-subgroups. Subsequent whole-blood transcriptomic analysis showed a significant overexpression of MYBL1 in the disease group, indirectly supporting our hypothesis. However, it is worth noting that although we identified MYBL1 as a potential therapeutic target for comorbidity, its role in the development or maintenance of these disorders requires further interpretation. After identifying this potential target, we mined four potentially valuable medicines from the CTD database. Based on the compounds’ biological activity, safety, reuse potential, and molecular binding energy, we selected tretinoin for further analysis. Tretinoin, a derivative of vitamin A, is clinically used for the treatment of acne and acute promyelocytic leukemia [[127]35]. However, emerging evidence suggests that it may have broader potential applications. Studies have indicated that tretinoin can modulate T cell function and differentiation, playing a potential role in various diseases [[128]36]. However, its precise mechanisms of action remain unclear. In our study, we attempted to elucidate these mechanisms using molecular docking and dynamic simulations. We propose that tretinoin may exert its therapeutic effects in comorbid conditions by inhibiting the expression of MYBL1, thereby regulating the function of CD8 Tem cells. Future work will involve biological experiments to further validate these findings. Additionally, we plan to rationally modify the structure of tretinoin or design suitable drug carriers to enhance its efficacy and minimize side effects. 5. Limitations This study had some limitations that should be acknowledged. First, our PBMC scRNA-seq data were obtained from public databases, which were of limited sample size and lacked data from patients codiagnosed with CD and RA. Second, both the eQTL and GWAS data were based on studies conducted in populations of European ancestry, limiting the generalizability of our findings to other ethnic groups. Further research in non-European populations is warranted to facilitate the clinical translation of these findings. Last, although our results suggest that tretinoin could be a potential therapeutic agent for comorbidity, the molecular docking and molecular dynamics simulation results are suggestive rather than conclusive. Further in-depth studies are needed to validate these preliminary findings. In the future, conducting additional animal studies and preclinical trials based on the findings of this study will aid in further elucidating the pathogenesis and therapeutics of RA and CD comorbidity. 6. Conclusions In conclusion, our comprehensive analysis indicates that CD8 Tem cells play a role in the pathogenesis of comorbid CD and RA. Specifically, the MYBL1 gene is causally associated with the risk of comorbid CD and RA and represents a potential drug target. Moreover, tretinoin shows promise as a potential treatment for comorbid CD and RA. This study contributes to laying the groundwork for a more profound understanding of comorbid CD and RA by offering a unique and comprehensive perspective into the sequencing data of PBMCs, as well as potential therapeutic avenues. Ethical statement Review and/or approval by an ethics committee was not needed for this study because all the data utilized in this study were obtained from previously published research and publicly accessible databases. Funding This research was supported by the Guangdong Medical Science and Technology Research Foundation (Grant no. A2023145), the Fundamental Research Funds for the Central Universities (Grant no. 21623311), Guangzhou Basic and Applied Basic Research (Grant no. SL2024A04J00960). Data availability statement GWAS data can be downloaded from the GWAS Catalog using the accession numbers ebi-a-GCST003044 and ebi-a-GCST005569. Sequencing data can be accessed from the GEO database using [129]GSM4976997, [130]GSM4977001, [131]GSM4976999, [132]GSM4977003, and [133]GSE159117. CRediT authorship contribution statement Qiu Dong: Writing – original draft, Data curation, Conceptualization. Xiaoting Wu: Investigation, Funding acquisition, Formal analysis. Tsz Mok: Project administration, Methodology. Gaohan Cai: Software, Resources. Zhengang Zha: Supervision, Software. Guorong She: Visualization, Validation. Junyuan Chen: Writing – review & editing, Writing – original draft. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements