Abstract The interplay between transcription factors (TFs) and regulatory elements (REs) drives gene transcription, forming gene regulatory networks (GRNs). Advances in single-cell technologies now enable simultaneous measurement of RNA expression and chromatin accessibility, offering unprecedented opportunities for GRN inference at single-cell resolution. However, heterogeneity across omics layers complicates regulatory feature extraction. We present scTFBridge, a multi-omics deep generative model for GRN inference. scTFBridge disentangles latent spaces into shared and specific components across omics layers. By integrating TF-motif binding knowledge, scTFBridge aligns shared embeddings with specific TF regulatory activities, enhancing biological interpretability. Using explainability methods, scTFBridge computes regulatory scores for REs and TFs, enabling robust GRN inference. Our results further demonstrate that scTFBridge can identify cell-type-specific susceptibility genes and distinct regulatory programs, providing insights into gene regulation mechanisms at the single-cell level. Subject terms: Computational models, Gene regulatory networks __________________________________________________________________ Recent work has enabled the simultaneous analysis of RNA levels and chromatin accessibility for gene regulatory network (GRN) inference, but heterogeneity across omics layers remains a barrier in these approaches. Here, the authors present scTFBridge, a deep-generative model for GRN inference, whereby latent spaces are disentangled into shared and specific components across omics layers. Introduction Transcription of genes is primarily regulated by TFs, proteins that bind to specific DNA sequences to trigger diverse gene regulatory functions^[34]1. These regulatory elements (REs) are typically located in accessible chromatin regions, such as promoters, enhancers, and silencers^[35]2,[36]3. The binding of TFs to REs and their spatial chromatin interaction with target genes (TGs) ultimately determines specific gene transcription processes^[37]4. The complex interplay among TFs, REs, and TGs can be represented as interconnected GRNs, the foundational elements of gene regulation in single cells^[38]5–[39]7. Understanding the regulatory mechanisms within these GRNs is pivotal for elucidating how cellular identities are established and maintained. Such insights have far-reaching implications, from guiding cell fate engineering to informing strategies for disease prevention and treatment^[40]8–[41]10. Recent advances in single-cell high-throughput sequencing techniques have enabled the exploration of gene regulatory processes at single-cell resolution^[42]11–[43]13. For example, single-cell RNA sequencing (scRNA-seq) facilitates the inference of cell-type-specific trans-regulatory interactions through computational tools such as DeepSEM^[44]14, PIDC^[45]15, GENIE3^[46]16, SupirFactor^[47]17, and SCENIC^[48]18. Similarly, single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) provides insights into chromatin accessibility at specific gene regions (Peaks)^[49]19, enabling the identification of TF-RE interactions related to gene regulation through methods like DeepTFni^[50]20, and DirectNet^[51]21. However, these distinct omics layers are typically measured in separate cells, complicating efforts to directly link RNA expression and chromatin accessibility within the same cell^[52]22. Unpaired data integration methods, such as scJoint^[53]23 and scGLUE^[54]24, integrate these datasets but are primarily optimized for tasks like cell-type annotation and harmonized data representation. Alternatively, cross-modality generation approaches, such as BABEL^[55]25 and UnpairReg^[56]26, predict gene expression from scATAC-seq data to enable deeper gene regulation analysis. Nevertheless, accurately capturing complex regulatory relationships within gene regulatory networks (GRNs) at single-cell resolution remains constrained by the availability of paired data and data quality. The emergence of single-cell multi-omics technologies, such as 10x Multiome^[57]27 and SHARE-seq^[58]28, has transformed the field by enabling simultaneous profiling of paired RNA-seq and ATAC-seq data from the same cell. This advancement facilitates more accurate GRN inference and a deeper understanding of gene regulation at the single-cell level. However, integrating these high-dimensional and heterogeneous multi-omics datasets remains a significant challenge^[59]29,[60]30. A key objective is to encode distinct omics layers into a unified latent space for a cohesive representation, as achieved by methods like MultiVI^[61]31, UnitedNet^[62]32. Additionally, specialized approaches, including LINGER^[63]33, scMultiomeGRN^[64]34, scMTNI^[65]35, scButterfly^[66]36, and SCENIC+^[67]37 have been developed to elucidate gene regulation across single-cell RNA-seq and ATAC-seq data. Despite these advances, a persistent hurdle in data integration is the modality gap^[68]38,[69]39. This gap arises from the intrinsic heterogeneity of different omics layers, where each modality captures distinct aspects of complex biological processes^[70]40. As a result, modality-specific private information can hinder the creation of a shared multi-omics latent space and impair accurate data reconstruction from the integrated embedding^[71]41,[72]42. Bridging this modality gap is essential for advancing multi-omics data integration and enabling more precise GRN analyses. In this study, we propose scTFBridge, an interpretable deep generative model for multi-omics integration and GRN inference at the single-cell level. By leveraging mutual information theory^[73]43–[74]45, scTFBridge minimizes mutual information and disentangles both RNA-seq and ATAC-seq data into modality-shared and modality-private subspaces. The modality-shared representations are further aligned through contrastive learning to construct a unified latent space. Incorporating biological priors of TF-motif bindings, scTFBridge constrains shared latent variables to represent specific TF regulatory activities, effectively acting as a bridge between RNA-seq and ATAC-seq modalities. Using deep learning explainability techniques, we derive regulatory scores linking REs and latent TF variables to TG expressions during the generation of TG expression. Our results demonstrate that scTFBridge effectively disentangles modality-shared and modality-private information, yielding biologically interpretable embeddings that capture cell-type-specific TF regulatory activities. Benchmarking against state-of-the-art methods on single-cell cis- and trans-regulation datasets, scTFBridge outperforms most baseline approaches in regulatory feature extraction across diverse cell groups. In summary, scTFBridge provides an efficient computational framework for exploring phenotype-specific gene regulatory mechanisms at single-cell resolution. It advances multi-omics data integration and enhances our understanding of gene regulation dynamics in complex cellular systems. Results The framework of scTFBridge scTFBridge is a biologically interpretable deep generative model designed for integrating single-cell multi-omics data and inferring cell-specific GRNs (Fig. [75]1a). It leverages a hybrid variational autoencoder^[76]46 (VAE) architecture that decomposes latent variables into modality-shared and modality-private components for each omics layer (Fig. [77]1b). Through mutual information theory and contrastive learning regularizations, scTFBridge effectively disentangles shared and private representations while aligning the shared latent space to capture common regulatory signals (Fig. [78]1c, Methods). To enhance biological interpretability, scTFBridge incorporates prior knowledge by constraining the weights of the shared ATAC-seq decoder (Fig. [79]1d). The shared latent variables were denoted as specific TF activities that contributed to modality reconstruction. Further enrichment analysis revealed the differential biological function of private/shared sets in a cell-specific manner (Fig. [80]1e). Using model interpretation techniques, scTFBridge enables the extraction of RE and TF regulatory scores for TGs, enabling the establishment of single-cell-specific GRNs (Fig. [81]1f, g). Fig. 1. Overview of the scTFBridge framework. [82]Fig. 1 [83]Open in a new tab a Schematic representation of RNA transcription, highlighting the cooperative roles of transcription factors (TFs) and regulatory elements (REs) in driving RNA synthesis. b The model architecture of scTFBridge features a hybrid multi-modal variational autoencoder (VAE) designed for latent space dismantlement and representation of mutual information (MI) learning. c Cross-mutual-information module for aligning modality-shared latent spaces and reducing MI between modality-shared and modality-private embeddings. d Regulatory module in the ATAC decoder, where TF–motif binding constrains the connections between TF variables and regulatory elements. e Definition of private/shared peak/gene sets according to their information flow from the input layer to disentangled latent space. Further enrichment analysis reveals the differential biological functions between private/shared sets in a cell-specific manner. f Cis-regulation inference depicted for target gene (TG) expression prediction based on ATAC data, where the objective function minimizes the mean squared error (MSE) between original and reconstructed gene expression levels. g Trans-regulation inference depicted for TG expression prediction using latent variables, employing the same computational strategy as in cis-regulation inference. Panel a was created with BioRender.com. In detail, we developed a Cross Mutual Information (CMI) module and contrastive learning (CL) loss to create a structured latent space that effectively separates and aligns information from both modalities (Fig. [84]1c). The CMI module employs independent neural networks to impose an upper bound^[85]47 on the MI (Methods), minimizing MI between shared and private embeddings within each modality, as well as between private embeddings across modalities. This forces common information into shared components. The CL loss pulls shared embeddings from the same cell closer together in the latent space, while pushing them away from the unrelated cells. This process aligns the two modalities, creating a unified space where the shared embeddings for a given cell are interchangeable, regardless of modality. Additionally, we employed a Product of Experts (PoE) module to integrate and maintain the final shared latent embedding. These approaches enable our model to learn a robust, shared representation that captures the common biology between RNA-seq and ATAC-seq while explicitly filtering out modality-specific noise. To preserve TF-RE regulatory relationships, we developed a regulatory module in scTFBridge that incorporates known TF-motif binding affinities to single-cell multi-omics data. In the shared latent embedding, latent variables are assigned to specific TFs. Prior knowledge of TF-motif binding affinities constrains the neural network weights, linking these TF latent variables to reconstructed REs (Fig. [86]1d, Methods). Concurrently, the same TF latent variables feed into the TG decoder network to capture paired gene expression information. The PoE module and the weight regularization ensure that the shared embeddings from RNA-seq and ATAC-seq data are aligned and interpretable at the latent variable level, enabling direct analysis of associations between multi-omics features and corresponding TFs in the shared latent space. To enhance clarity and precision in describing gene regulatory mechanisms, we employed SHAP (Shapley Additive Explanations) to quantify the contribution of input REs or shared latent TF variables to the TG expression reconstruction^[87]48 (Fig. [88]1f, g). SHAP scores were computed based on the mean squared error (MSE) between imputed and observed gene expression levels under specific feature perturbations (Methods). The biologically constrained model framework allows these SHAP scores to serve as regulatory scores for each RE and TF associated with TGs. For GRN inference, we derived both cis-regulation (RE-TG) and trans-regulation (TF-TG) interactions across diverse cell types using these scores. scTFBridge facilitates cell-type-specific disentangled embedding learning We applied scTFBridge to a single-cell multi-omics dataset comprising paired RNA-seq and ATAC-seq measurements of bone marrow mononuclear cells (BMMCs) from healthy donors across multiple batches^[89]49. The learned latent embeddings of RNA-private and ATAC-private networks enabled visualization of domain-specific cell representations, alongside shared cell embeddings integrating both modalities. Uniform manifold approximation and projection (UMAP) analysis of the embeddings revealed distinct cellular information within each domain. In the RNA-private space, most cell types were poorly separated, with the clear exception of erythroblasts and normoblasts (Fig. [90]2a). Conversely, the ATAC-private embedding captured greater cellular heterogeneity (Supplementary Fig. [91]1). To investigate the limited resolution of RNA-private embeddings, we estimated the MI score between RNA-private, RNA-shared, and scRNA-seq embeddings generated by scVI^[92]50 (Supplementary Data [93]1). The results indicated that RNA-private embeddings contained minimal unique information compared to RNA-shared embeddings, with substantial overlap between RNA-seq and ATAC-seq data. This overlap likely stems from the use of single-nucleus RNA sequencing (snRNA-seq), which predominantly captures nuclear transcripts. Additionally, we performed Gene Set Enrichment Analysis (GSEA)^[94]51 to access functional roles of genes in RNA-private embeddings based on their contribution ranks to shared or private embeddings (Supplementary Fig. [95]2). The analysis revealed that RNA-private genes in erythroblasts and normoblasts were significantly enriched in the Heme Metabolism pathway, indicating that their gene expression during erythroblast differentiation is regulated independently of ATAC-seq signals. Fig. 2. Interpretable TF-informed embedding learning of scTFBridge in BMMCs. [96]Fig. 2 [97]Open in a new tab a UMAP visualization of modality-shared and RNA-private embeddings in BMMCs. b Comparison between ground-truth target gene (TG) RNA expression and scTFBridge-based reconstruction from ATAC-seq data. c Quantification and ratio of annotated promoter peaks across shared and private peak sets for different cell types. d Regulatory scores and target gene counts for cell-type-specific TF variables identified through biological interpretation analysis. e Visualization of cell-type-specific gene regulons derived from high-ranked target genes (TGs) associated with TF variables. The sample size of the box plot is the number of cells for each cell type, ranging from 171 to 2168 cells. Note that all boxplots in this study present minima and maxima, the smallest and largest value that is not considered an outlier; center, median; bounds of box, 25th (Q1) to 75th (Q3) percentile; whiskers, 1.5 times the (Q3–Q1). Source data are provided as a Source Data file. To evaluate the scTFBridge’s capacity to learn cross-modal interactions, we performed a TG expression reconstruction experiment. In detail, we used the RNA-shared embedding inferred from ATAC-seq data, concatenated with a randomly sampled RNA-private embedding for TG reconstruction. The model’s performance was then quantified by the concordance between the clustering result of generated TG expression data and the ground-truth cell annotations (Methods). Compared to scButterfly^[98]36, scTFBridge achieved superior TG expression reconstruction accuracy and preserved cell-specific information more effectively (Fig. [99]2b, Supplementary Fig. [100]3). These results demonstrate that paired ATAC-seq data successfully capture key gene expression signals. Additionally, ablation experiments revealed that both MI and CL modules significantly contributed to RNA-seq data reconstruction and information retention (Supplementary Data [101]2). We further assessed the contributions of different ATAC-seq data Peak features to the ATAC-shared and ATAC-private embeddings based on the information flow through different encoders. Peaks were categorized into shared or private groups by their relative contributions to shared or private embeddings (Methods). Comparing the number and ratio of promoter Peaks in each group, our result revealed that the modality-shared embedding primarily enriched gene regulatory signals (Fig. [102]2c). We then explored the cell-type-specific gene regulatory elements through biological interpretation analysis using scTFBridge (Methods). Our results identified cell-type specific TFs, such as ESR in cDC cells, TCF7 in CD4 naïve T cells, and SOX4 in B cells (Fig. [103]2d). Additionally, high-ranked TGs of TF variables revealed cell-type-specific gene regulons (Fig. [104]2e). These findings demonstrate scTFBridge’s capacity for biologically disentangled learning and its effectiveness in elucidating potential cellular gene regulatory mechanisms. scTFBridge reveals heritability enrichment of peaks in rheumatoid arthritis We applied scTFBridge to a single-cell multi-omics dataset from profiled immune cells from rheumatoid arthritis tissues^[105]52. To visualize the latent representations, we used UMAP to project the modality-shared, RNA-private, and ATAC-private representations into a two-dimensional space (Fig. [106]3a). In the RNA-private embedding, immune cells formed close clusters, suggesting that most of the gene regulatory information for these cells in the RNA-seq data is captured by modality-shared embeddings, as compared to the ATAC-seq data. The MI scores between RNA-shared and ATAC-shared representations was higher than the MI between RNA-shared and RNA-private or ATAC-shared and RNA-private, demonstrating that scTFBridge effectively disentangles modality-private and modality-shared components (Fig. [107]3b). Fig. 3. Identifying cell-specific modality-shared and private regulatory elements and functions in rheumatoid arthritis. [108]Fig. 3 [109]Open in a new tab a UMAP visualization of modality-shared, RNA-private, and ATAC-private embedding in the dataset of rheumatoid arthritis. b Mutual information (MI) comparison between different representations. R and A denote RNA and ATAC modalities, respectively; s and p denote shared and private embeddings, respectively. c Gene Ontology (GO) pathway analysis comparing shared and private gene sets across different cell types. The enrichment statistical significance was assessed using the hypergeometric test, with -log[10](p-value) indicated. d Quantification and ratio of annotated promoter peaks across shared and private peak sets in different cell types. e Analysis of shared/private and dynamic/condition-specific (cs)-invariant peaks across cell types. Statistical significance was assessed using the one-sided Chi-square test, with p-values indicated. f Heritability enrichment analysis of shared and private peak sets of single-cells from RA samples. The x-axis shows the -log[10] enrichment p-value, and the dot size reflects the h^[110]2 effect. Heritability enrichment p-values were calculated using Stratified LD Score S-LDSC. A Z-test was performed on the regression coefficient (τ) for each annotation to determine if its contribution to heritability was significantly different from zero. Source data are provided as a Source Data file. To uncover deeper biological insights from distinct RNA-seq embeddings, we quantified the normalized absolute importance scores for each gene across different embeddings and identified the top 100 genes as their predominant gene programs (Methods). Gene Ontology (GO) pathway enrichment analysis revealed that the shared gene program primarily contributed to immune activation and receptor signaling, while the private gene program was associated with cell cycle regulation (Fig. [111]3c, Supplementary Data [112]3). Similarly, we assessed the enrichment of promoter peaks into modality-shared and ATAC-private embeddings. The results confirmed that the modality-shared embedding of scTFBridge captured a higher number of promoter peaks (Fig. [113]3d), highlighting its superior ability to capture gene regulatory signals in single-cell Multiome datasets. To further evaluate the scTFBridge model’s ability to characterize crucial regulatory elements, we compared the shared- and ATAC-private predominated REs identified by scTFBridge with the dynamic and cell-state-invariant (cs-invariant) peaks identified in Gupta et al. ^[114]52. Dynamic peaks were defined as those whose variation could be explained by paired RNA expression data, representing stronger gene regulatory effects^[115]52. In this comparison, we examined the overlap of peaks across two distinct groups (Fig. [116]3e). The results demonstrated that shared peaks were significantly enriched in the dynamic peak group (Chi-squared test, Methods). Additionally, we employed the stratified LD-score regression (S-LDSC)^[117]53 method to assess the heritability enrichment of shared and ATAC-private REs concerning GWAS-annotated summary statistics for rheumatoid arthritis (Methods). The enrichment scores revealed that shared peaks were more effective in capturing causal variants associated with the disease (Fig. [118]3f, Supplementary Data [119]4). These findings suggest that the scTFBridge model provides a more refined characterization of gene regulatory signals. Benchmarking analysis of cell-type-specific cis-regulatory inference Accurate GRN inference remains challenging across different cell types and functional states due to the limited availability of gold-standard datasets and potential experimental biases. To assess the cell-type-specific cis-regulatory inference capabilities of scTFBridge, we integrated promoter-capture Hi-C (pcHi-C) data from three primary blood cell types^[120]54 and expression quantitative trait loci (eQTL) data from eight immune cell types^[121]55,[122]56. These external validation datasets include SNPs significantly associated with gene expression, serving as known cis-regulatory RE-TG pairs for performance benchmarking (Supplementary Data [123]5, Methods). We compared scTFBridge’s performance in cis-regulatory inference against established methods, including PCC, Distance, Random, and SCENIC+^[124]37 (Methods). Since the distance from transcription start sites (TSS) of TG is a critical factor in cis-regulatory inference, we categorized predicted RE-TG pairs into six distance groups, ranging from 0–5 kb to 100–200 kb, for evaluation. Using single-cell eQTL data as the ground truth, scTFBridge achieved the highest area under the curve (AUC) and area under the precision-recall curve (AUPR) ratios across all distance groups in naive CD4 cells (Fig. [125]4a, b, Methods), whereas other methods performed comparably to random. Since SCENIC+ only provides final RE-TG pairs without relevance scores, we used the F1 score to assess its performance (Methods). scTFBridge outperformed SCENIC+ in F1 scores across all distance groups (Fig. [126]4c). In other cell types, due to the limited availability of eQTL pairs, all RE-TG pairs were treated as positive labels without further distance-based stratification. scTFBridge consistently outperformed baseline methods in both AUC and AUPR ratios across all evaluated cell types, achieving the highest overall ranking (Fig. [127]4d, e) and significantly higher F1 scores compared to SCENIC+ (Fig. [128]4f). Additionally, when using pcHi-C data as ground truth, scTFBridge also achieved the highest AUC across all distance groups in naive CD4 T cells (Fig. [129]4g). Fig. 4. Benchmarking analysis of cell-type-specific cis-regulatory inference in PBMCs. [130]Fig. 4 [131]Open in a new tab a, b Benchmark performance (AUC and AUPR ratio) of different methods for cis-regulatory inference in naïve CD4+ T cells, evaluated using eQTL data as ground truth. c F1 score comparison between scTFBridge and SCENIC+ for cis-regulatory inference in naïve CD4+ T cells. Statistical significance was assessed using a two-sided paired Student’s t-test, with *** indicating a p-value < 0.001. The sample size is the number of TSS distance groups with 6. The p-value is 0.00051. d, e Benchmark performance (AUC and AUPR ratio) of different methods of cis-regulatory inference across other cell types. f F1 score comparison between scTFBridge and SCENIC+ in the cis-regulatory inference across other cell types. Statistical significance was evaluated with a two-sided paired Student’s t-test, where ** indicates a p-value < 0.01. The sample size if the number of other cell types with 7. The p-value is 0.0016. g Benchmark performance (AUC) of different methods in the cis-regulatory inference of naïve CD4 + T cells, using the pcHi-C data as ground truth. h scTFBridge-identified cis-regulatory interactions of AIF1 in CD14 Mono, CD4+ T central memory (TCM) cells, and conventional dendritic cells (cDC), supported by eQTL evidence. The sample size of box plot is the number of cells for each cell type: cDC (n = 36), CD4 TCM (n = 223) and CD14 Mono (n = 511). Source data are provided as a Source Data file. To further demonstrate its utility, we examined cis-regulatory interactions identified by scTFBridge that align with known gene regulatory mechanisms across various cell types. For instance, AIF1 gene expression varies among cell types, being highly expressed in CD14 monocytes and cDCs, but with lower expression in T and B cells. To illustrate scTFBridge’s ability to capture cell-type-specific cis-regulatory interactions for AIF1, we analyzed a 200 kb region upstream and downstream region of its TSS, annotating cell-type-specific peak accessibility, eQTL SNP loci, and inferred RE-TG pairs predicted by scTFBridge (Fig. [132]4h). Using CD4 TCM cells as a reference, scTFBridge successfully identified the majority of known cis-regulatory elements in CD14 Mono and cDCs, supported by single-cell eQTL data. Additionally, scTFBridge uncovered potential regulatory elements that may be specific to these cell types (Fig. [133]4h). These results underscore the heterogeneity of cis-regulatory mechanisms across cell types and highlight scTFBridge’s potential to enhance and refine cell-type-specific GRN inference. Benchmarking analysis of cell-type-specific trans-regulatory inference To evaluate the single-cell trans-regulatory inference capability of scTFbridge, we leveraged pre-identified TF-TG regulatory pairs from Chromatin Immunoprecipitation sequencing (ChIP-seq) data in the Cistrome database as the ground truth^[134]57. Specifically, we compiled regulatory data for 17 TFs and their corresponding putative target genes across four blood cell types (Supplementary Data [135]6). We benchmarked scTFBridge’s performance against baseline methods designed for cell-type-specific GRN inference employing single-cell multi-omics data, including LINGER^[136]33, scMTNI^[137]35, scMultiomeGRN^[138]34, and SCENIC+^[139]37. As a case study, we analyzed the trans-regulatory scores of STAT1 across all single cells (Fig. [140]5a) and examined its paired trans-regulatory TGs in CD14 monocytes. scTFBridge outperformed all baseline methods except LINGER, achieving an AUC score of 0.693, and an AUPR ratio of 2.221 (Fig. [141]5b, c). Across the benchmarking dataset, scTFBridge consistently surpassed the other baseline methods, yielding higher AUC scores and AUPR ratios for all tested TFs (Fig. [142]5d, e). Fig. 5. Benchmarking analysis of cell-type-specific trans-regulatory inference in PBMCs. [143]Fig. 5 [144]Open in a new tab a UMAP visualization of modality-shared embedding of PBMCs. b, c Receiver operating characteristic curve and precision-recall curve of trans-regulatory potential inference of STAT1 in CD14 Mono cells. d, e Box plots comparing AUC and AUPR ratio scores across various methods for identifying putative TF target genes using ChIP-seq data. Statistical significance was evaluated with two-sided paired Student’s t-test, where *** indicates a p-value < 0.001. The sample size is the number of putative TFs with 17. For d, p-values are 7.2e^-3, 4.4e^-5, 1.2e^-5, and 4.8e^−6 for LINGER, SCENIC+, scMTNI, and scMultiomeGRN. For e, p-values are 9.1e^-3, 1.1e^-4, 9e^-6 and 4.4e^−6, respectively. f Top TF-regulated gene expression validated by ChIP-seq data across different cell types, including ETS1 in CD4+ T cells, STAT1 in CD14+ monocytes, and PAX5 in naïve B cells. g Violin plots showing cell-level trans-regulatory scores of specific TF-TG pairs across different cell types. The sample size is the number of cells for each cell type, ranging from 25 to 511 cells. h Network representations of inferred TF-TG pairs in CD14 Mono and naïve B cells. Node color corresponds to gene expression levels, while edge color represents regulatory scores. Source data are provided as a Source Data file. Unlike LINGER, which achieves high performance through pre-training on extensive external bulk-seq datasets and using pre-trained gene regulatory scores for GRN inference. scTFBridge operates as a self-contained, self-supervised framework requiring no external training data. This streamlined and versatile approach makes scTFBridge highly adaptable to datasets without high-quality bulk data. We conducted a comprehensive performance comparison of specific TFs and their top-ranked target genes (Supplementary Data [145]7 and Methods). For gene sets within the top 150 ranks, LINGER and scTFBridge showed comparable performance. However, for certain TFs (IRF1, SPI1, BATF, and RUNX3), scTFBridge exhibited superior performance across most ranges (Supplementary Fig. [146]7), demonstrating its robustness and broad applicability across diverse TFs. Next, we explored trans-regulatory associations in other single-cell groups. Based on annotated TF-TG associations, we selected ETS1, STAT1, and PAX5 as the dominant regulatory TF in CD4 T cells, monocytes, and B cells, respectively, along with their most associated TGs. Comparing TG expression levels revealed cell-type-specific patterns of gene expression for these TF-TG pairs (Fig. [147]5f). We further evaluated the regulatory scores predicted by scTFBridge and found that the model effectively captured cell-type-specific trans-regulatory associations (Fig. [148]5g). To visualize these relationships, we reconstructed the inferred trans-regulatory networks for cell-type-dominant TFs and their associated TGs. For instance, we identified the STAT1-ZEB2 regulatory link in CD14 Monocytes and BATF-BANK1 regulatory link in B cells (Fig. [149]5h). These results underscore scTFBridge’s ability to resolve cell-type-specific trans-regulatory activities at single-cell resolution. scTFBridge uncovers cell-type-specific gene regulations in lung cancer susceptibility To explore disease-specific single-cell gene regulation, we applied scTFBridge to a single-cell multi-omics dataset derived from lung cancer^[150]58. This dataset comprises paired RNA-seq and ATAC-seq data collected from tumor-distant normal lung tissues of ever- and never-smokers (n = 8 per group). The modality-shared embedding of these single cells revealed a distinct separation based on annotated cell types from Long et al. (Fig. [151]6a). In contrast, the RNA-private and ATAC-private embeddings exhibited less cellular information, with the RNA-private embedding showing particularly reduced information content (Supplementary Fig. [152]8). Fig. 6. Cell-type-specific gene regulations in lung cancer susceptibility uncovered by scTFBridge. [153]Fig. 6 [154]Open in a new tab a UMAP visualization of the modality-shared embedding of lung single-cell dataset. b Regulation scores of tier-6 linkage variants and candidate cis-regulatory elements associated with target genes across different cell types. Dot color indicates cell-type categories, and dot size represents the regulation scores. c UMAP visualization of NRG1 gene expression of lung single-cell dataset as in (a). d UMAP visualization of the regulation scores for the variant rs4236709-linked peak across different cell types as in (a). e Visualization of peak accessibility and regulation scores within the NRG1 locus across various cell types. Dot color corresponds to the regulatory score of a specific peak to NRG1. Source data are provided as a Source Data file. Long et al. previously defined high-confidence RE-TG pairs (tier 6) that included lung cancer candidate causal variants (CCVs) colocalized REs, and their TGs. This group proposed potential gene regulatory patterns contributing to lung cancer susceptibility. To validate scTFBridge’s ability to uncover cell-type-specific cancer susceptibility cis-regulations, we compared the inferred regulatory scores of identified RE-TG pairs across predefined calls in cell subtypes and other cell types (Fig. [155]6b). The results demonstrated that scTFBridge effectively captures higher cell-specific cis-regulations for these RE-TG pairs. Specifically, we visualized the expression of the target gene NRG1 across all single cells and evaluated the regulatory score inferred by scTFBridge for a CCV-specific peak (chr8: 32552186–32552663) (Fig. [156]6c). The findings confirmed that scTFBridge accurately reflects cell-type-specific cis-regulation. Furthermore, a comparison of the inferred regulatory scores across peaks within the NRG1 gene region also indicated that the CCV-colocalized peak exhibited the highest regulatory potential (Fig. [157]6d, e). To further investigate context-dependent gene regulation, we extended our analysis to smoking-responsive CREs, the results confirmed reduced regulatory scores for selected RE-TG pairs in cells from the ever-smoker group (Supplementary Fig. [158]9). These results underscore the capability of the scTFBridge model as a powerful tool for identifying gene regulatory events underlying human disease susceptibility. Discussion Inferring dynamic gene regulatory processes at single-cell resolution advances our understanding of complex cellular regulation during development and disease, while also facilitating the discovery of genotype-phenotype associations^[159]14,[160]59. Although multiple methods have been developed to infer GRNs using scRNA-seq or scATAC-seq datasets^[161]37,[162]60, these approaches are often limited by the unpaired nature of the datasets and the high sparsity of single-cell data, making it challenging to accurately characterize complex cell-type-specific GRNs. The advent of single-cell multi-omics techniques allows for the simultaneous profiling of individual cells at both the transcriptomic (RNA-seq) and epigenomic (ATAC-seq) levels. This comprehensive characterization necessitates the development of computational methods tailored for single-cell multi-omics profiling. These datasets facilitate learning joint embeddings of single-cell features^[163]2, establishing associations between REs and TG expression^[164]61, and inferring GRNs^[165]33,[166]37. Additionally, advances in biological deep learning for single-cell omics further provide more accurate representations of single-cell data, enhancing interpretability and offering deeper insights into cellular regulatory mechanisms^[167]17,[168]24,[169]62. In this study, we introduced scTFBridge, an interpretable deep generative model designed to infer GRNs from single-cell multi-omics data. Leveraging mutual information theory, scTFBridge effectively separates the latent variable space into modality-shared and modality-private subspaces, addressing key challenges posed by modality gaps and information redundancy in multi-omics integration^[170]39,[171]63. Furthermore, scTFBridge incorporates prior knowledge of TF-motif binding to regularize the biological learning of modality-shared embedding. This approach enables the model to capture TF-RE interactions within the shared embeddings, enhancing interpretability. Notably, each latent variable in the modality-shared embedding corresponds to a specific TF, forming a bridge between RNA-seq and ATAC-seq data to support accurate GRN inference. This architecture not only improves GRN inference accuracy but also ensures that the representations remain biologically interpretable. We applied scTFBridge to single-cell multi-omics datasets derived from BMMCs^[172]49, rheumatoid arthritis cells^[173]52, PBMCs, and lung cells^[174]58 to evaluate its performance across diverse gene regulatory tasks. Leveraging a deep-learning model interpretation framework, scTFBridge effectively identified cell-type-specific TFs and uncovered the contributions of various TFs and REs to TG regulation in different cell types. Using promoter-capture Hi-C (pcHi-C), eQTL, and ChIP-seq data from blood as ground truth, scTFBridge demonstrated superior performance compared to most baseline methods in inferring cis- and trans-regulatory interactions in the PBMC dataset. Furthermore, analysis of a lung cell multi-omics dataset demonstrated scTFBridge’s ability to identify candidate susceptibility genes associated with causal variants within cell-specific REs. These results highlight scTFBridge’s efficacy in disentangling complex regulatory features, thereby advancing single-cell GRN inference. Despite its contributions, this study has several limitations. Unlike LINGER^[175]33, Enformer^[176]64, and NvwaCE^[177]65 models, which leverage large-scale pretrained data, scTFBridge was trained from scratch, potentially limiting its ability to uncover established gene regulatory mechanisms. The benchmarking analysis primarily used blood-derived datasets due to the limited availability of standardized ground truth data for other tissues and cell types, reducing the method’s generalizability. Furthermore, our analysis concentrated on a simplified model of TF and RE binding to disentangle each latent variable in modality-shared space, potentially limiting its applicability to more complex TF regulatory contexts. Although scTFBridge effectively extracts latent variables for RNA-seq and ATAC-seq data, it does not account for regulatory mechanisms involving microRNAs, non-coding RNAs, or private peaks, necessitating further exploration and complementary methodologies^[178]2. While the CMI module’s capacity for information disentanglement was evaluated, achieving complete and accurate information separation in multi-omics data remains a significant challenge. To address this, researchers can adapt the modality-reconstruction task^[179]32 to evaluate the information constraints within specific latent embeddings and conduct latent space independence tests to evaluate the information redundancy^[180]66. A promising approach involves using a formal information-theoretic decomposition framework to quantify multimodal interactions, independent of prior biological understanding^[181]67. This method employs mathematical estimators to measure redundancy (shared information), uniqueness (private information), and synergy (emergent information), offering a rigorous, quantitative way to evaluate the separation of shared and private components in multimodal data. In summary, scTFBridge provides a computational framework for inferring both cis- and trans-regulatory GRNs from single-cell multi-omics data. The core principle of biologically informed multi-modal disentanglement can be adapted to other single-cell multi-omics applications. This work emphasizes the importance of developing interpretable, biologically grounded multi-modal integration methods that advance our understanding of molecular regulatory processes at the single-cell level. Methods Framework of scTFBridge scTFBridge is a deep learning-based computational framework designed to integrate single-cell multi-omics data and infer cell-type-specific GRNs. The framework utilizes a hybrid multimodal VAE architecture to decompose the data into shared and private latent spaces. The shared latent space captures regulatory interactions among TFs, REs, and TGs, and the private latent space encodes modality-specific information, such as non-TF regulatory factors or other cellular-specific features unique to each data modality. Disentangled Multi-Modal VAE For a given cell [MATH: n :MATH] , the observed multi-omics input data is denoted as [MATH: x=xr,xa :MATH] , where [MATH: xr :MATH] represents RNA-seq data and [MATH: xa :MATH] represent ATAC-seq data. we define a low-dimension latent variable [MATH: z=zr,za :MATH] to represent the observed latent omics. The latent space embedding [MATH: z :MATH] is divided into modality-shared and private representations, factorized as [MATH: zr={zsr,zpr} :MATH] , [MATH: za={zsa,zpa} :MATH] . For a single modality (e.g., RNA-seq), the goal of the deep generative model is to maximize the likelihood of the observed data conditioned on the latent variables. This likelihood is expressed as Eq. ([182]1): [MATH: px;θ=pxz;θpzdz :MATH] 1 Here, [MATH: pz :MATH] is the prior distribution, typically set as a standard Gaussian distribution, and [MATH: pxz;θ :MATH] represents the process of generating the observed data from latent variable, and [MATH: θ :MATH] denotes the learnable parameters in the decoder. Direct computation of [MATH: px;θ :MATH] is challenging, so we introduced a variational posterior [MATH: qzx;ϕ :MATH] to approximate the true poster distribution [MATH: pzx :MATH] , where [MATH: ϕ :MATH] are the parameters of data encoder. Using this approximation, Eq. ([183]1) can be rewritten as: [MATH: logpx;θ=logpxz;θpzdz=logqzx;ϕpxz;θpzqzx;ϕdz :MATH] 2 Using Jensen’s inequality, this leads to the evidence lower bound (ELBO): [MATH: logpx;θ=Eqzx;ϕlogpz,xqzx;ϕ+KL(q( zx;ϕ)p(zx)) :MATH] 3 Since [MATH: KL(q( zx;ϕ)p(zx))0 :MATH] , the ELBO is given by: [MATH: ELBO=Eqzx;ϕ[logp( xz;θ)]KL(q( zx;ϕ)p(z)) :MATH] 4 Here, [MATH: KL(q< mo>∣p) :MATH] represents the Kullback–Leibler divergence between [MATH: qzx;ϕ :MATH] and prior [MATH: pz :MATH] . To maximize the log-likelihood, we can maximize the ELBO. To disentangle the latent variables and address batch effects, the learning objective for each omics layer represents as follows: [MATH: ELBO=Eqzs,zpx;ϕ[logp(xzs,zp,c;θ)](KL(q( zsx;ϕ< mi>s) p(zs))+KL(q( zpx;ϕ< mi>p) p(zp))) :MATH] 5 Here, [MATH: c :MATH] is the batch label used as a conditional variable in the generative process. For the modality-shared representation, we use a product-of-experts (PoE) model to compute the joint variational posterior distribution for each omics modality: [MATH: qzsx=pzsi=1Nqzsixi :MATH] 6 The prior [MATH: pzs :MATH] is modeled as a Gaussian distribution [MATH: Nzsμ0,δ0 :MATH] , and each modality-specific posterior [MATH: q(zxixi) :MATH] is also Gaussian, with mean [MATH: μsi :MATH] and variance [MATH: δsi :MATH] . The joint distribution is defined by: [MATH: δs=δ01+ir,aδi11 :MATH] 7 [MATH: μs=μ0δ01+ir,aμiδi1δs :MATH] 8 The overall learning objective for multi-modal integration during training is defined as: [MATH: iϵr,a([Eq(zsixi;ϕs i),q(zpixi;ϕp i)[logp( xizpi,zsi,c;θ< mi>i)] KL(q( zsixi;ϕs i)p(zsi) )KL(q( zpixi;ϕp i)(p(zpi) ))]) :MATH] 9 Cross mutual information Block To achieve a well-disentangled multi-modal latent space, we incorporate mutual information (MI) theory to decouple shared and private representations. The loss function for this block is defined as in Eq. ([184]10): [MATH: LCID=i∈< /mo>r,aIzpi,zsi+Izpr,zpa :MATH] 10 Here, [MATH: I(zpi,zsi) :MATH] measures the MI between the private and shared latent embeddings within a single modality [MATH: i :MATH] , [MATH: I(zpr,zpa) :MATH] measures the MI between private embeddings across modalities. The goal is to minimize the MI between modality-shared and private embeddings within a modality and between private embeddings across modalities. The MI between two variables [MATH: x :MATH] and [MATH: y :MATH] is defined as in Eq. ([185]11): [MATH: Ix;y=Epx,ylogpx,ypxpy :MATH] 11 Since mutual information is generally intractable, we use an upper bound known as the Contrastive Log-ratio Upper Bound (CLUB). This approximation for [MATH: ICLUB< /mi>x,y :MATH] is described as in Eq. ([186]12): [MATH: ICLUB< /mi>x,y=Epx,ylogpyxEpxEpylogpyx :MATH] 12 To validate that [MATH: ICLUB< /mi>x,y :MATH] is an upper bound of [MATH: Ix;y :MATH] , we compute the difference [MATH: Δ=ICLUBI< /mi>x;y :MATH] as in Eq. ([187]13): [MATH: Δ=Ep(y)[logp(y) Ep(x)[logp(yx)]] :MATH] 13 Using the marginal distribution [MATH: logp(y)=log p(yx)p(< /mo>x)dx=log(Epx[p(yx)]) :MATH] and Jensen’s inequality, we derive: [MATH: logpy=logEpxpyxEpxlogpyx :MATH] 14 This shows that [MATH: Δ0 :MATH] , confirming that [MATH: ICLUB< /mi>x,y :MATH] is indeed an upper bound of [MATH: Ix;y :MATH] which can be computed as: [MATH: ICLUB< /mi>=1Ni=1Nlogpyixi1N 2i=1Nj=1Nlogpyixi :MATH] 15 This simplifies to: [MATH: ICLUB< /mi>=1< mrow>N2i=1Nj=1Nlogpyixilogpyjxi :MATH] 16 We employ a nonlinear neural network to estimate [MATH: pyx :MATH] , with the log-likelihood computed as: [MATH: i=1Nlogpyixi;ϕ :MATH] 17 During each training epoch, the parameters of the distribution estimator are first updated to maximize Eq. ([188]17), improving the accuracy of conditional distribution estimation. Subsequently, other parameters of scTFBridge are updated to minimize [MATH: ICLUB< /mi> :MATH] , thereby reducing the mutual information between the two variables. TF-Binding informed ATAC-seq Decoder To ensure that the joint shared embedding represents transcription factor (TF) regulatory activities, we incorporated prior knowledge of TF-motif binding scores into the architecture of the ATAC-seq decoder. We utilized 713 TF position weight matrices (PWMs) for known motifs, sourced from the PECA2 GitHub repository ([189]https://github.com/SUwonglab/PECA). This repository aggregates motif data from widely used databases, including JASPAR^[190]68, TRANSFAC^[191]69, and UniPROBE^[192]70. From the single-cell RNA-seq dataset, the 128 most highly expressed TFs were selected as dimensions of the modality-shared embedding. Each TF corresponds to a single latent variable in the shared embedding. Regulatory elements (REs) were scanned for TF binding motifs using HOMER. This produced a binding score matrix [MATH: B :MATH] , which quantifies the interaction strength between each TF and the REs. The ATAC-seq reconstruction decoder is implemented as a single-layer neural network. The decoder receives the combined latent variables as input and reconstructs ATAC-seq signals (peaks) constrained by biological priors. The binding score matrix [MATH: B :MATH] serves as a mask to constrain the weight matrix [MATH: W :MATH] of the decoder. [MATH: B :MATH] ensures that connections from TF latent embeddings to ATAC-seq peaks reflect known TF-RE interactions. For each connection in the decoder, the weight matrix [MATH: W :MATH] is modulated by the binding score matrix [MATH: B :MATH] , as follows: [MATH: Wi,j=Wi,j*B i,j,i0,N,j0,K :MATH] 18 Where [MATH: N :MATH] is the number of TF dimensions and [MATH: K :MATH] is the number of REs in ATAC-seq. Cross modal inference In addition to modality-private self-reconstruction, the scTFBridge training process incorporates cross-modal generation to learn cis-regulatory relationships between ATAC-seq and RNA-seq. This feature enables inference across modalities, allowing the model to predict missing data in one modality using information from the other. When RNA-seq data is missing, the Product-of-Experts (PoE) approach is employed to use the shared latent distribution of ATAC-seq [MATH: q(zsaxa,c;ϕ) :MATH] as the multi-modal joint distribution for cross-omics generation. The missing modality-private latent variable [MATH: zpr :MATH] is reparametrized from a prior normal Gaussian distribution [MATH: N(zprμ0,δ0) :MATH] . Using ( [MATH: zpr,zs :MATH] ), the model recovers the RNA-seq data by optimizing the following objective: [MATH: Ezpr~Nμ0,δ0,qzsxa;ϕ[log p(xrzpr,zs,c;θ)]KL(q( zsxa;ϕ)p (zs)) :MATH] 19 Similarly, we can derive the objective function for generating ATAC-seq data from RNA-seq: [MATH: Ezpa~Nμ0,δ0,qzsxr;ϕ[log p(xazpa,zs,c;θ)]KL(q( zsxr;ϕ)p (zs)) :MATH] 20 Learning Object In general, the final optimizing objective of scTFBridge can be expressed as follows: [MATH: L=LVA< /mi>E+β*< mi>LMI+γ *Lcr< mi>ossmodal :MATH] 21 The first term optimizes the disentangled VAE for learning omics-specific representations and accurate data reconstruction, it measures how well the model reconstructs each modality from the latent representations and regularizes the latent space by minimizing the divergence between the latent distributions and the prior Gaussian distribution. The second term evaluates and promotes the disentanglement of the shared and private latent representations by minimizing the MI between them. The third term ensures that the model can perform effective cross-modal inference, using the shared and reparametrized latent variables. These terms collectively ensure that scTFBridge achieves biologically meaningful integration, disentanglement, and cross-omics prediction. Single cell multi-omics data preprocessing In this study, we implemented a single-cell multi-omics data preprocessing workflow using the Scanpy package^[193]71. For the scRNA-seq data, TFs were isolated, normalized, and log-transformed. We then selected the top 128 highly variable TFs for focused embedding in the shared TF space. Additionally, the top 3000 highly expressed genes were identified as (TGs). These TGs were normalized and log-transformed to serve as input for scTFBridge. For scATAC-seq data, chromatin accessibility peaks were binarized and filtered to retain peaks with open accessibility in at least 1% of all cells. To ensure accurate integration, only cells represent in both datasets were included in the analysis. SHAP value for cell-level and cell-type-specific GRN inference SHAP values provide a robust framework for quantifying the contributions of features in deep learning models and are widely used for biologically interpretable biomarker discovery. Building on prior studies^[194]48, we employed the reconstruction loss between original and imputed TG expression values as the objective for computing SHAP values. for input features, focusing on their contribution to the reconstruction rather than direct expression effects. By focusing on the reconstruction error, scTFBridge highlights features that are challenging for the model to learn, revealing complex or subtle regulatory relationships in cell-level and cell-type-specific GRNs. For cis-regulatory contributions, we calculated regulatory elements (REs) associated with TGs via cross-omics generation. The cis-regulatory score for a RE-TG pair was defined as: [MATH: REscorei,j=ρi,jOjEied i,j< msub>d0 :MATH] 22 Where [MATH: ρi,j :MATH] is the SHAP value of RE [MATH: j :MATH] and TG [MATH: i :MATH] ; [MATH: Oj :MATH] is the average chromatin accessibility of this cell type; [MATH: Ei :MATH] is the average expression of TG [MATH: i :MATH] in the cell type; [MATH: di,j :MATH] is the distance between genomic locations of [MATH: j :MATH] and [MATH: i :MATH] ; and [MATH: d0 :MATH] is the normalization factor for genomic distance, set to 30,000 bp in this study. For trans-regulatory contributions, we quantified the influence of each TF variable in the shared embedding space on TGs. The trans-regulatory score was directly defined as the SHAP value: [MATH: TFscorek,j=αk,i :MATH] 23 Where [MATH: αk,i :MATH] represent the SHAP value quantifying the contribution of TF [MATH: k :MATH] to TG [MATH: i :MATH] . To infer regulatory networks at the individual cell level, we extended the computation of SHAP values to account for unique cell-level characteristics. Instead of averaging values across a specific cell type, we calculated SHAP values using the average gene expression and chromatin accessibility across all cell types. This method enabled the reconstruction of cell-specific GRNs, providing a fine-grained view of regulatory interactions that account for each cell’s unique features. Latent embedding contributions estimation To evaluate the contributions of modality-private and shared components in the generative process of RNA-seq data from latent variables, we employed SHAP values. For each latent dimension, the average of the absolute SHAP values was computed across all TGs to derive global contribution scores. The overall contributions of the private and shared embedding spaces were then estimated by summing the contribution scores of all latent variables within their respective spaces. To ensure comparability, the two contribution scores (private and shared) were normalized to the range [0,1] by dividing each score by the combined sum of both scores. These normalized proportions were then ranked to determine the relative influence of private and shared components on gene regulation. Using these normalized contribution proportions, we distinguished genes primarily regulated by private information from those predominantly influenced by shared features. To investigate the biological significance of these findings, we performed Gene Ontology biological process (GO:BP) enrichment analysis using the gprofiler2^[195]72 website with selected top 100 ranked genes from each group. Additionally, we conducted Gene Set Enrichment Analysis (GSEA) using the gesapy package^[196]73 with the ranked gene lists. Benchmarking analysis of GRN inference To evaluate the performance of scTFBridge in predicting cis- and trans-regulation, we conducted a benchmarking analysis against established methods using consistent dataset splits to ensure a fair comparison. For cis-regulation prediction, we compared scTFBridge with PCC, TSS distance, random prediction, and SCENIC + ^[197]37. For trans-regulation prediction, we evaluated scTFBridge against baseline methods for cell-type-specific GRN inference from single-cell multi-omics data: LINGER^[198]33, scMTNI^[199]35, scMultiomeGRN^[200]34, and SCENIC+^[201]37. To address variations in gene filtering criteria across methods, we used the intersection of genes retained by all methods for final metric calculations, ensuring a fair and consistent evaluation. For GRN inference, performance was assessed by comparing the top-k predicted target genes of specific TF from each method against the intersection of target genes in the Cistrome database^[202]57. The overlapping genes and their associated scores were used for performance evaluation. Heritability enrichment analysis We utilized stratified linkage disequilibrium score regression (S-LDSC)^[203]53 to assess the heritability enrichment of various peak sets. First, we converted the genomic coordinates of peaks from hg38 to hg19 to align with the GWAS summary statistics of RA. SNPs with a minor allele frequency (MAF) ≥ 0.05 associated with the peak sets of interest were selected. Each 200 bp peak window was extended by 500 bp on either side, resulting in a 1.2 kb genomic interval per peak. Using S-LDSC, we then calculated population-specific linkage disequilibrium (LD) scores and quantified the proportion of RA heritability attributable to modality-shared and modality-private peak sets from different single-cell populations in the RA dataset, as analyzed using scTFBridge. Evaluation metric AUPR Ratio The AUPR Ratio measures the improvement of a predictor over random chance in terms of precision-recall performance. It is defined as the ratio of the Area Under the Precision-Recall (AUPR) curve of a method to that of a random predictor. For a random predictor, the AUPR is equal to the fraction of positive samples in the dataset [MATH: P :MATH] , which is defined as: [MATH: AUPRRatio=AUPRmetho< mi>dP :MATH] 24 AUC The AUC score evaluates the performance of a predictor in distinguishing between positive and negative samples. It is derived from the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate (sensitivity) against the false positive rate (1-specificity) across thresholds. F1 Score The F1 Score assesses the balance between precision and recall, particularly useful when dealing with imbalanced datasets. It is computed as the harmonic mean of precision and recall, which is defined as: [MATH: F1Score=2Pre cision⋅< /mo>Recall Precisi on+Reca< /mi>ll :MATH] 25 Implementation details of scTFBridge We implemented scTFBridge using the PyTorch deep learning library in Python. The dimensions of modality-shared and private embedding were empirically set to 128 in scTFBridge. The model was trained with the Adam optimizer, employing an initial learning rate of 1e-3 and weight decay of 1e-5, reduced by epoch. Model training was conducted with a batch size of 128 over 150 epochs with early stopping triggered after 10 steps of no improvement. Hyperparameters used to balance the weights of each training objective, with example value was provided in Supplementary Data [204]8. To ensure robustness, the datasets were divided into five folds for cross-validation, with each fold split into 64% for training, 16% for validation and 20% for testing. The held-out validation set was used to identify the best model checkpoint in early stopping and optimize hyperparameters. The SHAP values for GRN inference were computed using the test datasets. All experiments were conducted on a workstation equipped with two Intel Xeon Silver 4210 R CPUs and eight NVIDIA GeForce RTX 4090 GPUs. Reporting summary Further information on research design is available in the [205]Nature Portfolio Reporting Summary linked to this article. Supplementary information [206]Supplementary Information^ (2.8MB, pdf) [207]Supplementary Data 1^ (9.1KB, xlsx) [208]Supplementary Data 2^ (9.4KB, xlsx) [209]Supplementary Data 3^ (100.3KB, xlsx) [210]Supplementary Data 4^ (9.3KB, xlsx) [211]Supplementary Data 5^ (9.1KB, xlsx) [212]Supplementary Data 6^ (9.3KB, xlsx) [213]Supplementary Data 7^ (15.8KB, xlsx) [214]Supplementary Data 8^ (9.3KB, xlsx) [215]Supplementary Data 9^ (9.2KB, xlsx) [216]Supplementary Data 10^ (8.9KB, xlsx) [217]Reporting Summary^ (265.9KB, pdf) [218]41467_2025_64227_MOESM13_ESM.pdf^ (189.6KB, pdf) Description of Additional Supplementary Files [219]Transparent Peer Review file^ (7.4MB, pdf) Source data [220]Source Data^ (390.6KB, zip) Acknowledgements