Abstract
The interplay between transcription factors (TFs) and regulatory
elements (REs) drives gene transcription, forming gene regulatory
networks (GRNs). Advances in single-cell technologies now enable
simultaneous measurement of RNA expression and chromatin accessibility,
offering unprecedented opportunities for GRN inference at single-cell
resolution. However, heterogeneity across omics layers complicates
regulatory feature extraction. We present scTFBridge, a multi-omics
deep generative model for GRN inference. scTFBridge disentangles latent
spaces into shared and specific components across omics layers. By
integrating TF-motif binding knowledge, scTFBridge aligns shared
embeddings with specific TF regulatory activities, enhancing biological
interpretability. Using explainability methods, scTFBridge computes
regulatory scores for REs and TFs, enabling robust GRN inference. Our
results further demonstrate that scTFBridge can identify
cell-type-specific susceptibility genes and distinct regulatory
programs, providing insights into gene regulation mechanisms at the
single-cell level.
Subject terms: Computational models, Gene regulatory networks
__________________________________________________________________
Recent work has enabled the simultaneous analysis of RNA levels and
chromatin accessibility for gene regulatory network (GRN) inference,
but heterogeneity across omics layers remains a barrier in these
approaches. Here, the authors present scTFBridge, a deep-generative
model for GRN inference, whereby latent spaces are disentangled into
shared and specific components across omics layers.
Introduction
Transcription of genes is primarily regulated by TFs, proteins that
bind to specific DNA sequences to trigger diverse gene regulatory
functions^[34]1. These regulatory elements (REs) are typically located
in accessible chromatin regions, such as promoters, enhancers, and
silencers^[35]2,[36]3. The binding of TFs to REs and their spatial
chromatin interaction with target genes (TGs) ultimately determines
specific gene transcription processes^[37]4. The complex interplay
among TFs, REs, and TGs can be represented as interconnected GRNs, the
foundational elements of gene regulation in single cells^[38]5–[39]7.
Understanding the regulatory mechanisms within these GRNs is pivotal
for elucidating how cellular identities are established and maintained.
Such insights have far-reaching implications, from guiding cell fate
engineering to informing strategies for disease prevention and
treatment^[40]8–[41]10.
Recent advances in single-cell high-throughput sequencing techniques
have enabled the exploration of gene regulatory processes at
single-cell resolution^[42]11–[43]13. For example, single-cell RNA
sequencing (scRNA-seq) facilitates the inference of cell-type-specific
trans-regulatory interactions through computational tools such as
DeepSEM^[44]14, PIDC^[45]15, GENIE3^[46]16, SupirFactor^[47]17, and
SCENIC^[48]18. Similarly, single-cell assay for transposase-accessible
chromatin using sequencing (scATAC-seq) provides insights into
chromatin accessibility at specific gene regions (Peaks)^[49]19,
enabling the identification of TF-RE interactions related to gene
regulation through methods like DeepTFni^[50]20, and DirectNet^[51]21.
However, these distinct omics layers are typically measured in separate
cells, complicating efforts to directly link RNA expression and
chromatin accessibility within the same cell^[52]22. Unpaired data
integration methods, such as scJoint^[53]23 and scGLUE^[54]24,
integrate these datasets but are primarily optimized for tasks like
cell-type annotation and harmonized data representation. Alternatively,
cross-modality generation approaches, such as BABEL^[55]25 and
UnpairReg^[56]26, predict gene expression from scATAC-seq data to
enable deeper gene regulation analysis. Nevertheless, accurately
capturing complex regulatory relationships within gene regulatory
networks (GRNs) at single-cell resolution remains constrained by the
availability of paired data and data quality.
The emergence of single-cell multi-omics technologies, such as 10x
Multiome^[57]27 and SHARE-seq^[58]28, has transformed the field by
enabling simultaneous profiling of paired RNA-seq and ATAC-seq data
from the same cell. This advancement facilitates more accurate GRN
inference and a deeper understanding of gene regulation at the
single-cell level. However, integrating these high-dimensional and
heterogeneous multi-omics datasets remains a significant
challenge^[59]29,[60]30. A key objective is to encode distinct omics
layers into a unified latent space for a cohesive representation, as
achieved by methods like MultiVI^[61]31, UnitedNet^[62]32.
Additionally, specialized approaches, including LINGER^[63]33,
scMultiomeGRN^[64]34, scMTNI^[65]35, scButterfly^[66]36, and
SCENIC+^[67]37 have been developed to elucidate gene regulation across
single-cell RNA-seq and ATAC-seq data. Despite these advances, a
persistent hurdle in data integration is the modality
gap^[68]38,[69]39. This gap arises from the intrinsic heterogeneity of
different omics layers, where each modality captures distinct aspects
of complex biological processes^[70]40. As a result, modality-specific
private information can hinder the creation of a shared multi-omics
latent space and impair accurate data reconstruction from the
integrated embedding^[71]41,[72]42. Bridging this modality gap is
essential for advancing multi-omics data integration and enabling more
precise GRN analyses.
In this study, we propose scTFBridge, an interpretable deep generative
model for multi-omics integration and GRN inference at the single-cell
level. By leveraging mutual information theory^[73]43–[74]45,
scTFBridge minimizes mutual information and disentangles both RNA-seq
and ATAC-seq data into modality-shared and modality-private subspaces.
The modality-shared representations are further aligned through
contrastive learning to construct a unified latent space. Incorporating
biological priors of TF-motif bindings, scTFBridge constrains shared
latent variables to represent specific TF regulatory activities,
effectively acting as a bridge between RNA-seq and ATAC-seq modalities.
Using deep learning explainability techniques, we derive regulatory
scores linking REs and latent TF variables to TG expressions during the
generation of TG expression. Our results demonstrate that scTFBridge
effectively disentangles modality-shared and modality-private
information, yielding biologically interpretable embeddings that
capture cell-type-specific TF regulatory activities. Benchmarking
against state-of-the-art methods on single-cell cis- and
trans-regulation datasets, scTFBridge outperforms most baseline
approaches in regulatory feature extraction across diverse cell groups.
In summary, scTFBridge provides an efficient computational framework
for exploring phenotype-specific gene regulatory mechanisms at
single-cell resolution. It advances multi-omics data integration and
enhances our understanding of gene regulation dynamics in complex
cellular systems.
Results
The framework of scTFBridge
scTFBridge is a biologically interpretable deep generative model
designed for integrating single-cell multi-omics data and inferring
cell-specific GRNs (Fig. [75]1a). It leverages a hybrid variational
autoencoder^[76]46 (VAE) architecture that decomposes latent variables
into modality-shared and modality-private components for each omics
layer (Fig. [77]1b). Through mutual information theory and contrastive
learning regularizations, scTFBridge effectively disentangles shared
and private representations while aligning the shared latent space to
capture common regulatory signals (Fig. [78]1c, Methods). To enhance
biological interpretability, scTFBridge incorporates prior knowledge by
constraining the weights of the shared ATAC-seq decoder (Fig. [79]1d).
The shared latent variables were denoted as specific TF activities that
contributed to modality reconstruction. Further enrichment analysis
revealed the differential biological function of private/shared sets in
a cell-specific manner (Fig. [80]1e). Using model interpretation
techniques, scTFBridge enables the extraction of RE and TF regulatory
scores for TGs, enabling the establishment of single-cell-specific GRNs
(Fig. [81]1f, g).
Fig. 1. Overview of the scTFBridge framework.
[82]Fig. 1
[83]Open in a new tab
a Schematic representation of RNA transcription, highlighting the
cooperative roles of transcription factors (TFs) and regulatory
elements (REs) in driving RNA synthesis. b The model architecture of
scTFBridge features a hybrid multi-modal variational autoencoder (VAE)
designed for latent space dismantlement and representation of mutual
information (MI) learning. c Cross-mutual-information module for
aligning modality-shared latent spaces and reducing MI between
modality-shared and modality-private embeddings. d Regulatory module in
the ATAC decoder, where TF–motif binding constrains the connections
between TF variables and regulatory elements. e Definition of
private/shared peak/gene sets according to their information flow from
the input layer to disentangled latent space. Further enrichment
analysis reveals the differential biological functions between
private/shared sets in a cell-specific manner. f Cis-regulation
inference depicted for target gene (TG) expression prediction based on
ATAC data, where the objective function minimizes the mean squared
error (MSE) between original and reconstructed gene expression levels.
g Trans-regulation inference depicted for TG expression prediction
using latent variables, employing the same computational strategy as in
cis-regulation inference. Panel a was created with BioRender.com.
In detail, we developed a Cross Mutual Information (CMI) module and
contrastive learning (CL) loss to create a structured latent space that
effectively separates and aligns information from both modalities
(Fig. [84]1c). The CMI module employs independent neural networks to
impose an upper bound^[85]47 on the MI (Methods), minimizing MI between
shared and private embeddings within each modality, as well as between
private embeddings across modalities. This forces common information
into shared components. The CL loss pulls shared embeddings from the
same cell closer together in the latent space, while pushing them away
from the unrelated cells. This process aligns the two modalities,
creating a unified space where the shared embeddings for a given cell
are interchangeable, regardless of modality. Additionally, we employed
a Product of Experts (PoE) module to integrate and maintain the final
shared latent embedding. These approaches enable our model to learn a
robust, shared representation that captures the common biology between
RNA-seq and ATAC-seq while explicitly filtering out modality-specific
noise.
To preserve TF-RE regulatory relationships, we developed a regulatory
module in scTFBridge that incorporates known TF-motif binding
affinities to single-cell multi-omics data. In the shared latent
embedding, latent variables are assigned to specific TFs. Prior
knowledge of TF-motif binding affinities constrains the neural network
weights, linking these TF latent variables to reconstructed REs
(Fig. [86]1d, Methods). Concurrently, the same TF latent variables feed
into the TG decoder network to capture paired gene expression
information. The PoE module and the weight regularization ensure that
the shared embeddings from RNA-seq and ATAC-seq data are aligned and
interpretable at the latent variable level, enabling direct analysis of
associations between multi-omics features and corresponding TFs in the
shared latent space.
To enhance clarity and precision in describing gene regulatory
mechanisms, we employed SHAP (Shapley Additive Explanations) to
quantify the contribution of input REs or shared latent TF variables to
the TG expression reconstruction^[87]48 (Fig. [88]1f, g). SHAP scores
were computed based on the mean squared error (MSE) between imputed and
observed gene expression levels under specific feature perturbations
(Methods). The biologically constrained model framework allows these
SHAP scores to serve as regulatory scores for each RE and TF associated
with TGs. For GRN inference, we derived both cis-regulation (RE-TG) and
trans-regulation (TF-TG) interactions across diverse cell types using
these scores.
scTFBridge facilitates cell-type-specific disentangled embedding learning
We applied scTFBridge to a single-cell multi-omics dataset comprising
paired RNA-seq and ATAC-seq measurements of bone marrow mononuclear
cells (BMMCs) from healthy donors across multiple batches^[89]49. The
learned latent embeddings of RNA-private and ATAC-private networks
enabled visualization of domain-specific cell representations,
alongside shared cell embeddings integrating both modalities. Uniform
manifold approximation and projection (UMAP) analysis of the embeddings
revealed distinct cellular information within each domain. In the
RNA-private space, most cell types were poorly separated, with the
clear exception of erythroblasts and normoblasts (Fig. [90]2a).
Conversely, the ATAC-private embedding captured greater cellular
heterogeneity (Supplementary Fig. [91]1). To investigate the limited
resolution of RNA-private embeddings, we estimated the MI score between
RNA-private, RNA-shared, and scRNA-seq embeddings generated by
scVI^[92]50 (Supplementary Data [93]1). The results indicated that
RNA-private embeddings contained minimal unique information compared to
RNA-shared embeddings, with substantial overlap between RNA-seq and
ATAC-seq data. This overlap likely stems from the use of single-nucleus
RNA sequencing (snRNA-seq), which predominantly captures nuclear
transcripts. Additionally, we performed Gene Set Enrichment Analysis
(GSEA)^[94]51 to access functional roles of genes in RNA-private
embeddings based on their contribution ranks to shared or private
embeddings (Supplementary Fig. [95]2). The analysis revealed that
RNA-private genes in erythroblasts and normoblasts were significantly
enriched in the Heme Metabolism pathway, indicating that their gene
expression during erythroblast differentiation is regulated
independently of ATAC-seq signals.
Fig. 2. Interpretable TF-informed embedding learning of scTFBridge in BMMCs.
[96]Fig. 2
[97]Open in a new tab
a UMAP visualization of modality-shared and RNA-private embeddings in
BMMCs. b Comparison between ground-truth target gene (TG) RNA
expression and scTFBridge-based reconstruction from ATAC-seq data. c
Quantification and ratio of annotated promoter peaks across shared and
private peak sets for different cell types. d Regulatory scores and
target gene counts for cell-type-specific TF variables identified
through biological interpretation analysis. e Visualization of
cell-type-specific gene regulons derived from high-ranked target genes
(TGs) associated with TF variables. The sample size of the box plot is
the number of cells for each cell type, ranging from 171 to 2168 cells.
Note that all boxplots in this study present minima and maxima, the
smallest and largest value that is not considered an outlier; center,
median; bounds of box, 25th (Q1) to 75th (Q3) percentile; whiskers, 1.5
times the (Q3–Q1). Source data are provided as a Source Data file.
To evaluate the scTFBridge’s capacity to learn cross-modal
interactions, we performed a TG expression reconstruction experiment.
In detail, we used the RNA-shared embedding inferred from ATAC-seq
data, concatenated with a randomly sampled RNA-private embedding for TG
reconstruction. The model’s performance was then quantified by the
concordance between the clustering result of generated TG expression
data and the ground-truth cell annotations (Methods). Compared to
scButterfly^[98]36, scTFBridge achieved superior TG expression
reconstruction accuracy and preserved cell-specific information more
effectively (Fig. [99]2b, Supplementary Fig. [100]3). These results
demonstrate that paired ATAC-seq data successfully capture key gene
expression signals. Additionally, ablation experiments revealed that
both MI and CL modules significantly contributed to RNA-seq data
reconstruction and information retention (Supplementary Data [101]2).
We further assessed the contributions of different ATAC-seq data Peak
features to the ATAC-shared and ATAC-private embeddings based on the
information flow through different encoders. Peaks were categorized
into shared or private groups by their relative contributions to shared
or private embeddings (Methods). Comparing the number and ratio of
promoter Peaks in each group, our result revealed that the
modality-shared embedding primarily enriched gene regulatory signals
(Fig. [102]2c). We then explored the cell-type-specific gene regulatory
elements through biological interpretation analysis using scTFBridge
(Methods). Our results identified cell-type specific TFs, such as ESR
in cDC cells, TCF7 in CD4 naïve T cells, and SOX4 in B cells
(Fig. [103]2d). Additionally, high-ranked TGs of TF variables revealed
cell-type-specific gene regulons (Fig. [104]2e). These findings
demonstrate scTFBridge’s capacity for biologically disentangled
learning and its effectiveness in elucidating potential cellular gene
regulatory mechanisms.
scTFBridge reveals heritability enrichment of peaks in rheumatoid arthritis
We applied scTFBridge to a single-cell multi-omics dataset from
profiled immune cells from rheumatoid arthritis tissues^[105]52. To
visualize the latent representations, we used UMAP to project the
modality-shared, RNA-private, and ATAC-private representations into a
two-dimensional space (Fig. [106]3a). In the RNA-private embedding,
immune cells formed close clusters, suggesting that most of the gene
regulatory information for these cells in the RNA-seq data is captured
by modality-shared embeddings, as compared to the ATAC-seq data. The MI
scores between RNA-shared and ATAC-shared representations was higher
than the MI between RNA-shared and RNA-private or ATAC-shared and
RNA-private, demonstrating that scTFBridge effectively disentangles
modality-private and modality-shared components (Fig. [107]3b).
Fig. 3. Identifying cell-specific modality-shared and private regulatory
elements and functions in rheumatoid arthritis.
[108]Fig. 3
[109]Open in a new tab
a UMAP visualization of modality-shared, RNA-private, and ATAC-private
embedding in the dataset of rheumatoid arthritis. b Mutual information
(MI) comparison between different representations. R and A denote RNA
and ATAC modalities, respectively; s and p denote shared and private
embeddings, respectively. c Gene Ontology (GO) pathway analysis
comparing shared and private gene sets across different cell types. The
enrichment statistical significance was assessed using the
hypergeometric test, with -log[10](p-value) indicated. d Quantification
and ratio of annotated promoter peaks across shared and private peak
sets in different cell types. e Analysis of shared/private and
dynamic/condition-specific (cs)-invariant peaks across cell types.
Statistical significance was assessed using the one-sided Chi-square
test, with p-values indicated. f Heritability enrichment analysis of
shared and private peak sets of single-cells from RA samples. The
x-axis shows the -log[10] enrichment p-value, and the dot size reflects
the h^[110]2 effect. Heritability enrichment p-values were calculated
using Stratified LD Score S-LDSC. A Z-test was performed on the
regression coefficient (τ) for each annotation to determine if its
contribution to heritability was significantly different from zero.
Source data are provided as a Source Data file.
To uncover deeper biological insights from distinct RNA-seq embeddings,
we quantified the normalized absolute importance scores for each gene
across different embeddings and identified the top 100 genes as their
predominant gene programs (Methods). Gene Ontology (GO) pathway
enrichment analysis revealed that the shared gene program primarily
contributed to immune activation and receptor signaling, while the
private gene program was associated with cell cycle regulation
(Fig. [111]3c, Supplementary Data [112]3). Similarly, we assessed the
enrichment of promoter peaks into modality-shared and ATAC-private
embeddings. The results confirmed that the modality-shared embedding of
scTFBridge captured a higher number of promoter peaks (Fig. [113]3d),
highlighting its superior ability to capture gene regulatory signals in
single-cell Multiome datasets.
To further evaluate the scTFBridge model’s ability to characterize
crucial regulatory elements, we compared the shared- and ATAC-private
predominated REs identified by scTFBridge with the dynamic and
cell-state-invariant (cs-invariant) peaks identified in Gupta et al.
^[114]52. Dynamic peaks were defined as those whose variation could be
explained by paired RNA expression data, representing stronger gene
regulatory effects^[115]52. In this comparison, we examined the overlap
of peaks across two distinct groups (Fig. [116]3e). The results
demonstrated that shared peaks were significantly enriched in the
dynamic peak group (Chi-squared test, Methods). Additionally, we
employed the stratified LD-score regression (S-LDSC)^[117]53 method to
assess the heritability enrichment of shared and ATAC-private REs
concerning GWAS-annotated summary statistics for rheumatoid arthritis
(Methods). The enrichment scores revealed that shared peaks were more
effective in capturing causal variants associated with the disease
(Fig. [118]3f, Supplementary Data [119]4). These findings suggest that
the scTFBridge model provides a more refined characterization of gene
regulatory signals.
Benchmarking analysis of cell-type-specific cis-regulatory inference
Accurate GRN inference remains challenging across different cell types
and functional states due to the limited availability of gold-standard
datasets and potential experimental biases. To assess the
cell-type-specific cis-regulatory inference capabilities of scTFBridge,
we integrated promoter-capture Hi-C (pcHi-C) data from three primary
blood cell types^[120]54 and expression quantitative trait loci (eQTL)
data from eight immune cell types^[121]55,[122]56. These external
validation datasets include SNPs significantly associated with gene
expression, serving as known cis-regulatory RE-TG pairs for performance
benchmarking (Supplementary Data [123]5, Methods). We compared
scTFBridge’s performance in cis-regulatory inference against
established methods, including PCC, Distance, Random, and
SCENIC+^[124]37 (Methods).
Since the distance from transcription start sites (TSS) of TG is a
critical factor in cis-regulatory inference, we categorized predicted
RE-TG pairs into six distance groups, ranging from 0–5 kb to
100–200 kb, for evaluation. Using single-cell eQTL data as the ground
truth, scTFBridge achieved the highest area under the curve (AUC) and
area under the precision-recall curve (AUPR) ratios across all distance
groups in naive CD4 cells (Fig. [125]4a, b, Methods), whereas other
methods performed comparably to random. Since SCENIC+ only provides
final RE-TG pairs without relevance scores, we used the F1 score to
assess its performance (Methods). scTFBridge outperformed SCENIC+ in F1
scores across all distance groups (Fig. [126]4c). In other cell types,
due to the limited availability of eQTL pairs, all RE-TG pairs were
treated as positive labels without further distance-based
stratification. scTFBridge consistently outperformed baseline methods
in both AUC and AUPR ratios across all evaluated cell types, achieving
the highest overall ranking (Fig. [127]4d, e) and significantly higher
F1 scores compared to SCENIC+ (Fig. [128]4f). Additionally, when using
pcHi-C data as ground truth, scTFBridge also achieved the highest AUC
across all distance groups in naive CD4 T cells (Fig. [129]4g).
Fig. 4. Benchmarking analysis of cell-type-specific cis-regulatory inference
in PBMCs.
[130]Fig. 4
[131]Open in a new tab
a, b Benchmark performance (AUC and AUPR ratio) of different methods
for cis-regulatory inference in naïve CD4+ T cells, evaluated using
eQTL data as ground truth. c F1 score comparison between scTFBridge and
SCENIC+ for cis-regulatory inference in naïve CD4+ T cells. Statistical
significance was assessed using a two-sided paired Student’s t-test,
with *** indicating a p-value < 0.001. The sample size is the number of
TSS distance groups with 6. The p-value is 0.00051. d, e Benchmark
performance (AUC and AUPR ratio) of different methods of cis-regulatory
inference across other cell types. f F1 score comparison between
scTFBridge and SCENIC+ in the cis-regulatory inference across other
cell types. Statistical significance was evaluated with a two-sided
paired Student’s t-test, where ** indicates a p-value < 0.01. The
sample size if the number of other cell types with 7. The p-value is
0.0016. g Benchmark performance (AUC) of different methods in the
cis-regulatory inference of naïve CD4 + T cells, using the pcHi-C data
as ground truth. h scTFBridge-identified cis-regulatory interactions of
AIF1 in CD14 Mono, CD4+ T central memory (TCM) cells, and conventional
dendritic cells (cDC), supported by eQTL evidence. The sample size of
box plot is the number of cells for each cell type: cDC (n = 36), CD4
TCM (n = 223) and CD14 Mono (n = 511). Source data are provided as a
Source Data file.
To further demonstrate its utility, we examined cis-regulatory
interactions identified by scTFBridge that align with known gene
regulatory mechanisms across various cell types. For instance, AIF1
gene expression varies among cell types, being highly expressed in CD14
monocytes and cDCs, but with lower expression in T and B cells. To
illustrate scTFBridge’s ability to capture cell-type-specific
cis-regulatory interactions for AIF1, we analyzed a 200 kb region
upstream and downstream region of its TSS, annotating
cell-type-specific peak accessibility, eQTL SNP loci, and inferred
RE-TG pairs predicted by scTFBridge (Fig. [132]4h). Using CD4 TCM cells
as a reference, scTFBridge successfully identified the majority of
known cis-regulatory elements in CD14 Mono and cDCs, supported by
single-cell eQTL data. Additionally, scTFBridge uncovered potential
regulatory elements that may be specific to these cell types
(Fig. [133]4h). These results underscore the heterogeneity of
cis-regulatory mechanisms across cell types and highlight scTFBridge’s
potential to enhance and refine cell-type-specific GRN inference.
Benchmarking analysis of cell-type-specific trans-regulatory inference
To evaluate the single-cell trans-regulatory inference capability of
scTFbridge, we leveraged pre-identified TF-TG regulatory pairs from
Chromatin Immunoprecipitation sequencing (ChIP-seq) data in the
Cistrome database as the ground truth^[134]57. Specifically, we
compiled regulatory data for 17 TFs and their corresponding putative
target genes across four blood cell types (Supplementary Data [135]6).
We benchmarked scTFBridge’s performance against baseline methods
designed for cell-type-specific GRN inference employing single-cell
multi-omics data, including LINGER^[136]33, scMTNI^[137]35,
scMultiomeGRN^[138]34, and SCENIC+^[139]37. As a case study, we
analyzed the trans-regulatory scores of STAT1 across all single cells
(Fig. [140]5a) and examined its paired trans-regulatory TGs in CD14
monocytes. scTFBridge outperformed all baseline methods except LINGER,
achieving an AUC score of 0.693, and an AUPR ratio of 2.221
(Fig. [141]5b, c). Across the benchmarking dataset, scTFBridge
consistently surpassed the other baseline methods, yielding higher AUC
scores and AUPR ratios for all tested TFs (Fig. [142]5d, e).
Fig. 5. Benchmarking analysis of cell-type-specific trans-regulatory
inference in PBMCs.
[143]Fig. 5
[144]Open in a new tab
a UMAP visualization of modality-shared embedding of PBMCs. b, c
Receiver operating characteristic curve and precision-recall curve of
trans-regulatory potential inference of STAT1 in CD14 Mono cells. d, e
Box plots comparing AUC and AUPR ratio scores across various methods
for identifying putative TF target genes using ChIP-seq data.
Statistical significance was evaluated with two-sided paired Student’s
t-test, where *** indicates a p-value < 0.001. The sample size is the
number of putative TFs with 17. For d, p-values are 7.2e^-3, 4.4e^-5,
1.2e^-5, and 4.8e^−6 for LINGER, SCENIC+, scMTNI, and scMultiomeGRN.
For e, p-values are 9.1e^-3, 1.1e^-4, 9e^-6 and 4.4e^−6, respectively.
f Top TF-regulated gene expression validated by ChIP-seq data across
different cell types, including ETS1 in CD4+ T cells, STAT1 in CD14+
monocytes, and PAX5 in naïve B cells. g Violin plots showing cell-level
trans-regulatory scores of specific TF-TG pairs across different cell
types. The sample size is the number of cells for each cell type,
ranging from 25 to 511 cells. h Network representations of inferred
TF-TG pairs in CD14 Mono and naïve B cells. Node color corresponds to
gene expression levels, while edge color represents regulatory scores.
Source data are provided as a Source Data file.
Unlike LINGER, which achieves high performance through pre-training on
extensive external bulk-seq datasets and using pre-trained gene
regulatory scores for GRN inference. scTFBridge operates as a
self-contained, self-supervised framework requiring no external
training data. This streamlined and versatile approach makes scTFBridge
highly adaptable to datasets without high-quality bulk data. We
conducted a comprehensive performance comparison of specific TFs and
their top-ranked target genes (Supplementary Data [145]7 and Methods).
For gene sets within the top 150 ranks, LINGER and scTFBridge showed
comparable performance. However, for certain TFs (IRF1, SPI1, BATF, and
RUNX3), scTFBridge exhibited superior performance across most ranges
(Supplementary Fig. [146]7), demonstrating its robustness and broad
applicability across diverse TFs.
Next, we explored trans-regulatory associations in other single-cell
groups. Based on annotated TF-TG associations, we selected ETS1, STAT1,
and PAX5 as the dominant regulatory TF in CD4 T cells, monocytes, and B
cells, respectively, along with their most associated TGs. Comparing TG
expression levels revealed cell-type-specific patterns of gene
expression for these TF-TG pairs (Fig. [147]5f). We further evaluated
the regulatory scores predicted by scTFBridge and found that the model
effectively captured cell-type-specific trans-regulatory associations
(Fig. [148]5g). To visualize these relationships, we reconstructed the
inferred trans-regulatory networks for cell-type-dominant TFs and their
associated TGs. For instance, we identified the STAT1-ZEB2 regulatory
link in CD14 Monocytes and BATF-BANK1 regulatory link in B cells
(Fig. [149]5h). These results underscore scTFBridge’s ability to
resolve cell-type-specific trans-regulatory activities at single-cell
resolution.
scTFBridge uncovers cell-type-specific gene regulations in lung cancer
susceptibility
To explore disease-specific single-cell gene regulation, we applied
scTFBridge to a single-cell multi-omics dataset derived from lung
cancer^[150]58. This dataset comprises paired RNA-seq and ATAC-seq data
collected from tumor-distant normal lung tissues of ever- and
never-smokers (n = 8 per group). The modality-shared embedding of these
single cells revealed a distinct separation based on annotated cell
types from Long et al. (Fig. [151]6a). In contrast, the RNA-private and
ATAC-private embeddings exhibited less cellular information, with the
RNA-private embedding showing particularly reduced information content
(Supplementary Fig. [152]8).
Fig. 6. Cell-type-specific gene regulations in lung cancer susceptibility
uncovered by scTFBridge.
[153]Fig. 6
[154]Open in a new tab
a UMAP visualization of the modality-shared embedding of lung
single-cell dataset. b Regulation scores of tier-6 linkage variants and
candidate cis-regulatory elements associated with target genes across
different cell types. Dot color indicates cell-type categories, and dot
size represents the regulation scores. c UMAP visualization of NRG1
gene expression of lung single-cell dataset as in (a). d UMAP
visualization of the regulation scores for the variant rs4236709-linked
peak across different cell types as in (a). e Visualization of peak
accessibility and regulation scores within the NRG1 locus across
various cell types. Dot color corresponds to the regulatory score of a
specific peak to NRG1. Source data are provided as a Source Data file.
Long et al. previously defined high-confidence RE-TG pairs (tier 6)
that included lung cancer candidate causal variants (CCVs) colocalized
REs, and their TGs. This group proposed potential gene regulatory
patterns contributing to lung cancer susceptibility. To validate
scTFBridge’s ability to uncover cell-type-specific cancer
susceptibility cis-regulations, we compared the inferred regulatory
scores of identified RE-TG pairs across predefined calls in cell
subtypes and other cell types (Fig. [155]6b). The results demonstrated
that scTFBridge effectively captures higher cell-specific
cis-regulations for these RE-TG pairs.
Specifically, we visualized the expression of the target gene NRG1
across all single cells and evaluated the regulatory score inferred by
scTFBridge for a CCV-specific peak (chr8: 32552186–32552663)
(Fig. [156]6c). The findings confirmed that scTFBridge accurately
reflects cell-type-specific cis-regulation. Furthermore, a comparison
of the inferred regulatory scores across peaks within the NRG1 gene
region also indicated that the CCV-colocalized peak exhibited the
highest regulatory potential (Fig. [157]6d, e). To further investigate
context-dependent gene regulation, we extended our analysis to
smoking-responsive CREs, the results confirmed reduced regulatory
scores for selected RE-TG pairs in cells from the ever-smoker group
(Supplementary Fig. [158]9). These results underscore the capability of
the scTFBridge model as a powerful tool for identifying gene regulatory
events underlying human disease susceptibility.
Discussion
Inferring dynamic gene regulatory processes at single-cell resolution
advances our understanding of complex cellular regulation during
development and disease, while also facilitating the discovery of
genotype-phenotype associations^[159]14,[160]59. Although multiple
methods have been developed to infer GRNs using scRNA-seq or scATAC-seq
datasets^[161]37,[162]60, these approaches are often limited by the
unpaired nature of the datasets and the high sparsity of single-cell
data, making it challenging to accurately characterize complex
cell-type-specific GRNs. The advent of single-cell multi-omics
techniques allows for the simultaneous profiling of individual cells at
both the transcriptomic (RNA-seq) and epigenomic (ATAC-seq) levels.
This comprehensive characterization necessitates the development of
computational methods tailored for single-cell multi-omics profiling.
These datasets facilitate learning joint embeddings of single-cell
features^[163]2, establishing associations between REs and TG
expression^[164]61, and inferring GRNs^[165]33,[166]37. Additionally,
advances in biological deep learning for single-cell omics further
provide more accurate representations of single-cell data, enhancing
interpretability and offering deeper insights into cellular regulatory
mechanisms^[167]17,[168]24,[169]62.
In this study, we introduced scTFBridge, an interpretable deep
generative model designed to infer GRNs from single-cell multi-omics
data. Leveraging mutual information theory, scTFBridge effectively
separates the latent variable space into modality-shared and
modality-private subspaces, addressing key challenges posed by modality
gaps and information redundancy in multi-omics
integration^[170]39,[171]63. Furthermore, scTFBridge incorporates prior
knowledge of TF-motif binding to regularize the biological learning of
modality-shared embedding. This approach enables the model to capture
TF-RE interactions within the shared embeddings, enhancing
interpretability. Notably, each latent variable in the modality-shared
embedding corresponds to a specific TF, forming a bridge between
RNA-seq and ATAC-seq data to support accurate GRN inference. This
architecture not only improves GRN inference accuracy but also ensures
that the representations remain biologically interpretable.
We applied scTFBridge to single-cell multi-omics datasets derived from
BMMCs^[172]49, rheumatoid arthritis cells^[173]52, PBMCs, and lung
cells^[174]58 to evaluate its performance across diverse gene
regulatory tasks. Leveraging a deep-learning model interpretation
framework, scTFBridge effectively identified cell-type-specific TFs and
uncovered the contributions of various TFs and REs to TG regulation in
different cell types. Using promoter-capture Hi-C (pcHi-C), eQTL, and
ChIP-seq data from blood as ground truth, scTFBridge demonstrated
superior performance compared to most baseline methods in inferring
cis- and trans-regulatory interactions in the PBMC dataset.
Furthermore, analysis of a lung cell multi-omics dataset demonstrated
scTFBridge’s ability to identify candidate susceptibility genes
associated with causal variants within cell-specific REs. These results
highlight scTFBridge’s efficacy in disentangling complex regulatory
features, thereby advancing single-cell GRN inference.
Despite its contributions, this study has several limitations. Unlike
LINGER^[175]33, Enformer^[176]64, and NvwaCE^[177]65 models, which
leverage large-scale pretrained data, scTFBridge was trained from
scratch, potentially limiting its ability to uncover established gene
regulatory mechanisms. The benchmarking analysis primarily used
blood-derived datasets due to the limited availability of standardized
ground truth data for other tissues and cell types, reducing the
method’s generalizability. Furthermore, our analysis concentrated on a
simplified model of TF and RE binding to disentangle each latent
variable in modality-shared space, potentially limiting its
applicability to more complex TF regulatory contexts. Although
scTFBridge effectively extracts latent variables for RNA-seq and
ATAC-seq data, it does not account for regulatory mechanisms involving
microRNAs, non-coding RNAs, or private peaks, necessitating further
exploration and complementary methodologies^[178]2. While the CMI
module’s capacity for information disentanglement was evaluated,
achieving complete and accurate information separation in multi-omics
data remains a significant challenge. To address this, researchers can
adapt the modality-reconstruction task^[179]32 to evaluate the
information constraints within specific latent embeddings and conduct
latent space independence tests to evaluate the information
redundancy^[180]66. A promising approach involves using a formal
information-theoretic decomposition framework to quantify multimodal
interactions, independent of prior biological understanding^[181]67.
This method employs mathematical estimators to measure redundancy
(shared information), uniqueness (private information), and synergy
(emergent information), offering a rigorous, quantitative way to
evaluate the separation of shared and private components in multimodal
data.
In summary, scTFBridge provides a computational framework for inferring
both cis- and trans-regulatory GRNs from single-cell multi-omics data.
The core principle of biologically informed multi-modal disentanglement
can be adapted to other single-cell multi-omics applications. This work
emphasizes the importance of developing interpretable, biologically
grounded multi-modal integration methods that advance our understanding
of molecular regulatory processes at the single-cell level.
Methods
Framework of scTFBridge
scTFBridge is a deep learning-based computational framework designed to
integrate single-cell multi-omics data and infer cell-type-specific
GRNs. The framework utilizes a hybrid multimodal VAE architecture to
decompose the data into shared and private latent spaces. The shared
latent space captures regulatory interactions among TFs, REs, and TGs,
and the private latent space encodes modality-specific information,
such as non-TF regulatory factors or other cellular-specific features
unique to each data modality.
Disentangled Multi-Modal VAE
For a given cell
[MATH: n :MATH]
, the observed multi-omics input data is denoted as
[MATH: x=xr,xa
mfenced> :MATH]
, where
[MATH: xr :MATH]
represents RNA-seq data and
[MATH: xa :MATH]
represent ATAC-seq data. we define a low-dimension latent variable
[MATH: z=zr,za
mfenced> :MATH]
to represent the observed latent omics. The latent space embedding
[MATH: z :MATH]
is divided into modality-shared and private representations, factorized
as
[MATH: zr={zsr,zpr} :MATH]
,
[MATH: za={zsa,zpa} :MATH]
. For a single modality (e.g., RNA-seq), the goal of the deep
generative model is to maximize the likelihood of the observed data
conditioned on the latent variables. This likelihood is expressed as
Eq. ([182]1):
[MATH: px;θ=∫px∣z;θpzdz :MATH]
1
Here,
[MATH: pz :MATH]
is the prior distribution, typically set as a standard Gaussian
distribution, and
[MATH: px∣z;θ :MATH]
represents the process of generating the observed data from latent
variable, and
[MATH: θ :MATH]
denotes the learnable parameters in the decoder. Direct computation of
[MATH: px;θ :MATH]
is challenging, so we introduced a variational posterior
[MATH: qz∣x;ϕ :MATH]
to approximate the true poster distribution
[MATH: pz∣x :MATH]
, where
[MATH: ϕ :MATH]
are the parameters of data encoder. Using this approximation, Eq.
([183]1) can be rewritten as:
[MATH: logpx;θ=log∫px∣z;θpzdz=log∫qz∣x;ϕpx∣z;θpzqz∣x;ϕ
mfrac>dz :MATH]
2
Using Jensen’s inequality, this leads to the evidence lower bound
(ELBO):
[MATH: logpx;θ=Eqz∣x;ϕ
msub>logpz,xqz∣x;ϕ
mfrac>+KL(q(
z∣x;ϕ)∣∣p(z∣x)) :MATH]
3
Since
[MATH: KL(q(
z∣x;ϕ)∣∣p(z∣x))≥0 :MATH]
, the ELBO is given by:
[MATH: ELBO=Eqz∣x;ϕ
msub>[logp(
x∣z;θ)
mrow>]−KL(q(
z∣x;ϕ)∣∣p(z)) :MATH]
4
Here,
[MATH: KL(q∣<
mo>∣p) :MATH]
represents the Kullback–Leibler divergence between
[MATH: qz∣x;ϕ :MATH]
and prior
[MATH: pz :MATH]
. To maximize the log-likelihood, we can maximize the ELBO.
To disentangle the latent variables and address batch effects, the
learning objective for each omics layer represents as follows:
[MATH:
ELBO=Eqzs,zp∣x;ϕ
mfenced>[logp(x∣zs,zp,c;θ)
mrow>]−(KL(q(
zs∣x;ϕ<
mi>s)∣∣
p(zs))+KL(q(
zp∣x;ϕ<
mi>p)∣∣
p(zp))) :MATH]
5
Here,
[MATH: c :MATH]
is the batch label used as a conditional variable in the generative
process.
For the modality-shared representation, we use a product-of-experts
(PoE) model to compute the joint variational posterior distribution for
each omics modality:
[MATH: qzs∣x=pzs
mfenced>∏i=1Nqzsi∣xi
mfenced> :MATH]
6
The prior
[MATH: pzs
mfenced> :MATH]
is modeled as a Gaussian distribution
[MATH: Nzs∣μ0,δ0
mrow> :MATH]
, and each modality-specific posterior
[MATH: q(zxi∣xi) :MATH]
is also Gaussian, with mean
[MATH: μsi :MATH]
and variance
[MATH: δsi :MATH]
. The joint distribution is defined by:
[MATH: δs=δ0−1+∑i∈r,a
munder>δi−1−
mo>1 :MATH]
7
[MATH: μs=μ0δ0−1+∑i∈r,a
munder>μiδi−1δs
:MATH]
8
The overall learning objective for multi-modal integration during
training is defined as:
[MATH: ∑iϵr,a
munder>([Eq(zsi∣xi;ϕs
i),q(zpi∣xi;ϕp
i)
msub>[logp(
xi∣zpi,zsi,c;θ<
mi>i)]
−KL(q(
zsi∣xi;ϕs
i)∣∣p(zsi)
)−KL(q(
zpi∣xi;ϕp
i)∣∣(p(zpi)
))]) :MATH]
9
Cross mutual information Block
To achieve a well-disentangled multi-modal latent space, we incorporate
mutual information (MI) theory to decouple shared and private
representations. The loss function for this block is defined as in Eq.
([184]10):
[MATH:
LCID=∑i∈<
/mo>r,a
munder>Izpi,zsi+Izpr,zpa :MATH]
10
Here,
[MATH: I(zpi,zsi) :MATH]
measures the MI between the private and shared latent embeddings within
a single modality
[MATH: i :MATH]
,
[MATH: I(zpr,zpa) :MATH]
measures the MI between private embeddings across modalities. The goal
is to minimize the MI between modality-shared and private embeddings
within a modality and between private embeddings across modalities. The
MI between two variables
[MATH: x :MATH]
and
[MATH: y :MATH]
is defined as in Eq. ([185]11):
[MATH: Ix;y=E
mi>px,ylogpx,ypxpy :MATH]
11
Since mutual information is generally intractable, we use an upper
bound known as the Contrastive Log-ratio Upper Bound (CLUB). This
approximation for
[MATH:
ICLUB<
/mi>x,y :MATH]
is described as in Eq. ([186]12):
[MATH:
ICLUB<
/mi>x,y=E
mi>px,ylogpy∣x−EpxEpylogpy∣x :MATH]
12
To validate that
[MATH:
ICLUB<
/mi>x,y :MATH]
is an upper bound of
[MATH: Ix;y :MATH]
, we compute the difference
[MATH: Δ=ICLUB−I<
/mi>x;y :MATH]
as in Eq. ([187]13):
[MATH: Δ=Ep(y)[logp(y)−
Ep(x)[logp(y∣x)]] :MATH]
13
Using the marginal distribution
[MATH: logp(y)=log
∫p(y∣x)p(<
/mo>x)dx=log(Epx[p(y∣x)]) :MATH]
and Jensen’s inequality, we derive:
[MATH: logpy=logEpxpy∣x≥Epxlogpy∣x :MATH]
14
This shows that
[MATH: Δ≥0 :MATH]
, confirming that
[MATH:
ICLUB<
/mi>x,y :MATH]
is indeed an upper bound of
[MATH: Ix;y :MATH]
which can be computed as:
[MATH:
ICLUB<
/mi>=1N
mi>∑i=1Nlogpyi∣xi
mfenced>−1N
2∑i=1N∑j=1Nlogpyi∣xi
mfenced> :MATH]
15
This simplifies to:
[MATH:
ICLUB<
/mi>=1<
mrow>N2∑i=1N∑j=1Nlogpyi∣xi
mfenced>−logpyj∣xi
mfenced> :MATH]
16
We employ a nonlinear neural network to estimate
[MATH: py∣x :MATH]
, with the log-likelihood computed as:
[MATH: ∑i=1Nlogpyi∣xi;ϕ :MATH]
17
During each training epoch, the parameters of the distribution
estimator are first updated to maximize Eq. ([188]17), improving the
accuracy of conditional distribution estimation. Subsequently, other
parameters of scTFBridge are updated to minimize
[MATH:
ICLUB<
/mi> :MATH]
, thereby reducing the mutual information between the two variables.
TF-Binding informed ATAC-seq Decoder
To ensure that the joint shared embedding represents transcription
factor (TF) regulatory activities, we incorporated prior knowledge of
TF-motif binding scores into the architecture of the ATAC-seq decoder.
We utilized 713 TF position weight matrices (PWMs) for known motifs,
sourced from the PECA2 GitHub repository
([189]https://github.com/SUwonglab/PECA). This repository aggregates
motif data from widely used databases, including JASPAR^[190]68,
TRANSFAC^[191]69, and UniPROBE^[192]70. From the single-cell RNA-seq
dataset, the 128 most highly expressed TFs were selected as dimensions
of the modality-shared embedding. Each TF corresponds to a single
latent variable in the shared embedding. Regulatory elements (REs) were
scanned for TF binding motifs using HOMER. This produced a binding
score matrix
[MATH: B :MATH]
, which quantifies the interaction strength between each TF and the
REs.
The ATAC-seq reconstruction decoder is implemented as a single-layer
neural network. The decoder receives the combined latent variables as
input and reconstructs ATAC-seq signals (peaks) constrained by
biological priors. The binding score matrix
[MATH: B :MATH]
serves as a mask to constrain the weight matrix
[MATH: W :MATH]
of the decoder.
[MATH: B :MATH]
ensures that connections from TF latent embeddings to ATAC-seq peaks
reflect known TF-RE interactions. For each connection in the decoder,
the weight matrix
[MATH: W :MATH]
is modulated by the binding score matrix
[MATH: B :MATH]
, as follows:
[MATH:
Wi,j=Wi,j*B
i,j,i∈
mo>0,…N,j∈0,…K :MATH]
18
Where
[MATH: N :MATH]
is the number of TF dimensions and
[MATH: K :MATH]
is the number of REs in ATAC-seq.
Cross modal inference
In addition to modality-private self-reconstruction, the scTFBridge
training process incorporates cross-modal generation to learn
cis-regulatory relationships between ATAC-seq and RNA-seq. This feature
enables inference across modalities, allowing the model to predict
missing data in one modality using information from the other. When
RNA-seq data is missing, the Product-of-Experts (PoE) approach is
employed to use the shared latent distribution of ATAC-seq
[MATH: q(zsa∣xa,c;ϕ)
:MATH]
as the multi-modal joint distribution for cross-omics generation. The
missing modality-private latent variable
[MATH: zpr :MATH]
is reparametrized from a prior normal Gaussian distribution
[MATH: N(zpr∣μ0,δ0
mrow>) :MATH]
. Using (
[MATH: zpr,zs :MATH]
), the model recovers the RNA-seq data by optimizing the following
objective:
[MATH: Ezpr~Nμ0,δ0
mrow>,qzs∣xa;ϕ[log
p(xr∣zpr,zs,c;θ)
mrow>]−KL(q(
zs∣xa;ϕ)∣∣p
(zs)) :MATH]
19
Similarly, we can derive the objective function for generating ATAC-seq
data from RNA-seq:
[MATH: Ezpa~Nμ0,δ0
mrow>,qzs∣xr;ϕ[log
p(xa∣zpa,zs,c;θ)
mrow>]−KL(q(
zs∣xr;ϕ)∣∣p
(zs)) :MATH]
20
Learning Object
In general, the final optimizing objective of scTFBridge can be
expressed as follows:
[MATH:
L=LVA<
/mi>E+β*<
mi>LMI+γ
*Lcr<
mi>ossmodal
mi> :MATH]
21
The first term optimizes the disentangled VAE for learning
omics-specific representations and accurate data reconstruction, it
measures how well the model reconstructs each modality from the latent
representations and regularizes the latent space by minimizing the
divergence between the latent distributions and the prior Gaussian
distribution. The second term evaluates and promotes the
disentanglement of the shared and private latent representations by
minimizing the MI between them. The third term ensures that the model
can perform effective cross-modal inference, using the shared and
reparametrized latent variables. These terms collectively ensure that
scTFBridge achieves biologically meaningful integration,
disentanglement, and cross-omics prediction.
Single cell multi-omics data preprocessing
In this study, we implemented a single-cell multi-omics data
preprocessing workflow using the Scanpy package^[193]71. For the
scRNA-seq data, TFs were isolated, normalized, and log-transformed. We
then selected the top 128 highly variable TFs for focused embedding in
the shared TF space. Additionally, the top 3000 highly expressed genes
were identified as (TGs). These TGs were normalized and log-transformed
to serve as input for scTFBridge. For scATAC-seq data, chromatin
accessibility peaks were binarized and filtered to retain peaks with
open accessibility in at least 1% of all cells. To ensure accurate
integration, only cells represent in both datasets were included in the
analysis.
SHAP value for cell-level and cell-type-specific GRN inference
SHAP values provide a robust framework for quantifying the
contributions of features in deep learning models and are widely used
for biologically interpretable biomarker discovery. Building on prior
studies^[194]48, we employed the reconstruction loss between original
and imputed TG expression values as the objective for computing SHAP
values. for input features, focusing on their contribution to the
reconstruction rather than direct expression effects. By focusing on
the reconstruction error, scTFBridge highlights features that are
challenging for the model to learn, revealing complex or subtle
regulatory relationships in cell-level and cell-type-specific GRNs.
For cis-regulatory contributions, we calculated regulatory elements
(REs) associated with TGs via cross-omics generation. The
cis-regulatory score for a RE-TG pair was defined as:
[MATH:
REscorei,j=ρi,jOjEie−d
i,j<
msub>d0 :MATH]
22
Where
[MATH:
ρi,j :MATH]
is the SHAP value of RE
[MATH: j :MATH]
and TG
[MATH: i :MATH]
;
[MATH: Oj
:MATH]
is the average chromatin accessibility of this cell type;
[MATH: Ei
:MATH]
is the average expression of TG
[MATH: i :MATH]
in the cell type;
[MATH:
di,j :MATH]
is the distance between genomic locations of
[MATH: j :MATH]
and
[MATH: i :MATH]
; and
[MATH: d0
:MATH]
is the normalization factor for genomic distance, set to 30,000 bp in
this study.
For trans-regulatory contributions, we quantified the influence of each
TF variable in the shared embedding space on TGs. The trans-regulatory
score was directly defined as the SHAP value:
[MATH:
TFscorek,j=αk,i :MATH]
23
Where
[MATH:
αk,i :MATH]
represent the SHAP value quantifying the contribution of TF
[MATH: k :MATH]
to TG
[MATH: i :MATH]
.
To infer regulatory networks at the individual cell level, we extended
the computation of SHAP values to account for unique cell-level
characteristics. Instead of averaging values across a specific cell
type, we calculated SHAP values using the average gene expression and
chromatin accessibility across all cell types. This method enabled the
reconstruction of cell-specific GRNs, providing a fine-grained view of
regulatory interactions that account for each cell’s unique features.
Latent embedding contributions estimation
To evaluate the contributions of modality-private and shared components
in the generative process of RNA-seq data from latent variables, we
employed SHAP values. For each latent dimension, the average of the
absolute SHAP values was computed across all TGs to derive global
contribution scores. The overall contributions of the private and
shared embedding spaces were then estimated by summing the contribution
scores of all latent variables within their respective spaces. To
ensure comparability, the two contribution scores (private and shared)
were normalized to the range [0,1] by dividing each score by the
combined sum of both scores. These normalized proportions were then
ranked to determine the relative influence of private and shared
components on gene regulation.
Using these normalized contribution proportions, we distinguished genes
primarily regulated by private information from those predominantly
influenced by shared features. To investigate the biological
significance of these findings, we performed Gene Ontology biological
process (GO:BP) enrichment analysis using the gprofiler2^[195]72
website with selected top 100 ranked genes from each group.
Additionally, we conducted Gene Set Enrichment Analysis (GSEA) using
the gesapy package^[196]73 with the ranked gene lists.
Benchmarking analysis of GRN inference
To evaluate the performance of scTFBridge in predicting cis- and
trans-regulation, we conducted a benchmarking analysis against
established methods using consistent dataset splits to ensure a fair
comparison. For cis-regulation prediction, we compared scTFBridge with
PCC, TSS distance, random prediction, and SCENIC + ^[197]37. For
trans-regulation prediction, we evaluated scTFBridge against baseline
methods for cell-type-specific GRN inference from single-cell
multi-omics data: LINGER^[198]33, scMTNI^[199]35,
scMultiomeGRN^[200]34, and SCENIC+^[201]37. To address variations in
gene filtering criteria across methods, we used the intersection of
genes retained by all methods for final metric calculations, ensuring a
fair and consistent evaluation. For GRN inference, performance was
assessed by comparing the top-k predicted target genes of specific TF
from each method against the intersection of target genes in the
Cistrome database^[202]57. The overlapping genes and their associated
scores were used for performance evaluation.
Heritability enrichment analysis
We utilized stratified linkage disequilibrium score regression
(S-LDSC)^[203]53 to assess the heritability enrichment of various peak
sets. First, we converted the genomic coordinates of peaks from hg38 to
hg19 to align with the GWAS summary statistics of RA. SNPs with a minor
allele frequency (MAF) ≥ 0.05 associated with the peak sets of interest
were selected. Each 200 bp peak window was extended by 500 bp on either
side, resulting in a 1.2 kb genomic interval per peak. Using S-LDSC, we
then calculated population-specific linkage disequilibrium (LD) scores
and quantified the proportion of RA heritability attributable to
modality-shared and modality-private peak sets from different
single-cell populations in the RA dataset, as analyzed using
scTFBridge.
Evaluation metric
AUPR Ratio
The AUPR Ratio measures the improvement of a predictor over random
chance in terms of precision-recall performance. It is defined as the
ratio of the Area Under the Precision-Recall (AUPR) curve of a method
to that of a random predictor. For a random predictor, the AUPR is
equal to the fraction of positive samples in the dataset
[MATH: P :MATH]
, which is defined as:
[MATH: AUPRRatio
mi>=AUPRmetho<
mi>dP :MATH]
24
AUC
The AUC score evaluates the performance of a predictor in
distinguishing between positive and negative samples. It is derived
from the Receiver Operating Characteristic (ROC) curve, which plots the
true positive rate (sensitivity) against the false positive rate
(1-specificity) across thresholds.
F1 Score
The F1 Score assesses the balance between precision and recall,
particularly useful when dealing with imbalanced datasets. It is
computed as the harmonic mean of precision and recall, which is defined
as:
[MATH: F1Score
mi>=2⋅Pre
cision⋅<
/mo>Recall
Precisi
on+Reca<
/mi>ll :MATH]
25
Implementation details of scTFBridge
We implemented scTFBridge using the PyTorch deep learning library in
Python. The dimensions of modality-shared and private embedding were
empirically set to 128 in scTFBridge. The model was trained with the
Adam optimizer, employing an initial learning rate of 1e-3 and weight
decay of 1e-5, reduced by epoch. Model training was conducted with a
batch size of 128 over 150 epochs with early stopping triggered after
10 steps of no improvement. Hyperparameters used to balance the weights
of each training objective, with example value was provided in
Supplementary Data [204]8. To ensure robustness, the datasets were
divided into five folds for cross-validation, with each fold split into
64% for training, 16% for validation and 20% for testing. The held-out
validation set was used to identify the best model checkpoint in early
stopping and optimize hyperparameters. The SHAP values for GRN
inference were computed using the test datasets. All experiments were
conducted on a workstation equipped with two Intel Xeon Silver 4210 R
CPUs and eight NVIDIA GeForce RTX 4090 GPUs.
Reporting summary
Further information on research design is available in the [205]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[206]Supplementary Information^ (2.8MB, pdf)
[207]Supplementary Data 1^ (9.1KB, xlsx)
[208]Supplementary Data 2^ (9.4KB, xlsx)
[209]Supplementary Data 3^ (100.3KB, xlsx)
[210]Supplementary Data 4^ (9.3KB, xlsx)
[211]Supplementary Data 5^ (9.1KB, xlsx)
[212]Supplementary Data 6^ (9.3KB, xlsx)
[213]Supplementary Data 7^ (15.8KB, xlsx)
[214]Supplementary Data 8^ (9.3KB, xlsx)
[215]Supplementary Data 9^ (9.2KB, xlsx)
[216]Supplementary Data 10^ (8.9KB, xlsx)
[217]Reporting Summary^ (265.9KB, pdf)
[218]41467_2025_64227_MOESM13_ESM.pdf^ (189.6KB, pdf)
Description of Additional Supplementary Files
[219]Transparent Peer Review file^ (7.4MB, pdf)
Source data
[220]Source Data^ (390.6KB, zip)
Acknowledgements