Abstract Cellular heterogeneity within cancer tissues determines cancer progression and treatment response. Single‐cell RNA sequencing (scRNA‐seq) has provided a powerful approach for investigating the cellular heterogeneity of both cancer cells and stroma cells in the tumor microenvironment. However, the common practice to characterize cell identity based on the similarity of their gene expression profiles may not really indicate distinct cellular populations with unique roles. Generally, the cell identity and function are orchestrated by the expression of given specific genes tightly regulated by transcription factors (TFs). Therefore, deciphering TF activity is essential for gaining a better understanding of the uniqueness and functionality of each cell type. Herein, metaTF, a computational framework designed to infer TF activity in scRNA‐seq data, is introduced and existing methods are outperformed for estimating TF activity. It presents the improved effectiveness in characterizing cell identity during mouse hematopoietic stem cell development. Furthermore, metaTF provides a superior characterization of the functional identity of breast cancer epithelial cells, and identifies a novel subset of neural‐regulated T cells within the tumor immune microenvironment, which potentially activates BCL6 in response to neural‐related signals. Overall, metaTF enables robust TF activity analysis from scRNA‐seq data, significantly enhancing the characterization of cell identity and function. Keywords: cell identity, scRNA‐seq, transcription factor activity, tumor immune microenvironment __________________________________________________________________ MetaTF is a new computational framework that uses single‐cell RNA sequencing (scRNA‐seq) data to infer transcription factor activity with the aim to help in deciphering cells identity starting from a heterogeneous population, and apply it to characterize mouse hematopoietic stem cells and breast cancer epithelial cells, thus identifying a novel subset of neural‐regulated T cells within the tumor microenvironment. graphic file with name ADVS-12-e10745-g003.jpg 1. Introduction Single‐cell RNA sequencing (scRNA‐seq) has become the state‐of‐the‐art approach for investigating the cellular heterogeneity and dynamics of various biological systems, ranging from cell development to disease progression.^[ [54]^1 , [55]^2 , [56]^3 ^] Leveraging the power of scRNA‐seq, researchers have successfully applied it for the identification of novel cell types and subtypes, the refinement of developmental lineage trajectories, and more.^[ [57]^4 , [58]^5 , [59]^6 ^] Regardless of what objective is pursued, a fundamental step consistently involves the establishment of cell identity present within a given tissue sample.^[ [60]^7 , [61]^8 ^] A common practice for identifying cell types is by clustering cells based on the similarity of their gene expression profiles.^[ [62]^9 , [63]^10 ^] However, transcriptional differences that separate cells into distinct groups may not necessarily indicate functionally distinct cell populations.^[ [64]^11 , [65]^12 , [66]^13 ^] Given the additional layers of complexity in cell identity, accurately defining cell identity remains a substantial challenge. At the core of cell identity and function lies the intricate interplay of gene expression programs, which ultimately dictate the unique characteristics of each cell type. Central to this regulatory landscape are the core regulatory circuits, orchestrated by tightly regulated transcription factors (TFs).^[ [67]^7 , [68]^14 ^] Understanding TF activity is paramount for unraveling the essence and functionality of distinct cell types. While recent years have seen the emergence of several algorithms for analyzing TF activity in scRNA‐seq data, they often grapple with limitations affecting their accuracy or robustness.^[ [69]^15 , [70]^16 , [71]^17 , [72]^18 ^] For instance, the algorithms solely relying on TF expression levels struggle to faithfully capture their biological activity. Similarly, those scrutinizing both TFs and their direct target expression encounter challenges in identifying genuine targets due to the complexities of regulation. These methods rely heavily on experimentally measured or predicted TF binding information, such as motif enrichment, thereby limiting their scope to TFs with available data. Moreover, current TF binding site (TFBS) annotation assays, such as chromatin immunoprecipitation coupled with deep sequencing (ChIP‐seq), ChIP‐exo, and CUT&RUN, necessitate high‐affinity antibodies and large amounts of homogeneous cells, posing challenges for diverse tissues. Additionally, these assays assess only one TF at a time, hampering comprehensive TF binding analysis. Conversely, motif‐based predictions, while not require cell‐specific data, rely on predefined motifs available for only a subset of TFs, leading to potential false positives. Recognizing these limitations, the compilation of TF–target interactions from diverse data sources to construct a comprehensive static TF–target interaction network, integrated it with dynamic gene regulatory networks (GRNs) from single cells, provide a better understanding of regulatory relationships within the sample's organization under specific conditions. Furthermore, a notable observation emerges; there exists an underutilization of TF activity in characterizing cell identity and function, especially in contexts like cell development and disease progression.^[ [73]^19 , [74]^20 ^] This underutilized potential underscores the urgent need for enhanced algorithms for inferring TF activity in scRNA‐seq data and their practical application. In this context, we introduce metaTF ([75]https://github.com/wanglabsmu/metaTF), a computational framework tailored for comprehensive TF activity analysis in scRNA‐seq data. MetaTF addresses the limitations of traditional methods by constructing comprehensive static prior networks through the integration of diverse data sources, and it does not require training on a specific scRNA‐seq dataset. It enhances TF regulation portrayal by amalgamating dynamic regulatory networks from scRNA‐seq data with static prior networks and optimizing existing algorithms to provide accurate and robust TF activity predictions and gene regulatory network analysis. Benchmarking across diverse types of scRNA‐seq data, including TF knockout, and TF overexpression data, demonstrates that metaTF consistently outperforms other widely used tools, affirming its remarkable accuracy and resilience in inferring TF activity from scRNA‐seq data. Single‐cell Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) screening is revolutionizing the field of immuno‐oncology by enabling the precise mapping of GRNs and the identification of critical targets. Specifically, the knockout of TFs enables the determination of the perturbation effects in finely regulated biological processes, which in turn influences cellular functions. In the realm of CD8^+ cytotoxic T‐cell differentiation,^[ [76]^21 ^] metaTF has not only proven its exceptional capability in the context of gene regulatory network analysis but has also displayed its prowess by showcasing its ability to predict the effects of TFs’ perturbations with remarkable accuracy. These underscore the enhanced capability of metaTF to effectively interpret novel forward genetic screens data, highlighting its advanced analytical prowess in decoding the complex genetic interactions. MetaTF provides an effective and applicable solution to elucidate perturbation function and biologic circuits by a model‐based quantitative analysis of single‐cell‐based CRISPR screening data. As an extension of its application, metaTF can quantify the relationships between various perturbations. Its application to mouse hematopoietic stem cell (HSC) data^[ [77]^22 ^] illustrates how TF activity enhances the accurate identification of cell identity. Additionally, metaTF provides a deeper understanding of the functional identity of epithelial cells in breast cancer (BRCA), unveiling the presence of a novel neural‐regulated T‐cell subset within the tumor immune microenvironment. In summary, metaTF addresses the need for methods capable of integrated analysis of diverse data types by offering a systematic integration method that is broadly applicable. By leveraging various data sources and dimensions of TF biology, metaTF provides a more comprehensive and versatile solution for TF activity inference and gene regulatory network analysis. This integrated analysis is crucial for unraveling the complexities of gene regulation and holds immense potential for researchers across various domains. 2. Results 2.1. Overview of the MetaTF Framework MetaTF is an integrated framework for inferring TF activity from scRNA‐seq data. This workflow consists of five key components: data import, GRN inference, construction of a prior network, regulon (TF–targets) identification, and estimation of TF activity (Figure [78]1 ). Specifically, metaTF supports flexible data input, including count matrices from droplet or full‐length protocols,^[ [79]^23 ^] or preprocessed data objects from software such as CellRanger,^[ [80]^24 ^] Seurat,^[ [81]^25 ^] and SingleCellExperiment.^[ [82]^26 ^] GRN inference utilizes percentage of unique information contribution (PUIC), an entropy‐based algorithm, to estimate the importance of regulator–target relationships by quantifying the unique informational contribution from the mutual information (MI) between each gene triplet (Figure [83]1; Experimental Section; Figure [84]S1a, Supporting Information). Notably, integration with a weighted prior TF–target network constructed from literature‐curated, ChIP‐seq, TF knockout, and tissue‐specific interaction information further enhances robust regulon identification for each TF (Experimental Section; Figure [85]S1b, Supporting Information). The Virtual Inference of Protein‐activity by Enriched Regulon analysis (VIPER)^[ [86]^17 ^] algorithm then estimates TF activity based on the enrichment score of the identified regulons. Implementation of metaTF as an R package enables leveraging the extensive SingleCellExperiment ecosystem for convenience and scalability. Meanwhile, parallel workflows allow comparisons of alternative methods for GRN inference (such as Gene Network Inference by Ensemble of Trees (GENIE3)^[ [87]^27 ^] and PCOR (R Package for a Fast Calculation to Semi‐partial Correlation Coefficients)^[ [88]^28 ^] and estimation of TF activity [such as single‐sample Gene Set Enrichment Analysis (ssGSEA)^[ [89]^29 ^] and AUCell^[ [90]^18 ^] (a tool to quantify the enrichment of gene sets in single cells)]. Downstream analyses include cell‐type‐specific activated TF identification, pathway enrichment, and the construction of visual aids, including Radviz plots,^[ [91]^30 ^] Sankey diagrams, and heatmaps (see online vignette). Optimization via vectorization and multithreading ensures user‐friendly and timely analysis. Figure 1. Figure 1 [92]Open in a new tab Overview of metaTF framework. MetaTF is built upon the S4 class SingleCellExperiment package and accepts various types of inputs, including count matrices and S4 objects generated from SingleCellExperiment or Seurat. Subsequent analyses include GRN inference, prior‐network construction, regulon identification, and TF activity estimation. Moreover, metaTF provides several additional functions, including cell type‐specific activated TF identification, regulon pathway enrichment analysis, and the construction of various visual aids, such as dot plots, Radviz plots, Sankey diagrams, and heatmaps. 2.2. Benchmarking Highlights MetaTF's Superior Performance for Inferring TF Activity We systematically benchmarked metaTF against the state‐of‐the‐art algorithms Single‐cell Regulatory Network Inference and Clustering (SCENIC)^[ [93]^18 ^] and DoRothEA.^[ [94]^15 ^] SCENIC uses machine learning (GENIE3/GRNBoost) to infer GRNs and refines them via motif enrichment analysis, retaining only TF–target pairs with significant motif presence in the promoters of target genes. This approach reduces indirect interactions but is limited by the availability of predefined motifs and may yield false positives. DoRothEA relies on a precompiled TF–target interaction database, bypassing GRN inference from single‐cell data. Although efficient, it may include context–irrelevant interactions, leading to false positives. By using diverse scRNA‐seq datasets, metaTF showed its superior performance in several scenarios. In CRISPRi perturbation data targeting 40 TFs in human embryonic stem cells (ESCs),^[ [95]^31 ^] metaTF outperformed SCENIC and DoRothEA at inferring TF activities (Figure [96]2a; Experimental Section; Figure [97]S2a and Tables [98]S1 and [99]S2, Supporting Information). Since dropouts pose challenges for TF activity analysis, we evaluated their performance under varying dropout levels. Despite observing declining performance with increased dropout rates, it is noteworthy that metaTF maintains competitive performance compared to other tools (Experimental Section; Figure [100]S2b, Supporting Information), demonstrating the robustness of metaTF with dropouts. This is further supported by results from low‐coverage 10× Genomics scRNA‐seq data^[ [101]^32 ^] (Experimental Section; Figure [102]S3, Supporting Information). In primary cells with single TF knockout, metaTF achieved Area Under the Receiver Operating Characteristic Curve (AUROC) values of 0.87 and 0.68 for Nkx2‐1 knockout droplet‐based scRNA‐seq data^[ [103]^33 ^] and Cebpa knockout MARS‐seq data,^[ [104]^34 ^] respectively, which were higher than those produced by the alternatives (Figure [105]2b,c; Experimental Section; Figure [106]S4a–d and Table [107]S3, Supporting Information). In overexpression data, metaTF again excelled with AUROC values of 0.63 and 0.97 separately for the Hoxa9 and Runx1 overexpression Smart‐seq2 data^[ [108]^35 ^] (Figure [109]2d,e; Experimental Section; Figure [110]S4e,g and Tables [111]S4 and [112]S5, Supporting Information). These results affirm the remarkable accuracy and resilience of metaTF in conducting TF activity inference for a wide range of scRNA‐seq datasets. Specifically, when applied to CRISPR droplet sequencing (CROP‐seq) data from HEK293T cells with TF perturbations,^[ [113]^36 ^] PUIC outshines the other two methods (GENIE3 and PCOR) in more than half of the TFs, and maintains a faster computational speed (Figure [114]2f; Experimental Section; Figure [115]S4h, Supporting Information). Furthermore, the integration of GRN inferred by PUIC and the prior TF–target network significantly contributed to the enhanced performance of metaTF (Experimental Section; Figure [116]S4i,j, Supporting Information). Lastly, in terms of TF activity estimation, VIPER was proven to be superior to ssGSEA and AUCell (Experimental Section; Figure [117]S4k and Table [118]S6, Supporting Information). Besides, combinatorial analysis on CRISPRi data showed that the default PUIC and VIPER methods in metaTF yielded optimal performance (Figure [119]2g), achieving the highest AUROC values for 23 out of 40 TFs (Table [120]S7, Supporting Information). Meanwhile, to elucidate the pivotal contributions of two key steps in inferring TF activity using metaTF, we recalculated the activities of 35 TFs by separately removing the steps involving PUIC for GRN calculation and the prior network. The findings revealed that, in comparison to PUIC alone, the inclusion of the prior network led to superior results for metaTF. Simultaneously, when excluding the PUIC calculation of the GRN step and retaining only the prior network, metaTF achieved results comparable to SCENIC (Figure [121]S5a,b, Supporting Information). In summary, systematic benchmarking validates metaTF as a superior algorithm for scRNA‐seq TF activity inference. Figure 2. Figure 2 [122]Open in a new tab Assessment of metaTF using various types of scRNA‐seq data. a) Line plot on the left depicting the AUROC for 40 perturbated TFs, calculated with three methods including metaTF, SCENIC, and DoRothEA. The distribution curve on the right illustrates the AUROC score distributions for these three methods. The Kolmogorov–Smirnov (KS) test was used to determine the differences (*P < 0.05). b) Receiver operating characteristic (ROC) curves for Nkx2‐1 activity estimation by metaTF, SCENIC, and DoRothEA. These estimations were conducted on Nkx2‐1 conditional knockout (cKO) epithelial cells from Little et al.,^[ [123]^33 ^] with the AUROC values indicated in brackets. FPR represents the false‐positive rate, and TPR represents true‐positive rate. c) ROC curves for Cebpa activity estimation by the same three methods applied to Cebpa cKO myeloid progenitors from Paul et al.^[ [124]^34 ^] d) ROC curves of Hoxa9 activity estimation by the same three methods applied to Hoxa9 overexpression HECs from Guo et al.^[ [125]^35 ^] e) ROC curves of Runx1 activity estimation by the same three methods applied to Runx1 overexpression HECs from Guo et al. f) Line plot showing the AUC values for 14 perturbed TFs, calculated using GRNs inferred by PUIC, GENIE3, and PCOR in the CROP‐seq dataset. g) Line plot showing the AUROC values for 40 perturbed TFs, calculated using various combinations of GRN inference methods and TF activity enrichment methods within the metaTF framework. h) The histogram displays the distribution of differences in the AUROC values obtained from metaTF, DoRothEA, and SCENIC for each transcription factor, providing a visual summary of their comparative performance on data of single‐cell CRISPR screens from CD8^+ cytotoxic T‐cell differentiation. Single‐cell CRISPR screening facilitates the mapping of gene regulatory networks, uncovering pivotal targets in immuno‐oncology. Specifically, the knockout of TFs enables the determination of the perturbation effects in finely regulated biological processes, which in turn influences cellular functions. In the realm of CD8^+ cytotoxic T‐cell differentiation,^[ [126]^21 ^] single‐cell CRISPR screens with knockout of TFs provide a more effective method for interrogating the regulatory landscape of these cells. Our metaTF method demonstrated superior performance in analyzing these datasets with single TF knockout experiments, achieving an average AUROC of 0.69 across 109 TFs. Notably, metaTF outperformed established methods DoRothEA and SCENIC in 80 and 66 TFs, respectively (Figure [127]2h; Experimental Section; Figure [128]S6a, Supporting Information). As an extension of its application, we initially identified eight distinct pairs of TFs that exhibited divergent activity trajectories following double knockout experiments, highlighting the intricate regulatory interdependencies within CD8^+ T‐cell regulation (Figure [129]S6b,c, Supporting Information). This comprehensive analysis unveiled the complex nature of TF interactions and their roles in the cellular response. These results not only validate metaTF's robustness in interpreting single‐cell CRISPR screen data but also provide insights into the intricate TF networks governing T‐cell fate decisions. We also validated similar conclusions using single‐cell CRISPR screening data from the cortical development of the mouse brain (Experimental Section; Figure [130]S7, Supporting Information). By elucidating both individual TF activities and their functional interplay, our approach offers a framework for deciphering GRN complexity in various biological contexts, potentially informing targeted approaches in immuno‐oncology. 2.3. TF Activity Profiles Outperform Expression Profiles for Dissecting Heterogeneous Cell Populations To explore the potential of TF activity in capturing cellular heterogeneity, we utilized six sequentially acquired cell populations during the development of mouse embryonic HSCs in our previous study,^[ [131]^22 ^] including inferred TF activity profile, TF expression profile, and highly variable gene expression profiles (Experimental Section). Dimensionality reduction analysis visualizes exceptional capability in effectively discerning diverse populations based on TF activity profiles while maintaining high level of internal consistency within each population (Figure [132]3a; Figures [133]S8a and [134]S9a, Supporting Information). Nonetheless, cell clustering based on TF expression or even highly variable gene expression faced challenges, particularly in the CD45^− T1 and CD45^+ T2 pre‐HSC populations, which were also observed in the results obtained from employing a classic random forest classification model to fit TF activity profiles and gene expression profiles (Experimental Section; Figure [135]S9b, Supporting Information). This indicates that aggregating signals from regulons in TF activity potentially delivers more resilient information content versus expression alone. To further confirm this notion, we gauged the cell‐type‐specific action and within cell‐type variation of TFs by contrasting activity and expression profiles (Experimental Section). In line with our hypothesis, we observed that TFs exhibited relatively higher cell type specificity (CTS, Figure [136]3b; Table [137]S8, Supporting Information) and lower within cell‐type variance (Figure [138]S8a, Supporting Information) in their activity versus expression profiles. As an example, the Hes family bHLH transcription factor 1 (Hes1), known for its role in the maintenance of the HSC pool in the bone marrow under stress,^[ [139]^37 ^] emerged as an adult HSC‐specific TF in the activity profile, while Hes1 expression was confined to a small subset of adult HSCs, exhibiting high within‐cell‐type variation (Figure [140]S8b,c, Supporting Information). Figure 3. Figure 3 [141]Open in a new tab Comparison between the TF expression and TF activity profiles. a) t‐SNE plots visualizing cluster assignments of cells based on TF expression profile (left), highly variable gene expression profile (middle) and TF activity profile (right). The colors indicate the categorization into six sequential cell populations during HSC development: arterial endothelial cells (AEC), hemogenic endothelial cells (HEC), T1 (type 1 pre‐HSCs), T2 (type 2 pre‐HSCs), fetal liver HSCs (FL), and Adult (adult HSCs). b) Bar plot representing the distribution of TFs at both the expression and activity levels, categorized by cell‐type‐specific score. The Kolmogorov–Smirnov (KS) test was used to determine the differences. c) Radviz plot showing the distribution of CTS TFs at the expression (left) and activity (right) levels, with colors representing the six cell types. d) The pie chart on the top displaying the number of shared TFs identified at both expression and activity levels, while the pie chart on the bottom displaying the number of TFs exclusively identified at the activity level. e) Violin plot demonstrating the expression of Rela in the six sequential populations. f) Violin plot demonstrating the activity score of Rela in the six sequential populations. Significance levels are indicated as ***, with a P‐value of <0.001. g) GSEA showing the relative enrichment of target genes regulated by Rela between WT (wild‐type) and Rela KO (knockout) samples. The normalized enrichment score was used to quantify the enrichment level. Robust cell state characterization using metaTF enables the discovery of additional CTS determinants. We further identified 414 CTS TFs, with the highest number in the T1 and T2 pre‐HSC stages (Figure [142]3c; Figure [143]S8d,e, Supporting Information). While 60% (248) of CTS TFs were identified in both the activity and expression profiles (Figure [144]3d, upper panel), the activity profile uncovered 161 additional unique CTS TFs undetected by the expression profile (Figure [145]3c,d, lower panel), accounting for 40% of the total. Nearly half (80/161) of these TFs were activated specifically in adult HSCs, like Rela (Figure [146]3e,f) and Sp1 despite negligible expression differences (Figure [147]S8f,g, Supporting Information). We confirmed their activation in HSCs using independent Rela knockout microarray data^[ [148]^38 ^] (Figure [149]3g) and Sp1 knockout microarray data^[ [150]^39 ^] (Experimental Section; Figure [151]S8h, Supporting Information). Additionally, by analyzing scRNA‐seq data from healthy human bone marrow plasma cells mapped to the T2T‐CHM13 reference assembly,^[ [152]^40 ^] we validated clustering patterns of Immunoglobulin A (IgA)/ Immunoglobulin G (IgG) plasma cells using TF activity analysis, revealing subtype‐specific TF activation and supporting previous observations of tissue‐origin transcriptional retention in bone marrow plasma cells (Experimental Section; Figure [153]S10, Supporting Information). In summary, the TF activity profile outperforms the expression profile in enabling a refined dissection of heterogeneous cell populations and identification of novel cell‐type‐specific regulators. 2.4. MetaTF Unveils the Heterogeneity of Human Breast Cancer Epithelial Cells Cellular heterogeneity within cancer tissues determines both cancer progression and treatment response. Breast cancer stands out as the first malignancy to be recognized for its molecular‐level heterogeneity, leading to the development of tailored treatment strategies for different molecular subtypes. This progress has been further propelled by the recent availability and accumulation of a wealth of scRNA‐seq data, which includes both normal and malignant breast epithelial cells, as well as immune cells within the breast cancer microenvironment.^[ [154]^41 , [155]^42 , [156]^43 , [157]^44 ^] Therefore, we utilized metaTF to explore cell identity in breast cancer development and the immune microenvironment. We reanalyzed an scRNA‐seq dataset of human breast cancer epithelial cells^[ [158]^45 ^] ([159]GSE176078), maintaining the original classification of seven cell populations: Cancer Basal, Cancer Her2, Cancer LumA, Cancer LumB, Luminal Progenitors, Mature Lumial, and Myoepithelial (Table [160]S9, Supporting Information). Four of these populations were composed of cancer cells: Cancer LumA (predominantly Estrogen Receptor (ER^+) clinical subtype, 98.9%), Cancer LumB (mainly ER^+ clinical subtype, 94.5%), Basal (primarily triple‐negative breast cancer (TNBC) clinical subtype, 96.8%), and Cancer Her2 (present across clinical subtypes, comprising 8.8% ER^+, 36.1% HER2^+, and 55.1% TNBC). Our analysis used highly variable TFs (top 500) and genes (top 2000), and found that TF activity outperformed gene expression in distinguishing cell populations, particularly evident in T‐distributed stochastic neighbor embedding (t‐SNE) visualization (Figure [161]4a; Experimental Section). Assessment metrics consistently demonstrated that TF activity more accurately grouped and separated cell types^[ [162]^46 ^] (Figure [163]4b; Experimental Section; Table [164]S10, Supporting Information). To delve into clinical subtype‐specific regulation, we focused on three cancer cell types (Cancer LumA, Cancer LumB, and Cancer Basal). Differential analysis identified 53, 58, and 65 subtype‐specific activated TFs in Cancer LumA, Cancer LumB, and Cancer Basal, respectively. Among these, there were known regulators such as HOXB2 ^[ [165]^47 ^] and KLF4 ^[ [166]^48 ^] in Cancer LumA, as well as HMGA1 ^[ [167]^49 ^] in Cancer Basal (Figure [168]4c; Table [169]S11, Supporting Information). Pathway analysis for TFs activated in specific clinical subtypes, along with their targets, revealed common properties associated with malignant proliferation, such as epithelial cell differentiation, regulation of transcription from RNA polymerase II promoter in response to stress, and regulation of cytoplasmic translation pathways. Each subtype also exhibits unique processes. For example, Cancer LumA displayed involvement in the apoptotic signaling pathway, negative regulation of programmed cell death, and response to estrogen.^[ [170]^45 ^] Cancer LumB showed an association with incorrect protein metabolic‐related pathways,^[ [171]^45 ^] while Cancer Basal exhibited enrichment in Messenger Ribonucleic Acid (mRNA) processing and RNA stability‐related pathways^[ [172]^50 ^] (Figure [173]4d; Table [174]S12, Supporting Information). Furthermore, we identified 8, 22, and 24 Cancer Her2‐specific activated TFs in these three clinical subtypes, respectively (Figure [175]S11a and Table [176]S13, Supporting Information). These TFs were predominantly associated with glycolytic process, protein localization, and pre‐miRNA processing related pathways (Figure [177]S11b and Table [178]S14, Supporting Information). We validated these clinical subtype‐specific TFs using an expression‐based AUCell approach,^[ [179]^18 ^] which demonstrated strong consistency with the metaTF results (Figure [180]S11c–e, Supporting Information). To ascertain the effectiveness of metaTF in identifying clinical subtype‐specific activated TFs, we obtained the BRCA RNA‐seq dataset from The Cancer Genome Atlas (TCGA)^[ [181]^51 ^] and clinical subtyping based on Berger et al.’s work.^[ [182]^52 ^] Subsequently, gene set variation analysis (GSVA)^[ [183]^53 ^] revealed significant differences in the activity of 15 out of 53 TFs in LumA, 24 out of 58 TFs in LumB, and 28 out of 65 TFs in Basal, compared to other clinical subtypes (Figure [184]S12a,b,d,f and Table [185]S15, Supporting Information). The gene set enrichment analysis (GSEA) results further confirmed the specificity of these TFs to cancer subtypes (Figure [186]S12c,e,g, Supporting Information). To further validate the effectiveness of subtype‐specific activated TFs identified by metaTF, the potential roles of the top 15 Basal‐activated TFs in the proliferation, migration, and invasion of Basal subtype of breast cancer cells were analyzed. We knockdown the expression of these 15 TFs in two basal breast cancer cell lines (Hs578T and BT549 cells) (Figures [187]S13a and [188]S14a and Table [189]S16, Supporting Information). Among these TFs, SOX6, SOX15, CEBPG, KLF13, NFAT5, and SMAD1 were experimentally validated for their ability to affect the migration and invasion of breast cancer cells, the transwell assay results showed that knockdown of these TFs, respectively, significantly decreased the migration (Figures [190]S13b and [191]S14b, Supporting Information) and invasion (Figures [192]S13c and [193]S14c, Supporting Information) of both cancer cells. In addition, we also detected the effects of 15 TFs on the proliferation function of breast cancer cells, among which knockdown of SOX6, SOX15, or CEBPG significantly inhibited the proliferation of Hs578T and BT549 cells, respectively (Figures [194]S13d and [195]S14d, Supporting Information). Furthermore, the functional study demonstrated that these six TFs have no significant impact on the migration of T47D, a luminal A type of breast cancer cell line (Figure [196]S15, Supporting Information). This result substantiates the effectiveness of the subtype‐specific activated transcription factors identified by metaTF. Additionally, to validate whether 6/15 is a remarkable percentage, we randomly selected 15 expressed TFs and performed migration assays in Basal‐like breast cancer cell lines, and showed that only two out of the randomly selected 15 TFs significantly inhibited the migration of BT549 cells (Figure [197]S16a,b, Supporting Information). When incorporating the impact of these TFs on the proliferation of BT549 cells into consideration, none of these 15 randomly selected TFs significantly affect both proliferation and migration of BT549 cells (Figure [198]S16c, Supporting Information). Biological replicate experiments have yielded results that are consistent with the previous conclusions (Figures [199]S17 and [200]S18, Supporting Information). These results suggest that activated TFs identified by metaTF somehow play critical roles in the progression of breast cancer. As cancer progresses and the patient's condition worsens, TF activities change with disease advancement. Analyzing activity dynamics across cancer stages unveiled dramatic changes in TF activity throughout disease progression (Figure [201]4e–g). For instance, KDM5B ^[ [202]^54 ^] and SNAI1 ^[ [203]^55 ^] activities exhibited gradually increased in ER^+ cells, aligning with their known roles in breast cancer metastasis. In contrast, MYB activity gradually declined, consistent with its suppressive role in breast cancer metastasis.^[ [204]^56 ^] To validate the repeatability of the findings in human breast cancer epithelial cells reported by metaTF, additional data^[ [205]^57 ^] ([206]GSE180286) analysis was conducted (Experimental Section). Consistently, metaTF demonstrated superior capability in distinguishing cancer cells from normal cells. Furthermore, through a further analysis, 47, 63, and 57 subtype‐specific activated TFs were identified in Cancer LumA, Cancer LumB, and Cancer Basal, respectively. As depicted in Figure [207]S19 (Supporting Information), there is substantial overlap between these TFs and the previous analysis results. Meanwhile, to demonstrate the broad applicability of metaTF, when applied metaTF to prostate cancer epithelial cells^[ [208]^58 ^] ([209]GSE176031), the results also demonstrated that clustering based on TF activity outperformed clustering based on gene expression. Likewise, random forest classification yielded similar outcomes (Experimental Section; Figure [210]S20, Supporting Information). In summary, analyzing TF activity has provided novel insights into the heterogeneity of breast cancer epithelial cells, enhancing our understanding of the regulatory mechanisms driving cancer initiation and progression. Figure 4. Figure 4 [211]Open in a new tab MetaTF results for the human breast cancer epithelial cells single‐cell RNA‐seq dataset. a) t‐SNE plots visualizing the cluster assignments of human breast cancer epithelial cells based on TF activity profile (left) and highly variable gene expression profile (right). Meanwhile, clustering based on gene expression revealed that 89 genes exhibited uniquely high expression specifically in Cancer Basal cells, which contributed to the improved separation of these cells from other cell types. b) Bar plot displaying the values of three evaluation indicators for the two clustering results. c) Heatmap plot displaying the activity scores of the top 30 specifically activated TFs in three distinct cancer epithelial cell populations. d) Radar chart showing Gene Ontology (GO) biological process enrichment results for specific activated TFs and their expressed targets in the three cancer epithelial cell populations. Common pathways are depicted in gray text, while cell type‐specific pathways are indicated in text colors corresponding to panel (a). Dot plots illustrating the degree of TF activation that significantly changed as the disease progressed (pathological tumor stage) in the three cancer epithelial cell populations: e) Cancer LumA, f) Cancer LumB, and g) Cancer Basal. BRCA, breast cancer. TNBC, triple‐negative breast cancer. 2.5. Discovering New Neural‐Regulated T‐Cell Subsets in the Breast Tumor Microenvironment through MetaTF The intricate interplay between the immune system and cancer has become a prominent focus in cancer biology research. Immune cells possess the ability to recognize and eliminate nascent tumor cells during immune surveillance. Paradoxically, they can also act as supporters of tumor growth and progression through immunoediting. Excitingly, clinical advancements in immunosuppressive therapies and cellular immunotherapies have demonstrated promise in targeting these dual actions of immune cells within the tumor microenvironment.^[ [212]^59 , [213]^60 , [214]^61 ^] Nonetheless, our understanding of the underlying transcriptional mechanisms governing pro‐ and antitumor immunity remains incomplete.^[ [215]^62 ^] To shed light on the transcriptional regulation of T‐cell states within the tumor microenvironment, we applied metaTF to analyze two scRNA‐seq concerning breast cancer datasets.^[ [216]^63 ^] Initial clustering based on highly variable (top 2000) gene expression profile indicated that there was no clear separation of T‐cell subpopulations (Figure [217]5a, right). However, when using TF activity (top 500) profile, a pronounced separation of T‐cell subpopulations was presented, particularly in the case of CD8^+ T‐effector memory (T[EM]) cells, which were divided into two distinct clusters, denoted as CD8^+ T[EM] C1 and C2 (Figure [218]5a, left). The classification accuracy for the newly identified CD8^+ T[EM] C1 and C2 was confirmed through the expression of their respective marker genes (Figure [219]5b). Differential analysis revealed 268 genes that were upregulated in CD8^+ T[EM] C1 and 153 genes in C2 (Figure [220]5c; Table [221]S17, Supporting Information). C1 displayed higher expression of activation/proliferation markers, like NFKBIA ^[ [222]^64 ^] as well as genes related to CD8^+ T‐cell response to acute infection, memory generation, and activation stress.^[ [223]^65 , [224]^66 ^] In contrast, C2 displayed elevated expression of DOCK2, which has the potential to impair the immune function of CD8^+ T cells.^[ [225]^67 ^] Intriguingly, pathway analysis revealed that C1 was enriched in functions associated with T‐cell activation, regulation of protein serine/threonine kinase activity, interleukin‐12‐mediated signaling, and notably, neuronal/synaptic processes such as axonogenesis (Figure [226]5d, upper panel; Table [227]S18, Supporting Information). Given emerging evidence indicating the influence of nerves within the tumor microenvironment on cancer progression,^[ [228]^68 , [229]^69 ^] we first confirmed the protein level of nerve fibers marker (antineurofilament heavy,^[ [230]^70 ^] NF) in breast cancer tissue (Figure [231]S21d, Supporting Information), subsequently examined the neuroreceptor expression level in scRNA‐seq data and discovered that C1 specifically expressed adrenergic receptors (ARs) ADRB2 and ADRA2B (Figure [232]5e). By analyzing publicly available spatial transcriptomics data, we confirmed co‐expression of CD8A and ADRB2 within breast tumors^[ [233]^41 , [234]^71 ^] (Figure [235]5g; Figure [236]S22, Supporting Information). Validation through immunofluorescence in breast cancer patient samples confirmed the expression of ADRB2 on tumor‐infiltrating T cells (Figure [237]5h; Figure [238]S21i and Table [239]S19, Supporting Information). To further validate these findings, we isolated tumor‐infiltrating lymphoid T cells (CD8^+ cells) from breast cancer patients using flow cytometry (Figure [240]S21e,f, Supporting Information). Subsequently, we treated these T cells with epinephrine^[ [241]^72 ^] (an adrenergic receptor agonist, 0.1 ng mL^−1) to activate neurotransmitter receptors and assessed downstream TF expression and activity through RNA sequencing (Figure [242]S21g, Supporting Information). After performing GSEA analysis, the results showed that BCL6, TBX21, and ATF3, along with their respective downstream targets, exhibited significant activation in the epinephrine treatment group (Figure [243]5i,j; Figure [244]S21h and Table [245]S20, Supporting Information). These results are consistent with the findings of metaTF activity analysis in CD8^+ T[EM] C1 (Figure [246]5f; Figure [247]S21a–c, Supporting Information). This suggests that CD8^+ T[EM] C1 cells respond to neuro‐related signals by activating these transcription factors and their downstream targets. To examine the consistency of our findings in other breast cancer dataset, we reanalyzed an scRNA‐seq dataset of human breast cancer T cells^[ [248]^73 ^] ([249]GSE110686), maintaining the original classification of 11 cell populations. Similarly, we employed metaTF to dissect the heterogeneity of these 11 cell clusters, and the results aligned with previous findings. CD8^+ T[EM] cells were stratified into two distinct clusters, with one subset specifically expressing adrenergic receptor ADRB2, while the other subset did not (Figure [250]S23a, Supporting Information). Differential TF activation analysis revealed that BCL6 exhibited specificity in activation within ADRB2^+ CD8^+ T[EM] cells, consistent with earlier conclusions. However, slightly divergent from previous findings, TBX21 and ATF3 were not exclusively activated within ADRB2^+ CD8^+ T[EM] cells (Figure [251]S23b,c, Supporting Information). In summary, our analysis confirms the existence of a previously unrecognized subset of CD8^+ T[EM] cells with specialized functionality, characterized by the expression of neural receptors. Figure 5. Figure 5 [252]Open in a new tab metaTF reveals TF activity heterogeneity in human breast cancer T cells. a) UMAP plots visualizing cluster assignments of human breast cancer T cells based on TF activity profile (left) and highly variable gene expression profile (right). Cells are color‐coded to represent different human breast cancer T‐cell populations, including T[EM] (effector/memory T cells), T[EX] (experienced T cells), T[N] (naive T cells), T[REG] (regulatory T cells), natural killer (NK) cells), and Proliferating (proliferating T cells). b) UMAP plot of CD8^+ T[EM] C1 and C2 marker genes expression. c) Heatmap plot showing the expression of the top 50 differentially expressed genes between CD8^+ T[EM] C1 and C2 cell populations. d) GO biological process enrichment results for the differentially expressed genes between CD8^+ T[EM] C1 (upper panel) and C2 (lower panel) cell populations respectively. e) Dot plot showing the significant expression of β[2]‐adrenergic receptor (ADRB2) and adrenoceptor alpha 2B (ADRA2B) primarily in the CD8^+ T[EM] C1 cell population. f) Dot plot indicating that BCL6 was specifically activated in the CD8^+ T[EM] C1 cell population. g) Spatial co‐mapping of CD8 and ADRB2 gene expression by spatial transcriptomics of a sample from a breast cancer patient recorded in a public dataset ([253]GSE195665, patient35, right panel). The left and middle panels depict expression levels in fresh frozen sections. h) Expression of CD8 and ADRB2 in breast cancer tissue, and the colocalization (Pearson's coefficient) between CD8 and ADRB2 is shown, n = 12. i) Upregulated targets of BCL6 in the epinephrine treatment group compared to the control group from in vitro cytological experiments. j) In vitro cytological experiments confirmed that candidate TFs and their targets are highly expressed in the epinephrine‐treated group through GSEA prerank analysis. 3. Conclusion We introduce here metaTF, a computational framework designed for analyzing TF activity from scRNA‐seq data. MetaTF offers a robust assessment of TF activities, independent of experimental methods, by integrating prior TF–target networks and expression‐based gene regulatory networks. The novelties of metaTF encompass several key aspects. First, metaTF starts by building comprehensive prior networks that compile TF–target interactions from various sources, including curated databases, ChIP‐seq analyses, and experimental perturbations (e.g., TF knockout/knockdown studies). This approach incorporates both strongly and weakly supported TF–target relationships, mitigates the biases associated with relying on a single data modality. By integrating diverse dimensions of TF biology, such as chromatin context and regulatory dynamics, metaTF provides a more holistic view of TF–target interactions that surpasses conventional methodologies dependent solely on structural motifs or DNA‐binding domains. Second, one of the critical advancements of metaTF is its ability to integrate and analyze multiple types of data, a capability that previous algorithms lacked. MetaTF employs a biologically driven network integration technique to merge the dynamic regulatory networks derived from scRNA‐seq data with the static prior networks. This integration enhances the accuracy of TF regulation portrayal by incorporating real‐time, cell‐specific regulatory interactions. This comprehensive integration approach enables metaTF to capture both canonical and cell‐specific regulatory events, providing deeper insights into gene regulatory networks. Moreover, by analyzing data from CD8^+ cytotoxic T‐cell differentiation, metaTF has not only demonstrated exceptional performance in analyzing gene regulatory networks but has also shown its ability to predict the effects of TF perturbations with remarkable accuracy, emphasizing that it provides an effective and applicable solution for quantitative analysis of single‐cell‐based CRISPR screening data. Certainly, limitations include reduced performance with higher dropout rates and a focus on inferring activated rather than repressed regulatory edges. Potential solutions involve assigning greater weight to prior knowledge and utilizing methods like PCOR or scLink, which support the calculation of negative edges. Although metaTF demonstrated superior performance over SCENIC and DoRothEA in identifying perturbation‐affected target genes for most TFs, discrepancies arose in regulon composition, particularly for TFs like FOXQ1, where nondifferentially expressed targets (e.g., RBBP5,^[ [254]^74 ^] SOX12,^[ [255]^75 ^] and SIRT1 ^[ [256]^76 ^]) reduced metaTF prediction accuracy (Figure [257]2a; Figure [258]S24, Supporting Information). These targets likely reflect regulatory complexities, such as epigenetic modifications or super enhancers, highlighting the necessity for metaTF to incorporate multiomics data to achieve more robust inference of TF activity. Additionally, we are actively compiling and curating similar TF regulatory networks for other model organisms, such as Drosophila^[ [259]^77 ^] and zebra fish,^[ [260]^78 ^] to expand the application of metaTF. In our study, we used 10× Genomics scRNA‐seq data from melanoma cultures to evaluate how low coverage affects metaTF performance. We simulated synthetic dropout in cells treated with SOX10 knockdown or nontargeting Small Interfering Ribonucleic Acid (siRNA). As dropout rates increased, both the number of detected genes and metaTF performance decreased. When the median number of detected genes per cell fell below 2500, metaTF's predictive performance markedly declined, and its accuracy was around 50% when this number dropped below 2000. This highlights the challenges of low‐coverage data and suggests that imputation methods like MAGIC^[ [261]^79 ^] and DeepImpute^[ [262]^80 , [263]^81 ^] could be useful for improving data quality and metaTF performance in such scenarios. Cancer represents an intricate and exceedingly heterogeneous class of diseases, and this heterogeneity presents itself in various manifestations. First, patients with the same type of cancer often exhibit varying prognostic outcomes and responses to treatment. Second, within the tumor microenvironment of an individual cancer patient, both the composition and biological behaviors of cancer cells and stromal cells undergo dynamic changes as the disease progresses. TFs play pivotal roles in orchestrating how cancer cells respond to changes in their microenvironment. Notably, the solid tumor microenvironment is usually characterized by hypoxia and cancer‐associated inflammation, as elucidated in previous studies.^[ [264]^82 , [265]^83 , [266]^84 ^] Hypoxia‐induced factor 1α (HIF1α) is a key TF that mediates cancer cells' adaptation to hypoxia stimuli. Interestingly, it is noted that the mRNA level of HIF1α remains largely unchanged during hypoxia compared to normoxia. In contrast, HIF1α protein is hydroxylated and degraded under normoxia, but rapidly accumulates and forms a heterodimer with HIF1β to regulate gene expression in response to the hypoxic microenvironment.^[ [267]^85 ^] When stimulated by inflammatory mediators or cytokines secreted by immune cells in the tumor microenvironment, the inflammation‐associated TF Nuclear Factor kappa‐B (NF‐κB) in cancer cells is activated, leading to the upregulation of genes that regulate cancer proliferation and metastasis. However, the activation of NF‐κB occurs through its translocation from the cytoplasm to the nucleus, rather than an increase in its mRNA or protein levels.^[ [268]^86 ^] These observations underscore the importance of investigating TF activities to understand the mechanisms driving cancer progression. In this study, we utilized metaTF to discern subtype‐specific activated TFs within various molecular subtypes of breast cancer. Our findings unveiled a plethora of subtype‐specific TFs with heightened activities in breast cancer. Delving deeper into these hitherto overlooked TFs will not only shed light on the regulatory mechanisms underlying diverse clinical subtypes and stages of breast cancer but also provide promising targets for therapeutic interventions in the management of breast cancer. Additionally, it is noteworthy that in addition to the inherent heterogeneity observed in cancer cells, immune cells infiltrating the tumor microenvironment also exhibit dynamic and diverse characteristics. These tumor‐infiltrated immune cells can lose their antitumor efficacy, and, in some cases, even adopt procancer functions within the tumor microenvironment.^[ [269]^86 , [270]^87 , [271]^88 , [272]^89 ^] In light of these considerations, we propose that delineating cell identities based on TF activity profiles offers distinct advantages over conventional strategies that rely on gene expression profiles when investigating the functional roles of immune cells. Therefore, utilizing metaTF can facilitate the discovery of new subpopulations, essential subtype‐specific regulators, and TF‐to‐TF regulatory circuitry. In this study, metaTF uncovered a previously unrecognized subset of CD8^+ T[EM] cells characterized by the expression of neural receptors within the microenvironment of breast cancer. These cells exhibit distinct transcriptional profiles and respond to neuro‐related signals, suggesting a potential role for neural signaling in modulating T‐cell function within breast cancer. Certainly, the exact mechanistic details of how these TFs influence immune responses and tumor behavior are not yet fully elucidated. Further investigations into the downstream effects and signaling pathways involved in CD8^+ T[EM] C1 cells would provide a deeper understanding of the mechanisms at play. Moreover, exploring clinical correlations and outcomes related to the presence and function of CD8^+ T[EM] C1 cells in breast cancer patients can enhance the clinical relevance of these findings. In addition, longitudinal studies tracking the dynamics of CD8^+ T[EM] C1 cells over time during cancer progression and in response to treatments would provide insights into their role in disease progression and therapy resistance. In this study, metaTF uncovered a previously unrecognized subset of CD8^+ T[EM] cells characterized by the expression of neural receptors within the microenvironment of breast cancer. These cells exhibit distinct transcriptional profiles and respond to neuro‐related signals, suggesting a potential role for neural signaling in modulating T‐cell function within breast cancer. Catecholamines bind to adrenergic receptors on immune cells, including T cells, initiating complex and diverse signaling pathways. Continuous stimulation of nor‐adrenaline in the tumor microenvironment triggers the canonical pathway of β2‐ARs expressed in T lymphocytes.^[ [273]^90 ^] β2‐AR triggers a cascade involving G protein dissociation, Cyclic Adenosine Monophosphate (cAMP) production, and Protein Kinase A (PKA) activation, which subsequently modulates key signaling molecules like Lck and ZAP70. This ultimately impacts T‐cell activation, proliferation, and function.^[ [274]^91 ^] On the other hand, β2‐AR upregulates the expression of checkpoint receptor PD‐1, which further inhibits CD28‐mediated T‐cell activation signaling.^[ [275]^92 ^] Additionally, the adrenergic system can modulate T‐cell apoptosis in a PKA‐independent manner by engaging the Src family tyrosine kinase Lck.^[ [276]^93 ^] These studies suggest that β2‐AR activation in T cells generally suppresses their immunosurveillance functions. Our study using metaTF on scRNA‐seq breast cancer datasets found that CD8^+ T[EM] cells could be divided into two clusters based on TF activity, with the ADRB2^+ CD8^+ T[EM] subset showing higher expression of activation and proliferation markers and genes related to immune responses, and validation experiments demonstrated that ADRB2^+ CD8^+ T[EM] cells respond to neuro‐related signals by activating key transcription factors such as BCL6, TBX21, and ATF3, along with their downstream targets. Considering that BCL6 has been recognized to play important roles in generation and proliferation of CD8^+ memory T cells,^[ [277]^94 ^] TBX21 may be necessary for the terminal differentiation of CD8^+ T cells,^[ [278]^95 ^] and ATF3 may promote the infiltration of functional CD8^+ T cells,^[ [279]^96 ^] thus we believed that we identified a novel subtype of CD8^+ T cells and activation of β2‐AR signaling by epinephrine may activate the cytotoxic activity of this type of CD8^+ T cells. Certainly, the exact mechanistic details of how these TFs influence immune responses and tumor behavior are not yet fully elucidated. Further investigations into the downstream effects and signaling pathways involved in CD8^+ T[EM] C1 cells would provide a deeper understanding of the mechanisms at play. In summary, metaTF provides an integrated framework to reconstruct GRNs and analyze TF activity from scRNA‐seq data. This framework will aid biologists in elucidating the transcriptional regulation of cell identities, states, and functions. Continued advancements in network inference and activity analysis methods will further strengthen the interpretation of single‐cell genomics data. 4. Experimental Section The MetaTF Workflow The metaTF workflow consists of five main steps: data preprocessing, GRN inference, network integration, regulon identification, and TF activity estimation. The standard protocol is as follows: 1. Data Import: The raw count matrix from multiple sources was summarized into a SingleCellExperiment object, subsequently, log2 normalization and filtration were performed to remove genes expressed in fewer than five cells usually. 2. GRN Inference: PUIC, an entropy‐based method, was used to infer regulator–target relationships by quantifying the unique contribution from the mutual information between each gene triplet.^[ [280]^97 ^] Suppose a target gene T and a vector of regulator genes are given. Then, the relationships between R and T can be estimated with MI. However, one target gene can be simultaneously regulated by multiple regulators, indicating that the MI between every two R and T pairs may be dependent. Therefore, it was needed to decompose the partial information that was uniquely contributed by the indicated regulator. Here, the non‐negative decomposition of multivariate information (NDMI) algorithm^[ [281]^98 ^] was utilized to decompose the MI into unique (Unq), redundant (Rdn), and synergistic (Syn) information. Specifically, the simplest form of NDMI with only three variables was considered: one target gene T, and two regulator genes {R [1], R [2]}. The MI I(T;  R [1], R [2]) can be decomposed as [MATH: IT;R1,R2=UnqT;R1+UnqT;R2+RdnT;R1,R2< mtd columnalign="right">+SynT;R1,R2 :MATH] (1) where the MI between T and R [1] is [MATH: IT;R1=UnqT;R1+RdnT;R1,R2 :MATH] (2) And the same with R [2] [MATH: IT;R2=UnqT;R2+RdnT;R1,R2 :MATH] (3) Here, a gene regulatory network was considered with n genes. Given a pair of genes R and T, there are n − 2 possible partners [MATH: R2=P=< /mo>{P1,P2,,< msub>Pn2} :MATH] involved in the triple gene pairs. Since the MI between R and T is unaffected by any choice of a third partner [MATH: P={P1,P2,,< msub>Pn2} :MATH] , the unique information Unq(T;  R) could accurately measure the individual contribution of R to T. Therefore, the total PUIC of all n − 2 triple gene pairs was used to estimate the importance of a regulator R to a target T as [MATH: PUCR;T=i=< /mo>1n2UnqpiR;TIR;T :MATH] (4) To calculate the unique information between regulator R and target T within each triple gene pair, the redundancy should first be measured by specific information I [spec](T  =  t;  R) as [MATH: IspecT=t;R=rRpr|tlog1ptlog1pt|r :MATH] (5) where [MATH: T={t1,t2,,< msub>tk} :MATH] and [MATH: R={r1,r2,,< msub>rk} :MATH] are nonempty subsets of target T and regulator R, respectively. In the context of gene expression data, t and r represent the discrete bins in which a gene's expression value fell in. In this research, the number of bins (k) of each gene was set as the nearest integer of the square root of the cell number. Then, the redundancy information of the triple gene pair was calculated with one regulator R, one partner P, and one target T by comparing the specific information from each bin within {R, P} about each bin t of target T as [MATH: RdnT;R,P< /mfenced>=i TptminR,PIspeT=t;R,P< /mfenced> :MATH] (6) Next, the MI between regulator R and target T was be calculated as [MATH: IR;T=HR+HTHR,T< /mrow> :MATH] (7) H(R) or H(T) represents the entropy that quantifies the uncertainty of the gene expression level as [MATH: HR=r Rprlogpr :MATH] (8) To calculate the frequency r in each bin within {R}, the gene expression was first dispersed into [MATH: k=m :MATH] bins according to the range of expression levels, while m represents the total number of cells and r represents the number of cells falling into each bin. It was noticed that the MI was symmetric but the redundancy was asymmetric, so the PUC score was an asymmetric measure that was suitable for transcriptional regulation since this is a directional process. However, for a pair of genes that exist both in the regulator and target genesets, the final score was calculated as the average PUC in both directions, as follows [MATH: PUCR,T< /msub>=PUCR;T+PUCT;R< mn>2,RT< /mtr>PUCR;T,RT :MATH] (9) The final PUC score for each gene pair was further normalized based on its average cumulative distribution in the regulator and its target set as [MATH: WR,T =FRPUCR,T< /msub>+FTPUCR,T< /msub>2 :MATH] (10) where the F[R] (PUC[ R,T ]) is the cumulative distribution of all PUC scores involving the regulator R, F[T] (PUC[ R,T ]) is the cumulative distribution of all PUC scores involving the target T, and W [R,T ]represents the weighted regulatory relationships between regulators and targets. The weighted GRN was constructed for subsequent analysis. Here, classic information‐theoretic approaches use the on‐off binary approximation, which is appropriate for scRNA‐seq data given that it mostly yields data about transcript presence or absence, on–off binary approximation‐based methods^[ [282]^99 , [283]^100 ^] were also incorporated into the single‐cell GRN inference framework. 1. Prior‐Network Construction: The TF–target relationships inferred from PUIC algorithm are only based on gene expression, they might include many false‐positive targets. To reduce the effect of the false‐positive connections, a prior TF–target network (knowledge‐based information) was constructed. TFs were obtained from the HumanTFs website^[ [284]^101 ^] ([285]http://humantfs.ccbr.utoronto.ca/) and constructed the human prior TF–target networks based on four main resources: manually curated datasets, ChIP‐seq binding datasets, TF knockout experiment, and tissue‐specific transcriptional regulatory networks (TRNs). 1) The manually curated TF–target relationships were collected from the following databases and literature: TRRUST,^[ [286]^102 ^] INTACT,^[ [287]^103 ^] TRANSFAC,^[ [288]^104 ^] FANTOM4,^[ [289]^105 ^] TFe,^[ [290]^106 ^] MSIGDB,^[ [291]^107 ^] PAZAR,^[ [292]^108 ^] TFactS,^[ [293]^109 ^] HTRIDB,^[ [294]^110 ^] TRED,^[ [295]^111 ^] TRRD,^[ [296]^112 ^] and Neph2012.^[ [297]^113 ^] 2) The ChIP‐seq associated TF–target networks were collected from two ChIP‐seq related databases (ChEA^[ [298]^114 ^] and ENCODE^[ [299]^115 ^]) and two motif‐based databases (HOCOMOCO^[ [300]^116 ^] and JASPAR^[ [301]^117 ^]) by predicting the TF binding sites. 3) The TF knockout/knockdown information was downloaded from the KnockTF^[ [302]^112 ^] database. First, the datasets that knockout/knockdown TFs were significantly downregulated (P < 0.05 and log‐transformed fold changes (logFC) < −1) compared to the control group were only retained. Then, the differentially expressed genes (P < 0.05 and |logFC| > 1) were regarded as targets for the knockout/knockdown TF, and the weight of interactions was calculated as log[10] (P). 4) Tissue‐specific and blood‐specific TF‐target networks were constructed based on 32 tissue datasets^[ [303]^118 ^] and six blood datasets ([304]https://zenodo.org/record/8355476), respectively. 2. Regulon Identification: First, the prior TF–target networks and the weighted GRN were integrated using the following equation [MATH: WR,T =11+< msup>ea0+i=1n< mrow>aiwR,Ti :MATH] (11) 1. TF Activity Estimation: The above filtered regulons were used to calculate the relative enrichment score (VIPER^[ [305]^17 ^] by default) against the log2‐normalized expression data and integrate the TF–target networks. VIPER is a statistical framework that was developed to estimate protein activity from gene expression data using enriched regulon analysis performed by the algorithm aREA. It requires information about interactions between a protein and its transcriptional targets and the likelihood of their interaction. Here, the log2 normalized expression data and integrated TF target network were inputted to VIPER to generate the final TF activity matrix. Moreover, several functions were added to the metaTF package for statistical analysis and visualization, such as cell‐type‐specific activated TF identification, regulon pathway enrichment analysis, and data visual aids (dot plot, Radviz plot, Sankey diagram, and heatmap). where n represents the number of sources, a^i controls the importance of the source i ^th, a ^0 is a constant parameter, and [MATH: wR,Ti :MATH] denotes the weight of the TF–target in source i ^th. Moreover, the mouse prior TF–target networks in the current research were transformed from the human networks using orthologous genes. Then, a normal distribution was fitted to the weights of targets for each TF in the integrated TF–target network, and the truncation value q = 0.01 was used to infer the threshold for these targets. For each TF, only targets with weights greater than the threshold were retained as a gene list and inputted together with the previously inferred GRN into the fgseaMultilevel method (fgsea, [306]https://github.com/ctlab/fgsea/) to do preranked gene set enrichment analysis. The leading edge of each enrichment result was considered as positively regulated targets for each TF. Finally, TFs and their positively regulated targets were identified as regulons. Data Processing for Prior‐Network Construction The files “trrust_rawdata.human.tsv” and “edge.GoldStd_TF.tbl.txt” were downloaded from TRRUST and FANTOM4, respectively. From INTACT, a curated dataset, specifically focusing on human protein–DNA interactions, was acquired. From TFe, “Targets” and “interaction” were taken as input content and download relevant data. From TFactS, the “Catalogues.xls” file, which only preserves the interaction pairs of human species and contains relevant literature, was downloaded. From HTRIDB, only the results detected by small and medium‐sized technologies such as chromatin immunoprecipitation and linker chromatin immunoprecipitation were retained. For TRED, the BiPAX file was downloaded from RegNetwork and relevant TF interaction information was extracted. For TRRD, it was ultimately extracted from the “Catalogues.xls” file of TFactS. From MSIGDB, the regulatory target gene set collection was downloaded. For Neph2012, it was downloaded directly from its R package. From TRANSFAC, the transcription factor regulatory gene sets were downloaded. For PAZAR, it was downloaded from the ORegAnno database. Finally, all the interaction relationships together without redundancy were integrated. The integration of the prior associations between transcription factors and target genes from the aforementioned sources involved the utilization of MI scores for weighting. These scores, which were normalized weighted counts, take into account independent interaction evidence and relevant experimental methods to quantify the relationships accurately [MATH: WeightMI=Kp×Spn+Km×Smcv+Kt×StcvKp+Km+Kt :MATH] (12) Here, the weights in three dimensions, namely number of publications, experimental detection method, and interaction type, are denoted as K [p],K [m,] and K [t] respectively. Experimental methods are typically classified into three groups: small and mid‐scale techniques such as chromatin immunoprecipitation, concatenate chromatin immunoprecipitation, Cytosine‐phosphate‐Guanine (CpG) chromatin immunoprecipitation, DNA affinity chromatography, DNA affinity precipitation assay, DNase I footprinting, electrophoretic mobility shift assay, southwestern blotting, streptavidin chromatin immunoprecipitation, surface plasmon resonance, yeast one‐hybrid assay; high‐throughput techniques like ChIP‐chip or ChIP‐seq, and unknown. Interaction types include physical interaction, direct interaction, interaction, activation, inhibition, and unknown. The merged ChIP‐seq binding peaks were obtained from ReMap. Notably, only regulatory proteins excluding transcription factors were considered in this analysis. Using the bedtools closest function, each binding site was matched with the nearest transcription start site (TSS) for every TF. In cases where genes had multiple transcripts, the closest binding site–transcript pair was selected. This enabled the assignment of each binding site to a gene, allowing for the possibility of a gene having no or multiple binding sites for a given TF. Furthermore, a score ranging from 0 to 1 was assigned to each binding site–gene pair based on the distance between the binding site and the TSS [MATH: ChIpTFG=i< mi mathvariant="normal">edi k×DTF :MATH] (13) Here D [TF] stands for the median distance separating the TSS and binding sites associated with all TFs, whereas d[i] refers to the distance between the binding site within a TF and the TSS of the gene g. The score attributed to a TF target is the sum of the weights assigned to all binding sites of that TF along with the TSS of the target. The analysis involves scanning TFBS in the promoter sequences of each transcript using the human genome reference sequence (GRCh38 version), encompassing 1000 bp upstream (5′ end) and 200 bp downstream (3′ end). Position Weight Matrix (PWMs were sourced from the HOCOMOCO and JASPAR repositories. Employing the Find Individual Motif Occurrences (FIMO) tool in the Multiple EM for Motif Elicitation (MEME) software suite (MEME's FIMO) motif discovery tool with default parameters, putative TFBS were identified in the promoters, focusing on FIMO predictions with a P‐value of <0.0001 and removing duplicate matches. It was then proceeded with the annotation of conservation and epigenetic regulatory properties related to TFBS. Essential phase Cons and phyloP scores were sourced from the CellBase database for this purpose. The final weight calculation involved multiplying the phasCons score by the phyloP score (with −log(P) value)) while taking the square root into account. Performance of GRN Inference Methods To evaluate the performance of GRN inference methods on single‐cell data, CROP‐seq data of HEK293T cells were downloaded from the Gene Expression Omnibus (GEO, [307]GSE92872^[ [308]^36 ^]). CROP‐seq enables pooled CRISPR knockout screening of several TFs simultaneously. Ideally, it was considered that the targets of the TFs were activated in wild‐type (WT) cells but not in perturbed cells. To ensure successful perturbations in the CROP‐seq experiments, samples were retained only when targeted TF was significantly downregulated in perturbed cells. Significant differential expression between stimulated and unstimulated cells was determined using a two‐tailed T‐test, and only the samples where the target TF was significantly downregulated (P < 0.05 and logFC < −1) were retained. A total of 1143 cells including 14 target TFs were retained for further evaluation (Figure [309]S4f, Supporting Information). Next, genes expressed in <10 cells were filtered out. Three different methods, PUIC, GENIE3,^[ [310]^27 ^] and PCOR,^[ [311]^28 ^] were used for GRN inference. For GENIE3, the above 14 target TFs were only used as input regulators considering the computational time cost. To evaluate these three methods, the targets of each TF were first pre‐ranked according to their edge weights, and then the area under the recovery curve (AUC) was calculated using independent gene sets from DoRothEA^[ [312]^15 ^] (ABCDE) as the ground truth. The AUC value of each TF reflects the enrichment of its targets in inferred GRNs. Additionally, to evaluate computational time consumption, expression data were simulated by subsampling different numbers of genes (100, 200, 400, 600, 800, and 1000) within the 1143 cells. The running time was then calculated for GRN inference steps using each simulated expression matrix. The results showed that PUIC outperformed the two other methods in over half of the TFs (Figure [313]2f). Meanwhile, PUIC is much more efficient than GENIE3 (Figure [314]S4h, Supporting Information). Performance of the Integrated TF–Target Network To evaluate the performance of regulons from different sources, regulons were collected from the above resources in multiple ways: downloaded from DoRothEA (DoRothEA regulons), generated from RcisTarget (RcisTarget regulons), prior‐network and integrated TF‐target network from prior network and the weighted GRN. For DoRothEA regulons, only human regulons named “dorothea_hs” were used. RcisTarget regulons were obtained using the simplified SCENIC workflow described above. To generate GRN‐based regulons, GRNs were first inferred from single‐cell CRISPRi data using the PUIC method with all of the genes as input, and then ARACNE^[ [315]^119 ^] was used to remove the weakest connections within the GRNs to generate regulons. Integrated regulons were generated by combining the prior pan‐tissue network and GRNs inferred from single‐cell CRISPRi data of human ESCs.^[ [316]^31 ^] Only 40 TFs were retained, corresponding to TFs perturbed in the CRISPRi experiment. TF activities for the above TFs were estimated using the standard metaTF pipeline. Performance was evaluated by calculating AUC using the TF activities estimated as described above. Paired sample T‐tests were performed on the AUC values to determine the difference between any two groups of regulons (Figure [317]S4j, Supporting Information). The results showed that the integrated regulons performed better (an average AUROC of 0.62) than the regulons from other sources (Figure [318]S4i, Supporting Information). It was also noticed that the regulons directly downloaded from the database without considering the expression data only performed slightly better than random levels (an average AUROC of 0.52). Performance of MetaTF To evaluate the performance of metaTF on TF activity estimation, the activity scores were transformed into a binary setup based on the perturbation type of the experiment (knockout/overexpression). For knockout experiments, cells with targeted TF knockout were labeled as the negative class, while control cells were labeled as the positive class. For overexpression experiments, the class labels were reversed. The receiver operating characteristic (ROC) and AUC analyses were then performed using the R package ROCR^[ [319]^120 ^] (version 1.0‐11). Using the positive and negative classes, the true positive rate (TPR) and false‐positive rate (FPR) were calculated at different thresholds of TF activity to generate the ROC curves (Figure [320]S1c, Supporting Information). Three commonly used single‐sample activity estimation methods including VIPER, ssGSEA, and AUCell, were benchmarked using the same integrated regulons. It was found that both VIPER and ssGSEA showed significantly higher performance than AUCell (Figure [321]S4k, Supporting Information). Implementation of Simplified SCENIC Workflow To efficiently infer the GRNs using the SCENIC^[ [322]^18 ^] workflow, a protocol that contains the main SCENIC steps was implemented. In the simplified SCENIC workflow, GENIE3 (version 1.14.0) was first used for GRN reconstruction with potential TFs. Next, the targets of each TF were preranked based on their edge weights in the GRNs, and the top‐ranked (e.g., 5%) targets of each TF were selected as putative regulons. The putative regulons were further trimmed using the R package RcisTarget^[ [323]^121 ^] (version 1.12.0) combined with cis‐regulatory DNA‐motif information downloaded from [324]https://resources.aertslab.org/cistarget/. By default, the motif database and TF annotation used were from version 9 of cistarget ([325]https://resources.aertslab.org/cistarget/, hg38 and mm10, with a distance 500 bp Up 100 Dw for humans and mice, respectively). All enriched genes were retained only if the TF of the putative regulon was identified as either high or low confidence. Then, the enriched targets from different motifs of the same TF were merged as the final regulons. The TF activity was estimated using the R package AUCell (version 1.14.0) with the final regulons against the log2‐normalized expression data. Finally, the above steps were summarized and implemented as a function named “runSCENIC” in the R package metaTF. Preprocessing of Single‐Cell CRISPRi Data To evaluate the performance of TF regulon analysis methods, scRNA‐seq‐based CRISPRi screening data on human ESCs were retrieved from GEO ([326]GSE127202^[ [327]^31 ^]). Samples with fewer than ten cells and cells with multiple guide RNAs were removed. A T‐test was then performed between knockout and control samples, leaving only samples with significant downregulation (P < 0.05 and logFC < −1; Figure [328]S2a, Supporting Information). After preprocessing, an expression matrix of 4325 cells containing 40 targeted TFs across 106 samples was retained for further analysis. The overall performance of metaTF was compared with SCENIC and DoRothEA on CRISPRi data with 40 target TFs. MetaTF identified all the target TFs and performed best in nearly all TFs (Figure [329]2a). To elucidate the pivotal contributions of two key steps in inferring TF activity using metaTF, the activities of 35 TFs were recalculated by separately removing the steps involving PUIC for GRN calculation and the prior network. The findings revealed that, compared with PUIC alone, the inclusion of the prior network resulted in 32 TFs achieving higher AUROC values, indicating that the inclusion of the prior network enables metaTF to achieve better performance. Simultaneously, when excluding the PUIC calculation of the GRN step and retaining only the prior network, metaTF achieved results comparable to SCENIC (Figure [330]S5a,b, Supporting Information). In Figure [331]2a, SOX17 has AUROC values much higher than other factors. The estimation of SOX17 activity using metaTF and SCENIC indicated that the knockout of SOX17 led to a significant decrease in the expression of its target genes, including GATA4, GATA6, FOXA2, and SOX7 (Figure [332]S5c, Supporting Information). The downregulation of these TFs subsequently affects the expression of other downstream genes, leading to substantial changes at the transcriptional level. These changes are particularly beneficial for accurately estimating SOX17 activity, thereby enhancing the precision of SOX17 activity estimation. Consistent with previous analysis of human single‐cell CRISPR screening data, the prior knowledge‐based network was replaced with a mouse TF‐target network to assess the activity of these four TFs in perturbed cells and control cells from mouse brain cortical development^[ [333]^122 ^] using three methods: metaTF, SCENIC, and DoRothEA. The AUROC values were calculated for each method (Figure [334]S7, Supporting Information). The results showed that metaTF achieved the highest AUROC values for Foxg1, Nr2f1, and Tcf4, indicating that the TF regulons identified by metaTF in mouse can reflect TF activity with a relatively high degree of accuracy. TF Activity Inference in Bone Marrow Plasma Cells with MetaTF and T2T‐CHM13 Alignment The T2T‐CHM13 reference genome, a significant improvement over GRCh38,^[ [335]^123 ^] was used to analyze scRNA‐seq data from healthy human bone marrow plasma cells.^[ [336]^40 ^] Sequencing reads were aligned to T2T‐CHM13 to generate a gene expression matrix, which was filtered based on original study criteria. TF activities were inferred using metaTF with a blood‐associated regulatory network as the prior. Highly variable genes and TFs (top 500 by coefficient of variation (CV)) were selected for Uniform Manifold Approximation and Projection (UMAP) clustering. Gene expression clustering confirmed IgA and IgG plasma cell populations, though isotype segregation was less distinct (Figure [337]S10a, Supporting Information). TF activity analysis revealed three major clusters (Figure [338]S10b, Supporting Information), with specific TFs showing elevated activity (Figure [339]S10c, Supporting Information). These findings support the hypothesis that bone marrow plasma cells retain tissue‐of‐origin transcriptional programs, reflecting their diverse tissue origins. Preprocessing of Single‐Cell CRISPR Data with Double‐Knockout TFs In CD8^+ Cytotoxic T‐Cell Differentiation As described by Zhou et al.[340] ^21 , they re‐engineered a dual‐guide, direct‐capture lentiviral single guide RNA (sgRNA) vector to generate a modified Ametrine‐expressing retroviral vector that effectively transduced primary CD8^+ T cells. The dataset was downloaded from single‐cell CRISPR screens pertaining to the differentiation of CD8^+ cytotoxic T cells. Data from cells were retained with only one type of sgRNA, those with two types of sgRNAs, as well as data from control cells. sgRNA with fewer than ten cells were removed. A T‐test was then performed between knockout and control samples, leaving only samples with significant downregulation (P < 0.05 and logFC < −1). After preprocessing, an expression matrix of 53 774 cells containing 109 single‐targeted and 426 double‐targeted TFs was retained for further analysis. The overall performance of metaTF with DoRothEA on CRISPRi data was compared with 109 single‐targeted TFs. For double‐targeted TFs, the target TFs metaTF activity scores were compared with the control cells. Evaluating the Robustness of TF Activity Estimation Methods through Synthetic Dropout Simulation Single‐cell CRISPRi screening data were used to evaluate the robustness of TF activity estimation methods through synthetic dropout simulation. The expression matrix was randomly zeroed out at various dropout ratios (5%, 10%, 20%, and 30% of total values) to create sparser matrices. For each simulated expression dataset, TF activities were inferred using metaTF, SCENIC, and DoRothEA. Specifically, metaTF integrated the prior‐based network with the GRN‐based network using the “integraNets” method with parameters “max_target_num = 2000” and “maxInter = 10.” For SCENIC, the “runSCENIC” method was used with default parameters. For DoRothEA, the “run_viper” method with built‐in data from the DoRothEA package was used as the default parameter. Three gene activity inference methods (VIPER, ssGSEA, and AUCell) were assessed using the same regulons identified by metaTF. For each method and dropout level, the AUC value of each TF was calculated. The impact of low coverage on metaTF performance was further assessed using 10× Genomics scRNA‐seq data from melanoma cultures ([341]GSE134432^[ [342]^32 ^]). Synthetic dropout was simulated to evaluate metaTF's robustness under varying dropout levels. Single melanoma sample MM074 cells were treated with SOX10 perturbation (knockdown group: MM074_SOX10_72 h) or nontargeting siRNA (control group: MM074_NTC; Figure [343]S3a–c, Supporting Information). SOX10 activity was predicted in knockdown cells after applying random zeroing at dropout ratios from 5% to 50% (Figure [344]S3d, Supporting Information). MetaTF was used to infer SOX10 activities, integrating prior‐based and GRN‐based networks via “integraNets” with parameters “max_target_num = 2000” and “maxInter = 10.” As expected, with increasing dropout rates, the number of detected genes progressively decreases, and the performance of metaTF also declines gradually (Figure [345]S3e, Supporting Information). When the median number of detected genes per cell falls below 2500, there is a significant decline in the predictive performance of metaTF. Specifically, when the median number of detected genes drops below 2000, the accuracy of metaTF's predictions hovers around 50%. At this point, imputation methods (e.g., MAGIC,^[ [346]^79 ^] DeepImpute^[ [347]^80 ^]) can be employed to enhance data quality. Evaluating the Performance of MetaTF Using In Vivo Nkx2‐1 Knockout Data Droplet‐based scRNA‐seq data consisting of lung cells from Nkx2‐1 ^CKO/CKO; Aqp5 ^Cre ^/+ mutant mice and littermate controls were downloaded from GEO ([348]GSE129584^[ [349]^33 ^]). Since the Nkx2‐1 conditional knockout (cKO) primarily affected alveolar type 1 (AT1) cells, the analysis exclusively focused on epithelial cells, in which Nkx2‐1 plays a key role. t‐SNE analysis on mutant and control cells was first performed using the R package scater,^[ [350]^124 ^] clustering them into 28 clusters (Figure [351]S4a, Supporting Information). Clusters with over 50% of cells expressing Cdh1, an epithelial marker gene, were further retained (Figure [352]S4b, Supporting Information). This resulted in 2085 control and 2484 Nkx2‐1 ^CKO/CKO; Aqp5 ^Cre ^/+ mutant epithelial cells. The t‐SNE results showed that Nkx2‐1 was only expressed in WT cells, indicating a successful knockout of Nkx2‐1 in AT1 cells (Figure [353]S4c, Supporting Information). For the benchmarking process, TF activity matrices for these epithelial cells were calculated using standard metaTF, simplified SCENIC, and standard DoRothEA workflows. The ROC and AUC values for each method were calculated as previously described. Evaluating the Performance of MetaTF Using In Vivo Cebpa Knockout Data Massively parallel single‐cell RNA sequencing (MARS‐seq) data consisting of myeloid progenitors from Cebpa conditional knockout (Cebpa ^flox/flox Mx1‐Cre) and matching control (Cebpa ^flox/flox) mice were downloaded from GEO ([354]GSE72857^[ [355]^34 ^]). After quality control, a total of 1152 control and 2304 Cebpa mutant cells were retained for analysis (Figure [356]S4d, Supporting Information). An analogous benchmarking process to the Nkx2‐1 knockout data was employed, with the distinction being that the prior GRNs were based on “myeloid_leukocyte”‐specific data. The results showed that metaTF still performed better (AUROC of 0.68) than other tools (Figure [357]2b). Evaluating the Performance of MetaTF Using TF Overexpression Data Single‐cell Smart‐seq2 data from induced hemogenic endothelial cells (iHECs) with exogenous Runx1 and Hoxa9 overexpressions were downloaded from GEO ([358]GSE128738^[ [359]^35 ^]). After quality control, a total of 26 embryonic HECs and 65 iHECs were retained for analysis. An analogous benchmarking process to the Nkx2‐1 knockout data was employed, with the distinction being that the prior GRNs were based on blood‐associated data, and cells with TF overexpression were considered as the positive class when computing ROC and AUC values. Pathway Enrichment Analysis of Regulons To identify the biological pathways associated with each regulon, pathway enrichment analysis was performed using the Jaccard test.^[ [360]^125 ^] This test evaluated the similarity and statistical significance between the regulon gene set and pathway gene lists based on their shared genes. Notably, the initial enriched pathway terms for each regulon contained redundancies. To address this issue, Jaccard similarity coefficients between all pairs of enriched terms were calculated and then clustered into nonredundant groups using unsupervised hierarchical clustering. The enriched term with the smallest P‐value in each group was selected to represent that group. By default, the Reactome pathway database^[ [361]^126 ^] (version 77) covering 16 species was used for enrichment analysis. Only terms containing 10–2000 genes were included. This was allowed for inferring key biological pathways linked to each regulon in a nonredundant manner. Analysis of Single‐Cell RNA Sequencing Data during HSC Development The raw single‐cell Smart‐seq2 data from six sequential cell populations (of note, E12 and E14 FL HSCs were combined) during HSC development were sourced from the GEO datasets ([362]GSE153653 and [363]GSE67120) derived from the recently published study.^[ [364]^22 , [365]^127 ^] Initially, adaptor and low‐quality base trimming were performed through trimmomatic^[ [366]^128 ^] (version 0.36) using the parameters “LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 HEADCROP:20 CROP:101 MINLEN:101.” Subsequently, gene expression was quantified as counts using Salmon^[ [367]^129 ^] (version 1.3.0) with clean data derived from trimmomatic output, where reads were mapped to the mouse GENCODE^[ [368]^130 ^] transcriptome (mm10) and gene annotation (vM25, primarily assembled). Log normalization was performed using the “logNormCounts” function from the R package scater. TF activities were estimated with metaTF, using blood‐associated networks as prior network. The CV of all genes and TFs was then calculated using normalized counts or activity scores, and the top 500 highly variable genes/TFs were retained for t‐SNE analysis. To confirm the efficacy of TF activity profiles in more effectively classifying distinct cell types compared to gene expression profiles, a classic machine learning approach, random forest classification model, was employed on mouse embryonic HSCs cell populations. Separate models were fitted using TF activity profiles and gene expression profiles, and models were trained in Python package scikit‐learn with “RandomForestClassifier” method. Cell‐type specificity of TFs was determined at both the expression and activity levels using the “findAllCTSRegulons” method in the R package metaTF. TFs with adjusted P < 0.01 and cell‐type‐specific score (CSS) > 3 were considered significant cell‐type‐specific regulators. The “RadvizEnrichPlot” function in the R package metaTF was used to visualize cell‐type‐specific TFs, and the R package UpSetR^[ [369]^30 ^] (version 1.4.0) was used to generate Venn diagrams. Rela loss impairs HSC function and promotes hematopoietic stem and progenitor cell (HSPC) cycling. To confirm that the target genes of Rela are indeed activated in HSCs, the enrichment of these targets was estimated using an independent dataset with Rela knockout (TF knockout/knockdown data from mouse bone marrow HSCs,^[ [370]^38 ^] [371]GSE45755). Ideally, Rela knockout would lead to the downregulation of its target genes, so GSEA was performed using Rela’s target genes on logFCs calculated from Rela knockout and control samples. The R package fgsea was used to calculate the normalized enrichment score and adjusted P‐values. The enrichment plot showed that these target genes were significantly enriched in control samples (Figure [372]3g), indicating that the target genes of Rela identified with metaTF can be confirmed in other independent datasets. The same confirmation was also performed on another TF, Sp1, using microarray data with Sp1 knockout^[ [373]^39 ^] ([374]GSE52497). Analysis of Breast Cancer Epithelial Cell Single‐Cell RNA Sequencing Data Breast cancer patient scRNA‐seq data from the GEO ([375]GSE176087) were analyzed.^[ [376]^131 ^] Gene expression was log normalized, and cells with fewer than 500 expressed genes were filtered out, resulting in 13 062 genes across 25 605 epithelial cells. The Seurat v3.0.0 method was employed in R (v3.5.0) for data normalization, dimensionality reduction, and clustering. Specifically, to recluster epithelial lineages, cell signatures were utilized for breast epithelial subsets described by Lim et al.^[ [377]^132 ^] as input features. Subsequently, the FindIntegrationAnchors step was utilized to identify anchors, followed by alignment with 30 dimensions (IntegrateData step) within Seurat. The final clustering results were visualized using t‐SNE. TF activities were estimated using metaTF with stranded pipeline except that tissue‐associated networks were used as prior networks. The CV of each gene/TF was then calculated using normalized counts or activity scores, and only the top 500 highly variable TFs or genes were retained for dimensionality reduction analysis (t‐SNE). The cell‐type specificity of TFs at the expression and activity levels was estimated using the “findAllCTSRegulons” method in the metaTF package. TFs with adjusted P < 0.05, CSS > 0.3, and percent > 0.1 for each stage were considered significant. Three evaluation indicators (Between/within, Calinski–Harabasz index, 1/Davies–Bouldin index (1/DBI)) for clustering results were calculated to show that using TF activities can more closely cluster cell populations and separate different types of cells as much as possible. For gene set activity analysis in scRNA‐seq data, the AUCell method from the AUCell Bioconductor package was used to calculate AUCell scores. The raw single‐cell count matrix data were downloaded from GEO ([378]GSE180286). Then, all cells expressing <300 genes were removed, and >20% mitochondrial counts. Genes expressed in fewer than three cells were likewise removed. The Seurat default parameters were used unless stated otherwise. For the clustering of all cell types, 2000 variable genes were identified, and principal component analysis (PCA) was applied to the dataset to reduce dimensionality after regressing the percentage of mitochondrial genes. Data integration was performed using the Seurat functions FindIntegrationAnchors and IntegrateData. Clustering was conducted using the FindClusters function with 30 PCA components and a resolution parameter set to 1.2 and visualized using UMAP with the RunUMAP function. Cell types were annotated based on canonical cell‐type markers. Cells annotated as epithelial cells in each sample were extracted into a subset and separately re‐clustered, using the methods described above and the following parameters: n [features] = 2000, npcs = 20, and resolution = 0.5. Malignant epithelial cells were identified based on genome‐wide copy number profiles computed from the gene expression Unique Molecular Identifier (UMI) matrix using the Bayesian segmentation approach, CopyKAT^[ [379]^133 ^] (v1.0.5). Then, malignant epithelial cell types were annotated with scSubtype. Normal epithelial cell types were annotated based on canonical cell‐type markers. TF activities were estimated using metaTF with a stranded pipeline, except that tissue‐associated networks were used as prior networks. Validation on TCGA BRCA RNA‐Seq Dataset The RNA‐seq data from breast cancer patients were downloaded from the TCGA database ([380]https://portal.gdc.cancer.gov/), and subtyping was performed based on Berger et al.’s work^[ [381]^52 ^] (subtype PAM50). This dataset includes 575 LumA BRCA clinical subtype samples, 211 LumB BRCA clinical subtype samples, 198 Basal BRCA clinical subtype samples, 82 Her2 BRCA clinical subtype samples, 40 Normal‐like BRCA clinical subtype samples, 113 BRCA tumor adjacent normal tissue samples. Subsequently, the genesets of cancer cell‐specific activated regulons which were identified by metaTF in BRCA epithelial cell scRNAseq data were used as input for GSVA.^[ [382]^53 ^] GSVA scores were calculated, and differentially activated regulons were determined using the limma R package with the parameter “adjusted P‐value of <0.01” (one versus the rest). Analysis of Prostate Cancer Epithelial Cell Single‐Cell RNA Sequencing Data Human prostate cancer single‐cell RNA‐seq data ([383]GSE176031), excluding samples related to prostate epithelial organoids, were downloaded and epithelial cells were extracted while retaining the original cell type annotations. Subsequently, clustering analysis was performed on the five epithelial cell populations using metaTF and Seurat. In brief, gene expression was log‐normalized, and cells with fewer than 500 expressed genes were filtered out, resulting in 14 546 epithelial cells. TF activities were estimated using metaTF with a stranded pipeline, except that tissue‐associated networks were used as prior networks. The CV of each gene/TF was then calculated using normalized counts or activity scores, and only the top 500 highly variable TFs or genes were retained for dimensionality reduction analysis (UMAP). To validate that TF activity can more effectively distinguish different epithelial cell populations, all epithelial cell TF activity profiles, gene expression profiles, and cell type annotations were inputted into a random forest classification model. After model training (using the RandomForestClassifier method from the scikit – learn Python package and specifying ‘max_depth = 5′ as one of the parameters), precision and recall values were calculated for each cell population. Total RNA Extraction and Quantitative Real‐Time PCR Total RNA was extracted from tissues or cells with Trizol reagent (Invitrogen, United States). RNA was reverse‐transcribed into Complementary DNA (cDNA) using reverse Transcriptase M‐MLV (Takara, Dalian, China). The relative expression of genes was detected with Hieff Quantitative Polymerase Chain Reaction (qPCR) SYBR Green Master Mix (#11201ES08 Yeasen Biotechnology, Shanghai, China). Cell Cultures The human TNBC cell lines BT549 and Hs578T were cultured in Dulbecco's Modified Eagle Medium (DMEM) containing 10% fetal bovine serum (Procell Life Science & Technology, China.) at 37 °C in a 5% CO[2] humidified atmosphere. Transfection of siRNAs The siRNAs targeting 15 TFs and negative control were synthesized by Sangon Biotech (Shanghai, China). The sequences of these siRNAs are included in Table [384]S16 (Supporting Information). The transfection was mediated by jetPRIME transfection reagent (Polyplus‐transfection, France) according to the manufacturer's protocol. Cell Migration and Invasion Assay Transwell chambers (6.5 mm insert, 24‐well plate, 8.0 µm polycarbonate membrane, Corning Incorporated, Corning, NY, USA) with or without Matrigel were used to measure the invasion and migration ability of breast cancer cells in vitro. Briefly, BT549 and HS578T cells transfected with siRNAs were seeded in the upper chamber of transwell at the density of 5 × 10^4 cells (for the migration assay) or 8 × 10^4 cells (for the invasion assay) in serum‐free DMEM medium, and DMEM containing 10% fetal bovine serum was added to the lower chamber. After culturing for 24 h (for the migration assay) or 48 h (for the invasion assay), the upper chamber cells were wiped by using cotton swabs, and fixed with crystal violet. The number of cells across the filter was counted by microscopic imaging system. Cell Proliferation Assay Breast cancer cells (BT549 and Hs578T) were plated in 96‐well plates at a density of 5000 cells per well. At 0, 24, 48, and 72 h, 20 µL of 3‐(4,5‐Dimethylthiazol‐2‐yl)‐2,5‐diphenyltetrazolium bromide (MTT) reagent (BS186, Biosharp) was added to each well and mixed thoroughly. Following incubation for 4 h at 37 °C, Dimethyl Sulfoxide (DMSO) was mixed with the cell–MTT suspension for 10 s using a plate shaker, the conversion of substrate to chromogenic product by live cells was detected using a microplate reader to analyze cell viability. Analysis of Breast Cancer Lymphoid Cell Single‐Cell RNA Sequencing Data Two breast cancer patient scRNA‐seq datasets from the 10× Genomics ([385]https://www.10xgenomics.com/resources/datasets) were analyzed. The data were integrated in R using Seurat v3.1.1 with the following steps: i) identified 2000 integration anchors using Seurat::FindIntegrationAnchors, ii) integrated the objects with Seurat::IntegrateData, iii) scaled the data and regressed out the nCount_RNA using all genes, and iv) calculated the top 30 PCs for dimensional reduction. TF activities were estimated with metaTF using tissue‐associated networks as prior network. The CV values for all gene/TFs were calculated using normalized counts or activity scores, and then the top 2000 highly variable genes or the top 500 highly variable TFs were retained for UMAP analysis. Differentially expressed genes between CD8^+ T[EM] C1 and C2 populations were identified using Seurat::FindMarkers function with the parameters “min.pct = 0.3, log.fold.change = 0.3.” To examine the consistency of the findings in other breast cancer dataset, an scRNA‐seq dataset of human breast cancer T cells ([386]GSE110686) was reanalyzed, and the same processing steps were performed. For the epinephrine RNA‐seq data, reads were aligned to the human reference genome (Gencode^[ [387]^130 ^] v43) with HISAT2^[ [388]^134 ^] v2.0.52. Gene counts were derived with featureCounts^[ [389]^135 ^] v2.0.3. Differential expression analysis was performed using edgeR^[ [390]^136 ^] with the parameters dispersion = 0.4 and P < 0.05. Figures were generated in the R package ggplot2 ([391]https://github.com/tidyverse/ggplot2). The Extraction of Tumor‐Infiltrating Lymphocytes Tumor tissue samples were obtained by surgical excision and dissociated into single‐cell suspensions for 3 h using 1 mg mL^−1 collagenase IV. Density gradient centrifugation was performed using lymphocyte separation medium to separate lymphocytes from other cell types. Isolated lymphocytes were washed with Phosphate Buffered Saline (PBS) to remove residual cell debris and surface staining was performed with Fluorescein Isothiocyanate (FITC) anti‐human CD8 antibody (1:20 dilution, OKT8, Biolegend) at 4 °C in the dark for 30 min. After staining, cells were resuspended with PBS and filtrated through a mesh filter (40 µm), and then CD8^+ T lymphocytes were specifically isolated by BD FACSAria II cell sorter. Tumor‐infiltrating T cells were cultured in CST AIM V Medium (Gibco, Thermo Fisher Scientific) supplemented with 10% FBS (Gibco, Thermo Fisher Scientific) and Human Recombinant IL‐2 (500 units mL^−1, #78220, STEMCELL Technologies) for 15 days. The cells were treated with epinephrine (0.1 ng mL^−1, Sigma‐Aldrich) for 24 h followed by RNA sequencing.^[ [392]^71 ^] Immunofluorescence Staining and Confocal Laser Scanning Microscopy Human breast cancer and tumor‐infiltrating T cells were fixed with 4% cool paraformaldehyde for 10 min and blocked with 1% Bovine Serum Albumin (BSA) for 30 min. After washing in PBS, breast cancer tissues were stained with NF‐H Polyclonal antibody (1:100 dilution, 21471‐1‐AP, Proteintech) at 4 °C overnight in the dark, followed by FITC goat aAtirabbit IgG (H+L) (1:100 dilution, ZF‐0311, ZSGB‐BIO) and tumor‐infiltrating T cells were stained with anti‐ADRB2 antibody (1:100 dilution, AF6117, Affinity) at 4 °C overnight in the dark, followed by Rhodamine (TRITC) goat antiRabbit IgG (H+L) (1:100 dilution, AS040, ABclonal) at room temperature in the dark for 1 h. After three washes in PBS, the samples were stained with anti‐CD8 Mouse mAb (FITC Conjugate) (1:400 dilution, 55397S, Cell Signaling Technology) at room temperature in the dark for 3 h, and then nuclei were stained with 4',6‐Diamidino‐2‐Phenylindole (DAPI) (D3571, Invitrogen) for 5 min in the dark. Finally, the samples were sealed with sealing agent. The expressions of ADRB2 and CD8 in breast cancer tissues were detected by polychromatic immunofluorescence staining. Polychromatic immunofluorescence staining was performed by using the three‐color multiple fluorescent immunohistochemical staining kit (HYDS0023, Guduo Technology, China) based on the tyramide signal amplification (TSA) technique following the manufacturer's manual. Briefly, deparaffinize and rehydrate breast cancer tissue sections, and then quench endogenous enzyme activity by incubating 30 min in 0.3% hydrogen peroxide and block nonspecific binding sites with goat serum for 10 min. The sections were incubated with primary antibodies against CD8 (AF5126, Affinity) at 4 °C overnight and incubated with secondary antibody labeled with Horseradish Peroxidase (HRP) provided with the kit for 50 min, followed by GD‐520 fluorescent dye diluted by the signal amplification reagent provided with the kit at room temperature in the dark for 15 min. The steps above were repeated, including the following: antigen retrieval and endogenous peroxidase blocking, the sections were incubated with primary antibodies against ADRB2 (AF6117, Affinity) at 4 °C overnight and incubated with secondary antibody labeled with HRP provided with the kit for 50 min, followed by GD‐570 fluorescent dye diluted by the signal amplification reagent provided with the kit at room temperature in the dark for 15 min. Finally, sections were counterstained with DAPI, sealed with glycerol and imaged using a ZEISS LSM 900 confocal microscope. RNA Extraction, Library Construction, and Sequencing Total RNA was extracted using Trizol reagent (15596018, Thermo Fisher) following the manufacturer's procedure and the quantity and purity of RNA were detected by Bioanalyzer 2100 and RNA 6000 Nano LabChip Kit (5067‐1511, Agilent). After mRNA was purified by using Dynabeads Oligo (dT) (Thermo Fisher) and was fragmented into short fragments, and then the RNA fragments were reverse‐transcribed to create the cDNA by SuperScript II Reverse Transcriptase (18064014, Invitrogen), the products were amplified by PCR. Finally the high‐throughput sequencing (PE150) was performed on the Illumina Novaseq 6000 sequencer (LC‐Bio Technology Co., Ltd., Hangzhou, China). Cell‐Type Specificity Estimation To evaluate the cell‐type specificity of TFs (or genes), an entropy‐based measure was used that quantifies the similarity between a TF activity and another predefined pattern, which represents the extreme state in which the TF is only activated in an indicated cell type. The specificity metric mostly relied on the Jensen–Shannon divergence^[ [393]^137 ^] (JSD), which was calculated as follows [MATH: JSDg,g=Hg+g2Hg+Hg2< /mfrac> :MATH] (14) where g and g′ represent two discrete probability vectors (such as the TF activity score) with equal length n, and H represent the entropy of discrete probability vectors as [MATH: Hp=i< /mi>=1np< mi>ilogpi,p=p1,p< /mi>2,,pn,0pi1,andi=1npi=1 :MATH] (15) Then, the distance between two TF activity patterns, r ^1 and r ^2 was defined as [MATH: JSDdistr1,r< /mi>2=JSDr1,r< /mi>2 :MATH] (16) The CSS of a TF, [MATH: r=(r1,r2,,< msup>rm) :MATH] , in an indicated cell type was calculated as [MATH: CSSr=1JSDdistr,rc :MATH] (17) where r ^c represents a predefined pattern of the extreme case in which the TF is only activated in this cell type and was defined as [MATH: rc=r1c,< /mo>r2c,,rnc,s.t.rie=1,ic0,ic :MATH] (18) In the context of TF activity data, the i represent each cell and the n represent the total number of cells. Meanwhile, for TF activity scores with negative values, and these were normalized into ranges from 0 to 1 as [MATH: xxminxmaxxmin :MATH] . Next, to evaluate the significance of a TF that is specifically activated in an indicated cell type, the P‐value of TF was estimated using a bootstrap strategy. Specifically, for a TF in an indicated cell type, the cell labels were shuffled and the simulated CSS were calculated 10 000 times. Next, the P‐value was calculated as the percentage of the simulated CSSs that were greater than the actual CSS within all 10 000 simulated CSSs, and the Benjamini & Hochberg method was used to adjust the P‐values. Furthermore, the cell‐type‐specific methods were also suitable for gene expression. Statistical Analysis To compare the means of three or more independent groups (Figure [394]S4i,b, Supporting Information), a one‐way Analysis of Variance (ANOVA) was employed using the aov function from the stats package in R. If the overall ANOVA was significant (*P < 0.05), post‐hoc comparisons were performed using Tukey's Honestly Significant Difference (HSD) test to identify specific group differences. Thresholds for statistical significance are denoted by *P < 0.05, **P < 0.01, and ***P < 0.001. Conflict of Interest The authors declare no conflict of interest. Author Contributions Y.H., Y.Z., G.T., M.S., and P.T. contributed equally to this work. D.W., X.b.L., and Z.L. conceived the study and designed and supervised the study. D.W., X.b.L., Y.Z., and M.S. performed the experiments. Y.H., G.T., and P.T. performed bioinformatic analyses. D.W., X.b.L., Y.H., and Z.L. wrote the manuscript with input from all of the authors. Y.Y., X.Z., M.L., X.n.L., L.W., J.C., H.Z., Y.H., and Y.H. revised the manuscript. Supporting information Supporting Information [395]ADVS-12-e10745-s004.docx^ (50.9MB, docx) Supplemental Table 1 [396]ADVS-12-e10745-s008.xlsx^ (19KB, xlsx) Supplemental Table 2 [397]ADVS-12-e10745-s007.xlsx^ (10.4KB, xlsx) Supplemental Table 3 [398]ADVS-12-e10745-s019.xlsx^ (197.1KB, xlsx) Supplemental Table 4 [399]ADVS-12-e10745-s010.xlsx^ (17.6KB, xlsx) Supplemental Table 5 [400]ADVS-12-e10745-s016.xlsx^ (18.4KB, xlsx) Supplemental Table 6 [401]ADVS-12-e10745-s014.xlsx^ (11.4KB, xlsx) Supplemental Table 7 [402]ADVS-12-e10745-s009.xlsx^ (14.5KB, xlsx) Supplemental Table 8 [403]ADVS-12-e10745-s006.xlsx^ (295KB, xlsx) Supplemental Table 9 [404]ADVS-12-e10745-s021.xlsx^ (13.5KB, xlsx) Supplemental Table 10 [405]ADVS-12-e10745-s003.xlsx^ (10.7KB, xlsx) Supplemental Table 11 [406]ADVS-12-e10745-s015.xlsx^ (18.8KB, xlsx) Supplemental Table 12 [407]ADVS-12-e10745-s020.xlsx^ (22.5KB, xlsx) Supplemental Table 13 [408]ADVS-12-e10745-s012.xlsx^ (12.4KB, xlsx) Supplemental Table 14 [409]ADVS-12-e10745-s001.xlsx^ (22.1KB, xlsx) Supplemental Table 15 [410]ADVS-12-e10745-s017.xlsx^ (15.5KB, xlsx) Supplemental Table 16 [411]ADVS-12-e10745-s018.xlsx^ (11.4KB, xlsx) Supplemental Table 17 [412]ADVS-12-e10745-s011.xlsx^ (44.7KB, xlsx) Supplemental Table 18 [413]ADVS-12-e10745-s005.xlsx^ (48.5KB, xlsx) Supplemental Table 19 [414]ADVS-12-e10745-s013.xlsx^ (11.6KB, xlsx) Supplemental Table 20 [415]ADVS-12-e10745-s002.xlsx^ (2.2MB, xlsx) Acknowledgements