Abstract

   Cellular heterogeneity within cancer tissues determines cancer
   progression and treatment response. Single‐cell RNA sequencing
   (scRNA‐seq) has provided a powerful approach for investigating the
   cellular heterogeneity of both cancer cells and stroma cells in the
   tumor microenvironment. However, the common practice to characterize
   cell identity based on the similarity of their gene expression profiles
   may not really indicate distinct cellular populations with unique
   roles. Generally, the cell identity and function are orchestrated by
   the expression of given specific genes tightly regulated by
   transcription factors (TFs). Therefore, deciphering TF activity is
   essential for gaining a better understanding of the uniqueness and
   functionality of each cell type. Herein, metaTF, a computational
   framework designed to infer TF activity in scRNA‐seq data, is
   introduced and existing methods are outperformed for estimating TF
   activity. It presents the improved effectiveness in characterizing cell
   identity during mouse hematopoietic stem cell development. Furthermore,
   metaTF provides a superior characterization of the functional identity
   of breast cancer epithelial cells, and identifies a novel subset of
   neural‐regulated T cells within the tumor immune microenvironment,
   which potentially activates BCL6 in response to neural‐related signals.
   Overall, metaTF enables robust TF activity analysis from scRNA‐seq
   data, significantly enhancing the characterization of cell identity and
   function.

   Keywords: cell identity, scRNA‐seq, transcription factor activity,
   tumor immune microenvironment
     __________________________________________________________________

   MetaTF is a new computational framework that uses single‐cell RNA
   sequencing (scRNA‐seq) data to infer transcription factor activity with
   the aim to help in deciphering cells identity starting from a
   heterogeneous population, and apply it to characterize mouse
   hematopoietic stem cells and breast cancer epithelial cells, thus
   identifying a novel subset of neural‐regulated T cells within the tumor
   microenvironment.

   graphic file with name ADVS-12-e10745-g003.jpg

1. Introduction

   Single‐cell RNA sequencing (scRNA‐seq) has become the state‐of‐the‐art
   approach for investigating the cellular heterogeneity and dynamics of
   various biological systems, ranging from cell development to disease
   progression.^[ [54]^1 , [55]^2 , [56]^3 ^] Leveraging the power of
   scRNA‐seq, researchers have successfully applied it for the
   identification of novel cell types and subtypes, the refinement of
   developmental lineage trajectories, and more.^[ [57]^4 , [58]^5 ,
   [59]^6 ^] Regardless of what objective is pursued, a fundamental step
   consistently involves the establishment of cell identity present within
   a given tissue sample.^[ [60]^7 , [61]^8 ^] A common practice for
   identifying cell types is by clustering cells based on the similarity
   of their gene expression profiles.^[ [62]^9 , [63]^10 ^] However,
   transcriptional differences that separate cells into distinct groups
   may not necessarily indicate functionally distinct cell populations.^[
   [64]^11 , [65]^12 , [66]^13 ^] Given the additional layers of
   complexity in cell identity, accurately defining cell identity remains
   a substantial challenge.

   At the core of cell identity and function lies the intricate interplay
   of gene expression programs, which ultimately dictate the unique
   characteristics of each cell type. Central to this regulatory landscape
   are the core regulatory circuits, orchestrated by tightly regulated
   transcription factors (TFs).^[ [67]^7 , [68]^14 ^] Understanding TF
   activity is paramount for unraveling the essence and functionality of
   distinct cell types. While recent years have seen the emergence of
   several algorithms for analyzing TF activity in scRNA‐seq data, they
   often grapple with limitations affecting their accuracy or
   robustness.^[ [69]^15 , [70]^16 , [71]^17 , [72]^18 ^] For instance,
   the algorithms solely relying on TF expression levels struggle to
   faithfully capture their biological activity. Similarly, those
   scrutinizing both TFs and their direct target expression encounter
   challenges in identifying genuine targets due to the complexities of
   regulation. These methods rely heavily on experimentally measured or
   predicted TF binding information, such as motif enrichment, thereby
   limiting their scope to TFs with available data. Moreover, current TF
   binding site (TFBS) annotation assays, such as chromatin
   immunoprecipitation coupled with deep sequencing (ChIP‐seq), ChIP‐exo,
   and CUT&RUN, necessitate high‐affinity antibodies and large amounts of
   homogeneous cells, posing challenges for diverse tissues. Additionally,
   these assays assess only one TF at a time, hampering comprehensive TF
   binding analysis. Conversely, motif‐based predictions, while not
   require cell‐specific data, rely on predefined motifs available for
   only a subset of TFs, leading to potential false positives.

   Recognizing these limitations, the compilation of TF–target
   interactions from diverse data sources to construct a comprehensive
   static TF–target interaction network, integrated it with dynamic gene
   regulatory networks (GRNs) from single cells, provide a better
   understanding of regulatory relationships within the sample's
   organization under specific conditions. Furthermore, a notable
   observation emerges; there exists an underutilization of TF activity in
   characterizing cell identity and function, especially in contexts like
   cell development and disease progression.^[ [73]^19 , [74]^20 ^] This
   underutilized potential underscores the urgent need for enhanced
   algorithms for inferring TF activity in scRNA‐seq data and their
   practical application.

   In this context, we introduce metaTF
   ([75]https://github.com/wanglabsmu/metaTF), a computational framework
   tailored for comprehensive TF activity analysis in scRNA‐seq data.
   MetaTF addresses the limitations of traditional methods by constructing
   comprehensive static prior networks through the integration of diverse
   data sources, and it does not require training on a specific scRNA‐seq
   dataset. It enhances TF regulation portrayal by amalgamating dynamic
   regulatory networks from scRNA‐seq data with static prior networks and
   optimizing existing algorithms to provide accurate and robust TF
   activity predictions and gene regulatory network analysis. Benchmarking
   across diverse types of scRNA‐seq data, including TF knockout, and TF
   overexpression data, demonstrates that metaTF consistently outperforms
   other widely used tools, affirming its remarkable accuracy and
   resilience in inferring TF activity from scRNA‐seq data.

   Single‐cell Clustered Regularly Interspaced Short Palindromic Repeats
   (CRISPR) screening is revolutionizing the field of immuno‐oncology by
   enabling the precise mapping of GRNs and the identification of critical
   targets. Specifically, the knockout of TFs enables the determination of
   the perturbation effects in finely regulated biological processes,
   which in turn influences cellular functions. In the realm of CD8^+
   cytotoxic T‐cell differentiation,^[ [76]^21 ^] metaTF has not only
   proven its exceptional capability in the context of gene regulatory
   network analysis but has also displayed its prowess by showcasing its
   ability to predict the effects of TFs’ perturbations with remarkable
   accuracy. These underscore the enhanced capability of metaTF to
   effectively interpret novel forward genetic screens data, highlighting
   its advanced analytical prowess in decoding the complex genetic
   interactions. MetaTF provides an effective and applicable solution to
   elucidate perturbation function and biologic circuits by a model‐based
   quantitative analysis of single‐cell‐based CRISPR screening data. As an
   extension of its application, metaTF can quantify the relationships
   between various perturbations.

   Its application to mouse hematopoietic stem cell (HSC) data^[ [77]^22
   ^] illustrates how TF activity enhances the accurate identification of
   cell identity. Additionally, metaTF provides a deeper understanding of
   the functional identity of epithelial cells in breast cancer (BRCA),
   unveiling the presence of a novel neural‐regulated T‐cell subset within
   the tumor immune microenvironment. In summary, metaTF addresses the
   need for methods capable of integrated analysis of diverse data types
   by offering a systematic integration method that is broadly applicable.
   By leveraging various data sources and dimensions of TF biology, metaTF
   provides a more comprehensive and versatile solution for TF activity
   inference and gene regulatory network analysis. This integrated
   analysis is crucial for unraveling the complexities of gene regulation
   and holds immense potential for researchers across various domains.

2. Results

2.1. Overview of the MetaTF Framework

   MetaTF is an integrated framework for inferring TF activity from
   scRNA‐seq data. This workflow consists of five key components: data
   import, GRN inference, construction of a prior network, regulon
   (TF–targets) identification, and estimation of TF activity (Figure
   [78]1 ). Specifically, metaTF supports flexible data input, including
   count matrices from droplet or full‐length protocols,^[ [79]^23 ^] or
   preprocessed data objects from software such as CellRanger,^[ [80]^24
   ^] Seurat,^[ [81]^25 ^] and SingleCellExperiment.^[ [82]^26 ^] GRN
   inference utilizes percentage of unique information contribution
   (PUIC), an entropy‐based algorithm, to estimate the importance of
   regulator–target relationships by quantifying the unique informational
   contribution from the mutual information (MI) between each gene triplet
   (Figure [83]1; Experimental Section; Figure [84]S1a, Supporting
   Information). Notably, integration with a weighted prior TF–target
   network constructed from literature‐curated, ChIP‐seq, TF knockout, and
   tissue‐specific interaction information further enhances robust regulon
   identification for each TF (Experimental Section; Figure [85]S1b,
   Supporting Information). The Virtual Inference of Protein‐activity by
   Enriched Regulon analysis (VIPER)^[ [86]^17 ^] algorithm then estimates
   TF activity based on the enrichment score of the identified regulons.
   Implementation of metaTF as an R package enables leveraging the
   extensive SingleCellExperiment ecosystem for convenience and
   scalability. Meanwhile, parallel workflows allow comparisons of
   alternative methods for GRN inference (such as Gene Network Inference
   by Ensemble of Trees (GENIE3)^[ [87]^27 ^] and PCOR (R Package for a
   Fast Calculation to Semi‐partial Correlation Coefficients)^[ [88]^28 ^]
   and estimation of TF activity [such as single‐sample Gene Set
   Enrichment Analysis (ssGSEA)^[ [89]^29 ^] and AUCell^[ [90]^18 ^] (a
   tool to quantify the enrichment of gene sets in single cells)].
   Downstream analyses include cell‐type‐specific activated TF
   identification, pathway enrichment, and the construction of visual
   aids, including Radviz plots,^[ [91]^30 ^] Sankey diagrams, and
   heatmaps (see online vignette). Optimization via vectorization and
   multithreading ensures user‐friendly and timely analysis.

Figure 1.

   Figure 1
   [92]Open in a new tab

   Overview of metaTF framework. MetaTF is built upon the S4 class
   SingleCellExperiment package and accepts various types of inputs,
   including count matrices and S4 objects generated from
   SingleCellExperiment or Seurat. Subsequent analyses include GRN
   inference, prior‐network construction, regulon identification, and TF
   activity estimation. Moreover, metaTF provides several additional
   functions, including cell type‐specific activated TF identification,
   regulon pathway enrichment analysis, and the construction of various
   visual aids, such as dot plots, Radviz plots, Sankey diagrams, and
   heatmaps.

2.2. Benchmarking Highlights MetaTF's Superior Performance for Inferring TF
Activity

   We systematically benchmarked metaTF against the state‐of‐the‐art
   algorithms Single‐cell Regulatory Network Inference and Clustering
   (SCENIC)^[ [93]^18 ^] and DoRothEA.^[ [94]^15 ^] SCENIC uses machine
   learning (GENIE3/GRNBoost) to infer GRNs and refines them via motif
   enrichment analysis, retaining only TF–target pairs with significant
   motif presence in the promoters of target genes. This approach reduces
   indirect interactions but is limited by the availability of predefined
   motifs and may yield false positives. DoRothEA relies on a precompiled
   TF–target interaction database, bypassing GRN inference from
   single‐cell data. Although efficient, it may include context–irrelevant
   interactions, leading to false positives. By using diverse scRNA‐seq
   datasets, metaTF showed its superior performance in several scenarios.
   In CRISPRi perturbation data targeting 40 TFs in human embryonic stem
   cells (ESCs),^[ [95]^31 ^] metaTF outperformed SCENIC and DoRothEA at
   inferring TF activities (Figure [96]2a; Experimental Section; Figure
   [97]S2a and Tables [98]S1 and [99]S2, Supporting Information). Since
   dropouts pose challenges for TF activity analysis, we evaluated their
   performance under varying dropout levels. Despite observing declining
   performance with increased dropout rates, it is noteworthy that metaTF
   maintains competitive performance compared to other tools (Experimental
   Section; Figure [100]S2b, Supporting Information), demonstrating the
   robustness of metaTF with dropouts. This is further supported by
   results from low‐coverage 10× Genomics scRNA‐seq data^[ [101]^32 ^]
   (Experimental Section; Figure [102]S3, Supporting Information). In
   primary cells with single TF knockout, metaTF achieved Area Under the
   Receiver Operating Characteristic Curve (AUROC) values of 0.87 and 0.68
   for Nkx2‐1 knockout droplet‐based scRNA‐seq data^[ [103]^33 ^] and
   Cebpa knockout MARS‐seq data,^[ [104]^34 ^] respectively, which were
   higher than those produced by the alternatives (Figure [105]2b,c;
   Experimental Section; Figure [106]S4a–d and Table [107]S3, Supporting
   Information). In overexpression data, metaTF again excelled with AUROC
   values of 0.63 and 0.97 separately for the Hoxa9 and Runx1
   overexpression Smart‐seq2 data^[ [108]^35 ^] (Figure [109]2d,e;
   Experimental Section; Figure [110]S4e,g and Tables [111]S4 and [112]S5,
   Supporting Information). These results affirm the remarkable accuracy
   and resilience of metaTF in conducting TF activity inference for a wide
   range of scRNA‐seq datasets. Specifically, when applied to CRISPR
   droplet sequencing (CROP‐seq) data from HEK293T cells with TF
   perturbations,^[ [113]^36 ^] PUIC outshines the other two methods
   (GENIE3 and PCOR) in more than half of the TFs, and maintains a faster
   computational speed (Figure [114]2f; Experimental Section; Figure
   [115]S4h, Supporting Information). Furthermore, the integration of GRN
   inferred by PUIC and the prior TF–target network significantly
   contributed to the enhanced performance of metaTF (Experimental
   Section; Figure [116]S4i,j, Supporting Information). Lastly, in terms
   of TF activity estimation, VIPER was proven to be superior to ssGSEA
   and AUCell (Experimental Section; Figure [117]S4k and Table [118]S6,
   Supporting Information). Besides, combinatorial analysis on CRISPRi
   data showed that the default PUIC and VIPER methods in metaTF yielded
   optimal performance (Figure [119]2g), achieving the highest AUROC
   values for 23 out of 40 TFs (Table [120]S7, Supporting Information).
   Meanwhile, to elucidate the pivotal contributions of two key steps in
   inferring TF activity using metaTF, we recalculated the activities of
   35 TFs by separately removing the steps involving PUIC for GRN
   calculation and the prior network. The findings revealed that, in
   comparison to PUIC alone, the inclusion of the prior network led to
   superior results for metaTF. Simultaneously, when excluding the PUIC
   calculation of the GRN step and retaining only the prior network,
   metaTF achieved results comparable to SCENIC (Figure [121]S5a,b,
   Supporting Information). In summary, systematic benchmarking validates
   metaTF as a superior algorithm for scRNA‐seq TF activity inference.

Figure 2.

   Figure 2
   [122]Open in a new tab

   Assessment of metaTF using various types of scRNA‐seq data. a) Line
   plot on the left depicting the AUROC for 40 perturbated TFs, calculated
   with three methods including metaTF, SCENIC, and DoRothEA. The
   distribution curve on the right illustrates the AUROC score
   distributions for these three methods. The Kolmogorov–Smirnov (KS) test
   was used to determine the differences (*P < 0.05). b) Receiver
   operating characteristic (ROC) curves for Nkx2‐1 activity estimation by
   metaTF, SCENIC, and DoRothEA. These estimations were conducted on
   Nkx2‐1 conditional knockout (cKO) epithelial cells from Little
   et al.,^[ [123]^33 ^] with the AUROC values indicated in brackets. FPR
   represents the false‐positive rate, and TPR represents true‐positive
   rate. c) ROC curves for Cebpa activity estimation by the same three
   methods applied to Cebpa cKO myeloid progenitors from Paul et al.^[
   [124]^34 ^] d) ROC curves of Hoxa9 activity estimation by the same
   three methods applied to Hoxa9 overexpression HECs from Guo et al.^[
   [125]^35 ^] e) ROC curves of Runx1 activity estimation by the same
   three methods applied to Runx1 overexpression HECs from Guo et al. f)
   Line plot showing the AUC values for 14 perturbed TFs, calculated using
   GRNs inferred by PUIC, GENIE3, and PCOR in the CROP‐seq dataset. g)
   Line plot showing the AUROC values for 40 perturbed TFs, calculated
   using various combinations of GRN inference methods and TF activity
   enrichment methods within the metaTF framework. h) The histogram
   displays the distribution of differences in the AUROC values obtained
   from metaTF, DoRothEA, and SCENIC for each transcription factor,
   providing a visual summary of their comparative performance on data of
   single‐cell CRISPR screens from CD8^+ cytotoxic T‐cell differentiation.

   Single‐cell CRISPR screening facilitates the mapping of gene regulatory
   networks, uncovering pivotal targets in immuno‐oncology. Specifically,
   the knockout of TFs enables the determination of the perturbation
   effects in finely regulated biological processes, which in turn
   influences cellular functions. In the realm of CD8^+ cytotoxic T‐cell
   differentiation,^[ [126]^21 ^] single‐cell CRISPR screens with knockout
   of TFs provide a more effective method for interrogating the regulatory
   landscape of these cells. Our metaTF method demonstrated superior
   performance in analyzing these datasets with single TF knockout
   experiments, achieving an average AUROC of 0.69 across 109 TFs.
   Notably, metaTF outperformed established methods DoRothEA and SCENIC in
   80 and 66 TFs, respectively (Figure [127]2h; Experimental Section;
   Figure [128]S6a, Supporting Information). As an extension of its
   application, we initially identified eight distinct pairs of TFs that
   exhibited divergent activity trajectories following double knockout
   experiments, highlighting the intricate regulatory interdependencies
   within CD8^+ T‐cell regulation (Figure [129]S6b,c, Supporting
   Information). This comprehensive analysis unveiled the complex nature
   of TF interactions and their roles in the cellular response. These
   results not only validate metaTF's robustness in interpreting
   single‐cell CRISPR screen data but also provide insights into the
   intricate TF networks governing T‐cell fate decisions. We also
   validated similar conclusions using single‐cell CRISPR screening data
   from the cortical development of the mouse brain (Experimental Section;
   Figure [130]S7, Supporting Information). By elucidating both individual
   TF activities and their functional interplay, our approach offers a
   framework for deciphering GRN complexity in various biological
   contexts, potentially informing targeted approaches in immuno‐oncology.

2.3. TF Activity Profiles Outperform Expression Profiles for Dissecting
Heterogeneous Cell Populations

   To explore the potential of TF activity in capturing cellular
   heterogeneity, we utilized six sequentially acquired cell populations
   during the development of mouse embryonic HSCs in our previous study,^[
   [131]^22 ^] including inferred TF activity profile, TF expression
   profile, and highly variable gene expression profiles (Experimental
   Section). Dimensionality reduction analysis visualizes exceptional
   capability in effectively discerning diverse populations based on TF
   activity profiles while maintaining high level of internal consistency
   within each population (Figure [132]3a; Figures [133]S8a and [134]S9a,
   Supporting Information). Nonetheless, cell clustering based on TF
   expression or even highly variable gene expression faced challenges,
   particularly in the CD45^− T1 and CD45^+ T2 pre‐HSC populations, which
   were also observed in the results obtained from employing a classic
   random forest classification model to fit TF activity profiles and gene
   expression profiles (Experimental Section; Figure [135]S9b, Supporting
   Information). This indicates that aggregating signals from regulons in
   TF activity potentially delivers more resilient information content
   versus expression alone. To further confirm this notion, we gauged the
   cell‐type‐specific action and within cell‐type variation of TFs by
   contrasting activity and expression profiles (Experimental Section). In
   line with our hypothesis, we observed that TFs exhibited relatively
   higher cell type specificity (CTS, Figure [136]3b; Table [137]S8,
   Supporting Information) and lower within cell‐type variance (Figure
   [138]S8a, Supporting Information) in their activity versus expression
   profiles. As an example, the Hes family bHLH transcription factor 1
   (Hes1), known for its role in the maintenance of the HSC pool in the
   bone marrow under stress,^[ [139]^37 ^] emerged as an adult
   HSC‐specific TF in the activity profile, while Hes1 expression was
   confined to a small subset of adult HSCs, exhibiting high
   within‐cell‐type variation (Figure [140]S8b,c, Supporting Information).

Figure 3.

   Figure 3
   [141]Open in a new tab

   Comparison between the TF expression and TF activity profiles. a) t‐SNE
   plots visualizing cluster assignments of cells based on TF expression
   profile (left), highly variable gene expression profile (middle) and TF
   activity profile (right). The colors indicate the categorization into
   six sequential cell populations during HSC development: arterial
   endothelial cells (AEC), hemogenic endothelial cells (HEC), T1 (type 1
   pre‐HSCs), T2 (type 2 pre‐HSCs), fetal liver HSCs (FL), and Adult
   (adult HSCs). b) Bar plot representing the distribution of TFs at both
   the expression and activity levels, categorized by cell‐type‐specific
   score. The Kolmogorov–Smirnov (KS) test was used to determine the
   differences. c) Radviz plot showing the distribution of CTS TFs at the
   expression (left) and activity (right) levels, with colors representing
   the six cell types. d) The pie chart on the top displaying the number
   of shared TFs identified at both expression and activity levels, while
   the pie chart on the bottom displaying the number of TFs exclusively
   identified at the activity level. e) Violin plot demonstrating the
   expression of Rela in the six sequential populations. f) Violin plot
   demonstrating the activity score of Rela in the six sequential
   populations. Significance levels are indicated as ***, with a P‐value
   of <0.001. g) GSEA showing the relative enrichment of target genes
   regulated by Rela between WT (wild‐type) and Rela KO (knockout)
   samples. The normalized enrichment score was used to quantify the
   enrichment level.

   Robust cell state characterization using metaTF enables the discovery
   of additional CTS determinants. We further identified 414 CTS TFs, with
   the highest number in the T1 and T2 pre‐HSC stages (Figure [142]3c;
   Figure [143]S8d,e, Supporting Information). While 60% (248) of CTS TFs
   were identified in both the activity and expression profiles
   (Figure [144]3d, upper panel), the activity profile uncovered 161
   additional unique CTS TFs undetected by the expression profile
   (Figure [145]3c,d, lower panel), accounting for 40% of the total.
   Nearly half (80/161) of these TFs were activated specifically in adult
   HSCs, like Rela (Figure [146]3e,f) and Sp1 despite negligible
   expression differences (Figure [147]S8f,g, Supporting Information). We
   confirmed their activation in HSCs using independent Rela knockout
   microarray data^[ [148]^38 ^] (Figure [149]3g) and Sp1 knockout
   microarray data^[ [150]^39 ^] (Experimental Section; Figure [151]S8h,
   Supporting Information).

   Additionally, by analyzing scRNA‐seq data from healthy human bone
   marrow plasma cells mapped to the T2T‐CHM13 reference assembly,^[
   [152]^40 ^] we validated clustering patterns of Immunoglobulin A (IgA)/
   Immunoglobulin G (IgG) plasma cells using TF activity analysis,
   revealing subtype‐specific TF activation and supporting previous
   observations of tissue‐origin transcriptional retention in bone marrow
   plasma cells (Experimental Section; Figure [153]S10, Supporting
   Information). In summary, the TF activity profile outperforms the
   expression profile in enabling a refined dissection of heterogeneous
   cell populations and identification of novel cell‐type‐specific
   regulators.

2.4. MetaTF Unveils the Heterogeneity of Human Breast Cancer Epithelial Cells

   Cellular heterogeneity within cancer tissues determines both cancer
   progression and treatment response. Breast cancer stands out as the
   first malignancy to be recognized for its molecular‐level
   heterogeneity, leading to the development of tailored treatment
   strategies for different molecular subtypes. This progress has been
   further propelled by the recent availability and accumulation of a
   wealth of scRNA‐seq data, which includes both normal and malignant
   breast epithelial cells, as well as immune cells within the breast
   cancer microenvironment.^[ [154]^41 , [155]^42 , [156]^43 , [157]^44 ^]
   Therefore, we utilized metaTF to explore cell identity in breast cancer
   development and the immune microenvironment.

   We reanalyzed an scRNA‐seq dataset of human breast cancer epithelial
   cells^[ [158]^45 ^] ([159]GSE176078), maintaining the original
   classification of seven cell populations: Cancer Basal, Cancer Her2,
   Cancer LumA, Cancer LumB, Luminal Progenitors, Mature Lumial, and
   Myoepithelial (Table [160]S9, Supporting Information). Four of these
   populations were composed of cancer cells: Cancer LumA (predominantly
   Estrogen Receptor (ER^+) clinical subtype, 98.9%), Cancer LumB (mainly
   ER^+ clinical subtype, 94.5%), Basal (primarily triple‐negative breast
   cancer (TNBC) clinical subtype, 96.8%), and Cancer Her2 (present across
   clinical subtypes, comprising 8.8% ER^+, 36.1% HER2^+, and 55.1% TNBC).
   Our analysis used highly variable TFs (top 500) and genes (top 2000),
   and found that TF activity outperformed gene expression in
   distinguishing cell populations, particularly evident in T‐distributed
   stochastic neighbor embedding (t‐SNE) visualization (Figure [161]4a;
   Experimental Section). Assessment metrics consistently demonstrated
   that TF activity more accurately grouped and separated cell types^[
   [162]^46 ^] (Figure [163]4b; Experimental Section; Table [164]S10,
   Supporting Information). To delve into clinical subtype‐specific
   regulation, we focused on three cancer cell types (Cancer LumA, Cancer
   LumB, and Cancer Basal). Differential analysis identified 53, 58, and
   65 subtype‐specific activated TFs in Cancer LumA, Cancer LumB, and
   Cancer Basal, respectively. Among these, there were known regulators
   such as HOXB2 ^[ [165]^47 ^] and KLF4 ^[ [166]^48 ^] in Cancer LumA, as
   well as HMGA1 ^[ [167]^49 ^] in Cancer Basal (Figure [168]4c; Table
   [169]S11, Supporting Information). Pathway analysis for TFs activated
   in specific clinical subtypes, along with their targets, revealed
   common properties associated with malignant proliferation, such as
   epithelial cell differentiation, regulation of transcription from RNA
   polymerase II promoter in response to stress, and regulation of
   cytoplasmic translation pathways. Each subtype also exhibits unique
   processes. For example, Cancer LumA displayed involvement in the
   apoptotic signaling pathway, negative regulation of programmed cell
   death, and response to estrogen.^[ [170]^45 ^] Cancer LumB showed an
   association with incorrect protein metabolic‐related pathways,^[
   [171]^45 ^] while Cancer Basal exhibited enrichment in Messenger
   Ribonucleic Acid (mRNA) processing and RNA stability‐related pathways^[
   [172]^50 ^] (Figure [173]4d; Table [174]S12, Supporting Information).
   Furthermore, we identified 8, 22, and 24 Cancer Her2‐specific activated
   TFs in these three clinical subtypes, respectively (Figure [175]S11a
   and Table [176]S13, Supporting Information). These TFs were
   predominantly associated with glycolytic process, protein localization,
   and pre‐miRNA processing related pathways (Figure [177]S11b and Table
   [178]S14, Supporting Information). We validated these clinical
   subtype‐specific TFs using an expression‐based AUCell approach,^[
   [179]^18 ^] which demonstrated strong consistency with the metaTF
   results (Figure [180]S11c–e, Supporting Information). To ascertain the
   effectiveness of metaTF in identifying clinical subtype‐specific
   activated TFs, we obtained the BRCA RNA‐seq dataset from The Cancer
   Genome Atlas (TCGA)^[ [181]^51 ^] and clinical subtyping based on
   Berger et al.’s work.^[ [182]^52 ^] Subsequently, gene set variation
   analysis (GSVA)^[ [183]^53 ^] revealed significant differences in the
   activity of 15 out of 53 TFs in LumA, 24 out of 58 TFs in LumB, and 28
   out of 65 TFs in Basal, compared to other clinical subtypes (Figure
   [184]S12a,b,d,f and Table [185]S15, Supporting Information). The gene
   set enrichment analysis (GSEA) results further confirmed the
   specificity of these TFs to cancer subtypes (Figure [186]S12c,e,g,
   Supporting Information). To further validate the effectiveness of
   subtype‐specific activated TFs identified by metaTF, the potential
   roles of the top 15 Basal‐activated TFs in the proliferation,
   migration, and invasion of Basal subtype of breast cancer cells were
   analyzed. We knockdown the expression of these 15 TFs in two basal
   breast cancer cell lines (Hs578T and BT549 cells) (Figures [187]S13a
   and [188]S14a and Table [189]S16, Supporting Information). Among these
   TFs, SOX6, SOX15, CEBPG, KLF13, NFAT5, and SMAD1 were experimentally
   validated for their ability to affect the migration and invasion of
   breast cancer cells, the transwell assay results showed that knockdown
   of these TFs, respectively, significantly decreased the migration
   (Figures [190]S13b and [191]S14b, Supporting Information) and invasion
   (Figures [192]S13c and [193]S14c, Supporting Information) of both
   cancer cells. In addition, we also detected the effects of 15 TFs on
   the proliferation function of breast cancer cells, among which
   knockdown of SOX6, SOX15, or CEBPG significantly inhibited the
   proliferation of Hs578T and BT549 cells, respectively (Figures
   [194]S13d and [195]S14d, Supporting Information). Furthermore, the
   functional study demonstrated that these six TFs have no significant
   impact on the migration of T47D, a luminal A type of breast cancer cell
   line (Figure [196]S15, Supporting Information). This result
   substantiates the effectiveness of the subtype‐specific activated
   transcription factors identified by metaTF. Additionally, to validate
   whether 6/15 is a remarkable percentage, we randomly selected 15
   expressed TFs and performed migration assays in Basal‐like breast
   cancer cell lines, and showed that only two out of the randomly
   selected 15 TFs significantly inhibited the migration of BT549 cells
   (Figure [197]S16a,b, Supporting Information). When incorporating the
   impact of these TFs on the proliferation of BT549 cells into
   consideration, none of these 15 randomly selected TFs significantly
   affect both proliferation and migration of BT549 cells (Figure
   [198]S16c, Supporting Information). Biological replicate experiments
   have yielded results that are consistent with the previous conclusions
   (Figures [199]S17 and [200]S18, Supporting Information). These results
   suggest that activated TFs identified by metaTF somehow play critical
   roles in the progression of breast cancer. As cancer progresses and the
   patient's condition worsens, TF activities change with disease
   advancement. Analyzing activity dynamics across cancer stages unveiled
   dramatic changes in TF activity throughout disease progression
   (Figure [201]4e–g). For instance, KDM5B ^[ [202]^54 ^] and SNAI1 ^[
   [203]^55 ^] activities exhibited gradually increased in ER^+ cells,
   aligning with their known roles in breast cancer metastasis. In
   contrast, MYB activity gradually declined, consistent with its
   suppressive role in breast cancer metastasis.^[ [204]^56 ^] To validate
   the repeatability of the findings in human breast cancer epithelial
   cells reported by metaTF, additional data^[ [205]^57 ^]
   ([206]GSE180286) analysis was conducted (Experimental Section).
   Consistently, metaTF demonstrated superior capability in distinguishing
   cancer cells from normal cells. Furthermore, through a further
   analysis, 47, 63, and 57 subtype‐specific activated TFs were identified
   in Cancer LumA, Cancer LumB, and Cancer Basal, respectively. As
   depicted in Figure [207]S19 (Supporting Information), there is
   substantial overlap between these TFs and the previous analysis
   results. Meanwhile, to demonstrate the broad applicability of metaTF,
   when applied metaTF to prostate cancer epithelial cells^[ [208]^58 ^]
   ([209]GSE176031), the results also demonstrated that clustering based
   on TF activity outperformed clustering based on gene expression.
   Likewise, random forest classification yielded similar outcomes
   (Experimental Section; Figure [210]S20, Supporting Information). In
   summary, analyzing TF activity has provided novel insights into the
   heterogeneity of breast cancer epithelial cells, enhancing our
   understanding of the regulatory mechanisms driving cancer initiation
   and progression.

Figure 4.

   Figure 4
   [211]Open in a new tab

   MetaTF results for the human breast cancer epithelial cells single‐cell
   RNA‐seq dataset. a) t‐SNE plots visualizing the cluster assignments of
   human breast cancer epithelial cells based on TF activity profile
   (left) and highly variable gene expression profile (right). Meanwhile,
   clustering based on gene expression revealed that 89 genes exhibited
   uniquely high expression specifically in Cancer Basal cells, which
   contributed to the improved separation of these cells from other cell
   types. b) Bar plot displaying the values of three evaluation indicators
   for the two clustering results. c) Heatmap plot displaying the activity
   scores of the top 30 specifically activated TFs in three distinct
   cancer epithelial cell populations. d) Radar chart showing Gene
   Ontology (GO) biological process enrichment results for specific
   activated TFs and their expressed targets in the three cancer
   epithelial cell populations. Common pathways are depicted in gray text,
   while cell type‐specific pathways are indicated in text colors
   corresponding to panel (a). Dot plots illustrating the degree of TF
   activation that significantly changed as the disease progressed
   (pathological tumor stage) in the three cancer epithelial cell
   populations: e) Cancer LumA, f) Cancer LumB, and g) Cancer Basal. BRCA,
   breast cancer. TNBC, triple‐negative breast cancer.

2.5. Discovering New Neural‐Regulated T‐Cell Subsets in the Breast Tumor
Microenvironment through MetaTF

   The intricate interplay between the immune system and cancer has become
   a prominent focus in cancer biology research. Immune cells possess the
   ability to recognize and eliminate nascent tumor cells during immune
   surveillance. Paradoxically, they can also act as supporters of tumor
   growth and progression through immunoediting. Excitingly, clinical
   advancements in immunosuppressive therapies and cellular
   immunotherapies have demonstrated promise in targeting these dual
   actions of immune cells within the tumor microenvironment.^[ [212]^59 ,
   [213]^60 , [214]^61 ^] Nonetheless, our understanding of the underlying
   transcriptional mechanisms governing pro‐ and antitumor immunity
   remains incomplete.^[ [215]^62 ^] To shed light on the transcriptional
   regulation of T‐cell states within the tumor microenvironment, we
   applied metaTF to analyze two scRNA‐seq concerning breast cancer
   datasets.^[ [216]^63 ^] Initial clustering based on highly variable
   (top 2000) gene expression profile indicated that there was no clear
   separation of T‐cell subpopulations (Figure [217]5a, right). However,
   when using TF activity (top 500) profile, a pronounced separation of
   T‐cell subpopulations was presented, particularly in the case of CD8^+
   T‐effector memory (T[EM]) cells, which were divided into two distinct
   clusters, denoted as CD8^+ T[EM] C1 and C2 (Figure [218]5a, left). The
   classification accuracy for the newly identified CD8^+ T[EM] C1 and C2
   was confirmed through the expression of their respective marker genes
   (Figure [219]5b). Differential analysis revealed 268 genes that were
   upregulated in CD8^+ T[EM] C1 and 153 genes in C2 (Figure [220]5c;
   Table [221]S17, Supporting Information). C1 displayed higher expression
   of activation/proliferation markers, like NFKBIA ^[ [222]^64 ^] as well
   as genes related to CD8^+ T‐cell response to acute infection, memory
   generation, and activation stress.^[ [223]^65 , [224]^66 ^] In
   contrast, C2 displayed elevated expression of DOCK2, which has the
   potential to impair the immune function of CD8^+ T cells.^[ [225]^67 ^]
   Intriguingly, pathway analysis revealed that C1 was enriched in
   functions associated with T‐cell activation, regulation of protein
   serine/threonine kinase activity, interleukin‐12‐mediated signaling,
   and notably, neuronal/synaptic processes such as axonogenesis
   (Figure [226]5d, upper panel; Table [227]S18, Supporting Information).
   Given emerging evidence indicating the influence of nerves within the
   tumor microenvironment on cancer progression,^[ [228]^68 , [229]^69 ^]
   we first confirmed the protein level of nerve fibers marker
   (antineurofilament heavy,^[ [230]^70 ^] NF) in breast cancer tissue
   (Figure [231]S21d, Supporting Information), subsequently examined the
   neuroreceptor expression level in scRNA‐seq data and discovered that C1
   specifically expressed adrenergic receptors (ARs) ADRB2 and ADRA2B
   (Figure [232]5e). By analyzing publicly available spatial
   transcriptomics data, we confirmed co‐expression of CD8A and ADRB2
   within breast tumors^[ [233]^41 , [234]^71 ^] (Figure [235]5g; Figure
   [236]S22, Supporting Information). Validation through
   immunofluorescence in breast cancer patient samples confirmed the
   expression of ADRB2 on tumor‐infiltrating T cells (Figure [237]5h;
   Figure [238]S21i and Table [239]S19, Supporting Information). To
   further validate these findings, we isolated tumor‐infiltrating
   lymphoid T cells (CD8^+ cells) from breast cancer patients using flow
   cytometry (Figure [240]S21e,f, Supporting Information). Subsequently,
   we treated these T cells with epinephrine^[ [241]^72 ^] (an adrenergic
   receptor agonist, 0.1 ng mL^−1) to activate neurotransmitter receptors
   and assessed downstream TF expression and activity through RNA
   sequencing (Figure [242]S21g, Supporting Information). After performing
   GSEA analysis, the results showed that BCL6, TBX21, and ATF3, along
   with their respective downstream targets, exhibited significant
   activation in the epinephrine treatment group (Figure [243]5i,j; Figure
   [244]S21h and Table [245]S20, Supporting Information). These results
   are consistent with the findings of metaTF activity analysis in CD8^+
   T[EM] C1 (Figure [246]5f; Figure [247]S21a–c, Supporting Information).
   This suggests that CD8^+ T[EM] C1 cells respond to neuro‐related
   signals by activating these transcription factors and their downstream
   targets. To examine the consistency of our findings in other breast
   cancer dataset, we reanalyzed an scRNA‐seq dataset of human breast
   cancer T cells^[ [248]^73 ^] ([249]GSE110686), maintaining the original
   classification of 11 cell populations. Similarly, we employed metaTF to
   dissect the heterogeneity of these 11 cell clusters, and the results
   aligned with previous findings. CD8^+ T[EM] cells were stratified into
   two distinct clusters, with one subset specifically expressing
   adrenergic receptor ADRB2, while the other subset did not (Figure
   [250]S23a, Supporting Information). Differential TF activation analysis
   revealed that BCL6 exhibited specificity in activation within ADRB2^+
   CD8^+ T[EM] cells, consistent with earlier conclusions. However,
   slightly divergent from previous findings, TBX21 and ATF3 were not
   exclusively activated within ADRB2^+ CD8^+ T[EM] cells (Figure
   [251]S23b,c, Supporting Information). In summary, our analysis confirms
   the existence of a previously unrecognized subset of CD8^+ T[EM] cells
   with specialized functionality, characterized by the expression of
   neural receptors.

Figure 5.

   Figure 5
   [252]Open in a new tab

   metaTF reveals TF activity heterogeneity in human breast cancer T
   cells. a) UMAP plots visualizing cluster assignments of human breast
   cancer T cells based on TF activity profile (left) and highly variable
   gene expression profile (right). Cells are color‐coded to represent
   different human breast cancer T‐cell populations, including T[EM]
   (effector/memory T cells), T[EX] (experienced T cells), T[N] (naive T
   cells), T[REG] (regulatory T cells), natural killer (NK) cells), and
   Proliferating (proliferating T cells). b) UMAP plot of CD8^+ T[EM] C1
   and C2 marker genes expression. c) Heatmap plot showing the expression
   of the top 50 differentially expressed genes between CD8^+ T[EM] C1 and
   C2 cell populations. d) GO biological process enrichment results for
   the differentially expressed genes between CD8^+ T[EM] C1 (upper panel)
   and C2 (lower panel) cell populations respectively. e) Dot plot showing
   the significant expression of β[2]‐adrenergic receptor (ADRB2) and
   adrenoceptor alpha 2B (ADRA2B) primarily in the CD8^+ T[EM] C1 cell
   population. f) Dot plot indicating that BCL6 was specifically activated
   in the CD8^+ T[EM] C1 cell population. g) Spatial co‐mapping of CD8 and
   ADRB2 gene expression by spatial transcriptomics of a sample from a
   breast cancer patient recorded in a public dataset ([253]GSE195665,
   patient35, right panel). The left and middle panels depict expression
   levels in fresh frozen sections. h) Expression of CD8 and ADRB2 in
   breast cancer tissue, and the colocalization (Pearson's coefficient)
   between CD8 and ADRB2 is shown, n = 12. i) Upregulated targets of BCL6
   in the epinephrine treatment group compared to the control group from
   in vitro cytological experiments. j) In vitro cytological experiments
   confirmed that candidate TFs and their targets are highly expressed in
   the epinephrine‐treated group through GSEA prerank analysis.

3. Conclusion

   We introduce here metaTF, a computational framework designed for
   analyzing TF activity from scRNA‐seq data. MetaTF offers a robust
   assessment of TF activities, independent of experimental methods, by
   integrating prior TF–target networks and expression‐based gene
   regulatory networks. The novelties of metaTF encompass several key
   aspects. First, metaTF starts by building comprehensive prior networks
   that compile TF–target interactions from various sources, including
   curated databases, ChIP‐seq analyses, and experimental perturbations
   (e.g., TF knockout/knockdown studies). This approach incorporates both
   strongly and weakly supported TF–target relationships, mitigates the
   biases associated with relying on a single data modality. By
   integrating diverse dimensions of TF biology, such as chromatin context
   and regulatory dynamics, metaTF provides a more holistic view of
   TF–target interactions that surpasses conventional methodologies
   dependent solely on structural motifs or DNA‐binding domains. Second,
   one of the critical advancements of metaTF is its ability to integrate
   and analyze multiple types of data, a capability that previous
   algorithms lacked. MetaTF employs a biologically driven network
   integration technique to merge the dynamic regulatory networks derived
   from scRNA‐seq data with the static prior networks. This integration
   enhances the accuracy of TF regulation portrayal by incorporating
   real‐time, cell‐specific regulatory interactions. This comprehensive
   integration approach enables metaTF to capture both canonical and
   cell‐specific regulatory events, providing deeper insights into gene
   regulatory networks. Moreover, by analyzing data from CD8^+ cytotoxic
   T‐cell differentiation, metaTF has not only demonstrated exceptional
   performance in analyzing gene regulatory networks but has also shown
   its ability to predict the effects of TF perturbations with remarkable
   accuracy, emphasizing that it provides an effective and applicable
   solution for quantitative analysis of single‐cell‐based CRISPR
   screening data. Certainly, limitations include reduced performance with
   higher dropout rates and a focus on inferring activated rather than
   repressed regulatory edges. Potential solutions involve assigning
   greater weight to prior knowledge and utilizing methods like PCOR or
   scLink, which support the calculation of negative edges. Although
   metaTF demonstrated superior performance over SCENIC and DoRothEA in
   identifying perturbation‐affected target genes for most TFs,
   discrepancies arose in regulon composition, particularly for TFs like
   FOXQ1, where nondifferentially expressed targets (e.g., RBBP5,^[
   [254]^74 ^] SOX12,^[ [255]^75 ^] and SIRT1 ^[ [256]^76 ^]) reduced
   metaTF prediction accuracy (Figure [257]2a; Figure [258]S24, Supporting
   Information). These targets likely reflect regulatory complexities,
   such as epigenetic modifications or super enhancers, highlighting the
   necessity for metaTF to incorporate multiomics data to achieve more
   robust inference of TF activity. Additionally, we are actively
   compiling and curating similar TF regulatory networks for other model
   organisms, such as Drosophila^[ [259]^77 ^] and zebra fish,^[ [260]^78
   ^] to expand the application of metaTF.

   In our study, we used 10× Genomics scRNA‐seq data from melanoma
   cultures to evaluate how low coverage affects metaTF performance. We
   simulated synthetic dropout in cells treated with SOX10 knockdown or
   nontargeting Small Interfering Ribonucleic Acid (siRNA). As dropout
   rates increased, both the number of detected genes and metaTF
   performance decreased. When the median number of detected genes per
   cell fell below 2500, metaTF's predictive performance markedly
   declined, and its accuracy was around 50% when this number dropped
   below 2000. This highlights the challenges of low‐coverage data and
   suggests that imputation methods like MAGIC^[ [261]^79 ^] and
   DeepImpute^[ [262]^80 , [263]^81 ^] could be useful for improving data
   quality and metaTF performance in such scenarios.

   Cancer represents an intricate and exceedingly heterogeneous class of
   diseases, and this heterogeneity presents itself in various
   manifestations. First, patients with the same type of cancer often
   exhibit varying prognostic outcomes and responses to treatment. Second,
   within the tumor microenvironment of an individual cancer patient, both
   the composition and biological behaviors of cancer cells and stromal
   cells undergo dynamic changes as the disease progresses. TFs play
   pivotal roles in orchestrating how cancer cells respond to changes in
   their microenvironment. Notably, the solid tumor microenvironment is
   usually characterized by hypoxia and cancer‐associated inflammation, as
   elucidated in previous studies.^[ [264]^82 , [265]^83 , [266]^84 ^]
   Hypoxia‐induced factor 1α (HIF1α) is a key TF that mediates cancer
   cells' adaptation to hypoxia stimuli. Interestingly, it is noted that
   the mRNA level of HIF1α remains largely unchanged during hypoxia
   compared to normoxia. In contrast, HIF1α protein is hydroxylated and
   degraded under normoxia, but rapidly accumulates and forms a
   heterodimer with HIF1β to regulate gene expression in response to the
   hypoxic microenvironment.^[ [267]^85 ^] When stimulated by inflammatory
   mediators or cytokines secreted by immune cells in the tumor
   microenvironment, the inflammation‐associated TF Nuclear Factor kappa‐B
   (NF‐κB) in cancer cells is activated, leading to the upregulation of
   genes that regulate cancer proliferation and metastasis. However, the
   activation of NF‐κB occurs through its translocation from the cytoplasm
   to the nucleus, rather than an increase in its mRNA or protein
   levels.^[ [268]^86 ^] These observations underscore the importance of
   investigating TF activities to understand the mechanisms driving cancer
   progression. In this study, we utilized metaTF to discern
   subtype‐specific activated TFs within various molecular subtypes of
   breast cancer. Our findings unveiled a plethora of subtype‐specific TFs
   with heightened activities in breast cancer. Delving deeper into these
   hitherto overlooked TFs will not only shed light on the regulatory
   mechanisms underlying diverse clinical subtypes and stages of breast
   cancer but also provide promising targets for therapeutic interventions
   in the management of breast cancer.

   Additionally, it is noteworthy that in addition to the inherent
   heterogeneity observed in cancer cells, immune cells infiltrating the
   tumor microenvironment also exhibit dynamic and diverse
   characteristics. These tumor‐infiltrated immune cells can lose their
   antitumor efficacy, and, in some cases, even adopt procancer functions
   within the tumor microenvironment.^[ [269]^86 , [270]^87 , [271]^88 ,
   [272]^89 ^] In light of these considerations, we propose that
   delineating cell identities based on TF activity profiles offers
   distinct advantages over conventional strategies that rely on gene
   expression profiles when investigating the functional roles of immune
   cells. Therefore, utilizing metaTF can facilitate the discovery of new
   subpopulations, essential subtype‐specific regulators, and TF‐to‐TF
   regulatory circuitry. In this study, metaTF uncovered a previously
   unrecognized subset of CD8^+ T[EM] cells characterized by the
   expression of neural receptors within the microenvironment of breast
   cancer. These cells exhibit distinct transcriptional profiles and
   respond to neuro‐related signals, suggesting a potential role for
   neural signaling in modulating T‐cell function within breast cancer.
   Certainly, the exact mechanistic details of how these TFs influence
   immune responses and tumor behavior are not yet fully elucidated.
   Further investigations into the downstream effects and signaling
   pathways involved in CD8^+ T[EM] C1 cells would provide a deeper
   understanding of the mechanisms at play. Moreover, exploring clinical
   correlations and outcomes related to the presence and function of CD8^+
   T[EM] C1 cells in breast cancer patients can enhance the clinical
   relevance of these findings. In addition, longitudinal studies tracking
   the dynamics of CD8^+ T[EM] C1 cells over time during cancer
   progression and in response to treatments would provide insights into
   their role in disease progression and therapy resistance.

   In this study, metaTF uncovered a previously unrecognized subset of
   CD8^+ T[EM] cells characterized by the expression of neural receptors
   within the microenvironment of breast cancer. These cells exhibit
   distinct transcriptional profiles and respond to neuro‐related signals,
   suggesting a potential role for neural signaling in modulating T‐cell
   function within breast cancer. Catecholamines bind to adrenergic
   receptors on immune cells, including T cells, initiating complex and
   diverse signaling pathways. Continuous stimulation of nor‐adrenaline in
   the tumor microenvironment triggers the canonical pathway of β2‐ARs
   expressed in T lymphocytes.^[ [273]^90 ^] β2‐AR triggers a cascade
   involving G protein dissociation, Cyclic Adenosine Monophosphate (cAMP)
   production, and Protein Kinase A (PKA) activation, which subsequently
   modulates key signaling molecules like Lck and ZAP70. This ultimately
   impacts T‐cell activation, proliferation, and function.^[ [274]^91 ^]
   On the other hand, β2‐AR upregulates the expression of checkpoint
   receptor PD‐1, which further inhibits CD28‐mediated T‐cell activation
   signaling.^[ [275]^92 ^] Additionally, the adrenergic system can
   modulate T‐cell apoptosis in a PKA‐independent manner by engaging the
   Src family tyrosine kinase Lck.^[ [276]^93 ^] These studies suggest
   that β2‐AR activation in T cells generally suppresses their
   immunosurveillance functions. Our study using metaTF on scRNA‐seq
   breast cancer datasets found that CD8^+ T[EM] cells could be divided
   into two clusters based on TF activity, with the ADRB2^+ CD8^+ T[EM]
   subset showing higher expression of activation and proliferation
   markers and genes related to immune responses, and validation
   experiments demonstrated that ADRB2^+ CD8^+ T[EM] cells respond to
   neuro‐related signals by activating key transcription factors such as
   BCL6, TBX21, and ATF3, along with their downstream targets. Considering
   that BCL6 has been recognized to play important roles in generation and
   proliferation of CD8^+ memory T cells,^[ [277]^94 ^] TBX21 may be
   necessary for the terminal differentiation of CD8^+ T cells,^[ [278]^95
   ^] and ATF3 may promote the infiltration of functional CD8^+ T cells,^[
   [279]^96 ^] thus we believed that we identified a novel subtype of
   CD8^+ T cells and activation of β2‐AR signaling by epinephrine may
   activate the cytotoxic activity of this type of CD8^+ T cells.
   Certainly, the exact mechanistic details of how these TFs influence
   immune responses and tumor behavior are not yet fully elucidated.
   Further investigations into the downstream effects and signaling
   pathways involved in CD8^+ T[EM] C1 cells would provide a deeper
   understanding of the mechanisms at play.

   In summary, metaTF provides an integrated framework to reconstruct GRNs
   and analyze TF activity from scRNA‐seq data. This framework will aid
   biologists in elucidating the transcriptional regulation of cell
   identities, states, and functions. Continued advancements in network
   inference and activity analysis methods will further strengthen the
   interpretation of single‐cell genomics data.

4. Experimental Section

The MetaTF Workflow

   The metaTF workflow consists of five main steps: data preprocessing,
   GRN inference, network integration, regulon identification, and TF
   activity estimation. The standard protocol is as follows:
    1. Data Import: The raw count matrix from multiple sources was
       summarized into a SingleCellExperiment object, subsequently, log2
       normalization and filtration were performed to remove genes
       expressed in fewer than five cells usually.
    2. GRN Inference: PUIC, an entropy‐based method, was used to infer
       regulator–target relationships by quantifying the unique
       contribution from the mutual information between each gene
       triplet.^[ [280]^97 ^] Suppose a target gene T and a vector of
       regulator genes are given. Then, the relationships between R and T
       can be estimated with MI. However, one target gene can be
       simultaneously regulated by multiple regulators, indicating that
       the MI between every two R and T pairs may be dependent. Therefore,
       it was needed to decompose the partial information that was
       uniquely contributed by the indicated regulator. Here, the
       non‐negative decomposition of multivariate information (NDMI)
       algorithm^[ [281]^98 ^] was utilized to decompose the MI into
       unique (Unq), redundant (Rdn), and synergistic (Syn) information.
       Specifically, the simplest form of NDMI with only three variables
       was considered: one target gene T, and two regulator genes {R
       [1], R [2]}. The MI I(T;  R [1], R [2]) can be decomposed as

   [MATH: <mrow><mtable displaystyle="true"><mtr><mtd
   columnalign="right"><mrow><mi>I</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub><mo>,</mo><msu
   b><mi>R</mi><mn>2</mn></msub></mrow></mfenced><mo
   linebreak="badbreak">=</mo><mi>Unq</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub></mrow></mfenc
   ed><mo linebreak="goodbreak">+</mo><mi>Unq</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>2</mn></msub></mrow></mfenc
   ed><mo linebreak="goodbreak">+</mo><mi>Rdn</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub><mo>,</mo><msu
   b><mi>R</mi><mn>2</mn></msub></mrow></mfenced></mrow></mtd></mtr><mtr><
   mtd columnalign="right"><mrow><mo
   linebreak="badbreak">+</mo><mi>Syn</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub><mo>,</mo><msu
   b><mi>R</mi><mn>2</mn></msub></mrow></mfenced></mrow></mtd></mtr></mtab
   le></mrow> :MATH]
       (1)

   where the MI between T and R [1] is
   [MATH: <mrow><mrow><mi>I</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub></mrow></mfenc
   ed><mo linebreak="badbreak">=</mo><mi>Unq</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub></mrow></mfenc
   ed><mo linebreak="goodbreak">+</mo><mi>Rdn</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub><mo>,</mo><msu
   b><mi>R</mi><mn>2</mn></msub></mrow></mfenced></mrow></mrow> :MATH]
   (2)

   And the same with R [2]
   [MATH: <mrow><mrow><mi>I</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>2</mn></msub></mrow></mfenc
   ed><mo linebreak="badbreak">=</mo><mi>Unq</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>2</mn></msub></mrow></mfenc
   ed><mo linebreak="goodbreak">+</mo><mi>Rdn</mi><mfenced separators=""
   open="(" close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><msub><mi>R</mi><mn>1</mn></msub><mo>,</mo><msu
   b><mi>R</mi><mn>2</mn></msub></mrow></mfenced></mrow></mrow> :MATH]
   (3)

   Here, a gene regulatory network was considered with n genes. Given a
   pair of genes R and T, there are n − 2 possible partners
   [MATH:
   <mrow><mrow><msub><mi>R</mi><mn>2</mn></msub><mo>=</mo><mi>P</mi><mo>=<
   /mo><mrow><mo
   stretchy="false">{</mo><mrow><msub><mi>P</mi><mn>1</mn></msub><mo>,</mo
   ><msub><mi>P</mi><mn>2</mn></msub><mo>,</mo><mtext>…</mtext><mo>,</mo><
   msub><mi>P</mi><mrow><mi>n</mi><mo>−</mo><mn>2</mn></mrow></msub></mrow
   ><mo stretchy="false">}</mo></mrow></mrow></mrow> :MATH]
   involved in the triple gene pairs. Since the MI between R and T is
   unaffected by any choice of a third partner
   [MATH: <mrow><mrow><mi>P</mi><mo>=</mo><mo
   stretchy="false">{</mo><mrow><msub><mi>P</mi><mn>1</mn></msub><mo>,</mo
   ><msub><mi>P</mi><mn>2</mn></msub><mo>,</mo><mtext>…</mtext><mo>,</mo><
   msub><mi>P</mi><mrow><mi>n</mi><mo>−</mo><mn>2</mn></mrow></msub></mrow
   ><mo stretchy="false">}</mo></mrow></mrow> :MATH]
   , the unique information Unq(T;  R) could accurately measure the
   individual contribution of R to T. Therefore, the total PUIC of all n −
   2 triple gene pairs was used to estimate the importance of a regulator
   R to a target T as
   [MATH: <mrow><mrow><mi>PUC</mi><mfenced separators="" open="("
   close=")"><mrow><mi>R</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mi>T</mi></mrow></mfenced><mo
   linebreak="badbreak">=</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=<
   /mo><mn>1</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>2</mn></mrow></munde
   rover><mfrac><mrow><mi>Un</mi><msub><mi
   mathvariant="normal">q</mi><msub><mi>p</mi><mi>i</mi></msub></msub><mfe
   nced separators="" open="(" close=")"><mrow><mi>R</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mi>T</mi></mrow></mfenced></mrow><mrow><mi>I</
   mi><mfenced separators="" open="("
   close=")"><mrow><mi>R</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mi>T</mi></mrow></mfenced></mrow></mfrac></mro
   w></mrow> :MATH]
   (4)

   To calculate the unique information between regulator R and target T
   within each triple gene pair, the redundancy should first be measured
   by specific information I [spec](T  =  t;  R) as
   [MATH: <mrow><mrow><msub><mi>I</mi><mi>spec</mi></msub><mfenced
   separators="" open="(" close=")"><mrow><mi>T</mi><mspace
   width="0.28em"></mspace><mo linebreak="goodbreak">=</mo><mspace
   width="0.28em"></mspace><mi>t</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mi>R</mi></mrow></mfenced><mspace
   width="0.28em"></mspace><mo linebreak="badbreak">=</mo><mspace
   width="0.28em"></mspace><munder><mo>∑</mo><mrow><mi>r</mi><mo>∈</mo><mi
   >R</mi></mrow></munder><mi>p</mi><mfenced separators="" open="("
   close=")"><mrow><mi>r</mi><mo
   stretchy="false">|</mo><mi>t</mi></mrow></mfenced><mfenced
   separators="" open="["
   close="]"><mrow><mi>log</mi><mfrac><mn>1</mn><mrow><mi>p</mi><mfenced
   open="("
   close=")"><mi>t</mi></mfenced></mrow></mfrac><mo>−</mo><mi>log</mi><mfr
   ac><mn>1</mn><mrow><mi>p</mi><mfenced separators="" open="("
   close=")"><mrow><mi>t</mi><mo
   stretchy="false">|</mo><mi>r</mi></mrow></mfenced></mrow></mfrac></mrow
   ></mfenced></mrow></mrow> :MATH]
   (5)

   where
   [MATH: <mrow><mrow><mi>T</mi><mo>=</mo><mo
   stretchy="false">{</mo><mrow><msub><mi>t</mi><mn>1</mn></msub><mo>,</mo
   ><msub><mi>t</mi><mn>2</mn></msub><mo>,</mo><mtext>…</mtext><mo>,</mo><
   msub><mi>t</mi><mi>k</mi></msub></mrow><mo
   stretchy="false">}</mo></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mrow><mi>R</mi><mo>=</mo><mo
   stretchy="false">{</mo><mrow><msub><mi>r</mi><mn>1</mn></msub><mo>,</mo
   ><msub><mi>r</mi><mn>2</mn></msub><mo>,</mo><mtext>…</mtext><mo>,</mo><
   msub><mi>r</mi><mi>k</mi></msub></mrow><mo
   stretchy="false">}</mo></mrow></mrow> :MATH]
   are nonempty subsets of target T and regulator R, respectively. In the
   context of gene expression data, t and r represent the discrete bins in
   which a gene's expression value fell in. In this research, the number
   of bins (k) of each gene was set as the nearest integer of the square
   root of the cell number. Then, the redundancy information of the triple
   gene pair was calculated with one regulator R, one partner P, and one
   target T by comparing the specific information from each bin within {R,
   P} about each bin t of target T as
   [MATH: <mrow><mrow><mi>Rdn</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mfenced separators="" open="{"
   close="}"><mrow><mi>R</mi><mo>,</mo><mi>P</mi></mrow></mfenced></mrow><
   /mfenced><mo
   linebreak="badbreak">=</mo><munder><mo>∑</mo><mrow><mi>i</mi><mo>∈</mo>
   <mi>T</mi></mrow></munder><mi>p</mi><mfenced open="("
   close=")"><mi>t</mi></mfenced><munder><mi>min</mi><mfenced
   separators="" open="{"
   close="}"><mrow><mi>R</mi><mo>,</mo><mi>P</mi></mrow></mfenced></munder
   ><msub><mi>I</mi><mi>spe</mi></msub><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo
   linebreak="goodbreak">=</mo><mi>t</mi><mo>;</mo><mfenced separators=""
   open="{"
   close="}"><mrow><mi>R</mi><mo>,</mo><mi>P</mi></mrow></mfenced></mrow><
   /mfenced></mrow></mrow> :MATH]
   (6)

   Next, the MI between regulator R and target T was be calculated as
   [MATH: <mrow><mrow><mi>I</mi><mfenced separators="" open="("
   close=")"><mrow><mi>R</mi><mo>;</mo><mspace
   width="0.28em"></mspace><mi>T</mi></mrow></mfenced><mo
   linebreak="badbreak">=</mo><mi>H</mi><mfenced open="("
   close=")"><mi>R</mi></mfenced><mo
   linebreak="goodbreak">+</mo><mi>H</mi><mfenced open="("
   close=")"><mi>T</mi></mfenced><mo
   linebreak="goodbreak">−</mo><mi>H</mi><mfenced separators="" open="("
   close=")"><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow></mfenced></mrow><
   /mrow> :MATH]
   (7)

   H(R) or H(T) represents the entropy that quantifies the uncertainty of
   the gene expression level as
   [MATH: <mrow><mrow><mi>H</mi><mfenced open="("
   close=")"><mi>R</mi></mfenced><mo
   linebreak="badbreak">=</mo><mo>−</mo><munder><mo>∑</mo><mrow><mi>r</mi>
   <mo>∈</mo><mi>R</mi></mrow></munder><mi>p</mi><mfenced open="("
   close=")"><mi>r</mi></mfenced><mi>log</mi><mspace
   width="0.28em"></mspace><mi>p</mi><mfenced open="("
   close=")"><mi>r</mi></mfenced></mrow></mrow> :MATH]
   (8)

   To calculate the frequency r in each bin within {R}, the gene
   expression was first dispersed into
   [MATH:
   <mrow><mrow><mi>k</mi><mo>=</mo><msqrt><mi>m</mi></msqrt></mrow></mrow>
   :MATH]
   bins according to the range of expression levels, while m represents
   the total number of cells and r represents the number of cells falling
   into each bin.

   It was noticed that the MI was symmetric but the redundancy was
   asymmetric, so the PUC score was an asymmetric measure that was
   suitable for transcriptional regulation since this is a directional
   process. However, for a pair of genes that exist both in the regulator
   and target genesets, the final score was calculated as the average PUC
   in both directions, as follows
   [MATH: <mrow><mrow><mi>PU</mi><msub><mi
   mathvariant="normal">C</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow><
   /msub><mo linebreak="badbreak">=</mo><mfenced separators="" open="{"
   close=""><mtable><mtr><mtd><mrow><mfrac><mrow><mi>PUC</mi><mfenced
   separators="" open="("
   close=")"><mrow><mi>R</mi><mo>;</mo><mi>T</mi></mrow></mfenced><mo>+</m
   o><mi>PUC</mi><mfenced separators="" open="("
   close=")"><mrow><mi>T</mi><mo>;</mo><mi>R</mi></mrow></mfenced></mrow><
   mn>2</mn></mfrac><mo>,</mo><mi>R</mi><mo>∈</mo><mi>T</mi></mrow></mtd><
   /mtr><mtr><mtd><mrow><mi>P</mi><mi>U</mi><mi>C</mi><mfenced
   separators="" open="("
   close=")"><mrow><mi>R</mi><mo>;</mo><mi>T</mi></mrow></mfenced><mo>,</m
   o><mi>R</mi><mo>∉</mo><mspace
   width="0.28em"></mspace><mi>T</mi></mrow></mtd></mtr></mtable></mfenced
   ></mrow></mrow> :MATH]
   (9)

   The final PUC score for each gene pair was further normalized based on
   its average cumulative distribution in the regulator and its target set
   as
   [MATH:
   <mrow><mrow><msub><mi>W</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow>
   </msub><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mi>F</mi><mi>R</mi></msu
   b><mfenced separators="" open="(" close=")"><mrow><mi>PU</mi><msub><mi
   mathvariant="normal">C</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow><
   /msub></mrow></mfenced><mo>+</mo><msub><mi>F</mi><mi>T</mi></msub><mfen
   ced separators="" open="(" close=")"><mrow><mi>PU</mi><msub><mi
   mathvariant="normal">C</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow><
   /msub></mrow></mfenced></mrow><mn>2</mn></mfrac></mrow></mrow> :MATH]
   (10)

   where the F[R] (PUC[ R,T ]) is the cumulative distribution of all PUC
   scores involving the regulator R, F[T] (PUC[ R,T ]) is the cumulative
   distribution of all PUC scores involving the target T, and W [R,T
   ]represents the weighted regulatory relationships between regulators
   and targets. The weighted GRN was constructed for subsequent analysis.
   Here, classic information‐theoretic approaches use the on‐off binary
   approximation, which is appropriate for scRNA‐seq data given that it
   mostly yields data about transcript presence or absence, on–off binary
   approximation‐based methods^[ [282]^99 , [283]^100 ^] were also
   incorporated into the single‐cell GRN inference framework.
    1. Prior‐Network Construction: The TF–target relationships inferred
       from PUIC algorithm are only based on gene expression, they might
       include many false‐positive targets. To reduce the effect of the
       false‐positive connections, a prior TF–target network
       (knowledge‐based information) was constructed. TFs were obtained
       from the HumanTFs website^[ [284]^101 ^]
       ([285]http://humantfs.ccbr.utoronto.ca/) and constructed the human
       prior TF–target networks based on four main resources: manually
       curated datasets, ChIP‐seq binding datasets, TF knockout
       experiment, and tissue‐specific transcriptional regulatory networks
       (TRNs). 1) The manually curated TF–target relationships were
       collected from the following databases and literature: TRRUST,^[
       [286]^102 ^] INTACT,^[ [287]^103 ^] TRANSFAC,^[ [288]^104 ^]
       FANTOM4,^[ [289]^105 ^] TFe,^[ [290]^106 ^] MSIGDB,^[ [291]^107 ^]
       PAZAR,^[ [292]^108 ^] TFactS,^[ [293]^109 ^] HTRIDB,^[ [294]^110 ^]
       TRED,^[ [295]^111 ^] TRRD,^[ [296]^112 ^] and Neph2012.^[ [297]^113
       ^] 2) The ChIP‐seq associated TF–target networks were collected
       from two ChIP‐seq related databases (ChEA^[ [298]^114 ^] and
       ENCODE^[ [299]^115 ^]) and two motif‐based databases (HOCOMOCO^[
       [300]^116 ^] and JASPAR^[ [301]^117 ^]) by predicting the TF
       binding sites. 3) The TF knockout/knockdown information was
       downloaded from the KnockTF^[ [302]^112 ^] database. First, the
       datasets that knockout/knockdown TFs were significantly
       downregulated (P < 0.05 and log‐transformed fold changes (logFC) <
       −1) compared to the control group were only retained. Then, the
       differentially expressed genes (P < 0.05 and |logFC| > 1) were
       regarded as targets for the knockout/knockdown TF, and the weight
       of interactions was calculated as log[10] (P). 4) Tissue‐specific
       and blood‐specific TF‐target networks were constructed based on 32
       tissue datasets^[ [303]^118 ^] and six blood datasets
       ([304]https://zenodo.org/record/8355476), respectively.
    2. Regulon Identification: First, the prior TF–target networks and the
       weighted GRN were integrated using the following equation

   [MATH:
   <mrow><mrow><msub><mi>W</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow>
   </msub><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><
   msup><mi>e</mi><mrow><mo>−</mo><mfenced separators="" open="("
   close=")"><mrow><msup><mi>a</mi><mn>0</mn></msup><mo>+</mo><msubsup><mo
   >∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><
   mrow><msup><mi>a</mi><mi>i</mi></msup><msubsup><mi>w</mi><mrow><mi>R</m
   i><mo>,</mo><mi>T</mi></mrow><mi>i</mi></msubsup></mrow></mrow></mfence
   d></mrow></msup></mrow></mfrac></mrow></mrow> :MATH]
       (11)

    1. TF Activity Estimation: The above filtered regulons were used to
       calculate the relative enrichment score (VIPER^[ [305]^17 ^] by
       default) against the log2‐normalized expression data and integrate
       the TF–target networks. VIPER is a statistical framework that was
       developed to estimate protein activity from gene expression data
       using enriched regulon analysis performed by the algorithm aREA. It
       requires information about interactions between a protein and its
       transcriptional targets and the likelihood of their interaction.
       Here, the log2 normalized expression data and integrated TF target
       network were inputted to VIPER to generate the final TF activity
       matrix. Moreover, several functions were added to the metaTF
       package for statistical analysis and visualization, such as
       cell‐type‐specific activated TF identification, regulon pathway
       enrichment analysis, and data visual aids (dot plot, Radviz plot,
       Sankey diagram, and heatmap).

   where n represents the number of sources, a^i controls the importance
   of the source i ^th, a ^0 is a constant parameter, and
   [MATH:
   <mrow><msubsup><mi>w</mi><mrow><mi>R</mi><mo>,</mo><mi>T</mi></mrow><mi
   >i</mi></msubsup></mrow> :MATH]
   denotes the weight of the TF–target in source i ^th. Moreover, the
   mouse prior TF–target networks in the current research were transformed
   from the human networks using orthologous genes. Then, a normal
   distribution was fitted to the weights of targets for each TF in the
   integrated TF–target network, and the truncation value q = 0.01 was
   used to infer the threshold for these targets. For each TF, only
   targets with weights greater than the threshold were retained as a gene
   list and inputted together with the previously inferred GRN into the
   fgseaMultilevel method (fgsea, [306]https://github.com/ctlab/fgsea/) to
   do preranked gene set enrichment analysis. The leading edge of each
   enrichment result was considered as positively regulated targets for
   each TF. Finally, TFs and their positively regulated targets were
   identified as regulons.

Data Processing for Prior‐Network Construction

   The files “trrust_rawdata.human.tsv” and “edge.GoldStd_TF.tbl.txt” were
   downloaded from TRRUST and FANTOM4, respectively. From INTACT, a
   curated dataset, specifically focusing on human protein–DNA
   interactions, was acquired. From TFe, “Targets” and “interaction” were
   taken as input content and download relevant data. From TFactS, the
   “Catalogues.xls” file, which only preserves the interaction pairs of
   human species and contains relevant literature, was downloaded. From
   HTRIDB, only the results detected by small and medium‐sized
   technologies such as chromatin immunoprecipitation and linker chromatin
   immunoprecipitation were retained. For TRED, the BiPAX file was
   downloaded from RegNetwork and relevant TF interaction information was
   extracted. For TRRD, it was ultimately extracted from the
   “Catalogues.xls” file of TFactS. From MSIGDB, the regulatory target
   gene set collection was downloaded. For Neph2012, it was downloaded
   directly from its R package. From TRANSFAC, the transcription factor
   regulatory gene sets were downloaded. For PAZAR, it was downloaded from
   the ORegAnno database. Finally, all the interaction relationships
   together without redundancy were integrated.

   The integration of the prior associations between transcription factors
   and target genes from the aforementioned sources involved the
   utilization of MI scores for weighting. These scores, which were
   normalized weighted counts, take into account independent interaction
   evidence and relevant experimental methods to quantify the
   relationships accurately
   [MATH: <mrow><mrow><mi>Weigh</mi><msub><mi
   mathvariant="normal">t</mi><mi>MI</mi></msub><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mi>K</mi><mi
   mathvariant="normal">p</mi></msub><mo>×</mo><msub><mi>S</mi><mi
   mathvariant="normal">p</mi></msub><mfenced open="("
   close=")"><mi>n</mi></mfenced><mo>+</mo><msub><mi>K</mi><mi
   mathvariant="normal">m</mi></msub><mo>×</mo><msub><mi>S</mi><mi
   mathvariant="normal">m</mi></msub><mfenced separators="" open="("
   close=")"><mrow><mi>c</mi><mi>v</mi></mrow></mfenced><mo>+</mo><msub><m
   i>K</mi><mi
   mathvariant="normal">t</mi></msub><mo>×</mo><msub><mi>S</mi><mi
   mathvariant="normal">t</mi></msub><mfenced separators="" open="("
   close=")"><mrow><mi>c</mi><mi>v</mi></mrow></mfenced></mrow><mrow><msub
   ><mi>K</mi><mi
   mathvariant="normal">p</mi></msub><mo>+</mo><msub><mi>K</mi><mi
   mathvariant="normal">m</mi></msub><mo>+</mo><msub><mi>K</mi><mi
   mathvariant="normal">t</mi></msub></mrow></mfrac></mrow></mrow> :MATH]
   (12)

   Here, the weights in three dimensions, namely number of publications,
   experimental detection method, and interaction type, are denoted as K
   [p],K [m,] and K [t] respectively. Experimental methods are typically
   classified into three groups: small and mid‐scale techniques such as
   chromatin immunoprecipitation, concatenate chromatin
   immunoprecipitation, Cytosine‐phosphate‐Guanine (CpG) chromatin
   immunoprecipitation, DNA affinity chromatography, DNA affinity
   precipitation assay, DNase I footprinting, electrophoretic mobility
   shift assay, southwestern blotting, streptavidin chromatin
   immunoprecipitation, surface plasmon resonance, yeast one‐hybrid assay;
   high‐throughput techniques like ChIP‐chip or ChIP‐seq, and unknown.
   Interaction types include physical interaction, direct interaction,
   interaction, activation, inhibition, and unknown.

   The merged ChIP‐seq binding peaks were obtained from ReMap. Notably,
   only regulatory proteins excluding transcription factors were
   considered in this analysis. Using the bedtools closest function, each
   binding site was matched with the nearest transcription start site
   (TSS) for every TF. In cases where genes had multiple transcripts, the
   closest binding site–transcript pair was selected. This enabled the
   assignment of each binding site to a gene, allowing for the possibility
   of a gene having no or multiple binding sites for a given TF.
   Furthermore, a score ranging from 0 to 1 was assigned to each binding
   site–gene pair based on the distance between the binding site and the
   TSS
   [MATH: <mrow><mrow><mi>ChIp</mi><mfenced separators="" open="("
   close=")"><mrow><mi>TF</mi><mo>−</mo><mi>G</mi></mrow></mfenced><mo
   linebreak="badbreak">=</mo><munder><mo>∑</mo><mi>i</mi></munder><msup><
   mi
   mathvariant="normal">e</mi><mrow><mo>−</mo><mfrac><msub><mi>d</mi><mi>i
   </mi></msub><mrow><mi>k</mi><mo>×</mo><msub><mi>D</mi><mi>TF</mi></msub
   ></mrow></mfrac></mrow></msup></mrow></mrow> :MATH]
   (13)

   Here D [TF] stands for the median distance separating the TSS and
   binding sites associated with all TFs, whereas d[i] refers to the
   distance between the binding site within a TF and the TSS of the gene
   g. The score attributed to a TF target is the sum of the weights
   assigned to all binding sites of that TF along with the TSS of the
   target.

   The analysis involves scanning TFBS in the promoter sequences of each
   transcript using the human genome reference sequence (GRCh38 version),
   encompassing 1000 bp upstream (5′ end) and 200 bp downstream (3′ end).
   Position Weight Matrix (PWMs were sourced from the HOCOMOCO and JASPAR
   repositories. Employing the Find Individual Motif Occurrences (FIMO)
   tool in the Multiple EM for Motif Elicitation (MEME) software suite
   (MEME's FIMO) motif discovery tool with default parameters, putative
   TFBS were identified in the promoters, focusing on FIMO predictions
   with a P‐value of <0.0001 and removing duplicate matches. It was then
   proceeded with the annotation of conservation and epigenetic regulatory
   properties related to TFBS. Essential phase Cons and phyloP scores were
   sourced from the CellBase database for this purpose. The final weight
   calculation involved multiplying the phasCons score by the phyloP score
   (with −log(P) value)) while taking the square root into account.

Performance of GRN Inference Methods

   To evaluate the performance of GRN inference methods on single‐cell
   data, CROP‐seq data of HEK293T cells were downloaded from the Gene
   Expression Omnibus (GEO, [307]GSE92872^[ [308]^36 ^]). CROP‐seq enables
   pooled CRISPR knockout screening of several TFs simultaneously.
   Ideally, it was considered that the targets of the TFs were activated
   in wild‐type (WT) cells but not in perturbed cells. To ensure
   successful perturbations in the CROP‐seq experiments, samples were
   retained only when targeted TF was significantly downregulated in
   perturbed cells. Significant differential expression between stimulated
   and unstimulated cells was determined using a two‐tailed T‐test, and
   only the samples where the target TF was significantly downregulated (P
   < 0.05 and logFC < −1) were retained. A total of 1143 cells including
   14 target TFs were retained for further evaluation (Figure [309]S4f,
   Supporting Information). Next, genes expressed in <10 cells were
   filtered out. Three different methods, PUIC, GENIE3,^[ [310]^27 ^] and
   PCOR,^[ [311]^28 ^] were used for GRN inference. For GENIE3, the above
   14 target TFs were only used as input regulators considering the
   computational time cost. To evaluate these three methods, the targets
   of each TF were first pre‐ranked according to their edge weights, and
   then the area under the recovery curve (AUC) was calculated using
   independent gene sets from DoRothEA^[ [312]^15 ^] (ABCDE) as the ground
   truth. The AUC value of each TF reflects the enrichment of its targets
   in inferred GRNs. Additionally, to evaluate computational time
   consumption, expression data were simulated by subsampling different
   numbers of genes (100, 200, 400, 600, 800, and 1000) within the 1143
   cells. The running time was then calculated for GRN inference steps
   using each simulated expression matrix. The results showed that PUIC
   outperformed the two other methods in over half of the TFs
   (Figure [313]2f). Meanwhile, PUIC is much more efficient than GENIE3
   (Figure [314]S4h, Supporting Information).

Performance of the Integrated TF–Target Network

   To evaluate the performance of regulons from different sources,
   regulons were collected from the above resources in multiple ways:
   downloaded from DoRothEA (DoRothEA regulons), generated from RcisTarget
   (RcisTarget regulons), prior‐network and integrated TF‐target network
   from prior network and the weighted GRN. For DoRothEA regulons, only
   human regulons named “dorothea_hs” were used. RcisTarget regulons were
   obtained using the simplified SCENIC workflow described above. To
   generate GRN‐based regulons, GRNs were first inferred from single‐cell
   CRISPRi data using the PUIC method with all of the genes as input, and
   then ARACNE^[ [315]^119 ^] was used to remove the weakest connections
   within the GRNs to generate regulons. Integrated regulons were
   generated by combining the prior pan‐tissue network and GRNs inferred
   from single‐cell CRISPRi data of human ESCs.^[ [316]^31 ^] Only 40 TFs
   were retained, corresponding to TFs perturbed in the CRISPRi
   experiment. TF activities for the above TFs were estimated using the
   standard metaTF pipeline. Performance was evaluated by calculating AUC
   using the TF activities estimated as described above. Paired sample
   T‐tests were performed on the AUC values to determine the difference
   between any two groups of regulons (Figure [317]S4j, Supporting
   Information). The results showed that the integrated regulons performed
   better (an average AUROC of 0.62) than the regulons from other sources
   (Figure [318]S4i, Supporting Information). It was also noticed that the
   regulons directly downloaded from the database without considering the
   expression data only performed slightly better than random levels (an
   average AUROC of 0.52).

Performance of MetaTF

   To evaluate the performance of metaTF on TF activity estimation, the
   activity scores were transformed into a binary setup based on the
   perturbation type of the experiment (knockout/overexpression). For
   knockout experiments, cells with targeted TF knockout were labeled as
   the negative class, while control cells were labeled as the positive
   class. For overexpression experiments, the class labels were reversed.
   The receiver operating characteristic (ROC) and AUC analyses were then
   performed using the R package ROCR^[ [319]^120 ^] (version 1.0‐11).
   Using the positive and negative classes, the true positive rate (TPR)
   and false‐positive rate (FPR) were calculated at different thresholds
   of TF activity to generate the ROC curves (Figure [320]S1c, Supporting
   Information). Three commonly used single‐sample activity estimation
   methods including VIPER, ssGSEA, and AUCell, were benchmarked using the
   same integrated regulons. It was found that both VIPER and ssGSEA
   showed significantly higher performance than AUCell (Figure [321]S4k,
   Supporting Information).

Implementation of Simplified SCENIC Workflow

   To efficiently infer the GRNs using the SCENIC^[ [322]^18 ^] workflow,
   a protocol that contains the main SCENIC steps was implemented. In the
   simplified SCENIC workflow, GENIE3 (version 1.14.0) was first used for
   GRN reconstruction with potential TFs. Next, the targets of each TF
   were preranked based on their edge weights in the GRNs, and the
   top‐ranked (e.g., 5%) targets of each TF were selected as putative
   regulons. The putative regulons were further trimmed using the R
   package RcisTarget^[ [323]^121 ^] (version 1.12.0) combined with
   cis‐regulatory DNA‐motif information downloaded from
   [324]https://resources.aertslab.org/cistarget/. By default, the motif
   database and TF annotation used were from version 9 of cistarget
   ([325]https://resources.aertslab.org/cistarget/, hg38 and mm10, with a
   distance 500 bp Up 100 Dw for humans and mice, respectively). All
   enriched genes were retained only if the TF of the putative regulon was
   identified as either high or low confidence. Then, the enriched targets
   from different motifs of the same TF were merged as the final regulons.
   The TF activity was estimated using the R package AUCell (version
   1.14.0) with the final regulons against the log2‐normalized expression
   data. Finally, the above steps were summarized and implemented as a
   function named “runSCENIC” in the R package metaTF.

Preprocessing of Single‐Cell CRISPRi Data

   To evaluate the performance of TF regulon analysis methods,
   scRNA‐seq‐based CRISPRi screening data on human ESCs were retrieved
   from GEO ([326]GSE127202^[ [327]^31 ^]). Samples with fewer than ten
   cells and cells with multiple guide RNAs were removed. A T‐test was
   then performed between knockout and control samples, leaving only
   samples with significant downregulation (P < 0.05 and logFC < −1;
   Figure [328]S2a, Supporting Information). After preprocessing, an
   expression matrix of 4325 cells containing 40 targeted TFs across 106
   samples was retained for further analysis. The overall performance of
   metaTF was compared with SCENIC and DoRothEA on CRISPRi data with 40
   target TFs. MetaTF identified all the target TFs and performed best in
   nearly all TFs (Figure [329]2a). To elucidate the pivotal contributions
   of two key steps in inferring TF activity using metaTF, the activities
   of 35 TFs were recalculated by separately removing the steps involving
   PUIC for GRN calculation and the prior network. The findings revealed
   that, compared with PUIC alone, the inclusion of the prior network
   resulted in 32 TFs achieving higher AUROC values, indicating that the
   inclusion of the prior network enables metaTF to achieve better
   performance. Simultaneously, when excluding the PUIC calculation of the
   GRN step and retaining only the prior network, metaTF achieved results
   comparable to SCENIC (Figure [330]S5a,b, Supporting Information). In
   Figure [331]2a, SOX17 has AUROC values much higher than other factors.
   The estimation of SOX17 activity using metaTF and SCENIC indicated that
   the knockout of SOX17 led to a significant decrease in the expression
   of its target genes, including GATA4, GATA6, FOXA2, and SOX7 (Figure
   [332]S5c, Supporting Information). The downregulation of these TFs
   subsequently affects the expression of other downstream genes, leading
   to substantial changes at the transcriptional level. These changes are
   particularly beneficial for accurately estimating SOX17 activity,
   thereby enhancing the precision of SOX17 activity estimation.

   Consistent with previous analysis of human single‐cell CRISPR screening
   data, the prior knowledge‐based network was replaced with a mouse
   TF‐target network to assess the activity of these four TFs in perturbed
   cells and control cells from mouse brain cortical development^[
   [333]^122 ^] using three methods: metaTF, SCENIC, and DoRothEA. The
   AUROC values were calculated for each method (Figure [334]S7,
   Supporting Information). The results showed that metaTF achieved the
   highest AUROC values for Foxg1, Nr2f1, and Tcf4, indicating that the TF
   regulons identified by metaTF in mouse can reflect TF activity with a
   relatively high degree of accuracy.

TF Activity Inference in Bone Marrow Plasma Cells with MetaTF and T2T‐CHM13
Alignment

   The T2T‐CHM13 reference genome, a significant improvement over
   GRCh38,^[ [335]^123 ^] was used to analyze scRNA‐seq data from healthy
   human bone marrow plasma cells.^[ [336]^40 ^] Sequencing reads were
   aligned to T2T‐CHM13 to generate a gene expression matrix, which was
   filtered based on original study criteria. TF activities were inferred
   using metaTF with a blood‐associated regulatory network as the prior.
   Highly variable genes and TFs (top 500 by coefficient of variation
   (CV)) were selected for Uniform Manifold Approximation and Projection
   (UMAP) clustering. Gene expression clustering confirmed IgA and IgG
   plasma cell populations, though isotype segregation was less distinct
   (Figure [337]S10a, Supporting Information). TF activity analysis
   revealed three major clusters (Figure [338]S10b, Supporting
   Information), with specific TFs showing elevated activity (Figure
   [339]S10c, Supporting Information). These findings support the
   hypothesis that bone marrow plasma cells retain tissue‐of‐origin
   transcriptional programs, reflecting their diverse tissue origins.

Preprocessing of Single‐Cell CRISPR Data with Double‐Knockout TFs In CD8^+
Cytotoxic T‐Cell Differentiation

   As described by Zhou et al.[340] ^21 , they re‐engineered a dual‐guide,
   direct‐capture lentiviral single guide RNA (sgRNA) vector to generate a
   modified Ametrine‐expressing retroviral vector that effectively
   transduced primary CD8^+ T cells. The dataset was downloaded from
   single‐cell CRISPR screens pertaining to the differentiation of CD8^+
   cytotoxic T cells. Data from cells were retained with only one type of
   sgRNA, those with two types of sgRNAs, as well as data from control
   cells. sgRNA with fewer than ten cells were removed. A T‐test was then
   performed between knockout and control samples, leaving only samples
   with significant downregulation (P < 0.05 and logFC < −1). After
   preprocessing, an expression matrix of 53 774 cells containing 109
   single‐targeted and 426 double‐targeted TFs was retained for further
   analysis. The overall performance of metaTF with DoRothEA on CRISPRi
   data was compared with 109 single‐targeted TFs. For double‐targeted
   TFs, the target TFs metaTF activity scores were compared with the
   control cells.

Evaluating the Robustness of TF Activity Estimation Methods through Synthetic
Dropout Simulation

   Single‐cell CRISPRi screening data were used to evaluate the robustness
   of TF activity estimation methods through synthetic dropout simulation.
   The expression matrix was randomly zeroed out at various dropout ratios
   (5%, 10%, 20%, and 30% of total values) to create sparser matrices. For
   each simulated expression dataset, TF activities were inferred using
   metaTF, SCENIC, and DoRothEA. Specifically, metaTF integrated the
   prior‐based network with the GRN‐based network using the “integraNets”
   method with parameters “max_target_num = 2000” and “maxInter = 10.” For
   SCENIC, the “runSCENIC” method was used with default parameters. For
   DoRothEA, the “run_viper” method with built‐in data from the DoRothEA
   package was used as the default parameter. Three gene activity
   inference methods (VIPER, ssGSEA, and AUCell) were assessed using the
   same regulons identified by metaTF. For each method and dropout level,
   the AUC value of each TF was calculated.

   The impact of low coverage on metaTF performance was further assessed
   using 10× Genomics scRNA‐seq data from melanoma cultures
   ([341]GSE134432^[ [342]^32 ^]). Synthetic dropout was simulated to
   evaluate metaTF's robustness under varying dropout levels. Single
   melanoma sample MM074 cells were treated with SOX10 perturbation
   (knockdown group: MM074_SOX10_72 h) or nontargeting siRNA (control
   group: MM074_NTC; Figure [343]S3a–c, Supporting Information). SOX10
   activity was predicted in knockdown cells after applying random zeroing
   at dropout ratios from 5% to 50% (Figure [344]S3d, Supporting
   Information). MetaTF was used to infer SOX10 activities, integrating
   prior‐based and GRN‐based networks via “integraNets” with parameters
   “max_target_num = 2000” and “maxInter = 10.” As expected, with
   increasing dropout rates, the number of detected genes progressively
   decreases, and the performance of metaTF also declines gradually
   (Figure [345]S3e, Supporting Information). When the median number of
   detected genes per cell falls below 2500, there is a significant
   decline in the predictive performance of metaTF. Specifically, when the
   median number of detected genes drops below 2000, the accuracy of
   metaTF's predictions hovers around 50%. At this point, imputation
   methods (e.g., MAGIC,^[ [346]^79 ^] DeepImpute^[ [347]^80 ^]) can be
   employed to enhance data quality.

Evaluating the Performance of MetaTF Using In Vivo Nkx2‐1 Knockout Data

   Droplet‐based scRNA‐seq data consisting of lung cells from Nkx2‐1
   ^CKO/CKO; Aqp5 ^Cre ^/+ mutant mice and littermate controls were
   downloaded from GEO ([348]GSE129584^[ [349]^33 ^]). Since the Nkx2‐1
   conditional knockout (cKO) primarily affected alveolar type 1 (AT1)
   cells, the analysis exclusively focused on epithelial cells, in which
   Nkx2‐1 plays a key role. t‐SNE analysis on mutant and control cells was
   first performed using the R package scater,^[ [350]^124 ^] clustering
   them into 28 clusters (Figure [351]S4a, Supporting Information).
   Clusters with over 50% of cells expressing Cdh1, an epithelial marker
   gene, were further retained (Figure [352]S4b, Supporting Information).
   This resulted in 2085 control and 2484 Nkx2‐1 ^CKO/CKO; Aqp5 ^Cre ^/+
   mutant epithelial cells. The t‐SNE results showed that Nkx2‐1 was only
   expressed in WT cells, indicating a successful knockout of Nkx2‐1 in
   AT1 cells (Figure [353]S4c, Supporting Information). For the
   benchmarking process, TF activity matrices for these epithelial cells
   were calculated using standard metaTF, simplified SCENIC, and standard
   DoRothEA workflows. The ROC and AUC values for each method were
   calculated as previously described.

Evaluating the Performance of MetaTF Using In Vivo Cebpa Knockout Data

   Massively parallel single‐cell RNA sequencing (MARS‐seq) data
   consisting of myeloid progenitors from Cebpa conditional knockout
   (Cebpa ^flox/flox Mx1‐Cre) and matching control (Cebpa ^flox/flox) mice
   were downloaded from GEO ([354]GSE72857^[ [355]^34 ^]). After quality
   control, a total of 1152 control and 2304 Cebpa mutant cells were
   retained for analysis (Figure [356]S4d, Supporting Information). An
   analogous benchmarking process to the Nkx2‐1 knockout data was
   employed, with the distinction being that the prior GRNs were based on
   “myeloid_leukocyte”‐specific data. The results showed that metaTF still
   performed better (AUROC of 0.68) than other tools (Figure [357]2b).

Evaluating the Performance of MetaTF Using TF Overexpression Data

   Single‐cell Smart‐seq2 data from induced hemogenic endothelial cells
   (iHECs) with exogenous Runx1 and Hoxa9 overexpressions were downloaded
   from GEO ([358]GSE128738^[ [359]^35 ^]). After quality control, a total
   of 26 embryonic HECs and 65 iHECs were retained for analysis. An
   analogous benchmarking process to the Nkx2‐1 knockout data was
   employed, with the distinction being that the prior GRNs were based on
   blood‐associated data, and cells with TF overexpression were considered
   as the positive class when computing ROC and AUC values.

Pathway Enrichment Analysis of Regulons

   To identify the biological pathways associated with each regulon,
   pathway enrichment analysis was performed using the Jaccard test.^[
   [360]^125 ^] This test evaluated the similarity and statistical
   significance between the regulon gene set and pathway gene lists based
   on their shared genes. Notably, the initial enriched pathway terms for
   each regulon contained redundancies. To address this issue, Jaccard
   similarity coefficients between all pairs of enriched terms were
   calculated and then clustered into nonredundant groups using
   unsupervised hierarchical clustering. The enriched term with the
   smallest P‐value in each group was selected to represent that group. By
   default, the Reactome pathway database^[ [361]^126 ^] (version 77)
   covering 16 species was used for enrichment analysis. Only terms
   containing 10–2000 genes were included. This was allowed for inferring
   key biological pathways linked to each regulon in a nonredundant
   manner.

Analysis of Single‐Cell RNA Sequencing Data during HSC Development

   The raw single‐cell Smart‐seq2 data from six sequential cell
   populations (of note, E12 and E14 FL HSCs were combined) during HSC
   development were sourced from the GEO datasets ([362]GSE153653 and
   [363]GSE67120) derived from the recently published study.^[ [364]^22 ,
   [365]^127 ^] Initially, adaptor and low‐quality base trimming were
   performed through trimmomatic^[ [366]^128 ^] (version 0.36) using the
   parameters “LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 HEADCROP:20
   CROP:101 MINLEN:101.” Subsequently, gene expression was quantified as
   counts using Salmon^[ [367]^129 ^] (version 1.3.0) with clean data
   derived from trimmomatic output, where reads were mapped to the mouse
   GENCODE^[ [368]^130 ^] transcriptome (mm10) and gene annotation (vM25,
   primarily assembled). Log normalization was performed using the
   “logNormCounts” function from the R package scater. TF activities were
   estimated with metaTF, using blood‐associated networks as prior
   network. The CV of all genes and TFs was then calculated using
   normalized counts or activity scores, and the top 500 highly variable
   genes/TFs were retained for t‐SNE analysis. To confirm the efficacy of
   TF activity profiles in more effectively classifying distinct cell
   types compared to gene expression profiles, a classic machine learning
   approach, random forest classification model, was employed on mouse
   embryonic HSCs cell populations. Separate models were fitted using TF
   activity profiles and gene expression profiles, and models were trained
   in Python package scikit‐learn with “RandomForestClassifier” method.
   Cell‐type specificity of TFs was determined at both the expression and
   activity levels using the “findAllCTSRegulons” method in the R package
   metaTF. TFs with adjusted P < 0.01 and cell‐type‐specific score (CSS) >
   3 were considered significant cell‐type‐specific regulators. The
   “RadvizEnrichPlot” function in the R package metaTF was used to
   visualize cell‐type‐specific TFs, and the R package UpSetR^[ [369]^30
   ^] (version 1.4.0) was used to generate Venn diagrams.

   Rela loss impairs HSC function and promotes hematopoietic stem and
   progenitor cell (HSPC) cycling. To confirm that the target genes of
   Rela are indeed activated in HSCs, the enrichment of these targets was
   estimated using an independent dataset with Rela knockout (TF
   knockout/knockdown data from mouse bone marrow HSCs,^[ [370]^38 ^]
   [371]GSE45755). Ideally, Rela knockout would lead to the downregulation
   of its target genes, so GSEA was performed using Rela’s target genes on
   logFCs calculated from Rela knockout and control samples. The R package
   fgsea was used to calculate the normalized enrichment score and
   adjusted P‐values. The enrichment plot showed that these target genes
   were significantly enriched in control samples (Figure [372]3g),
   indicating that the target genes of Rela identified with metaTF can be
   confirmed in other independent datasets. The same confirmation was also
   performed on another TF, Sp1, using microarray data with Sp1 knockout^[
   [373]^39 ^] ([374]GSE52497).

Analysis of Breast Cancer Epithelial Cell Single‐Cell RNA Sequencing Data

   Breast cancer patient scRNA‐seq data from the GEO ([375]GSE176087) were
   analyzed.^[ [376]^131 ^] Gene expression was log normalized, and cells
   with fewer than 500 expressed genes were filtered out, resulting in 13
   062 genes across 25 605 epithelial cells. The Seurat v3.0.0 method was
   employed in R (v3.5.0) for data normalization, dimensionality
   reduction, and clustering. Specifically, to recluster epithelial
   lineages, cell signatures were utilized for breast epithelial subsets
   described by Lim et al.^[ [377]^132 ^] as input features. Subsequently,
   the FindIntegrationAnchors step was utilized to identify anchors,
   followed by alignment with 30 dimensions (IntegrateData step) within
   Seurat. The final clustering results were visualized using t‐SNE. TF
   activities were estimated using metaTF with stranded pipeline except
   that tissue‐associated networks were used as prior networks. The CV of
   each gene/TF was then calculated using normalized counts or activity
   scores, and only the top 500 highly variable TFs or genes were retained
   for dimensionality reduction analysis (t‐SNE). The cell‐type
   specificity of TFs at the expression and activity levels was estimated
   using the “findAllCTSRegulons” method in the metaTF package. TFs with
   adjusted P < 0.05, CSS > 0.3, and percent > 0.1 for each stage were
   considered significant. Three evaluation indicators (Between/within,
   Calinski–Harabasz index, 1/Davies–Bouldin index (1/DBI)) for clustering
   results were calculated to show that using TF activities can more
   closely cluster cell populations and separate different types of cells
   as much as possible. For gene set activity analysis in scRNA‐seq data,
   the AUCell method from the AUCell Bioconductor package was used to
   calculate AUCell scores.

   The raw single‐cell count matrix data were downloaded from GEO
   ([378]GSE180286). Then, all cells expressing <300 genes were removed,
   and >20% mitochondrial counts. Genes expressed in fewer than three
   cells were likewise removed. The Seurat default parameters were used
   unless stated otherwise. For the clustering of all cell types, 2000
   variable genes were identified, and principal component analysis (PCA)
   was applied to the dataset to reduce dimensionality after regressing
   the percentage of mitochondrial genes. Data integration was performed
   using the Seurat functions FindIntegrationAnchors and IntegrateData.
   Clustering was conducted using the FindClusters function with 30 PCA
   components and a resolution parameter set to 1.2 and visualized using
   UMAP with the RunUMAP function. Cell types were annotated based on
   canonical cell‐type markers. Cells annotated as epithelial cells in
   each sample were extracted into a subset and separately re‐clustered,
   using the methods described above and the following parameters: n
   [features] = 2000, npcs = 20, and resolution = 0.5. Malignant
   epithelial cells were identified based on genome‐wide copy number
   profiles computed from the gene expression Unique Molecular Identifier
   (UMI) matrix using the Bayesian segmentation approach, CopyKAT^[
   [379]^133 ^] (v1.0.5). Then, malignant epithelial cell types were
   annotated with scSubtype. Normal epithelial cell types were annotated
   based on canonical cell‐type markers. TF activities were estimated
   using metaTF with a stranded pipeline, except that tissue‐associated
   networks were used as prior networks.

Validation on TCGA BRCA RNA‐Seq Dataset

   The RNA‐seq data from breast cancer patients were downloaded from the
   TCGA database ([380]https://portal.gdc.cancer.gov/), and subtyping was
   performed based on Berger et al.’s work^[ [381]^52 ^] (subtype PAM50).
   This dataset includes 575 LumA BRCA clinical subtype samples, 211 LumB
   BRCA clinical subtype samples, 198 Basal BRCA clinical subtype samples,
   82 Her2 BRCA clinical subtype samples, 40 Normal‐like BRCA clinical
   subtype samples, 113 BRCA tumor adjacent normal tissue samples.
   Subsequently, the genesets of cancer cell‐specific activated regulons
   which were identified by metaTF in BRCA epithelial cell scRNAseq data
   were used as input for GSVA.^[ [382]^53 ^] GSVA scores were calculated,
   and differentially activated regulons were determined using the limma R
   package with the parameter “adjusted P‐value of <0.01” (one versus the
   rest).

Analysis of Prostate Cancer Epithelial Cell Single‐Cell RNA Sequencing Data

   Human prostate cancer single‐cell RNA‐seq data ([383]GSE176031),
   excluding samples related to prostate epithelial organoids, were
   downloaded and epithelial cells were extracted while retaining the
   original cell type annotations. Subsequently, clustering analysis was
   performed on the five epithelial cell populations using metaTF and
   Seurat. In brief, gene expression was log‐normalized, and cells with
   fewer than 500 expressed genes were filtered out, resulting in 14 546
   epithelial cells. TF activities were estimated using metaTF with a
   stranded pipeline, except that tissue‐associated networks were used as
   prior networks. The CV of each gene/TF was then calculated using
   normalized counts or activity scores, and only the top 500 highly
   variable TFs or genes were retained for dimensionality reduction
   analysis (UMAP). To validate that TF activity can more effectively
   distinguish different epithelial cell populations, all epithelial cell
   TF activity profiles, gene expression profiles, and cell type
   annotations were inputted into a random forest classification model.
   After model training (using the RandomForestClassifier method from the
   scikit – learn Python package and specifying ‘max_depth = 5′ as one of
   the parameters), precision and recall values were calculated for each
   cell population.

Total RNA Extraction and Quantitative Real‐Time PCR

   Total RNA was extracted from tissues or cells with Trizol reagent
   (Invitrogen, United States). RNA was reverse‐transcribed into
   Complementary DNA (cDNA) using reverse Transcriptase M‐MLV (Takara,
   Dalian, China). The relative expression of genes was detected with
   Hieff Quantitative Polymerase Chain Reaction (qPCR) SYBR Green Master
   Mix (#11201ES08 Yeasen Biotechnology, Shanghai, China).

Cell Cultures

   The human TNBC cell lines BT549 and Hs578T were cultured in Dulbecco's
   Modified Eagle Medium (DMEM) containing 10% fetal bovine serum (Procell
   Life Science & Technology, China.) at 37 °C in a 5% CO[2] humidified
   atmosphere.

Transfection of siRNAs

   The siRNAs targeting 15 TFs and negative control were synthesized by
   Sangon Biotech (Shanghai, China). The sequences of these siRNAs are
   included in Table [384]S16 (Supporting Information). The transfection
   was mediated by jetPRIME transfection reagent (Polyplus‐transfection,
   France) according to the manufacturer's protocol.

Cell Migration and Invasion Assay

   Transwell chambers (6.5 mm insert, 24‐well plate, 8.0 µm polycarbonate
   membrane, Corning Incorporated, Corning, NY, USA) with or without
   Matrigel were used to measure the invasion and migration ability of
   breast cancer cells in vitro. Briefly, BT549 and HS578T cells
   transfected with siRNAs were seeded in the upper chamber of transwell
   at the density of 5 × 10^4 cells (for the migration assay) or 8 × 10^4
   cells (for the invasion assay) in serum‐free DMEM medium, and DMEM
   containing 10% fetal bovine serum was added to the lower chamber. After
   culturing for 24 h (for the migration assay) or 48 h (for the invasion
   assay), the upper chamber cells were wiped by using cotton swabs, and
   fixed with crystal violet. The number of cells across the filter was
   counted by microscopic imaging system.

Cell Proliferation Assay

   Breast cancer cells (BT549 and Hs578T) were plated in 96‐well plates at
   a density of 5000 cells per well. At 0, 24, 48, and 72 h, 20 µL of
   3‐(4,5‐Dimethylthiazol‐2‐yl)‐2,5‐diphenyltetrazolium bromide (MTT)
   reagent (BS186, Biosharp) was added to each well and mixed thoroughly.
   Following incubation for 4 h at 37 °C, Dimethyl Sulfoxide (DMSO) was
   mixed with the cell–MTT suspension for 10 s using a plate shaker, the
   conversion of substrate to chromogenic product by live cells was
   detected using a microplate reader to analyze cell viability.

Analysis of Breast Cancer Lymphoid Cell Single‐Cell RNA Sequencing Data

   Two breast cancer patient scRNA‐seq datasets from the 10× Genomics
   ([385]https://www.10xgenomics.com/resources/datasets) were analyzed.
   The data were integrated in R using Seurat v3.1.1 with the following
   steps: i) identified 2000 integration anchors using
   Seurat::FindIntegrationAnchors, ii) integrated the objects with
   Seurat::IntegrateData, iii) scaled the data and regressed out the
   nCount_RNA using all genes, and iv) calculated the top 30 PCs for
   dimensional reduction. TF activities were estimated with metaTF using
   tissue‐associated networks as prior network. The CV values for all
   gene/TFs were calculated using normalized counts or activity scores,
   and then the top 2000 highly variable genes or the top 500 highly
   variable TFs were retained for UMAP analysis. Differentially expressed
   genes between CD8^+ T[EM] C1 and C2 populations were identified using
   Seurat::FindMarkers function with the parameters “min.pct = 0.3,
   log.fold.change = 0.3.” To examine the consistency of the findings in
   other breast cancer dataset, an scRNA‐seq dataset of human breast
   cancer T cells ([386]GSE110686) was reanalyzed, and the same processing
   steps were performed.

   For the epinephrine RNA‐seq data, reads were aligned to the human
   reference genome (Gencode^[ [387]^130 ^] v43) with HISAT2^[ [388]^134
   ^] v2.0.52. Gene counts were derived with featureCounts^[ [389]^135 ^]
   v2.0.3. Differential expression analysis was performed using edgeR^[
   [390]^136 ^] with the parameters dispersion = 0.4 and P < 0.05. Figures
   were generated in the R package ggplot2
   ([391]https://github.com/tidyverse/ggplot2).

The Extraction of Tumor‐Infiltrating Lymphocytes

   Tumor tissue samples were obtained by surgical excision and dissociated
   into single‐cell suspensions for 3 h using 1 mg mL^−1 collagenase IV.
   Density gradient centrifugation was performed using lymphocyte
   separation medium to separate lymphocytes from other cell types.
   Isolated lymphocytes were washed with Phosphate Buffered Saline (PBS)
   to remove residual cell debris and surface staining was performed with
   Fluorescein Isothiocyanate (FITC) anti‐human CD8 antibody (1:20
   dilution, OKT8, Biolegend) at 4 °C in the dark for 30 min. After
   staining, cells were resuspended with PBS and filtrated through a mesh
   filter (40 µm), and then CD8^+ T lymphocytes were specifically isolated
   by BD FACSAria II cell sorter. Tumor‐infiltrating T cells were cultured
   in CST AIM V Medium (Gibco, Thermo Fisher Scientific) supplemented with
   10% FBS (Gibco, Thermo Fisher Scientific) and Human Recombinant IL‐2
   (500 units mL^−1, #78220, STEMCELL Technologies) for 15 days. The cells
   were treated with epinephrine (0.1 ng mL^−1, Sigma‐Aldrich) for 24 h
   followed by RNA sequencing.^[ [392]^71 ^]

Immunofluorescence Staining and Confocal Laser Scanning Microscopy

   Human breast cancer and tumor‐infiltrating T cells were fixed with 4%
   cool paraformaldehyde for 10 min and blocked with 1% Bovine Serum
   Albumin (BSA) for 30 min. After washing in PBS, breast cancer tissues
   were stained with NF‐H Polyclonal antibody (1:100 dilution, 21471‐1‐AP,
   Proteintech) at 4 °C overnight in the dark, followed by FITC goat
   aAtirabbit IgG (H+L) (1:100 dilution, ZF‐0311, ZSGB‐BIO) and
   tumor‐infiltrating T cells were stained with anti‐ADRB2 antibody (1:100
   dilution, AF6117, Affinity) at 4 °C overnight in the dark, followed by
   Rhodamine (TRITC) goat antiRabbit IgG (H+L) (1:100 dilution, AS040,
   ABclonal) at room temperature in the dark for 1 h. After three washes
   in PBS, the samples were stained with anti‐CD8 Mouse mAb (FITC
   Conjugate) (1:400 dilution, 55397S, Cell Signaling Technology) at room
   temperature in the dark for 3 h, and then nuclei were stained with
   4',6‐Diamidino‐2‐Phenylindole (DAPI) (D3571, Invitrogen) for 5 min in
   the dark. Finally, the samples were sealed with sealing agent.

   The expressions of ADRB2 and CD8 in breast cancer tissues were detected
   by polychromatic immunofluorescence staining. Polychromatic
   immunofluorescence staining was performed by using the three‐color
   multiple fluorescent immunohistochemical staining kit (HYDS0023, Guduo
   Technology, China) based on the tyramide signal amplification (TSA)
   technique following the manufacturer's manual. Briefly, deparaffinize
   and rehydrate breast cancer tissue sections, and then quench endogenous
   enzyme activity by incubating 30 min in 0.3% hydrogen peroxide and
   block nonspecific binding sites with goat serum for 10 min. The
   sections were incubated with primary antibodies against CD8 (AF5126,
   Affinity) at 4 °C overnight and incubated with secondary antibody
   labeled with Horseradish Peroxidase (HRP) provided with the kit for 50
   min, followed by GD‐520 fluorescent dye diluted by the signal
   amplification reagent provided with the kit at room temperature in the
   dark for 15 min. The steps above were repeated, including the
   following: antigen retrieval and endogenous peroxidase blocking, the
   sections were incubated with primary antibodies against ADRB2 (AF6117,
   Affinity) at 4 °C overnight and incubated with secondary antibody
   labeled with HRP provided with the kit for 50 min, followed by GD‐570
   fluorescent dye diluted by the signal amplification reagent provided
   with the kit at room temperature in the dark for 15 min. Finally,
   sections were counterstained with DAPI, sealed with glycerol and imaged
   using a ZEISS LSM 900 confocal microscope.

RNA Extraction, Library Construction, and Sequencing

   Total RNA was extracted using Trizol reagent (15596018, Thermo Fisher)
   following the manufacturer's procedure and the quantity and purity of
   RNA were detected by Bioanalyzer 2100 and RNA 6000 Nano LabChip Kit
   (5067‐1511, Agilent). After mRNA was purified by using Dynabeads Oligo
   (dT) (Thermo Fisher) and was fragmented into short fragments, and then
   the RNA fragments were reverse‐transcribed to create the cDNA by
   SuperScript II Reverse Transcriptase (18064014, Invitrogen), the
   products were amplified by PCR. Finally the high‐throughput sequencing
   (PE150) was performed on the Illumina Novaseq 6000 sequencer (LC‐Bio
   Technology Co., Ltd., Hangzhou, China).

Cell‐Type Specificity Estimation

   To evaluate the cell‐type specificity of TFs (or genes), an
   entropy‐based measure was used that quantifies the similarity between a
   TF activity and another predefined pattern, which represents the
   extreme state in which the TF is only activated in an indicated cell
   type. The specificity metric mostly relied on the Jensen–Shannon
   divergence^[ [393]^137 ^] (JSD), which was calculated as follows
   [MATH: <mrow><mrow><mi>JSD</mi><mfenced separators="" open="("
   close=")"><mrow><mi>g</mi><mo>,</mo><msup><mi>g</mi><mo>′</mo></msup></
   mrow></mfenced><mo linebreak="badbreak">=</mo><mi>H</mi><mfenced
   separators="" open="("
   close=")"><mfrac><mrow><mi>g</mi><mo>+</mo><msup><mi>g</mi><mo>′</mo></
   msup></mrow><mn>2</mn></mfrac></mfenced><mo
   linebreak="goodbreak">−</mo><mfrac><mrow><mi>H</mi><mfenced open="("
   close=")"><mi>g</mi></mfenced><mo>+</mo><mi>H</mi><mfenced
   separators="" open="("
   close=")"><msup><mi>g</mi><mo>′</mo></msup></mfenced></mrow><mn>2</mn><
   /mfrac></mrow></mrow> :MATH]
   (14)

   where g and g′ represent two discrete probability vectors (such as the
   TF activity score) with equal length n, and H represent the entropy of
   discrete probability vectors as
   [MATH: <mrow><mtable displaystyle="true"><mtr><mtd
   columnalign="right"><mrow><mi>H</mi><mfenced open="("
   close=")"><mi>p</mi></mfenced><mo
   linebreak="badbreak">=</mo><mo>−</mo><munderover><mo>∑</mo><mrow><mi>i<
   /mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>p</mi><
   mi>i</mi></msub><mi>log</mi><mfenced separators="" open="("
   close=")"><msub><mi>p</mi><mi>i</mi></msub></mfenced><mo>,</mo><mi>p</m
   i><mo linebreak="goodbreak">=</mo><mfenced separators="" open="("
   close=")"><mrow><msub><mi>p</mi><mn>1</mn></msub><mo>,</mo><msub><mi>p<
   /mi><mn>2</mn></msub><mo>,</mo><mtext>…</mtext><mo>,</mo><msub><mi>p</m
   i><mi>n</mi></msub></mrow></mfenced><mo>,</mo><mn>0</mn><mo
   linebreak="goodbreak">≤</mo><msub><mi>p</mi><mi>i</mi></msub><mspace
   width="0.28em"></mspace><mo
   linebreak="goodbreak">≤</mo><mn>1</mn><mo>,</mo><mi>and</mi><munderover
   ><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munde
   rover><msub><mi>p</mi><mi>i</mi></msub><mo
   linebreak="goodbreak">=</mo><mn>1</mn></mrow></mtd></mtr></mtable></mro
   w> :MATH]
   (15)

   Then, the distance between two TF activity patterns, r ^1 and r ^2 was
   defined as
   [MATH: <mrow><mrow><mi>JS</mi><msub><mi
   mathvariant="normal">D</mi><mi>dist</mi></msub><mfenced separators=""
   open="("
   close=")"><mrow><msup><mi>r</mi><mn>1</mn></msup><mo>,</mo><msup><mi>r<
   /mi><mn>2</mn></msup></mrow></mfenced><mo
   linebreak="badbreak">=</mo><msqrt><mrow><mi>JSD</mi><mfenced
   separators="" open="("
   close=")"><mrow><msup><mi>r</mi><mn>1</mn></msup><mo>,</mo><msup><mi>r<
   /mi><mn>2</mn></msup></mrow></mfenced></mrow></msqrt></mrow></mrow>
   :MATH]
   (16)

   The CSS of a TF,
   [MATH: <mrow><mrow><mi>r</mi><mo>=</mo><mo
   stretchy="false">(</mo><mrow><msup><mi>r</mi><mn>1</mn></msup><mo>,</mo
   ><msup><mi>r</mi><mn>2</mn></msup><mo>,</mo><mtext>…</mtext><mo>,</mo><
   msup><mi>r</mi><mi>m</mi></msup></mrow><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   , in an indicated cell type was calculated as
   [MATH: <mrow><mrow><mi>CSS</mi><mfenced open="("
   close=")"><mi>r</mi></mfenced><mo
   linebreak="badbreak">=</mo><mn>1</mn><mo
   linebreak="goodbreak">−</mo><mi>JS</mi><msub><mi
   mathvariant="normal">D</mi><mi>dist</mi></msub><mfenced separators=""
   open="(" close=")"><mrow><mi>r</mi><mo>,</mo><msup><mi>r</mi><mi
   mathvariant="normal">c</mi></msup></mrow></mfenced></mrow></mrow>
   :MATH]
   (17)

   where r ^c represents a predefined pattern of the extreme case in which
   the TF is only activated in this cell type and was defined as
   [MATH: <mrow><mrow><msup><mi>r</mi><mi>c</mi></msup><mo
   linebreak="badbreak">=</mo><mfenced separators="" open="("
   close=")"><mrow><msubsup><mi>r</mi><mn>1</mn><mi>c</mi></msubsup><mo>,<
   /mo><msubsup><mi>r</mi><mn>2</mn><mi>c</mi></msubsup><mo>,</mo><mtext>…
   </mtext><mo>,</mo><msubsup><mi>r</mi><mi>n</mi><mi>c</mi></msubsup></mr
   ow></mfenced><mo>,</mo><mi>s</mi><mo>.</mo><mi>t</mi><mo>.</mo><msubsup
   ><mi>r</mi><mi>i</mi><mi>e</mi></msubsup><mo
   linebreak="goodbreak">=</mo><mfenced separators="" open="{"
   close=""><mtable><mtr><mtd><mrow><mn>1</mn><mo>,</mo><mi>i</mi><mo>∈</m
   o><mspace
   width="0.28em"></mspace><mi>c</mi></mrow></mtd></mtr><mtr><mtd><mrow><m
   n>0</mn><mo>,</mo><mi>i</mi><mo>∉</mo><mspace
   width="0.28em"></mspace><mi>c</mi></mrow></mtd></mtr></mtable></mfenced
   ></mrow></mrow> :MATH]
   (18)

   In the context of TF activity data, the i represent each cell and the n
   represent the total number of cells. Meanwhile, for TF activity scores
   with negative values, and these were normalized into ranges from 0 to 1
   as
   [MATH: <mrow><mfrac><mrow><mi>x</mi><mspace
   width="0.28em"></mspace><mo>−</mo><mspace
   width="0.28em"></mspace><msub><mi>x</mi><mi>min</mi></msub></mrow><mrow
   ><msub><mi>x</mi><mi>max</mi></msub><mspace
   width="0.28em"></mspace><mo>−</mo><mspace
   width="0.28em"></mspace><msub><mi>x</mi><mi>min</mi></msub></mrow></mfr
   ac></mrow> :MATH]
   .

   Next, to evaluate the significance of a TF that is specifically
   activated in an indicated cell type, the P‐value of TF was estimated
   using a bootstrap strategy. Specifically, for a TF in an indicated cell
   type, the cell labels were shuffled and the simulated CSS were
   calculated 10 000 times. Next, the P‐value was calculated as the
   percentage of the simulated CSSs that were greater than the actual CSS
   within all 10 000 simulated CSSs, and the Benjamini & Hochberg method
   was used to adjust the P‐values. Furthermore, the cell‐type‐specific
   methods were also suitable for gene expression.

Statistical Analysis

   To compare the means of three or more independent groups (Figure
   [394]S4i,b, Supporting Information), a one‐way Analysis of Variance
   (ANOVA) was employed using the aov function from the stats package in
   R. If the overall ANOVA was significant (*P < 0.05), post‐hoc
   comparisons were performed using Tukey's Honestly Significant
   Difference (HSD) test to identify specific group differences.
   Thresholds for statistical significance are denoted by *P < 0.05, **P <
   0.01, and ***P < 0.001.

Conflict of Interest

   The authors declare no conflict of interest.

Author Contributions

   Y.H., Y.Z., G.T., M.S., and P.T. contributed equally to this work.
   D.W., X.b.L., and Z.L. conceived the study and designed and supervised
   the study. D.W., X.b.L., Y.Z., and M.S. performed the experiments.
   Y.H., G.T., and P.T. performed bioinformatic analyses. D.W., X.b.L.,
   Y.H., and Z.L. wrote the manuscript with input from all of the authors.
   Y.Y., X.Z., M.L., X.n.L., L.W., J.C., H.Z., Y.H., and Y.H. revised the
   manuscript.

Supporting information

   Supporting Information
   [395]ADVS-12-e10745-s004.docx^ (50.9MB, docx)

   Supplemental Table 1
   [396]ADVS-12-e10745-s008.xlsx^ (19KB, xlsx)

   Supplemental Table 2
   [397]ADVS-12-e10745-s007.xlsx^ (10.4KB, xlsx)

   Supplemental Table 3
   [398]ADVS-12-e10745-s019.xlsx^ (197.1KB, xlsx)

   Supplemental Table 4
   [399]ADVS-12-e10745-s010.xlsx^ (17.6KB, xlsx)

   Supplemental Table 5
   [400]ADVS-12-e10745-s016.xlsx^ (18.4KB, xlsx)

   Supplemental Table 6
   [401]ADVS-12-e10745-s014.xlsx^ (11.4KB, xlsx)

   Supplemental Table 7
   [402]ADVS-12-e10745-s009.xlsx^ (14.5KB, xlsx)

   Supplemental Table 8
   [403]ADVS-12-e10745-s006.xlsx^ (295KB, xlsx)

   Supplemental Table 9
   [404]ADVS-12-e10745-s021.xlsx^ (13.5KB, xlsx)

   Supplemental Table 10
   [405]ADVS-12-e10745-s003.xlsx^ (10.7KB, xlsx)

   Supplemental Table 11
   [406]ADVS-12-e10745-s015.xlsx^ (18.8KB, xlsx)

   Supplemental Table 12
   [407]ADVS-12-e10745-s020.xlsx^ (22.5KB, xlsx)

   Supplemental Table 13
   [408]ADVS-12-e10745-s012.xlsx^ (12.4KB, xlsx)

   Supplemental Table 14
   [409]ADVS-12-e10745-s001.xlsx^ (22.1KB, xlsx)

   Supplemental Table 15
   [410]ADVS-12-e10745-s017.xlsx^ (15.5KB, xlsx)

   Supplemental Table 16
   [411]ADVS-12-e10745-s018.xlsx^ (11.4KB, xlsx)

   Supplemental Table 17
   [412]ADVS-12-e10745-s011.xlsx^ (44.7KB, xlsx)

   Supplemental Table 18
   [413]ADVS-12-e10745-s005.xlsx^ (48.5KB, xlsx)

   Supplemental Table 19
   [414]ADVS-12-e10745-s013.xlsx^ (11.6KB, xlsx)

   Supplemental Table 20
   [415]ADVS-12-e10745-s002.xlsx^ (2.2MB, xlsx)

Acknowledgements