Abstract

   RNA velocity is closely related with cell fate and is an important
   indicator for the prediction of cell states with elegant physical
   explanation derived from single-cell RNA-seq data. Most existing RNA
   velocity models aim to extract dynamics from the phase delay between
   unspliced and spliced mRNA for each individual gene. However,
   unspliced/spliced mRNA abundance may not provide sufficient signal for
   dynamic modeling, leading to poor fit in phase portraits. Motivated by
   the idea that RNA velocity could be driven by the transcriptional
   regulation, we propose TFvelo, which expands RNA velocity concept to
   various single-cell datasets without relying on splicing information,
   by introducing gene regulatory information. Our experiments on
   synthetic data and multiple scRNA-Seq datasets show that TFvelo can
   accurately fit genes dynamics on phase portraits, and effectively infer
   cell pseudo-time and trajectory from RNA abundance data. TFvelo opens a
   robust and accurate avenue for modeling RNA velocity for single cell
   data.

   Subject terms: Computational models, Genome informatics, RNA
   sequencing, Data processing, Gene regulatory networks
     __________________________________________________________________

   Most RNA velocity models extract dynamics from the phase delay between
   unspliced and spliced mRNA for each gene. Here, authors propose TFvelo,
   broadening RNA velocity beyond splicing information to include gene
   regulation. TFvelo accurately models genes dynamics and infers cell
   pseudo-time from RNA abundance data.

Introduction

   Single-cell RNA sequencing (scRNA-seq)^[30]1–[31]3 provides a wealth of
   information about the gene expression profile of individual cells. To
   expand the scope of scRNA-seq analysis beyond a static snapshot and
   capture cellular dynamics without tracking alive individual cells over
   time, many trajectory inference (TI) approaches, like PAGA^[32]4,
   Monocle^[33]5, Slingshot^[34]6 and Palantir^[35]7 have been developed.
   While these TI methods enable pseudotime analysis at both the cell and
   gene levels, they typically require the annotation of initial
   cells^[36]8. In recent years, RNA velocity^[37]9 theory was proposed,
   which describes the time derivative of gene abundance in a
   physical-informed approach, by modeling the relationship between the
   unspliced (immature) and spliced (mature) mRNAs. Combining velocities
   across multiple genes, velocity-based pseudotime and cell fate can be
   inferred from the transition probability between cells. To estimate the
   RNA velocity, velocyto^[38]9 is introduced as the vanilla approach with
   a steady state assumption. ScVelo^[39]10 models the dynamics without
   the steady state assumption and employs an Expectation-Maximization
   (EM) approach for better reconstructing the underlying kinetics.

   More recently, several approaches have been developed to improve the
   estimation of RNA velocity. For instance, VeloAE^[40]11 utilized an
   auto-encoder and low-dimensional space to smooth RNA velocity. Still in
   the unspliced/spliced space, DeepCycle^[41]12 used an autoencoder to
   map cell cycle-related genes. UniTVelo^[42]13 directly designed a
   function of time to model the spliced RNA level. In contrast to
   deterministic models, VeloVI^[43]8, Pyrovelocity^[44]14 and
   veloVAE^[45]15 estimated RNA velocity using Bayesian inference
   frameworks. LatentVelo^[46]16 embeds the unspliced and spliced
   expression into the latent space with a variational auto-encoder, so
   that a low-dimensional representation of RNA velocity could be
   obtained. CellDancer^[47]17 models the transcription, splicing and
   degradation rate of a gene as a function of its unspliced and spliced
   counts with a neural network. Furthermore, apart from the
   unspliced/spliced data, dynamics can also be modeled with additional
   information. For instance, Dynamo^[48]18 improve the RNA velocity model
   by using the new/total labeled RNA-seq^[49]19. Taking the advantage of
   single-cell multi-omics technology^[50]20,[51]21, RNA velocity analysis
   can be expanded to protein abundances (protaccel^[52]22) and
   single-cell ATAC-seq (MultiVelo^[53]23) datasets. However, these
   additional omics data or labeling may be cost or even not available in
   most single cell datasets.

   While RNA velocity theory has improved the inference of single-cell
   trajectories, pseudo-time, and gene regulation inference in numerous
   studies^[54]24, current RNA velocity models still face severe
   challenges. Firstly, RNA velocity models always treat each gene
   independently and do not take the underlying regulation into
   consideration^[55]25, although it is expected to be helpful for cell
   fate inference by integrating gene regulatory mechanism analysis.
   Secondly, these models can not fit the gene dynamics well on most
   genes, which might be because that the transcriptional dynamics of mRNA
   splicing may not provide sufficient signal in single cell
   resolution^[56]25. Most conventional approaches assume that the joint
   plot between unspliced and spliced expression should form a clockwise
   curve on the phase portrait because of the phase delay within
   them^[57]9,[58]10,[59]13,[60]26, which is, nevertheless, rarely
   observed from the data. The high sparsity and noise nature of unspliced
   mRNA counts^[61]27,[62]28 could be one significant reason. The short
   time scale of splicing process could also make it hard to extract
   dynamics from the delay between unspliced and spliced RNA. Last but not
   least, the RNA velocity models can only be applied to scRNA-seq
   datasets with unspliced/spliced or new/total labels. However, the
   abundance of unspliced/spliced data may not be available in certain
   datasets, such as those obtained through Fluorescence In Situ
   Hybridization (FISH) technologies^[63]29, and some human sequencing
   data due to privacy restrictions. These challenges motivate us to
   construct the gene dynamics based on regulation, instead of only
   relying on the unspliced/spliced counts.

   The gene regulation has been explored a lot in the field of single cell
   research^[64]30–[65]34. In this study, we aim at inferring the gene
   dynamics and cell fate based on the underlying regulatory among genes.
   To this end, we investigate the relationship between gene regulatory
   patterns and RNA velocity, which reveals that the RNA velocity can be
   approximately estimated as a linear combination of the expression
   levels of transcription factors (TFs). Notably, the TFs with the
   non-zero weights are significantly enriched with TF set functionally
   linked to the target gene. In addition, we find that the joint
   distribution of expressions between a TF and its target could form a
   clockwise curve on the co-expression plot^[66]35, indicating the phase
   delay between them. Those findings imply that the abundance of TFs
   could also be used to construct dynamics model of RNA velocity.

   In this study, we propose TFvelo to model the RNA velocity based on the
   expression levels of TFs, where the velocity refers to the changing
   rate of RNA abundance. The computational framework of TFvelo relies on
   a generalized EM algorithm, which iteratively updates the learned
   representation of TFs, the latent time of cells, and the parameters in
   the dynamic equation. TFvelo can accomplish analysis performed by
   previous RNA velocity studies, such as gene-specific phase portrait
   fitting, velocity modeling, inferring pseudotime without annotation of
   initial states, and visualizing cell fate with the velocity-based
   transition stream plot. Unlike methods that depend on splicing
   information, TFvelo can robustly work on genes with sparse unspliced
   counts and even datasets without splicing information. Using both
   synthetic data and multiple real scRNA-seq datasets, we show that the
   TFvelo model can capture the dynamics on phase portraits of TFs-target,
   accurately infer the pseudo-time and the developmental trajectories of
   cells and help explore biological findings.

Results

Findings: RNA velocity can be approximately estimated with TFs’ abundance

   The fact that RNA velocity can be modelled by the transcriptional
   regulation is supported by the following two findings. We find that the
   RNA velocity model by previous approach can be approximately estimated
   as a linear combination of TFs’ abundance. On the scRNA-Seq pancreas
   dataset^[67]9,[68]36, given a target gene, Least Absolute Shrinkage and
   Selection Operator (LASSO) regression is applied to predict RNA
   velocity modelled by scVelo, based on the expression level of TFs and
   the target gene itself (Fig. S[69]1a). Here we run scVelo pipeline on
   pancreas dataset with the default parameters to get the velocities, and
   only apply LASSO regression to those genes which are modelled by
   scVelo. We feed all TFs^[70]37 into LASSO, and get a sparse
   weights-vector where only some TFs have non-zero weights. As a result,
   the RNA velocity of gene
   [MATH: <mi>g</mi> :MATH]
   can be estimated as:
   [MATH:
   <msub><mrow><mi>v</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>=</mo><m
   under><mrow><mo>∑</mo></mrow><mrow><msub><mrow><mi>T</mi><mi>F</mi></mr
   ow><mrow><mi>i</mi></mrow></msub><mo>∈</mo><msub><mrow><mi>S</mi></mrow
   ><mrow><mi>g</mi></mrow></msub></mrow></munder><msub><mrow><mi>w</mi></
   mrow><mrow><mi>g</mi><mo>,</mo><msub><mrow><mi>T</mi><mi>F</mi></mrow><
   mrow><mi>i</mi></mrow></msub></mrow></msub><msub><mrow><mi>e</mi></mrow
   ><mrow><msub><mrow><mi>T</mi><mi>F</mi></mrow><mrow><mi>i</mi></mrow></
   msub></mrow></msub><mo>+</mo><msub><mrow><mi>w</mi></mrow><mrow><mi>g</
   mi><mo>,</mo><mi>g</mi></mrow></msub><msub><mrow><mi>e</mi></mrow><mrow
   ><mi>g</mi></mrow></msub> :MATH]
   1

   where
   [MATH: <msub><mrow><mi>S</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   represents the set of TFs,
   [MATH:
   <msub><mrow><mi>e</mi></mrow><mrow><msub><mrow><mi>T</mi><mi>F</mi></mr
   ow><mrow><mi>i</mi></mrow></msub></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi>e</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   are the expression level of
   [MATH:
   <msub><mrow><mi>T</mi><mi>F</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   and gene
   [MATH: <mi>g</mi> :MATH]
   respectively,
   [MATH:
   <msub><mrow><mi>w</mi></mrow><mrow><mi>g</mi><mo>,</mo><msub><mrow><mi>
   T</mi><mi>F</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msub>
   :MATH]
   and
   [MATH:
   <msub><mrow><mi>w</mi></mrow><mrow><mi>g</mi><mo>,</mo><mi>g</mi></mrow
   ></msub> :MATH]
   are the weights of
   [MATH:
   <msub><mrow><mi>T</mi><mi>F</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   and the target gene itself.

   Cells are separated into training and test sets to evaluate the LASSO
   model. The regression results on the test set shows a high correlation
   between the predicted and labeled velocity, as demonstrated in Fig.
   S[71]1b,c, which indicates that velocity of the target gene can be
   predicted based on the expressions of its TFs. In addition, as shown in
   Table [72]S1, the TFs with non-zero weights are significantly enriched
   in TFs set functionally linked to the target gene (according to the
   ENCODE TF-target database^[73]38) with p value of 6.66e-06 (one-sided
   t-test), further suggesting that velocity could be modelled by TFs.
   Secondly, due to regulatory relationship between TF and target, the
   expression levels of them change asynchronously, which could result in
   a clockwise curve on the phase portraits. And as expected, we find the
   desired clockwise curve on the co-expression plots between some
   TF-target pairs^[74]35 (Fig. S[75]2), which is similar to the
   theoretical prediction of existing velocity models on unspliced-spliced
   space. These findings suggest that it is possible to construct a novel
   RNA velocity model based on the expression of TFs, instead of using
   unspliced/spliced RNA counts or any additional experimental omics data
   and labels.

Estimate RNA velocity with TFvelo

   Here we report TFvelo to estimate the RNA velocity of a target gene
   using the abundance of the target gene itself and its potential TFs.
   Figure [76]1a provides an example of phase portrait fitting on a gene
   from pancreas dataset, to explain how TFvelo outperforms splicing-based
   methods. Previous methods focus on modeling the dynamics between the
   spliced and unspliced RNAs, while transcriptional dynamics of RNA
   splicing may not provide sufficient signal^[77]25, and the 2D
   unspliced-spliced RNA level could be too noisy or sparse to fit. By
   comparison, TFvelo considers the relationship between the target gene
   and multiple TFs, by learning a representation of regulation, allowing
   for more robust and accurate modeling based on gene regulatory network.

Fig. 1. The overview of TFvelo.

   [78]Fig. 1
   [79]Open in a new tab

   a The comparison among constructed dynamics on LITAF gene in Pancreas
   dataset by TFvelo and previous approaches. Cells are colored in the
   same way as those in Fig. [80]2a. b The workflow of TFvelo. c–e The
   phase portrait plot on WX-y space, the plot of WX with respect to
   pseudotime, and the plot of y with respect to pseudotime. The color bar
   indicating the RNA velocity is share by the three panels. For
   simplification,
   [MATH:
   <msub><mrow><mi>W</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><m
   sub><mrow><mi>X</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msu
   b><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi>t</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   , are written as
   [MATH: <mi mathvariant="normal">W</mi><mo>,</mo><mi
   mathvariant="normal">X</mi><mo>,</mo><mi mathvariant="normal">y</mi>
   :MATH]
   and
   [MATH: <mi>t</mi> :MATH]
   , respectively. f–h The steps within each iteration of the generalized
   EM algorithm. Here the purple line represents the current prediction by
   the model. f Assignment of latent time to each cell (in green) to find
   the corresponding target point (shown in red), which is the nearest
   point to the cell on the purple line. g Optimization of the weights
   [MATH: <mi>W</mi> :MATH]
   to move all cells along the
   [MATH: <mi>W</mi><mi>X</mi> :MATH]
   axis, minimizing the mean loss over all cells. h Optimization of the
   parameters in the dynamical function, to move the target point on the
   curve closer to each cell. i–k Visualization and analysis of TFvelo’s
   results, including phase portrait fitting pseudo-time inference, and
   cell trajectory prediction. l–o Simulation on synthetic dataset. l An
   example of the generated synthetic dynamics. m Reconstructed dynamics
   by TFvelo of the sample shown in (l) with MSE loss. n The joint plot of
   ground truth velocity and inferred velocity with their spearman
   correlation. The red reference line refers to that the ground truth is
   equal to the inferred value. o The joint plot of ground truth weights
   and inferred weights with their spearman correlation. Source data are
   provided as a Source Data file.

   For a target gene
   [MATH: <mi>g</mi> :MATH]
   , TFvelo explores the relationships between its expression level
   [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   and the regulation of TFs on
   [MATH: <mi>g</mi> :MATH]
   . In TFvelo, the RNA velocity
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac> :MATH]
   , which is defined as the time derivative of RNA abundance, is
   determined by the abundance of involved TFs
   [MATH: <msub><mrow><mi mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub> :MATH]
   and itself,
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac><mo>=</mo><mi>h</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mi
   mathvariant="normal">;</mi><msub><mrow><mi>Ψ</mi></mrow><mrow><mi>g</mi
   ></mrow></msub></mrow></mfenced> :MATH]
   2

   Using a top-down strategy, which can relax the gene dynamics to more
   flexible profiles^[81]13, TFvelo directly designs a profile function of
   target gene’s expression level,
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>=</mo><mi>f</mi><mfenced
   close=")" open="("><mrow><mi>t</mi><mi
   mathvariant="normal">;</mi><msub><mrow><mi>Φ</mi></mrow><mrow><mi>g</mi
   ></mrow></msub></mrow></mfenced> :MATH]
   3

   where
   [MATH: <msub><mrow><mi>Ψ</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>Φ</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   are two sets of gene-specific time-invariant parameters, which describe
   the influence of TFs on the target gene, and the shape of phase
   portraits, respectively. Considering that linear models have been
   employed to represent the gene regulatory in previous
   studies^[82]34,[83]39–[84]41,
   [MATH: <mi>h</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>,</mo><msub><mrow><mi>y</
   mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>;</mo><msub><mrow><mi>Ψ</
   mi></mrow><mrow><mi>g</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   is implemented with a linear model
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac><mo>=</mo><msub><mrow><mi
   mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>−</mo><msub><mrow><mi>γ</
   mi></mrow><mrow><mi>g</mi></mrow></msub><msub><mrow><mi>y</mi></mrow><m
   row><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced> :MATH]
   . The profile function of
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")" open="("><mrow><mi>t</mi></mrow></mfenced> :MATH]
   can be chosen flexibly from a series of second-order differentiable
   functions^[85]13, which is designed as a

   function in implementation (Methods). This is because sine functions
   can model the nonmonotonic dynamics start from both up-regulation and
   down-regulation (e.g., gene MGST3 in Fig. S[86]8), which is not
   feasible for those methods relying on one switching time
   point^[87]9,[88]10,[89]18.

   For optimization, all learnable parameters are divided into three
   groups, which are cell-specific latent time
   [MATH: <mi>t</mi> :MATH]
   , TFs’ weights
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub> :MATH]
   and the shape parameters of phase portraits
   [MATH: <msub><mrow><mi
   mathvariant="normal">ψ</mi></mrow><mrow><mi>g</mi></mrow></msub> :MATH]
   , in which
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>c</mi><mi>e</mi><mi>l</mi><mi>l<
   /mi></mrow></msub> :MATH]
   and
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F</mi></mrow></msub>
   :MATH]
   represent number of cells and number of genes, respectively. A
   generalized EM^[90]42,[91]43 algorithm is adopted to minimize the loss
   function in each iteration, by updating the three groups of parameters
   alternately (Fig. [92]1f–h). Based on the optimized models on all
   genes, we can obtain a transition matrix and calculate the pseudotime
   and velocity stream for cell trajectory analysis. Please find the
   details in Methods.

TFvelo can reconstruct the dynamic model and detect the regulation
relationship on synthetic datasets

   To validate that TFvelo can detect the underlying dynamic from data, we
   firstly test it on a synthetic dataset (see Methods), where a target
   gene is regulated by 10 TFs, and the corresponding weights are either
   positive or negative, representing that TF may up-regulate or
   down-regulate the target gene, respectively. We randomly generate 200
   synthetic gene dynamics on 1000 cells with different ground truth
   parameters. For each of them, the TFvelo reconstructs the dynamic
   function and estimates the weight of each TF, based on the simulated
   expressions of TFs and target.

   The performance of TFvelo was evaluated using the spearman correlation
   between the ground truth and the reconstructed values. Under 20
   iterations for optimization, the spearman correlation coefficient
   between the ground truth weights and inferred weights of TFs is 0.823,
   and that for velocity estimation is 0.894 (Fig. [93]1n, o). The high
   consistence shows that TFvelo can effectively reconstruct the
   underlying dynamics. In addition, the high F1 score (0.944) and high
   ratio of area under the receiver operating characteristic (ROC) curve
   (0.962) also demonstrate that TFvelo can correctly recognize whether
   the weight is positive or negative. In addition, we employ a vanilla EM
   approach as baseline here, by removing the optimization step on TF
   weight, as well as the strategy of multiple points initialization.
   After 20 optimization iterations, the performance of this baseline is
   much poorer than that of TFvelo (Fig. S[94]4). Please see Supplementary
   Information Section 3 for details on the synthetic dataset.

TFvelo can model cell differentiation process on Pancreas dataset

   To evaluate TFvelo on real scRNA-seq data, we first apply it to the
   dataset of E15.5 mouse pancreas^[95]9, which has been widely adopted in
   previous RNA velocity studies. The pseudo-time and trajectory predicted
   by TFvelo can identify the differentiation process (Fig. [96]2a, b).
   Compared with previous methods, TFvelo shows advantage on phase
   portrait fitting. Example genes are shown in Figs. [97]2c, S[98]7.
   Figure [99]2c compares the phase portrait fitting of different methods,
   and Fig. S[100]7 provides the gene expression dynamics resolved along
   gene-specific latent time obtained by scVelo and TFvelo. The expression
   of H19, which plays an important role in early development^[101]44, is
   found to increase at the beginning and then decrease at the stage of
   Ngn3 high EP by TFvelo. For MAML3, only TFvelo can correctly detect the
   process that starts with up-regulation then turns to down-regulation.
   Other methods can not correctly detect the order between pre-endocrine
   cells (in green) and those endocrine populations, including alpha,
   beta, delta and epsilon cells (in blue, light blue, purple and pink).

Fig. 2. Results of TFvelo on pancreas dataset.

   [102]Fig. 2
   [103]Open in a new tab

   a Pseudo-time inferred by TFvelo in the UMAP-based embedding space. b
   The stream plot projected to UMAP. The Ngn3 low/high EP denotes the
   neurogenin3 low/high epithelial cells. c Two example genes for
   illustrating the dynamics fitting in phase portrait. The cells are
   colored in the same way as panel (b). d Comparison on the intra class
   distance in phase portrait. Two-sided t-test is applied without
   adjustment. e Comparison on the inter class distance in phase portrait.
   Two-sided t-test is applied without adjustment. f Comparison on the
   inter-cluster coherence. g The cosine similarity between the velocity
   vectors obtained by scVelo and TFvelo. For the boxplots in panels (d,
   e, f, h) the down, central and up hinges correspond to the first
   quartile, median value and third quartile, respectively. The whiskers
   extend to 1.5× the interquartile range of the distribution from the
   hinge. The number of samples is 405 for boxplots in panel d, e and f.
   The number of samples for each boxplot (from top to down) in panel g is
   916, 262, 642, 592, 591, 481, 70 and 142, respectively. Source data are
   provided as a Source Data file.

   To quantitatively evaluate the phase portrait fitting, we propose three
   metrics, including: (1) The intra-class distance on phase portrait,
   which reflects how cells within the same type gather on the phase
   portrait (the lower the better). (2) The inter-class distance on phase
   portrait, which reflects how cells from different types are separated
   well on the phase portrait (the higher the better). (3) The fitting
   error on phase portrait, which measures the normalized distance between
   each cell to the constructed model on the phase portrait. We calculate
   these metrics based on the genes that can be commonly modeled by all
   methods. For the “intra-class distance” and “inter class distance”, we
   take the Paired Samples T Test between the value obtained on the same
   gene by TFvelo’s phase portrait and the un/spliced phase portrait,
   which is shared by all baseline approaches, including scVelo, Dynamo,
   UniTVelo and cellDancer. The TFvelo results (Fig. [104]2d, e) show
   advantage with extreme high significance (p = 4.23e-51 and
   p = 5.65e-117). Compared with those un/spliced based approaches, TFvelo
   can achieve lower intra-class distance, higher inter-class distance and
   lower fitting error (Fig. S[105]5h). To draw a conclusion, TFvelo can
   address the current issue caused by noisy un/spliced data for RNA
   velocity modeling, by using the learned feature with transcription
   regulation.

   To quantitatively evaluate the learned streams, we employ two metrics
   adopted in VeloAE^[106]11 and UniTVelo^[107]13, which are “Cross
   Boundary Direction Correctness (CBDir)” and “In Cluster Coherence
   (ICCoh)”. CBDir assesses the correctness of transitions from one cell
   type to the next, utilizing boundary cells with ground truth
   annotations. ICCoh is computed using cosine similarity among cells
   within the same cluster, measuring the smoothness of velocity in
   high-dimensional space within clusters. As shown in Fig. [108]2f,
   S[109]5i, TFvelo achieves similar median value compared to UniTVelo,
   and significantly outperforms the other methods. Also TFvelo can even
   achieve a high velocity consistency (Fig. S[110]5j), which indicates a
   higher consistency of RNA velocities across neighboring cells in the
   UMAP space^[111]10. The direct reason for the higher consistency scores
   achieved by TFvelo and UniTVelo is that they can generate smoother
   velocity streams across neighboring cells (Fig. S[112]5b, d). This
   might be potentially attributed to the similarity in the models of
   TFvelo and UniTVelo. Both methods directly model gene abundance as
   high-order differentiable functions with respect to time, without
   assuming a switching point that could disrupt the smoothness of the
   model. In addition, to show how the velocity inferred by TFvelo and
   scVelo correlated with each other, we calculate the cosine similarity
   of velocity vectors obtained through both methods, using the commonly
   modeled genes. The distribution of cosine similarity on each cell type
   is shown in Fig. [113]2g.

   Then we conduct KEGG pathways enrichment analysis based on the best
   fitted genes, and observe a strong enrichment associated with insulin
   secretion and the glucagon signaling pathway (Fig. [114]3a).
   Additionally, we find that REST and HMGN3 consistently exhibit high
   absolute weights on modeling most target genes (Fig. [115]3b). When
   examining the UMAP distribution, it is clear that REST expression
   decreases during differentiation, while HMGN3 expression increases
   (Fig. [116]3c). Previous research has established REST as a key
   negative regulator of endocrine differentiation during pancreas
   organogenesis^[117]45,[118]46. Earlier studies have also report HMGN3
   to be a regulator for insulin secretion in pancreatic cells^[119]46. We
   further analyze the weights of these two TFs on modeling genes within
   the insulin secretion pathway. REST consistently has a negative weight,
   while HMGN3 consistently exhibits a positive weight (Fig. [120]3d),
   which is consistent with the previous findings.

Fig. 3. Biological analysis based on the results of TFvelo on pancreas
dataset.

   [121]Fig. 3
   [122]Open in a new tab

   a The KEGG pathway enrichment from the genes best fitted by TFvelo on
   pancreas dataset. The adjusted p-value is computed using the Fisher
   exact test. b List of TFs having high weights on multiple target genes.
   The counts mean the number of target genes where the TF have a
   normalized weight whose absolute value is larger than 0.5. c The
   distribution of REST and HMGN3 on UMAP. d The weights distribution of
   TFs REST and HMGN3 on modeling in the genes that in the insulin
   secretion pathway. The down, central and up hinges correspond to the
   first quartile, median value and third quartile, respectively. The
   whiskers extend to 1.5× the interquartile range of the distribution
   from the hinge. The numbers of samples for the boxplots corresponding
   to REST and HMGN3 are 7 and 5, respectively. e Phase analysis on two
   example genes. On each row, from the left to the right: the phase
   portrait fitting, the distribution of abundance on UMAP, the value of
   learned TFs representation (WX) and the target gene along with
   pseudotime, and the abundance of a TF and the target gene along with
   pseudotime. f In simulation, the visualization of two variables where
   the time delay between them is very short or relatively long. The
   colorbar reflects pseudotime. g Comparison of the phase portrait
   fitting and expression level along pseudotime obtained by scVelo and
   TFvelo on two example genes. Source data are provided as a Source Data
   file.

   The comparison of phase between TFs and target are shown in
   Fig. [123]3e, which shows clear phase delay between the learned
   representation of TFs (
   [MATH: <mi mathvariant="bold">WX</mi> :MATH]
   ) and target gene’s expression. We also analyze the phase delay between
   a TF and the individual target. Among the two genes, HMGN3 has a
   positive weight on modeling SURF, where a phase delay between them can
   be observed. On the contrary, REST has a negative one on modeling ECE1,
   where their changing shows negative correlation.

   In addition to the sparsity and noise in unspliced data (Fig. S[124]6),
   the time scale of splicing dynamics might also make it hard to extract
   dynamics from the delay between unspliced and spliced RNA. It has been
   reported that splicing can be accomplished within 30 s^[125]47, a
   duration much shorter than the timescale of the entire differentiation
   process described in scRNA-seq datasets. Consequently, the phase delay
   between the unspliced and spliced mRNA might be too brief to be
   captured in the phase portrait. This might be one reason why the
   theoretical curve cannot be observed in the unspliced-spliced phase
   portrait (Fig. [126]3f, g). Figure [127]3f provide simulations to
   illustrate that, even with the same level of noise, a shorter phase
   delay between variables could make fitting the phase portrait more
   challenging. We also observed phenomena from the scRNA-seq dataset
   supporting our hypothesis. Figure [128]3g shows that the spliced and
   unspliced changes almost synchronously on some genes, while TFvelo
   could detect a clearer phase delay for dynamic modeling.

TFvelo can model cell differentiation process on gastrulation erythroid
dataset

   The Gastrulation Erythroid dataset is selected from the transcriptional
   profile for mouse embryos^[129]48, and has be adopted in RNA velocity
   studies^[130]13. Using TFvelo, the pseudo-time and cell development
   flow visualized by streamlines are consistent with the development in
   the erythroid lineage, from blood progenitors to erythroid cells
   (Figs. [131]4a, b). By comparison, several of the previous approaches
   cannot capture such developmental dynamics, as shown in Fig. [132]4d.

Fig. 4. Results of TFvelo on gastrulation erythroid dataset.

   [133]Fig. 4
   [134]Open in a new tab

   a Pseudo-time inferenced by TFvelo projected into a UMAP-based
   embedding. b The stream plot obtained by TFvelo. c The sparsity for
   unspliced, spliced and total mRNA counts. The sparsity of a gene is
   defined as:
   [MATH: <mi mathvariant="normal">Sparsity</mi><mo>=</mo><mfrac><mrow><mi
   mathvariant="normal">The</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">number</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">of</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">cells</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">that</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">the</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">count</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">of</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">this</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">gene</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">is</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">zero</mi></mrow><mrow><mi
   mathvariant="normal">The</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">total</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">number</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">of</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">cells</mi></mrow></mfrac> :MATH]
   . d The stream plot obtained by baseline approaches. Each plot is drawn
   according to the method’s tutorial. e The phase portrait fitting on
   genes with sparse unspliced counts. f The phase portrait fitting on
   TACC1 and HSP90AB1 with sparse unspliced counts. g The GO term
   biological process enrichment based on the best fitted genes. The
   adjusted p-value is computed using the Fisher exact test. h The number
   of target genes where the absolute value of the weight larger than 0.5
   for each TF. Source data are provided as a Source Data file.

   The sparsity in data is a common challenge for scRNA-seq studies.
   Considering that only about 20% of reads contained unspliced intronic
   sequences^[135]9, the sparse unspliced abundance is a large obstacle
   for dynamics fitting in phase portrait. Our quantitative analysis of
   the sparsity in unspliced, spliced and total mRNA counts is shown in
   Fig. [136]4c, which verifies the high sparsity in unspliced counts.
   Figure [137]4e shows the comparison on two genes with very sparse
   unspliced counts to illustrate the advantage of TFvelo for addressing
   the issue of sparsity. Although these genes can pass the filtering and
   selection during preprocessing, they are still too sparse to provide
   sufficient information for fitting the spiced-based models well.
   Figure [138]4f provides more example genes of the comparison between
   TFvelo and baselines. TACC1 is annotated as a gene involved in
   promoting cell division prior to the formation of differentiated
   tissues^[139]49. In our experiment, the expression of TACC1 initially
   increases, reaching its highest value in blood progenitors 2, which is
   just before differentiation into erythroid cells. Subsequently, TACC1’s
   expression decreases with pseudo-time from erythroid cells 1 to
   erythroid cells 3. By comparison, previous approaches fail to detect a
   dynamic starting from blood progenitors 1 (in red). As for HSP90AB1,
   only TFvelo can capture the correct dynamics from the learned phase
   portrait.

   We next take the GO term enrichment analysis based on the best fitted
   genes, and find that the most significant GO terms include
   porphyrin-containing compound biosynthetic process (GO:0006779) and
   heme biosynthetic process (GO:0006783), which are directly related to
   erythroid development (Fig. [140]4g). In addition, the TFs having high
   weights on most target genes are GATA1, GATA2 and LMO2 (Fig. [141]4h),
   all of which has been reported to be involved in erythroid
   differentiation. GATA1 plays a significant role in regulating the
   transcriptional aspects of erythroid maturation and function^[142]50.
   GATA2 is verified to regulates hematopoietic stem cell proliferation
   and differentiation^[143]51. LMO2 is a bridging molecule assembling an
   erythroid, DNA-binding complex^[144]52. These findings demonstrate the
   potential of TFvelo in detecting important regulatory genes from single
   cell data.

TFvelo can achieve competitive performance compared to Multivelo using only
scRNA-seq data

   Next we apply TFvelo to a 10x multi-omics embryonic mouse brain
   dataset, where both Assay for Transposase-Accessible Chromatin with
   sequencing (ATAC-Seq)^[145]53 and scRNA-seq are available. This dataset
   was employed in Multivelo study, which is an RNA velocity model
   designed for multi-omics datasets, capturing dynamics between chromatin
   accessibility and unspliced mRNA^[146]12. On this dataset, as described
   in the Multivelo study, Radial glia (RG) cells in the outer
   subventricular zone can generate neurons, astrocytes, and
   oligodendrocytes. The development of cortical layers follows an
   inside-out pattern during neuronal migration, with younger cells
   ascending to upper layers while older cells remain in deeper layers. RG
   cells can divide, producing intermediate progenitor cells (IPC) that
   act as neural stem cells and give rise to various mature excitatory
   neurons in cortical layers.

   The pseudotime inferred by TFvelo closely resemble Multivelo’s results
   (Fig. [147]5a, b), and both methods correctly identify the
   differentiation direction in most cells. Notably, TFvelo achieves this
   performance without using ATAC-seq data. Figure [148]5c shows the phase
   portrait fitting of both methods, where Multivelo could show an
   additional c-u phase portrait, reflecting the joint plot between
   ATAC-seq and unspliced mRNA.

Fig. 5. Results of TFvelo on 10x multi-omics embryonic mouse brain dataset.

   [149]Fig. 5
   [150]Open in a new tab

   a Pseudo-time and trajectory inferenced by Multivelo projected into a
   UMAP-based embedding. b Pseudo-time and trajectory inferenced by TFvelo
   projected into a UMAP-based embedding. c The comparison between
   Multivelo and TFvelo on three example genes, which are AHI1, NTRK2 and
   GRIN2B, respectively. For Multivelo plot, (c) means the chromatin
   accessibility in ATAC-seq.

   Regarding the AHI1 gene, Multivelo mistakenly constructs a cyclic
   dynamic, in contrast, TFvelo correctly captures the dynamic, with
   expression increasing consistently from V-SVZ cells (in green). For the
   NTRK2 gene, TFvelo captures such dynamics correctly, while Multivelo
   misses the decreasing process at the beginning. On the GRIN2B gene,
   both TFvelo and Multivelo detect the same dynamics, with expression
   increasing from the beginning to the terminal stage. These results
   suggest that by extracting features from multiple TFs, TFvelo can
   achieve a better fit for the dynamics of individual genes than
   Multivelo, without using ATAC-seq.

   Although ATAC-seq could provide additional information for RNA dynamics
   modeling, it could be challenging due to multiple issues. For instance,
   the high sparse and near binary ATAC-seq data^[151]54 could lead to the
   mixing of cells from different types, posing a challenge for phase
   portrait fitting. As shown in the first column at Fig. [152]5c, even
   the ATAC/unspliced/spliced 3D phase portrait can not provide enough
   information for dynamic modeling. Specifically, the c-axis (ATAC) could
   not help separate cells of different types (second column in
   Fig. [153]5c). Additionally, peaks obtained by ATAC are more
   challenging to directly link to a specific gene. By comparison, TFvelo
   can directly link multiple TFs into modeling a target gene. As a
   result, TFvelo could extract dynamics that align with the
   differentiation process well in TFs-target phase portraits (fourth
   column in Fig. [154]5c).

TFvelo can predict the cell fate on dataset without splicing information

   Next, TFvelo was applied to another single-cell RNA-seq dataset
   comprising 1,529 cells obtained from 88 human preimplantation embryos
   ranging from embryonic day 3 to 7^[155]55. Since the raw sequencing
   file is not provided on the online dataset server (see Data
   Availability Statement), those RNA velocity methods relying on the
   spliced/unspliced information are not feasible. By contrast, TFvelo
   still successfully identifies the pseudo-time and developmental
   trajectory of the cells, as depicted in Fig. [156]6a, b. Additionally,
   dynamic models successfully fit the expression patterns of target
   genes, which are illustrated in Fig. [157]6d as examples. In details,
   the dynamics predicted on PPP1R14A reveals a clockwise curve on
   TFs-target plot, and finds the changing point between the increasing
   and decreasing of expression level at day 5 post-fertilization.
   Consistent with the TFvelo results, it has been reported that the
   expression of PPP1R14A can be detected with high dynamics that it
   increases at the beginning and declines throughout the development in
   the early stage of zebrafish embryos^[158]56. TFvelo finds that the
   expression of RGS2 is the highest at the beginning, and decreases with
   the pseudo-time, which can be supported by the early findings that RGS2
   plays a critical role in early embryo development^[159]57.

Fig. 6. Results of TFvelo on dataset without splicing information.

   [160]Fig. 6
   [161]Open in a new tab

   a Pseudo-time inferenced by TFvelo projected into a UMAP-based
   embedding. b The streamplot of UMAP. c The heatmap which shows the
   dynamics of gene expression resolved along pseudo-time in the top 300
   likelihood-ranked genes. d The constructed dynamic models on three
   example genes with high likelihoods, which are PPP1R14A, RGS2 and
   GSTP1, respectively. From the left to the right within each row, the
   three subfigures show phase portrait fitting, distribution of velocity
   on UMAP, distribution of expression on UMAP, and the dynamics of
   expression resolved along with pseudo-time, respectively.

Discussion

   The recent development in RNA velocity approaches provide new insights
   for temporal analysis of single cell data. However, the existing
   methods often fails to fit the cell dynamics well in the
   unspliced/spliced space, which is partially because of the insufficient
   signal and high sparsity and noise. Motivated by the insight that the
   phase delay between TFs and target can also provide temporal
   information, we propose TFvelo to estimate the gene dynamics based on
   TFs abundance. Different from previous approaches which require the
   splicing information from the data, TFvelo models the dynamics of a
   target gene with the expression level of TFs and the target gene
   itself. In the computation framework, a generalized EM algorithm is
   adopted, so that the latent-time, the weight of TFs and those learnable
   parameters in the dynamic function can be updated alternately in each
   iteration.

   Experiments on a synthetic dataset validate that TFvelo could detect
   the underlying dynamics from the data. Results on various scRNA-seq
   datasets and a 10× multi-omics dataset further demonstrate that TFvelo
   can fit the gene dynamics well, showing a desired clockwise curve on
   the phase portrait between the TFs representation and the target.
   Furthermore, TFvelo can be used to infer the pseudo time, cell
   trajectory and detect key TFs which play important role in the
   differentiation. In addition, compared with the previous RNA velocity
   model relying on splicing information, TFvelo can achieve better
   performance on those scRNA-seq datasets, and be flexibly applied to
   more categories of datasets without splicing information.

   Due to the requirement of learning a representation of TFs, the time
   efficiency of TFvelo is lower than baseline approaches in the same
   framework, e.g., scVelo. This could be a weakness when being applied to
   the large-scale datasets. Performing TFvelo with more CPUs in parallel
   could reduce the running time. Meanwhile, TFvelo still cannot fit the
   dynamics of some genes well (Fig. S[162]14). Improving the modeling for
   more genes remains a challenging task that requires further
   exploration. In the future, more complex models may be explored to
   replace the current linear model in TFvelo for estimating the weight of
   each TFs. The employed sine function for describing expression dynamics
   could also be replaced by other functional forms flexibly. In addition,
   those recently proposed ideas for improving unspliced/spliced-based RNA
   velocity models can also be adopted in the further development of
   TFvelo, including stochastics modeling with Bayesian inference^[163]14,
   multi-omics integration^[164]22 and universe time inference^[165]13.

Methods

Data preprocessing

   We utilize the data preparation procedure with the default setting in
   scVelo^[166]10, where the difference is that TFvelo is applied to the
   total mRNA counts, which is the sum of unspliced and spliced counts.
   Additionally, after filtering out those genes that can be detected from
   fewer than 2% of cells, and selecting the top 2000 highly variable
   genes (HVGs), we normalize the expression profile of these HVGs by the
   total count in each cell. A nearest-neighbor graph (with 30 neighbors
   by default) was calculated based on Euclidean distances in principal
   component analysis space (with 30 principal components by default) on
   log-transformed spliced counts. After that, first-order and
   second-order moments (means and uncentered variances) are computed for
   each cell across its nearest neighbors. These steps are implemented in
   the same way as scv.pp.filter_and_normalize() and scv.pp.moments().

   Additionally, to find potential transcriptional relationship within the
   selected 2000 highly variable genes, we refer to the TF-target pairs
   annotated in ENCODE TF-target dataset^[167]38 and ChEA TF-target
   dataset^[168]58. If the regulatory relationship between a TF-target
   pair is labeled in at least one of these two datasets, the TF will be
   included in the involved TFs set when modeling the dynamic of the
   target gene.

The computational framework of TFvelo

   Given a gene
   [MATH: <mi>g</mi> :MATH]
   , TFvelo assumes that the RNA velocity
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac> :MATH]
   is determined by the expression level of involved TFs
   [MATH: <msub><mrow><mi mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub> :MATH]
   and the target gene itself
   [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   , which is
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac><mo>=</mo><mi>h</mi><mrow><mo>(</mo><mrow><msub><mro
   w><mi mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>,</mo><msub><mrow><mi>y</
   mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>;</mo><msub><mrow><mi>Ψ</
   mi></mrow><mrow><mi>g</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   . Using a top-down strategy^[169]13, TFvelo directly designs a profile
   function of target gene’s expression level to relax the gene dynamics
   to more flexible profiles, that is
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>=</mo><mi>f</mi><mrow><mo
   >(</mo><mrow><mi>t</mi><mo>;</mo><msub><mrow><mi>Φ</mi></mrow><mrow><mi
   >g</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   .
   [MATH: <msub><mrow><mi>Ψ</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>Φ</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   are two sets of gene-specific, time-invariant parameters, which control
   the influence of TFs on the target gene, and the shape of target gene’s
   phase portraits, respectively. Our findings and early studies^[170]39
   have shown that it is feasible to infer the regulation relationship
   according to the learned parameter in linear regression models based on
   expression data. Additionally, linear models always have high
   interpretability, and rarely suffer from over-fitting. As a result, to
   show the capability of modeling RNA velocity based on gene regulatory
   relationship, we simply adopt a linear model as
   [MATH: <mi>h</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>,</mo><msub><mrow><mi>y</
   mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>;</mo><msub><mrow><mi>Ψ</
   mi></mrow><mrow><mi>g</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   , which can map the expression levels of multiple TFs
   [MATH: <msub><mrow><mi mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mo>∈</mo><msup><mrow><mi
   >R</mi></mrow><mrow><msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F<
   /mi></mrow></msub></mrow></msup> :MATH]
   from
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F</mi></mrow></msub>
   :MATH]
   -dimensional space to 1-dimensional representation
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub> :MATH]
   with a weight vector
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub> :MATH]
   , where
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F</mi></mrow></msub>
   :MATH]
   represents the number of involved TFs. As a result, the dynamic
   equation between TFs and the target gene
   [MATH: <mi>g</mi> :MATH]
   can be written as
   [MATH:
   <mfrac><mrow><mi>d</mi><msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></m
   row></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac><mo>=</mo><mi>h</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>,</mo><msub><mrow><mi>y</
   mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mi
   mathvariant="normal">;</mi><msub><mrow><mi>Ψ</mi></mrow><mrow><mi>g</mi
   ></mrow></msub></mrow></mfenced><mo>=</mo><msub><mrow><mi
   mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>−</mo><msub><mrow><mi>γ</
   mi></mrow><mrow><mi>g</mi></mrow></msub><msub><mrow><mi>y</mi></mrow><m
   row><mi>g</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>,</mo> :MATH]
   4

   where
   [MATH: <mi>t</mi><mo>∈</mo><mrow><mo>[</mo><mrow><mi
   mathvariant="normal">0, 1</mi></mrow><mo>)</mo></mrow> :MATH]
   is the latent time assigned to each cell,
   [MATH: <msub><mrow><mi>γ</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   is the degradation rate.

   Besides, The profile function of
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")" open="("><mrow><mi>t</mi></mrow></mfenced> :MATH]
   can be chosen flexibly from those second-order differentiable
   functions^[171]13. For simplification,
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")" open="("><mrow><mi>t</mi></mrow></mfenced> :MATH]
   is designed with a sine function in our implementation, which is
   high-order differentiable and suitable for capture the curves on phase
   portrait. In addition, different with those models which assume the
   gene expression should follow a pattern of initial improvement followed
   by a decrease (or part of this whole process), TFvelo is more flexible
   and can also model gene dynamics where expression initially decreases
   and later increases with the sine function.
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>=</mo><mi>f</mi><mfenced
   close=")" open="("><mrow><mi>t</mi><mi
   mathvariant="normal">;</mi><msub><mrow><mi>Φ</mi></mrow><mrow><mi>g</mi
   ></mrow></msub></mrow></mfenced><mo>=</mo><msub><mrow><mi>α</mi></mrow>
   <mrow><mi>g</mi></mrow></msub><mi>sin</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>ω</mi></mrow><mrow><mi>g</mi></mrow></ms
   ub><mi>t</mi><mo>+</mo><msub><mrow><mi>θ</mi></mrow><mrow><mi>g</mi></m
   row></msub></mrow></mfenced><mo>+</mo><msub><mrow><mi>β</mi></mrow><mro
   w><mi>g</mi></mrow></msub><mo>,</mo> :MATH]
   5

   where
   [MATH:
   <msub><mrow><mi>α</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><m
   sub><mrow><mi>β</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msu
   b><mrow><mi>θ</mi></mrow><mrow><mi>g</mi></mrow></msub> :MATH]
   are gene-specific parameters to be learned.
   [MATH: <msub><mrow><mi>ω</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   is fixed as
   [MATH: <mn>2</mn><mi mathvariant="normal">π</mi> :MATH]
   , so that the
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>g</mi></mrow></msub><mfenced
   close=")" open="("><mrow><mi>t</mi></mrow></mfenced> :MATH]
   is unimodal within
   [MATH: <mi>t</mi><mo>∈</mo><mrow><mo>[</mo><mrow><mi
   mathvariant="normal">0, 1</mi></mrow><mo>)</mo></mrow> :MATH]
   , which follows the common assumption shared by most current
   approaches. By employing the
   [MATH: <mi>sin</mi> :MATH]
   function, TFvelo can model the dynamics of each gene without defining a
   switching time point between the of increasing and decreasing
   periods^[172]9,[173]10,[174]14,[175]18. To optimize the parameters in
   TFvelo, a generalized EM^[176]42,[177]43 algorithm is adopted, as
   discussed in the following.

Optimization of generalized EM algorithm

   To simplify the mathematical representation, here we denote the
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mo>,</mo><msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold-italic">g</mi></mrow></msub><mo>,</mo><msub><mrow><mi
   >y</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msub><mrow><mi>α
   </mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msub><mrow><mi>β</
   mi></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msub><mrow><mi>θ</mi
   ></mrow><mrow><mi>g</mi></mrow></msub><mo>,</mo><msub><mrow><mi>γ</mi><
   /mrow><mrow><mi>g</mi></mrow></msub> :MATH]
   [MATH: <mi>a</mi><mi>n</mi><mi>d</mi> :MATH]
   [MATH: <msub><mrow><mi>ω</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
   as
   [MATH: <mi mathvariant="bold">W</mi><mo>,</mo><mi
   mathvariant="bold">X</mi><mo>,</mo><mi>y</mi><mo>,</mo><mi>α</mi><mo>,<
   /mo><mi>β</mi><mo>,</mo><mi>θ</mi><mo>,</mo><mi>γ</mi> :MATH]
   and
   [MATH: <mi>ω</mi> :MATH]
   , respectively. As a result, the Eqs. ([178]5, [179]4) can be written
   as
   [MATH: <mi>y</mi><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>=</mo><mi>α</mi><mi>sin</
   mi><mfenced close=")"
   open="("><mrow><mn>2</mn><mi>π</mi><mi>t</mi><mo>+</mo><mi>θ</mi></mrow
   ></mfenced><mo>+</mo><mi>β</mi><mo>,</mo> :MATH]
   6
   [MATH: <mfrac><mrow><mi>d</mi><mi>y</mi><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced></mrow><mrow><mi>d</mi><mi>t<
   /mi></mrow></mfrac><mo>=</mo><mi
   mathvariant="bold">WX</mi><mrow><mo>(</mo><mrow><mi>t</mi></mrow><mo>)<
   /mo></mrow><mo>−</mo><mi>γ</mi><mi>y</mi><mrow><mo>(</mo><mrow><mi>t</m
   i></mrow><mo>)</mo></mrow><mo>.</mo> :MATH]
   7

   From the dynamics model in Eqs. ([180]6, [181]7), we can derive that,
   [MATH: <mi mathvariant="bold">WX</mi><mfenced close=")"
   open="("><mrow><mi>t</mi></mrow></mfenced><mo>=</mo><mi>α</mi><msqrt><m
   row><mn>4</mn><msup><mrow><mi>π</mi></mrow><mrow><mn>2</mn></mrow></msu
   p><mo>+</mo><msup><mrow><mi>γ</mi></mrow><mrow><mn>2</mn></mrow></msup>
   </mrow></msqrt><mi>sin</mi><mfenced close=")"
   open="("><mrow><mn>2</mn><mi>π</mi><mi>t</mi><mo>+</mo><mi>θ</mi><mo>+<
   /mo><mi>ϕ</mi></mrow></mfenced><mo>+</mo><mi>β</mi><mi>γ</mi><mo>,</mo>
   <mspace
   width="0.25em"></mspace><mi>ϕ</mi><mo>=</mo><mi>arctan</mi><mfenced
   close=")"
   open="("><mrow><mfrac><mrow><mn>2</mn><mi>π</mi></mrow><mrow><mi>γ</mi>
   </mrow></mfrac></mrow></mfenced><mo>,</mo> :MATH]
   8

   For modeling a gene, suppose observations
   [MATH: <msubsup><mrow><mi mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold">c</mi></mrow><mrow><mi
   mathvariant="bold">obs</mi></mrow></msubsup> :MATH]
   and
   [MATH:
   <msubsup><mrow><mi>y</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>o</mi>
   <mi>b</mi><mi>s</mi></mrow></msubsup> :MATH]
   be the normalized expression levels of TFs and the target gene within a
   cell
   [MATH: <mi>c</mi> :MATH]
   ,
   [MATH:
   <msubsup><mrow><mi>s</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>o</mi>
   <mi>b</mi><mi>s</mi></mrow></msubsup> :MATH]
   be the state space representation of the cell
   [MATH: <mi>c</mi> :MATH]
   , and
   [MATH: <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover><mr
   ow><mo>(</mo><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow>
   </msub></mrow><mo>)</mo></mrow> :MATH]
   be the modeled state at time
   [MATH: <msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msub>
   :MATH]
   , we have
   [MATH:
   <msubsup><mrow><mi>s</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>o</mi>
   <mi>b</mi><mi>s</mi></mrow></msubsup><mo>=</mo><mrow><mo>[</mo><mrow><m
   i mathvariant="bold">W</mi><msubsup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold">c</mi></mrow><mrow><mi
   mathvariant="bold">obs</mi></mrow></msubsup><mo>,</mo><msubsup><mrow><m
   i>y</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</
   mi></mrow></msubsup></mrow><mo>]</mo></mrow> :MATH]
   and
   [MATH: <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover><mf
   enced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced><mo>=</mo><mfenced close="]" open="["><mrow><mi
   mathvariant="bold">WX</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced><mo>,</mo><mi>y</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced></mrow></mfenced><mo>.</mo> :MATH]
   Aiming at finding the phase trajectory
   [MATH: <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover><mr
   ow><mo>(</mo><mrow><mi>t</mi></mrow><mo>)</mo></mrow> :MATH]
   that best fits the observations for all cells, we define the loss
   function as:
   [MATH:
   <mi>L</mi><mo>=</mo><munder><mo>∑</mo><mrow><mi>c</mi></mrow></munder><
   mrow><mi
   mathvariant="normal">||</mi><msubsup><mrow><mi>s</mi></mrow><mrow><mi>c
   </mi></mrow><mrow><mi
   mathvariant="italic">obs</mi></mrow></msubsup><mo>−</mo><mover
   accent="true"><mrow><mi>s</mi></mrow><mo>ˆ</mo></mover><mrow><mfenced
   close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced></mrow><mi
   mathvariant="normal">|</mi><msup><mrow><mi
   mathvariant="normal">|</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow>
   <mo>=</mo><munder><mo>∑</mo><mrow><mi>c</mi></mrow></munder><mrow><msup
   ><mrow><mo stretchy="false">(</mo><mi
   mathvariant="bold">W</mi><msubsup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="bold">c</mi></mrow><mrow><mi
   mathvariant="bold">obs</mi></mrow></msubsup><mo>−</mo><mi
   mathvariant="bold">WX</mi><mrow><mfenced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced></mrow><mo
   stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msup><mo>+</mo><
   msup><mrow><mo
   stretchy="false">(</mo><msubsup><mrow><mi>y</mi></mrow><mrow><mi>c</mi>
   </mrow><mrow><mi
   mathvariant="italic">obs</mi></mrow></msubsup><mo>−</mo><mi>y</mi><mrow
   ><mfenced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced></mrow><mo
   stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow><mo>
   ,</mo> :MATH]
   9

   The residuals of the observations are defined as
   [MATH:
   <msub><mrow><mi>e</mi></mrow><mrow><mi>c</mi></mrow></msub><mo>=</mo><m
   i>s</mi><mi>i</mi><mi>g</mi><mi>n</mi><mrow><mo>(</mo><mrow><msubsup><m
   row><mi>y</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>o</mi><mi>b</mi><
   mi>s</mi></mrow></msubsup><mo>−</mo><mi>y</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced></mrow><mo>)</mo></mrow><mi
   mathvariant="normal">∣∣</mi><msubsup><mrow><mi>s</mi></mrow><mrow><mi>c
   </mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</mi></mrow></msubsup><mo>−<
   /mo><mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover><mf
   enced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub></mrow></mfenced><mi mathvariant="normal">∣</mi><msup><mrow><mi
   mathvariant="normal">∣</mi></mrow><mrow><mn>2</mn></mrow></msup> :MATH]
   , which is assumed to follow a normal distribution
   [MATH:
   <msub><mrow><mi>e</mi></mrow><mrow><mi>c</mi></mrow></msub><mo>~</mo><m
   i>N</mi><mrow><mo>(</mo><mrow><mn>0</mn><mo>,</mo><msup><mrow><mi>σ</mi
   ></mrow><mrow><mn>2</mn></mrow></msup></mrow><mo>)</mo></mrow> :MATH]
   , where the gene-specific
   [MATH: <mi>σ</mi> :MATH]
   is constant across cells, and the observations are independent and
   identically distributed. Then we can arrive at the likelihood function,
   [MATH: <mi class="MJX-tex-caligraphic"
   mathvariant="script">L</mi><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow></mfenced><mo>=</mo><mfrac><mr
   ow><mn>1</mn></mrow><mrow><msqrt><mrow><mn>2</mn><mi>π</mi></mrow></msq
   rt><mi>σ</mi></mrow></mfrac><mo>*</mo><mi>exp</mi><mfenced close=")"
   open="("><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn>
   <msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfr
   ac><munder><mrow><mo>∑</mo></mrow><mrow><mi>c</mi></mrow></munder><mi
   mathvariant="normal">∣∣</mi><msubsup><mrow><mi>s</mi></mrow><mrow><mi>c
   </mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</mi></mrow></msubsup><mrow>
   <mo>(</mo><mrow><mi
   mathvariant="bold">W</mi></mrow><mo>)</mo></mrow><mo>−</mo><msub><mrow>
   <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msub></
   mrow></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow></mfenced><mi
   mathvariant="normal">∣</mi><msup><mrow><mi
   mathvariant="normal">∣</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow>
   </mfenced><mo>,</mo> :MATH]
   10

   Where
   [MATH: <msub><mrow><mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msub></
   mrow></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow></mfenced> :MATH]
   represents
   [MATH: <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover><mf
   enced close=")"
   open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></ms
   ub><mo>;</mo><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow></mfenced> :MATH]
   . Finally, the optimization algorithm aims to minimize the negative
   log-likelihood function:
   [MATH: <mi>l</mi><mrow><mo>(</mo><mrow><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow><mo>)</mo></mrow><mo>=</mo><mo
   >−</mo><mi>log</mi><mfenced close=")"
   open="("><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><msqrt><mrow><mn>2</
   mn><mi>π</mi></mrow></msqrt><mi>σ</mi></mrow></mfrac></mrow></mfenced><
   mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn><msup><mrow><mi>
   σ</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfrac><munder><mrow
   ><mo>∑</mo></mrow><mrow><mi>c</mi></mrow></munder><mi
   mathvariant="normal">∣∣</mi><msubsup><mrow><mi>s</mi></mrow><mrow><mi>c
   </mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</mi></mrow></msubsup><mrow>
   <mo>(</mo><mrow><mi
   mathvariant="bold">W</mi></mrow><mo>)</mo></mrow><mo>−</mo><msub><mrow>
   <mover
   accent="true"><mrow><mi>s</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>c</mi></mrow></msub></
   mrow></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">W</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,<
   /mo><mi>θ</mi><mo>,</mo><mi>γ</mi></mrow></mfenced><mi
   mathvariant="normal">∣</mi><msup><mrow><mi
   mathvariant="normal">∣</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>,</
   mo> :MATH]
   11

   In standard EM algorithms, the E-step computes the expected value of
   likelihood function given the observed data and the current estimated
   parameters to update the latent variables, and the M-step consists of
   maximizing that expectation computed in E step by optimizing the
   parameters. While under some circumstances, there is an intractable
   problem in M step, which could be addressed by generalized EM
   algorithms. Expectation conditional maximization is one form of
   generalized EM algorithms, which makes multiple constraint
   optimizations within each M step^[182]42. In detail, the parameters can
   be separated into more subsets, and the M step consists of multiple
   steps as well, each of which optimizes one subset of parameters with
   the reminder fixed^[183]43.

   In TFvelo, all learnable parameters can be divided into three groups,
   which are cell-specific latent time
   [MATH:
   <mi>t</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><msub><mrow><mi>
   n</mi></mrow><mrow><mi>c</mi><mi>e</mi><mi>l</mi><mi>l</mi></mrow></msu
   b></mrow></msup> :MATH]
   , weight of each TF
   [MATH: <mi
   mathvariant="bold">W</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><
   msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F</mi></mrow></msub></m
   row></msup> :MATH]
   and scalars [
   [MATH:
   <mi>α</mi><mo>,</mo><mi>β</mi><mo>,</mo><mi>θ</mi><mo>,</mo><mi>γ</mi>
   :MATH]
   ] respectively, where
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>c</mi><mi>e</mi><mi>l</mi><mi>l<
   /mi></mrow></msub> :MATH]
   and
   [MATH:
   <msub><mrow><mi>n</mi></mrow><mrow><mi>T</mi><mi>F</mi></mrow></msub>
   :MATH]
   represent the number of cells and the number of involved TFs.
   Generalized EM algorithm is adopted so that the three group of
   parameters will be updated alternately to minimize the loss function in
   each iteration:
    1. Assign
       [MATH: <mi>t</mi> :MATH]
       within [0, 1] for each cell by grid search. For each cell, on the
       phase plot, the closest point located on the curve described by the
       dynamic model is defined as the target point. The step (b) and (c)
       aim to minimize the average distance between cells and their
       corresponding target points.
    2. Update
       [MATH: <mi>W</mi> :MATH]
       by linear regression with bounds on these weights to minimize the
       loss function. Trust region reflective algorithm^[184]59, which is
       an interior-point-like method is adopted to solve this linear
       least-squares problem with inequality constraint. In our
       implementation, the weights are bounded within (−20, 20) by
       default.
    3. Update [
       [MATH:
       <mi>α</mi><mo>,</mo><mi>β</mi><mo>,</mo><mi>θ</mi><mo>,</mo><mi>γ</
       mi> :MATH]
       ] with constraints of
       [MATH: <mi>α</mi><mo>∈</mo> :MATH]
       (0, ∞),
       [MATH: <mi>θ</mi><mo>∈</mo> :MATH]
       (−π, π) and
       [MATH: <mi>γ</mi><mo>∈</mo> :MATH]
       (0, ∞) to minimize the loss function.

   By iteratively running the three steps, parameters of TFvelo for each
   target gene can be optimized. Furthermore, multiple downstream analysis
   could be conducted by combining the learned dynamics of all target
   genes.

Initialization of the generalized EM algorithm

   Since EM algorithms could be trapped in local minimum, parameters are
   initialized in the following steps. Firstly, those cells in which
   expression of target gene is 0 are filtered out. Then
   [MATH: <mi mathvariant="bold">X</mi><mo>,</mo><mi>y</mi> :MATH]
   are normalized by
   [MATH: <mi mathvariant="bold">X</mi><mo>=</mo><mfrac><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi
   mathvariant="normal">var</mi><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">X</mi></mrow></mfenced></mrow></mfrac><mo>,</mo><mi>
   y</mi><mo>=</mo><mi>y</mi><mo>/</mo><mi
   mathvariant="normal">var</mi><mrow><mo>(</mo><mrow><mi>y</mi></mrow><mo
   >)</mo></mrow> :MATH]
   , where
   [MATH: <mi
   mathvariant="normal">var</mi><mrow><mo>(</mo><mrow><mo>⋅</mo></mrow><mo
   >)</mo></mrow> :MATH]
   represents the standard deviation function. The weights
   [MATH: <mi mathvariant="bold">W</mi> :MATH]
   is initialized with the value of spearman correlation of each TF-target
   pair. For parameters
   [MATH: <mi>α</mi><mo>,</mo><mi>β</mi> :MATH]
   and
   [MATH: <mi>γ</mi> :MATH]
   , according to the steady-state assumption that
   [MATH: <mover
   accent="true"><mrow><mi>y</mi></mrow><mrow><mo>°</mo></mrow></mover><ms
   ub><mrow><mi
   mathvariant="normal">∣</mi></mrow><mrow><mi>y</mi><mo>=</mo><mi>max</mi
   ><mi mathvariant="normal"></mi><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">y</mi></mrow><mo>)</mo></mrow></mrow></msub><mo>=<
   /mo><mn>0</mn> :MATH]
   ,
   [MATH: <mi>γ</mi> :MATH]
   can be initialized by
   [MATH: <mi>γ</mi><mo>=</mo><mfrac><mrow><mi
   mathvariant="bold">WX</mi></mrow><mrow><mi>y</mi></mrow></mfrac><mspace
   width="0.25em"></mspace><msub><mrow><mi
   mathvariant="normal">∣</mi></mrow><mrow><mi>y</mi><mo>→</mo><msub><mrow
   ><mi>y</mi></mrow><mrow><mi>max</mi></mrow></msub></mrow></msub> :MATH]
   . At the points of maximal and minimal
   [MATH: <mi>y</mi> :MATH]
   value, we have
   [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>m</mi><mi
   mathvariant="normal">a</mi><mi>x</mi></mrow></msub><mo>=</mo><mi>α</mi>
   <mo>+</mo><mi>β</mi> :MATH]
   and
   [MATH: <msub><mrow><mi
   mathvariant="normal">y</mi></mrow><mrow><mi>min</mi></mrow></msub><mo>=
   </mo><mo>−</mo><mi mathvariant="normal">α</mi><mo>+</mo><mi>β</mi>
   :MATH]
   . As a result,
   [MATH: <mi>α</mi> :MATH]
   and
   [MATH: <mi>β</mi> :MATH]
   can be initialized by
   [MATH:
   <mi>α</mi><mo>=</mo><mfrac><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi
   >max</mi></mrow></msub><mo>−</mo><msub><mrow><mi>y</mi></mrow><mrow><mi
   >min</mi></mrow></msub></mrow><mrow><mn>2</mn></mrow></mfrac> :MATH]
   and
   [MATH: <mi>β</mi><mo>=</mo><mfrac><mrow><msub><mrow><mi
   mathvariant="normal">y</mi></mrow><mrow><mi
   mathvariant="normal">m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>+</mo>
   <msub><mrow><mi mathvariant="normal">y</mi></mrow><mrow><mi
   mathvariant="normal">m</mi><mi>i</mi><mi>n</mi></mrow></msub></mrow><mr
   ow><mn>2</mn></mrow></mfrac> :MATH]
   , respectively. These assumptions are only used for initialization and
   in the following optimization process, all parameters will be updated
   iteratively. The hyper-parameter
   [MATH: <mi>ω</mi> :MATH]
   is set as
   [MATH: <mn>2</mn><mi>π</mi> :MATH]
   and parameter
   [MATH: <mi>θ</mi> :MATH]
   isintialized with various values, so that the generalized EM algorithm
   will run at different start points. After that, the model with the
   lowest loss is finally selected.

Inference of Pseudo-time

   After constructing the dynamic model for each gene, TFvelo will infer a
   gene-agnostic pseudo-time for each cell based on the combination of RNA
   velocities of all genes. For this purpose, we first compute the cosine
   similarities between velocities and potential cell state transitions to
   get the transition matrix. Then the end points and root cells are
   obtained as stationary states of the velocity-inferred transition
   matrix and its transposed, respectively. After inferring the root
   cells’ location on the embedding space, velocity pseudo-time, which
   measures the average number of steps it takes to reach a cell after
   start walking from the root cell, can be computed. The strategy of
   calculating pseudo-time is the similar with that in scVelo^[185]10. The
   different is that in scVelo, the transition matrix is obtained based on
   the abundance and velocity of spliced mRNA, while in TFvelo that is
   based on the abundance and velocity of total mRNA. Details are provided
   in the Supplementary Information section 6 titled “The root and end
   cells detection”.

Velocity streams on 2D embedding space

   To illustrate cell transitions, we create stream plots in a 2D space
   using UMAP visualization as the default. To this end, we select genes
   whose loss is low and gene-specific time aligns with the pseudotime to
   construct velocity streams.

   For the single cell datasets, we first select genes with modeling error
   in the bottom 50%. Since sine functions exhibit periodicity, we need to
   determine the order of cells instead of using latent time directly. We
   achieve this by analyzing the histogram of latent time. If there are
   several consecutive blank bins in the cell distribution of latent time,
   we set the minimum value of the next non-blank bin as the initial state
   (normalized latent time = 0). Conversely, the maximum value of the last
   non-blank bin will be set as the final state (normalized latent
   time = 1). The normalized latent time of all other cells will be mapped
   within the range of 0 to 1. Subsequently, we select genes where the
   normalized latent time aligns with the pseudotime based on the Spearman
   correlation coefficient between them.

Metrics for evaluating in phase portrait

   We propose three metrics for phase portrait fitting.
    1. The intra-class distance on phase portrait, which measures the
       normalized distance between cells within the same type on the phase
       portrait. This reflects whether cells within the same type gather
       on the phase portrait, which is the lower the better. On
       unspliced/spliced based methods, the intra-class distance is
       defined as
       [MATH:
       <mi>D</mi><mi>i</mi><mi>s</mi><mi>t</mi><mo>=</mo><msub><mrow><mo>∑
       </mo></mrow><mrow><mi>k</mi></mrow></msub><msub><mrow><mo>∑</mo></m
       row><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>t</mi><mi>y</mi><mi>p
       </mi><mi>e</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msub><
       mfenced close="]" open="["><mrow><msup><mrow><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi>u</mi></mrow><mrow><mi>
       j</mi></mrow></msub><mo>−</mo><msubsup><mrow><mi>u</mi></mrow><mrow
       ><mi>c</mi></mrow><mrow><mi>k</mi></mrow></msubsup></mrow><mrow><mi
       >s</mi><mi>t</mi><mi>d</mi><mfenced close=")"
       open="("><mrow><mi>u</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup><mo>+</mo><msup><mro
       w><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>
       j</mi></mrow></msub><mo>−</mo><msubsup><mrow><mi>s</mi></mrow><mrow
       ><mi>c</mi></mrow><mrow><mi>k</mi></mrow></msubsup></mrow><mrow><mi
       >s</mi><mi>t</mi><mi>d</mi><mfenced close=")"
       open="("><mrow><mi>s</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup></mrow></mfenced>
       :MATH]
       , where
       [MATH:
       <mrow><mo>(</mo><mrow><msubsup><mrow><mi>u</mi></mrow><mrow><mi>c</
       mi></mrow><mrow><mi>k</mi></mrow></msubsup><mo>,</mo><msubsup><mrow
       ><mi>s</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>k</mi></mrow></m
       subsup></mrow><mo>)</mo></mrow> :MATH]
       means the center of cell type k, std(.) means the standard variance
       and
       [MATH: <mi>j</mi> :MATH]
       means the index of a cell. For TFvelo, the distance can be defined
       by substituting
       [MATH: <mi>u</mi> :MATH]
       with
       [MATH: <mi>W</mi><mi>X</mi> :MATH]
       and
       [MATH: <mi>s</mi> :MATH]
       with
       [MATH: <mi>y</mi> :MATH]
       .
    2. The inter-class distance on phase portrait, which measures the
       normalized distance between the distribution centers of different
       cell types on the phase portrait, which is defined as
       [MATH:
       <mi>D</mi><mi>i</mi><mi>s</mi><mi>t</mi><mo>=</mo><msub><mrow><mo>∑
       </mo></mrow><mrow><mi>k</mi><mn>1</mn><mo>≠</mo><mi>k</mi><mn>2</mn
       ></mrow></msub><mfenced close="]"
       open="["><mrow><msup><mrow><mfenced close=")"
       open="("><mrow><mfrac><mrow><msubsup><mrow><mi>u</mi></mrow><mrow><
       mi>c</mi></mrow><mrow><mi>k</mi><mn>1</mn></mrow></msubsup><mo>−</m
       o><msubsup><mrow><mi>u</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>
       k</mi><mn>2</mn></mrow></msubsup></mrow><mrow><mi>s</mi><mi>t</mi><
       mi>d</mi><mfenced close=")"
       open="("><mrow><mi>u</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup><mo>+</mo><msup><mro
       w><mfenced close=")"
       open="("><mrow><mfrac><mrow><msubsup><mrow><mi>s</mi></mrow><mrow><
       mi>c</mi></mrow><mrow><mi>k</mi><mn>1</mn></mrow></msubsup><mo>−</m
       o><msubsup><mrow><mi>s</mi></mrow><mrow><mi>c</mi></mrow><mrow><mi>
       k</mi><mn>2</mn></mrow></msubsup></mrow><mrow><mi>s</mi><mi>t</mi><
       mi>d</mi><mfenced close=")"
       open="("><mrow><mi>s</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup></mrow></mfenced>
       :MATH]
       . This reflects whether cells from different types are separated
       well on the phase portrait, which is the higher the better.
    3. The fitting error on phase portrait, which measures the normalized
       distance between each cell to the constructed model on the phase
       portrait. This reflects the fitting accuracy and calculated in the
       following way. On the phase portrait of each selected gene,
       calculate the normalized mean square error of the distance between
       each cell to the inferred model curve. For the TFvelo data,
       [MATH:
       <mi>E</mi><mi>r</mi><mi>r</mi><mo>=</mo><msubsup><mrow><mo>∑</mo></
       mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></m
       row></msubsup><mfenced close="]"
       open="["><mrow><msup><mrow><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi
       mathvariant="bold">WX</mi></mrow><mrow><mi
       mathvariant="bold">j</mi></mrow></msub><mo>−</mo><mi
       mathvariant="bold">WX</mi><mfenced close=")"
       open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow>
       </msub></mrow></mfenced></mrow><mrow><mi>s</mi><mi>t</mi><mi>d</mi>
       <mfenced close=")" open="("><mrow><mi
       mathvariant="bold">WX</mi></mrow></mfenced></mrow></mfrac></mrow></
       mfenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup><mo>+</mo><msup><mro
       w><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>
       j</mi></mrow></msub><mo>−</mo><mi>y</mi><mfenced close=")"
       open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow>
       </msub></mrow></mfenced></mrow><mrow><mi>s</mi><mi>t</mi><mi>d</mi>
       <mfenced close=")"
       open="("><mrow><mi>y</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup></mrow></mfenced><mo
       >,</mo> :MATH]
       where
       [MATH: <mi>j</mi> :MATH]
       means the index of a cell, t[j] means the latent time modeled for
       cell
       [MATH: <mi>j</mi> :MATH]
       and N is the total number of cells.
       [MATH: <msub><mrow><mi mathvariant="bold">WX</mi></mrow><mrow><mi
       mathvariant="bold-italic">j</mi></mrow></msub> :MATH]
       and
       [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>j</mi></mrow></msub>
       :MATH]
       refer to the location of cell on the phase portrait.
       [MATH: <mi
       mathvariant="bold">WX</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>t</
       mi></mrow><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow>
       :MATH]
       and
       [MATH:
       <mi>y</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>t</mi></mrow><mrow>
       <mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
       refers to the model given by TFvelo. Similarly, the error for
       un/spliced data-based methods is defined as
       [MATH:
       <mi>E</mi><mi>r</mi><mi>r</mi><mo>=</mo><msubsup><mrow><mo>∑</mo></
       mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></m
       row></msubsup><mfenced close="]"
       open="["><mrow><msup><mrow><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi>u</mi></mrow><mrow><mi>
       j</mi></mrow></msub><mo>−</mo><mi>u</mi><mfenced close=")"
       open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow>
       </msub></mrow></mfenced></mrow><mrow><mi>s</mi><mi>t</mi><mi>d</mi>
       <mfenced close=")"
       open="("><mrow><mi>u</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup><mo>+</mo><msup><mro
       w><mfenced close=")"
       open="("><mrow><mfrac><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>
       j</mi></mrow></msub><mo>−</mo><mi>s</mi><mfenced close=")"
       open="("><mrow><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow>
       </msub></mrow></mfenced></mrow><mrow><mi>s</mi><mi>t</mi><mi>d</mi>
       <mfenced close=")"
       open="("><mrow><mi>s</mi></mrow></mfenced></mrow></mfrac></mrow></m
       fenced></mrow><mrow><mspace width="0.3em"></mspace><mspace
       width="0.3em"></mspace><mn>2</mn></mrow></msup></mrow></mfenced>
       :MATH]
       .

Metrics for evaluating the velocity stream

    1. Cross boundary direction correctness (CBDir) and within-cluster
       velocity coherence (ICVCoh). According to the definition given in
       VeloAE^[186]11 and UniTVelo^[187]13, CBDir evaluates the accuracy
       of transitions from a source cluster to a target cluster based on
       the information provided by boundary cells, which requires the
       ground truth annotation. ICVCoh is computed with a cosine
       similarity scoring function between cell velocities within the same
       cluster. We directly run the functions provided in UniTVelo^[188]13
       to obtain the of CBDir and ICVCoh.
    2. Velocity consistency. We executed scVelo’s velocity_confidence()
       function and interpreted the outcomes as velocity consistency. This
       is because that it indeed measures the consistency of velocities
       within neighboring cells, as opposed to the statistical definition
       of “confidence”.

GO term and KEGG pathway enrichment analysis

   To perform the GO term and KEGG pathway enrichment analysis, we
   utilized the “gseapy.enrichr()” function in Python package “gseapy”
   with using Fisher’s exact test by default. For each dataset, the
   background gene list consists of all genes within the original dataset
   before preprocessing.

Weights normalization and TFs selection

   Considering that the abundance of different TFs could vary a lot, the
   quantitative comparison between the weights of TFs is conducted after
   normalizing them. For each TF on modeling a target, we multiply the
   mean expression level by the learned weights to get the average
   influence of the TF to the target gene. For instance, the average
   influence of TF
   [MATH: <mi>i</mi> :MATH]
   is
   [MATH:
   <msub><mrow><mi>w</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>*</mo><m
   sub><mrow><mover
   accent="true"><mrow><mi>x</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub> :MATH]
   . Then the normalized weights are defined as:
   [MATH: <msub><mrow><mover
   accent="true"><mrow><mi>w</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub><mo>=</mo><mfrac><mrow><msub><mrow><mi>w</mi></mro
   w><mrow><mi>i</mi></mrow></msub><mo>*</mo><msub><mrow><mover
   accent="true"><mrow><mi>x</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub></mrow><mrow><msub><mrow><mi>max</mi></mrow><mrow>
   <mi>i</mi></mrow></msub><mrow><mo>(</mo><mrow><mo>∣</mo><msub><mrow><mi
   >w</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>*</mo><msub><mrow><move
   r
   accent="true"><mrow><mi>x</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub><mo>∣</mo></mrow><mo>)</mo></mrow></mrow></mfrac>
   :MATH]
   . The TFs with high weights are select according to these normalized
   weights with default threshold of
   [MATH: <mn>0.5</mn> :MATH]
   .

Statistics & reproducibility

   Four datasets are utilized in this manuscript, which is a proper number
   compared with relevant RNA velocity studies. No statistical method was
   used to predetermine sample size. No data were excluded from the
   analyses. The experiments were not randomized. The Investigators were
   not blinded to allocation during experiments and outcome assessment.

   To reproduce the results shown in this paper, please use the demo code
   with default parameters at the GitHub repository,
   [189]https://github.com/xiaoyeye/TFvelo.

Reporting summary

   Further information on research design is available in the [190]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [191]Supplementary Information^ (12MB, pdf)
   [192]Peer Review File^ (3.6MB, pdf)
   [193]Reporting Summary^ (234.3KB, pdf)

Source data

   [194]Source Data^ (14.6MB, xlsx)

Acknowledgements