Abstract

   The remarkable capacity of the liver to regenerate its lost mass after
   resection makes living donor liver transplantation a successful
   treatment option. However, donor heterogeneity significantly influences
   recovery trajectories, highlighting the need for individualized
   monitoring. With the rising incidence of liver diseases, safer
   transplant procedures and improved donor care are urgently needed.
   Current clinical markers provide only limited snapshots of recovery,
   making it challenging to predict long-term outcomes. Following partial
   hepatectomy, precise liver mass recovery requires tightly regulated
   hepatocyte proliferation. We identified distinct gene expression
   patterns associated with liver regeneration by analyzing blood-derived
   gene expression measurements from twelve donors followed over a year.
   Using a deep learning-based framework, we integrated these patterns
   with a mathematical model of hepatocyte transitions to develop a
   personalized, progressive mechanistic digital twin—a virtual liver
   model that predicts donor-specific recovery trajectories. Central to
   our approach is a mechanistically identifiable latent space, defined by
   variables derived from a physiologically grounded differential equation
   model of liver regeneration, which enables biologically interpretable,
   bidirectional mapping between gene expression data and model dynamics.
   This approach integrates clinical genomics and computational modeling
   to enhance post-surgical care, ensuring safer transplants and improved
   donor recovery.

   Keywords: liver regeneration, partial hepatectomy, mathematical
   modeling, deep learning, digital twin, living donor liver transplant
   (LDLT)

Introduction

   The liver has the ability to regenerate and restore its functional mass
   following injury, disease, or surgical removal, such as partial
   hepatectomy. This regenerative capacity is vital for maintaining the
   body’s homeostasis, given its central role in numerous physiological
   processes, including the detoxification of harmful substances, the
   metabolism of nutrients and drugs, and the synthesis of essential
   proteins such as albumin and clotting factors [[30]1–4]. The
   proliferation of mature hepatocytes, the functional cells of the liver,
   primarily drives liver regeneration. These hepatocytes, generally in a
   quiescent state (the G0 phase of the cell cycle), can re-enter the cell
   cycle and proliferate in response to injury or a loss of liver mass.
   This proliferation is a meticulously regulated process, ensuring that
   the liver regenerates to its original size without excessive or
   insufficient growth [[31]5–7]. Apart from the hepatocytes, supporting
   endothelial cells, hepatic stellate cells, and Kupffer cells play
   critical roles in liver regeneration. Endothelial cells facilitate the
   restoration of the liver vascular network, ensuring adequate blood flow
   and nutrient delivery during the regenerative process. Hepatic stellate
   cells contribute by producing extracellular matrix (ECM) components and
   growth factors, while Kupffer cells, the liver’s resident macrophages,
   secrete cytokines and chemokines that prime hepatocytes for
   regeneration [[32]8–14]. This collective and coordinated interplay
   among hepatocytes and their supporting cells enables the capacity of
   the liver to regenerate while maintaining its essential functions. This
   capability of the liver to continue its normal functioning even during
   the regeneration process permits living donor liver transplantation
   (LDLT) [[33]15].

   Liver regeneration is influenced by multiple factors, including the
   host’s metabolic environment, graft type, and hepatectomy volume.
   Studies have shown that left lobe grafts regenerate faster than right
   lobes, with recipients exhibiting higher regeneration rates than donors
   [[34]16]. Additionally, minor resections primarily rely on hypertrophy,
   whereas extensive resections involve both hypertrophy and hepatocyte
   proliferation, often leading to increased cell ploidy [[35]16–18].
   These complexities make it challenging to predict individual recovery
   trajectories following LDLT. Inter-donor variability significantly
   impacts recovery outcomes in LDLT, with factors such as genetic makeup,
   physiological status, pre-existing liver conditions, and immune
   responses playing crucial roles. For example, high intra-donor
   variability in tacrolimus levels has been associated with adverse
   outcomes, including graft rejection and the development of
   donor-specific antibodies [[36]19]. Furthermore, the specialized nature
   of LDLT limits data availability, as it represents only a tiny fraction
   of total liver transplants. A study analyzing 73 681 adult recipients
   found that only 4% underwent LDLT, with substantial variation across
   transplant centers [[37]20]. This scarcity of cases, combined with
   limited longitudinal follow-up data, poses significant challenges in
   developing clinically relevant predictive models for liver
   regeneration.

   The process of liver regeneration is orchestrated by dynamic
   transitions of hepatocytes through quiescent, primed, and replicative
   states, regulated by intricate signaling pathways [[38]21–23].
   Understanding molecular drivers, including gene activation and
   suppression, is critical for modeling and predicting these regenerative
   dynamics [[39]24–27]. Building on a foundation of experimental
   observations, we developed the first mathematical model to predict
   liver volume after partial hepatectomy (PHx) [[40]28, [41]29], offering
   a quantitative framework to predict regenerative trajectories. To make
   such predictive models clinically relevant, we must integrate real-time
   donor-specific data with the dynamical framework provided by our
   dynamical model [[42]30, [43]31]. This is essential for capturing the
   complex interplay of molecular, cellular, and systemic factors that
   drive liver regeneration [[44]1, [45]22, [46]32].

   However, challenges such as significant inter-donor variability, the
   nonlinear nature of liver recovery, and the integration of
   heterogeneous datasets complicate the development of personalized
   predictive models [[47]31]. To address these complexities, this study
   introduces the personalized progressive mechanistic digital twin
   (PePMDT), a donor-specific digital twin for the livers of LDLT donors.
   By integrating donor-specific clinical and molecular data with deep
   learning techniques, PePMDT enables precise simulation of regeneration
   dynamics. Similar deep learning ideas have also been applied in systems
   biology and multi-omics integration to uncover functional relationships
   between gene regulatory networks and physiological outcomes
   [[48]33–36]. The digital twin framework begins with collecting blood
   samples from a specific LDLT donor after partial hepatectomy.
   Whole-transcriptomic analysis is then performed, and the identified
   genes are clustered into various functional groups. These grouped genes
   are subsequently mapped onto our mechanistic mathematical model that
   describes liver regeneration processes. A key novelty of our approach
   lies in the construction of a mechanistically identifiable latent space
   between the encoder and decoder networks, composed of variables derived
   from physiological model of liver regeneration [[49]28, [50]29]. This
   biologically interpretable latent space allows bidirectional mapping
   between gene expression profiles and mechanistic variables, making our
   digital twin framework fundamentally different from black-box deep
   learning approaches. The model predicts dynamic variables at future
   time points, which are then used to forecast future gene expression
   patterns, enabling a data-driven approach to personalized liver
   recovery assessment, as depicted in [51]Fig. 1. This personalized
   framework has the potential to optimize post-transplant care, enhance
   recovery outcomes, and advance the field of regenerative medicine.

Figure 1.

   [52]Figure 1.
   [53]Open in a new tab

   Framework of the digital twin. The figure depicts the general framework
   of the digital twin for liver regeneration. Blood samples are collected
   from a specific LDLT donor after partial hepatectomy, followed by whole
   transcriptomic analysis. Identified genes are clustered into functional
   groups and mapped onto a mechanistic mathematical model describing
   liver regeneration. The model predicts dynamic variables at future time
   points, which are then used to forecast future gene expression
   patterns.

Methods

Streamlining gene expression profiles

   The dataset consists of whole-transcriptome RNA sequencing data from 12
   healthy LDLT donors, collected at 14 predefined time points over a
   1-year follow-up period. All LDLT donors underwent partial hepatectomy,
   and blood samples were collected at specific intervals post-surgery,
   including 5 min, 30 min, 1 h, 2 h, 3 h, 4 h, 1 day, 2 days, 3 days,
   4 days, 10 days, 3 months, 6 months, and 12 months. The aim was to
   explore the dynamic transcriptional changes associated with liver
   regeneration post-PHx. This comprehensive dataset provided a unique
   opportunity to capture the liver’s gene expression landscape during
   recovery, from the immediate post-operative phase to the long-term
   restoration of liver function. [54]Figure 2 presents a visual
   representation of the data structure, providing context for how the
   data were organized and utilized in this analysis. To ensure accurate
   downstream analysis and model predictions, the RNA-seq data underwent
   several refinement steps aimed at standardizing gene expression values
   across different donors and time points. These steps were essential to
   handle potential technical and biological noise inherent in
   high-throughput sequencing data.

Figure 2.

   [55]Figure 2.
   [56]Open in a new tab

   Data structure. The dataset consists of samples obtained from 12
   healthy LDLT donors, collected at 14 distinct time points over the
   course of 1 year following partial hepatectomy. Filled color boxes
   indicate the presence of data, while blank spaces represent missing
   data. Donors are arranged in descending order based on the number of
   available time points. The number of available data points out of the
   14 total time points is indicated on the right side for each donor. The
   top two-thirds of the dataset is used for training, while the remaining
   one-third serves as the testing dataset. The lower portion of the
   dataset was chosen for testing because these samples lack gene
   expression values in the later stages, providing an opportunity for
   predictive modeling.

Missing value imputation

   Due to the nature of RNA-seq data, some gene expression values were
   missing or below detection limits. To address this, we filled the
   missing values by averaging the corresponding expression values from
   other donors. This approach ensured that the imputed values were
   biologically plausible and aligned with the collective data structure.
   By using donor-averaged imputation, we preserved the continuity of
   expression profiles across time points and minimized biases in
   downstream analyses, ensuring robust and reliable integration of the
   data into our model.

Correction of negative expression values

   In some instances, negative expression values were observed due to
   noise in low-expressed genes. These negative values were biologically
   unrealistic and could interfere with further analysis. To address this
   issue, we replaced all negative values with the average of the
   corresponding expression values from other donors. This replacement
   aims to ensure that no low or negative value artifacts skew the
   results, particularly in downstream clustering or modeling.

Log-transformation

   Gene expression values were log-transformed to reduce the skewness of
   the data and to stabilize the variance across time points and donors.
   Log-transformation compresses the range of gene expression values,
   potentially revealing more patterns in genes with both high and low
   expression. This transformation is especially useful in dynamic
   biological processes like liver regeneration, where gene expression
   changes can span several orders of magnitude over time. The
   transformation used was:
   [MATH:
   <mrow><mtext>Log</mtext><mo> </mo><mi> </mi><mtext>transformed</mtext><
   mi> </mi><mtext>data</mtext><mo>=</mo><mrow><msub><mrow><mrow><mo> </mo
   ><mtext>log</mtext><mo> </mo></mrow></mrow><mn>2</mn></msub></mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>+</mo><mtext>data</mtext><mo
   stretchy="false">)</mo></mrow> :MATH]

   This pipeline made the RNA-seq data ready for subsequent analyses,
   including clustering, temporal trend analysis, and mapping to the
   mathematical model variables.

Gene co-expression analysis: Identification of temporal gene modules

   The dataset contains 16 735 RNA gene transcripts in the blood, each
   expressed at least once during post-surgical recovery. Among these,
   6649 genes exhibited a differential expression of 2-fold or more across
   at least two consecutive time points compared to the preoperative
   baseline following liver resection. Additionally, 1110 genes showed
   expression changes of 1.5-fold or more at pre-resection time points,
   likely due to nonspecific surgical effects from the abdominal incision.
   These genes were excluded to focus on those potentially specific to
   liver regeneration. This left 5876 genes that were identified as
   differentially expressed genes (DEGs) and specific to liver
   regeneration.

   Weighted gene co-expression network analysis (WGCNA) [[57]37] was
   employed on the DEGs to identify groups of genes that exhibited
   distinct temporal patterns during liver regeneration. These groups were
   further refined using self-organizing maps (SOM), which organized the
   genes into clusters based on their temporal expression profiles. SOM is
   particularly useful for visualizing time-series data, as it provides a
   low-dimensional representation of complex patterns, helping to reveal
   temporal trends in gene expression [[58]38, [59]39]. These methods
   collectively identify clusters of highly correlated genes, which are
   believed to be involved in shared biological processes. Each cluster is
   assigned a unique name and represents a set of genes whose expression
   patterns are highly similar across donors and time points. This method
   allows for the identification of clusters without prior knowledge of
   the biological pathways involved, making it an unbiased tool for
   analyzing dynamic gene expression data [[60]37, [61]40].

Mathematical and molecular framework of liver regeneration

   Liver regeneration following partial hepatectomy involves a series of
   biological processes, including hepatocyte proliferation, ECM
   degradation, and tissue remodeling. These processes are regulated by
   dynamic interactions between genes and the tissue environment, which
   can be modeled mathematically. Our previously developed mathematical
   model [[62]28, [63]29] simulates liver regeneration by incorporating
   key processes such as liver volume recovery, hepatocyte proliferation
   rates, ECM turnover, and tissue remodeling.

   The liver regeneration process is characterized by a coordinated
   transition of hepatocytes through distinct states—quiescent (Q), primed
   (P), and replicating (R). These transitions are illustrated in
   [64]Fig. 3A. While the total cell count consists of Q, P, and R cells,
   the overall liver volume increases due to the hypertrophy of primed and
   replicating cells [[65]18]. The reduction in hepatocyte number
   following resection increases the metabolic load per hepatocyte (M/N),
   triggering the expression of growth factors (GF) and tumor necrosis
   factor (TNF). TNF promotes ECM degradation via MMPs and induces IL-6
   expression, which activates the JAK-STAT3 signaling pathway. STAT3, in
   turn, drives the transcription of SOCS3 and immediate early (IE) genes,
   shifting hepatocytes into a primed state. In this state, hepatocytes
   remain metabolically active but do not yet proliferate. Cytokines such
   as IL-6 and TNF-
   [MATH: <mi>α</mi> :MATH]
   , along with growth factors like HGF and EGF, further reinforce this
   priming by upregulating transcriptional programs involving immediate
   early genes, such as c-Jun and c-Fos. This prepares hepatocytes to
   respond to mitogenic signals and transition into the replicating state,
   where DNA synthesis and cell division occur to restore liver mass. As
   regeneration progresses, TNF levels decrease, ECM reformation occurs,
   and GF are inactivated, for example by uptake into the ECM, ultimately
   halting the primed-to-replicating transition and promoting the shift
   from replication back to quiescence. The molecular interactions
   underlying this process are depicted in [66]Fig. 3B. A detailed
   mathematical framework is given in [[67]28, [68]29].

Figure 3.

   [69]Figure 3.
   [70]Open in a new tab

   Schematic overview of liver regeneration. (A) The cell cycle dynamics
   during liver regeneration, highlighting changes in cell numbers. (B)
   The biochemical pathways involved in liver regeneration, illustrating
   key molecular interactions driving hepatocyte proliferation and tissue
   remodeling.

Constructing the digital twin

   Integration of gene expression data with mechanistic modeling
   represents a potentially transformative approach to predicting
   personalized liver recovery. In this study, we constructed a digital
   twin, PePMDT, through a two-step mapping process combined with
   mathematical model simulations (see [71]Fig. 4). In the first step, we
   map gene expression data onto key physiological parameters of the
   mathematical model. This transformation enables us to represent
   biological states in terms of mathematically defined variables
   governing liver regeneration dynamics. Once the gene expression data
   has been mapped, we use the mathematical model to predict liver
   regeneration dynamics at future time points. These simulations predict
   the evolution of key physiological parameters, capturing hepatocyte
   proliferation, ECM turnover, and tissue remodeling over time. In the
   final step, we reverse-map the simulated mathematical model variables
   back to gene expression values at future time points. This process
   gives predicted gene expression profiles, allowing us to estimate the
   molecular responses of LDLT donors over time based on their initial
   gene expression states. These forward and reverse mappings are
   described in the following subsections.

Figure 4.

   [72]Figure 4.
   [73]Open in a new tab

   Schematic representation of the digital twin framework. The process
   involves a two-step mapping approach combined with mathematical model
   simulations. First, gene expression data are mapped onto key
   physiological parameters of the mathematical model. The model then
   simulates liver regeneration dynamics over future time points. Finally,
   the simulated mathematical values are reverse-mapped to reconstruct
   gene expression values, enabling the prediction of molecular responses
   in LDLT donors over time.

Mapping gene expression profiles to mathematical model variables

   The mathematical model [[74]28, [75]29] describes hepatocyte state
   transitions in response to molecular signals, providing insights into
   liver recovery and potential therapeutic targets. However, they lack
   direct links to gene regulation. To bridge this gap, we used RNA-seq
   data from LDLT donors to map gene expression onto dynamic model
   variables. Using a feedforward neural network (FNN), we mapped gene
   expression data into key physiological parameters, enabling a
   gene-driven understanding of liver regeneration. The FNN consists of
   three fully connected layers, each followed by ReLU activation
   functions to introduce nonlinearity, potentially allowing the model to
   capture complex relationships between gene expression patterns and
   liver regeneration states. Dropout layers (0.2 probability) are
   incorporated to prevent overfitting, aiming to ensure robust
   generalization across different gene expression profiles. The input
   layer receives gene expression data clustered into biologically
   relevant groups (15 clusters), which is then transformed through
   progressively wider layers (64 and 128 neurons) to extract meaningful
   patterns. The final layer outputs a lower-dimensional representation
   consisting of mathematical model variables corresponding to the
   measured gene expression levels computed based on donor-specific
   factors, such as the extent of liver resection, which plays a crucial
   role in the mathematical modeling framework. This parameter
   significantly influences the simulation of liver regeneration dynamics.
   The FNN is trained using a supervised learning approach, where the loss
   function (mean squared error, MSE) measures the difference between
   predicted and actual physiological states. FNN parameters are optimized
   using backpropagation and the Adam optimizer [[76]41].

Reverse mapping: Predicting gene expression from model variables

   After mapping gene expression data to mathematical model variables, the
   model simulates liver regeneration dynamics over future time points. A
   reverse mapping step is required to translate these simulated variables
   back into gene expression values, predicting future molecular
   responses. This transformation is performed using an FNN as well. The
   architecture mirrors the FNN used for the forward mapping, consisting
   of three fully connected layers with ReLU activation functions to
   introduce nonlinearity and dropout layers (0.2 probability) to prevent
   overfitting. The input layer receives simulated model variables,
   processes them through progressively wider layers (64 and 128 neurons),
   and outputs gene expression values corresponding to biologically
   relevant clusters. The model is optimized using the Adam optimizer,
   with MSE as the loss function to minimize the discrepancy between
   predicted and actual gene expression values.

Results

The genetic blueprint of liver regeneration: Clustering analysis reveals
temporal gene modules in liver regeneration

   The RNA-seq data from 12 healthy LDLT donors, collected over a 1-year
   follow-up period, were obtained from Lawrence et al. [[77]42]. The
   donors were divided into two sets: two-thirds were used as training
   donors, and one-third were used as test donors. These data underwent
   several refinement steps to standardize them, as detailed in the
   Methods section. To identify temporally regulated gene modules
   representing distinct biological processes across different phases of
   liver regeneration, we applied a combination of WGCNA and SOM
   [[78]37–40, [79]43–46]. This approach enabled the identification of
   gene clusters exhibiting dynamic regulation, capturing key patterns of
   gene expression over time. These identified modules form the foundation
   for understanding the molecular mechanisms underlying liver
   regeneration. Moreover, this clustering approach revealed gene networks
   with synchronized behavior, providing insights into the transcriptional
   programs that drive recovery after partial hepatectomy.

   After applying WGCNA and SOM (for details see Methods section), the
   genes were clustered into 15 distinct groups (see [80]Fig. 5)—a higher
   number than the eight clusters used by Lawrence et al. [[81]42]. This
   increased number of clusters allowed for a more precise differentiation
   of DEGs based not only on their temporal expression patterns but also
   on their expression values, which proved beneficial for subsequent
   analyses. To assess the biological significance of the clusters, we
   compared our results with those of Lawrence et al. [[82]42]. Consistent
   with their findings, we also identified that the clusters can be
   grouped into three prominent functional temporal gene modules
   corresponding to the main phases of liver regeneration: early response
   (Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, and 15), proliferation
   (Clusters 3 and 6), and long-term recovery (Clusters 1 and 2). Each
   module comprises genes with specific functions that are activated
   during distinct stages of the recovery process of the liver. SOM
   generates results based on random seeds, meaning that the outcomes may
   vary slightly for individual clusters with each run. However, this does
   not affect the overall functional modules of regeneration. The key
   processes associated with each module were determined using functional
   enrichment analysis using the Enrichr database [[83]47] and are
   outlined to highlight their relevance to the overall biological
   context. The detailed figure of the gene ontology (GO) and pathway
   enrichment analysis is provided in the [84]Supplementary Fig. 1. All
   gene clusters and their corresponding top driver genes are provided in
   the [85]Supplementary Materials.

Figure 5.

   [86]Figure 5.
   [87]Open in a new tab

   Temporal gene modules. A visual representation of the temporal gene
   expression patterns across 15 distinct clusters, which were
   subsequently grouped into three modules, during the liver resection and
   regeneration process. The bold lines represent the mean log2 fold
   change relative to preoperative values, and the shaded regions indicate
   the standard deviations.
     * Early-response module (Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14,
       and 15): This module consists of genes that peak within the first
       24 h post-PHx. These genes are primarily involved in the
       inflammatory response and early ECM degradation and remodeling.
       This triggers immune cell recruitment and matrix breakdown, which
       are crucial for initiating the regenerative process. Key genes in
       this module include SLC39A1, which is involved in zinc transport
       and immune regulation; IFNGR2, critical for immune response and
       inflammation; CMTM4, which modulates cytokine signaling; SEMA6C,
       guiding immune cell migration and tissue repair; IGKV1-33, playing
       a role in immune response through immunoglobulin synthesis; LGMN, a
       protease responsible for ECM degradation; CCR6, mediating immune
       cell recruitment to injury sites; FES, a tyrosine kinase
       participating in ECM remodeling; ADM, a cytokine involved in
       vasodilation and inflammation; and WNT2B, part of the Wnt signaling
       pathway that promotes cell proliferation and tissue regeneration.
       The temporal peak of this module aligns with the critical early
       stages of liver regeneration, where tissue damage needs to be
       repaired to create a favorable environment for cell proliferation.
     * Proliferation module (Clusters 3 and 6): Active between 1 and 10
       days post-surgery, this module is enriched for genes related to
       cell cycle progression, hepatocyte proliferation, and DNA
       replication. During this phase, the liver undergoes rapid growth as
       hepatocytes proliferate to replace the lost tissue. This process is
       tightly regulated by growth factors and cell cycle regulators,
       ensuring controlled proliferation. Genes in this module include
       cyclins, cyclin-dependent kinases (CDKs), and growth factor
       receptors. For example, Cluster 3 contains TFDP1, a key
       transcription factor in cell cycle regulation, along with COLGALT1,
       HOMER3, ITGAD, and ANG, which are associated with chromatin
       remodeling, cell signaling, and angiogenesis. Cluster 6 includes
       immune-regulatory and proliferation-associated genes such as IFIT1,
       CXCL8, CXCL6, IFIT3, and AHRR, which are known to be involved in
       inflammation-driven proliferation and tissue remodeling. The peak
       of this module corresponds to the proliferative burst that occurs
       during liver regeneration, with hepatocyte replication reaching its
       highest levels between 3 and 10 days.
     * Long-term recovery module (Clusters 1 and 2): This module is
       characterized by genes that show sustained upregulation beyond 3
       months post-PHx. These genes are associated with ECM organization,
       tissue remodeling, and the restoration of normal liver function. As
       the liver nears the completion of regeneration, it enters a phase
       of tissue stabilization, where the newly formed ECM becomes
       organized, and liver architecture is restored to support long-term
       functional recovery. Key genes in this module include PTGS1, TSTA3,
       MAP1A, and CTSE, which are involved in extracellular matrix
       remodeling. Genes such as ESAM, PTGFRN, and CORIN are associated
       with angiogenesis and vascular functions, facilitating proper blood
       vessel formation and remodeling. DAP and SIAH2 contribute to tissue
       remodeling and maintaining homeostasis. Additionally, FAXDC2,
       ACCSL, and possibly GPR146 play roles in liver function and lipid
       metabolism, supporting the restoration of metabolic activity during
       the late phase of liver regeneration.

   These three modules capture the temporal dynamics of liver regeneration
   and also reflect distinct biological processes that are activated
   sequentially.

PepMDT predicts donor-specific future gene expression patterns related to
liver mass recovery with precision

   Comparisons of predicted gene expression from PePMDT with actual gene
   expression profiles across multiple time points and donors demonstrated
   a strong alignment between the two datasets. In this analysis, the
   initial gene expression profiles of test donors (before surgery values)
   were input into the PePMDT model, which then predicted gene expression
   values for each cluster at subsequent time points. [88]Figure 6
   illustrates the results for all test donors: donors 5, 6, 10, and 11.

Figure 6.

   [89]Figure 6.
   [90]Open in a new tab

   Comparison of original and predicted gene expression profiles. The plot
   illustrates the temporal gene expression patterns for individual test
   donors—donor 5 (A), donor 6 (B), donor 10 (C), and donor 11 (D). Bold
   lines represent the mean log2 fold change relative to preoperative
   values, with shaded regions indicating standard deviations. Red lines
   denote the predicted values, while blue lines represent the original
   (true) gene expression values. Open circles indicate time points where
   data are missing in the original dataset.

   The cluster-wise MSE and Pearson correlation coefficient between the
   true and predicted gene expression profiles were computed to evaluate
   prediction accuracy. For the majority of clusters, the MSE was found to
   be less than 0.3, with an average MSE across all test donors below
   0.15. The Pearson correlation coefficient consistently exceeded 0.7,
   with the lowest average for a donor being 0.872, indicating a strong
   correlation between the predicted and true gene expression profiles.
   These results are depicted in [91]Fig. 7, where Panel A shows the
   cluster-wise MSE and Pearson correlation, and Panel B presents the
   averaged MSE and Pearson correlation across clusters. The very low
   P-values associated with the correlation coefficients further confirm
   the statistical significance of the observed agreement between true and
   predicted gene expression profiles.

Figure 7.

   [92]Figure 7.
   [93]Open in a new tab

   Donor-specific prediction accuracy. Comparison of prediction accuracy
   across individual test donors for the initial time point (before
   surgery). Panel A shows the MSE and Pearson correlation coefficients
   between the true and predicted gene expression profiles for each
   cluster; the corresponding P-values are color-coded in the right
   figure. Panel B displays the averaged MSE and averaged Pearson
   correlation across clusters, with the corresponding P-values also
   indicated.

   We extended our analysis to two additional time points: 5 and 30 min
   post-surgery, to assess whether selecting a later time point as the
   initial point improves prediction accuracy. However, we did not explore
   time-points beyond these, as the primary focus of our work is
   predicting future time-points based on earlier stages. We compared the
   results for these initial time-points and observed that the prediction
   accuracy remained consistent across all test donors. This comparison is
   shown in [94]Fig. 8 for donor 5, while the results for the remaining
   test donors are provided in the [95]Supplementary Fig. S2. The error
   bars represent the MSE between the true and predicted values across
   different time points for each of the 15 clusters. These results
   demonstrate that prediction accuracy does not significantly vary with
   different initial time-points. Although there is a small variation at
   the immediate next time-point for some clusters, no significant
   differences are observed at later time points. This indicates that our
   model is capable of reliably predicting future gene expression values,
   even when the gene expression profile is provided as early as the
   pre-surgery time point. To evaluate prediction accuracy for the
   post-surgical initial time points (5 and 30 min), we computed the
   cluster-wise and average MSE and Pearson correlation coefficients
   between true and predicted gene expression profiles. The results, shown
   in [96]Supplementary Fig. 3, indicate strong correlations, reinforcing
   the model’s reliability in forecasting future gene expression patterns
   regardless of the initial time point.

Figure 8.

   [97]Figure 8.
   [98]Open in a new tab

   Comparison of prediction accuracy with different initial time points.
   The figure depicts error bars representing the mean squared errors
   between the true values and the predicted values for each time point
   for donor 5. The red, green, and blue bars correspond to the initial
   time-points chosen as before surgery, 5 min post-surgery, and 30 min
   post-surgery, respectively. This comparison is shown for all 15
   clusters.

Influence of mathematical model parameters and gene clusters on liver volume
recovery

   We analyzed the effects of perturbations in both mechanistic model
   parameters and input gene expression levels to better understand the
   drivers of liver volume recovery. This dual analysis aimed to identify
   the most sensitive factors—at both the gene and the mechanistic
   levels—that influence liver regeneration outcomes. The effect of
   perturbations in mechanistic model parameters can be assessed by
   analyzing their sensitivity to the model output, specifically liver
   volume recovery. Global sensitivity analysis (GSA) was performed using
   Latin hypercube sampling (LHS) and partial rank correlation coefficient
   (PRCC) to evaluate the time-dependent sensitivity of the mathematical
   model parameters [[99]48–51]. Sensitive parameters were identified
   based on the PRCC values, as indicated by the color bar in [100]Fig. 9.
   A higher positive or negative PRCC value signifies greater sensitivity
   of the parameter with respect to changes in liver volume. Asterisks
   denote statistically significant PRCC values (P < 0.05).

Figure 9.

   [101]Figure 9.
   [102]Open in a new tab

   Global sensitivity analysis. The y-axis represents the mathematical
   model parameters, and the x-axis represents time points. The color bar
   indicates the PRCC values, which measure the sensitivity of each
   parameter. Asterisks denote statistically significant P-values: * for
   P < 0.05, ** for P < .01, and *** for P < .001.

   The GSA revealed a subset of parameters that remained consistently
   sensitive across all timepoints. These include metabolic load (M), TNF
   production rate (
   [MATH: <mrow><mrow><msub><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="italic">TNF</mi></mrow></msub></mrow></mrow> :MATH]
   ), JAK degradation rate (
   [MATH: <mrow><mrow><msub><mrow><mi>κ</mi></mrow><mrow><mi
   mathvariant="italic">JAK</mi></mrow></msub></mrow></mrow> :MATH]
   ), SOCS3 induction rate (
   [MATH: <mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi
   mathvariant="italic">SOCS</mi><mn>3</mn></mrow></msub></mrow></mrow>
   :MATH]
   ), its half-saturation constant (
   [MATH: <mrow><mrow><msubsup><mrow><mi>k</mi></mrow><mi>M</mi><mrow><mi
   mathvariant="italic">SOCS</mi><mn>3</mn></mrow></msubsup></mrow></mrow>
   :MATH]
   ), SOCS3 degradation rate (
   [MATH: <mrow><mrow><msub><mrow><mi>κ</mi></mrow><mrow><mi
   mathvariant="italic">SOCS</mi><mn>3</mn></mrow></msub></mrow></mrow>
   :MATH]
   ), immediate early gene expression rate (
   [MATH:
   <mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>I</mi><mi>E</mi></mr
   ow></msub></mrow></mrow> :MATH]
   ), its half-saturation constant (
   [MATH:
   <mrow><mrow><msubsup><mrow><mi>k</mi></mrow><mi>M</mi><mrow><mi>I</mi><
   mi>E</mi></mrow></msubsup></mrow></mrow> :MATH]
   ), IE gene degradation rate (
   [MATH:
   <mrow><mrow><msub><mrow><mi>κ</mi></mrow><mrow><mi>I</mi><mi>E</mi></mr
   ow></msub></mrow></mrow> :MATH]
   ), and the transition rate from quiescent to priming hepatocytes (
   [MATH:
   <mrow><mrow><msub><mrow><mi>k</mi></mrow><mi>Q</mi></msub></mrow></mrow
   > :MATH]
   ). Conversely, some parameters displayed strong time-dependent
   sensitivity. For example, JAK activation (
   [MATH: <mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi
   mathvariant="italic">JAK</mi></mrow></msub></mrow></mrow> :MATH]
   ), its half-saturation constant (
   [MATH: <mrow><mrow><msubsup><mrow><mi>k</mi></mrow><mi>M</mi><mrow><mi
   mathvariant="italic">JAK</mi></mrow></msubsup></mrow></mrow> :MATH]
   ), the concentration of monomeric STAT3 ([proSTAT3]), and its
   half-saturation constant (
   [MATH:
   <mrow><mrow><msubsup><mrow><mi>k</mi></mrow><mi>M</mi><mrow><mi>S</mi><
   mi>T</mi><mn>3</mn></mrow></msubsup></mrow></mrow> :MATH]
   ) were predominantly influential during the early priming phase. The
   sigmoid parameter for re-quiescence (
   [MATH: <mrow><mrow><msub><mrow><mi>θ</mi></mrow><mrow><mi
   mathvariant="italic">req</mi></mrow></msub></mrow></mrow> :MATH]
   ) showed increased sensitivity in the mid-proliferation phase. In the
   later stages of regeneration, parameters related to growth factor and
   extracellular matrix (ECM) dynamics became more dominant. These include
   growth factor production (
   [MATH:
   <mrow><mrow><msub><mrow><mi>k</mi></mrow><mrow><mi>G</mi><mi>F</mi></mr
   ow></msub></mrow></mrow> :MATH]
   ), its decay rate (
   [MATH:
   <mrow><mrow><msub><mrow><mi>κ</mi></mrow><mrow><mi>G</mi><mi>F</mi></mr
   ow></msub></mrow></mrow> :MATH]
   ), ECM decay (
   [MATH: <mrow><mrow><msub><mrow><mi>κ</mi></mrow><mrow><mi
   mathvariant="italic">ECM</mi></mrow></msub></mrow></mrow> :MATH]
   ), TNF-mediated ECM degradation (
   [MATH: <mrow><mrow><msub><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="italic">deg</mi></mrow></msub></mrow></mrow> :MATH]
   ), and transition rates between cellular states—replicating to
   quiescent (
   [MATH:
   <mrow><mrow><msub><mrow><mi>k</mi></mrow><mi>R</mi></msub></mrow></mrow
   > :MATH]
   ), proliferation to replicating (
   [MATH:
   <mrow><mrow><msub><mrow><mi>k</mi></mrow><mi>p</mi></msub></mrow></mrow
   > :MATH]
   ), proliferation rate (
   [MATH: <mrow><mrow><msub><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="italic">prol</mi></mrow></msub></mrow></mrow> :MATH]
   ), and re-quiescence rate (
   [MATH: <mrow><mrow><msub><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="italic">req</mi></mrow></msub></mrow></mrow> :MATH]
   ). These parameters grew increasingly sensitive during the late-middle
   to terminal phases of regeneration.

   Next, we systematically perturbed the expression levels of each gene
   cluster across a range of
   [MATH: <mo>±</mo> :MATH]
   100% to investigate how changes in gene expression within specific
   clusters influence liver volume recovery. This analysis was performed
   for four test donors (IDs 5, 6, 10, and 11), with results for donor 5
   shown in [103]Fig. 10; the remaining donors are presented in the
   [104]Supplementary Fig. 4. Perturbations were applied at three early
   timepoints (before surgery, 5 and 30 min), while liver volume was
   assessed at a representative late timepoint (3 months) during liver
   regeneration. Using donor-specific mean gene expression values and
   standard deviations, we perturbed each gene cluster independently while
   keeping all others constant. For each perturbation level, the modified
   gene expression profile was passed through the trained model to predict
   liver volume. These predicted volumes were then compared to the actual
   (from donor-specific mechanistic model) liver volume, and the impact of
   each cluster perturbation was visualized across timepoints.

Figure 10.

   [105]Figure 10.
   [106]Open in a new tab

   Effect of gene perturbation on liver volume recovery. For the test
   donor 5, the expression levels of individual gene clusters were
   systematically perturbed from −100% to +100% at three early time
   points: Panel A—Before surgery, Panel B—5 min post-surgery, and Panel
   C—30 min post-surgery. The corresponding liver volume was predicted at
   a representative late time point (3 months). The x-axis indicates the
   percentage change in gene expression, and the y-axis shows the
   predicted liver volume fraction. Each colored line with dots represents
   a different gene cluster, illustrating how specific clusters influence
   liver volume recovery.

   The predicted liver volumes for unperturbed gene expression profiles
   (green dashed lines in [107]Fig. 10) closely matched the actual
   observed volumes (red dashed lines), demonstrating the accuracy of our
   PePMDT model and providing a robust baseline for assessing the effects
   of gene perturbations. When individual gene clusters were perturbed,
   their impact on predicted liver volume varied across timepoints,
   highlighting dynamic and time-specific sensitivities in the
   regenerative process. Although the overall liver volume recovery did
   not exhibit dramatic shifts, specific gene clusters influenced the
   model’s predictions more strongly at certain stages. For instance,
   clusters 5 and 6 showed notable effects before surgery, while clusters
   8 and 14 had greater influence at 5 and 30 min post-surgery. This
   evidence supports the hypothesis that different clusters play
   temporally distinct roles in liver regeneration. Furthermore, the
   pattern of these effects differed between donors, underscoring the
   personalized nature of the model and its capacity to capture individual
   variability in regenerative dynamics.

   Perturbation of individual gene clusters revealed distinct,
   time-dependent sensitivities in liver volume predictions, with certain
   clusters exerting stronger effects at specific post-surgical phases.
   Insights from GSA further support a temporal shift in dominant
   regulatory mechanisms—from immune signaling and early stress responses
   during the initial stages to growth factor signaling and cell cycle
   regulation in later phases. This evolving sensitivity landscape
   highlights the critical role of temporal context in understanding
   regenerative control. Together, these analyses identify both
   consistently and phase-specifically sensitive regulatory modules,
   offering promising candidates for further mechanistic investigation and
   potential clinical intervention strategies to support and modulate
   liver regeneration. These findings underscore the utility of the
   digital twin framework in capturing the dynamic and donor-specific
   regulatory landscape underlying liver regeneration.

Discussion

   Digital twin models have gained traction in healthcare, offering
   donor-specific simulations of disease progression and treatment
   responses [[108]31]. Notable applications include heart failure
   management [[109]52], personalized cancer treatment [[110]53–57], organ
   transplantation [[111]58], and liver regeneration [[112]59, [113]60].
   However, many existing models focus on static representations of
   disease progression, limiting their ability to capture the dynamic
   nature of biological processes. A notable exception is Ref. [[114]61],
   which developed a framework for modeling personalized bladder cancer
   progression and optimizing adaptive multi-drug treatment strategies
   using deep reinforcement learning. However, no similar studies have
   been found for liver regeneration.

   In this study, we developed PePMDT, a digital twin model that
   integrates deep learning with mathematical modeling to predict
   donor-specific gene expression dynamics during liver regeneration. Our
   approach successfully reconstructed future postoperative gene
   expression profiles using very early postoperative data, including
   pre-resection data and data from 5 to 30 min after resection,
   demonstrating high predictive accuracy. PePMDT offers a dynamic,
   personalized framework that seamlessly integrates gene expression data
   with mechanistic simulations to predict long-term liver function
   recovery.

   The novelty of our study lies in the development of a hybrid modeling
   framework that uniquely integrates mechanistic mathematical modeling
   with deep learning-based autoencoders to construct a biologically
   interpretable PePMDT for liver regeneration. While autoencoders have
   been utilized in prior time-series prediction tasks, our framework
   embeds a mechanistically constrained latent space—comprising
   physiologically meaningful variables defined by a differential equation
   model of liver regeneration—between the encoder and decoder. This
   architecture facilitates biologically grounded, bidirectional
   translation between longitudinal gene expression data and mechanistic
   model dynamics. Unlike conventional deep learning methods that operate
   as black boxes, our model offers transparent, mechanism-informed
   predictions of donor-specific liver regeneration. To the best of our
   knowledge, this is the first instance where such an integrated digital
   twin, trained on real longitudinal transcriptomic data from LDLT
   donors, has been applied to model liver recovery. By bridging
   molecular-scale fluctuations with organ-level regenerative responses,
   PePMDT sets a new direction in interpretable and personalized
   predictive modeling for regenerative medicine.

   We evaluated its performance on donor-specific data to assess the
   predictive accuracy of PePMDT. Each donor exhibited unique gene
   expression patterns, underscoring the personalized nature of liver
   regeneration. Our model accurately predicted gene expression profiles
   across all donors, while also capturing substantial variability in
   recovery trajectories. Quantitatively, we assessed prediction accuracy
   using mean squared error and Pearson correlation coefficients across
   gene clusters, with the lowest average correlation across all donors
   being 0.83, indicating strong alignment between observed and predicted
   values. Despite perturbing individual gene clusters by up to
   [MATH: <mo>±</mo> :MATH]
   100%, the predicted liver volume fractions exhibited relatively modest
   changes. This indicates that the neural network model has not
   overfitted to specific gene expression patterns but has instead learned
   a generalized and robust mapping from gene clusters to liver volume
   dynamics. The model’s stability in the face of extreme perturbations
   highlights its ability to capture underlying biological principles
   rather than spurious correlations, further supporting its potential for
   predictive and mechanistic insights in diverse donor contexts.

   Beyond its predictive capability, PePMDT enables what-if simulations,
   allowing the exploration of liver recovery under different clinical
   scenarios, such as varying degrees of liver damage or resections. This
   predictive power provides a valuable tool for understanding liver
   regeneration at a molecular level and optimizing treatment strategies.
   By constructing a computational model informed by real donor gene
   expression data, PePMDT facilitates personalized medicine applications,
   allowing clinicians to anticipate complications and tailor
   interventions based on a donor’s molecular profile. Moreover,
   simulations of gene expression dynamics over time validate the model’s
   utility in predicting individual recovery trajectories. By accurately
   forecasting future gene expression states based on early molecular
   patterns, PePMDT bridges high-throughput genomic data with
   physiologically relevant mathematical modeling, offering a scalable and
   robust framework for precision medicine in liver regeneration.

Supplementary Material

   bpaf037_Supplementary_Data
   [115]bpaf037_supplementary_data.zip^ (28.1MB, zip)

Contributor Information

   Suvankar Halder, Laboratory of Biological Modeling, National Institutes
   of Diabetes and Digestive and Kidney Diseases, National Institutes of
   Health, Bethesda, MD, 20892, United States.

   Michael C Lawrence, Baylor Scott & White Research Institute, Dallas,
   TX, 75204, United States.

   Giuliano Testa, Annette C. and Harold C. Simmons Transplant Institute,
   Baylor University Medical Center, Dallas, TX, 76104, United States.

   Vipul Periwal, Laboratory of Biological Modeling, National Institutes
   of Diabetes and Digestive and Kidney Diseases, National Institutes of
   Health, Bethesda, MD, 20892, United States.

Author contributions

   Suvankar Halder (Formal analysis [lead], Writing—original draft
   [lead]), Michael C. Lawrence (Data curation [lead]), and Giuliano Testa
   (Data curation [lead]), Vipul Periwal (Conceptualization [lead],
   Supervision [lead], Writing—review & editing [lead])

Supplementary data

   [116]Supplementary data is available at Biology Methods and Protocols
   online.

Conflict of interest

   The authors declare no competing interests.

Funding

   This research was supported by the Intramural Research Program of the
   NIH, the National Institute of Diabetes and Digestive and Kidney
   Diseases (NIDDK, Project Z01-DK075059).

Data availability

   All code and data required to replicate the analysis presented in this
   study are available at [117]https://github.com/nihcompmed/PePMDT.

References