Abstract The remarkable capacity of the liver to regenerate its lost mass after resection makes living donor liver transplantation a successful treatment option. However, donor heterogeneity significantly influences recovery trajectories, highlighting the need for individualized monitoring. With the rising incidence of liver diseases, safer transplant procedures and improved donor care are urgently needed. Current clinical markers provide only limited snapshots of recovery, making it challenging to predict long-term outcomes. Following partial hepatectomy, precise liver mass recovery requires tightly regulated hepatocyte proliferation. We identified distinct gene expression patterns associated with liver regeneration by analyzing blood-derived gene expression measurements from twelve donors followed over a year. Using a deep learning-based framework, we integrated these patterns with a mathematical model of hepatocyte transitions to develop a personalized, progressive mechanistic digital twin—a virtual liver model that predicts donor-specific recovery trajectories. Central to our approach is a mechanistically identifiable latent space, defined by variables derived from a physiologically grounded differential equation model of liver regeneration, which enables biologically interpretable, bidirectional mapping between gene expression data and model dynamics. This approach integrates clinical genomics and computational modeling to enhance post-surgical care, ensuring safer transplants and improved donor recovery. Keywords: liver regeneration, partial hepatectomy, mathematical modeling, deep learning, digital twin, living donor liver transplant (LDLT) Introduction The liver has the ability to regenerate and restore its functional mass following injury, disease, or surgical removal, such as partial hepatectomy. This regenerative capacity is vital for maintaining the body’s homeostasis, given its central role in numerous physiological processes, including the detoxification of harmful substances, the metabolism of nutrients and drugs, and the synthesis of essential proteins such as albumin and clotting factors [[30]1–4]. The proliferation of mature hepatocytes, the functional cells of the liver, primarily drives liver regeneration. These hepatocytes, generally in a quiescent state (the G0 phase of the cell cycle), can re-enter the cell cycle and proliferate in response to injury or a loss of liver mass. This proliferation is a meticulously regulated process, ensuring that the liver regenerates to its original size without excessive or insufficient growth [[31]5–7]. Apart from the hepatocytes, supporting endothelial cells, hepatic stellate cells, and Kupffer cells play critical roles in liver regeneration. Endothelial cells facilitate the restoration of the liver vascular network, ensuring adequate blood flow and nutrient delivery during the regenerative process. Hepatic stellate cells contribute by producing extracellular matrix (ECM) components and growth factors, while Kupffer cells, the liver’s resident macrophages, secrete cytokines and chemokines that prime hepatocytes for regeneration [[32]8–14]. This collective and coordinated interplay among hepatocytes and their supporting cells enables the capacity of the liver to regenerate while maintaining its essential functions. This capability of the liver to continue its normal functioning even during the regeneration process permits living donor liver transplantation (LDLT) [[33]15]. Liver regeneration is influenced by multiple factors, including the host’s metabolic environment, graft type, and hepatectomy volume. Studies have shown that left lobe grafts regenerate faster than right lobes, with recipients exhibiting higher regeneration rates than donors [[34]16]. Additionally, minor resections primarily rely on hypertrophy, whereas extensive resections involve both hypertrophy and hepatocyte proliferation, often leading to increased cell ploidy [[35]16–18]. These complexities make it challenging to predict individual recovery trajectories following LDLT. Inter-donor variability significantly impacts recovery outcomes in LDLT, with factors such as genetic makeup, physiological status, pre-existing liver conditions, and immune responses playing crucial roles. For example, high intra-donor variability in tacrolimus levels has been associated with adverse outcomes, including graft rejection and the development of donor-specific antibodies [[36]19]. Furthermore, the specialized nature of LDLT limits data availability, as it represents only a tiny fraction of total liver transplants. A study analyzing 73 681 adult recipients found that only 4% underwent LDLT, with substantial variation across transplant centers [[37]20]. This scarcity of cases, combined with limited longitudinal follow-up data, poses significant challenges in developing clinically relevant predictive models for liver regeneration. The process of liver regeneration is orchestrated by dynamic transitions of hepatocytes through quiescent, primed, and replicative states, regulated by intricate signaling pathways [[38]21–23]. Understanding molecular drivers, including gene activation and suppression, is critical for modeling and predicting these regenerative dynamics [[39]24–27]. Building on a foundation of experimental observations, we developed the first mathematical model to predict liver volume after partial hepatectomy (PHx) [[40]28, [41]29], offering a quantitative framework to predict regenerative trajectories. To make such predictive models clinically relevant, we must integrate real-time donor-specific data with the dynamical framework provided by our dynamical model [[42]30, [43]31]. This is essential for capturing the complex interplay of molecular, cellular, and systemic factors that drive liver regeneration [[44]1, [45]22, [46]32]. However, challenges such as significant inter-donor variability, the nonlinear nature of liver recovery, and the integration of heterogeneous datasets complicate the development of personalized predictive models [[47]31]. To address these complexities, this study introduces the personalized progressive mechanistic digital twin (PePMDT), a donor-specific digital twin for the livers of LDLT donors. By integrating donor-specific clinical and molecular data with deep learning techniques, PePMDT enables precise simulation of regeneration dynamics. Similar deep learning ideas have also been applied in systems biology and multi-omics integration to uncover functional relationships between gene regulatory networks and physiological outcomes [[48]33–36]. The digital twin framework begins with collecting blood samples from a specific LDLT donor after partial hepatectomy. Whole-transcriptomic analysis is then performed, and the identified genes are clustered into various functional groups. These grouped genes are subsequently mapped onto our mechanistic mathematical model that describes liver regeneration processes. A key novelty of our approach lies in the construction of a mechanistically identifiable latent space between the encoder and decoder networks, composed of variables derived from physiological model of liver regeneration [[49]28, [50]29]. This biologically interpretable latent space allows bidirectional mapping between gene expression profiles and mechanistic variables, making our digital twin framework fundamentally different from black-box deep learning approaches. The model predicts dynamic variables at future time points, which are then used to forecast future gene expression patterns, enabling a data-driven approach to personalized liver recovery assessment, as depicted in [51]Fig. 1. This personalized framework has the potential to optimize post-transplant care, enhance recovery outcomes, and advance the field of regenerative medicine. Figure 1. [52]Figure 1. [53]Open in a new tab Framework of the digital twin. The figure depicts the general framework of the digital twin for liver regeneration. Blood samples are collected from a specific LDLT donor after partial hepatectomy, followed by whole transcriptomic analysis. Identified genes are clustered into functional groups and mapped onto a mechanistic mathematical model describing liver regeneration. The model predicts dynamic variables at future time points, which are then used to forecast future gene expression patterns. Methods Streamlining gene expression profiles The dataset consists of whole-transcriptome RNA sequencing data from 12 healthy LDLT donors, collected at 14 predefined time points over a 1-year follow-up period. All LDLT donors underwent partial hepatectomy, and blood samples were collected at specific intervals post-surgery, including 5 min, 30 min, 1 h, 2 h, 3 h, 4 h, 1 day, 2 days, 3 days, 4 days, 10 days, 3 months, 6 months, and 12 months. The aim was to explore the dynamic transcriptional changes associated with liver regeneration post-PHx. This comprehensive dataset provided a unique opportunity to capture the liver’s gene expression landscape during recovery, from the immediate post-operative phase to the long-term restoration of liver function. [54]Figure 2 presents a visual representation of the data structure, providing context for how the data were organized and utilized in this analysis. To ensure accurate downstream analysis and model predictions, the RNA-seq data underwent several refinement steps aimed at standardizing gene expression values across different donors and time points. These steps were essential to handle potential technical and biological noise inherent in high-throughput sequencing data. Figure 2. [55]Figure 2. [56]Open in a new tab Data structure. The dataset consists of samples obtained from 12 healthy LDLT donors, collected at 14 distinct time points over the course of 1 year following partial hepatectomy. Filled color boxes indicate the presence of data, while blank spaces represent missing data. Donors are arranged in descending order based on the number of available time points. The number of available data points out of the 14 total time points is indicated on the right side for each donor. The top two-thirds of the dataset is used for training, while the remaining one-third serves as the testing dataset. The lower portion of the dataset was chosen for testing because these samples lack gene expression values in the later stages, providing an opportunity for predictive modeling. Missing value imputation Due to the nature of RNA-seq data, some gene expression values were missing or below detection limits. To address this, we filled the missing values by averaging the corresponding expression values from other donors. This approach ensured that the imputed values were biologically plausible and aligned with the collective data structure. By using donor-averaged imputation, we preserved the continuity of expression profiles across time points and minimized biases in downstream analyses, ensuring robust and reliable integration of the data into our model. Correction of negative expression values In some instances, negative expression values were observed due to noise in low-expressed genes. These negative values were biologically unrealistic and could interfere with further analysis. To address this issue, we replaced all negative values with the average of the corresponding expression values from other donors. This replacement aims to ensure that no low or negative value artifacts skew the results, particularly in downstream clustering or modeling. Log-transformation Gene expression values were log-transformed to reduce the skewness of the data and to stabilize the variance across time points and donors. Log-transformation compresses the range of gene expression values, potentially revealing more patterns in genes with both high and low expression. This transformation is especially useful in dynamic biological processes like liver regeneration, where gene expression changes can span several orders of magnitude over time. The transformation used was: [MATH: Log transformed< mi> data=log2(1+data) :MATH] This pipeline made the RNA-seq data ready for subsequent analyses, including clustering, temporal trend analysis, and mapping to the mathematical model variables. Gene co-expression analysis: Identification of temporal gene modules The dataset contains 16 735 RNA gene transcripts in the blood, each expressed at least once during post-surgical recovery. Among these, 6649 genes exhibited a differential expression of 2-fold or more across at least two consecutive time points compared to the preoperative baseline following liver resection. Additionally, 1110 genes showed expression changes of 1.5-fold or more at pre-resection time points, likely due to nonspecific surgical effects from the abdominal incision. These genes were excluded to focus on those potentially specific to liver regeneration. This left 5876 genes that were identified as differentially expressed genes (DEGs) and specific to liver regeneration. Weighted gene co-expression network analysis (WGCNA) [[57]37] was employed on the DEGs to identify groups of genes that exhibited distinct temporal patterns during liver regeneration. These groups were further refined using self-organizing maps (SOM), which organized the genes into clusters based on their temporal expression profiles. SOM is particularly useful for visualizing time-series data, as it provides a low-dimensional representation of complex patterns, helping to reveal temporal trends in gene expression [[58]38, [59]39]. These methods collectively identify clusters of highly correlated genes, which are believed to be involved in shared biological processes. Each cluster is assigned a unique name and represents a set of genes whose expression patterns are highly similar across donors and time points. This method allows for the identification of clusters without prior knowledge of the biological pathways involved, making it an unbiased tool for analyzing dynamic gene expression data [[60]37, [61]40]. Mathematical and molecular framework of liver regeneration Liver regeneration following partial hepatectomy involves a series of biological processes, including hepatocyte proliferation, ECM degradation, and tissue remodeling. These processes are regulated by dynamic interactions between genes and the tissue environment, which can be modeled mathematically. Our previously developed mathematical model [[62]28, [63]29] simulates liver regeneration by incorporating key processes such as liver volume recovery, hepatocyte proliferation rates, ECM turnover, and tissue remodeling. The liver regeneration process is characterized by a coordinated transition of hepatocytes through distinct states—quiescent (Q), primed (P), and replicating (R). These transitions are illustrated in [64]Fig. 3A. While the total cell count consists of Q, P, and R cells, the overall liver volume increases due to the hypertrophy of primed and replicating cells [[65]18]. The reduction in hepatocyte number following resection increases the metabolic load per hepatocyte (M/N), triggering the expression of growth factors (GF) and tumor necrosis factor (TNF). TNF promotes ECM degradation via MMPs and induces IL-6 expression, which activates the JAK-STAT3 signaling pathway. STAT3, in turn, drives the transcription of SOCS3 and immediate early (IE) genes, shifting hepatocytes into a primed state. In this state, hepatocytes remain metabolically active but do not yet proliferate. Cytokines such as IL-6 and TNF- [MATH: α :MATH] , along with growth factors like HGF and EGF, further reinforce this priming by upregulating transcriptional programs involving immediate early genes, such as c-Jun and c-Fos. This prepares hepatocytes to respond to mitogenic signals and transition into the replicating state, where DNA synthesis and cell division occur to restore liver mass. As regeneration progresses, TNF levels decrease, ECM reformation occurs, and GF are inactivated, for example by uptake into the ECM, ultimately halting the primed-to-replicating transition and promoting the shift from replication back to quiescence. The molecular interactions underlying this process are depicted in [66]Fig. 3B. A detailed mathematical framework is given in [[67]28, [68]29]. Figure 3. [69]Figure 3. [70]Open in a new tab Schematic overview of liver regeneration. (A) The cell cycle dynamics during liver regeneration, highlighting changes in cell numbers. (B) The biochemical pathways involved in liver regeneration, illustrating key molecular interactions driving hepatocyte proliferation and tissue remodeling. Constructing the digital twin Integration of gene expression data with mechanistic modeling represents a potentially transformative approach to predicting personalized liver recovery. In this study, we constructed a digital twin, PePMDT, through a two-step mapping process combined with mathematical model simulations (see [71]Fig. 4). In the first step, we map gene expression data onto key physiological parameters of the mathematical model. This transformation enables us to represent biological states in terms of mathematically defined variables governing liver regeneration dynamics. Once the gene expression data has been mapped, we use the mathematical model to predict liver regeneration dynamics at future time points. These simulations predict the evolution of key physiological parameters, capturing hepatocyte proliferation, ECM turnover, and tissue remodeling over time. In the final step, we reverse-map the simulated mathematical model variables back to gene expression values at future time points. This process gives predicted gene expression profiles, allowing us to estimate the molecular responses of LDLT donors over time based on their initial gene expression states. These forward and reverse mappings are described in the following subsections. Figure 4. [72]Figure 4. [73]Open in a new tab Schematic representation of the digital twin framework. The process involves a two-step mapping approach combined with mathematical model simulations. First, gene expression data are mapped onto key physiological parameters of the mathematical model. The model then simulates liver regeneration dynamics over future time points. Finally, the simulated mathematical values are reverse-mapped to reconstruct gene expression values, enabling the prediction of molecular responses in LDLT donors over time. Mapping gene expression profiles to mathematical model variables The mathematical model [[74]28, [75]29] describes hepatocyte state transitions in response to molecular signals, providing insights into liver recovery and potential therapeutic targets. However, they lack direct links to gene regulation. To bridge this gap, we used RNA-seq data from LDLT donors to map gene expression onto dynamic model variables. Using a feedforward neural network (FNN), we mapped gene expression data into key physiological parameters, enabling a gene-driven understanding of liver regeneration. The FNN consists of three fully connected layers, each followed by ReLU activation functions to introduce nonlinearity, potentially allowing the model to capture complex relationships between gene expression patterns and liver regeneration states. Dropout layers (0.2 probability) are incorporated to prevent overfitting, aiming to ensure robust generalization across different gene expression profiles. The input layer receives gene expression data clustered into biologically relevant groups (15 clusters), which is then transformed through progressively wider layers (64 and 128 neurons) to extract meaningful patterns. The final layer outputs a lower-dimensional representation consisting of mathematical model variables corresponding to the measured gene expression levels computed based on donor-specific factors, such as the extent of liver resection, which plays a crucial role in the mathematical modeling framework. This parameter significantly influences the simulation of liver regeneration dynamics. The FNN is trained using a supervised learning approach, where the loss function (mean squared error, MSE) measures the difference between predicted and actual physiological states. FNN parameters are optimized using backpropagation and the Adam optimizer [[76]41]. Reverse mapping: Predicting gene expression from model variables After mapping gene expression data to mathematical model variables, the model simulates liver regeneration dynamics over future time points. A reverse mapping step is required to translate these simulated variables back into gene expression values, predicting future molecular responses. This transformation is performed using an FNN as well. The architecture mirrors the FNN used for the forward mapping, consisting of three fully connected layers with ReLU activation functions to introduce nonlinearity and dropout layers (0.2 probability) to prevent overfitting. The input layer receives simulated model variables, processes them through progressively wider layers (64 and 128 neurons), and outputs gene expression values corresponding to biologically relevant clusters. The model is optimized using the Adam optimizer, with MSE as the loss function to minimize the discrepancy between predicted and actual gene expression values. Results The genetic blueprint of liver regeneration: Clustering analysis reveals temporal gene modules in liver regeneration The RNA-seq data from 12 healthy LDLT donors, collected over a 1-year follow-up period, were obtained from Lawrence et al. [[77]42]. The donors were divided into two sets: two-thirds were used as training donors, and one-third were used as test donors. These data underwent several refinement steps to standardize them, as detailed in the Methods section. To identify temporally regulated gene modules representing distinct biological processes across different phases of liver regeneration, we applied a combination of WGCNA and SOM [[78]37–40, [79]43–46]. This approach enabled the identification of gene clusters exhibiting dynamic regulation, capturing key patterns of gene expression over time. These identified modules form the foundation for understanding the molecular mechanisms underlying liver regeneration. Moreover, this clustering approach revealed gene networks with synchronized behavior, providing insights into the transcriptional programs that drive recovery after partial hepatectomy. After applying WGCNA and SOM (for details see Methods section), the genes were clustered into 15 distinct groups (see [80]Fig. 5)—a higher number than the eight clusters used by Lawrence et al. [[81]42]. This increased number of clusters allowed for a more precise differentiation of DEGs based not only on their temporal expression patterns but also on their expression values, which proved beneficial for subsequent analyses. To assess the biological significance of the clusters, we compared our results with those of Lawrence et al. [[82]42]. Consistent with their findings, we also identified that the clusters can be grouped into three prominent functional temporal gene modules corresponding to the main phases of liver regeneration: early response (Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, and 15), proliferation (Clusters 3 and 6), and long-term recovery (Clusters 1 and 2). Each module comprises genes with specific functions that are activated during distinct stages of the recovery process of the liver. SOM generates results based on random seeds, meaning that the outcomes may vary slightly for individual clusters with each run. However, this does not affect the overall functional modules of regeneration. The key processes associated with each module were determined using functional enrichment analysis using the Enrichr database [[83]47] and are outlined to highlight their relevance to the overall biological context. The detailed figure of the gene ontology (GO) and pathway enrichment analysis is provided in the [84]Supplementary Fig. 1. All gene clusters and their corresponding top driver genes are provided in the [85]Supplementary Materials. Figure 5. [86]Figure 5. [87]Open in a new tab Temporal gene modules. A visual representation of the temporal gene expression patterns across 15 distinct clusters, which were subsequently grouped into three modules, during the liver resection and regeneration process. The bold lines represent the mean log2 fold change relative to preoperative values, and the shaded regions indicate the standard deviations. * Early-response module (Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, and 15): This module consists of genes that peak within the first 24 h post-PHx. These genes are primarily involved in the inflammatory response and early ECM degradation and remodeling. This triggers immune cell recruitment and matrix breakdown, which are crucial for initiating the regenerative process. Key genes in this module include SLC39A1, which is involved in zinc transport and immune regulation; IFNGR2, critical for immune response and inflammation; CMTM4, which modulates cytokine signaling; SEMA6C, guiding immune cell migration and tissue repair; IGKV1-33, playing a role in immune response through immunoglobulin synthesis; LGMN, a protease responsible for ECM degradation; CCR6, mediating immune cell recruitment to injury sites; FES, a tyrosine kinase participating in ECM remodeling; ADM, a cytokine involved in vasodilation and inflammation; and WNT2B, part of the Wnt signaling pathway that promotes cell proliferation and tissue regeneration. The temporal peak of this module aligns with the critical early stages of liver regeneration, where tissue damage needs to be repaired to create a favorable environment for cell proliferation. * Proliferation module (Clusters 3 and 6): Active between 1 and 10 days post-surgery, this module is enriched for genes related to cell cycle progression, hepatocyte proliferation, and DNA replication. During this phase, the liver undergoes rapid growth as hepatocytes proliferate to replace the lost tissue. This process is tightly regulated by growth factors and cell cycle regulators, ensuring controlled proliferation. Genes in this module include cyclins, cyclin-dependent kinases (CDKs), and growth factor receptors. For example, Cluster 3 contains TFDP1, a key transcription factor in cell cycle regulation, along with COLGALT1, HOMER3, ITGAD, and ANG, which are associated with chromatin remodeling, cell signaling, and angiogenesis. Cluster 6 includes immune-regulatory and proliferation-associated genes such as IFIT1, CXCL8, CXCL6, IFIT3, and AHRR, which are known to be involved in inflammation-driven proliferation and tissue remodeling. The peak of this module corresponds to the proliferative burst that occurs during liver regeneration, with hepatocyte replication reaching its highest levels between 3 and 10 days. * Long-term recovery module (Clusters 1 and 2): This module is characterized by genes that show sustained upregulation beyond 3 months post-PHx. These genes are associated with ECM organization, tissue remodeling, and the restoration of normal liver function. As the liver nears the completion of regeneration, it enters a phase of tissue stabilization, where the newly formed ECM becomes organized, and liver architecture is restored to support long-term functional recovery. Key genes in this module include PTGS1, TSTA3, MAP1A, and CTSE, which are involved in extracellular matrix remodeling. Genes such as ESAM, PTGFRN, and CORIN are associated with angiogenesis and vascular functions, facilitating proper blood vessel formation and remodeling. DAP and SIAH2 contribute to tissue remodeling and maintaining homeostasis. Additionally, FAXDC2, ACCSL, and possibly GPR146 play roles in liver function and lipid metabolism, supporting the restoration of metabolic activity during the late phase of liver regeneration. These three modules capture the temporal dynamics of liver regeneration and also reflect distinct biological processes that are activated sequentially. PepMDT predicts donor-specific future gene expression patterns related to liver mass recovery with precision Comparisons of predicted gene expression from PePMDT with actual gene expression profiles across multiple time points and donors demonstrated a strong alignment between the two datasets. In this analysis, the initial gene expression profiles of test donors (before surgery values) were input into the PePMDT model, which then predicted gene expression values for each cluster at subsequent time points. [88]Figure 6 illustrates the results for all test donors: donors 5, 6, 10, and 11. Figure 6. [89]Figure 6. [90]Open in a new tab Comparison of original and predicted gene expression profiles. The plot illustrates the temporal gene expression patterns for individual test donors—donor 5 (A), donor 6 (B), donor 10 (C), and donor 11 (D). Bold lines represent the mean log2 fold change relative to preoperative values, with shaded regions indicating standard deviations. Red lines denote the predicted values, while blue lines represent the original (true) gene expression values. Open circles indicate time points where data are missing in the original dataset. The cluster-wise MSE and Pearson correlation coefficient between the true and predicted gene expression profiles were computed to evaluate prediction accuracy. For the majority of clusters, the MSE was found to be less than 0.3, with an average MSE across all test donors below 0.15. The Pearson correlation coefficient consistently exceeded 0.7, with the lowest average for a donor being 0.872, indicating a strong correlation between the predicted and true gene expression profiles. These results are depicted in [91]Fig. 7, where Panel A shows the cluster-wise MSE and Pearson correlation, and Panel B presents the averaged MSE and Pearson correlation across clusters. The very low P-values associated with the correlation coefficients further confirm the statistical significance of the observed agreement between true and predicted gene expression profiles. Figure 7. [92]Figure 7. [93]Open in a new tab Donor-specific prediction accuracy. Comparison of prediction accuracy across individual test donors for the initial time point (before surgery). Panel A shows the MSE and Pearson correlation coefficients between the true and predicted gene expression profiles for each cluster; the corresponding P-values are color-coded in the right figure. Panel B displays the averaged MSE and averaged Pearson correlation across clusters, with the corresponding P-values also indicated. We extended our analysis to two additional time points: 5 and 30 min post-surgery, to assess whether selecting a later time point as the initial point improves prediction accuracy. However, we did not explore time-points beyond these, as the primary focus of our work is predicting future time-points based on earlier stages. We compared the results for these initial time-points and observed that the prediction accuracy remained consistent across all test donors. This comparison is shown in [94]Fig. 8 for donor 5, while the results for the remaining test donors are provided in the [95]Supplementary Fig. S2. The error bars represent the MSE between the true and predicted values across different time points for each of the 15 clusters. These results demonstrate that prediction accuracy does not significantly vary with different initial time-points. Although there is a small variation at the immediate next time-point for some clusters, no significant differences are observed at later time points. This indicates that our model is capable of reliably predicting future gene expression values, even when the gene expression profile is provided as early as the pre-surgery time point. To evaluate prediction accuracy for the post-surgical initial time points (5 and 30 min), we computed the cluster-wise and average MSE and Pearson correlation coefficients between true and predicted gene expression profiles. The results, shown in [96]Supplementary Fig. 3, indicate strong correlations, reinforcing the model’s reliability in forecasting future gene expression patterns regardless of the initial time point. Figure 8. [97]Figure 8. [98]Open in a new tab Comparison of prediction accuracy with different initial time points. The figure depicts error bars representing the mean squared errors between the true values and the predicted values for each time point for donor 5. The red, green, and blue bars correspond to the initial time-points chosen as before surgery, 5 min post-surgery, and 30 min post-surgery, respectively. This comparison is shown for all 15 clusters. Influence of mathematical model parameters and gene clusters on liver volume recovery We analyzed the effects of perturbations in both mechanistic model parameters and input gene expression levels to better understand the drivers of liver volume recovery. This dual analysis aimed to identify the most sensitive factors—at both the gene and the mechanistic levels—that influence liver regeneration outcomes. The effect of perturbations in mechanistic model parameters can be assessed by analyzing their sensitivity to the model output, specifically liver volume recovery. Global sensitivity analysis (GSA) was performed using Latin hypercube sampling (LHS) and partial rank correlation coefficient (PRCC) to evaluate the time-dependent sensitivity of the mathematical model parameters [[99]48–51]. Sensitive parameters were identified based on the PRCC values, as indicated by the color bar in [100]Fig. 9. A higher positive or negative PRCC value signifies greater sensitivity of the parameter with respect to changes in liver volume. Asterisks denote statistically significant PRCC values (P < 0.05). Figure 9. [101]Figure 9. [102]Open in a new tab Global sensitivity analysis. The y-axis represents the mathematical model parameters, and the x-axis represents time points. The color bar indicates the PRCC values, which measure the sensitivity of each parameter. Asterisks denote statistically significant P-values: * for P < 0.05, ** for P < .01, and *** for P < .001. The GSA revealed a subset of parameters that remained consistently sensitive across all timepoints. These include metabolic load (M), TNF production rate ( [MATH: kTNF :MATH] ), JAK degradation rate ( [MATH: κJAK :MATH] ), SOCS3 induction rate ( [MATH: VSOCS3 :MATH] ), its half-saturation constant ( [MATH: kMSOCS3 :MATH] ), SOCS3 degradation rate ( [MATH: κSOCS3 :MATH] ), immediate early gene expression rate ( [MATH: VIE :MATH] ), its half-saturation constant ( [MATH: kMI< mi>E :MATH] ), IE gene degradation rate ( [MATH: κIE :MATH] ), and the transition rate from quiescent to priming hepatocytes ( [MATH: kQ :MATH] ). Conversely, some parameters displayed strong time-dependent sensitivity. For example, JAK activation ( [MATH: VJAK :MATH] ), its half-saturation constant ( [MATH: kMJAK :MATH] ), the concentration of monomeric STAT3 ([proSTAT3]), and its half-saturation constant ( [MATH: kMS< mi>T3 :MATH] ) were predominantly influential during the early priming phase. The sigmoid parameter for re-quiescence ( [MATH: θreq :MATH] ) showed increased sensitivity in the mid-proliferation phase. In the later stages of regeneration, parameters related to growth factor and extracellular matrix (ECM) dynamics became more dominant. These include growth factor production ( [MATH: kGF :MATH] ), its decay rate ( [MATH: κGF :MATH] ), ECM decay ( [MATH: κECM :MATH] ), TNF-mediated ECM degradation ( [MATH: kdeg :MATH] ), and transition rates between cellular states—replicating to quiescent ( [MATH: kR :MATH] ), proliferation to replicating ( [MATH: kp :MATH] ), proliferation rate ( [MATH: kprol :MATH] ), and re-quiescence rate ( [MATH: kreq :MATH] ). These parameters grew increasingly sensitive during the late-middle to terminal phases of regeneration. Next, we systematically perturbed the expression levels of each gene cluster across a range of [MATH: ± :MATH] 100% to investigate how changes in gene expression within specific clusters influence liver volume recovery. This analysis was performed for four test donors (IDs 5, 6, 10, and 11), with results for donor 5 shown in [103]Fig. 10; the remaining donors are presented in the [104]Supplementary Fig. 4. Perturbations were applied at three early timepoints (before surgery, 5 and 30 min), while liver volume was assessed at a representative late timepoint (3 months) during liver regeneration. Using donor-specific mean gene expression values and standard deviations, we perturbed each gene cluster independently while keeping all others constant. For each perturbation level, the modified gene expression profile was passed through the trained model to predict liver volume. These predicted volumes were then compared to the actual (from donor-specific mechanistic model) liver volume, and the impact of each cluster perturbation was visualized across timepoints. Figure 10. [105]Figure 10. [106]Open in a new tab Effect of gene perturbation on liver volume recovery. For the test donor 5, the expression levels of individual gene clusters were systematically perturbed from −100% to +100% at three early time points: Panel A—Before surgery, Panel B—5 min post-surgery, and Panel C—30 min post-surgery. The corresponding liver volume was predicted at a representative late time point (3 months). The x-axis indicates the percentage change in gene expression, and the y-axis shows the predicted liver volume fraction. Each colored line with dots represents a different gene cluster, illustrating how specific clusters influence liver volume recovery. The predicted liver volumes for unperturbed gene expression profiles (green dashed lines in [107]Fig. 10) closely matched the actual observed volumes (red dashed lines), demonstrating the accuracy of our PePMDT model and providing a robust baseline for assessing the effects of gene perturbations. When individual gene clusters were perturbed, their impact on predicted liver volume varied across timepoints, highlighting dynamic and time-specific sensitivities in the regenerative process. Although the overall liver volume recovery did not exhibit dramatic shifts, specific gene clusters influenced the model’s predictions more strongly at certain stages. For instance, clusters 5 and 6 showed notable effects before surgery, while clusters 8 and 14 had greater influence at 5 and 30 min post-surgery. This evidence supports the hypothesis that different clusters play temporally distinct roles in liver regeneration. Furthermore, the pattern of these effects differed between donors, underscoring the personalized nature of the model and its capacity to capture individual variability in regenerative dynamics. Perturbation of individual gene clusters revealed distinct, time-dependent sensitivities in liver volume predictions, with certain clusters exerting stronger effects at specific post-surgical phases. Insights from GSA further support a temporal shift in dominant regulatory mechanisms—from immune signaling and early stress responses during the initial stages to growth factor signaling and cell cycle regulation in later phases. This evolving sensitivity landscape highlights the critical role of temporal context in understanding regenerative control. Together, these analyses identify both consistently and phase-specifically sensitive regulatory modules, offering promising candidates for further mechanistic investigation and potential clinical intervention strategies to support and modulate liver regeneration. These findings underscore the utility of the digital twin framework in capturing the dynamic and donor-specific regulatory landscape underlying liver regeneration. Discussion Digital twin models have gained traction in healthcare, offering donor-specific simulations of disease progression and treatment responses [[108]31]. Notable applications include heart failure management [[109]52], personalized cancer treatment [[110]53–57], organ transplantation [[111]58], and liver regeneration [[112]59, [113]60]. However, many existing models focus on static representations of disease progression, limiting their ability to capture the dynamic nature of biological processes. A notable exception is Ref. [[114]61], which developed a framework for modeling personalized bladder cancer progression and optimizing adaptive multi-drug treatment strategies using deep reinforcement learning. However, no similar studies have been found for liver regeneration. In this study, we developed PePMDT, a digital twin model that integrates deep learning with mathematical modeling to predict donor-specific gene expression dynamics during liver regeneration. Our approach successfully reconstructed future postoperative gene expression profiles using very early postoperative data, including pre-resection data and data from 5 to 30 min after resection, demonstrating high predictive accuracy. PePMDT offers a dynamic, personalized framework that seamlessly integrates gene expression data with mechanistic simulations to predict long-term liver function recovery. The novelty of our study lies in the development of a hybrid modeling framework that uniquely integrates mechanistic mathematical modeling with deep learning-based autoencoders to construct a biologically interpretable PePMDT for liver regeneration. While autoencoders have been utilized in prior time-series prediction tasks, our framework embeds a mechanistically constrained latent space—comprising physiologically meaningful variables defined by a differential equation model of liver regeneration—between the encoder and decoder. This architecture facilitates biologically grounded, bidirectional translation between longitudinal gene expression data and mechanistic model dynamics. Unlike conventional deep learning methods that operate as black boxes, our model offers transparent, mechanism-informed predictions of donor-specific liver regeneration. To the best of our knowledge, this is the first instance where such an integrated digital twin, trained on real longitudinal transcriptomic data from LDLT donors, has been applied to model liver recovery. By bridging molecular-scale fluctuations with organ-level regenerative responses, PePMDT sets a new direction in interpretable and personalized predictive modeling for regenerative medicine. We evaluated its performance on donor-specific data to assess the predictive accuracy of PePMDT. Each donor exhibited unique gene expression patterns, underscoring the personalized nature of liver regeneration. Our model accurately predicted gene expression profiles across all donors, while also capturing substantial variability in recovery trajectories. Quantitatively, we assessed prediction accuracy using mean squared error and Pearson correlation coefficients across gene clusters, with the lowest average correlation across all donors being 0.83, indicating strong alignment between observed and predicted values. Despite perturbing individual gene clusters by up to [MATH: ± :MATH] 100%, the predicted liver volume fractions exhibited relatively modest changes. This indicates that the neural network model has not overfitted to specific gene expression patterns but has instead learned a generalized and robust mapping from gene clusters to liver volume dynamics. The model’s stability in the face of extreme perturbations highlights its ability to capture underlying biological principles rather than spurious correlations, further supporting its potential for predictive and mechanistic insights in diverse donor contexts. Beyond its predictive capability, PePMDT enables what-if simulations, allowing the exploration of liver recovery under different clinical scenarios, such as varying degrees of liver damage or resections. This predictive power provides a valuable tool for understanding liver regeneration at a molecular level and optimizing treatment strategies. By constructing a computational model informed by real donor gene expression data, PePMDT facilitates personalized medicine applications, allowing clinicians to anticipate complications and tailor interventions based on a donor’s molecular profile. Moreover, simulations of gene expression dynamics over time validate the model’s utility in predicting individual recovery trajectories. By accurately forecasting future gene expression states based on early molecular patterns, PePMDT bridges high-throughput genomic data with physiologically relevant mathematical modeling, offering a scalable and robust framework for precision medicine in liver regeneration. Supplementary Material bpaf037_Supplementary_Data [115]bpaf037_supplementary_data.zip^ (28.1MB, zip) Contributor Information Suvankar Halder, Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, United States. Michael C Lawrence, Baylor Scott & White Research Institute, Dallas, TX, 75204, United States. Giuliano Testa, Annette C. and Harold C. Simmons Transplant Institute, Baylor University Medical Center, Dallas, TX, 76104, United States. Vipul Periwal, Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, United States. Author contributions Suvankar Halder (Formal analysis [lead], Writing—original draft [lead]), Michael C. Lawrence (Data curation [lead]), and Giuliano Testa (Data curation [lead]), Vipul Periwal (Conceptualization [lead], Supervision [lead], Writing—review & editing [lead]) Supplementary data [116]Supplementary data is available at Biology Methods and Protocols online. Conflict of interest The authors declare no competing interests. Funding This research was supported by the Intramural Research Program of the NIH, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, Project Z01-DK075059). Data availability All code and data required to replicate the analysis presented in this study are available at [117]https://github.com/nihcompmed/PePMDT. References