Abstract
The remarkable capacity of the liver to regenerate its lost mass after
resection makes living donor liver transplantation a successful
treatment option. However, donor heterogeneity significantly influences
recovery trajectories, highlighting the need for individualized
monitoring. With the rising incidence of liver diseases, safer
transplant procedures and improved donor care are urgently needed.
Current clinical markers provide only limited snapshots of recovery,
making it challenging to predict long-term outcomes. Following partial
hepatectomy, precise liver mass recovery requires tightly regulated
hepatocyte proliferation. We identified distinct gene expression
patterns associated with liver regeneration by analyzing blood-derived
gene expression measurements from twelve donors followed over a year.
Using a deep learning-based framework, we integrated these patterns
with a mathematical model of hepatocyte transitions to develop a
personalized, progressive mechanistic digital twin—a virtual liver
model that predicts donor-specific recovery trajectories. Central to
our approach is a mechanistically identifiable latent space, defined by
variables derived from a physiologically grounded differential equation
model of liver regeneration, which enables biologically interpretable,
bidirectional mapping between gene expression data and model dynamics.
This approach integrates clinical genomics and computational modeling
to enhance post-surgical care, ensuring safer transplants and improved
donor recovery.
Keywords: liver regeneration, partial hepatectomy, mathematical
modeling, deep learning, digital twin, living donor liver transplant
(LDLT)
Introduction
The liver has the ability to regenerate and restore its functional mass
following injury, disease, or surgical removal, such as partial
hepatectomy. This regenerative capacity is vital for maintaining the
body’s homeostasis, given its central role in numerous physiological
processes, including the detoxification of harmful substances, the
metabolism of nutrients and drugs, and the synthesis of essential
proteins such as albumin and clotting factors [[30]1–4]. The
proliferation of mature hepatocytes, the functional cells of the liver,
primarily drives liver regeneration. These hepatocytes, generally in a
quiescent state (the G0 phase of the cell cycle), can re-enter the cell
cycle and proliferate in response to injury or a loss of liver mass.
This proliferation is a meticulously regulated process, ensuring that
the liver regenerates to its original size without excessive or
insufficient growth [[31]5–7]. Apart from the hepatocytes, supporting
endothelial cells, hepatic stellate cells, and Kupffer cells play
critical roles in liver regeneration. Endothelial cells facilitate the
restoration of the liver vascular network, ensuring adequate blood flow
and nutrient delivery during the regenerative process. Hepatic stellate
cells contribute by producing extracellular matrix (ECM) components and
growth factors, while Kupffer cells, the liver’s resident macrophages,
secrete cytokines and chemokines that prime hepatocytes for
regeneration [[32]8–14]. This collective and coordinated interplay
among hepatocytes and their supporting cells enables the capacity of
the liver to regenerate while maintaining its essential functions. This
capability of the liver to continue its normal functioning even during
the regeneration process permits living donor liver transplantation
(LDLT) [[33]15].
Liver regeneration is influenced by multiple factors, including the
host’s metabolic environment, graft type, and hepatectomy volume.
Studies have shown that left lobe grafts regenerate faster than right
lobes, with recipients exhibiting higher regeneration rates than donors
[[34]16]. Additionally, minor resections primarily rely on hypertrophy,
whereas extensive resections involve both hypertrophy and hepatocyte
proliferation, often leading to increased cell ploidy [[35]16–18].
These complexities make it challenging to predict individual recovery
trajectories following LDLT. Inter-donor variability significantly
impacts recovery outcomes in LDLT, with factors such as genetic makeup,
physiological status, pre-existing liver conditions, and immune
responses playing crucial roles. For example, high intra-donor
variability in tacrolimus levels has been associated with adverse
outcomes, including graft rejection and the development of
donor-specific antibodies [[36]19]. Furthermore, the specialized nature
of LDLT limits data availability, as it represents only a tiny fraction
of total liver transplants. A study analyzing 73 681 adult recipients
found that only 4% underwent LDLT, with substantial variation across
transplant centers [[37]20]. This scarcity of cases, combined with
limited longitudinal follow-up data, poses significant challenges in
developing clinically relevant predictive models for liver
regeneration.
The process of liver regeneration is orchestrated by dynamic
transitions of hepatocytes through quiescent, primed, and replicative
states, regulated by intricate signaling pathways [[38]21–23].
Understanding molecular drivers, including gene activation and
suppression, is critical for modeling and predicting these regenerative
dynamics [[39]24–27]. Building on a foundation of experimental
observations, we developed the first mathematical model to predict
liver volume after partial hepatectomy (PHx) [[40]28, [41]29], offering
a quantitative framework to predict regenerative trajectories. To make
such predictive models clinically relevant, we must integrate real-time
donor-specific data with the dynamical framework provided by our
dynamical model [[42]30, [43]31]. This is essential for capturing the
complex interplay of molecular, cellular, and systemic factors that
drive liver regeneration [[44]1, [45]22, [46]32].
However, challenges such as significant inter-donor variability, the
nonlinear nature of liver recovery, and the integration of
heterogeneous datasets complicate the development of personalized
predictive models [[47]31]. To address these complexities, this study
introduces the personalized progressive mechanistic digital twin
(PePMDT), a donor-specific digital twin for the livers of LDLT donors.
By integrating donor-specific clinical and molecular data with deep
learning techniques, PePMDT enables precise simulation of regeneration
dynamics. Similar deep learning ideas have also been applied in systems
biology and multi-omics integration to uncover functional relationships
between gene regulatory networks and physiological outcomes
[[48]33–36]. The digital twin framework begins with collecting blood
samples from a specific LDLT donor after partial hepatectomy.
Whole-transcriptomic analysis is then performed, and the identified
genes are clustered into various functional groups. These grouped genes
are subsequently mapped onto our mechanistic mathematical model that
describes liver regeneration processes. A key novelty of our approach
lies in the construction of a mechanistically identifiable latent space
between the encoder and decoder networks, composed of variables derived
from physiological model of liver regeneration [[49]28, [50]29]. This
biologically interpretable latent space allows bidirectional mapping
between gene expression profiles and mechanistic variables, making our
digital twin framework fundamentally different from black-box deep
learning approaches. The model predicts dynamic variables at future
time points, which are then used to forecast future gene expression
patterns, enabling a data-driven approach to personalized liver
recovery assessment, as depicted in [51]Fig. 1. This personalized
framework has the potential to optimize post-transplant care, enhance
recovery outcomes, and advance the field of regenerative medicine.
Figure 1.
[52]Figure 1.
[53]Open in a new tab
Framework of the digital twin. The figure depicts the general framework
of the digital twin for liver regeneration. Blood samples are collected
from a specific LDLT donor after partial hepatectomy, followed by whole
transcriptomic analysis. Identified genes are clustered into functional
groups and mapped onto a mechanistic mathematical model describing
liver regeneration. The model predicts dynamic variables at future time
points, which are then used to forecast future gene expression
patterns.
Methods
Streamlining gene expression profiles
The dataset consists of whole-transcriptome RNA sequencing data from 12
healthy LDLT donors, collected at 14 predefined time points over a
1-year follow-up period. All LDLT donors underwent partial hepatectomy,
and blood samples were collected at specific intervals post-surgery,
including 5 min, 30 min, 1 h, 2 h, 3 h, 4 h, 1 day, 2 days, 3 days,
4 days, 10 days, 3 months, 6 months, and 12 months. The aim was to
explore the dynamic transcriptional changes associated with liver
regeneration post-PHx. This comprehensive dataset provided a unique
opportunity to capture the liver’s gene expression landscape during
recovery, from the immediate post-operative phase to the long-term
restoration of liver function. [54]Figure 2 presents a visual
representation of the data structure, providing context for how the
data were organized and utilized in this analysis. To ensure accurate
downstream analysis and model predictions, the RNA-seq data underwent
several refinement steps aimed at standardizing gene expression values
across different donors and time points. These steps were essential to
handle potential technical and biological noise inherent in
high-throughput sequencing data.
Figure 2.
[55]Figure 2.
[56]Open in a new tab
Data structure. The dataset consists of samples obtained from 12
healthy LDLT donors, collected at 14 distinct time points over the
course of 1 year following partial hepatectomy. Filled color boxes
indicate the presence of data, while blank spaces represent missing
data. Donors are arranged in descending order based on the number of
available time points. The number of available data points out of the
14 total time points is indicated on the right side for each donor. The
top two-thirds of the dataset is used for training, while the remaining
one-third serves as the testing dataset. The lower portion of the
dataset was chosen for testing because these samples lack gene
expression values in the later stages, providing an opportunity for
predictive modeling.
Missing value imputation
Due to the nature of RNA-seq data, some gene expression values were
missing or below detection limits. To address this, we filled the
missing values by averaging the corresponding expression values from
other donors. This approach ensured that the imputed values were
biologically plausible and aligned with the collective data structure.
By using donor-averaged imputation, we preserved the continuity of
expression profiles across time points and minimized biases in
downstream analyses, ensuring robust and reliable integration of the
data into our model.
Correction of negative expression values
In some instances, negative expression values were observed due to
noise in low-expressed genes. These negative values were biologically
unrealistic and could interfere with further analysis. To address this
issue, we replaced all negative values with the average of the
corresponding expression values from other donors. This replacement
aims to ensure that no low or negative value artifacts skew the
results, particularly in downstream clustering or modeling.
Log-transformation
Gene expression values were log-transformed to reduce the skewness of
the data and to stabilize the variance across time points and donors.
Log-transformation compresses the range of gene expression values,
potentially revealing more patterns in genes with both high and low
expression. This transformation is especially useful in dynamic
biological processes like liver regeneration, where gene expression
changes can span several orders of magnitude over time. The
transformation used was:
[MATH:
Log transformed<
mi> data= log 2(1+data) :MATH]
This pipeline made the RNA-seq data ready for subsequent analyses,
including clustering, temporal trend analysis, and mapping to the
mathematical model variables.
Gene co-expression analysis: Identification of temporal gene modules
The dataset contains 16 735 RNA gene transcripts in the blood, each
expressed at least once during post-surgical recovery. Among these,
6649 genes exhibited a differential expression of 2-fold or more across
at least two consecutive time points compared to the preoperative
baseline following liver resection. Additionally, 1110 genes showed
expression changes of 1.5-fold or more at pre-resection time points,
likely due to nonspecific surgical effects from the abdominal incision.
These genes were excluded to focus on those potentially specific to
liver regeneration. This left 5876 genes that were identified as
differentially expressed genes (DEGs) and specific to liver
regeneration.
Weighted gene co-expression network analysis (WGCNA) [[57]37] was
employed on the DEGs to identify groups of genes that exhibited
distinct temporal patterns during liver regeneration. These groups were
further refined using self-organizing maps (SOM), which organized the
genes into clusters based on their temporal expression profiles. SOM is
particularly useful for visualizing time-series data, as it provides a
low-dimensional representation of complex patterns, helping to reveal
temporal trends in gene expression [[58]38, [59]39]. These methods
collectively identify clusters of highly correlated genes, which are
believed to be involved in shared biological processes. Each cluster is
assigned a unique name and represents a set of genes whose expression
patterns are highly similar across donors and time points. This method
allows for the identification of clusters without prior knowledge of
the biological pathways involved, making it an unbiased tool for
analyzing dynamic gene expression data [[60]37, [61]40].
Mathematical and molecular framework of liver regeneration
Liver regeneration following partial hepatectomy involves a series of
biological processes, including hepatocyte proliferation, ECM
degradation, and tissue remodeling. These processes are regulated by
dynamic interactions between genes and the tissue environment, which
can be modeled mathematically. Our previously developed mathematical
model [[62]28, [63]29] simulates liver regeneration by incorporating
key processes such as liver volume recovery, hepatocyte proliferation
rates, ECM turnover, and tissue remodeling.
The liver regeneration process is characterized by a coordinated
transition of hepatocytes through distinct states—quiescent (Q), primed
(P), and replicating (R). These transitions are illustrated in
[64]Fig. 3A. While the total cell count consists of Q, P, and R cells,
the overall liver volume increases due to the hypertrophy of primed and
replicating cells [[65]18]. The reduction in hepatocyte number
following resection increases the metabolic load per hepatocyte (M/N),
triggering the expression of growth factors (GF) and tumor necrosis
factor (TNF). TNF promotes ECM degradation via MMPs and induces IL-6
expression, which activates the JAK-STAT3 signaling pathway. STAT3, in
turn, drives the transcription of SOCS3 and immediate early (IE) genes,
shifting hepatocytes into a primed state. In this state, hepatocytes
remain metabolically active but do not yet proliferate. Cytokines such
as IL-6 and TNF-
[MATH: α :MATH]
, along with growth factors like HGF and EGF, further reinforce this
priming by upregulating transcriptional programs involving immediate
early genes, such as c-Jun and c-Fos. This prepares hepatocytes to
respond to mitogenic signals and transition into the replicating state,
where DNA synthesis and cell division occur to restore liver mass. As
regeneration progresses, TNF levels decrease, ECM reformation occurs,
and GF are inactivated, for example by uptake into the ECM, ultimately
halting the primed-to-replicating transition and promoting the shift
from replication back to quiescence. The molecular interactions
underlying this process are depicted in [66]Fig. 3B. A detailed
mathematical framework is given in [[67]28, [68]29].
Figure 3.
[69]Figure 3.
[70]Open in a new tab
Schematic overview of liver regeneration. (A) The cell cycle dynamics
during liver regeneration, highlighting changes in cell numbers. (B)
The biochemical pathways involved in liver regeneration, illustrating
key molecular interactions driving hepatocyte proliferation and tissue
remodeling.
Constructing the digital twin
Integration of gene expression data with mechanistic modeling
represents a potentially transformative approach to predicting
personalized liver recovery. In this study, we constructed a digital
twin, PePMDT, through a two-step mapping process combined with
mathematical model simulations (see [71]Fig. 4). In the first step, we
map gene expression data onto key physiological parameters of the
mathematical model. This transformation enables us to represent
biological states in terms of mathematically defined variables
governing liver regeneration dynamics. Once the gene expression data
has been mapped, we use the mathematical model to predict liver
regeneration dynamics at future time points. These simulations predict
the evolution of key physiological parameters, capturing hepatocyte
proliferation, ECM turnover, and tissue remodeling over time. In the
final step, we reverse-map the simulated mathematical model variables
back to gene expression values at future time points. This process
gives predicted gene expression profiles, allowing us to estimate the
molecular responses of LDLT donors over time based on their initial
gene expression states. These forward and reverse mappings are
described in the following subsections.
Figure 4.
[72]Figure 4.
[73]Open in a new tab
Schematic representation of the digital twin framework. The process
involves a two-step mapping approach combined with mathematical model
simulations. First, gene expression data are mapped onto key
physiological parameters of the mathematical model. The model then
simulates liver regeneration dynamics over future time points. Finally,
the simulated mathematical values are reverse-mapped to reconstruct
gene expression values, enabling the prediction of molecular responses
in LDLT donors over time.
Mapping gene expression profiles to mathematical model variables
The mathematical model [[74]28, [75]29] describes hepatocyte state
transitions in response to molecular signals, providing insights into
liver recovery and potential therapeutic targets. However, they lack
direct links to gene regulation. To bridge this gap, we used RNA-seq
data from LDLT donors to map gene expression onto dynamic model
variables. Using a feedforward neural network (FNN), we mapped gene
expression data into key physiological parameters, enabling a
gene-driven understanding of liver regeneration. The FNN consists of
three fully connected layers, each followed by ReLU activation
functions to introduce nonlinearity, potentially allowing the model to
capture complex relationships between gene expression patterns and
liver regeneration states. Dropout layers (0.2 probability) are
incorporated to prevent overfitting, aiming to ensure robust
generalization across different gene expression profiles. The input
layer receives gene expression data clustered into biologically
relevant groups (15 clusters), which is then transformed through
progressively wider layers (64 and 128 neurons) to extract meaningful
patterns. The final layer outputs a lower-dimensional representation
consisting of mathematical model variables corresponding to the
measured gene expression levels computed based on donor-specific
factors, such as the extent of liver resection, which plays a crucial
role in the mathematical modeling framework. This parameter
significantly influences the simulation of liver regeneration dynamics.
The FNN is trained using a supervised learning approach, where the loss
function (mean squared error, MSE) measures the difference between
predicted and actual physiological states. FNN parameters are optimized
using backpropagation and the Adam optimizer [[76]41].
Reverse mapping: Predicting gene expression from model variables
After mapping gene expression data to mathematical model variables, the
model simulates liver regeneration dynamics over future time points. A
reverse mapping step is required to translate these simulated variables
back into gene expression values, predicting future molecular
responses. This transformation is performed using an FNN as well. The
architecture mirrors the FNN used for the forward mapping, consisting
of three fully connected layers with ReLU activation functions to
introduce nonlinearity and dropout layers (0.2 probability) to prevent
overfitting. The input layer receives simulated model variables,
processes them through progressively wider layers (64 and 128 neurons),
and outputs gene expression values corresponding to biologically
relevant clusters. The model is optimized using the Adam optimizer,
with MSE as the loss function to minimize the discrepancy between
predicted and actual gene expression values.
Results
The genetic blueprint of liver regeneration: Clustering analysis reveals
temporal gene modules in liver regeneration
The RNA-seq data from 12 healthy LDLT donors, collected over a 1-year
follow-up period, were obtained from Lawrence et al. [[77]42]. The
donors were divided into two sets: two-thirds were used as training
donors, and one-third were used as test donors. These data underwent
several refinement steps to standardize them, as detailed in the
Methods section. To identify temporally regulated gene modules
representing distinct biological processes across different phases of
liver regeneration, we applied a combination of WGCNA and SOM
[[78]37–40, [79]43–46]. This approach enabled the identification of
gene clusters exhibiting dynamic regulation, capturing key patterns of
gene expression over time. These identified modules form the foundation
for understanding the molecular mechanisms underlying liver
regeneration. Moreover, this clustering approach revealed gene networks
with synchronized behavior, providing insights into the transcriptional
programs that drive recovery after partial hepatectomy.
After applying WGCNA and SOM (for details see Methods section), the
genes were clustered into 15 distinct groups (see [80]Fig. 5)—a higher
number than the eight clusters used by Lawrence et al. [[81]42]. This
increased number of clusters allowed for a more precise differentiation
of DEGs based not only on their temporal expression patterns but also
on their expression values, which proved beneficial for subsequent
analyses. To assess the biological significance of the clusters, we
compared our results with those of Lawrence et al. [[82]42]. Consistent
with their findings, we also identified that the clusters can be
grouped into three prominent functional temporal gene modules
corresponding to the main phases of liver regeneration: early response
(Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, and 15), proliferation
(Clusters 3 and 6), and long-term recovery (Clusters 1 and 2). Each
module comprises genes with specific functions that are activated
during distinct stages of the recovery process of the liver. SOM
generates results based on random seeds, meaning that the outcomes may
vary slightly for individual clusters with each run. However, this does
not affect the overall functional modules of regeneration. The key
processes associated with each module were determined using functional
enrichment analysis using the Enrichr database [[83]47] and are
outlined to highlight their relevance to the overall biological
context. The detailed figure of the gene ontology (GO) and pathway
enrichment analysis is provided in the [84]Supplementary Fig. 1. All
gene clusters and their corresponding top driver genes are provided in
the [85]Supplementary Materials.
Figure 5.
[86]Figure 5.
[87]Open in a new tab
Temporal gene modules. A visual representation of the temporal gene
expression patterns across 15 distinct clusters, which were
subsequently grouped into three modules, during the liver resection and
regeneration process. The bold lines represent the mean log2 fold
change relative to preoperative values, and the shaded regions indicate
the standard deviations.
* Early-response module (Clusters 4, 5, 7, 8, 9, 10, 11, 12, 13, 14,
and 15): This module consists of genes that peak within the first
24 h post-PHx. These genes are primarily involved in the
inflammatory response and early ECM degradation and remodeling.
This triggers immune cell recruitment and matrix breakdown, which
are crucial for initiating the regenerative process. Key genes in
this module include SLC39A1, which is involved in zinc transport
and immune regulation; IFNGR2, critical for immune response and
inflammation; CMTM4, which modulates cytokine signaling; SEMA6C,
guiding immune cell migration and tissue repair; IGKV1-33, playing
a role in immune response through immunoglobulin synthesis; LGMN, a
protease responsible for ECM degradation; CCR6, mediating immune
cell recruitment to injury sites; FES, a tyrosine kinase
participating in ECM remodeling; ADM, a cytokine involved in
vasodilation and inflammation; and WNT2B, part of the Wnt signaling
pathway that promotes cell proliferation and tissue regeneration.
The temporal peak of this module aligns with the critical early
stages of liver regeneration, where tissue damage needs to be
repaired to create a favorable environment for cell proliferation.
* Proliferation module (Clusters 3 and 6): Active between 1 and 10
days post-surgery, this module is enriched for genes related to
cell cycle progression, hepatocyte proliferation, and DNA
replication. During this phase, the liver undergoes rapid growth as
hepatocytes proliferate to replace the lost tissue. This process is
tightly regulated by growth factors and cell cycle regulators,
ensuring controlled proliferation. Genes in this module include
cyclins, cyclin-dependent kinases (CDKs), and growth factor
receptors. For example, Cluster 3 contains TFDP1, a key
transcription factor in cell cycle regulation, along with COLGALT1,
HOMER3, ITGAD, and ANG, which are associated with chromatin
remodeling, cell signaling, and angiogenesis. Cluster 6 includes
immune-regulatory and proliferation-associated genes such as IFIT1,
CXCL8, CXCL6, IFIT3, and AHRR, which are known to be involved in
inflammation-driven proliferation and tissue remodeling. The peak
of this module corresponds to the proliferative burst that occurs
during liver regeneration, with hepatocyte replication reaching its
highest levels between 3 and 10 days.
* Long-term recovery module (Clusters 1 and 2): This module is
characterized by genes that show sustained upregulation beyond 3
months post-PHx. These genes are associated with ECM organization,
tissue remodeling, and the restoration of normal liver function. As
the liver nears the completion of regeneration, it enters a phase
of tissue stabilization, where the newly formed ECM becomes
organized, and liver architecture is restored to support long-term
functional recovery. Key genes in this module include PTGS1, TSTA3,
MAP1A, and CTSE, which are involved in extracellular matrix
remodeling. Genes such as ESAM, PTGFRN, and CORIN are associated
with angiogenesis and vascular functions, facilitating proper blood
vessel formation and remodeling. DAP and SIAH2 contribute to tissue
remodeling and maintaining homeostasis. Additionally, FAXDC2,
ACCSL, and possibly GPR146 play roles in liver function and lipid
metabolism, supporting the restoration of metabolic activity during
the late phase of liver regeneration.
These three modules capture the temporal dynamics of liver regeneration
and also reflect distinct biological processes that are activated
sequentially.
PepMDT predicts donor-specific future gene expression patterns related to
liver mass recovery with precision
Comparisons of predicted gene expression from PePMDT with actual gene
expression profiles across multiple time points and donors demonstrated
a strong alignment between the two datasets. In this analysis, the
initial gene expression profiles of test donors (before surgery values)
were input into the PePMDT model, which then predicted gene expression
values for each cluster at subsequent time points. [88]Figure 6
illustrates the results for all test donors: donors 5, 6, 10, and 11.
Figure 6.
[89]Figure 6.
[90]Open in a new tab
Comparison of original and predicted gene expression profiles. The plot
illustrates the temporal gene expression patterns for individual test
donors—donor 5 (A), donor 6 (B), donor 10 (C), and donor 11 (D). Bold
lines represent the mean log2 fold change relative to preoperative
values, with shaded regions indicating standard deviations. Red lines
denote the predicted values, while blue lines represent the original
(true) gene expression values. Open circles indicate time points where
data are missing in the original dataset.
The cluster-wise MSE and Pearson correlation coefficient between the
true and predicted gene expression profiles were computed to evaluate
prediction accuracy. For the majority of clusters, the MSE was found to
be less than 0.3, with an average MSE across all test donors below
0.15. The Pearson correlation coefficient consistently exceeded 0.7,
with the lowest average for a donor being 0.872, indicating a strong
correlation between the predicted and true gene expression profiles.
These results are depicted in [91]Fig. 7, where Panel A shows the
cluster-wise MSE and Pearson correlation, and Panel B presents the
averaged MSE and Pearson correlation across clusters. The very low
P-values associated with the correlation coefficients further confirm
the statistical significance of the observed agreement between true and
predicted gene expression profiles.
Figure 7.
[92]Figure 7.
[93]Open in a new tab
Donor-specific prediction accuracy. Comparison of prediction accuracy
across individual test donors for the initial time point (before
surgery). Panel A shows the MSE and Pearson correlation coefficients
between the true and predicted gene expression profiles for each
cluster; the corresponding P-values are color-coded in the right
figure. Panel B displays the averaged MSE and averaged Pearson
correlation across clusters, with the corresponding P-values also
indicated.
We extended our analysis to two additional time points: 5 and 30 min
post-surgery, to assess whether selecting a later time point as the
initial point improves prediction accuracy. However, we did not explore
time-points beyond these, as the primary focus of our work is
predicting future time-points based on earlier stages. We compared the
results for these initial time-points and observed that the prediction
accuracy remained consistent across all test donors. This comparison is
shown in [94]Fig. 8 for donor 5, while the results for the remaining
test donors are provided in the [95]Supplementary Fig. S2. The error
bars represent the MSE between the true and predicted values across
different time points for each of the 15 clusters. These results
demonstrate that prediction accuracy does not significantly vary with
different initial time-points. Although there is a small variation at
the immediate next time-point for some clusters, no significant
differences are observed at later time points. This indicates that our
model is capable of reliably predicting future gene expression values,
even when the gene expression profile is provided as early as the
pre-surgery time point. To evaluate prediction accuracy for the
post-surgical initial time points (5 and 30 min), we computed the
cluster-wise and average MSE and Pearson correlation coefficients
between true and predicted gene expression profiles. The results, shown
in [96]Supplementary Fig. 3, indicate strong correlations, reinforcing
the model’s reliability in forecasting future gene expression patterns
regardless of the initial time point.
Figure 8.
[97]Figure 8.
[98]Open in a new tab
Comparison of prediction accuracy with different initial time points.
The figure depicts error bars representing the mean squared errors
between the true values and the predicted values for each time point
for donor 5. The red, green, and blue bars correspond to the initial
time-points chosen as before surgery, 5 min post-surgery, and 30 min
post-surgery, respectively. This comparison is shown for all 15
clusters.
Influence of mathematical model parameters and gene clusters on liver volume
recovery
We analyzed the effects of perturbations in both mechanistic model
parameters and input gene expression levels to better understand the
drivers of liver volume recovery. This dual analysis aimed to identify
the most sensitive factors—at both the gene and the mechanistic
levels—that influence liver regeneration outcomes. The effect of
perturbations in mechanistic model parameters can be assessed by
analyzing their sensitivity to the model output, specifically liver
volume recovery. Global sensitivity analysis (GSA) was performed using
Latin hypercube sampling (LHS) and partial rank correlation coefficient
(PRCC) to evaluate the time-dependent sensitivity of the mathematical
model parameters [[99]48–51]. Sensitive parameters were identified
based on the PRCC values, as indicated by the color bar in [100]Fig. 9.
A higher positive or negative PRCC value signifies greater sensitivity
of the parameter with respect to changes in liver volume. Asterisks
denote statistically significant PRCC values (P < 0.05).
Figure 9.
[101]Figure 9.
[102]Open in a new tab
Global sensitivity analysis. The y-axis represents the mathematical
model parameters, and the x-axis represents time points. The color bar
indicates the PRCC values, which measure the sensitivity of each
parameter. Asterisks denote statistically significant P-values: * for
P < 0.05, ** for P < .01, and *** for P < .001.
The GSA revealed a subset of parameters that remained consistently
sensitive across all timepoints. These include metabolic load (M), TNF
production rate (
[MATH: kTNF :MATH]
), JAK degradation rate (
[MATH: κJAK :MATH]
), SOCS3 induction rate (
[MATH: VSOCS3
:MATH]
), its half-saturation constant (
[MATH: kMSOCS3
:MATH]
), SOCS3 degradation rate (
[MATH: κSOCS3
:MATH]
), immediate early gene expression rate (
[MATH:
VIE :MATH]
), its half-saturation constant (
[MATH:
kMI<
mi>E :MATH]
), IE gene degradation rate (
[MATH:
κIE :MATH]
), and the transition rate from quiescent to priming hepatocytes (
[MATH:
kQ :MATH]
). Conversely, some parameters displayed strong time-dependent
sensitivity. For example, JAK activation (
[MATH: VJAK :MATH]
), its half-saturation constant (
[MATH: kMJAK :MATH]
), the concentration of monomeric STAT3 ([proSTAT3]), and its
half-saturation constant (
[MATH:
kMS<
mi>T3 :MATH]
) were predominantly influential during the early priming phase. The
sigmoid parameter for re-quiescence (
[MATH: θreq :MATH]
) showed increased sensitivity in the mid-proliferation phase. In the
later stages of regeneration, parameters related to growth factor and
extracellular matrix (ECM) dynamics became more dominant. These include
growth factor production (
[MATH:
kGF :MATH]
), its decay rate (
[MATH:
κGF :MATH]
), ECM decay (
[MATH: κECM :MATH]
), TNF-mediated ECM degradation (
[MATH: kdeg :MATH]
), and transition rates between cellular states—replicating to
quiescent (
[MATH:
kR :MATH]
), proliferation to replicating (
[MATH:
kp :MATH]
), proliferation rate (
[MATH: kprol :MATH]
), and re-quiescence rate (
[MATH: kreq :MATH]
). These parameters grew increasingly sensitive during the late-middle
to terminal phases of regeneration.
Next, we systematically perturbed the expression levels of each gene
cluster across a range of
[MATH: ± :MATH]
100% to investigate how changes in gene expression within specific
clusters influence liver volume recovery. This analysis was performed
for four test donors (IDs 5, 6, 10, and 11), with results for donor 5
shown in [103]Fig. 10; the remaining donors are presented in the
[104]Supplementary Fig. 4. Perturbations were applied at three early
timepoints (before surgery, 5 and 30 min), while liver volume was
assessed at a representative late timepoint (3 months) during liver
regeneration. Using donor-specific mean gene expression values and
standard deviations, we perturbed each gene cluster independently while
keeping all others constant. For each perturbation level, the modified
gene expression profile was passed through the trained model to predict
liver volume. These predicted volumes were then compared to the actual
(from donor-specific mechanistic model) liver volume, and the impact of
each cluster perturbation was visualized across timepoints.
Figure 10.
[105]Figure 10.
[106]Open in a new tab
Effect of gene perturbation on liver volume recovery. For the test
donor 5, the expression levels of individual gene clusters were
systematically perturbed from −100% to +100% at three early time
points: Panel A—Before surgery, Panel B—5 min post-surgery, and Panel
C—30 min post-surgery. The corresponding liver volume was predicted at
a representative late time point (3 months). The x-axis indicates the
percentage change in gene expression, and the y-axis shows the
predicted liver volume fraction. Each colored line with dots represents
a different gene cluster, illustrating how specific clusters influence
liver volume recovery.
The predicted liver volumes for unperturbed gene expression profiles
(green dashed lines in [107]Fig. 10) closely matched the actual
observed volumes (red dashed lines), demonstrating the accuracy of our
PePMDT model and providing a robust baseline for assessing the effects
of gene perturbations. When individual gene clusters were perturbed,
their impact on predicted liver volume varied across timepoints,
highlighting dynamic and time-specific sensitivities in the
regenerative process. Although the overall liver volume recovery did
not exhibit dramatic shifts, specific gene clusters influenced the
model’s predictions more strongly at certain stages. For instance,
clusters 5 and 6 showed notable effects before surgery, while clusters
8 and 14 had greater influence at 5 and 30 min post-surgery. This
evidence supports the hypothesis that different clusters play
temporally distinct roles in liver regeneration. Furthermore, the
pattern of these effects differed between donors, underscoring the
personalized nature of the model and its capacity to capture individual
variability in regenerative dynamics.
Perturbation of individual gene clusters revealed distinct,
time-dependent sensitivities in liver volume predictions, with certain
clusters exerting stronger effects at specific post-surgical phases.
Insights from GSA further support a temporal shift in dominant
regulatory mechanisms—from immune signaling and early stress responses
during the initial stages to growth factor signaling and cell cycle
regulation in later phases. This evolving sensitivity landscape
highlights the critical role of temporal context in understanding
regenerative control. Together, these analyses identify both
consistently and phase-specifically sensitive regulatory modules,
offering promising candidates for further mechanistic investigation and
potential clinical intervention strategies to support and modulate
liver regeneration. These findings underscore the utility of the
digital twin framework in capturing the dynamic and donor-specific
regulatory landscape underlying liver regeneration.
Discussion
Digital twin models have gained traction in healthcare, offering
donor-specific simulations of disease progression and treatment
responses [[108]31]. Notable applications include heart failure
management [[109]52], personalized cancer treatment [[110]53–57], organ
transplantation [[111]58], and liver regeneration [[112]59, [113]60].
However, many existing models focus on static representations of
disease progression, limiting their ability to capture the dynamic
nature of biological processes. A notable exception is Ref. [[114]61],
which developed a framework for modeling personalized bladder cancer
progression and optimizing adaptive multi-drug treatment strategies
using deep reinforcement learning. However, no similar studies have
been found for liver regeneration.
In this study, we developed PePMDT, a digital twin model that
integrates deep learning with mathematical modeling to predict
donor-specific gene expression dynamics during liver regeneration. Our
approach successfully reconstructed future postoperative gene
expression profiles using very early postoperative data, including
pre-resection data and data from 5 to 30 min after resection,
demonstrating high predictive accuracy. PePMDT offers a dynamic,
personalized framework that seamlessly integrates gene expression data
with mechanistic simulations to predict long-term liver function
recovery.
The novelty of our study lies in the development of a hybrid modeling
framework that uniquely integrates mechanistic mathematical modeling
with deep learning-based autoencoders to construct a biologically
interpretable PePMDT for liver regeneration. While autoencoders have
been utilized in prior time-series prediction tasks, our framework
embeds a mechanistically constrained latent space—comprising
physiologically meaningful variables defined by a differential equation
model of liver regeneration—between the encoder and decoder. This
architecture facilitates biologically grounded, bidirectional
translation between longitudinal gene expression data and mechanistic
model dynamics. Unlike conventional deep learning methods that operate
as black boxes, our model offers transparent, mechanism-informed
predictions of donor-specific liver regeneration. To the best of our
knowledge, this is the first instance where such an integrated digital
twin, trained on real longitudinal transcriptomic data from LDLT
donors, has been applied to model liver recovery. By bridging
molecular-scale fluctuations with organ-level regenerative responses,
PePMDT sets a new direction in interpretable and personalized
predictive modeling for regenerative medicine.
We evaluated its performance on donor-specific data to assess the
predictive accuracy of PePMDT. Each donor exhibited unique gene
expression patterns, underscoring the personalized nature of liver
regeneration. Our model accurately predicted gene expression profiles
across all donors, while also capturing substantial variability in
recovery trajectories. Quantitatively, we assessed prediction accuracy
using mean squared error and Pearson correlation coefficients across
gene clusters, with the lowest average correlation across all donors
being 0.83, indicating strong alignment between observed and predicted
values. Despite perturbing individual gene clusters by up to
[MATH: ± :MATH]
100%, the predicted liver volume fractions exhibited relatively modest
changes. This indicates that the neural network model has not
overfitted to specific gene expression patterns but has instead learned
a generalized and robust mapping from gene clusters to liver volume
dynamics. The model’s stability in the face of extreme perturbations
highlights its ability to capture underlying biological principles
rather than spurious correlations, further supporting its potential for
predictive and mechanistic insights in diverse donor contexts.
Beyond its predictive capability, PePMDT enables what-if simulations,
allowing the exploration of liver recovery under different clinical
scenarios, such as varying degrees of liver damage or resections. This
predictive power provides a valuable tool for understanding liver
regeneration at a molecular level and optimizing treatment strategies.
By constructing a computational model informed by real donor gene
expression data, PePMDT facilitates personalized medicine applications,
allowing clinicians to anticipate complications and tailor
interventions based on a donor’s molecular profile. Moreover,
simulations of gene expression dynamics over time validate the model’s
utility in predicting individual recovery trajectories. By accurately
forecasting future gene expression states based on early molecular
patterns, PePMDT bridges high-throughput genomic data with
physiologically relevant mathematical modeling, offering a scalable and
robust framework for precision medicine in liver regeneration.
Supplementary Material
bpaf037_Supplementary_Data
[115]bpaf037_supplementary_data.zip^ (28.1MB, zip)
Contributor Information
Suvankar Halder, Laboratory of Biological Modeling, National Institutes
of Diabetes and Digestive and Kidney Diseases, National Institutes of
Health, Bethesda, MD, 20892, United States.
Michael C Lawrence, Baylor Scott & White Research Institute, Dallas,
TX, 75204, United States.
Giuliano Testa, Annette C. and Harold C. Simmons Transplant Institute,
Baylor University Medical Center, Dallas, TX, 76104, United States.
Vipul Periwal, Laboratory of Biological Modeling, National Institutes
of Diabetes and Digestive and Kidney Diseases, National Institutes of
Health, Bethesda, MD, 20892, United States.
Author contributions
Suvankar Halder (Formal analysis [lead], Writing—original draft
[lead]), Michael C. Lawrence (Data curation [lead]), and Giuliano Testa
(Data curation [lead]), Vipul Periwal (Conceptualization [lead],
Supervision [lead], Writing—review & editing [lead])
Supplementary data
[116]Supplementary data is available at Biology Methods and Protocols
online.
Conflict of interest
The authors declare no competing interests.
Funding
This research was supported by the Intramural Research Program of the
NIH, the National Institute of Diabetes and Digestive and Kidney
Diseases (NIDDK, Project Z01-DK075059).
Data availability
All code and data required to replicate the analysis presented in this
study are available at [117]https://github.com/nihcompmed/PePMDT.
References