Abstract
Quiescence, a temporary withdrawal from the cell cycle, plays a key
role in tissue homeostasis and regeneration. Quiescence is increasingly
viewed as a continuum between shallow and deep quiescence, reflecting
different potentials to proliferate. The depth of quiescence is altered
in a range of diseases and during aging. Here, we leveraged
genome-scale metabolic modeling (GEM) to define the metabolic and
epigenetic changes that take place with quiescence deepening. We
discovered contrasting changes in lipid catabolism and anabolism and
diverging trends in histone methylation and acetylation. We then built
a multi-cell type machine learning model that accurately predicts
quiescence depth in diverse biological contexts. Using both machine
learning and genome-scale flux simulations, we performed
high-throughput screening of chemical and genetic modulators of
quiescence and identified novel small molecule and genetic modulators
with relevance to cancer and aging.
Keywords: quiescence deepening, genome-scale models, machine learning,
cancer, aging
__________________________________________________________________
Significance Statement.
Quiescence is a sleep-like cell state important for tissue
regeneration. Quiescence exists on a spectrum, from shallow to deep
quiescence, based on a cell's ability to proliferate. Through
computational modeling, we simulated metabolic changes associated with
quiescence deepening. We also developed a machine learning model that
accurately predicts quiescence depth in multiple cell types and
quiescence induction methods. Our models predicted genes and small
molecules capable of tuning the depth of quiescence. These modulating
factors show a striking association with cancer initiation and aging.
Introduction
Cellular quiescence is a state of reversible dormancy in which cell
division is halted but able to resume if needed. As a common somatic
cell state, appropriate transition into and out of quiescence is
essential for proper tissue homeostasis ([26]1, [27]2). Quiescence is
induced in stem cells, for instance, where it shields cells from
environmental stress. In response to tissue injury, however, activation
of quiescent stem cells is key to regenerating damaged tissue ([28]3,
[29]4).
Rather than a uniform state, quiescence is increasingly viewed as a
continuum between shallow and deep quiescence, akin to light and deep
sleep, requiring different potencies of growth stimulation to induce
proliferation. Cells in shallower quiescence respond more easily to
growth signals, whereas cells in deeper quiescence are more difficult
to activate ([30]5). This phenomenon is observed in a wide range of
contexts. Cells cultured under serum starvation or contact inhibition
become progressively desensitized to growth signals ([31]6). Likewise,
quiescent liver cells display impaired regenerative capacity with age,
reflective of a deeper quiescence ([32]7).
Tuning the depth of quiescence is critical in a range of physiological
contexts. Muscle stem cells, for instance, shift to a shallower state
of quiescence in response to tissue injury or stress ([33]8), and
certain neural stem cells (NSCs) exit quiescence upon nutritional
stimulus ([34]9). Additionally, quiescence depth is dysregulated in a
range of diseases and during aging. In cancer, malignant cells can
become quiescent, where they evade most cancer treatments ([35]10), and
with age, quiescence deepens across various tissues and cell types,
impairing tissue regeneration and compromising cellular resilience
([36]5). In light of this, better understanding the factors modulating
quiescence would shed light on an important facet of quiescent cell
biology and may have therapeutic relevance to the variety of disease
states in which quiescence depth is dysregulated. These factors,
however, remain poorly defined. In this study, we take a multifaceted
approach to uncover modulators of quiescence depth encompassing
genome-scale metabolic model (GEM) and machine learning predictors of
quiescence.
Genome-scale metabolic models are mathematical representations of
metabolic networks encompassing an organism's entire genome ([37]11).
By incorporating constraints, including physicochemical and
environmental limitations, GEMs can simulate fluxes through metabolic
reactions and accurately predict responses to genetic perturbations
([38]12). GEMs can also incorporate omics data to further constrain the
set of allowable fluxes ([39]13). For instance, Frezza et al. ([40]14)
created a tissue-specific metabolic model using transcriptomics data,
and then accurately predicted how metabolism was rewired in the absence
of a key citric acid cycle enzyme. GEMs have also been used to define
the metabolic signature of quiescent cancer cells through omics data
incorporation ([41]15).
Here, we leveraged metabolic models to simulate the metabolic changes
underlying quiescence deepening and perform in silico gene knockouts to
identify genes capable of shifting quiescence depth toward a shallower
or deeper state. In doing so, we highlighted metabolic and epigenetic
trends associated with quiescence deepening and built a metabolic
flux-based quiescence depth predictor that accommodates multiple types
of input data. Metabolism and epigenetics are profoundly important to
cellular biology, yet the metabolic and epigenetic trends associated
with quiescence deepening remain unexplored, so this analysis provides
a window into critical yet undefined aspects of quiescence.
In addition, we developed a novel transcriptomics-based quiescence
depth predictor built using multiple quiescence induction methods and
multiple cell types and validated its accuracy in diverse contexts.
Using this predictor, we performed high-throughput screening of
chemical and genetic perturbations to identify interventions that
robustly altered quiescence depth. We also explored quiescence depth in
embryogenesis and stem cell aging at the single-cell level. We
validated our predictions using cancer genomics data (Fig. [42]1).
Fig. 1.
[43]Fig. 1.
[44]Open in a new tab
Graphical overview. In this article, we take a multifaceted approach to
identifying modulators of quiescence depth: (i) metabolic modeling and
flux simulations that enable in silico genetic knockouts and (ii) a
multi-cell type transcriptomics quiescence depth predictor that
facilitates screening chemical and genetic perturbations for their
effect on quiescence depth. In doing so, we simulate metabolic and
epigenetic features of quiescence deepening and explore quiescence
depth in embryogenesis and aging. Predictions are validated using a
cancer genomics database.
Results
Metabolic modeling simulates metabolic fluxes during quiescence deepening
First, we simulated metabolic fluxes for progressively deeper stages of
quiescence by incorporating transcriptomics from nine stages of
quiescence depth into a GEM containing 2,953 genes and 8,377
metabolites organized into 13,028 reactions ([45]16). Quiescence
deepening was induced in rat fibroblasts through serum starvation, and
quiescence depth was quantified as the days since serum starvation
began, denoted as quiescence depth score (QDS) ([46]5). Each quiescence
depth score had three replicates for a total of 27 samples. To
incorporate transcriptomics, we used Integrative Network Inference for
Tissues (INIT), a well-established tool for creating
transcriptomics-constrained metabolic models ([47]17). Then we randomly
sampled from the flux solution space (Fig. [48]2A). Similar results
were obtained using linear Integrative Metabolic Analysis Tool (iMAT)
([49]18) and parsimonious flux balance analysis (see Methods).
Fig. 2.
[50]Fig. 2.
[51]Open in a new tab
Metabolic pathway analysis of quiescence deepening. A) Workflow for
simulating metabolic reaction fluxes. Transcriptomics from nine stages
of quiescence depth were integrated into a metabolic model (Rat-GEM) to
constrain the set of allowable fluxes (using INIT). Then, we used CHRR
to randomly sample from the set of potential fluxes for each quiescence
depth. B) Top subsystems associated with quiescence depth. For each
subsystem, each reaction in the subsystem was correlated with
quiescence depth and the average Pearson correlation was computed.
Subsystems with the highest average Pearson correlation coefficients
are shown here (Q-value < 1E-5, one-sample t test). A positive mean
correlation coefficient implies a positive association with quiescence
depth while a negative mean correlation coefficient indicates a
negative association with quiescence depth. C) Subsystem enrichment
analysis of reactions negatively correlating with QDS at a P-value of
<0.15. Subsystems (pathways) ranked by fold enrichment are shown here
along with an enrichment P-value calculated using the hypergeometric
function and corrected with the Benjamini–Hochberg adjustment (Q-value
< 0.01 subsystems shown here). D) Subsystem enrichment analysis of
reactions positively correlating with QDS at a P-value of <0.15.
Subsystems (pathways) ranked by fold enrichment are shown here along
with an enrichment P-value calculated using the hypergeometric function
and corrected with the Benjamini–Hochberg adjustment (Q-value < 0.01
subsystems shown here). Top 10 subsystems meeting criteria are shown
here, ranked by fold enrichment. E) Normalized reaction fluxes across
quiescence depths for all reactions negatively correlating with QDS at
a P-value of <0.15. F) Normalized reaction fluxes across quiescence
depths for all reactions positively correlating with QDS at a P-value
of <0.15.
We then identified which metabolic reactions corresponded most strongly
to quiescence deepening by correlating reaction fluxes with quiescence
depth and highlighting the reactions with the strongest association.
The fluxes of these reactions show a strong correspondence with
quiescence depth (Fig. [52]2E and F). To better understand the
metabolic pathways represented by these reactions, each reaction's
association with a subsystem in the metabolic model was leveraged.
Using this information, we performed an enrichment analysis to identify
which metabolic subsystems were overrepresented in the reactions
positively correlating to quiescence depth (increasing with quiescence
depth) and the reactions negatively correlating to quiescence depth
(decreasing with quiescence depth) (Fig. [53]2C and D). The basis for
this fold enrichment is the frequency of a given subsystem in the
differential fluxes compared to the frequency of a given subsystem in
all reactions in the metabolic model. Glycosaminoglycan, arachidonic
acid, and linoleate metabolism were a few of the subsystems found to
rise with quiescence deepening, whereas fatty acid biosynthesis,
prostaglandin biosynthesis, and omega-3 metabolism were predicted to
fall.
A further pathway analysis was then performed in which each metabolic
subsystem's average correlation with quiescence depth was calculated,
and subsystems with the highest absolute correlation were identified
(Fig. [54]2B). This revealed many of the same trends, with lipid
metabolism subsystems conspicuous.
Given the enrichment of lipid metabolism pathways in these subsystems,
we decided to delve deeper into these pathways specifically. To do so,
we examined a comprehensive set of lipid-related metabolic subsystems
and corroborated the trends using transcriptomics data.
Lipid anabolism and catabolism display largely opposing trends during
quiescence deepening
The most striking metabolic pattern observed during quiescence
deepening was the diverging trends of lipid anabolism and catabolism.
As quiescence deepens, beta-oxidation fluxes substantially increase
while fatty acid and cholesterol biosynthesis fluxes decline, albeit in
a less uniform fashion (Fig. [55]3A and B), suggesting a metabolic
shift from fatty acid biosynthesis to fatty acid oxidation takes place
during quiescence deepening. We highlight representative metabolic
subsystems and reactions in Fig. [56]3C and D for lipid anabolism and
catabolism, respectively, using Escher plots ([57]19). The full-size
metabolic network plots are shown in Figs. [58]S2 and S3.
Fig. 3.
[59]Fig. 3.
[60]Open in a new tab
Lipid anabolism and lipid catabolism display largely opposing trends
during quiescence deepening. A) Mean Pearson correlation coefficient of
all reactions with QDS for lipid anabolism subsystems. Only subsystems
with a mean correlation coefficient different from zero (Q-value <
0.05, one-sided t test) are shown here. B) Mean Pearson correlation
coefficient of all reactions with QDS for lipid catabolism subsystems.
Only subsystems with a mean correlation coefficient different from zero
(Q-value < 0.05, one-sided t test) are shown here. C) Metabolic pathway
plot for example lipid anabolism subsystem: fatty acid biosynthesis
(odd-chain). Reactions are colored by the Pearson correlation
coefficient with quiescence depth. D) Metabolic pathway plot for
example lipid catabolism subsystem: beta oxidation of di-unsaturated
fatty acids (N-6) (mitochondrial). Reactions are colored by the Pearson
correlation coefficient with quiescence depth. E) Distribution of
Pearson correlation coefficients with quiescence depth for transcripts
belonging to the GO biological process “lipid biosynthesis.” Only
correlation coefficients with Q-value < 0.05 are shown here. F)
Distribution of Pearson correlation coefficients with quiescence depth
for transcripts belonging to the GO biological process “lipid
catabolism.” Only correlation coefficients with Q-value < 0.05 are
shown here.
This metabolic shift aligns strongly with experimental metabolomics
data from Yao et al. ([61]20) in which quiescent cells displayed higher
rates of carnitine-shuttle-dependent fatty acid oxidation and lower
rates of lipid biosynthesis compared to proliferating controls. It also
aligns with transcriptomics data used for model reduction. To
corroborate the trends with transcriptomics data, we identified genes
related to lipid biosynthesis and lipid catabolism, and then computed
their correlation with quiescence depth (Fig. [62]3E and F). For lipid
biosynthesis genes, we see a modest majority decline with quiescence
deepening whereas for lipid catabolism, a more substantial majority
increases with quiescence depth, both aligning with the trends seen in
the fluxes.
Epigenetic simulations reveal diverging trends in histone acetylation and
methylation as quiescence deepens
Epigenetic modifications play a profound role in regulating a cell's
gene expression, and their nature is highly dependent on the metabolic
state of the cell. The metabolite acetyl-CoA, for instance, serves as
the substrate for histone acetylation, a histone modification that
activates gene transcription, while the metabolite S-adenosylmethionine
(SAM) serves as the carbon donor in histone methylation ([63]21,
[64]22) (Fig. [65]4A). Given the interplay between metabolism and
epigenetics, metabolic models enable the simulation of histone
modifications based on the abundance and flux of metabolic precursors
([66]23–25).
Fig. 4.
[67]Fig. 4.
[68]Open in a new tab
Quiescence deepening is associated with diverging trends in epigenetic
histone acetylation and methylation. A) Diagram of chromatin
highlighting histone post-translational modifications and their
relationship to metabolism. SAM is the substrate for histone
methyltransferases (HMT) to methylate histone residues and convert SAM
to S-adenosylhomocysteine (SAH). Acetyl-CoA is the substrate for
histone acetyltransferases (HAT) to acetylate histone residues with CoA
as a byproduct. Citrate is converted to acetyl-CoA by ATP citrate lyase
(ACLY). These reactions are simulated in the metabolic model. B)
Simulated mean nuclear histone acetylation flux vs. quiescence depth
reveals a downward trend in histone acetylation as quiescence deepens.
Pearson correlation performed between quiescence depth and mean nuclear
histone acetylation flux. C) Top subsystems whose knockout reverse the
acetylation trend, ranked by Pearson correlation between QDS and
histone acetylation upon subsystem knockout. D) Simulated mean nuclear
histone methylation flux vs. quiescence depth reveals an upward trend
in histone methylation with increasing quiescence depth. Pearson
correlation performed between quiescence depth and mean nuclear histone
methylation flux. E) Top subsystems whose knockout reverses the
methylation trend, ranked by Pearson correlation between QDS and
histone methylation upon subsystem knockout.
Leveraging this ability, we simulated histone acetylation and
methylation as quiescence transitions from a shallow to a deep state.
Remarkably, we observed diverging trends in these two histone
modifications: histone acetylation is predicted to fall during
quiescence deepening (Fig. [69]4B) while histone methylation is
predicted to rise (Fig. [70]4D). In agreement with these results,
histone acetylation is known to fall with quiescence induction
([71]26).
What metabolic pathways could be driving these trends? To answer this
question, we knocked out each subsystem in the metabolic model and
observed the resulting epigenetic trend. For histone acetylation, the
downward trend was abolished most strongly by knocking out fatty acid
activation and carnitine-shuttle-related reactions—pathways known to
impact histone acetylation in other contexts ([72]27, [73]28) (Fig.
[74]4C). Given the steady increase in carnitine-shuttle-dependent fatty
acid oxidation predicted during quiescence deepening, this suggests
that increased fatty acid oxidation may be responsible for mediating
the downward trend in histone acetylation. For histone methylation,
knockdown of peptide metabolism and several lipid-related
subsystems—both anabolic and catabolic—had the strongest impact in
abolishing the upward trend (Fig. [75]4E), implying a link may exist
between rising histone methylation and the predicted shift from fatty
acid anabolism to catabolism during quiescence deepening.
In silico gene knockout simulations predict a range of quiescence
depth-altering genes that align with experimental data
Metabolic models enable the large-scale simulation of genetic
perturbations by applying known information about an organism's
metabolic networks and their relationship to genes ([76]13). Leveraging
this functionality, we performed a gene knockout screening to identify
genes that may modulate quiescence depth.
To accomplish this, we employed the algorithm robust metabolic
transformation algorithm (rMTA), which identifies genes whose knockout
shifts the metabolic state of a cell from a source state to a target
state ([77]29). We simulated metabolic fluxes for shallow quiescence
and deep quiescence through flux sampling, then performed two
screenings: one for genes whose knockout shifts the cell from shallow
to deep quiescence and another for genes whose knockout shifts the cell
from deep to shallow quiescence (Fig. [78]5A).
Fig. 5.
[79]Fig. 5.
[80]Open in a new tab
Gene knockout simulations predict a range of quiescence depth
modulators in alignment with experimental data. A) Overview of gene
knockout simulations. Transcriptomics from shallow and deep quiescence
states were used to constrain a metabolic model and perform flux
simulations, enabling the simulation of the metabolic state of a cell
in shallow and deep quiescence. Then, we used the rMTA to predict gene
knockouts that could move a cell to (i) a shallower quiescence state or
(ii) a deeper quiescence state. B) Accuracy of gene perturbation
simulations for top 50 hits. To validate these results experimentally,
we used a cancer genomic database called CancerMine. The CancerMine
database validation relied on the following principle: genes whose
knockdown lowered quiescence depth was considered validated if they
were annotated as tumor suppressors (genes whose downregulation
promotes cancer growth), genes whose knockdown raised quiescence were
considered validated if they were annotated as oncogenes (genes whose
downregulation lowers cancer growth). C) Enrichment analysis. For
quiescence deepeners, we calculated the fold enrichment and P-value of
tumor suppressors among the top 50 hits compared to the background list
of all genes knocked out with rMTA. For quiescence reducers, we
calculated the fold enrichment of cancer oncogenes or drivers compared
to the background set of genes. P-values are computed using the
hypergeometric function and listed above the bar chart bars. **P-value
< 0.01. D) Top 20 hits of the gene perturbation simulation for
quiescence reducers: The top 20 genes whose knockout is predicted to
move the cells toward a deeper quiescence state are shown here (ranked
by rMTA robust transformations core). Here, the CancerMine annotation
described in B) is shown for each prediction. E) Top 20 hits of the
gene perturbation simulation for quiescence deepeners: The top 20 genes
whose knockout is predicted to move the cells toward a shallower
quiescence state are shown here (ranked by rMTA robust transformations
core). As before, the CancerMine annotation described in B) is shown
here for each prediction. The gene MPC1 is highlighted. This gene has
experimental evidence for its inhibition lowering quiescence depth (as
predicted here).
To validate the results of these predictions, we compared the
predictions with known oncogenes and tumor suppressors from the
CancerMine database ([81]30). This comparison relied on the following
principle: genes whose knockdown lowered quiescence depth (“quiescence
deepeners”) were considered validated if they were annotated in the
database as tumor suppressors (genes whose downregulation promotes
cancer growth) and genes whose knockdown deepened quiescence depth
(“quiescence reducers”) were considered validated if they were
annotated as oncogenes or cancer drivers (genes whose downregulation
lowers cancer growth). This validation approach was used because of the
sparsity of studies screening genes directly for quiescence depth and
proliferation's strong association with quiescence depth—shallower
quiescent cells are more easily able to proliferate and several genes
that alter quiescence depth also affect proliferation in nonquiescent
cells, such as genes in the Rb-E2F pathway ([82]31, [83]32).
For genes whose knockdown was predicted to shift the cell to deeper
quiescence (“quiescence reducers”), a remarkable percentage agree with
experimental data. Virtually, all these genes with CancerMine
annotations are considered cancer oncogenes or drivers (19/20 of the
top 20 predictions with annotations, 23/25 of the top 25 predictions
with annotations), which aligns with a role in driving quiescence to a
shallower state with more proliferative potential (Fig. [84]5B). The
enrichment for cancer driver genes in these predictions out of all
genes tested is shown in Fig. [85]5C along with the P-value calculated
using the hypergeometric function. Examining genes in this category
reveals that many are involved in lipid metabolism and cancer
progression. The genes fatty acid synthase (FASN) and fatty acid
elongase 1 (ELOVL1), for instance, are both involved in fatty acid
biosynthesis and cancer initiation ([86]33, [87]34), and the genes
cholestenol delta-isomerase (EBP), squalene epoxidase (SQLE), and
24-dehydrocholesterol reductase (DHCR24) represent a diverse set of
cholesterol biosynthesis-related oncogenes ([88]35).
For genes whose knockdown is predicted to shift the cell toward
shallower quiescence (“quiescence deepeners”), 8 of the top 21
predictions with annotations are classified as tumor suppressors. While
this enrichment does not reach statistical significance out of all
tested genes, one prediction is especially noteworthy for its alignment
with experimental data: mitochondrial pyruvate carrier 1 (MPC1), which
is listed among the top 20 genes whose knockout reduces quiescence
mostly strongly out of 2,953 tested genes. This gene has direct
experimental evidence for its inhibition lowering quiescence depth.
Both pharmacological and genetic inhibition of MPC1 activates quiescent
NSCs by shifting metabolism away from oxidative phosphorylation and
toward glycolysis ([89]36), in alignment with the simulated results.
Exploring other predictions in this category shows that several genes
are involved in bile acid synthesis, including 3 beta-hydroxysteroid
dehydrogenase type 7 (HSD3B7) and cholesterol 7α-hydroxylase (CYP7A1)
([90]37), and several are involved in nucleotide metabolism such as
purine nucleoside phosphorylase (PNP), guanylate kinase 1 (GUK1), and
adenosine deaminase (ADA) ([91]38–40).
The top 20 hits for both quiescence reducers and drivers along with
their CancerMine annotation are shown in Fig. [92]5D and E. The top 50
predictions for both quiescence reducers and deepeners are listed in
Table [93]S1.
Flux-based quiescence depth predictor captures metabolic trends associated
with quiescence deepening for multiple types of input data
To synthesize the metabolic variation underlying quiescence deepening
into a single score, we built a machine-learning model to predict QDS
using reaction fluxes. This model captures the metabolic trends behind
quiescence deepening in an easily interpretable manner. It also
accommodates multiple types of input data. An elastic net architecture
was chosen to build the most parsimonious model and strike a balance
between underfitting and overfitting the reaction coefficients (i.e.
too many and too few model coefficients).
The model's accuracy in predicting quiescence depth was validated both
internally and externally. Based on internal k-fold cross-validation,
the model accurately predicted quiescence depth across the entire range
of values when trained on 90% of the data and tested on the remaining
10% (Fig. [94]6C). To validate the accuracy of the model externally in
independent datasets, we examined six datasets involving nine distinct
experimental conditions for proliferating vs. quiescent cells
([95]42–45). Given that shallower quiescent cells are closer to
proliferation than deeper quiescent cells, we reasoned that quiescent
cells should have a higher QDS relative to proliferating cells. Here,
we observed a trend toward higher quiescence depth in seven of the nine
datasets and a higher quiescence depth in four of the nine datasets at
a P-value threshold of 0.1 (Fig. [96]6D) despite vastly different
experimental conditions than the data the model was trained on,
including interleukin 3 (IL-3) starvation in murine lymphocytes (Fig.
[97]6D, iii) and serum starvation in human cancer cells (Fig. [98]6D,
iv). The model, however, does not generalize to all forms of
quiescence, such as Mek inhibition and CDK4/6 inhibition, likely
because these are pharmacologically induced. It also does not
generalize to all cell types, such as liposarcoma cells (Fig. [99]6D,
vi), likely because of the vast differences in metabolism between
fibroblasts and adipocytes ([100]47).
Fig. 6.
[101]Fig. 6.
[102]Open in a new tab
Flux-based machine learning model captures metabolic signatures of
quiescence deepening for multiple types of input data. A) Diagram of
the machine learning model. Reaction fluxes are inputted, and the model
is trained to predict QDS (days since serum starvation) using elastic
net regression. B) Top subsystems enriched in the model's positive
coefficients (up with quiescence deepening) and negative coefficients
(down with quiescence deepening). The top 10 subsystems with Q-value <
0.05 for the hypergeometric fold enrichment test are shown here. C)
K-fold cross-validation accuracy plot. The model is trained and
optimized on 90% of the data and tested in the remaining 10% for 10
random subsets of the data. Note: all parameter are un-optimized in
training set except for fold-change threshold, which is set close to
one to maximize the impact of genes on fluxes. D) External validation
shows that the model predicts deeper quiescence (higher QDS) for cells
in a quiescent phase vs. a proliferative phase for multiple types of
quiescence and proliferation induction and multiple types of input
omics data (transcriptomics, translatomics, and proteomics). Fluxes are
generated using genes upregulated in quiescence relative to
proliferation vs. fluxes generated using genes upregulated in
proliferation relative to quiescence with a fold change threshold of
1.00. (i) Proliferation is induced using epidermal growth factor (EGF),
horse serum, insulin, and hydrocortisone. Quiescence is induced with
serum starvation, Mek inhibition, CDK46 inhibition, and contact
inhibition. Transcriptomics measured. Comparisons were performed with a
two-sample t test. Source: [103]GSE122927 ([104]41). (ii) Proliferation
is induced using EGF or tissue plasminogen activator (TPA). Quiescence
induced with serum starvation. Transcriptomics measured. Comparisons
were performed with a two-sample t test. Source: Ref. ([105]42). (iii)
QDS for proliferating vs. quiescent cells. Proliferation is induced by
IL-3. Quiescence is induced with IL-3 starvation. Proteomics measured.
Comparisons were performed with a paired t test. Source: Ref.
([106]43). (iv) QDS for proliferating vs. quiescent cells.
Proliferation is induced using EGF. Quiescence induced with serum
starvation. Translatomics measured. Comparisons were performed with a
paired t test. Source: [107]GSE112295 ([108]44). (v) Proliferation is
induced using fetal bovine serum (FBS). Quiescence induced with 7-day
contact-inhibition. Transcriptomics measured. Comparisons were
performed with a two-sample t test. Source: [109]GSE117444 ([110]45).
(vi) Proliferation is induced using FBS. Quiescence induced with serum
starvation. Transcriptomics measured. Comparisons were performed with a
two-sample t test. Source: [111]GSE74620 ([112]46).
One of the advantages of this flux-based approach is its ability to
accommodate multiple types of omics data. Using transcriptomics,
translatomics, and proteomics data as inputs, external validation
accurately predicted quiescent cells to have a deeper quiescence than
proliferating cells (Fig. [113]6D i, iii, iv). Features in the model
largely aligned with metabolic trends highlighted in Fig. [114]2:
reactions corresponding to heme metabolism, carnitine shuttle, and
fatty acid oxidation were enriched in the positive coefficients (up
with quiescence deepening) while reactions associated with fatty acid
biosynthesis were overrepresented in the negative coefficients (down
with quiescence deepening) (Fig. [115]6B). The full set of features is
listed in Table [116]S2.
Multi-cell type quiescence depth predictor accurately predicts quiescence
depth in a wide range of contexts
While the flux-based model provides both mechanistic interpretability
and flexibility in terms of data types that can be inputted, the large
number of computations it requires limits its application in external
datasets. Hence a second, complementary model that directly uses
transcriptomics data to predict QDS was also developed to facilitate
high-throughput screening of perturbations to modify quiescence depth
and explore quiescence in large external datasets. To build the model,
we compiled quiescence-deepening gene expression datasets from three
experimental conditions including rat embryo fibroblasts (REF)
([117]5), human retinal pigment epithelial cells (RPEs) ([118]48), and
human dermal fibroblasts (HFB) ([119]48) (Fig. [120]7A). Taken
together, these datasets encompass multiple species (rat and human),
multiple cell types (epithelial and fibroblast), and multiple
quiescence induction methods (serum starvation and contact inhibition).
Similar to before, quiescence depth was denoted as the days since
quiescence induction began, and each quiescence depth was normalized to
the maximum of each dataset. Transcriptomics gene expression was set as
the input to the model and the output was set as the normalized
quiescence depth. Like before, an elastic net architecture was chosen
to balance underfitting and overfitting the model.
Fig. 7.
[121]Fig. 7.
[122]Open in a new tab
Multi-cell type machine learning model accurately predicts QDS in a
wide range of contexts. A) Diagram of the machine learning model.
Transcriptomics from three datasets measuring quiescence deepening is
inputted, and the model is trained to predict QDS (days since serum
starvation, normalized to the maximum of each dataset) using elastic
net regression. B) GO pathway enrichment analysis for positive and
negative coefficients of model. The top eight biological processes for
negative coefficients (down in quiescence deepening, Q-value < 0.65)
and positive coefficients (up in quiescence deepening, Q-value < 0.05)
as ranked by Q-value of enrichment. C) K-fold cross-validation accuracy
plot. The model is trained and optimized on 90% of the data and tested
in the remaining 10% for 10 random subsets of the data. D) External
validation: model predictions for proliferating vs. quiescent cells in
various cell types and experimental conditions. (i) Proliferation is
induced using FBS. Quiescence induced with 7-day contact-inhibition.
Comparisons were performed with a two-sample t test. Source:
[123]GSE117444 ([124]45). (ii) Proliferation is induced using FBS.
Quiescence induced with serum starvation. Comparisons were performed
with a two-sample t test. Source: [125]GSE74620 ([126]46). (iii)
Proliferation is induced using EGF, horse serum, insulin, and
hydrocortisone. Quiescence is induced with serum starvation, Mek
inhibition, CDK46 inhibition, and contact inhibition. Comparisons were
performed with a two-sample t test. Source: [127]GSE122927 ([128]41).
(iv) QDS for proliferating vs. quiescent cells. Proliferation is
induced using EGF. Quiescence induced with serum starvation.
Comparisons were performed with a two-sample t test. Source:
[129]GSE112295 ([130]44). (v) Proliferation is induced using EGF or
TPA. Quiescence induced with serum starvation. Comparisons were
performed with a two-sample t test. Source: Ref. ([131]42).
As before, the model's accuracy was validated internally and
externally. For k-fold cross-validation, the model accurately predicted
quiescence depth across the entire range of values when trained on 90%
of the data and tested on the final 10% (Fig. [132]7C). To validate the
accuracy of the model in independent datasets, we examined five
external gene expression datasets involving eight distinct experimental
conditions for proliferating vs. quiescent cells ([133]42–45).
Strikingly, quiescent cells were accurately predicted to have a deeper
quiescence than proliferating controls for six of the eight conditions
tested. This was also observed across multiple types of growth
stimulation and multiple types of quiescence induction, including
contact inhibition, serum starvation, Mek inhibition, and CDK4/6
inhibition (Fig. [134]7D).
To gain a better understanding of the pathways captured in this model,
we performed a gene ontology (GO) enrichment analysis for the negative
features (down in quiescence deepening) and the positive features (up
in quiescence deepening) (Fig. [135]7B). Various ion transport pathways
are enriched in the positive model coefficients whereas several
immune-related and Wnt-related pathways are enriched in the negative
features. The full set of features is listed in Table [136]S3.
Changes in quiescence depth are predicted during stem cell aging and
embryogenesis
Fujimaki et al. ([137]5) observed a deepening of quiescence during stem
cell aging. Exploring this area with our multi-cell type machine
learning model, we examined quiescence depth across age for both
hematopoietic stem cells (HSCs) ([138]49, [139]50) and NSCs ([140]51).
For datasets with significant missing features, we reconstructed the
model using genes shared between that dataset and the training gene set
(see Methods).
For both NSCs and HSCs, we observed a trend reflecting a deepening of
quiescence with age (Fig. [141]8A and B). For HSCs, we applied the
model to a single-cell transcriptomics dataset measuring gene
expression in long-term (LT) HSCs, short-term (ST) HSCs, and
multipotent progenitor cells (MPPs). Intriguingly, we see a deepening
of quiescence in LT and ST HSCs with age but not MPPs, suggesting aging
is associated with cell-type-specific changes in quiescence depth. For
NSCs, the trend of increasing quiescence depth followed by a reversal
matches the nonmonotonic changes in NSC behavior with aging observed by
Apostolopoulou et al. ([142]51) (Fig. [143]8B).
Fig. 8.
[144]Fig. 8.
[145]Open in a new tab
Changes in quiescence depth are associated with stem cell aging and
embryogenesis. A) HSC aging: (i) Older HSCs have a trend toward higher
QDS than young HSCs. Source: [146]GSE109546. (ii) Older LT HSCs and ST
HSCs have higher QDS than young cells of the same type. No aging trend
was observed in MPPs. Single-cell RNA-seq of aging HSCs. Source:
[147]GSE59114. B) Median QDS and age correlate in mouse subventricular
zone (SVZ) neural stem cell (NSC) niche (Pearson correlation). Source:
[148]GSE104651. C) During embryogenesis, quiescence depth is predicted
to fall and then rise between the zygote and 32-cell state. Single cell
RNA-seq measured for preimplantation mouse blastomeres. Source:
[149]GSE136714. D) GO transcriptomic analysis of proliferation markers.
Shown here are the mean z-score-normalized transcript levels for
transcripts belonging to GO cell growth and GO stem cell proliferation.
While GO cell growth-related genes show no definitive pattern, stem
cell proliferation-related genes rise and then fall in a pattern that
(except for the zygote stage) aligns with the fall and rise in QDS seen
in early embryogenesis.
Examining quiescence in a different biological context, we investigated
quiescence depth during embryogenesis, specifically focusing on the
zygote to 32-cell state. Here, the model predicted a fall and then a
rise in quiescence depth between the zygote and 32-cell (Fig. [150]8C).
This suggests that during embryogenesis, cells are most proliferative
around the four-cell state and less so before and after. To corroborate
these findings, we examined the normalized expression levels of genes
belonging to two proliferation-related gene sets (Fig. [151]8D). While
cell growth gene expression shows no definitive trends, stem cell
proliferation gene expression (aside from the zygote stage) rises and
then falls during early embryogenesis, which aligns with the changes in
quiescence depth predicted by the machine learning model. Given that
embryonic cells are a type of stem cell, the stem cell proliferation
gene set serves as a more relevant validation of these predictions.
Taken together, our model implies that quiescence depth regulation is a
critical factor in both development and aging.
Various small molecules and genes predicted to alter quiescence align
strongly with experimental results
We next applied the multi-cell type QDS predictor to screen a range of
perturbations for their ability to modify quiescence depth. As input to
the model, we used data measured as part of the library of integrated
cellular signatures (LINCS) program that measures gene expression in
response to small molecule and genetic perturbations using the L1000
assay ([152]52) (Fig. [153]9A). Importantly, we reconstructed the model
using genes shared between the high confidence “landmark” L1000 gene
set and the training gene set.
Fig. 9.
[154]Fig. 9.
[155]Open in a new tab
Small molecules and gene knockouts predicted to alter quiescence align
strongly with experimental results. A) Overview of perturbation
screening: L1000 data measured as part of the library of integrated
network-based cellular signatures program for two classes of
perturbations (small molecules and gene knockouts) is inputted into the
QDS predictor to screen perturbations for their effect on quiescence
depth. B) Top small molecules whose knockout is predicted to alter
quiescence depth. For each of the 1,713 small molecules tested,
conditions are controlled to a 24-h timepoint and 10 μM dose across
multiple cell lines. Top 25 predictions with Q-value < 0.05 shown here,
as ranked by magnitude of quiescence depth score modulation. Q-value is
the Benjamin–Hochberg-corrected P-value of a two-sample t test between
predicted QDS for the control condition (drug vehicle) and treatment
condition (small molecule compound). C) Top genes whose knockout is
predicted to alter quiescence depth (Q-value < 0.05). D) Enrichment of
tumor suppressors among quiescence deepeners (genes whose knockout
lowers QDS and increases proliferative potential), and enrichment of
oncogenes/cancer drivers among quiescence reducers (genes whose
knockout raises QDS and decreases proliferative potential). P-value
calculated using the hypergeometric function. Results are shown at
three Q-value thresholds for selecting top gene knockout hits: 0.05,
0.1, and 0.85. Cancer annotations sourced from the CancerMine database.
Genes with a majority of tumor suppressor annotations in CancerMine are
classified as tumor suppressors, and genes with a majority of cancer
driver or oncogene annotations are classified as cancer drivers.
First, we screened all 1,713 small molecules in the LINCS dataset for
their ability to alter quiescence depth. For each molecule, QDS was
predicted and compared to QDS values predicted for a vehicle control
with a two-sample t test. The ratio between the average QDS in the
treatment group and the average QDS in the control group was then
computed. At a false discovery rate (FDR) of 5%, 488 small molecules
were predicted to raise or lower quiescence depth. The top 25
predictions are highlighted in Fig. [156]8B, and the full list of
compounds is listed in Table [157]S4.
Among the top compounds predicted to deepen quiescence,
cyclin-dependent kinase (CDK), histone deacetylase (HDAC), and
topoisomerase II (TOPII) inhibitors are common (Fig. [158]9B). In
alignment with their predicted role as quiescence deepeners, many of
the identified small molecules predicted to deepen quiescence have
direct experimental evidence for inducing quiescence through cell cycle
arrest, including the CDK inhibitors R-547 ([159]53), AT-7519
([160]54), and AZD-5438 ([161]55). Others may be novel compounds to
alter quiescence. Of the 488 small molecules predicted to alter
quiescence, 51 are predicted to lower quiescence depth. While several
of these compounds have antiproliferative activity and are thus likely
false positives, the hormone estradiol is among the compounds predicted
to lower quiescence, which aligns with the proliferation-promoting
effects of estrogen ([162]56).
Next, we screened all 53 gene knockouts in the LINCS dataset for their
ability to modulate quiescence. QDS values predicted for each gene
knockout were compared to QDS predicted for an empty gene vector
control. As with the metabolic model knockouts, we used the CancerMine
database to validate our predictions. Here, for both directions of QDS
modulation, a striking percentage of genes were validated with
experimental cancer data. At an FDR of 5%, of the genes with
annotations, all genes whose knockout lowered QDS are annotated as
tumor suppressors, and all genes whose knockout raised QDS are
considered cancer drivers (Fig. [163]9C).
For the genes whose knockout is predicted to lower quiescence depth,
the strongest prediction was made for tumor protein p53 (TP53), a gene
famous for its tumor-suppressive and antiproliferative properties
([164]57). At more lenient differential QDS thresholds, three of the
top five genes whose knockout lowers QDS are tumor suppressors, an
enrichment of 3.86-fold over the background gene set (P-value 0.021).
For genes whose knockout is predicted to deepen quiescence, results are
similarly striking. At Q-value thresholds up to 0.3, all genes whose
knockout is predicted to raise quiescence depth are considered
oncogenes or cancer drivers. Among the top predictions are several
notorious oncogenes including proto-oncogene C-Myc (MYC), which is
dysregulated in over 50% of human cancers ([165]58), and proto-oncogene
B-Raf (BRAF), whose mutation is associated with nearly 7% of all human
cancers ([166]59). The top gene knockout hits for both directions of
QDS alteration are shown in Fig. [167]9C along with their CancerMine
annotation. The full list of genes is listed in Table [168]S5.
Discussion
Here, we employed two distinct yet complementary approaches to identify
modulators of quiescence depth: in silico gene knockouts in metabolic
models, and a multi-cell type quiescence depth predictor that
facilitates perturbation screening. In doing so, we also simulated
metabolic and epigenetic changes underlying quiescence deepening,
revealing substantial shifts in lipid metabolism and epigenetic
modifications as quiescence shifts from a shallow to a deep state. For
both screening approaches, top quiescence-altering candidates align
strongly with experimental data and suggest various pathways capable of
shifting quiescence toward a shallower or deeper state.
The most striking metabolic trends identified during quiescence
deepening involve lipid metabolism, namely the fall in various kinds of
lipid anabolism and the rise of lipid catabolism, particularly fatty
acid oxidation. In alignment with these results, fatty acid oxidation
is a key hallmark of quiescence across multiple cell types. Comparing
the metabolomics of proliferating vs. quiescent fibroblasts, Yao et al.
([169]20) observed that quiescent fibroblasts showed higher levels of
carnitine-shuttle-dependent fatty acid oxidation. Further, they
observed lower rates of complex-lipid biosynthesis in quiescent cells.
Additional studies have shown that in both fibroblasts ([170]60) and
NSCs ([171]61), quiescent cells are so reliant upon fatty acid
oxidation for energy production that inhibiting the pathway impairs
cell viability. Both lend experimental evidence to a metabolic shift
from fatty acid anabolism to catabolism as cells transition from
proliferation to progressively deeper stages of quiescence.
Based on epigenetic simulations, similarly strong trends were
identified in histone modifications with acetylation falling and
methylation rising as quiescence deepens. Quiescence is typically
associated with reduced gene expression and more compact chromatin
architecture. It is also associated with changes in histone
methylation, although the direction of this change is variable and
depends on the specific histone residue ([172]26). For acetylation, a
clearer pattern is evident. In agreement with the simulated results,
quiescence induction is associated with a reduction in histone
acetylation across multiple cell types and histone residues. Since
acetylation promotes a more accessible chromatin structure, this
reduction—in part—mediates the transition to lower levels of gene
expression during the transition from proliferation to quiescence
([173]26). An investigation into the metabolic pathways driving the
trends in histone modifications revealed that shifts in fatty acid
metabolism—higher fatty acid oxidation, lower fatty acid
biosynthesis—in quiescence could explain the observed trends and
highlight the strong interplay between metabolism and epigenetics.
Through our modeling, we showed how quiescence-inducing perturbations
such as serum starvation can result in metabolic changes that influence
epigenetic modifications, which alter gene expression and ultimately
result in withdrawal from the cell cycle.
Given the alteration in quiescence depth observed in aging and disease
([174]5, [175]60, [176]62), identifying chemical and genetic modulators
of quiescence may be key to addressing these conditions. In this study,
we identified potential modulators of quiescence depth using two
approaches. Using rMTA, we identified genes whose knockout could
robustly shift the metabolic state of a cell to a deeper or shallower
state of quiescence. Particularly for quiescence reducers, results
align well with experimental data. Notably, several genes predicted to
affect quiescence depth here are involved in lipid metabolism, aligning
with the shifts in lipid metabolism predicted in quiescence deepening.
We also developed two complementary quiescence depth predictors: a
flux-based predictor that captures metabolic variations in quiescence
deepening and a gene expression-based predictor based on multiple cell
types and quiescence induction methods. Both models accurately
predicted quiescent cells to have higher quiescence depth than
proliferating cells for multiple types of growth induction, multiple
types of quiescence induction, and multiple types of cells, suggesting
they are generalizable to contexts beyond the datasets they were
trained. Applying the multi-cell type model to screen L1000
perturbations generated predictions that align well with experimental
data for both small molecules and gene knockouts. It also predicted
changes in proliferative ability during embryogenesis, suggesting
quiescence depth is an important biological factor from the earliest
stages of development onto later aging processes. Interestingly,
several small molecules whose knockout is predicted to affect
quiescence depth target histone deacetylases (HDACs), underscoring the
link between epigenetics and quiescence deepening, and connecting the
metabolic modeling and knockout screening.
Several limitations to this study should be noted. Metabolic
simulations are a powerful hypothesis-generating tool, and although
several predictions align with experimental datasets, more
comprehensive experimental validation is necessary to confirm their
accuracy. Moreover, while GEMs are a well-established tool to study
metabolism, their predictions are limited to the genes and metabolic
pathways that are well described in literature, which represents only a
subset of mammalian metabolism. The small molecule and genetic
perturbations will also require further validation to confirm their
ability to alter quiescence depth.
In sum, we leverage two approaches to identify perturbations with
strong potential to alter quiescence depth and with relevance to cancer
and aging. We also show that GEMs can capture known aspects of
quiescence deepening across metabolic and epigenetic levels and
simulate the interplay between metabolic networks and epigenetic
modifications. Our multi-cell type quiescence depth predictor is
generalizable to a wide range of contexts. This study underscores the
power of metabolic modeling and machine learning to characterize and
predict modulators of cell states.
Methods
Differential expression analysis
To identify differentially expressed genes found in omics datasets for
integration into the metabolic model, we employed a fold change-based
approach. For this approach, the ratio between two groups was computed
and transcripts with a fold change above a certain threshold were
considered upregulated and those with a fold change below a certain
threshold downregulated. A fold change threshold of 1.00 was used
unless otherwise noted.
Flux prediction
To simulate metabolic reaction fluxes, we used two complementary
approaches: flux sampling and a linear version of the iMAT algorithm
([177]18, [178]24). For both approaches, we used the Rat-GEM metabolic
model ([179]16), which was chosen to match the biology of the rat
embryonic fibroblast cells in which the quiescence deepening
transcriptomics was measured. This model contains 2,953 genes, 8,377
metabolites, and 13,028 reactions.
For flux sampling, quiescence depth-specific metabolic models were
created for nine stages of quiescence depth with three samples each by
incorporating highly and lowly expressed genes to constrain the models
using INIT ([180]17, [181]63). Highly expressed genes were defined as
genes whose quantile-normalized, log2-normalized value was at least the
mean plus alpha times the standard deviation in at least beta percent
of the three quiescence depth replicates ([182]Eq. 1). Lowly expressed
genes were defined similarly ([183]Eq. 2).
[MATH: Geneup≥sum(Log2(Geneabun.+
1)≥(mean+alpha*st.dev.))≥beta*3. :MATH]
(1)
[MATH: Genedown≥sum(Log2(Geneabun.+
1)≤(mean−alpha*st.dev.))≥beta*3.<
/mtable> :MATH]
(2)
Alpha was chosen as the minimum value that generated metabolic models
that could obtain a feasible flux solution (described in the next step)
through flux sampling for all nine quiescence depths. Alpha was raised
in increments of 0.1 until a value of 1.1 was determined. Beta was
chosen as 0.6 (meaning genes had to meet the specified criteria in at
least two of the three replicates) to balance capturing changes in
transcripts over time and still imposing a robust threshold for
selecting up and down genes.
We then used coordinate hit-and-run with rounding (CHRR) ([184]64) to
perform flux sampling on the quiescence-specific metabolic models for
each stage of quiescence depth. 1,000 flux samples were generated for
each quiescence depth. The Gurobi mathematical solver (version 9.5,
Linux) was used to solve for the reaction fluxes on a high-performance
computing cluster ([185]65).
While flux sampling has advantages, including the absence of a
presupposed objective function and a more comprehensive accounting of
alternate flux solutions, it also takes considerable computational
resources and time. Because of this, we also simulated reaction fluxes
with a simpler approach using a linear version of the iMAT algorithm
and compared the results to flux sampling (Fig. [186]S4). This approach
assumes a steady-state metabolism and solves for the reaction fluxes by
optimizing an objective function ([187]66). Here, reactions associated
with upregulated genes are assigned higher flux values while reactions
associated with downregulated genes are assigned lower values. Up and
down genes are determined using the fold change approach described
above based on differential expression compared to the lowest
quiescence depth (QDS = 2); hence, eight quiescence depth scores were
used with three replicates each (since the first quiescence depth was
used as the reference). The Gurobi mathematical solver (version 9.0,
macOS) was used here to solve for the reaction fluxes ([188]65). Based
on a subsystem analysis, we found that the predicted metabolic changes
during quiescence deepening were largely similar between the linear
iMAT algorithm and flux sampling (Fig. [189]S4).
The linear version of the iMAT algorithm included three input
parameters: kappa (weight for downregulated genes), rho (weight for
upregulated genes), and epsilon (minimum absolute value for upregulated
flux). To determine which parameter values to use, different ranges of
values were tested to determine which parameter set minimized
three-fold cross-validated mean squared error in predicting quiescence
depth through elastic net regression. The same approach was used to
choose the objective function to optimize for during flux balance
analysis (FBA) (biomass, adenosine triphosphate [ATP], nicotinamide
adenine dinucleotide [NADH], and glutathione were tested, and biomass
was chosen) and the optimal fold change threshold for selecting
upregulated and downregulated genes from the quiescence deepening
transcriptomics to input into iMAT (1.008695652). The bounds of the
fold change threshold optimization range were set close to 1 (1 to 1.2)
to maximize the impact of genes on the fluxes and separate conditions
in external datasets.
Histone methylation and acetylation simulation
To simulate histone methylation and acetylation, reactions were added
to the rat-GEM metabolic model representing these histone modifications
as described in Fig. [190]4A ([191]24, [192]25). This simulation
modeled the substrate metabolite abundance to predict histone
modification flux. Then the objective value for this reaction was
modified to a nonzero value and flux predictions were made for each QDS
using the linear version of the iMAT algorithm described above. Linear
iMAT was chosen over flux sampling as quiescence-associated fluxes were
similar between flux sampling and iMAT and because this approach
requires an objective function value for the histone-modification
reaction to be a nonzero value (since the histone-modification reaction
does not contribute to biomass optimization). The exact objective
function value for the histone-modification reaction was determined by
identifying the range of values that separated the histone-modification
fluxes across QDS (i.e. fluxes that were not all identical) and
resulted in biomass flux values that were on the same order of
magnitude or higher than the flux through the histone-modification
reaction (meaning the histone objective was not overoptimized relative
to biomass). Figure [193]S5 shows histone-modification fluxes and
biomass fluxes meeting these criteria for multiple objective function
values.
Subsystem enrichment analysis
To perform the metabolic subsystem enrichment analysis, the proportion
of reactions belonging to a particular subsystem within the highly
correlating-with-quiescence depth reactions was divided by the
proportion of reactions associated with that particular subsystem out
of the total set of reactions. This was done separately for the
positively correlating and negatively correlating reaction sets. The
hypergeometric function was used to calculate the P-value for the
enrichment of each subsystem. These P-values were corrected with the
Benjamini–Hochberg FDR adjustment.
rMTA gene knockout screening
For the rMTA gene knockout screening ([194]29, [195]64), we generated
metabolic models for shallow quiescence (QDS of 2 to 3 days) and deep
quiescence (QDS of 12 to 14 days) using INIT and selecting up and down
genes as described above in the Flux prediction section. We performed
two rounds of rMTA: one in which the source state was shallow
quiescence and the target state was deep quiescence and another in
which the source state was deep quiescence and the target state was
shallow quiescence. For each round, the following procedure was used:
1. For the source state, we generated fluxes through flux sampling
with CHRR using the same parameters described in the Flux
prediction section ([196]30, [197]64).
2. Using differential transcriptomics, we determined the transcripts
that changed between the source state and target state at an FDR
value of 0.05 (Benjamini–Hochberg-corrected P-value of two-sample t
test between the source and target state).
3. We mapped these changes in transcripts to changes in metabolic
reactions using gene-reaction mapping in the metabolic model.
4. We knocked out each gene in the metabolic model and determined its
potential to shift these metabolic reactions from the source to the
target state.
The rMTA screening was performed using the Gurobi mathematical solver
(version 9.5, Linux) with default parameters on a high-performance
computing cluster ([198]65). Gene knockout predictions were validated
using CancerMine as described below in CancerMine validation.
CancerMine validation
For the CancerMine database approach, gene annotation as a tumor
suppressor, oncogene, or cancer driver was used to validate predictions
([199]30).
To validate genes whose knockdown was predicted to lower quiescence
depth (i.e. raise proliferative potential), the following criteria were
used:
1. In the CancerMine database, a majority of annotations for this gene
were tumor suppressors (genes whose downregulation increases cancer
growth)
To validate genes whose knockdown was predicted to raise quiescence
depth (i.e. lower proliferative potential), the following criteria were
used:
1. In the CancerMine database, a majority of annotations for this gene
were either oncogenes or cancer drivers (genes whose downregulation
lowers cancer growth)
If no annotations were present or if annotations were split 50% tumor
suppressor, 50% oncogene, or driver, validation for that gene was
considered inconclusive.
Machine learning model creation and optimization
To build the flux-based QDS model, all GEM reaction fluxes generated by
the linear version of iMAT were set as the input, and the output was
set as QDS (days since serum starvation) without normalization to [0,
1]. The linear version of iMAT was chosen as the input because its
predictions aligned well with flux sampling (see Fig. [200]S4) while
using considerably fewer computational resources. An elastic net
architecture was chosen to strike a balance between underfitting and
overfitting the reaction coefficients. Underfitting would impair the
accuracy of the model in the training dataset and overfitting would
hinder the generalizability of the model in external datasets, both
outcomes we sought to avoid. The elastic net architecture accomplishes
this by minimizing model error while also minimizing both the sum of
the magnitude of coefficients and the sum of the square of coefficients
(so-called L1 and L2 penalties, also known as lasso and ridge
penalties). This ensures that only the most salient and generalizable
features are incorporated into the model.
The model was implemented in MATLAB with the lasso function and
accepted two key parameters: lambda, the regularization coefficients,
and alpha, the ratio between L1 and L2 optimization (or lasso and ridge
optimization). Both alpha and lambda were optimized to minimize
three-fold cross-validated mean squared error in predicting QDS. The
fold change threshold, objective function, and inputs to the linear
version of iMAT were optimized similarly, as described in the Flux
prediction section (Table [201]S6). In total, 838 metabolic reactions
were selected by elastic net regression in this model (i.e. they had
nonzero coefficient values) out of 13,028 total reactions used in the
training set.
To ensure the flux-based model was robust to the data integration
method, we predicted quiescence depth across the entire range of values
for fluxes generated via INIT and flux sampling (Fig. [202]S7) as
described in the Flux prediction section. Here, the model accurately
predicted a deepening of quiescence for these fluxes despite vastly
different transcriptomics incorporation methods.
To build the multi-cell type transcriptomics QDS predictor, we
aggregated three transcriptomics datasets capturing quiescence
deepening over time ([203]5, [204]48, [205]67). Each dataset was
converted to transcripts per million (TPM) and quantile-normalized to
align the distributions between datasets (Fig. [206]S6). A count filter
was applied to omit any transcripts with a coverage of <10.
Proliferating controls were also omitted. Then, principal component
analysis was performed to identify outlier conditions, and any outliers
were removed (Fig. [207]S6). Finally, each transcript was z-score
normalized across conditions. This resulted in 79 total samples across
the three datasets. Normalized gene expression was set as the input to
the model and QDS was set as the output.
For each sample, QDS was defined as the days since quiescence induction
began. Quiescence depth was normalized to the maximum of each dataset,
so all values were in the range of 0 to 1 (Fig. [208]S6).
To construct the machine learning model, an elastic net regression
model was built and optimized to predict QDS based on transcript
levels. As before, an elastic net architecture was chosen to strike a
balance between underfitting and overfitting the model coefficients.
The model was also implemented in MATLAB with the lasso function and
accepted two key inputs: lambda, the regularization coefficients, and
alpha, the ratio between ridge and lasso optimization. Both alpha and
lambda were optimized to minimize 10-fold cross-validated mean squared
error in predicting QDS (Table [209]S7). In total, 1,306 genes were
selected by elastic net regression in this model out of 9,304 genes in
the training set.
To evaluate both models internally, we performed 10-fold
cross-validation in which a random 90% of the data was used to train
and validate the model, and the remaining 10% of the data was used as a
test set to evaluate the model's accuracy. The trained and optimized
model had never seen this last 10%. This process was repeated 10 times
until all samples had been in the test group.
Enrichment analysis of transcriptomic elastic net model coefficients
To perform GO enrichment of the multi-cell type elastic net model QDS
predictor, we used the R package clusterProfiler ([210]68). We searched
for GO biological processes that were enriched in the positive
coefficients (up with quiescence deepening) and negative coefficients
(down with quiescence deepening). For the background gene set, we used
the list of genes used for training the elastic net model.
Application of models to external datasets
For external datasets, raw counts were filtered to omit genes with a
count of <10, and each condition was size-normalized. For single-cell
RNA-sequencing data, low coverage values were imputed using the
algorithm Markov affinity-based graph imputation of cells (MAGIC)
([211]69).
If fewer than 40% of model coefficient genes were found in a dataset
after filtering, the elastic net model was reconstructed using genes
shared between the training set and the external dataset using the same
approach described in the Machine learning model creation and
optimization section. This procedure was performed only for the dataset
[212]GSE109546.
L1000 perturbation screening
Level 5 (consensus z-score signature) L1000 gene expression data from
the LINCS phase II ([213]52) dataset was read into MATLAB using
CMAP-associated scripts ([214]70). Genes were filtered to include only
the directly measured landmark gene set. The multi-cell type QDS
predictor was reconstructed using only the genes common to the training
set and the landmark gene set with the same approach described in the
Machine learning model creation and optimization section.
Small molecule screening
One thousand seven hundred and thirteen small molecules were tested,
and conditions were controlled to a 24-h
post-small-molecule-administration timepoint and 10 μM dose. Multiple
cell lines were used for each small molecule. QDS predictions were made
with the multi-cell type QDS predictor for each small molecule and
compared to QDS predictions for a vehicle control with a two-sample t
test. Small molecules with a significant difference in mean QDS between
the treatment and control group were highlighted.
Gene knockout screening
Fifty-three gene knockouts were tested, and all conditions were
measured 96 h postgene-knockout. Multiple cell lines were used for each
gene knockout. QDS predictions were made with the multi-cell type QDS
predictor for each gene and compared to QDS predictions for an empty
gene vector control with a two-sample t test. Gene knockouts with a
significant difference in the mean QDS between the treatment and
control group were highlighted and validated as described in the
CancerMine validation section.
Quantification and statistical analysis
Statistical tests were performed in MATLAB (version R2022A and R2022B)
and R (version 4.3.0). All correlations were computed with the Pearson
method. A two-tailed t test was used to determine the statistical
significance of differences between groups based on the independence of
observations in each sample, the normality of samples, similarity in
sample variance, and random sampling of data. The sample size was
determined based on the number of replicates in publicly available
datasets and listed in figure captions for each comparison (no power
analyses were performed). A paired t test was used to compare
quiescence depth scores generated through the fold change approach (as
these were paired samples). Enrichment significance was calculated
using the hypergeometric test. P-values were corrected for multiple
hypotheses through the Benjamini–Hochberg FDR correction. A P-value of
0.05 was used as a threshold for significance.
Supplementary Material
pgae013_Supplementary_Data
[215]Click here for additional data file.^ (9.8MB, zip)
Contributor Information
Alec Eames, Department of Biomedical Engineering, University of
Michigan, Ann Arbor, MI 48109, USA.
Sriram Chandrasekaran, Department of Biomedical Engineering, University
of Michigan, Ann Arbor, MI 48109, USA; Program in Chemical Biology,
University of Michigan, Ann Arbor, MI 48109, USA; Rogel Cancer Center,
University of Michigan Medical School, Ann Arbor, MI 48109, USA; Center
for Bioinformatics and Computational Medicine, University of Michigan,
Ann Arbor, MI 48109, USA.
Supplementary Material
[216]Supplementary material is available at PNAS Nexus online.
Funding
This work was supported by faculty start-up funds from the University
of Michigan (UM), Camille and Henry Dreyfus Foundation, and the
National Institute of General Medical Sciences (NIGMS) R35 GM13779501
to S.C.
Author Contributions
A.E. and S.C. conceived the study and wrote the manuscript. A.E.
performed research with supervision by S.C.
Data Availability
All datasets used in this study were downloaded from public
repositories. Transcriptomics data used to generate fluxes and train
the quiescence depth predictors are available from GEO under accession
numbers [217]GSE124109 ([218]5), [219]GSE129964 ([220]67), and
[221]GSE179848 ([222]48). Omics data used to validate the models,
explore quiescence in other contexts, and perform perturbation
screening are available under the following GEO accession numbers:
[223]GSE117444 ([224]45), [225]GSE74620 ([226]46), [227]GSE122927
([228]41), [229]GSE112295 ([230]44), [231]GSE109546 ([232]49),
[233]GSE104651 ([234]51), [235]GSE136714 ([236]71), [237]GSE59114
([238]50), and [239]GSE70138 ([240]52). All code used to generate the
results of this study is available at the following Github repository:
[241]https://github.com/sriram-lab/quiescence-depth-modulation.
References