Abstract
Altered metabolism is a hallmark of cancer, but little is still known
about its regulation. In this study, we measure transcriptomic,
proteomic, phospho-proteomic and fluxomics data in a breast cancer
cell-line (MCF7) across three different growth conditions. Integrating
these multiomics data within a genome scale human metabolic model in
combination with machine learning, we systematically chart the
different layers of metabolic regulation in breast cancer cells,
predicting which enzymes and pathways are regulated at which level. We
distinguish between two types of reactions, directly and indirectly
regulated. Directly-regulated reactions include those whose flux is
regulated by transcriptomic alterations (~890) or via proteomic or
phospho-proteomics alterations (~140) in the enzymes catalyzing them.
We term the reactions that currently lack evidence for direct
regulation as (putative) indirectly regulated (~930). Many metabolic
pathways are predicted to be regulated at different levels, and those
may change at different media conditions. Remarkably, we find that the
flux of predicted indirectly regulated reactions is strongly coupled to
the flux of the predicted directly regulated ones, uncovering a tiered
hierarchical organization of breast cancer cell metabolism.
Furthermore, the predicted indirectly regulated reactions are
predominantly reversible. Taken together, this architecture may
facilitate rapid and efficient metabolic reprogramming in response to
the varying environmental conditions incurred by the tumor cells. The
approach presented lays a conceptual and computational basis for
mapping metabolic regulation in additional cancers.
Subject terms: Cancer genomics, Computational models, Regulatory
networks
Introduction
Cancer cells adapt their metabolism to facilitate biomass formation to
support their rapid proliferation. Transcriptional regulation alone
does not account for many of the metabolic alterations observed in
cancer^[42]1,[43]2, suggesting that post-transcriptional,
post-translational and protein phosphorylation mechanisms may play an
important role in modulating cancer metabolism and determining cancer
cell phenotypes^[44]3–[45]6. Here we aim to chart the transcriptional,
post-transcriptional and post-translational regulation of MCF7 breast
cancer cell metabolism on a genome scale. This is performed via
measurements of multi-omics data employing MCF7 breast cancer cells
under three different in vitro growth conditions, and its analysis via
an integration of this data within a genome scale metabolic model
(GSMM) of human metabolism. Our approach is inspired by previous
large-scale omics studies of the multi-level regulation of bacterial
metabolism^[46]7–[47]9 and yeast^[48]10, which have advanced our
understanding of the organization and regulation of metabolism in these
organisms.
Genome scale metabolic modeling is an increasingly widely used
computational framework for studying metabolism. Given the GSMM of a
species alongside contextual information such as growth media and omics
data, it has been shown that one can fairly reliably predict numerous
metabolic phenotypes, including cells’ growth rates, metabolite uptake
and secretion rates and internal fluxes, gene essentiality, and more.
Over the last few years, GSMMs have successfully served as a basis for
many computational studies of cancer, e.g.^[49]11–[50]16. GSMMs have
also been used to predict post-transcriptional regulation of metabolic
enzymes in healthy tissues^[51]17 but going beyond that to
systematically analyze metabolic regulation in cancer is addressed here
for the first time to the best of our knowledge.
Results
Data collection and preliminary model-free analysis
We collected omics measurements in MCF7, a breast cancer cell line,
grown under three different conditions: (1) Minimum Essential Medium
(MEM) with glucose and without glutamine (MEM-Gln), (2) MEM with
glucose and glutamine (MEM) and (3) MEM with glucose, glutamine and
supplemented with Oligomycin – an inhibitor of ATP synthase that
inhibits cell respiration (MEM+Oli). The media were chosen because they
reflect multiple stress conditions for the cell: one media (glutamine
deprivation) is chosen because MCF7 cells rely on glutamine as the main
source of energy, and the other media (supplement of Oligomycin) is
chosen because it emulates tumor hypoxic conditions.
The measurements were repeated twice under each condition at two time
points - after 8 and 24 hours, resulting in overall 6 × 2 multi-omics
datasets. Each such dataset includes the gene-expression of 1372
metabolic genes, proteomics for 486 metabolic enzymes (~97% of the
measured enzymes have gene expression values), phosphorylation values
for 71 phosphorylation sites on metabolic enzymes, and flux
measurements of 44 metabolic reactions (see methods). To obtain flux
measurements, we fitted all the data obtained through
spectrophotometric measurements and ^13C assisted metabolomics
experiments using our in-house developed software that simulates
dynamics of metabolites ^13C labeling, Isodyn^[52]18–[53]22. Fitting
the data allows determining the metabolic flux profiles of MCF7 breast
cancer cells under three different growth conditions (see methods).
Figure [54]1 summarizes the qualitative changes in the metabolites and
their analysis using Isodyn. The analysis demonstrates a decrease in
the fluxes of glycolysis, lactate production, pentose phosphate pathway
(PPP) activity, tricarboxylic acid cycle (TCA) cycle utilization and
fatty acid synthesis when the cells are at MEM-Gln growth condition
compared to MEM. Moreover, increased pyruvate cycle, which is the
conversion of pyruvate to oxaloacetate via pyruvate carboxylase
followed by its conversion to malate and consequently back to pyruvate
via malic enzyme, occurs mainly in MCF7 cells at MEM-Gln condition
compared to the MEM growth condition. On the other hand, in the MEM+Oli
growth condition, increased glycolysis, lactic acid fermentation and
pyruvate cycle are observed compared to the MEM growth condition,
together with decreased TCA cycle activity, PPP and lipogenesis. All
measured and estimated fluxes and their values are listed in SI
Table [55]1.
Figure 1.
[56]Figure 1
[57]Open in a new tab
Metabolic flux map of MCF7 breast cancer cells under MEM-Gln or MEM+Oli
growth conditions compared to MEM condition. The fluxes were estimated
by using Isodyn software. In each growth condition, the calculated flux
was normalized against the flux of MEM growth condition in order to
calculate the net change.
To obtain a genome wide view of pathway-level differences in the
transcriptional data across the different growth conditions, we first
compared (using a t-test) the metabolic gene expression values between
the different growth conditions to identify metabolic pathways that
were significantly up or down regulated in any of these conditions
compared to the others. We found that upon oligomycin treatment,
carnitine shuttle pathway is downregulated compared to the other growth
conditions, as well as the urea cycle/amino group metabolism pathway.
On the other hand, fatty acid activation and C5-Branched dibasic acid
metabolism (among other pathways) were found to be elevated upon such
treatment - a full table listing the significant growth
condition-specific changes is provided in SI Table [58]2(a–c), all
p-values were FDR corrected for 0.05). A similar analysis of the
proteomics data revealed different results. While carnitine shuttle
pathway activation was consistent with the gene expression analysis,
the fatty acid pathways (activation, elongation and oxidation) were now
found to be downregulated upon Oligomycin treatment. These results,
consistent with previous observations both in yeast^[59]23,[60]24 and
in human^[61]25,[62]26, point to the significant differences between
the mRNA and protein levels of many metabolic enzymes and call for a
systematic study of their potential functional regulatory implications.
Overview of the metabolic modeling based analysis
Our main goal in this study is to use the measured multi-omics data to
systematically chart the different layers of metabolic regulation in
breast cancer cells that orchestrate the actual metabolic flux across
the network’s reactions occurring in each growth condition. Ideally,
measuring the actual fluxes in each condition directly via tracing
experiments would be adequate, but obviously, this can currently be
done only for a small number of fluxes that are mainly involving
central cell metabolism. Hence, alternatively, we integrated the
various omics data measured in each growth condition within a genome
scale model of human metabolism^[63]27 to infer the likely metabolic
fluxes given these data in a genome wide manner. After an initial
validation of these predictions, we proceeded to compare the flux
predictions of the resulting reactions to the corresponding enzymes’
omics data to identify their regulation. This is performed in a
stepwise manner as follows (Fig. [64]2):
1. GSMM based identification of transcriptional and translational
directly regulated reactions: We first identify reactions that are
directly regulated – that is, reactions whose model-based predicted
flux alterations across the different conditions studied can be
accounted for by molecular alterations at any one of the levels
measured: those include reactions that are primarily
transcriptionally regulated and primarily translationally
regulated. These assignments are done in a mutually exclusive
manner, as follows: (1) transcriptionally regulated reactions (TR)
are those reactions whose enzymes’ gene expression levels match the
predicted fluxes. (2) translationally regulated reactions (TL) are
those reactions whose predicted flux levels do not match their gene
expression levels, but they match the protein levels of their
enzymes.
2. GSMM based identification of post-translationally directly
regulated reactions: Post-translationally regulated reactions’
(PTL) assignments are given to the reactions where both the
enzymes’ gene expression and proteomics levels do not match the
predicted flux levels but the predicted flux levels across the
different growth condition can be significantly associated with
changes in the phosphorylation levels of the enzymes.
3. Building machine learning predictors of additional directly
regulated reactions: For the majority of the metabolic reactions,
however, we did not find omics evidence testifying that they are
directly regulated at any of these three levels. One major reason
for that may be the limited scope of the proteomics and
phospho-proteomics measurements. We, therefore, built machine
learning based predictors of TR and TL regulation based on the
reactions that have already been labeled as such via the
model-based analysis in step (1). Then, we applied these predictors
in a genome wide manner to further identify sets of reactions that
are predicted to be TR or TL regulated (detailed below). We then
performed various genome wide analyses to further test and validate
the veracity of these predictions.
4. Identifying stochiometrically coupled, indirectly regulated
reactions: Finally, even after this prediction step, a large set of
reactions still remains unassigned and are labeled as indirectly
regulated. A major likely source of such indirect regulation is
metabolic regulation^[65]28, which manifests itself in the
stoichiometric coupling of the fluxes of different reactions across
the metabolic network, and which we study further using the human
metabolic model.
Figure 2.
[66]Figure 2
[67]Open in a new tab
Systematic identification of reactions’ regulation: Step 1: Using
gene-expression and proteomics data to predict transcriptionally and
translationally regulated reactions. Step 2: Using phospho-proteomic
data to predict post-translationally regulated reactions. Step 3: Based
on the results of step 1, build predictors of TR and TL regulation.
Step 4: Identifying indirectly regulated reactions that are
metabolically regulated via stoichiometric coupling.
Below we provide a detailed description of each of these four steps and
the results they uncover.
Step 1: Identifying transcriptionally regulated (TR) and translationally
regulated (TL) reactions
We first aimed to predict the fluxes of the reactions in each
condition, to determine which reactions are directly regulated and at
what level they are regulated. To this end, we used iMAT (the
integrative Metabolic Analysis Tool)^[68]17, a computational method
that systematically predicts metabolic fluxes in a GSMM by
incorporating omics data (transcriptomics and/or proteomics) that
represent the activity level of the metabolic enzymes. iMAT considers
the gene expression or protein levels as cues for the likelihood that
the enzymes in question carry a metabolic flux in their associated
reactions. It then leverages the GSMM to accumulate these cues into a
global flux distribution that is stochiometrically consistent and
maintains mass balance across the entire metabolic network (see
methods).
To this end we first tested if the above described procedure yields
flux predictions that are in accordance with those quantified with ^13C
Metabolic Flux Analysis (^13C MFA). To this end, we combined both mRNA
and protein expression measurements and used iMAT, a tool that extends
upon the standard flux balance analysis (FBA) to predict the flux
distribution that is the most likely given both types of data. Briefly,
following a procedure already established and validated by^[69]17, the
activity level of an enzyme was set according to the proteomics data
when these data were available and according to the gene-expression
otherwise, leaving the activity level unconstrained when large
disparities existed between the gene expression and the proteomics data
(see methods). Reassuringly, the accuracy of predicting the
experimentally measured fluxes was significant across all growth
conditions (Spearman correlation coefficient across all growth
conditions = 0.42 p-values < 8.9671e-25, see Fig. [70]3 for the
correlations obtained at each of the three different growth
conditions).
Figure 3.
[71]Figure 3
[72]Open in a new tab
Scatter plot depicting the association between the measured and
predicted fluxes in each of the three media conditions. Flux
predictions were obtained by integrating the transcriptomics and
proteomics data within the human metabolic model, as described in the
main text.
Given these network wide flux predictions, we next set to identify the
reactions that are transcriptionally regulated (TR). To this end we
discretized the gene expression measurements and the predicted fluxes
into three levels of activity: low (TR-low), moderate (TR-moderate) and
high (TR-high)). We then compared predicted flux level of each reaction
to the discretized gene-expression level of the pertaining enzymes (see
methods). Reactions whose predicted flux levels matched gene expression
levels of their enzymes across the different measurements were
considered to be TR. For the three conditions (MEM-Gln, MEM and
MEM+Oli), 562, 550 and 556 reactions (approximately 28% of the model
reactions) were identified as TR, respectively. Supporting these
predictions, we found that the group of predicted TR reactions is
enriched with transcription factor binding sites (using ENRICHR
tool^[73]29,[74]30, we calculated the enrichment according to several
databases: Jaspar^[75]31 and Transfar^[76]32 (hyper-geometric
p-value = 9.5892e-05), ChEA^[77]33 (hyper-geometric
p-value = 1.2819e-10) and ENCODE^[78]34,[79]35 (hyper-geometric
p-value = 0.0029)) (see methods).
To predict translational regulation (TL), we searched for reactions
whose (discretized) predicted flux activity levels were different from
the transcriptomic levels of their enzymes. Such transcriptomic/flux
‘discordant’ reactions whose activity levels were high (low) according
to the gene expression of their enzymes but low (high) according to the
flux predictions are considered to be post-transcriptionally
down-(up-)regulated. The correlation between the proteomics data and
the predicted fluxes for this subset of TL predicted reactions was high
and significant (rho = 0.75, 0.6, 0.5, for the 3 growth conditions, all
p-values < 0.0071), as would be expected (SI Fig. [80]1). It is
important to note that in order to avoid circularity, this correlation
was calculated in a cross-validation manner only for sub-group which
was not constrained in the algorithm input. Among the reactions
identified as post-transcriptionally regulated, we denoted the subset
of reactions whose predicted flux state highly matches the proteomics
(discretized) levels in a given growth condition as translationally
(TL)-regulated. Among those, about 15 reactions are predicted to be
TL-upregulated (the discretized flux/proteomics activity state is
higher than the discretized transcriptomics state), and about 35 are
predicted to be TL-downregulated (the discretized flux/proteomics
activity state is lower than the discretized transcriptomics state) (SI
Table [81]3). The specific pathways that are predicted to be TR
(high/low/moderate) and TL (up/down) regulated are listed in SI
Table [82]6(a–c).
Step 2: Identifying post-translational (PTL) regulated reactions
To identify the reactions that are post-translationally (PTL)
regulated, we used the fluxes predicted in the previous step as a
reference point. That is, reactions whose predicted flux activity
markedly differed both from their transcriptomics and proteomics
expression levels (that are hence not predicted to be TR or TL
regulated) may be post-translationally (PTL)-regulated. Overall, 34,
39, 42 such reactions have at least one measured phosphorylation site
in MEM, MEM-Gln and MEM+Oli, respectively. We next inferred the impact
of each of the measured phosphorylation sites on enzyme activity. The
phosphorylation data included 56 metabolic enzymes phosphorylated at 71
different phosphorylation sites catalyzing 164 metabolic reactions. For
each of the reactions, we computed the Spearman rank correlation
between the predicted flux (computed via integrating the pertaining
transcriptomics and proteomics data) and the corresponding site
phosphorylation levels across all growth conditions and time points
measured (SI Fig. [83]2). 19 reactions manifested a significant p-value
(<0.05) with a strong correlation (Spearman rho > |0.6|). These 19
reactions have 13 different phosphorylation sites (SI, Fig. [84]3).
The functional impact of phosphorylation is currently known from the
literature for only two of these enzymes: (1) phosphorylation of S1859
in carbamoyl-phosphate synthetase 2 (CAD) enhances its in vivo^[85]36
activity, and (2) phosphorylation on S293 causes pyruvate dehydrogenase
(PDHA1) enzyme inactivation^[86]37. Our predictions match both; for the
CAD enzyme, we detected a high positive correlation (0.718) and for
PDHA1 we obtained a strong negative correlation of −0.6. To test and
validate these predictions in our cells further, we performed western
blot experiments for both proteins (CAD and PDH together with their
phosphorylated forms). We observed a marked phosphorylation of PDH in
the predicted conditions for MEM-Gln and MEM+Oli compared to MEM growth
condition, indicating its reduced activity under these conditions
(Fig. [87]4). This is additionally confirmed via flux measurements
through ^13C MFA (SI, Table [88]1). On the other hand, we observed a
decreased phosphorylation at CAD protein, indicating a decrease at its
activity at MEM-Gln and MEM+Oli conditions, as predicted (Fig. [89]4).
Figure 4.
Figure 4
[90]Open in a new tab
Phosphorylation of the indicated proteins (PDH and CAD) at MEM-Gln and
MEM+Oli conditions were detected by western blot analysis.
Step 3: Genome wide prediction of TR and TL regulation of breast cancer
metabolism
In the previous steps, we have identified about 500 reactions that are
directly regulated at one of the three regulatory levels described
above (TR, TL or PTL). In these reactions, the predicted flux changes
were significantly associated with molecular alterations in the
pertaining enzymes. However, this leaves a large number of about 1450
reactions that were not assigned to any of these direct regulatory
levels, which can be attributed to the limited scope of our
measurements. In order to predict additional reactions that are likely
to be directly regulated at TR or TL level, we built five Support
Vector Machine (SVM) classifiers for five different direct regulation
levels: TR-high, TR-low, TR-moderate, TL-up and TL-down. The goal of
each classifier is to predict whether a reaction is regulated at one of
these levels or not. The classifier was trained and evaluated using the
reactions that have already been labeled as TR or TL regulated in the
previous analysis at step (1), using a standard train and test 5-fold
cross validation. The classifier input features included the gene
expression, proteomics, predicted fluxes and metabolic network
characteristics (reversibility information, number of participating
metabolites, index of the relevant pathway, and more) of the given
reactions, and the TR/TL labels already assigned in the previous steps
(see methods). The accuracy of the classifier was measured by comparing
the predicted labels against the known labels. The resulting
classifiers achieved a high cross validation prediction accuracy (mean
AUC > 0.946 for all classifiers, all values are presented in
Fig. [91]5a; recall and precision values are presented in Fig. [92]5b).
Applying this to predict the direct regulation of the 1450 remaining
reactions, ~450 additional reactions were predicted to be regulated at
exactly one of the TR/TL levels (in MEM, MEM-Gln and MEM+Oli, see
Fig. [93]5c for their subdivision in each of the regulation groups).
The predicted TR group is enriched with transcription factor binding
sites (hyper-geometric p-value = 6.236e-119, see methods. Similarly,
the predicted TL group has a significantly higher number of
flux/proteomic states matches compared to the randomly selected sets
(empiric p-value = 0.04). It is important to note that the very small
numbers of predicted PTL reactions did not enable us to build reliable
predictors of regulation at this level. Interestingly, adding the new
set of predicted reactions which are directly regulated to those
reactions which are previously identified as directly regulated by
model based integration uncovers a large number of new pathways that
now become enriched in directly regulated reactions (SI Table [94]6).
Figure 5.
[95]Figure 5
[96]Open in a new tab
(a) AUC curves for each of the direct regulation SVM classifiers; (b)
mean precision and recall values for each of the SVM classifiers; (c)
number of reactions that have been uniquely predicted to be directly
regulated by one of the classifiers.
Step 4: Studying the reactions that are indirectly regulated via
stoichiometric coupling
After the predictions we performed at step 1–3, around 1000 reactions
yet remained not to be predicted as directly regulated, some of which
are likely to be further identified as regulated with more extensive
data. However, many of these remaining unassigned reactions may still
be truly indirectly regulated (IR) reactions where their flux may be
primarily metabolically-regulated by changes in their substrate and
product levels due to changes in the flux activities of other reactions
in the metabolic network. That is, their flux may be stoichiometric
coupling (SC-regulated) to the flux of other reactions in the metabolic
network^[97]38–[98]40.
In the framework of MCA (Metabolic Control Analysis), it has been
established that network structure is an important determinant of
metabolic control^[99]41. Accordingly, a perturbation in enzyme
abundance or activity can be propagated through reactions stoichiometry
coupled to the reaction catalyzed by such enzyme. To study such
dependencies on a genome-scale, we used flux sampling to quantify the
pairwise stoichiometric couplings between all the metabolic reactions
in the human network, identifying for each reaction how tightly its
flux is coupled to the flux of each of the other reactions, in each of
the different conditions (see Methods).
Remarkably, we found that the ~1000 ‘unassigned’ indirect reactions
have significantly higher stoichiometric couplings to the TL and PTL
directly regulated reactions than among themselves across the different
growth conditions (using one sided Wilcoxon test,
p-values = 6.9163e-158 and 2.945e-14, respectively). These findings
point out that the regulation of cellular metabolism may be governed in
a hierarchical manner where the flux of many indirectly regulated
reactions is determined via stoichiometric coupling to the flux of
others, directly regulated reactions. Finally, we found that the group
of ~1000 indirectly regulated reactions is highly enriched with
bi-directional reactions (hyper-geometric p-value = 1.15e-28, 2.21e-32,
5.54e-32 for each condition, see Methods). This observation can be
explained by metabolic control analysis (MCA)^[100]42 theory: In the
framework of MCA, enzyme activities catalyzing reversible reactions,
which often are in rapid equilibrium, usually have low flux control
coefficients and hence are poor targets of direct regulation. Indeed,
the combination of the ‘directional flexibility’ of candidate
SC-regulated reactions with their enhanced coupling to other
directly-regulated reactions is likely to facilitate the formation of
stoichiometrically feasible flux distributions across the metabolic
network, providing a way for efficiently regulating the metabolic state
with minimal cellular costs in terms of transcriptomics, proteomics and
phospho-proteomics regulation.
Discussion
This study integrates transcriptomics, proteomics, phospho-proteomics
and fluxomics data with metabolic modeling to provide the first chart
of metabolic regulation in MCF7 breast cancer cells on genome scale. We
classified the metabolic enzymes as those that are predicted to be
directly regulated at three distinct levels (TR, TL, and PTL) and those
that are predicted to be indirectly regulated, given the current
coverage of omics data. As expected, we found that citric acid cycle is
generally upregulated both on the transcription and translational
level. Interestingly, while on the transcriptional level fatty acid
oxidation was found to be generally down-regulated, it is up-regulated
on the translational level. In addition, oxidative phosphorylation –
another hallmark of cancer, was found to be up-regulated only on the
translational level (not including MEM+Oli medium). These findings
further highlight the pivotal role of translational regulation in
cancer and the importance of obtaining higher coverage of proteomic
data, whenever possible.
Remarkably, we found that the flux of the indirectly regulated
reactions is coupled to the flux of directly regulated ones. We also
found that the indirectly regulated reactions are enriched with
bi-directional reactions. These findings might open an opportunity for
further research to determine an extent by which their activity levels
are set by other reactions. Taken all together, these findings suggest
that the regulation of breast cancer cell metabolism is controlled in a
hierarchical manner where the direct regulation of about half of the
reactions suffices to orchestrate the flux regulation through the whole
metabolic network via flux coupling.
Like almost any other computational, genome scale investigation, our
approach has quite a few limitations. First, the data itself, is still
limited and noisy, and the coverage of different layers of omics data
is uneven, due to obvious technical limitations. Second, guided by the
data we collected, we focused here on studying post-translational
modifications mediated by phosphorylation. However, obviously,
post-translational modifications occur via a variety of additional
mechanisms, including, e.g., acetylation, glycosylation and allosteric
regulation^[101]43,[102]44. Consequently, the machine learning
predictors built for predicting transcriptional regulation and
post-transcriptional regulation, but not post-translational regulation.
Fourthly, as we employ coarse discretization to overcome some of the
noise in the data, we only identify regulatory alterations in reactions
that are differentially active across the conditions of study. This
limitation is partly mitigated, however, by analyzing three very
distinct metabolic states. Future work should aim to address these
limitations by incorporating data sets covering more conditions,
measuring a wider range of omics data with higher coverage, and
ideally, move to perform such measurements in patients’ tumor data.
With the advent of omics technologies such data may become readily
available soon and may be benefit from the conceptual and computational
framework laid out in the current study.
Although we analyzed multiple layers of omics data, their coverage has
been limited: while we had gene expression data for all 1372 metabolic
genes, the coverage of our cutting-edge proteomics measurements
provided data for only 486 metabolic enzymes and 71 of their
phosphorylation sites. Flux measurements using ^13C labeling are
understandably even more limited in their scope, covering only central
carbon metabolism. Aiming to make the best use of the available data
and to obtain a genome-wide view of breast cancer cell metabolism, we
used a modeling approach to integrate the data and infer the most
likely genome-scale flux distributions. Additional work aiming to deal
with the limited coverage problem was carried out via creating SVM
predictors that used the known network properties together with
measurements with high coverage and helped us extend the scope of the
study to the utmost. With rapid advancement of high-throughput
technology and accumulation of more comprehensive omics data across
additional cellular conditions, the conceptual and computational
framework exhibited here lays the methodological foundations for
gradually obtaining a more comprehensive view of metabolic regulation
in both breast cancer and other cancer types.
Materials and Methods
Cell culture
Breast cancer cell line, MCF7 was purchased from ATCC and cultured in
MEM without phenol red (Gibco, Thermo Fisher Scientific Inc., Waltham,
MA, USA) containing 10% Fetal Bovine Serum (Sigma), 10 mM d-glucose
(Sigma-Aldrich), 1 mM sodium pyruvate (Biological Industries), 2 mM
glutamine (Gibco), 0.1% antibiotic (penicillin 10 Units/ml-streptomycin
10 Units/ml, Gibco), 0.01 mg/ml insulin (Sigma), and 1% non-essential
amino acids (Biological Industries). The cells were maintained at 37 °C
with 5% CO[2] and saturated humidity. Growth medium was replaced every
2–3 days.
Many breast cancer cells, including MCF7, display glutamine addiction
habits; that is, they rely on glutamine as the main source of energy
rather than glucose^[103]45. Besides that, they also have elevated
mitochondrial activity, and considering that hypoxia is a common
condition in the tumor microenvironment, the study of metabolism in the
presence of strong stress condition such as hypoxia is also
particularly interesting. Therefore; to study the regulation of breast
cancer cells, we applied these two perturbations; glutamine deprivation
and mitochondrial inhibition by oligomycin.
For the experiments, MCF7 cells were seeded and 48 h later, the medium
was exchanged with an adaptation medium, MEM without phenol red (Gibco)
containing 10% dialyzed Fetal Bovine Serum (Sigma) and the
above-mentioned supplements. For the metabolomics experiments, after
24 h of incubation with adaptation medium, for the MEM systems, the
medium was exchanged with the same medium containing 10 mM
[1,2-^13C[2]]-glucose (Sigma) or 4 mM [U-^13C[5]]-glutamine (Sigma)
with or without oligomycin (1 µM). For the MEM-Gln systems the replaced
growth medium did not contain glutamine but only 10 mM
[1,2-^13C[2]]-glucose (Sigma). The cells were counted at 0 h, 8 h and
24 h after tracer introduction, and cell pellet and media were
immediately frozen to use in later analysis. For the proteomic
experiments heavy labeled MCF7 cells were used as an internal standard.
To obtain complete labeling, cells were cultured in DMEM deprived of
lysine and arginine, and supplemented with the heavy versions of these
amino acids, ^13C[6]^15N[2]-lysine (Lys8) and ^13C[6]^15N[4]-arginine
(Arg10). After ten cell doublings, complete labeling was achieved and
validated by mass spectrometric analysis.
Biochemical assays
Glucose, lactate, glutamine and glutamate concentrations were
determined by spectrophotometry (COBAS Mira Plus, Horiba ABX) from
frozen cell culture medium as previously described^[104]46–[105]48.
Briefly, extracellular glucose was measured by calculating the NAD(P)H
concentration decrease after the conversion of total glucose by
hexokinase and conversion of resulting glucose-6-phosphate into
D-gluconate-6-phosphate by G6PDH using coupled enzymatic reactions (ABX
Pentra Glucose HK CP, HORIBA ABX, Montpellier, France). Lactate
concentration was determined by lactate dehydrogenase (LDH) reaction
and measurement of NADH change. Similarly, the glutamate concentration
was determined by glutamate dehydrogenase (GDH) reaction and
measurement of NADH change. To measure glutamine concentration,
glutamine was first converted to glutamate by glutaminase (GLS)
reaction and then glutamate concentration was quantified as described
above. Consumption and production rates of metabolites in the cells
were analyzed by measuring the decrease or increase in concentration of
the extracellular metabolites in the media at 8 h or 24 h compared to
the initial concentration of the metabolite, with respect to the total
cell number at each time point.
13C Assisted metabolomics
Isotopologue distribution analysis of intracellular and extracellular
metabolites was performed by gas chromatography coupled to mass
spectrometry (GC-MS). All GC-MS analysis was carried out using an
Agilent 7890 A GC equipped with HP5 capillary column connected to an
Agilent 5975 C MS. GC-MS analysis of fatty acids was carried out using
a GCMS-QP 2012 Shimadzu coupled with bpx70 (SGE) column. For all
measurements, 1 µL of sample was injected at 250 °C, helium as the
carrier gas, at a flow rate of 1 mL per minute. Each metabolite or
metabolite set had different isolation, derivatization and detection
procedures as explained in^[106]49–[107]51. Raw mass spectra of
metabolites were corrected for natural abundance of ^13C, ^29Si, ^30Si,
^33S, ^34S to compute the fractions of ^13C incorporated into the
analyzed metabolic products from artificially labeled substrates. Data
are available via Metabolights with identifier MTBLS183.
([108]https://www.ebi.ac.uk/metabolights)
13C Metabolic flux analysis (13C MFA)
Our in-house developed software, Isodyn
[[109]https://github.com/seliv55/isodyn], was used to simulate the
transfer of the tracers from [1,2-^13C[2]]-glucose or
[U-^13C[5]]-glutamine medium into intracellular metabolites. Isodyn is
a program written in C++ and designed to simulate the dynamics of
metabolite labeling by stable isotopic tracers^[110]18–[111]22. This
program automatically constructs and solves a large system of ordinary
differential equations which describe the evolution of isotopologue
concentrations of metabolites produced in glycolysis, TCA cycle and
PPP. Initially, all the metabolites except for introduced labeled
substrates with known isotopologue composition in the medium are
considered to be non-labeled and initial total concentrations of
intracellular metabolites are calculated as a function of model
parameters assuming a steady state at the initial moment. There is a
function designed specifically for each type of reaction (i.e.
carboxylation, decarboxylation) and these functions simulate
transformation of carbon skeleton (specific transition of labeled
carbon) and consumption and production rates of each isotopologue in
the considered system. These transformations redistribute ^13C isotopes
in all metabolites, so that, individual rates which determine the
values of the derivatives for the isotopologues are calculated for each
isotopologue. To solve this system, a method of numerical integration
is chosen arbitrarily (Runge-Kutta, BDF, Dassl). Isodyn simulates a
real-time course of label propagation starting from the initial values
of experimental conditions of incubation. As it compares the
experimental and computed data for corresponding time points, reaching
an isotopic steady state is not necessary.
Western blot
Cell extracts were obtained from fresh plates 24 h after incubation
with the corresponding growth medium. Then, cells were incubated for
30 min on ice with lysis buffer, scraped, sonicated and centrifuged at
15,000 g for 20 minutes at 4 °C. Supernatants were recovered and the
protein content was quantified by the BCA kit (Pierce Biotechnology).
Western blot analysis was carried out size-separating an equal amount
of protein by electrophoresis on SDS polyacrylamide gels, and then the
proteins were electroblotted onto polyvinylidene fluoride transfer
membranes (PVDF) (Bio-Rad Laboratories, Hercules, CA, USA). The
membranes were blocked with 5% of non-fat dry milk in PBS with 0.1%
Tween, and then incubated with specific primary antibodies overnight at
4 °C. Next, membranes were treated with the appropriate secondary
antibody for 1 hour at room temperature. All blots were visualized on
Fujifilm X-ray (VWR International, Radnor, PA, USA) with
chemiluminescence detection using Immobilon ECL Western Blotting
Detection Kit Reagent (EMD Millipore, Billerica, MA, USA). The
antibodies used were CAD (Santacruz Biotechnology), CAD-P (Cell
Signaling), PDH (Merck Millipore) PDH-P (Cell signaling) and β-actin
(MP Biomedicals). Also, anti-mouse (Dako) and, Anti-rabbit (Amersham
Biosciences) secondary antibodies were used.
Transcriptomics analysis
mRNA was extracted from cells using GeneAll Hybrid miRNA kit according
to manufacturer instructions. mRNA was then processed on Atlas machine
using Affymetrix Human Gene 2.1 ST Array Strip and WT expression kit.
CEL files were analyzed using Affymetrix Expression Console software.
The data were converted to log2 RMA values.
Proteomics and phosphoproteomics analysis
MCF7 cells were lysed in buffer containing 4% SDS, 100 mM DTT in
Tris-HCl pH 7.5. Equal protein amounts were combined with the SILAC
standard and 5–10 mg proteins were digested using the FASP
protocol^[112]52. From each sample, 10 ug were taken for proteomic
analysis, and the rest was used for phospho-peptide enrichment with
IMAC. Single runs were performed for each proteomic and
phospho-proteomic sample.
MS analysis was performed on the EASY-nLC1000 nano-HPLC coupled to the
Q-Exactive MS (Thermo Scientific). Peptides were separated on PepMap
C18 columns using 200 min gradients. Raw MS files were analyzed with
MaxQuant. Database search was performed with the Andromeda search
engine using the Uniprot database. A decoy database was used to
determine a 1% FDR cutoff on the peptide and protein levels. For
phospho-proteomic analysis, the database search included p(STY) sites
as variable modifications. Data are available via ProteomeXchange with
identifier PXD006449 ([113]http://www.proteomexchange.org/)
Genome-scale metabolic modeling (GSMM)
A metabolic network consisting of m metabolites and n reactions can be
represented by a stoichiometric matrix S, where the entry S[ij]
represents the stoichiometric coefficient of metabolite i in reaction
j^[114]53. A GSMM model imposes mass balance, directionality, and flux
capacity constraints on the space of possible fluxes in the metabolic
network’s reactions through a set of linear equations:
[MATH: S⋅v=0 :MATH]
[MATH: vmin≤v≤vmax :MATH]
where v stands for the flux vector for all of the reactions in the
model (i.e. the flux distribution). The exchange of metabolites with
the environment is represented as a set of exchange (transport)
reactions, enabling a pre-defined set of metabolites to be either taken
up or secreted from the growth media. The steady-state assumption
represented in equation (1) constrains the production rate of each
metabolite to be equal to its consumption rate. Enzymatic
directionality and flux capacity constraints define lower and upper
bounds on the fluxes and are embedded in equation (2).
In the following, flux vectors satisfying these conditions will be
referred to as feasible steady-state flux distributions.
Pathway enrichment analysis
Based on iMAT results, which was used to predict the regulation of the
reactions in the metabolic model, a hypergeometric p-value was computed
for each pathway in the model for being enriched with reactions that
are regulated in each level. Data for reactions and their pathways were
taken from BIGG database^[115]54. A correction for multiple hypotheses
was done using false discovery rate method of 0.05.
Using iMAT with transcriptomics and proteomics as its input
We first employed a discrete representation of significantly high or
low enzyme-expression levels across tissues. Gene expression and
proteomics levels were discretized to highly (1), lowly (-1), or
moderately (0) expressed, for each sample. This discretization was
conducted as follows: the 1/3 of the proteomics with the highest values
to be considered as highly expressed, and vice versa for lowly
expressed. When proteomics data was not available, transcriptomics data
was used (again – top 1/3 as lowly expressed, and vice versa). One
could argue that the different levels of coverage between
transcriptomics and proteomics could suggest using different thresholds
for determining ‘active’ and ‘inactive’ genes in the respective
analysis; To keep a systematic approach, here we opted to treat both
data measurements in the same, uniform, way (but other approaches may
be taken in the future. Lastly, in order to avoid direct effect of the
coverage differences between proteomics and transcriptomics, we
determined a moderate expression level for genes whose level according
to the gene expression was high (low) and according to the proteomics
low (high), and left their corresponding enzymes/reactions
unconstrained. In iMAT analysis, the discretized gene expression levels
were incorporated into the metabolic model to predict a set of high and
low activity reactions. Network integration was done by mapping the
genes to the reactions according to the metabolic model (see methods),
and by solving a constraint-based modeling (CBM) optimization problem
to find a steady-state metabolic flux distribution. CBM models the cell
as a network of metabolic reactions controlled by hundreds of genes and
enables the prediction of feasible metabolic behavior under different
genetic and environmental conditions, that are expressed as constraints
in the network^[116]55,[117]56. By using the CBM approach, we assign
permissible flux ranges to all the reactions in the network, in a way
that satisfies the stoichiometric and thermodynamic constraints
embedded in the model and maximizes the number of reactions whose
activity is consistent with their expression state. iMAT’s solution may
not be unique as a space of alternative optimal solutions (in terms of
its objective function) may exist. Therefore, we sampled 2,000
different flux distributions that are all consistent with the
reactions’ state of activity or inactivity defined in one of iMAT’s
optimal solutions. To address the potential degeneracy of the CBM
solutions, we used the artificial-center-hit-and-run (ACHR) sampling
approach^[118]57 which is an efficient sampling approach for a linearly
constrained space^[119]58 (mean, min and max flux and flux range for
each reaction is provided in SI). The mean flux distribution obtained
over the 2,000 samples then serves as an approximation of the source
metabolic state.
Gene to reaction mapping
To map the gene expression to expression on the reaction level, we used
the boolean gene-protein-reaction (GPR) associations available in the
H. sapiens recon1 metabolic model, downloaded from the BIGG database
(52). These rules indicate which genes need to be expressed using the
two Boolean operators “and” and “or”. An example of such a rule is the
following:
R1 = (g1 or g2) and g3 (indicating that either gene 1 or gene 2 (or
both) need to be expressed in combination with gene 3 to allow reaction
1 activity.
OR rules were converted to the maximum transcription level of either of
the genes, i.e. (g1 or g2) was converted to max(g1, g2)
AND rules were converted to the minimum transcription level of either
of the genes, i.e. (g1 and g2) was converted to min(g1, g2).
Bi-directional reactions
Bi-directional reactions are those that can potentially carry flux in
both directions (this information is provided in the human GSMM model).
Identifying TR/TL reactions
We compared the discretized gene expression measurements to the
activity levels of the predicted fluxes; we took 1/3 of the reactions
with the highest flux values to be considered as highly active, and
vice versa for lowly active reactions. The rest of the reactions
considered to be moderately active. If the activity level of a reaction
matches the discretized value according to the gene expression, in at
least 3 out of the 4 cell line replicates, the reaction is considered
to be TR. For the rest of the reactions, if the activity level of a
reaction matches the discretized value according to the proteomics, the
reaction is considered to be TL.
Identifying PTL reaction
Among the reactions that haven’t been classified as TR or TL in the way
that mentioned above, we found the sub group of reactions that were
associated with at least one phosphorylation site. Reactions whose
predicted flux activity markedly differed from their transcriptomics or
proteomics expression levels, and that were associated with at least
one phosphorylation site in 3 of the 4 cell line replications, were
predicted to be potentially post-translationally (PTL) regulated.
Finding transcription factor enrichment
First, we found the reactions that were predicted to be TR in all
condition. Then, using the reaction-gene matrix, we found the list of
genes that catalyze this group of reactions. Using ENRICHR
tool^[120]29,[121]30, we found how many of the genes have (at least
one) TFs that bind to their promoter region, from exploring
Jaspar^[122]31, Transfar^[123]32, ChEA^[124]33 and
ENCODE^[125]34,[126]35 databases. Same for all model genes. These
values were used in the hypergeometric calculation.
Support vector machine (SVM) classification
We built and trained five SVMs classifiers (representing 5 “classes” of
regulation, as described in main text). We applied an SVM classifier
with a quadratic kernel for each classifier, with the following
features:
(1–4) gene expression measurements under 4 data points
(5–8) predicted fluxes under 4 data points
(9) A binary integer indicating if the reaction is reversible.
(10) An integer value associated with a unique metabolic pathway.
(11) The total number of metabolites participating in the reaction.
(12) The total number of substrates participating in the reaction.
(13) The total number of products participating in the reaction.
For the labels, we used the classification of the reactions from the
previous steps (1 if it’s regulated at that level, 0 otherwise). All
SVM classifiers were trained on part of this data, and later tested on
all data (mean recall and precision values presented in the text).
Cross-validation was performed by setting aside one fifth of the
regulated-predicted reactions in the training set. The classifier was
trained on the remaining four. The classifier’s accuracy was measured
by comparing the predicted labels against the known labels.
Computing pairwise flux correlations
For each growth condition, we found 2000 different flux distributions
using flux balance analysis. Then, for each pair of reactions, we
calculated the Spearman correlation between their flux values. For the
coupling calculations, we used the absolute values of these
correlations (as coupling between reactions can be either positive or
negative).
Multiple hypotheses correction
Throughout our paper P-values were filtered by False Discovery Rate
(FDR) to correct for multiple testing^[127]59. More specifically,
first, all the p-values were sorted in increasing order, P1, P2,.., Pn.
Next, we filtered p-values
[MATH: pi :MATH]
:
[MATH: pi :MATH]
>
[MATH: in :MATH]
* 0.05.
Supplementary information
[128]Supplementary information^ (672KB, docx)
[129]Supplementary Dataset 1^ (1.8MB, xlsx)
Acknowledgements