Abstract
Recent research has shown that many types of cancers take control of
specific metabolic processes. We compiled metabolic networks
corresponding to four healthy and cancer tissues, and analysed the
healthy–cancer transition from the metabolic flux change perspective.
We used a Probabilistic Minimum Dominating Set (PMDS) model, which
identifies a minimum set of nodes that act as driver nodes and control
the entire network. The combination of control theory with flux
correlation analysis shows that flux correlations substantially
increase in cancer states of breast, kidney and urothelial tissues, but
not in lung. No change in the network topology between healthy and
cancer networks was observed, but PMDS analysis shows that cancer
states require fewer controllers than their corresponding healthy
states. These results indicate that cancer metabolism is characterised
by more streamlined flux distributions, which may be focused towards a
reduced set of objectives and controlled by fewer regulatory elements.
Subject terms: Cancer metabolism, Biochemical reaction networks,
Probabilistic data networks
__________________________________________________________________
Metabolic rewiring is a feature of many cancers. Here, the authors
combine control theory and flux correlation analysis to study the
transition of healthy metabolic networks to cancer states, and find
that cancer metabolism is characterized by more streamlined flux
distributions.
Introduction
Metabolic pathways are essential chemical processes that catalyse
complex reactions indispensable for development and life. Recent
research has shown that many types of cancers take control of specific
metabolic processes^[30]1. In particular, cancer pathogenesis has
recently been closely associated to serine, glycine and one-carbon
metabolism^[31]2. Preclinical analyses have shown that by adopting a
specific diet that limits the amount of serine and glycine, tumour
growth was severely inhibited.
Understanding how metabolic networks are controlled is a challenging
question though. Metabolic control analysis was originally developed to
evaluate how metabolic fluxes depend on kinetic parameters of enzymatic
reactions or metabolite concentrations. The level of control is
quantified by calculating control or elasticity
coefficients^[32]3,[33]4. However, these approaches are not easily
applicable to large systems since they rely on a kinetic model of the
metabolic pathways. More recently, an approach was developed that uses
flux couplings between metabolic reactions^[34]5. Flux couplings
identify constrained relations between fluxes of different reactions,
which arise from the network topology and mass conservation^[35]6. Five
types of couplings were identified, namely directional, partial, full,
anti and inhibitive couplings. The networks were analysed using a
minimum dominating set (MDS) approach to identify potential driver
nodes reactions^[36]5. The MDS approach was proposed by Nacher and
Akutsu^[37]7 and has been used to analyse many biological systems and
identify proteins associated to cancer^[38]8–[39]12. However, the flux
coupling approach uses a discrete definition of coupling between
reactions and does not take into account the full range of possible
flux distributions that the metabolic system is able to support. For
example, two reactions are defined as anti-coupled if when one of them
is inactive then the other carries a non-zero flux (in a non-zero
steady state), but this definition does not take into account the
relation between both fluxes in situations where both reactions are
active.
Here, we argue that in order to investigate the distributed control of
metabolic flux in large-scale metabolic networks, a different approach
is needed. Coupling should be measured by a continuous value
characterising the relations between reaction fluxes over all feasible
steady states, instead of a set of binary values that represent
particular subsets of steady states. In order to distinguish this
continuous measure from previously used definitions of coupling, we
hereon use the term correlation. Such a measure already exists: as
demonstrated by Poolman et al.^[40]13, it can be obtained from the
angles between the vectors forming an orthonormal basis of the
null-space (or kernel) of the stoichiometric matrix defining the
metabolic network. The cosine of this angle precisely represents
Pearson’s correlation coefficient between the fluxes carried by a pair
of reactions over all possible steady states of the system, and was
therefore named reaction correlation coefficient. Hence, the reaction
correlation coefficient ϕ[ij] of a pair of reactions i and j is a
continuous value comprised between –1 and 1, where ϕ[ij] = 0 indicates
that the fluxes of both reactions are completely independent, ϕ[ij] = 1
indicates that they are perfectly correlated and ϕ[ij] = −1 indicates
that they are perfectly anti-correlated.
In this work, we compiled metabolic networks corresponding to four
healthy and cancer tissues, namely breast, lung, kidney and urothelial
cancer. This data allows us to analyse the transition from healthy to
cancer states from the metabolic flux perspective (Fig. [41]1). We
first assembled a metabolic flux correlation network obtained from
biochemical metabolic pathways in healthy and cancer tissues. Indeed,
the above-mentioned flux correlation coefficient can be interpreted as
a failure probability, which suggests that a probabilistic model can be
suitable to address flux control. For example, a positive correlation
interaction with value 1 could be understood as an interaction with
failure probability of zero. Similarly, an absence of correlation with
value zero would suggest a failure probability of 1. These concepts
pave the way to the applicability of a probabilistic control theory for
complex metabolic networks. Here, we used a Probabilistic Minimum
Dominating Set (PMDS) model that can identify a minimum set of nodes
that act as driver nodes and control the entire network in a context of
probabilistic interaction failures^[42]14. In this network
representation, the nodes are interpreted as metabolic reactions and
the weighted edges correspond to the probabilistic flux exchanged among
reactions. The application of PMDS in the context of metabolic fluxes
makes it possible to connect controllability theory with flux
correlations.
Fig. 1.
[43]Fig. 1
[44]Open in a new tab
Method for the application of probabilistic control theory to metabolic
flux analysis. The stoichiometric matrix of the metabolic network is
constructed and used to compute the orthonormal kernel matrix. The
reaction correlation coefficient ϕ[ij] is the cosine of the angle
between the reaction rows in the kernel and represents the strength of
control between both reactions. The failure probability ρ[ij] is
defined as 1−|ϕ[ij]|, which is used to determine the probabilistic
minimum dominating set. We require that each node (reaction) is covered
by multiple nodes in PMDS so that the probability that at least one
edge (incoming flux) is active is greater than the threshold Θ
Results
Cancer metabolic reaction fluxes show higher correlations
Frequency distributions of all correlations in models of healthy and
cancer cells corresponding to four different tissues were computed and
the results are shown in Fig. [45]2. In general, we observed that the
correlations are stronger in the cancer state than the healthy state
for breast, kidney and urothelial tissues. However, for lung cancer the
results show an opposite tendency with a weaker correlation
distribution for the cancer state.
Fig. 2.
[46]Fig. 2
[47]Open in a new tab
Probability distribution of reaction correlations. Probability
distribution of reaction correlations in healthy (blue) and cancer
(purple) states of four human tissues
Statistical analysis of genome-scale metabolic flux networks
The clustering degree in all networks shows, however, a similar local
structure (Table [48]1). This indicates that the differences observed
in cancer states are not due to changes in the network structure, but
to changes in the relations between fluxes.
Table 1.
Topological properties of the complete metabolic networks
Network Clustering coefficient Connected components
Breast healthy 0.780 4
Breast cancer 0.766 7
Lung healthy 0.772 5
Lung cancer 0.771 8
Kidney healthy 0.770 6
Renal cancer 0.764 5
Urothelial healthy 0.780 6
Urothelial cancer 0.769 8
[49]Open in a new tab
We also determined the degree distributions in all these networks and
found that they are similar and adhere to the classical scale-free
distribution observed in most biological networks, including metabolic
networks (Fig. [50]3). It is interesting that in spite of large
metabolic flux changes, the global statistical pattern of the network
remains conserved.
Fig. 3.
[51]Fig. 3
[52]Open in a new tab
Topology of healthy and cancer networks. a Cumulative degree
distributions of healthy and cancer high flux correlation networks. b
Visual representation of high flux correlation networks; the edge
colour represents the flux correlation ranging from 0.5 (red) to 1.0
(green)
However, the specificity of lung cancer appears in the fraction of
nodes connected to the giant connected component (Table [53]2 and
Fig. [54]4). Breast, urothelial and renal cancers show larger giant
components than the corresponding healthy states. In contrast, healthy
lung tissue shows a larger giant component than the cancer state. This
may imply a higher correlation between fluxes in the cancer state for 3
out of 4 cancer types.
Table 2.
Properties of high flux correlation networks in healthy and cancer
states of human tissues
Network Total network size Size of giant component Fraction of giant
component
Breast healthy 3729 611 0.164
Breast cancer 3741 1103 0.295
Lung healthy 3510 1091 0.311
Lung cancer 3350 598 0.179
Kidney healthy 3986 711 0.178
Renal cancer 3444 677 0.197
Urothelial healthy 3890 660 0.170
Urothelial cancer 3618 1078 0.298
[55]Open in a new tab
Fig. 4.
Fig. 4
[56]Open in a new tab
Fraction of giant component in healthy and cancer tissues. In these
networks, nodes represent reactions and two reactions are connected if
the absolute flux correlation between them is higher than 0.5, as
described in Methods. The giant component is the largest connected
component in each network; the fraction was calculated by dividing the
number of nodes in the giant component through the total number of
nodes in each network
Cancer states require fewer controllers except in lung cancer
Based on the PMDS model described in the Methods section, we computed
the number of reactions necessary to achieve full control of each
network in a probabilistic context of flux correlations. We calculated
the PMDS for each network using different values of Θ (threshold
probability). The results (Fig. [57]5) show that the fraction of driver
nodes is smaller in all cancer tissues compared to their corresponding
healthy states, suggesting that it may be theoretically easier to
control these cancer states. This change is consistent with an increase
in flux correlations observed in three out of four cancer states. In
lung the PMDS fraction is still smaller in the cancer state, even
though the healthy state shows higher flux correlations than the cancer
state. The size of the largest component is higher in the healthy state
for lung, which may explain the observed higher correlations.
Fig. 5.
[58]Fig. 5
[59]Open in a new tab
Fraction of PMDS in healthy and cancer tissues. a–d The PMDS fraction
in four healthy (blue) and cancer (red) tissues is displayed for
different values of the threshold Θ
To account for possible inaccuracies in the original models, we
verified that these results are stable with respect to small
alterations in the topologies of the four healthy and cancer networks.
For each tissue type and for each value of Θ, we constructed 10
randomised networks by rewiring 1%, 5% and 10% of the edges,
respectively. In order to conserve the topological properties of the
networks during randomisation, we applied an edge rewiring algorithm
that preserves the degree distribution (see Methods). Even with a
moderate perturbation of 10%, the average PMDS fraction only marginally
increased and the PMDS fraction in the healthy state always remained
larger than in the corresponding cancer state (Supplementary
Figs [60]1, [61]2 and [62]3).
Distribution of controllers in metabolic pathways
In order to compare the controller and non-controller nodes from a
perspective of biological significance, we analysed the distribution of
PMDS and non-PMDS nodes among metabolic pathways from the KEGG
database. Since we are using metabolic models, a comparison to
metabolic pathways is more appropriate than for example the Gene
Ontology, which contains other types of biological functions not
included in the data. We found that some central metabolic pathways are
enriched in PMDS nodes across cancer and non-cancer networks: these
pathways include glycolysis (p-values from 0.009 to 0.06; all p-values
derived from two-tailed Fisher exact tests) and pyruvate metabolism
(p-values from 0.003 to 0.06, except renal cancer p = 0.12). The
citrate cycle is significantly enriched in PMDS nodes in breast and
renal tissues (p-values from 0.01 to 0.05) but not in urothelial and
lung; inositol phosphate metabolism was enriched in healthy breast
(p = 0.03). Conversely, aminoacyl-tRNA biosynthesis was significantly
depleted in PMDS nodes in all cancer and non-cancer networks (p-values
from 3e−5 to 0.05), as well as N-glycan biosynthesis (p-values from
3e−5 to 0.03). Overall, the fact that a higher proportion of controller
nodes are found in central metabolic pathways is consistent with
expectations, since these pathways distribute fluxes towards other
parts of the metabolic network, and this property is maintained between
healthy and cancer cells. Additionally, these results confirm that each
cancer type has distinctive characteristics in terms of pathways
enriched in PMDS nodes, with lung cancer being more distinct than the
three other types.
Folate cycle subnetwork
There is growing evidence of relations between metabolic perturbations
and cancer. In particular, serine and glycine pathways were found to be
associated with oncogenesis^[63]2. These amino acids feed into the
folate cycle, whose outputs feed into the synthesis of nucleotides and
phospholipids. To illustrate how control analysis can shed light on
changes occurring in metabolic pathways, we show the results obtained
on a subnetwork centred on the folate cycle (Fig. [64]6).
Fig. 6.
[65]Fig. 6
[66]Open in a new tab
Flux correlation changes in the folate cycle. a Simplified
representation of the folate cycle. In this representation, nodes
represent metabolites and edges represent metabolic reactions. b–i
Reaction correlation coefficients in the metabolic system depicted in A
for four types of healthy and cancer tissues. In these representations,
nodes represent metabolic reactions and the edge colour represents the
flux correlation value between the reactions connected by the edge; the
colour bar represents the correlations value from red (0.5) to green
(1.0). Abbreviations: THF, tetrahydrofolate; MeTHF,
5–10-methylene-tetrahydrofolate; mTHF, 5-methyl-tetrahydrofolate; DHFR,
dihydrofolate reductase; GLDC, glycine decarboxylase; MS, methionine
synthase; MTHFR, methylene-tetrahydrofolate reductase; SHMT, serine
hydroxymethyl transferase; TS, thymidylate synthase
In the healthy states, we observe that SHMT and TS are strongly coupled
(Fig. [67]6b, d, f, h) indicating that the production of THF is
strongly coupled to its methylation into MeTHF. In addition, there is a
strong positive correlation between TS and DHFR, which means that THF
production from folic acid is also coupled to the THF cycle. In the
cancer states though, these correlations become weaker (Fig. [68]6c, e,
g, i) and several new connections appear. In breast and lung cancer,
MTHFR becomes strongly coupled to SHMT and MS, suggesting that the
folate cycle uses mTHF as an intermediate rather than direct
transformation of MeTHF to THF. Overall, the cancer states induce more
flux interactions between reactions of the THF system. This increases
the complexity of the relations and at the same time opens more
possibilities to control the fluxes in the system.
Discussion
Links between metabolism and cancer have been known for a long time, as
sustained aerobic glycolysis is well known to occur in cancer
cells^[69]15. How these changes are connected to cell proliferation and
accumulation is not well understood though. It was also widely observed
that obesity increases cancer risk, which raises questions about the
existence of causal links between increased metabolic activity and
tumour progression. Conversely, large animals with low metabolic rate
are generally found to have lower incidence of cancer than smaller
animals with faster metabolic rate, which strengthens the view that
metabolic activity could be a factor favouring cancer, at least as
important as mutations^[70]16. Several oncogenes are known to stimulate
metabolic pathways, in particular glucose and glutamine
metabolism^[71]17. Alternative carbon sources such as acetate were
found to be better utilised by tumour cells than normal tissue^[72]18.
Metabolic flux distributions in tumour cells were observed to correlate
with increased lactate production^[73]19 and to maintain levels of
NADPH, allowing cells to better resist to oxidative
stress^[74]20,[75]21.
Our results show that, in the four tissue types for which comparative
models of healthy and cancer metabolic networks are available, namely
breast, lung, kidney and urothelial, controllability in cancer
metabolism is easier than that in healthy metabolism. A generic
interpretation of this result can be found by combining knowledge from
Fig. [76]3. The network visualization shows significant flux
correlation changes. However, the cumulative degree distribution shows
that, in spite that many reactions change flux correlations, the global
statistical degree pattern of the network does not change and still
follows a power-law for the degree of nodes. This is important
technically because in general the PMDS can be computed faster in
networks that have a power-law distribution, i.e. scale-free networks.
However, the new flux distribution in cancer state seems to be more
highly correlated (lower failure probabilities) as shown in Fig. [77]2,
which makes cancer states easier to control from a PMDS view point
since key flux routes have lower probability of failure. Indeed, these
results are based on comparative analyses of metabolic reconstructions
of cancer and healthy tissues and remain dependent on the quality of
the models used. Nevertheless, we verified that these results are
preserved under moderate alterations of the networks, and as the
quality of metabolic models increases and new models become available
for other tissue types, these properties can be further tested in the
future. It is worth mentioning that other types of analyses not related
to controllability may be able to identify differences between cancer
and healthy metabolism. For example null-space analysis can
characterise properties of genome-scale metabolic networks based on
stoichiometry alone^[78]13 and other types of constraints such as
thermodynamics may be taken into consideration.
Nevertheless, specific differences in metabolic flux correlations were
observed in lung cancer. These differences were also reflected in the
number of reactions assembled in the main connected component of the
network. Heterogeneity in metabolic pathway activity has been reported
before, not only between different types of cancers^[79]22 but also
between different types of lung tumours^[80]23,[81]24. In spite of the
heterogeneous metabolic flux response observed in cancers and in
particular a large fluctuation of active correlations (i.e. number of
links) among reactions in lung cancer, the number of necessary
reactions to be controlled in cancer does not largely change. More
importantly, the metabolic flux space in cancer remains easier to
control than that in healthy state.
The number of active correlations in lung cancer is twice as large as
that number in breast cancer. Counterintuitively, the PMDS size of both
lung and breast cancer is lower than that in the corresponding healthy
state. This indicates that the size of the reaction control backbone of
both lung and breast cancer is very similar. The increase in
correlations observed in lung cancer, consistent with the already
reported variability on cancer metabolism, might only perform
peripheral metabolic functions without critical control roles.
Glutamine metabolism is altered in many types of cancer cells, but the
consumption of glutamine by lung cancer cells is higher than in other
cancer types^[82]25. The glutamine metabolic pathway is tightly
interconnected with the mTOR signalling pathway, which promotes cell
survival and is activated by glutamine efflux; this particular feature
is being investigated for potential therapeutic applications^[83]26.
The tyrosine kinase epidermal growth factor receptor (EGFR) is
frequently mutated in non-small cell lung cancer and has strong
interrelations to several metabolic pathways. It was shown that EGFR
signalling promotes not only glucose consumption and lactate
production, but also de novo pyrimidine synthesis, therefore it plays a
major regulatory role on global metabolism^[84]27. Different KRAS
mutations are also found in lung cancer, which do not have the same
cellular activity. It was shown that they affect different metabolic
pathways, with distinguishing effects in particular in the
glutaminolysis and glutathione pathways^[85]28. These examples show
that, in addition to strong metabolic effects shared with other types
cancers such as aerobic glycolysis, lung cancers can also be
characterised by original relations with metabolic pathways. The full
extent of these relations is still poorly understood but is an active
area of research towards new therapies^[86]25.
Methods
Computation of reaction correlation coefficients
The computation starts by constructing the stoichiometric matrix of the
network, S, whose elements are the stoichiometric coefficients of each
metabolite in each reaction. The kernel matrix K is defined by vectors
constituting a basis of the null-space of S, such that S K^T = 0.
In most cases, K is not unique. However, if the vectors of K form an
orthonormal basis, which means K K^T = I, then the angles θ[ij] between
the row vectors of K are invariant. The reaction correlation
coefficient ϕ[ij] is defined as the cosine of ϑ[ij]:
[MATH:
ϕij=kikjTkikiTkjkjT=
mo>cos(ϑij) :MATH]
1
where k[i] and k[j] are the row vectors of K corresponding to reactions
i and j respectively. As demonstrated in Poolman et al.^[87]13, ϕ[ij]
represents the Pearson correlation coefficient between the fluxes
carried by reactions i and j over all possible steady states of the
system. The mathematical demonstration is based on introducing a random
matrix R that contains all possible steady states, s being the number
of steady states which can be arbitrarily large. Then the Pearson
correlation r of this distribution is defined, and when s tends towards
infinite it is shown that r tends towards a product of k vectors, which
represents the cosine of the angle.
We calculated K using the null function in Matlab and verified that it
meets the orthonormality condition for each metabolic network. After
obtaining ϕ[ij] for each pair of reactions, the probability of failure
between reactions i and j was defined as ρ[ij] = 1 − abs(ϕ[ij]).
Probabilistic control model
Natural and engineered complex networks are composed of thousands of
nodes and tens of thousands of links. These links representing
regulatory interactions or transmission lines suffer from probabilistic
failures. The flux correlation coefficient allows us to define the
probability of failure between reactions i and j (ρ[ij]) and integrate
it into a probabilistic control model. We want each node (reaction) to
be covered by multiple nodes in MDS so the probability that at least
one edge (incoming flux) is active is at least Θ. This problem can be
formulated as a probabilistic minimum dominated (PMDS) as follows. Let
S be a dominating set, then we require S to satisfy:
[MATH: 1-∏
i∈Sρ
mi>ij
≥Θ,∀j∈V :MATH]
2
which can be rewritten as:
[MATH:
∑i∈S-lnρij<
/mi>≥-ln1-Θ :MATH]
3
where ρ[ij] indicates the probability of failure between reactions (i,
j).
The standard MDS problem is formalized by the following integer linear
programming (ILP) problem:
Minimise
[MATH:
∑i∈V
xi :MATH]
subject to:
[MATH:
xi+
∑(j,i
)∈Exj≥1, :MATH]
4
[MATH:
xi∈0,1
,∀i∈V,
mo> :MATH]
where an MDS is obtained by the set
[MATH: x∣xi=1, :MATH]
V indicates the set of reaction nodes in the network and E denotes the
set of undirected edges between reaction nodes. Then, inserting the
probabilistic condition shown in Eqs [88]2 into [89]3 leads to the PMDS
formalized as the following ILP:
Minimise
[MATH:
∑i∈V
xi, :MATH]
subject to:
[MATH:
xj≥1,∀j∈Vsuchthatdegj=0
, :MATH]
[MATH: -ln1-Θxj+
∑(i,j
)∈E-ln(ρij)xi≥-ln1-Θ,∀j∈Vsuchthatdegj>0
, :MATH]
5
[MATH:
xi∈0,1
,∀v∈V :MATH]
where deg(j) denotes the degree of reaction node j. In the above
expression, the first term
[MATH: - ln1-Θxj :MATH]
is added because if node j is included in the PMDS, the inequality
needs to be hold. In the implementation of the ILP-based method, a
small number (10^−6) is added to
[MATH:
ρij
:MATH]
to avoid infinity occurrence at
[MATH:
ρij=0 :MATH]
. The ILP problem was solved using the IBM ILOG CPLEX Optimiser package
version 12.0.
Construction of cancer networks
Genome-scale models representing human metabolic pathways in healthy
and cancer states were compiled from Basler et al.^[90]5 and Gatto et
al.^[91]29 Supplementary data. Then, flux correlation networks were
constructed by computing the reaction correlation coefficients. To
create healthy and cancer networks a threshold of 0.5 was used, which
means that absolute correlations smaller than 0.5 were deleted, or
conversely that failure probabilities higher than 0.5 were deleted; in
the resulting networks, nodes represent reactions and two reactions are
connected if their correlation is higher than 0.5. This means that we
are considering the high flux correlation networks in our analysis.
Construction of randomised networks
For each tissue type and for each value of Θ, we constructed 10
randomised networks by rewiring 1%, 5% and 10% of the edges,
respectively. We used the rewire function from the R igraph package,
together with the keeping_degseq function that preserves the original
network’s degree distribution, in order to conserve the topological
properties of the networks.
Pathway enrichment analysis
We analysed the distribution of PMDS and non-PMDS nodes across all
metabolic pathways of the KEGG database in order to determine whether
some pathways are significantly enriched or depleted in controller
nodes. Enzyme Commission numbers associated to reactions were extracted
from the raw metabolic models, then were mapped to PMDS and non-PMDS
node lists in the same conditions as described above. These node lists
were searched against KEGG pathways using the KEGG Mapper tool^[92]30
in order to obtain the number of PMDS and non-PMDS nodes in each
pathway, then two-tailed Fisher exact tests were conducted using R in
order to determine significant enrichment or depletion.
Reporting summary
Further information on research design is available in the [93]Nature
Research Reporting Summary linked to this article.
Supplementary information
[94]Supplementary Information^ (126.3KB, pdf)
[95]Peer Review File^ (173.9KB, pdf)
[96]Reporting Summary^ (78.7KB, pdf)
Acknowledgements