Abstract

   Type 2 diabetes mellitus (T2DM) is a challenging and progressive
   metabolic disease caused by insulin resistance. Skeletal muscle is the
   major insulin-sensitive tissue that plays a pivotal role in blood sugar
   homeostasis. Dysfunction of muscle metabolism is implicated in the
   disturbance of glucose homeostasis, the development of insulin
   resistance, and T2DM. Understanding metabolism reprogramming in newly
   diagnosed patients provides opportunities for early diagnosis and
   treatment of T2DM as a challenging disease to manage. Here, we applied
   a system biology approach to investigate metabolic dysregulations
   associated with the early stage of T2DM. We first reconstructed a human
   muscle-specific metabolic model. The model was applied for personalized
   metabolic modeling and analyses in newly diagnosed patients. We found
   that several pathways and metabolites, mainly implicating in amino
   acids and lipids metabolisms, were dysregulated. Our results indicated
   the significance of perturbation of pathways implicated in building
   membrane and extracellular matrix (ECM). Dysfunctional metabolism in
   these pathways possibly interrupts the signaling process and develops
   insulin resistance. We also applied a machine learning method to
   predict potential metabolite markers of insulin resistance in skeletal
   muscle. 13 exchange metabolites were predicted as the potential
   markers. The efficiency of these markers in discriminating
   insulin-resistant muscle was successfully validated.

Introduction

   Type 2 diabetes mellitus is a complex progressive metabolic disease
   with a heterogeneous etiology. The disease has a high global prevalence
   and is among the top ten causes of death worldwide. According to the
   latest international diabetes federation report, the global prevalence
   of diabetes in 2019 is estimated to be 463 million people. In addition,
   374 million people are at greater risk of developing T2DM [[30]1].
   Therefore, this disease has become a major concern in global health.

   Insulin resistance is the main pathophysiologic feature in T2DM. The
   key insulin-sensitive tissues include skeletal muscle, liver, and
   adipose tissues. Among these, skeletal muscle is responsible for the
   clearance of over 75% of glucose from the bloodstream. Thus this tissue
   plays a great role in lowering the blood glucose level [[31]2].
   Dysfunction of muscle metabolism is associated with disturbance in
   glucose hemostasis and implicated in obesity, insulin resistance, and
   T2DM [[32]3]. Therefore, the study of the molecular mechanisms
   underlying insulin resistance in skeletal muscle will improve our
   understanding and treatments for T2DM.

   In complex and multifactorial diseases such as diabetes, systems
   biology provides a holistic view to study dysregulated subsystems
   associated with the disease in a cell. Meanwhile, genome-scale
   metabolic modeling serves as a context for in silico exploration of
   metabolism through a systems biology approach. Since there is no
   one-to-one relationship between transcription and translation, the
   study of gene expression data alone does not provide an accurate
   understanding of cellular metabolism; while, metabolic network
   simulation gives an in-depth insight into the molecular mechanisms
   involved in cellular processes. Generic metabolic networks are
   reconstructed based on the information encoded in the genome. This
   information includes all possible reactions that can occur in all types
   of human cells. By having gene expression data from a particular cell
   or disease state and integrating it into the generic model, active
   reactions are identified and a context-specific model can be
   reconstructed. Some efforts have been made to study the molecular
   mechanisms contributing to muscle insulin resistance using genome-scale
   metabolic models (GEMs). Bordbar et.al. have reconstructed a
   multi-tissue type GEMs and integrated gene expression data of obese and
   type 2 obese gastric bypass patients to study metabolic activity
   between these states [[33]4]. Nogiec et.al. have applied flux balance
   analysis to model muscle insulin resistance metabolism in fasted to fed
   states. In addition, they tried to identify reactions that reproduce
   key features of insulin resistance by perturbing the muscle metabolic
   network. Nevertheless, they did not apply the integration of
   high-throughput data to the model [[34]5]. Varemo et.al. have
   reconstructed a myocyte metabolic network and have utilized it to apply
   a gene-set analysis. They have found reporter metabolites and the
   metabolic pathways affected by differentially expressed genes in T2DM
   [[35]6]. Previous studies were either small scale or no comprehensive
   simulation-based metabolic analysis was employed. Also, there is not
   any study that demonstrated metabolic disturbance at the early stages
   of diabetes. Since T2DM is a progressive disease, identification of
   disorders in newly diagnosed patients can provide significant
   information about the disease. It may bring us closer to identifying
   the main causes triggering the onset of disease progression.

   In this study, the system-level metabolic analysis on the most
   comprehensive data from early-stage T2DM patients was performed. The
   samples include muscle gene expression data from newly diagnosed
   diabetic patients [[36]7]. We attempted to identify early metabolic
   alterations associated with the disease. We first reconstructed the
   functional muscle-specific metabolic model. The model was employed for
   metabolic analyses in T2DM. We applied personalized metabolic modeling
   along with the constraint-based approach to study metabolic
   reprogramming in diabetic muscles. Moreover, potential metabolite
   markers related to muscle insulin resistance were predicted using a
   machine learning approach. Finally, we validated these markers with
   gene expression data from a separate study [[37]8]. The overall
   workflow of this study is shown in [38]Fig 1.

Fig 1. The workflow of study design.

   [39]Fig 1
   [40]Open in a new tab

   (A) The muscle-specific metabolic model was reconstructed and was
   employed for our following analyses: (B) Study of metabolism
   reprogramming in newly diagnosed T2DM patients using the personalized
   metabolic modeling with the constraint-based approach. (C) Prediction
   of potential metabolite markers of insulin resistance in muscle by
   applying a machine-learning technique.

Materials and methods

Ethics statement

   In this study, we utilized transcriptome data from the dbGaP database
   with the accession code phs001068.v1.p1, which was originally published
   in reference [[41]7]. The authors of reference [[42]7] obtained written
   informed consent from all subjects and obtained approval from the
   coordinating ethics committee of the Hospital District of Helsinki and
   Uusimaa. For validation of metabolite markers, we used muscle gene
   expression data of healthy and diabetic subjects from the GEO
   repository with the accession numbers [43]GSE63887 and [44]GSE81965,
   which were obtained from reference 8. The authors of reference [[45]8]
   stated that ethical approval for the human studies, from which
   satellite cells were obtained, was provided by the Ethics Committee of
   Copenhagen and Frederiksberg Council, Denmark (KF 01-141/04). The human
   volunteers were given oral and written information about the
   experimental procedures, and all subjects provided written consent
   before participation. The study was carried out in accordance with the
   principles of the Declaration of Helsinki as revised in 2000.

   As we did not directly perform any experiments on human samples and
   solely used publicly available data, we deemed it unnecessary to seek
   additional ethical approval for our study.

Data

   We searched transcriptome databases for skeletal muscle data from
   healthy and T2DM individuals. We found the most comprehensive dataset
   from the dbGaP database with the accession code phs001068.v1.p1. The
   gene expression data were obtained from vastus lateralis muscle using
   high-throughput RNA-Seq technology. The data comprise 91 healthy and 63
   newly diagnosed subjects [[46]7]. The transcriptome data were
   downloaded and were used for our analyses.

   For validation of metabolite markers, the muscle gene expression data
   of healthy and diabetic subjects were taken from the GEO repository
   with the accession numbers [47]GSE63887 and [48]GSE81965 [[49]8]. This
   data contains RNA-Seq samples from 24 individuals in four equal
   following groups: normal glucose tolerant (NGT)/non-obese, NGT/obese,
   T2DM/non-obese, and T2DM/obese.

Muscle-specific metabolic model reconstruction

   The muscle-specific metabolic model was reconstructed by integration of
   gene expression data to the generic model. Here, we used the HMR2 model
   as the generic model [[50]9, [51]10]. The gene-protein-reaction (GPR)
   relationships in HMR2 were corrected using the Metabolic Atlas database
   [[52]11] and BiGG model [[53]12]. Moreover, the myocyte biomass
   reaction from the Bordbar muscle metabolic model was added to our model
   [[54]4]. Muscle gene expression profiles of healthy individuals [[55]7]
   were used. For the preprocessing, low expressed genes (< 5 read counts
   in less than 25 percentages of samples) were filtered and
   between-sample normalization according to the DESeq2 normalization
   method with gene length adjustment and log2 (gene expression+1)
   transformation were applied [[56]13]. E-Flux algorithm [[57]14] was
   employed to integrate gene-expression data to the HMR2 model and to
   reconstruct the context-specific metabolic model. To build a fully
   functional model we removed dead-end reactions.

Constraint-based personalized metabolic modeling

   Personalized GEMs were reconstructed by integrating gene expression
   data from each individual into the muscle-specific metabolic model.
   Pre-processed gene expression data and E-Flux algorithm were used here.
   Media conditions were set by body fluid metabolites [[58]15]. We used
   maximizing flux through the production of mitochondrial ATP as the
   objective function. In addition, to ensure that the simulated cell is
   biologically relevant and functional,we set the lower bound of biomass
   reaction to 80% of the maximum biomass production by the healthy model
   [[59]16]. For personalized metabolism simulation, flux variability
   analysis was applied. FVA is a common constraint-based and linear
   programming method to find the possible range of flux through each
   reaction while a set of constraints are satisfied [[60]17]. The minimum
   (min) and maximum (max) possible fluxes through each reaction in each
   personalized model were obtained using FVA in Cobra Toolbox [[61]18].
   The glpk has been used for solving the linear programming problem. To
   find perturbed reactions in the T2DM state, the two-sample t-test was
   applied on the min and max fluxes between healthy and T2DM groups. The
   Benjamini–Hochberg procedure was used to apply multiple testing
   corrections and reactions with false discovery rates less than 0.1 were
   regarded as significant perturbed reactions.

Enrichment analyses

   To looks for significantly dysregulated pathways and metabolites,
   enrichment analyses were performed. Significant perturbed reactions
   were used as the reaction set to be enriched. Pathway-reaction set and
   metabolite-reaction sets were created using our reconstructed muscle
   metabolic model. Metabolites were assigned to the reaction if it
   participated in the corresponding reaction as either substrate or
   products. For enrichment analysis, hypergeometric 1-sided test with FDR
   correction for multiple testing correction was used.

Metabolite-centric network

   Metabolite-centric is one of the most common representations of the
   metabolic network. Metabolites are considered as the nodes of the
   network. Then two nodes are linked if both of them participate in the
   reaction, one as the substrate and the other as the product. We
   reconstructed the metabolite-centric network from perturbed reactions.
   For simplicity, transport and exchange reactions were excluded and
   unique metabolites instead of the compartmentalized ones were used. We
   also excluded the currency metabolites. To find the significant
   connected components of dysregulated metabolites, metabolites with an
   adjusted p-value greater than 0.05 were filtered. Adjusted p-values
   were obtained from metabolite enrichment analysis. The
   metabolite-centric subnetwork was visualized using Cytoscape version
   3.8.2 [[62]19].

Classification and feature selection

   An SVM classifier with a polynomial kernel of order 2 was used for
   classification. The classifier was evaluated by 10-fold stratified
   cross-validation and analysis of accuracy, sensitivity, and specificity
   were reported.
   [MATH:
   <mi>A</mi><mi>c</mi><mi>c</mi><mi>u</mi><mi>r</mi><mi>a</mi><mi>c</mi><
   mi>y</mi><mo>=</mo><mfrac><mrow><mi>T</mi><mi>P</mi><mo>+</mo><mi>T</mi
   ><mi>N</mi></mrow><mrow><mi>T</mi><mi>P</mi><mo>+</mo><mi>T</mi><mi>N</
   mi><mo>+</mo><mi>F</mi><mi>P</mi><mo>+</mo><mi>F</mi><mi>N</mi></mrow><
   /mfrac> :MATH]
   [MATH:
   <mi>S</mi><mi>e</mi><mi>n</mi><mi>s</mi><mi>i</mi><mi>t</mi><mi>i</mi><
   mi>v</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>=</mo><mfrac><mrow><mi>T</mi
   ><mi>P</mi></mrow><mrow><mi>T</mi><mi>P</mi><mo>+</mo><mi>F</mi><mi>N</
   mi></mrow></mfrac> :MATH]
   [MATH:
   <mi>S</mi><mi>p</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><mi>i</mi><
   mi>c</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>=</mo><mfrac><mrow><mi>T</mi
   ><mi>N</mi></mrow><mrow><mi>T</mi><mi>N</mi><mo>+</mo><mi>F</mi><mi>P</
   mi></mrow></mfrac> :MATH]

   True Positives (TP) refers to the T2DM individuals that are correctly
   classified as patients. False Positives (FP) refers to the healthy
   individuals that are incorrectly classified as patients. True Negatives
   (TN) is the number of healthy individuals that correctly classified as
   healthy, and False Negatives (FN) is the number of T2DM individuals
   that are incorrectly classified as healthy.

   For feature selection, a feature selection method, based on a hybrid of
   genetic algorithm (GA) and support vector machine was employed. In the
   GA, the feature subset is encoded as the candidate solution on a
   chromosome-like structure. The population is formed from a set of
   chromosomes that mutation and crossing over in it occur to generate the
   next generation. A fitness score is calculated for each chromosome
   representing adaptation of it to the environment and better feature
   subsets have more chance of reproducing the next generation. Creating
   the next generations will be continued until a stopping criterion is
   satisfied.

   Here, a binary GA was employed that the binary values of 1 and 0
   represent the presence or absence of a specific feature at the
   particular chromosome, respectively. The chromosome length was set to
   the number of features. The population size and the maximal number of
   generations were set to 300 chromosomes and 100 generations,
   respectively. We used the accuracy of the SVM classifier as the fitness
   score. The GA was stopped if the fitness score was reached an accuracy
   higher than 90 percent or the maximum number of generations was
   reproduced.

Results

Comprehensive high-throughput RNA-Seq data from skeletal muscle

   We looked for the most comprehensive skeletal muscle gene expression
   profiles. We queried the transcriptome repositories for individuals’
   expression profiles with the following criteria: 1) newly diagnosed
   with T2DM. 2) Taking no medication for diabetes. 3) Do not have any
   other disease affecting the analysis. This resulted in data with the
   accession code phs001068.v1.p1 in the dbGaP database. Diabetic patients
   in this sample have been newly diagnosed as T2DM patients, and in most
   of them, the fasting blood glucose concentration was around seven
   mmol/l. [63]Table 1 shows the characteristics of the subjects of the
   samples used in this study. Investigation of metabolism reprogramming
   in these patients who are at the early stages of T2DM has significant
   value in identifying the underlying disorders in T2DM. We used gene
   expression data of healthy individuals in this sample to reconstruct
   the muscle-specific metabolic model.

Table 1. Subjects characteristics of the individuals in the dataset [[64]7].

   Means ± standard deviation values for fasting plasma glucose, fasting
   serum insulin, body mass index (BMI), and waist/hip ratio (WHR) in each
   diabetic and healthy group are shown.
   Sample  Glucose (mmol/L) Insulin (mu/l) BMI          WHR
   T2DM    7.17 ± 0.66      10.68 ± 6.94   29.37 ± 5.04 0.99 ± 0.07
   Healthy 5.62 ± 0.33      6.87 ± 3.34    26.35 ± 3.47 0.92 ± 0.08
   [65]Open in a new tab

Human muscle-specific metabolic model reconstruction

   Reconstruction of the muscle-specific metabolic model requires the
   human generic metabolic model as the template. Here, the Human
   Metabolic Reaction 2 (HMR2) [[66]10] was employed as the template
   metabolic model. The gene expression data are integrated into the model
   using GPR relationships. Proteins/protein complexes are linked to the
   corresponding genes by GPRs that are typically described with Boolean
   rules. An ’AND’ logic operator between two genes implies the required
   expression of both of them for a protein to have a function; an ‘OR’
   operator denotes any of the related genes that can produce protein. In
   the HMR2 model, however, each reaction is assigned to the corresponding
   genes but the Boolean operators were all OR annotated. Since the right
   GPR rules are crucial for metabolic simulation, we manually curated
   them using the Metabolic Atlas database [[67]11] and BiGG models
   [[68]12]. The updated HMR2 model was used for subsequent steps.

   The biomass reaction is used in metabolic models to represent all
   metabolites required for cell growth and maintenance. We included the
   biomass function in the template model to ensure that the simulated
   fluxes also assert muscle maintenance and viability. The
   muscle-specific biomass maintenance function was built using the
   information obtained from the experimental measurements and literature
   [[69]4]. The process of formulating muscle-specific biomass functions
   has been described in their publication [[70]4]. The biomass reaction
   was constructed and added to the template model.

   The process of reconstructing the muscle-specific metabolic model
   requires the expression data to infer active reactions from the
   template model. The preprocessed transcriptome data from the healthy
   group were used to integrate into the model. E-flux algorithm [[71]14]
   was applied to build the muscle-specific metabolic model. Reactions
   with dead-end metabolites were detected and removed from the model to
   have a fully functional model. Dead-end metabolites are those that are
   only consumed or produced within the network. Reactions with dead-end
   metabolites can not carry flux and are blocked. The final model
   comprises 3560 metabolites, 5714 reactions and, 2704 genes. [72]Table 2
   shows the characteristics of our muscle-specific metabolic model in
   comparison with HMR2. The reconstructed muscle metabolic model in this
   study is freely available at
   [73]https://github.com/Maryamkhn/Muscle_MetNet_CBB.

Table 2. Comparison of HMR2 metabolic model with reconstructed
muscle-specific metabolic model in the present study.

               HMR2 model Muscle model
   Reactions   8181       5714
   Metabolites 6006       3560
   Genes       3765       2704
   [74]Open in a new tab

   For quality controls and validation, we first checked for biomass
   production. This asserts that all the required reactions to having an
   alive model are satisfied. Then, 56 metabolic tasks known to occur in
   all types of human cells were checked; Important pathways including
   glycolysis, TCA cycle, pentose phosphate, cellular respiration,
   synthesis of nucleotides and lipids, oxidation of fatty acids were
   checked. The complete list of these metabolic tasks can be found in
   [[75]20]. Moreover, for muscle-specific validation, same as Bordbar
   et.al. [[76]4], we tested the ability of the model to produce ATP from
   several different sources like glucose, fatty acids, glycogen,
   branched-chain amino acids, and ketone bodies. The model successfully
   passed all these tests. The reconstructed muscle-specific metabolic
   model was employed for further analyses.

Metabolic reprogramming in the muscle at the early stage of T2DM

   The reconstructed muscle metabolic model can be employed as a context
   to investigate metabolic reprogramming in diabetic muscles. Integrating
   gene expression data with the metabolic model can unravel pathways
   implicated in the disease. Here, we applied constraint-based
   personalized metabolic modeling. We reconstructed personalized
   metabolic models for healthy and T2DM individuals. Then, using a
   constraint-based approach, metabolism was simulated at the personalized
   level. In this approach, the metabolic model is converted to the
   mathematical format, and several constraints (e.g. adjusting lower and
   upper bound of fluxes in each reaction) were imposed on the model.
   Combining high-throughput transcriptional data and constraint-based
   models provides an opportunity to infer cellular metabolism at the
   relevant physiological state [[77]21]. Flux variability analysis (FVA)
   is a well-known constraint-based approach, which can provide allowable
   flux ranges in the metabolic model through linear programming based
   strategy [[78]17].

   This analysis resulted in perturbed reactions involving in several
   pathways. The perturbations in the glucose to glucose-6-phosphate
   conversion reaction and the glucose exchange reaction were also
   observed. Many reactions contributing to fatty acids and lipid
   metabolism were dysregulated. Also, changes in fluxes of several
   reactions implicated in inositol phosphate, chondroitin, and heparin
   sulfate metabolism were observed. To find statistically significant
   enriched pathways, pathway enrichment analysis was employed. [79]Fig 2
   shows enrichment scores of these significant pathways. This score is
   calculated as the percentages of the perturbed reactions in each
   pathway divided by the total number of reactions in the corresponding
   pathway. For simplicity, carnitine shuttles, transport, and exchange
   reactions are not shown. Keratan sulfate, chondroitin/heparan sulfate
   biosynthesis, and glycerolipid metabolism have the most enrichment
   scores. Moreover, the distribution of perturbed reactions in these
   significant pathways is shown in [80]Fig 3. Formation and hydrolysis of
   cholesterol esters, glycerolipid metabolism, fatty acid activation,
   keratan sulfate biosynthesis, chondroitin/heparan sulfate biosynthesis,
   glycerophospholipid metabolism, and inositol phosphate metabolism are
   the top enriched pathways in this figure.

Fig 2. Pathway enrichment analysis.

   [81]Fig 2
   [82]Open in a new tab

   Perturbed reactions obtained from metabolic analysis in T2DM were used
   as reaction set to be enriched. The X-axis indicates the enrichment
   score. This score is calculated as the percentages of the perturbed
   reactions in each pathway divided by the total number of reactions in
   the corresponding pathway. Colors also show adjusted p-value obtained
   from pathway enrichment analysis.

Fig 3. The distribution of perturbed reactions in enriched pathways.

   [83]Fig 3
   [84]Open in a new tab

   These perturbed reactions obtained from metabolic analysis in T2DM.
   Numbers indicate the percentage of reactions that belong to each
   pathway.

   To reveal metabolic alterations at the subcellular compartment level,
   we classified each perturbed reaction based on the compartment the
   reaction occurs inside it (([85]Fig 4A). As is shown, cytosol,
   lysosome, and endoplasmic reticulum are the top dysregulated
   compartments. Cytosol due to the high number of cytosolic reactions has
   the largest amount of perturbed reaction. To understand which
   compartment is the most dysregulated ones we adjusted perturbed
   reactions in each compartment by the total number of reactions in the
   corresponding compartment (transport reactions were excluded). [86]Fig
   4B shows the result of this analysis. This figure demonstrates that
   Golgi, lysosome, and endoplasmic reticulum were mostly affected. Golgi
   has mainly encountered dysmetabolism in chondroitin, heparin, and
   keratan biosynthesis. Perturb reactions implicated in the formation and
   hydrolysis of cholesterol esters, heparan sulfate degradation, and
   chondroitin sulfate degradation comprise the major dysmetabolism in the
   lysosome. In addition, the endoplasmic reticulum is mostly affected by
   perturbation in the carnitine shuttle subsystem.

Fig 4. Dysmetabolism at the subcellular level.

   [87]Fig 4
   [88]Open in a new tab

   A) Distribution of perturbed reactions based on the subcellular
   compartments. B) The percentage of perturbed reactions in each
   compartment.

   We also provided enrichment analysis of significant perturbed reactions
   in the context of metabolites. Several important metabolites mainly
   implicated in amino acids, fatty acids, and lipid metabolism were
   enriched; Metabolites including glucose, mitochondrial ATP, CoA, fatty
   acids, activated fatty acids, cholesterol, amino acids, pyruvate, AKG,
   succinate, homoserine, NADH, FADH, glutathione, Acetyl-CoA,
   CMP-N-acetylneuraminate, THF, 5,10-methenyl-THF, and 10-formyl-THF.
   [89]Fig 5 shows the top 40 significant enriched metabolites with
   corresponding enrichment scores (currency metabolites are excluded).
   The complete list of the enriched metabolites in unique and
   compartmentalized form can be found in Additional File 1, S1 and S2
   Tables in [90]S1 File.

Fig 5. Metabolites enrichment analysis.

   [91]Fig 5
   [92]Open in a new tab

   The X-axis indicates the enrichment score. This score is calculated as
   the percentage of perturbed reactions to the total number of reactions
   in which the corresponding metabolite involved. Colors also show
   adjusted p-values obtained from enrichment analysis.

   We constructed a metabolite-centric network from the perturbed
   metabolic network to find the connected component of significantly
   dysregulated metabolites. Enriched metabolites with adjusted p-value
   lower than 0.05 were preserved and other metabolites were removed from
   the metabolite-centric network. We used Cytoscape biological network
   tool [[93]19] to visualize the subnetwork. [94]Fig 6 shows the two
   biggest connected components as significantly dysregulated ones; One is
   related to the glycerolipid metabolism and the other related to the
   metabolism of the amino acids. We also found tetrahydrofolate (THF) and
   glutathione (GSH) connected to the amino acids related component.

Fig 6. Connected components of significantly dysregulated metabolites.

   [95]Fig 6
   [96]Open in a new tab

   The metabolite-centric network was reconstructed from the perturbed
   reactions in T2DM. Insignificant enriched metabolites were removed from
   the network. The two biggest connected components were found. The right
   component mainly consists of amino acids. The left component comprises
   metabolites involved in the glycerolipid metabolism.

Potential metabolite markers prediction

   The genome‐scale model of human metabolism provides an opportunity to
   predict potential biofluid markers associated with the diseases. The
   overall approach involves identifying exchange metabolites that their
   flux range in disease state is shifted compared to healthy ones. The
   successful performance of this method has already been evaluated and
   validated [[97]22]. Here, we applied this method to find metabolite
   markers of insulin resistance in muscle. Two generic metabolic models
   of healthy and diabetic muscles were reconstructed using average gene
   expression data of each group and the E-Flux method; Then, FVA analysis
   was applied to them. The exchange reactions in which the flux range was
   shifted relative to the normal state were selected. This approach
   resulted in 32 exchange reactions. In order to evaluate the
   discriminative capability of these reactions at the individual level, a
   machine learning approach was employed. To do this, the generated
   minimum and maximum fluxes of these reactions obtained from
   personalized FVA analysis were used as the features for classification
   with SVM. This evaluation resulted in 72.22% accuracy, 81.11%
   specificity, and 60.32% sensitivity. We found that these exchange
   metabolites altered at the individuals’ level.

   To achieve a fewer and better feature subset with improvement in
   classification accuracy, a wrapper feature selection method was
   employed that comprises a combination of GA and SVM as the feature
   selector and classifier, respectively. Several subsets of features with
   which the SVM classifier can discriminate diabetic patients from
   healthy ones with approximately 90 percentages accuracy were found.
   This GA-SVM procedure was iterated 100 times resulting in 100 feature
   subsets; then, the observation frequency of each feature in these
   iterations was calculated. These frequencies were considered as
   indicative of the importance of features and features with at least 80%
   frequency regarded as top-ranked features. The top-ranked features were
   related to 13 reactions exchanging glucose, alanine, aspartate, sodium,
   hyaluronate, galactose, GQ1b, acetaldehyde, deoxyribose,
   4-OH-Estradiol, hypoxanthine, retinoate, and methylglyoxal.
   Subsequently, the SVM classification performance with the top-ranked
   features was assessed. Our analysis revealed that using these
   top-ranked features improved classification accuracy to 81.04% with
   73.02% sensitivity and 86.67% specificity. Since the classifier
   performance will vary depending on which samples are assigned to the
   training set and which ones to the test set, we repeated the 10-fold
   cross-validation 100 times and evaluated the SVM performance. The
   boxplot of this evaluation is shown in [98]Fig 7.

Fig 7. Boxplot of SVM evaluation using 13 metabolite markers as features.

   [99]Fig 7
   [100]Open in a new tab

   10-fold cross-validation was repeated 100 times and the SVM performance
   was evaluated.

   For further validation, we assessed the performance of these top-ranked
   features with data obtained from another study [[101]8]. This data
   comprises RNA-Seq samples from the vastus lateralis muscle of 24
   participants, divided into four following groups: normal glucose
   tolerant (NGT)/non-obese, NGT/obese, T2DM/non-obese, and T2DM/obese.
   After preprocessing of the read counts, personalized metabolic models
   were reconstructed and FVA analysis was performed. Associated fluxes of
   top-ranked features were obtained and were used as the features for the
   SVM classifier.

   We divided these people into NGT and diabetic groups regardless of the
   obesity state and the generated fluxes of marker reactions in these 24
   participants were used for validation with SVM. This analysis resulted
   in 49.46% accuracy, 42.25% specificity, and 56.67% sensitivity, which
   is remarkably low, probably due to the presence of normoglycemic obese
   people in the healthy group. Although the blood glucose level in these
   individuals is in the normal range, high levels of fasting blood
   insulin level (60 ± 33 pmol/l in NGT/obese vs 25 ±7 pmol/l in
   NGT/non-obese) indicates insulin resistance in this group, which is
   compensated by more insulin secretion. Thus, we decided to remove data
   from NGT/obese individuals and re-evaluated markers in this set. The
   result significantly altered with 78.50% accuracy, 70.17% specificity,
   and 82.67% sensitivity. Therefore, the metabolites markers are related
   to insulin-resistant metabolism and can successfully discriminate
   insulin-resistant individuals from healthy ones.

Discussions

   Here, we aimed to provide a holistic view of the metabolism remodeling
   in skeletal muscle of newly diagnosed T2DM patients. Since diabetes is
   a progressive metabolic disease, our goal was to identify the metabolic
   alterations at the early stages of the disease. We used the gene
   expression data from the most comprehensive human skeletal muscle
   transcriptome study to date. The diabetic sample consists of
   participants who have newly been diagnosed with diabetes (with an
   average blood glucose concentration of ~ 7.2 mmol/L) and had not taken
   any medications. For understanding metabolism alterations in T2DM, we
   first reconstructed our functional muscle-specific metabolic model. We
   could not utilize Bordbar [[102]4] or Nogiec [[103]5] muscle metabolic
   models due to them being small-scale. Varemo model [[104]6] was a
   complete muscle metabolic model, but because of the wrong GPRs, it was
   not suitable for simulation. Thus, we reconstructed our comprehensive
   skeletal muscle metabolic model. The model quality was successfully
   validated. This model is freely available and can be used for other
   muscle metabolic researches.

   We employed the reconstructed model for system-level investigation of
   dysmetabolism in T2DM. Personalized metabolic models with
   constraint-based simulations were applied to quantitatively compare
   metabolic capabilities between healthy and newly diagnosed diabetic
   patients. Perturbed reactions and subsequently affected pathways and
   metabolites were identified. We found that significant metabolic
   alterations occurred in several pathways. Pathways implicated in
   membrane and ECM biosynthesis were dysregulated. Perturbations in the
   metabolism of keratin sulfate, chondroitin sulfate, and heparan
   sulfate, inositol phosphate, glycerolipid, glycerophospholipid, and
   sphingolipid metabolism were observed. Chondroitin sulfate, keratin
   sulfate, heparan sulfate, and hyaluronic acid are glycosaminoglycans
   (GAGs). GAGs play a vital role in cell physiology including cell
   signaling, proliferation, and cell adhesion. The insulin-sensitizing
   and anti-diabetic impacts of some GAGs have been reported [[105]23,
   [106]24]. Perturbations in GAGs related pathways and sphingolipid
   metabolism imply the role of the ECM in insulin resistance, which is
   involved in the regulation of insulin action. Inositol mediates insulin
   signal transduction, is associated with glucose uptake, and plays a
   vital role in oxidative stress and inflammation. Administration of
   inositol supplements improves glucose metabolism and insulin resistance
   [[107]25].

   Significant alterations in lipid metabolism including glycerolipid,
   glycerophospholipid, sphingolipid, and glycosphingolipid metabolism
   were found in our analyses. These pathways mediate the production of
   major constituents of the plasma membrane. These metabolites act as
   structural components of membranes and as signaling molecules.
   Dysfunctional lipid metabolism can lead to membrane rearrangement,
   changing its integrity, and interrupt cell signaling. Perturbation in
   these pathways also triggers feedback loops resulting in mitochondrial
   dysfunction, reactive oxygen species production, and inflammation.
   Studies have found several intermediates in these pathways implicated
   in insulin resistance [[108]26].

   Enrichment analysis also revealed perturbation in amino acids. Previous
   studies have demonstrated associations between amino acids and T2DM. In
   particular, the various pieces of evidence have implied the implication
   of branched-chain amino acids (BCAAs) in diabetes. Studies have shown
   that dysregulation of BCAAs metabolism results in the serine
   phosphorylation of insulin receptor substrates and subsequent
   uncoupling of insulin signaling [[109]27]. Here, in this analysis BCAAs
   and all other amino acids were affected by the metabolic reprogramming
   in T2DM. Since amino acids also circulate in blood plasma,
   dysmetabolism can possibly be reflected in serum. In recent years,
   amino acids have been considered as a source of prognostic markers in
   T2DM [[110]28]. Here, dysregulation of amino acids in the early stages
   of T2DM was observed which can support the significance of these
   metabolites in the early diagnosis of the disease. Also,
   tetrahydrofolic acid is connected to the amino acids in the
   metabolite-centric network. THF is a cofactor in the anabolism of amino
   acids. THF and its derivatives also contribute to purine, pyrimidine
   and nucleotide synthesis, and DNA methylation [[111]29]. Alteration in
   folate metabolism can disrupt downstream processes. Limited research
   regarding the role of folate and its supplementation has been
   performed. Folate intakes have shown an inverse association with the
   incidence of T2DM [[112]30–[113]33]. It improves insulin resistance by
   modulating DNA methylation of genes associated with obesity and insulin
   resistance [[114]34]. The role of folate metabolism in muscle insulin
   resistance could be the subject of further empirical researches.

   Analysis of metabolic alteration at the sub-cellular level revealed
   that Golgi is mainly affected by metabolic remodeling in T2DM. This
   compartment implicates in building ECM metabolites. Many studies have
   been performed on the role of mitochondria and endoplasmic reticulum in
   the development of insulin resistance [[115]35]; whereas, our knowledge
   about the role of Golgi is very limited. Based on the observations in
   the present study, we suppose the role of Golgi in insulin resistance
   should be further investigated.

   Taken together, the main dysmetabolism was observed in ECM components,
   lipids, and amino acids related pathways. Muscle dysmetabolism changes
   the abundance of metabolites involved in the membrane and ECM. This
   metabolic reprogramming can interrupt the process of sensing insulin
   and transmitting signals into the cell. Decreased insulin sensitivity
   results in lower expression of insulin-responsive genes, reduced
   glucose uptake, and consequently changes in energy, glucose, fatty
   acids, and amino acid metabolisms.

   As the final analysis, potential metabolite markers were predicted. Our
   approach led to the identification of 13 exchange metabolites that
   could discriminate healthy individuals from T2DM patients with 81%
   accuracy. We validated these markers using separate gene expression
   data from the Varemo et.al. study [[116]8]. They have shown that muscle
   transcriptional reprogramming in obesity is similar to that which occur
   in T2DM [[117]8]. Here, we found that using all data from this study,
   including normoglycemic obese individuals for validation, the accuracy
   of our proposed marker was notably low (~ 50%). We thought this low
   accuracy may be due to the metabolic similarities between obese and
   diabetic individuals. In fact, as noted in the Varemo et.al. study,
   obese and diabetic individuals have shown similar gene expression
   patterns [[118]8] that can lead to similar metabolisms. Moreover, obese
   individuals had high levels of fasting blood insulin, which demonstrate
   insulin resistance in this group. To test this issue, we removed obese
   healthy individuals from the validation set. This improved accuracy to
   78.50%. Therefore, this analysis confirmed both the Varemo et.al. claim
   about the similarity of the gene expression pattern between obese and
   diabetic individuals and the appropriate efficiency of our proposed
   markers in identifying insulin-resistant individuals. Although it is
   reasonable to assume that our model predicts insulin resistance
   biomarkers rather than obesity-related metabolic changes based on the
   fact that the diabetic samples in the test group included both
   non-obese and obese individuals, further research is needed to confirm
   that these biomarkers do not overlap with metabolic changes related to
   obesity. To achieve this, muscle gene expression data is required from
   individuals with normal blood sugar and insulin levels in both obese
   and non-obese groups, without drug use or confounding factors.
   Important metabolites such as methylglyoxal, hyaluronan, retinoic acid
   (vitamin A), sodium, alanine, and aspartate are present in the
   predicted markers. Notably, this method also successfully identified
   glucose as one of the markers. Methylglyoxal, a glycolytic by-product,
   is a toxic and highly reactive compound involved in cellular
   dysregulations. This compound modifies nucleotides, proteins, and
   lipids producing advanced glycation end products, contributing to
   diabetes complications. Methylglyoxal is associated with oxidative
   stress, cellular inflammation, and age-related disease such as diabetes
   [[119]36]. Several studies have revealed the impact of methylglyoxal on
   insulin signaling pathways and insulin resistance [[120]37, [121]38]
   and recently this metabolite has been introduced as an emerging marker
   for T2DM diagnosis [[122]36]. Hyaluronan is an anionic GAG metabolite
   implicated in several functions like cell signaling, proliferation and
   migration, and angiogenesis. This metabolite also contributes to the
   inflammation and pathogenesis of T2DM [[123]39, [124]40]. Studies have
   demonstrated that hyaluronan increases in the serum and skeletal muscle
   of T2DM subjects [[125]41, [126]42]. Several analyses have shown the
   possible roles of vitamin A in glucose metabolism, and the progression
   of insulin resistance [[127]43–[128]45]. Association of purine
   metabolites such as xanthine and hypoxanthine with the risk of T2DM
   incidence and complications has been reported [[129]46, [130]47]. Also,
   change in serum concentration of amino acids [[131]48], and sodium
   [[132]49] is associated with obesity and T2DM. In addition, the
   implication of gangliosides in insulin resistance has been shown
   [[133]50–[134]52]. In our study, GQ1b ganglioside was predicted in the
   top-ranked metabolites list that can be considered for future analysis.
   We also checked metabolomics-based studies of T2DM and the Human
   Metabolome Database for these metabolites [[135]53]. We found that
   glucose, hypoxanthine, alanine, aspartate, galactose, hyaluronate, and
   methylglyoxal levels have been reported to be associated with T2DM
   [[136]46, [137]53, [138]54]. This evidence confirms the efficiency of
   our method for identifying metabolite markers. These metabolite markers
   can be used for a further empirical investigation to verify their
   prognostic and diagnostic values in insulin resistance.

Supporting information

   S1 File. The complete list of the enriched metabolites in the unique
   and compartmentalized forms obtained from metabolite enrichment
   analysis of perturbed reactions is provided in S1 and S2 Tables,
   respectively.

   (XLSX)
   [139]Click here for additional data file.^ (74.9KB, XLSX)

Acknowledgments