ABSTRACT

   Context-specific genome-scale model (CS-GSM) reconstruction is becoming
   an efficient strategy for integrating and cross-comparing experimental
   multi-scale data to explore the relationship between cellular
   genotypes, facilitating fundamental or applied research discoveries.
   However, the application of CS modeling for non-conventional microbes
   is still challenging. Here, we present a graphical user interface that
   integrates COBRApy, EscherPy, and RIPTiDe, Python-based tools within
   the BioUML platform, and streamlines the reconstruction and
   interrogation of the CS genome-scale metabolic frameworks via Jupyter
   Notebook. The approach was tested using -omics data collected for
   Methylotuvimicrobium alcaliphilum 20Z^R, a prominent microbial chassis
   for methane capturing and valorization. We optimized the previously
   reconstructed whole genome-scale metabolic network by adjusting the
   flux distribution using gene expression data. The outputs of the
   automatically reconstructed CS metabolic network were comparable to
   manually optimized iIA409 models for Ca-growth conditions. However, the
   CS model questions the reversibility of the phosphoketolase pathway and
   suggests higher flux via primary oxidation pathways. The model also
   highlighted unresolved carbon partitioning between assimilatory and
   catabolic pathways at the formaldehyde-formate node. Only a very few
   genes and only one enzyme with a predicted function in C1 metabolism, a
   homolog of the formaldehyde oxidation enzyme (fae1-2), showed a
   significant change in expression in La-growth conditions. The CS-GSM
   predictions agreed with the experimental measurements under the
   assumption that the Fae1-2 is a part of the tetrahydrofolate-linked
   pathway. The cellular roles of the tungsten (W)-dependent formate
   dehydrogenase (fdhAB) and fae homologs (fae1-2 and fae3) were
   investigated via mutagenesis. The phenotype of the fdhAB mutant
   followed the model prediction. Furthermore, a more significant
   reduction of the biomass yield was observed during growth in
   La-supplemented media, confirming a higher flux through formate. M.
   alcaliphilum 20Z^R mutants lacking fae1-2 did not display any
   significant defects in methane or methanol-dependent growth. However,
   contrary to fae1, the fae1-2 homolog failed to restore the
   formaldehyde-activating enzyme function in complementation tests.
   Overall, the presented data suggest that the developed computational
   workflow supports the reconstruction and validation of CS-GSM networks
   of non-model microbes.

IMPORTANCE

   The interrogation of various types of data is a routine strategy to
   explore the relationship between genotype and phenotype. An efficient
   approach for integrating and cross-comparing experimental multi-scale
   data in the context of whole-genome-based metabolic network
   reconstruction becomes a powerful tool that facilitates fundamental and
   applied research discoveries. The present study describes the
   reconstruction of a context-specific (CS) model for the
   methane-utilizing bacterium, Methylotuvimicrobium alcaliphilum 20Z^R.
   M. alcaliphilum 20Z^R is becoming an attractive microbial platform for
   the production of biofuels, chemicals, pharmaceuticals, and
   bio-sorbents for capturing atmospheric methane. We demonstrate that
   this pipeline can help reconstruct metabolic models that are similar to
   manually curated networks. Furthermore, the model is able to highlight
   previously overlooked pathways, thus advancing fundamental knowledge of
   non-model microbial systems or promoting their development toward
   biotechnological or environmental implementations.

   KEYWORDS: context-specific genome-scale metabolic modeling,
   methanotrophy, methane-utilizing bacteria, Methylotuvimicrobium
   alcaliphilum 20Z^R

INTRODUCTION

   Complete and accurate reconstruction of metabolic networks is central
   for advancing fundamental discoveries or rebuilding microbial systems
   toward desired outcomes ([38]1[39]–[40]3). Model reconstruction
   includes numerous steps, but in general, success relies on the
   efficient integration of multi-scale experimental data, such as
   structural and functional organization of genomes, transcriptomes,
   proteomes, and metabolomes. The integration requires additional
   mathematical model developments that can capture changes in intra- and
   extracellular metabolites of a cell in response to genetic and/or
   environmental perturbation ([41]4). The model is expected to provide in
   silico and yet reliable solutions that reproduce the system’s
   behaviors, such as growth rate or cell yields, and predict genetic
   alterations required to improve desired outcomes (i.e., product yield)
   under defined growth conditions ([42]5, [43]6).

   Advances in next-generation sequencing approaches and automatic
   reconstruction of genome-scale metabolic (GSM) models have led to
   active application of network-guided metabolic engineering of non-model
   organisms, often with unique metabolisms like methane utilization and
   lithotrophy. Numerous models are now available in BiGG ([44]7),
   BioModels ([45]8), and MetaFishNet ([46]9). GSM reconstruction can
   further facilitate developments in non-model microbial systems via the
   addition of omics-driven constraints on the distribution of fluxes to
   realistically represent metabolic capabilities of the organism of
   interest ([47]10, [48]11).

   The present study describes the computational framework for the
   semi-automated reconstruction of a context-specific (CS) model of
   non-model organisms. The workflow includes a set of Python programs,
   including COBRApy ([49]12), EscherPy ([50]13), and RIPTiDe ([51]14)
   integrated into the Jupyter Notebook environment on the BioUML platform
   ([52]15). The pipeline was tested for the CS-GSM reconstruction of
   methane-utilizing bacteria (known as methanotrophs). There is a
   significant interest in compiling the unique metabolic capabilities of
   methanotrophic bacteria for biotechnological and/or environmental
   applications. We anticipate that the computational framework presented
   below can expedite the implementation of GSM models for such
   advancements. We selected Methylotuvimicrobium alcaliphilum 20Z^R, as
   over the years, the Methylotuvimicrobium spp. has become a testable
   microbial platform for methane capturing and valorization
   ([53]16[54]–[55]20). The metabolic flux balance models of the whole
   genome have been generated and manually curated ([56]21[57]–[58]23),
   allowing for a direct comparison between expert-curated and automated
   optimization of GSMs for non-model microbes.

MATERIALS AND METHODS

Strain, growth media, and -omics data sets

Strains

   The lab strain of Methylotuvimicrobium alcaliphilum 20Z^R is a
   Gram-negative, gamma proteobacterium that can grow on either methane or
   methanol as its main source of carbon ([59]24). The cultures of
   Methylobacterium extorquens AM1 and Methylobacterium extorquens AM1
   ∆fae1 were kindly provided by Dr. N. Cecilia Martinez-Gomez (UC
   Berkeley) (also see Table 3). The following Escherichia coli strains
   were used for cloning and gene transfer experiments: E. coli S17-1 (lab
   stock) and E. coli DH10B (NEB Labs, Catalog Number C3019H:
   Δ(ara-leu)7697 araD139 fhuA ΔlacX74 galK16 galE15 e14-ϕ80dlacZΔM15
   recA1 relA1 endA1 nupG rpsL (Str^R) rph spoT1 Δ(mrr-hsdRMS-mcrBC).

Growth media

   M. alcaliphilum 20Z^R strains were routinely grown in Pi[3%] nitrate
   mineral media supplemented with methanol (0.2%, vol/vol) or methane as
   described previously ([60]25). Methane (25–50 mL) was introduced into
   125–250 mL vials with 25–50 mL media using a syringe to keep a
   methane:air ratio of 20:80 in headspace. Plates were incubated in
   anaerobic jars and refilled using an Anoxomat system programmed to
   provide an atmosphere of 12%–25% methane and 75%–88% air. M. extorquens
   strains were grown using Hypho media ([61]26), supplemented with
   methanol (0.2%, vol/vol) or succinate (0.4%, vol/vol). Liquid cultures
   were incubated at 30°C with shaking at 200 rpm. Plates were kept in
   30°C incubators. E. coli strains were grown using LB broth or LB agar
   media (Miller, Difco BD Life Sciences) and incubated at 37°C. The
   following antibiotics were applied when required: kanamycin (100 µg/mL)
   and rifamycin (50 µg/mL).

Gene expression studies

   The cultivation parameters of M. alcaliphilum 20Z^R, culture growth
   rates, and sequencing details were previously described ([62]21) and
   are summarized in [63]Table 1. Briefly, the cultivation parameters and
   cell samples for RNA sequencing were collected from steady-state
   bioreactor cultures growing in nitrate mineral media at pH 8.7–9 with
   or without La supplementation. The parameters for each bioreactor unit
   (250 mL vessel with 125–150 mL of media) were set with agitation at 500
   rpm, temperature kept at 30°C, and gas inflow rate of 0.3 sL/hour.

TABLE 1.

   Summary of RNA-seq transcriptomic data sets used for the optimization
   of the GSM model
   Sample name Sample ID Raw reads Aligned reads Data set ID CH[4]/O[2]
   consumption Growth rate (hour^−1)
   Ca-BR1 Ca-CH4-BR1_S1_L005_R1_001 22,124,770 10,140,552 [64]GSM8020389
   1.12 ± 0.09 0.05
   CaBR2 Ca-CH4-BR2_S1_L005_R1_001 17,884,237 16,955,446 [65]GSM8020390
   1.12 ± 0.09 0.05
   LaBR1 La-CH4-BR1_S1_L005_R1_001 18,542,763 8,409,524 [66]GSM8020393
   1.28 ± 0.01 0.07
   LaBR2 La-CH4-BR2_S1_L005_R1_001 16,381,413 12,029,600 [67]GSM8020394
   1.28 ± 0.01 0.07
   [68]Open in a new tab

   The RNA-seq reads were mapped to the M. alcaliphilum 20Z^R reference
   genome (ASM96853v1) using the Bowtie2 algorithm version 2.5.1 ([69]27).
   Illumina standard adapters were removed using Trim Galore ([70]28)
   software (when necessary) before mapping. The mapped reads were
   quantified with FeatureCounts ([71]29) using gene features from the
   RefSeq annotation (GCF_000968535.2). To extend the functional
   annotation of mapped genes, a comparison of three annotations (old and
   new RefSeq as well as Genoscope annotations) for this genome was
   carried out ([72]Table S1). Subsequently, DESeq2 ([73]30) was used to
   normalize and perform differential expression analysis. Genes are
   considered to be differentially expressed if they have |log2(fold
   change)| > log2(1.5) at adjusted P-value < 0.05. The EnhancedVolcano
   R-package was used to visualize the obtained results.

   For KEGG signaling pathways enrichment analysis, we used a homebrew
   R-script. Specifically, the KEGGREST R-package was applied to match
   genes to KEGG signaling pathways. Then, Wilcoxon rank-sum test was
   performed for enrichment testing. The analysis was performed
   independently on either the full set of genes or specifically on genes
   considered in gene-protein-reaction (GPR) associations of the model.

   The results of the analyses are stored on the BioUML platform ([74]15)
   and available at the following link:
   [75]https://gitlab.sirius-web.org/RSF/20ZR_CS_GSM_model. The RNA-seq
   data (counts and differentially expressed genes) were submitted to the
   NCBI Gene Expression Omnibus database under accession number
   [76]GSE253414.

Extension of the iIA409 model, flux balance analysis with COBRApy, and model
visualization with EscherPy

   To integrate transcriptomic data into the mathematical model, a
   published and experimentally verified mathematical model for 20Z^R,
   iIA409 ([77]21) was harnessed. The model was modified to account for
   the presence of metals either in the medium or transported into the
   cell that were not previously considered explicitly. In addition,
   different scenarios for the formaldehyde partitioning between
   tetrahydrofolate (H[4]F) and tetrahydromethanopterin (H4MPT) pathways
   and phosphoketolase (EC4.1.2.22 and EC 4.1.2.9) were evaluated and
   compared with the original model. Modifications of the original (i.e.,
   iIA409) model, as well as a subsequent flux balance analysis of the
   modified iIA409-based models and CS models reconstructed using the
   RIPTiDe algorithm ([78]14), were made via the COBRApy library (version
   0.25.0 [[79]12]) using Jupyter widgets. The program code is given at
   the link
   [80]https://gitlab.sirius-web.org/RSF/20ZR_CS_GSM_model/-/tree/master/D
   ata/CS_GSM_model/file_collection.files.

   The EscherPy library (version 1.7.3 [[81]13]) was employed to visualize
   flux distributions predicted by the modified and CS models. The final
   network is available at the following link:
   [82]https://gitlab.sirius-web.org/RSF/20ZR_CS_GSM_model/-/tree/master/D
   ata/CS_GSM_model/file_collection.files.

   The differential reaction fluxes (DRFs) between two model predictions
   were visualized on the metabolic map if the ratio of fluxes for a
   reaction fell within the boundaries of 0.67 < DRF < 1.5 (i.e., the
   change in the reaction flux occurs more than 1.5 times). Additionally,
   reactions that were activated (On) or turned off (Off) in the CS model
   but not in the original were assigned to a ratio value of 1,000 (for
   On) or 2,000 (for Off). To visually distinguish these turned-on/off
   reactions from other DRFs, a specific color assignment was applied:
   pink for the value of 1,000 (On reactions) and aqua for the value of
   2,000 (Off reactions). The visualization program code is given under
   the link in the Jupyter Notebook called “20Z_transcript” in sections
   “Escher visualization” and “DRF analysis”
   ([83]https://gitlab.sirius-web.org/RSF/20ZR_CS_GSM_model/-/tree/master/
   Data/CS_GSM_model/file_collection.files).

Integration of transcriptomics data into the model via RIPTiDe

   The RIPTiDe algorithm (version 3.4.79 [[84]14]) was used to integrate
   transcriptomic data. Counts reflecting expression levels for each gene
   in transcripts per million in the corresponding growth conditions
   ([85]Table 1) were used as a source of transcriptomic data for the
   construction of CS models. Right before data integration, the selected
   changes were made to the model to match the selected cultivation
   conditions, and, if necessary, the biomass metal content was modified
   using the original code implemented within the Jupyter Notebook using
   the Jupyter widgets modules, which greatly simplified the process of
   the model modification. The stoichiometric coefficients for La^3+ and
   W^4+ in the biomass equation were set at experimentally measured
   concentrations (summarized in [86]Table 2). Also, the transcriptomic
   data were prepared using the RIPTiDe algorithm. Because we used data
   normalized by DESeq2, the normalization step with RIPTiDe was skipped.
   The file for the algorithm was converted to TSV format, containing a
   column with gene IDs (the same as those used in GPR rules in the iIA409
   model) and columns with normalized counts for each gene. For the direct
   reconstruction of CS models, the maxfit module was used, which analyzes
   models with different proportions of the original biomass in the model
   and selects the one in which the predicted distribution of flux had the
   highest level of correlation with transcriptomic data ([87]Table 2).
   RIPTiDe contextualize module was applied to reanalyze steps that were
   not calculated by original maxfit. All codes for the model modification
   and transcriptomic data integration are provided within the BioUML
   platform and available at
   [88]https://gitlab.sirius-web.org/RSF/20ZR_CS_GSM_model/-/tree/master/D
   ata/CS_GSM_model/file_collection.files and at
   [89]https://github.com/mkulyashov/20ZR_CS_GSM_model_mSystems.

TABLE 2.

   Modifications and in silico growth conditions used for the
   reconstruction of CS models via RIPTiDe algorithm
   Context-specific condition Model modification RIPTiDe parameters
   CH[4] (+Ca) Consumption CH[4] −11.7 ⇒ −2.85 mmol·gDCW^−1·hour^−1
   Consumption W^4+ lower bound: 0 ⇒ −1,000
   W^4+ added to biomass equation as 0.01 mmol/gDCW maxfit default
   parameters, frac_min = 0.3
   CH[4] (+La) Consumption CH[4] −11.7 ⇒ −6.84 mmol·gDCW^−1·hour^−1
   MXALa_for lower bound: 0 ⇒ 1.71; upper bound: 1,000 ⇒ 1.71
   FALDtpc lower bound: 0 ⇒ 5.13; upper bound: 1,000 ⇒ 5.13
   Consumption La^3+ lower bound: 0 ⇒ −1,000
   Consumption W^4+ lower bound: 0 ⇒ −1,000
   Consumption Ca^2+ lower bound: −1,000 ⇒ 0
   Ca^2+ removed from the biomass equation
   W^4+ added to biomass equation as 0.01 mmol/gDCW
   La^3+ added to biomass equation as 0.01 mmol/gDCW maxfit default
   parameters, frac_min = 0.5
   CH[4] (+La), Scenario 2 Consumption CH[4] −11.7 ⇒ −6.84
   mmol·gDCW^−1·hour^−1
   MXALa_for lower bound: 0 ⇒ 1.71; upper bound: 1,000 ⇒ 1.71
   FALDtpc lower bound: 0 ⇒ 5.13; upper bound: 1,000 ⇒ 5.13
   Faeii lower bound: −1,000; upper bound: 1,000
   Consumption La^3+ lower bound: 0 ⇒ −1,000
   Consumption W^4+ lower bound: 0 ⇒ −1,000
   Consumption Ca^2+ lower bound: −1,000 ⇒ 0
   Ca^2+ removed from the biomass equation
   W^4+ added to biomass equation as 0.01 mmol/gDCW
   La^3+ added to biomass equation as 0.01 mmol/gDCW maxfit default
   parameters, frac_min = 0.7
   [90]Open in a new tab

Fae phylogeny, construction of mutant strains, and phenotyping

Phylogeny

   More than 560 formaldehyde-activating enzymes (Fae) and Fae-like
   proteins with some homology to the Fae protein from M. alcaliphilum
   20Z^R (Gene ID: 2540616361; MEALZ_RS11875) were identified in the
   UniProt database. Of the 560 sequences, 78 were selected for
   phylogenetic comparison. The sequences represent 39 species—3 archaea
   and 36 bacteria, including planctomycetes, firmicutes, actinobacteria,
   and proteobacteria, and a representative from the NC10 phylum. The
   evolutionary history was inferred by using the maximum likelihood
   method and Le and Gascuel model ([91]31). The tree with the highest log
   likelihood (−12,851.33) is shown. Initial trees for the heuristic
   search were obtained automatically by applying Neighbor-Join and BioNJ
   algorithms to a matrix of pairwise distances estimated using the JTT
   model and then selecting the topology with superior log likelihood
   value. A discrete gamma distribution was used to model evolutionary
   rate differences among sites (five categories [+G, parameter =
   1.2236]). Evolutionary analyses were conducted in MEGA X ([92]32,
   [93]33).

Complementation tests

   The schematic for plasmid construction and integration of 20Z^Rfae into
   Methylobacterium extorquens AM1Δfae1 is presented in [94]Fig. S1. To
   construct fae-expression vectors, the fae genes from 20Z^R and
   pAWP78(P[89]) plasmid were amplified by PCR using Flash Phusion master
   mix (Thermo Fisher) using M. alcaliphilum DNA. The fae genes were then
   integrated into the pAWP78 vector under P[89] promoter by Gibson
   assembly using NEBuilder (NEB Labs) and transformed into E. coli
   DH10B-competent cells. Promoter P[89] is a truncated hybrid version of
   the P[tac] promoter, with a fragment of the P[trp], lac-operator, and a
   mxaF ribosome binding site (AGGAAA). The promoter is known to provide
   constitutive expression in methylotrophic bacteria ([95]34). The
   correct assemblies of the genetic constructs were validated by
   sequencing. The vectors were subcloned into E. coli S17-1. The plasmids
   were then transferred via conjugation into an M. extorquens AM1Δfae1
   mutant via bi-parental mating. pAWP78 empty vector was cloned into
   wild-type M. extorquens AM1 and the mutant strain AM1Δfae1 to construct
   control variants ([96]Table 3). Lawns of M. extorquens AM1Δfae1
   supplemented with succinate (1–2 day growth) and lawns of pAWP78-
   P[89]-20Z^Rfae1 in E. coli S17-1 (overnight) were prepared for mating
   on plates with Hypho media supplemented with 5% Nutrient Broth (NB
   Difco, BD Life Sciences). Plates were incubated for 1 day at 30°C. The
   biomass was collected and spread onto Hypho-agar plates supplemented
   with succinate and kanamycin for selection. Kanamycin-resistant clones
   were observed after 3–4 days of incubation. Clones were transferred
   into Hypho-agar plates with kanamycin and rifamycin for E. coli counter
   selection.

TABLE 3.

   Strains used in this study
   Strain Description Reference/source
   AM1 WT Methylobacterium extorquens AM1 wild type Dr. N. C.
   Martinez-Gomes (UC Berkeley)
   AM1Δfae1 M. extorquens AM1 lacking fae1 gene Dr. N. C. Martinez-Gomes
   (UC Berkeley) ([97]35)
   AM1-WT- EV M. extorquens AM1 wild type harboring pAWP78 empty vector.
   Kan^R This study
   AM1Δfae1-EV M. extorquens AM1Δfae1 harboring pAWP78 empty vector. Kan^R
   This study
   AM1Δfae:20Z^Rfae M. extorquens AM1Δfae harboring
   pAWP79-P[89]-20Z^Rfae1. Kan^R This study
   AM1Δfae:20Z^Rfae1-2 M. extorquens AM1Δfae harboring
   pAWP78-P[89]-20Z^Rfae1-2. Kan^R This study
   AM1Δfae:20Z^Rfae3 M. extorquens AM1Δfae harboring
   pAWP78-P[89]-20Z^Rfae3. Kan^R This study
   20Z^R Methylotuvimicrobium alcaliphilum 20Z^R WT, selected
   rifamycin-resistant variant Kalyuzhnaya Lab (SDSU)
   20Z^RΔfae1 M. alcaliphilum 20Z^R lacking canonical fae gene This study
   20Z^RΔfae1-2 M. alcaliphilum 20Z^R lacking fae1-2 gene, a homolog of
   fae This study
   20Z^RΔfae3 M. alcaliphilum 20Z^R lacking fae3 gene This study
   20Z^RΔfdhAB M. alcaliphilum 20Z^R lacking canonical fae gene This study
   DH10B Chemically competent E. coli cells, used for gene cloning NEB
   Labs ([98]https://www.neb.com)
   S17-1 Chemically competent E. coli cells, used for conjugation NEB Labs
   ([99]https://www.neb.com)
   [100]Open in a new tab

   The M. extorquens AM1Δfae1-mutant complementation tests were conducted
   using (i) control: Hypho medium supplemented with succinate (0.4%);
   (ii) sensitivity test: Hypho medium supplemented with succinate (0.4%)
   and methanol (0.2%); (iii) growth test: Hypho medium supplemented with
   methanol (0.2%). Cultures were incubated at 30°C and evaluated after
   1–2 days (succinate plates) and after 3–4 days (methanol or methanol +
   succinate plates).

   Liquid batch culture experiments were carried out in flasks containing
   25 mL of Hypho and supplemented with succinate (0.4%) and/or methanol
   (0.2%). At least three biological replicates with two technical
   replicates each per culture were tested. Each biological replicate was
   inoculated from a single colony. Growth measurements were taken
   starting from OD 0.1 until reaching the stationary phase and were used
   to calculate growth rates, and standard deviations were calculated to
   produce a graphic representation of the acquired data.

Mutagenesis

   All mutants were constructed using the allelic exchange vector pCM433
   ([101]36). Upstream and downstream flanking regions of fae genes or
   fdhAB-gene cluster (MEALZ_1882-MEALZ_1883) genes from M. alcaliphilum
   20Z^R were amplified using 2× Fusion Phlash (Thermo Fisher). The pCM433
   vector was amplified in the same way using pCM433-F and pCM433-R
   primers ([102]25). The PCR products were run on a gel and purified. The
   flanking regions were integrated into pCM433 vector using a NEBuilder
   HiFi DNA assembly master mix. Reactions were transformed into E. coli
   DH10B-competent cells. Kanamycin-resistant colonies were selected.
   Plasmids were purified using Zymoclean Gel DNA recovery kit (Zymo
   Research, USA). The constructs were verified by PCR, and sequences were
   confirmed by sequencing. The resulted plasmids were transformed into E.
   coli S17-1 and mated with M. alcaliphilum 20Z^R. Kanamycin-resistant
   colonies were subjected for negative selection on sucrose. The genotype
   of resulted clones was confirmed by PCR and PCR-fragment sequencing.
   All constructed strains are listed in [103]Table 3.

RESULTS

   Below, we summarize the main steps of the reconstruction and analysis
   of a CS GSM ([104]Fig. 1) using the previously developed GSM model of
   M. alcaliphilum 20Z^R ([105]21) and the RIPTiDe algorithm ([106]14).

Fig 1.

   [107]Gene-to-metabolite mapping depicts methane metabolism to biomass
   via formaldehyde pathways. Flux analysis visualizes context-specific
   gene models with RIPTiDe and CobraPy. Metabolic flux maps highlight
   pathway activity with colored intensity scales.
   [108]Open in a new tab

   A schematic representation of the developed pipeline to reconstruct and
   analyze the CS GSM model for M. alcaliphilum 20Z^R. At the first step,
   the original model was extended using COBRApy ([109]12) to account for
   the presence in the medium and transport into the cell of some metals
   (see Materials and Methods). Thereafter, the transcriptomics data for
   different growth conditions (or contexts) were integrated into the
   extended model to reconstruct a number of CS models using the RIPTiDe
   tool ([110]14) based on the GPR presented in the model. Finally, flux
   balance analysis (FBA) was conducted in COBRApy, and predicted
   distributions of fluxes were displayed on the metabolic maps using the
   Escher web tool ([111]13). All steps of the pipeline are performed in
   the corresponding Jupyter Notebook (see Data availability statement).

Construction of CS-GSM model for aerobic methanotroph M. alcaliphilum 20Z^R

   To build a workflow for the CS modeling with GSM and available -omics
   data, we applied Jupyter Notebook ([112]37, [113]38). The computational
   platform included Python-based tools such as COBRApy ([114]12),
   EscherPy ([115]13), and RIPTiDe ([116]14) to simplify the process of
   handling GSM models.

   To reconstruct CS-GSMs for M. alcaliphilum 20Z^R, the original GSM
   model was extended. Specifically, the Ca^2+ transport reactions were
   added as an environmental control of the enzymatic reaction catalyzed
   by Ca^2+-dependent methanol dehydrogenase (Ca-MDH, mxaFI) ([117]Fig.
   2A). Similarly, the transport reaction for La^3+ was added and linked
   with the La-dependent methanol dehydrogenase (La-MDH, xoxF) reaction
   ([118]Fig. 2B). Since it has been shown that La-MDHs can oxidize
   methanol to formate ([119]39), the corresponding reaction was added in
   addition to the methanol to formaldehyde conversion. These
   modifications enabled tunable control of the methanol conversion steps
   depending on the cultivation conditions. This modification also allows
   control of the methanol flux via Ca-MDH or La-MDH according to their
   gene expression levels. Additionally, a transport reaction for tungsten
   (W) was added and integrated with the reaction catalyzed by W-dependent
   formate dehydrogenase ([120]Fig. 2C).

Fig 2.

   [121]Metabolic pathway diagrams illustrate modifications for Ca²⁺,
   La³⁺, and W⁴⁺ transport reactions. Arrows denote import, export, and
   transport processes through specific pathways, with key nodes
   representing reaction intermediates and transporters.
   [122]Open in a new tab

   Visualization of the reactions that were added to the model for calcium
   (A), lanthanum (B), and tungsten (C) via Escher tool ([123]13). (A)
   Modified Ca^2+ transport reactions. (B) Added exchange and transport
   reactions for La^3+. (C) Added exchange and transport reactions for
   W^4+.

   The growth rates predicted by the iIA409 and optimized CS models were
   similar, but both were below the experimentally observed rates for
   Ca-growth conditions ([124]Table S2). However, the predicted O[2]:CH[4]
   ratios were similar to the experimental measurements (1.22 and 1.273
   for iIA409 and CS model, respectively, versus 1.12 ± 0.09 obtained for
   chemostat cultures).

   Similarly to the original model, the predicted O[2] consumption
   reaction rate was significantly higher than experimental measurements
   for the La-growth conditions too (see details in [125]Table S2). The
   introduction of O[2] uptake constraints to the CS model tuned the in
   silico prediction toward measured parameters. Despite similar outputs,
   some differences between the two types of M. alcaliphilum genome-scale
   metabolic models were observed, in particular, differences in predicted
   growth rates, suggesting variations in carbon flux distributions. To
   investigate this further, we carried out a comparative analysis of the
   fluxes and applied network visualization (Escher tool) to highlight the
   most significant divergences.

CS-GMS highlights unresolved carbon distribution during growth in the
presence of lanthanides

   The Ca- and La-growth conditions were investigated separately. Detailed
   Escher metabolic maps are presented in [126]Fig. S1.2 and 1.3 for
   Ca-growth condition and in [127]Fig. S1.5 and 1.6 for La growth
   condition. The CS model predicts a slightly higher activity of the
   H4MPT pathway (from 1.07 to 1.25 mmol/gDCW/hour compared to the
   original model) and activation of the uphill electron transport
   mechanism unlike the original model for Ca-growth ([128]Fig. S1.2 to
   1.4).

   The flux distribution predicted by the CS model for growth with La
   differed more substantially from that predicted by the original model.
   To quantitatively identify metabolic differences between the original
   model and the CS model, the metabolic map was constructed using DRFs
   (see Materials and Methods). The constructed map for La growth
   conditions showed differences in the value of fluxes of reactions
   related to the RuMP pathway: reactions associated with
   ribulose-5-phosphate regeneration are less active in the CS model
   simulation (decreased flux through transaldolase and transketolase
   reactions [TKT1 and TKT2] by 1.37 times, from 1.28 to 0.933 and 1.31 to
   0.952 mmol/gDCW/hour, respectively). A similar trend was observed for
   pentose-5-phosphate 3-epimerase reaction from 2.59 to 1.88
   mmol/gDCW/hour. Carbon flux via the EMP and EDD pathways also decreased
   compared to the original model predictions by 1.37 times. The decreased
   flux of CO[2] production reaction (by 1.67 times, from 3.55 to 2
   mmol/gDCW/hour) was accompanied by lower CO[2] production via the
   formate dehydrogenase (FDH) reaction by 1.9 times (from 2.95 to 1.56
   mmol/gDCW/hour), similar to the Ca-growth condition but to a higher
   degree ([129]Fig. S1.5 to 1.7).

   At the same time, the carbon flow via the C1 transfer pathways was
   increased. The high flux via the H[4]MTP pathway can be linked with a
   very high-level expression of fae1-2, a homolog of canonical FAE, also
   known as 5,6,7,8-tetrahydromethanopterin hydro-lyase ([130]35). fae1-2
   expression was more than four times higher during La-supplemented
   growth, while the expression of other genes in the H[4]MPT pathway did
   not significantly change ([131]Fig. S1.8; [132]Table 4). Furthermore, a
   twofold reduction was observed for the canonical FAE1. The differential
   expression of the fae genes between Ca and La growth conditions has
   been previously noted, and while the alternative metabolic function for
   fae1-2 has been speculated, it remains unresolved ([133]21).

TABLE 4.

   Comparison of transcription level for FAE genes (fae1, fae1-2, and
   fae3) between Ca- and La-present conditions
   Gene ID Number of normalized counts, +Ca Number of normalized counts,
   +La log2fold
   change P [adj]
   MEALZ_RS11875
   (fae1) 22,233 21,377 14,086 13,351 −0.431 0.001
   MEALZ_RS04100 (fae1-2) 2,560 1,834 40,690 24,419 4.094 7.01E^−17
   MEALZ_RS07105 (fae3) 6,392 5,327 3,519 3,245 −0.55 0.00026
   [134]Open in a new tab

   To evaluate the fae homolog’s contribution to carbon flux deterring
   from the RuMP pathway toward the primary oxidation pathway, we
   reconstructed the CS model, which assumes different roles for the
   Fae1-2, from its contribution to Fae1 function to the carbon flow into
   the H[4]F pathway by adding the reaction FAEii, which is the
   condensation reaction of formaldehyde with tetrahydrofolate and
   considering different ratios of formate and formaldehyde produced by
   La-MDH (xoxF) ([135]Fig. 3; [136]Table 5).

Fig 3.

   [137]Metabolic flux maps depict methane conversion to formaldehyde via
   Cu-pMMO and subsequent pathways to acetyl-CoA through the RuMP cycle,
   serine cycle, and TCA cycle. Arrows indicate flux intensity,
   color-coded by mmol/gDCW/hour.
   [138]Open in a new tab

   The schematic representation of the distribution of fluxes predicted
   (A) by original iIA409 model for growth on CH[4] in the presence of La,
   W, and Cu with FAEii in H4MTP mode and (B) by CS model based on
   transcriptomic count data for growth on CH[4] in the presence of La, W,
   and Cu with FAEii in H4Folate mode. The color indicates the extent of
   the flux rate through the reaction in mmol/gDCW/hour. Cu-pMMO, a
   reaction catalyzed by Cu-dependent pMMO; La-MDH and Ca-MDH, reactions
   with activities of two MDH isoforms (La- and Ca-dependent,
   respectively); W-FDH, the enzymatic reaction catalyzed by W-dependent
   FDH. FAEii, reaction that describes the potential role of fae1-2 in the
   condensation of formaldehyde and tetrahydrofolate.

TABLE 5.

   Predicted flux via the cofactor (H[4]F or H[4]MTP) mediated C[1]
   transfer for growth with La
   Scenario MXA-La type Flux through FAEii Model growth rate (hour^−1)
   Model O[2]:CH[4] ratio Experimental growth rate (hour^−1) Experimental
   O[2]:CH[4] ratio
   iIA409 H4MTP MXALa:MXALa_for
   = 0.75:0.25[139]^a 1.29 0.74 1.3 0.07 1.28 ± 0.01
   MXALa:MXALa_for
   = 1.0:0.0[140]^b 2.57 0.0822 1.22
   iIA409 H4F MXALa:MXALa_for 0.75:0.25[141]^a 1.17 0.0763 1.285
   MXALa:MXALa_for
   = 1.0:0.0[142]^b 2.33 0.0869 1.185
   CS H4MTP MXALa:MXALa_for
   = 0.75:0.25[143]^a 2.34 0.0537 1.31
   MXALa:MXALa_for
   = 1.0:0.0[144]^b 2.78 0.0778 1.27
   CS H4F MXALa:MXALa_for
   = 0.75:0.25[145]^a 1.17 0.0763 1.285
   MXALa:MXALa_for
   = 1.0:0.0[146]^b 2.59 0.0657 1.17
   [147]Open in a new tab
   ^^a

   MXALa_for bounded conditions, in which the proportion of formate and
   formaldehyde produced by La-MDH (xoxF) was added manually by
   constriction of flux through MXALa and MXALa_for reactions in
   proportion 0.75:0.25.
   ^^b

   MXALa_for unbounded conditions, in which proportion of formate and
   formaldehyde produced by La-MDH (xoxF) was not manually restricted and
   bounds for MXALa and MXALa_for reactions were the same (0;1,000
   mmol/gDCW) for both.

   We also investigated an opportunity of the reverse function for Fae1-2
   (hydrolysis instead of condensation). The assumption of Fae1-2 role in
   the H[4]Folate pathway provided the best fit to the experimental data
   ([148]Table 5) for both the original and CS models. Furthermore, the CS
   model in this condition or the context predicts O[2] consumption rate
   without any restrictions on its reactions, unlike CS model without
   FAEii reaction. Considering that La-growth contributes to a higher
   formate pool, the essentiality of the Fae1-2 function becomes more
   apparent. To further investigate the prediction, additional targeted
   mutagenesis studies were carried out. First, we investigated the impact
   of the formate dehydrogenase knockout on cell growth with Ca versus La.
   Then, we investigated the phenotypes of fae1-2 knockouts. The
   functional characterization of Fae1-2 was supplemented by phylogenetic
   and complementation studies.

Mutagenesis

   A knockout strain 20Z^R ∆fdhAB was constructed as described in
   Materials and Methods. The growth phenotypes of the mutant were tested
   using growth media containing either La or Ca to investigate the
   impacts of the La-switch (i.e., XoxF methanol dehydrogenase instead of
   Ca-dependent MxaFI). The 20Z^R∆fdh knockouts displayed growth defects
   when compared to the wild type, and the growth reduction was more
   severe during La supplementation ([149]Fig. 4). The growth rate was
   almost fivefold lower in the fdhAB mutant compared to the wild type. No
   significant growth defects were observed in media supplemented with
   methanol (data not shown), suggesting that the FDH is more critical for
   methane utilization.

Fig 4.

   Left graph depicts OD600 growth curves for 20ZR-WT and 20ZRΔFDH strains
   on Ca and La over time. Right bar chart compares growth rates,
   highlighting significantly higher rates for WT strains on CH₄. Error
   bars represent standard deviations.
   [150]Open in a new tab

   Growth parameters of M. alcaliphilum 20Z^R WT and 20Z^R∆fdhAB grown
   with methane using growth media supplemented with Ca or La.

   We also constructed M. alcaliphilum 20Z^R mutants lacking fae1-2 or
   fae3; however, the mutants did not display any growth defects (data not
   shown). We were not able to generate fae1 knockouts, which indicates
   its essentiality for the growth.

Complementation tests

   In order to investigate the functional activity of the Fae1 and Fae1-2
   homologs, complementation studies were carried out. M. extorquens
   AM1Δfae1, the null mutant strain, does not have the ability to use
   methanol as a carbon source and is highly sensitive to methanol
   ([151]35). The strain AM1 has been used for investigating the function
   of Fae-like sequences obtained from pure cultures as well as
   metagenomic sequences ([152]40, [153]41). In this study, three
   plasmids, harboring fae1 (MEALZ_RS11875), fae1-2 (MEALZ_RS04100), and
   fae3 (MEALZ_RS07105) homologs from M. alcaliphilum 20Z^R, were
   constructed and integrated into M. extorquens AM1Δfae1. All strains
   have the ability to grow on succinate ([154]Fig. 5A and B). When tested
   on solid media, both the sensitivity test and the growth test revealed
   that the integration of fae1 and fae1-2 homologs into AM1Δfae1 relieved
   methanol sensitivity through complementation. The strain
   AM1Δfae1:pAWP78-P[89]-20Z^Rfae3 displayed similar sensitivity to
   methanol as the parental AM1Δfae1 strain, while both AM1Δfae1:
   pAWP78-P[89]-20Z^Rfae1 and AM1Δfae1:pAWP78-P[89]-20Z^Rfae1-2 became
   resistant to methanol. The fae1 and fae1-2 genes also rescued the
   AM1Δfae1 growth on methanol on solid media.

Fig 5.

   [155]Bar charts and growth curves compare AM1-WT and Δfae strains with
   fae variants for growth rates and OD600 over time under different
   conditions. Growth condition table summarizes strain performance on
   succinate and methanol.
   [156]Open in a new tab

   Growth parameters (growth rate and growth curve) of the M. extorquens
   AM1 wild-type (WT) and AM1Δfae strains expressing fae1 and fae homologs
   on succinate (A and B) and methanol (C and D). (E) Complementation
   results obtained on solid media. *Complementation of the AM1Δfae-mutant
   growth on solid media with methanol is highlighted in gray.

   The growth of the strains was also investigated in liquid culture and
   produced surprisingly different outcome for the Fae1-2 construct. While
   both fae1 and fae1-2 restored growth on methanol on solid plates, only
   fae1 is able to restore the AM1∆fae-mutant phenotype. AM1Δfae:20Zfae1-2
   is able to relieve methanol sensitivity on agar medium but is not able
   to grow in liquid culture supplemented with methanol ([157]Fig. 5C
   through E). AM1Δfae1, AM1Δfae:20Z^Rfae1-2, and AM1Δfae1:20Z^Rfae3 do
   not grow in liquid cultures supplemented with methanol.

Phylogenetic analyses of Fae homologs

   To analyze the functional roles of the fae homologs, and more
   specifically the La-induced fae1-2 gene product, we carried out
   phylogenetic analyses. Fae and Fae-like proteins are homologous across
   different groups of bacteria, such as the firmicutes, actinomycetes,
   planctomycetes proteobacteria, and NC10 phylum, as well as archaea. The
   phylogenetic tree we constructed includes 78 Fae sequences that
   represent different microbial species, including 34 bacterial and 5
   archaeal species. Geoglobus ahangarhi (Gene ID: 2810347888;
   Ga0325130_111788) roots the tree. Five main groups were identified.
   Only sequences from group 1 (bacterial Fae) and group 5 (archaeal Fae)
   have defined cellular function ([158]Fig. 6). Group 1 Fae proteins
   capture formaldehyde generated from primary C[1] oxidation (methanol to
   formaldehyde). Group 5 Fae enzymes contribute to capturing formaldehyde
   produced by a reverse hexulose phosphate synthase activity during the
   conversion of C6 sugars into C5 sugars for nucleotide biosynthesis
   ([159]42, [160]43). Both Group 1 and Group 5 Fae proteins catalyze the
   formation of methylene-H[4]MPT from H[4]MPT and formaldehyde and thus
   serve as the entry point for the formaldehyde oxidation pathway. The
   percent identity shared for archaeal sequences in Group 5 with
   canonical Fae from M. extorquens AM1 (Gene ID: 644811916;
   MexAM1_META1p1766) is between 50% and 51%.

Fig 6.

   [161]Phylogenetic tree categorizes Fae enzymes into five groups based
   on function and taxonomy, including conventional, unknown,
   Gram-positive, and archaeal Fae. Colored arcs represent groups, with
   branch labels detailing species and strains.
   [162]Open in a new tab

   Phylogenetic analysis of Fae homologs in bacteria and archaea. Protein
   sequences of Fae from M. alcaliphilum 20Z^R are highlighted in bold.
   Filled circles on nodes represent >70% bootstrap support. Open circles
   on nodes represent >50% bootstrap support. The tree is drawn to scale,
   with branch lengths measured in the number of substitutions per site.
   This analysis involved 78 amino acid sequences. A total of 452
   positions were used in the final data set.

   The cellular functions of Fae-like proteins from Groups 2, 3, and 4 are
   not known. Suggested functions include regulatory or sensing roles or a
   possible contribution to tetrahydrofolate-linked C-transfer ([163]44,
   [164]45).

DISCUSSION AND CONCLUSIONS

   In this study, we implemented the Python-based pipeline to reconstruct
   the context-specific metabolic network of M. alcaliphilum 20Z^R, a
   model methanotrophic bacterium. The initial goal of the study was to
   streamline metabolic network reconstruction for non-model microbial
   systems with unique metabolic capabilities, i.e., methanotrophy. The
   model reconstruction was followed by flux balance analysis for 20Z^R
   growing on methane with Ca or La supplementation. Considering that a
   great deal of information is available for the strain, the
   reconstructed model was compared to previously expert-curated
   reconstructions (iIA409 model [[165]21]).

   The CS model predictions were comparable to the original iIA409 model
   simulation with a few exceptions. The CS model indicated that reverse
   electron transfer through the cytochrome bc1 complex is essential for
   supporting methane oxidation, whereas the original model suggested a
   more direct redox input from a reduced cytochrome cL, a product of
   methanol oxidation. Therefore, the CS model predicted the necessity of
   the reverse electron transfer systems without requiring additional
   expert-curated constraints, which were necessary for the original model
   ([166]46). It should be mentioned, however, that both models do not
   support the redox model of methane oxidation. The CS model also ruled
   out a possibility of carbon flux from acetyl-P toward pentose phosphate
   pathway intermediates via the phosphoketolase pathway. Finally, CS
   model suggested higher carbon flux via formaldehyde-formate nodes. The
   direct conversion to formate due to XoxF activity and the H[4]MPT
   pathway activation led to the enhanced production of formate compared
   to the value predicted by the original model. The increase in the
   expression of the gene encoding FDH has been confirmed by the analysis
   of DEGs between La and Ca conditions ([167]47). Follow-up studies with
   the formate dehydrogenase knockouts further confirm the essentiality of
   the formate oxidation reaction.

   The CS model predictions for the La-dependent growth condition
   demonstrated a set of metabolic rearrangements in comparison to the
   flux distributions obtained by the simulation of the original model.
   The key difference in the carbon flux distribution predicted by the CS
   versus the original models is the increased role of the C1 transfer
   pathway. It has been shown previously that La supplementation results
   in the reduction of gene expression for a canonical fae1, while
   inducing its close homolog, named here as fae1-2. We interrogated the
   possible functional role of fae1-2 using the CS model. The model
   highlights a possibility that Fae1-2 is an enzyme that contributes to
   formaldehyde condensation with H[4]folate. Genetic tests further
   indicate that the Fae1-2 function differs substantially from the
   canonical Fae1.

   In summary, we demonstrate that this pipeline can help reconstruct
   metabolic models that are similar to manually curated networks. The
   pipeline for CS model development implemented in the BioUML platform
   can be used and easily tuned for the investigation of metabolic changes
   in other growth conditions of the methanotrophic strain. Furthermore,
   the reconstructed models should be able to highlight previously
   overlooked pathways, thus advancing fundamental knowledge of non-model
   microbial systems and promoting their development toward
   biotechnological or environmental implementations.

Supplementary Material

   Reviewer comments
   [168]reviewer-comments.pdf^ (555.8KB, pdf)

ACKNOWLEDGMENTS