Abstract

   Extensive microdiversity within Prochlorococcus, the most abundant
   marine cyanobacterium, occurs at scales from a single droplet of
   seawater to ocean basins. To interpret the structuring role of
   variations in genetic potential, as well as metabolic and physiological
   acclimation, we developed a mechanistic constraint-based modeling
   framework that incorporates the full suite of genes, proteins,
   metabolic reactions, pigments, and biochemical compositions of 69
   sequenced isolates spanning the Prochlorococcus pangenome. Optimizing
   each strain to the local, observed physical and chemical environment
   along an Atlantic Ocean transect, we predicted variations in
   strain-specific patterns of growth rate, metabolic configuration, and
   physiological state, defining subtle niche subspaces directly
   attributable to differences in their encoded metabolic potential.
   Predicted growth rates covaried with observed ecotype abundances,
   affirming their significance as a measure of fitness and inferring a
   nonlinear density dependence of mortality. Our study demonstrates the
   potential to interpret global-scale ecosystem organization in terms of
   cellular-scale processes.
     __________________________________________________________________

   Resolving the nano-niche: Simulations of metabolism and physiology
   across pangenomes link the molecular and ecosystem scales.

INTRODUCTION

   The cyanobacterium Prochlorococcus is the most abundant photosynthetic
   organism in the ocean, contributing most of the primary production in
   vast regions of the nutrient-starved subtropical gyres ([40]1–[41]3).
   The dominance of Prochlorococcus in these “ocean deserts” has been
   attributed to their small size ([42]4), streamlined genomes ([43]5),
   and their microdiversity ([44]6, [45]7). At the subspecies clade level,
   “ecotypes” of Prochlorococcus partition with depth in the water column
   and along temperature gradients to span the full euphotic layer from
   about 40°N to 40°S ([46]8–[47]11) and exhibit differential activity
   within the total Prochlorococcus assemblage (population, herein)
   ([48]12). To maintain access to a low but variable supply of many
   organic and inorganic nutrient resources, it may be that
   Prochlorococcus shifts the costs of housing a broad metabolic
   repertoire from the individual cell to the population. While a typical
   strain contains fewer than 2000 protein-coding genes, the pangenome of
   Prochlorococcus may carry more than 80,000 protein-coding genes
   ([49]13), a vast library of genetic potential distributed among a
   nonredundant, diverse minority. Extensive metagenomic surveys and
   single-cell sequencing have provided a glimpse into the scope of the
   pangenome ([50]14) and the co-occurrence of many strains ([51]6,
   [52]12), but it is unclear how metabolic and physiological
   microdiversity influences the Prochlorococcus population fitness across
   the ocean.

   Prochlorococcus populations make use of a broad repertoire of
   strategies to respond to changes in nutrient availability and light
   conditions through its genetic diversity ([53]15), photoadaptation and
   photoacclimation ([54]16, [55]17), flexibility in nutrient uptake
   kinetics ([56]18), and changes in elemental stoichiometry ([57]19,
   [58]20). Although much has been learned about the biology of
   Prochlorococcus, its physiology, genetic diversity, and adaptations to
   its spartan lifestyle, the current generation of statistical and
   dynamic models used to interpret and predict Prochloroccus biogeography
   and biogeochemistry ([59]21–[60]23) reflects little of the depth of
   diversity or current mechanistic understanding. Drawing from extensive
   genomic and molecular datasets, we present a model that spans the
   genetic microdiversity of Prochlorococcus and resolves physiological
   and metabolic acclimation to local environments. It can provide a
   framework to simulate and interpret the essential diversity and
   flexibility of this organism and the organization of its populations at
   the ecosystem scale.

RESULTS AND DISCUSSION

The Microbial Simulation Environment

   We developed the Microbial Simulation Environment (MSE;
   [61]http://github.com/jrcasey/mse_AMT; [62]Fig. 1), a pipeline to
   simulate the steady-state growth, metabolism, and physiology of both
   sequenced isolates and in silico strains, each acclimated to a
   particular environment. Briefly, a pangenome-scale metabolic model
   (PanGEM) was reconstructed, drawing from a database of 866,894 protein
   coding sequences across 647 sequenced isolates, metagenome-assembled
   genomes, and single cell–amplified genomes spanning the Prochlorococcus
   phylogeny (DOI:[63]10.5281/zenodo.4477905; data S1). The PanGEM
   consists of 1117 orthologous gene clusters, 1484 reactions, and 1282
   metabolites and was manually curated and annotated with multiple
   bioinformatics, cheminformatics, and systems biology databases and
   tools (Materials and Methods and table S1). On the basis of a power law
   regression ([64]24) of pangenome rarefaction curves generated for both
   homologous genes and orthologous gene clusters annotated in the Kyoto
   Encyclopedia of Genes and Genomes (KEGG) (fig. S1), the genetic
   diversity remains open (α = 0.8) but the functionally annotated
   metabolic subset appears to be closed (α = 1.9), suggesting that much
   of the genetic diversity yet to be cataloged will be associated with
   regulatory, repair, and other nonmetabolic functions. Even so, gap
   filling of 36 reactions (2.4% of all reactions) lacking an annotated
   gene was still necessary for biomass synthesis in the PanGEM
   reconstruction, so additional functional annotation efforts are needed
   to provide evidence for these missing gene-protein-reaction
   associations.

Fig. 1. Schematic of metabolic and physiological acclimation processes
simulated by MSE.

   [65]Fig. 1.
   [66]Open in a new tab

   A single cell is shown at the center with several membrane transporters
   (colored rectangles) and macromolecular pools (graded circles). Four
   key modules are shown in expanded detail, described clockwise. (Top
   left) Metabolic networks differ for each of the 69 strains simulated. A
   multipartite graph of genes, reactions, and metabolites is shown for a
   particular strain, SS120. (Top right) Transporter abundances (n) are
   optimized according to a mechanistic model of substrate transport that
   resolves several different limitation regimes (diffusion limitation,
   growth limitation, porter limitation, and surface area limitation).
   Competition for resources between transporters results in deviations
   from the optimum n^* across the range of ambient substrate
   concentrations S[∞]. (Bottom right) The downwelling irradiance spectrum
   selects the optimal distribution of pigments on the basis of absorption
   (Abs) spectra and the cellular energy demand, within experimental
   constraints. Excitons are shuttled to photosystems I and II from each
   electronic state of each light-harvesting pigment [divinylchlorophyll a
   (DV-Chl a), divinylchlorophyll b (DV-Chl b), and α-carotene (α-Car)].
   Under excess light conditions, the photoprotective pigment zeaxanthin
   (Zeax) acts both to dissipate excess photons from light-harvesting
   pigments to heat and to quench singlet oxygen generated by the
   relaxation of triply excited chlorophylls a and b. (Bottom left)
   Macromolecular compositions are optimized within experimental
   constraints to maximize growth given the availability of nutrient and
   energy resources. Differences between environments in the size of each
   graded circle are intended to reflect changes in the levels of each
   macromolecular pool. Carbs, carbohydrates; dMets, dissolved (free)
   metabolites.

   Using the PanGEM as a template, feasible and stoichiometrically
   balanced strain-specific genome-scale models (GEMs) for each sequenced
   organism (referred to as in silico strains) are generated using a
   compressed sensing (CS)–based approach (PanGEM Toolbox;
   [67]http://github.com/jrcasey/PanGem; Materials and Methods and fig.
   S2) and quality tested using a community standard ([68]25). Before
   simulations, each in silico strain acclimates to a given environment
   (defined by in situ temperature, nutrient concentrations, and light
   spectra) by optimally adjusting the cell size, nutrient transporters,
   pigments, biochemical compositions, and metabolic fluxes, accomplished
   with a sequence of experimentally and empirically constrained bilevel
   flux balance analysis (FBA) optimizations (called OptTrans and PhysOpt;
   Materials and Methods). Metabolic rates are then adjusted for
   strain-specific temperature dependence, predicted from proteome
   sequences with an amended machine learning algorithm [based on
   ([69]26)] (Materials and Methods). Together, PanGEM Toolbox and MSE are
   intended to be suitably flexible for implementation in other microbial
   systems and are documented to guide users through reconstruction and
   simulation.

   MSE simulations were implemented for 69 sequenced isolates representing
   all five major ecotypes of Prochlorococcus (data S1) on the Atlantic
   Meridional Transect cruise (AMT-13) in late summer of 2003. A suite of
   66 physical, chemical, and biological measurements from satellite
   optics, shipboard sensors, and discrete samples were interpolated at
   10-m depth intervals from 10 to 200 m at 32 stations from 48°N to 45°S,
   passing through the Senegalo-Mauritanian upwelling region ([70]Fig. 2A
   and data S2). A subset of variables including temperature, solar
   irradiance, and nutrient concentrations were used as inputs to each
   simulation, while other variables were used for context and validation
   (figs. S3 to S5).

Fig. 2. Summary of AMT-13 cruise simulation.

   [71]Fig. 2.
   [72]Open in a new tab

   (A) Cruise track of the AMT-13 expedition during the fall of 2003. (B)
   Meridional sections of observed temperature (white contours),
   photosynthetically active radiation PAR; (yellow contours,
   logarithmically spaced), and dissolved inorganic nitrogen (DIN;
   heatmap). (C) Meridional sections of observed Prochlorococcus
   abundances (white contours, logarithmically spaced) and predicted
   population growth rates (heatmap). (D) Observed Prochlorococcus
   population abundance plotted against predicted population growth rates
   (black markers). Red and blue markers correspond to previously
   published datasets. (E to I) Meridional sections of measured
   Prochlorococcus ecotype abundances (white contours, logarithmically
   spaced) and predicted growth rates (heatmaps). Although the LLII/III
   ecotype was nearly absent from the entire transect, Johnson et al.
   ([73]9) note that quantitative polymerase chain reaction primers for
   this ecotype may have been contaminated.

   Leveraging a synoptic dataset of Prochlorococcus ecotype abundances
   along the transect ([74]9), predicted growth rates, metabolic rates,
   and biochemical compositions at the strain level were scaled up to
   populations. By assuming that each ecotype could be represented by its
   fittest strain in a particular sample location, the rates and
   compositions associated with each ecotype were weighted by their
   observed in situ abundances (Materials and Methods) to directly compare
   against measured, volumetric quantities.

Prochlorococcus population ecology and biogeochemistry

   The predicted vertical structure in total Prochlorococcus population
   growth rates ([75]Fig. 2C and fig. S6) was consistent with observed
   abundance profiles ([76]2, [77]27), generally declining with depth and
   light intensity and in the top few meters because of photoinhibition
   and nutrient scarcity. While our approach considers only bottom-up
   controls, there was a remarkable correspondence at the population level
   between predicted growth rates and observed abundances throughout the
   transect ([78]Fig. 2D). Overall, a nonlinear relationship emerged
   between observed abundances and predicted growth rates; a power law (of
   the form
   [MATH: <mrow><mrow><mfrac><mi mathvariant="italic">dN</mi><mi
   mathvariant="italic">dt</mi></mfrac><mo>=</mo><mi
   mathvariant="normal">μ</mi><mi>B</mi><mo>−</mo><mi>m</mi><msup><mi>B</m
   i><mi>x</mi></msup></mrow></mrow> :MATH]
   ) provided a better fit (R^2 = 0.79; df = 638) than a linear form (R^2
   = 0.40), suggesting a nonlinear density dependence of mortality. Direct
   taxon-specific in situ phytoplankton growth rate measurements are
   scarce ([79]1, [80]2, [81]27) but follow a qualitatively similar
   pattern ([82]Fig. 2D).

   Within the total Prochlorococcus population, the predicted growth rates
   of each ecotype showed distinct environmental niches that corresponded
   to observed abundances ([83]Fig. 2, E to I; shading shows growth rate
   and contours show observed population density). This correspondence
   supports the notion that growth rate is important for relative fitness.
   Generally, growth of HLII ecotype strains outpaced others in the
   warmest environments where inorganic nitrogen concentrations were
   lowest and light was highest, while HLI strains were fittest in the
   midlatitudes. LLI and LLII/III ecotypes overlapped in their optimal
   niche space around the deep chlorophyll maximum depth, below which
   growth rates were typically highest for strains belonging to the
   larger, higher–gene content ecotype LLIV.

   At high latitudes, the temperatures, nutrient concentrations, and light
   levels near the surface correspond to those typical of the deep
   chlorophyll maximum in the central gyres, resulting in an outcropping
   of low light–adapted ecotypes. In the northern hemisphere, LLI was the
   dominant ecotype at the surface, both in our simulations of growth
   rates and in observed abundance. Breaking the general trend of
   correspondence, we predicted high growth rates of low light–adapted
   ecotypes in the southernmost stations of the transect where the
   observed population density was low. This might reflect a dominant
   top-down control of populations in that region ([84]28) that cannot be
   captured by the bottom-up growth model. It might also be noted that the
   AMT-13 transect was conducted during the late boreal summer and austral
   winter, and mixed layer depths at stations south of 22°S were greater
   than 140 m (based on the Δσ[t] = 0.125 kg m^−3 criterion). Hence,
   potentially important light history dynamics associated with deep
   convective mixing may be improperly captured by our steady-state model.

   At stations where these direct comparisons were possible, the
   contribution of Prochlorococcus to total depth-integrated gross oxygen
   production (fig. S7) followed an expected trend, contributing less (15
   ± 14%) in high-chlorophyll regions (>15 mg chlorophyll a m^−2) and more
   (52 ± 31%) in the oligotrophic regions, in line with previous reports
   ([85]1, [86]3). Although a direct comparison was not possible, a
   subsurface maximum in the f-ratio (the proportion of nitrogen uptake
   supplied by nitrate) ([87]29) was predicted along the top of the deep
   chlorophyll maximum near the equator and lower subtropics (fig. S8), a
   hypothesis with implications for the contribution of Prochlorococcus to
   the biological carbon pump and that could conceivably be tested in situ
   ([88]30).

   The elemental composition and energy content of phytoplankton constrain
   marine food webs and the cycles of carbon, nitrogen, phosphorus, and
   other bioelements ([89]31). Predicted carbon-to-nitrogen molar ratios
   of Prochlorococcus populations varied from 5.4 to 6.8 along the
   meridional transect (fig. S9), with higher values associated with
   deeper or higher-latitude samples. This range and pattern is consistent
   with previous measurements of Prochlorococcus elemental stoichiometry
   in the Sargasso Sea (5.4 to 7.4) ([90]20). The energy available for
   consumers (here quantified by the enthalpy of combustion of biomass)
   also varied with light (68% of the predicted variance), with dry
   biomass spanning the range from 27.4 KJ gDW^−1 near the surface to 30.3
   KJ gDW^−1 below the euphotic depth (fig. S10). Population elemental
   stoichiometry and energy content reflect both the strain composition of
   each population and the phenotypic states of those strains that, in
   turn, reflect a great many underlying physiological processes,
   including alterations to macromolecular compositions and
   photoacclimation.

Physiological and metabolic acclimation

   Departures in environmental variables from ideal laboratory growth
   conditions were reflected in changes to optimal cell physiology and
   metabolic flux distributions. Acclimation strategies involved altering
   uptake kinetics through changes in transporter abundances and cell
   size, altering elemental stoichiometry and growth yields through
   changes in macromolecular compositions, and altering photosynthetic
   performance through changes in pigmentation and electron flow pathways.
   The effect of allowing for physiological acclimation can be appreciated
   by monitoring phenotypic changes that give rise to fitness gains.
   Compared with strains prevented from acclimating beyond their
   nutrient-replete batch culture phenotype, transporter acclimation
   increased growth rates by 43 ± 26% and macromolecular and
   photoacclimation provided further growth rate gains of 47 ± 25% (fig.
   S11). Especially under low-light conditions, physiological acclimation
   was necessary to acquire sufficient reductant to exceed nongrowth
   associated maintenance adenosine triphosphate (ATP) demands. For
   example, the threshold for net positive growth of a high light–adapted
   strain (MIT 9312) acclimated to ideal laboratory conditions required a
   minimum flux of 17.7 μmol photons m^−2 s^−1 but was able to acclimate
   to as low as 2.1 μmol photons m^−2 s^−1. Photoacclimation may therefore
   provide an explanation for HLII cells found deeper in the water column
   ([91]9, [92]10) than might be expected from laboratory isolates
   ([93]17). Thus, in addition to providing fitness gains within the
   feasible growth niche of each strain, physiological acclimation
   effectively extends the depth horizon and geographic range for growth.

   The abundances of key nutrient transporters (e.g., ammonia, nitrite,
   nitrate, and phosphate) are regulated by a trade-off between synthesis
   costs, available space on the membrane, and uptake rates. OptTrans
   explicitly accounts for these trade-offs using a mechanistic model that
   combines quantitative proteomics, molecular modeling, and FBA to
   predict the optimal abundance of each transporter and the optimal cell
   size (Materials and Methods) ([94]32). For any modeled strain, optimal
   cell sizes increased in proportion with external resource
   concentrations, but average population cell sizes (fig. S12) were
   driven more by community composition than by acclimation processes
   alone, indicating the importance of cell size as an adaptive trait. Our
   model explicitly differentiates states of nutrient limitation:
   Diffusion limitation occurs when external substrate concentrations
   limit the flux toward the cell below demands, and membrane surface area
   limitation occurs when the space afforded to transporters is exhausted,
   resulting in suboptimal uptake. When external substrates exceed a
   critical concentration S*, uptake rates are held at the level demanded
   by some other limiting resource (e.g., another limiting nutrient or the
   maximum growth rate). As expected, diffusion limitation was prevalent
   for nitrite and nitrate transport in the lower latitudes above the
   nutricline, where concentrations were routinely in the low-nanomolar
   range. Observed ammonium concentrations were predominantly just below
   the predicted S* (fig. S13), and transport was often limited by
   membrane surface area. It is possible that additional nitrogen sources
   that were not resolved in our simulation, especially urea and amino
   acids, supply the remaining deficit in nitrogen demanded by maximal
   growth rates; however, the competition for membrane space among
   multiple transporters would be even more prevalent. Phosphate
   concentrations were mostly above S^* throughout the transect, with the
   notable exception of the surface layer between 20°N and 30°N where
   concentrations approached a diffusion-limited regime, a region known to
   exhibit seasonal phosphorus limitation ([95]33).

   Among strains capable of transporting and assimilating all three,
   nitrogen was preferentially taken up in the order as follows: ammonia,
   nitrite, and nitrate. As a consequence of the high energy costs
   associated with nitrate and nitrite reduction, preference for reduced
   nitrogen was most pronounced in low-light environments, where a
   sensitivity analysis (Materials and Methods) indicated energy
   limitation, although there is also a small effect (in the same order)
   on encounter rates due to differences in molecular diffusivity. Several
   low light–adapted strains acclimated to growth in these low-light,
   high-inorganic nitrogen environments were further able to reduce the
   energy demands of nitrogen assimilation by redirecting a substantial
   portion (up to 48%) of the flux from the ATP-dependent glutamine
   synthetase–glutamate synthase complex (glutamate aminating) to the
   reduced form of nicotinamide adenine dinucleotide phosphate
   (NADP^+)–dependent glutamate dehydrogenase (GDH; 2-oxoglutarate
   aminating). The auxiliary role of GDH in nitrogen assimilation under
   low-light, high-nitrogen conditions has been recognized in the model
   freshwater cyanobacterium Synechocystis ([96]34) but not in
   Prochlorococcus ([97]35).

   Under resource-limiting conditions, growth rates can be further
   increased through the rebalancing of macromolecular pools to reduce the
   quota of that resource, thereby increasing yields. Particularly, in
   regions where inorganic nitrogen sources were scarce (e.g., in the
   upper water column of the subtropical gyres), protein content was
   reduced, leaving the remaining cellular mass to be partitioned
   predominantly between either carbohydrate or lipid stores. Because the
   energy requirements for de novo synthesis of lipids is considerably
   higher than of carbohydrates, allocating C to lipids was favored near
   the surface, while carbohydrates were relatively enriched just above
   the deep chlorophyll maximum depth interval. This solution points to
   lipid storage as an auxiliary electron sink to cope with high-light,
   low-nitrogen conditions. The resulting changes in the energy content of
   biomass can therefore be decoupled from synoptic changes in cellular
   C:N ratios, a prediction with implications for food web dynamics and
   trophic transfer efficiency ([98]36). The PhysOpt algorithm also
   predicted subtle changes to other macromolecular pools, including
   reductions in P-containing compounds (found in RNA, in cell
   wall–associated lipopolysaccharides, and in other metabolites,
   vitamins, and cofactors) in the upper water column near 26°N where
   phosphate was most depleted (≤20 nM).

   Changes in photophysiology and photosynthetic performance were
   predicted across latitudinal and vertical light gradients. In regions
   with high light and low nitrogen, the photoprotective pigment
   zeaxanthin increased while light-harvesting pigments α-carotene and
   divinylchlorophylls a and b decreased, providing a mechanism for
   nonphotochemical quenching of excess photons to be dissipated as heat.
   α-Carotene contributed a modest 22 ± 9% of total light-harvesting
   pigment absorption across all strains but appears to be somewhat
   favored by the HLII ecotype (28 ± 7%). These predictions are consistent
   with observations of Prochlorococcus isolates grown under variable
   light intensity and quality ([99]37, [100]38). Spectral tuning also
   influenced optimal pigmentation; the gradual shift in downwelling
   irradiance spectra favors the de novo synthesis of divinylchlorophyll b
   over divinychlorophyll a in deeper samples, a well-documented
   phenomenon in naturally occurring Prochlorococcus populations ([101]17,
   [102]39, [103]40) and phytoplankton more broadly (fig. S14) ([104]41).

   Strain-specific differences in key photosynthesis parameters were
   derived from modeled photosynthesis versus irradiance curves at each
   station. The predicted meridional variation in the initial slope (α),
   assimilation number (
   [MATH:
   <mrow><mrow><msubsup><mi>P</mi><mtext>max</mtext><mi>B</mi></msubsup></
   mrow></mrow> :MATH]
   ), and the photosynthetic saturation parameter (E[k]) was remarkably
   consistent with both in situ observations ([105]42) and laboratory
   experiments ([106]Fig. 3) ([107]17). The quantum yield (QY) of
   photosynthesis, defined here as the moles of carbon fixed by ribulose
   bisphosphate carboxylase per mole of photons absorbed by the
   light-harvesting pigments, is a holistic measure of photophysiological
   states. In general, QY approached (but did not reach) a theoretical
   limit of 0.125 mol C (mol photons)^−1 and declined because of the
   temperature dependence of biosynthetic rates at lower irradiances that
   covaried with in situ temperatures ([108]Fig. 3). Subsurface maxima in
   the QY clustered primarily by ecotype, with overlapping optima from
   high to low light in the order HLII, HLI, LLI, LLII/III, and LLIV. The
   same general order was found for predicted optimal growth temperatures
   (OGTs) (data S4 and fig. S15), perhaps because of the covariation of
   temperature and light. Notably, this same order describes the
   phylogenetic evolution of the Prochlorococcus lineage, with
   deep-branching LLIV strains occupying the lower euphotic zone and more
   recently diverged ecotypes expanding their niche range toward the sea
   surface ([109]15).

Fig. 3. Photoacclimation and photosynthesis.

   [110]Fig. 3.
   [111]Open in a new tab

   (A) Predicted QY of photosynthesis (moles C fixed by ribulose
   1,5-bisphosphate carboxylase per mole photons absorbed by
   light-harvesting pigments) over the range of in situ irradiance levels
   for all strains (small markers), ecotypes (large markers), and the
   total population (large black markers) across all samples. (B)
   Predicted photosynthetic rate normalized to the predicted chlorophyll
   against irradiance for all strains, ecotypes, and the total population
   across all samples. Box-and-whisker plots of meridional variations in
   (C) the assimilation number (
   [MATH:
   <mrow><mrow><msubsup><mi>P</mi><mi>m</mi><mi>B</mi></msubsup></mrow></m
   row> :MATH]
   ) and (D) the photosynthetic saturation irradiance (E[k]), derived from
   least-squares regression of predicted photosynthesis-versus-irradiance
   curves at each station.

   Several alternative electron flow pathways were active under
   high-light, low-nitrogen conditions, including the Mehler reaction,
   pseudo-cyclic electron flow around photosystem I, and ferredoxin-NADP^+
   reductase. In some strains, excess reductant is dissipated by the
   plastoquinone terminal oxidase (PTOX) reaction, while in other strains
   lacking PTOX, the generation of singlet oxygen and other free oxygen
   radicals is unavoidable, requiring glutathione or thioredoxin
   peroxidases. Partial recovery of the photorespiratory by-product
   2-phosphoglycolate was only possible for a subset of the LLIV strains,
   while other ecotypes are known to excrete substantial amounts of the
   hydroxy acid glycolate ([112]43). Although a comprehensive analysis of
   all strains across the transect was not possible, flux variability
   analysis (Materials and Methods) ([113]44) indicated that glycolate
   excretion was favored by strains lacking a complete photorespiratory
   pathway in high-light, rather than low-light, conditions, supporting
   the notion of photorespiration as an additional alternative electron
   flow pathway ([114]45, [115]46).

Metabolic microdiversity and the “nano-niche”

   A survey of growth capabilities and pathway enrichment analysis among
   the core and accessory reaction subsets of the Prochlorococcus
   pangenome revealed the commonality of metabolic features at different
   taxonomic resolutions. The core subset, reactions common to all
   strains, included much of the photosynthetic, respiratory, and central
   carbon metabolism pathways, as well as the anabolic pathways for amino
   acids, lipids, light-harvesting pigments, vitamins, and cofactors. The
   accessory subset, reactions present in only some strains, was more
   widely distributed among pathways, especially those that enable access
   to nutrient sources in specific environments (fig. S15). For example,
   accessory genes associated with the LLIV clade included an enrichment
   in chitin-degrading and other polysaccharide-degrading enzymes and
   transporters, resources that are expected to supplement their
   energy-limited niche below the deep chlorophyll maximum depth. While
   our metabolic networks resolve the diversity of mixotrophic potential
   across the Prochlorococcus pangenome, comprehensive determinations of
   compound-specific dissolved organic matter concentrations in the marine
   environment remains an analytical challenge. Although exogenous organic
   substrate uptake was not included in our AMT simulations,
   ecotype-specific patterns were evident from an analysis of growth
   capabilities on various sole N, P, and S sources. For example, LLI and
   LLIV strains were universally able to grow on nitrite as a sole N
   source, while only four strains from other ecotypes shared that
   capability (fig. S16).

   While some metabolic strategies could be generalized at the ecotype
   level, even closely related strains with only subtle differences in
   their metabolic networks occupied different niches in physical and
   chemical space, coined a nano-niche ([116]47). To illustrate this
   point, a head-to-head comparison was conducted between two HLII
   strains, MIT 9314 and SB, which are closely related both
   phylogenetically ([117]13) and in terms of overlap in their metabolic
   networks (Jaccard similarity of 97.3%; fig. S2). Predicted growth rates
   of both strains were similar near the surface in the subtropics, but
   MIT 9314 outpaced SB in the surface layer of the equatorial region and
   SB outpaced MIT 9314 at higher latitudes and in deeper samples. A
   metabolite sensitivity analysis (Materials and Methods), used in this
   context as diagnostics of nutritional and energy status in key
   metabolites, indicated that the transition depth from nutrient
   limitation to light limitation occurred higher in the water column for
   SB, whereas MIT 9314 was more pervasively nitrogen limited ([118]Fig.
   4). Inspection of the subsets of reactions unique to each strain
   provided the mechanistic basis for their niche differentiation. MIT
   9314 supplemented the tricarboxylic acid cycle intermediate
   oxaloacetate via the anapleurotic carbon fixation enzyme
   phosphoenolpyruvate carboxykinase (PEPCKase), increasing its relative
   fitness with respect to SB in the surface layer of the equatorial
   region where carbon was essentially growth limiting for this strain.
   The coupling of PEPCKase to a truly incomplete tricarboxylic acid cycle
   in Prochlorococcus ([119]48) is potentially unique and has not been
   demonstrated in vivo. Near the deep chlorophyll maximum and at higher
   latitudes, SB leveraged its access to nitrate and nitrite transport and
   reduction to outpace MIT 9314, which relied on ammonia as its sole
   nitrogen source. The two strains converged in fitness below the 1%
   light depth because of the diminishing returns of oxidized nitrogen
   assimilation. Reactions attributed to the difference in niche optima
   between the two strains were validated with simulated knock-in mutants
   expressing the missing subsets in each. The resulting offset in the
   depth intervals for optimal growth were also reflected in their
   predicted OGTs (29.06°C for MIT 9314 and 27.86°C for SB; data S4).

Fig. 4. Metabolic microdiversity.

   [120]Fig. 4.
   [121]Open in a new tab

   (A) Meridional section of the relative growth rate of strains SB and
   MIT 9314. A positive value indicates the percentage increase in growth
   rate for strain SB relative to strain MIT 9314. (B and C) Meridional
   sections of resources limiting the growth of strains SB and MIT 9314.
   Warmer colors correspond to increasingly nitrogen-limited phenotypes,
   while cooler colors correspond to increasingly energy (light)–limited
   phenotypes.

   Mechanistic, genome-scale modeling of cellular metabolism and
   physiology offers readily interpretable and, importantly, directly
   testable hypotheses in unprecedented detail. Moreover, models that
   resolve the molecular scale are needed to interface with the deluge of
   environmental omics data. However, the adoption of GEMs and
   constraint-based methods as prognostic tools in environmental
   microbiology and microbial ecology has been limited thus far
   ([122]49–[123]51). The traditional flux balance approach has routinely
   been applied to predict growth rates and yields of monoclonal isolates
   in chemostat, given a priori knowledge of not only internal
   stoichiometry and external resource availability but also, crucially,
   resource acquisition rates and biomass compositions, both of which vary
   widely in natural systems. Explicit, mechanistic modeling of
   acclimation processes (nutrient acquisition, photosynthetic
   architecture, and biochemical compositions) are therefore an important
   component of the model and a key factor in capturing the flexibility of
   relative fitness. By canvasing the microdiversity of physiological and
   metabolic traits within populations, our constraint-based optimization
   approach enables a bottom-up perspective on microbial population
   physiology, metabolism, and growth in diverse environments at multiple
   taxonomic resolutions. In addition to the many testable predictions of
   cell stoichiometry, nutrient uptake, metabolic rates, and
   photophysiology, the correspondence of the predicted optimal growth
   rate patterns with observed abundances is intriguing, and the emergent
   relationship could provide a calibration for a density-dependent
   mortality closure with broader application.

   Implementation of this framework in regions beyond the AMT transect,
   where appropriate environmental data are already available, would
   provide testable predictions on the relative abundance, composition,
   and activity of Prochlorococcus ecotypes. However, because of the
   streamlined nature of Prochlorococcus, much of the metabolic diversity
   within the accessory subset of the pangenome is associated with the
   transport and metabolism of exogenous organic substrates that are
   neither routinely measured in situ nor resolved in our current
   implementation. Exchange of these resources within Prochlorococcus
   populations may provide stability to strain diversity through
   cross-feeding and division of labor. This is one avenue in which
   pangenome-scale modeling may contribute in the future, because many
   specific metabolite exchanges could, in theory, be resolved. Allowing
   the virtual organisms to feed back on their resource environment could
   lead to prognostic applications ([124]52), perhaps embedded in a
   coarser-grained general population. Although currently still
   computationally prohibitive, with effort on code development and
   optimization, this is a plausible goal. The degree of metabolic detail
   resolved makes these tools a good prospect for exploring community
   interactions.

MATERIALS AND METHODS

Environmental data

   In situ and remote sensing data were used as inputs for simulations. A
   suite of 66 measured in situ variables from the 2003 AMT-13 cruise,
   including temperature, nutrients, cell abundances, and optical
   properties, are described in detail in the Supplementary Materials
   (section S1.1) and are provided as a gridded dataset (data S2). To
   compute downwelling irradiance, we combined in situ optical profiler
   diffuse spectral attenuation coefficients with synoptic measurements of
   satellite cloud cover as a mask over a model of clear-sky direct and
   diffuse spectral radiation. Simulations conducted in this study were
   restricted to a subset of 32 stations (from a total of 78 casts at 54
   stations) for which ecotype abundance data ([125]9) were available.

Biochemical compositions

   Biochemical compositions, physiological traits, and constraints on
   their ranges were assigned to each strain. Initial compositions
   represent the nutrient-replete batch-acclimated state of isolates as a
   starting point for acclimation. Data assigned to each strain were
   derived both from literature sources and from measurements taken for
   this study. Metabolomic data were collected for five Prochlorococcus
   strains under optimal batch conditions: MED4 (HLI), MIT 9312 (HLII),
   NATL2A (LLI), MIT 1304 (LLII/III), and MIT 9313 (LLIV). Strain MED4 was
   analyzed for macromolecular compositions (carbohydrate, lipid, protein,
   DNA, RNA, and divinylchlorophyll a) as well as elemental composition
   (particulate C, N, and P) and dry weight as in ([126]53) (with
   modifications) under nutrient-replete and nitrogen and phosphorus
   stress conditions. Detailed analytical methods and a summary of
   literature datasets used in this study are provided in the
   Supplementary Materials (section S1.5 and data S7).

Pangenome-scale metabolic network assembly

   An aggregate GEM was reconstructed to represent the full set of
   metabolic functions encoded in the pangenome of Prochlorococcus
   (PanGEM). The current pangenome consists of 77 sequenced isolates,
   seven metagenome-assembled genomes, and 564 single-cell amplified
   genomes, comprising 866,894 unique open reading frames. Translated
   coding sequences were mapped to KEGG orthologs (KOs) using a
   bidirectional blast with KEGG’s kofamKOALA, pairing the number of
   unique KOs to 2084. A database containing nucleotide sequences,
   translated amino acid sequences, header strings, database identifiers
   (National Center for Biotechnology Information, Joint Genome Institute,
   and KEGG), metadata, and descriptive statistics for each genome was
   compiled; this database was the basis for draft reconstruction of the
   PanGEM network. The draft reconstruction was then manually curated
   following a standard process (section S1.3) ([127]54).

Semiautomatic strain GEM reconstruction

   A novel approach to automatically extract functional,
   stoichiometrically consistent strain GEMs from the PanGEM was
   developed. By definition, any strain is a subset of the PanGEM;
   however, simply excluding the remaining reactions often results in
   lethal deletions. In lieu of manual curation for each strain, we use a
   CS algorithm to identify an essential subset of reactions that are
   required to synthesize all biomass components de novo and satisfy the
   growth objective. From this essential reaction subset, we append
   reactions specific to each strain. A detailed description of PanGEM and
   the CS algorithm is provided in the Supplementary Materials (section
   S1.4).

Flux balance analysis

   A chemical flux balance can be derived from the law of mass action
   (section S1.2). Assuming a steady state, a set of optimal flux
   distributions can be predicted that maximizes an objective function by
   solving the linear program
   [MATH: <mrow><mrow><mtable><mtr columnalign="left"><mtd><mrow><mover
   accent="true"><mrow><mi mathvariant="bold">v</mi><mo>∈</mo><msup><mi
   mathvariant="normal">ℝ</mi><mi>n</mi></msup></mrow><mtext>maximize</mte
   xt></mover></mrow></mtd><mtd><mrow><msup><mi
   mathvariant="bold">c</mi><mi>T</mi></msup><mi
   mathvariant="bold">v</mi></mrow></mtd></mtr><mtr
   columnalign="left"><mtd><mtext>subject
   to</mtext></mtd><mtd><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd><mrow><mi
   mathvariant="bold">Sv</mi><mo>=</mo><mn>0</mn><mo>,</mo></mrow></mtd></
   mtr><mtr columnalign="left"><mtd><mrow><msup><mi
   mathvariant="bold">v</mi><mi
   mathvariant="bold">lb</mi></msup><mo>≤</mo><mi
   mathvariant="bold">v</mi><mo>≤</mo><msup><mi
   mathvariant="bold">v</mi><mi
   mathvariant="bold">ub</mi></msup></mrow></mtd></mtr></mtable></mrow></m
   td></mtr></mtable></mrow></mrow> :MATH]
   (1)

   where S is the stoichiometric matrix, v is the vector of fluxes and its
   respective lower and upper bounds (v^lb and v^ub), and c is a
   predefined coefficient vector whose entries define the weighted set of
   reactions that determine fitness. Here, the objective function is
   defined as growth rate, the rate of synthesis of 1 g of ash-free dry
   biomass of specified composition. Several variations of FBA and
   sensitivity analyses are used throughout this article, as described in
   the Supplementary Materials (section S1.2).

Microbial Simulation Environment

   Two sequential acclimation steps, OptTrans and PhysOpt, are performed
   on each strain GEM at each environmental grid point. Physiological data
   specific to each strain, or to its closest phylogenetic relative, are
   initially assigned to the corresponding GEM to represent its phenotype
   when grown under optimal laboratory conditions. Physiological data,
   collected as a part of this study (data S4 to S7) and from the
   literature (data S8), include cell size spectra, biochemical
   compositions (macromolecular pools, pigments, and metabolites), maximum
   carbon fixation rates, and maintenance requirements. Boundary
   constraints for all parameters (e.g., molecular compositions, cell
   size, and metabolic rates) are assigned to each strain GEM by compiling
   experimental data collected under a wide range of growth conditions.
   The first bilevel optimization, OptTrans, searches for the optimal
   distribution of metabolic fluxes, transporter abundances, and cell size
   that maximizes fitness, given the external concentration of each
   nutrient (ammonia, nitrite, nitrate, and phosphate). Quantitative
   proteomics data and molecular modeling are used to derive catalytic
   rates and transmembrane domain dimensions for each transporter,
   necessary to predict external substrate concentration–dependent uptake
   rates ([128]32). The calculated optimal uptake rates, cell sizes, and
   transporter abundances are then used to constrain the second bilevel
   optimization, PhysOpt. Given our experimentally determined constraints
   on cellular biochemical compositions (data S5), PhysOpt searches for
   the optimal biomass composition of several macromolecular pools (e.g.,
   protein, DNA, RNA, cell wall, lipid, carbohydrates, minerals, vitamins,
   osmolytes, and other metabolites). Simultaneously, pigment compositions
   (the light-harvesting pigments divinylchlorophylls a and b, α-carotene,
   and the photoprotective pigment zeaxanthin) are optimized according to
   the alignment of pigment-specific absorption spectra with the observed
   in situ downwelling irradiance spectrum, again with the objective of
   maximizing fitness. This coupling therefore takes into account the
   costs of synthesizing each macromolecular pool or pigment de novo, in
   the holistic context of total cellular nutritional status, photon flux
   demands, metabolic capabilities, and biomass composition, specific to
   each strain. The entire MSE procedure is intended to mechanistically
   represent the processes by which microbes acclimate to their
   environment at the molecular scale, while being faithful to the
   trade-offs of altering their physiology. A more thorough discussion and
   accompanying illustrations for OptTrans, PhysOpt, and the transport and
   photosynthesis models are provided in the Supplementary Materials
   (section S1.6).

Temperature dependence of metabolic rates

   Growth rates and metabolic fluxes were adjusted for temperature effects
   by comparing in situ growth temperatures to a model of
   temperature-dependent growth, parameterized for each strain. Our
   approach provides a strain-specific parameterization while avoiding the
   effects of adaptive laboratory evolution, which can be substantial
   ([129]55). First, the Arrhenius activation energy E[a] was calculated
   on the basis of growth rates determined in batch cultures of 12
   Prochlorococcus strains ([130]9, [131]56). A modification of a
   previously proposed model ([132]57), but which preserves the Arrhenius
   parameterization, was used to capture the dynamics of a dimensionless
   growth rate constant that was then applied to metabolic rates and
   growth rates. The OGT parameter was predicted for each strain from
   proteome sequences using the machine learning algorithm TOME-cool.
   Among several algorithms, a support vector machine regression achieved
   the best performance (coefficient of determination score, R^2 = 0.88).
   The OGT dataset from TOME 1.0 ([133]26) was expanded to include
   additional psychrophilic and psychrotolerant taxa ([134]58, [135]59),
   resulting in a training dataset of 6020 microorganisms
   ([136]https://github.com/EngqvistLab/tome_cool; section S1.7).

Acknowledgments