Abstract Metabolism is controlled to ensure organismal development and homeostasis. Several mechanisms regulate metabolism, including allosteric control and transcriptional regulation of metabolic enzymes and transporters. So far, metabolism regulation has mostly been described for individual genes and pathways, and the extent of transcriptional regulation of the entire metabolic network remains largely unknown. Here, we find that three‐quarters of all metabolic genes are transcriptionally regulated in the nematode Caenorhabditis elegans. We find that many annotated metabolic pathways are coexpressed, and we use gene expression data and the iCEL1314 metabolic network model to define coregulated subpathways in an unbiased manner. Using a large gene expression compendium, we determine the conditions where subpathways exhibit strong coexpression. Finally, we develop “WormClust,” a web application that enables a gene‐by‐gene query of genes to view their association with metabolic (sub)‐pathways. Overall, this study sheds light on the ubiquity of transcriptional regulation of metabolism and provides a blueprint for similar studies in other organisms, including humans. Keywords: gene regulation, metabolic network model, metabolism, systems biology, transcription Subject Categories: Chromatin, Transcription & Genomics; Computational Biology; Metabolism __________________________________________________________________ Systems‐level analyses in C. elegans show widespread transcriptional regulation of metabolic genes, coexpression of metabolic (sub)‐pathways, and activation/repression conditions of (sub)‐pathways. graphic file with name MSB-19-e11443-g009.jpg Introduction All organisms regulate their metabolism during development and to maintain homeostasis under fluctuating dietary and environmental conditions. In humans, failure to maintain homeostasis can lead to a variety of metabolic disorders such as inborn errors in human metabolism, obesity, hypertension, and diabetes (Sharma et al, [34]2008; DeBerardinis & Thompson, [35]2012). Metabolism can be regulated through different mechanisms. One well‐known mechanism is allostery, a fast‐acting mechanism where metabolites directly modulate enzyme activity. For instance, the enzyme phosphofructokinase, which regulates the conversion of fructose 6‐phosphate to fructose 1,6‐biphosphate, is allosterically regulated during glycolysis. This reaction is coupled to ATP hydrolysis where ATP binding to phosphofructokinase inhibits enzyme activity by decreasing its affinity for fructose 6‐phosphate, while conversion to AMP reverses the inhibitory effect and increases the activity of the enzyme (Blangy et al, [36]1968; Schirmer & Evans, [37]1990). Metabolism can also be regulated transcriptionally by activating or repressing the expression of genes encoding metabolic enzymes or transporters. This mechanism is relatively slow and allows the organism to adapt to changing cellular or environmental conditions. Well‐known examples of the transcriptional regulation of metabolism include induction of the lac operon in Escherichia coli in response to a switch from glucose to lactose as a carbon source (Jacob & Monod, [38]1961; Gilbert & Muller‐Hill, [39]1966); the Leloir pathway in Saccharomyces cerevisiae, which is transcriptionally activated by galactose (Caputto et al, [40]1949; Hopper et al, [41]1978); and mammalian cholesterol biosynthesis genes, which are activated by the transcription factor (TF) SREBP when cholesterol levels are low (Brown & Goldstein, [42]1997; DeBose‐Boyd & Ye, [43]2018). Another example of transcriptional rewiring of metabolism involves propionate degradation in the nematode Caenorhabditis elegans. Like humans, C. elegans utilizes a vitamin B12‐dependent pathway to break down this short‐chain fatty acid. When dietary vitamin B12 is low, propionate metabolism is transcriptionally rewired to an alternative degradation pathway referred to as the propionate shunt, thereby preventing toxic propionate accumulation (Watson et al, [44]2014, [45]2016; Bulcha et al, [46]2019). The contribution of transcriptional regulation of metabolism has mostly been studied at a systems, or network, level, in single‐cell organisms such as E. coli and S. cerevisiae and to a lesser extent in plants (Ihmels et al, [47]2004; Kharchenko et al, [48]2005; Seshasayee et al, [49]2009; Ledezma‐Tejeida et al, [50]2017; Tang et al, [51]2021). However, the extent to which overall metabolic activity is under transcriptional control in animals remains unclear. C. elegans is an excellent multicellular animal model to study the transcriptional regulation of metabolism at a systems level: Its fixed lineage of 959 somatic cells was fully described (Sulston & Horvitz, [52]1977), its metabolism shows extensive conservation with human metabolism (Lai et al, [53]2000; Shaye & Greenwald, [54]2011), many gene expression datasets are available, and a genome‐scale metabolic network model (MNM) has been reconstructed (Yilmaz & Walhout, [55]2016). The most up‐to‐date MNM, iCEL1314, contains 907 metabolites, 2,230 reactions and 1,314 genes (Yilmaz et al, [56]2020). By using flux balance analysis (FBA), iCEL1314 can be used to gain insight into the metabolic state of C. elegans during different nutritional conditions or in different tissues. An additional set of metabolic genes has been predicted based on homology with known metabolic enzymes in other organisms or based on the presence of domains found in metabolic enzymes (Yilmaz & Walhout, [57]2016; Bhattacharya et al, [58]2022). Guilt‐by‐association is a powerful concept in systems biology that can be used to identify genes with shared functions. One way this can be done is by coexpression analysis where a functional association can be predicted when genes are coexpressed in many transcriptomic datasets (Eisen et al, [59]1998; Hughes et al, [60]2000; Kim et al, [61]2001; Segal et al, [62]2003; Stuart et al, [63]2003). In C. elegans, coexpression analysis has been used to study global, temporal, and spatial gene expression (Reinke et al, [64]2000; Kim et al, [65]2001, [66]2016; Spencer et al, [67]2011; Liu et al, [68]2018). Here, we investigated the extent of transcriptional regulation of C. elegans metabolism. We developed a computational pipeline to identify genes of which the corresponding mRNA varies significantly during development, in different tissues, and across a gene expression compendium consisting of different conditions. Using both a supervised and a semisupervised method, we identified coexpressed metabolic pathways and subpathways. Overall, we found that three‐quarters of metabolic genes exhibit variation in expression, which is comparable to the proportion in nonmetabolic genes. Further, we found that most annotated metabolic pathways contain genes that are significantly coexpressed. With a custom‐made semisupervised method, we identified clusters of genes that define coexpressed subpathways or combinations of subpathways that likely form functional metabolic units. We extracted conditions where coexpressed clusters of genes are coordinately activated or repressed, revealing how these clusters may contribute to metabolic homeostasis. We developed a web application we named “WormClust” that is available on WormFlux website (Yilmaz & Walhout, [69]2016). WormClust enables querying of C. elegans genes to identify metabolic (sub‐) pathways with which these genes are coexpressed. Altogether, our findings show that transcriptional regulation of metabolic genes and pathways is ubiquitous in C. elegans, indicating that this principle is broadly conserved from single‐cell organisms to metazoa. Finally, our analyses and tools provide a platform for similar studies in other organisms, including humans. Results Three‐Quarter of metabolic genes are transcriptionally regulated mRNA levels are determined by a combination of synthesis and degradation. Here, we used variation in mRNA levels as a first approximation for transcriptional regulation. We evaluated the expression of metabolic genes during development, in different tissues, and under different conditions to identify metabolic genes that are highly variant and therefore likely transcriptionally regulated. We used all annotated metabolic genes (Yilmaz & Walhout, [70]2016; Yilmaz et al, [71]2020) and grouped them into four classes based on current annotation (Dataset [72]EV1): Class A, iCEL1314 genes (N = 1,308; after removal of six pseudogenes, see [73]Materials and Methods); class B, genes annotated to reactions that cannot yet be connected to the iCEL1314 model (N = 192); class C, genes encoding proteins with homology to metabolic enzymes in other organisms (N = 860); and class D, genes encoding proteins with a domain found in known metabolic enzymes (N = 132). Hereafter, we refer to the 1,308 genes in class A as “iCEL1314 genes” and the remaining 1,184 as “other metabolic genes”. We first identified metabolic genes that vary in expression during larval development by using a high‐quality postembryonic time‐resolved RNA‐seq dataset, hereafter referred to as the “development dataset” (Kim et al, [74]2013; Figs [75]1A and [76]EV1A, and Dataset [77]EV2). Briefly, this dataset contains expression profiles of stage‐synchronized animals that were collected every 2 h after hatching for 48 h at 20°C. In the original paper, genes were grouped into 12 clusters based on similarity in developmental expression profiles. One of these clusters contains 5,045 genes, including 995 metabolic genes, with relatively invariant temporal expressions. We will refer to this cluster as the “flat cluster.” However, although the expression levels of most of the flat cluster genes are relatively stable during development, we noticed that some did exhibit considerable variation. Additionally, many invariant genes from other clusters were not included in the flat cluster. Therefore, we used an unbiased statistical method, called variation score (VS) to stringently define variation in developmental gene expression. This included calculating deviation from the flat cluster genes' expression and then empirically establishing a conservative VS threshold (0.169; Fig [78]EV1A and [79]B, see details in [80]Materials and Methods). We excluded 3,552 genes, including 213 metabolic genes, because they were expressed at levels too low for variability analysis. For the remaining metabolic genes, we found that 754 (31.4%, VS ≥ 0.169) are highly variant, and 98 were invariant (4%, VS = 0; Fig [81]1B). The remaining 1,332 metabolic genes (0 < VS <0.169) were annotated as moderately variant (Figs [82]1B and [83]EV1C). About a quarter of iCEL1314 genes (329, or 26%) are highly variant, which is lower than the proportion of other metabolic genes (37%) and nonmetabolic genes (41%; Fig [84]EV1C and [85]D, and Dataset [86]EV2). The percentage of highly variant metabolic genes is lower than that of nonmetabolic genes across most VS thresholds (Fig [87]1C). Figure 1. Analysis of metabolic gene expression during development, in different tissues and in a gene expression compendium. Figure 1 [88]Open in a new tab 1. Computational pipeline to identify C. elegans metabolic genes that change in expression during development, across tissues, and compendium of multiple conditions. Statistically significant differences in gene expression were calculated in the developmental dataset using a variation score (VS), in the tissue dataset using coefficient of variation (CV) and by number of datasets with CV ≥ 0.75 in the compendium (collection of 177 datasets). 2. Pie charts of metabolic and nonmetabolic gene expression variation in the three different datasets: development, tissue, and compendium separately and combined. Bar graph shows metabolic genes in iCEL1314 and other (predicted) metabolic genes. 3. Comparison of percentage of highly variant metabolic versus nonmetabolic genes at different VS thresholds. 4. Comparison of percentage of highly variant metabolic versus nonmetabolic genes at different CV thresholds. 5. Scatter plot of VS (development) versus CV (tissue) of metabolic genes. The plot is divided into four quadrants: Q1 with moderate/low VS and moderate/low CV; Q2 with high VS and moderate/low CV; Q3 with moderate/ low VS and high CV; and Q4 with high VS and high CV. The Pearson and Spearman correlation coefficients and the corresponding P‐values are indicated. Examples of Q4 genes that are highly variant both during development and in different tissues include ugt‐13, ugt‐18, and ugt‐34 (UGT enzymes); gcy‐3 (guanylate cyclases); and ddo‐3, gln‐2, argk‐1, and phy‐3 (amino acid metabolism). 6. Comparison of percentage of highly variant metabolic versus nonmetabolic genes at different cutoffs of number of datasets with high CV (≥ 0.75). 7. Comparison of percentage of highly variant metabolic versus nonmetabolic genes at different CV cutoffs in at least three datasets in the compendium. 8. Comparison of percentage of invariant metabolic versus nonmetabolic genes at different cutoffs of the fraction of datasets with low CV (< 0.3). 9. Venn diagram of highly variant metabolic genes in the different datasets. Figure EV1. Transcriptional regulation of metabolism during development. Figure EV1 [89]Open in a new tab 1. Computational pipeline to identify highly variant metabolic genes during development. Genes that showed either moderate or high expression in at least one time point were selected, reducing the number of genes from 18,113 (2,397 metabolic) to 14,561 (2,184 metabolic). For each gene, VS was calculated using the deviation from a flat reference profile at each time point (see [90]Materials and Methods). The red line indicates the mean value and light red shaded area the standard deviation of the flat reference profile. The profile of acdh‐1 is plotted in blue as an example of a developmentally regulated gene. d[5] denotes the deviation of acdh‐1 expression from the flat reference profile at the 5^th data point. 2. Distribution of VS of genes belonging to the flat cluster versus those belonging to other clusters. The vertical lines at 1 and 3 represent the iCEL1314 genes nduf‐7 and ugt‐42, which have the lowest and highest VS of the flat set, respectively. The vertical line at 2 indicates the gene C27A7.3 with selected threshold of VS at the 97% quantile of the flat cluster (VS = 0.169). 3. Diagram showing criterion of categorizing metabolic and nonmetabolic genes into four categories across development: lowly expressed, invariant, moderately variant and highly variant. 4. Bar graph shows the distinction of low expressed, invariant, moderately variant and highly variant genes during development in iCEL1314 and other metabolic genes. Color legend as indicated in (C). To identify metabolic genes that exhibit differential expression across tissues, we selected a high‐quality single‐cell RNA sequencing dataset that measured gene expression during L2 stage of C. elegans across seven major tissues: body wall muscle, glia, gonad, hypodermis, intestine, neurons, and pharynx (Cao et al, [91]2017; Figs [92]1A and [93]EV2A, and Dataset [94]EV3). This dataset is hereafter referred to as the “tissue dataset.” Unlike the development dataset, the tissue dataset does not have a defined cluster of invariant genes. Therefore, we used the less sophisticated coefficient of variation (CV) measure to identify variation in gene expression across the seven tissues (Fig [95]EV2A). We previously found that the five genes comprising the propionate shunt are differentially expressed in different tissues (Watson et al, [96]2016; Yilmaz et al, [97]2020), and each of these genes had a CV greater than 0.75 (Fig [98]EV2B). Visual inspection of genes with CV values from 0.15 to 1.2 indicates that CV = 0.75 provides a clear, yet conservative threshold to annotate highly variant genes across tissues (Fig [99]EV2C). We further classified genes with a CV less than 0.75 but greater than or equal to 0.3 as moderately variant and genes with a CV less than 0.3 as invariant (Fig [100]EV2D, see Fig [101]EV2B and [102]C for examples). A total of 6,370 genes, including 348 metabolic genes, were not included in this analysis because they are expressed at low levels (Yilmaz et al, [103]2020; details in [104]Materials and Methods). We identified ~60 and ~25% of metabolic genes as highly and moderately variant, respectively. These include 781 highly variant and 405 moderately variant iCEL1314 genes (Figs [105]1B and [106]EV2E). A very small number of metabolic genes (13, or 1%) were invariant across tissues. Even though the analysis of the different datasets used a different statistical approach, these results suggest that more metabolic genes are variant and, therefore, likely transcriptionally regulated in different tissues than during development (Fig [107]1B). In contrast to development, the percentage of metabolic genes that are highly variant across tissues at any CV cutoff is greater than nonmetabolic genes (Fig [108]1D), indicating that metabolic processes exhibit a relatively high level of tissue specificity. Figure EV2. Transcriptional regulation of metabolic genes across different tissues. Figure EV2 [109]Open in a new tab 1. Computational pipeline to determine highly variant metabolic genes across tissues. Genes that show either moderate or high expression in at least one tissue are selected for analysis reducing the number of C. elegans genes from 19,675 (2,491 metabolic) to 13,305 (2,143 metabolic). The coefficient of variation (CV) of each gene was calculated by dividing the standard deviation of expression across tissues by the mean expression. Different thresholds of CV were titrated to select a stringent CV to categorize genes as variant and therefore potentially transcriptionally regulated. Examples for each threshold are provided in (B). 2. Tissue expression and CV values of propionate shunt genes. 3. Bar graphs showing expression profiles of example genes across tissues with CV 0.15, 0.3, 0.45, 0.6, 0.75, 0.9, 1.05 and 1.2. Numbering of examples is according to the corresponding threshold lines in (A). 4. Diagram showing criterion of categorizing metabolic and nonmetabolic genes into four categories across tissues: lowly expressed, invariant, moderately variant and highly variant. 5. Bar graph shows the distinction of low expressed, invariant, moderately variant and highly variant genes across tissues in iCEL1314 and other metabolic genes. Color legend as indicated in (D). Overall, metabolic gene expression showed higher variation across tissues at a fixed time point (L2) than larval development. However, because we used two different statistical methods for the development and tissue datasets, we confirmed that it held true when we applied the same CV measure we used in the tissue dataset to the development dataset (Appendix Fig [110]S1A). The two datasets also have different resolution: the development dataset has great temporal but no spatial resolution because it was measured by bulk RNA‐seq while the tissue dataset, which was measured by single‐cell RNA‐seq, has great spatial but no temporal resolution. Therefore, we examined genes that are highly tissue‐specific, because they are highly expressed in a single tissue in the tissue dataset, and found that only 57% of these are also highly variant in the development dataset (Appendix Fig [111]S1B). Therefore, we conclude that transcriptional regulation of metabolic genes more frequently establishes spatial than temporal gene expression patterns. To directly compare metabolic gene expression in tissues and development, we plotted VS values of metabolic genes across development versus CV values across tissues and found that these two parameters are moderately correlated (Fig [112]1E). We divided the scatter plot into four quadrants, based on the thresholds used in each dataset (Dataset [113]EV4). To determine whether there are any functional enrichments, we performed pathway enrichment analysis (PEA) on the metabolic genes for each quadrant using the tool provided on the WormFlux website (Yilmaz & Walhout, [114]2016). The first quadrant (Q1) consists of genes with moderate/low developmental variation and moderate/low tissue variation. It has 595 metabolic genes, including 385 iCEL1314 genes that are enriched in several metabolic pathways, such as the electron transport chain (ETC), aminoacyl‐tRNA biosynthesis, the tricarboxylic acid (TCA) cycle, the pentose phosphate pathway and glycolysis/gluconeogenesis (Appendix Fig [115]S2A). The second quadrant (Q2), with high developmental variation and moderate/low tissue variation, consists of only 176 genes, including 68 iCEL1314 genes that are enriched in sulfur, cysteine, and methionine metabolism (Appendix Fig [116]S2A). The 891 genes in the third quadrant (Q3) consist of genes with moderate/low developmental variation and high tissue variation. They include 504 iCEL1314 genes that are highly enriched in lipid metabolism. Notably, genes involved in peroxisomal fatty acid (FA) metabolism vary more in expression than mitochondrial FA degradation (Appendix Fig [117]S2A). Finally, the 577 genes in the fourth quadrant (Q4) show high developmental variation and high tissue variation. They include 261 iCEL1314 genes, which are enriched in UDP‐glucuronosyltransferases (UGT) enzymes, guanylate cyclases, glyoxylate and dicarboxylate metabolism, and amino acid metabolism, such as arginine and proline metabolism and glutamate/glutamine metabolism (Appendix Fig [118]S2A). Interestingly, there are differences among different types of metabolic genes. For instance, amino acid metabolism genes are variant in both development and in tissues, while lipid metabolism genes are mostly variant in tissues, and growth and energy metabolism are relatively invariant in both development and in tissues. To evaluate metabolic gene expression more broadly, we combined 177 expression profiling datasets into an expression compendium, an earlier version of which we have used to study TF paralogs (Reece‐Hoyes et al, [119]2013; Figs [120]1A and [121]EV3A, Datasets [122]EV5 and [123]EV6, see [124]Materials and Methods). Using a CV threshold ≥ 0.75 in at least three datasets, we found that 775 of the 2,492 metabolic genes (~31%), including 284 iCEL1314 genes, are highly variant in the compendium, which is lower than nonmetabolic genes (44%, Figs [125]1B and [126]EV3B). This difference holds true for different cutoffs of the number of datasets showing high variation (Fig [127]1F) and across different CV thresholds (Fig [128]1G). However, the percentage of invariant genes is similar between metabolic and nonmetabolic genes using different CV cutoffs (Fig [129]1H). Figure EV3. Transcriptional regulation of metabolic genes across a compendium. Figure EV3 [130]Open in a new tab 1. Computational pipeline to determine transcriptionally regulated metabolic genes across the gene expression compendium. Datasets were evaluated for batch effects using correlation distribution. Some skewed datasets were corrected for batch effect by splitting into separate datasets, while some were removed if the source of skewness was not clear. This resulted in 177 datasets for analyses. CV of each gene was calculated. The criterion of classifying genes is based on the number of datasets with high CV and fraction of datasets with low CV. 2. Bar graph showing low expressed, invariant, moderately variant and highly variant genes in the compendium in iCEL1314 and other metabolic genes. Color legend as indicated in (A). When we compared highly variant genes in development, tissue, and compendium, we found that a total of 1,867 metabolic genes (75%) are highly variant in at least one of the three datasets and that 283 metabolic genes are highly variant across all three datasets (Fig [131]1B and [132]I). Using phenotypes provided in WormBase WS282 (Harris et al, [133]2020b), we found that the 75% highly variant metabolic genes are enriched in conditional response variants such as chemical and pathogen response, and depleted in essential phenotypes such as lethality, larval arrest, slow growth, and sterility (Appendix Fig [134]S2B). Finally, the remaining 624 (25%) of metabolic genes that are not highly variant in any dataset are similar in pathway enrichment as the Q1 genes discussed previously and are enriched in the essential phenotypes (Appendix Fig [135]S2C and D). Altogether, our analyses indicate that at least 75% (1,867 out of 2,492) of metabolic genes vary in mRNA levels and are therefore likely transcriptionally regulated, including 902 iCEL1314 genes (~69%; Fig [136]1B). This is similar to the proportion of varying nonmetabolic genes (~79%), indicating that metabolic genes are overall at least as much under transcriptional control as other genes (Fig [137]1B). A supervised approach shows widespread Coexpression of genes comprising metabolic pathways We previously found that the five genes comprising the propionate shunt pathway are coordinately activated in response to propionate accumulation (Watson et al, [138]2016; Bulcha et al, [139]2019). In addition, we found strong coexpression of genes functioning in the methionine/S‐adenosylmethionine (Met/SAM) cycle, for instance when flux through this pathway is perturbed (Giese et al, [140]2020). To systematically test which C. elegans metabolic pathways exhibit coexpression, we developed a custom pathway enrichment analysis pipeline (Fig [141]2A) based on gene set enrichment analysis (GSEA; see [142]Materials and Methods; Subramanian et al, [143]2005) and applied it to the compendium. We ran this pipeline using metabolic pathways, enzyme complexes, and enzyme families as defined in WormPaths (Walker et al, [144]2021). Henceforth, we use “category” to refer to a group of metabolic genes that best fit in an enzyme complex or related set of enzymes such as aminoacyl‐tRNA synthetases. We calculated an enrichment score (ES) that defines the enrichment of relatively high coexpression within that set. A normalized ES (NES) indicates relative strength of this enrichment compared with randomized tests, the significance of which is measured as a false discovery rate (FDR; Fig [145]2A). With an FDR cutoff of ≤ 0.05, 52 of 84 metabolic pathways or categories (~61%) exhibit coexpression, which is significantly more than expected by chance (Fig [146]2B and [147]C, Dataset [148]EV7, Appendix Fig [149]S3). As expected, the 52 coexpressed metabolic pathways and categories include the propionate shunt and the Met/SAM cycle (Fig [150]2D and [151]E). When we examined coexpression in pathways and categories separately, we found that 78% of categories showed significant coexpression compared to 58% of pathways (Fig [152]2B). Examples of metabolic pathways that exhibit high coexpression include peroxisomal FA degradation and starch and sucrose metabolism (Fig [153]2F and [154]G). Examples of coexpressed categories include vacuolar ATPases, ETC complex I, and aminoacyl‐tRNA synthetases (Appendix Fig [155]S3). There are 32 categories and pathways that do not exhibit self‐enrichment, including pantothenate and CoA biosynthesis and mevalonate metabolism (Fig [156]2H and [157]I, and Appendix Fig [158]S3). Such pathways may either not be regulated at all, may be regulated by allostery, or only one or a few genes in these pathways are transcriptionally regulated and may therefore function as key regulatory genes. Alternatively, these pathways maybe coregulated in conditions that were not yet profiled and therefore are not included in the compendium. Figure 2. Supervised approach to investigate coexpression of metabolic pathways. Figure 2 [159]Open in a new tab * A Custom computational pathway enrichment analysis pipeline that determines coexpressed genes functioning in the same metabolic pathway. Pairwise coexpression was based on the gene expression compendium. For every annotated metabolic pathway, the coexpression of pathway genes (columns) to all metabolic genes (rows) was extracted. A ranked list of genes was obtained for each pathway by taking the mean of coexpression values in rows while ignoring self‐correlations. Weighted gene set enrichment analysis was then performed to find significantly enriched pathways. If a pathway is self‐enriched with FDR ≤ 0.05, it is annotated as coexpressed. * B Bar graph indicating the percentage of metabolic pathways and categories that show significant coexpression compared with ones that are not self‐enriched for coexpression. * C Histogram denoting the number of significantly coexpressed metabolic pathways obtained by 1,000 randomizations while maintaining the structure of the data. * D–I Mountain plots showing self‐enrichment of (D) propionate shunt pathway, (E) Met/SAM cycle, (F) peroxisomal fatty acid degradation pathway, (G) starch and sucrose metabolism, (H) pantothenate and CoA biosynthesis, and (I) mevalonate metabolism. * J Metabolic pathways often consist of reactions catalyzed by single genes, OR genes and AND genes. All genes involved in the same pathway are collectively annotated as all pathway genes. Genes that are associated with distinct reactions are annotated as PW genes. PW gene pairs exclude AND and OR gene pairs. Met/SAM cycle pathway, which consists of 13 metabolic genes, is shown as an example. * K Percentages of pairs of AND genes, OR genes, other paralogs, operon genes, and random metabolic genes categorized as coexpressed using different coexpression values as cutoffs. Coexpression values are based on the gene expression compendium. * L Percentage of random, all pathway, PW, and PO gene pairs categorized as coexpressed using different coexpression values as cutoffs. Coexpression values are based on the gene expression compendium. The extent of within‐pathway coexpression of metabolic genes can be potentially confounded because metabolic reactions are often associated with multiple genes in gene‐protein‐reaction (GPR) annotations (Kim et al, [160]2008; Thiele & Palsson, [161]2010). There are two reasons for this. First, some metabolic reactions are catalyzed by enzyme complexes comprising two or more proteins. In such cases, all genes need to be expressed for the reaction to take place and are therefore annotated here as “AND” genes. Second, some genes are part of larger families (paralogs) that encode isozymes or highly similar proteins. Metabolic network reconstruction efforts use protein sequence homology to associate genes with reactions (Thiele & Palsson, [162]2010; Yilmaz & Walhout, [163]2016, [164]2017). As a result, multiple highly homologous paralogs may be associated with the same metabolic reaction. Such paralogs are annotated here as “OR” genes. Some reactions are associated with a combination of AND and OR genes (Fig [165]EV4A). Finally, for some gene families it may be that one member catalyzes one reaction and another member catalyzes another. Paralogs that are associated with distinct reactions are referred to here as “other paralogs” (Fig [166]EV4B). Pathways can be associated with multiple types of AND and OR genes (Fig [167]2J). Figure EV4. Reaction‐level analysis of metabolic pathways. Figure EV4 [168]Open in a new tab * A The conversion of succinate to fumarate, which is part of complex II of the ETC and of the TCA cycle, is carried out by succinate dehydrogenase. Diagram showing that succinate dehydrogenase is composed of the OR genes sdha‐1 and sdha‐2 that each function together with the rest of the genes as AND genes. The GPR of this reaction is noted as [(sdha‐1 | sdha‐2) & sdhb‐1] & mev‐1 & sdhd‐1]. Color legend is indicated. * B Example of a gene family (aagr) where members occur as paralogous OR gene pairs in separate reactions. Pairs of paralogs associated with different reactions are called other paralogs. * C Violin plot comparing coexpression for different populations of gene pairs including random, AND, OR, operon and other paralogs gene pairs in compendium of expression datasets. * D, E Percentages of AND, OR, other paralogs, operon genes and random metabolic gene pairs categorized as coexpressed using different coexpression values as cutoffs across developmental time (D) and tissues (E). Color legend as indicated in (D). * F–H Violin plots showing coexpression of random gene pairs, all pathway genes, PW genes and PO genes across compendium (F), different developmental stages (G) and tissues (H). AND genes encode proteins that function together in complexes, and such genes are often strongly coexpressed (Jansen et al, [169]2002). For example, genes encoding ETC complex members are coexpressed and coregulated (van Waveren & Moraes, [170]2008). Therefore, we wondered whether this holds true for AND genes in iCEL1314 and, if so, whether this would inflate pathway coexpression enrichment. To test this, we systematically assessed coexpression of different types of gene pairs. As expected, we found that AND genes are significantly more coexpressed than random gene pairs, OR genes and other paralogs (Figs [171]2K and [172]EV4C–E). Both OR genes and other paralogs are also more coexpressed than random metabolic gene pairs in all three datasets (Figs [173]2K and [174]EV4C–E). Surprisingly, OR genes are more coexpressed than other paralogs across tissues (Figs [175]2K and [176]EV4C–E). In C. elegans, ~18% of genes are transcribed from operons (Blumenthal et al, [177]2002). In total, 26% of metabolic genes occur in operons. However, they most frequently occur as a pair with a nonmetabolic gene. In total, 242 metabolic genes (~10% of all metabolic genes) occur in a pair with another metabolic gene in an operon (Dataset [178]EV1). As expected, these operon gene pairs are more coexpressed than random gene pairs, OR genes and other paralogs, thus serving as a validation for our coexpression analysis. However, these pairs are less coexpressed than AND genes and there is no overlap between the two categories. This shows that enzyme complexes are strongly coregulated and their coregulation mechanism is largely independent of operonic organization (Figs [179]2K and [180]EV4C–E). Based on the analysis of AND, OR and operon genes, it is difficult to determine the contribution of the coexpression of such gene pairs to pathway enrichment. Therefore, we examined coexpression of gene pairs that are annotated with distinct reactions in a pathway, which we refer to as pathway (PW) genes (Fig [181]2J). We found that PW gene pairs are significantly more coexpressed than random gene pairs (Figs [182]2L and [183]EV4F–H). We also examined coexpression of gene pairs that are not part of an operon, which we refer to as pathway excluding operon (PO) genes. There are only three pathway gene pairs that are part of operon; hence, there is no significant difference between pathway genes and PO genes coexpression (Figs [184]2L and [185]EV4F–H). Therefore, pathway coexpression is not just driven by AND, OR and operon genes, indicating that pathway genes' coexpression is a true feature of many metabolic pathways. A Semisupervised approach extracts coexpressed subpathways Our finding that metabolic pathways and categories exhibit extensive coexpression was based on previously annotated pathways (Walker et al, [186]2021). However, these pathways connect into the larger metabolic network and the definition of the start and ending of each pathway is somewhat arbitrary. Since there is extensive coexpression of genes that function together in predefined pathways, we reasoned that we may be able to use coexpression to extract metabolic (sub)‐pathways in an unbiased manner. To specifically focus on metabolic genes that function in connected reactions in the metabolic network, we developed a “coflux” metric that calculates flux dependency between metabolic genes using the network model (see details in [187]Materials and Methods; Dataset [188]EV8). Reactions in linear pathways have complete flux dependence (i.e., coflux = 1), while in branched pathways flux dependency may be partial (coflux = between 0 and 1), and in uncoupled reactions, there is no dependence (coflux = 0). We then used a custom semisupervised approach that multiplies coflux and coexpression values and clustered the resulting product matrix with a relatively stringent set of parameters (Fig [189]3A and [190]B, Dataset [191]EV9, Appendix Fig [192]S4A, see [193]Materials and Methods for details). Figure 3. Semisupervised approach to extract coexpressed and flux‐dependent metabolic genes. Figure 3 [194]Open in a new tab 1. Computational pipeline to extract tightly coregulated units in the metabolic network: Functional relationships are provided through theoretical flux associations (coflux) calculated using C. elegans metabolic network model iCEL1314, while expression correlations come from the compendium of 177 expression datasets. Hierarchical clustering on the product of coflux and coexpression matrix gives coexpressed metabolic pathways. 2. Heatmaps showing coflux and coexpression of iCEL1314 genes and clustered heatmap showing added modularity to coexpression space by product of coexpression and coflux. Color legend is indicated. 3. Distinct clusters denoted by clustered heatmap of genes in canonical and shunt pathways of propionate degradation were extracted using dynamic cut tree algorithm with stringent parameters (deepSplit = 2, minClusterSize = 3). Color legend as indicated in (B). As expected, the propionate shunt pathway genes formed a tight cluster (Fig [195]3C and Appendix Fig [196]S4B). Interestingly, while the first four genes, acdh‐1, ech‐6, hach‐1, and hphd‐1, occurred closely together, the fifth gene, alh‐8, was not part of the same cluster. This could be explained in two ways. First, alh‐8 encodes an enzyme that functions at a junction in the pathway where its substrate malonate semialdehyde is converted either to acetyl‐coa or, potentially, to beta‐alanine. Therefore, metabolic flux is divided in two directions and is not linearly coupled with the shunt pathway flux like the first four reactions. Second, alh‐8 is annotated to another reaction where 2‐methyl‐3‐oxopropionate is converted to propionyl‐CoA (Fig [197]3C). This approach also revealed another cluster comprising the canonical, vitamin B12‐dependent propionate degradation pathway, indicating that, like the propionate shunt, this pathway may also be transcriptionally activated or repressed under specific conditions (Fig [198]3C, Dataset [199]EV9). The semisupervised approach reveals pathway boundaries The propionate shunt example above shows that the semisupervised approach can extract subpathways (e.g., the propionate shunt) from previously defined pathways (e.g., propionate degradation) based on coexpression and coflux. We therefore used other clusters defined by the semisupervised approach to better define starts and ends of different pathways. An example of a pre‐annotated WormPaths pathway that was fully captured with the semisupervised approach is peroxisomal FA degradation (Fig [200]4A, Dataset [201]EV9). The first reaction in peroxisomal beta‐oxidation is catalyzed by acyl‐CoA oxidases (encoded by acox genes). Only acox‐1.1 and acox‐3 in the acox family genes are coexpressed with the other peroxisomal FA oxidation genes, indicating that they are more likely to function in this pathway than the other acox genes, which are coexpressed with each other, and with mitochondrial FA degradation genes (Dataset [202]EV9). Figure 4. Semisupervised approach defines metabolic pathway boundaries. Figure 4 [203]Open in a new tab 1. Clustered heatmap denoting a distinct cluster consisting of at least one gene from every reaction in peroxisomal fatty acid degradation. Heatmap genes are shown in bold font. Color legend, indicated here, applies to all panels. 2. Clustered heatmap showing a distinct cluster formed by the tyrosine degradation genes separate from the rest of the tyrosine metabolism. 3. Clustered heatmap showing a distinct cluster formed by a boundary within the histidine degradation pathway. 4. Clustered heatmap showing a distinct cluster formed by genes traversing pathway boundaries that are parts of propionate, alanine, and pyrimidine metabolism. Examples where only a subset of annotated pathway genes clustered together include tyrosine metabolism and histidine degradation. Tyrosine can be metabolized via different reactions, in different pathway branches (Fig [204]4B). In one pathway branch, tyrosine is degraded in five steps to produce fumarate and acetoacetate. The genes in this branch; gst‐43, C31H2.4, Y53G8B.1, hpd‐1, hgo‐1, fah‐1, and gst‐42 form a tight cluster (Fig [205]4B, Dataset [206]EV9). This cluster consists of OR genes hpd‐1 OR C31H2.4; and Y53G8B.1 OR gst‐42 OR gst‐43, which suggests that these genes are correctly annotated to this pathway branch. Histidine can also be degraded via two pathway branches: one converting histidine to glutamate through N‐formyl‐L‐glutamate and the other converting histidine to 3‐methylimidazoleacetic acid. The four genes associated with the conversion of histidine to N‐formyl‐L‐glutamate; haly‐1, Y51H4A.7, amdh‐1, and cpin‐1, form one of the top‐ranked clusters (Fig [207]4C, Dataset [208]EV9). However, the genes in the other branch are not coexpressed. We also found clusters consisting of genes that traversed different pathways. For instance, alh‐8 and gta‐1, which are functionally associated but not strongly coexpressed with the propionate shunt (Fig [209]3D), cluster with T09B4.8 (alanine metabolism) and four other genes belonging to pyrimidine metabolism: dpyd‐1, dhp‐1, dhp‐2 and upb‐1 (Fig [210]4D, Dataset [211]EV9). This observation functionally connects genes in what were heretofore separately annotated pathways, that is, pyrimidine, alanine, and propionate metabolism. The coexpression of these genes suggests that thymine is degraded, leading to the formation of propionyl‐CoA, L‐3‐amino‐isobutanoate and acetyl‐CoA. This observation also suggests that alh‐8 levels have a stronger functional role in the conversion of 2‐methyloxopropanoate to propionyl‐CoA than in the propionate shunt. Remarkably, this further indicates that alh‐8 may participate in both the generation and degradation of propionyl‐CoA. The stringent cluster derivation parameters we used favor small clusters, and as a result, interconnections between different pathways may be lost in the analysis. To unveil such connections, we relaxed the parameters to allow the derivation of larger clusters of coexpressed genes (Dataset [212]EV10, Appendix Fig [213]S4C). With these settings, the propionate shunt cluster expanded and included bckd‐1A, bckd‐1B, dbt‐1, Y43F4A.4, ard‐1, acdh‐3, acdh‐9, and B0250.5, which are annotated to branched‐chain amino acids (BCAA) isoleucine and valine degradation pathways, but not alh‐8 or gta‐1 (Fig [214]EV5A, Appendix Fig [215]S4D, Dataset [216]EV10). Propionyl‐CoA, the starting metabolite of the propionate shunt, is produced by the breakdown of valine and isoleucine. We recently proposed that the propionate shunt not only functions to detoxify excess propionate but also to produce acetyl‐CoA for ketone body and energy production (Ponomarova et al, [217]2022). The coexpression of valine and isoleucine breakdown genes with the propionate shunt indicates a functional connection between these pathways to produce energy. Figure EV5. Semisupervised clustering of product matrix using relaxed parameters. Figure EV5 [218]Open in a new tab 1. Distinct cluster of known coregulated metabolic pathway propionate shunt genes together with isoleucine and valine degradation genes obtained using relaxed clustering parameters (deepSplit = 3, minClusterSize = 6) with the dynamic cut tree algorithm. 2. Clusters of Met/SAM cycle genes and genes of related pathways obtained using relaxed (deepSplit = 3, minClusterSize = 6) and stringent (deepSplit = 2, minClusterSize = 3) parameters. Stringent clusters are shown near the drawn pathways of clustered genes. 3. Pathway boundary (green shade) within selenocompound metabolism defined by genes in a high‐quality cluster obtained by relaxed parameters and shown by the clustered heatmap (left). Mountain plots showing comparison of self‐enrichment analysis statistics (NES, ES and FDR) of selenocompound metabolism from WormPaths with that of selenocompound cluster obtained by the semisupervised approach (right). The Met/SAM cycle provides another example of different degrees of clustering that can be unveiled with different parameter settings (Fig [219]EV5B). The smaller clusters with stringent clustering captured different parts of one‐carbon metabolism with their connections to Met/SAM cycle, while relaxed clustering combined these genes into one single cluster. This cluster acts as a subsystem that connects the Met/SAM cycle on one side with glycerophospholipid metabolism, specifically phosphatidylcholine biosynthesis, which depends on methylation reactions using SAM (Walker et al, [220]2011), as well as with purine metabolism (Ducker & Rabinowitz, [221]2017). Second, Met/SAM cycle genes are highly coexpressed with the folate cycle gene which produces the methyl group that is used to convert homocysteine into methionine in the Met/SAM cycle (Ducker & Rabinowitz, [222]2017; Giese et al, [223]2020). Together, these results confirm that the Met/SAM cycle is overall coexpressed (Giese et al, [224]2020) and show that additional co‐functioning genes can be identified. The semisupervised approach also identified gene clusters that are not part of any coexpressed pathway identified by the supervised method above. An example is selenocompound metabolism, where a set of seven genes form a highly coexpressed cluster (FDR = 0.025, NES = 1.7; Fig [225]EV5C). In comparison, the respective FDR and NES values for self‐enrichment of the WormPaths pathway of selenocompound metabolism were 0.75 and 0.98 (Dataset [226]EV7). Altogether, these results illustrate that not all genes in a pathway are coexpressed and further indicate that a subset of a pathway or a combination of subsets from multiple pathways may be under transcriptional control, illustrating the utility of semisupervised approach as an addition to the predefined metabolic pathways in WormPaths. Metabolic pathway communities reveal coexpression among complexes and pathways To explore additional coexpression clusters than those that were captured by the relaxed settings described previously, we visually inspected the product matrix and extracted three clusters we refer to as metabolic pathway “communities” (Fig [227]5A). We analyzed these communities by WormPaths PEA (Walker et al, [228]2021). The first community is enriched in ETC complexes I, III, and IV, indicating broad transcriptional control of energy production (Fig [229]5B, Dataset [230]EV11). The second community is enriched in mitochondrial and peroxisomal FA degradation, FA biosynthesis, ascaroside biosynthesis, and BCAA degradation (Fig [231]5C, Dataset [232]EV11). The connection between FA metabolism and BCAA degradation may reflect the fact that some FAs are synthesized from BCAA breakdown products. For instance, branched‐chain fatty acids (BCFAs) are synthesized from the branched‐chain alpha‐keto acids of valine, leucine, and isoleucine such as isovaleryl‐CoA and isobutyryl‐CoA after their decarboxylation and further chain elongation (Daschner et al, [233]2001; Jia et al, [234]2016; Wallace et al, [235]2018). The third community is enriched in aminoacyl‐tRNA biosynthesis, N‐glycan biosynthesis, collagen biosynthesis, iron metabolism, and mevalonate metabolism, all of which produce biomass precursors. While aminoacyl‐tRNA synthetases play a major role in protein biosynthesis by linking amino acids to their cognate transfer RNAs (tRNAs), mevalonate metabolism provides precursors for glycan, collagen biosynthesis provides collagen for the formation of the cuticle and other extracellular matrices, and iron metabolism is important for many aspects of metabolism, including the production of heme groups of heme proteins. This result points toward the possibility that growth is transcriptionally regulated by a central mechanism controlling pathways that produce biomass precursors and assemble biomass (Fig [236]5D, Dataset [237]EV11). Taken together, we confirmed the coexpression of metabolic pathways and revealed coexpressed subpathways, as well as coexpression among pathways. Figure 5. Extraction of metabolic communities. Figure 5 [238]Open in a new tab * A Clustered heatmap indicating communities formed by multiplying coflux and coexpression values of iCEL1314 genes. (B, C, and D) define three major communities shown in the respective parts of this fig. * B–D PEA of communities B (B), C (C), and D (D). Metabolic subpathways are activated or repressed under different conditions The gene expression compendium is comprised of 177 expression profiling datasets that measure relative mRNA levels in a variety of experimental conditions and genotypes. Therefore, we next asked whether we could identify specific conditions in which different metabolic gene clusters are activated or repressed. Using a custom computational pipeline (Fig [239]6A), we first identified those datasets that best represent the coexpression of a particular cluster. We then manually investigated the top datasets for each cluster (see [240]Materials and Methods). To validate this approach, we first examined the expression of propionate shunt cluster genes in top‐scoring datasets. We previously showed that propionate shunt genes are repressed in animals fed Comamonas aquatica DA1877, a bacterium that (unlike the standard E. coli OP50 diet) produces vitamin B12, thus enabling flux through the canonical propionate degradation pathway (MacNeil et al, [241]2013; Watson et al, [242]2013, [243]2014, [244]2016). The dataset from that study, labeled as dataset 15 in the compendium, scored as most significant for propionate shunt gene coexpression, where the genes are expressed in animals fed E. coli OP50, but not in animals fed C. aquatica (Fig [245]6B, Dataset [246]EV4). Interestingly, propionate shunt genes are also highly coexpressed in a dataset that measured expression in spr‐5 mutants versus wild‐type animals across 1(f1), 13(f13) and 26(f26) generations (Fig [247]6B, dataset 139). Propionate shunt genes are more highly expressed in the N2 reference strain compared with spr‐5 mutant animals (Fig [248]6B). Since spr‐5 encodes a histone demethylase, this may indicate that the expression of shunt genes is regulated not only by TFs but also by epigenetic mechanisms. Figure 6. Condition analysis of metabolic gene coexpression. Figure 6 [249]Open in a new tab * A Computational pipeline to extract activating or repressing conditions of metabolic clusters. The mean coexpression of all gene pairs in each cluster in each dataset is calculated separately. To rank datasets that showed highest coexpression uniquely for each cluster, these mean coexpression values are normalized using z‐scoring across each dataset (as shown by heatmap). Thirty best datasets that potentially represent activation/repression conditions of each cluster are identified by the z‐score values of mean coexpression. Then, for each cluster, datasets are manually inspected in the order of decreasing mean coexpression along with its associated published paper and the heatmap to understand activation/repression conditions. * B–F Mean coexpression of the 30 best datasets for clusters of propionate shunt (B), peroxisomal fatty acid degradation (C), histidine degradation (D), tyrosine degradation (E), and canonical propionate degradation (F), followed by heatmap examples from selected datasets as indicated by bold‐blue dataset numbers. Color bar and heat map legend as indicated in (B). Peroxisomal FA degradation genes were most significantly coexpressed in dataset 132, which measured gene expression in precisely staged embryos during the first quarter of embryonic development (Fig [250]6C). The time course included a stage of exclusively maternal transcripts (four‐cell), the transition to zygotic transcription (28‐cell), and the presumptive commitment to the major cell fates (55‐, 95‐, and 190‐cell stages; Yanai & Hunter, [251]2009). Peroxisomal FA degradation genes were lowly expressed in four‐cell embryos and their expression increased in later embryonic stages, which may reflect a change in carbon source for energy and biomass generation prior to hatching and feeding. Peroxisomal FA degradation genes are also upregulated in animals fed P. aeruginosa compared with the standard E. coli OP50 diet (Fig [252]6C, dataset 51). This may reflect the high energy demand during infection (Nhan et al, [253]2019). Inspection of expression of the histidine degradation cluster discussed above revealed that it is highly expressed in animals fed C. aquatica (Fig [254]6D, dataset 15). However, these genes were not affected in animals fed E. coli supplemented with vitamin B12 (Bulcha et al, [255]2019), suggesting that the effect of C. aquatica on this cluster may be independent of this cofactor. Animals showed lower expression of this cluster upon exposure to UV treatment (Fig [256]6D, dataset 33). In humans, UV converts cis‐uruconate (the product of first reaction of histidine degradation, Fig [257]4C) to trans‐uruconate, which has been proposed to play a protective role in skin (Brosnan & Brosnan, [258]2020). Our result indicates that, in C. elegans, UV exposure rewires metabolic flux to avoid histidine degradation, for instance to preserve histidine that could be converted to trans‐uruconate. The tyrosine degradation cluster was most significantly coexpressed in dataset 11, wherein expression profiles of gas‐1 mutant animals, which are deficient in mitochondrial respiration, were compared with wild‐type animals (Falk et al, [259]2008; Fig [260]6E). This study revealed that free tyrosine levels are decreased in gas‐1 mutants. In addition, there is a failure of NAD^+‐dependent ketoacid oxidation in mitochondrial respiratory chain mutants (Falk et al, [261]2008). To compensate for this respiratory dysfunction, multiple pathways are upregulated, including the TCA cycle and ketone body metabolism (Falk et al, [262]2008). Since the end products of tyrosine degradation pathway cluster are the TCA cycle intermediate fumarate and the ketone body acetoacetate (Fig [263]4B), the function of the upregulation of tyrosine degradation during mitochondrial dysfunction may be to supply metabolites for compensatory pathways. Altogether, these results show that the gene expression compendium can be used to gain insight into the conditions that most greatly affect the activation or repression of different metabolic gene clusters. However, one needs to be careful to manually inspect the conditions of interest because sometimes coexpression can be biased by an outlier experiment. An example of this is dataset 47, one of the top datasets in which canonical propionate degradation pathway genes are coexpressed. Even though the conditions in this dataset are not related to canonical propionate breakdown, this dataset falsely appears as one of the top datasets due to coexpression driven by one bad outlier sample. It is however also possible that this outlier sample was unknowingly contaminated to change the nutritional or environmental state, hence driving the variable expression of these genes (Fig [264]6F). Overall, our systematic analysis revealed specific conditions of when metabolic gene clusters are activated or repressed, reinforcing our overall finding that transcriptional regulation plays an important role in the control of metabolism. WormClust web application enables gene‐by‐gene query to identify coexpression with metabolic (sub)‐pathways A major premise of this study is the assumption that variance in mRNA levels results, at least in part, from transcriptional regulation, which in turn suggests that genes are coexpressed because they are coregulated. In reverse engineering of gene regulatory networks, coexpression of TFs with their target genes has been used to define causal relationships (Segal et al, [265]2004; MacNeil & Walhout, [266]2011). To make our data available to the community as well as to enable the easy identification of TFs and other C. elegans genes that are coexpressed with metabolic (sub)‐pathways, we developed a web application named WormClust, which is available on the WormFlux website (Yilmaz & Walhout, [267]2016). This tool takes any C. elegans gene as input and evaluates its coexpression with metabolic (sub)‐pathways. If the query gene is an iCEL1314 gene, the output is a clustered heatmap of the coexpressed genes in the model based on product matrix, and according to the selected level of stringency, that is, relaxed, or stringent. If the query gene is not an iCEL1314 gene, then an association of the gene with annotated metabolic pathways is provided. The threshold for this association can be based on FDR and/or NES (Fig [268]7A). Figure 7. WormClust: a web application that enables querying of genes to identify coexpression with metabolic (sub)‐pathways. Figure 7 [269]Open in a new tab 1. Diagram showing the workflow of WormClust. A C. elegans gene is taken as input. If the gene is part of iCEL1314, a clustered heatmap of closely associated genes in product matrix of coflux and coexpression is displayed, based on stringency level of clustering. If the input gene is not an iCEL1314 gene, enrichment bar graphs of the gene to annotated metabolic pathways are displayed, based on selected FDR and NES thresholds. 2. Bar graph of pathways that are significantly coexpressed with nhr‐68 with NES ≥ 2 and FDR ≤ 0.05. 3. Plot showing significant coexpression of nhr‐31 with vacuolar ATPases (FDR ≤ 0.05 and NES ≥ 2). 4. Plot showing significant coexpression of nhr‐79 with peroxisomal fatty acid degradation (FDR ≤ 0.05 and NES ≥ 2). 5. Plot showing significant coexpression of vha‐20 with vacuolar ATPases (FDR ≤ 0.05 and NES ≥ 2). 6. Bar graph of pathways that are significantly coexpressed with acdh‐11 with NES ≥ 2 and FDR ≤ 0.05 (left). Examples of specific reactions in metabolic network model where acdh‐11 can be annotated as OR gene (right). We, and others, previously found that nuclear hormone receptor (NHRs) TFs frequently associate with metabolic genes in different types of assays and dataset (Van Gilst et al, [270]2005; Arda et al, [271]2010; Mori et al, [272]2017; Bhattacharya et al, [273]2022). To illustrate the utility of WormClust, we tested three NHRs with known metabolic pathway associations for coexpression with annotated metabolic pathways in the compendium of 177 C. elegans expression datasets. All of these showed coexpression with their target metabolic pathways (Fig [274]7B–D): nhr‐68 was highly coexpressed with the propionate shunt (FDR = 0.02 and NES = 2.1; Bulcha et al, [275]2019), nhr‐31 associated with vacuolar ATPases (FDR = 0.036, NES = 2.04; Hahn‐Windgassen & Van Gilst, [276]2009), and nhr‐79 is coexpressed with peroxisomal FA degradation (FDR = 0.019, NES = 2.3; Zeng et al, [277]2021). In addition to pathways, we also performed enrichment of clusters or subpathways from our semisupervised analysis with TFs. We found that nhr‐79 is enriched to cluster 16 (stringent), which contains peroxisomal FA degradation genes; nhr‐31 shows enrichment to cluster 5 (relaxed) that consists of vacuolar ATPases; and nhr‐68 shows enrichment to cluster 12 (stringent) which contains propionate shunt genes, albeit with higher FDR. In addition, nhr‐68 shows enrichment to cluster 40 consisting of mans‐2, hex‐2, and fut‐8 (N‐glycan biosynthesis), and cluster 51 consisting of bgal‐1, gana‐1 (galactose metabolism) and hex‐1 (sphingolipid metabolism; Appendix Fig [278]S5A–C, Dataset [279]EV9). This observation suggests that nhr‐68 may play a broader role in the regulation of metabolic gene expression. WormClust also provides an opportunity to annotate new metabolic genes. For example, vha‐20, a vacuolar ATPase that is not part of the original iCEL model because it was only recently annotated by WormBase and KEGG, shows significant enrichment to vacuolar ATPases (Fig [280]7E). This result suggests that WormClust can be used to “deorphan” unannotated metabolic genes. As another example, we found that acdh‐11 is highly coexpressed with mitochondrial fatty acid degradation and isoleucine degradation genes (Fig [281]7F). Therefore, we propose that acdh‐11 can now be added to the iCEL model as another OR gene to reactions in these pathways. We envision future systematic studies of orphan metabolic genes and TFs to increase the annotation of metabolic genes and elucidate the transcriptional mechanisms that regulate their expression. Discussion In this study, we performed a systems‐level analysis of mRNA level variation as a proxy for the transcriptional regulation in C. elegans. Even with our conservative approach, we found that most metabolic genes are under transcriptional control. By a combination of supervised and semisupervised methods, we found that genes in many metabolic pathways are coexpressed and identified gene clusters that represent parts of pathways, or combinations of pathways with strong coexpression. These results build upon earlier work in single‐cell organisms such as E. coli and S. cerevisiae (Ihmels et al, [282]2004; Kharchenko et al, [283]2005; Seshasayee et al, [284]2009; Ledezma‐Tejeida et al, [285]2017; Tang et al, [286]2021) implying that coexpression of metabolic genes that function together is a principle that is evolutionarily conserved. Our data indicate metabolic genes are more subject to transcriptional regulation across tissues than during development. What is the purpose of transcriptional activation or repression of metabolic genes? We propose that the transcriptional regulation of metabolic genes can serve different purposes. First, there is extensive transcriptional regulation of metabolic genes in different tissues. This can be viewed as the setup of metabolic network functionality depending on a tissue's needs. Indeed, tissues have different needs. For instance, the C. elegans intestine serves as the entry point of bacterial nutrients and requires the expression of enzymes that aid digestion and metabolite transport to other tissues. Similarly, the animal's muscle needs to produce energy to support movement and is therefore highly catabolic. Second, metabolic genes are transcriptionally regulated during development. This likely reflects the need for different aspects of metabolism as tissues differentiate and grow and as different metabolic functions are necessary. We refer to the expression of different metabolic genes and pathways in different tissues and different developmental stages as “metabolic network wiring.” A third function of metabolic gene activation or repression is under different conditions that dictate the need for different metabolic functions. This can be for the breakdown of different nutrients, for example, carbohydrates versus fats, versus protein, or to rewire metabolic pathways when others are perturbed. An example of this is the propionate shunt, which is transcriptionally activated when flux through the preferred, vitamin B12‐dependent pathway is perturbed (Watson et al, [287]2016; Bulcha et al, [288]2019). We refer to the transcriptional rerouting of metabolism as “metabolic rewiring.” Finally, metabolic genes can be transcriptionally activated when flux through the pathway in which they function is hampered. An example of this is the Met/SAM cycle in C. elegans, which is transcriptionally activated when flux through the cycle is low, for instance under low vitamin B12 dietary conditions (Giese et al, [289]2020). For genes encoding enzymes that function in multiple reactions and metabolic pathways (e.g., alh‐8), we can learn with which pathways they are more strongly coexpressed. Our study provides a facile portal to investigate the tissues, developmental stages, or conditions under which particular metabolic genes and pathways are highly coexpressed, which will help to formulate hypotheses for detailed follow‐up studies. Genes that are coexpressed often function together and are frequently coexpressed with their transcriptional regulators (Eisen et al, [290]1998; Hughes et al, [291]2000; Kim et al, [292]2001; Segal et al, [293]2003; Stuart et al, [294]2003). We used this principle to develop WormClust with which any C. elegans gene, including TFs, can be used to search for metabolic pathways with which it is coexpressed. However, it is important to note that the most critical regulators, those that respond to the initial information, are often not coexpressed with their target genes. Indeed, nhr‐10, which is essential for activation of the propionate shunt in response to high levels of propionate, does not change much in expression under relevant conditions (Bulcha et al, [295]2019). To identify such “first responders,” it will be useful to employ promoter‐reporter strains with large‐scale RNAi screens (MacNeil et al, [296]2015; Bhattacharya et al, [297]2022). Further, at least half of all C. elegans metabolic genes are not yet associated with reactions or pathways (Yilmaz et al, [298]2020). We have already shown the utility of WormClust in “deorphaning” genes and connecting them to metabolic network such as in case of vha‐20 and acdh‐11 (Fig [299]7E and [300]F). We propose that more such genes may be “deorphaned” in the future. Longer term, we envision that association of other types of regulators, such as RNA binding proteins and microRNAs, with metabolic pathways can be used to gain broader insights into the functional connections among different biological processes. Finally, the approaches used here should be broadly applicable to any organism, including humans, for which large gene expression profile compendia and high‐quality metabolic network models are available. By applying these approaches, deeper insights into the transcriptional control of metabolism will be obtained, as well as insights into the conditions under which metabolic genes and pathways are activated or repressed. Materials and Methods Reagents and Tools table Reagent/Resource Reference or Source Identifier or Catalog Number Software Python 3.6.6 [301]https://www.python.org N/A MATLAB 2019a [302]https://www.mathworks.com N/A GSEApy 0.1.18 [303]https://doi.org/10.1093/bioinformatics/btac757 N/A dynamicTreeCut 0.1.0 [304]https://CRAN.R‐project.org/package=dynamicTreeCut (Langfelder et al, [305]2008) N/A Numpy 1.18.3 Harris et al ([306]2020a) N/A Pandas 1.1.0 [307]https://pandas.pydata.org/ N/A Matplotlib 3.2.1 [308]https://matplotlib.org/ N/A Scikit‐learn 0.20.3 Pedregosa et al ([309]2011) N/A Scipy 1.4.1 Virtanen et al ([310]2020) N/A Seaborn 0.9 [311]https://joss.theoj.org/papers/10.21105/joss.03021 N/A Statannot 0.2.3 [312]https://doi.org/10.5281/zenodo.7213391 N/A Sleipnir Huttenhower et al ([313]2008) N/A [314]Open in a new tab Methods and Protocols Preprocessing of genes The master list of C. elegans genes considered for analysis was downloaded from WormBase public ftp site (release WS282; Harris et al, [315]2020b). The genes were filtered out to obtain only live and protein‐coding genes, which amounted to 19,985 genes in total. Development dataset Postembryonic expression profiles were based on published RNA‐seq data (Kim et al, [316]2013). Briefly, the authors measured the transcriptome of wild‐type (N2) animals from hatching to 48‐h posthatching every 2 h. This dataset includes 21,714 protein‐coding genes, including 2,405 metabolic genes. Genes were classified into 12 clusters based on their expression profiles (Kim et al, [317]2013). We refer to the cluster showing relatively invariant expression as the “flat cluster” and the genes within this cluster as “flat genes.” Selecting only live protein‐coding genes (WS282) resulted in a total of 18,113 genes, including 2,397 metabolic genes. Of these, 4,689 are flat genes, including 995 metabolic genes. We generated a histogram of average gene expression across development using the logarithm of reads per kilobase per transcript (RPKM) values at base 2. This resulted in a bimodal expression distribution that was fitted by two superposed Gaussian curves, representing a high‐expression subpopulation and a low‐expression subpopulation (LES). Genes that showed expression values less than the mean plus the standard deviation of LES at all the time points were filtered out to avoid false fluctuations in gene expression. After this step, a total of 14,561 genes were left, including 2,184 metabolic genes. The number of flat genes was reduced to 4,646, including 986 metabolic genes. Tissue dataset Tissue‐level expression profiles were based on a single‐cell RNA‐seq dataset of animals at the second larval stage (L2; Cao et al, [318]2017). This dataset provides gene expression as transcripts per million (TPM) for 20,271 protein‐coding genes including 2,506 metabolic genes across seven major tissues: body wall muscle, glia, gonad, hypodermis, intestine, neurons, and pharynx. Selecting only live genes (WS282) reduced the number of genes to 19,675, including 2,491 metabolic genes. The dataset was previously processed to label gene expression in every tissue according to the level of expression into four categories: high, moderate, low, and rare (Yilmaz et al, [319]2020). Genes that showed rare or low expression in all seven tissues were filtered out in this study, resulting in 13,305 genes including 2,143 metabolic genes. Gene expression compendium A compendium of gene expression datasets was generated using a combination of public datasets. First, 374 microarray, RNA‐Seq, and tiling array datasets related to C. elegans were downloaded from WormBase (Harris et al, [320]2020b). Then, only those datasets that consisted of at least 10 conditions were selected, resulting in 169 datasets. These datasets were individually examined for batch effects, since many were obtained from multiple microarray experiments where total RNA was not normalized. Initially, histograms of expression values were analyzed, and 16 microarray datasets that displayed abnormal distributions where the correlation distributions were skewed toward +1, hence suggesting that the data may consist of samples that are highly distinctive from each other or are from separate experiments altogether. Such datasets were selected for further examination (Fig [321]EV3A). Twelve of these datasets were found to be composed of two subsets of data, where all genes in one subset were up or downregulated with respect to the other except for a few. The samples forming each subset were independent of the other and seemed to have different amounts of total RNA or a similar batch effect. Therefore, these datasets were divided into two separate datasets to correct for batch effects. The remaining four datasets were removed since the source of abnormalities in their distributions of expression was not clear. This processing resulted in a total of 177 datasets. For each dataset, the expression of every gene was normalized by converting the expression values to z‐scores based on expression across all conditions using the Normalizer function of Sleipnir library (Huttenhower et al, [322]2008). Once all the datasets were z‐normalized, they were combined to form a compendium with 4,796 conditions (sum of multiple conditions within 177 datasets) using the Combiner function of the Sleipnir library (Huttenhower et al, [323]2008), which took a union set of all genes across the different datasets and converted missing values to NaN (not a number) for subsequent processes. Calculation of variation score in the development dataset We define variation score (VS) as a measure of the deviation of a gene's expression profile from a flat reference over time in the development dataset (Kim et al, [324]2013). Prior to any analysis, expression values of every gene were normalized by total expression in all time points using equation ([325]1), [MATH: xinorm=xi1ixi, :MATH] (1) where x [i ]indicates the expression value of gene x at time i. To define a reference profile of invariant expression, a line was constructed in time by joining the mean normalized expression value of the flat cluster at every time point. An envelope around this line was then defined by adding and subtracting the standard deviation of each point. A deviation from this envelope, referred as variation score (VS), was then computed by taking the average distance between an individual gene profile and the flat reference profile according to equation ([326]2), [MATH: VSg=dinwithdi=0, if xinorm < msub>μi ± σi else minxinorm μi + σi,xinorm μi σi :MATH] (2) where n is the number of observations for the gene g, d [i ]is the distance at time i between the normalized level of expression of the gene [MATH: xinorm :MATH] and the closest border of the reference flat profile, and μ[ i ]and σ[ i ]are mean and standard deviations of normalized expression values of flat genes at time point i, respectively. A graphical example of this calculation is provided in Fig [327]EV1A. With this definition, a VS = 0 means that the profile of a given gene stays within the envelope of the flat cluster and is therefore perfectly flat, or invariant. To define highly variable genes, we empirically established a conservative VS threshold value of 0.169 based on the distribution of VS between flat genes and all other genes, such that 97% of flat genes were not annotated as variant (Fig [328]EV1B, see details in [329]Materials and Methods). Calculation of coefficient of variation Coefficient of variation (CV) is a statistical measure that is used to calculate the dispersion of data. For every gene, CV was calculated by dividing standard deviation of expression across different samples (σ) (e.g., different tissues in case of tissue dataset) to the mean of expression across samples (μ). CV was empirically thresholded using the CV of known propionate shunt genes to keep the approach conservative. [MATH: CV=σμ :MATH] (3) Categorizing genes based on expression variation in compendium To be consistent with the approach used with the tissue dataset and to be conservative in our assessment, a CV threshold of 0.75 was selected and required that highly variant genes had a CV greater than or equal to this threshold in at least three datasets. Genes that showed CV < 0.3 in at least 95% of the datasets were labeled as invariant, and genes that fit into neither category were annotated as moderately variant (Fig [330]EV3A). It was further assumed that genes that are not present in a dataset are lowly expressed, and were removed from the analysis. Calculation of coexpression of gene pairs The correlation in expression of metabolic gene pairs during development, across tissues, and across the compendium of gene expression studies was calculated based on Pearson correlation coefficient (PCC) using the Distancer function of Sleipnir library (Huttenhower et al, [331]2008). These correlations defined pairwise coexpression. Differences in the distribution of coexpression values between random, AND genes, OR genes, other paralogs, all pathway genes, and PW genes were evaluated using Mann–Whitney U test (Fay & Proschan, [332]2010). Custom pathway enrichment analysis pipeline Pathway‐to‐gene annotations from level 4 of WormPaths (Walker et al, [333]2021) were used as input gene sets. Each metabolic pathway (or category such as an enzyme complex; hereafter referred to as pathway for simplicity) consists of two or more annotated metabolic genes. The coexpression of genes in each metabolic pathway with all other genes in the metabolic network was extracted from the compendium. Subsequently, the mean of the correlations of pathway genes with all other metabolic genes (excluding the self‐correlations) was calculated. The mean values were used to define a ranked list of metabolic genes for every metabolic pathway. GSEA was then performed on the preranked list of each pathway using the PreRank module (Subramanian et al, [334]2005). Enrichment score (ES) is the degree to which the genes in a gene set are overrepresented at the top or bottom of the entire ranked list of genes. Since the genes that are functionally related are mostly positively correlated, we only consider the genes at the top of the list, hence the ones positively contributing to ES. Leading edge subset enlists the gene hits before the peak while calculating ES, therefore consisting of genes that contribute the most to the enrichment score. NES was derived for each pathway by normalizing the ES values to mean ES for all permutations of the gene sets. This accounts for differences in gene set size. FDR is the estimated probability that a gene set with a given NES represents a false‐positive finding. The significance cutoff for the GSEA was set at an FDR value of ≤ or =0.05. If a pathway was found to be significantly self‐enriched, it was categorized as coexpressed. We validated this result by running the custom pathway enrichment analysis pipeline on 1,000 randomized gene sets. For these randomizations, the structure of the data was maintained such that the correlation matrix, the number of genes in each pathway, and the number of times an individual gene was repeated across pathways all remained the same. Semisupervised approach that combines Coflux and Coexpression The first part of the approach involves using flux balance analysis (FBA) to simulate reaction rates (fluxes) in the metabolic network and then using a flux dependency metric, referred to as coflux, to measure pairwise associations of genes in the C. elegans metabolic network model (Yilmaz & Walhout, [335]2016; Yilmaz et al, [336]2020). We examined all reaction pairs to see whether constraining the flux of one of the reactions to zero reduces the flux of the other (see below for the algorithm). The coflux value is zero for independent reactions and one for reactions that are fully coupled. Reactions that are connected by a junction to another reaction are usually partially dependent. After generating coflux values for each pair of reactions, we converted the reaction matrix to a pairwise gene coflux matrix using GPR associations. A high coflux value for a gene pair indicates that the genes encode enzymes acting in the same metabolic process. For the second part of the approach, a coexpression matrix was derived from the C. elegans gene expression compendium described above. All negative correlations were converted to zero to be consistent with the coflux matrix. The coflux and coexpression matrices were multiplied to obtain a product matrix. Since both coexpression and coflux values are between 0 and 1, a product takes a high value only if both coflux and coexpression values are high. Hierarchical clustering was then performed on the product matrix using dynamic cut tree algorithm using cutreehybrid package (Langfelder et al, [337]2008). Coflux algorithm The coflux value for each gene pair was calculated using FBA with iCEL1314 (Yilmaz et al, [338]2020). First, the standard bacterial diet was amended with a minimum set of nutrients (i.e., by allowing uptake through exchange reactions in the model as indicated in Dataset [339]EV8) that warranted nonzero flux in all reactions of the model. Then, the following steps were taken to calculate coflux values: * For every irreversible reaction i, + Calculate v[max,i ], the maximum flux that can be achieved with the intact network. + For every reaction j, calculate v[max,ij ], which is the maximum flux observed in reaction i when reaction j is constrained to a flux of zero. If i is included in a predefined set of 15 redundant reaction pairs (i.e., reactions with similar reactants and products except for differences such as the use of NADP instead of NAD as electron carrier, Dataset [340]EV8), the flux of the corresponding reaction in the pair was also constrained to zero. + For every reaction j, calculate the coflux with i (c [ij ]) using equation ([341]4). [MATH: cij=Vmax< /mi>,iVmax,ijV max,i :MATH] (4) * For every reversible reaction i, + For every reaction j, repeat the above steps to calculate the coflux with i in forward direction (c [ij,forward]). + Calculate v[min,i ], the minimum (i.e., the most negative, as negative flux indicates flux in reverse direction) flux that can be achieved with the intact network. + For every reaction j, calculate v[min,ij ], which is the minimum flux observed in reaction i when reaction j is constrained to a flux of zero. Once again, the reaction redundant with reaction i is also constrained to zero flux, if applicable. + For every reaction j, calculate the coflux with the reverse direction of i (c [ij,reverse]) using equation ([342]5). [MATH: cij,reverse=Vmin< /mi>,iV< mi>min,ijV min,i :MATH] (5) * For every reaction j, calculate final coflux with i as the maximum of c [ij,forward] and c [ij,reverse] (equation [343]6). [MATH: cij=maxcij,forwardabscij,reverse :MATH] (6) * Since c [ij ]and c [ji ]are not necessarily equal, calculate final coflux value for every reaction pair using equation ([344]7). [MATH: cij,final=maxcijcji :MATH] (7) * Convert the reaction coflux matrix to a gene coflux matrix based on gene–reaction associations. If a gene pair is associated through multiple reaction pairs (i.e., when at least one of the genes is associated with multiple reactions), take the maximum of coflux values between reactions to calculate gene coflux. Hierarchical clustering Hierarchical clustering was performed using average method of linkage on the dissimilarity matrix generated by 1 minus the product matrix value (coflux*coexpression). Dynamic cut tree algorithm from cutreehybrid package was used to cut the dendrogram generated by this clustering with stringent parameters deepSplit = 2 and minClusterSize = 3 and relatively relaxed parameters deepSplit = 3 and minClusterSize = 6 (Langfelder et al, [345]2008). The stringent setting was thresholded based on the occurrence of propionate shunt genes together in one cluster while keeping the size of the smallest cluster to be at least 3. The relaxed setting was chosen to capture larger clusters, such as the Met/SAM cycle genes in a single cluster. Quantifying cluster quality through Silhouette score Silhouette score determines the quality of clustering by measuring the cohesiveness of genes within the same cluster and separateness from the genes in the neighboring clusters (Rousseeuw, [346]1987). It was calculated using scikit‐learn package (Pedregosa et al ([347]2011). We first calculated silhouette score of each metabolic gene based on its placement in each cluster and then calculated mean silhouette score (MSS) for every cluster. Stringent clustering led to 197 clusters, ranging in size from three to 27 genes. We ranked these stringent clusters using MSS, where few of the top‐ranked were inspected in more detail (Appendix Fig [348]S3A). For relaxed clustering, we followed the same approach (Appendix Fig [349]S3C). Finding activation and repression conditions of metabolic clusters To find activation and repression conditions of each cluster, we first calculated the mean coexpression of all gene pairs in that cluster in each dataset separately. To rank datasets that showed highest coexpression uniquely for each cluster, we normalized these mean coexpression values of all clusters using z‐scoring across each dataset. We then identified the 30 best datasets that potentially represent activation/repression conditions of each cluster by the z‐score values of mean coexpression. After this, we manually inspected each dataset in the order of decreasing mean coexpression, its associated published paper, and the heatmap to understand activation/repression conditions. Gene‐centric coexpression with metabolic (sub)‐pathways by WormClust We developed a custom computational pipeline that identifies coexpression of C. elegans genes with metabolic genes used in this study. The pathway gene sets were generated using WormPaths as a GMT (Gene Matrix Transposed) file, a tab‐delimited file of gene sets (Walker et al, [350]2021). The ranked coexpression list of metabolic genes was extracted for each queried gene, from the global coexpression matrix generated using compendium of 177 datasets. The ranked list of each queried gene was used to run Gene Set Enrichment Analysis (GSEA) on the custom metabolic pathway gene sets. Author contributions Albertha JM Walhout: Conceptualization; supervision; funding acquisition; writing – original draft; project administration; writing – review and editing. Shivani Nanda: Conceptualization; software; formal analysis; investigation; methodology; writing – original draft; writing – review and editing. Marc‐Antoine Jacques: Conceptualization; investigation; methodology; writing – review and editing. Wen Wang: Methodology. Chad L Myers: Supervision; methodology; writing – review and editing. L Safak Yilmaz: Conceptualization; formal analysis; supervision; investigation; methodology; writing – review and editing. Disclosure and competing interests statement The authors declare that they have no conflict of interest. AJMW is an editorial advisory board member. This has no bearing on the editorial consideration of this article for publication. Supporting information Appendix [351]Click here for additional data file.^ (6.4MB, pdf) Expanded View Figures PDF [352]Click here for additional data file.^ (474.8KB, pdf) Dataset EV1 [353]Click here for additional data file.^ (371.7KB, xlsx) Dataset EV2 [354]Click here for additional data file.^ (1.1MB, xlsx) Dataset EV3 [355]Click here for additional data file.^ (1.2MB, xlsx) Dataset EV4 [356]Click here for additional data file.^ (177.7KB, xlsx) Dataset EV5 [357]Click here for additional data file.^ (104.9KB, xlsx) Dataset EV6 [358]Click here for additional data file.^ (2.9MB, xlsx) Dataset EV7 [359]Click here for additional data file.^ (88.2KB, xlsx) Dataset EV8 [360]Click here for additional data file.^ (176.4KB, xlsx) Dataset EV9 [361]Click here for additional data file.^ (47.3KB, xlsx) Dataset EV10 [362]Click here for additional data file.^ (37.6KB, xlsx) Dataset EV11 [363]Click here for additional data file.^ (80.2KB, xlsx) PDF+ [364]Click here for additional data file.^ (10.5MB, pdf) Acknowledgements