Abstract

   Soybean is highly sensitive to flooding and extreme rainfall. The
   phenotypic variation of flooding tolerance is a complex quantitative
   trait controlled by many genes and their interaction with environmental
   factors. We previously constructed a gene-pool relevant to soybean
   flooding-tolerant responses from integrated multiple omics and
   non-omics databases, and selected 144 prioritized flooding tolerance
   genes (FTgenes). In this study, we proposed a comprehensive framework
   at the systems level, using competitive (hypergeometric test) and
   self-contained (sum-statistic, sum-square-statistic) pathway-based
   approaches to identify biologically enriched pathways through
   evaluating the joint effects of the FTgenes within annotated pathways.
   These FTgenes were significantly enriched in 36 pathways in the Gene
   Ontology database. These pathways were related to plant hormones,
   defense-related, primary metabolic process, and system development
   pathways, which plays key roles in soybean flooding-induced responses.
   We further identified nine key FTgenes from important subnetworks
   extracted from several gene networks of enriched pathways. The nine key
   FTgenes were significantly expressed in soybean root under flooding
   stress in a qRT-PCR analysis. We demonstrated that this systems biology
   framework is promising to uncover important key genes underlying the
   molecular mechanisms of flooding-tolerant responses in soybean. This
   result supplied a good foundation for gene function analysis in further
   work.

   Subject terms: Plant sciences, Systems biology

Introduction

   Soybean [Glycine max (L.) Merr] provides abundant flavonoids,
   plant-based proteins and lipids. It is the major protein source for
   vegetarians. Soybean is nutritious for their isoflavones and
   anthocyanins belonging to flavonoid compounds^[36]1. Isoflavones, of
   which soybean has higher content, generally exist in many kinds of
   plants^[37]2. Isoflavones have been functionally linked to
   anti-oxidation, reduction in inflammation, inhibition of free radicals,
   and cancer prevention^[38]3–[39]5. Anthocyanin and its main
   constituents, such as cyaniding-3-O-glucoside, present in soybeans can
   effectively inhibit lipopolysaccharide, hydrogen peroxide, and
   pro-inflammatory cytokines, which are a natural source of antioxidants
   and anti-inflammatory^[40]6–[41]8. Hence, soybeans could be used to
   boost the nutritional content, nutraceutical products, and potential
   therapeutic agents for some pathological diseases.

   Soybeans are highly sensitive to growth conditions, particularly in
   flooding environments^[42]9,[43]10. In recent years, global agriculture
   damage and losses from changing climate (e.g. flooding) have
   increased^[44]11,[45]12. Extreme torrential rain or momentary heavy
   rain brought by strong southwesterly air currents or jet streams
   induced by typhoons has caused severe flooding during soybean
   (including edamame) autumn seedlings in southern and western areas of
   Taiwan. In the United States, flooding can occur sequentially during a
   single crop cycle or independently in the same fields during different
   years. Over the past 15 years, flooding resulted in $6.2 billion worth
   of soybean production losses. Loss of soil, nutrients, and pesticides
   to waterways is a major problem in high agricultural production areas
   such as the Mid-western United States^[46]13,[47]14. In China, the
   flooding stress of soybean is associated with excess irrigation that
   impairs water uptake, and soil waterlogging is largely affected by the
   season^[48]15. The total summer crop sown area in 2020 is 26.17 million
   hectares; therefore, the floods affected 23% of the planted area of
   summer crops and caused 4.3% of crop failure. Facing such high
   uncertainty climate change, we need a systematical and comprehensive
   method to find the whole picture of defense mechanisms against flooding
   for breeding stress-tolerant cultivars.

   There is general recognition that flooding can be classified into
   waterlogging, when the water covers only the root system, and
   submergence, when the water covers both the shoot and the root system,
   according to water levels above the soil surface^[49]16. The present
   study mainly focuses on submergence. Abiotic stresses can disturb plant
   growth and adversely affect growth characteristics, for example, leaf
   etiolation and the number of pods per plant^[50]17–[51]19. Under the
   flooding stress, the contents of flavonoid compounds in soybean
   increase significantly, but the yields decrease
   simultaneously^[52]20–[53]22. Also, cell wall maturation, cell wall
   formation, and plant development will be seriously changed during
   flooding^[54]23–[55]26. Thus, a better understanding of the
   physiological mechanisms involved in flooding-induced response and
   tolerance of soybeans is needed for breeding work.

   Mechanisms related to flooding tolerance or response have been
   investigated and reviewed^[56]27,[57]28. At initial flooding stress of
   soybean, ATP-citrate lyase and xylosidase decrease while alcohol
   dehydrogenases and calreticulin increase^[58]29. These enzymes are
   related to the tricarboxylic acid cycle, cell wall maturation, alcohol
   fermentation, and calcium homeostasis^[59]26,[60]30,[61]31. Prolonged
   submergence caused a significant decline in photosynthesis, stomatal
   conductance, and the nutrition absorption of leaves^[62]32. Soybean
   produces abscisic acid (ABA) to regulate protein kinases under
   hypoxia^[63]33. These protein kinases are related to pathways including
   glycolysis, cell organization, and vesicle
   transport^[64]20,[65]22,[66]33. The proteomic analyses have found that
   excessive water supply for soybean roots induces anthocyanin 5-aromatic
   acyltransferase, anthocyanin malonyltransferase, and isoflavone
   reductase to increase^[67]20,[68]34,[69]35. These protein kinases
   facilitate isoflavones and anthocyanins to increase the survival rate
   after flooding. Although many molecular and physiological mechanisms
   were reported, mechanisms of flooding-induced response and tolerance
   have yet to be fully clarified for soybean. No studies were reported on
   the enhancement of pathway analysis for flooding tolerance and
   response, a polygenetic trait, by introducing multigenes selected from
   an integrated knowledge framework in a systematic and comprehensive
   design^[70]36.

   Flooding tolerance is a complex quantitative (or polygenic) trait,
   which is regulated through several biological pathways that are
   controlled by a number of genes (i.e., polygenes). Many functional
   mechanisms studies for flooding tolerance in soybean have been
   reported^[71]20,[72]33,[73]37. Most of the studies were based on
   selected candidate genes that were hypothesis-driven, such as
   text-mining-based^[74]38 and meta-analysis-based^[75]39. However, these
   mechanisms may only partially explain flooding due to a limited
   understanding of the genetic make-up of a polygenic trait, particularly
   flooding tolerance. Furthermore, potential biases might have affected
   the results using the hypothesis-free approach, for example,
   genome-wide association study (GWAS), because it is challenging to
   account for variations between germplasms and quantitative
   trait^[76]40. It is also challenging to balance the results between
   false positives and false negatives in GWAS^[77]41. Determining the
   genetic makeup underlying flooding tolerance in soybean is crucial to
   precisely identifying mechanisms related to flooding tolerance or
   responding to stress. Hence, applying pathway-based analysis to
   selected candidate genes can systematically integrate prior biological
   knowledge of gene regulating functions and biological pathway
   information or functional categories to figure out the whole picture of
   physiological mechanisms for flooding tolerance in soybean. This can
   reveal a more comprehensive picture at the molecular level than a
   single marker-based or gene-level analysis.

   The main purpose of system biology is to precisely explore the unknown
   mechanisms in experimental data containing implicit biological
   information^[78]42. Through systematical methods, pathway enrichment
   analysis, and network analysis, for example, enable us to understand
   the signal transmission of responses biologically in a plant cell being
   stimulated by an environmental factor. These signals are complicated,
   information-worthless in a single signal but information-valuable in
   systematic manners^[79]43. Pathway enrichment analysis, a
   knowledge-based approach, provide biological insights into molecular
   responses to a trait of interest from integrated omics and non-omics
   (OnO) data^[80]44. Pathway enrichment analysis detects whether
   particular biological pathways or molecular groups are significantly
   overrepresented. Networks have successfully carried on the idea of
   graph theory and probability theory to succinctly represent a
   mathematical structure of biological components using a group of nodes
   (e.g. proteins, genes, pathways) and links (e.g. genetic and/or
   functional interactions)^[81]45. Using available biological knowledge
   and candidate genes selected from integrated OnO data for network
   analysis provides a great potential to uncover novel information on
   complex biological networks^[82]46.

   Methods of pathway enrichment analysis in systems biology can be
   generalized into, but not limited to, competitive and self-contained
   method^[83]47. In the competitive method, it compares associations
   between two gene sets (i.e., genes in a specific pathway versus genes
   not in that pathway) and traits, such as a hypergeometric test^[84]48.
   However, the self-contained method only considers associations between
   the genes in a specific pathway and traits, such as sum-statistic (e.g.
   SUMSTAT) and sum-square (e.g. SUMSQ) statistic^[85]49. There are
   several examples that successfully applied pathway-based analysis to
   explore potential mechanisms and biological functions for important
   traits in plants, including cytoplasmic male sterile in soybean^[86]50,
   comparison between a mutant gene and wild-type in soybean^[87]51, or
   high temperature in soybean^[88]52. Recently, Naithani et al.^[89]53
   developed the Plant Reactome, a knowledgebase and resource for
   pathway-based analysis in plants to address important biological
   questions and regulatory mechanisms. Many open-access knowledgebase
   data such as Gene Ontology (GO, [90]http://geneontology.org/) and Kyoto
   Encyclopedia of Genes Genomes (KEGG,
   [91]https://www.genome.jp/kegg/kegg2.html) are commonly used worldwide.
   These functional annotations provide opportunities to access the whole
   map underlying a specific trait via systematically testing unknown
   functional gene sets by statistical model^[92]54. The networks
   integrate biological information (e.g. proteins, molecules, pathways),
   and quantify nucleic acid information, providing information on the
   associations between several genetic loci, and how genes and pathways
   interact with each other (i.e., gene modules) to regulate traits. The
   association between genes can be visualized by the network composed of
   nodes and edges, making the complex associations between genes
   presented in a simple and trivial way^[93]55,[94]56. It is practical
   and efficient way to reveal enriched pathways and networks for
   flooding-tolerant responses using candidate genes prioritized from
   integrated OnO databases^[95]57.

   We previously developed a comprehensive framework to integrate OnO data
   that is relevant to flood-tolerant responses in soybean. A total of
   36,705 genes were collected and prioritized according to their
   magnitude of association with flooding-tolerant responses^[96]36. In
   this study, we introduced a systems biology framework (Fig. [97]1),
   through the pathway enrichment analysis (both the competitive and the
   self-contained methods) and network analysis to combine the joint
   effects of the 144 prioritized flooding tolerance genes (i.e., FTgenes)
   (Fig. [98]2) to uncover the molecular mechanisms underlying
   flooding-tolerant responses in soybean. The strategies proposed in this
   study can better understanding in how flooding-tolerance genes act
   against a flooding event and protect soybean plants from floods in
   complex biological systems.

Figure 1.

   [99]Figure 1
   [100]Open in a new tab

   The study pipelines. This pipeline consists of six steps. The first
   step is the GO annotations filtering. The second step is gene-wise
   statistic score calculation. The third step is the pathway enrichment
   analysis. The fourth step is the gene network construction. The fifth
   step is the discovery of key genes. The final step is the validation
   study using the qRT-PCR experiments.

Figure 2.

   [101]Figure 2
   [102]Open in a new tab

   Manhattan plot of the flooding-tolerance genes in soybean. The dashed
   line is the cut-off score of 42. Dots colored in red are the FTgenes.

Results

Gene-pathway mapping

   A total of 14,772, 17,017, 19,060, and 18,889 expression data (Step 1
   in Fig. [103]1) from soybean roots after 3, 6, 12, and 24 h (h) of
   submergence treatments in the RNA-seq database^[104]58 were used as the
   test sets to conduct pathway enrichment analyses. Only pathways (i.e.
   GO terms) containing at least one FTgenes (Fig. [105]2) were
   considered, resulting in 417 annotated pathways (Step 1 in Fig. [106]1)
   for pathway enrichment analysis.

Gene-wise statistic values

   For gene score calculation, we transformed expression-level statistics
   (i.e. p-values) using 10-based logarithms into gene-wise statistic
   scores (Step 2 in Fig. [107]1) to measure changes in gene expression in
   roots flooded after 3, 6, 12, and 24 h. The distributions of gene-wise
   statistic scores were highly skewed to the right (Fig. [108]3), as seen
   in the microarray data. The expression skewness of each dataset was
   4.82, 4.31, 4.91, and 4.24, respectively, indicating the expression
   skewness has the potential to reveal new insights into the FTgenes
   (Fig. [109]2) in the analyses of pathway enrichment and gene network.

Figure 3.

   [110]Figure 3
   [111]Open in a new tab

   Expression skewness of gene-wise statistic scores. The p-values were
   transformed into 10-based logarithms for capturing changes in
   expressions of soybean roots flooded after (A) 3 h, (B) 6 h, (C) 12 h,
   and (D) 24 h. The red solid dots represent the FTgenes with a combined
   score greater than 42. The dashed line represents − log(p-value) = 3.

Competitive method (hypergeometric test) revealed the mechanisms of
flooding-tolerant responses

   Using the hypergeometric model test (Step 3 in Fig. [112]1), we
   initially found 27 pathways (Fig. [113]4) with at least one nominal
   p-value less than 1.00 × 10^–4 that were enriched with flooding
   tolerance or response to the stress in the gene expression data from
   soybean roots after 3, 6, 12, and 24 h of submergence treatments. Among
   them, 24 pathways were significantly enriched at all four-time points
   after submergence treatments. Table [114]1 demonstrated detailed
   information on significantly enriched pathways overrepresented in the
   gene expression dataset. The top five pathways included ‘abscisic acid
   mediated signaling pathway’, ‘response to ethylene stimulus’, ‘ethylene
   biosynthetic process’, ‘hyperosmotic salinity response’, and ‘response
   to the jasmonic acid stimulus’. Two pathways, ‘abscisic acid mediated
   signaling pathway’ and ‘response to ethylene stimulus’, were the most
   significantly enriched at 3 h after submergence treatments. The pathway
   of ‘abscisic acid mediated signaling pathway’ was the most
   significantly enriched at 6 and 12 h after submergence treatments. The
   top five pathways were the most significantly enriched at 24 h after
   submergence treatments.

Figure 4.

   [115]Figure 4
   [116]Open in a new tab

   Pathway enrichment analysis of the FTgenes using the competitive
   method. Enriched GO pathways were identified by using the
   hypergeometric test. Software R v3.6.2
   ([117]https://www.r-project.org/) was used to create image.

Table 1.

   Significantly enriched pathways in gene expression data for
   flooding-tolerance using hypergeometric test.
   Annotated pathway N[G] 3 h 6 h 12 h 24 h
   N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a
   Abscisic acid mediated signaling pathway 630 28  < 1.00 × 10^–16 29
    < 1.00 × 10^–16 28  < 1.00 × 10^–16 33  < 1.00 × 10^–16
   Response to ethylene stimulus 723 22  < 1.00 × 10^–16 19 7.86 × 10^–11
   21 1.42 × 10^–13 24  < 1.00 × 10^–16
   Ethylene biosynthetic process 268 15 1.33 × 10^–15 16 1.11 × 10^–16 11
   2.22 × 10^–10 17  < 1.00 × 10^–16
   Hyperosmotic salinity response 459 17 1.25 × 10^–14 17 2.31 × 10^–14 15
   2.32 × 10^–12 20  < 1.00 × 10^–16
   Response to jasmonic acid stimulus 754 18 3.25 × 10^–12 20
   4.80 × 10^–14 21 1.22 × 10^–15 23  < 1.00 × 10^–16
   Response to water deprivation 883 19 4.50 × 10^–12 19 8.70 × 10^–12 18
   3.06 × 10^–11 23 1.33 × 10^–15
   Response to chitin 1130 21 4.41 × 10^–12 22 1.06 × 10^–12 16
   7.46 × 10^–8 19 9.72 × 10^–10
   Intracellular signal transduction 476 15 5.37 × 10^–12 14 1.17 × 10^–10
   14 5.40 × 10^–11 17 7.02 × 10^–14
   Systemic acquired resistance, salicylic acid mediated signaling pathway
   684 17 7.53 × 10^–12 19 9.99 × 10^–14 17 5.34 × 10^–12 17 2.23 × 10^–11
   Response to fungus 310 12 8.21 × 10^–11 11 1.90 × 10^–9 12
   6.25 × 10^–11 11 2.62 × 10^–9
   Response to abscisic acid stimulus 1249 20 2.20 × 10^–10 22
   7.58 × 10^–12 19 1.11 × 10^–9 23 1.84 × 10^–12
   Regulation of hydrogen peroxide metabolic process 532 14 3.13 × 10^–10
   14 4.97 × 10^–10 12 2.73 × 10^–8 12 7.24 × 10^–8
   Jasmonic acid mediated signaling pathway 797 16 7.62 × 10^–10 16
   1.29 × 10^–9 16 5.51 × 10^–10 15 1.62 × 10^–8
   Ethylene mediated signaling pathway 309 10 1.88 × 10^–8 10 2.59 × 10^–8
   9 2.00 × 10^–7 12 1.68 × 10^–10
   Response to wounding 1025 16 2.67 × 10^–8 17 6.43 × 10^–9 18
   3.40 × 10^–10 20 2.37 × 10^–11
   MAPKKK cascade 572 12 7.82 × 10^–8 15 1.17 × 10^–10 12 6.02 × 10^–8 13
   1.80 × 10^–8
   Response to hypoxia 194 8 8.54 × 10^–8 8 1.10 × 10^–7 9 3.66 × 10^–9 9
   7.87 × 10^–9
   Regulation of plant-type hypersensitive response 1015 15 1.57 × 10^–7
   16 3.86 × 10^–8 15 1.16 × 10^–7 15 3.70 × 10^–7
   Protein targeting to membrane 1016 15 1.59 × 10^–7 16 3.91 × 10^–8 15
   1.18 × 10^–7 15 3.74 × 10^–7
   Negative regulation of defense response 762 12 1.63 × 10^–6 14
   4.68 × 10^–8 12 1.27 × 10^–6 12 3.19 × 10^–6
   Signal transduction 1301 16 6.76 × 10^–7 16 1.08 × 10^–6 18
   1.43 × 10^–8 20 1.53 × 10^–9
   Response to auxin stimulus 1009 12 2.77 × 10^–5 12 3.86 × 10^–5 14
   6.87 × 10^–7 16 5.50 × 10^–8
   Jasmonic acid biosynthetic process 415 10 2.94 × 10^–7 9 3.73 × 10^–6
   11 2.12 × 10^–8 12 4.72 × 10^–9
   Negative regulation of programmed cell death 523 10 2.37 × 10^–6 13
   4.33 × 10^–9 11 2.18 × 10^–7 13 6.26 × 10^–9
   Respiratory burst involved in defense response 418 9 3.01 × 10^–6 8
   3.25 × 10^–5 7 1.64 × 10^–4 11 5.64 × 10^–8
   Regulation of multi-organism process 262 9 6.10 × 10^–8 9 8.11 × 10^–8
   7 8.56 × 10^–6 6 1.47 × 10^–4
   Detection of biotic stimulus 276 9 2.56 × 10^–5 9 3.72 × 10^–5 7
   3.80 × 10^–3 6 6.16 × 10^–2
   [118]Open in a new tab

   N[G] total number of genes in a specific pathway (i.e. GO term), N[FT]
   number of overlap genes between FTgenes and pathway.

   ^aPathways with p-values, calculated using competitive method
   (hypergeometric test), less than 1.00 × 10^–4 were significantly
   enriched with flooding-tolerant responses in soybean.

   Significant values are in bold.

Self-contained methods (SUMSTAT, SUMSQ) revealed the mechanisms of
flooding-tolerant responses

   The 144 FTgenes (Fig. [119]2) were significantly enriched in fourteen
   GO pathways (14 in SUMSTAT and 1 in SUMSQ) after controlling the false
   discovery rate at the 0.05 level in the self-contained approaches (Step
   3 in Figs. [120]1, [121]5). Among them, only one GO pathway, ‘response
   to hypoxia’, was found at all four-time points in both methods. Tables
   [122]2 and [123]3 demonstrated detailed information of significantly
   enriched pathways overrepresented in the gene expression dataset using
   SUMSTAT and SUMSQ, respectively. Five GO pathways were significantly
   enriched after submergence treatments at all four-time points. The top
   five pathways included ‘response to hypoxia’, ‘response to cadmium
   ion’, ‘systemic acquired resistance’, ‘salicylic acid mediated
   signaling pathway’, ‘regulation of hydrogen peroxide metabolic
   process’, and ‘glycolysis’. One pathway, ‘response to hypoxia’, was the
   most significantly enriched at 3 h after submergence treatments. Three
   pathways, including ‘response to hypoxia’, ‘systemic acquired
   resistance, salicylic acid mediated signaling pathway’, and ‘response
   to cadmium ion’ were the most significantly enriched at both 6 and 12 h
   after submergence treatments. The ‘response to wounding’ pathway and
   the top five pathways were the most significantly enriched at 24 h
   after submergence treatments. Six pathways (‘response to wounding’,
   ‘ethylene mediated signaling pathway’, ‘carboxy-lyase activity’,
   ‘thiamine pyrophosphate binding’, ‘response to cold’, and ‘cell wall’)
   were not enriched at 3 h at the beginning but enriched later during
   6–24 h after submergence treatments.

Figure 5.

   [124]Figure 5
   [125]Open in a new tab

   Pathway enrichment analysis of the FTgenes using the self-contained
   methods. Enriched GO pathways were identified based on 10,000
   permutations by using (A) SUMSTAT and (B) SUMSQ statistics. Software R
   v3.6.2 ([126]https://www.r-project.org/) was used to create image.

Table 2.

   Significantly enriched pathways in gene expression data for
   flooding-tolerance using SUMSTAT statistic.
   Pathway N[G] 3 h 6 h 12 h 24 h
   N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a
   Response to hypoxia 194 8  < 1.00 × 10^–7 8  < 1.00 × 10^–7 9
    < 1.00 × 10^–7 9  < 1.00 × 10^–7
   Response to cadmium ion 1301 10 8.00 × 10^–6 9  < 1.00 × 10^–7 9
    < 1.00 × 10^–7 9  < 1.00 × 10^–7
   Systemic acquired resistance, salicylic acid mediated signaling pathway
   684 17 1.40 × 10^–5 19  < 1.00 × 10^–7 17  < 1.00 × 10^–7 17
    < 1.00 × 10^–7
   Regulation of hydrogen peroxide metabolic process 532 14 6.00 × 10^–6
   14 2.00 × 10^–6 12  < 1.00 × 10^–7 12  < 1.00 × 10^–7
   Glycolysis 658 5 3.00 × 10^–5 6  < 1.00 × 10^–7 6 8.00 × 10^–6 6
    < 1.00 × 10^–7
   Response to wounding 1025 16 1.00 17 4.40 × 10^–5 18 2.84 × 10^–3 20
    < 1.00 × 10^–7
   Magnesium ion binding 247 5 1.04 × 10^–4 5 1.80 × 10^–5 5 1.52 × 10^–4
   5 4.00 × 10^–6
   Gluconeogenesis 462 5 2.60 × 10^–4 5 8.80 × 10^–5 5 8.56 × 10^–4 5
   3.20 × 10^–5
   Pyruvate decarboxylase activity 18 3 1.22 × 10^–3 3 5.80 × 10^–5 3
   1.20 × 10^–4 3 1.60 × 10^–5
   Ethylene mediated signaling pathway 309 10 1.00 10 2.00 × 10^–4 9
   1.76 × 10^–4 12 4.00 × 10^–6
   Carboxy-lyase activity 38 3 1.00 3 3.32 × 10^–4 3 4.08 × 10^–4 3
   9.80 × 10^–5
   Thiamine pyrophosphate binding 29 3 1.00 3 1.04 × 10^–4 3 4.34 × 10^–4
   3 4.60 × 10^–5
   Response to cold 1113 11 1.00 13 1.65 × 10^–3 15 2.92 × 10^–3 16
   2.20 × 10^–5
   Cell wall 1355 6 1.00 6 3.00 × 10^–5 6 1.26 × 10^–4 5 1.49 × 10^–3
   [127]Open in a new tab

   N[G] total number of genes in a specific pathway (i.e. GO term), N[FT]
   number of overlap genes between FTgenes and pathway.

   ^aPathways with p-values, calculated based on 10,000 permutations using
   self-contained method (SUMSTAT statistic), less than 1.00 × 10^–4 were
   significantly enriched with flooding-tolerant responses in soybean.

   Significant values are in bold.

Table 3.

   Significantly enriched pathways in gene expression data for
   flooding-tolerance using SUMSQ statistic.
   Pathway N[G] 3 h 6 h 12 h 24 h
   N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a N[FT] p-value^a
   Response to hypoxia 194 8 6.00 × 10^–5 8 4.20 × 10^–5 9 1.00 × 10^–5 9
    < 1.00 × 10^–6
   [128]Open in a new tab

   N[G] total number of genes in a specific pathway (i.e. GO term), N[FT]
   number of overlap genes between FTgenes and pathway.

   ^aPathways with p-values, calculated based on 10,000 permutations using
   self-contained method (SUMSQ statistic), less than 1.00 × 10^–4 were
   significantly enriched with flooding-tolerant responses in soybean.

   We found that four pathways (response to hypoxia, systemic acquired
   resistance, salicylic acid mediated signaling pathway, regulation of
   hydrogen peroxide metabolic process, and response to wounding) were
   reported in both competitive and self-contained approaches. However,
   there was no overlap among the top five pathways in both approaches.

Gene network analysis selects the key genes relevant to flooding-tolerant
responses

   Since many FTgenes were involved in flooding-tolerant responses, we
   conducted a functional gene network analysis (Step 4 in Fig. [129]1) to
   better understand how these FTgenes work together. Among the 144
   FTgenes, 103 were found to have protein–protein interactions (PPIs) in
   the soybean interactome. Using the functional modules analytic tool in
   SoyNet, we successfully constructed a gene network specific to
   flooding-tolerant responses in soybean (Fig. [130]6; Sheet 1 in
   Supplementary Table [131]1). This gene network contained 103 FTgenes
   and 70 intermediate genes that were highly connected nodes (hubs) in
   the reference network and hence recruited in the gene network. The
   degree values of the 173 genes ranged from 0 to 66, with an average
   degree of 7.32. Of which, 110 genes (degree values between 0 and 2) and
   13 genes (degree values between 3 and 10) had a low degree of
   centrality and were hence excluded from the gene network. Figure [132]6
   demonstrated a dense gene network containing 50 genes, of which 23
   genes had degree values between 20 and 30, demonstrating a high degree
   of centrality in the gene network. The 23 genes (highlighted in
   yellow), including eight FTgenes and 15 significant intermediate genes,
   had an average degree of 25.5. Among them, the eight FTgenes
   (Glyma.02g222400, Glyma.18g009700, Glyma.13g231700, Glyma.13g361900,
   Glyma.15g012000, Glyma.07g153100, Glyma.01g118000, and
   Glyma.15g011900), reported in both competitive and self-contained
   pathway analytic strategies (Table [133]4), demonstrated a high degree
   of centrality ranged between 20 and 30 (the average degree was 26). The
   eight FTgenes are mainly related to signal transduction
   (Glyma.13g361900, Glyma.15g011900, and Glyma.15g012000),
   energy-producing (Glyma.02g222400, Glyma.13g361900, and
   Glyma.18g009700), and plant hormone regulation (Glyma.15g011900 and
   Glyma.15g012000), indicating they play important roles in
   flooding-tolerant responses in soybean.

Figure 6.

   [134]Figure 6
   [135]Open in a new tab

   The gene network analysis of the FTgenes. 103 FTgenes were selected
   from enriched pathways to construct the gene network. Software
   Cytoscape v3.9.0 ([136]https://cytoscape.org/) was used to create
   image.

Table 4.

   Contributions of the FTgenes in three pathway analysis.
   Gene N^pw N^method Gene size (kb) Z-score^a Annotation Source
   Glyma.02g195300 23 3 2.723 10.62 Intracellular protein transport
   Uniprot GO
   Glyma.05g021100 20 3 6.774 0
   Glyma.08g218600 20 3 2.408 19.59 Regulation of defense response
   Arabidopsis GO
   Glyma.14g102900 20 3 2.732 0
   Glyma.13g234500 19 3 1.059 0 Cell redox homeostasis AgriGO-BP, Uniprot
   GO
   Glyma.13g361900 19 3 6.667 37.14 Abscisic acid transport; response to
   ethylene; ATP catabolic process Arabidopsis GO, Uniprot GO
   Glyma.14g127800 19 3 8.822 18.72 ATP catabolic process Uniprot GO
   Glyma.15g011900 19 3 6.771 33.59 Abscisic acid transport; response to
   ethylene; ATP catabolic process Arabidopsis GO, Uniprot GO
   Glyma.15g012000 19 3 8.106 33.24 Abscisic acid transport; response to
   ethylene; ATP catabolic process Arabidopsis GO, Uniprot GO
   Glyma.13g279900 18 3 1.663 0 Jasmonic acid mediated signaling pathway
   Arabidopsis GO
   Glyma.03g148300 17 3 2.556 0 Response to gibberellin Uniprot GO,
   Arabidopsis GO
   Glyma.03g112400 16 2 0.611 NA
   Glyma.02g005600 16 3 2.498 6.31
   Glyma.04g044900 16 3 2.290 36.94 Photosynthesis; response to oxidative
   stress Arabidopsis GO
   Glyma.06g045400 16 3 2.250 4.37 Photosynthesis; regulation of root
   development; negative regulation of transcription, DNA-templated
   Arabidopsis GO, Uniprot GO
   Glyma.10g180800 16 3 2.372 20.14
   Glyma.17g236200 16 3 1.335 20.55 Negative regulation of transcription,
   DNA-templated; regulation of root development; response to abscisic
   acid Arabidopsis GO, Uniprot GO
   Glyma.09g153900 14 3 4.318 30.18 Glycolysis AgriGO-BP, Uniprot GO
   Glyma.16g204600 14 3 4.389 32.62 Glycolysis AgriGO-BP, Uniprot GO
   Glyma.02g054200 13 1 2.652 0 Methylation Uniprot GO
   Glyma.19g013700 13 2 2.764 32.28 Response to abscisic acid Uniprot GO
   Glyma.07g153800 13 3 4.777 0
   Glyma.18g009700 12 2 4.273 26.73 Gluconeogenesis; oxidation–reduction
   process; glucose metabolic process Arabidopsis GO, Uniprot GO
   Glyma.02g268200 12 3 1.844 1.84
   Glyma.05g123900 12 3 1.228 0
   Glyma.05g124000 12 3 1.304 0
   Glyma.08g050400 12 3 1.838 NA Ethylene biosynthetic process Arabidopsis
   GO
   Glyma.14g049000 12 3 8.736 NA
   Glyma.14g049200 12 3 1.773 0
   Glyma.02g009800 11 3 1.645 0
   Glyma.11g181200 10 2 2.255 5.85
   Glyma.07g153100 10 3 3.256 20.67 Response to anoxia Arabidopsis GO
   Glyma.12g149100 10 3 2.385 0 Jasmonic acid mediated signaling pathway
   Arabidopsis GO
   Glyma.01g206600 9 2 1.517 5.91 Positive regulation of transcription,
   DNA-templated; regulation of transcription, DNA-templated Arabidopsis
   GO
   Glyma.11g036500 9 2 1.653 6.05 Positive regulation of transcription,
   DNA-templated; regulation of transcription, DNA-templated Arabidopsis
   GO
   Glyma.17g145400 9 2 1.383 2.43 Positive regulation of transcription,
   DNA-templated Arabidopsis GO
   Glyma.09g276600 8 1 1.786 3.30 Signal transduction AgriGO-BP, Uniprot
   GO
   Glyma.14g203000 8 1 3.251 0 Protein phosphorylation; response to pH
   Arabidopsis GO
   Glyma.18g212700 8 1 2.046 3.30 Signal transduction AgriGO-BP
   Glyma.13g231700 8 3 4.293 24.27
   Glyma.05g150100 7 1 0.532 0
   Glyma.08g106900 7 1 0.539 1.85
   Glyma.11g180500 7 1 2.800 33.04 Nuclear-transcribed mRNA poly(A) tail
   shortening Arabidopsis GO
   Glyma.12g187400 7 1 1.473 18.24 Defense response to bacterium; response
   to wounding Uniprot GO
   Glyma.13g314100 7 1 1.223 10.21 Nuclear-transcribed mRNA poly(A) tail
   shortening Arabidopsis GO
   Glyma.13g005100 7 2 1.603 16.83
   Glyma.20g064500 7 2 0.701 0
   Glyma.04g240800 7 3 2.998 0 Response to osmotic stress; cellular
   respiration Arabidopsis GO
   Glyma.14g121200 7 3 3.604 4.05 Response to osmotic stress; response to
   hypoxia Arabidopsis GO, Uniprot GO
   Glyma.11g121900 6 1 1.129 0
   Glyma.13g239000 6 1 5.001 0 Fatty acid biosynthetic process Uniprot GO
   Glyma.04g190900 6 3 2.889 3.01 Abscisic acid-activated signaling
   pathway Arabidopsis GO
   Glyma.02g211600 5 1 4.720 17.25
   Glyma.10g195700 5 1 3.781 0 Response to jasmonic acid Uniprot GO
   Glyma.04g231400 5 3 1.856 0
   Glyma.06g133800 5 3 1.767 0
   Glyma.08g176300 5 3 1.748 0 Response to osmotic stress Uniprot GO
   Glyma.11g222600 5 3 2.090 3.99 Response to abscisic acid Arabidopsis GO
   Glyma.03g173300 4 1 0.887 42.51 Regulation of root development;
   regulation of transcription, DNA-templated Arabidopsis GO
   Glyma.07g105700 4 1 1.375 8.28 MAPK cascade; ethylene biosynthetic
   process Arabidopsis GO
   Glyma.09g172500 4 1 1.443 3.85 Ethylene biosynthetic process; systemic
   acquired resistance, salicylic acid mediated signaling pathway
   Arabidopsis GO
   Glyma.09g250700 4 1 2.983 0 Dephosphorylation Uniprot GO
   Glyma.12g093100 4 1 1.787 30.92 Nuclear-transcribed mRNA poly(A) tail
   shortening Arabidopsis GO
   Glyma.13g032100 4 1 2.468 0
   Glyma.15g045600 4 1 2.655 0
   Glyma.18g042100 4 1 1.831 15.71 Protein ubiquitination AgriGO-BP,
   Uniprot GO
   Glyma.19g174200 4 1 0.805 25.18 Regulation of root development;
   regulation of transcription, DNA-templated Arabidopsis GO
   Glyma.01g118000 4 2 3.537 18.13 Response to anoxia Arabidopsis GO
   Glyma.02g222400 4 2 3.121 11.54 Glycolysis; response to cadmium ion
   Glyma.13g270100 4 2 2.011 5.43
   Glyma.08g133600 4 3 3.2 Response to osmotic stress Arabidopsis GO
   Glyma.11g149900 4 3 2.784 2.38
   Glyma.11g255000 4 3 2.822 0 Response to osmotic stress Arabidopsis GO
   Glyma.04g092100 3 1 1.16 11.90
   Glyma.08g128500 3 1 6.27 0 Cellular metabolic process AgriGO-BP,
   Uniprot GO
   Glyma.17g164100 3 2 3.264 0 Negative regulation of catalytic activity
   AgriGO-BP
   Glyma.08g128100 3 3 1.163 0
   Glyma.12g150500 3 3 2.312 1.79
   Glyma.12g222400 3 3 2.147 2.38
   Glyma.01g037200 2 1 1.785 0
   Glyma.02g148200 2 1 1.675 3.56
   Glyma.05g108900 2 1 2.098 3.44 Ethylene biosynthetic process
   Arabidopsis GO
   Glyma.13g208000 2 1 1.889 1.87
   Glyma.17g158100 2 1 2.109 3.44 Ethylene biosynthetic process; cell
   division Arabidopsis GO
   Glyma.11g055700 2 3 8.44 0 Abscisic acid biosynthetic process
   Arabidopsis GO, Uniprot GO
   Glyma.17g174500 2 3 12.853 0 Abscisic acid biosynthetic process
   Arabidopsis GO, Uniprot GO
   Glyma.02g134500 1 1 1.662 0
   Glyma.03g015800 1 1 0.831 6.58 Negative regulation of defense response
   Arabidopsis GO
   Glyma.04g136600 1 1 0.864 3.70
   Glyma.06g100900 1 1 1.206 4.36
   Glyma.10g073600 1 1 1.716 4.86 Anaerobic respiration Arabidopsis GO
   Glyma.11g028200 1 1 2.816 0 Cellular amino acid metabolic process
   Uniprot GO
   Glyma.18g206000 1 1 4.469 3.99 Defense response; abscisic
   acid-activated signaling pathway AgriGO-BP, Arabidopsis GO
   Glyma.18g238100 1 1 3.714 NA
   Glyma.19g069200 1 1 3.582 0 Protein dephosphorylation Uniprot GO
   Glyma.19g213300 1 1 4.844 0 Negative regulation of ethylene-activated
   signaling pathway Arabidopsis GO
   Glyma.07g031400 1 2 5.646 0 ATP catabolic process Uniprot GO
   Glyma.08g199800 1 2 5.023 0 Glycolysis AgriGO-BP, Uniprot GO
   Glyma.09g149200 1 2 2.164 0 Gibberellin biosynthetic process
   Arabidopsis GO
   Glyma.13g250400 1 2 4.118 0
   Glyma.07g049900 1 3 4.782 3.37 Glucan biosynthetic process Uniprot GO
   Glyma.16g018500 1 3 4.614 3.37 Glucan biosynthetic process Uniprot GO
   Glyma.20g218100 1 3 4.614 0 Glucan biosynthetic process Uniprot GO
   [137]Open in a new tab

   N^pw number of enriched pathways, N^method number of pathway analytic
   approaches, NA not available, GO gene ontology.

   ^aZ-score value was calculated using a binomial proportions test.

   Significant values are in bold.

   We selected 77 FTgenes from 24 significantly enriched pathways reported
   in the hypergeometric test to compute node edges in SoyNet and
   construct a gene network in Cytoscape. As a result, 103 genes (74
   FTgenes and 29 intermediate genes) were retained, with degree values
   ranging from 0 to 35 (the average degree was 9.68). For simplicity, we
   further grouped 60 genes having a row degree of centrality into a node
   (named as Group0_2), and included the node with the remaining 43 genes
   (15 FTgenes and 28 intermediate genes) to form a gene network
   (Fig. [138]7; Sheet 2 in Supplementary Table [139]1). Of those, 20
   genes (highlighted in yellow) having higher degree values between 20
   and 30, with an average degree of 28.2, demonstrated to interact with
   each other more closely in the gene network. Among them, one Ftgenes
   (Glyma.14g127800) is mainly related to plant hormone transport that
   contributes to flooding-tolerant responses.

Figure 7.

   [140]Figure 7
   [141]Open in a new tab

   The gene network analysis of FTgenes associated with enriched pathways
   in the hypergeometric test. A total of 77 FTgenes were selected from 24
   significantly enriched pathways in hypergeometric tests in four
   time-point databases to construct the gene network. Software Cytoscape
   v3.9.0 ([142]https://cytoscape.org/) was used to create image.

   Another 34 Ftgenes from 5 significantly enriched pathways reported in
   SUMSTAT were being computed edges in SoyNet, resulting in 65 genes (32
   Ftgenes and 33 intermediate genes), with degree values ranging from 0
   to 71 (the average degree was 17.94). We further constructed a gene
   network in Cytoscape (Fig. [143]8; Sheet 3 in Supplementary Table
   [144]1), and observed that 29 genes (highlighted in yellow) were highly
   connected to form a dense module, with higher degree values between 20
   and 29 (the average degree was 25.5). Among them, 6 Ftgenes are mainly
   related to signal transduction (Glyma.15g011900, Glyma.15g012000),
   plant hormone transport (Glyma.14G127800, Glyma.18G009700), and enzyme
   catalytic activity (Glyma.13G231700, Glyma.07G153100).

Figure 8.

   [145]Figure 8
   [146]Open in a new tab

   The gene network analysis of FTgenes associated with enriched pathways
   in SUMSTAT test. A total of 34 FTgenes were selected from 5
   significantly enriched pathways in SUMSTAT in four time-point databases
   to construct the gene network. Software Cytoscape v3.9.0
   ([147]https://cytoscape.org/) was used to create image.

   The above results show closely connected PPIs in the soybean
   interactome by entering the 144 FTgenes, 24 enriched pathways at all
   four-time points (3, 6, 12, and 24 h) in the hypergeometric test, and 5
   enriched pathways at all four-time points in the SUMSTAT method,
   respectively. These genes were first compared to 23 (Fig. [148]6), 20
   (Fig. [149]7), and 29 (Fig. [150]8) selected important genes (including
   the FTgenes and the intermediate genes) to examine their topological
   characteristics. Our results showed that these important genes had
   higher degree values in all comparisons, suggesting high degree of
   centrality. These FTgenes (103, 77, and 34 FTgenes) were further
   compared to the intermediate genes (70, 29, and 33 genes) and the
   remaining genes (14,599, 16,911, and 18,993 genes), respectively. We
   found that the FTgenes and the intermediate genes in the corresponding
   gene network more frequently received small p-values at all four-time
   points in gene expression datasets.

   To further explore the key FTgenes, we selected 25 FTgenes from 5
   significantly enriched pathways that overlapped in both the
   hypergeometric test and the SUMSTAT method to compute node edges in
   SoyNet and construct a gene network in Cytoscape. As a result, a gene
   network containing 25 FTgenes and 26 intermediate genes was obtained
   (Fig. [151]9; Sheet 4 in Supplementary Table [152]1), with degree
   values ranged from 0 to 28 (the average degree was 13.68). Of them, 26
   genes (highlighted in yellow) were closely linked, having higher degree
   values between 20 and 30, with an average degree of 25.3. Among them,
   four FTgenes (Glyma.13g361900, Glyma.15g012000, Glyma.15g011900, and
   Glyma.14g127800) exhibited higher degree values ranged between 26 and
   28, with an average degree value of 26.3. The four FTgenes are mainly
   related to signal transduction (Glyma.13g361900, Glyma.15g011900,
   Glyma.15g012000) and plant hormone transport (Glyma.14g127800) to play
   key roles in flooding-tolerant responses in soybean.

Figure 9.

   [153]Figure 9
   [154]Open in a new tab

   The gene network analysis of FTgenes associated with enriched pathways
   in hypergeometric test and SUMSTAT. A total of 25 FTgenes were selected
   from 5 enriched pathways overlapped in both hypergeometric test and
   SUMSTAT to construct the gene network. Software Cytoscape v3.9.0
   ([155]https://cytoscape.org/) was used to create image.

   More importantly, all these FTgenes in the corresponding gene networks
   had significantly larger mean scores (or smaller p-values) in the
   corresponding gene expression datasets (p-values < 0.001) compared to
   the remaining genes (Fig. [156]10). Similar scenarios were also
   observed in the intermediate genes, although they did not reach
   significance level at 0.05. In particular, we further selected 8, 1,
   and 6 key FTgenes from the corresponding gene network, respectively,
   and found that these key FTgenes significantly outperformed all gene
   groups (Step 5 in Figs. [157]1, [158]10). The nine key FTgenes
   (Fig. [159]11A) were involving with signal transduction
   (Glyma.15g012000 and Glyma.15g011900), energy (Glyma.02g222400,
   Glyma.18g009700, Glyma.13g361900, and Glyma.14g127800), enzyme activity
   (Glyma.07g153100 and Glyma.13g231700), and unknown function
   (Glyma.01g118000), which were significantly related to abscisic acid
   transport and terpenoid transport.

Figure 10.

   [160]Figure 10
   [161]Open in a new tab

   Validation study of the FTgenes (and the key FTgenes) compared to the
   intermediate genes and the remaining genes using an independent RNA-seq
   data. (A) 103 FTgenes were selected from enriched pathways. (B) 77
   FTgenes were selected from 24 significantly enriched pathways in the
   hypergeometric test. (C) 34 FTgenes were selected from 5 significantly
   enriched pathways in SUMSTAT. The dark yellow and red dotted line
   presents the significance level of the key FTgenes and the FTgenes
   against other gene sets at 0.05, respectively. The red star represents
   that the key FTgenes have a significantly higher mean score than all
   other gene sets. The dark grey star represents that the FTgenes have a
   significantly higher mean score than the remaining genes. The threshold
   of statistical significance level: *p-value < 0.05, **p-value < 0.01,
   ***p-value < 0.001.

Figure 11.

   [162]Figure 11
   [163]Open in a new tab

   Summary information of nine key FTgenes. (A) The sunburst chart of the
   nine key FTgenes and their involved gene functions. (B) The evidence
   was obtained from different layers for the nine key FTgenes.

   To validate the flooding stress responses in the plant cell, a
   real-time quantitative reverse transcription polymerase chain reaction
   (qRT-PCR) was used to measure the level of the nine key FTgenes
   expressions in soybean root under flooding stress (Step 6 in Figs.
   [164]1, [165]12). Our results revealed that four energy involved genes
   (Glyma.02g222400, Glyma.18g009700, Glyma.13g361900, and
   Glyma.14g127800) were significantly upregulated from 3 to 24 h except
   for Glyma.14g127800 which showed downregulation in all conditions
   compared with the control (i.e. untreated condition). In the enzyme
   activity involved genes (Glyma.07g153100 and Glyma.13g231700), the
   highest expression was found at 12 h after treatment. For those of
   signal transduction involved genes, the transcript level of
   Glyma.15g011900 and Glyma.15g012000 was significantly higher than the
   control from 3–24 h to 6–24 h, respectively. Interestingly, the
   Glyma.01g118000 which is an unknown function gene exhibited around
   330–7000 times higher expression level than the control, and when
   compared with the other, this gene also has the highest relative gene
   expression.

Figure 12.

   [166]Figure 12
   [167]Open in a new tab

   Relative gene expression of 9 key FTgenes in soybean root under
   flooding stress. Actin was used as a reference gene, and untreated was
   used as the reference condition (ctrl). Error bars indicate ± standard
   error. Different lowercase letters above columns indicate significant
   differences at p < 0.05.

Discussion

   Understanding genetic backgrounds and molecular mechanisms underlying
   flooding-tolerant responses is imperative for soybean breeding.
   However, the success in identifying candidate genes for
   flooding-tolerant responses in soybean has been limited because of the
   complex nature of abiotic stresses. The present study introduced
   systems biology methods using pathway enrichment analysis (both
   competitive and self-contained approaches were considered) and gene
   network analysis to evaluate the joint effects of multi-genes (in this
   context, FTgenes) within annotated GO pathways. Most importantly, the
   FTgenes^[168]36 (Fig. [169]2) used in this study were prioritized from
   multiple OnO databases integrated from experimental and computational
   studies that have been made available in the last decades. In
   particular, several data-ensemble approaches were performed, including
   data cleaning, data harmonization, data heterogeneity, and data
   mapping, to remove unwanted data and inaccurate data. Through the
   process of gene prioritization, the uncertainties, noise, biases, and
   false positives raised from the data itself and statistical approaches
   could be reduced effectively.

   Systems biology often requires sophisticated computational models and
   simulations to understand the larger picture of the biological systems
   by studying interactions among a set of candidate genes^[170]59.
   Integrative pathway and network analysis marry the idea of mathematical
   graph theory and data-driven approach (e.g. multiple omics data, OnO
   data integration) to efficiently uncover the genotype–phenotype
   relationship at the systems level by integrating knowledge of gene
   regulation and function. In an attempt to integrate OnO data with
   mathematical graph theory, we introduced an integrative pathway and
   network approach to construct a comprehensive view of the biological
   mechanisms for flooding-tolerant responses in soybean. To the best of
   our knowledge, this is the first work on the pathway and network
   analyses using candidate genes prioritized from multiple OnO data
   integration algorithms. Our results reveals novel molecular pathways
   and functional relationships of the FTgenes to better understand their
   biological implications in the regulatory system for further
   validation.

   Skewness is widely used to measure the degree of asymmetry in
   expression data. Expression skewness can identify novel molecular
   pathways and key genes via the systems biology approaches (e.g.
   enrichment pathway analysis and functional network analysis), which is
   a valuable way to capture meaningful outliers (i.e. the greatest
   variation between samples with and without submergence) and
   asymmetrical behavior in the whole genome expression dataset^[171]60.
   Our results demonstrated a high degree of skewness (Fig. [172]3) that
   was appropriate for pathway and network analyses. In addition, our
   results may provide valuable insights into exploring mechanisms
   underlying flooding-tolerant responses in soybean.

   In the studies of crops, pathway analysis is merely used for the
   exploration of candidate genes focusing on specific
   traits^[173]58,[174]61. In general, pathway analysis can be
   distinguished into two different approaches, the competitive and the
   self-contained, according to their null hypothesis^[175]48. In
   practical applications, however, two different approaches often
   generated inconsistent results^[176]62 due to distinct null hypotheses.
   The competitive methods can potentially exclude confounding effects and
   provide biological relevance to the analysis^[177]63. The
   self-contained methods have the greater power to identify feature-set
   (i.e. GO pathways), and the outcomes are highly reproducible^[178]64.
   Both approaches have their strengths and limitations. Therefore, a
   suitable way to gain better insights into the data is to perform the
   competitive and the self-contained approaches simultaneously for
   feature-set (i.e. GO pathways) testing. This could reduce the
   likelihood of false-positive results and gain biological relevance to
   the analysis.

   In this study, we identified 36 overrepresented GO pathways (Tables
   [179]1, [180]2, [181]3) in the independent RNA-seq databases of
   submergence treatments in soybean. The most frequently shared FTgenes
   among enriched pathways were Glyma.02g195300 (functioning in 23
   pathways), Glyma.05g021100 (functioning in 20 pathways),
   Glyma.08g218600 (functioning in 20 pathways), and Glyma.14g102900
   (functioning in 20 pathways), which were found in two or more
   pathway-based methods (Table [182]4). Many of these FTgenes (6, 18, 10,
   and 12 FTgenes in 3, 6, 12, and 24 h, respectively) were not
   significantly overrepresented at the single gene-level in the RNA-seq
   databases (Fig. [183]3); however, they were enriched (p-values < 0.001)
   with flood-tolerant responses at the systems level using pathway-based
   analytic approaches. For instance, Glyma.17g236200, Glyma.19g013700,
   and Glyma.11g180500 gene did not reach genome-wide significant
   association, but were found at the systems level in our approaches. In
   particular, Arabidopsis GO and Uniprot GO databases provide
   opportunities to access a better understanding of how these FTgenes
   participate in flooding activities. Under flooding conditions, the
   Glyma.17g236200 gene regulates root development to prevent from
   wounding, and the Glyma.19g013700 gene mediates the transpiration
   efficiency by regulating ABA to control stomata closure. In further,
   the Glyma.11g180500 gene participates in RNA regulation, producing
   factors to control where plant hormones should work. Evidence from
   previous studies confirmed the roles of these important FTgenes and
   pathways identified in this study for the complex mechanisms of
   flooding-tolerant responses in soybean. These findings indicate that
   systems biology methods can boost the power to reveal the potential
   roles of FTgenes in uncovering the molecular mechanisms and biological
   novelties for studying flooding-tolerant responses in soybean.

   The hypothesis and the model of different categories of pathway
   analysis are distinct; hence, the results are also different. In this
   study, we compared the results across the hypergeometric test, the
   SUMSTAT method, and the SUMSQ method. In total, 27, 14, and 1 enriched
   pathway(s) were identified in the hypergeometric test (Table [184]1,
   Fig. [185]4), the SUMSTAT method (Table [186]2, Fig. [187]5A), and the
   SUMSQ method (Table [188]3, Fig. [189]5B), respectively. The three
   methods found only one pathway, ‘response to hypoxia’ in all four-time
   points (3, 6, 12, and 24 h) of gene expression data. Under flooding
   conditions, the response to hypoxia begins with low-oxygen stimulation,
   followed by activates the transcription of plant hormone genes. Plant
   hormones, such as ABA, ethylene, and salicylic acid, are involved in
   participating in roots recovery^[190]65. Of which five pathways were
   consistently reported by both the competitive and the self-contained
   approaches, even a more stringent threshold was applied to correct for
   multiple testing. The ‘systemic acquired resistance, salicylic acid
   mediated signaling pathway’ is responsible for regulating the
   biosynthesis, the perception, and the signal mediating^[191]66. When a
   plant suffers from flooding, hydrogen peroxide begins to express in
   roots to remove some harmful chemicals from flooding stress, and
   salicylic acid mediates a series of signals in producing hydrogen
   peroxide^[192]66,[193]67. Evidence shows the important fact that the
   regulation of hydrogen peroxide interacts with salicylic acid by
   signaling series forms to eliminate fatal chemicals in roots cell under
   flooding stress^[194]66–[195]68. Thus, ‘regulation of hydrogen peroxide
   metabolic process’ and ‘systemic acquired resistance, salicylic acid
   mediated signaling pathway’ are evidenced to be linked to
   flooding-tolerant responses in soybean. Besides, rice was also
   evidenced to be involved with these two pathways under flooding
   stress^[196]69,[197]70. In soybean roots, cell wall and aerenchyma will
   swell under flooding. The response to wounding in roots leads to many
   salicylic acid signals activating and interacting with other plant
   hormones in order to restore the wounds^[198]71. After wounding, the
   soybean’s adventitious roots will grow against hypoxia environments.
   The energy from glycolysis and pyruvate-phosphorylation is consumed
   when soybean grows adventitious. Evidence showed that ‘response to
   wounding’ and plant hormone-related pathways may play key roles in
   flooding-tolerant responses in soybean. In addition, the
   gluconeogenesis and glycolysis, which can synthesize or degrade
   carbohydrates, make crops gain and store adequate ATPs in order to get
   more energy^[199]20–[200]22,[201]33. All the evidence suggested that
   ‘glycolysis’, ‘gluconeogenesis’, ‘pyruvate decarboxylase activity’,
   ‘abscisic acid mediated signaling pathway’, and ‘regulation of hydrogen
   peroxide metabolic process’ were found to be linked to
   flooding-tolerant responses in soybean, which were in line with the
   previous studies^[202]20–[203]22,[204]33. All these pathways mentioned
   above play key roles in the physiological mechanisms underlying
   flooding-tolerant responses. Our results demonstrated that the pathways
   we found differ considerably between distinct types of pathway
   analyses. Hence, combining distinct pathway-based analyses with
   considering different hypotheses and models can provide comprehensive,
   precise, and reliable results.

   Ethylene is important to protein phosphorylation in the mechanisms of
   the initial stage of flooding stress, especially in root tips. Evidence
   shows that roots recovery needs more ATP to provide energy and protein
   phosphorylation to develop the cell
   tissue^[205]25,[206]28,[207]29,[208]37. At the initial stage of
   flooding, root cells are stimulated by ethylene, and a series of
   mediated signaling produce more ethylene. The evidence proves that
   ‘response to ethylene stimulus’, ‘ethylene biosynthetic process’, and
   ‘ethylene mediated signaling pathway’ are important to flooding stress.

   Our results also showed 22 (Table [209]1, Fig. [210]4) and 9 (Table
   [211]2, Fig. [212]5A) enriched pathways specific to the hypergeometric
   test and the SUMSTAT method, respectively. Without comparing the
   results of two different approaches, we might obtain false-positive and
   false-negative results. For instance, two pathways, ‘glycolysis’ and
   ‘gluconeogenesis’, were evidenced^[213]19–[214]21,[215]33 and found in
   the SUMSTAT approach but not in the hypergeometric test, and hence they
   are false-negative results; the ‘abscisic acid mediated signaling
   pathway’ pathway was evidenced^[216]22,[217]25,[218]28,[219]33 and
   reported in the hypergeometric test but not in the SUMSTAT, and hence
   it is a false-negative result. Two pathways, ’response to chitin’ and
   ‘response to fungus’, were significantly enriched in the hypergeometric
   test, but did not reach the significance in the SUMSTAT approach.
   Besides, the two pathways were evidenced to be related to biotic
   stress^[220]72. Hence, the two pathways might be false-positive results
   or novel findings that need further validation. Again, combining the
   competitive and the self-contained methods is a promising approach to
   better understanding a given candidate genes for a trait of interest.

   Benefits from combining advantages of pathway enrichment analysis and
   network analysis, our study not only discovers new novelty about the
   flooding mechanisms in soybean but also captures more information of
   biological systems. For instance, We finally selected nine key FTgenes,
   four of these genes (Glyma.13g361900, Glyma.14g127800, Glyma.15g012000,
   and Glyma.15g011900) were recorded in DNA, RNA, protein, function, and
   homologs layer; one gene (Glyma.18g009700) was recorded in RNA and
   protein layer; and four genes (Glyma.02g222400, Glyma.07g153100,
   Glyma.13g231700, and Glyma.01g118000) were recorded in RNA, protein,
   and homolog layer^[221]36 (Fig. [222]11B). These key genes may play
   important roles in coordinating physiological mechanisms under
   flooding-tolerant responses in soybean.

   The systems biology framework proposed in this study demonstrated the
   power in identifying the nine key FTgenes in a rigorous and efficient
   manner. To validate the 9 key FTgenes, in planta FTgenes expression
   analysis was performed. Our qRT-PCR results (Fig. [223]12) revealed
   that eight key FTgenes were upregulated and one FTgene was
   downregulated after exposed to flooding stress. The results
   demonstrated the unique and differential response of soybean leaf
   tissue under flooding, offering the evidence of the real response to
   flooding in genetics and molecular biology. Our results can be supplied
   as a good foundation for the gene function analysis underlying
   flooding-tolerant responses in further work.

   Although systems biology takes advantage of a comprehensive and
   systematic understanding of the FTgenes in flooding-tolerant responses
   in soybean, there still are some limitations and considerations in this
   study. First, pathway and network analyses were built on the basis of
   gene and pathway annotation completeness. In the application of
   scientific research, the research team of GO and PlantRegMap updates
   the databases and maintains the website annually. It ensures the
   databases are complete, accurate, and persuasive. Second, the accuracy
   of pathway and network analyses relied on the accuracy and the
   completeness of the FTgenes. Fortunately, our FTgenes were selected
   from a comprehensive framework consisting of omics and non-omics data
   integration and gene prioritization algorithm. Several data quality
   control processes were done during the data-ensemble step to
   effectively reduce potential uncertainties, noise, and false positive
   results. Although our FTgenes are informative, more validation
   experiments are required.

   Flooding-tolerant responses are a quantitative trait regulated by
   polygenes; thus, many traditional single-marker methods, such as
   association mapping, linkage mapping, and genome-wide association
   study, have no power to uncover the whole picture of how these genes
   interact with each other to regulate traits. Our proposed systems
   biology framework can efficiently integrate gene information with
   annotated GO database biologically to boost the power of identifying
   key FTgenes and their underlying molecular pathways or mechanisms. This
   provides an opportunity to better understanding complex
   flooding-tolerant responses that should be noted. These findings
   present a wealth of information for future validation.

Methods

   We developed an integrative systems biology framework to explore
   insights into the FTgenes underlying flooding-tolerant responses in
   soybean. Six-step pipelines (Fig. [224]1) were proposed to select the
   key FTgenes, including the GO annotations filtering, gene-wise
   statistic scores calculation, pathway enrichment analysis, functional
   network analysis, validation study, and the key gene selection.
   Detailed methods and materials used in this study are described below.

Candidate genes for flooding-tolerance (FTgenes)

   We previously proposed a comprehensive multiple OnO data mining,
   integration, and prioritization framework^[225]36. All genetic data
   (SNPs, genes, SSRs, QTLs) and bioinformatics information (trait index,
   variety, biochemical, statistical values) that were relevant to
   flooding-tolerant responses in soybean were collected and defined as a
   flooding-tolerance gene pool (containing 36,705 genes). These OnO data
   were integrated from multidimensional data platforms, including
   association mapping and GWAS, linkage mapping, gene expression, pathway
   regulatory, network analysis, protein–protein interaction, proteomic
   analysis, and model plants. Through the systems biology framework, a
   total of 144 prioritized FTgenes (Fig. [226]2), based on the cut-off
   score of 42, were selected from the gene pool^[227]36. The FTgenes were
   defined to be significantly associated or enriched with flood-tolerance
   or flood-response after flooding treatment (i.e., submergence) was
   conducted during the germination and vegetative growth stages of
   soybean. The study framework and the prioritized results, the data of
   which are used here, are provided elsewhere^[228]36.

Gene expression dataset and gene-wise statistic values

   The gene expression dataset (whole genome expression database) of
   soybean seedling submergence was accessed through the database of
   Genotypes and Phenotypes (dbGaP,
   [229]https://www.ncbi.nlm.nih.gov/gap/) that was published by Lin et
   al.^[230]58. They used cultivar Qihuang 34, a flooding-resistant
   variety, for submergence experiments, and recorded gene expression
   changes in roots after 3, 6, 12, and 24 h of submergence treatments.
   All four-time periods of RNA expression data were obtained from the
   dbGaP repository. We used p-values, of which genes under the null
   hypothesis of no differential gene expression, to present gene-level
   statistic values of flooding-tolerance in soybean. To obtain gene-level
   significance, we used 10-based logarithms to transform p-values into
   gene-wise statistic scores to capture information for gene expression
   changes in roots flooded after 3, 6, 12, and 24 h.

Pathway annotations

   To perform mapping for functional pathway analysis, we used
   GO^[231]73,[232]74 ([233]http://geneontology.org/) annotations.
   GO-based functional annotation in soybean contains 4896 terms covering
   48,606 unique genes, mapped in the Williams 82 reference genome version
   2 (Glycine max Wm82.a2.v1). These annotated GO gene sets (i.e.
   pathways) systematically provide a standard catalogue to classify
   functional genes into biological functions and molecular mechanisms.
   Pathways with overly limited information (< 6 genes) were removed, as
   well as substantially large (> 1500 genes) pathways. As a result, a
   total of 2926 pathways, which consist of 916 cellular components, 762
   biological processes, and 1248 molecular functions, remained for
   pathway analysis. In pathway analysis, we used the negative logarithm
   of these 2926 pathways’ p-values as our statistic.

Statistical methods for pathway enrichment analysis

   We utilized two different strategies, the competitive method, and the
   self-contained method^[234]47, to test for significantly enriched
   pathways for the trait of flooding-tolerance in soybean. Three
   statistical methods, including the hypergeometric test (competitive
   method), SUMSTAT, and SUMSQ (self-contained methods), were used to
   discover the significance of enriched pathways. The former method
   compares two gene sets in terms of association with a phenotypic trait
   based on a statistical probability model, and the latter two methods
   only test the association between a phenotypic trait and genes in
   pathways.

   The hypergeometric test, assuming an experimentally-derived gene list
   is randomly conditional on a fixed pathway, is a widely utilized
   competitive method for pathway analysis^[235]75. The null hypothesis of
   the test is that genes in a pathway are more strongly associated with
   the phenotypic trait than those outside the pathway. The main idea of
   the test is to sample randomly, without replacement, from a finite
   population, calculating the statistic of characteristic (here is
   flooding-tolerance) of interest. Hence, this method aims to test
   whether annotated pathways (i.e. biological functions or processes),
   which are functionally related, are enriched or over-represented in a
   list of important genes (i.e. FTgenes) with the trait of interest. The
   p-value can be computed by
   [MATH: <mrow><mrow><mi mathvariant="normal">p</mi><mo>-</mo><mi
   mathvariant="normal">value</mi></mrow><mo>=</mo><munderover><mo>∑</mo><
   mrow><mi>x</mi><mo>=</mo><mi>g</mi></mrow><mi>S</mi></munderover><mfrac
   ><mrow><mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>S</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>x</mi></mrow></mtd></mtr></mtable></mrow></mfenced><mf
   enced close=")"
   open="("><mrow><mtable><mtr><mtd><mrow><mi>L</mi><mo>-</mo><mi>S</mi></
   mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow><mi>M</mi><mo>-</mo><mi>x
   </mi></mrow></mtd></mtr></mtable></mrow></mfenced></mrow><mfenced
   close=")"
   open="("><mrow><mtable><mtr><mtd><mi>L</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>M</mi></mrow></mtd></mtr></mtable></mrow></mfenced></m
   frac><mo>,</mo></mrow> :MATH]

   where L is the total number of genes in a finite population, M is the
   size of important genes, S is the number of genes in a specific
   pathway, x is the number of important genes in a specific pathway, and
   g is the number of genes in M.

   The idea of the self-contained methods is to use permutations to
   generate a huge number of null distributions. We compared genes in a
   specific pathway with random sets sampled from the hull distributions
   and calculated an empirical p-value for pathway analysis. The tests
   ignored genes not in the pathways. The present study applied two
   self-contained methods, SUMSTAT and SUMSQ^[236]76. Under the null
   hypothesis (the pathway is unrelated to the trait), we tested whether
   the observed gene set (i.e. pathway) outperforms the random gene sets
   generated by permutations. The enrichment score (ES) calculation of
   SUMSTAT and SUMSQ can be expressed as
   [MATH: <mrow><mrow><mo stretchy="false">(</mo><mi
   mathvariant="normal">SUMSTAT</mi><mo
   stretchy="false">)</mo></mrow><mspace width="1em"></mspace><msub><mi
   mathvariant="normal">ES</mi><mi
   mathvariant="normal">SUMSTAT</mi></msub><mo>=</mo><munderover><mo>∑</mo
   ><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>S</mi></munderover><msu
   b><mi>t</mi><mi>i</mi></msub><mo>,</mo></mrow> :MATH]
   [MATH: <mrow><mfenced close=")" open="("><mi
   mathvariant="normal">SUMSQ</mi></mfenced><mspace
   width="1em"></mspace><msub><mi mathvariant="normal">ES</mi><mi
   mathvariant="normal">SUMSQ</mi></msub><mo>=</mo><munderover><mo>∑</mo><
   mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>S</mi></munderover><msubs
   up><mi>t</mi><mrow><mi>i</mi></mrow><mn>2</mn></msubsup><mo>,</mo></mro
   w> :MATH]

   where t[i] is the i-th value of the statistic (in this context,
   expression metrics e.g. p-value, fold-change) of FTgenes, and S is the
   number of genes in a specific pathway.

   The analysis pipelines of SUMSTAT and SUMSQ consist of calculating the
   statistics ES of observed gene sets of soybeans, random permutations of
   statistics calculated from gene expression data, calculating permuted
   ES and association p-value. The ES represents association signals for
   each of annotated pathways, and the calculation of ES[SUMSTAT] and
   ES[SUMSQ] is to sum over all statistics and all squared statistics of a
   gene set (i.e. GO pathway) containing S FTgenes, respectively. We
   randomly shuffled the statistics calculated from gene expression data
   for each pathway and followed the same receipt above to calculate a
   permuted ES. Then, we normalized the ES by subtracting the mean of
   permutated ESs, and divided it by the standard deviation of permuted
   ESs. Finally, we calculated empirical p-values by comparing the
   observed ES and the permuted ES in 10,000 permutations for all
   pathways.

Functional gene network analysis

   A graphical model of a network composes of nodes and edges. Nodes can
   be defined as genes, proteins, metabolites, and annotated pathways.
   Edges are typically presented by connections between nodes. In network
   analysis, the degree is the most widely used measure to describe the
   connections of the nodes in a network. In this study, we defined nodes
   as the FTgenes, and calculated edges using the sum of the
   log-likelihood score in SoyNet functional gene network tool
   ([237]https://www.inetbio.org/soynet/Network_nfm_form_conv.php). For
   detailed steps of network links calculation, please refer to Berger et
   al.^[238]77 and Kim et al.^[239]78. We further used Cytoscape
   v3.9.0^[240]79 to integrate molecular interaction network data to
   visualize the graphical model of the network.

Multiple testing correction

   To account for multiple testing problems in pathway analysis, we
   applied both the Benjamini–Hochberg correction method^[241]80 and the
   Bonferroni correction method to balance false positive and false
   negative results. The procedure controls the false discovery rate at
   0.05 level in the current study, assuming p-values are independently
   distributed under the null hypothesis. Only pathways reaching
   genome-wide significance threshold of p-value less than 1.00 × 10^–4
   were considered significantly enriched.

Validation for the key FTgenes

   Soybean seeds of Chiangmai 60 cultivar were obtained from Thanya Farm
   Co., Ltd., Nonthaburi, Thailand. The seeds were surface-sterilized in
   1% sodium hypochlorite and rinsed with distilled water 3 times. The
   seeds culture and stress conditions were done following Lin et
   al.^[242]58 with some modification. Ten seeds were sown on the sandy
   soil in a plastic pot (240-mm length × 240-mm width × 190-mm depth). A
   total of eight pots were sowed. Five seedlings soybean with the same
   size were retained in the pot, when two true leaves were fully unfolded
   (~ 8 days), the seedlings pot was transferred into new plastic
   containers filled with water. The samples were collected at 3, 6, 12,
   and 24 h, respectively. The untreated plants were used as the control.
   The root was collected and immediately frozen in liquid nitrogen for
   RNA extraction.

   Total RNA was isolated from root and shoot with TRIzol reagent
   according to the manufacturer’s protocol. Subsequently, 5 µg of the
   total RNA was mixed with 500 ng of oligo(dT)[18] and 200 U Superscript™
   III reverse transcriptase (Invitrogen), and the mixture was reverse
   transcribed at 50 °C for 60 min. The real-time PCR was done following
   the manufacturer’s protocol of the Luna^® Universal qPCR Master Mix
   (NEB) with the gene-specific primers listed in Supplementary Table
   [243]2. After the PCR had finished, the PCR specificity was examined
   using 2% agarose gel and the relative gene expression ratios were
   calculated using the 2^−ΔΔCT method with untreated plants cDNA as the
   reference sample and actin as the reference gene. All experiments were
   done in biological triplicates.

   To validate the significant differences between the transcript
   quantities of FTgenes under flooding stress, statistical analysis was
   performed using one-way ANOVA followed by Tukey’s HSD test method
   facilitated by the IBM SPSS statistics software. p-values less than
   0.05 were considered as statistically significant difference.

Conclusions

   This study shed new light on the effectiveness of the systems biology
   framework based on the FTgenes selected from the integrated OnO data
   and gene prioritization algorithm to uncover the mechanisms behind
   flooding-tolerant responses in soybean. We proposed a computational
   systems biology pipeline to discover enriched pathways and nine key
   genes that were real responses to flooding stress in our qRT-PCR
   experiments. This work suggests that the integrative pathway and
   network framework at systems biology level can be a good foundation for
   key genes discovery and gene function analysis for further work. In
   addition, this pipeline can minimize potential uncertainties and false
   positives and gain valuable insights into mechanisms underlying
   flooding-tolerant responses in soybean. The framework presented in this
   work can be applied to other complex traits in important crops.

Supplementary Information

   [244]Supplementary Table 1.^ (19KB, xlsx)
   [245]Supplementary Table 2.^ (15.4KB, docx)

Acknowledgements