Abstract

   Functional diversity rather than species richness is critical for the
   understanding of ecological patterns and processes. This study aimed to
   develop novel integrated analytical strategies for the functional
   characterization of fish diversity based on the quantification,
   prediction and integration of the chemical and physical features in
   fish muscles. Machine learning models with an improved random forest
   algorithm applied on 1867 muscle nuclear magnetic resonance spectra
   belonging to 249 fish species successfully predicted the mobility
   patterns of fishes into four categories (migratory, territorial,
   rockfish, and demersal) with accuracies of 90.3–95.4%. Markov
   blanket-based feature selection method with an
   ecological–chemical–physical integrated network based on the Bayesian
   network inference algorithm highlighted the importance of nitrogen
   metabolism, which is critical for environmental adaptability of fishes
   in nutrient-rich environments, in the functional characterization of
   fish biodiversity. Our study provides valuable information and
   analytical strategies for fish home-range assessment on the basis of
   the chemical and physical characterization of fish muscle, which can
   serve as an ecological indicator for fish ecotyping and human impact
   monitoring.

   Subject terms: Data mining, Machine learning, Network topology,
   Metabolomics

Introduction

   With the growing challenges in global food security, it is critical to
   achieve a sustainable protein supply from both environmental and
   ecological perspectives. Livestock products are almost completely grown
   under artificial conditions. However, the feeding and cultivation of
   livestock might suffer from the inefficient use of energy and water. In
   contrast, fishes are the most diverse vertebrates with more than 34,000
   species^[34]1. Approximately 30% of the marine biomass is composed of
   fish species, which provide a critical food source and animal protein
   to meet the nutritional demands of billions of people^[35]2. For
   example, there is growing interest in sashimi and sushi products, which
   are traditional Japanese seafoods consisting of fresh raw fish, as they
   are low-calorie and healthy foods. The biodiversity of fish has been
   shown to be an important factor in promoting the production of fish
   biomass^[36]3. In the marine environment, fishes quickly respond to
   multiple environmental stresses, such as climate change and pollution,
   causing disturbances in species composition, diversity, and ecosystem
   function^[37]4. There is growing recognition that the quantification of
   the functional and phenotypic variations among species, such as the
   measures of functional diversity, rather than species richness, is
   critical for measuring species diversity and understanding ecological
   patterns and processes^[38]5. Functional traits are defined as any
   measurable features of an individual who could affect the persistence
   and performance of a species and its ecological interactions within a
   community, which could be physical, biochemical, behavioral, or
   phenological, among other effects^[39]6. Recent studies have provided
   evidence that in comparison with taxonomic diversity, the functional
   diversity of fish could be more sensitive to environmental stresses and
   might serve as a better predictor of ecosystem function^[40]7. However,
   given the large number of species and the complicated species
   interactions and ecosystem dynamics, it is still challenging to predict
   and quantitatively characterize the functional diversity of fish at a
   large scale^[41]8. A metabolome is a collection of the final product of
   all cellular activities, and it reflects the complex interactions among
   endogenous biological processes and environmental stimuli. The changes
   in metabolites are often more sensitive than those in RNA transcripts
   or proteins to biological and environmental changes. This difference
   makes metabolomics a relevant tool for environmental monitoring and
   safety assessments^[42]9. The nondestructive profiling of the chemical
   matrix using nuclear magnetic resonance (NMR) spectroscopy, which does
   not require chemical separation, provides an ideal approach to explore
   the complicated interactions of genes, growth stages and the
   surrounding environment of wild fish in their natural
   state^[43]10–[44]13. Advances in data science and big data
   technologies, such as machine learning, deep learning and artificial
   intelligence, have enabled the extraction of useful information by
   evaluating the relative importance of each feature according to model
   coefficients and the establishment of simulation models to predict
   ecosystem dynamics on the basis of high dimensionality and large data
   sets in which raw data are largely unlabeled and
   uncategorized^[45]12,[46]14–[47]16. Successful applications of big data
   analysis have mainly involved nonbiological data, such as in the fields
   of atmospheric physical movement (weather forecasting) and market
   analysis. Notably, studies have been limited by ease of access to
   massive amounts of data; for example, electronic sensors can be easily
   used to record physical environmental information such as the
   temperature, humidity, wind velocity and direction, and precipitation,
   and trade details and market transaction information can be obtained
   from the internet^[48]17. However, applications of big data analysis in
   computational ecological modeling are still limited. The main
   bottleneck is the difficulty related to large-scale biological sampling
   and analysis under identical conditions. Indeed, it is well known that
   machine learning approaches are quite beneficial for learning from
   large volumes of unsupervised data. Biochemical and physical processes
   consist of systems of interacting molecules and macromolecules. One
   excellent approach that can be used to integrate individual features
   into a meaningful network that might represent physical interactions
   and remove transitive relationships is to infer a network by modeling
   the dependencies among variables^[49]18. There is growing interest in
   the application of the Bayesian network (BN) inference algorithm in
   environmental modeling and management, such as in predicting population
   dynamics for a single species and assessing the functional
   relationships between species and habitats within
   ecosystems^[50]19,[51]20. The BN inference algorithm, which represents
   the joint posterior probability distribution over the whole set of
   variables in a system, is considered more biologically interpretable
   than correlation networks that consider each relationship
   independently^[52]21. In comparing algorithm results with observational
   data from traditional field studies, the BN inference algorithm has
   proven to be a powerful tool in ecosystem analysis that can accurately
   reveal known relationships and identify key features with high
   connectivity within an ecosystem by inferring the network
   structure^[53]22. Therefore, a BN-inferred functional network
   architecture provides a valuable solution to process and interpret the
   outcomes of machine learning models from biological
   perspectives^[54]23. Collectively, the main motivation for the present
   study is the lack of a perspective on the functional diversity of fish
   and the corresponding relationship with the ecological characteristics
   of the natural state over a large scale. Therefore, this study aimed to
   develop novel integrated analytical strategies for the functional
   characterization of fish diversity based on the quantification,
   prediction and integration of the chemical and physical variations in
   fish muscles (Fig. [55]1). Fish sampling was conducted in Japan at a
   nationwide scale, large-scale biochemical fish muscle data were
   generated using NMR, and physical fish muscle data were generated by
   examining muscle strength using stress testing, an autograph and
   observations of intact textural features with one-dimensional magnetic
   resonance imaging (1D MRI). Machine learning models run with the
   improved random forest (RF) algorithm based on 1867 muscle NMR spectra
   for 249 fish species successfully predicted the mobility patterns of
   fishes into four categories (migratory, territorial, rockfish, and
   demersal) with accuracies of 90.3–95.4%. Eleven important muscle
   metabolites, including creatine, histidine, lactate, inosine,
   glutamine, N-acetyl-glutamate (NAG), carnitine, taurine, alanine,
   proline, and uridine diphosphate (UDP)-glucose, were extracted
   according to the Gini index. Then, ecological category-dependent
   metabolic networks of the machine-learned chemical features and Markov
   blanket-based feature selection for an ecological–chemical–physical
   integrated network were established with the BN inference algorithm to
   mine the functional connections among the high-dimensional data
   factors. Collectively, our study provides valuable information and
   analytical strategies for fish home-range assessment on the basis of
   the chemical and physical characterization of fish muscle, which can
   serve as an ecological indicator for fish ecotyping and human impact
   monitoring.

Figure 1.

   [56]Figure 1
   [57]Open in a new tab

   The scheme for the present study. Fish sampling was conducted in Japan
   at a nationwide scale with Tokyo Bay as the center, covering a wide
   range of seas and rivers from Hokkaido to Okinawa. A total of 1867
   fishes belonging to 249 species were collected from 2011 to 2019. Fish
   muscle was dissected and prepared for analysis under identical
   conditions. The functional traits of fish muscle were evaluated with
   multiple methods, including the biochemical profiling of water-soluble
   small-molecule metabolites and macromolecules with NMR spectra,
   physical characteristic assessment with autographs, and intact
   phenotype observation with 1D MRI. Two data mining strategies were
   conducted to evaluate fish diversity in this study: (1) machine
   learning based on the chemical profiles of 1867 fish muscle samples to
   establish a predictive model for the ecological characterization of the
   movement patterns and home ranges of fish and (2) the use of
   BN-algorithm-inferred ecological category-dependent metabolic networks
   of machine-learned chemical features combined with Markov blanket-based
   feature selection for an integrated network of chemical (NMR-based
   small-molecular metabolites and macromolecule composition profiles),
   physical (stress testing) and phenotypic (1D MRI-based intact
   observations of texture features) data and ecological categories
   (migratory, territorial, rockfish, and demersal) to extract the hidden
   patterns and interactions related to the functional diversity of fish.

Results and discussion

Fish sampling and data generation

   Fish sampling centered in Tokyo Bay was conducted from 2011 to 2019,
   and it covered a wide range of seas and rivers from Hokkaido to
   Okinawa. A total of 1867 fishes belonging to 249 species were collected
   with no restrictions on fish species or environmental conditions (Table
   [58]S1). Fish muscle was dissected and prepared for analysis under
   identical conditions. Fish muscle was the focus of this study because
   it is the main edible part of fish and reflects the athletic ability of
   fish, which is closely related to the ecological characteristics of
   their habitat and life habits^[59]14; additionally, muscles are the
   main protein source of fish. Large-scale data sets of the chemical
   composition, physical characteristics and phenotypical microstructure
   of fish muscle were generated by NMR spectroscopy, autography and 1D
   MRI, respectively. The impacts of diverse factors in the system were
   examined at the level of the data structure to filter out noise and
   mine meaningful information/patterns of interest. This data-driven
   model was constructed by filtering the noise from the data structure
   without the need for additional measurements. Therefore, the technical
   demands on the experimental system were not considerable, making it
   possible to apply the method under nonhypothetical conditions to a wide
   range of objects/systems, including ecosystems. Two data mining
   strategies were implemented to evaluate fish diversity in this study:
   (1) machine learning based on the chemical profiles of 1867 fish muscle
   samples to establish a predictive model for the ecological
   characterization of movement patterns and habitat use of fish and (2)
   the use of BN-algorithm-inferred ecological category-dependent
   metabolic networks of machine-learned chemical features combined with
   Markov blanket-based feature selection for an integrated network of
   chemical (NMR-based small-molecular metabolites and macromolecule
   composition profiles), physical (stress testing) and phenotypic (1D
   MRI-based intact observations of texture features) data and ecological
   categories (migratory, territorial, rockfish, and demersal) to extract
   the hidden patterns and interactions related to the functional
   diversity of fish.

Machine learning-based ecological prediction using large-scale metabolomic
data

   Unsupervised principal component analysis (PCA) revealed that the most
   important factor that affects the metabolic profile of fish was
   environmental salinity, followed by the growth stage and habitat depth,
   as shown in Fig. [60]S1. The fish mobility pattern and habitat use are
   critical functional traits that are characterized by ecological
   features^[61]24. The fishes used in this study were divided into four
   categories according to their mobility pattern: migratory, territorial,
   rockfish, and demersal (Table [62]S1). Muscles are the peripheral
   structures of the motor system and are used in exercise. The chemical
   composition and structural characteristics of muscles represent the
   result of ecological adaptation. Next, based on the water-soluble
   metabolic profiles of all 1867 fish muscle samples, a large-scale
   machine learning model was established using RF methodology; this
   method is an ensemble learning algorithm used for classification,
   regression, and clustering^[63]25 and was applied for the prediction of
   the mobility patterns of fish ecological characteristics. Due to the
   unbalanced distribution of fishes in the four ecological categories
   (migratory: 93 fishes; territorial, 736 fishes; rockfish: 102 fishes;
   and demersal: 936 fishes), RF predictive modeling for categories with a
   small number of samples might lead to inadequate learning, subsequently
   resulting in the prediction accuracy being largely dependent on the
   distribution of learning samples (Fig. [64]S2). To address this issue,
   we improved the RF algorithm following the scheme shown in Fig. [65]2A.
   In detail, the data were divided into two modules: the test module and
   learning module. After assessing the prediction accuracy in the test
   module, the bias of the sample was adjusted by randomly duplicating the
   data in the learning module. Notably, the improved RF model effectively
   enhanced the accuracy of the prediction of the ecological
   characteristics of fish to as high as 90% in all four categories based
   on the water-soluble metabolic profiles of fish muscle (Fig. [66]2B).
   Accordingly, the impact of each NMR peak on the predictive power of the
   RF model was evaluated based on the Gini index ranking (Fig. [67]2C).
   Peak assignment revealed that fish muscle metabolites such as creatine,
   histidine, lactate, inosine, glutamine, and NAG had high Gini scores,
   suggesting that these metabolites are important in the prediction of
   the mobility patterns of fish. For example, high levels of metabolites
   involved in energy metabolism, such as creatine, lactate, and
   UDP-glucose, were observed in the muscles of oceanodromous fish species
   (Fig. [68]S3). A relatively high distribution of histidine, which is an
   intracellular proton buffering constituent required for anaerobic
   performance, such as burst swimming in fishes^[69]26, was observed in
   the muscles of migratory fishes (Fig. [70]S3).

Figure 2.

   [71]Figure 2
   [72]Open in a new tab

   Machine learning for ecological prediction according to the NMR-based
   chemical profiling of fish muscle. (A) Conceptual diagram of the
   improved random forest (RF) algorithm used in the present study. After
   the division and reservation of the test data, the data for each class
   used for modeling were randomly duplicated to reach the maximum sample
   number for classes 1 to 4 to eliminate bias. The generated data with
   equal sample numbers for each class were used as the modeling data. RF
   calculations were performed 5 times each with the test data for k1 to
   k5. The result files of RF (prediction accuracy and important variables
   identified by the Gini index) were generated, and the average values of
   RF1 to RF5 were calculated and used as the final RF results. (B)
   Accuracy of the prediction of fish ecotype using NMR-based machine
   learning. (C) The most important metabolites in the discrimination of
   fish ecotypes ranked by the Gini index were δ and the ^1H chemical
   shift.

Ecological category-dependent network of machine-learned chemical features
using the BN inference algorithm

   To recover meaningful networks of functional relationships from
   machine-learned metabolic data, BN analysis was performed on the
   important muscle metabolites (creatine, histidine, lactate, inosine,
   glutamine, NAG, carnitine, taurine, alanine, proline, and UDP-glucose)
   with high Gini scores in each ecological category using the
   hill-climbing algorithm in the bnlearn R package. The reconstructed BN
   structures involved circular network diagrams that displayed all
   network edges with a conditional dependence level greater than 0.5
   between any of the muscle metabolites (Fig. [73]3A–D). Topological
   analysis demonstrated structural diversity among the four ecological
   categories. Notably, the creatine node with the highest Gini score in
   the RF model was located at the top of the network hierarchy for
   migratory, territorial, and rockfish fishes but was located in the
   center of the network for demersal fish (Fig. [74]3A–D). Creatine is a
   key product of the energy compound creatine phosphate in the so-called
   Lohmann reaction and is an indicator of muscle performance^[75]27.
   Creatine-related network topological diversity might reflect the
   functional differences in movement patterns between demersal fishes and
   other fishes. Next, the importance of each variable was calculated as
   the sum of all the probabilistic strengths of the variables connected
   to the corresponding parent (red) and child (blue) nodes
   (Fig. [76]3E–H). A comparison of importance ranks using Kendall’s
   coefficient of concordance showed large diversity across the four
   ecological categories (Kendall’s W = 0.34, P = 0.19). Of interest, low
   rankings for glutamine were observed in migratory, territorial, and
   rockfish species, while a relatively high rank for glutamine was
   observed in demersal fish. To further investigate the biological basis
   of fish biodiversity, metabolic pathway analysis was performed on the
   top-6-ranked metabolites of each ecological category using the online
   platform Metaboloanalyst^[77]28. Pathway enrichment analysis using
   Metaboloanalyst showed that “ammonia recycling” was the most
   significantly enriched metabolite set in the muscle of demersal fish
   but not that of the other three fish types (Fig. [78]S4). These data
   highlighted the importance of nitrogen metabolism in the functional
   characterization of fish biodiversity. Notably, the ammonia
   detoxification ability of fish is critical for environmental
   adaptability in ammonia-rich environments that are affected by various
   industries and human activities^[79]29. Coastal hypoxia, which is
   induced by an increase in nutrient inputs attributable to anthropogenic
   origins, fundamentally alters the diversity and functionality of
   coastal fishes across the land–sea interface^[80]30. It is reasonable
   to consider that variations in the functional trials of fish species,
   especially demersal fish species in nutrient-rich environments
   attributable to human activities, are largely dependent on the level of
   human impact. The above data suggested that the BN structures exhibited
   preserved patterns and complicated relationships among metabolites
   belonging to the same ecological category, which inspired us to examine
   whether the dependence among metabolites might be related to fish
   biodiversity. To explore this hypothesis, 200 fish individuals were
   randomly selected 200 times to generate 200 metabolic datasets for BN
   analysis. For each dataset, the strength of the probabilistic
   dependence between each of the 11 important muscle metabolites and the
   Shannon diversity index across the four ecological categories
   (migratory, territorial, rockfish, and demersal) were calculated. To
   achieve the most discrete distribution of fish diversity, the 10
   datasets with the largest and smallest Shannon index values were
   selected. Then, the metabolic dependence related to fish biodiversity
   was selected according to the Pearson correlation between the strength
   of the conditional dependence of each BN edge and the Shannon index.
   Hierarchical clustering demonstrated the strength of the highly
   correlated metabolic dependence (Fig. [81]S5). Notably, the strength of
   the conditional dependence between histidine and lactate was retained
   across the diverse datasets. The serine–histidine–aspartate motif is
   one of the most thoroughly characterized catalytic motifs in
   biochemistry. The aspartate hydrogen bonded to histidine, which
   increases the pK[a] of imidazole nitrogen from 7 to approximately 12,
   results in histidine acting as a powerful general base and activator of
   other nucleophiles, such as serine or cysteine^[82]31. In addition, the
   strength of conditional dependence associated with ornithine-urea cycle
   (OUC) metabolites, such as NAG and glutamine, increased with the
   Shannon index. Since the joint probability of BNs indicates the
   likelihood of two events occurring together and equivalently
   disappearing together, these data suggest that nitrogen
   metabolism-related dependence might be a determinant factor in the
   functional characterization of fish diversity.

Figure 3.

   [83]Figure 3
   [84]Open in a new tab

   Ecological category-dependent (A)–(D) directed acyclic graphs (DAGs)
   and (E)–(H) the rank of variable importance using the BN inference
   algorithm. DAGs were reconstructed with the machine-learned chemical
   features of migratory, territorial, rockfish and demersal fish,
   respectively. (A)–(D): the circular network diagrams display all the
   network edges with a strength of probabilistic dependence greater than
   0.5 between any of the muscle metabolites. Red: beginning nodes without
   parent nodes; blue: end nodes without any child nodes. (E)–(H): the
   importance of each variable was calculated as the sum of all the
   probabilistic strengths of each variable connected to the corresponding
   parent (in black) and child (in gray) nodes.

Multidimensional profiling of fish muscles in species with different
ecological characteristics

   Then, the major soluble macromolecule composition, physical
   characteristics and phenotypic textural features of fish muscles were
   examined, and the corresponding distributions were evaluated among
   different ecological categories. We extracted the signal profiles of
   the major water-soluble macromolecular components from the performance
   baselines using an integrated analytical strategy that combined
   covariation peak separation and matrix decomposition and identified two
   major macromolecules, lipids and collagens, in the fish muscle
   extracts^[85]32. Here, we examined the distribution of these
   macromolecules among the four ecological categories (Fig. [86]4A).
   Relatively high distributions of lipids and collagens were observed in
   the muscles of migratory fish and demersal fish, respectively. Low
   contents of both lipids and collagens were observed in the muscles of
   rockfish. Next, to gain insight into the physical properties of fish
   muscle, the compressive force was examined, and exponential fitting was
   performed on the force–stroke curves (Fig. [87]4B). Three exponential
   function parameters, the y-intercept parameter a and the
   appropriateness factor b of the natural exponential base, and the
   maximum force value (max) of force–distance curve were used to describe
   the trends of the force–stroke curves and enable an accurate
   characterization of the physical properties of muscle tissues in
   response to external cutting forces. The plot of the fitting
   coefficients for a and b indicated a strong correlation between the
   physical properties of fish muscle and the mobility patterns. The
   muscles of migratory fish were characterized by an exponential curve
   with a relatively small a value and a relatively large b value (the
   stress curve grew at a late point and then quickly increased),
   suggesting easy compression but lack of complete cutting. In contrast,
   the muscles of rockfish were characterized by an exponential curve with
   a relatively large a value and a relatively small b value (the stress
   curve quickly increased in the early stage and then slowly increased),
   suggesting that the muscle was easily cut but not easily compressed.
   Therefore, parameters a and b could reflect the physical
   characteristics of fish meat in terms of elasticity (a rubbery
   mouthfeel) and softness (a residual chewy mouthfeel). Then, intact
   phenotypic observations were performed on fish muscle using 1D
   MRI^[88]33. Unlike the averaged signals in the conventional NMR
   spectra, changes in the tissue characteristics could be detected in 1D
   MRI based on the spatial position of the Z-axis (Figs. [89]4C, [90]S6).
   We performed statistical processing for the average (AVE), standard
   deviation (SD), and first derivative (number of edges; EDGE) of the
   variation curves of the transverse relaxation time T[2] and diffusion
   coefficients (D), which depended on the spatial position of the Z-axis
   (Fig. [91]4C). As a result, demersal fish and rockfish muscle had
   relatively large AVE and SD values for T[2]. Rockfish muscle had
   relatively small EDGE values for T[2]. Demersal fish muscle had
   relatively high D and D EDGE values, and rockfish muscle had the
   smallest D EDGE value. Collectively, the biochemical, physiological and
   structural properties of fish muscles were highly correlated with
   ecological characteristics, such as the movement patterns and habitat
   use of fish.

Figure 4.

   [92]Figure 4
   [93]Open in a new tab

   Multidimensional profiling of fish muscles in species with different
   ecological characteristics. Distribution of (A) the major
   macromolecules, (B) index of the exponential function of the fitted
   cut-off stress curve, and (C) 1D MRI features of fish muscle samples
   among the four ecological categories of fish: migratory, territorial,
   rockfish, and demersal. The insert in (A) indicates the average value
   of each ecological category. The error bar presents standard error. The
   left panel in (B) shows a representative photo and the force–time curve
   of fish muscle stress testing using an autograph. The three exponential
   function parameters, the y-intercept parameter a and the
   appropriateness factor b of the natural exponential base, and the
   maximum force value (max) of force–distance curve are highlighted in
   red. The right panel in (B) shows the distribution of the parameters a
   and b of each ecological category, which are scaled based on the
   parameter max values. The insert of the right panel in (B) shows the
   average value of each ecological category. The left panel in (C) shows
   the drilled cylindrical muscle was inserted into a 5-mm NMR tube filled
   with KPi buffer. The observation area of the Z-axis gradient was
   approximately 150 mm of the equipped probe used in the present study.
   1D MRI signals of proton density, T[2] and diffusion coefficients were
   observed. After calculating the moving averages of ± 100 data points,
   the first derivative of the curve was calculated to evaluate the “EDGE”
   (number of zero points in the first derivative) of the 1D imaging data.
   The right panel in (C) shows the distribution of 1D MRI features of
   fish muscle samples of each ecological category.

Markov blanket-based feature selection from a BN-inferred
ecological–chemical–physical integrated network

   As a probabilistic modeling approach, the BN inference algorithm
   enables the integration of complex heterogeneous data, including
   categorical and continuous data, from different sources to assess the
   relationships between multiple environmental factors and ecological
   indicators^[94]34. Therefore, all the data obtained from multiple
   techniques and ecological categories were integrated using the BN
   inference algorithm to extract the hidden patterns and interactions
   related to the functional diversity of fish. The generated
   ecological–chemical–physical integrated network with a score-based
   hill-climbing algorithm identified multiple highly correlated features
   that might play dominant roles in shaping fish diversity (Fig. [95]5).
   Overall, the integrated network demonstrated that NAGs were at the
   central location in the network and had the largest number of
   connections. An integrated network analysis of multidimensional data
   demonstrated the critical role of nitrogen metabolism in the functional
   characterization of fish biodiversity. In addition, the integrated
   network provided other valuable clues for the functional understanding
   of fish diversity. Taurine is known to play a critical nutritional role
   in the growth and development of marine fish. Dietary taurine
   supplementation might increase the growth rates of fish and decrease
   motility due to its role in hemolytic suppression through
   osmoregulation and biomembrane stabilization in fish^[96]35. The
   taurine contents in muscle were correlated with the maximum value of
   the force–distance curve, suggesting that the distribution of taurine
   in natural fishes is closely related to muscle strength. It is possible
   that fish species with high muscle strength would require more muscle
   taurine than other species to maintain the stabilization of muscle cell
   membranes. According to the force–distance curve of fish muscle, the
   parameter a value will increase at an early stage, reflecting a rubbery
   mouthfeel. In contrast, the parameter b value will increase in a later
   stage, reflecting a residual chewy mouthfeel. Our integrated network
   analysis showed that parameter b was correlated with parameter a and
   the SD of the position-dependent diffusion curve (D. SD) obtained from
   the 1D MRI of fish muscle. This result suggests that the degree of
   fluctuation of the apparent diffusion coefficient (ADC), which measures
   the rate of diffusion of water molecules within a tissue^[97]36, is
   related to parameter b. The ADC in biological tissues is determined by
   multiple factors, such as the cell type and density. Therefore, the
   fluctuation in the ADC, which is presented as the SD value of ADC,
   could reflect the degree of heterogeneity in the samples. Collectively,
   the force–distance curve of fish muscle showed that the rubbery
   mouthfeel of fish muscle is probably related to the strength of the
   muscle cell membrane, while the residual chewy mouthfeel of fish muscle
   might be related to the connective tissue interspersed between
   myocytes. Finally, to further understand and characterize the
   biodiversity of fish at multiple levels, Markov blanket-based feature
   selection was applied at the node level in each ecological category
   (Fig. [98]6). A Markov blanket was defined as the union of the parents
   (nodes connected above), children (nodes connected below), and other
   parents of those children for identifying redundant and irrelevant
   features of the node of interest in the reconstructed BN^[99]37.
   Remarkably, the Markov blanket of migratory fish was connected to
   metabolites involved in energy metabolism, such as lactate, UDP-glucose
   and muscle lipids, representing advanced swimming ability during
   migratory movement. In addition, the Markov blanket of dermal fish was
   connected to the OUC metabolites NAG and glutamine, which was in
   accordance with our above observations that demersal fish were
   characterized by their ammonia detoxification ability in nutrient-rich
   environments. These data suggest that the Markov blanket-based feature
   selection of BN structures provides a powerful approach for extracting
   fundamental knowledge about the functional characterization of fish
   biodiversity from multidimensional data.

Figure 5.

   Figure 5
   [100]Open in a new tab

   Integrative network of chemical, physical and phenotypic profiles of
   fish muscle and ecological categories according to the BN inference
   algorithm. NMR-based low-molecular-weight chemical data were used to
   establish continuous variables. The discretized data for the major
   soluble macromolecule composition, physical characteristics and 1D
   MRI-based phenotypic textural features for fish in the four ecological
   categories (migratory, territorial, rockfish, and demersal) were used
   as categorical variables. The integrated network was generated using
   the score-based hill-climbing learning algorithm in the bnlearn R
   package. The length of node bars was scaled according to the number of
   connected edges. The nodes were colored according to the measurement
   techniques and ecological categories. “PD”: 1D MRI factor for the
   proton density; “T[2]”: spin–spin relaxation time; “D”: diffusion
   coefficient; “SD”: standard deviation; “EDGE”: number of zero points in
   the first derivative of 1D MRI data; and “CSI”: chemical shift imaging,
   as described in Fig. [101]S6.

Figure 6.

   [102]Figure 6
   [103]Open in a new tab

   Markov blanket-based feature selection from the BN-inferred
   ecological–chemical–physical integrated network. Markov blankets of the
   nodes (A) migratory, (B) territorial, (C) rockfish and (D) demersal
   fish represent the union of the parents, children, and other parents of
   those children. The arrows are colored by parents. The node size is
   based on edge weights according to the integrative network of
   Fig. [104]5.

   In summary, the biochemical, physical and phenotypic profiles of fish
   muscles from 1867 individuals of 235 species were comprehensively
   evaluated using multiple techniques, including NMR, autograph testing
   and 1D MRI. Consequently, the diversity of fishes in terms of their
   mobility pattern and home range was predictively and functionally
   characterized based on big data collection with PCA, machine learning,
   and BN-inferred network analysis methods. The NMR-based chemical
   profiling of fish muscles was performed to classify the fishes
   according to their habitat features, such as salinity and depth.
   Machine learning models with nonbiased adjusted RF algorithms
   successfully predicted the mobility patterns of fishes into four
   categories (migratory, territorial, rockfish, and demersal) with
   accuracies of 90.3–95.4% based on the metabolic profiles of fish
   muscles. Muscle metabolites such as creatine, histidine, lactate,
   inosine, glutamine, NAG, carnitine, taurine, alanine, proline, and
   UDP-glucose were identified as the most important factors according to
   the Gini index for the prediction of fish mobility patterns. Then, the
   functional features of each ecological category were extracted from the
   BN-inferred ecological category-dependent metabolic network of the
   machine-learned chemical features and the ecological–chemical–physical
   integrated network for Markov blanket-based feature selection. Notably,
   our findings distinguished demersal fish from those in other ecological
   categories, highlighting the critical roles of nitrogen metabolism and
   ammonia detoxification in the functional characterization of fish
   biodiversity. In addition, ammonia detoxification is closely related to
   the environmental adaptability of fishes in nutrient-rich environments
   attributable to human activities, which can be used for assessing the
   consequences of global changes resulting from human activities and
   maintaining seafood sustainability for humans. Collectively, our study
   provides valuable information and analytical strategies for fish
   home-range assessment on the basis of the chemical and physical
   characterization of fish muscle, which can serve as an ecological
   indicator for fish ecotyping and human impact monitoring.

Materials and methods

Sample collection and preprocessing

   Natural fish samples (n = 1867 individuals belonging to 2 classes, 23
   orders, 82 families, 171 genera, and 249 species) were collected over a
   period of nine years from May 2011 to August 2019 from 33 inland,
   estuarine and coastal regions of the water ecosystem in Japan, as shown
   in Table [105]S1. No specific permission was required at any of the
   sampling points, as they were all in public areas. The animal
   experiments were performed in accordance with protocols approved by the
   Institutional Committee of Animal Experiments of RIKEN and adhered to
   the guidelines of the Institutional Regulation for Animal Experiments
   and Fundamental Guidelines for the Proper Conduct of Animal Experiments
   and Related Activities in Academic Research Institutions under the
   jurisdiction of the Ministry of Education, Culture, Sports, Science and
   Technology, Japan. This study was carried out in compliance with the
   ARRIVE guidelines ([106]http://www.nc3rs.org.uk/page.asp?id=1357).
   After dissection, the fish muscle was collected and prepared for
   different analyses. For compressive force measurements, fish muscle
   above the anal fin was picked and cut into slices (approximately 5 mm
   thick and 10 mm wide, as shown in Fig. [107]3B). For intact
   observations by 1D MRI, a stainless pipe (Φ[inside] = 4 mm, as shown in
   Fig. [108]S6) was used as a cutter. The drilled cylindrical muscle was
   inserted into a 5-mm NMR tube filled with 100 μl of KPi (0.1 M
   K[2]HPO[4]/KH[2]PO[4] in D2O, pH 7.0) with 1 mM of sodium
   2,2-dimethyl-2-silapentane-5-sulfonate (DSS) as an internal standard
   and centrifuged at 3000 rpm for 10 min for degasification. For NMR
   observations of metabolites and major macromolecules, the remaining
   fish muscle was lyophilized and crushed into powder. Eighteen
   milligrams of each powdered sample was extracted with 600 μl of KPi at
   65℃ for 15 min and centrifuged at 14,000 rpm for 5 min. The supernatant
   with 1 mM of DSS was transferred to a 5-mm NMR tube.

Physical property analysis

   Stress testing was performed using a multipurpose stretching tester
   that included an autograph (EZ-L, Shimadzu Co. Ltd., Kyoto, Japan) with
   a wedge-shaped cutter bit, as shown in Fig. [109]3B. The loading rate
   was 2 mm min^−1, the total distance was 5 mm, and the force–time curves
   were recorded. The force–time curves for fish muscle were exponentially
   fit in Microsoft Excel using the following equation:
   [MATH:
   <mrow><mi>y</mi><mo>=</mo><mi>a</mi><msup><mrow><mi>e</mi></mrow><mrow>
   <mi mathvariant="italic">bx</mi></mrow></msup><mo>,</mo></mrow> :MATH]
   1

   where x represents the time from the starting point, y represents the
   compressive force, and a and b represent the fitting coefficients.

NMR-based metabolomics

   For metabolome observations, two-dimensional J-resolved (2DJ) spectra
   (pulse sequence of jresgpprqf) were acquired at 298 K using a Bruker
   AVANCE II 700 spectrometer equipped with a ^1H inverse triple-resonance
   cryogenically cooled probe with Z-axis gradients (Bruker BioSpin GmbH,
   Rheinstetten, Germany). The parameters were as follows: data points,
   16 K (F2) and 16 (F1); number of scans, 32; spectral widths, 12,500 Hz
   (F2) and 50 Hz (F1); and acquisition time, 0.66 s (F2) and 0.32 s (F1).
   The 1D projection of F2 was obtained with Topspin 4.0.6 (Bruker BioSpin
   GmbH, Rheinstetten, Germany). All baseline 1D projections were
   collected, and the peaks were identified by rNMR^[110]38 on the R
   platform (v. 3.4.4). The peak intensity matrix was normalized by
   probabilistic quotient normalization (PQN)^[111]39, scaled, centered
   using R and then sorted as a basic data matrix.

NMR observations of major soluble macromolecules

   For water-soluble macromolecule observations, a diffusion-edited pulse
   sequence (ledbpgp2s1d) was used, and the parameters were as follows:
   gradient strength, 36.6% of the maximum gradient strength (48.15 G/cm);
   little delta (δ), 1.5 ms; big delta (Δ), 120 ms; gradient recovery
   delay, 200 μs; data points, 16 K; number of scans, 128; spectral width,
   11,160.71 Hz; and acquisition time, 0.73 s. Information on major
   soluble macromolecules was extracted based on peak separation^[112]14
   and Moore–Penrose pseudoinversion^[113]32.

Intact observation based on 1D MRI

   All MRI experiments were performed at 298 K using a Bruker Avance III
   HD-500 spectrometer equipped with a ^1H inverse probe with triple
   resonance (Bruker BioSpin GmbH, Rheinstetten, Germany). The imaging
   area (range of the Z-gradient region) was detected as approximately
   150 mm using a series of layered solutions, as shown in Figs. [114]3C
   and [115]S6C. As shown in Fig. [116]S7, the pulse sequences for imaging
   were modified by embedding the Z-gradient into standard sequences of
   the acquisitions^[117]33. Proton density profiles were obtained with
   the spin echo imaging scheme depicted in Fig. [118]S7A. The strength of
   the magnetic gradients was set at 19.26 G/cm (40% of the maximum 48.15
   G/cm); spectral width, 250,000 Hz; and acquisition time, 0.001948 s.
   The approach used to measure the spin–spin relaxation time (T[2]) and
   the diffusion coefficient (D) of the fish muscle along the
   depth-concentration profile is schematically represented in Fig.
   [119]S6B,C. The parameters were as follows: spectral width of
   250,000 Hz and acquisition time of 0.001948 s. To analyze such a series
   of data, Dynamic Center (Ver 2.5.4, Bruker BioSpin GmbH, Rheinstetten,
   Germany) was used to calculate T[2] and D in the depth profiles. All
   data points in the imaging area (approximately 4000 data points) were
   used, and T[2] and D were fitted using the following relations in
   Dynamic Center:
   [MATH: <mrow><mi>f</mi><mfenced close=")"
   open="("><mi>t</mi></mfenced><mo>=</mo><msub><mi>I</mi><mn>0</mn></msub
   ><mrow></mrow><mo>∗</mo><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><m
   frac><mi>t</mi><mrow><mi>T</mi><mn>2</mn></mrow></mfrac></mrow></msup><
   mo>,</mo></mrow> :MATH]
   2
   [MATH: <mrow><mi>f</mi><mfenced close=")"
   open="("><mi>g</mi></mfenced><mo>=</mo><msub><mi>I</mi><mn>0</mn></msub
   ><mrow></mrow><mo>∗</mo><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><m
   sup><mrow><mi>γ</mi></mrow><mn>2</mn></msup><mrow></mrow><mo>∗</mo><msu
   p><mrow><mi>g</mi></mrow><mn>2</mn></msup><mrow></mrow><mo>∗</mo><msup>
   <mrow><mi>δ</mi></mrow><mn>2</mn></msup><mrow></mrow><mo>∗</mo><mfenced
   close=")" open="("><mi
   mathvariant="normal">Δ</mi><mo>-</mo><mfrac><mi>δ</mi><mn>3</mn></mfrac
   ></mfenced><mrow></mrow><mo>∗</mo><mi>D</mi></mrow></msup><mo>,</mo></m
   row> :MATH]
   3

   As shown in Figs. [120]3C and [121]S6, for the extraction of muscle
   texture features in the fish microstructure, the average value (AVE)
   and SD of T[2] and D in the imaging area were calculated to evaluate
   the proton mobility level and variation in depth. The moving average
   value and first derivative were calculated, and the number of zero
   points in the first derivative was obtained; this value was named the
   “EDGE” of T[2] and D from the imaging of fish muscle. The spectra of 2D
   chemical shift imaging (CSI)^[122]40 were observed using the pulse
   sequences in Fig. [123]S7. The parameters were as follows: spectral
   width of 14,098 Hz and acquisition time of 0.29 s. Spectral data were
   processed using SMOOSY software developed by our team, and the
   projections of F2 were used for analysis.

Machine learning, BN inference and statistical analysis

   PCA, machine learning, bnlearn and correlation analysis were performed
   on the R platform. Machine learning was performed with the RF method.
   The RF models were evaluated with fivefold cross-validation. The RF
   models were evaluated with fivefold cross-validation. After the
   division and reservation of the test data, the data for each class used
   for modeling were randomly duplicated to reach the maximum sample
   number for classes 1 to 4 to eliminate bias. The generated data with
   equal sample numbers for each class were used as the modeling data. RF
   calculations were performed 5 times each with the test data for k1 to
   k5. The result files of RF (prediction accuracy and important variables
   identified by the Gini index) were generated, and the average values of
   RF1 to RF5 were calculated and used as the final RF results. BN
   inference was performed with the score-based hill-climbing learning
   algorithm implemented in the R package bnlearn. For the ecological
   category-dependent network, RF-learned metabolites with high Gini index
   were used as the input continuous variables. For the integrative
   network, RF-learned metabolites with high Gini index, the major soluble
   macromolecules, physical characteristics and 1D MRI-based phenotypic
   textural features were used as the input continuous variables, while
   the four ecological categories were used as input categorical
   variables. As the output data, all the probabilistic strengths between
   two variables were calculated. The connection with a probabilistic
   strength more than 0.5 was used in BN network construction. The
   importance of each variable was calculated as the sum of all the
   probabilistic strengths of each variable connected to the corresponding
   parent and child nodes. Data discretization was performed with
   Hartemink’s information-preserving discretization algorithm in the
   infotheo R package. The network was visualized using Gephi software
   ([124]https://gephi.org/). Metabolic pathway enrichment analysis was
   performed using the free-web software MetaboAnalyst 4.0
   ([125]www.metaboanalyst.ca).

Supplementary Information

   [126]Supplementary Table.^ (102.9KB, xlsx)
   [127]Supplementary Figures.^ (1.7MB, pdf)

Acknowledgements