Abstract

   The human biological system uses ‘inter-organ’ communication to achieve
   a state of homeostasis. This communication occurs through the response
   of receptors, located on target organs, to the binding of secreted
   ligands from source organs. Albeit years of research, the roles these
   receptors play in tissues is only partially understood. This work
   presents a new methodology based on the enrichment analysis scores of
   co-expression networks fed into support vector machines (SVMs) and k-NN
   classifiers to predict the tissue-specific metabolic roles of
   receptors. The approach is primarily based on the detection of
   coordination patterns of receptors expression. These patterns and the
   enrichment analysis scores of their co-expression networks were used to
   analyse ~ 700 receptors and predict metabolic roles of receptors in
   subcutaneous adipose. To facilitate supervised learning, a list of
   known metabolic and non-metabolic receptors was constructed using a
   semi-supervised approach following literature-based verification. Our
   approach confirms that pathway enrichment scores are good signatures
   for correctly classifying the metabolic receptors in adipose. We also
   show that the k-NN method outperforms the SVM method in classifying
   metabolic receptors. Finally, we predict novel metabolic roles of
   receptors. These predictions can enhance biological understanding and
   the development of new receptor-targeting metabolic drugs.

   Subject terms: Systems biology, Computational biology and
   bioinformatics, Data processing, Functional clustering, Machine
   learning

Introduction

   The human system, as any other biological system, always aiming to
   achieve a state of homeostasis, responds to different conditions
   through activating feedback control loops between its sub-systems,
   organs and tissues. For example, to ensure whole organism survival, the
   endocrine system preserves long feedback loops of ligands secretion and
   receptors binding to maintain glucose or energetic balance.
   Ligand–receptor secretion and binding are accomplished by molecules,
   i.e., ligands, secreted into the blood stream from source organs that
   bind to receptors located on both the cell surface and within the cells
   of target organs. This complex network of whole-body ligand–receptor
   interactions serves as the information transducer of these feedback
   loops. Understanding these receptor roles is pivotal in the field of
   modern medicine. Receptor dysregulation underlies the etiology of many
   human diseases (e.g., diabetes^[24]1) and prescription drugs are
   designed to affect the regulation of receptors, e.g., by distrupting
   the interaction to the ligand, and produce therapeutic changes in the
   function of related biological systems^[25]1,[26]2. Moreover, receptors
   serve as targets for virus invasion of cells, e.g., the ACE-2 receptor
   is responsible for the entrance of the COVID-19 virus into the
   lungs^[27]3. Albeit years of research, our present-day understanding of
   the tissue-specific functions of many receptors and their ligand
   intercellular signalling networks is still incomplete. Developing drugs
   continues to be a challenge, as advances in scientific knowledge of
   receptors has been relatively slow, being based on laborious
   experimentation that typically precedes testing one or two receptors at
   a time in one or two tissues.

   The advent of ultrahigh-throughput sequencing technologies and
   algorithmic advancements now enable us to investigate systematically
   and simultaneously hundreds of genes coded to receptors. A recent
   computational work^[28]4 defined cross-tissue expression of
   ligand–receptor pairs by merely measuring the expression levels of
   ligands and receptors across 144 cell types. A common task of analysis
   of gene expression data is to detect gene–gene co-expression networks.
   These gene co-expression networks are based on the “guilt by
   association” concept that is related to the fact that functionally
   related genes are co-expressed^[29]5. Such networks are used to
   identify the functional roles of genes whose function is unknown by
   relating their co-expression networks to known biological processes.
   For example, Horan et al. annotated genes of known and unknown function
   by large-scale coexpression analysis^[30]6. The Weighted Gene
   Co-expression Network Analysis (WGCNA)^[31]7 is the most popular
   algorithm for specifying co-expression networks. The algorithm groups
   related genes into gene modules (clusters) based on their co-expression
   patterns and topological similarity to neighbour genes in the network.
   Machine learning approaches are gaining popularity for gene expression
   analysis^[32]8,[33]9 and the support vector machines (SVMs) are one of
   the most widely used type of machine learning algorithm for solving
   binary classification problems^[34]10. SVMs have successfully
   classified functional modules and protein interaction networks from
   gene expression data^[35]8,[36]9. The binary SVM classifier is based on
   defining a hyperplane that distinguishes between the positive labeled
   data (e.g., metabolic receptors) and the negative labeled data (e.g.,
   non-metabolic receptors) based on the feature space, the properties of
   the data. The k-NN (k-nearest neighbours) algorithm is a distance-based
   approach that classifies the data points based on the known
   classification of their neighbours^[37]11.

   The GTEx project^[38]20 includes a unique collection of thousands of
   samples of RNA-seq gene expression data across multiple tissues
   collected from hundreds of donors. Using this data and focusing on
   metabolic receptors and adipose tissue, we ask several questions: (1)
   Is expression of genes coded to receptors widely correlated within
   tissues? And in adipose in specific? (2) How can we use this data to
   infer the metabolic roles of receptors in tissues and to detect new
   metabolic receptors, not thought of as being members of a specific
   classically defined metabolic system? Together, answers to these
   questions can begin to delineate a comprehensive view of the metabolic
   network signalling.

   Here we present a new computational methodology to predict
   tissue-specific receptor metabolic functionality, which we applied to
   subcutaneous adipose. The methodology incorporates three steps A, B and
   C (see Fig. [39]1) and is based on our new finding that metabolic
   receptors are co-expressed, among themselves and with other genes. In
   Step A an annotated list of metabolic and non-metabolic receptors in
   adipose was constructed using a semi-supervised approach and
   literature-based validation. In Step B we used the (WGCNA)
   algorithm^[40]7 for co-expression network analysis to generate gene
   modules (clusters) in subcutaneous adipose followed by their pathways
   enrichment analysis. We used the enrichment scores to train SVMs and
   k-nearest neighbour (k-NN) classifiers and compared their performance,
   in Step C. Finally, we used the classifiers to predict new metabolic
   receptors, having previously unknown metabolic functions, in adipose.
   We used an extensive list of ~ 700 receptors for the full analyses and
   predictions.

Figure 1.

   [41]Figure 1
   [42]Open in a new tab

   Schematic view of the new computational methodology.

Results

   The new computational methodology predicts tissue-specific roles of
   metabolic receptors in subcutaneous adipose and comprises the following
   steps.

Step A: Subcutaneous adipose receptor labeled list

   Supervised learning requires an initial labeled list of known metabolic
   (positive examples) and non-metabolic (negative examples) receptors in
   a tissue for the training, performance evaluation and construction of
   the classifier.

   We chose to study adipose tissue^[43]13 since it is a highly active
   endocrine and metabolically important organ, with the ability to
   modulate glucose homeostasis, energy expenditure, lipid metabolism, and
   peripheral inflammation. In addition, the existing knowledge about its
   metabolic receptors roles is extensive and, experimentally, it was
   robustly tested in comparison to other tissues.

   One main challenge for us was to detect the receptors that exhibit
   metabolic roles in adipose and those that do not. We note that we use
   the term metabolic receptors to include receptors related to the
   metabolic/endocytosis/growth regulation system^[44]14–[45]16. This
   knowledge is not easily available since public databases, such as KEGG,
   do not include a metabolic receptor classification in general or a
   tissue-specific metabolic receptors classification in particular. For
   example, the KEGG database includes the “Neuroactive ligand-receptor
   interaction” pathway that consists of a combination of metabolic and
   non-metabolic receptors. The insulin receptor is included in its own
   pathway, the KEGG insulin signalling pathway. In addition to the "pure"
   metabolic receptors a receptor may exhibit ubiquitous roles across the
   whole body, e.g., a known inflammation-related cytokine receptor which
   we possibly label as a non-metabolic negative example, may exhibit
   metabolic roles in adipose. An example is the cytokine receptor
   TNFRSF21, a tumor necrosis factor receptor superfamily member 21, that
   is include in the KEGG “Cytokine-cytokine receptor interaction” but is
   also related to the “regulation of lipid metabolic process” in GO (Gene
   Ontology)^[46]17,[47]18.

   To construct the initial positively labeled receptors list, we gathered
   a list of 33 metabolic receptors known from the literature to be
   related to the regulation of growth, endocytosis and
   metabolism^[48]14–[49]16. The reader is directed to Supplemental Table
   [50]S1 for this list and additional references for the metabolic