Abstract

   To achieve homeostasis, the human biological system relies on the
   interaction between organs through the binding of ligands secreted from
   source organs to receptors located on destination organs. Currently,
   the changing roles that receptors perform in tissues are only partially
   understood. Recently, a methodology based on receptor co-expression
   patterns to classify their tissue-specific metabolic functions was
   suggested. Here we present an advanced framework to predict an
   additional class of inflammatory receptors that use a feature space of
   biological pathway enrichment analysis scores of co-expression networks
   and their eigengene correlations. These are fed into three machine
   learning classifiers–eXtreme Gradient Boosting (XGBoost), Support
   Vector Machines (SVM), and K-Nearest Neighbors (k-NN). We applied our
   methodology to subcutaneous and visceral adipose gene expression
   datasets derived from the GTEx (Genotype-Tissue Expression) project and
   compared the predictions. The XGBoost model demonstrated the best
   performance in predicting the pre-labeled receptors, with an accuracy
   of 0.89/0.8 in subcutaneous/visceral adipose. We analyzed ~700
   receptors to predict eight new metabolic and 15 new inflammatory
   functions of receptors and four new metabolic functions for known
   inflammatory receptors in both adipose tissues. We cross-referenced
   multiple predictions using the published literature. Our results
   establish a picture of the changing functions of receptors for two
   adipose tissues that can be beneficial for drug development.

Introduction

   As the human system instinctively and continuously aims to maintain a
   steady state, the biological system reacts to different conditions by
   activating feedback control loops between the cells in tissues, which
   is manifested through the binding of chemical structures called
   receptors to their ligands [[26]1]. These receptors are proteins,
   usually cell surface receptors, which bind to their ligands and cause a
   required response in the cell. When a ligand binds to its corresponding
   receptor, it activates or inhibits the receptor’s associated
   biochemical pathway. Receptors can control membrane channels, induce
   cell growth, division, and death [[27]1]. For example, insulin is a
   metabolic hormone ligand that is secreted from pancreatic cells into
   the bloodstream to bind distant insulin receptors located on various
   cell types [[28]2]. Upon insulin binding, the insulin receptors start a
   cascade of molecular events that result in, among other developments,
   glucose absorption by the cells [[29]3]. Another example is cytokines,
   which are ligands that serve as immunomodulating agents [[30]4]. As
   such, they have immune-signaling and inflammatory receptors that
   respond to circulating levels of proinflammatory cytokines, adipokines
   and other immune markers and trigger immune and inflammatory signaling
   pathways that are found in various cell types, including immune cells
   and non-immune cells[[31]5].

   Since receptors play an important role in signal transduction within
   the cell, many drugs are designed to target receptors [[32]6,[33]7] and
   understanding the functions these receptors fulfill in different
   tissues is crucial in the development of these drugs. Despite years of
   biological experimental research, the current knowledge and
   understanding of the functions in general and the tissue-specific
   functions of many receptors specifically are lacking.

   High-throughput sequencing technologies generate gene expression data
   that measure the expression level of thousands of genes from a single
   experiment. Today, these technologies and algorithmic advancements
   enable us to research simultaneously hundreds of genes coded to
   receptors. A common task of gene expression analysis is the detection
   of gene–gene co-expression networks. The most popular method for
   specifying co-expression networks is Weighted Gene Co-Expression
   Network Analysis (WGCNA) [[34]8]. The WGCNA algorithm gathers together
   into gene modules (networks) related genes based on their co-expression
   patterns and topological closeness to neighbor genes in the network.
   The main concept behind WGCNA is that genes with similar functions
   might be co-expressed [[35]9] and thus co-expression networks are used
   to identify and categorize the functional roles of genes whose function
   is unknown.

   Supervised learning methods have been used to predict protein functions
   from gene expression data and gene co-expression [[36]10–[37]17]. For
   example, Support Vector Machines (SVMs) have successfully classified
   functional modules and protein interaction networks from gene
   expression data [[38]18]. Brown et al. [[39]10] proposed a method for
   functionally categorizing genes based on gene expression data. The
   authors examined numerous SVM models as well as other supervised
   learning approaches such as Parzen windows (a nonparametric method for
   estimating continuous density function), Fisher’s linear discriminant
   analysis (LDA), two decision tree classifiers (C4.5 and MOC1), and SVMs
   with different kernels. They discovered that SVM models with a radial
   kernel were the best at identifying groupings of genes with a common
   function using expression data according to the cost function the
   authors defined, trying to minimize the false-positive (FP) and
   false-negative (FN) errors (when giving double weight to the FN
   mistakes). The authors used log-transformed gene expression levels,
   i.e., DNA microarray hybridization experiments data, as the features’
   space for the SVM models and measured their performances compared to
   the null learning producer that classifies all test examples as
   negative (a dummy classifier that always predicts “negative” for all
   examples). The results showed that SVM can recognize some functional
   classes successfully and outperformed the other examined models.
   Furthermore, based on their expression data, they employed SVMs to
   infer new functional functions for unannotated yeast genes.

   Kiliç et al. [[40]11] tested the SVM model for semi-supervised positive
   unlabeled (PU) learning (further discussed in the Methods section) as
   part of a survey of PU learning algorithms. The authors used
   Escherichia coli gene expression data combined with known protein
   interactions in such a way that if two proteins are known to be
   interacting, the example consisting of their expression profiles is a
   positive example and all other protein pairs are treated as (unlabeled)
   negative examples. SVM showed promising results for protein interaction
   predictions. Although clusters of gene expression profiles can be
   informative about function, they might not always be coherent, as
   pointed out by Zhou et al [[41]12]. The latter authors investigated a
   graph-theoretic approach, in which genes are encoded as nodes, and
   edges connect genes with correlated expression profiles—a co-expression
   network. They carry on conducting a simple experiment in which the
   shortest path between genes with the same GO (gene ontology) term is
   analyzed to determine whether genes in the path belong to the same GO
   term or GO terms that are ancestors or descendants in the ontology. Wu
   et al. [[42]13] showed that this method can be used to predict the
   function of unknown genes from known genes that are part of the same
   shortest path with good accuracy for several types of genes
   (mitochondrial and cytoplasmic) but with medium accuracy for nuclear
   genes. Romero et al. [[43]14] proposed a method that combines cluster
   analysis with hierarchical multi-label classification (HMC) in which
   examples may belong to more than one class at each hierarchical level
   at the same time. They employed spectral clustering to extract novel
   features from the gene co-expression network (GCN) to enhance the
   function prediction job. To generate consistent predictions, they
   emphasized the need to develop new characteristics that indicate the
   GCN structural qualities and the hierarchical structure of biological
   processes. Obregón et al. [[44]15] used the gene’s location in the
   genomes to which they belong to predict their function. They executed
   machine learning models and trained them using attributes derived from
   the location of genes in the genomes to which they belong to predict
   thousands of gene functions. The authors demonstrated that, in some
   situations, gene location alone can be more valuable than sequencing in
   determining gene function. Peng et al. [[45]16] used network
   correlation to create a semi-supervised autoencoder approach for
   integrating various networks and generating a low-dimensional feature
   representation. The authors used multi-network embedding using a
   semi-Auto Encoder to map input networks into a non-linear and
   low-dimension space. A convolutional neural network based on those
   integrated features’ embeddings was used to identify unlabeled gene
   functions. Both yeast and human datasets were evaluated, and the
   approach outperformed three other methods. Tahzeeb et al. [[46]17]
   examined the ability of several neural networks to predict protein
   function using Gene Ontology terms. Each protein instance was
   associated with several Gene Ontology (GO) terms of molecular function,
   resulting in a multilabel classification of protein functions using a
   dataset of reviewed protein entries from nine bacterial phyla. In
   addition to the association of each protein to multiple terms of GO
   molecular function, the dataset includes features such as the sequence
   of amino acids that make up the corresponding protein, compositions of
   amino acids, dipeptides, and tripeptides; compositions of five groups
   of amino acids, namely aliphatic, aromatic, positively charged,
   negatively charged, and uncharged, and various structural and
   physiochemical properties derived from the amino acid sequence. The
   researchers found that single-layer neural networks with a small number
   of neurons outperformed multi-layer neural networks.

   The GTEx project [[47]19] includes a unique collection of more than
   8000 samples of RNA-seq gene expression data across multiple tissues
   collected from ~1000 donors. Using this data and focusing on metabolic
   and inflammatory roles of receptors, we ask the following question: How
   can we use gene expression data to predict the function of genes
   corresponding to proteins that represent receptors? Specifically, we
   focus on predicting two receptor functions: (1) metabolic functions
   that are related to the metabolic/endocytosis/growth regulation systems
   [[48]20–[49]22] and trigger various metabolic signaling pathways within
   the cell and (2) inflammatory functions that respond to circulating
   levels of proinflammatory cytokines, adipokines and other inflammatory
   markers and trigger inflammatory signaling pathways within the cell.

   Somekh [[50]23] suggested an approach for predicting the
   tissue-specific metabolic functions of receptor proteins based on gene
   expression data. The method was based on detecting receptor expression
   coordination patterns for over 700 receptors and predicting the
   metabolic roles of receptors in subcutaneous adipose tissue. The
   enrichment analysis scores of the receptor’s co-expression networks
   were fed as an input to SVM and k-NN classifiers. Using a
   semi-supervised technique and literature survey, Somekh [[51]23]
   compiled a list of known metabolic and non-metabolic receptors. Pathway
   enrichment scores were found by the authors to be highly successful
   indicators of correctly categorizing metabolic receptors in the
   subcutaneous adipose tissue.

   Here we extend and refine this previous work [[52]23] that predicted
   metabolic receptors in adipose subcutaneous by offering (1) an
   additional class of inflammatory receptors to classify three receptor
   classes–“metabolic”, “inflammatory”, and “other” class (neither
   metabolic nor inflammation-related), (2) an additional visceral adipose
   tissue, (3) an additional machine learning model–the XGBoost, and (4) a
   new feature for each tested receptor, based on the correlations between
   the receptor’s composing co-expression module eigengene and the
   correlations between this eigengene and the rest of the co-expression
   modules’ eigengenes. We add this feature to account for the modules’
   connectivity, e.g., to include data on “close” metabolic modules that
   might be positively correlated and may add more knowledge on receptor
   roles using its co-expression.

Results

   Our methodology classifies three classes of receptors applying to two
   adipose tissues. We validated our approach on the known labeled
   tissue-specific functions of receptors and further used our approach to
   predict new tissue-specific “metabolic”, “inflammatory”, and “other”
   functions of receptors in subcutaneous and visceral adipose. Our
   methodology is detailed in the following sections and a schematic view
   is presented in [53]Fig 1.

Fig 1. Schematic workflow of data preprocessing and the proposed methodology.

   [54]Fig 1
   [55]Open in a new tab

Receptor labeled lists

   To construct a machine learning model, a labeled list of known receptor
   classes in each tissue is required. As mentioned above, we focused on
   three classes of receptors–“metabolic”, “inflammatory” and “other”. By
   “metabolic” we refer to receptors related to the metabolic,
   endocytosis, or growth regulation systems [[56]20–[57]22]. By
   “inflammatory” we refer to receptors that are activated as a result of
   immune-related stimuli such as inflammation or chronic disease by known
   immune-related ligands such as cytokines and chemokines. By “other” we
   refer to a group of receptors that are neither metabolic nor
   inflammatory.

   Receptor labeling was challenging since tissue-specific labeling of
   receptors is not an established knowledge and we had to generate it.
   For the classification of metabolic receptors, we used the “metabolic”
   labeling created by Somekh et al. [[58]23], which was based on a
   literature review and semi-supervised learning. We generated the
   “inflammatory” labeling using the known cytokine receptors derived from
   the KEGG (Kyoto Encyclopedia of Genes and Genomes) [[59]24] database,
   the “cytokine-cytokine receptor interaction” KEGG pathway. The “other”
   class was inferred using semi-supervised learning (see details in the
   [60]Methods section). Examples of receptors that were labeled as
   "other" are—GFRA2, HTR1F, KCNA3, ADCY7 and CATSPER1. GFRA2 gene encodes
   to a potent neurotropic factor and a receptor of Neurturin (NRTN) which
   regulate the survival and function of neurons [“GFRA2 GDNF family
   receptor alpha 2 [Homo sapiens (human)]”. NCBI. Retrieved 21 July
   2022]. HTR1F gene encodes serotonin 5-TH 1F receptor that bind to the
   [61]endogenous neurotransmitter serotonin and mediate [62]inhibitory
   [63]neurotransmission [[64]"HTR2C 5-hydroxytryptamine receptor 2C [Homo
   sapiens (human)]". NCBI. Retrieved 21 July 2022]. KCNA3 gene encodes
   the Potassium voltage-gated channel, shaker-related subfamily, member 3
   protein. Potassium channels represent the most complex class of
   voltage-gated ion channels from both functional and structural
   standpoints. Their diverse functions include regulating
   neurotransmitter release, heart rate, insulin secretion, neuronal
   excitability, epithelial electrolyte transport, smooth muscle
   contraction, and cell volume [“KCNA3 potassium voltage-gated channel
   subfamily A member 3 [Homo sapiens (human)]”, NCBI. Retrieved 21 July
   2022]. Adenylate Cyclase 7 (ADCY7) encodes a membrane-bound adenylate
   cyclase that catalyzes the formation of cyclic AMP from ATP and is
   inhibitable by calcium [“ADCY7 adenylate cyclase 7 [ Homo sapiens
   (human)]”. NCBI. Retrieved 21 July 2022]. The Cation Channel Sperm
   Associated 1 (CATSPER1) plays a central role in calcium-dependent
   physiological responses essential for successful fertilization, such as
   sperm hyperactivation, acrosome reaction and chemotaxis towards the
   oocyte [“CATSPER1 Cation Channel Sperm Associated 1 [Homo sapiens
   (human)]”. NCBI. Retrieved 21 July 2022].

   This way we labeled three known receptor classes: “metabolic”,
   “inflammatory” and “other”. After labelling and processing, we had 44
   “metabolic” receptors, 40 “inflammatory” receptors, and 50 “other”
   receptors for subcutaneous adipose and 45 “metabolic” receptors, 47
   “inflammatory” receptors, and 48 “other” receptors for visceral
   adipose.

Data preparation

   The GTEx subcutaneous and visceral adipose gene expression data were
   filtered, pre-processed, and corrected for batch effects (see the
   [65]Methods section). After filtering, we were left with 656 samples
   and 16,058 genes for subcutaneous adipose and 486 samples, and 16,091
   genes for visceral adipose.

Co‑expression module construction and annotation

   We utilized the WGCNA [[66]8] algorithm to generate 61 subcutaneous
   adipose co-expression networks and 38 visceral adipose co-expression
   networks (see [67]Methods section). Module (cluster) dendrograms for
   subcutaneous and visceral adipose can be found in S3 and S4 Figs in
   [68]S1 File, respectively. Following the construction of the modules,
   we executed KEGG pathway enrichment analysis for each module to
   generate their enrichment scores [[69]25]. The annotations of modules
   that include multiple known labeled receptors are demonstrated in
   [70]Fig 2. The figure presents a heatmap of ten representative WGCNA
   co-expression modules and their enrichment scores (-log(p-value) for
   p-values < 0.01) for KEGG’s biological pathways for both subcutaneous
   and visceral adipose. It can be seen that the modules that are enriched
   with multiple known metabolic receptors (highlighted as “Metabolic” on
   the x-axis), are enriched with metabolic biological pathways. For
   example, module #1 in subcutaneous adipose, which includes 52% of our
   “metabolic” labeled receptors (see S4 Table in [71]S1 File), is highly
   enriched with metabolic KEGG pathways that are classified as a
   “Metabolism” class according to the BRITE classification (shown in blue
   on the annotation column to the left). The modules that include
   multiple inflammatory receptors (highlighted as “Immune” on the x-axis)
   are significantly enriched with multiple pathways that are classified
   as “Human Diseases” (highlighted in green on the annotation column to
   the left). Heatmaps that include the full list of generated modules and
   their significantly enriched pathways are presented in S1 and S2 Figs
   in [72]S1 File for subcutaneous and visceral adipose, respectively. S4
   and S5 Tables in [73]S1 File show that many receptors with similar
   functions tend to be clustered together across several main modules and
   present the percentages of labeled receptors from each class within the
   WGCNA modules. The full distribution of labeled receptors into the
   different WCGNA-generated modules can be found in S7 Table in [74]S1
   File. Nevertheless, there are receptors, e.g., metabolic receptors,
   that are clustered in distinct modules and can be detected as metabolic
   only by using the new feature of module correlations and enrichment
   scores that are fed into the machine learning classifiers.

Fig 2. Pathway enrichment analysis of ten representative WGCNA modules.

   [75]Fig 2
   [76]Open in a new tab

   The significantly enriched KEGG pathways (adjusted p-values < = 0.01)
   and enrichment scores calculated for subcutaneous and visceral adipose
   are presented. The columns present the modules (that include many
   labeled receptors) in each tissue and the rows represent the
   significantly enriched KEGG biological pathways. The matrix cells
   present the enrichment scores of the pathways for each module. The
   pathways are classified into six classes according to the BRITE
   classification and highlighted in the annotation column to the left
   (see the y-axis). For example, you can see that the modules that
   include many metabolic receptors (highlighted as “Metabolic” on the
   x-axis) are highly enriched with the “Metabolism” classification
   (colored in blue on the y-axis).

Machine learning model construction and validation

   We employed the XGBoost, linear SVM, and k-NN models to tackle the
   problem of multiclass classification of receptors in subcutaneous and
   visceral adipose tissues (see the [77]Methods section). All models
   utilized the following feature space per receptor: (1) the enrichment
   scores of the KEGG pathways applied to each receptor’s module, and (2)
   the receptor’s module eigengene correlations with other modules. To
   assess the performance of our classifiers, we utilized tenfold
   cross-validation (see the [78]Methods section).

   [79]Table 1 shows the performance of the classifiers for each adipose
   tissue in the three-class experiment. It can be seen that the XGB
   classifier outperforms the SVM and k-NN classifiers for both tissues,
   with accuracies of 0.89 and 0.8 for subcutaneous adipose and visceral
   adipose, correspondingly.

Table 1. Performance evaluation for predicting “inflammatory”, “metabolic”
and “other” receptor types in subcutaneous and visceral adipose.

     Method Adipose tissue Accuracy Precision Recall  F1
   1  XGB   Subcutaneous    0.89      0.91    0.88   0.87
   2  SVM    Subcutaneous    0.85     0.85     0.84  0.82
   3  k-NN   Subcutaneous    0.87     0.89     0.87  0.86
   1  XGB     Visceral       0.8      0.83     0.8   0.79
   2  SVM      Visceral      0.69     0.74     0.69  0.66
   3  k-NN     Visceral      0.79     0.81     0.79  0.78
   [80]Open in a new tab

   We then investigated the receptors that were misclassified by our
   models, i.e., their known functions and the predicted functions were
   not identical. [81]Table 2 presents the FP and FN misclassified
   receptors and highlights in bold the common misclassified receptors in
   both tissues. Interestingly, the EPOR, TNFRSF21, TNFRSF25, and IFNGR1
   receptors that we labeled as inflammatory (based on KEGG’s
   cytokine-cytokine receptor interaction pathway) are predicted to be
   “metabolic” for both adipose tissues. LDLR and TFRC that we labeled as
   metabolic are both predicted to be “inflammatory” by the model for both
   adipose tissues. Some of the predictions are consistent for both
   tissues. For example, the inflammatory labeled receptors EPOR,
   TNFRSF21, IFNGR1, and TNFRSF25, as noted before, are predicted by the
   models to be metabolic (and not “other”) in both tissues. Indeed, we
   found experimental validation supporting our predictions
   (misclassifications) which we elaborate on in the discussion section.

Table 2. Misclassified receptors in subcutaneous and visceral adipose with
their true/predicted labels and probabilities.

   Gene Symbol Tissue True Label Predicted Label inflammatory probability
   Metabolic probability Other probability
   1 LTBR SA inflammatory Metabolic 0.027 0.953 0.019
   2 EPOR SA/VA Inflammatory Metabolic 0.014/0.013 0.982/0.982 0.004/0.005
   3 TNFRSF25 SA/VA Inflammatory Metabolic 0.140/0.124 0.843/0.867
   0.017/0.009
   4 TNFRSF21 SA/VA Inflammatory Metabolic 0.004/0.008 0.992/0.989
   0.004/0.004
   5 IL17RA SA Inflammatory Metabolic 0.026 0.968 0.006
   6 IFNAR1 SA Inflammatory Metabolic 0.039 0.947 0.014
   7 IFNGR1 SA/VA Inflammatory Metabolic 0.022/0.030 0.973/0.968
   0.005/0.001
   8 CCR2 SA/VA Inflammatory Other 0.010/0.014 0.003/0.002 0.987/0.984
   9 XCR1 SA Inflammatory Other 0.157 0.012 0.831
   10 IL10RA SA/VA Inflammatory Other 0.032/0.003 0.010/0.001 0.958/0.996
   11 IL12RB1 SA/VA Inflammatory Other 0.005/0.005 0.007/0.001 0.987/0.995
   12 IL2RG SA/VA Inflammatory Other 0.004/0.014 0.004/0.001 0.993/0.986
   13 F3 SA Metabolic Inflammatory 0.919 0.076 0.004
   14 TFRC SA/VA Metabolic Inflammatory 0.603/0.776 0.340/0.215
   0.057/0.009
   15 LDLR SA/VA Metabolic Inflammatory 0.877/0.986 0.071/0.008
   0.052/0.007
   16 LEPR SA Metabolic Inflammatory 0.897 0.097 0.005
   17 DRD4 SA Metabolic Inflammatory 0.962 0.027 0.010
   18 ADRA2B SA Metabolic Other 0.061 0.129 0.810
   19 TFR2 SA Metabolic Other 0.011 0.002 0.987
   20 CD28 SA Other Inflammatory 0.704 0.014 0.282
   21 MPL VA Inflammatory Metabolic 0.019 0.979 0.002
   22 TNFRSF10D VA Inflammatory Metabolic 0.133 0.856 0.012
   23 FAS VA Inflammatory Metabolic 0.061 0.934 0.005
   24 IL22RA1 VA Inflammatory Metabolic 0.435 0.559 0.006
   25 IL3RA VA Inflammatory Metabolic 0.028 0.970 0.003
   26 CD27 VA Inflammatory Other 0.115 0.001 0.884
   27 CSF2RB VA Inflammatory Other 0.035 0.002 0.963
   28 CCR6 VA Inflammatory Other 0.211 0.008 0.781
   28 IL2RB VA Inflammatory Other 0.110 0.007 0.883
   30 IL10RB VA Inflammatory Other 0.004 0.001 0.995
   31 EDNRB VA Metabolic Inflammatory 0.970 0.025 0.005
   32 NOTCH4 VA Metabolic Inflammatory 0.732 0.263 0.005
   33 ADRB2 VA Metabolic Inflammatory 0.815 0.177 0.008
   34 FGFR2 VA Metabolic Inflammatory 0.865 0.132 0.003
   35 S1PR4 VA Metabolic Inflammatory 0.988 0.007 0.005
   36 CATSPER1 VA Other Inflammatory 0.573 0.006 0.421
   37 KCNN4 VA Other Inflammatory 0.660 0.004 0.336
   38 CD5 VA Other Inflammatory 0.664 0.004 0.332
   39 NCR3 VA Other Inflammatory 0.804 0.014 0.182
   40 CD48 VA Other Inflammatory 0.503 0.013 0.484
   41 ITGAL VA Other Inflammatory 0.900 0.008 0.092
   [82]Open in a new tab

   Receptors predicted as FN/FP in both adipose tissues are shown in bold.
   “SA” represents subcutaneous adipose and “VA” visceral adipose.

Feature analysis for detecting significant biological pathways

   We used feature analysis with the SHapley Additive exPlanations (SHAP)
   [[83]26] to find the most predictive features, i.e., KEGG pathways
   significantly enriched with genes that are included in the receptor’s
   module and that drive the prediction of receptors (see the full
   description in the Experimental Design section). The SHAP values of
   each feature (KEGG pathway) represent the feature’s impact on the model
   output/classification of receptors. [84]Fig 3A and 3B show the ten most
   important features for subcutaneous and visceral adipose tissues,
   respectively. It shows the average SHAP influence on the magnitude of
   model output in absolute values for the top ten features in our
   three-class model. It can be seen that the most important feature
   affecting the “metabolic” classification of receptors (highlighted in
   pink in [85]Fig 3A) is the “Diabetic cardiomyopathy” pathway. Diabetic
   cardiomyopathy is defined as left ventricular dysfunction that occurs
   among patients with diabetes mellitus independent of a recognized cause
   such as coronary artery disease or hypertension
   [[86]https://www.kegg.jp/entry/hsa05415] and is characterized by
   insulin and metabolic resistance genes [[87]27]. The enrichment score
   of the “Necroptosis” pathway (third from the top in [88]Fig 3A), which
   is related to cell apoptosis and death, is most significant for
   classifying inflammatory receptors. We also highlighted KEGG’s BRITE
   hierarchy for annotation of these top KEGG pathways. It can be seen
   that many of the top pathways that drive the classification of
   metabolic/inflammatory/other receptors are annotated as “Metabolism”
   (shown using green dots) and “Immune system/disease” (shown using red
   dots). For example, see [89]Fig 3A where the “Linoleic acid
   metabolism”, “Glycosaminoglycan biosynthesis chondroitin sulfate”,
   “Phenylalanine” and “alpha-Linoleic acid metabolism” pathways are
   annotated as “Metabolism” (colored in green) for subcutaneous adipose
   and [90]Fig 3B where the top five “Immune system/disease” annotated
   pathways for adipose visceral are shown in red. We note that the values
   are absolute and the interpretation is not always straightforward,
   meaning that the combination of distinct features between the three
   classes and the metabolically annotated pathways (highlighted in green)
   are presented by the model as important features to distinguish the
   “other” type of receptors from the “metabolic” and “inflammatory”
   classifications. To get a better understanding of what we were seeing,
   we used a directional SHAP analysis that can only be generated for two
   types of classes (since it shows direction). We analyzed the metabolic
   against the inflammatory classes and used the SHAP method to
   investigate the direction of each feature’s contribution to the class
   classification in subcutaneous adipose (see [91]Fig 4).

Fig 3. Top 10 features (pathways) and their average SHAP influence (absolute
values) on the magnitude of the model prediction.

   [92]Fig 3
   [93]Open in a new tab

   A. Calculated for subcutaneous adipose. B. Calculated for visceral
   adipose. The SHAP method illustrates the magnitude of the effect of
   each feature (KEGG pathway) on each classification. The colored dots to
   the left highlight KEGG’s BRITE annotations of the pathways.

Fig 4. SHAP subcutaneous adipose “metabolic” class variable significance
plot.

   [94]Fig 4
   [95]Open in a new tab

   The values show the impact of the feature on model output (prediction).
   The plot is composed of all receptors in the training data. SHAP values
   indicate how much the feature contributes to the classification.

   [96]Fig 4 shows the total magnitudes of the SHAP values over all
   samples as a plot of features sorted in descending order by their
   relevance and uses the SHAP values to highlight the distribution of
   their impact on the model output (metabolic) prediction. Here we
   analyze and show how the value of the feature affects the “metabolic”
   class as opposed to the “inflammatory” class. The horizontal position
   indicates the influence of each feature, i.e., whether that value’s
   effect is related to a greater or lower prediction for the metabolic
   class. The coloring corresponds to each feature’s original values
   across samples and indicates whether that feature value (pathway
   enrichment score) is high (red) or low (blue) for that observation. The
   SHAP values of each feature are represented on the x-axis and represent
   the feature’s impact on the model output; the features (e.g., KEGG
   pathways) are on the y-axis. For example, a high value (red colored
   dots) of the enrichment score for “Diabetic cardiomyopathy” (the third
   from the bottom) has a negative impact (a negative SHAP value on the
   x-axis) on the “metabolic” type receptor prediction. In other words, a
   higher enrichment score for this pathway drives a metabolic prediction
   in most cases, increasing the probability of the receptor being
   categorized as belonging to the metabolic receptor group. An additional
   analysis of the effect (direction) of each feature’s contribution to
   the metabolic, inflammatory or other class classification is presented
   in S5 and S6 Figs in [97]S1 File.

Prediction

   Finally, we chose the XGB model, which outperformed the other models,
   to use for predicting the unlabeled receptors. Out of the 692 known
   receptors list derived from Ramilowski et al. [[98]28] we retained the
   receptors that were included in the GTEx dataset, 594 and 600 for
   subcutaneous and visceral adipose respectively. From these, we retained
   the receptors that were included in the co-expression modules, 446 and
   485 for subcutaneous and visceral adipose, respectively. We used 134
   and 140 labeled receptors, for subcutaneous and visceral adipose
   respectively, for the training and testing phases (see [99]methods).
   Finally, we executed the model to predict the function of the remaining
   312 and 345 unlabeled receptors for subcutaneous and visceral adipose
   respectively. The XGB model predicted 96 and 46 new unknown
   inflammatory receptors and 24 and 22 new unknown metabolic receptors
   (with probability > 0.85) for subcutaneous and visceral adipose,
   respectively. These full lists of newly predicted receptors for
   subcutaneous and visceral adipose can be found in S1 and S2 Tables in
   [100]S1 File, respectively. The receptors that were classified in the
   same way in both adipose tissues (classification probability > 0.85)
   are presented in [101]Table 3. We surveyed the literature for relevant
   wet lab experiments in support of our predictions. Column 5 in
   [102]Table 3 include the literature verification of the predictions,
   describing the experimental summary and the manuscript reference. We
   note that several inflammatory receptors (e.g., TNFRSF1B in row 9,
   [103]Table 3) which were included in the KEGG “Cytokine-cytokine
   signaling” pathway, were previously filtered out by us (see
   [104]Methods) since they are related to metabolic functions by GO.
   Nevertheless, these receptors are predicted by our classifiers to
   trigger inflammatory functions in adipose tissues. In addition,
   receptors that are strongly predicted to change functions (the
   predictions’ probability absolute difference is > 0.85) between the two
   tissues (e.g., predicted to be “metabolic” in one tissue and
   “inflammatory” in the other) are presented in S6 Table in [105]S1 File.

Table 3. Receptors with unknown metabolic and inflammatory functions, which
we predicted to have similar functions in both subcutaneous and visceral
adipose tissues, and references supporting our predictions.