Abstract To achieve homeostasis, the human biological system relies on the interaction between organs through the binding of ligands secreted from source organs to receptors located on destination organs. Currently, the changing roles that receptors perform in tissues are only partially understood. Recently, a methodology based on receptor co-expression patterns to classify their tissue-specific metabolic functions was suggested. Here we present an advanced framework to predict an additional class of inflammatory receptors that use a feature space of biological pathway enrichment analysis scores of co-expression networks and their eigengene correlations. These are fed into three machine learning classifiers–eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and K-Nearest Neighbors (k-NN). We applied our methodology to subcutaneous and visceral adipose gene expression datasets derived from the GTEx (Genotype-Tissue Expression) project and compared the predictions. The XGBoost model demonstrated the best performance in predicting the pre-labeled receptors, with an accuracy of 0.89/0.8 in subcutaneous/visceral adipose. We analyzed ~700 receptors to predict eight new metabolic and 15 new inflammatory functions of receptors and four new metabolic functions for known inflammatory receptors in both adipose tissues. We cross-referenced multiple predictions using the published literature. Our results establish a picture of the changing functions of receptors for two adipose tissues that can be beneficial for drug development. Introduction As the human system instinctively and continuously aims to maintain a steady state, the biological system reacts to different conditions by activating feedback control loops between the cells in tissues, which is manifested through the binding of chemical structures called receptors to their ligands [[26]1]. These receptors are proteins, usually cell surface receptors, which bind to their ligands and cause a required response in the cell. When a ligand binds to its corresponding receptor, it activates or inhibits the receptor’s associated biochemical pathway. Receptors can control membrane channels, induce cell growth, division, and death [[27]1]. For example, insulin is a metabolic hormone ligand that is secreted from pancreatic cells into the bloodstream to bind distant insulin receptors located on various cell types [[28]2]. Upon insulin binding, the insulin receptors start a cascade of molecular events that result in, among other developments, glucose absorption by the cells [[29]3]. Another example is cytokines, which are ligands that serve as immunomodulating agents [[30]4]. As such, they have immune-signaling and inflammatory receptors that respond to circulating levels of proinflammatory cytokines, adipokines and other immune markers and trigger immune and inflammatory signaling pathways that are found in various cell types, including immune cells and non-immune cells[[31]5]. Since receptors play an important role in signal transduction within the cell, many drugs are designed to target receptors [[32]6,[33]7] and understanding the functions these receptors fulfill in different tissues is crucial in the development of these drugs. Despite years of biological experimental research, the current knowledge and understanding of the functions in general and the tissue-specific functions of many receptors specifically are lacking. High-throughput sequencing technologies generate gene expression data that measure the expression level of thousands of genes from a single experiment. Today, these technologies and algorithmic advancements enable us to research simultaneously hundreds of genes coded to receptors. A common task of gene expression analysis is the detection of gene–gene co-expression networks. The most popular method for specifying co-expression networks is Weighted Gene Co-Expression Network Analysis (WGCNA) [[34]8]. The WGCNA algorithm gathers together into gene modules (networks) related genes based on their co-expression patterns and topological closeness to neighbor genes in the network. The main concept behind WGCNA is that genes with similar functions might be co-expressed [[35]9] and thus co-expression networks are used to identify and categorize the functional roles of genes whose function is unknown. Supervised learning methods have been used to predict protein functions from gene expression data and gene co-expression [[36]10–[37]17]. For example, Support Vector Machines (SVMs) have successfully classified functional modules and protein interaction networks from gene expression data [[38]18]. Brown et al. [[39]10] proposed a method for functionally categorizing genes based on gene expression data. The authors examined numerous SVM models as well as other supervised learning approaches such as Parzen windows (a nonparametric method for estimating continuous density function), Fisher’s linear discriminant analysis (LDA), two decision tree classifiers (C4.5 and MOC1), and SVMs with different kernels. They discovered that SVM models with a radial kernel were the best at identifying groupings of genes with a common function using expression data according to the cost function the authors defined, trying to minimize the false-positive (FP) and false-negative (FN) errors (when giving double weight to the FN mistakes). The authors used log-transformed gene expression levels, i.e., DNA microarray hybridization experiments data, as the features’ space for the SVM models and measured their performances compared to the null learning producer that classifies all test examples as negative (a dummy classifier that always predicts “negative” for all examples). The results showed that SVM can recognize some functional classes successfully and outperformed the other examined models. Furthermore, based on their expression data, they employed SVMs to infer new functional functions for unannotated yeast genes. Kiliç et al. [[40]11] tested the SVM model for semi-supervised positive unlabeled (PU) learning (further discussed in the Methods section) as part of a survey of PU learning algorithms. The authors used Escherichia coli gene expression data combined with known protein interactions in such a way that if two proteins are known to be interacting, the example consisting of their expression profiles is a positive example and all other protein pairs are treated as (unlabeled) negative examples. SVM showed promising results for protein interaction predictions. Although clusters of gene expression profiles can be informative about function, they might not always be coherent, as pointed out by Zhou et al [[41]12]. The latter authors investigated a graph-theoretic approach, in which genes are encoded as nodes, and edges connect genes with correlated expression profiles—a co-expression network. They carry on conducting a simple experiment in which the shortest path between genes with the same GO (gene ontology) term is analyzed to determine whether genes in the path belong to the same GO term or GO terms that are ancestors or descendants in the ontology. Wu et al. [[42]13] showed that this method can be used to predict the function of unknown genes from known genes that are part of the same shortest path with good accuracy for several types of genes (mitochondrial and cytoplasmic) but with medium accuracy for nuclear genes. Romero et al. [[43]14] proposed a method that combines cluster analysis with hierarchical multi-label classification (HMC) in which examples may belong to more than one class at each hierarchical level at the same time. They employed spectral clustering to extract novel features from the gene co-expression network (GCN) to enhance the function prediction job. To generate consistent predictions, they emphasized the need to develop new characteristics that indicate the GCN structural qualities and the hierarchical structure of biological processes. Obregón et al. [[44]15] used the gene’s location in the genomes to which they belong to predict their function. They executed machine learning models and trained them using attributes derived from the location of genes in the genomes to which they belong to predict thousands of gene functions. The authors demonstrated that, in some situations, gene location alone can be more valuable than sequencing in determining gene function. Peng et al. [[45]16] used network correlation to create a semi-supervised autoencoder approach for integrating various networks and generating a low-dimensional feature representation. The authors used multi-network embedding using a semi-Auto Encoder to map input networks into a non-linear and low-dimension space. A convolutional neural network based on those integrated features’ embeddings was used to identify unlabeled gene functions. Both yeast and human datasets were evaluated, and the approach outperformed three other methods. Tahzeeb et al. [[46]17] examined the ability of several neural networks to predict protein function using Gene Ontology terms. Each protein instance was associated with several Gene Ontology (GO) terms of molecular function, resulting in a multilabel classification of protein functions using a dataset of reviewed protein entries from nine bacterial phyla. In addition to the association of each protein to multiple terms of GO molecular function, the dataset includes features such as the sequence of amino acids that make up the corresponding protein, compositions of amino acids, dipeptides, and tripeptides; compositions of five groups of amino acids, namely aliphatic, aromatic, positively charged, negatively charged, and uncharged, and various structural and physiochemical properties derived from the amino acid sequence. The researchers found that single-layer neural networks with a small number of neurons outperformed multi-layer neural networks. The GTEx project [[47]19] includes a unique collection of more than 8000 samples of RNA-seq gene expression data across multiple tissues collected from ~1000 donors. Using this data and focusing on metabolic and inflammatory roles of receptors, we ask the following question: How can we use gene expression data to predict the function of genes corresponding to proteins that represent receptors? Specifically, we focus on predicting two receptor functions: (1) metabolic functions that are related to the metabolic/endocytosis/growth regulation systems [[48]20–[49]22] and trigger various metabolic signaling pathways within the cell and (2) inflammatory functions that respond to circulating levels of proinflammatory cytokines, adipokines and other inflammatory markers and trigger inflammatory signaling pathways within the cell. Somekh [[50]23] suggested an approach for predicting the tissue-specific metabolic functions of receptor proteins based on gene expression data. The method was based on detecting receptor expression coordination patterns for over 700 receptors and predicting the metabolic roles of receptors in subcutaneous adipose tissue. The enrichment analysis scores of the receptor’s co-expression networks were fed as an input to SVM and k-NN classifiers. Using a semi-supervised technique and literature survey, Somekh [[51]23] compiled a list of known metabolic and non-metabolic receptors. Pathway enrichment scores were found by the authors to be highly successful indicators of correctly categorizing metabolic receptors in the subcutaneous adipose tissue. Here we extend and refine this previous work [[52]23] that predicted metabolic receptors in adipose subcutaneous by offering (1) an additional class of inflammatory receptors to classify three receptor classes–“metabolic”, “inflammatory”, and “other” class (neither metabolic nor inflammation-related), (2) an additional visceral adipose tissue, (3) an additional machine learning model–the XGBoost, and (4) a new feature for each tested receptor, based on the correlations between the receptor’s composing co-expression module eigengene and the correlations between this eigengene and the rest of the co-expression modules’ eigengenes. We add this feature to account for the modules’ connectivity, e.g., to include data on “close” metabolic modules that might be positively correlated and may add more knowledge on receptor roles using its co-expression. Results Our methodology classifies three classes of receptors applying to two adipose tissues. We validated our approach on the known labeled tissue-specific functions of receptors and further used our approach to predict new tissue-specific “metabolic”, “inflammatory”, and “other” functions of receptors in subcutaneous and visceral adipose. Our methodology is detailed in the following sections and a schematic view is presented in [53]Fig 1. Fig 1. Schematic workflow of data preprocessing and the proposed methodology. [54]Fig 1 [55]Open in a new tab Receptor labeled lists To construct a machine learning model, a labeled list of known receptor classes in each tissue is required. As mentioned above, we focused on three classes of receptors–“metabolic”, “inflammatory” and “other”. By “metabolic” we refer to receptors related to the metabolic, endocytosis, or growth regulation systems [[56]20–[57]22]. By “inflammatory” we refer to receptors that are activated as a result of immune-related stimuli such as inflammation or chronic disease by known immune-related ligands such as cytokines and chemokines. By “other” we refer to a group of receptors that are neither metabolic nor inflammatory. Receptor labeling was challenging since tissue-specific labeling of receptors is not an established knowledge and we had to generate it. For the classification of metabolic receptors, we used the “metabolic” labeling created by Somekh et al. [[58]23], which was based on a literature review and semi-supervised learning. We generated the “inflammatory” labeling using the known cytokine receptors derived from the KEGG (Kyoto Encyclopedia of Genes and Genomes) [[59]24] database, the “cytokine-cytokine receptor interaction” KEGG pathway. The “other” class was inferred using semi-supervised learning (see details in the [60]Methods section). Examples of receptors that were labeled as "other" are—GFRA2, HTR1F, KCNA3, ADCY7 and CATSPER1. GFRA2 gene encodes to a potent neurotropic factor and a receptor of Neurturin (NRTN) which regulate the survival and function of neurons [“GFRA2 GDNF family receptor alpha 2 [Homo sapiens (human)]”. NCBI. Retrieved 21 July 2022]. HTR1F gene encodes serotonin 5-TH 1F receptor that bind to the [61]endogenous neurotransmitter serotonin and mediate [62]inhibitory [63]neurotransmission [[64]"HTR2C 5-hydroxytryptamine receptor 2C [Homo sapiens (human)]". NCBI. Retrieved 21 July 2022]. KCNA3 gene encodes the Potassium voltage-gated channel, shaker-related subfamily, member 3 protein. Potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume [“KCNA3 potassium voltage-gated channel subfamily A member 3 [Homo sapiens (human)]”, NCBI. Retrieved 21 July 2022]. Adenylate Cyclase 7 (ADCY7) encodes a membrane-bound adenylate cyclase that catalyzes the formation of cyclic AMP from ATP and is inhibitable by calcium [“ADCY7 adenylate cyclase 7 [ Homo sapiens (human)]”. NCBI. Retrieved 21 July 2022]. The Cation Channel Sperm Associated 1 (CATSPER1) plays a central role in calcium-dependent physiological responses essential for successful fertilization, such as sperm hyperactivation, acrosome reaction and chemotaxis towards the oocyte [“CATSPER1 Cation Channel Sperm Associated 1 [Homo sapiens (human)]”. NCBI. Retrieved 21 July 2022]. This way we labeled three known receptor classes: “metabolic”, “inflammatory” and “other”. After labelling and processing, we had 44 “metabolic” receptors, 40 “inflammatory” receptors, and 50 “other” receptors for subcutaneous adipose and 45 “metabolic” receptors, 47 “inflammatory” receptors, and 48 “other” receptors for visceral adipose. Data preparation The GTEx subcutaneous and visceral adipose gene expression data were filtered, pre-processed, and corrected for batch effects (see the [65]Methods section). After filtering, we were left with 656 samples and 16,058 genes for subcutaneous adipose and 486 samples, and 16,091 genes for visceral adipose. Co‑expression module construction and annotation We utilized the WGCNA [[66]8] algorithm to generate 61 subcutaneous adipose co-expression networks and 38 visceral adipose co-expression networks (see [67]Methods section). Module (cluster) dendrograms for subcutaneous and visceral adipose can be found in S3 and S4 Figs in [68]S1 File, respectively. Following the construction of the modules, we executed KEGG pathway enrichment analysis for each module to generate their enrichment scores [[69]25]. The annotations of modules that include multiple known labeled receptors are demonstrated in [70]Fig 2. The figure presents a heatmap of ten representative WGCNA co-expression modules and their enrichment scores (-log(p-value) for p-values < 0.01) for KEGG’s biological pathways for both subcutaneous and visceral adipose. It can be seen that the modules that are enriched with multiple known metabolic receptors (highlighted as “Metabolic” on the x-axis), are enriched with metabolic biological pathways. For example, module #1 in subcutaneous adipose, which includes 52% of our “metabolic” labeled receptors (see S4 Table in [71]S1 File), is highly enriched with metabolic KEGG pathways that are classified as a “Metabolism” class according to the BRITE classification (shown in blue on the annotation column to the left). The modules that include multiple inflammatory receptors (highlighted as “Immune” on the x-axis) are significantly enriched with multiple pathways that are classified as “Human Diseases” (highlighted in green on the annotation column to the left). Heatmaps that include the full list of generated modules and their significantly enriched pathways are presented in S1 and S2 Figs in [72]S1 File for subcutaneous and visceral adipose, respectively. S4 and S5 Tables in [73]S1 File show that many receptors with similar functions tend to be clustered together across several main modules and present the percentages of labeled receptors from each class within the WGCNA modules. The full distribution of labeled receptors into the different WCGNA-generated modules can be found in S7 Table in [74]S1 File. Nevertheless, there are receptors, e.g., metabolic receptors, that are clustered in distinct modules and can be detected as metabolic only by using the new feature of module correlations and enrichment scores that are fed into the machine learning classifiers. Fig 2. Pathway enrichment analysis of ten representative WGCNA modules. [75]Fig 2 [76]Open in a new tab The significantly enriched KEGG pathways (adjusted p-values < = 0.01) and enrichment scores calculated for subcutaneous and visceral adipose are presented. The columns present the modules (that include many labeled receptors) in each tissue and the rows represent the significantly enriched KEGG biological pathways. The matrix cells present the enrichment scores of the pathways for each module. The pathways are classified into six classes according to the BRITE classification and highlighted in the annotation column to the left (see the y-axis). For example, you can see that the modules that include many metabolic receptors (highlighted as “Metabolic” on the x-axis) are highly enriched with the “Metabolism” classification (colored in blue on the y-axis). Machine learning model construction and validation We employed the XGBoost, linear SVM, and k-NN models to tackle the problem of multiclass classification of receptors in subcutaneous and visceral adipose tissues (see the [77]Methods section). All models utilized the following feature space per receptor: (1) the enrichment scores of the KEGG pathways applied to each receptor’s module, and (2) the receptor’s module eigengene correlations with other modules. To assess the performance of our classifiers, we utilized tenfold cross-validation (see the [78]Methods section). [79]Table 1 shows the performance of the classifiers for each adipose tissue in the three-class experiment. It can be seen that the XGB classifier outperforms the SVM and k-NN classifiers for both tissues, with accuracies of 0.89 and 0.8 for subcutaneous adipose and visceral adipose, correspondingly. Table 1. Performance evaluation for predicting “inflammatory”, “metabolic” and “other” receptor types in subcutaneous and visceral adipose. Method Adipose tissue Accuracy Precision Recall F1 1 XGB Subcutaneous 0.89 0.91 0.88 0.87 2 SVM Subcutaneous 0.85 0.85 0.84 0.82 3 k-NN Subcutaneous 0.87 0.89 0.87 0.86 1 XGB Visceral 0.8 0.83 0.8 0.79 2 SVM Visceral 0.69 0.74 0.69 0.66 3 k-NN Visceral 0.79 0.81 0.79 0.78 [80]Open in a new tab We then investigated the receptors that were misclassified by our models, i.e., their known functions and the predicted functions were not identical. [81]Table 2 presents the FP and FN misclassified receptors and highlights in bold the common misclassified receptors in both tissues. Interestingly, the EPOR, TNFRSF21, TNFRSF25, and IFNGR1 receptors that we labeled as inflammatory (based on KEGG’s cytokine-cytokine receptor interaction pathway) are predicted to be “metabolic” for both adipose tissues. LDLR and TFRC that we labeled as metabolic are both predicted to be “inflammatory” by the model for both adipose tissues. Some of the predictions are consistent for both tissues. For example, the inflammatory labeled receptors EPOR, TNFRSF21, IFNGR1, and TNFRSF25, as noted before, are predicted by the models to be metabolic (and not “other”) in both tissues. Indeed, we found experimental validation supporting our predictions (misclassifications) which we elaborate on in the discussion section. Table 2. Misclassified receptors in subcutaneous and visceral adipose with their true/predicted labels and probabilities. Gene Symbol Tissue True Label Predicted Label inflammatory probability Metabolic probability Other probability 1 LTBR SA inflammatory Metabolic 0.027 0.953 0.019 2 EPOR SA/VA Inflammatory Metabolic 0.014/0.013 0.982/0.982 0.004/0.005 3 TNFRSF25 SA/VA Inflammatory Metabolic 0.140/0.124 0.843/0.867 0.017/0.009 4 TNFRSF21 SA/VA Inflammatory Metabolic 0.004/0.008 0.992/0.989 0.004/0.004 5 IL17RA SA Inflammatory Metabolic 0.026 0.968 0.006 6 IFNAR1 SA Inflammatory Metabolic 0.039 0.947 0.014 7 IFNGR1 SA/VA Inflammatory Metabolic 0.022/0.030 0.973/0.968 0.005/0.001 8 CCR2 SA/VA Inflammatory Other 0.010/0.014 0.003/0.002 0.987/0.984 9 XCR1 SA Inflammatory Other 0.157 0.012 0.831 10 IL10RA SA/VA Inflammatory Other 0.032/0.003 0.010/0.001 0.958/0.996 11 IL12RB1 SA/VA Inflammatory Other 0.005/0.005 0.007/0.001 0.987/0.995 12 IL2RG SA/VA Inflammatory Other 0.004/0.014 0.004/0.001 0.993/0.986 13 F3 SA Metabolic Inflammatory 0.919 0.076 0.004 14 TFRC SA/VA Metabolic Inflammatory 0.603/0.776 0.340/0.215 0.057/0.009 15 LDLR SA/VA Metabolic Inflammatory 0.877/0.986 0.071/0.008 0.052/0.007 16 LEPR SA Metabolic Inflammatory 0.897 0.097 0.005 17 DRD4 SA Metabolic Inflammatory 0.962 0.027 0.010 18 ADRA2B SA Metabolic Other 0.061 0.129 0.810 19 TFR2 SA Metabolic Other 0.011 0.002 0.987 20 CD28 SA Other Inflammatory 0.704 0.014 0.282 21 MPL VA Inflammatory Metabolic 0.019 0.979 0.002 22 TNFRSF10D VA Inflammatory Metabolic 0.133 0.856 0.012 23 FAS VA Inflammatory Metabolic 0.061 0.934 0.005 24 IL22RA1 VA Inflammatory Metabolic 0.435 0.559 0.006 25 IL3RA VA Inflammatory Metabolic 0.028 0.970 0.003 26 CD27 VA Inflammatory Other 0.115 0.001 0.884 27 CSF2RB VA Inflammatory Other 0.035 0.002 0.963 28 CCR6 VA Inflammatory Other 0.211 0.008 0.781 28 IL2RB VA Inflammatory Other 0.110 0.007 0.883 30 IL10RB VA Inflammatory Other 0.004 0.001 0.995 31 EDNRB VA Metabolic Inflammatory 0.970 0.025 0.005 32 NOTCH4 VA Metabolic Inflammatory 0.732 0.263 0.005 33 ADRB2 VA Metabolic Inflammatory 0.815 0.177 0.008 34 FGFR2 VA Metabolic Inflammatory 0.865 0.132 0.003 35 S1PR4 VA Metabolic Inflammatory 0.988 0.007 0.005 36 CATSPER1 VA Other Inflammatory 0.573 0.006 0.421 37 KCNN4 VA Other Inflammatory 0.660 0.004 0.336 38 CD5 VA Other Inflammatory 0.664 0.004 0.332 39 NCR3 VA Other Inflammatory 0.804 0.014 0.182 40 CD48 VA Other Inflammatory 0.503 0.013 0.484 41 ITGAL VA Other Inflammatory 0.900 0.008 0.092 [82]Open in a new tab Receptors predicted as FN/FP in both adipose tissues are shown in bold. “SA” represents subcutaneous adipose and “VA” visceral adipose. Feature analysis for detecting significant biological pathways We used feature analysis with the SHapley Additive exPlanations (SHAP) [[83]26] to find the most predictive features, i.e., KEGG pathways significantly enriched with genes that are included in the receptor’s module and that drive the prediction of receptors (see the full description in the Experimental Design section). The SHAP values of each feature (KEGG pathway) represent the feature’s impact on the model output/classification of receptors. [84]Fig 3A and 3B show the ten most important features for subcutaneous and visceral adipose tissues, respectively. It shows the average SHAP influence on the magnitude of model output in absolute values for the top ten features in our three-class model. It can be seen that the most important feature affecting the “metabolic” classification of receptors (highlighted in pink in [85]Fig 3A) is the “Diabetic cardiomyopathy” pathway. Diabetic cardiomyopathy is defined as left ventricular dysfunction that occurs among patients with diabetes mellitus independent of a recognized cause such as coronary artery disease or hypertension [[86]https://www.kegg.jp/entry/hsa05415] and is characterized by insulin and metabolic resistance genes [[87]27]. The enrichment score of the “Necroptosis” pathway (third from the top in [88]Fig 3A), which is related to cell apoptosis and death, is most significant for classifying inflammatory receptors. We also highlighted KEGG’s BRITE hierarchy for annotation of these top KEGG pathways. It can be seen that many of the top pathways that drive the classification of metabolic/inflammatory/other receptors are annotated as “Metabolism” (shown using green dots) and “Immune system/disease” (shown using red dots). For example, see [89]Fig 3A where the “Linoleic acid metabolism”, “Glycosaminoglycan biosynthesis chondroitin sulfate”, “Phenylalanine” and “alpha-Linoleic acid metabolism” pathways are annotated as “Metabolism” (colored in green) for subcutaneous adipose and [90]Fig 3B where the top five “Immune system/disease” annotated pathways for adipose visceral are shown in red. We note that the values are absolute and the interpretation is not always straightforward, meaning that the combination of distinct features between the three classes and the metabolically annotated pathways (highlighted in green) are presented by the model as important features to distinguish the “other” type of receptors from the “metabolic” and “inflammatory” classifications. To get a better understanding of what we were seeing, we used a directional SHAP analysis that can only be generated for two types of classes (since it shows direction). We analyzed the metabolic against the inflammatory classes and used the SHAP method to investigate the direction of each feature’s contribution to the class classification in subcutaneous adipose (see [91]Fig 4). Fig 3. Top 10 features (pathways) and their average SHAP influence (absolute values) on the magnitude of the model prediction. [92]Fig 3 [93]Open in a new tab A. Calculated for subcutaneous adipose. B. Calculated for visceral adipose. The SHAP method illustrates the magnitude of the effect of each feature (KEGG pathway) on each classification. The colored dots to the left highlight KEGG’s BRITE annotations of the pathways. Fig 4. SHAP subcutaneous adipose “metabolic” class variable significance plot. [94]Fig 4 [95]Open in a new tab The values show the impact of the feature on model output (prediction). The plot is composed of all receptors in the training data. SHAP values indicate how much the feature contributes to the classification. [96]Fig 4 shows the total magnitudes of the SHAP values over all samples as a plot of features sorted in descending order by their relevance and uses the SHAP values to highlight the distribution of their impact on the model output (metabolic) prediction. Here we analyze and show how the value of the feature affects the “metabolic” class as opposed to the “inflammatory” class. The horizontal position indicates the influence of each feature, i.e., whether that value’s effect is related to a greater or lower prediction for the metabolic class. The coloring corresponds to each feature’s original values across samples and indicates whether that feature value (pathway enrichment score) is high (red) or low (blue) for that observation. The SHAP values of each feature are represented on the x-axis and represent the feature’s impact on the model output; the features (e.g., KEGG pathways) are on the y-axis. For example, a high value (red colored dots) of the enrichment score for “Diabetic cardiomyopathy” (the third from the bottom) has a negative impact (a negative SHAP value on the x-axis) on the “metabolic” type receptor prediction. In other words, a higher enrichment score for this pathway drives a metabolic prediction in most cases, increasing the probability of the receptor being categorized as belonging to the metabolic receptor group. An additional analysis of the effect (direction) of each feature’s contribution to the metabolic, inflammatory or other class classification is presented in S5 and S6 Figs in [97]S1 File. Prediction Finally, we chose the XGB model, which outperformed the other models, to use for predicting the unlabeled receptors. Out of the 692 known receptors list derived from Ramilowski et al. [[98]28] we retained the receptors that were included in the GTEx dataset, 594 and 600 for subcutaneous and visceral adipose respectively. From these, we retained the receptors that were included in the co-expression modules, 446 and 485 for subcutaneous and visceral adipose, respectively. We used 134 and 140 labeled receptors, for subcutaneous and visceral adipose respectively, for the training and testing phases (see [99]methods). Finally, we executed the model to predict the function of the remaining 312 and 345 unlabeled receptors for subcutaneous and visceral adipose respectively. The XGB model predicted 96 and 46 new unknown inflammatory receptors and 24 and 22 new unknown metabolic receptors (with probability > 0.85) for subcutaneous and visceral adipose, respectively. These full lists of newly predicted receptors for subcutaneous and visceral adipose can be found in S1 and S2 Tables in [100]S1 File, respectively. The receptors that were classified in the same way in both adipose tissues (classification probability > 0.85) are presented in [101]Table 3. We surveyed the literature for relevant wet lab experiments in support of our predictions. Column 5 in [102]Table 3 include the literature verification of the predictions, describing the experimental summary and the manuscript reference. We note that several inflammatory receptors (e.g., TNFRSF1B in row 9, [103]Table 3) which were included in the KEGG “Cytokine-cytokine signaling” pathway, were previously filtered out by us (see [104]Methods) since they are related to metabolic functions by GO. Nevertheless, these receptors are predicted by our classifiers to trigger inflammatory functions in adipose tissues. In addition, receptors that are strongly predicted to change functions (the predictions’ probability absolute difference is > 0.85) between the two tissues (e.g., predicted to be “metabolic” in one tissue and “inflammatory” in the other) are presented in S6 Table in [105]S1 File. Table 3. Receptors with unknown metabolic and inflammatory functions, which we predicted to have similar functions in both subcutaneous and visceral adipose tissues, and references supporting our predictions.