Abstract

   Type 2 diabetes (T2D) is known as a disease caused by gene alterations
   characterized by insulin resistance, thus the insulin-responsive
   tissues are of great interest for T2D study. It’s of great relevance to
   systematically investigate commonalities and specificities of T2D among
   those tissues. Here we establish a multi-level comparative framework
   across three insulin target tissues (white adipose, skeletal muscle,
   and liver) to provide a better understanding of T2D. Starting from the
   ranks of gene expression, we constructed the ‘disease network’ through
   detecting diverse interactions to provide a well-characterization for
   disease affected tissues. Then, we applied random walk with restart
   algorithm to the disease network to prioritize its nodes and edges
   according to their association with T2D. Finally, we identified a
   merged core module by combining the clustering coefficient and Jaccard
   index, which can provide elaborate and visible illumination of the
   common and specific features for different tissues at network level.
   Taken together, our network-, gene-, and module-level characterization
   across different tissues of T2D hold the promise to provide a broader
   and deeper understanding for T2D mechanism.

   Keywords: type 2 diabetes, gene pairwise expression, dysfunctional
   interactions, multi-level analysis, random walk with restart

Introduction

   Type 2 diabetes (T2D) is one of the leading complex diseases. It is
   most commonly seen in older adults, but it is increasingly seen in
   children, adolescents, and younger adults due to rising levels of
   obesity, physical inactivity, and poor diet ([27]International Diabetes
   Federation [IDF], 2017). T2D is mainly a glucose metabolism disorder,
   and is currently believed to be a heterogeneous disease. One important
   fact is that the development of T2D involves multiple tissues
   ([28]Zhong et al., 2010; [29]Camastra et al., 2011; [30]Petersen et
   al., 2012; [31]Tang et al., 2018). Those tissues include white adipose
   tissue, skeletal muscle, and liver (shortly written as adipose, muscle,
   and liver hereafter). Each tissue has its own characteristics induced
   by T2D. One important and challenging problem is to explore the cross
   talk among multiple tissues since T2D is a systemic disease involving
   complicated synergy and regulation among different tissues. Exploring
   its underlying mechanisms across multiple tissues will be helpful for
   personalized therapy and precision medicine in T2D treatment. However,
   most existing studies focused on single tissue only ([32]Li et al.,
   2015; [33]Lee and Kim, 2016; [34]Alghamdi et al., 2017).

   On the other hand, it has been accepted that although molecules are
   basic components of cellular machinery, a complex disease is generally
   caused not from the malfunction of individual molecules but from the
   interplay of a group of correlated molecules or a network. Thus, with
   the development of bioinformatics and high-throughput data,
   network-based characterization of complex diseases, including T2D, has
   been invaluable to integrate and interpret functional genomics datasets
   and identify new biomarkers or modules to better classify patients into
   subtypes. Such approaches are much more powerful than approaches that
   examine a single gene at a time ([35]Barabási et al., 2011; [36]Hofree
   et al., 2013; [37]Zhang et al., 2014; [38]Liu et al., 2017; [39]Zhang
   and Zhang, 2017; [40]Hwang et al., 2018; [41]Zou et al., 2018).
   However, most methods tended to characterize the complex programs using
   sets of genes, while the interaction (described as edge in biological
   network) information was not fully utilized in final prediction or
   analysis. Therefore, such methods cannot inform us on the functional
   discrepancies between gene pairs that are perturbed under disease.

   Here, we propose a multi-level comparative framework mostly focusing on
   interactions across different representative tissues to gain a broader
   and further understanding of T2D. The whole framework can be divided
   into two phases and each phase is comprised of several major steps
   ([42]Figure 1). The first phase is disease network construction, to
   effectively characterize the abnormal information response to disease.
   In recent years, many efforts have been made to extract disease
   information through selecting co-expression gene pairs whose
   expressions were highly correlated across samples. These methods were
   under the hypothesis that genes associated with the same disorder tend
   to share common functional features, i.e., their protein products tend
   to interact with each other. However, genes show highly correlated
   patterns of expression in one biological state, but not in another,
   i.e., they may not be highly correlated across the entire dataset, and
   therefore they fail to be picked out by co-expression based methods
   ([43]Zhang and Horvath, 2005; [44]Fuente, 2010). For this reason, we
   proposed a method based on finding ‘diverse interactions’ according to
   the discrepancy between correlations in different phenotypes by
   extending our previous work ([45]Sun et al., 2013). Beyond our previous
   work, we further considered the weight of interactions together, thus
   all the diverse interactions along with their discrepant coefficients
   which composed the weighted diverse interaction network (WDIN). Indeed,
   this WDIN would unravel the complexity of gene-pair regulation in the
   complex process regarding different tissues. In addition, it should be
   noted that during the computation of adjacency matrix for WDIN, we
   proposed to calculate the discrepant coefficient based on genes’ ranks
   instead of expression values. Such an operation would weaken the biased
   influence caused by different expression levels in different tissues
   and experiments and consequently provide a uniform scale for all
   samples independent of the dynamic range of a data profile ([46]Le et
   al., 2010; [47]Altschuler et al., 2013).

FIGURE 1.

   [48]FIGURE 1
   [49]Open in a new tab

   Schematic illustration of the multi-level comparative framework of this
   study. Two major phases are included, network construction and
   multi-level analysis. WDIN is the constructed disease network, and MCM
   is the identified key module.

   The second phase is multi-level analysis based on the inferred disease
   network. This phase is composed of network-, gene-, and module-level
   analysis. Network-level analysis gives an overall cross-tissue study
   about the T2D based on the constructed original tissue-dependent WDIN.
   Gene-level analysis is carried out through discussing the prioritized
   nodes of WDIN which can provide a local and delicate explanation for
   the disease. The major step of this phase is to discover core modules
   and then derive a merged core module. Following module-level analysis
   could carefully and visibly illustrate the common and specific features
   belonging to different tissues. In this part, the random walk restart
   (RWR) algorithm is employed to rank genes and further extended to
   prioritize gene interactions. The clustering coefficient is introduced
   to enable identifying biological core modules composed of prioritized
   interactions.

   In summary, we explore the tissue cross talk of T2D at gene-, module-,
   and network-level comparisons across different tissues to uncover
   hidden patterns and their biological implications from multi-tissues
   ‘omic’ data. Our multi-level comparative framework is shown in
   [50]Figure 1. We expect that our proposed multi-level analysis
   framework can be extended beyond gene expression level and discover new
   commonalities and specificities among tissues.

Materials and Methods

Data Collection

Tissue Dependent Gene Expression Data Retrieval

   We obtained gene expression profiles from three rat tissues (white
   adipose, skeletal muscle, and liver) [diabetes rats: Goto–Kakizaki (GK)
   rats; control rats: Wistar–Kyoto (WK) rats] from the National Center
   for Biotechnology Information (NCBI) Gene Expression Omnibus database
   (access ID: [51]GSE13268, [52]GSE13269, and [53]GSE13270) ([54]Almon et
   al., 2009; [55]Nie et al., 2011; [56]Xue et al., 2011). The profile was
   composed of 31,099 probes. Our first filter eliminated the probe sets
   without the corresponding official symbol, leaving 25,345 genes for
   further consideration.

T2D Associated Genes Collection

   To provide support and verification for our work, we collected the
   canonically reported genes associated with T2D. These genes were
   gathered from Type II diabetes mellitus pathway (KEGG-Kyoto
   Encyclopedia of Genes and Genomes, [57]H00409) which will be referred
   to as ‘T2D-pathway genes.’ In total, we obtained 50 T2D-pathway genes,
   among which 42 genes had gene expression aforementioned and were used
   in this study. Other T2D related genes were downloaded from the Rat
   Genome Database (RGD)^[58]1 in October 2018. In total, 515 genes were
   downloaded from RGD (referred to as RGD-reported genes), and only those
   that have been measured in GSE 13268-13270 were kept. Thus 303
   RGD-reported genes were left.

Protein–Protein Interaction Network Integration

   Rat protein–protein interaction (PPI) network was integrated based on
   KEGG pathway and BioGRID (Biological General Repository for Interaction
   Datasets). Actually, not all the proteins in PPI network have
   corresponding gene expression values in expression profiles
   [59]GSE13268-[60]GSE13270. To carry out our cross-tissue analysis, we
   only reserved interactions whose two nodes have expressions in three
   tissues. Finally, the reserved interactions composed a network with
   4,081 nodes (proteins) among 24,503 edges (interactions). This network
   was noted as ‘background PPI network’ with gene pair-wise expressions
   in our study.

Data Processing

   In most studies about network biology such as the ones aimed at
   identifying network biomarkers of complex diseases, researchers usually
   collect gene expression data from multiple experiments. The expression
   profiles belonging to different experiments may represent different
   tissues, or different experiment conditions. Straightly pooling the
   expression profiles to construct various networks would ignore the
   underlying structure of the data, and the pooled estimates may be
   severely biased due to the heterogeneity of the experiments. Instead of
   pooling the expression data, we first ranked the genes by their
   expression values for each expression array, and then normalized the
   rank index for every gene. The normalized rank (z-score) was computed
   using the mean (μ) and standard deviation (σ) of the ranks of all genes
   along one sample which can be described as z-score = (x–μ)/σ. In
   subsequent computation and analysis, this normalized rank was used
   instead of expression value since it can reduce the data noise deriving
   from different experimental conditions and different tissues.

   [61]Figure 2 shows that the expression levels estimated by normalized
   ranks subject to normal distribution are more consistent, indicating
   that such ranking processing is a more valid procedure.

FIGURE 2.

   [62]FIGURE 2
   [63]Open in a new tab

   The histograms of six gene expression profiles under different data
   processing strategies. (A) Histograms of gene expression in three
   tissues for two phenotypes. (B) Histograms of normalized gene
   expression in three tissues for two phenotypes. (C) Histograms of gene
   index based on ranking expression in three tissues for two phenotypes.

Tissue-Dependent WDINs Construction

   A WDIN was constructed for each tissue that was related to the query
   disease—T2D. The pipeline of constructing WDIN was as follows:
     * simple (1)
       For each gene pair in the integrated background PPI network,
       calculate its correlation respectively under normal (WK) and
       disease condition (GK) based on the tissue specific normalized
       expression ranks of their corresponding two genes.
     * simple (2)
       Choose the gene pair as our diverse interaction whose correlations
       own significant difference between two conditions, and all
       identified dysfunctional interactions compose the WDIN. The weight
       of each edge is the difference between two correlations under
       normal (WK) and disease (GK) conditions.

   After the creation of the tissue-dependent WDIN for each tissue, each
   interaction was weighted by the diverse correlation of corresponding
   gene pair. For each interaction, the absolute value of its weight can
   reflect the degree of deviation of this interaction in different
   phenotypes (normal and disease), and the positive and negative property
   of the weight coefficient indicates whether the corresponding function
   is active or inactive in disease condition compared with normal
   condition, which would provide more information for medical biology.
   Subsequently, we noted the edge with positive weight as ‘active edge,’
   and the edge with negative weight as ‘inactive edge,’ respectively.

Random Walk With Restart (RWR) on WDIN

Rank Candidate Genes

   We applied RWR algorithm to our constructed tissue-dependent WDINs. The
   goal is to rank genes in candidate sets based on their association
   level with T2D. RWR is a ranking algorithm which simulates a random
   walk on the network to compute the proximity between two nodes by
   exploiting the global structure of the network. It starts on a set of
   seed nodes, which is the set of genes known to be associated with a
   phenotype p (T2D in this work). The candidate genes are then ranked by
   the probability of the random walker reaching this node ([64]Lovasz,
   1996; [65]Tong et al., 2008; [66]Ganegoda et al., 2014).

   Each tissue-specific WDIN can be mathematically described as Γ = (V, ε,
   w), V is the gene set of WDIN’s nodes, εis a set of undirected
   interactions between these genes (or their products), uv ∈ ε represents
   an interaction between u ∈ V and v ∈ V, w(u, v) indicates the weight
   coefficient of interaction uv ∈ ε. The set of interacting partners of a
   gene v ∈ V is defined as N(v) = {u ∈ V } : uv ∈ ε and the total
   reliability of known interactions of v is defined as W(v) = ∑[u∈ N(v)]
   w(uv).

   Let p[0] be the initial probability vector and p[s] be a vector in
   which the i-th element holds the probability of finding the random
   walker at node i at steps. Algorithmically, random-walk based
   association scores can be computed iteratively as follows:
   [MATH:
   <mrow><msub><mi>p</mi><mrow><mtext>S</mtext><mo>+</mo><mn>1</mn></mrow>
   </msub><mo>=</mo><mo
   stretchy="false">(</mo><mn>1</mn><mo>−</mo><mtext>γ</mtext><mo
   stretchy="false">)</mo><msup><mi>M</mi><mtext>T</mtext></msup><msub><mi
   >p</mi><mtext>s</mtext></msub><mo>+</mo><mtext>γ</mtext><mo
   stretchy="false">(</mo><mn>1</mn><mo>−</mo><mtext>η</mtext><mo
   stretchy="false">)</mo><msub><mi>p</mi><mn>0</mn></msub></mrow> :MATH]
   (1)

   Here, η denotes the weight of the network. γ ∈ (0,1)is a user-defined
   restart probability to adjust the preference between the importance of
   a protein or gene with respect to the seed set and network topology.
   Numerical results show that γ = 0.3 is optimal for RWR’s performance
   ([67]Erten, 2009; [68]Erten et al., 2011). Thus, γ is set to 0.3 for
   RWR in this paper. M is the transition matrix of the Γ, the transition
   probability from gene u to gene v can be described as follows,
   [MATH: <mrow><mi>M</mi><mo
   stretchy="false">(</mo><mi>u</mi><mo>,</mo><mtext> </mtext><mi>v</mi><m
   o stretchy="false">)</mo><mo>=</mo><mrow><mo>{</mo><mrow><mtable
   columnalign="left"><mtr columnalign="left"><mtd
   columnalign="left"><mrow><mi>w</mi><mo
   stretchy="false">(</mo><mi>u</mi><mo>,</mo><mtext> </mtext><mi>v</mi><m
   o stretchy="false">)</mo><mo>/</mo><mi>W</mi><mo
   stretchy="false">(</mo><mi>v</mi><mo
   stretchy="false">)</mo><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi mathvariant="italic">if
   uv</mi><mo>∈</mo><mi mathvariant="normal">ε</mi></mrow></mtd></mtr><mtr
   columnalign="left"><mtd
   columnalign="left"><mrow><mn>0</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi
   mathvariant="italic">otherwise</mi></mrow></mtd></mtr></mtable></mrow><
   /mrow></mrow> :MATH]
   (2)

   The seed set is composed of T2D-pathway genes covered by the WDIN, and
   the candidate set contains other nodes of WDIN excluding these seed
   genes. The RGD-reported genes would be used as test genes to verify the
   performance of applying RWR to WDINs. The details will be shown in the
   Results section.

Rank Candidate Edges

   Based on the ranked candidate nodes, we further prioritized the edges
   of WDIN for each tissue. In detail, given an edge, we assigned an index
   to the edge according to the ranks of its two linked nodes to indicate
   its association degree to the studied disease. Theoretically and
   computationally, using the average ranks of its two nodes as the edge’s
   index would meet the requirement. Then all the edges would be ranked in
   ascending order according to their assigned indices. An edge having
   higher rank (with smaller rank value) is more closely associated with
   the studied disease. These sorted edges enable us to identify a minimal
   set composed of prioritized edges which can reflect plentiful
   information about the disease in the considered tissue. Such a set
   would be referred to as ‘core module’, which could provide some
   important information from another viewpoint.

Network/Module Comparison

   We quantified the similarity of two networks based on edges. Given two
   different networks, we supposed the numbers of edges belonging to these
   two networks to be respectively N[1] and N[2]. Then we calculated the
   number of edges that are present in both networks (common edges) noted
   as variable n. We defined variable x[1] as the ratio of the number of
   common edges (n) to the number of edges in the first network (N[1]) and
   defined variable x[2] as the ratio of the number of common edges (n) to
   the number of edges in the second network (N[2]). Using variables x[1]
   andx[2], we introduced a S-score as the harmonic mean of x[1] and x[2]
   ([69]Roy et al., 2013; [70]Knaack et al., 2014). The formulae are as
   follows,
   [MATH:
   <mrow><mtable><mtr><mtd><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>=</m
   o><mfrac><mi>n</mi><mrow><mi>N</mi><mn>1</mn></mrow></mfrac></mrow></mt
   d></mtr><mtr><mtd><mrow><msub><mi>x</mi><mn>2</mn></msub><mo>=</mo><mfr
   ac><mi>n</mi><mrow><mi>N</mi><mn>2</mn></mrow></mfrac></mrow></mtd></mt
   r><mtr><mtd><mrow><mi>S</mi><mo>−</mo><mi
   mathvariant="italic">score</mi><mo>=</mo><mfrac><mrow><mn>1</mn></mrow>
   <mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mstyle
   displaystyle="true"><munderover><mo>∑</mo><mrow><mtext>i</mtext><mo>=</
   mo><mn>1</mn></mrow><mn>2</mn></munderover><mrow><mfrac><mn>1</mn><mrow
   ><msub><mi>x</mi><mtext>i</mtext></msub></mrow></mfrac></mrow></mstyle>
   </mrow></mfrac></mrow></mtd></mtr></mtable></mrow> :MATH]
   (3)

Pathway and Motif Enrichment

   Functional analyses about pathway and process enrichment have been
   carried out with the following ontology sources: GO Biological
   Processes, KEGG Pathway, Reactome Gene Sets. The analyses are performed
   through web tool Metascape^[71]2 ([72]Tripathi et al., 2015).

Results

Network-Level Analysis Based on Tissue-Dependent WDINs

   In order to better capture the disease related information, for each
   tissue, we have designed a new way to infer a tissue-dependent WDIN
   through exacting the interactions with significant diverse correlations
   in different phenotypes. Instead of choosing disease related genes
   individually, we picked out such genes in pairs. An edge would be
   picked out if its corresponding genes were strongly correlated (the
   Spearman correlation coefficient is larger than or equal to some
   threshold, such as 0.8) under one strain while not (e.g., less than
   0.2) in the other condition. Such an edge implicated that the
   interaction between the genes was obviously perturbed under the disease
   and hence noted as ‘diverse interaction’ in this work. Subtracting the
   correlation coefficient in normal state from the coefficient in disease
   state, we have the weight of each diverse interaction.

Characteristics of WDINs

   Through screening all edges in background PPI network, we identified
   2,026/2,118/2,184 diverse edges among 1,710/1,783/1,793 nodes for
   adipose/muscle/liver respectively. We computationally validated these
   networks by examining the number of T2D-rpathway genes and RGD-reported
   genes covered by each tissue-dependent WDIN. We found that the created
   WDIN could hit most (exceed 83%) T2D-pathway genes and more than half
   RGD-reported genes ([73]Table 1). This means that our designed way of
   constructing WDIN is effective, and through screening the background
   PPI network and cutting away those edges having loose or no association
   with the disease, the inferred WDINs could capture abundant disease
   related information with less edges.

Table 1.

   Characteristics of three tissue-dependent WDINs.
   graphic file with name fgene-10-00252-i001.jpg Number of nodes Number
   of edges Hits of 42 T2D-pathway genes Hits of 303 RGD-reported genes
   Adipose 2,026 1,710 35 154
   Muscle 2,118 1,783 36 169
   Liver 2,184 1,793 39 162
   [74]Open in a new tab

   We further investigated the Venn diagram for nodes of three
   tissue-dependent WDINs and disease related genes they covered
   ([75]Figure 3). From the Venn we can see that the percentage of
   specific genes in each WDIN ranges from 11.1 to 14.1%, and the
   housekeeping genes (genes appeared in three WDINs simultaneously) is at
   the level of 33.2% ([76]Eisenberg and Levanon, 2013; [77]Lee et al.,
   2015). The Venn about the coved T2D-related genes composed of
   T2D-pathway genes and RGD-reported genes presented a similar result.

FIGURE 3.

   [78]FIGURE 3
   [79]Open in a new tab

   (A) Venn diagram of all genes in three tissue-dependent WDINs. (B) Venn
   diagram of T2D- related genes covered in three WDINs respectively.

Network Similarity Analysis

   In addition to comparing networks based on network nodes through Venn
   diagram, we also carried out network edge comparison to further
   quantify the extent of shared and tissue-specific network components.
   Thus, we introduced S-score measure to assess the similarity between
   networks edges for each pair of tissues since it is a more sensitive
   measure for comparisons. Based on S-score ([80]Figure 4A), we found
   that the similarity between each pair of WDINs is low, which means that
   during the progress of T2D, three tissues possess evident tissue
   specificity. In this network comparison, adipose tissue and muscle
   tissue were indicated to emerge as the most similar dysfunctions caused
   by T2D among three insulin responsive tissues.

FIGURE 4.

   [81]FIGURE 4
   [82]Open in a new tab

   Network-level comparison of three tissue-dependent WDINs. (A) Network
   similarity based on edges. (B) Heatmap of enriched terms across three
   tissue-specific gene lists, colored by p-values.

Cross-Tissue Functional Analysis Based on WDINS

   To confirm the significant relation between tissue- dependent WDINs and
   T2D, the pathway and motif enrichment was conducted for each WDIN to
   categorize the genes participating in different biological functions or
   pathways.

Enrichment analysis on tissue non-specific genes

   [83]Table 2 lists the top 20 enriched terms of pathway and biological
   process of 904 tissue non-specific genes. The well-known T2D related
   pathway- Insulin signaling pathway ranks in the 16th position. Most
   other listed pathways have been documented to be associated with T2D.
   For example, two pathways about cancer (pathways in cancer and
   proteoglycans in cancer) were enriched, and this is not surprising
   since cancer is quickly emerging as another pathological consequence of
   T2D ([84]Poloz and Stambolic, 2015). Due to the lack of adequate
   glucose uptake induced by dysfunction of insulin response, most
   pathways related to the cellular regulatory signal transduction of
   basic energy metabolism became abnormal, such as Chemokine, cAMP and
   Wnt signaling pathways and so on ([85]Li et al., 2014). As T2D is a
   well-known metabolic disease, the metabolic related pathways appeared
   to be abnormal, such as Purine metabolism and drug metabolic process.

Table 2.

   Enriched terms of pathway and biological process of house-keeping
   genes.
   Category Description Count % Log10(P)
   KEGG Pathway Pathways in cancer 174 19.3548 −100.0
   KEGG Pathway Chemokine signaling pathway 112 12.4582 −100.0
   KEGG Pathway Purine metabolism 104 11.5684 −91.7492
   KEGG Pathway Proteoglycans in cancer 101 11.2347 −79.4258
   GO Biological Processes Organophosphate biosynthetic process 139
   15.4616 −70.6282
   KEGG Pathway Oxytocin signaling pathway 83 9.2324 −69.5144
   KEGG Pathway HTLV-I infection 105 11.6796 −65.5991
   Reactome Gene Sets Signaling by Receptor Tyrosine Kinases 105 11.6796
   −64.0668
   KEGG Pathway cAMP signaling pathway 87 9.6774 −63.8028
   GO Biological Processes Small molecule biosynthetic process 143 15.9065
   −59.8328
   GO Biological Processes Response to peptide 138 15.3503 −58.5452
   GO Biological Processes Drug metabolic process 148 16.4627 −55.7688
   GO Biological Processes Response to xenobiotic stimulus 108 12.0133
   −54.4963
   GO Biological Processes Response to toxic substance 141 15.6840
   −52.1495
   KEGG Pathway Insulin signaling pathway 67 7.4527 −52.0524
   GO Biological Processes Response to inorganic substance 143 15.9065
   −51.8458
   KEGG Pathway Fluid shear stress and atherosclerosis 69 7.6751 −51.1151
   KEGG Pathway Leukocyte transendothelial migration 61 6.7853 −50.3888
   KEGG Pathway Wnt signaling pathway 66 7.3414 −49.2176
   KEGG Pathway Phosphatidylinositol signaling system 54 6.0066 −46.7635
   [86]Open in a new tab

Enrichment analysis on tissue specific genes

   Some tissue-specific property can be displayed through the functional
   analysis of specific genes in three WDINs ([87]Figure 4B). For example,
   Drug metabolic appeared to be significant in liver-dependent WDIN
   enriched terms. Negative regulation of intracellular signal
   transduction and response to oxidative stress were found to be
   dysfunctional in liver-specific WDIN only, and didn’t appear in the
   remaining two tissues. In addition, we found that overall the abnormal
   functions caused by T2D appear to be more similar in adipose and
   muscle, which is consistent with the result induced by network
   similarity based on S-score ([88]Figure 4A).

Gene-Level Analysis Based on Prioritized WDINs

Prioritizing WDINs Using RWR

   To rank genes belonging to candidate sets based on their association
   level with T2D, we applied RWR to our three constructed
   tissue-dependent WDINs. The RWR method starts with the genes belonging
   to T2D pathway covered by each WDIN, and these genes are referred to as
   seed genes. The candidate set of genes includes other nodes of WDIN
   excluding the seed genes. After the random walking, the candidates are
   ranked according to the proximity of each gene to the genes in the seed
   set.

   To verify the efficiency of random walk on the WDIN, we collected T2D
   related genes from RGD database. In all 303 RGD-reported genes appeared
   in our background network, and among 35 T2D-pathway genes hit in
   adipose-WDIN (seed genes), there are 0 overlaps. These 130 genes are
   pleasing test genes because we can verify our method through inspecting
   their rank indexes. Similarly, in muscle-WDIN, there are 36 genes from
   T2D-pathway (used as seed genes) and 169 RGD-reported genes
   respectively. In 169 RGD-reported genes, excluding the seed genes,
   there remains 146 genes which would be used as test genes for RWR. In
   liver-WDIN, there are 39 genes from T2D-pathway (used as seed genes)
   and 162 RGD-reported genes respectively. In 162 RGD-reported genes,
   excluding the overlap genes with seed genes, there remains 136 genes
   which would be used as test genes for RWR.

   After random walking on the adipose-WDIN, the top 100 ranked candidates
   cover 16 test genes, top 200 covered 26, corresponding p-values tested
   by Fisher’s exact are 0.0019 and 0.0147. The corresponding results are
   listed in [89]Table 3, along with the similar results for the other two
   tissues. This indicated that our random walking on inferred WDINs can
   effectively prioritized the disease related genes.

Table 3.

   We verify the efficiency of prioritized WDIN through Fisher’s exact
   test.
   Tissue         Item         Top100 All candidates Top200 All candidates
   Adipose Test gene covered     18         82         29        171
           Test gene uncovered  120        1593       109        1504
           Fisher’s exact        p-value = 0.0003      p-value = 0.0003
   Muscle  Test gene covered     16         84         25        175
           Test gene uncovered  138        1663       129        1572
           Fisher’s exact        p-value = 0.0069      p-value = 0.0198
   Liver   Test gene covered     13         87         23        177
           Test gene uncovered  131        1667       121        1577
           Fisher’s exact        p-value = 0.0492      p-value = 0.0336
   Adipose Test gene covered     16         84         26        174
           Test gene uncovered  114        1591       104        1501
           Fisher’s exact        p-value = 0.0019      p-value = 0.0019
   Muscle  Test gene covered     14         86         22        178
           Test gene uncovered  132        1661       124        1569
           Fisher’s exact        p-value = 0.0310      p-value = 0.0490
   Liver   Test gene covered     11         89         21        179
           Test gene uncovered  125        1665       115        1575
           Fisher’s exact        p-value = 0.0989      p-value = 0.0435
   [90]Open in a new tab

   To collect more generally known T2D genes to verify our framework, we
   queried the approved type 2 diabetes genes through published papers
   from year 2010 to 2018. After mapping their homologous genes in rats
   based on the homologous categories from MGI ([91]Bult et al., 2008), 12
   genes were reserved. Finally, 8 of 12 genes were detected in three
   constructed WDINs which are respectively BCAR1, CAMK2B, CPS1, FADS2,
   GCK, PPARG, PDX1, and POLD2. Among them, GCK ([92]Misra and Owen, 2018)
   and PDX1 ([93]Kodama et al., 2016) were included in our seed set.
   Besides, FADS2 ([94]Li et al., 2016) were ranked 112 in our prioritized
   adipose-WDIN (1,675 genes in total), and POLD2 ([95]Gaudet et al.,
   2011) were ranked 98 in muscle-WDIN (1,747 genes in total). BCAR1
   ([96]Kazakova et al., 2018) and CAMK2B ([97]Sacco et al., 2016) also
   have higher ranks in muscle-WDIN which were 289 and 257 respectively.
   PPARG ([98]Voight et al., 2010) was ranked 533 in adipose-WDIN, while
   CPS1 ([99]Matone et al., 2016) had an unsatisfied rank 1,272. In
   general, the ranked results of these 8 genes were basically consistent
   with their published results associated with T2D. Furthermore, our
   results could offer tissue-specific insights into T2D.

Cross-Tissue Functional Analysis Based on Prioritized Genes

   In order to reduce the effect of ascertainment bias in genes loosely or
   less associated with the disease, we set a threshold, and only focused
   on the prioritized genes above this threshold in three tissue-dependent
   WDINs. Thus we can perform a local and more delicate functional
   analysis on tissue non-specific and tissue-specific genes among them
   separately with less noise. Our experiments show that the effect of the
   selection of the threshold is minor. If the threshold is too small such
   as less than 50, the genes chosen to perform functional enrichment
   analysis would be insufficient to achieve statistical significance;
   whereas if the threshold is too large such as higher than 200
   (according to the results of the previous subsection), the follow-up
   functional analysis would be dilute on account of genes loosely
   associated with T2D. Therefore, we set a cutoff value for prioritized
   genes at 100 to conduct the cross-tissue functional analysis, and the
   results were shown in [100]Figure 5.

FIGURE 5.

   [101]FIGURE 5
   [102]Open in a new tab

   Heatmap of top100 genes cross three prioritized WDINs. (A) For tissue
   non-specific genes. (B) For tissue-specific genes.

   We found that the enriched terms were more specific and striking with
   T2D when considering only the top100 prioritized housekeeping genes,
   such as type II diabetes mellitus, insulin signaling pathway, and
   glucose metabolic process. When restricting tissue-specific genes in
   the top100 prioritized genes, the tissue specificity became more
   visible. In detail, the immune effector process and positive regulation
   of phosphatidylinositol 3-kinase activity were not enriched in adipose,
   while specific genes in liver were not involved in negative regulation
   of transferase activity, positive regulation of phosphatidylinositol
   3-kinase activity and epithelial cell differentiation process which
   also means that only adipose takes part in epithelial cell
   differentiation. When compared with the other two tissues, the muscle
   displays significant enrichment in regulation of kinase activity
   process and MAPK signaling pathway. And also from the overall view, the
   enriched functional items for adipose and muscle were closer.

Characterizing the Molecular Functions of Each Predicted T2D-Associated Gene

   After ranking candidates through random walking on tissue-dependent
   WDINs, we found that some genes were ranked ahead while they did not
   appear in the RGD database or T2D-pathway. For clarity, we focused on
   top50 nodes and considered them as potential T2D-associated genes.
   Below, we respectively list their gene information in [103]Table 4, and
   corresponding pathway and process enrichment in [104]Table 5.

Table 4.

   Annotation on predicted potential T2D-assocaited genes.
   Tissue Gene symbol Rank index Description
   Adipose CARD9 2 Caspase recruitment domain family, member 9
   CHUK 9 Conserved helix-loop-helix ubiquitous kinase
   FBP1 6 Fructose-bisphosphatase 1
   FOS 7 Fos proto-oncogene, AP-1 transcription factor subunit
   GNB4 15 G protein subunit beta 4
   NFKBIB 4 NFKB inhibitor beta
   PGM1 3 Phosphoglucomutase 1
   PRPS1 12 Phosphoribosyl pyrophosphate synthetase 1
   RXRA 5 Retinoid X receptor alpha
   SHC1 10 SHC adaptor protein 1
   STAT4 1 Signal transducer and activator of transcription 4
   Muscle ATP5O 12 ATP synthase peripheral stalk subunit OSCP
   CAMK2A 6 Calcium/calmodulin-dependent protein kinase II alpha
   GLB1 3 Galactosidase, beta 1
   MAP3K7 4 Mitogen activated protein kinase kinase kinase 7
   PDGFRA 11 Platelet derived growth factor receptor alpha
   PRKACB 5 Protein kinase cAMP-activated catalytic subunit beta
   RASGRF1 9 RAS protein-specific guanine nucleotide-releasing factor 1
   RPS6KA1 2 Ribosomal protein S6 kinase A1
   STAT4 7 Signal transducer and activator of transcription 4
   liver CNTFR 5 Ciliary neurotrophic factor receptor
   GALM 8 Galactose mutarotase
   MAP2K7 9 Mitogen activated protein kinase kinase 7
   PSMD9 3 Proteasome 26S subunit, non-ATPase 9
   PTPN6 6 Protein tyrosine phosphatase, non-receptor type 6
   RASGRF1 1 RAS protein-specific guanine nucleotide-releasing factor 1
   RASGRF2 7 RAS protein-specific guanine nucleotide-releasing factor 2
   RASGRP2 10 RAS guanyl releasing protein 2
   RYR3 4 Ryanodine receptor 3
   [105]Open in a new tab

Table 5.

   Key issues of motif enrichment analysis on predicted potential
   T2D-assocaited genes.
   Tissue Category Description Log10 (P) Hit genes
   Adipose Reactome Gene Sets Signaling by Interleukins −6.77 Nfkbib,
   Shc1, Chuk, Fos, Stat4
   KEGG Pathway Th1 and Th2 cell differentiation −6.64 Nfkbib, Chuk, Fos,
   Stat4
   KEGG Pathway Pentose phosphate pathway −6.12 Fbp1, Pgm1, Prps1
   Reactome Gene Sets Innate Immune System −5.75 Pgm1, Card9, Nfkbib,
   Shc1, Chuk, Fos
   KEGG Pathway Chemokine signaling pathway −5.57 Nfkbib, Shc1, Gnb4, Chuk
   Reactome Gene Sets Fc epsilon receptor signaling −4.63 Shc1, Chuk, Fos
   GO Biological Processes Cellular response to insulin stimulus −3.47
   Fbp1, Rxra, Shc1
   Muscle KEGG Pathway MAPK signaling pathway −7.15 Pdgfra, Rps6ka1,
   Rasgrf1, Prkacb, Map3k7
   KEGG Pathway Long-term potentiation −5.41 Camk2a, Rps6ka1, Prkacb
   KEGG Pathway Wnt signaling pathway −4.38 Camk2a, Prkacb, Map3k7
   KEGG Pathway Calcium signaling pathway −4.04 Pdgfra, Camk2a, Prkacb
   Liver KEGG Pathway MAPK signaling pathway −5.30 Rasgrf2, Rasgrf1,
   Rasgrp2, Map2k7
   GO Biological Processes Regulation of cation transmembrane transport
   −3.31 Rasgrf2, Ptpn6, Rasgrf1
   [106]Open in a new tab

   According to the annotation of genes by Metascape, we found that some
   detected genes indeed have close connection with T2D. Among the top50
   potential genes in adipose-WDIN, PGM1 mainly takes part in Pentose
   phosphate pathway and innate immune system and galactose catabolic
   process. FBP1 responses to insulin stimulus and pentose phosphate
   pathway. STAT4 was ranked in the top50 in both prioritized adipose- and
   muscle- WDINs respectively. This gene provides instructions for a
   protein that acts as a transcription factor, which means that it
   attaches (binds) to specific regions of DNA and helps control the
   activity of certain genes. The STAT4 protein is turned on (activated)
   by immune system proteins called cytokines, which are part of the
   inflammatory response to fight infection. RASGRF1 appears in potential
   T2D-associated gene set in both prioritized muscle- and liver- WDINs,
   and it has been shown to be upstream from IGF1 (Insulin-like growth
   factor 1) which is a star gene about T2D, allowing it to control growth
   in mice ([107]Drake et al., 2009). PSMD9 was observed in top50 genes
   detected in liver-WDIN, and it plays an important role in negative
   regulation of insulin secretion processes (GO:0046676) and positive
   regulation of insulin secretion (GO:0032024); GALM is another potential
   T2D-related gene detected in liver-WDIN that acts as a part of the
   galactose catabolic process (GO:0019388) and the galactose metabolic
   process (GO:0006012).

   These potential T2D-associated genes would be helpful for physicians or
   biologists, as they can be used to determine an experimental target as
   the subject of future research.

Module-Level Analysis for the Predicted T2D Associated Genes

Identifying Core Module From Prioritized WDIN Based on Edges

   In addition to prioritizing candidate nodes, we further ranked
   interactions in three WDINs separately to capture those crucial
   interactions in several disorders caused by T2D. Specifically, each
   edge of the WDIN would be ranked according to the average rank of its
   corresponding two nodes. Theoretically, the edge having higher ranks
   (with smaller rank values) tends to have a closer association with the
   studied disease.

   Our final goal was to identify a minimal set of prioritized WDIN edges
   which can reflect copious information about the disease in some
   tissues, namely ‘core module’ hereafter. To achieve this goal, we
   carried out a series of tests to evaluate the effectiveness of
   predicting and detecting disease related genes for the subnetworks from
   prioritized tissue-specific WDIN. Firstly, for each tissue, the series
   subnetworks were successively exacted from the prioritized WDIN based
   on edges from upon top 2.5% to upon top 25% at an internal of 2.5, and
   then the maximum connected component (MCC) of each subnetwork was
   retained for further analysis. In total, we had 10 MCCs for each
   tissue. Secondly, we compute the numbers and the coverage rates of the
   disease related genes collected from T2D-pathway and RGD database hit
   by each MCC ([108]Supplementary Figure S1).

   We then identified our core module through investigating the clustering
   coefficient of each MCC because the clustering coefficient can reflect
   the modular feature of biological function module ([109]Caroline and
   Ralf, 2006). In most cases, a complex with larger clustering
   coefficient frequently tends to form a biological functional module.
   Generally, the clustering coefficient ranges from 0 to 1, and for a
   stochastic network with N nodes the value of it is approximately equal
   to N^−1.

   We calculated the clustering coefficient for each MCC and the
   corresponding results were listed in [110]Figure 6. Besides, we compute
   the clustering coefficients of three tissue-dependent WDINs which are
   0.011, 0.011, and 0.016 for adipose, muscle, and liver respectively.
   For each tissue, the MCC with the largest clustering coefficient was
   selected as our core module, that is to say, top 15% of adipose-WDIN,
   top 17.5% of muscle-WDIN and top 12.5% of liver-WDIN are taken as our
   tissue-dependent core modules. It should be noted that the clustering
   coefficients (0.018, 0.013, and 0.021) belonging to three identified
   modules are all larger than that of their corresponding WDINs.

FIGURE 6.

   [111]FIGURE 6
   [112]Open in a new tab

   Clustering coefficient of MCCs for each tissue.

   Analysis of core modules and a focus on the difference between the
   edges/interactions instead of nodes/genes individually would provide
   some important information from another viewpoint. We will illustrate
   this point in the next subsection.

Creating Merged Core Module (MCM) Through Jaccard Index

   Here we created the merged core module (MCM) by integrating three
   tissue-dependent core modules through introducing ‘Jaccard index’
   ([113]Knaack et al., 2014). For each pair of core modules, we
   calculated the Jaccard index, which for a pair of sets is defined as
   the ratio between the size of the intersection and the size of the
   union of two sets. If the Jaccard index of an interaction is higher
   than a threshold in at least one pair of modules, we added this
   interaction to the merged core module along with their weights. To find
   a suitable threshold, we separately calculated a series of clustering
   coefficients of complexes which were generated when the threshold was
   set as 0.3, 0.2, 0.1, and 0.05, and the resulting clustering
   coefficients were 0, 0, 0.027, and 0.026. Hence, the parameter α = 0.1
   was finalized as the best threshold to create our MCM. Finally, we had
   an MCM composed of 198 edges among 154 nodes.

   We then investigated the inferred MCM to identify the specific and
   common components of dysfunctions between different tissues.

Cross-Tissue Analysis Based on Merged Core Module

   To systematically assess the extent to which the dysfunctions were
   shared among different tissues, we split the edges of MCM into three
   categories:
     * simple (1)
       Tissue-specific edge: the edge was significantly dysfunctional in
       only one tissue (as shown in [114]Figure 7).
     * simple (2)
       Differential edge: the edge was significantly dysfunctional in at
       least two tissues. While their statuses were different, for
       example, the edge was active in tissue A whereas inactive in tissue
       B (shown in [115]Figure 8).
     * simple (3)
       Common edge: the edge was significantly dysfunctional in at least
       two tissues. When the status was consistent, for example, the edge
       was active in tissue A whereas inactive in tissue B (shown in
       [116]Figure 9).

FIGURE 7.

   [117]FIGURE 7
   [118]Open in a new tab

   Tissue-specific edges displayed in MCM. (A) Adipose. (B) Muscle. (C)
   Liver.

FIGURE 8.

   [119]FIGURE 8
   [120]Open in a new tab

   Differential edges displayed in MCM. The two subgraphs in each row show
   the differential edges between tissue I and tissue II. The different
   status between the tissue pair under T2D can be exhibited through the
   dysfunctional weight of the edge in different tissue-dependent WDIN.
   (A,B) Show the differential edges between adipose and muscle,
   respectively. (C,D) Show muscle and liver, respectively. (E,F) Show
   adipose and liver, respectively. The color and width of the edge here
   have the same meaning as in [121]Figure 7.

FIGURE 9.

   [122]FIGURE 9
   [123]Open in a new tab

   Common edges displayed in MCM. The two subgraphs in each row show the
   common edges between two tissues. The consistent status between the
   tissue pair under T2D can be exhibited through the dysfunctional weight
   of the edge in different tissue-dependent WDIN. (A,B) Are the common
   edges between adipose and muscle, respectively. (C,D) Are for muscle
   and liver, respectively. (E,F) Show adipose and liver, respectively.
   The color and width of the edge here have the same meaning as in
   [124]Figure 7.

   We found that in MCM, most adipose-specific edges appeared in links to
   hub gene PIK3R1, and a large proportion of them were active under T2D,
   such as the function occurred between PIK3R1 and INS1, the function
   between PIK3R1 and IRS2, the function between PIK3R1 and IGF1 etc.
   Taking muscle-specific edges in MCM into consideration, it can be seen
   that PIK3R1, PIK3CB, and EGFR were key nodes. The corresponding
   functions linked to PIK3R1 (such as the function occurred between
   PIK3R1 and IGF1R) were inactive which made the condition different than
   in adipose, and the corresponding functions around EGFR were active
   (such as functions linked between this node and IGF1). Though the
   distribution of liver-specific edges in MCM was decentralized when
   compared with other two tissues, PIK3R1 and PIK3CB were still two key
   nodes, interactions between PIK3R1 and other nodes (such as INS2) were
   active, while PIK3CB was a balanced node in terms of the hallmark of
   interactions.

   There were 7 differential edges between adipose and muscle in MCM,
   which means the corresponding functions presented opposite status in
   these two tissues. As documented ([125]Bereziat et al., 2002), GRB14
   inhibited the catalytic activity of the INSR, which was identical to
   our result displayed in [126]Figure 8B. Actually, according to our
   result, this inhibition may have only emerged in muscle tissue, while
   it would be inverted in adipose tissue ([127]Figure 8A) This provides a
   meaningful target for biological experiments.

   All 5 differential edges were identified between muscle and liver in
   MCM ([128]Figure 8C,D). Also from the visible changes in corresponding
   functions we could reveal more subtle and interesting difference
   between tissues. For example, as described in GeneCards^[129]3, SHC1
   interacts with the NPXY motif of tyrosine-phosphorylated IGF1R.
   According to our results, we can further infer that the interaction was
   active in muscle under T2D while inhibited in liver. Some similar
   results would be observed between adipose and liver.

   Inspecting the common edges between tissue pair, a notable thing was
   that all the four common interactions between adipose and liver were
   inactive. Remarkably, one of these four edges was between ABCC8 and
   KCNJ11, which are two well-known T2D-related genes encoding proteins
   Kir6.2 and Sur1, respectively, in pancreatic beta cells. KCNJ11
   interacts with ABCC8 to produce the KATP channel, which transfers
   potassium ions across the beta cells ([130]Haghvirdizadeh et al.,
   2015). This interaction was indirectly linked through CACNA1E and was
   inhibited in both adipose and liver.

   However, here we only described several edges for each category.
   Similar results can be examined from [131]Figure 6–[132]9 which would
   be useful in studies about disease mechanism through analyzing the
   shared and specific components among tissues under the disease.

Discussion

   Type 2 diabetes (T2D) is a complex disease and its dysfunction involves
   many tissues. This work systematically investigates commonalities and
   specificities of T2D among multiple tissues. We established a
   multi-level comparative framework across three insulin target tissues
   (white adipose, skeletal muscle, and liver) to provide a better
   understanding of T2D.

   The first challenge is to represent the tissues from the data. Starting
   from the ranks of gene expression, we constructed the ‘disease network’
   through detecting diverse interactions to provide a
   well-characterization for disease affected tissues. Based on the
   constructed tissue-dependent WDINs, an elementary and integral
   comparative analysis at network-level was conducted. The results of
   network similarity according to the edges of network indicated that the
   similarity among three tissues is lower, thus justifying the necessity
   to conduct tissue-specific analysis for T2D. The differences among
   tissues were also visible in enriched motif based on tissue-specific
   genes, and these differences showed that some T2D-related pathways or
   biological processes own tissue specificity. Besides, we found that
   among three tissues, adipose and muscle have more similar components in
   terms of both enriched functions and network similarity.

   To reduce the negative effects induced by genes loosely associated with
   T2D, RWRs algorithm was applied to the disease network to prioritize
   its nodes and edges according to their associations with T2D. Genes
   ranked higher theoretically are significantly associated with T2D.
   Gene-level analysis was carried out on those genes ranked higher such
   as in the top100. On one side, we discussed these genes individually
   and found that some of them have been reported to be related with
   disease genes, while several are not yet documented and could be
   potential T2D-related genes which may be further verified
   experimentally. On the other side, we collected these genes together to
   survey their combined functions through inspecting the enriched
   pathways and biological processes. Compared with the similar analysis
   based on the whole disease network, the analysis based on those closely
   associated with T2D displayed more specific and striking enriched
   issues with T2D (such as type II diabetes mellitus, insulin signaling
   pathway, and glucose metabolic process).

   The network- and gene- level analyses could give some novel information
   cross tissues; meanwhile, they largely verified the effectiveness of
   our random walking on the constructed WDINs. Thus, based on the
   prioritized edges of WDINs, we further identified a merged core module
   (MCM) by combining the clustering coefficient and Jaccard index, which
   can provide and elaborate and visible explanation about the common and
   specific features for different tissues at module- level. Edges in the
   MCM were grouped into three categories: specific edges, differential
   edges, and common edges. Focusing on these three categories of edges,
   more detailed common issues and specific differences of dysfunctional
   functions between tissues were revealed which would enable us to
   further understand the disease. We await the emergence of
   tissue-specific and isoform-specific gene (specifically genes from IRs)
   knockout studies to corroborate these conclusions.

   Overall, we presented a mathematical and systems biology framework
   consisting of constructing tissue-dependent disease related networks
   and multi-level analysis based on these constructed networks. Our
   network-, gene-, and module-level characterization across different
   tissues of T2D hold the promise to provide a broader and deeper
   understanding for T2D mechanism.

Data Availability

   Publicly available datasets were analyzed in this study. This data can
   be found here: [133]www.ncbi.nlm.nih.gov/geo.

Author Contributions

   YW and SS designed the study. FS organized the database and analyzed
   the results. SS and YW wrote the manuscript. All authors read and
   approved the submitted version.

Conflict of Interest Statement

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

   Funding. This work was supported in part by the National Key Research
   and Development Program of China under grant 2017YFC0908400. YW is also
   supported by NSFC under nos. 11871463, 61621003, and 61671444.
   ^1

   [134]http://rgd.mcw.edu/wg/home
   ^2

   [135]http://metascape.org
   ^3

   [136]https://www.genecards.org

Supplementary Material

   The Supplementary Material for this article can be found online at:
   [137]https://www.frontiersin.org/articles/10.3389/fgene.2019.00252/full
   #supplementary-material
   [138]Click here for additional data file.^ (735.7KB, pdf)

References