Abstract
Background
The rewiring of molecular interactions in various conditions leads to
distinct phenotypic outcomes. Differential network analysis (DINA) is
dedicated to exploring these rewirings within gene and protein
networks. Leveraging statistical learning and graph theory, DINA
algorithms scrutinize alterations in interaction patterns derived from
experimental data.
Results
Introducing a novel approach to differential network analysis, we
incorporate differential gene expression based on sex and gender
attributes. We hypothesize that gene expression can be accurately
represented through non-Gaussian processes. Our methodology involves
quantifying changes in non-parametric correlations among gene pairs and
expression levels of individual genes.
Conclusions
Applying our method to public expression datasets concerning diabetes
mellitus and atherosclerosis in liver tissue, we identify
gender-specific differential networks. Results underscore the
biological relevance of our approach in uncovering meaningful molecular
distinctions.
Keywords: Differential network, DINA, Differential network analysis,
Atherosclerosis, Diabetes
Introduction
The emergence of high-throughput technologies in genomics, proteomics,
and non-coding RNA studies has revolutionized our understanding of how
variations in the abundance of these biological molecules correlate
with diseases [[30]1]. This deluge of data has spurred the creation of
innovative analytical techniques that adopt a system-level approach
through the lens of network science. These techniques employ networks
to depict the complex interactions between biological molecules under
specific conditions, as inferred from empirical data [[31]2]. A
particularly compelling use of network theory is comparing biological
networks under disparate conditions, such as contrasting a disease
state with a healthy one.
Differential network analysis (DINA) has gained recognition for its
ability to delineate the differences between two states by
encapsulating them within a singular differential network that
highlights their variances [[32]3–[33]6]. DINA has found application in
contrasting various experimental conditions or phenotypes, with recent
studies underscoring the significance of factors like the age and sex
of patients on drug response, disease progression, and comorbidity
prevalence in chronic diseases [[34]7–[35]9]. Empirical findings have
demonstrated notable disparities in disease incidence and progression
based on sex and age, such as the heightened susceptibility of older
diabetic patients to comorbidities and the observed sex-based
differences in COVID-19 mortality rates [[36]10–[37]12]. This
underscores the need for advanced algorithms to unravel the molecular
mechanisms driving these age- and sex-dependent disparities.
DINA algorithms are designed to pinpoint changes in network structures
by identifying association measures that differ between two biological
states,
[MATH: C1,C2 :MATH]
. When presented with two disparate biological conditions, represented
by two networks of molecular interactions, DINA algorithms aim to
uncover the network rewiring that underpins the mechanistic differences
between these states. DINA algorithms construct networks
[MATH: N1,N2 :MATH]
for each condition, starting with gene expression datasets from two
conditions. These networks feature nodes for each gene and weighted
edges that denote the strength and nature of associations or causal
relationships between genes. A differential network
[MATH: Nd :MATH]
is then derived to represent the variation in associations across
conditions, a technique previously applied to investigate
disease-related alterations [[38]8, [39]13].
Traditional methods often assume that gene expression data adhere to
specific parametric distributions, such as Gaussian or Poisson
distributions [[40]1, [41]14–[42]16]. However, the count-based nature
of Next-Generation Sequencing (NGS) data challenges these assumptions,
prompting the necessity for non-parametric DINA analysis approaches.
We introduce a novel DINA methodology that identifies differential
edges between networks and integrates differential gene expressions,
taking into account sex-based differences [[43]12]. This approach
employs multivariate count data for predicting gene expression levels
and constructs a conditional dependence graph using pairwise Markov
random fields [[44]17]. This departure from traditional methods, which
often presuppose parametric distributions for gene expression data,
highlights the imperative for non-parametric techniques in DINA
analysis.
Our proposed DINA algorithm is initiated by constructing two
condition-specific graphs, from which a final differential graph is
derived. This graph is then pruned to emphasize edges related to genes
exhibiting differential expression. Our DINA method facilitates the
identification of differential networks while incorporating
considerations of gender differences, thereby advancing our
understanding of the molecular basis of disease in the context of sex
and age.
Relevance of the DNA
Differential networks are used to compare two or more networks based on
changes in connectivity or interactions between nodes under different
conditions or across different datasets. They are commonly applied in
biological networks, such as gene expression or protein-protein
interaction networks, to identify how interactions change in response
to disease, treatment, or other stimuli [[45]18–[46]26].
For instance, DNA may be applied on Expression Networks in Healthy vs.
Diseased States. Let imagine that Network 1 represents Healthy State.
In the healthy state, a set of genes (nodes) are connected based on
their co-expression (edges) For example, if Gene A and Gene B are
highly co-expressed, there would be an edge between them. This network
shows how genes typically interact in a healthy individual. Let imagine
a Network 2 who models a diseased state (such as cancer) in which the
expression levels of genes may change, and therefore the interactions
between them may be altered. Some gene-gene interactions may be lost,
and new interactions may emerge. The differential network is created by
taking the difference between the two networks (Healthy vs. Diseased).
This network will highlight both lost and novel connections.
Interactions present in the healthy network but missing in the diseased
network, and interactions that appear in the diseased state but were
absent in the healthy state.
For example: In the healthy state: Gene A interacts with Gene B and
Gene C. In the diseased state: Gene A no longer interacts with Gene B,
but interacts with Gene D instead. The differential network would show
a lost edge between Gene A and Gene B and a new edge between Gene A and
Gene D. This type of analysis helps in identifying key genes and
pathways that may be responsible for the diseased condition or that
could be potential therapeutic targets see Fig. [47]1.
Fig. 1.
Fig. 1
[48]Open in a new tab
Toy example of differential network
Examples of the relevance of DNA are reported in many works. For
instance, by Ha et al [[49]27], who describes differential networks
between two different subtypes of glioblastoma estimated from genomic
data. Basha et al,. [[50]28] introduces an extensive differential
network analysis of multiple human tissue interactomes. As a results
they are able to evidence differences of processes between tissues.
Related work
DINA’s application in distinguishing differentially expressed genes
among various sample groups is invaluable, especially in contrasting
individuals with specific diseases against healthy controls. This
methodological approach is crucial in molecular biology and
bioinformatics for pinpointing genes with variable expression levels
between diseased and healthy sample groups.
Central to DINA-based research are algorithms designed to detect
alterations in network structures under varying conditions [[51]29].
These algorithms have been pivotal in biology for mapping the
transition from healthy to diseased states within the same biological
framework [[52]2]. Our focus narrows to networks that maintain constant
node sets yet exhibit variable edge sets. Specifically, in the presence
of two distinct conditions
[MATH: C1 :MATH]
and
[MATH: C2 :MATH]
, represented by graphs
[MATH: G1(V,E1
msub>) :MATH]
and
[MATH: G1(V,E2
msub>) :MATH]
, the objective of DINA analysis is to pinpoint the modifications of
the network.
In biological systems analysis, it is pertinent to note that while
nodes represent directly quantifiable entities, the derivation of edges
necessitates observing a sequence of temporal data. For instance, gene
networks originating from microarray experiments necessitate the
inference of edges from data through statistical graphical models
[[53]30–[54]32]. In these models, each node within the graph
[MATH: G=(V,E) :MATH]
is one of these measurable random variables
[MATH:
X1,…,XM :MATH]
, and the edges quantify a pre-specified notion of associations between
the pairs of these variables. In this setting, the focus is
predominantly on undirected graphs where the directions or the
causality of these associations are not of interest. Among different
metrics of associations, partial correlation is one of the most common
ones as it measures conditional dependencies. Probabilistic graphical
models allow conditional dependency-based graph estimation.
Differential associations within these models are scrutinized by
evaluating the variance in partial correlations across experimental
conditions, utilizing specific statistical tests to measure the
alterations in correlations among entities. Additionally, changes in
gene expression levels are assessed using the classical Student’s
t-test [[55]33]. Subsequently, these statistical evaluations are
amalgamated into a singular optimization model aiming to elucidate the
hierarchical network structures. However, certain assumptions inherent
to previous models, such as Gaussian data distribution, may not hold
across all experimental conditions, necessitating non-parametric
methods. While computationally efficient and simpler to implement,
these methods demand adherence to specific distributional
prerequisites, failing which could skew or invalidate the results.
Several studies have opted for a nonparanormal data distribution (or
Gaussian copula) approach [[56]34], employing rank-based correlation
matrices like Spearman correlation or Kendall’s
[MATH: τ :MATH]
. There are other variations available too [[57]29]. However, the
nonparanormal models are primarily only suitable for continuous data,
which limits its applicability in other settings.
These models have found applications in analyzing brain data and
sequencing counts, circumventing the temporal limitations of
non-parametric methods. Efficient Bayesian models have emerged
[[58]29], calculating edge probabilities by inferring their likelihood.
Some methods adopt diverse heuristics for probability inference,
challenging direct data derivation, as highlighted in [[59]17], with
this method surpassing other contemporary techniques.
Non-parametric methods, recognized for their minimal assumptions
regarding data distribution, leverage data-driven approaches to
evaluate network connectivity differences between conditions. Their
flexibility and robustness are advantageous in handling complex,
non-linear node relationships within networks, albeit at the cost of
computational intensity and reduced interpretability.
The decision between parametric and non-parametric approaches for
differential network analysis hinges on data characteristics,
foundational assumptions, and the investigative query. Researchers
frequently use sensitivity analysis and result cross-validation to
ensure their findings’ robustness and reliability. Integrating insights
from both methodological spectrums can yield a more detailed
comprehension of the differential network architecture.
Materials and methods
Non parametric differential network analysis algorithm
Let us consider two different expression datasets encoded in two
matrices
[MATH:
Nj×M
:MATH]
(
[MATH: Nj :MATH]
samples, M genes) for
[MATH: j=1,2
:MATH]
denoted as
[MATH: X1 :MATH]
,
[MATH: X2 :MATH]
representing two biological conditions
[MATH: C1 :MATH]
[MATH: C2 :MATH]
. Each row of
[MATH: Xj :MATH]
stores the expression values of M genes of different samples.
Therefore,
[MATH:
Xi,jc :MATH]
[MATH: (c=1,2,i=1,…,Nc,j=1
mn>,…,M) :MATH]
denotes expression of j-th gene in i-th sample under condition c. Note
that the sample sizes under the two conditions may be different. We
model this data under a Bayesian non-parametric framework.
Each column representing a gene may be encoded as a network node and
compute conditional independence-based graphical relation. Let
[MATH: M×M :MATH]
dimensional matrices
[MATH: P1 :MATH]
and
[MATH: P2 :MATH]
represents the conditional independence relation among the M genes
[[60]17]. We then define the differential relation between two
conditions based on the posterior samples of
[MATH: P1 :MATH]
and
[MATH: P2 :MATH]
.
Following the pairwise Markov random field (MRF) model for counts from
[[61]17], we consider the following joint probability mass function for
M-dimensional count-valued data X,
[MATH: Pr(X1,…
,XM)∝exp∑j=1M[αjXj-log(Xj!)]-∑ℓ=2M∑<
/mo>j<ℓβjlF(Xj)F(Xℓ), :MATH]
where
[MATH: F(·) :MATH]
is a monotone increasing bounded function with support
[MATH: [0,∞) :MATH]
. As in [[62]17], we let
[MATH: F(·)=(tan-1(·))θ :MATH]
for some positive
[MATH: θ∈R+
:MATH]
. Since the data is positive-valued, the range of F(X) is
[MATH: (0,(π2)θ)
:MATH]
and the exponent
[MATH: θ :MATH]
is specified as a minimizer of the loss, quantifying the difference in
covariance between F(X) and X following [[63]17]. For detailed
descriptions of the method, readers are encouraged to check [[64]17].
Under this model, if
[MATH:
βjℓ=0 :MATH]
, we have
[MATH: Xj :MATH]
and
[MATH: Xℓ :MATH]
to be conditionally independent, i.e.
[MATH: P(Xj,Xℓ∣X-<
mo stretchy="false">(j,ℓ))=P(Xj∣X-(j,ℓ))P(Xℓ∣X-(j,ℓ)) :MATH]
, where
[MATH: X-(j,ℓ) :MATH]
stands for all the variables excluding
[MATH: Xj :MATH]
and
[MATH: Xℓ :MATH]
. Our estimated graphical relation thus would rely on
[MATH: βjℓ :MATH]
’s.
We take a Bayesian route for inference and put the same priors as in
[[65]17]. Specifically, for the
[MATH: βjl :MATH]
’s, we set simple independent and identically distributed mean zero
Gaussian priors. The parameter
[MATH: λj :MATH]
’s are treated as random effects and given distributions
[MATH: Dj :MATH]
. The distribution
[MATH: Dj :MATH]
governs the over-dispersion and the shape of the marginal count
distribution for the
[MATH: jth :MATH]
node. To allow these marginals to be flexibly determined by the data,
we take a Bayesian nonparametric approach using Dirichlet process (DP)
priors, where
[MATH:
Dj∼DP(MjD0) :MATH]
, with
[MATH: D0 :MATH]
as a Gamma base measure and
[MATH: Mj :MATH]
as a precision parameter. The precision parameter
[MATH: Mj :MATH]
follows a Gamma distribution,
[MATH:
Mj∼Ga(c,d) :MATH]
, allowing for greater adaptivity to the data. We run MCMC to
approximate the posterior and generate posterior samples of
[MATH: βjl :MATH]
’s. The model follows a structure, similar to the Poisson auto-model
[[66]35]. When
[MATH: βjl :MATH]
’s are zero, the marginals lead to Poisson-type marginals with
sample-specific means as
[MATH: λj :MATH]
’s are modeled as random effects. Thus, they also need to be sampled
for all the samples. This puts a moderately high computational cost.
The general code to fit this model is in the second author’s GitHub
page [67]https://github.com/royarkaprava/CONGA.
After running the Markov chain Monte Carlo (MCMC) sampling for the
above model under two conditions, we get the matrices
[MATH: P1 :MATH]
and
[MATH: P2 :MATH]
with (j, k)-th entries as
[MATH:
βj,k(1) :MATH]
and
[MATH:
βj,k(2) :MATH]
respectively. Consequently, a differential network is defined as the
difference
[MATH:
βj,k(1)-βj,k(2) :MATH]
for each edge (j, k) where
[MATH:
βj,k(1) :MATH]
and
[MATH:
βj,k(2) :MATH]
are the coefficients under two conditions 1 and 2. From the MCMC
samples, we can get the posterior mean of these differences as
[MATH: β^j,k(1)-β^j,k(2) :MATH]
using the individual posterior means. Alternatively, we can compute
other posterior summaries such as
[MATH: P(|βj,k(1)-βj,k(2)|>c∣D1,D2) :MATH]
, which is the posterior probability that
[MATH: |βj,<
/mo>k(1)-βj,k(2)| :MATH]
is greater than some pre-specified cutoff c given the two datasets,
denoted as
[MATH: D1 :MATH]
and
[MATH: D2 :MATH]
. We take the second approach to define our differential networks. To
choose c adequately, we run a sensitivity test. Specifically, we vary c
over a range, and compute
[MATH: fij(c)=P(|βj,<
/mo>k(1)-βj,k(2)|>c∣D1,D2) :MATH]
for each choice. Then, we monitor
[MATH:
∑j,k(fij(c1)-fij(c2))2 :MATH]
and find the smallest c showing stability around its neighborhood to
assess its sensitivity on the estimate.
Databases
The T2DiACoD database, as described by Rani et al. (2017) [[68]36], was
employed to collate a comprehensive list of genes linked to comorbid
conditions associated with Type 2 Diabetes Mellitus (T2DM). Gene
expression datasets were also sourced from the GTEx database [[69]37].
T2DiACoD is a meticulously compiled database, the result of rigorous
research and systematic literature review. It catalogues genes and
noncoding RNAs that are crucial to understanding T2DM and its frequent
comorbidities, including atherosclerosis, nephropathy, diabetic
retinopathy, and cardiovascular disorders. This repository, enriched
through meticulous data integration from existing databases,
encapsulates 650 genes and 34 microRNAs related to these conditions,
providing a reliable resource for your research.
The genotype-tissue expression (GTEx) project is a vast open-access
platform that empowers researchers like us with the distribution of
genomic data collected from various individuals. This repository
encompasses a broad spectrum of genomic data, from sequencing to
methylation analyses, providing a wealth of information at your
fingertips.
GTEx offers detailed metadata for each sample, covering aspects such as
tissue type, sex, and age, categorized into six distinct groups. This
makes GTEx an invaluable asset for investigating the interplay between
age and tissue-specific gene expression. As of February 1st, the GTEx
database boasts a collection of 17,382 samples across 54 tissue types
from 948 donors, all accessible via the GTEx web portal. This portal
enables users to efficiently search for and visualize data [[70]12,
[71]38, [72]39]. Furthermore, the data can be downloaded for in-depth
analysis with custom scripts. In our research, we leveraged data from
various tissues, dividing samples into two categories based on sex. Our
study focused on the genes that play a pivotal role in developing
T2DM-related complications, examining nine tissues (including blood,
brain, adipose, amygdala, aorta, colon, coronary, liver, and lung
tissues).
To ensure the validity of our research, we meticulously selected an
equal number of samples from each tissue type, maintaining a balanced
representation of age groups and uniformity in sample sizes across
tissues. This approach facilitated an equitable distribution of age
groups within each sex-based category for each tissue analyzed,
ensuring the precision and reliability of our findings.
Focusing on atherosclerosis, we obtained a list of 115 genes related to
this disease in T2DiACoD database, while for diabetes we obtained a
list of 650 genes. We retrieved expression data by employing
GTExVisualizer [[73]8, [74]40], and metadata related to tissue, sex and
age of the sample are extracted using genes identified in the T2DiACoD
database in the previous step. Expression data are measured as
Transcript per Million (TPM). This data integration and gene enrichment
process was performed using an ad-hoc realised script that has been
integrated into GTExVisualizer. We performed the analysis at tissue
level, thus for each considered tissue, we split the data into male and
female samples and randomly selected the same number of samples. We
first generated DN by using non parametric methods and we evaluated the
biological significance by means of enrichment methods.
Results
To show the effectiveness of our method we present two case studies on
two chronic diseases to show differential mechanisms related to sex
differences. We evaluated differential networks between men and women
focusing on genes related to diabetes and atherosclerosis as reported
in the T2DiaCoD database [[75]36]. The characteristics of all the DNs
are reported in Tables [76]1 and [77]2. Then, to evaluate the
biological significance of the resulted DNs, we performed a pathway
enrichment analysis. Functional enrichment analysis, also known as
pathway enrichment analysis (PEA), is a bioinformatics technique used
to identify biological pathways that are significantly over-represented
in a given list of genes compared to what would be expected by chance.
These biological functions are stored in bioinformatics databases such
KEGG, and statistical methods like Fisher’s exact test, that computes
the p-value of the enrichment are used to determine the most enriched
pathways. All PEA tools require a gene list as input. However, some
tools accept genomic regions instead of gene lists and first map these
regions to their associated genes. This process, known as genomic
regions enrichment analysis, is helpful for uncovering biological
pathways related to specific chromosome regions.
Table 1.
Characteristics of the differential networks related to diabetes
DN Nodes Edges
Liver 128 3340
Aorta 237 2116
Heart 238 4316
[78]Open in a new tab
Table 2.
Characteristics of the differential networks related to atherosclerosis
DN Nodes Edges
Adipose Visceral 12 32
Artery Coronary 11 30
Artery Tibial 11 12
Blood 2 1
[79]Open in a new tab
Thus, for each network, we performed a pathway based on the KEGG
pathway database [[80]41] available on STRING enrichmennt app [[81]42]
of the Cytoscape software [[82]43].
Diabetes related differential networks
Liver Tissue For liver tissue in diabetes, we obtained a DN with 128
nodes and 3340 edges, see Fig. [83]2. The enrichment analysis
highlighted the presence of some enriched pathways between sex, see
Table [84]3. Figure [85]3 depicts the subnetworks of the differential
network for Liver tissue in Diabetes, where, we highlighted the genes
involved in the resulted pathways in different colours. In order to
evaluate
Fig. 2.
[86]Fig. 2
[87]Open in a new tab
The figure shows the differential network between men and women
focusing on genes in liver tissue related to diabetes disease
Table 3.
Top Enriched Pathways of liver tissue in Diabetes
Description Term name P-value with Fisher
Cytokine-cytokine receptor interaction hsa04060
[MATH:
5.15e-8 :MATH]
Viral protein interaction with cytokine and cytokine receptor hsa04061
[MATH:
3.48e-7 :MATH]
Chemokine signaling pathway hsa04062
[MATH:
3.47e-6 :MATH]
Inflammatory bowel disease hsa05321
[MATH:
3.47e-6 :MATH]
Toll-like receptor signaling pathway hsa04620
[MATH:
5.97e-5 :MATH]
Malaria hsa05144
[MATH:
2.6e-4 :MATH]
Rheumatoid arthritis hsa05323
[MATH:
2.6e-4 :MATH]
Renin-angiotensin system hsa04614
[MATH:
3.5e-4 :MATH]
Th17 cell differentiation hsa04659
[MATH:
4.1e-4 :MATH]
Chagas disease hsa05142
[MATH:
4.1e-4 :MATH]
[88]Open in a new tab
Fig. 3.
[89]Fig. 3
[90]Open in a new tab
The figure shows selected subnetworks of the differential network for
Liver tissue in Diabetes. In detail, starting from the differential
network, we highlighted the genes involved in the pathways resulted
from enriched analysis in different colours, whereas the genes not
involved in pathways are identified in grey. In particular, the genes
involved in Cytokine-cytokine receptor interaction are coloured in
purple, the genes involved in Inflammatory bowel disease are coloured
in green, the genes involved in Viral protein interaction with cytokine
and cytokine receptor are coloured in pink and the genes involved in
Chemokine signaling pathway are coloured in blue
Aorta Tissue For Aorta tissue in Diabetes we obtained a DN with 237
nodes and 2116 edges, see Fig. [91]4. The enrichment analysis
highlighted the presence of some enriched pathways between sex, see
Table [92]4. Figure [93]5 depicts the subnetworks of the differential
network for Liver tissue in Diabetes, where, we highlighted the genes
involved in the resulted pathways in different colours.
Fig. 4.
Fig. 4
[94]Open in a new tab
The figure shows the differential network between men and women
focusing on genes in aorta tissue related to diabetes disease
Table 4.
Top Enriched Pathways of aorta tissue in Diabetes
Description Term name P-value with Fisher
Toll-like receptor signaling pathway hsa04620
[MATH:
7.05e-19
msup> :MATH]
NOD-like receptor signaling pathway hsa04621
[MATH:
7.57e-16
msup> :MATH]
Shigellosis hsa05131
[MATH:
7.57e-16
msup> :MATH]
Hepatitis B hsa05161
[MATH:
1.08e-15
msup> :MATH]
Salmonella infection hsa05132
[MATH:
1.5e-15 :MATH]
TNF signaling pathway hsa04668
[MATH:
4.27e-15
msup> :MATH]
NF-kappa B signaling pathway hsa04064
[MATH:
1.31e-14
msup> :MATH]
Toxoplasmosis hsa05145
[MATH:
1.55e-14
msup> :MATH]
PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235
[MATH:
1.82e-14
msup> :MATH]
PI3K-Akt signaling pathway hsa04151
[MATH:
6.36e-14
msup> :MATH]
[95]Open in a new tab
Fig. 5.
[96]Fig. 5
[97]Open in a new tab
The figure shows Selected subnetwork of the differential network for
Aorta tissue in Diabetes. In detail, starting from the differential
network, we highlighted the genes involved in the pathways resulted
from enriched analysis in different colours, whereas the genes not
involved in pathways are identified in grey. In particular, the genes
involved Toll-like receptor signaling pathway are coloured in blue, the
genes involved in NOD-like receptor signaling pathway are coloured in
yellow, the genes involved in Hepatitis B are coloured in pink, and the
genes involved in Shigellosis are coloured in purple
Heart Tissue For Hearth tissue in Diabetes we obtained a DN with 238
nodes and 4316 edges, see Fig. [98]6. The enrichment analysis
highlighted the presence of some enriched pathways between sex, see
Table [99]5. Figure [100]7 depicts the subnetworks of the differential
network for Liver tissue in Diabetes, where, we highlighted the genes
involved in the resulted pathways in different colours.
Fig. 6.
Fig. 6
[101]Open in a new tab
The figure shows the differential network between men and women
focusing on genes in heart tissue related to diabetes disease
Table 5.
Top Enriched Pathways of heart tissue in Diabetes
Description Term name P-value with Fisher
NOD-like receptor signaling pathway hsa04621
[MATH:
2.33e-18
msup> :MATH]
Toll-like receptor signaling pathway hsa04620
[MATH:
8.48e-18
msup> :MATH]
TNF signaling pathway hsa04668
[MATH:
6.15e-16
msup> :MATH]
Cytokine-cytokine receptor interaction hsa04060
[MATH:
4.69e-14
msup> :MATH]
NF-kappa B signaling pathway hsa04064
[MATH:
2.34e-13
msup> :MATH]
Salmonella infection hsa05132
[MATH:
3.34e-13
msup> :MATH]
Inflammatory bowel disease hsa05321
[MATH:
3.79e-13
msup> :MATH]
Shigellosis hsa05131
[MATH:
5.73e-13
msup> :MATH]
PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235
[MATH:
3.01e-12
msup> :MATH]
Viral protein interaction with cytokine and cytokine receptor hsa04061
[MATH:
1.04e-11
msup> :MATH]
[102]Open in a new tab
Fig. 7.
[103]Fig. 7
[104]Open in a new tab
The figure shows selected subnetwork of the differential network for
Heart tissue in Diabetes. In detail, starting from the differential
network, we highlighted the genes involved in the pathways resulted
from enriched analysis in different colours, whereas the genes not
involved in pathways are identified in grey. In particular, the genes
involved Toll-like receptor signaling pathway are coloured in blue, the
genes involved in NOD-like receptor signaling pathway are coloured in
red, the genes involved in Cytokine-cytokine receptor interaction are
coloured in pink, and the genes involved in Inflammatory bowel disease
are coloured in yellow
Figure [105]8 shows a selected subnetwork of the differential network
in Liver, Hearth and Aorta Tissue.
Fig. 8.
Fig. 8
[106]Open in a new tab
Differential network related to diabetes expressed in Liver, Heart,
Aorta tissues. The genes expressed in different tissues are reported in
green, whereas the genes of specific tissue are reported in blue
Finally, we applied the Markov Clustering Algorithm to perform a
cluster analysis on differential networks. Figures [107]9, [108]10,
[109]11 depict the clustered differential networks for Liver, Aorta,
Heart tissues.
Fig. 9.
Fig. 9
[110]Open in a new tab
Clustered Differential subnetwork for Liver tissue in Diabetes
Fig. 10.
Fig. 10
[111]Open in a new tab
Clustered Differential subnetwork for Aorta tissue in Diabetes
Fig. 11.
Fig. 11
[112]Open in a new tab
Clustered Differential subnetwork for Heart tissue in Diabetes
Atherosclerosis related differential networks
Adipose Visceral Tissue For Adipose Visceral tissue in Atherosclerosis
we obtained a DN with 12 nodes and 32 edges, see Fig. [113]12. The
enrichment analysis did not report the presence of some enriched
pathways between sex.
Fig. 12.
Fig. 12
[114]Open in a new tab
Differential network for Adipose Visceral tissue in Atherosclerosis
Artery Coronary Tissue For Artery Coronary tissue in Atherosclerosis we
identified a network of 11 nodes and 30 edges, see Fig. [115]13. The
enrichment analysis highlighted the presence of some enriched pathways
between sex, see Table [116]6. Figure [117]14 depicts the subnetworks
of the differential network for Liver tissue in Diabetes, where, we
highlighted the genes involved in the resulted pathways in different
colours.
Fig. 13.
Fig. 13
[118]Open in a new tab
The figure shows the differential network between men and women
focusing on genes in artery coronary tissue related to atherosclerosis
disease
Table 6.
Enriched Pathway of Artery Coronary and Artery Tibial tissue in
Atherosclerosis
Description Term name P-value with Fisher
PPAR signaling pathway of Artery Coronary hsa03320
[MATH:
1.48e-7 :MATH]
PPAR signaling pathway of Artery Tibial hsa03320
[MATH:
2.59e-5 :MATH]
[119]Open in a new tab
Fig. 14.
[120]Fig. 14
[121]Open in a new tab
The figure shows selected subnetwork of the differential network for
Artery Coronary tissue in Atherosclerosis. In detail, starting from the
differential network, we highlighted the genes involved in the pathways
resulted from enriched analysis in yellow, whereas the genes not
involved in pathways are identified in grey. In particular, the genes
involved in PPAR signaling pathway are reported in yellow
Artery Tibial Tissue For Artery Tibial tissue in Atherosclerosis we
identified a network of 11 nodes and 12 edges, see Fig. [122]15. The
enrichment analysis highlighted the presence of some enriched pathways
between sex, see Table [123]4. Figure [124]16 depicts the subnetwork of
the differential network for Liver tissue in Diabetes, where, we
highlighted the genes involved in the resulted pathways in different
colours.
Fig. 15.
Fig. 15
[125]Open in a new tab
The figure shows the differential network between men and women
focusing on genes in artery tibial tissue related to atherosclerosis
disease
Fig. 16.
[126]Fig. 16
[127]Open in a new tab
A Selected subnetwork of the differential network for Artery Tibial
tissue in Atherosclerosis. In detail, starting from the differential
network, we highlighted the genes involved in the pathways resulted
from enriched analysis in yellow, whereas the genes not involved in
pathways are identified in grey. In particular, the genes involved in
PPAR-
[MATH: γ :MATH]
signaling pathway are reported in yellow
Blood Tissue For Blood tissue in Atherosclerosis we identified a
network of 2 nodes and 1 edges, see Fig. [128]17. The enrichment
analysis did not report the presence of some enriched pathways between
sex.
Fig. 17.
Fig. 17
[129]Open in a new tab
Differential network for Blood tissue in Atherosclerosis
We summarize in Fig. [130]18 a selected subnetwork of the differential
network in Adipose Visceral, Artery Coronary, and Aorta Tissue, Artery
Tibial and Blood tissues.
Fig. 18.
Fig. 18
[131]Open in a new tab
Differential network related to atherosclerosis expressed in Adipose
Visceral, Artery Coronary, Artery Tibial tissues. The genes expressed
in different tissues are reported in green, whereas the genes of
specific tissue are reported in blue
Finally, we applied the Markov Clustering Algorithm to perform a
cluster analysis on differential networks. Figures [132]19, [133]20,
[134]21 depict the clustered differential networks for Adipose
Visceral, Artery Coronary, Artery Tibial tissues.
Fig. 19.
Fig. 19
[135]Open in a new tab
Clustered Differential network for Adipose Visceral tissue in
Atherosclerosis
Fig. 20.
Fig. 20
[136]Open in a new tab
Clustered Differential network for Artery Coronary tissue in
Atherosclerosis
Fig. 21.
Fig. 21
[137]Open in a new tab
Clustered Differential network for Artery Tibial tissue in
Atherosclerosis
Comparison with baseline methods
In order to evaluate the effectiveness of our method with respect to
other parametric ones, we compared Conga with a R package iDINGO
[[138]44]. iDINGO is able to infer group-specific dependencies and
build differential networks. We built the differential network for
diabetes dataset and atherosclerosis dataset. We run iDINGO using
default parameters. However, iDINGO built the differential networks
according to NP-complete approach by including in the building of the
network the whole set of edges. We reported the characteristics of the
differential networks related to diabetes and atherosclerosis in Tables
[139]7 and [140]8.
Table 7.
Characteristics of the differential networks related to diabetes built
with iDINGO
DN Nodes Edges
Liver 128 8127
Aorta 238 28679
Heart 238 28679
[141]Open in a new tab
Table 8.
Characteristics of the differential networks related to atherosclerosis
built with iDINGO
DN Nodes Edges
Adipose Visceral 12 45
Artery Coronary 11 45
Artery Tibial 11 45
Blood 2 1
[142]Open in a new tab
We subsequently performed the path enrichment analysis on the DN, and
found that the pathways detected for each tissue for diabetes were
fewer than those built with our method, whereas none pathway was
detected for atherosclerosis. This demonstrates that our methodology is
able to build networks that have a greater biological information
content than classical methods. We reported the results of pathway
enrichment analysis in Tables [143]9, [144]10 and [145]11.
Table 9.
Top Enriched Pathways of liver tissue in Diabetes related to DNs buitl
with iDINGO
Description Term name P-value with Fisher
Cytokine-cytokine receptor interaction hsa04060
[MATH:
2.31e-6 :MATH]
Viral protein interaction with cytokine and cytokine receptor hsa04061
[MATH:
2.7e-6 :MATH]
Chemokine signaling pathway hsa04062
[MATH:
4.57e-5 :MATH]
Inflammatory bowel disease hsa05321
[MATH:
2.31e-6 :MATH]
Toll-like receptor signaling pathway hsa04620
[MATH:
3.46e-4 :MATH]
Malaria hsa05144
[MATH:
7e-4 :MATH]
[146]Open in a new tab
Table 10.
Top Enriched Pathways of aorta tissue in Diabetes related to DNs buitl
with iDINGO
Description Term name P-value with Fisher
Salmonella infection hsa05132
[MATH:
6.12e-14
msup> :MATH]
TNF signaling pathway hsa04668
[MATH:
8.55e-14
msup> :MATH]
NF-kappa B signaling pathway hsa04064
[MATH:
2.24e-13
msup> :MATH]
Toxoplasmosis hsa05145
[MATH:
2.65e-13
msup> :MATH]
PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235
[MATH:
2.68e-13
msup> :MATH]
PI3K-Akt signaling pathway hsa04151
[MATH:
3.5e-12 :MATH]
[147]Open in a new tab
Table 11.
Top Enriched Pathways of heart tissue in Diabetes related to DNs buitl
with iDINGO
Description Term name P-value with Fisher
NOD-like receptor signaling pathway hsa04621
[MATH:
2.25e-18
msup> :MATH]
Toll-like receptor signaling pathway hsa04620
[MATH:
1.17e-17
msup> :MATH]
TNF signaling pathway hsa04668
[MATH:
7.96e-16
msup> :MATH]
Cytokine-cytokine receptor interaction hsa04060
[MATH:
6.43e-14
msup> :MATH]
NF-kappa B signaling pathway hsa04064
[MATH:
2.91e-13
msup> :MATH]
[148]Open in a new tab
Discussion
In our research study, we introduced a method called differential
network analysis. This approach is particularly beneficial for datasets
that contain count data or non-parametric data, which are prevalent in
biological research but pose challenges due to their non-normal
distribution and discrete nature. By employing differential network
analysis, we can delve into the underlying network structures between
different conditions, providing a more comprehensive understanding of
biological variations and interactions that are not discernible through
traditional methods. Given the absence of an established standard in
this area, the rigorous evaluation of the biological relevance of the
networks derived from our differential network analysis was paramount.
We analysed the structure of the networks, ensuring their biological
significance and consistency with known biological pathways and
mechanisms. This comprehensive analysis not only validated the
practical utility of our methodology but also provided novel insights
into the biological systems under study. By correlating our findings
with existing biological knowledge and experimental data, we improved
the biological relevance of our results, further underscoring the value
of differential network analysis in uncovering biological patterns and
interactions. In the context of diabetes, we found some differential
networks in Liver, Aorta, and Heart tissues are enriched for the
Toll-like receptor signaling pathway, as indicated in Tables [149]3,
[150]4, [151]7. The toll-like receptor 4 (TLR-4) pathway has been
associated with various pathophysiological conditions, including
cardiovascular diseases (CVDs) and Rheumatoid Arthritis (RA),
underscoring the relevance of our findings in the broader context of
human health. Different studies demonstrated that TLR4 activates the
expression of several pro-inflammatory cytokine genes that play pivotal
roles in myocardial inflammation, particularly myocarditis, myocardial
infarction, ischemia-reperfusion injury, and heart failure
[[152]45–[153]48].
Also, we found in Aorta and Heart tissues the NOD-like receptor
signalling pathway; see Tables [154]6 and [155]7. The NOD-like Receptor
(NLR) family of proteins is a group of pattern recognition receptors
(PRRs) known to mediate the initial innate immune response to cellular
injury and stress. Different studies reveal the role of the activation
of the Nod-like receptor protein 3 (NLRP3) inflammasome in the
pathogenesis of many metabolic diseases, including diabetes and its
complications [[156]49].
Furthermore, the differential networks in the Aorta and Heart tissues
were enriched for the TNF signalling pathway; see Tables [157]6 and
[158]7. Lamki et al. [[159]50] reported that Tumor necrosis factor
(TNF) represent a central mediator of a broad range of biological
activities from cell proliferation, cell death and differentiation to
induction of inflammation and immune modulation. TNF mediates the
inflammatory response and regulates immune function. Inappropriate
production of TNF or sustained activation of TNF signalling has been
implicated in the pathogenesis of a broad spectrum of human diseases,
including diabetes, cancer, osteoporosis, allograft rejection, and
autoimmune diseases such as multiple sclerosis, rheumatoid arthritis,
and inflammatory bowel diseases [[160]50, [161]51].
Instead, for the differential networks in Liver tissue were reported
the chemokine signalling pathway that promotes changes in cellular
morphology [[162]52] and insulin signalling pathway that accounts for
selective insulin resistance [[163]53], see Table [164]6.
Also, the differential network in the Aorta tissue was enriched for the
PI3K-Akt signalling pathway, see Table [165]7. Phosphatidylinositol
3-kinases (PI3Ks) are crucial coordinators of intracellular signalling
in response to extracellular stimulators. The hyperactivation of PI3K
signalling cascades is one of the most common events in human cancers.
The high recurrence of phosphoinositide 3-kinase (PI3K) pathway
adjustments in cancer has led to a surge in the progression of PI3K
inhibitors [[166]54]. For diabetes, we found that differential networks
in Artery Coronary and Artery Tibial tissue were enriched for the PPAR
signalling pathway, which activation is linked to a correlation between
metabolic syndromes and cancer, see Table [167]6.
Conclusion
Differential network analysis (DINA) may help the understanding the
intricate interactions within biological systems, especially regarding
specific conditions or phenotypes. This study utilizes DINA to explore
the differential molecular interactions related to diabetes mellitus,
taking into account age and gender-specific variations. Differential
networks are critical since they enable the detection and visualisation
of variations in gene and protein interactions between males and
females. This approach helps to identify subtle biological
discrepancies that conventional analytical methods could miss.
Non-parametric methods for DINA are emphasized in the study since they
enhance the robustness and applicability of the findings, given the
complex nature of biological data, rather than assuming normal
distributions of gene expression data.
The study applies this method to liver tissue gene expression data
related to diabetes, identifying distinct networks that may be involved
in sex-specific disease mechanisms. These findings can provide insight
into why certain diseases exhibit different manifestations in men and
women, ultimately leading to more targeted approaches to treatment and
management. Identifying gender-specific differential networks helps to
understand the molecular basis of diabetes and its variation with sex,
providing potential pathways for therapeutic intervention and a deeper
understanding of the etiology of the disease.
This study demonstrates the effectiveness of the differential network
approach in discriminating between male and female biological samples.
The method effectively identifies and visualizes differences in
molecular interactions related to diabetes in liver tissue between the
two sexes. Such discrimination is vital for personalized medicine,
leading to more precise diagnostic and therapeutic strategies. By
effectively highlighting the differences in gene expression and
interactions, the study supports the potential of DINA in improving our
understanding of sex-specific traits in diseases, which is crucial for
advancing gender-specific medicine.
The article discusses the use of differential networks in biological
research, especially in complex diseases such as diabetes. Integrating
non-parametric methods improves the analysis by accounting for the
inherent complexities and variations in biological data often
overlooked in parametric approaches. This research contributes
significantly to our understanding of diabetes and sets a precedent for
future studies exploring other complex diseases with potential
variations in expression and interaction patterns across different
groups.
Acknowledgements