Abstract Background The rewiring of molecular interactions in various conditions leads to distinct phenotypic outcomes. Differential network analysis (DINA) is dedicated to exploring these rewirings within gene and protein networks. Leveraging statistical learning and graph theory, DINA algorithms scrutinize alterations in interaction patterns derived from experimental data. Results Introducing a novel approach to differential network analysis, we incorporate differential gene expression based on sex and gender attributes. We hypothesize that gene expression can be accurately represented through non-Gaussian processes. Our methodology involves quantifying changes in non-parametric correlations among gene pairs and expression levels of individual genes. Conclusions Applying our method to public expression datasets concerning diabetes mellitus and atherosclerosis in liver tissue, we identify gender-specific differential networks. Results underscore the biological relevance of our approach in uncovering meaningful molecular distinctions. Keywords: Differential network, DINA, Differential network analysis, Atherosclerosis, Diabetes Introduction The emergence of high-throughput technologies in genomics, proteomics, and non-coding RNA studies has revolutionized our understanding of how variations in the abundance of these biological molecules correlate with diseases [[30]1]. This deluge of data has spurred the creation of innovative analytical techniques that adopt a system-level approach through the lens of network science. These techniques employ networks to depict the complex interactions between biological molecules under specific conditions, as inferred from empirical data [[31]2]. A particularly compelling use of network theory is comparing biological networks under disparate conditions, such as contrasting a disease state with a healthy one. Differential network analysis (DINA) has gained recognition for its ability to delineate the differences between two states by encapsulating them within a singular differential network that highlights their variances [[32]3–[33]6]. DINA has found application in contrasting various experimental conditions or phenotypes, with recent studies underscoring the significance of factors like the age and sex of patients on drug response, disease progression, and comorbidity prevalence in chronic diseases [[34]7–[35]9]. Empirical findings have demonstrated notable disparities in disease incidence and progression based on sex and age, such as the heightened susceptibility of older diabetic patients to comorbidities and the observed sex-based differences in COVID-19 mortality rates [[36]10–[37]12]. This underscores the need for advanced algorithms to unravel the molecular mechanisms driving these age- and sex-dependent disparities. DINA algorithms are designed to pinpoint changes in network structures by identifying association measures that differ between two biological states, [MATH: C1,C2 :MATH] . When presented with two disparate biological conditions, represented by two networks of molecular interactions, DINA algorithms aim to uncover the network rewiring that underpins the mechanistic differences between these states. DINA algorithms construct networks [MATH: N1,N2 :MATH] for each condition, starting with gene expression datasets from two conditions. These networks feature nodes for each gene and weighted edges that denote the strength and nature of associations or causal relationships between genes. A differential network [MATH: Nd :MATH] is then derived to represent the variation in associations across conditions, a technique previously applied to investigate disease-related alterations [[38]8, [39]13]. Traditional methods often assume that gene expression data adhere to specific parametric distributions, such as Gaussian or Poisson distributions [[40]1, [41]14–[42]16]. However, the count-based nature of Next-Generation Sequencing (NGS) data challenges these assumptions, prompting the necessity for non-parametric DINA analysis approaches. We introduce a novel DINA methodology that identifies differential edges between networks and integrates differential gene expressions, taking into account sex-based differences [[43]12]. This approach employs multivariate count data for predicting gene expression levels and constructs a conditional dependence graph using pairwise Markov random fields [[44]17]. This departure from traditional methods, which often presuppose parametric distributions for gene expression data, highlights the imperative for non-parametric techniques in DINA analysis. Our proposed DINA algorithm is initiated by constructing two condition-specific graphs, from which a final differential graph is derived. This graph is then pruned to emphasize edges related to genes exhibiting differential expression. Our DINA method facilitates the identification of differential networks while incorporating considerations of gender differences, thereby advancing our understanding of the molecular basis of disease in the context of sex and age. Relevance of the DNA Differential networks are used to compare two or more networks based on changes in connectivity or interactions between nodes under different conditions or across different datasets. They are commonly applied in biological networks, such as gene expression or protein-protein interaction networks, to identify how interactions change in response to disease, treatment, or other stimuli [[45]18–[46]26]. For instance, DNA may be applied on Expression Networks in Healthy vs. Diseased States. Let imagine that Network 1 represents Healthy State. In the healthy state, a set of genes (nodes) are connected based on their co-expression (edges) For example, if Gene A and Gene B are highly co-expressed, there would be an edge between them. This network shows how genes typically interact in a healthy individual. Let imagine a Network 2 who models a diseased state (such as cancer) in which the expression levels of genes may change, and therefore the interactions between them may be altered. Some gene-gene interactions may be lost, and new interactions may emerge. The differential network is created by taking the difference between the two networks (Healthy vs. Diseased). This network will highlight both lost and novel connections. Interactions present in the healthy network but missing in the diseased network, and interactions that appear in the diseased state but were absent in the healthy state. For example: In the healthy state: Gene A interacts with Gene B and Gene C. In the diseased state: Gene A no longer interacts with Gene B, but interacts with Gene D instead. The differential network would show a lost edge between Gene A and Gene B and a new edge between Gene A and Gene D. This type of analysis helps in identifying key genes and pathways that may be responsible for the diseased condition or that could be potential therapeutic targets see Fig. [47]1. Fig. 1. Fig. 1 [48]Open in a new tab Toy example of differential network Examples of the relevance of DNA are reported in many works. For instance, by Ha et al [[49]27], who describes differential networks between two different subtypes of glioblastoma estimated from genomic data. Basha et al,. [[50]28] introduces an extensive differential network analysis of multiple human tissue interactomes. As a results they are able to evidence differences of processes between tissues. Related work DINA’s application in distinguishing differentially expressed genes among various sample groups is invaluable, especially in contrasting individuals with specific diseases against healthy controls. This methodological approach is crucial in molecular biology and bioinformatics for pinpointing genes with variable expression levels between diseased and healthy sample groups. Central to DINA-based research are algorithms designed to detect alterations in network structures under varying conditions [[51]29]. These algorithms have been pivotal in biology for mapping the transition from healthy to diseased states within the same biological framework [[52]2]. Our focus narrows to networks that maintain constant node sets yet exhibit variable edge sets. Specifically, in the presence of two distinct conditions [MATH: C1 :MATH] and [MATH: C2 :MATH] , represented by graphs [MATH: G1(V,E1) :MATH] and [MATH: G1(V,E2) :MATH] , the objective of DINA analysis is to pinpoint the modifications of the network. In biological systems analysis, it is pertinent to note that while nodes represent directly quantifiable entities, the derivation of edges necessitates observing a sequence of temporal data. For instance, gene networks originating from microarray experiments necessitate the inference of edges from data through statistical graphical models [[53]30–[54]32]. In these models, each node within the graph [MATH: G=(V,E) :MATH] is one of these measurable random variables [MATH: X1,…,XM :MATH] , and the edges quantify a pre-specified notion of associations between the pairs of these variables. In this setting, the focus is predominantly on undirected graphs where the directions or the causality of these associations are not of interest. Among different metrics of associations, partial correlation is one of the most common ones as it measures conditional dependencies. Probabilistic graphical models allow conditional dependency-based graph estimation. Differential associations within these models are scrutinized by evaluating the variance in partial correlations across experimental conditions, utilizing specific statistical tests to measure the alterations in correlations among entities. Additionally, changes in gene expression levels are assessed using the classical Student’s t-test [[55]33]. Subsequently, these statistical evaluations are amalgamated into a singular optimization model aiming to elucidate the hierarchical network structures. However, certain assumptions inherent to previous models, such as Gaussian data distribution, may not hold across all experimental conditions, necessitating non-parametric methods. While computationally efficient and simpler to implement, these methods demand adherence to specific distributional prerequisites, failing which could skew or invalidate the results. Several studies have opted for a nonparanormal data distribution (or Gaussian copula) approach [[56]34], employing rank-based correlation matrices like Spearman correlation or Kendall’s [MATH: τ :MATH] . There are other variations available too [[57]29]. However, the nonparanormal models are primarily only suitable for continuous data, which limits its applicability in other settings. These models have found applications in analyzing brain data and sequencing counts, circumventing the temporal limitations of non-parametric methods. Efficient Bayesian models have emerged [[58]29], calculating edge probabilities by inferring their likelihood. Some methods adopt diverse heuristics for probability inference, challenging direct data derivation, as highlighted in [[59]17], with this method surpassing other contemporary techniques. Non-parametric methods, recognized for their minimal assumptions regarding data distribution, leverage data-driven approaches to evaluate network connectivity differences between conditions. Their flexibility and robustness are advantageous in handling complex, non-linear node relationships within networks, albeit at the cost of computational intensity and reduced interpretability. The decision between parametric and non-parametric approaches for differential network analysis hinges on data characteristics, foundational assumptions, and the investigative query. Researchers frequently use sensitivity analysis and result cross-validation to ensure their findings’ robustness and reliability. Integrating insights from both methodological spectrums can yield a more detailed comprehension of the differential network architecture. Materials and methods Non parametric differential network analysis algorithm Let us consider two different expression datasets encoded in two matrices [MATH: Nj×M :MATH] ( [MATH: Nj :MATH] samples, M genes) for [MATH: j=1,2 :MATH] denoted as [MATH: X1 :MATH] , [MATH: X2 :MATH] representing two biological conditions [MATH: C1 :MATH] [MATH: C2 :MATH] . Each row of [MATH: Xj :MATH] stores the expression values of M genes of different samples. Therefore, [MATH: Xi,jc :MATH] [MATH: (c=1,2,i=1,…,Nc,j=1,…,M) :MATH] denotes expression of j-th gene in i-th sample under condition c. Note that the sample sizes under the two conditions may be different. We model this data under a Bayesian non-parametric framework. Each column representing a gene may be encoded as a network node and compute conditional independence-based graphical relation. Let [MATH: M×M :MATH] dimensional matrices [MATH: P1 :MATH] and [MATH: P2 :MATH] represents the conditional independence relation among the M genes [[60]17]. We then define the differential relation between two conditions based on the posterior samples of [MATH: P1 :MATH] and [MATH: P2 :MATH] . Following the pairwise Markov random field (MRF) model for counts from [[61]17], we consider the following joint probability mass function for M-dimensional count-valued data X, [MATH: Pr(X1,… ,XM)∝exp∑j=1M[αjXj-log(Xj!)]-∑ℓ=2M∑< /mo>j<ℓβjlF(Xj)F(Xℓ), :MATH] where [MATH: F(·) :MATH] is a monotone increasing bounded function with support [MATH: [0,∞) :MATH] . As in [[62]17], we let [MATH: F(·)=(tan-1(·))θ :MATH] for some positive [MATH: θ∈R+ :MATH] . Since the data is positive-valued, the range of F(X) is [MATH: (0,(π2)θ) :MATH] and the exponent [MATH: θ :MATH] is specified as a minimizer of the loss, quantifying the difference in covariance between F(X) and X following [[63]17]. For detailed descriptions of the method, readers are encouraged to check [[64]17]. Under this model, if [MATH: βjℓ=0 :MATH] , we have [MATH: Xj :MATH] and [MATH: Xℓ :MATH] to be conditionally independent, i.e. [MATH: P(Xj,Xℓ∣X-< mo stretchy="false">(j,ℓ))=P(Xj∣X-(j,ℓ))P(Xℓ∣X-(j,ℓ)) :MATH] , where [MATH: X-(j,ℓ) :MATH] stands for all the variables excluding [MATH: Xj :MATH] and [MATH: Xℓ :MATH] . Our estimated graphical relation thus would rely on [MATH: βjℓ :MATH] ’s. We take a Bayesian route for inference and put the same priors as in [[65]17]. Specifically, for the [MATH: βjl :MATH] ’s, we set simple independent and identically distributed mean zero Gaussian priors. The parameter [MATH: λj :MATH] ’s are treated as random effects and given distributions [MATH: Dj :MATH] . The distribution [MATH: Dj :MATH] governs the over-dispersion and the shape of the marginal count distribution for the [MATH: jth :MATH] node. To allow these marginals to be flexibly determined by the data, we take a Bayesian nonparametric approach using Dirichlet process (DP) priors, where [MATH: Dj∼DP(MjD0) :MATH] , with [MATH: D0 :MATH] as a Gamma base measure and [MATH: Mj :MATH] as a precision parameter. The precision parameter [MATH: Mj :MATH] follows a Gamma distribution, [MATH: Mj∼Ga(c,d) :MATH] , allowing for greater adaptivity to the data. We run MCMC to approximate the posterior and generate posterior samples of [MATH: βjl :MATH] ’s. The model follows a structure, similar to the Poisson auto-model [[66]35]. When [MATH: βjl :MATH] ’s are zero, the marginals lead to Poisson-type marginals with sample-specific means as [MATH: λj :MATH] ’s are modeled as random effects. Thus, they also need to be sampled for all the samples. This puts a moderately high computational cost. The general code to fit this model is in the second author’s GitHub page [67]https://github.com/royarkaprava/CONGA. After running the Markov chain Monte Carlo (MCMC) sampling for the above model under two conditions, we get the matrices [MATH: P1 :MATH] and [MATH: P2 :MATH] with (j, k)-th entries as [MATH: βj,k(1) :MATH] and [MATH: βj,k(2) :MATH] respectively. Consequently, a differential network is defined as the difference [MATH: βj,k(1)-βj,k(2) :MATH] for each edge (j, k) where [MATH: βj,k(1) :MATH] and [MATH: βj,k(2) :MATH] are the coefficients under two conditions 1 and 2. From the MCMC samples, we can get the posterior mean of these differences as [MATH: β^j,k(1)-β^j,k(2) :MATH] using the individual posterior means. Alternatively, we can compute other posterior summaries such as [MATH: P(|βj,k(1)-βj,k(2)|>c∣D1,D2) :MATH] , which is the posterior probability that [MATH: |βj,< /mo>k(1)-βj,k(2)| :MATH] is greater than some pre-specified cutoff c given the two datasets, denoted as [MATH: D1 :MATH] and [MATH: D2 :MATH] . We take the second approach to define our differential networks. To choose c adequately, we run a sensitivity test. Specifically, we vary c over a range, and compute [MATH: fij(c)=P(|βj,< /mo>k(1)-βj,k(2)|>c∣D1,D2) :MATH] for each choice. Then, we monitor [MATH: ∑j,k(fij(c1)-fij(c2))2 :MATH] and find the smallest c showing stability around its neighborhood to assess its sensitivity on the estimate. Databases The T2DiACoD database, as described by Rani et al. (2017) [[68]36], was employed to collate a comprehensive list of genes linked to comorbid conditions associated with Type 2 Diabetes Mellitus (T2DM). Gene expression datasets were also sourced from the GTEx database [[69]37]. T2DiACoD is a meticulously compiled database, the result of rigorous research and systematic literature review. It catalogues genes and noncoding RNAs that are crucial to understanding T2DM and its frequent comorbidities, including atherosclerosis, nephropathy, diabetic retinopathy, and cardiovascular disorders. This repository, enriched through meticulous data integration from existing databases, encapsulates 650 genes and 34 microRNAs related to these conditions, providing a reliable resource for your research. The genotype-tissue expression (GTEx) project is a vast open-access platform that empowers researchers like us with the distribution of genomic data collected from various individuals. This repository encompasses a broad spectrum of genomic data, from sequencing to methylation analyses, providing a wealth of information at your fingertips. GTEx offers detailed metadata for each sample, covering aspects such as tissue type, sex, and age, categorized into six distinct groups. This makes GTEx an invaluable asset for investigating the interplay between age and tissue-specific gene expression. As of February 1st, the GTEx database boasts a collection of 17,382 samples across 54 tissue types from 948 donors, all accessible via the GTEx web portal. This portal enables users to efficiently search for and visualize data [[70]12, [71]38, [72]39]. Furthermore, the data can be downloaded for in-depth analysis with custom scripts. In our research, we leveraged data from various tissues, dividing samples into two categories based on sex. Our study focused on the genes that play a pivotal role in developing T2DM-related complications, examining nine tissues (including blood, brain, adipose, amygdala, aorta, colon, coronary, liver, and lung tissues). To ensure the validity of our research, we meticulously selected an equal number of samples from each tissue type, maintaining a balanced representation of age groups and uniformity in sample sizes across tissues. This approach facilitated an equitable distribution of age groups within each sex-based category for each tissue analyzed, ensuring the precision and reliability of our findings. Focusing on atherosclerosis, we obtained a list of 115 genes related to this disease in T2DiACoD database, while for diabetes we obtained a list of 650 genes. We retrieved expression data by employing GTExVisualizer [[73]8, [74]40], and metadata related to tissue, sex and age of the sample are extracted using genes identified in the T2DiACoD database in the previous step. Expression data are measured as Transcript per Million (TPM). This data integration and gene enrichment process was performed using an ad-hoc realised script that has been integrated into GTExVisualizer. We performed the analysis at tissue level, thus for each considered tissue, we split the data into male and female samples and randomly selected the same number of samples. We first generated DN by using non parametric methods and we evaluated the biological significance by means of enrichment methods. Results To show the effectiveness of our method we present two case studies on two chronic diseases to show differential mechanisms related to sex differences. We evaluated differential networks between men and women focusing on genes related to diabetes and atherosclerosis as reported in the T2DiaCoD database [[75]36]. The characteristics of all the DNs are reported in Tables [76]1 and [77]2. Then, to evaluate the biological significance of the resulted DNs, we performed a pathway enrichment analysis. Functional enrichment analysis, also known as pathway enrichment analysis (PEA), is a bioinformatics technique used to identify biological pathways that are significantly over-represented in a given list of genes compared to what would be expected by chance. These biological functions are stored in bioinformatics databases such KEGG, and statistical methods like Fisher’s exact test, that computes the p-value of the enrichment are used to determine the most enriched pathways. All PEA tools require a gene list as input. However, some tools accept genomic regions instead of gene lists and first map these regions to their associated genes. This process, known as genomic regions enrichment analysis, is helpful for uncovering biological pathways related to specific chromosome regions. Table 1. Characteristics of the differential networks related to diabetes DN Nodes Edges Liver 128 3340 Aorta 237 2116 Heart 238 4316 [78]Open in a new tab Table 2. Characteristics of the differential networks related to atherosclerosis DN Nodes Edges Adipose Visceral 12 32 Artery Coronary 11 30 Artery Tibial 11 12 Blood 2 1 [79]Open in a new tab Thus, for each network, we performed a pathway based on the KEGG pathway database [[80]41] available on STRING enrichmennt app [[81]42] of the Cytoscape software [[82]43]. Diabetes related differential networks Liver Tissue For liver tissue in diabetes, we obtained a DN with 128 nodes and 3340 edges, see Fig. [83]2. The enrichment analysis highlighted the presence of some enriched pathways between sex, see Table [84]3. Figure [85]3 depicts the subnetworks of the differential network for Liver tissue in Diabetes, where, we highlighted the genes involved in the resulted pathways in different colours. In order to evaluate Fig. 2. [86]Fig. 2 [87]Open in a new tab The figure shows the differential network between men and women focusing on genes in liver tissue related to diabetes disease Table 3. Top Enriched Pathways of liver tissue in Diabetes Description Term name P-value with Fisher Cytokine-cytokine receptor interaction hsa04060 [MATH: 5.15e-8 :MATH] Viral protein interaction with cytokine and cytokine receptor hsa04061 [MATH: 3.48e-7 :MATH] Chemokine signaling pathway hsa04062 [MATH: 3.47e-6 :MATH] Inflammatory bowel disease hsa05321 [MATH: 3.47e-6 :MATH] Toll-like receptor signaling pathway hsa04620 [MATH: 5.97e-5 :MATH] Malaria hsa05144 [MATH: 2.6e-4 :MATH] Rheumatoid arthritis hsa05323 [MATH: 2.6e-4 :MATH] Renin-angiotensin system hsa04614 [MATH: 3.5e-4 :MATH] Th17 cell differentiation hsa04659 [MATH: 4.1e-4 :MATH] Chagas disease hsa05142 [MATH: 4.1e-4 :MATH] [88]Open in a new tab Fig. 3. [89]Fig. 3 [90]Open in a new tab The figure shows selected subnetworks of the differential network for Liver tissue in Diabetes. In detail, starting from the differential network, we highlighted the genes involved in the pathways resulted from enriched analysis in different colours, whereas the genes not involved in pathways are identified in grey. In particular, the genes involved in Cytokine-cytokine receptor interaction are coloured in purple, the genes involved in Inflammatory bowel disease are coloured in green, the genes involved in Viral protein interaction with cytokine and cytokine receptor are coloured in pink and the genes involved in Chemokine signaling pathway are coloured in blue Aorta Tissue For Aorta tissue in Diabetes we obtained a DN with 237 nodes and 2116 edges, see Fig. [91]4. The enrichment analysis highlighted the presence of some enriched pathways between sex, see Table [92]4. Figure [93]5 depicts the subnetworks of the differential network for Liver tissue in Diabetes, where, we highlighted the genes involved in the resulted pathways in different colours. Fig. 4. Fig. 4 [94]Open in a new tab The figure shows the differential network between men and women focusing on genes in aorta tissue related to diabetes disease Table 4. Top Enriched Pathways of aorta tissue in Diabetes Description Term name P-value with Fisher Toll-like receptor signaling pathway hsa04620 [MATH: 7.05e-19 :MATH] NOD-like receptor signaling pathway hsa04621 [MATH: 7.57e-16 :MATH] Shigellosis hsa05131 [MATH: 7.57e-16 :MATH] Hepatitis B hsa05161 [MATH: 1.08e-15 :MATH] Salmonella infection hsa05132 [MATH: 1.5e-15 :MATH] TNF signaling pathway hsa04668 [MATH: 4.27e-15 :MATH] NF-kappa B signaling pathway hsa04064 [MATH: 1.31e-14 :MATH] Toxoplasmosis hsa05145 [MATH: 1.55e-14 :MATH] PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235 [MATH: 1.82e-14 :MATH] PI3K-Akt signaling pathway hsa04151 [MATH: 6.36e-14 :MATH] [95]Open in a new tab Fig. 5. [96]Fig. 5 [97]Open in a new tab The figure shows Selected subnetwork of the differential network for Aorta tissue in Diabetes. In detail, starting from the differential network, we highlighted the genes involved in the pathways resulted from enriched analysis in different colours, whereas the genes not involved in pathways are identified in grey. In particular, the genes involved Toll-like receptor signaling pathway are coloured in blue, the genes involved in NOD-like receptor signaling pathway are coloured in yellow, the genes involved in Hepatitis B are coloured in pink, and the genes involved in Shigellosis are coloured in purple Heart Tissue For Hearth tissue in Diabetes we obtained a DN with 238 nodes and 4316 edges, see Fig. [98]6. The enrichment analysis highlighted the presence of some enriched pathways between sex, see Table [99]5. Figure [100]7 depicts the subnetworks of the differential network for Liver tissue in Diabetes, where, we highlighted the genes involved in the resulted pathways in different colours. Fig. 6. Fig. 6 [101]Open in a new tab The figure shows the differential network between men and women focusing on genes in heart tissue related to diabetes disease Table 5. Top Enriched Pathways of heart tissue in Diabetes Description Term name P-value with Fisher NOD-like receptor signaling pathway hsa04621 [MATH: 2.33e-18 :MATH] Toll-like receptor signaling pathway hsa04620 [MATH: 8.48e-18 :MATH] TNF signaling pathway hsa04668 [MATH: 6.15e-16 :MATH] Cytokine-cytokine receptor interaction hsa04060 [MATH: 4.69e-14 :MATH] NF-kappa B signaling pathway hsa04064 [MATH: 2.34e-13 :MATH] Salmonella infection hsa05132 [MATH: 3.34e-13 :MATH] Inflammatory bowel disease hsa05321 [MATH: 3.79e-13 :MATH] Shigellosis hsa05131 [MATH: 5.73e-13 :MATH] PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235 [MATH: 3.01e-12 :MATH] Viral protein interaction with cytokine and cytokine receptor hsa04061 [MATH: 1.04e-11 :MATH] [102]Open in a new tab Fig. 7. [103]Fig. 7 [104]Open in a new tab The figure shows selected subnetwork of the differential network for Heart tissue in Diabetes. In detail, starting from the differential network, we highlighted the genes involved in the pathways resulted from enriched analysis in different colours, whereas the genes not involved in pathways are identified in grey. In particular, the genes involved Toll-like receptor signaling pathway are coloured in blue, the genes involved in NOD-like receptor signaling pathway are coloured in red, the genes involved in Cytokine-cytokine receptor interaction are coloured in pink, and the genes involved in Inflammatory bowel disease are coloured in yellow Figure [105]8 shows a selected subnetwork of the differential network in Liver, Hearth and Aorta Tissue. Fig. 8. Fig. 8 [106]Open in a new tab Differential network related to diabetes expressed in Liver, Heart, Aorta tissues. The genes expressed in different tissues are reported in green, whereas the genes of specific tissue are reported in blue Finally, we applied the Markov Clustering Algorithm to perform a cluster analysis on differential networks. Figures [107]9, [108]10, [109]11 depict the clustered differential networks for Liver, Aorta, Heart tissues. Fig. 9. Fig. 9 [110]Open in a new tab Clustered Differential subnetwork for Liver tissue in Diabetes Fig. 10. Fig. 10 [111]Open in a new tab Clustered Differential subnetwork for Aorta tissue in Diabetes Fig. 11. Fig. 11 [112]Open in a new tab Clustered Differential subnetwork for Heart tissue in Diabetes Atherosclerosis related differential networks Adipose Visceral Tissue For Adipose Visceral tissue in Atherosclerosis we obtained a DN with 12 nodes and 32 edges, see Fig. [113]12. The enrichment analysis did not report the presence of some enriched pathways between sex. Fig. 12. Fig. 12 [114]Open in a new tab Differential network for Adipose Visceral tissue in Atherosclerosis Artery Coronary Tissue For Artery Coronary tissue in Atherosclerosis we identified a network of 11 nodes and 30 edges, see Fig. [115]13. The enrichment analysis highlighted the presence of some enriched pathways between sex, see Table [116]6. Figure [117]14 depicts the subnetworks of the differential network for Liver tissue in Diabetes, where, we highlighted the genes involved in the resulted pathways in different colours. Fig. 13. Fig. 13 [118]Open in a new tab The figure shows the differential network between men and women focusing on genes in artery coronary tissue related to atherosclerosis disease Table 6. Enriched Pathway of Artery Coronary and Artery Tibial tissue in Atherosclerosis Description Term name P-value with Fisher PPAR signaling pathway of Artery Coronary hsa03320 [MATH: 1.48e-7 :MATH] PPAR signaling pathway of Artery Tibial hsa03320 [MATH: 2.59e-5 :MATH] [119]Open in a new tab Fig. 14. [120]Fig. 14 [121]Open in a new tab The figure shows selected subnetwork of the differential network for Artery Coronary tissue in Atherosclerosis. In detail, starting from the differential network, we highlighted the genes involved in the pathways resulted from enriched analysis in yellow, whereas the genes not involved in pathways are identified in grey. In particular, the genes involved in PPAR signaling pathway are reported in yellow Artery Tibial Tissue For Artery Tibial tissue in Atherosclerosis we identified a network of 11 nodes and 12 edges, see Fig. [122]15. The enrichment analysis highlighted the presence of some enriched pathways between sex, see Table [123]4. Figure [124]16 depicts the subnetwork of the differential network for Liver tissue in Diabetes, where, we highlighted the genes involved in the resulted pathways in different colours. Fig. 15. Fig. 15 [125]Open in a new tab The figure shows the differential network between men and women focusing on genes in artery tibial tissue related to atherosclerosis disease Fig. 16. [126]Fig. 16 [127]Open in a new tab A Selected subnetwork of the differential network for Artery Tibial tissue in Atherosclerosis. In detail, starting from the differential network, we highlighted the genes involved in the pathways resulted from enriched analysis in yellow, whereas the genes not involved in pathways are identified in grey. In particular, the genes involved in PPAR- [MATH: γ :MATH] signaling pathway are reported in yellow Blood Tissue For Blood tissue in Atherosclerosis we identified a network of 2 nodes and 1 edges, see Fig. [128]17. The enrichment analysis did not report the presence of some enriched pathways between sex. Fig. 17. Fig. 17 [129]Open in a new tab Differential network for Blood tissue in Atherosclerosis We summarize in Fig. [130]18 a selected subnetwork of the differential network in Adipose Visceral, Artery Coronary, and Aorta Tissue, Artery Tibial and Blood tissues. Fig. 18. Fig. 18 [131]Open in a new tab Differential network related to atherosclerosis expressed in Adipose Visceral, Artery Coronary, Artery Tibial tissues. The genes expressed in different tissues are reported in green, whereas the genes of specific tissue are reported in blue Finally, we applied the Markov Clustering Algorithm to perform a cluster analysis on differential networks. Figures [132]19, [133]20, [134]21 depict the clustered differential networks for Adipose Visceral, Artery Coronary, Artery Tibial tissues. Fig. 19. Fig. 19 [135]Open in a new tab Clustered Differential network for Adipose Visceral tissue in Atherosclerosis Fig. 20. Fig. 20 [136]Open in a new tab Clustered Differential network for Artery Coronary tissue in Atherosclerosis Fig. 21. Fig. 21 [137]Open in a new tab Clustered Differential network for Artery Tibial tissue in Atherosclerosis Comparison with baseline methods In order to evaluate the effectiveness of our method with respect to other parametric ones, we compared Conga with a R package iDINGO [[138]44]. iDINGO is able to infer group-specific dependencies and build differential networks. We built the differential network for diabetes dataset and atherosclerosis dataset. We run iDINGO using default parameters. However, iDINGO built the differential networks according to NP-complete approach by including in the building of the network the whole set of edges. We reported the characteristics of the differential networks related to diabetes and atherosclerosis in Tables [139]7 and [140]8. Table 7. Characteristics of the differential networks related to diabetes built with iDINGO DN Nodes Edges Liver 128 8127 Aorta 238 28679 Heart 238 28679 [141]Open in a new tab Table 8. Characteristics of the differential networks related to atherosclerosis built with iDINGO DN Nodes Edges Adipose Visceral 12 45 Artery Coronary 11 45 Artery Tibial 11 45 Blood 2 1 [142]Open in a new tab We subsequently performed the path enrichment analysis on the DN, and found that the pathways detected for each tissue for diabetes were fewer than those built with our method, whereas none pathway was detected for atherosclerosis. This demonstrates that our methodology is able to build networks that have a greater biological information content than classical methods. We reported the results of pathway enrichment analysis in Tables [143]9, [144]10 and [145]11. Table 9. Top Enriched Pathways of liver tissue in Diabetes related to DNs buitl with iDINGO Description Term name P-value with Fisher Cytokine-cytokine receptor interaction hsa04060 [MATH: 2.31e-6 :MATH] Viral protein interaction with cytokine and cytokine receptor hsa04061 [MATH: 2.7e-6 :MATH] Chemokine signaling pathway hsa04062 [MATH: 4.57e-5 :MATH] Inflammatory bowel disease hsa05321 [MATH: 2.31e-6 :MATH] Toll-like receptor signaling pathway hsa04620 [MATH: 3.46e-4 :MATH] Malaria hsa05144 [MATH: 7e-4 :MATH] [146]Open in a new tab Table 10. Top Enriched Pathways of aorta tissue in Diabetes related to DNs buitl with iDINGO Description Term name P-value with Fisher Salmonella infection hsa05132 [MATH: 6.12e-14 :MATH] TNF signaling pathway hsa04668 [MATH: 8.55e-14 :MATH] NF-kappa B signaling pathway hsa04064 [MATH: 2.24e-13 :MATH] Toxoplasmosis hsa05145 [MATH: 2.65e-13 :MATH] PD-L1 expression and PD-1 checkpoint pathway in cancer hsa05235 [MATH: 2.68e-13 :MATH] PI3K-Akt signaling pathway hsa04151 [MATH: 3.5e-12 :MATH] [147]Open in a new tab Table 11. Top Enriched Pathways of heart tissue in Diabetes related to DNs buitl with iDINGO Description Term name P-value with Fisher NOD-like receptor signaling pathway hsa04621 [MATH: 2.25e-18 :MATH] Toll-like receptor signaling pathway hsa04620 [MATH: 1.17e-17 :MATH] TNF signaling pathway hsa04668 [MATH: 7.96e-16 :MATH] Cytokine-cytokine receptor interaction hsa04060 [MATH: 6.43e-14 :MATH] NF-kappa B signaling pathway hsa04064 [MATH: 2.91e-13 :MATH] [148]Open in a new tab Discussion In our research study, we introduced a method called differential network analysis. This approach is particularly beneficial for datasets that contain count data or non-parametric data, which are prevalent in biological research but pose challenges due to their non-normal distribution and discrete nature. By employing differential network analysis, we can delve into the underlying network structures between different conditions, providing a more comprehensive understanding of biological variations and interactions that are not discernible through traditional methods. Given the absence of an established standard in this area, the rigorous evaluation of the biological relevance of the networks derived from our differential network analysis was paramount. We analysed the structure of the networks, ensuring their biological significance and consistency with known biological pathways and mechanisms. This comprehensive analysis not only validated the practical utility of our methodology but also provided novel insights into the biological systems under study. By correlating our findings with existing biological knowledge and experimental data, we improved the biological relevance of our results, further underscoring the value of differential network analysis in uncovering biological patterns and interactions. In the context of diabetes, we found some differential networks in Liver, Aorta, and Heart tissues are enriched for the Toll-like receptor signaling pathway, as indicated in Tables [149]3, [150]4, [151]7. The toll-like receptor 4 (TLR-4) pathway has been associated with various pathophysiological conditions, including cardiovascular diseases (CVDs) and Rheumatoid Arthritis (RA), underscoring the relevance of our findings in the broader context of human health. Different studies demonstrated that TLR4 activates the expression of several pro-inflammatory cytokine genes that play pivotal roles in myocardial inflammation, particularly myocarditis, myocardial infarction, ischemia-reperfusion injury, and heart failure [[152]45–[153]48]. Also, we found in Aorta and Heart tissues the NOD-like receptor signalling pathway; see Tables [154]6 and [155]7. The NOD-like Receptor (NLR) family of proteins is a group of pattern recognition receptors (PRRs) known to mediate the initial innate immune response to cellular injury and stress. Different studies reveal the role of the activation of the Nod-like receptor protein 3 (NLRP3) inflammasome in the pathogenesis of many metabolic diseases, including diabetes and its complications [[156]49]. Furthermore, the differential networks in the Aorta and Heart tissues were enriched for the TNF signalling pathway; see Tables [157]6 and [158]7. Lamki et al. [[159]50] reported that Tumor necrosis factor (TNF) represent a central mediator of a broad range of biological activities from cell proliferation, cell death and differentiation to induction of inflammation and immune modulation. TNF mediates the inflammatory response and regulates immune function. Inappropriate production of TNF or sustained activation of TNF signalling has been implicated in the pathogenesis of a broad spectrum of human diseases, including diabetes, cancer, osteoporosis, allograft rejection, and autoimmune diseases such as multiple sclerosis, rheumatoid arthritis, and inflammatory bowel diseases [[160]50, [161]51]. Instead, for the differential networks in Liver tissue were reported the chemokine signalling pathway that promotes changes in cellular morphology [[162]52] and insulin signalling pathway that accounts for selective insulin resistance [[163]53], see Table [164]6. Also, the differential network in the Aorta tissue was enriched for the PI3K-Akt signalling pathway, see Table [165]7. Phosphatidylinositol 3-kinases (PI3Ks) are crucial coordinators of intracellular signalling in response to extracellular stimulators. The hyperactivation of PI3K signalling cascades is one of the most common events in human cancers. The high recurrence of phosphoinositide 3-kinase (PI3K) pathway adjustments in cancer has led to a surge in the progression of PI3K inhibitors [[166]54]. For diabetes, we found that differential networks in Artery Coronary and Artery Tibial tissue were enriched for the PPAR signalling pathway, which activation is linked to a correlation between metabolic syndromes and cancer, see Table [167]6. Conclusion Differential network analysis (DINA) may help the understanding the intricate interactions within biological systems, especially regarding specific conditions or phenotypes. This study utilizes DINA to explore the differential molecular interactions related to diabetes mellitus, taking into account age and gender-specific variations. Differential networks are critical since they enable the detection and visualisation of variations in gene and protein interactions between males and females. This approach helps to identify subtle biological discrepancies that conventional analytical methods could miss. Non-parametric methods for DINA are emphasized in the study since they enhance the robustness and applicability of the findings, given the complex nature of biological data, rather than assuming normal distributions of gene expression data. The study applies this method to liver tissue gene expression data related to diabetes, identifying distinct networks that may be involved in sex-specific disease mechanisms. These findings can provide insight into why certain diseases exhibit different manifestations in men and women, ultimately leading to more targeted approaches to treatment and management. Identifying gender-specific differential networks helps to understand the molecular basis of diabetes and its variation with sex, providing potential pathways for therapeutic intervention and a deeper understanding of the etiology of the disease. This study demonstrates the effectiveness of the differential network approach in discriminating between male and female biological samples. The method effectively identifies and visualizes differences in molecular interactions related to diabetes in liver tissue between the two sexes. Such discrimination is vital for personalized medicine, leading to more precise diagnostic and therapeutic strategies. By effectively highlighting the differences in gene expression and interactions, the study supports the potential of DINA in improving our understanding of sex-specific traits in diseases, which is crucial for advancing gender-specific medicine. The article discusses the use of differential networks in biological research, especially in complex diseases such as diabetes. Integrating non-parametric methods improves the analysis by accounting for the inherent complexities and variations in biological data often overlooked in parametric approaches. This research contributes significantly to our understanding of diabetes and sets a precedent for future studies exploring other complex diseases with potential variations in expression and interaction patterns across different groups. Acknowledgements