Graphical abstract graphic file with name fx1.jpg [40]Open in a new tab Highlights * • DREAMER is a network-based pipeline to explore the mechanism of clinical phenotypes * • Approach uncovers shared mechanisms of adverse drug reactions and disease phenotypes * • DREAMER provides insights for improving drug safety * • Potentially accelerates the process of drug repurposing Motivation Adverse drug reactions (ADRs) are a major concern in healthcare and drug development, often resulting in severe clinical outcomes and leading to drug withdrawals. Despite their impact, understanding the complex, multifactorial nature of ADR mechanisms remains challenging, leaving critical gaps in drug safety assessments. An important question is whether network-based approaches, emerging as a promising frontier in systems pharmacology, can help uncover these mechanisms by integrating diverse biological and pharmacological data. We explored this question and developed DREAMER, a network-based framework designed to elucidate the molecular pathways underlying ADRs, with the aim of enhancing drug safety and identifying new opportunities for drug repurposing. __________________________________________________________________ Firoozbakht et al. introduce DREAMER, a network-based framework that elucidates shared molecular mechanisms underlying adverse drug reactions and disease phenotypes. By integrating diverse biological data, DREAMER identifies proteins that reveal therapeutic insights, advancing drug safety and repurposing opportunities across various clinical contexts. Introduction Adverse drug reactions (ADRs) are important concerns in pharmacology and healthcare. They are a leading cause of mortality and drug withdrawals.[41]^1 Gaining a deeper understanding of ADRs is essential for enhancing drug safety profiles and making informed healthcare decisions as they can reveal the complexity of in vivo human phenotypic responses.[42]^2^,[43]^3 By understanding the underlying mechanisms of ADRs, we can gain insight into a drug’s mechanism of action, which can assist in identifying new drug targets, enhancing drug repurposing, predicting new therapeutic indications, and advancing personalized medicine. Although some ADRs cannot be explained by known pharmacology and may result from non-specific interactions of reactive metabolites, drug kinetics, and/or environmental exposures, most ADRs are caused by unintended consequences of on-target or off-target drug-protein interactions.[44]^4^,[45]^5^,[46]^6^,[47]^7 Thus, drug-target interactions serve as valuable resources for understanding ADR mechanisms. Previous studies have considered the comprehensive set of drug targets to identify specific proteins associated with ADRs. An initial computational method for identifying ADR-related pathways (i.e., biological pathways that can explain the mechanisms of an ADR) was developed by Wallach et al.[48]^7 who hypothesized that drugs modulating the same pathways may lead to ADRs with similar phenotypes. To establish ADR-pathway relationships, they employed a logistic regression model to predict ADRs by quantifying drug-pathway interactions based on the docking scores of drugs to proteins within each pathway. Mizutani et al.[49]^8 identified protein-associated ADRs by calculating the sparse canonical correlation between drug-protein relations and drug-side-effect relations. Kuhn et al.[50]^9 further defined the relationship of proteins to ADRs by searching for statistically significant overlap between the set of drugs linked to their associated proteins and the set of drugs linked to the given ADR. To establish the relationship between ADRs and their potential drug targets, Lounkine et al.[51]^6 calculated an enrichment score for each target-ADR pair based on their observed versus expected co-occurrence, and a statistical significance test was applied to find likely target-ADR associations. Lim et al.[52]^10 constructed a heterogeneous network including drug, gene, and ADR nodes. They employed the ADR-gene pairs identified by Lounkine et al.[53]^6 and applied a collaborative filtering-based algorithm to predict the missing links between ADRs and genes. Next, using a permutation-based algorithm, statistically significant genes for each ADR were ascertained and used for pathway enrichment analysis. Park et al.[54]^11 assumed that ADRs reported for drugs targeting a single protein are entirely derived from perturbing that specific protein. Accordingly, they hypothesized that predicting the likelihood that a single-target drug causes an ADR corresponds to the probability that the protein target is associated with the ADR. Based on this concept, they reduced the problem of ADR-protein association prediction to the problem of drug-single-protein target prediction. To solve this problem, they constructed a network of drug-target and protein-protein interactions (PPIs) and used the node2vec representation algorithm to embed proteins and drugs into a low-dimensional vector space. They further used a logistic regression classifier for each ADR to score ADR-protein pairs. Despite the importance of drug targets in understanding ADR mechanisms, relying solely on these targets can lead to false positives (or failure to identify the true causative pathways owing to a limited search space). In contrast, exhaustive human genetic research has identified numerous disease-related genes. These genes are often linked to disease phenotypes (DPs), which can be considered analogous to ADRs.[55]^12 Such relationships can be leveraged to strengthen our confidence and reduce the false positives complicating the drug-target analysis approach. Nguyen et al.[56]^13 hypothesized that phenotypes caused by genetic variations could predict those by drug interactions with the encoded proteins. They demonstrated a significant correlation between the organ systems affected by genetic variations and those exhibiting ADRs when targeting the encoded proteins. Understanding the molecular mechanisms behind ADRs and DPs remains challenging. Existing approaches often treat ADRs and DPs separately, overlooking shared mechanisms. In this study, we introduce Drug Adverse Reaction Mechanism Explainer (DREAMER), a network-based method to uncover shared protein mechanisms between ADRs and DPs, which can enhance drug safety and repurposing[57]^14 efforts. Specifically, we hypothesize that equivalent ADR-DP pairs, representing the same phenotype, arise from variations in shared biological pathways ([58]Figure 1A). Figure 1. [59]Figure 1 [60]Open in a new tab Overview of the DREAMER pipeline (A) The basic hypothesis: phenotypically similar adverse drug reactions (ADRs) and disease phenotypes (DPs) might result from targeting of and variation in the same biological mechanisms and pathways. (B) To obtain ADR-related proteins, we diffuse from the drug targets and perform a statistical test for each protein. (C) To obtain DP-related proteins, we diffuse from the disease-related proteins and perform a statistical test for each protein. (D) Left: ADR-DP proteins comprise the intersection set of proteins with significant overlap between ADR-related proteins and DP-related proteins; right: an example of identified ADR-DP proteins for dyspraxia (MedDRA: 10009696) phenotype. (E) To limit potential confounding effects by organ/tissue-related indications, ADR-DP proteins are identified after removing the drugs with the same organ/tissue indication as the organ/tissue affected by the ADR. (F) To analyze the confounding effects of drug indications, protein scores are determined by diffusing from proteins related to the indications of drugs associated with a specific ADR, resulting in the identification of significant proteins called indication-related proteins. To explore this hypothesis, we constructed a comprehensive knowledge graph (KG) integrating drugs, diseases, ADRs, DPs, and proteins. Our KG links drugs to ADRs and targets, and diseases to DPs and related proteins, and includes PPIs. DREAMER applies a network diffusion algorithm to identify proteins associated with ADRs and DPs, reducing potential false positives by integrating proteins linked to equivalent ADR-DP pairs. This dual perspective enables a holistic view of molecular mechanisms underlying shared phenotypes. Key contributions of this study include the following: * (1) Constructing a KG that integrates diverse data sources, and particularly PPI networks, allowing analysis beyond individual proteins to capture broader molecular landscapes. * (2) Developing DREAMER, a network-based pipeline to identify proteins associated with clinical phenotypes. * (3) Providing a database of protein sets linked to phenotype mechanisms. Overall, this study offers a systems-level perspective on joint ADR and DP mechanisms through DREAMER, integrating ADR- and DP-associated proteins to advance systems pharmacology and enhance our understanding of molecular mechanisms. Results Dataset and network construction We constructed a heterogeneous network, also referred to as a KG, where drugs, diseases, proteins, ADRs, and DPs are represented as nodes. Links between nodes were established using various databases, incorporating ADR-DP, drug-ADR, disease-DP, drug-target, disease-gene,[61]^9^,[62]^15^,[63]^16^,[64]^17 PPIs from STRING,[65]^18 and physical interactions.[66]^19 Unless otherwise specified, results presented in the main text are based on the STRING network. An overview of our KG and the framework used for its construction are shown in [67]Figure S2. Summary statistics and data sources for the network are provided in [68]Table S1, with further details available in the [69]STAR Methods section. DREAMER pipeline To explore the underlying mechanisms of a specific phenotype, DREAMER identifies proteins related to a pair of ADR and DP that exhibit the same phenotype. The step-by-step pipeline of DREAMER is summarized in the following: * (1) Identification of ADR-related proteins: we started by identifying proteins associated with the ADR. Using a network diffusion approach (i.e., personalized page rank; see [70]STAR Methods), we diffused the signal from protein targets of drugs with a certain ADR over the PPI network. Therefore, as the initial condition for network diffusion, each protein in the network was assigned a probability score based on the frequency of being targeted by the drug associated with the queried ADR. This scoring process was followed by a permutation test (see [71]STAR Methods), generating p values for each protein. We considered proteins with adjusted p values below 0.05 as significantly related to the ADR ([72]Figure 1B). * (2) Identification of DP-related proteins: we applied the same network diffusion approach to identify proteins related to the DP with the same phenotype as the ADR in step 1, substituting drug targets with proteins related to the diseases linked to the queried DP. This step mirrors the ADR analysis and identifies DP-associated proteins ([73]Figure 1C). * (3) Intersection to minimize false positives: to enhance the specificity of proteins identified for each queried phenotype, we obtained the intersection of the corresponding ADR-related (step 1) and DP-related (step 2) protein sets ([74]Figure 1D). To evaluate the significance of the intersections, we applied the hypergeometric test, with Benjamini-Hochberg correction (adjusted p <0.05). Proteins present in both sets that pass the significance test were designated as ADR-DP proteins, which we hypothesized to be involved in the mechanisms linking the ADR and DP. Among 649 phenotypes in our network, 120 of them showed significant overlap between their ADR-related and DP-related proteins. These 120 phenotypes with their identified proteins are listed in [75]Tables S2 and [76]S3. * (4) Considering the confounding effect of drug indications: we consider potential confounding effects related to drug indications by the following analysis: + (a) Controlling the effect of indication-ADR organ overlap: removing drugs with the same organ/tissue indication as the ADR to avoid false associations ([77]Figure 1E). + (b) Controlling the effect of indication-related proteins: scoring proteins by diffusing from those related to drug indications to identify and remove significant indication-related proteins ([78]Figure 1F). After controlling for the confounding effect, the number of our significant phenotypes was reduced to 67 and their proteins are listed in [79]Tables S4 and [80]S5. As an example of the ADR-DP proteins identified in our study, the phenotype dyspraxia (MedDRA: 10009696) was associated with six proteins: three sodium channels (SCN1A, SCN9A, SCN1B) for action potential propagation, and three GABA[A] receptor subunits (GABRB3, GABRG2, GABRA6) for synaptic transmission ([81]Figures 1D; [82]Table S2). GABA[A] receptors enhance sodium channel activation at myelinated axon nodes, regulating sensory feedback. Dysregulation can lead to dyspraxia due to impaired motor coordination. As for visualization, we propose the diffusion map, which is a scatterplot representing each protein by its ADR-related and DP-related diffusion scores ([83]Figure 2A). The red points show proteins with scores that are statistically significant for both ADRs and DPs and usually have large diffusion scores for both ADRs and DPs. It is worth mentioning that proteins with high diffusion scores might not necessarily be significant. In certain cases, these proteins may be hub proteins (i.e., highly connected), enhancing the probability of achieving high scores in the null model and leading to their rejection in the permutation test. As predicted, this analysis identifies many proteins involved in disease processes. Next, we provide three examples of protein sets identified by DREAMER as significantly associated with ventricular arrhythmia, vasculitis, and thrombocytosis. Figure 2. [84]Figure 2 [85]Open in a new tab Diffusion map and reliability assessment of the identified protein set using the network diffusion algorithm on STRING (A) Diffusion map for ventricular arrhythmia, vasculitis, and thrombocytosis. The abscissa and ordinate values represent the diffusion scores of proteins initiated from the drug targets and disease-associated proteins, respectively. In the diffusion map, drug targets are represented by triangles, disease-associated proteins by squares, proteins that are both drug targets and disease-associated proteins by diamonds, and proteins that are neither drug targets nor disease-associated proteins by circles. (B) Number of phenotypes shared between our constructed KG and known databases (C) Comparison of our diffusion-based method and the baseline method by identifying the number of ADRs and DPs with significant overlaps between proteins detected by different methods and those reported in the known databases (also see [86]Figures S3A and S3B). For ventricular arrhythmia ([87]Figure 2A), many of the significant proteins identified by the diffusion algorithm are ion channel proteins, such as those that contribute to Ca^2+ (CACNA1C, CACNA1C-IT2), Na^+ (SCN3A, SCN1B, SCN4B, SCN10A), or K^+ (KCNQ1, KCNE3, KCNH2) transport in the heart.[88]^20 Previously described mutations in KCNQ1 and KCNH2 are associated with dysfunction of the voltage-gated K^+ channel resulting in ventricular arrhythmias, such as long QT syndrome and ventricular fibrillation.[89]^21 Additionally, patients treated for ventricular arrhythmias often have their potassium (K^+) levels tested and receive supplements if their levels are low. This is because hypokalemia, or low potassium levels, is a well-known risk factor for arrhythmias.[90]^22 Mutations in Na^+-channel proteins (suprachiasmatic nucleus [SCN] proteins) can result in long QT syndrome or atrial fibrillation.[91]^23 Some of these ion channels are also present in other tissues, including brain, muscle, stomach, and colon. For example, mutations in the SCN1B can increase not only the risk of cardiac arrhythmia but also epilepsy.[92]^24 Therefore, drugs that alter their function can have cardiovascular, muscular, gastrointestinal, or neurological consequences, depending on which organs express the specific channels. Similarly, mutations in CACNA1C alter L-type voltage-gated Ca^2+-channels and are associated with long QT and short QT syndromes. An example is Timothy syndrome, the complex congenital syndrome caused by CACNA1C mutations,[93]^25 which involves cardiac manifestations such as long QT, along with one or more non-cardiac phenotypes such as skeletal, facial, and neurodevelopmental abnormalities.[94]^26 Vasculitis encompasses a heterogeneous group of diseases involving large, medium, or small vessels depending on the underlying specific disease.[95]^27^,[96]^28 Hallmarks include damage or dysfunction of the endothelial cells that line blood vessels, and treatments vary depending on the specific type. The DP reflects the action of specific proteins that govern the inflammatory response ([97]Figure 2A), including PTGS1 and PTGS2, known as COX-1 and COX-2 enzymes. Kawasaki disease, a pediatric vasculitis, is treated with aspirin targeting these enzymes and thereby reducing inflammation. Similarly, methotrexate, which inhibits DHFR, is used in the treatment of other forms of vasculitis, having more potent anti-inflammatory effects than aspirin or non-steroidal anti-inflammatory drugs (NSAIDs). Activation of AHR, the aryl hydrocarbon receptor, is also associated with promoting vascular inflammation; however, downregulation of AHR can also exacerbate vascular injury by enhancing the function of monocytes and macrophages.[98]^29^,[99]^30 Statins, widely known for their ability to decrease cholesterol and reduce atherosclerosis via the inhibition of HMGCR, have beneficial anti-inflammatory effects on endothelial function and are being considered as additional therapies in some forms of vasculitis.[100]^31^,[101]^32 Distinct from vasculitis, which involves inflammation of blood vessels, thrombocytosis is characterized by an elevated platelet count. This hematologic abnormality is reflected in the identified proteins that drive the phenotype, such as platelet-derived growth factor receptor-alpha and beta (PDGFR-α and -β) ([102]Figure 2A), which are present on both platelets and megakaryocytes, platelet precursors.[103]^33 Inhibition of these tyrosine kinases with imatinib and other related targeted therapies reduces megakaryocyte survival and proliferation and decreases platelet numbers by blocking PDGF signaling. Myeloproliferative syndromes, including essential thrombocythemia, can result from mutations in the JAK2, CALR, and MPL genes, each acting as drivers of the fusion protein BCR-ABL1 to increase cell (platelet as well as leukocyte) production.[104]^34 In summary, these examples demonstrate how the DREAMER pipeline can identify proteins that are mechanistically known to be associated with specific phenotypes. Reliability assessment of the network diffusion method In this section, we assess the reliability of our network diffusion algorithms in identifying relevant proteins compared to a baseline method. To do so, we examined the overlap between proteins identified by the algorithms (network diffusion-based and baseline) and proteins previously reported in the literature as associated with specific phenotypes (referred to as a priori known proteins). Specifically, we calculated how many phenotypes exhibit a significant overlap between ADR-related (or DP-related) proteins identified by our network diffusion-based method and known proteins. Significance was determined using a hypergeometric test with Benjamini-Hochberg correction (adjusted p <0.05). We then compared this count of significant phenotype overlaps with those observed using the baseline method. As a baseline, we implemented a published method[105]^9 that links proteins to phenotypes based on statistical testing (see [106]STAR Methods). Notably, both methods were applied to the same KG that we constructed for this study. Although curated resources of a priori known proteins for ADRs and DPs are limited, they can still serve as valuable benchmarks for assessing the reliability of our identified proteins. Accordingly, we compiled known proteins from various available sources to use as a reference standard, as described in the following. Literature-based associations Lu et al.[107]^35 compiled DP-related proteins from the PubMed and SemMed databases using natural language-processing methods and manual curation, identifying co-occurrences of DP and protein keywords in abstracts published before January 2022. In our analysis, we identified 134 phenotypes shared between the dataset of Lu et al.[108]^35 and our dataset ([109]Figures 2B and [110]S3A). We evaluated the performance of our network diffusion method against the baseline by calculating the number of DPs whose identified proteins significantly overlapping literature-based proteins ([111]Figures 2C and [112]S3B). As shown in [113]Figure 2H, our method achieved a significant overlap for 66 DPs, outperforming the baseline. Notably, we also observed a substantial overlap between our ADR-related proteins and literature-based proteins, despite the latter not covering ADRs. Indirect associations Several DP terms are equivalent to disease terms, allowing genes associated with these diseases to serve as a priori known proteins. We term these “indirect associations.” This equivalence enables evaluation of our identified DP by their overlap with proteins encoded by disease-related genes. Lu et al.[114]^35 compiled a set of such indirect DP-gene associations using phenotype-genotype databases. We identified 115 DPs common to both our dataset and theirs ([115]Figures 2B and [116]S3A). As shown in [117]Figures 2C and [118]S3B, our identified DP proteins exhibit a significantly greater overlap with these indirect DP-gene associations compared to the baseline. Open Targets-derived data The Open Targets platform aggregates direct and indirect associations between targets and diseases from various sources, including genetic associations, somatic mutations, drugs, pathways, RNA expression, text mining, and animal models.[119]^36 Direct associations are based on evidence explicitly linking a target to a phenotype. Indirect associations leverage the hierarchical structure of the disease ontology. We identified 69 phenotypes with direct associations and 70 with indirect associations shared between Open Targets and our dataset ([120]Figures 2B and [121]S3A). As shown in [122]Figures 2C and [123]S3B, our method outperformed the baseline in identifying known proteins reported in the Open Targets dataset. Notably, none of the disease-gene associations in the Open Targets dataset was found in our KG. While our diffusion-based model outperformed the baseline in identifying ADR- and DP-related proteins, we do not expect perfect overlap with known proteins. This is because the available known proteins are neither sufficiently comprehensive nor adequate as a complete ground truth. Therefore, although showing some degree of overlap between the identified ADR-related and DP-related proteins with proteins derived from a priori known proteins is useful for validation, we expect to recognize de novo proteins for each phenotype. Moreover, to reduce the false positives, we identified proteins present in both the ADR-related and DP-related sets for each phenotype, termed as ADR-DP proteins, as described in step 3 in DREAMER pipeline. Holdout validation DREAMER identifies proteins mechanistically related to specific phenotypes by analyzing the network proximity of proteins to drugs and diseases associated with that phenotype. This raises a question about the generalizability of DREAMER-identified proteins: would ADR-DP proteins remain relevant when our KG encounters new drugs and diseases? Accordingly, we hypothesized that any new drug and disease linked to a given phenotype would likely have at least one associated protein in closer proximity to the DREAMER-identified proteins than drugs and diseases without that phenotype ([124]Figure S3C). To test this hypothesis, we performed a holdout analysis to further validate our pipeline. For each phenotype, we split its associated drugs and diseases into two sets: 80% as the discovery set and 20% as the validation set. Additionally, for each phenotype, we randomly selected drugs and diseases that are not associated with the given phenotype, with an equal number to that the validation set. Here, we refer to the drugs and diseases in the validation set as “positive assets” and the randomly selected ones as “negative assets.” In the discovery phase, we use DREAMER to identify proteins related to each phenotype based on the drugs and diseases in the discovery set. In the validation phase, for each phenotype, we assessed the network proximity of the identified proteins with the positive and negative assets. We expect that proteins identified in the discovery phase for a given phenotype would show higher network proximity to the positive assets than to the negative assets. To assess the network proximities, we used the shortest path in the network. Specifically, we counted the shortest paths of lengths less than or equal to X ∈ {0, 1, 2, …} for both positive and negative assets across all phenotypes. Using Fisher’s exact test, we evaluated whether the proportion of positive assets with a shortest path ≤X was significantly higher than that for negative assets. The results, shown in [125]Table 1, indicate that positive assets are indeed significantly closer to DREAMER-identified proteins than negative assets. This finding supports our hypothesis that ADRs and DPs that exhibit the same phenotype can arise from variation in the same proteins and pathways. Table 1. The p values obtained from Fisher’s exact test for the holdout validation Holdout type Thresholds __________________________________________________________________ X = 0 X = 1 X = 2 X = 3 X = 4 X = 5 X = 6 Drug holdout 5.7e−08 1.4e−08 0.07 0.1 – – – Disease holdout 3.1e−09 1.2e−10 2.7e−07 0.1 0.03 0.3 0.5 Drug holdout (after drug clustering) 0.003 0.003 0.9 0.9 0.9 – – [126]Open in a new tab Additionally, to ensure a more rigorous assessment, we repeated the process above by splitting the drugs into discovery and validation sets based on their molecular dissimilarities using the DataSAIL package in Python.[127]^37 Specifically, DataSAIL employs an algorithm to minimize similarity between molecules in the discovery and validation set. To measure drug similarities, DataSAIL calculates Tanimoto coefficients between molecular fingerprints derived from their Simplified Molecular Input Line Entry System (SMILES) representations. This approach reduces the risk of information leakage and structural similarity between discovery and validation sets. The results are presented in [128]Table 1 and are consistent with those from the previous holdout validation analysis. Considering the confounding effect of drug indications The proteins identified by DREAMER for a specific phenotype may have been recognized under the influence of the indications of the drugs with the corresponding ADR. In this context, we consider two types of potential confounding effects: * (1) Organ/tissue overlap: when a drug is used to treat a condition in a specific organ/tissue, it might also have targets that seem relevant to its associated ADR affecting the same organ/tissue. However, these associations might only reflect the drug’s intended action in that organ or tissue, rather than the ADR’s underlying mechanism; * (2) Indication-related proteins: for drugs with a specific ADR, the conditions they treat may be associated with proteins that are closely related to our identified ADR-DP proteins. In such cases, the identified ADR-related proteins may be influenced by the drug’s therapeutic indications rather than a direct mechanistic link to the ADR itself. In the following, we focus on each of the mentioned confounding effects and describe the pipeline we employed to address them. To reduce organ/tissue-related confounding, for each ADR, we excluded all of its associated drugs with indications affecting the same organ/tissue as the ADR ([129]Figure 1E). For example, for the cardiovascular-related ADR-phenotype tachycardia, we excluded all drugs with tachycardia as their ADRs that also had at least one cardiovascular-related indication. Specifically, we manually identified the relevant organs/tissues for the 120 ADR-DPs ([130]Table S6) and obtained organ indications for the drugs from a previous study.[131]^13 Among 465 drugs in our network, 328 were listed in their dataset. As a result, the number of ADRs with at least one associated drug was reduced to 97 (see [132]Tables S4 and [133]S5). We then reapplied the DREAMER pipeline to obtain a new set of proteins for each phenotype. For 90 out of 97 phenotypes, we observed a significant overlap between the proteins identified before and after the organ/tissue-based drug removal described above (tested via hypergeometric test, p < 0.05) listed in [134]Tables S4 and [135]S5. We note that, after the removal of the drugs with indications in the same organ/tissue as the ADRs, the average number of drugs was reduced to 6.6 from 14.2 for each ADR. While, on average, 30% of the drugs are excluded in this analysis, the results do not show a significant variation, which further indicates the robustness of our pipeline. To address the potential confounding effect of indication-related proteins, we investigated whether ADR-DP proteins for each phenotype interact with proteins associated with therapeutic indications of drugs linked to the same phenotype. Using the network diffusion algorithm over the PPI network ([136]Figure 1F; see [137]STAR Methods), we computed diffusion scores for proteins based on their PPI adjacencies with drug indications, as previously described for identifying ADR and DP proteins. Of the 120 phenotypes analyzed, 95 were linked to at least one drug that has at least an indication with at least one associated gene in our dataset (listed in [138]Tables S4 and [139]S5). Limiting our analysis to these 95 phenotypes, we obtained diffusion scores from the indications of drugs linked to each ADR. [140]Figure 3 shows the diffusion map, now incorporating a third dimension representing diffusion scores based on drug indications. After identifying the diffusion score of proteins with respect to the drug indications, we performed a permutation test (see [141]STAR Methods) to assign a p value to each protein. Proteins with corrected p < 0.05 were considered as indication-related proteins (see [142]STAR Methods). We then recognized phenotypes with significant overlap (hypergeometric test with Benjamini-Hochberg adjusted p < 0.05) between the indication-related proteins and ADR-DP proteins. For 84 out of 95 phenotypes (listed in [143]Tables S4 and [144]S5), no evidence of significant overlap was found. As can be seen in [145]Figure 3A, in these phenotypes, the ADR-DP proteins (indicated in red) have smaller diffusion scores with respect to the third dimension, suggesting that, for these phenotypes, the subnetworks related to drug indications are far from those related to ADR-DP proteins. For example, intracranial hemorrhage, a critical condition involving bleeding within the brain, was linked to proteins of the coagulation and anticoagulation pathways such as protein C ) PROC), factor X (F10), and prothrombin (F2) ([146]Table S2). Dysregulation of these proteins can impair hemostasis and formation of stable clots, leading to an increased risk of excessive bleeding events such as intracranial hemorrhage. These findings suggest that, at least for these phenotypes, the identified ADR-DP proteins have no significant association with the protein drivers of their clinical indications for their related drugs. Figure 3. [147]Figure 3 [148]Open in a new tab 3D diffusion map and pathway enrichment results for the top phenotypes (A) Phenotypes whose identified ADR-DP proteins (indicated in red) are far from the subnetworks related to drug indications. In diffusion maps, the x, y, and z axes represent the diffusion scores of proteins from drug targets, disease-related proteins, and drug-indication-related proteins. (B) Phenotypes whose identified ADR-DP proteins (indicated in red) have larger values in the third axis, and their independence from indication-related proteins is not trivial (also see [149]Figure S4). (C) The ranked list of phenotypes based on the significance of their ADR-DP proteins. (D) Reactome and Gene Ontology over-representation analysis for ADR-DP proteins of the top three phenotypes. By contrast, for the remaining 11 phenotypes ([150]Figure 3B; [151]Figure S4), the ADR-DP proteins show higher values on the third axis, indicating a non-trivial independence with indication-related proteins that requires a further investigation by domain experts. Accordingly, the 3D diffusion map can help to inspect such ambiguities. Put another way, high z axis values may reflect reverse causality, where the ADR is a downstream consequence of treatment and not directly associated with the underlying phenotype mechanism. For example, in hyperuricemia, we identified ADR-DP proteins that included PDGFRA, FLT3, and ABL1 as having high values in the z axis. These proteins are commonly targeted by drugs for cancer treatment, including leukemia, and play roles in the differentiation, division, and growth of cells. Cell death induced by these treatments can lead to great increases in uric acid in the blood, overwhelming the body’s normal ability to clear that metabolite and ultimately causing renal dysfunction, further worsening the hyperuricemia. In vasculitis, proteins like PTGS2 ([152]Figure 2B) are crucial in mediating inflammation via prostaglandin synthesis. Incorporating indication-related proteins, we observe an overlap in inflammatory activation between the indication (z axis) and vasculitis (y axis) ([153]Figure 3B). Similarly, ITGAL (CD11a/LFA-1), essential in leukocyte migration,[154]^38 may be detected due to the (1) modulation of immune filtration by the indicated drug/disease and/or (2) vasculitis itself affecting immune cell-endothelial interactions. For hypersomnia, VIP’s high z axis value may relate to VIPomas, where octreotide inhibits excessive VIP secretion. However, VIP is produced by neurons in the SCN of the hypothalamus where it maintains normal circadian rhythms,[155]^39 supporting its potential involvement in hypersomnia too. VIP’s role in various phenotypes likely depends on its anatomic location and quantification. According to both analyses discussed above, we identified mechanisms for 67 phenotypes that show no evidence of association with drug indications ([156]Table S4). Additionally, when replacing the STRING PPI network with the physical PPI network, our analysis identified mechanisms for 56 phenotypes ([157]Table S5) that are not related to drug indications. Notably, there was an overlap of 29 phenotypes between these two analyses. The ADR-DP proteins identified from both analyses for all these 29 phenotypes showed significant overlap (hypergeometric test, adjusted using the Benjamini-Hochberg method, p < 0.05). Biological insights into phenotype mechanism of action In this section, we investigate the biological function of the protein sets identified by DREAMER for 67 phenotypes with no evidence of association to drug indications, based on evidence from pathway enrichment analysis and supported by prior literature to connect these findings to known physiological and pathological processes, while knowing limitations of indirect connections. We first ranked these phenotypes based on the significance of the overlaps between their ADR proteins and DP proteins, which supports our hypothesis that overlapping ADR and PD protein sets may indicate shared underlying biological mechanisms. [158]Figure S5 shows the ranking of all phenotypes, with the top 20 phenotypes shown in [159]Figure 3C. We then found the enriched pathways for the ADR-DP protein sets based on an over-representation analysis using the Reactome[160]^40 and Gene Ontology[161]^41 ([162]Table S7) databases. The results for the top three ranked phenotypes are illustrated in [163]Figure 3D. Pathway analysis highlights the physiological processes involved in these disorders. For example, chloride is an anion that is mostly found in the extracellular space. Its concentration is regulated by the gastrointestinal tract, where it is absorbed from food, as well as the kidney, where it is excreted in urine or reabsorbed in the proximal tubule. Chloride transport relies on transmembrane ion transporters and cotransporters as well as additional Na^+/K^+ ATP-dependent ion transporters that provide energetics for Cl^− transport ([164]Figure 3D). Thus, its concentration is dependent on that of other ions, such as Na^+, K^+, and bicarbonate (HCO[3]^−). Owing to its inverse relationship with bicarbonate, hypochloremia can result in metabolic alkalosis. Hypochloremia can occur due to gastrointestinal causes, such as vomiting; or renal loss of chloride due to the use of diuretics (hypochloremic metabolic alkalosis due to excessive fluid loss leading to volume contraction); and/or because of hyponatremia and hypokalemia, as the fluxes in sodium and potassium will affect chloride levels.[165]^42 The coordinated movement of ions through voltage-gated ion channels is important to maintain the rhythmic beating of the heart ([166]Figure 3D). Disruption of these processes leads to abnormal action potentials, arrhythmias, and ventricular fibrillation. Many of these ion channels also play a role in other organs, including the brain.[167]^43 In personality disorder (a complex, comparatively non-specific phenotype), the identified pathways are all known key mechanisms for various psychiatric conditions. The neurotransmitter receptors and postsynaptic signal transmission reflect the significant roles of dysregulated neurotransmitter systems implicated in a range of personality disorders such as mood and bipolar disorder. Altered phase 0, representing rapid neuronal depolarization, can lead to epilepsy[168]^44 ([169]Figure 3D). Dysregulated membrane potential can be influenced by chloride transport and can impair GABAergic transmission ([170]Figure 3D). Impairment in GABAergic transmission plays a significant role in the pathophysiology of major depressive disorder (MDD),[171]^45 schizophrenia,[172]^46 bipolar disorder,[173]^47 and autism,[174]^48 and lower levels of GABA are often identified as the main endophenotype of MDD.[175]^49 The antidepressant effect of ketamine may also be related to its selective impact on GABAergic interneurons, blocking NMDA receptors and reducing inhibitory signals to enhance cortical excitation. Additionally, the interaction between L1CAM and ankyrins ([176]Figure 3D) guides neuronal adhesion and signaling, where abnormalities are associated with neurodevelopmental disorders like autism.[177]^50 These mechanistic phenotype pathways emphasize the interconnected roles of neurotransmitter signaling, synaptic function, and neuronal excitability in personality (and other psychiatric) disorders. Overall, these results provide preliminary insights into potential biological mechanisms connecting ADRs and DPs through shared protein pathways. We note that, while pathway enrichment and literature evidence support these findings, future experimental studies need to confirm these mechanistic connections. Application of DREAMER for therapeutic potential The proteins identified for each phenotype using DREAMER can open new avenues for drug design and drug repurposing (i.e., an approach to identifying new therapeutic uses for drugs that are already approved for specific disorders).[178]^14 It can be hypothesized that targeting proteins identified for each phenotype is most likely either to induce or treat the phenotype, as one cannot determine directionality a priori from this analysis. Therefore, DREAMER can be leveraged for drug discovery in two ways: (1) predict possible ADRs for new drugs based on their known targets, and (2) design new drugs or suggest repurposing candidates, based on their targets, for a disease. In particular, to showcase the application of DREAMER for drug repurposing in the context of the second case, we focus on phenotypes for which there is evidence that targeting their ADR-DP proteins can treat the corresponding phenotype. For this purpose, we identified phenotypes whose ADR-DP proteins contain at least one protein targeted by a drug with an indication with the same terminology as the specified ADR. For example, cardiac arrest is a terminology that is assigned to an ADR (MedDRA: 10007515 in the SIDER dataset), a DP (with hpo:0001695 in the Human Phenotype Ontology [HPO] dataset), and drug indication (with mondo:0000745 in Mondo Disease Ontology dataset). Interestingly, we found three drugs (diltiazem, carvedilol, and verapamil) that are indicated for cardiac arrest and target at least one of the proteins recognized by DREAMER for cardiac arrest ([179]Figure 4). Thus, hereafter we refer to such drugs as “indicated drugs.” In our dataset, we identified a total of eight such phenotypes, namely cardiac arrest, hypophosphatemia, precocious puberty, Torsades de Pointes, thrombocytosis, peptic ulcer, ventricular tachycardia, and ventricular fibrillation ([180]Table S8). [181]Figure 4 illustrates PPI subnetworks for five of those phenotypes restricted to their ADR-DP proteins along with the drugs that target them and have indications for those phenotypes. The complete list of the indicated drugs along with their targets among ADR-DP proteins is provided in [182]Table S8. Figure 4. [183]Figure 4 [184]Open in a new tab Subnetworks of identified protein sets along with indicated drugs that target them Pink nodes are the proteins that are targeted by indicated drugs, and blue nodes are the rest of the ADR-DP proteins. To find opportunities for drug repurposing on the above-mentioned phenotypes, we found all the drugs that have at least one target among their ADR-DP proteins and focused only on those with no ADR on the corresponding phenotype and those that have not previously been found to have an indication for the corresponding phenotype in our dataset ([185]Table S9). We, thus, refer to these drugs as candidate drugs for repurposing. For example, sotalol is recognized for its efficacy in treating various cardiac arrhythmias by targeting KCNH2 (hERG) channels. In our dataset, sotalol is recognized to have indications for ventricular fibrillation. However, sotalol can cause prolongation of the QT interval, leading to ventricular arrhythmias such as ventricular tachycardia, ventricular fibrillation, cardiac arrest, and, in particular (based on our dataset confirmed by the published literature), Torsades de Pointes.[186]^51^,[187]^52^,[188]^53 Although sotalol has not been reported to be related to cardiac arrest in our dataset, it is among the candidate drugs for the treatment of cardiac arrest in our analysis ([189]Table S9). Similarly, ranolazine, a drug used to treat angina pectoris, has been used off-label for the treatment of ventricular arrhythmias.[190]^54 In addition, we conducted a comprehensive search on the clinical trials website ([191]ClinicalTrials.gov) to find evidence for these candidate drugs. Using a customized Python script, we queried all pairs of phenotypes and their candidate drugs and then carefully inspected all the derived results. Accordingly, we found a number of such phenotype-drug pairs listed in [192]Table 2 along with their [193]ClinicalTrials.gov IDs. Table 2. The phenotype-drug evidence for candidate drugs in the ClinicalTrials website Phenotype Candidate drug ClinicalTrials.gov ID Ventricular tachycardia ranolazine [194]NCT01590979 Cardiac arrest ranolazine, domperidone [195]NCT00998218, [196]NCT04024865, [197]NCT02500108, [198]NCT01907633, [199]NCT00925457 Peptic ulcer baclofen [200]NCT00414856, [201]NCT00461604, [202]NCT00978016 Precocious puberty anastrozole, letrozole [203]NCT00094328, [204]NCT00055302 Torsades de Pointes progesterone, testosterone [205]NCT01929083, [206]NCT02513940 Ventricular fibrillation ranolazine [207]NCT01887353, [208]NCT01558830 [209]Open in a new tab To repurpose novel drugs for these phenotypes, we scored and ranked all candidate drugs based on the ratio between the number of proteins they target within ADR-DP proteins and their overall number of targets. [210]Figure 5 shows these rankings for the top 20 drugs, with circle sizes representing the total number of targets of each drug within the ADR-DP protein set. The circles are colored red to indicate drugs that have been found to be under investigation in clinical trials as being relevant to the queried phenotype. Candidate drugs that target proteins also targeted by indicated drugs may have a potential for repurposing (marked in green), as targeting these proteins has already been shown to be effective in treating the phenotype, but experimental testing is required to establish their efficacy. Circles colored in blue could potentially represent interesting findings to repurpose for their corresponding phenotypes, as they have not been previously targeted for the treatment of those phenotypes. Moreover, we hypothesize that drugs with higher ranks are more likely associated with the corresponding phenotype. Figure 5. [211]Figure 5 [212]Open in a new tab Top-ranked drugs with at least one target in the identified ADR-DP proteins for the phenotypes Red circles indicate candidate drugs that are found to be under investigation in clinical trials. Green circles indicate candidate drugs that target proteins that are already targeted by indicated drugs. Blue circles indicate candidate drugs whose targets were not already targeted by any indicated drugs. The size of the circles indicates the total number of proteins the candidate drugs target within ADR-DP proteins.: GHB∗, gamma-hydroxybutyric acid. In summary, DREAMER provides a framework for identifying potential therapeutic targets; however, further research is essential to validate these findings. Future studies should focus on experimental and clinical validation of these candidate drugs and explore the mechanistic basis of their associations with the specified phenotypes. Discussion In this study, we present DREAMER, a network-based method designed to investigate the underlying mechanisms of DPs and ADRs. While previous efforts have explored the mechanisms of ADRs and DPs separately, focusing on ADR-target PPIs or DP-gene associations, our approach integrates both phenotypes, offering a more comprehensive insight into their mechanisms. By identifying interconnected modules between ADRs and DPs, DREAMER effectively uncovers shared molecular mechanisms underlying 67 phenotypes, supporting our hypothesis. Furthermore, our reliability assessments and validation analyses confirm the robustness of DREAMER’s approach, underscoring its contributions to systems pharmacology. Our pipeline has the potential to identify biomarkers for designing safer therapeutic strategies that minimize the need for drug discontinuation and enhance opportunities for drug repurposing, thereby supporting more effective and personalized treatments. Network-based methods have proved to be effective in understanding the complex biology and systems pharmacology.[213]^55^,[214]^56^,[215]^57 By representing biological entities as interconnected nodes in a network, these methods enable the identification of interactions and functional modules that may not be apparent when examining isolated entities. Unlike traditional approaches, which often rely on linear associations, network-based methods capture the interconnected nature of biological processes, providing a holistic view that can account for indirect and higher-order relationships across entities. This approach is particularly beneficial in exploring mechanisms of ADRs and DPs, as it allows us to understand how proteins and pathways implicated in therapeutic indications might overlap with or diverge from those contributing to adverse effects. Targeting proteins in the organ of interest with drugs provides the basis for in vivo experiments that can explain the relationship between the functionality of that protein, systemic effects, and phenotypic responses. These proteins can be both on target and off target of drugs. While they are primarily targeted to treat specific indications or their phenotypes, they can also induce unintended side effects. Currently, drug safety evaluation, to a large degree, relies on animal experiments, which do not always translate reliably to humans owing to inherent biological differences.[216]^12 In recent years, the increased availability of public databases including drug targets and ADRs has become a more reliable source of human-specific information. Genetic variations, by contrast, can be considered natural experiments, providing insights into the mechanism of phenotypes. Genome-wide association studies have been extensively utilized to identify novel therapeutic targets, with a greater probability of drug approval when these targets are corroborated by human genetic evidence for the desired indication. Additionally, there is a growing interest in harnessing human genetic studies to predict the risk of ADRs. The importance of applying this strategy is more pronounced where suitable animal models for drug safety assessment are lacking.[217]^12 Protein modules that are affected by both drugs with a specific ADR and diseases with a similar phenotype provide more evidence to explain the mechanism underlying the phenotype. Our study advances this concept by considering modules targeted by drugs and diseases exhibiting phenotypically similar ADRs and DPs. The ADR-DP proteins identified by DREAMER may be influenced by the drug indications in the PPI, which can be recognized by their large values along the z axis in the 3D diffusion map ([218]Figures 3A and 3B). Such instances may affect our interpretations and should be addressed carefully. Specifically, we encountered three scenarios: (1) ADR-DP proteins have an indirect association with the phenotype of interest, as seen in cases like hyperuricemia caused by cancer therapies; (2) higher-order relationships between the drug indications and the phenotype (such as both being related to the same tissue or organ); and (3) the phenotype mechanism stems from on-target effects. We acknowledge that, while the last scenario does not limit our interpretation, distinguishing it from the first two scenarios might be challenging or even infeasible. Some phenotypes are multifactorial and are not directly linked to the drug’s molecular effect(s) alone. However, it is still valuable to explore whether any of these phenotypes have a specific molecular mechanism that connects them to both ADRs and DPs simultaneously. For example, in the case of female infertility, which often stems from prior infections, DREAMER has identified proteins enriched in the metabolism of steroid hormone pathways ([219]Tables S2 and [220]S7). In contrast, dementia may arise from multiple factors beyond genetics, such as age, lifestyle, social engagement, and cognitive function, which would be discarded by DREAMER owing to the lack of significant overlapping proteins connecting the ADRs and DPs. In this analysis, we used the STRING network, which integrates diverse PPIs from various sources, including direct physical interactions and indirect associations such as genetic co-occurrences, co-expression, and computational predictions. This broad interaction dataset enables exploration of phenotype mechanisms through higher-order interactions. While STRING offers a comprehensive view of generalized interaction data, we also applied DREAMER using a physical PPI network, which focuses exclusively on high-confidence interactions validated by experimental methods such as yeast two-hybrid screening, NMR spectroscopy, X-ray crystallography, and cryoelectron microscopy. Shared phenotypes between the STRING and physical PPI networks are extensively studied, commonly encountered in clinical practice, and span multiple organ systems, such as cardiac arrest, bradykinesia, ventricular arrhythmia, and interstitial pneumonitis ([221]Figure S5). However, the physical PPI network uniquely identifies dementia and parkinsonism (cognitive and motor symptoms), while STRING uniquely identifies akinesia, a symptom of parkinsonism related to movement initiation difficulties. The reasons for these distinctions remain uncertain at this time but likely reflect the complexity of the phenotypes as well as their lack of specificity in some cases. Similar trends are observed in inflammatory phenotypes: both networks identify interstitial pneumonitis, but the physical PPI network uniquely captures systemic inflammatory conditions (e.g., rheumatoid arthritis, leukocytosis, involving direct interactions with mediators derived from circulating immune cells), while STRING identifies organ-specific inflammation (e.g., cholecystitis, cholangitis). Overall, STRING identifies more phenotypes and a broader range of conditions, including structural cardiac and metabolic abnormalities, reflecting its integrative approach. In contrast, the physical PPI network identifies disorders with better-characterized molecular mechanisms. To assess the validity of our identified proteins, we conducted several analyses: (1) reliability assessment, to demonstrate the overlap of our identified diffusion-based proteins with a priori known proteins and show their superiority over a baseline model; (2) holdout validation, to show that the identified proteins are generalizable to new drugs and diseases that are added to our KG; and (3) robustness assessment, to demonstrate that the identified proteins statistically remain consistent even when 30% of the drugs are excluded from our KG. Although in silico validation for computationally identified proteins is necessary, true validity of the identified proteins can only be confirmed through experimental validation. For example, gene-knockout studies in animal models allow researchers to assess whether eliminating genes identified by in silico models produces a phenotype that mirrors the ADR of interest. Following validation in animal models, clinical studies provide the strongest confirmation of these mechanisms in humans. One potential and interesting application of this work could be in drug design and repurposing. We identified and showed eight phenotypes ([222]Table S8) where drugs targeting ADR-DP proteins for some phenotypes have indications for diseases with the same phenotype. Extending this idea to other phenotypes, DREAMER can reduce the search space to find relevant protein targets for a particular phenotype. Moreover, DREAMER can be used for drug off-target prediction. Drugs with a particular ADR are expected to bind to a protein within (or close to) the identified protein sets that govern that ADR (side effect module).[223]^58 The reduced protein space can then be used to infer the potential off-target proteins of drugs using computational methods (e.g., Autodock and Autodock-vina[224]^59^,[225]^60) or experimental methods (e.g., based on established physicochemical methods). Limitations of the study DREAMER explores the mechanism of phenotypes without considering the specific variations in individual molecular profiles, which are crucial for personalized medicine. To advance our understanding in personalized medicine, one will also require access to individual-specific information. The Food and Drug Administration (FDA) Adverse Event Reporting System[226]^61 provides extensive patient information, including ADRs, drug prescriptions, dosages, and demographic details, which can be leveraged to help elucidate the mechanisms of phenotypes in the context of personalized treatments but ultimately will require molecular-level information with which to generate individual PPIs.[227]^62 A potential future direction is the investigation of the phenotype mechanism in the context of combination therapy. Drugs can be prescribed as monotherapies or combination therapies,[228]^63 with the latter offering synergistic benefits for complex or multiple disorders but potentially introducing unique ADRs. For instance, in Parkinson’s disease, levodopa is prescribed to increase dopamine level and, in combination with that, carbidopa is prescribed to reduce peripheral conversion, reducing the ADRs such as nausea. The TWOSIDES[229]^1 database provides insights into ADRs related to drug combinations and can aid in identifying their mechanisms.[230]^64^,[231]^65 Additionally, similarities among certain phenotypes can improve the reliability of mechanism identification. Phenotypes can be clustered based on shared drugs and diseases using techniques like biclustering[232]^66 or KG representation learning.[233]^67 In conclusion, DREAMER advances our understanding of ADR and DP mechanisms, offering a valuable tool for improving drug safety, repurposing, and personalized medicine. Resource availability Lead contact Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Farzaneh Firoozbakht (farzaneh.firoozbakht@uni-hamburg.de). Materials availability This study did not generate new unique reagents. Data and code availability * • ADR-phenotype, drug-ADR, drug-protein, gene-disease, gene-protein, drug-indication, and phenotype-disease links and pre-processed STRING PPI network have been deposited at [234]https://doi.org/10.6084/m9.figshare.28254812. Access to drug-protein links requires a usage license from the DrugBank dataset. * • All original code has been deposited at [235]https://github.com/faren-f/DREAMER and is publicly available at [236]https://doi.org/10.6084/m9.figshare.28254812 as of the date of publication. * • Any additional information required to reanalyze the data reported in this work paper is available from the [237]lead contact upon request. Acknowledgments