Abstract Background Although there are huge volumes of genomic data, how to decipher them and identify driver events is still a challenge. The current methods based on network typically use the relationship between genomic events and consequent changes in gene expression to nominate putative driver genes. But there may exist some relationships within the transcriptional network. Methods We developed MECoRank, a novel method that improves the recognition accuracy of driver genes. MECoRank is based on bipartite graph to propagates the scores via an iterative process. After iteration, we will obtain a ranked gene list for each patient sample. Then, we applied the Condorcet voting method to determine the most impactful drivers in a population. Results We applied MECoRank to three cancer datasets to reveal candidate driver genes which have a greater impact on gene expression. Experimental results show that our method not only can identify more driver genes that have been validated than other methods, but also can recognize some impactful novel genes which have been proved to be more important in literature. Conclusions We propose a novel approach named MECoRank to prioritize driver genes based on their impact on the expression in the molecular interaction network. This method not only assesses mutation’s effect on the transcriptional network, but also assesses the differential expression’s effect within the transcriptional network. And the results demonstrated that MECoRank has better performance than the other competing approaches in identifying driver genes. Keywords: Driver genes, Cancer, Transcriptional networks Background Recent advances in deep sequencing have provided us with an unprecedented amount of cancer genomics data. With the rapid accumulation of huge volumes of genomic data, we have tremendous opportunities to better understand cancer initiation, progression and development [[33]1]. However, it is still a challenge to decipher those data (e.g., single nucleotide variants (SNVs), small insertions or deletions (indels), large copy-number variations (CNVs) and structural aberrations, etc.) and use them to distinguish driver mutations which contribute to cancer development from passenger mutations that have accumulated in somatic cells but without functional consequences [[34]2, [35]3]. In fact, in the early stages, single data, such as somatic aberrations data is the most commonly used data to identify driver genes. For example, frequency-based methods, such as MuSiC [[36]3] and MutSigCV [[37]4], are the common approach which relies on the frequency of aberration of a given gene or locus in a population of tumors [[38]5]. In addition, machine learning approaches based on alterations knowledge are also used to identify driver genes. For instance, CHASM adopts random forest which use alterations trained from known cancer-causing somatic missense mutations to classify driver mutations [[39]6, [40]7]. Recently, many mathematical and statistical methods which are based on data integration were proposed to search for driver genes, driver pathways or core modules [[41]8]. Bayesian network-based methods such as CONEXIC integrated copy number change and gene expression data to identify potential driver genes which are located in some amplified or deleted regions in tumors [[42]9]. With the developing of the research of cancer, we have recognized that the development and progression of cancer can be promoted by driver mutations or gene perturbing signaling, regulatory or metabolic pathways [[43]1]. Thus, several methods that use network and pathway to understand drivers have been proposed, e.g., MEMo [[44]10] and Dendrix [[45]11]. MEMo uses the mutual exclusivity of gene mutations to detect mutated subnetworks critical to carcinogenesis [[46]10]. As well as Dendrix was designed to identify subnetworks with potential driver activity which have high coverage and high mutual exclusivity [[47]11]. Another method, MUFFINN measures the functional impact of the network neighbors of mutated genes, and scores the investigated genes by considering the influence of either the most frequently mutated neighbor or all direct neighbors [[48]12]. Although the aforementioned approaches have achieved great achievements in distinguishing driver genes, improving the identification accuracy of driver genes is still a challenge. In this work, we propose a method named MECoRank to prioritize driver genes of single patient sample based on their impact on the expression in the molecular interaction. Our method not only assess mutation’s effect on gene expression, but also assess the differential expression’s effect within a transcriptional network. We first construct a bipartite graph model to formulate associations between expression and somatic SNVs using protein-protein interaction (PPI) network. A bipartite graph is a graph whose vertices can be partitioned into two subsets. In our work, vertices on the left partition of the bipartite graph correspond to individual gene expression status and vertices on the right partition represent individual mutated genes. And then an iterative process was used to propagate the effect of somatic SNVs and differential expression. After iteration, we will obtain a ranked gene list for each patient sample. Finally, we applied the Condorcet voting method to determine the most impactful drivers in a population. To test the performance of our approach, we analyzed three datasets which are breast cancer dataset (BRCA), kidney renal clear cell carcinoma (KIRC) and lung squamous cell carcinoma (LUSC). From TCGA Data Portal ([49]https://portal.gdc.cancer.gov/), we collected the data of somatic SNVs. And from UCSC [[50]13], we obtained gene expression data. Experimental results show that our method not only can identify more driver genes that have been validated than other methods, but also can recognize some impactful novel genes. Although these genes are not presented in Cancer Gene Census (CGC), some evidences show that these candidate genes have functional roles in cancer or cancer-related biological processes. Methods Method overview The proposed MECoRank method prioritizes driver genes based on the impact of their mutations and differential expression on the expression in the molecular interaction during a single patient. An overview of its workflow is presented in Fig. [51]1. In the following section, we first present the bipartite graph model and then give the iterative framework. Finally, we introduce Condorcet voting method for rank aggregation. Fig. 1. [52]Fig. 1 [53]Open in a new tab A schematic of MECoRank framework. a The data we used including gene expression of cancer and normal patients, somatic SNVs and PPI network. b The left partition of bipartite graph represents an individual patient’s expression set U where the transition matrices W^UU indicates the relations in U. The right of the bipartite graph represents the patient’s mutation set V where the transition matrices W^VU indicates the interactions between U and V. c We can obtain a score matrix in which each gene of every patient has a score. d We used the Condorcet voting to obtain the final rank of the genes A bipartite graph model In this section, we detail the bipartite graph model used in our work. Consider a bipartite graph G = (U ∪ V, E), where U and V represent the individual expression and mutation respectively. Each edge in E connects a vertex in U and one in V. Let U = {u[1], u[2], …, u[m]} and V = {v[1], v[2], …, v[n]} be the two sets of m and n genes. We use u[i] to denote the i − th vertex in U, and v[j] to denote the j − th vertex in V, where 1 ≤ i ≤ m and 1 ≤ j ≤ n. For each patient, an edge between the nodes on the left and right partitions of the graph is drawn if u[i]and v[j] are known to interact according to PPI network. We constructed the bipartite graph for each patient by the same way. The edges between U and V are represented as the transition probability W^VU [[54]14]. And the edges within U are represented as W^UU. If there is an edge connecting u[i] and v[j] in PPI, [MATH: wijvu=1 :MATH] ; otherwise, [MATH: wijvu=0 :MATH] . As such, we can describe all edges weights of the graph as a m × n matrix [MATH: WVU=wijvu :MATH] . Similarly, we can easily obtain the transition matrix W^UU. For each vertex v[j], we denote its degree (sum of connected edges’ weights) as d[j], and use a diagonal matrix D[v] to denote the degrees of all vertices in V; and similarly, for d[i] and D[u]. Note that in this paper, we deal with undirected bipartite graphs. Iterative framework To rank vertices based on the graph structure, seminal algorithms like PageRank [[55]15] and HITS [[56]16] have been proposed [[57]17]. PageRank is an algorithm used by Google Search to rank websites in their search engine results [[58]18]. This approach estimates the importance score of vertices as the stationary distribution of a random walk process – starting from a vertex, the surfer randomly jumps to a neighbor according to the edge weight [[59]17]. HITS algorithm is similar to PageRank in some aspects. This method assumes each vertex has two roles: hub and authority [[60]16]. If a vertex is linked by many vertices with hub score, this vertex has high authority and vice versa [[61]17]. These two methods are focus on unipartite graphs. Our iterative framework references HITS, PageRank and their variants which