Abstract

   High coverage and mutual exclusivity (HCME), which are considered two
   combinatorial properties of mutations in a collection of driver genes
   in cancers, have been used to develop mathematical programming models
   for distinguishing cancer driver gene sets. In this paper, we summarize
   a weak HCME pattern to justify the description of practical mutation
   datasets. We then present AWRMP, a method for identifying driver gene
   sets through the adaptive assignment of appropriate weights to gene
   candidates to tune the balance between coverage and mutual exclusivity.
   It embeds the genetic algorithm into the subsampling strategy to
   provide the optimization results robust against the uncertainty and
   noise in the data. Using biological datasets, we show that AWRMP can
   identify driver gene sets that satisfy the weak HCME pattern and
   outperform the state-of-arts methods in terms of robustness.

Introduction

   Driver mutations, which are the mutations responsible for cancer, are
   different from randomly occurring passenger mutations. Because driver
   mutations typically target genes involved in cellular signalling and
   regulatory pathways^[32]1,[33]2. The examination of these mutations in
   the context of pathways and gene sets is an essential issue in cancer
   genome research. However, an exhaustive search for driver pathways is
   impossible due to the enormous number of gene set candidates. Thus,
   prior knowledge, such as mutation patterns, is often used as a
   constraint to limit the scale of the gene set candidates. In
   particular, high coverage and mutual exclusivity (HCME), two
   combinatorial properties of driver mutations in a cellular signalling
   pathway or regulatory pathway^[34]2,[35]3, are being used as important
   prior knowledge in de novo discovery methods for driver gene sets
   (i.e., groups of mutated driver genes)^[36]4–[37]19. High coverage
   means that the members in the driver gene set recurrently occur in
   patient cohorts, and mutual exclusivity means that almost all the
   patients exhibit no more than one single driver mutation event in the
   driver gene set. For the developments of state-of-art discovery methods
   for cancer driver pathways, readers are referred to the latest survey
   by Zhang and Zhang^[38]20.

   The mathematical programming models for the de novo discovery of driver
   gene sets can be deduced from the HCME pattern. Vandin et al. developed
   the Dendrix algorithm, which investigates the optimal gene set by
   maximizing a HCME-derived score function^[39]4. The scoring function in
   Dendrix was formulated by the cardinalities of sets of patients and
   genes, and thus, the function was not sufficiently explicit for the
   optimization design. To this end, Zhao et al. further developed an
   explicit binary linear programming model and optimization framework,
   called MDPfinder, for the scoring system^[40]5. Zhao et al. initially
   introduced the genetic algorithm (GA)^[41]21 for this problem^[42]5.
   Leiserson et al. generalized Dendrix for the batch discovery of
   multiple driver gene sets^[43]6. Zhang et al. developed CoMDP to
   identify co-occurring driver gene sets^[44]7. Zhang et al. proposed
   ComMDP and SpeMDP to investigate common and specific driver gene sets
   among multiple cancer types, respectively^[45]8. In addition to the
   mathematical programming based de novo discovery methods, several
   probabilistic and statistical approaches have also been proposed. For
   example, Constantinescu et al. proposed TiMEx, a probabilistic
   generative model for the identification of mutually exclusive
   patterns^[46]17. Leiserson et al. proposed CoMEt for the identification
   of genes exhibiting mutual exclusivity^[47]18. Kim et al. proposed
   WeSME, a computational cost saving method for the permutation test in
   the discovery of mutual exclusivity^[48]19.

   The assumption of mutual exclusivity implies that a patient exhibits no
   more than one crucial mutation event. Thus, this assumption is strong
   for the discovery of the driver gene sets from the mutation data of a
   cohort of patients. As indicated by^[49]16, the application of such a
   strong assumption can lead to a highly unbalanced pattern, in which a
   single frequently mutated gene is coupled to several other infrequently
   mutated genes to satisfy the assumption of mutual exclusivity. By
   observing the mutation patterns in critical cancer driver pathways
   (Supplementary Fig. [50]1), we found that a gene that is mutated in
   many patients always overlaps with other genes. The coverage of an
   individual gene is positively correlated with its overlap with other
   genes in a pathway. On the basis of this fact, we proposed the
   following weak HCME pattern for discovering a driver gene set from a
   cohort of patients: (a) the members in the driver gene set recurrently
   occur in a patient cohort; (b) the members in the driver gene set
   approximately satisfy mutual exclusivity and (c) the overlaps should be
   adequately permitted and the members that cover many patients can have
   relatively more overlaps than the rarely mutated members. On the other
   hand, the mutation datasets used in the de novo discovery methods are
   commonly sparse, i.e., the total number of patients (samples) is
   smaller than that of genes (variables). Similar to other data-driven
   inference methods, the sparseness of datasets presents another
   challenge for ensuring the robustness of de novo discovery methods.

   Here, we introduce adaptively weighted and robust mathematical
   programming (AWRMP) for identifying driver gene sets that satisfy the
   weak HCME pattern. We constructed mathematical programming models using
   the cardinalities of sets of patients associated with the mutated genes
   as adaptive weights, for tuning the balance of importance between
   coverage and mutual exclusivity, to construct mathematical programming
   models. Motivated by^[51]5, GA^[52]21 was used as the basic
   optimization solver to efficiently solve the optimization problem. GA
   was embedded in a systematic subsampling strategy to obtain robust
   solutions against uncertainty and noise in the mutation data.
   Additionally, the subsampling approach can identify a parsimonious gene
   set, whose dimension can be considered a low bound for the dimension of
   the driver gene set in the sense of robustness. We applied our method
   to several biological datasets, and the results showed that our method
   identified rational driver gene sets. We then tested the significance
   of mutual exclusivity on our results using CoMEt^[53]18 and
   TiMEx^[54]17, and proved the robustness of AWRMP through a disturbance
   test.

Results

AWRMP workflow

   The AWRMP procedure can be divided into three modules as follows
   (Fig. [55]1). We first converted the mutation data into a binary-valued
   matrix A with m rows (samples) and n columns (genes). Each element
   A[ij] ∈ {0, 1} of A was defined as
   [MATH:
   <msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><mrow><mo stretchy="true">(</mo><mrow><mtable
   columnalign="left"><mtr columnalign="left"><mtd
   columnalign="left"><mn>1</mn></mtd><mtd columnalign="left"><mrow><mi
   mathvariant="normal">gene</mi><mspace
   width=".25em"></mspace><mi>j</mi><mspace width=".25em"></mspace><mi
   mathvariant="normal">is</mi><mspace width=".25em"></mspace><mi
   mathvariant="normal">mutated</mi><mspace width=".25em"></mspace><mi
   mathvariant="normal">in</mi><mspace width=".25em"></mspace><mi
   mathvariant="normal">sample</mi><mspace
   width=".25em"></mspace><mi>i</mi></mrow></mtd></mtr><mtr
   columnalign="left"><mtd columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mi
   mathvariant="normal">otherwise</mi></mrow></mtd></mtr></mtable></mrow><
   /mrow><mn>.</mn> :MATH]
   1

Figure 1.

   [56]Figure 1
   [57]Open in a new tab

   Overview of AWRMP. (a) We constructed binary-valued mutation matrices
   from mutation data files. (b) We used subsampling to make our method
   robust against the uncertainty and noise in the data. (c) The optimal
   gene set was evaluated based on coverage and exclusivity scores and
   annotated to analyse the gene interactions using DAVID. (d) We proposed
   a new mathematical programming model that uses adaptive weights to tune
   the balance between the coverage and mutual exclusivity.

   We constructed a binary integer programming model on the basis of weak
   HCME, which is used to investigate optimal submatrix of A. Compared
   with Dendrix and its extensions, we embedded the adaptive weights to
   tune the balance between coverage and mutual exclusivity. GA was used
   as the optimization solver. We further integrated GA with a systematic
   subsampling strategy^[58]22 to eliminate the uncertainty and noise in
   the mutation data, and then annotated and evaluated the identified gene
   sets using DAVID^[59]23.

Correlations between coverage score and overlap contribution

   By observing the critical cancer driver pathways, we found that the
   coverage score of a mutated gene defined by formula ([60]19) and its
   overlap contribution defined by formula ([61]22) are highly positively
   correlated. For example, Supplementary Fig. [62]1 illustrates coMut
   plots of the somatic mutations in the apoptosis pathway obtained from
   the breast cancer (BC) mutation data^[63]24 and in the ErbB pathway
   obtained from glioblastoma (GBM) mutation data^[64]25. The two plots
   showed that the mutated genes approximately satisfied HCME. However,
   the genes with high coverage scores showed many overlaps with other
   genes, such as TP53, PIK3CA, EGFR, and PTEN in the two pathways.
   Figure [65]2 illustrates two scatter plots of the coverage score
   against the overlap contribution for all the genes in the two pathways.
   The correlation coefficient corresponding to the apoptosis pathway for
   BC was 0.9936; and that corresponding to the ErbB pathway for GBM was
   0.9975. Therefore, the proper overlaps should be considered to identify
   the driver gene sets from mutation data from a cohort of patients.

Figure 2.

   [66]Figure 2
   [67]Open in a new tab

   Scatter plots of the coverage score against the overlap contribution
   for (a) the apoptosis pathway of BC and (b) the ERBB pathway of GBM.

Identified driver gene set for lung adenocarcinoma

   Lung adenocarcinoma (LUAD) is the most common histological type of lung
   cancer. To illustrate the performance of AWRMP, we applied AWRMP to
   LUAD mutation data^[68]26, which was also previoustly used to test
   Dendrix^[69]4. The variable k denotes the gene set dimension that is
   pre-defined to be identified by AWRMP. The gene sets obtained from the
   LUAD data with k = 2, 3, …, 10 were investigated (Fig. [70]3). TP53,
   KRAS, EGFR, and STK11, which have relatively high mutation frequencies,
   were always included in the identified gene sets obtained with k values
   larger than 4 (Fig. [71]3(a)). The identified gene sets became nested
   with increasing values of k (Fig. [72]3(b)).

Figure 3.

   Figure 3
   [73]Open in a new tab

   Nested gene sets identified by AWRMP for the gene set dimensions k = 2,
   3, 4, …, 10. (a) Subsampling rates and coverage scores of genes
   obtained using Eqs ([74]13) and ([75]19) in Methods, respectively. (b)
   Elements of gene sets obtained with k = 2, 3, …, 10.

   For k = 2, the gene set (KRAS, TP53) was identified by AWRMP with a
   subsampling rate of 1 obtained using Eq. ([76]13). In contrast, the set
   (KRAS, EGFR) was identified by Dendrix^[77]4. For k = 3, the triplet
   (EGFR, KRAS, TP53) was the unique optimal gene set selected by AWRMP
   with a subsampling rate 1 calculated using Eq. ([78]13) in Methods.
   This gene set was mutated in 119 patients with an overlap score of
   0.7059, which was obtained using Eq. ([79]20) in Methods. Mutations in
   EGFR, KRAS, and TP53 are vital in lung cancer biology, and the
   molecular alterations associated with these mutation profiles have been
   widely investigated^[80]27. Note that the triplet (EGFR, KRAS, STK11)
   was obtained with Dendrix. This difference was obtained because TP53
   overlapped with the other two genes, and Dendrix ignored TP53 to ensure
   mutual exclusivity in its programming model.

   For k = 10, we found that the gene set (ABL1, EGFR, KRAS, MKNK2, NF1,
   PAK6, PTEN, STK11, TERT, TP53) was mutated in 145 patients
   (Fig. [81]4(a)). Through annotation using DAVID^[82]23, these genes
   were found to be involved in the ErbB, MAPK, and PI3K-Akt signalling
   pathways, which are known to be critical in LUAD. Based on the
   knowledge of these pathways, we observed that most genes in this set
   involve interactions (Fig. [83]4(b)). The subset (KRAS, EGFR, STK11,
   PTEN, TP53) covering 133 patients is a subset of the PI3K-Akt
   signalling pathway, and PI3K-Akt pathway mutations involved in
   tumourigenesis have been reported for LUAD^[84]28. Various treatments
   aiming to inhibit lung cancer cell proliferation, migration and
   invasion through the PI3K-Akt pathway have been developed^[85]29. The
   subset (KRAS, MKNK2, EGFR, NF1, TP53), which constitutes a subset of
   the MAPK pathway, plays a pivotal role in cell proliferation,
   differentiation and survival^[86]30. MAPK signal amplification
   contributes to the rapid progression of established adenomas to LUAD
   and takes effect during both malignant progression and tumour
   initiation^[87]31,[88]32. The subset (KRAS, MKNK2, EGFR, NF1)
   overlapped in five patients, whereas the subset (KRAS, MKNK2, EGFR,
   NF1, TP53) overlapped in 44 patients. This finding indicated that TP53
   showed little mutual exclusivity with the other four genes. Whereas the
   remaining genes KRAS, MKNK2, EGFR, and NF1 exhibited highly mutual
   exclusivity. The subset (ABL1, EGFR, KRAS, PAK6) which was mutated in
   94 patients, forms part of the ErbB signalling pathway, which involves
   a family of tyrosine kinases and has been confirmed to be vital for the
   development of LUAD^[89]33,[90]34. All the genes in this subset exhibit
   highly mutual exclusivity scores. Although TERT was not annotated in
   the aforementioned pathways, it has been found to be the most frequent
   genetic event in the early stages of non-small cell lung cancer^[91]35.

Figure 4.

   [92]Figure 4
   [93]Open in a new tab

   Optimal gene set identified by AWRMP for the LUAD dataset (k = 10). (a)
   Coverage plot of the optimal gene set. (b) Interaction of the genes in
   the optimal set annotated by knowledge of known pathways.

   To date, no explicit method has been developed to determine the
   dimension of the driver gene set identified by de novo discovery
   method. However, based on the subsampling strategy in AWRMP, we
   calculated the subsampling rate of each gene using Eq. ([94]16) in
   Methods. Consequently, according to the subsampling rates, the subset
   (EGFR, KRAS, TP53, STK11, NF1) can be considered a parsimonious set
   that shows robustness against the uncertainty and noise in the data.
   The dimension of the parsimonious set can be considered a lower bound
   for the dimension of the driver gene set.

Performance of AWRMP

   Figure [95]5 shows a scatter plot of the coverage score against the
   overlap contribution for the optimal gene set. As shown, the optimal
   gene set identified by AWRMP shows a similar pattern with the mutation
   pattern of the well-known cancer driver pathways illustrated in
   Fig. [96]2. This fact confirmed that our adaptive weights in Eq.
   ([97]8) worked well, and this adaptiveness allowed us to identify
   useful overlaps. For example, the co-mutation (overlaps) of TP53 and
   NF1 has been known to be the feature of the PI subtype of LUAD^[98]28.

Figure 5.

   [99]Figure 5
   [100]Open in a new tab

   Scatter plot of coverage score against overlap contribution for the
   optimal gene set.

   To illustrate the robustness of AWRMP, we artificially disturbed some
   elements A[ij]s of the mutation matrix A, by randomly turning the value
   0 to 1 or randomly turning the value 1 to 0, to generate 100 new
   mutation matrices, and AWRMP was then performed for each disturbed
   mutation matrix. Consequently, the numbers of times that the candidate
   genes were selected by AWRMP with all 100 disturbed mutation matrices
   were used to evaluate the robustness of the proposed method
   (Fig. [101]6).

Figure 6.

   [102]Figure 6
   [103]Open in a new tab

   Number of times that the optimal gene set (ABL1, EGFR, KRAS, MKNK2,
   NF1, PAK6, PTEN, STK11, TERT, TP53) and its elements were identified by
   AWRMP with the 100 disturbed mutation matrices. (a) Percentages of the
   100 disturbed mutation matrices obtained by tuning values of 1 into 0
   (blue) and by tuning values of 0 into 1 (green). (b) Number of times
   that the 10 genes in the optimal gene set were identified with all 100
   disturbed mutation matrices.

   We conducted the disturbance test for d = 10 and 50, where d denotes
   the total number of disturbed elements in the mutation matrix A. The
   same optimal gene set (ABL1, EGFR, KRAS, MKNK2, NF1, PAK6, STK11, TERT,
   PTEN, TP53) was always identified for d = 10 in the both disturbance
   schemes (Fig. [104]6(a)). Increasing the value of d to 50, decreased
   the percentage of the 100 disturbed mutation matrices obtained by
   tuning values of 1 to 0 that yielded the optimal gene set to 27% and
   the percentage of the 100 disturbance schemes obtained by tuning values
   of 0 to 1 that yielded the optimal gene set to 30%. As expected, the
   robustness of AWRMP degenerated with increases in d. By observing the
   number of the times that ten genes of the optimal gene set were
   identified in the 100 runs of the disturbance test, we found that the
   subset (EGFR, KRAS, STK11, TP53, NF1) was always identified, even for d
   = 50 (Fig. [105]6(b)). This subset was thus the parsimonious set
   identified according to the subsampling rates. The total numbers of
   samples harboring these five genes were 30, 60, 34, 64, and 13,
   respectively. Thus the genes with relatively high coverage endured the
   disturbance. Furthermore, TERT showed the most sensitivity to the
   disturbance, even though it did not show the lowest observed mutation
   frequency. Moreover, TERT was not involved in any pathway detected by
   AWRMP (Fig. [106]4). This finding implies that TERT is slightly
   different from the other nine genes due to its weak HCME pattern. The
   results of the disturbance tests for other related methods are shown in
   Supplementary Fig. [107]2.

   In addition to the robustness analysis, we also performed statistical
   significance tests using CoMEt^[108]18 and TiMEx^[109]17, and the
   results are depicted in Table [110]1. The optimal gene set identified
   by AWRMP can be considered to be significant for mutual exclusivity.

Table 1.

   Pathway enrichment analysis and assessment of the statistical
   significance of the optimal gene set for LUAD identified by AWRMP from
   LUAD mutation data.
            Genes                    Pathway (q-value)            CoMEt   TiMEx
 KRAS, EGFR, TP53MKNK2, NF1 MAPK signalling pathway(2.00e-3)     0.02
   6.45e-7
 KRAS, EGFR, ABL1PAK6       ErbB signalling pathway(2.10e-3)     4.27e-8 1.57e-7
 KRAS, EGFR,STK11PTEN, TP53 PI3K-Akt signalling pathway(4.70e-3) 0.05
   3.34e-6
   [111]Open in a new tab

Parsimonious sets identified from breast cancer and glioblastoma datasets

   In addition to the LUAD data, we also applied AWRMP to mutation
   datasets, including BC mutation data^[112]24 and GBM mutation
   data^[113]25.

   For the BC mutation data, AWRMP identified the parsimonious set (AKT1,
   BRCA2, GATA3, MAP3K1, PIK3CA TP53, RGS1(A), where “(A)” refers to
   amplification) with a high coverage score of 0.86 and a low overlap
   score of 0.45 (Supplementary Fig. [114]3). Among these genes, BRCA2
   truncating mutations have been associated with an increased risk of
   BC^[115]36. GATA3 has been identified as a prognostic marker for
   BC^[116]37. The genes (AKT1, MAP3K1, PIK3CA) are associated with the
   abrogation of JUN kinase signalling, which occurs in approximately half
   of BC patients^[117]38. The biological consequences of a reduction in
   JUN kinase activity in response to stress might include destabilization
   and consequent inactivation of TP53 and thereby disruption of
   pro-apoptotic cellular signalling^[118]39. Thus, the co-mutations in
   the parsimonious set obtained by the adaptiveness of AWRMP are
   reasonable. The relation between RGS1 mutation and BC has been
   discovered in^[119]40.

   From the GBM mutation data, AWRMP identified the parsimonious set
   (EGFR, NF1, PIK3CA, PIK3R1, PTEN, GABRA6, TP53) with a coverage score
   of 0.70 and an overlap score of 0.30 (Supplementary Fig. [120]4). Among
   these genes, NF1 is a human glioblastoma suppressor gene^[121]41, and
   patients harbouring NF1 mutation or deletion tended to show decreased
   PKC pathway activity and elevated MAP kinase activity^[122]25. GABRA6,
   an inhibitory neurotransmitter in the mammalian brain, contributes to
   coding for a transmembrane polymorphic antigen glycoprotein^[123]25.
   The subset (EGFR, PIK3CA, PIK3R1, PTEN, TP53) is part of the PI3K
   signalling pathway, and 62% of the glioblastoma samples harboured at
   least one genetic event associated with this subset. The PI3K-Akt
   signalling pathway plays an important role in the regulation of signal
   transduction, which mediates various biological processes, including
   cell proliferation, apoptosis, metabolism, motility and angiogenesis in
   GBM^[124]42.

Discussion

   By observing the mutation patterns in cancer driver pathways from
   practical mutation datasets, we found the following: (a) the HCME
   pattern was approximately satisfied by the genes in the driver pathways
   and (b) overlaps were always observed, particularly among the genes
   with high coverage scores. For this reason, we proposed that the HCME
   pattern should be weakened by allowing proper overlaps in the discovery
   of driver gene sets. We developed AWRMP to identify the driver gene
   sets in cancer from mutation data. Ultimately, the goal of this
   approach is to investigate the gene sets that adaptively satisfy the
   weak HCME pattern. Moreover, by considering the sparsity of the
   mutation data, AWRMP can endure the potential uncertainty and noise in
   the data using the subsampling method. Here, we tested the performance
   of AWRMP using several biological datasets.

   Driver mutations have often been investigated by observing the
   recurrence of individual genes^[125]43,[126]44. However, mutational
   heterogeneity complicates the identification of functional mutations
   due to the recurrence of individual genes across many samples. As an
   alternative, an investigation of the putative driver gene set found
   across patients, has been proven to be another feasible approach. It is
   obvious that increases in the dimension of gene sets increases the
   monotonic coverage. For this reason, it becomes necessary to utilise
   constraints derived from biological knowledge. Notably, the mutual
   exclusivity of the pathways was used in combination with coverage to
   investigate driver gene sets. As noted by^[127]6, the driver pathways
   exhibiting the HCME patterns are generally smaller and more focused
   than most pathways annotated in the databases.

   Figure [128]2 shows two examples of mutation patterns in cancer driver
   pathways, and these show that the coverage scores of the gene members
   are positively correlated with the overlap contributions. The
   information provided in Supplementary Fig. [129]6 suggests that this
   positive correlation can be generally observed in all mutated gene
   sets, not just in cancer driver pathways. Thus, when investigating
   cancer driver gene sets, the genes covering many patients should be
   allowed to exhibit more overlaps with other genes. For this reason, we
   claimed that the weak HCME pattern is more feasible for describing the
   mutation patterns in cancer driver pathways. According to the weak HCME
   pattern, we proposed the use of adaptive weights in AWRMP. Because of
   the adaptive weights included in our programming model, our results
   were different from those obtained with Dendrix^[130]4 (Supplementary
   Table [131]1), MDPfinder^[132]5 (Supplementary Table [133]2),
   Mutex^[134]13 (Supplementary Table [135]3), and CoMDP^[136]7
   (Supplementary Table [137]4), all of which assign identical weights to
   all gene candidates. The analysis of LUAD mutation data using our
   method included TP53 with a high coverage score in the final result.
   Because CoMEt and TiMEx were proposed based on the rigorously mutual
   exclusivity, these four related methods showed better scores than AWRMP
   (Supplementary Tables [138]7–[139]10). However, the optimal gene set
   obtained by AWRMP still passed the permutation test of mutual
   exclusivity performed using CoMEt and TiMEx. In other words, the gene
   set identified by our method satisfied the mutual exclusivity, although
   our method permits more overlaps than other related methods.
   Furthermore, the overlaps identified by our AWRMP can be useful, like
   the overlaps between TP53 and NF1 identified for the LUAD data set. We
   do not claim that our method is better than other related approaches
   for the identification of TP53. After all, frequently mutated genes
   individuals can be identified using MutsigCV^[140]43. Our proposal is
   that the results obtained by AWRMP are more concordant to the objective
   mutation pattern, i.e., weak HCME, as demonstrated in Figs [141]2 and
   [142]5. Supplementary Fig. [143]5 shows the correlation between the
   coverage score and the overlap contribution of the optimal gene sets
   obtained by the other four methods, and these findings showed that
   these four methods did not satisfy the weak HCME pattern as well as our
   method. Note that ComMDP can also identify genes with high mutation
   frequencies, such as TP53 and PIK3CA^[144]8. However, ComMDP was
   proposed for the identification of the common driver gene set across
   several types of cancer by combining their mutation matrices. Based on
   the mathematical programming model^[145]8, ComMDP is identical to
   MDPfinder for a single type of cancer.

   In AWRMP, the optimization solver GA was embedded into the subsampling
   strategy to ensure the robustness of the algorithm. Prior to this
   study, the robustness of de novo discovery methods has seldom been
   considered. Nevertheless, the mutation matrices used as the inputs in
   these methods were always derived from high-throughput sequencing data,
   which are well known to be noisy. Furthermore, the total number of
   samples is notably much smaller than the number of genes. The use of
   sparse data always leads to statistical inference that is not robust to
   noise and uncertainty. The disturbance tests of Dendrix, MDPfinder,
   CoMDP, and, Mutex (Supplementary Tables [146]5 and [147]6, and
   Supplementary Fig. [148]2) revealed that a single run of the MCMC
   method and integer linear programming method were not robust to the
   disturbance. Because the subsampling strategy is always applied to
   estimate the precision of sample statistics, we adopted the subsampling
   method to compute the probabilities of gene sets obtained by the
   optimization solver. Consequently, the gene sets with high
   probabilities can be considered robust results. Because the adaptive
   weight defined by Eq. ([149]8) is a nonlinear function of I[M](j)
   defined by the Eq. ([150]3), the programming model ([151]6) is no
   longer a linear programming model. Motivated by^[152]5, the heuristic
   GA was used in AWRMP. As a type of combinatorial optimization model,
   the mathematical programming model defined by formula ([153]8) often
   consists of multiple solutions. AWRMP can offer the robustness level
   for each solution based on the subsampling strategy.

   Through AWRMP, we propose that the gene candidates should be assigned
   different levels of importance based on the weak HCME pattern. In
   addition to the weights derived from the coverage scores obtained by
   AWRMP, the covariates associated with mutations, such as the expression
   level of genes and the DNA replication time of genes used in
   MutsigCV^[154]43, can also be considered weights. The application of
   subsampling can assuredly increase the computational cost. However, we
   insist that the robust results obtained from sparse data need to be
   cautiously investigated.

Methods

Cancer genetic data and mutation matrix

   We directly used the mutation matrix derived from LUAD mutation data by
   Dendrix^[155]4, which included 163 patients with at least one mutated
   gene and 356 genes mutated in at least one patient.

   The BC and GBM mutation datasets (maf files) were downloaded from The
   Cancer Genome Atlas Data Portal ([156]http://tcga-data.nci.nih.gov),
   and these datasets consider point mutations and copy number alterations
   (CNAs). Somatic point mutations were identified with MutsigCV^[157]43.
   The corresponding entry in the mutation matrix was assigned a value of
   1 to indicate significant point mutation. Using the approach described
   in^[158]16, if a CNA event is concordant with the expression data, the
   corresponding entry in the mutation matrix is 1. After pre-processing,
   487 samples and 274 genes were included in the BC mutation matrix and
   282 samples and 308 genes were included in the GBM mutation matrix.

Previous methods

   For the mutation matrix A defined by Eq. ([159]1), which has m rows
   (samples) and n columns (gene candidates), Dendrix initially proposed
   the following programming model for the identification of an m × k
   optimal submatrix M that satisfies the HCME pattern
   [MATH: <mi>W</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo><mo>≡</mo><mrow><mo
   stretchy="true">|</mo><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow><mo>−</mo><mi>ω</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo><mo>=</mo><mn>2</mn><mrow><mo
   stretchy="true">|</mo><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow><mo>−</mo><mrow><munder><mo>∑</mo><mrow><m
   i>g</mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></mrow><
   /msub></mrow></munder></mrow><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mo stretchy="false">(</mo><mi>g</mi><mo
   stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow><mo>,</mo> :MATH]
   2

   where G[M] denotes the set of genes corresponding to the mutation
   matrix M, Γ(g) ≡ {i: A[ig] = 1} denotes the set of patients who
   presented mutations in gene g. g, g′ ∈ G[M] are mutually exclusive, if
   Γ(g) ∩ Γ(g′) = ∅. The sum of the cardinalities
   [MATH:
   <msub><mo>∑</mo><mrow><mi>g</mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow>
   <mrow><mi>M</mi></mrow></msub></mrow></msub><mrow><mo
   stretchy="true">|</mo><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><mi>g</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow> :MATH]
   denotes the total number of mutation events in M.
   [MATH: <mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo
   stretchy="false">)</mo><mo>≡</mo><msub><mrow><mo>∪</mo></mrow><mrow><mi
   >g</mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></mrow></
   msub></mrow></msub><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><mi>g</mi><mo stretchy="false">)</mo> :MATH]
   is the set of patients with mutations in the genes in M, and its
   cardinality |Γ(G[M])| can be further used to measure the coverage of
   the submatrix M. Thus, the coverage overlap
   [MATH: <mi>ω</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo
   stretchy="false">)</mo><mo>≡</mo><msub><mo>∑</mo><mrow><mi>g</mi><mo>∈<
   /mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></mrow></msub></mrow><
   /msub><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mo stretchy="false">(</mo><mi>g</mi><mo
   stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow><mo>−</mo><mrow><mo
   stretchy="true">|</mo><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow> :MATH]
   can be used to measure exclusivity.

   By noticing that the formula ([160]2) is not easy for developing the
   optimization strategy, Zhang et al.^[161]5 initially defined two
   indicator functions
   [MATH: <msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi></mrow></msub><mo
   stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo><mo>≡</mo><mrow><mo
   stretchy="true">(</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd columnalign="left"><mn>1</mn></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>G</mi></mr
   ow><mrow><mi>M</mi></mrow></msub></mrow></mtd></mtr><mtr
   columnalign="left"><mtd columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mspace width=".1em"></mspace><mi
   mathvariant="normal">otherwise</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr></mtable></mrow></mrow> :MATH]
   3

   for j = 1, 2, …, n and
   [MATH: <msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></mrow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo><mo>≡</mo><mrow><mo
   stretchy="true">(</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd columnalign="left"><mn>1</mn></mtd><mtd
   columnalign="left"><mrow><mi mathvariant="normal">genes</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">in</mi><mspace
   width=".5em"></mspace><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></mr
   ow></msub><mspace width=".5em"></mspace><mi
   mathvariant="normal">are</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">mutated</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">in</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">patient</mi><mspace
   width=".5em"></mspace><mi>i</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr><mtr columnalign="left"><mtd
   columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mspace width=".1em"></mspace><mi
   mathvariant="normal">otherwise</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr></mtable></mrow></mrow> :MATH]
   4

   for i = 1, 2, …, m, and reformulated the maximization of W(G[M]) as
   MDPfinder, which is a binary linear programming (BLP) problem:
   [MATH: <mtable columnalign="left"><mtr><mtd><munder><mrow><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">x</mi></mrow><mrow><mo
   stretchy="false">{</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo><mo
   stretchy="false">|</mo><mi>j</mi><mo>=</mo><mn>1,</mn><mo>⋯</mo><mo>,</
   mo><mi>n</mi><mo stretchy="false">}</mo></mrow></munder><mspace
   width=".5em"></mspace><mspace width=".5em"></mspace><mi>W</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo
   stretchy="false">)</mo><mo>=</mo><mn>2</mn><munderover><mo>∑</mo><mrow>
   <mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mrow
   ><mi>I</mi></mrow><mrow><mi>i</mi></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><munderover><mo>∑</mo><mrow><mi>
   j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>I</mi></mrow><mrow><mi>M</m
   i></mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow><mo>⋅</mo><munderover><mo>∑</mo><mrow><mi>
   i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mrow><mi
   >A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow></mtd></mtr><mtr><mtd><mi>s</mi><mn>.</mn>
   <mi>t</mi><mn>.</mn><mrow><mo stretchy="true">{</mo><mrow><mtable
   columnalign="left"><mtr columnalign="left"><mtd
   columnalign="right"><mrow><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi>
   </mrow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="center"><mo>≤</mo></mtd><mtd
   columnalign="left"><mrow><mrow><mo
   stretchy="true">(</mo><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=
   </mo><mn>1</mn></mrow><mi>n</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub><mo>⋅</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M
   </mi></mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow></mrow><mo
   stretchy="true">)</mo></mrow><mo>,</mo><mspace
   width="1em"></mspace><mspace width="1em"></mspace><mi
   mathvariant="normal">for</mi><mspace
   width=".5em"></mspace><mi>i</mi><mo>=</mo><mn>1,</mn><mspace
   width=".25em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><mi>m</mi><mo>;</mo><mspace
   width=".25em"></mspace><mi>j</mi><mo>=</mo><mn>1,</mn><mspace
   width=".5em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><mi>n</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr><mtr columnalign="left"><mtd
   columnalign="right"><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</
   mo><mn>1</mn></mrow><mi>n</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="center"><mo>=</mo></mtd><mtd
   columnalign="left"><mi>k</mi></mtd></mtr></mtable></mrow></mrow></mtd><
   /mtr></mtable> :MATH]
   5

Mathematical programming model of AWRMP

   In the mathematical programming model ([162]5), W(G[M]) is divided into
   two parts:
   [MATH:
   <msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi
   ></msubsup><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></m
   row></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo stretchy="true">)</mo></mrow> :MATH]
   measures the coverage using the sum with respect to patient i and the
   second term
   [MATH:
   <msubsup><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi
   ></msubsup><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>I</mi></mrow><mrow><mi>M</m
   i></mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow><mo>⋅</mo><msubsup><mo>∑</mo><mrow><mi>i</
   mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></msubsup><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub></mrow><mo stretchy="true">)</mo></mrow> :MATH]
   is the total number of mutation events (i.e., entries with a value of
   “1”) in the mutation matrix. The latter term indicates that MDPfinder
   assigns identical weights to all the genes. As we mentioned before,
   coverage is more important than exclusivity for the genes involved in
   multiple pathways. Consequently, we improve the mathematical
   programming model by assigning different weights to the genes contained
   in G[M], i.e.,
   [MATH: <mtable columnalign="left"><mtr columnalign="left"><mtd
   columnalign="right"><mrow><msub><mrow><mi>W</mi></mrow><mrow><mi>λ</mi>
   </mrow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="center"><mo>≡</mo></mtd><mtd
   columnalign="left"><mrow><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow><mo
   stretchy="true">|</mo></mrow><mo>−</mo><msub><mrow><mi>ω</mi></mrow><mr
   ow><mi>λ</mi></mrow></msub><mo stretchy="false">(</mo><mi>M</mi><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr columnalign="left"><mtd
   columnalign="right"></mtd><mtd
   columnalign="center"><mo>=</mo></mtd><mtd
   columnalign="left"><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mn>=1</
   mn></mrow><mi>m</mi></munderover><msub><mrow><mi>I</mi></mrow><mrow><mi
   >i</mi></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><mrow><mo
   stretchy="true">(</mo><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mn>=
   1</mn></mrow><mi>n</mi></munderover><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>λ</mi></mrow><mrow><mi>j</m
   i></mrow></msub><mo>⋅</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi>
   </mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow><mo>⋅</mo><munderover><mo>∑</mo><mrow><mi>
   i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><munderover><mo>∑</mo><mrow><mi>
   i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></m
   row></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo stretchy="true">)</mo></mrow></mrow><mo
   stretchy="true">)</mo></mrow></mrow></mtd></mtr><mtr
   columnalign="left"><mtd columnalign="right"></mtd><mtd
   columnalign="center"><mo>=</mo></mtd><mtd
   columnalign="left"><mrow><mn>2</mn><munderover><mo>∑</mo><mrow><mi>i</m
   i><mn>=1</mn></mrow><mi>m</mi></munderover><msub><mrow><mi>I</mi></mrow
   ><mrow><mi>i</mi></mrow></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><mrow><mo
   stretchy="true">(</mo><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=
   </mo><mn>1</mn></mrow><mi>n</mi></munderover><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>λ</mi></mrow><mrow><mi>j</m
   i></mrow></msub><mo>⋅</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi>
   </mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow><mo>⋅</mo><munderover><mo>∑</mo><mrow><mi>
   i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub></mrow><mo stretchy="true">)</mo></mrow></mrow><mo
   stretchy="true">)</mo></mrow></mrow></mtd></mtr></mtable> :MATH]
   6

   where
   [MATH:
   <msub><mrow><mi>λ</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>≡</mo><m
   row><mo stretchy="true">(</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd
   columnalign="left"><mrow><mfrac><mrow><mi>exp</mi><mo
   stretchy="false">(</mo><mspace
   width="-0.25em"></mspace><mo>−</mo><mspace
   width="-0.25em"></mspace><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo></mrow><mo stretchy="true">|</mo></mrow><mo
   stretchy="false">)</mo></mrow><mrow><mrow><munder><mo>∑</mo><mrow><mi>r
   </mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></mrow></ms
   ub></mrow></munder></mrow><mspace
   width=".25em"></mspace><mi>exp</mi><mo stretchy="false">(</mo><mspace
   width="-0.25em"></mspace><mo>−</mo><mspace
   width="-0.25em"></mspace><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mo stretchy="false">(</mo><mi>r</mi><mo
   stretchy="false">)</mo></mrow><mo stretchy="true">|</mo></mrow><mo
   stretchy="false">)</mo></mrow></mfrac></mrow></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>G</mi></mr
   ow><mrow><mi>M</mi></mrow></msub></mrow></mtd></mtr><mtr
   columnalign="left"><mtd columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mspace width=".1em"></mspace><mi
   mathvariant="normal">otherwise</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr></mtable></mrow></mrow> :MATH]
   7

   is the weight assigned to gene j for j = 1, 2, …, n. For all j ∈ G[M],
   λ [j]∈ (0, 1) and
   [MATH:
   <msub><mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow>
   <mrow><mi>M</mi></mrow></msub></mrow></msub><mspace
   width=".25em"></mspace><msub><mrow><mi>λ</mi></mrow><mrow><mi>j</mi></m
   row></msub><mo>=</mo><mn>1</mn> :MATH]
   . For gene j, λ [j]∈ (0, 1) makes coverage slightly more important than
   mutual exclusivity, and introduces overlaps with other genes.
   [MATH:
   <msub><mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><msub><mrow><mi>G</mi></mrow>
   <mrow><mi>M</mi></mrow></msub></mrow></msub><mspace
   width=".25em"></mspace><msub><mrow><mi>λ</mi></mrow><mrow><mi>j</mi></m
   row></msub><mo>=</mo><mn>1</mn> :MATH]
   allows the frequently mutated genes to have more overlaps than the
   rarely mutated genes in a gene set. In the case of λ[j] → 1 for |Γ(j)|,
   mutual exclusivity is tuned to be as important as coverage. Using this
   approach, the balance between coverage and exclusivity can be
   adaptively adjusted for various genes with respect to the cardinality
   |Γ(j)|. For this reason, λ[j] is called as adaptive weight.
   Consequently, the AWRMP programming model can be summarized as the
   following
   [MATH: <mtable columnalign="left"><mtr><mtd><munder><mrow><mi
   mathvariant="normal">max</mi></mrow><mrow><mo
   stretchy="false">{</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo><mo
   stretchy="false">|</mo><mi>j</mi><mn>=1,</mn><mo>⋯</mo><mo>,</mo><mi>n<
   /mi><mo stretchy="false">}</mo></mrow></munder><mspace
   width=".5em"></mspace><mspace
   width=".5em"></mspace><msub><mrow><mi>W</mi></mrow><mrow><mi>λ</mi></mr
   ow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mn>)=2</mn><munderover><mo>∑</mo><mrow><mi>i</mi><mn>=1</mn
   ></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></m
   row></msub><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>M</m
   i></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo>−</mo><munderover><mo>∑</mo><mrow><mi>
   j</mi><mn>=1</mn></mrow><mi>n</mi></munderover><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>λ</mi></mrow><mrow><mi>j</m
   i></mrow></msub><mo>⋅</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi>
   </mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow><mo>⋅</mo><munderover><mo>∑</mo><mrow><mi>
   i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow></mtd></mtr><mtr><mtd><mi>s</mi><mn>.</mn>
   <mi>t</mi><mn>.</mn><mspace width=".25em"></mspace><mrow><mo
   stretchy="true">{</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd
   columnalign="right"><mrow><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi>
   </mrow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="center"><mo>≤</mo></mtd><mtd
   columnalign="left"><mrow><mrow><mo
   stretchy="true">(</mo><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=
   </mo><mn>1</mn></mrow><mi>n</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi
   >j</mi></mrow></msub><mo>⋅</mo><msub><mrow><mi>I</mi></mrow><mrow><mi>M
   </mi></mrow></msub><mrow><mo stretchy="true">(</mo><mi>j</mi><mo
   stretchy="true">)</mo></mrow></mrow><mo
   stretchy="true">)</mo></mrow><mspace width=".1em"></mspace><mspace
   width="1em"></mspace><mspace width="1em"></mspace><mi
   mathvariant="normal">for</mi><mspace
   width=".5em"></mspace><mi>i</mi><mo>=</mo><mn>1,</mn><mspace
   width=".25em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><mi>m</mi><mo>;</mo><mspace
   width=".25em"></mspace><mi>j</mi><mo>=</mo><mn>1,</mn><mspace
   width=".5em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><mi>n</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr><mtr columnalign="left"><mtd
   columnalign="right"><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mn>=1<
   /mn></mrow><mi>n</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="center"><mo>=</mo></mtd><mtd
   columnalign="left"><mi>k</mi></mtd></mtr></mtable></mrow></mrow></mtd><
   /mtr></mtable> :MATH]
   8

Setting up of GA

   According to Eq. ([163]7), λ[j] is a nonlinear function of I[M](j),
   which indicates that the AWRMP optimization model ([164]8) is a
   nonlinear programming (NLP) model. According to the MDPfinder
   solver^[165]5, we used the metaheuristic GA method as the NLP solver.
   The settings of the GA are as follows:

GA search space

   The genes in A were labeled as 1, 2, …, n. According to Eqs ([166]3)
   and ([167]8), a binary-valued vector x ≡ [x[1], x[2], …, x[n]]^Τ is
   used as an individual of a population, in which x[i] ∈ {0, 1}
   characterizes the i-th gene in submatrix M. Thus, the GA search space
   is as follows:
   [MATH: <mi>S</mi><mo>=</mo><mrow><mo stretchy="true">{</mo><mrow><mi
   mathvariant="italic">x</mi><mrow><mo
   stretchy="true">|</mo><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</m
   i></mrow></msub><mo>∈</mo><mn>{0</mn><mo>,</mo><mspace
   width=".1em"></mspace><mn>1}</mn><mspace width=".25em"></mspace><mi
   mathvariant="normal">for</mi><mspace
   width=".5em"></mspace><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mspace
   width=".1em"></mspace><mn>2,</mn><mspace
   width=".25em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><mi>n</mi><mo>,</mo><mspace
   width=".25em"></mspace><munderover><mo>∑</mo><mrow><mi>i</mi><mn>=1</mn
   ></mrow><mi>n</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></m
   row></msub><mo>=</mo><mo
   stretchy="false">|</mo><msub><mrow><mi>G</mi></mrow><mrow><mi>M</mi></m
   row></msub><mo stretchy="false">|</mo></mrow></mrow></mrow><mo
   stretchy="true">}</mo></mrow><mo>.</mo> :MATH]
   9

GA fitness function

   In a GA, the fitness function is used to evaluate the quality of
   individual s[j] ∈ S. In AWRMP, we ranked each individual solution s[j]
   with respect to
   [MATH: <msub><mrow><mi>W</mi></mrow><mrow><mi>λ</mi></mrow></msub><mo
   stretchy="false">(</mo><msub><mrow><mi>G</mi></mrow><mrow><msub><mrow><
   mi>M</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow></msub><mo
   stretchy="false">)</mo> :MATH]
   obtained by the programming model ([168]8), in which M[j] is the
   submatrix corresponding to s[j]. The ranked result, denoted by r[j], is
   used to evaluate the fitness of s[j].

GA operations

   Selection, crossover, and mutation are three basic operators of GA. To
   distinguish from the above-mentioned mutation, we denoted the
   ‘mutation’ operator as ‘GA_mutation’. For individual s[j] and rank r[j]
   of each individual s[j] based on the fitness value, the selection
   probability was defined as
   [MATH:
   <msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>=</mo><m
   frac><mrow><mn>2</mn><msub><mrow><mi>r</mi></mrow><mrow><mi>j</mi></mro
   w></msub></mrow><mrow><mi>n</mi><mo
   stretchy="false">(</mo><mi>n</mi><mo>+</mo><mn>1)</mn></mrow></mfrac>
   :MATH]
   10

   where n is the population size.

   The detailed GA procedure is provided in the supplementary information.

Integrating GA with subsampling

   Robustness means that the algorithm can give identical results for
   various datasets with high probability. Through the use of subsampling,
   AWRMP investigates probabilities of the gene sets selected by the GA.
   We used a leave-one-out subsampling strategy to obtain n subsamples
   A[i−] for i = 1, 2, …, m, in which A[i−] was obtained by removing the
   ith row of A. For all subsamples {A[i−]} and a given k, m runs of the
   GA were conducted to select the optimal gene sets.
   [MATH: <mo
   stretchy="false">{</mo><mrow><msubsup><mrow><mi>G</mi></mrow><mrow><mi>
   k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup><mrow><mo
   stretchy="true">|</mo><mrow><mi>k</mi><mo>=</mo><mn>1,</mn><mspace
   width=".25em"></mspace><mn>2,</mn><mspace
   width=".25em"></mspace><mo>⋯</mo><mo>,</mo><mspace
   width=".25em"></mspace><msup><mrow><mi>m</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msup></mrow></mrow></mrow><mo
   stretchy="false">}</mo> :MATH]
   denotes the selected gene sets obtained by m runs of the GA. Note that
   the possible multiple solutions of the optimization model ([169]8) can
   lead to m^SS > m. For
   [MATH: <msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   , we defined
   [MATH: <msubsup><mrow><mi>m</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup><mo>≡</mo><munderover><mo>
   ∑</mo><mrow><mi>i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></m
   row></msub><mo
   stretchy="false">(</mo><msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi>
   </mrow><mrow><mi mathvariant="normal">SS</mi></mrow></msubsup><mo
   stretchy="false">)</mo> :MATH]
   11

   with
   [MATH: <mi>I</mi><mo
   stretchy="false">(</mo><msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi>
   </mrow><mrow><mi mathvariant="normal">SS</mi></mrow></msubsup><mo
   stretchy="false">)</mo><mo>≡</mo><mrow><mo
   stretchy="true">(</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd columnalign="left"><mn>1</mn></mtd><mtd
   columnalign="left"><mrow><mspace
   width=".1em"></mspace><msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi><
   /mrow><mrow><mi mathvariant="normal">SS</mi></mrow></msubsup><mspace
   width=".5em"></mspace><mi mathvariant="normal">is</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">selected</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">with</mi><mspace
   width=".5em"></mspace><mi>i</mi><mi mathvariant="normal">th</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">subsample</mi><mspace
   width=".5em"></mspace><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mo>
   −</mo></mrow></msub></mrow></mtd></mtr><mtr columnalign="left"><mtd
   columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mspace width=".1em"></mspace><mi
   mathvariant="normal">otherwise</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr></mtable></mrow></mrow><mo>.</
   mo> :MATH]
   12

   [MATH: <msubsup><mrow><mi>m</mi></mrow><mrow><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   is the total number of times that
   [MATH: <msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   was selected in all m runs of the GA. Consequently, the probability of
   [MATH: <msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   being selected as the optimal gene set can be obtained by
   [MATH:
   <mi>S</mi><mi>S</mi><msub><mrow><mi>R</mi></mrow><mrow><msubsup><mrow><
   mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup></mrow></msub><mo>≡</mo><m
   i>P</mi><mi>r</mi><mo
   stretchy="false">(</mo><msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi>
   </mrow><mrow><mi mathvariant="normal">SS</mi></mrow></msubsup><mspace
   width=".25em"></mspace><mi mathvariant="normal">is</mi><mspace
   width=".25em"></mspace><mi mathvariant="normal">selected</mi><mo
   stretchy="false">)</mo><mo>=</mo><mfrac><mrow><msubsup><mrow><mi>m</mi>
   </mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup></mrow><mi>m</mi></mfrac>
   :MATH]
   13

   which is called the subsampling rate (SSR) in this study. Moreover, the
   subsampling rate of a gene can also be calculated by Eq. ([170]13),
   which denotes the probability of a gene being included in the optimal
   gene set. To test the significant robustness of
   [MATH: <msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   , the null hypothesis was set up as follows: the distribution of
   [MATH: <msubsup><mrow><mi>m</mi></mrow><mrow><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup> :MATH]
   was assumed to be a binomial distribution Bin(p, m). By taking the
   uncertainty of data into consideration, p is further assumed to obey a
   Beta distribution Beta(p[0]m, m) where p[0] ∈ (0, 1) is a user-defined
   hyper-parameter. In this study, p[0] = 0.1. Note that the Beta
   distribution is a conjugate distribution of the binomial distribution
   and the Beta-binomial distribution is the corresponding posterior
   distribution. Consequently, the following statistics
   [MATH:
   <msub><mrow><mi>Q</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>≡</mo><m
   n>1</mn><mo>−</mo><munderover><mo>∑</mo><mrow><mi>r</mi><mo>=</mo><mn>0
   </mn></mrow><mrow><msubsup><mrow><mi>m</mi></mrow><mrow><mi>k</mi></mro
   w><mrow><mi
   mathvariant="normal">SS</mi></mrow></msubsup></mrow></munderover><mspac
   e width=".25em"></mspace><mi>H</mi><mrow><mo
   stretchy="true">(</mo><mrow><mi>r</mi><mo>,</mo><mi>m</mi><mo>,</mo><ms
   ub><mrow><mi>p</mi></mrow><mrow><mn>0</mn></mrow></msub><mi>m</mi><mo>,
   </mo><mi>m</mi></mrow><mo stretchy="true">)</mo></mrow> :MATH]
   14

   is calculated. H is the Beta-binomial probability mass function
   [MATH: <mi>H</mi><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>m</mi></mrow><mrow><mn>1</m
   n></mrow></msub><mo>,</mo><msub><mrow><mi>M</mi></mrow><mrow><mn>1</mn>
   </mrow></msub><mo>,</mo><msub><mrow><mi>m</mi></mrow><mrow><mn>2</mn></
   mrow></msub><mo>,</mo><msub><mrow><mi>M</mi></mrow><mrow><mn>2</mn></mr
   ow></msub></mrow><mo stretchy="true">)</mo></mrow><mo>=</mo><mrow><mo
   stretchy="true">(</mo><mrow><mtable><mtr><mtd><mrow><msub><mrow><mi>M</
   mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></mtd></mtr><mtr><mtd><m
   row><msub><mrow><mi>m</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow><
   /mtd></mtr></mtable></mrow><mo
   stretchy="true">)</mo></mrow><mfrac><mrow><mi>B</mi><mo
   stretchy="false">(</mo><msub><mrow><mi>m</mi></mrow><mrow><mn>1</mn></m
   row></msub><mo>+</mo><mi>a</mi><mo>,</mo><msub><mrow><mi>M</mi></mrow><
   mrow><mn>1</mn></mrow></msub><mo>−</mo><msub><mrow><mi>m</mi></mrow><mr
   ow><mn>1</mn></mrow></msub><mo>+</mo><mi>b</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>B</mi><mo
   stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo
   stretchy="false">)</mo></mrow></mfrac> :MATH]
   15

   where B(⋅) is the Beta function, a = m[2] + 1, and b = M[2]−m[2] + 1.
   The
   [MATH:
   <msubsup><mrow><mi>G</mi></mrow><mrow><mi>k</mi></mrow><mrow><mspace
   width=".1em"></mspace><mi mathvariant="normal">SS</mi><mspace
   width=".1em"></mspace></mrow></msubsup> :MATH]
   that satisfies Q[j] ≤ 0.05 was considered to form the driver gene set.
   We further defined the subsampling rate for gene g as follows:
   [MATH:
   <mi>S</mi><mi>S</mi><msub><mrow><mi>R</mi></mrow><mrow><mi>g</mi></mrow
   ></msub><mo>≡</mo><mi mathvariant="normal">Pr</mi><mo
   stretchy="false">(</mo><mspace width=".1em"></mspace><mi>g</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">is</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">selected</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">in</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">the</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">driver</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">gene</mi><mspace
   width=".5em"></mspace><mi mathvariant="normal">set</mi><mo
   stretchy="false">)</mo><mo>=</mo><mfrac><mrow><munderover><mo>∑</mo><mr
   ow><mi>i</mi><mn>=1</mn></mrow><mi>m</mi></munderover><mspace
   width=".25em"></mspace><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></m
   row></msub><mo stretchy="false">(</mo><mi>g</mi><mo
   stretchy="false">)</mo></mrow><mi>m</mi></mfrac> :MATH]
   16

   with
   [MATH: <msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></mrow></msub><mo
   stretchy="false">(</mo><mspace width=".1em"></mspace><mi>g</mi><mo
   stretchy="false">)</mo><mo>≡</mo><mrow><mo
   stretchy="true">(</mo><mrow><mtable columnalign="left"><mtr
   columnalign="left"><mtd columnalign="left"><mn>1</mn></mtd><mtd
   columnalign="left"><mrow><mspace
   width=".1em"></mspace><mi>g</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">is</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">selected</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">in</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">the</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">ith</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">subsampling</mi><mspace width=".5em"></mspace><mi
   mathvariant="normal">run</mi></mrow></mtd></mtr><mtr
   columnalign="left"><mtd columnalign="left"><mn>0</mn></mtd><mtd
   columnalign="left"><mrow><mspace width=".1em"></mspace><mi
   mathvariant="normal">otherwise</mi><mspace
   width=".1em"></mspace></mrow></mtd></mtr></mtable></mrow></mrow><mn>.</
   mn> :MATH]
   17

   Based on SSR[g], we define a parsimonious set as follows:
   [MATH: <mi mathvariant="normal">Parsimonious</mi><mspace
   width=".5em"></mspace><mi
   mathvariant="normal">set</mi><mo>≡</mo><mrow><mo
   stretchy="true">{</mo><mrow><mi>g</mi><mo
   stretchy="false">|</mo><mi>S</mi><mi>S</mi><msub><mrow><mi>R</mi></mrow
   ><mrow><mi>g</mi></mrow></msub><mo>=</mo><mn>1</mn></mrow><mo
   stretchy="true">}</mo></mrow><mo>,</mo> :MATH]
   18

   which indicates the most robust result obtained by AWRMP.

Evaluation of the gene set G

   The coverage, mutual exclusivity, and optimal performance of the gene
   set G were evaluated by the coverage score, overlap score, and total
   score, respectively as follows:
   [MATH: <mi mathvariant="normal">Coverage</mi><mspace
   width=".25em"></mspace><mi
   mathvariant="normal">score</mi><mo>≡</mo><mfrac><mn>1</mn><mi>m</mi></m
   frac><mrow><mo stretchy="true">|</mo><mrow><mi
   mathvariant="normal">Γ</mi><mrow><mo
   stretchy="true">(</mo><mi>G</mi><mo
   stretchy="true">)</mo></mrow></mrow><mo stretchy="true">|</mo></mrow>
   :MATH]
   19
   [MATH: <mi mathvariant="normal">Overlap</mi><mspace
   width=".25em"></mspace><mi
   mathvariant="normal">score</mi><mo>≡</mo><mfrac><mn>1</mn><mi>m</mi></m
   frac><mi>ω</mi><mrow><mo stretchy="true">(</mo><mi>G</mi><mo
   stretchy="true">)</mo></mrow> :MATH]
   20
   [MATH: <mi mathvariant="normal">Totals</mi><mspace
   width=".25em"></mspace><mi
   mathvariant="normal">core</mi><mo>≡</mo><mfrac><mn>1</mn><mi>m</mi></mf
   rac><msub><mrow><mi>W</mi></mrow><mrow><mi>λ</mi></mrow></msub><mrow><m
   o stretchy="true">(</mo><mi>G</mi><mo
   stretchy="true">)</mo></mrow><mn>.</mn> :MATH]
   21

   We further define the overlap contribution for gene g ∈ G as follows:
   [MATH: <mi mathvariant="normal">Overlap</mi><mspace
   width=".25em"></mspace><mi
   mathvariant="normal">contribution</mi><mspace
   width=".25em"></mspace><mi mathvariant="normal">of</mi><mspace
   width=".25em"></mspace><mi mathvariant="normal">gene</mi><mspace
   width=".25em"></mspace><mi>g</mi><mo>≡</mo><mfrac><mn>1</mn><mi>m</mi><
   /mfrac><mrow><mo stretchy="true">(</mo><mrow><mi>ω</mi><mrow><mo
   stretchy="true">(</mo><mi>G</mi><mo
   stretchy="true">)</mo></mrow><mo>−</mo><mi>ω</mi><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>g</m
   i><mo>−</mo></mrow></msub></mrow><mo
   stretchy="true">)</mo></mrow></mrow><mo stretchy="true">)</mo></mrow>
   :MATH]
   22

   where G[g−] is the gene set obtained by subtracting gene g from gene
   set G, and this analysis is used to measure how gene g affects the
   overlap score of G.

Supplementary information

   [171]41598_2019_42500_MOESM1_ESM.pdf^ (1.2MB, pdf)

   SUPPLEMENTARY INFORMATION FOR “ADAPTIVELY WEIGHTED AND ROBUST
   MATHEMATICAL PROGRAMMING FOR THE DISCOVERY OF DRIVER GENE SETS IN
   CANCERS”

Acknowledgements