Abstract

   Recent advances in DNA/RNA sequencing have made it possible to identify
   new targets rapidly and to repurpose approved drugs for treating
   heterogeneous diseases by the ‘precise’ targeting of individualized
   disease modules. In this study, we develop a Genome-wide Positioning
   Systems network (GPSnet) algorithm for drug repurposing by specifically
   targeting disease modules derived from individual patient’s DNA and RNA
   sequencing profiles mapped to the human protein-protein interactome
   network. We investigate whole-exome sequencing and transcriptome
   profiles from ~5,000 patients across 15 cancer types from The Cancer
   Genome Atlas. We show that GPSnet-predicted disease modules can predict
   drug responses and prioritize new indications for 140 approved drugs.
   Importantly, we experimentally validate that an approved cardiac
   arrhythmia and heart failure drug, ouabain, shows potential antitumor
   activities in lung adenocarcinoma by uniquely targeting a
   HIF1α/LEO1-mediated cell metabolism pathway. In summary, GPSnet offers
   a network-based, in silico drug repurposing framework for more
   efficacious therapeutic selections.

   Subject terms: Drug discovery, Genetics, Systems biology
     __________________________________________________________________

   Identification of disease modules in the human interactome can guide
   more efficacious therapeutic selections. Here, the authors introduce a
   network-based methodology to identify individualized disease modules by
   mapping patients’ DNA and RNA sequencing profiles to the interactome,
   enabling prediction of cancer type-specific drug responses.

Introduction

   After the completion of the human genome project in 2003, there has
   been unexpected enthusiasm for how genetics and genomics would inform
   drug discovery and development^[50]1. Although it is in its infancy,
   the use of genomics in the drug discovery and development pipeline has
   generated some successes^[51]2,[52]3. For example, proprotein
   convertase subtilisin/kexin type 9 (PCSK9), first discovered by human
   genetics studies in 2003, has generated great interest in
   genomics-informed drug discovery in cardiovascular disease^[53]4.
   However, the overall clinical efficacy of genome-derived approved drugs
   has remained limited owing to the heterogeneity of complex diseases.

   Drug development in the genomics era has become a highly integrated
   systems pipeline in which complementary multi-omics and computational
   methods are used^[54]3. Recent technological and computational advances
   in genomics and systems biology have now made it possible to identify
   new druggable targets and therapeutic agents by uniquely targeting
   cancer type-specific mechanisms (e.g., perturbed pathways in the
   disease module) that cause or contribute to human disease^[55]5–[56]8.
   For example, network-based approaches have offered possibilities for
   drug repurposing^[57]8, target identification^[58]9, and combination
   therapy^[59]10 by quantifying the proximity of drug targets and disease
   proteins in the human protein–protein interactome. It remains unclear
   as to how generalizable these network-based disease module
   identification methodologies are in exploiting the wealth of massive
   multi-omics data from genomics studies and offering novel insights into
   cancer type-specific mechanisms. Were they to be successful, we would
   ultimately be able to target precisely human disease pathways,
   promoting the development of precision medicine.

   In this study, we present a novel network-based disease module
   identification and in silico drug repurposing methodology, denoted the
   Genome-wide Positioning Systems network (GPSnet) algorithm.
   Specifically, we demonstrate the feasibility of individualized disease
   module identification by integrating large-scale DNA sequencing and
   transcriptome (RNA-seq) profiling across approximately 5000 human tumor
   genomes to the human protein–protein interactome. We find that gene
   expression of disease modules identified by GPSnet can predict drug
   responses in cancer cell lines with high accuracy. Importantly, we show
   that GPSnet can be used for in silico drug repurposing by uniquely
   targeting the specific disease module (i.e., cancer type-specific
   module) through combining network proximity measure and gene-set
   enrichment analysis (GSEA) approaches. Furthermore and importantly, we
   validate these network-based predictions experimentally. From a
   translational perspective, if broadly applied, GPSnet offers a powerful
   network-based tool for target identification and drug repurposing. We
   believe this approach can minimize the translational gap between
   genomics studies and drug development, currently a significant
   bottleneck in precision medicine.

Results

Modularity of highly mutated genes in the human interactome

   Disease proteins are not scattered randomly in the human
   protein–protein interactome, but form one or several connected
   subgraphs, defining the disease module^[60]7. Yet, whether the mutant
   proteins directly derived from individual patient sequencing data
   (e.g., whole-exome sequencing) form a statistically significant module
   in the human protein–protein interactome (or subnetwork in a disease
   module) remains unknown. To test this hypothesis, we first collected
   the significantly mutated genes (SMGs) identified from large-scale
   genome sequencing projects across 15 cancer types (Supplementary
   Data [61]1, see Methods section). We found that SMGs form highly
   connected subgraphs in the human protein–protein interactome across all
   15 cancer types (Fig. [62]1a). In lung adenocarcinoma (LUAD), 83.1% of
   genes (172/207, P = 1.6 × 10^−62 [permutation test]) form the largest
   connected component that is statistically signifcantly
   clustered compared to the same number of randomly selected genes with
   similar connectivity (degree) as the original SMGs in the human
   interactome (Fig. [63]1b). Interestingly, we found that genes with high
   somatic mutation frequency also tend to form highly connected subgraphs
   compared to the same fraction of random gene sets with the same degree
   distribution as the highly mutated genes in the human interactome. For
   example, mutated genes with high somatic mutation frequency form a
   significant module (P = 3.0 × 10^−^11, Fig. [64]1b) in LUAD, although
   the module size of highly mutated genes is less than that of SMGs.
   Previous studies have suggested that well-known cancer genes/proteins
   often have high connectivity in the literature-derived human
   protein–protein interactome. To inspect potential literature data bias,
   we performed the same network modularity analysis using the
   high-throughput systematic interactome (see Methods section) identified
   by the (unbiased) yeast two-hybrid (Y2H) screening assays previously
   published^[65]11. We found that the SMGs and highly mutated genes form
   a significant module in this more limited but unbiased interactome, as
   well (Supplementary Note [66]1 and Supplementary Fig. [67]1),
   suggesting low literature data bias.

Fig. 1.

   [68]Fig. 1
   [69]Open in a new tab

   Proof-of-concept of cancer type-specific disease module. a Both
   significantly mutated genes (SMGs, Supplementary Data [70]1) identified
   by statistical approaches and highly mutated genes ranked by mutation
   frequency form a significant largest connected component (LCC) compared
   to random genes with matching connectivity (degree) distributed in the
   human protein–protein interactome, across 15 cancer types
   (Supplementary Fig. [71]6). b The sizes of the LCC of SMGs and of
   highly mutated proteins are shown for lung adenocarcinoma (LUAD). The
   y-axis (p) denotes the fraction of the number of random modules
   against the total number of permutations. The observed module sizes,
   172 (SMGs, orange line) and 76 (highly mutated gene, cyan line), are
   significantly larger than the random expectation. The P value was
   calculated by permutation test. c A subgraph illustrating a subnetwork
   of highly mutated genes versus genes with low mutation frequency in
   LUAD. The full names of the 15 cancer types are provided in the main
   text

   Figure [72]1c illustrates the connectivity of products of several
   highly mutated genes (such as EGFR, TP53, and NF1) compared to genes
   with low somatic mutation frequency in LUAD. These observations are
   consistent with previous network-based studies showing that
   experimentally reported cancer proteins are more likely to connect to
   each other in the human interactome than to noncancer
   proteins^[73]12–[74]14. The strong modularity of gene products with
   high mutation frequency led us to develop new network-based
   methodologies to identify cancer type-specific disease modules by
   mapping individual patient’s DNA sequencing data to the human
   protein–protein interactome disease network model.

Pipeline of GPSnet

   Here, we present GPSnet, an integrated, network-based methodology for
   cancer type-specific disease module identification and in silico drug
   repurposing. Figure [75]2 illustrates the pipeline of the GPSnet
   algorithm, which contains two main components: (a) cancer type-specific
   disease module identification by integrating whole-exome sequencing
   (somatic mutation) and transcriptome (RNA-seq) profiling into the human
   protein–protein interactome (Fig. [76]2a), and (b) in silico drug
   repurposing that targets uniquely cancer type-specific disease modules
   generated from the first step by implementing both GSEA and network
   proximity approaches (Fig. [77]2b, c) complemented by mechanistic in
   vitro validation (Fig. [78]2d) in human cell lines (see Methods
   section).

Fig. 2.

   [79]Fig. 2
   [80]Open in a new tab

   A diagram illustrating the pipeline of GPSnet for in silico drug
   repurposing. The GPSnet pipeline contains three key steps: a
   network-based identification of cancer type-specific disease modules by
   integrating patient’s DNA sequencing (somatic mutations) and
   transcriptome (RNA-seq) data into the human protein–protein
   interactome. We collected the human protein–protein interactions (PPIs)
   by assembling multiple types of experimental data, and built the cancer
   type-specific (co-expression) PPI network based on tumors’ RNA-seq data
   (see Methods section). The Manhattan plot shows the pan-cancer mutation
   load distribution. The heatmap shows gene expression (RNA-seq) across
   tumor samples; red denotes high expression levels and blue denotes
   low-expression levels. b, c Performing both a network proximity
   analysis (quantifying the network distance of drug targets to cancer
   type-specific disease modules from a in the human protein–protein
   interactome) and gene-set enrichment analysis (GSEA) by searching
   drug-up/downregulated genes from drug-induced transcriptome data
   (Connectivity Map) in human cell lines in cancer type-specific disease
   modules; and d in silico drug repurposing and experimental validation

   Figure [81]1 shows that genes with high mutation frequency form highly
   connected subgraphs called disease modules. In addition, we find that
   genes in the subnetwork identified from the RNA-seq database
   co-expressed protein–protein interactome are highly mutated, and SMGs
   are significantly enriched in the subnetworks identified from RNA-seq
   data, as well (Supplementary Note [82]2 and Supplementary Figs. [83]2,
   [84]3). Here, we thereby define a module as a subgraph within the human
   protein–protein interactome (see Methods section) and calculate the
   score of the module M as
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>m</mi></mrow></msub><mo>=</mo><m
   frac><mrow><msub><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>∈</mo><mi>M
   </mi></mrow></msub><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>s</mi><mrow><mo>(</mo
   ><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>-</mo><mi>μ</mi></mrow><mo
   >)</mo></mrow></mrow><mrow><msqrt><mi>m</mi></msqrt></mrow></mfrac><mo>
   ,</mo> :MATH]
   1

   where m is the number of genes in module M, s(i) is the
   network-predicted mutation frequency of gene i normalized by a gene’s
   cDNA sequence length, and μ is the average mutation frequency over the
   whole gene set for a specific cancer type after network smoothing using
   the random walk with restart algorithm (see Supplementary Note [85]3).
   We denote Γ[M] as the set of genes that interact with module M in the
   human protein–protein interactome. Specifically, we first randomly
   select a protein (gene) as a “seed protein.” For each gene
   [MATH: <mi>i</mi><mo>∈</mo><msub><mrow><mi
   mathvariant="normal">Γ</mi></mrow><mrow><mi>M</mi></mrow></msub> :MATH]
   , we next calculate its connectivity significance^[86]15 in the human
   protein–protein interactome as
   [MATH:
   <mi>P</mi><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>=
   </mo><munderover accent="false" accentunder="false"><mrow><mo>
   ∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><msub><mrow><mi>k</mi></mrow><mr
   ow><mi>m</mi></mrow></msub></mrow><mrow><msub><mrow><mi>k</mi></mrow><m
   row><mi>i</mi></mrow></msub></mrow></munderover><mspace
   width="0.25em"></mspace><mfrac><mrow><mfenced close=")" open="("
   separators=""><mrow><mtable><mtr><mtd
   columnalign="center"><mi>m</mi></mtd></mtr><mtr><mtd
   columnalign="center"><mi>k</mi></mtd></mtr></mtable></mrow></mfenced><m
   fenced close=")" open="(" separators=""><mrow><mtable><mtr><mtd
   columnalign="center"><mi>N</mi><mo>-</mo><mi>m</mi></mtd></mtr><mtr><mt
   d
   columnalign="center"><msub><mrow><mi>k</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>-</mo><mi>k</mi></mtd></mtr></mtable></mrow></mfenced></mr
   ow><mrow><mfenced close=")" open="("
   separators=""><mrow><mtable><mtr><mtd
   columnalign="center"><mi>N</mi></mtd></mtr><mtr><mtd
   columnalign="center"><msub><mrow><mi>k</mi></mrow><mrow><mi>i</mi></mro
   w></msub></mtd></mtr></mtable></mrow></mfenced></mrow></mfrac><mo>,</mo
   > :MATH]
   2

   where
   [MATH: <msub><mrow><mi>k</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   is the degree of gene i, m is the number of genes in the module,
   [MATH: <msub><mrow><mi>k</mi></mrow><mrow><mi>m</mi></mrow></msub>
   :MATH]
    is the number of module genes that link to gene i, and N is the total
   number of genes in the gene set. For each seed
   [MATH:
   <mi>i</mi><mo>∈</mo><msub><mrow><mi>Γ</mi></mrow><mrow><mi>M</mi></mrow
   ></msub> :MATH]
   , we further calculate the expanded module score if gene i is added to
   the module as
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mrow
   ></msub><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>=</
   mo><mfrac><mrow><mfenced close=")" open="("
   separators=""><mrow><mi>s</mi><mrow><mo>(</mo><mrow><mi>i</mi></mrow><m
   o>)</mo></mrow><mo>-</mo><mi>μ</mi></mrow></mfenced><mo>+</mo><msub><mr
   ow><mo>
   ∑</mo></mrow><mrow><mi>i</mi><mo>∈</mo><mi>M</mi></mrow></msub><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>s</mi><mrow><mo>(</mo
   ><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>-</mo><mi>μ</mi></mrow><mo
   >)</mo></mrow></mrow><mrow><msqrt><mi>m</mi><mo>+</mo><mn>1</mn></msqrt
   ></mrow></mfrac><mo>.</mo> :MATH]
   3

   The candidate gene i will be added to the module to build a new module
   by satisfying three criteria: (a) P(i) in Eq. ([87]2) is less than 0.05
   (connectivity significance), (b) this candidate gene is significantly
   co-expressed (P value < 0.05, F-statistic) within the neighborhood of
   the raw module based on RNA-seq profiles of tumor samples (see Methods
   section), and (c)
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mrow
   ></msub><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>></
   mo><msub><mrow><mi>Z</mi></mrow><mrow><mi>m</mi></mrow></msub> :MATH]
   . We repeat steps (a) and (c) until no more genes can be added by these
   criteria. In this study, we built ~60,000 raw modules for each cancer
   type by randomly selecting each seed gene in the human interactome
   approximately five times. Finally, we assembled the top 1%
   (approximately 300) modules with the highest module scores and defined
   the largest connected component^[88]7 by removing the isolated genes
   (less than 5% of isolated genes [nodes] across all 15 cancer types) to
   define the final disease module for each cancer type (Supplementary
   Fig. [89]4). Details are provided in the Supplementary Note [90]3.

Cancer type-specific disease module identified by GPSnet

   Via GPSnet, we identify computationally the final cancer type-specific
   disease modules for all 15 cancer types by integrating iteratively the
   patients’ whole-exome sequencing and RNA-seq data into the human
   protein–protein interactome (Fig. [91]2a). Figure [92]3a shows the
   network visualization of disease modules identified by GPSnet across 15
   cancer types: urothelial bladder carcinoma (BLCA), invasive breast
   carcinoma (BRCA), colorectal adenocarcinoma (COAD), glioblastoma
   multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney
   renal clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung
   adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian
   serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), skin
   cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), papillary
   thyroid carcinoma (THCA), and uterine corpus endometrial carcinoma
   (UCEC). The distributions of overlapping genes across different cancer
   types are provided in Supplementary Fig. [93]5. We find that most
   cancer types share multiple genes with other cancer types; however,
   several cancer types, including prostate adenocarcinoma (PRAD, 84
   unique genes), papillary thyroid carcinoma (THCA, 108 unique genes),
   and urothelial bladder carcinoma (BLCA, 50 unique genes), have high
   numbers of unique module genes compared to other cancers, suggesting
   unique biological pathways for these cancer types. The detailed data
   for 15 cancer type-specific disease modules identified by GPSnet are
   provided in Supplementary Data [94]2.

Fig. 3.

   Fig. 3
   [95]Open in a new tab

   Network-based identification and validation of cancer type-specific
   disease modules via GPSnet. a Pan-cancer analysis of cancer
   type-specific disease modules (Supplementary Data [96]2) identified by
   GPSnet across 15 cancer types. b Canonical cancer pathway enrichment
   analysis for the cancer type-specific disease modules identified by
   GPSnet across 15 cancer types. c Known cancer driver genes (named
   significantly mutated genes, Supplementary Data [97]1) are appreciably
   enriched in cancer type-specific disease modules in breast cancer. The
   cancer enrichment analyses for other cancer types are provided in
   Supplementary Fig. [98]6. The P value was computed by the permutation
   (randomization) test (see Methods section). d A heat map illustrating
   modularity similarity between known cancer driver genes (named
   significantly mutated genes) and the module genes identified by GPSnet
   across 15 cancer types. Modularity similarity (color key) is measured
   by the Jaccard-index
   [MATH: <mi>J</mi><mo>=</mo><mfenced close="∣" open="∣"
   separators=""><mrow><mi>A</mi><mo>∩</mo><mi>B</mi></mrow></mfenced><mo>
   ∕</mo><mfenced close="∣" open="∣"
   separators=""><mrow><mi>A</mi><mo>∪</mo><mi>B</mi></mrow></mfenced>
   :MATH]
   , where gene set A represent the known cancer driver genes (Y-axis) and
   gene set B represent module genes identified by GPSnet (X-axis). The
   full names of the 15 cancer types are provided in main text

   We next attempted to validate the GPSnet-identified disease modules via
   two approaches: (a) canonical disease pathway enrichment analysis, and
   (b) well-known disease-associated gene enrichment analysis. As shown in
   Fig. [99]3b, canonical cancer pathway enrichment analyses showed that
   genes in the GPSnet-predicted disease modules are significantly
   enriched in various canonical cancer pathways, which is comparable to
   the experimentally validated cancer genes from the Cancer Gene Census
   (CGC) database^[100]16. We further find that the known SMGs
   (Supplementary Data [101]1) are significantly enriched in the disease
   modules identified by GPSnet across all 15 cancer types (Fig. [102]3c,
   and Supplementary Fig. [103]6). For BRCA, SMGs are enriched in the
   identified breast cancer modules, which is significantly stronger than
   a random gene set with the same connectivity (degree) distribution
   (P = 1.33 × 10^−39, Fig. [104]3c). We calculate the module similarity
   (Jaccard-Index) across 15 cancer types in Fig. [105]3d. We find that
   the disease modules of several cancer types are very similar (such as
   LUAD and LUSC), while some cancer types have unique disease modules,
   such as LAML and PRAD, indicating both common mechanisms and unique
   pathways identified by GPSnet. We further collect the
   disease-associated genes from several publicly available databases
   (Supplementary Data [106]3, see Methods section). Supplementary
   Fig. [107]7 shows that cancer-associated genes are significantly
   enriched in the cancer type-specific disease modules identified by
   GPSnet, as well. To assess the impact of the literature data bias of
   the human protein–protein interactome, we re-identify cancer
   type-specific disease modules for 15 cancer types using the unbiased,
   Y2H high-throughput systematic human interactome^[108]11. We find that
   known SMGs are significantly enriched in the cancer type-specific
   disease modules identified from the unbiased, systematic interactome by
   GPSnet, as well, revealing the low literature data bias for GPSnet
   (Supplementary Fig. [109]8). Altogether, cancer type-specific disease
   modules are highly enriched in the known cancer genes and canonical
   cancer pathways, suggesting potential pathobiological implications of
   GPSnet-based analysis.

Identification of new pharmacogenomics biomarkers by GPSnet

   To examine further the potential pharmacogenomics application of
   GPSnet, we downloaded robust multi-array (RMA) gene expression profiles
   and drug response data (IC[50], the half maximal inhibitory
   concentration) across 1065 cell lines from the Genomics of Drug
   Sensitivity in Cancer (GDSC) database^[110]17. We then built regression
   models to predict drug responses (IC[50]) based on RMA gene expression
   profiles of GPSnet-identified disease modules (see Methods section) for
   three specific cancer types: BRCA, LUAD, and SKCM. We focused on seven
   drugs across those three cancer types using subject matter expertise
   based on a combination of factors: (a) the highest variances of IC[50]
   of each drug among over 1065 cell lines; (b) drugs approved by
   targeting specific pathways; and (c) cancer type with the highest
   number of tumor samples from TCGA. These seven selected drugs include
   enzastaurin, linifanib, olaparib, pictilisib, refametinib, selumetinib,
   and vemurafenib.

   We define cell lines whose IC[50] value is higher than 10 µM as
   drug-resistant cell lines and the remainder as drug sensitive cell
   lines. As shown in Fig. [111]4, the area-under-the-receiver operating
   characteristic curves (AUC) ranges from 0.711 to 0.821 across 7 drugs
   with 10-fold cross-validation, revealing high performance in predicting
   drug responses by GPSnet-identified cancer type-specific disease
   modules. For example, olaparib, an inhibitor of poly (ADP-ribose)
   polymerase enzymes, is approved for treatment of breast cancer and
   ovarian cancer^[112]18. Figure [113]4a reveals that gene expression in
   the GPSnet-identified breast cancer disease module can predict
   olaparib’s response with AUC = 0.820. Vemurafenib, a B-Raf V600E
   targeted inhibitor, is approved for the treatment of late-stage
   melanoma^[114]19. We find that the GPSnet-predicted melanoma disease
   module can predict response to vemurafenib with an AUC = 0.821
   (Fig. [115]4c). Refametinib, a MEK specific inhibitor, is currently in
   clinical trial for the treatment of RAS-mutant cancers^[116]20.
   Figure [117]4a–c shows that the GPSnet-predicted disease modules can
   accurately predict responses to refametinib in all three cancer types:
   BRCA (AUC = 0.730), LUAD (AUC = 0.819), and SKCM (AUC = 0.811).
   Altogether, gene expression in the GPSnet-disease modules offer
   potential pharmacogenomics biomarkers for assessment of drug
   resistance/sensitivity in multiple types of cancer.

Fig. 4.

   [118]Fig. 4
   [119]Open in a new tab

   Pharmacogenomics validation for GPSnet-predicted disease modules. a–c
   The receiver operating characteristic (ROC) curves for seven selected
   drugs: enzastaurin, linifanib, olaparib, pictilisib, refametinib,
   selumetinib, and vemurafenib, for three specific cancer types,
   including invasive breast carcinoma (BRCA, a), lung adenocarcinoma
   (LUAD, b), and skin cutaneous melanoma (SKCM, c). Drug IC[50] values
   were predicted based on regression models built by utilizing gene
   expression profiles of the GPSnet-identified cancer type-specific
   disease modules as feature vectors (see Methods section). The area
   under ROC curves (AUC) during 10-fold cross-validations are shown

High druggability of disease modules identified by GPSnet

   Previous studies have revealed that genes/proteins identified by genome
   sequencing studies, such as genome-wide association studies, often
   failed to identify novel druggable targets^[120]21. Integration of
   human PPIs, however, offers potential opportunities to identify
   druggable pathways by targeting adjacent mutant genes^[121]21. We
   further assessed whether genes in the GPSnet-predicted modules could
   offer more druggable targets compared to traditional statistics-based
   genome sequencing analysis. Interestingly, we find that gene products
   from the GPSnet-predicted modules are more likely to be targeted by
   approved drugs or clinically investigational drugs compared to SMGs
   that are identified by the statistics-based approaches alone
   (Supplementary Figs. [122]9, [123]10). For example, among 229 SMGs in
   breast cancer, only 17 are targeted by known drugs. However, among 236
   genes in the GPSnet-predicted breast cancer module, 46 genes are
   targeted by known drugs, three-fold higher than SMGs (P < 0.0001,
   Fisher’s test). Furthermore, gene products in the GPSnet-identified
   modules are significantly targeted by known drugs compared to
   genome-wide drug targets’ distributions across all 15 cancer types
   (Supplementary Fig. [124]9). Importantly, the GPSnet-identified disease
   modules contain targets of drugs (Supplementary Data [125]4) known to
   treat this disease in two cancer types: LUAD (P < 0.0001, Fisher’s
   test, Supplementary Fig. [126]10) and BRCA (P < 0.0001, Fisher’s test,
   Supplementary Fig. [127]10). We further investigated drug target
   distribution in the human protein-protein interactome. As expected, we
   found that neighbors with significant connectivity to the known SMGs
   are more likely to be targeted by known drugs compared to SMGs only, or
   experimentally validated genes from the Cancer Gene Census
   database^[128]16 (Supplementary Fig. [129]11). Taken together, disease
   modules identified by GPSnet offer more druggable targets compared to
   traditional statistics-based genome sequencing analysis approaches. We,
   therefore, next turned to in silico drug repurposing by uniquely
   targeting GPSnet-predicted cancer type-specific disease modules derived
   from the human protein–protein interactome analysis.

Uncovering new indications for approved drugs by GPSnet

   To identify novel indications for approved drugs, we utilize two
   complementary approaches: (i) a network proximity (z-score) measure
   that quantifies the relationship between cancer type-specific disease
   modules identified by GPSnet and drug targets in the human
   protein–protein interactome^[130]8 (see Methods section); and (ii)
   gene-set enrichment analysis (GSEA)^[131]22 that examines drug-induced
   gene signatures (up/downregulated genes in tumor cells before and after
   drug treatments) in the GPSnet-predicted disease module (Fig. [132]2c,
   see Methods section). Here, we investigate 1309 drugs in total that
   have known target information from publicly available databases or gene
   signatures from the Connectivity Map database^[133]23 (see Methods
   section). Figure [134]5a shows that 140 approved drugs (adjusted
   P < 0.01) are significantly associated with at least one cancer type
   identified by network proximity or GSEA approaches (Supplementary
   Data [135]5). For example, several known cancer drugs (such as
   gefitinib, etoposide, doxorubicin, and tamoxifen) defined by the
   first-level of the Anatomical Therapeutic Chemical Classification codes
   are correctly predicted by the network proximity approach compared to
   GSEA, revealing the power of the network approach for drug repurposing,
   consistent with our recent study^[136]8.

Fig. 5.

   [137]Fig. 5
   [138]Open in a new tab

   Network-based in silico drug repurposing and experimental validation. a
   A circos plot illustrating a global view of computationally predicted
   potential anticancer indications for 140 approved drugs across 15
   cancer types, identified by both network proximity and gene-set
   enrichment analysis approaches. Drugs are grouped by their first-level
   Anatomical Therapeutic Chemical Classification (ATC) codes. Two
   overlapping drugs (ouabain and niclosamide) from the top 10 lists
   identified by both network proximity and GSEA approaches in lung
   adenocarcinoma (Supplementary Data [139]5) were selected for
   experimental validation. b, c Dose-response curves of ouabain (b) and
   niclosamide (c) in six representative non-small cell lung cancer
   (NSCLC) cell lines (A549, H522, H596, H1975, HCC827, and PC9). Cells
   were treated with a series of concentrations of ouabain and niclosamide
   for 48 h. The CellTiter 96 AQueous one solution cell proliferation kit
   was used to determine cell viability (see Methods section). Data are
   represented as mean ± SEM (n = 3) and each experiment was performed at
   least three times in duplicate. d Effect of ouabain on the colony
   formation assay in two NSCLC cell lines (A549 and H522)

   To examine further the performance of in silico drug repurposing in
   GPSnet, we inspected the top 10 predicted drugs by both network
   proximity and GSEA in LUAD. We find that two FDA-approved non-cancer
   drugs, niclosamide (an approved drug for the treatment of tapeworm
   infestations) and ouabain (an approved drug for treatment of cardiac
   arrhythmias and heart failure), are predicted to have significant
   associations (directionality of effect is not defined in this initial
   analysis) with LUAD by both network proximity and GSEA approaches
   (Fig. [140]5a and Supplementary Fig. [141]12). To test the antitumor
   effect of niclosamide and ouabain on non-small cell lung cancer
   (NSCLC), the MTS assay is performed to determine the cell proliferation
   capacity of 6 different cell lines: A549, H522, H596, H1975, HCC827,
   and PC9. We find that both niclosamide and ouabain show potential
   antitumor effects on these NSCLC cell lines (Fig. [142]5b, c). For
   example, ouabain is cytotoxic to all tested NSCLC cells in the
   nanomolar (nM) range: IC[50] = 0.45 nM in A549 cells, IC[50] = 0.58 nM
   in H522 cells, IC[50] = 0.62 nM in H596 cells, IC[50] = 0.52 nM in
   H1975 cells, IC[50] = 0.94 nM in HCC827 cells, and IC[50] = 2.48 nM in
   PC9 cells (Fig. [143]5c). Furthermore, a prolonged colony formation
   assay is performed to investigate whether ouabain causes irreversible
   growth arrest (Fig. [144]5d). We find that ouabain significantly
   impairs A549 and H522 cell growth, as demonstrated by a reduction in
   both colony number and colony size in the ouabain-treated group.
   Altogether, ouabain exhibits potential anti-proliferative efficacy and
   was, therefore, selected for further experimental evaluation.

Ouabain inhibits HIF1α/LEO1 pathway in NSCLC cells

   We next examined the mechanism-of-action of ouabain in NSCLC by network
   analysis on the lung-specific human protein–protein interactome.
   Currently, there are four reported targets (ATP1A1, EIF4E, HIF1α, and
   SRC) of ouabain in human cells. Figure [145]6a shows that SRC has the
   closest distance to the LUAD disease module identified by GPSnet,
   followed by HIF1α, EIF4E, and ATP1A1. Both SRC and EIF4E play essential
   roles in multiple cancers, including NSCLC^[146]24,[147]25. We,
   therefore, examine whether HIF1α contributes to the antitumor effect of
   ouabain as it shows a closer network proximity to the LUAD module
   compared to the primary target, ATP1A1 (Fig. [148]6a). As predicted,
   ouabain downregulates both mRNA and protein expression of HIF1α in a
   dose-dependent manner in A549 cells (Fig. [149]6b, c). Knockdown of
   HIF1A by two distinct siRNAs reduces the response of ouabain in A549
   cells from 5-fold to 6-fold (Supplementary Fig. [150]13). Moreover,
   knockdown of HIF1A by two distinct shRNAs reduces the response of
   ouabain in A549 cells approximately 17-fold (Fig. [151]6d, e). These
   observations reveal that HIF1α may contribute to the potential
   antitumor effect of ouabain in NSCLC cells.

Fig. 6.

   [152]Fig. 6
   [153]Open in a new tab

   Network-based experimental validation of ouabain’s likely
   mechanism-of-action in non-small cell lung cancer (NSCLC). a A
   highlighted subnetwork reveals the inferred mechanism-of-action for
   ouabain’s anticancer effects on NSCLC by network analysis. The network
   distance (the shortest path) of four known ouabain targets to the
   cancer type-specific disease module of lung adenocarcinoma in the human
   protein–protein interactome is ranked for SRC, HIF1α, EIF4E, and
   ATP1A1. Yellow circles: known ouabain targets; blue circles:
   significantly mutated genes (SMGs) for lung adenocarcinoma (LUAD) based
   on data from The Cancer Genome Atlas (Supplementary Data [154]1);
   purple circles: LUAD-specific module genes identified by GPSnet; green
   circles: overlapping  genes between SMGs and LUAD-specific module
   genes. The size of circle nodes denote the gene expression level in
   lung compared to other 31 tissues based on RNA-seq data from GTEx
   database (see Methods section). The protein–protein interactions (PPIs)
   among ouabain targets, LUAD-specific module genes, and SMGs are labeled
   by blue links. The ouabain-target interactions are labeled by red
   links. Background light gray lines represent other edges in the dense
   PPIs unrelated to the LUAD-specific disease module. b Ouabain
   downregulates gene expression of HIF1A in a dose-dependent manner in
   A549 cells. c Western blot analysis of HIF1α protein expression
   (normalized by GAPHD) after CoCl[2]/ouabain treatment in A549 cells.
   The uncropped image for c is included as Supplementary Fig. [155]22. d
   shRNA significantly downregulates HIF1A gene expression in A549 cells.
   Differences between the two groups (b and d) were analyzed by Student’s
   t-test: *P-value < 0.05 and ***P-value < 0.001. e Cell viability
   reduction by ouabain is perturbed by two specific shRNA against HIF1A.
   Each experiment was performed at least three times in duplicate. All
   data is represented as mean ± SEM (n = 3) and each experiment was
   performed at least three times in duplicate

   HIF1α activates the transcription of multiple genes involved in many
   human diseases, including cancer^[156]26. We next collect 719 putative
   HIF1α-targeted genes (Supplementary Data [157]6) identified by
   different high-throughput experiments or the curated literature (see
   Methods). We found that 251 HIF1α targeted genes (Supplementary
   Data [158]6) were differentially expressed in LUAD patients compared to
   normal control samples based on TCGA data^[159]27. KEGG pathway
   enrichment analysis shows that the top enriched pathways are cell
   metabolism-related pathways, including redox pathways, such as
   glycolysis/gluconeogenesis (adjusted P = 1.66 × 10^−5) and central
   carbon metabolism, in cancer (adjusted P = 0.001, Fig. [160]7a). We
   further tested the effects of ouabain on redox metabolism using a
   highly responsive NAD+/NADH sensor, SoNar, reported in our previous
   study^[161]28. We found that ouabain at 10 or 20 nM reduced the
   NAD+/NADH ratio in A549 cells (Supplementary Fig. [162]14), suggesting
   that ouabain potentially targets cell (redox) metabolism in NSCLC
   cells.

Fig. 7.

   [163]Fig. 7
   [164]Open in a new tab

   Ouabain regulates LEO1-mediated cell metabolism of non-small cell lung
   cancer (NSCLC) cells by bioinformatics and experimental validation. a
   Pathway enrichment analysis for the differentially expressed HIF1A
   target genes in lung adenocarcinoma patients. FDR: false discovery rate
   (also denoted the adjusted p-value for multiple testing using
   Bonferroni’s correction). b Correlation of LEO1 expression with serine
   or glycine abundance in NSCLC cell lines (see Methods section and
   Supplementary Data [165]6). R: Pearson correlation coefficient. P-value
   (P) was calculated by F-statistics. n is the number of NSCLC cell lines
   were used for each correlation analysis. c Ouabain downregulates gene
   expression of LEO1 in a dose-dependent manner in A549 cells. Each
   experiment was performed at least three times in duplicate. Differences
   between the two groups were analyzed by Student’s t-test:
   **P-value < 0.01 and ***P-value < 0.001. Data is represented as
   mean ± SEM (n = 3) and each experiment was performed at least three
   times in duplicate. d Western blot analysis of LEO1 protein expression
   (normalized by GAPDH) following ouabain treatment in A549 cells and two
   normal human lung fibroblasts cells (WI38 and MRC5). Uncropped images
   for d are included as Supplementary Fig. [166]23. e Survival analysis
   of LEO1 gene expression in NSCLC patients (see Methods section). The
   P-value (P) was calculated by log-rank test. HR, hazard ratio

   We next collected genome-wide gene expression profiles from the CCLE
   database^[167]29 and the metabolite abundance on NSCLC cell lines from
   a recent study^[168]30. We computed the correlation of gene
   transcriptional activities and the serine/glycine abundance in
   approximately 70 NSCLC cell lines having both gene expression and
   metabolite abundance profiles. Among 251 HIF1α targeted genes with
   differential expression in LUAD patients, we found that the
   transcriptional activities of LEO1 showed the highest correlation with
   [^13C]serine or [^13C]glycine abundance in NSCLC cell lines
   (Supplementary Data [169]6 and Fig. [170]7b). We tested the effect of
   ouabain on LEO1 in A549 cells. Figure [171]7c, d shows that ouabain
   downregulates the expression of both mRNA and protein levels of LEO1 in
   A549 cells, whereas it has minor effect on LEO1 protein expression in
   two normal human lung fibroblast lines WI38 and MRC5. LEO1, encoding a
   component of RNA polymerase II-associated factor complex, has been
   reported to be involved in acute myelogenous leukemia^[172]31.
   Figure [173]7e shows that overexpression of LEO1 is correlated with
   poor survival rate in LUAD patients (P = 0.02, Log-rank test, see
   Methods section). Taken together, these data show that regulation of
   HIF1α/LEO1-mediated cell metabolism may contribute to ouabain’s
   antitumor effects in NSCLC cells.

Discussion

   Molecularly targeted agents have demonstrated clinical efficacy and
   favorable toxicity profiles for the treatment of cancer compared to
   nonspecific cytotoxic chemotherapeutic agents. However, patients
   treated with those targeted agents often develop drug resistance due to
   clinically actionable genetic and epigenetic events in individuals,
   with survival typically prolonged by only a few months. Thus, there is
   clearly a need to identify new treatments with high efficacy and low
   toxicity profiles (e.g., by drug repurposing) for subsequent treatment
   of relapsed patients.

   In this study, we developed an integrated interactome-based, systems
   pharmacology approach, GPSnet, for the systematic identification of
   novel targets (in cancer type-specific disease modules) or for
   repurposing approved drugs for development of new therapeutic
   strategies. To do so, we incorporated large-scale patient transcriptome
   profiles, whole-exome sequencing, drug-target network, and drug-induced
   microarray data into the human protein–protein interactome. We showed
   that both experimentally validated genes and canonical cancer pathways
   were significantly enriched in the disease modules identified by
   GPSnet. In addition, gene expression of GPSnet-identified disease
   modules can predict drug responses with reasonable accuracy
   (Fig. [174]4). We also showed that disease modules identified in the
   human protein–protein interactome are more likely to be druggable than
   the disease genes identified by traditional statistical approaches.
   Furthermore, we computationally identified multiple potential
   anticancer indications for 140 approved drugs across 15 cancer types by
   combining network proximity and GSEA approaches implemented in GPSnet.
   Importantly, we experimentally validated that two approved non-cancer
   drugs (ouabain and niclosamide) showed potential anticancer activities
   in several NSCLC cell lines. Interestingly, ouabain’s targets do not
   show significant network proximity with SMGs identified from
   traditional statistics-based genome sequencing analysis in the human
   interactome (Supplementary Fig. [175]15), indicating the power of the
   GPSnet algorithm. In summary, GPSnet offers a powerful network-based
   framework to identify new therapeutic interventions through uniquely
   integrating patient sequencing data and the human interactome. If
   broadly applied, GPSnet could be applied to other diseases as well.

   Several potential limitations should be acknowledged. First, previous
   studies have shown that well-known disease genes often have high data
   bias in the literature-derived human protein–protein
   interactome^[176]11. For this reason, we tested the GPSnet algorithm in
   the unbiased, systematic Y2H interactome with low literature data bias,
   and found consistent results based on an integrated human interactome
   network that we used in this study (Supplementary Fig. [177]8). For
   drug target enrichment analysis, we found that known drug targets are
   enriched in the new disease modules identified by unbiased, systematic
   Y2H interactome across 4 four cancer types; however, the enrichments
   are not statistically significant (p-value ranges from 0.0667 to
   0.0984, permutation test, Supplementary Fig. [178]16). There are two
   possible explanations for this finding: (i) literature data bias, and
   (ii) network data incompleteness. The current Y2H unbiased interactome
   we used only includes 46,203 PPIs (links) connecting 8072 unique
   proteins (nodes). We, therefore, used a more comprehensive binary human
   interactome (including 98,240 PPIs connecting 12,994 unique proteins)
   published recently^[179]32. We found that drug targets are
   significantly enriched in the new disease modules identified from this
   more comprehensive binary human interactome network (Supplementary
   Fig. [180]17). Thus, building a high-quality, comprehensive, unbiased
   systematic interactome would improve further the performance of GPSnet.

   Here, we integrated both GSEA and network proximity approaches
   (Fig. [181]2c) for in silico drug repurposing by integrating physical
   drug-target interactions and functional drug-gene associations with the
   cancer type-specific disease modules identified by GPSnet into the
   human interactome. Owing to a lack of large-scale published available
   experimental data, whether combining GSEA and network proximity
   synergistically improves the performance of drug repurposing
   compared to single approaches (GSEA or network proximity) requires
   further evaluation. Drugs can inhibit or activate protein function
   (including antagonists vs. agonists), while disease alleles from
   genetic or genomic studies harbor loss-of-function or gain-of-function
   phenotypes. For example, an inhibitor that targets loss-of-function
   disease proteins (e.g., tumor suppressors) often causes adverse
   effects. Hence, integration of functional genomic assays (e.g.,
   CRISPR-Cas9) or large-scale disease gene expression profiles (e.g.,
   well-defined oncogenes or tumor suppressor genes) along with patient
   data (e.g., health insurance claims data) for validation and in vitro
   or in vivo mechanistic studies will improve in silico drug repurposing
   models in the future^[182]33. In this study, we experimentally
   validated that ouabain showed potential antitumor effects in NSCLC
   cells, consistent with several previous studies^[183]34,[184]35.
   Importantly, we demonstrated that ouabain may target
   HIF1α/LEO1-mediated cell metabolic pathways in NSCLC cells by
   integrating gene expression, metabolomics data, and network analysis.
   However, the regulation of HIF1α and LEO1 in NSCLC cell metabolism
   warrants future studies by several approaches, such as CHIP-seq.

   In summary, this study offers a powerful network-based methodology for
   drug repurposing, and validates experimentally ouabain and niclosamide
   as potential antitumor agents to treat NSCLC. Importantly, we showed
   that ouabain potentially targets the HIF1α/LEO1-mediated cell
   metabolism pathway in NSCLC cells. To the best of our knowledge, GPSnet
   is a highly innovative network-based methodology that integrates
   large-scale patient DNA/RNA-seq data with the human protein–protein
   interactome, enabling accelerated target identification and drug
   development in cancer and other diseases if broadly applied. In this
   manner, we can minimize the translational gap between genomic medicine
   studies and clinical outcomes, a significant goal in network medicine
   and  precision medicine.

Methods

Collection of whole-exome sequencing data

   We downloaded the tumor-normal pairwise somatic mutation data for
   patients from three sources: (1) Elledge Lab website at Harvard
   University ([185]http://elledgelab.med.harvard.edu/?page_id=689), (2)
   Sanger website:
   ([186]ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl), and (3)
   COSMIC: Catalog of Somatic Mutations in Cancer
   ([187]https://cancer.sanger.ac.uk/cosmic). To reduce redundancy and
   ensure the quality of somatic mutation data in this study, we only
   focused on the somatic mutations in TCGA tumor-normal matched samples
   from the aforementioned three datasets. The RNA-seq (read counts) data
   for 15 cancer types were downloaded from GDC website
   ([188]https://portal.gdc.cancer.gov/) for computing co-expression
   (measured by Pearson Correlation Coefficient) of PPI coding gene pairs.

Construction of drug-target network

   We assembled high-quality physical drug-target interactions for
   FDA-approved drugs from six commonly used data sources, and defined a
   physical drug-target interaction using reported binding affinity data:
   inhibition constant/potency (K[i]), dissociation constant (K[d]),
   median effective concentration (EC[50]), or median inhibitory
   concentration (IC[50]) ≤ 10 µM. Drug-target interactions were acquired
   from the DrugBank database (v4.3)^[189]36, the Therapeutic Target
   Database (TTD, v4.3.02)^[190]37, and the PharmGKB database^[191]38.
   Specifically, bioactivity data of drug-target pairs were collected from
   three commonly used databases: ChEMBL (v20)^[192]39, BindingDB^[193]40,
   and IUPHAR/BPS Guide to PHARMACOLOGY^[194]41. After extracting the
   bioactivity data related to the drugs from the prepared bioactivity
   databases, only those items meeting the following four criteria were
   retained: (i) binding affinities, including K[i], K[d], IC[50], or
   EC[50], ≤10 μM; (ii) proteins represented by unique UniProt accession
   number; (iii) proteins marked as “reviewed” in the UniProt
   database^[195]42; and (iv) proteins of human origin.

Building the human protein–protein interactome

   To build the comprehensive human protein–protein interactome as
   currently available, we assembled 15 commonly used databases with
   multiple experimental sources of evidence and the in-house systematic
   human protein–protein interactome: (1) binary PPIs tested by
   high-throughput yeast-two-hybrid (Y2H) systems in which we combined two
   publicly available high-quality Y2H datasets^[196]11,[197]43, and one
   dataset available from our website:
   [198]http://ccsb.dana-farber.org/interactome-data.html; (2)
   kinase-substrate interactions by literature-derived low-throughput and
   high-throughput experiments from KinomeNetworkX^[199]44, Human Protein
   Resource Database (HPRD)^[200]45, PhosphoNetworks^[201]46,[202]47,
   PhosphositePlus^[203]48, DbPTM 3.0^[204]49, and Phospho.ELM^[205]50;
   (3) carefully literature-curated PPIs identified by affinity
   purification followed by mass spectrometry (AP-MS), and by
   literature-derived low-throughput experiments from BioGRID^[206]51,
   PINA^[207]52, HPRD^[208]45, MINT^[209]53, IntAct^[210]54, and
   InnateDB^[211]55; (4) high-quality PPIs from protein three-dimensional
   (3D) structures reported in Instruct^[212]56; and (5) signaling
   networks by literature-derived low-throughput experiments as annotated
   in SignaLink2.0^[213]57. The genes were mapped to their Entrez ID based
   on the NCBI database^[214]58 as well as their official gene symbols
   based on GeneCards ([215]http://www.genecards.org/). Inferred data,
   such as evolutionary analysis, gene expression data, and metabolic
   associations, were excluded. The updated human interactome constructed
   in this way includes 170,000 protein–protein interactions (PPIs) (edges
   or links) connecting 15,474 unique proteins (nodes). The unbiased,
   systematic human interactome^[216]11,[217]43 was downloaded from our
   website ([218]https://ccsb.dana-farber.org/interactome-data.html).

Description of GPSnet

   The GPSnet algorithm contains two main components: cancer type-specific
   disease module identification, and in silico drug repurposing.
   Under our assumptions, the disease module should be highly connected in
   the cancer type-specific co-expressed PPI networks derived from RNA-seq
   data, and the genes in the module should tend to be the highly-mutated
   genes. We use the random searching method to identify the disease
   module in GPSnet.

   We first set the initial score of each gene i in each cancer type as
   [MATH:
   <msub><mrow><mi>s</mi></mrow><mrow><mn>0</mn></mrow></msub><mrow><mo>(<
   /mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mo>=</mo><mfrac><mrow><mi>
   m</mi><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow><m
   row><mi>l</mi><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><
   /mrow></mfrac> :MATH]
   , where m(i) is the mutation frequency of gene i in the corresponding
   cancer type, and l(i) is the cDNA length of gene i. In order to
   eliminate the influence of the somatic mutation data that are sparse,
   the network smoothing method is used to transmit the score across the
   whole network.

   The random walk with restart process (RWR) is applied to calculate the
   smoothing gene score. Consider a random walker starting from gene i,
   who will move to a random neighbor with probability
   [MATH: <mrow><mo>(</mo><mrow><mn>1</mn><mo>-</mo><mi
   mathvariant="normal">α</mi></mrow><mo>)</mo></mrow> :MATH]
   or returns to gene i with probability α at each iterative time step,
   where
   [MATH: <mi
   mathvariant="normal">α</mi><mo>∈</mo><mrow><mo>[</mo><mrow><mn>0</mn><m
   space width="0.25em"></mspace><mn>1</mn></mrow><mo>]</mo></mrow> :MATH]
   is the parameter that drives the restart probability of the random walk
   process. Here we use α = 0.5 to balance the degree bias (Supplementary
   Figs. [219]18–[220]21). Denote
   [MATH: <mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>t</mi></mrow
   ></msub></mrow><mo> →</mo></mover> :MATH]
   as the score vector at step t, and the propagation process can be
   described as:
   [MATH: <mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>t</mi><mo>+<
   /mo><mn>1</mn></mrow></msub></mrow><mo>
   →</mo></mover><mo>=</mo><mrow><mo>(</mo><mrow><mn>1</mn><mo>-</mo><mi>α
   </mi></mrow><mo>)</mo></mrow><mi>W</mi><mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>t</mi></mrow
   ></msub></mrow><mo> →</mo></mover><mo>+</mo><mi>α</mi><mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mn>0</mn></mrow
   ></msub></mrow><mo> →</mo></mover><mo>,</mo> :MATH]
   4

   where
   [MATH: <mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mn>0</mn></mrow
   ></msub></mrow><mo> →</mo></mover> :MATH]
   is the vector of each gene’s initial score, and W is the transfer
   matrix with
   [MATH:
   <msub><mrow><mi>W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>k</mi><mrow><mo>(</mo><
   mrow><mi>j</mi></mrow><mo>)</mo></mrow></mrow></mfrac> :MATH]
   if gene i interacts with gene j, otherwise
   [MATH:
   <msub><mrow><mi>W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><mn>0</mn> :MATH]
   (with
   [MATH:
   <mi>k</mi><mrow><mo>(</mo><mrow><mi>j</mi></mrow><mo>)</mo></mrow>
   :MATH]
   the number of neighbors of gene j in the network). The theoretical
   solution is:
   [MATH: <mover
   accent="true"><mrow><mi>s</mi></mrow><mo>→</mo></mover><mo>=</mo><mi>α<
   /mi><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><mn>1</mn><mo>-</mo><mfenced close=")" open="("
   separators=""><mrow><mn>1</mn><mo>-</mo><mi>α</mi></mrow></mfenced><mi>
   W</mi></mrow></mfenced></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup><
   mover
   accent="true"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mn>0</mn></mrow
   ></msub></mrow><mo> →</mo></mover> :MATH]
   5

   with the ith element of
   [MATH: <mover accent="true"><mrow><mi>s</mi></mrow><mo>→</mo></mover>
   :MATH]
   the smoothing score of gene i.

   The module is defined as a sub-graph within the cancer type-specific
   co-expressed PPI network, and the score of module M is calculated
   according to Eq. [221]1, where m represents the number of genes in
   module M and μ is the average score over the whole gene set for the
   corresponding cancer type. The following steps are used to perform the
   random search process to generate a raw module. Initially, a gene is
   randomly selected as the “seed” gene. We denote Γ[M] as the set of the
   genes that interact with module M in the human interactome. For each
   gene
   [MATH:
   <mi>i</mi><mo>∈</mo><msub><mrow><mi>Γ</mi></mrow><mrow><mi>M</mi></mrow
   ></msub> :MATH]
   , we calculate the connectivity significance in the cancer
   type-specific PPI network by Eq. [222]2.

   For each gene
   [MATH:
   <mi>i</mi><mo>∈</mo><msub><mrow><mi>Γ</mi></mrow><mrow><mi>M</mi></mrow
   ></msub> :MATH]
   , we calculate the expanded module score if gene i is added to the
   module in Eq. [223]3. One raw module is obtained by repeating the
   searching steps until no more genes can be added to the corresponding
   module. In this study, we built approximately 60,000 raw modules for
   each cancer type. In this way, each gene in the human interactome was
   randomly selected approximately five times. We removed the raw modules
   with less than 10 genes, and collected the top 1% (approximately 300)
   of modules with the highest module scores, indicating gene confidence
   by calculating how many times each gene appears in these modules.
   Finally, we selected the genes with gene confidence values larger than
   0.5% and assembled the largest connected component^[224]7 among these
   genes (removing isolated nodes) in the cancer type-specific
   co-expressed PPI network as the final disease module for each cancer
   type. The detailed description of GPSnet is provided in the
   Supplementary Note [225]3 and Supplementary Fig. [226]4.

Pharmacogenomics models

   We downloaded gene bulk expression profiles and drug response data
   (defined by IC[50] value) across cancer cell lines from the GDSC
   database^[227]17. We built regression models to predict drug’s IC[50]
   value using the LIBSVM^[228]59 R package with default parameters and
   linear kernels. Specifically, the GPSnet-identified disease modules
   with RMA gene expression profiles of cancer cell lines were transformed
   into a matrix with each column of this matrix denoting a feature vector
   and each row denoting a cancer cell line from the GDSC database.
   Finally, the ROC curves were plotted using the R package.

Gene-set enrichment analysis (GSEA)

   We collected drug-gene signatures from the Connectivity Map (CMap,
   build 02)^[229]23. The CMap comprises over 7000 gene expression
   profiles from cultured human cell lines treated with various small
   bioactive molecules (1309 total), at different concentrations, covering
   6100 individual instances. The CMap thus provides a measure of the
   extent of differential expression for a given probe set. The amplitude
   (a) is defined as follows:
   [MATH:
   <mi>a</mi><mo>=</mo><mfrac><mrow><mi>t</mi><mo>-</mo><mi>c</mi></mrow><
   mrow><mrow><mo>(</mo><mrow><mi>t</mi><mo>+</mo><mi>c</mi></mrow><mo>)</
   mo></mrow><mo>∕</mo><mn>2</mn></mrow></mfrac><mo>,</mo> :MATH]
   6

   where t is the scaled and threshold average difference value for the
   drug treatment group and c is the threshold average difference value
   for the control group. Thus, a = 0 indicates no differential
   expression, a > 0 indicates increased expression (upregulation) upon
   treatment, and a < 0 indicates decreased expression (downregulation)
   upon treatment. For example, an amplitude of 0.67 represents a two-fold
   induction^[230]60. Drug gene signatures with amplitudes >0.67 were
   defined as upregulated drug-gene pairs, and amplitudes <− 0.67
   reflected downregulated drug-gene pairs. We then mapped probe sets to
   the human genome. In total, we compiled 1,874,674 drug-gene pairs from
   the CMap, connecting 1309 drugs and 12,768 genes.

   For each drug-disease pair, we counted the number of module genes in a
   particular cancer type identified by GPSnet, and those that are
   upregulated or downregulated by drug treatments, as well as overlap or
   mutually exclusive pairs. Here, we used P < 0.05 as a cutoff to
   identify significant drug-disease associations (Supplementary
   Data [231]5).

Network proximity

   Given S, the set of disease proteins, and T, the set of drug targets,
   d(s,t), the closest distance measured by the average shortest path
   length between node s and the nearest disease protein t in the human
   protein–protein interactome, is defined as:
   [MATH:
   <mi>d</mi><mrow><mo>(</mo><mrow><mi>S</mi><mo>,</mo><mi>T</mi></mrow><m
   o>)</mo></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mfenced
   close="∥" open="∥"
   separators=""><mrow><mi>T</mi></mrow></mfenced></mrow></mfrac><munder><
   mrow><mo>∑</mo></mrow><mrow><mi>t</mi><mo>∈</mo><mi>T</mi></mrow></mund
   er><mspace width="0.25em"></mspace><msub><mrow><mi
   mathvariant="normal">min</mi></mrow><mrow><mi>s</mi><mo>∈</mo><mi>S</mi
   ></mrow></msub><mi>d</mi><mrow><mo>(</mo><mrow><mi>s</mi><mo>,</mo><mi>
   t</mi></mrow><mo>)</mo></mrow> :MATH]
   7

   To evaluate the significance of the network distance between a drug and
   a given disease, we constructed a reference distance distribution
   corresponding to the expected distance between two randomly selected
   groups of proteins of the same size and degree distribution as the
   original disease proteins and drug targets in the network. This
   procedure was repeated 1000 times. The mean
   [MATH: <mover accent="true"><mrow><mi>d</mi></mrow><mo>¯</mo></mover>
   :MATH]
   and standard deviation (
   [MATH: <msub><mrow><mi>σ</mi></mrow><mrow><mi>d</mi></mrow></msub>
   :MATH]
   ) of the reference distribution were used to calculate a z-score (
   [MATH: <msub><mrow><mi>z</mi></mrow><mrow><mi>d</mi></mrow></msub>
   :MATH]
   ) by converting an observed (non-Euclidean) distance to a normalized
   (non-Euclidean) distance (Supplementary Data [232]5). The detailed
   description of network proximity was provided in our recent
   study^[233]8.

Pathway enrichment analysis

   We used ClueGO^[234]61 for enrichment analysis of genes in the
   canonical KEGG pathways. A hypergeometric test was performed to
   estimate statistical significance, and all P values were adjusted for
   multiple testing using Bonferroni’s correction (adjusted P values).

Cell culture

   All cells were incubated at 37 °C in a humidified incubator with 5%
   CO[2]. NSCLC cell lines A549, H522, H596, H1975, HCC827, and PC9 were
   obtained from the American Type Culture Collection (Manassas, VA) and
   cultured in Roswell Park Memorial Institute (RPMI) 1640 Medium
   supplemented with 10% fetal bovine serum (FBS, Gibco), and
   penicillin-streptomycin. Lung normal cell lines MRC5 and WI38 were
   obtained from the Shanghai Cell Bank of the Chinese Academy of Sciences
   (Shanghai, China) and maintained in Eagle’s Minimum Essential Medium
   (EMEM) supplemented with 10% fetal bovine serum (FBS, Gibco) and
   penicillin-streptomycin (Supplementary Table [235]1). Cell lines were
   subjected to a mycoplasma detection test and authenticated by short
   tandem repeat (STR) profiling.

Cell viability assay

   Cell viability assay was conducted as previously described^[236]62. In
   brief, 3000–5000 cells/well were seeded in 96-well plates for 12 h, and
   then incubated with the indicated compounds for 48 h. Cell viability
   was detected using CellTiter 96 AQueous One Solution from Promega
   (Madison, WI), according to the manufacturer’s protocol. IC[50] values
   were calculated from dose-response curves using Graphpad Prism 7
   (GraphPad Software).

Colony formation

   A549 or H522 cells were seeded into 6-well plates at 1500 cells per
   well in 2 ml of 1640 medium supplemented with 10% FBS and
   penicillin-streptomycin. After cell adherence, various concentrations
   of ouabain were incubated with cells. The medium was changed every 2
   days. After 7 days, colonies were fixed in 4% paraformaldehyde and
   stained with 0.2% crystal violet.

Western blotting and antibodies

   Cell were lysed with buffer containing 100 mM Tris-HCl (pH 7.5),150 mM
   NaCl, 1 mM EDTA, 0.1% SDS, 1% sodium deoxycholate, and 1% Triton X-100
   with protease inhibitors (Roche, San Francisco, CA) and phosphatase
   inhibitors (Santa Cruz, Dallas, TX). Protein concentration was
   determined using the bicinchoninic acid assay (BCA) (Thermo Fisher,
   Waltham, MA). Equal amounts of proteins samples (approximately 50 μg)
   were run on SDS-polyacrylamide gel electrophoresis and transferred to
   nitrocellulose membranes (Millipore, Burlington, MA). The membranes
   were blocked in TBST (10 mM Tris-HCl pH = 8, 150 mM NaCl, 0.1% Tween)
   with 5% BSA and probed with primary antibodies and corresponding
   fluorescence-conjugated secondary antibodies. Antibodies (Supplementary
   Table [237]2) used included: anti-GAPDH (Catalog No.: AB0037, 1:10,000
   dilution) and anti-HIF1α (Catalog No.: CY5197, 1:1000 dilution),
   purchased from Abways Technology (Shanghai, China); and anti-LEO1
   (Catalog No.: ab75721, 1:100 dilution), purchased from Abcam
   (Cambridge, MA). Uncropped images of Figs. [238]6c, [239]7d are
   included as Supplementary Figs. [240]22 and [241]23.

Real-time quantitative PCR (RT-qPCR)

   Total RNA was isolated using the Trizol reagent (Invitrogen, Carlsbad,
   CA) according to the manufacturer’s protocol. cDNA synthesis was
   performed using the ReverTra Ace qPCR RT Master Mix (TOYOBO, OSAKA,
   Japan). qPCR reactions were conducted using SYBR Green Real-Time PCR
   Master Mixes (Thermo Fisher, Waltham, MA) on CFX-96TM (Bio-Rad,
   Hercules, CA). The amount of each gene was detected and normalized by
   the amount of GAPDH. The Ct values were generated using the standard
   curve method. The sequences of primers used in RT-qPCR experiments
   (Supplementary Table [242]3) are as follows: HIF1A: Forward primer:
   5′-ACTCAGGACACAGATTTAGACTTG-3′. Reverse primer:
   5′-TGGCATTAGCAGTAGGTTCTTG-3′.

   LEO1: Forward primer: 5′-AGAAGCGGATAGTGACACTGAGGT-3′.

   Reverse primer: 5′-TTCATCAACAGGCTGTCCTGGAGT-3′. pLKO.1-puro vectors
   encoding shRNA targeting HIF1A were purchased from Synbio Technologies.
   shHIF1A-1 sequence:
   CCGGGTGATGAAAGAATTACCGAATCTCGAGATTCGGTAATTCTTTCATCACTTTTT; shHIF1A-2
   sequence: CCGGTGCTCTTTGTGGTTGGATCTACTCGAGTAGATCCAACCACAAAGAGCATTTTT.
   HIF1A knockdown was confirmed by a RT-qPCR.

Transcription factor network analysis

   We collected the 719 putative reported HIF1A transcription factor
   targets (Supplementary Data [243]6) from two previous
   studies^[244]63,[245]64. To inspect the potential function of HIF1A
   transcription factor in LUAD, we calculated the differential expression
   of these 719 genes (Supplementary Data [246]6) by utilizing the RNA-seq
   read count data in LUAD patient’s tumor samples compared to the matched
   normal samples from TCGA using DESeq2^[247]65. We used the adjusted
   P-value less than 0.05 to define the differentially expressed genes.

Correlation between metabolite abundance and gene expression

   We collected [^13C]serine or [^13C]glycine abundance tested in ~70
   NSCLC cell lines from a previous study^[248]30. We next collected
   genome-wide gene expression profiles on NSCLC cell lines from the CCLE
   database^[249]29. The correlation between metabolite abundance and gene
   expression level (Supplementary Data [250]6) was computed by Pearson
   correlation coefficient measure, and P-values were computed by the
   F-statistics using the R platform (v3.01,
   [251]http://www.r-project.org/).

Tissue-specific subnetwork analysis

   We downloaded the RNA-seq data (RPKM value) of 32 tissues from GTEx V6
   release ([252]https://gtexportal.org/home/). For each tissue (e.g.,
   lung), we regarded those genes with RPKM ≥1 in more than 80% of samples
   as tissue-expressed genes and the remaining genes as
   tissue-unexpressed. To quantify the expression significance of
   tissue-expressed gene i in tissue t, we calculated the average
   expression E(i) and the standard deviation
   [MATH:
   <msub><mrow><mi>δ</mi></mrow><mrow><mi>E</mi></mrow></msub><mrow><mo>(<
   /mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow> :MATH]
   of a gene’s expression across all considered tissues. The significance
   of gene expression in tissue t is defined as
   [MATH:
   <msub><mrow><mi>z</mi></mrow><mrow><mi>E</mi></mrow></msub><mrow><mo>(<
   /mo><mrow><mi>i</mi><mo>,</mo><mi>t</mi></mrow><mo>)</mo></mrow><mo>=</
   mo><mrow><mo>(</mo><mrow><mi>E</mi><mrow><mo>(</mo><mrow><mi>i</mi><mo>
   ,</mo><mi>t</mi></mrow><mo>)</mo></mrow><mo>-</mo><mi>E</mi><mrow><mo>(
   </mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow><mo>)</mo></mrow><m
   o>∕</mo><msub><mrow><mi>δ</mi></mrow><mrow><mi>E</mi></mrow></msub><mro
   w><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow><mi>s</mi> :MATH]
   as described previously^[253]8. For LUAD, we built a lung-specific
   protein–protein interaction network by comparing genome-wide expression
   profiles of lung to 31 other different tissues from GTEx.

Survival analysis

   We downloaded the microarray data and survival profiles on 226 primary
   human stage I–II lung adenocarcinomas ([254]GSE31210)^[255]66. Patients
   were grouped into the top 50% lowly expressed (blue) versus top 50%
   highly expressed (red) groups based on the normalized expression level.
   The P-value for Kaplan–Meier survival analysis was determined using a
   log-rank test in the GraphPad Prism 7 software.

Statistical analysis

   The data shown in the study were obtained from at least three
   independent experiments and all data in different experimental groups
   were expressed as the mean ± the standard error of the mean (SEM).
   Differences between the two groups were analyzed by Student’s t-test
   and P values as indicated in the figure legends. *P < 0.05 was
   considered statistically significant. Statistical analyses were
   conducted using GraphPad Prism 7 software.

Reporting summary

   Further information on research design is available in the [256]Nature
   Research Reporting Summary linked to this article.

Supplementary information

   [257]Supplementary Information^ (5.8MB, pdf)
   [258]41467_2019_10744_MOESM2_ESM.pdf^ (59.9KB, pdf)

   Description of Additional Supplementary Files
   [259]Supplementary Data 1^ (90.7KB, xlsx)
   [260]Supplementary Data 2^ (91.7KB, xlsx)
   [261]Supplementary Data 3^ (177.2KB, xlsx)
   [262]Supplementary Data 4^ (17.9KB, xlsx)
   [263]Supplementary Data 5^ (139.5KB, xlsx)
   [264]Supplementary Data 6^ (102.9KB, xlsx)
   [265]Reporting Summary^ (68.8KB, pdf)

Acknowledgements