Graphical abstract

   graphic file with name fx1.jpg
   [30]Open in a new tab

Highlights

     * •
       Protocol to evaluate neoPPI functions and clinical significance in
       cancer using AVERON
     * •
       Steps to estimate and compare neoPPI levels across different cancer
       patient cohorts
     * •
       Procedures for uncovering neoPPI-regulated genes and mapping them
       on defined pathways
     * •
       Discovery of druggable clinically significant neoPPI-regulated
       genes
     __________________________________________________________________

   Publisher’s note: Undertaking any experimental protocol requires
   adherence to local institutional guidelines for laboratory safety and
   ethics.
     __________________________________________________________________

   While some tumor driver mutations inhibit existing protein-protein
   interactions (PPIs), others can create neomorph interactions (neoPPIs)
   not characteristic of the wild-type counterparts. Such tumor-specific
   neoPPIs may represent targets for therapeutic interventions. Here, we
   present a protocol to computationally uncover neoPPI-enabled druggable
   tumor dependencies using the AVERON Notebook environment. We describe
   steps for determining PPI levels, identifying clinically significant
   neoPPIs, and determining neoPPI-regulated pathways. We then detail
   procedures for determining neoPPI-regulated therapeutically actionable
   targets.

Before you begin

   Oncogenic mutations in tumor-driver genes are among the key events in
   cancer initiation and progression.[31]^2^,[32]^3 Such mutations can
   change protein structure, functions, and cellular localization,
   ultimately rewiring the protein-protein interaction (PPI) networks and
   dysregulating signaling and metabolic pathways. Discovery of the most
   clinically and biologically significant mutant-directed neomorph PPIs
   (neoPPIs) that drive cancer is vital to develop new personalized
   clinical strategies.[33]^4^,[34]^5 However, experimental interrogation
   of neoPPI functions and therapeutic potential is highly challenging
   even in cell-based models and currently is not feasible in cancer
   patients. To address this critical challenge, we developed a
   computational platform, termed AVERON Notebook, to discover Actionable
   Vulnerabilities Enabled by Rewired Oncogenic Networks.[35]^1
   Implemented in a widely used Jupyter Notebook format, AVERON Notebook
   enables systematic profiling of the association between decreased
   clinical outcomes and neoPPI levels, rather than the status of
   individual genes, uncovering molecular mechanisms of neoPPI-driven
   tumorigenesis, and identification of druggable and therapeutically
   significant neoPPI-regulated genes to inform new biological models and
   personalized therapeutic strategies in mutant-driven cancers.

   The protocol below provides comprehensive and in-depth guidance on the
   features and capabilities of the AVERON Notebook. It is designed to
   empower users with a thorough understanding of AVERON’s functions and
   operations, offering step-by-step instructions tailored to specific
   goals and analyses.

Software installation

     Inline graphic Timing: 15 min

     * 1.
       Download and install Jupyter Lab or Jupyter Notebook.

     Note: To run the AVERON Notebook, Jupyter Notebook or Jupyter Lab is
     needed. We recommend installing Jupyter Lab as part of ANACONDA from
     [36]https://www.anaconda.com/download.

     * 2.
       Check and install the python packages that are needed by the Averon
       functions, referring to the Jupyter Lab, conda, and pip
       documentation for the package installation instructions:
          + a.
            [37]https://jupyterlab.readthedocs.io/en/stable/user/index.htm
            l.
          + b.
            [38]https://conda.io/projects/conda/en/latest/user-guide/tasks
            /manage-pkgs.html.
          + c.
            [39]https://packaging.python.org/en/latest/tutorials/installin
            g-packages/#ensure-you-can-run-pip-from-the-command-line.

     Note: The Averon Notebook was tested with Jupyter Lab 3.6.3
     installed with Anaconda 2.5.0. Upon the fresh installation, the
     following packages had to be installed: ipycytoscape, lifelines,
     scikit_posthocs, py4cytoscape, and pygtop==2.1.4.

     * 3.
       Install Cytoscape software[40]^6 for network visualization from
       [41]https://cytoscape.org/download-platforms.html.

Download associated files

     Inline graphic Timing: 1 min

     * 4.
       Download the AVERON Notebook repository from GitHub:
       [42]https://github.com/aivanovlab/averon_notebook.

Loading Python general environment

     Inline graphic Timing: 5 min

     * 5.
       Run Jupyter Lab.
     * 6.
       In Jupyter Lab open “averon_notebook.ipynb” located in
       AVERON/notebook folder.
     * 7.
       Import essential dependencies and set up the general environment.
     * 8.
       Set the path to the AVERON parent folder and execute the following
       code:

   #Make sure the parent_folder is set to the root Averon folder

   %reset

   parent_folder = "C:\\AVERON\\"

   Proceed (y/[n])?”, type “y” in the text box to confirm the reset of all
   variables.
     * 9.
       Execute the #Import essentials cell to import required packages:

   #Import essentials

   import os

   import warnings

   import sys, os, importlib,ipycytoscape,requests,pygtop,lifelines

   import pandas as pd

   import matplotlib.pyplot as plt

   import ipywidgets as widgets

   import matplotlib

   import seaborn as sns

   from IPython.display import display, HTML

   os.chdir(parent_folder)

   sys.path.insert(0, os.path.abspath('src'))

   import averon as av

   warnings.filterwarnings('ignore')

   pd.set_option("display.max_rows", None)

   plt.rcParams['pdf.fonttype'] = 42

   plt.rcParams['ps.fonttype'] = 42

   pd.set_option('display.max_columns', None)

   %matplotlib inline
     * 10.
       Define the folders and the input files.

   #Define folders

   #Set output folders:

   project_folder = parent_folder+"Projects"+"/"

   #Folder with all input files:

   data_folder = parent_folder+"input"+"/"

   #pathway folder.

   pathway_folder = data_folder+"pathway_genesets/MSigDB/"

   #Set input files

   genomics_clinical_folder = data_folder+"genomics_and_clinical/"

   #HGNC gene annotations from[43]https://www.genenames.org/

   hgnc_f = genomics_clinical_folder + "hgnc5.txt"

   #Mutation data
   from[44]https://gdc.cancer.gov/about-data/publications/pancanatlas

   mut_f = genomics_clinical_folder +"braf_v600e.maf"

   #mRNA expression data
   from[45]https://gdc.cancer.gov/about-data/publications/pancanatlas

   exp_f = genomics_clinical_folder +
   "EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.refined.tsv"

   params = {}

   params['projects_folder']=project_folder

   params['mut_f']=mut_f

   params['exp_f']=exp_f

   params['clinical_f']=clinical_f

   params['uuid_f']=uuid_f

   params['hgnc_f']=hgnc_f

   params['data_folder']=data_folder

   params['pathway_folder']=pathway_folder

   params['genomics_clinical_folder']=genomics_clinical_folder
     * 11.
       Check if all required files and folders are in place.

   #Check files & folders

   av.check_files(params)

     Note: The mRNA expression file
     “EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv” is not
     provided as part of the AVERON Notebook. It should be downloaded
     from
     [46]https://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d913
     6611 and pre-processed by executing the #Download TCGA expression
     data cell. This step should be done just once and can be skipped
     during the next AVERON Notebook executions.

   #Download TCGA expression data

   exp_f = av.download_and_refine_tcga_exp_data(params)
     * 12.
       Provide the project name to create a new project.

   #Provide a new project name:

   project_name = "Project1"

   params = av.new_project(project_name,project_folder,params)

   tbl_folder = params['tbl_folder']

   net_folder = params['net_folder']

   fig_folder = params['fig_folder']

   exp_f = av.download_and_refine_tcga_exp_data(params)
     * 13.
       Load cancer patient IDs, mRNA expression data, and the coding gene
       names by executing #Get TCGA Patient IDs, #Get genes with mRNA
       expression data, and #Get protein coding genes cells.

   #Get TCGA Patient IDs

   barcode_df = av.prepare_cancer_barcodes(clinical_f)

   cancers = barcode_df.columns.tolist()

   params['barcode_df']=barcode_df

   barcode_df.head(3)

   #Get genes with mRNA expression data

   genes_with_expression=av.get_mRNA_expession(exp_f)

   print("A total of ",len(genes_with_expression),

   "genes with expression data")

   #Get protein coding genes

   hgnc_df = pd.read_csv(hgnc_f,sep='\t',index_col = 0)

   hgnc_df.set_index('Approved symbol',inplace=True)

   coding_genes = av.get_coding_genes(hgnc_df.copy()).index.values

   params['coding_genes']=coding_genes

   print("There is a total of",len(coding_genes),"coding genes")

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Deposited data
     __________________________________________________________________

   Mutation-directed neo-protein-protein interactions Mo et al.[47]^4
   [48]https://doi.org/10.1016/J.CELL.2022.04.014
   MSigDB Liberzon et al.[49]^7
   [50]https://www.gsea-msigdb.org/gsea/msigdb
   KEGG Kanehisa et al.[51]^8 [52]https://www.genome.jp/kegg
   REACTOME Fabregat et al.[53]^9 [54]https://reactome.org
   IUPHAR database Harding et al.[55]^10
   [56]https://www.guidetopharmacology.org
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal ACC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal BLCA
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal BRCA
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal CESC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal CHOL
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal COAD
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal DLBC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal ESCA
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal GBM
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal HNSC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal KICH
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal KIRC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal KIRP
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal LAML
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal LGG
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal LIHC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal LUAD
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal LUSC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal MESO
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal OV
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal PAAD
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal PCPG
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal PRAD
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal READ
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal SARC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal SKCM
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal STAD
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal TGCT
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal THCA
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal THYM
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal UCEC
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal UCS
   TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data
   Portal UVM
   Ligand-target interactions IUPHAR/BPS Guide to PHARMACOLOGY database
   [57]https://blog.guidetopharmacology.org/2024/03/27/database-release-20
   24-1/
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   AVERON Notebook Chen et al.[58]^1
   [59]https://github.com/aivanovlab/averon_notebook
   [60]https://doi.org/10.5281/zenodo.13926943
   Python 3 Python Software Foundation [61]https://www.python.org
   Jupyter Notebook environment Ragan-Kelley et al.[62]^11
   [63]www.jupyter.org
   MSigDB Liberzon et al.[64]^7
   [65]https://www.gsea-msigdb.org/gsea/msigdb
   Cytoscape Shannon et al.[66]^6 [67]https://cytoscape.org
   [68]Open in a new tab

Step-by-step method details

Determine PPI levels in cancer samples

     Inline graphic Timing: 6 min

   In this step, we are going to determine PPI levels across cancer
   samples. We will determine both mutant and wild-type PPI levels by
   calculating corresponding PPI scores.
     * 1.
       Define the mutant tumor driver gene.
          + a.
            In the #Define the mutant driver cell, provide the standard
            gene symbol for the tumor driver gene of interest in the
            driver_gene variable.
          + b.
            Define a single mutation or list several mutations in the
            driver_mut variable.
          + c.
            Use “ALL” to consider all driver mutations.

     Note: Here, we will use BRAF V600E mutant as an example.

   #Define the mutant driver

   driver_gene = 'BRAF'

   driver_mut = ['p.V600E']

   #driver_mut can be an array of point mutations or

   #driver_mut = 'ALL’ for all mutants

   params['driver_gene']=driver_gene

   params['driver_mut']=driver_mut

   if driver_gene not in genes_with_expression:

    print("Gene not found")
     * 2.
       Determine mutation frequency across cancer types.
          + a.
            Build and save a graph with the frequency of the specified
            mutation by executing the #Mutation frequency cell
            ([69]Figure 1A).
            #Mutation frequency
            fig,cancers_df=av.get_cancer_mutation_freq(barcode_df,mut_f,dr
            iver_gene,
            driver_mut,cancers)
            #Save the graph
            fig.savefig(fig_folder+av.gen_filename(driver_gene,driver_mut,
            "_frequency_across_cancers",".png"), dpi=600,format='png')
          + b.
            Get the mutation frequency in a tabular format ([70]Figure 1B)
            by executing the #Show mutation frequency cell. The data will
            be saved to the Project/Tables folder.
            #Show mutation frequency and save the table
            tbl_file = tbl_folder+av.gen_filename(driver_gene,driver_mut,
            "_frequency_across_cancers",".tsv")
            cancers_df.to_csv(tbl_file,sep='\t')
     * 3.
       Define binding partners.
          + a.
            Execute the #Load partners cell to upload all binding partners
            from an external file located in the AVERONE/input folder. In
            this tutorial, we use the BRAF_neo_partners_example.txt file
            that contains three BRAF V600E neoPPI partners, one per line:
               o i.
                 AURKA
               o ii.
                 RAB25.
               o iii.
                 CDK4.
                 #Load partners
                 ppi_file = data_folder + "BRAF_neo_partners_example.txt"
                 partners = av.load_partners(ppi_file)
                 params['partners']=partners
          + b.
            Alternatively, move on to the #Enter partners manually cell to
            manually upload specific binding partners by populating an
            array named “partners”:

     Note: If a binding partner lacks mRNA expression data in the
     database, a message stating “The gene is not in the MUT/EXP dataset.
     Check gene name” will be displayed.
            #Enter partners manually
            partners = ['AURKA', 'CDK4', 'RAB25']
            for partner in partners:
             if partner not in genes_with_expression:
             print(partner,"not in MUT/EXP dataset. Check gene name.")
            params['partners']=partners
     * 4.
       To get the detailed annotations for the binding partners, execute
       the #Get binding partner annotation cell. It will automatically
       annotate the proteins based on the information available from the
       HGNC database[71]^12 ([72]Figure 2), and will be stored to the
       Project/Tables folder:

   #Get binding partner annotations

   partner_hgnc_df = av.get_partner_info(partners, hgnc_df)

   partner_hgnc_df.to_csv(tbl_folder+"binding_partners.csv",sep=',')

   partner_hgnc_df.head(3)
     * 5.
       Define the cancer type for the analysis.

     Note: The [73]standard TCGA cancer type abbreviations should be
     used. [74]Figure 1 can be used as a reference for available cancer
     types.

   #Indicate the cancer type

   cancer = 'SKCM'

Figure 1.

   [75]Figure 1
   [76]Open in a new tab

   BRAF V600E mutation frequency across different cancer types

   (A) The bars indicate the number of samples of each cancer type,
   including samples with the wild-type gene (gray), target mutation
   (green), and other mutations (blue).

   (B) The data can be exported to a tabular format.

Figure 2.

   [77]Figure 2
   [78]Open in a new tab

   Detailed annotations of the binding partners are provided based on the
   HGNC database

   The standard TCGA cancer type abbreviations should be used.
   [79]Figure 1 can be used as a reference for available cancer types.
     * 6.
       Extract and prepare mRNA expression data for the analysis by
       executing the #Get expression data cell:

   #Get expression data

   df_mut_exp_samples,df_wt_exp_samples =av.get_wt_mut_expression(cancer,

   params)

   df_mut_exp_samples.head(3)
     * 7.
       Evaluate neoPPI levels.
          + a.
            Calculate the PPI scores for a single cancer type by executing
            the #Calculate PPI scores cell
            #Calculate PPI scores
            all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df,
             scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer,
             params, df_wt_exp_samples,df_mut_exp_samples)
            all_mut_score_nn_df.head(3)
          + b.
            Identify co-regulated neoPPIs, with significant correlation
            between their PPI scores in mutant samples ([80]Figure 3).
            #Heatmap of co-regulated neoPPIs
            #The heatmap shows pairwise correlations between neoPPI scores
            corr = av.coregulated_neoppis_map(all_mut_score_nn_df,
             fig_folder,cancer,'ward',10,10)
          + c.
            To get PPI scores across multiple cancer types, execute the
            #PPI scores for multiple cancers cell, indicating the cancer
            types of interest using the standard TCGA four-letter
            abbreviations.
            #PPI scores for multiple cancers
            ppi_score_dict =
             av.get_ppi_scores_multiple_cancers(['THCA','COAD','SKCM'],
             params)
          + d.
            Compare neoPPI levels across different cancer types.
               o i.
                 Execute the #Compare PPI scores cell to generate a table
                 with statistical differences between PPI scores
                 calculated for different cancer types ([81]Figure 4A).
                 #Compare PPI scores
                 compare_ppi_scores_df=av.compare_ppi_scores(ppi_score_dic
                 t,partners)
                 compare_ppi_scores_df.head(3)
               o ii.
                 To save the table to the Project/Tables folder, execute
                 the #Save it! cell.
               o iii.
                 The #Get the heatmap cell will deliver a heatmap where a
                 brighter color signifies a higher average neoPPI score
                 ([82]Figure 4B). The #Save the heatmap cell will save the
                 heatmap image to the Project/Figures folder.
                  #Save it!
                 compare_ppi_scores_df.to_csv(tbl_folder+av.gen_filename(d
                 river_gene,
                 driver_mut,'COAD_SKCM_THCA','_AVR_neoPPIscores.csv'),sep=
                 ',')

     Note: The AVERON also allows the analysis of neoPPI scores in
     individual tumor samples, annotated with associated metadata, such
     as patient race, gender, and age. To generate the annotated heatmaps
     with the neoPPI scores per sample, execute the #Heatmaps of neoPPI
     scores per tumor sample cell ([83]Figure 5A).
                 #Get the heatmap
                 g=av.get_neo_ppi_score_heatmap(compare_ppi_scores_df)
                 #Heatmaps of neoPPI scores per tumor sample
                 out = {}
                 for cancer in ['COAD','THCA','SKCM']:
                  g,out_df =
                 av.build_heatmap(cancer,ppi_score_dict,params,
                  ['Race','Gender','Age'])
                  out[cancer]=out_df
               o iv.
                 To visualize the distribution of neoPPI scores across
                 cancer types for multiple genes as boxplots, execute the
                 #neoPPI score distribution boxplots cell ([84]Figure 5B).
                 To retain these boxplots for future reference, proceed to
                 run the following #Save it cell for saving.
                 #neoPPI score distribution box plots
                 cancers = ['THCA','COAD','SKCM']
                 #genes = ['GOT1','CNKSR1','AJUBA','NOX1','CHD1L','DLL3']
                 genes = partners
                 colors = ['#00FF00', '#FFFF00', '#FF0000']
                 fig=av.boxplot_neoppi_score_distribution(genes,cancers,
                 ppi_score_dict,colors,'gainsboro')
                 #Save it
                 fig_file =
                 fig_folder+av.gen_filename(driver_gene,driver_mut,'COAD_S
                 KCM_THCA',
                 "_neoPPI_distribution_boxplots1.pdf")
                 fig.savefig(fig_file, dpi=600,format='pdf'
               o v.
                 The exact neoPPI score values per tumor sample can be
                 extracted by running the #Get the PPIScore values cell
                 ([85]Figure 5C). Use the partner variable to indicate the
                 binding partner, and the cancer variable to indicate the
                 cancer type:
                 #Get the PPIScore values
                 partner = "AURKA"
                 cancer = "SKCM"
                 av.get_ppi_values(cancer,partner,ppi_score_dict)
               o vi.
                 The #Get sample details cell will provide the detailed
                 metadata associated with the individual sample and a
                 direct link to the GDC DataPortal[86]^13 for further
                 exploration ([87]Figure 6):
                 #Get sample details
                 av.get_sample_info('TCGA-FW-A5DX')

Figure 3.

   [88]Figure 3
   [89]Open in a new tab

   Analysis of co-regulated neoPPIs

   The heatmap (A) and table (B) show the lack of correlation between BRAF
   V600E/RAB25 neoPPI scores and the scores obtained for BRAF V600E
   neoPPIs with AURKA or CDK4. There is a more prominent correlation of
   0.7 between BRAF V600E neoPPIs with AURKA and CDK4.

Figure 4.

   [90]Figure 4
   [91]Open in a new tab

   Evaluation of BRAF V600E neoPPIs levels across colon (COAD), thyroid
   (THCA), and skin melanoma (SKCM) cancers

   The AVERON Notebook enables the comparison of neoPPI levels in terms of
   the neoPPI scores provided in a tabular format (A) or as a heat map
   (B). The averaged PPI scores calculated for thyroid cancer, colon
   cancer, and skin melanoma are provided in THCA AVR, COAD AVR, and SKCM
   AVR columns. The corresponding THCA SD, COAD SD, and SKCM SD columns
   show the standard deviation for the neoPPI scores. The FC columns
   indicate the fold change values calculated as the ratio of the mean of
   neoPPI scores. PVAL THCA-COAD, THCA-SKCM, and COAD-SKCM show the
   pairwise Dunn’s test p-values. The Kruskal-Wallis test p-value and
   q-values are provided in “KW H TEST PVAL” and “KW H TES QVAL” columns,
   respectively.

Figure 5.

   [92]Figure 5
   [93]Open in a new tab

   Analysis of neoPPI score values in individual tumor samples

   (A) The heatmaps show BRAF V600E neoPPIs scores in samples from COAD,
   SKCM, and THCA patients.

   (B) The boxplots show the score distribution of BRAF V600E neoPPIs with
   AURKA, RAB25, and CDK4 in COAD, SKCM, and THCA. The dots indicate
   neoPPI score values in individual tumor samples. The boxplots are shown
   with boxes representing the interquartile range (IQR). The midline
   corresponds to the median. Error bars represent Q1/Q3 ± 1.5 × IQR.

   (C) The data can also be extracted in a tabular format.

Figure 6.

   [94]Figure 6
   [95]Open in a new tab

   Patient sample details

   The data is extracted from GDC Data Portal. Further details can be
   found at the GDC DataPortal by clicking the direct uuid hyperlink.

Identify clinically significant neoPPI

     Inline graphic Timing: 1 min

     Note: This step conducts a Kaplan-Meier survival analysis based on
     the neoPPI levels. The analysis compares the survival times of
     patients with high (above the median) and low (below the median)
     PPIneo scores.

     * 8.
       Set the cancer type and calculate neoPPI scores prior to the
       survival analysis by executing the #Define cancer type & update PPI
       scores cell.

   #Define cancer type & update PPI scores

   cancer = 'SKCM'

   df_mut_exp_samples,df_wt_exp_samples =
   av.get_wt_mut_expression(cancer,params)

   all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df,

   scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer,

   params, df_wt_exp_samples,df_mut_exp_samples)
     * 9.
       Conduct the survival analysis by executing the #Conduct survival
       analysis cell. The plots and statistics will be shown and
       automatically saved to the Project/Figures and
       Project/Tables folders, respectively ([96]Figure 7).

   #Conduct survival analysis

   pval = 0.1

   qval = 0.25

   surv_sum_df,fig,m1,m2,mut_surv=av.survival_analysis(

   df_mut_exp_samples.columns,clinical_f,all_mut_score_nn_df,

   'significant',pval)

   #Show statistics for the survival analysis

   surv_sum_df.loc[(surv_sum_df['MEDIAN_TIME_HIGH']<

   surv_sum_df['MEDIAN_TIME_LOW'])&(surv_sum_df['PVALUE']<pval)&

   (surv_sum_df['QVALUE']<qval)]

   #Save the data & plots

   surv_sum_df.to_csv(tbl_folder+av.gen_filename(driver_gene,driver_mut,

   cancer,"_Survival.csv"),sep=',')

   fig.savefig(fig_folder+av.gen_filename(driver_gene,driver_mut,cancer,

   "_Survival_plots.pdf"),dpi=600,format='pdf')

   display(HTML("<div style='height: 200px; overflow: auto;

   width: fit-content'>" + surv_sum_df.style.to_html() +"</div>"))

     Note: To calculate q-values, AVERON Notebook calls the
     get_FDR(pvals) function from /AVERON/src/averon.py that returns
     q-values calculated with the multipletests function from
     statsmodels.sandbox.stats.multicomp library. By default, the
     Benjamini/Hochberg method is used. However, multipletests provides
     other methods for the p-value adjustment, including `bonferroni` :
     one-step correction; `sidak` : one-step correction; `holm-sidak` :
     step down method using Sidak adjustments; `holm` : step-down method
     using Bonferroni adjustments; `simes-hochberg` : step-up method
     (independent); `hommel` : closed method based on Simes tests
     (non-negative); `fdr_bh` : Benjamini/Hochberg (non-negative);
     `fdr_by` : Benjamini/Yekutieli (negative); `fdr_tsbh` : two stage
     fdr correction (non-negative); `fdr_tsbky` : two stage fdr
     correction (non-negative). Further details and guidance about
     multipletests and their functionality can be found at
     [97]https://www.statsmodels.org.

Figure 7.

   [98]Figure 7
   [99]Open in a new tab

   neoPPI score-based survival analysis enables the evaluation of neoPPI
   clinical significance

   (A) Kaplan-Meier plots and (B) the statistics for the correlation
   between SKCM patient clinical outcomes and BRAF V600E neoPPIs with
   AURKA, CDK4, and RAB25. MEDIAN_TIME_HIGH and MEDIAN_TIME-LOW columns
   indicate the median times of patient survival with high and low neoPPI
   levels, respectively. PVALUE indicates the log-rank test p-values.
   QVALUE indicates the q-values.

Determine neoPPI-regulated genes

     Inline graphic Timing: 4 min

     Note: This step determines neoPPI-regulated pathways to explore the
     biological mechanism and functions behind neoPPI-regulated genes.

     * 10.
       Identify neoPPI-regulated genes.
          + a.
            Set the cancer type and calculate the PPI scores prior to the
            analysis of neoPPI-regulated genes by executing the #Define
            cancer type & update PPI scores cell.
            #Define cancer type & update PPI scores
            cancer = 'SKCM'
            df_mut_exp_samples,df_wt_exp_samples =
            av.get_wt_mut_expression(cancer,params)
            all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df,
            scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer,
            params, df_wt_exp_samples,df_mut_exp_samples)
          + b.
            Run the #neoPPI-correlated genes cell to determine the
            correlation between neoPPI scores and gene expression in
            mutant samples. The results will be automatically saved to the
            Project/Table/Correlated_genes folder.
            #neoPPI-correlated genes
            corr_dict =
            av.calculate_correlations(df_mut_exp_samples,df_wt_exp_samples
            ,partners,
            all_mut_score_nn_df,all_wt_score_nn_df)
            #Save it
            folder = tbl_folder+'Correlated_genes/'
            if not os.path.exists(folder):
            os.makedirs(folder)
            for p in corr_dict.keys():
            corr_dict[p].to_csv(folder+av.gen_filename(driver_gene,driver_
            mut,cancer,"_"+p+"_correlated_genes.csv"),sep=',')
            print("The analysis of neoPPI-correlated genes as been
            completed.")
          + c.
            Alternatively, previously calculated neoPPI-correlated genes
            can be loaded by executing the #Load correlated genes from
            file cell.
            #Load correlated genes from file
            cancer = 'SKCM'
            folder = tbl_folder+'Correlated_genes/'
            corr_dict = {}
            for partner in partners:
            corr_dict[partner] =
            pd.read_csv(folder+av.gen_filename(driver_gene,driver_mut,canc
            er,
            "_"+partner+"_correlated_genes.csv"),sep=',',index_col=0)
          + d.
            Determine signature genes for individual neoPPI.
               o i.
                 Indicate the neoPPI partner by executing the #Select the
                 partner cell, followed by the #Get signature genes cell.
                 #Select the partner:
                 partner = "AURKA"
                 #Get signature genes
                 def on_value_change(change):
                 global sig_genes
                 [box.children[0].children[x].observe(on_value_change,
                 names='value') for x in range(0,4)];
                 sig_genes, tbl =
                 av.get_signature_genes2(corr_dict,partner,driver_gene,box
                 .children[0].children[0].value,box.children[0].children[1
                 ].value,box.children[0].children[2].value,box.children[0]
                 .children[3].value)
                 return(box,sig_genes,tbl)
                 box,sig_genes =
                 av.display_signature_genes(corr_dict,partner,driver_gene)
                 [box.children[0].children[x].observe(on_value_change,
                 names='value') for x in range(0,4)];
                 display(box)

     Note: The interaction scrollbars ([100]Figure 8A) allow us to adjust
     the statistical thresholds. Each signature gene within the display
     is linked to the HGNC portal ([101]https://www.genenames.org) for
     detailed gene information ([102]Figure 8B).
               o ii.
                 Execute the #Save signature genes statistics cell to view
                 and save the detailed statistical characteristics of the
                 identified signature genes ([103]Figure 8C).
               o iii.
                 To visualize the signature gene network, run the
                 Cytoscape application.
                 #Network of the signature genes
                 node_df = pd.DataFrame(sig_genes)
                 node_df['PARTNER']=partner
                 types = [[g,"GENE"] for g in sig_genes]
                 types.append([partner,'PARTNER'])
                 types_df = pd.DataFrame(types,columns=['node','type'])
                 p4c,file =
                 av.display_cynetwork3(params,cancer,node_df,types_df,
                  ['red','blue'],['PARTNER','GENE'],layout='force-directed
                 ')
                 p4c.notebook_show_image(file)
               o iv.
                 Then, execute the #Network of signature genes cell.

     Note: The network will appear in the Notebook as an image and in the
     Cytoscape software as an interactive network ([104]Figure 9).
          + e.
            Determine neoPPI-regulated genes for multiple neoPPIs.
               o i.
                 Adjust the statistical thresholds in the #Get signature
                 genes of multiple neoPPIs cell. The following settings
                 are used by default:
                 CORR_BP_MUT = 0.33.
                 PVAL_BP_MUT = 0.05.
                 PVAL_MUT_vs_WT = 0.05.
                 QVAL_MUT_vs_WT = 0.25.
               o ii.
                 Execute the cell to identify signature genes for all
                 binding partners defined in step 3. The resulting table
                 with identified genes and associated statistical
                 parameters will be shown and saved to the
                 Project/Tables folder:
          + f.
            Compare the size and composition of signature gene sets.
               o i.
                 Execute the #Size distribution of neoPPI-regulated gene
                 sets.
               o ii.
                 Execute #neoPPI-signature gene set overlap, JACCARD score
                 cells to calculate JACCARD scores ([105]Figure 10).
                 #Size distribution of neoPPI-regulated gene sets
                 fig,stat_df =
                 av.neoPPI_genes_distr(sign_gene_dict,'red',8,8)
                 fig.savefig(fig_folder+av.gen_filename(driver_gene,driver
                 _mut,cancer,
                  "_signature_genes_hist.pdf"),dpi=600,format="pdf")
                 stat_df.sort_values(by='SIZE',ascending=False).head(10)
                 #neoPPI-signature gene set overlap, JACCARD score
                 g=av.jaccard(sign_gene_dict,fig_folder,'Blues',8,8)
                 g.savefig(fig_folder+av.gen_filename(driver_gene,driver_m
                 ut,cancer,
                  "_jaccard_heatmap1.pdf"),dpi=600,format='pdf')
          + g.
            To visualize the correlation between PPI-scores and mRNA
            expression of a signature gene in mutant and wild-type samples
            execute the #PPI-score/mRNA expression correlation cell. The
            correlation plots will be automatically saved to the
            Project/Figures folder ([106]Figure 11).

     Note: The partner and gene variables define the neoPPI binding
     partner and the signature gene, respectively.
            #PPI-score/mRNA expression correaltion
            partner='AURKA'
            gene = 'PLK1'
            fig =
            av.ppi_score_mrna_corr(gene,driver_gene,partner,all_mut_score_
            nn_df,
            all_wt_score_nn_df, df_mut_exp_samples,df_wt_exp_samples,
            'darkorange','deepskyblue')
            #Save it
            fig_file = fig_folder+av.gen_filename(driver_gene,driver_mut,
            cancer,"_"+partner+"_"+gene+"_correlattion_plot.png")
            fig.savefig(fig_file, dpi=600,format='png')
            QVAL = 0.25
            sign_gene_dict = {}
            sign_gene_dict,sign_gene_df=
            av.get_signature_genes_for_multiple_binding_partners(CORR_BP_M
            UT,
            PVAL_BP_MUT,PVAL,QVAL,corr_dict,cancer,driver_gene,driver_mut)
            #Show the table
            display(HTML("<div style='height: 200px; overflow: auto;
            width: fit-content'>" + sign_gene_df.style.to_html() +
            "</div>"))
            #Save it!
            sign_gene_df.to_csv(tbl_folder+av.gen_filename(driver_gene,dri
            ver_mut,
            cancer,"_signature_genes.csv"),sep=",")

Figure 8.

   [107]Figure 8
   [108]Open in a new tab

   Determine signature genes for individual neoPPI

   (A) An interactive tool enables an easy adjustment of statistical
   thresholds, such as the correlation coefficient between neoPPI score
   and expression of the neoPPI- regulated gene in mutant samples
   (CORR_BP_MUT), p-value for the CORR_BP_MUT correlation, p-value of
   statistical difference between CORR_BP_MUT and CORR_BP_WT
   (PVAL_MUT_vs_WT), and q-value of statistical difference between
   CORR_BP_MUT and CORR_BP_WT (PVAL_MUT_vs_WT).

   (B) The identified signature genes are shown and hyperlinked with their
   corresponding pages on the HGNC database website.

   (C) The signature gene statistical parameters are summarized in the
   table, including GENE: neoPPI-regulated gene, CORR_BP_MUT: Correlation
   coefficient between neoPPI score and expression of the neoPPI-regulated
   gene in mutant samples, M_MUT: The number of mutant samples used to
   calculate CORR_BP_MUT, CORR_BP_WT: Correlation coefficient between
   neoPPI score and expression of the neoPPI-regulated gene in the wild
   type samples, N_WT: The number of mutant samples used to calculate
   CORR_BP_WT, PVAL_BP_MUT: q-value of CORR_BP_MUT correlation,
   QVAL_BP_MUT: q-value of CORR_BP_MUT correlation, PVAL_BP_WT: q-value of
   CORR_BP_WT correlation, QVAL_BP_WT: q-value of CORR_BP_WT correlation,
   PVAL: p-value of statistical difference between CORR_BP_MUT and
   CORR_BP_WT, QVAL: q-value of statistical difference between CORR_BP_MUT
   and CORR_BP_WT.

Figure 9.

   [109]Figure 9
   [110]Open in a new tab

   The signature gene network visualization

   The network of BRAF V600E/AURKA neoPPI signature gene network is shown
   as an example. The neo-binding partner (AURKA) is shown in yellow. The
   signature genes are shown in light blue.

Figure 10.

   [111]Figure 10
   [112]Open in a new tab

   Size distribution and the similarity analysis of neoPPI-regulated gene
   sets

   The size distribution can be visualized as a histogram (A) and saved in
   a tabular format (B).

   (C) A heatmap shows the neoPPI-regulated gene set similarity evaluated
   in terms of the Jaccard index. The overall averaged Jaccard index of
   3.7% indicates a very limited overlap between signature gene sets of
   different neoPPIs.

Figure 11.

   [113]Figure 11
   [114]Open in a new tab

   neoPPI scores/mRNA expression correlation analysis

   Regression plots show the correlation between PLK1 expression and
   neoPPI scores calculated for (A) BRAF V600E/AURKA and (B) BRAF WT/AURKA
   PPIs.

Determine neoPPI-regulated pathways

     * 11.
       Define the reference gene sets in the #Define the pathway gene sets
       to analyze cell.

   #Define the pathway gene sets to analyze

   #Currently Averon uses genesets defined in MSigDB:
   [115]https://www.gsea-msigdb.org/gsea/msigdb

   pathway_files = ["h.all.v2022.1.Hs.symbols.gmt",

   "c2.cp.kegg.v2022.1.Hs.symbols.gmt",
   "c2.cp.reactome.v2022.1.Hs.symbols.gmt"]

   params['pathway_files']=pathway_files

     Note: In this example, we use “h.all.v2022.1.Hs.symbols.gmt”,
     “c2.cp.kegg.v2022.1.Hs.symbols.gmt”, and
     “c2.cp.reactome.v2022.1.Hs.symbols.gmt” sets defined in the
     Molecular Signature Database (MSigDB).[116]^7^,[117]^14 The
     corresponding reference GMT dataset files are located in the
     AVERON\input\pathway_genesets\MSigDB folder, defined by
     pathway_folder = data_folder+“pathway_genesets/MSigDB/” variable.”

     * 12.
       To conduct the analysis, execute the #Conduct the pathway
       enrichment analysis cell. Use the partner variable to specify a
       binding partner for the analysis or set partner = “” to analyze all
       binding partners.
       #Conduct the pathway enrichment analysis
       #If partner = "", the enrichment analysis will be conducted for all
       binding partners
       #If a particular partner is specified, the analysis will be
       conducted just for this partner
       #partner="AURKA" #specify a particular binding partner
       partner = "" #uncomment this line to use all the partners for the
       analysis
       bars_dict,enrichment_dict=av.pathway(sign_gene_dict,pathway_files,p
       artner,coding_genes,pathway_folder)
       print("Pathway enrichment analysis completed!")
          + a.
            Execute the #Show bar graphs cells to visualize the results as
            bar graphs ([118]Figure 12A).
            #Show bar graphs
            partner = 'AURKA'
            pathway = "h.all.v2022.1.Hs.symbols.gmt"
            fig = bars_dict[partner][pathway]
            fig
          + b.
            Save individual enrichment plots by executing the #Save the
            bar graph cell.
          + c.
            Save the enrichment analysis statistics by executing #Save the
            enrichment analysis cell. Set the qval variable to change the
            q-value statistical significance cut-off.
          + d.
            Run the #Show the enrichment statistics cell to show the
            enrichment analysis statistics.

     Note: The qval variable defines the threshold for statistical
     significance, and the partner variable indicates for which
     neo-binding partner the results will be shown.
            #Save the bar graph
            fig_file =
            fig_folder+av.gen_filename(driver_gene,driver_mut,cancer,
            "_"+partner+"_Enrichment_"+
            (".").join(pathway.split(".")[:-1])+".png")
            fig.savefig(fig_file, dpi=600,format='png')
            #Save the enrichment analysis
            qval = 0.05
            av.save_enrichment(enrichment_dict,qval,params,cancer)
            #Show the enrichment statistics
            partner = "AURKA"
            qval = 0.05
            enrichment_df = enrichment_dict[partner]
            enrichment_df =
            enrichment_df.loc[enrichment_df['qvalue']<qval].sort_values(
            by=['qvalue'])
            display(HTML("<div style='height: 400px; overflow: auto;width:
            fit-content'>" +
            enrichment_df.style.set_properties(∗∗{'text-align':
            'left'}).to_html() + enrichment_df.style.to_html() +
            "</div>"))
     * 13.
       Visualize and export the enrichment analysis as a network.
       #Connect Mutant driver - Partner - Pathway
       #Set the qval threshold:
       qval=0.05
       all_enrichment_df,all_enrichment_types_df=
       av.connect_mutant_driver_partner_pathway(enrichment_dict,
       driver_gene,qval)
          + a.
            Execute the #Connect Mutant driver - Partner - Pathway cell to
            prepare the network.
          + b.
            Generate an interactive network within the AVERON Notebook by
            executing the #Interactive network cell ([119]Figure 12B) or a
            Cytoscape network by executing the #Cytoscape network cell
            ([120]Figure 12C).

     Note: The interactive network works best with a small number (<100)
     of nodes. For larger networks, it is recommended to explore the
     network in Cytoscape. The Cytoscape application should be run prior
     to the Cytoscape network generation. The Cytoscape network will be
     saved as an image to the Project/Figures folder and as an SIF file
     to the Project/Networks folder.
            #Interactive network
            selected_partners = ['AURKA','CDK4','RAB25'] #indicate
            partners to use.
            #We don't recommentd to use more than 10 partners
            ipycytoscape_obj =
            av.create_interactive_network2(all_enrichment_df.loc[all_enric
            hment_df['Partner'].isin(selected_partners)],all_enrichment_ty
            pes_df,colors=['green','red','blue','orange'])
            ipycytoscape_obj
            #Cytoscape network
            p4c =
            av.display_cynetwork2(all_enrichment_df,all_enrichment_types_d
            f,
            colors=['#04DB00','#00CCCC','#CCCC00'],layout='force-directed'
            )
            os.chdir(fig_folder)
            file =
            fig_folder+av.gen_filename(driver_gene,driver_mut,cancer,
            "_pathway_network.png")
            p4c.export_image(file,overwrite_file=True);
            #Save network to file
            p4c.export_network(net_folder+av.gen_filename(driver_gene,driv
            er_mut,
            cancer,"_pathway_network.sif"),type='SIF',overwrite_file=True)
            ;
            p4c.notebook_show_image(file)
     * 14.
       To represent the enrichment analysis in a heatmap format, execute
       the #Enrichment heatmap cell ([121]Figure 12D). The heatmap image
       will be saved to the Project/Figures folder.
     * 15.
       A descriptive summary of how different neoPPIs can regulate
       different biological pathways through their signature genes can be
       generated by executing the #Generate text description cell
       ([122]Figure 12E).

   #Enrichment heatmap

   #Set the qval threshold:

   qval=0.05

   all_enrichment_df,all_enrichment_types_df=

   av.connect_mutant_driver_partner_pathway(enrichment_dict,

   driver_gene,qval)

   #select subset of pathway genesets to use

   #e.g. HALLMARK, KEGG, REACTOME

   subset = "HALLMARK"

   g=av.enrichment_heatmap(subset,all_enrichment_df,driver_gene,

   fig_folder,'blue','auto')

   g.savefig(fig_folder+av.gen_filename(driver_gene,driver_mut,

   cancer,"_"+subset+"_Enrichment_heatmap.pdf"),dpi=600,format='pdf')

   #Generate text description

   partner="AURKA"

   qval = 0.05

   description = av.get_txt_description_of_regulations(partner,

   enrichment_dict,qval,params)

   a = ('<br>').join(description)

   display(HTML("<div style='height: 400px;overflow: auto;

   width: 2000px'>" + a + "</div>"))

   #Save it!

   fname = tbl_folder+av.gen_filename(driver_gene,driver_mut,cancer,

   "_"+partner+"_neoPPI_functions.txt")

   with open(fname, 'w') as f:

   f.write(("\n").join(description))

Figure 12.

   [123]Figure 12
   [124]Open in a new tab

   The enrichment analysis of neoPPI-regulated genes conducted based on
   KEGG, REACTOME, and MSigDB cancer hallmark sets

   (A) The bar graph shows enrichment of BRAF V600E/AURKA neoPPI-regulated
   genes in the MSigDB cancer hallmark genes. The red dash line indicates
   the statistical significance cut-off of q-value = 0.05.

   (B) An interactive network shows the connectivity between the mutant
   driver gene, neo binding partners, and the neoPPI-regulated pathways
   defined. The mutant driver gene (e.g., BRAF V600E) is shown in green.
   The neo binding partners (AURKA, CDK4, and RAB25) are shown in red. The
   neoPPI-regulated pathways are shown in gray. The gene and pathway names
   can be shown by hovering the mouse over the corresponding node.

   (C) The network can also be visualized and explored in Cytoscape. The
   mutant driver gene, neo binding partners, and neoPPI-regulated pathways
   are shown in green, red, and yellow, respectively. The pathway names
   are hidden for clarity.

   (D) A heatmap shows the MSigDB cancer hallmark gene sets that can be
   regulated by BRAF V600E neoPPIs with AURKA, CDK4, and RAB25.

   (E) A descriptive summary of how neoPPIs can regulate different
   oncogenic pathways.

Uncover druggable vulnerabilities

     Inline graphic Timing: 11 min

     Note: This step enables the exploration of the neoPPI-regulated
     clinically actionable targets with available approved drugs and
     inhibitors.

     * 16.
       Determine clinically significant neoPPI-regulated genes.
          + a.
            Execute the #Clinically significant neoPPI- signature genes
            cell to determine neoPPI signature genes whose high expression
            in mutant samples correlates with worsened clinical outcomes
            ([125]Figure 13A).

     Note: With default parameters, the high expression of a gene g is
     defined as the expression above the 67th percentile, and the low
     expression is defined as the gene expression below the 33rd
     percentile in the mutant samples.
            #Clinically significant neoPPI-signature genes
            #use partners to determine clinically significant signature
            genes for all #neoPPIs or provide a subset of neo binding
            partners e.g. ['AURKA'] or #['AURKA','RAB25','CDK4']
            survival_df,sign_gene_dict,survival_plots =
            av.sign_genes_survival(df_mut_exp_samples,
            sign_gene_dict,params,cancer,partners)
            display(HTML("<div style='height: 400px; overflow: auto;
            width: fit-content'>" +
            survival_df.loc[survival_df['CLIN_FDR']<0.1].sort_values(
            by=['CLIN_FDR']).style.to_html() + "</div>"))
          + b.
            To save all the survival plot images for all neoPPIs analyzed,
            execute #Save survival cell. The plots will be saved to the
            Project/Figures folder.
            #Save survival plots
            pval = 0.05
            qval = 0.25
            av.save_survival_plots(survival_plots,pval,
            qval,partner,params,
            cancer,survival_df)

     Note: Use the pval and qval variables to define the statistical
     thresholds to save the plots.
          + c.
            To visualize and save the survival plot for individual genes,
            execute the #Survival plot for a single gene cell
            ([126]Figure 13B).
            #Survival plot for a single gene
            gene = 'PLK1'
            #Save it
            av.single_gene_survival_plot(df_mut_exp_samples,gene,
            clinical_f)[0].savefig(fig_folder+'survival_'+gene+'.pdf',
            dpi=600,format='pdf')
            av.single_gene_survival_plot(df_mut_exp_samples,gene,clinical_
            f)[0]

     Note: Use the gene variable to set the neo-binding partner for the
     analysis.
     * 17.
       Identify approved drugs and general inhibitors drugs available for
       neoPPI-regulated genes by executing the #Gene-drug connectivity
       cell ([127]Figure 13C).

     Note: The pval and the qval variables can be used to set statistical
     thresholds and conduct the analysis for a subset of clinically
     significant genes. To perform the analysis for all neoPPI signature
     genes, set pval = 1 and qval = 1. The information about the
     available drugs is extracted from the IUPHAR database[128]^10:
     [129]https://www.guidetopharmacology.org.

     Note: The identified compounds will be shown in an interactive
     table, where each compound is directly linked to its page on the
     IUPHAR website for detailed exploration. For convenience, the data
     is also shown in a scrollable table, which is automatically saved to
     the Project/Tables folder ([130]Figure 13D).

   #Gene-drug connectivity

   #Set p-value and q-value thresholds for gene clinical significance:

   pval = 0.05

   qval = 0.25

   #use partners to determine clinically significant signature genes for
   all #neoPPIs or provide a subset of neo binding partners e.g. ['AURKA']
   or #['AURKA','RAB25','CDK4']:

   clin_genes,ligands_df,gene_drugs_df,ofile =
   av.get_drugs(sign_gene_dict,

   pval,qval,params,cancer,partners)

   display(HTML("<div style='height: 400px; overflow: auto;

   width: fit-content'>" + gene_drugs_df.style.to_html() + "</div>" +

   "<span>" + "Gene-drug connectivity table was saved to <br>" + ofile +
   "</span"))

Figure 13.

   [131]Figure 13
   [132]Open in a new tab

   Identification of clinically significant neoPPI-signature genes

   (A) Statistical characteristics of correlations between neoPPI scores
   and clinical outcomes of cancer patients can be obtained and saved in a
   tabular format.

   (B) The Kaplan-Meier survival plots can be generated for individual
   neoPPIs. The data for BRAF V600e neoPPIs with FOXM1, PLK1, and TK1 are
   shown as representative examples.

   (C) AVERON determine available approved drugs (orange) and general
   inhibitors (blue) of clinically significant neoPPI-regulated genes
   (red). The identified compounds are linked to corresponding pages at
   the IUPHAR website ([133]https://www.guidetopharmacology.org) for
   detailed exploration.

   (D) The list of identified compounds can be exported to a tabular
   format.

Expected outcomes

   This protocol describes a computational procedure to identify druggable
   cancer vulnerabilities enabled by neomorph protein-protein interactions
   using the AVERON Notebook. The first seven steps determine and compare
   the neoPPI levels across cancer samples. Then, through the Kaplan-Meier
   analysis, AVERON enables prioritization of neoPPI which correlates with
   worsened clinical outcomes. Next several easy steps can uncover
   specific neoPPI-regulated genes and determine neoPPI-dependent
   signaling and metabolic pathways. The last two steps of the protocol
   help identify druggable and clinically significant neoPPI-regulated
   genes with available approved drugs and general inhibitors. Together,
   the protocol helps uncover new neoPPI-dependent mechanisms of oncogenic
   signaling and enables rapid prioritization of neoPPIs for detailed
   experimental studies.

Limitations

   The accuracy of AVERON’s outcomes relies on the quality of the
   available cancer genomics, clinical, and pharmacological databases. In
   some cases, a limited number of samples with a driver mutation can
   decrease the statistical power. The use of additional datasets beyond
   TCGA data, such as AACR GENIE or individual Genome-Wide Association
   Studies (GWASs) can help overcome this common limitation. The
   concentration of protein-protein complexes in cancer samples can be
   regulated through different mechanisms and its precise calculation
   would require multiple parameters, such as dynamics and kinetics of the
   interaction, protein stability, post-translation modification state,
   states and the binding affinities of other binding partners and other
   factors. Today, such information is unavailable. However, for
   experimentally determined complexes or neoPPIs predicted with modern
   computational approaches, we can expect a correlation between the PPI
   amounts and concentrations of individual binding partner, as
   implemented in AVERON with the PPI scores.

   The future AVERON development and improvements would benefit from an
   extensive characterization of the robustness and experimental
   validation of the method. The lack of large-scale datasets of
   neoPPI-regulated genes experimentally determined in cancer patients and
   the largely unknown clinical significance of individual neoPPIs
   challenge systematic benchmarking and validation of AVERON predictions.
   However, we envision that such data will rapidly accumulate with the
   development of PPI-based approved drugs and specific chemical probes,
   along with other approaches for precisely interrogating PPI functions.
   Meanwhile, the AVERON application on the cell-based data may facilitate
   the generation of new hypotheses testable in laboratory settings. Such
   data would be invaluable for the algorithm optimization. The future
   implementation of more advanced statistical approaches, such as
   permutation tests combined with the randomized PPI and genomics
   networks, may also significantly improve the algorithm’s robustness and
   is one of the main directions for its further development.

Troubleshooting

Problem 1

   Unable to install Anaconda or installation fails. (Related to Step 1 in
   Before your begin).

Potential solution

   Ensure you download the correct version for your operating system from
   the Anaconda website. Verify that there is enough disk space and that
   permissions are required.

Problem 2

   JupyterLab is not launching, runs slowly, or crashes (related to Step 5
   in Before your begin).

Potential solution

   Make Sure JupyerLab is installed in your conda environment. Restart the
   Anaconda and/or computer. The solutions for different
   JupyterLab-related issues can be found on the Jupyter website:
   [134]https://discourse.jupyter.org/c/jupyterlab/17.

Problem 3

   The packages used for AVERON were outdated or incompatible (related to
   Step 2 in [135]Before your begin).

Potential solution

   Check whether the package is outdated or incompatible. If it is
   outdated, use conda update <package_name> or pip install --upgrade
   <package_name> to update the packages. Replace the <package_name> with
   an outdated package. If the package is incompatible and needs a
   specific version number, reinstall the packages using conda install
   <package_name>=<version_number> or pip install
   <package_name>==<version_number> commands. Replace the <package_name>
   with the reinstalled package and <version_number> with a specific
   version number. The detailed instructions on package managing with
   conda can be found at
   [136]https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/ma
   nage-pkgs.html.

Problem 4

   The processing and analysis of large number of neoPPIs (>200) can be
   slow. (Related to step 3).

Potential solution

   Split a large neoPPI set into smaller sets of <100 PPIs and analyze
   them individually providing the corresponding files in step 3 of the
   “determine PPI Levels in cancer samples” stage.

Problem 5

   Not enough samples with mutations of interest. (Related to step 1).

Potential solution

   Consider a simultaneous analysis of multiple or all mutations by
   setting the driver_mut variable in step 1 of the “[137]determine PPI
   Levels in cancer samples”

   #Define the mutant driver

   driver_gene = 'SPOP'

   driver_mut = 'All'

   #driver_mut can be an array of point mutations or

   #driver_mut = 'ALL’ for all mutants

   params['driver_gene']=driver_gene

   params['driver_mut']=driver_mut

   if driver_gene not in genes_with_expression:

    print("Gene not found")

   Alternatively, consider integrating other datasets, such as GENIE or
   GWAS sets.

Problem 6

   The error “Gene not found.” appears in step 1 of the “Determine PPI
   Levels in cancer samples” stage (related to step 1).

Potential solution

   Check the gene name provided. Make sure that the currently approved
   gene symbol and not a common gene name is used. For example, instead of
   driver_gene = ‘LKB1’, use driver_gene = ‘STK11’. Refer to HGNC database
   at [138]https://www.genenames.org regarding the standard gene symbols.

Resource availability

Lead contact

   Andrey A. Ivanov (andrey.ivanov@emory.edu).

Technical contact

   Andrey A. Ivanov (andrey.ivanov@emory.edu).

Materials availability

   This study did not generate new unique reagents.

Data and code availability

   This paper analyzes existing, publicly available data. These accession
   numbers for the datasets are listed in the [139]key resources table.

   All original code has been deposited on GitHub and is publicly
   available as of the date of publication. The link is provided in the
   [140]key resources table.

   Any additional information required to reanalyze the data reported in
   this paper is available from the [141]lead contact upon request.

Acknowledgments