Graphical abstract graphic file with name fx1.jpg [30]Open in a new tab Highlights * • Protocol to evaluate neoPPI functions and clinical significance in cancer using AVERON * • Steps to estimate and compare neoPPI levels across different cancer patient cohorts * • Procedures for uncovering neoPPI-regulated genes and mapping them on defined pathways * • Discovery of druggable clinically significant neoPPI-regulated genes __________________________________________________________________ Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics. __________________________________________________________________ While some tumor driver mutations inhibit existing protein-protein interactions (PPIs), others can create neomorph interactions (neoPPIs) not characteristic of the wild-type counterparts. Such tumor-specific neoPPIs may represent targets for therapeutic interventions. Here, we present a protocol to computationally uncover neoPPI-enabled druggable tumor dependencies using the AVERON Notebook environment. We describe steps for determining PPI levels, identifying clinically significant neoPPIs, and determining neoPPI-regulated pathways. We then detail procedures for determining neoPPI-regulated therapeutically actionable targets. Before you begin Oncogenic mutations in tumor-driver genes are among the key events in cancer initiation and progression.[31]^2^,[32]^3 Such mutations can change protein structure, functions, and cellular localization, ultimately rewiring the protein-protein interaction (PPI) networks and dysregulating signaling and metabolic pathways. Discovery of the most clinically and biologically significant mutant-directed neomorph PPIs (neoPPIs) that drive cancer is vital to develop new personalized clinical strategies.[33]^4^,[34]^5 However, experimental interrogation of neoPPI functions and therapeutic potential is highly challenging even in cell-based models and currently is not feasible in cancer patients. To address this critical challenge, we developed a computational platform, termed AVERON Notebook, to discover Actionable Vulnerabilities Enabled by Rewired Oncogenic Networks.[35]^1 Implemented in a widely used Jupyter Notebook format, AVERON Notebook enables systematic profiling of the association between decreased clinical outcomes and neoPPI levels, rather than the status of individual genes, uncovering molecular mechanisms of neoPPI-driven tumorigenesis, and identification of druggable and therapeutically significant neoPPI-regulated genes to inform new biological models and personalized therapeutic strategies in mutant-driven cancers. The protocol below provides comprehensive and in-depth guidance on the features and capabilities of the AVERON Notebook. It is designed to empower users with a thorough understanding of AVERON’s functions and operations, offering step-by-step instructions tailored to specific goals and analyses. Software installation Inline graphic Timing: 15 min * 1. Download and install Jupyter Lab or Jupyter Notebook. Note: To run the AVERON Notebook, Jupyter Notebook or Jupyter Lab is needed. We recommend installing Jupyter Lab as part of ANACONDA from [36]https://www.anaconda.com/download. * 2. Check and install the python packages that are needed by the Averon functions, referring to the Jupyter Lab, conda, and pip documentation for the package installation instructions: + a. [37]https://jupyterlab.readthedocs.io/en/stable/user/index.htm l. + b. [38]https://conda.io/projects/conda/en/latest/user-guide/tasks /manage-pkgs.html. + c. [39]https://packaging.python.org/en/latest/tutorials/installin g-packages/#ensure-you-can-run-pip-from-the-command-line. Note: The Averon Notebook was tested with Jupyter Lab 3.6.3 installed with Anaconda 2.5.0. Upon the fresh installation, the following packages had to be installed: ipycytoscape, lifelines, scikit_posthocs, py4cytoscape, and pygtop==2.1.4. * 3. Install Cytoscape software[40]^6 for network visualization from [41]https://cytoscape.org/download-platforms.html. Download associated files Inline graphic Timing: 1 min * 4. Download the AVERON Notebook repository from GitHub: [42]https://github.com/aivanovlab/averon_notebook. Loading Python general environment Inline graphic Timing: 5 min * 5. Run Jupyter Lab. * 6. In Jupyter Lab open “averon_notebook.ipynb” located in AVERON/notebook folder. * 7. Import essential dependencies and set up the general environment. * 8. Set the path to the AVERON parent folder and execute the following code: #Make sure the parent_folder is set to the root Averon folder %reset parent_folder = "C:\\AVERON\\" Proceed (y/[n])?”, type “y” in the text box to confirm the reset of all variables. * 9. Execute the #Import essentials cell to import required packages: #Import essentials import os import warnings import sys, os, importlib,ipycytoscape,requests,pygtop,lifelines import pandas as pd import matplotlib.pyplot as plt import ipywidgets as widgets import matplotlib import seaborn as sns from IPython.display import display, HTML os.chdir(parent_folder) sys.path.insert(0, os.path.abspath('src')) import averon as av warnings.filterwarnings('ignore') pd.set_option("display.max_rows", None) plt.rcParams['pdf.fonttype'] = 42 plt.rcParams['ps.fonttype'] = 42 pd.set_option('display.max_columns', None) %matplotlib inline * 10. Define the folders and the input files. #Define folders #Set output folders: project_folder = parent_folder+"Projects"+"/" #Folder with all input files: data_folder = parent_folder+"input"+"/" #pathway folder. pathway_folder = data_folder+"pathway_genesets/MSigDB/" #Set input files genomics_clinical_folder = data_folder+"genomics_and_clinical/" #HGNC gene annotations from[43]https://www.genenames.org/ hgnc_f = genomics_clinical_folder + "hgnc5.txt" #Mutation data from[44]https://gdc.cancer.gov/about-data/publications/pancanatlas mut_f = genomics_clinical_folder +"braf_v600e.maf" #mRNA expression data from[45]https://gdc.cancer.gov/about-data/publications/pancanatlas exp_f = genomics_clinical_folder + "EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.refined.tsv" params = {} params['projects_folder']=project_folder params['mut_f']=mut_f params['exp_f']=exp_f params['clinical_f']=clinical_f params['uuid_f']=uuid_f params['hgnc_f']=hgnc_f params['data_folder']=data_folder params['pathway_folder']=pathway_folder params['genomics_clinical_folder']=genomics_clinical_folder * 11. Check if all required files and folders are in place. #Check files & folders av.check_files(params) Note: The mRNA expression file “EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv” is not provided as part of the AVERON Notebook. It should be downloaded from [46]https://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d913 6611 and pre-processed by executing the #Download TCGA expression data cell. This step should be done just once and can be skipped during the next AVERON Notebook executions. #Download TCGA expression data exp_f = av.download_and_refine_tcga_exp_data(params) * 12. Provide the project name to create a new project. #Provide a new project name: project_name = "Project1" params = av.new_project(project_name,project_folder,params) tbl_folder = params['tbl_folder'] net_folder = params['net_folder'] fig_folder = params['fig_folder'] exp_f = av.download_and_refine_tcga_exp_data(params) * 13. Load cancer patient IDs, mRNA expression data, and the coding gene names by executing #Get TCGA Patient IDs, #Get genes with mRNA expression data, and #Get protein coding genes cells. #Get TCGA Patient IDs barcode_df = av.prepare_cancer_barcodes(clinical_f) cancers = barcode_df.columns.tolist() params['barcode_df']=barcode_df barcode_df.head(3) #Get genes with mRNA expression data genes_with_expression=av.get_mRNA_expession(exp_f) print("A total of ",len(genes_with_expression), "genes with expression data") #Get protein coding genes hgnc_df = pd.read_csv(hgnc_f,sep='\t',index_col = 0) hgnc_df.set_index('Approved symbol',inplace=True) coding_genes = av.get_coding_genes(hgnc_df.copy()).index.values params['coding_genes']=coding_genes print("There is a total of",len(coding_genes),"coding genes") Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Deposited data __________________________________________________________________ Mutation-directed neo-protein-protein interactions Mo et al.[47]^4 [48]https://doi.org/10.1016/J.CELL.2022.04.014 MSigDB Liberzon et al.[49]^7 [50]https://www.gsea-msigdb.org/gsea/msigdb KEGG Kanehisa et al.[51]^8 [52]https://www.genome.jp/kegg REACTOME Fabregat et al.[53]^9 [54]https://reactome.org IUPHAR database Harding et al.[55]^10 [56]https://www.guidetopharmacology.org TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal ACC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal BLCA TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal BRCA TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal CESC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal CHOL TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal COAD TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal DLBC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal ESCA TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal GBM TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal HNSC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal KICH TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal KIRC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal KIRP TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal LAML TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal LGG TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal LIHC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal LUAD TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal LUSC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal MESO TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal OV TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal PAAD TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal PCPG TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal PRAD TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal READ TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal SARC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal SKCM TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal STAD TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal TGCT TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal THCA TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal THYM TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal UCEC TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal UCS TCGA pan-cancer RNA-seq, mutation, and clinical data NCI GDC Data Portal UVM Ligand-target interactions IUPHAR/BPS Guide to PHARMACOLOGY database [57]https://blog.guidetopharmacology.org/2024/03/27/database-release-20 24-1/ __________________________________________________________________ Software and algorithms __________________________________________________________________ AVERON Notebook Chen et al.[58]^1 [59]https://github.com/aivanovlab/averon_notebook [60]https://doi.org/10.5281/zenodo.13926943 Python 3 Python Software Foundation [61]https://www.python.org Jupyter Notebook environment Ragan-Kelley et al.[62]^11 [63]www.jupyter.org MSigDB Liberzon et al.[64]^7 [65]https://www.gsea-msigdb.org/gsea/msigdb Cytoscape Shannon et al.[66]^6 [67]https://cytoscape.org [68]Open in a new tab Step-by-step method details Determine PPI levels in cancer samples Inline graphic Timing: 6 min In this step, we are going to determine PPI levels across cancer samples. We will determine both mutant and wild-type PPI levels by calculating corresponding PPI scores. * 1. Define the mutant tumor driver gene. + a. In the #Define the mutant driver cell, provide the standard gene symbol for the tumor driver gene of interest in the driver_gene variable. + b. Define a single mutation or list several mutations in the driver_mut variable. + c. Use “ALL” to consider all driver mutations. Note: Here, we will use BRAF V600E mutant as an example. #Define the mutant driver driver_gene = 'BRAF' driver_mut = ['p.V600E'] #driver_mut can be an array of point mutations or #driver_mut = 'ALL’ for all mutants params['driver_gene']=driver_gene params['driver_mut']=driver_mut if driver_gene not in genes_with_expression: print("Gene not found") * 2. Determine mutation frequency across cancer types. + a. Build and save a graph with the frequency of the specified mutation by executing the #Mutation frequency cell ([69]Figure 1A). #Mutation frequency fig,cancers_df=av.get_cancer_mutation_freq(barcode_df,mut_f,dr iver_gene, driver_mut,cancers) #Save the graph fig.savefig(fig_folder+av.gen_filename(driver_gene,driver_mut, "_frequency_across_cancers",".png"), dpi=600,format='png') + b. Get the mutation frequency in a tabular format ([70]Figure 1B) by executing the #Show mutation frequency cell. The data will be saved to the Project/Tables folder. #Show mutation frequency and save the table tbl_file = tbl_folder+av.gen_filename(driver_gene,driver_mut, "_frequency_across_cancers",".tsv") cancers_df.to_csv(tbl_file,sep='\t') * 3. Define binding partners. + a. Execute the #Load partners cell to upload all binding partners from an external file located in the AVERONE/input folder. In this tutorial, we use the BRAF_neo_partners_example.txt file that contains three BRAF V600E neoPPI partners, one per line: o i. AURKA o ii. RAB25. o iii. CDK4. #Load partners ppi_file = data_folder + "BRAF_neo_partners_example.txt" partners = av.load_partners(ppi_file) params['partners']=partners + b. Alternatively, move on to the #Enter partners manually cell to manually upload specific binding partners by populating an array named “partners”: Note: If a binding partner lacks mRNA expression data in the database, a message stating “The gene is not in the MUT/EXP dataset. Check gene name” will be displayed. #Enter partners manually partners = ['AURKA', 'CDK4', 'RAB25'] for partner in partners: if partner not in genes_with_expression: print(partner,"not in MUT/EXP dataset. Check gene name.") params['partners']=partners * 4. To get the detailed annotations for the binding partners, execute the #Get binding partner annotation cell. It will automatically annotate the proteins based on the information available from the HGNC database[71]^12 ([72]Figure 2), and will be stored to the Project/Tables folder: #Get binding partner annotations partner_hgnc_df = av.get_partner_info(partners, hgnc_df) partner_hgnc_df.to_csv(tbl_folder+"binding_partners.csv",sep=',') partner_hgnc_df.head(3) * 5. Define the cancer type for the analysis. Note: The [73]standard TCGA cancer type abbreviations should be used. [74]Figure 1 can be used as a reference for available cancer types. #Indicate the cancer type cancer = 'SKCM' Figure 1. [75]Figure 1 [76]Open in a new tab BRAF V600E mutation frequency across different cancer types (A) The bars indicate the number of samples of each cancer type, including samples with the wild-type gene (gray), target mutation (green), and other mutations (blue). (B) The data can be exported to a tabular format. Figure 2. [77]Figure 2 [78]Open in a new tab Detailed annotations of the binding partners are provided based on the HGNC database The standard TCGA cancer type abbreviations should be used. [79]Figure 1 can be used as a reference for available cancer types. * 6. Extract and prepare mRNA expression data for the analysis by executing the #Get expression data cell: #Get expression data df_mut_exp_samples,df_wt_exp_samples =av.get_wt_mut_expression(cancer, params) df_mut_exp_samples.head(3) * 7. Evaluate neoPPI levels. + a. Calculate the PPI scores for a single cancer type by executing the #Calculate PPI scores cell #Calculate PPI scores all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df, scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer, params, df_wt_exp_samples,df_mut_exp_samples) all_mut_score_nn_df.head(3) + b. Identify co-regulated neoPPIs, with significant correlation between their PPI scores in mutant samples ([80]Figure 3). #Heatmap of co-regulated neoPPIs #The heatmap shows pairwise correlations between neoPPI scores corr = av.coregulated_neoppis_map(all_mut_score_nn_df, fig_folder,cancer,'ward',10,10) + c. To get PPI scores across multiple cancer types, execute the #PPI scores for multiple cancers cell, indicating the cancer types of interest using the standard TCGA four-letter abbreviations. #PPI scores for multiple cancers ppi_score_dict = av.get_ppi_scores_multiple_cancers(['THCA','COAD','SKCM'], params) + d. Compare neoPPI levels across different cancer types. o i. Execute the #Compare PPI scores cell to generate a table with statistical differences between PPI scores calculated for different cancer types ([81]Figure 4A). #Compare PPI scores compare_ppi_scores_df=av.compare_ppi_scores(ppi_score_dic t,partners) compare_ppi_scores_df.head(3) o ii. To save the table to the Project/Tables folder, execute the #Save it! cell. o iii. The #Get the heatmap cell will deliver a heatmap where a brighter color signifies a higher average neoPPI score ([82]Figure 4B). The #Save the heatmap cell will save the heatmap image to the Project/Figures folder. 
#Save it! compare_ppi_scores_df.to_csv(tbl_folder+av.gen_filename(d river_gene, driver_mut,'COAD_SKCM_THCA','_AVR_neoPPIscores.csv'),sep= ',') Note: The AVERON also allows the analysis of neoPPI scores in individual tumor samples, annotated with associated metadata, such as patient race, gender, and age. To generate the annotated heatmaps with the neoPPI scores per sample, execute the #Heatmaps of neoPPI scores per tumor sample cell ([83]Figure 5A). #Get the heatmap g=av.get_neo_ppi_score_heatmap(compare_ppi_scores_df) #Heatmaps of neoPPI scores per tumor sample out = {} for cancer in ['COAD','THCA','SKCM']: g,out_df = av.build_heatmap(cancer,ppi_score_dict,params, ['Race','Gender','Age']) out[cancer]=out_df o iv. To visualize the distribution of neoPPI scores across cancer types for multiple genes as boxplots, execute the #neoPPI score distribution boxplots cell ([84]Figure 5B). To retain these boxplots for future reference, proceed to run the following #Save it cell for saving. #neoPPI score distribution box plots cancers = ['THCA','COAD','SKCM'] #genes = ['GOT1','CNKSR1','AJUBA','NOX1','CHD1L','DLL3'] genes = partners colors = ['#00FF00', '#FFFF00', '#FF0000'] fig=av.boxplot_neoppi_score_distribution(genes,cancers, ppi_score_dict,colors,'gainsboro') #Save it fig_file = fig_folder+av.gen_filename(driver_gene,driver_mut,'COAD_S KCM_THCA', "_neoPPI_distribution_boxplots1.pdf") fig.savefig(fig_file, dpi=600,format='pdf' o v. The exact neoPPI score values per tumor sample can be extracted by running the #Get the PPIScore values cell ([85]Figure 5C). Use the partner variable to indicate the binding partner, and the cancer variable to indicate the cancer type: #Get the PPIScore values partner = "AURKA" cancer = "SKCM" av.get_ppi_values(cancer,partner,ppi_score_dict) o vi. The #Get sample details cell will provide the detailed metadata associated with the individual sample and a direct link to the GDC DataPortal[86]^13 for further exploration ([87]Figure 6): #Get sample details av.get_sample_info('TCGA-FW-A5DX') Figure 3. [88]Figure 3 [89]Open in a new tab Analysis of co-regulated neoPPIs The heatmap (A) and table (B) show the lack of correlation between BRAF V600E/RAB25 neoPPI scores and the scores obtained for BRAF V600E neoPPIs with AURKA or CDK4. There is a more prominent correlation of 0.7 between BRAF V600E neoPPIs with AURKA and CDK4. Figure 4. [90]Figure 4 [91]Open in a new tab Evaluation of BRAF V600E neoPPIs levels across colon (COAD), thyroid (THCA), and skin melanoma (SKCM) cancers The AVERON Notebook enables the comparison of neoPPI levels in terms of the neoPPI scores provided in a tabular format (A) or as a heat map (B). The averaged PPI scores calculated for thyroid cancer, colon cancer, and skin melanoma are provided in THCA AVR, COAD AVR, and SKCM AVR columns. The corresponding THCA SD, COAD SD, and SKCM SD columns show the standard deviation for the neoPPI scores. The FC columns indicate the fold change values calculated as the ratio of the mean of neoPPI scores. PVAL THCA-COAD, THCA-SKCM, and COAD-SKCM show the pairwise Dunn’s test p-values. The Kruskal-Wallis test p-value and q-values are provided in “KW H TEST PVAL” and “KW H TES QVAL” columns, respectively. Figure 5. [92]Figure 5 [93]Open in a new tab Analysis of neoPPI score values in individual tumor samples (A) The heatmaps show BRAF V600E neoPPIs scores in samples from COAD, SKCM, and THCA patients. (B) The boxplots show the score distribution of BRAF V600E neoPPIs with AURKA, RAB25, and CDK4 in COAD, SKCM, and THCA. The dots indicate neoPPI score values in individual tumor samples. The boxplots are shown with boxes representing the interquartile range (IQR). The midline corresponds to the median. Error bars represent Q1/Q3 ± 1.5 × IQR. (C) The data can also be extracted in a tabular format. Figure 6. [94]Figure 6 [95]Open in a new tab Patient sample details The data is extracted from GDC Data Portal. Further details can be found at the GDC DataPortal by clicking the direct uuid hyperlink. Identify clinically significant neoPPI Inline graphic Timing: 1 min Note: This step conducts a Kaplan-Meier survival analysis based on the neoPPI levels. The analysis compares the survival times of patients with high (above the median) and low (below the median) PPIneo scores. * 8. Set the cancer type and calculate neoPPI scores prior to the survival analysis by executing the #Define cancer type & update PPI scores cell. #Define cancer type & update PPI scores cancer = 'SKCM' df_mut_exp_samples,df_wt_exp_samples = av.get_wt_mut_expression(cancer,params) all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df, scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer, params, df_wt_exp_samples,df_mut_exp_samples) * 9. Conduct the survival analysis by executing the #Conduct survival analysis cell. The plots and statistics will be shown and automatically saved to the Project/Figures and Project/Tables folders, respectively ([96]Figure 7). #Conduct survival analysis pval = 0.1 qval = 0.25 surv_sum_df,fig,m1,m2,mut_surv=av.survival_analysis( df_mut_exp_samples.columns,clinical_f,all_mut_score_nn_df, 'significant',pval) #Show statistics for the survival analysis surv_sum_df.loc[(surv_sum_df['MEDIAN_TIME_HIGH']< surv_sum_df['MEDIAN_TIME_LOW'])&(surv_sum_df['PVALUE']" + surv_sum_df.style.to_html() +"")) Note: To calculate q-values, AVERON Notebook calls the get_FDR(pvals) function from /AVERON/src/averon.py that returns q-values calculated with the multipletests function from statsmodels.sandbox.stats.multicomp library. By default, the Benjamini/Hochberg method is used. However, multipletests provides other methods for the p-value adjustment, including `bonferroni` : one-step correction; `sidak` : one-step correction; `holm-sidak` : step down method using Sidak adjustments; `holm` : step-down method using Bonferroni adjustments; `simes-hochberg` : step-up method (independent); `hommel` : closed method based on Simes tests (non-negative); `fdr_bh` : Benjamini/Hochberg (non-negative); `fdr_by` : Benjamini/Yekutieli (negative); `fdr_tsbh` : two stage fdr correction (non-negative); `fdr_tsbky` : two stage fdr correction (non-negative). Further details and guidance about multipletests and their functionality can be found at [97]https://www.statsmodels.org. Figure 7. [98]Figure 7 [99]Open in a new tab neoPPI score-based survival analysis enables the evaluation of neoPPI clinical significance (A) Kaplan-Meier plots and (B) the statistics for the correlation between SKCM patient clinical outcomes and BRAF V600E neoPPIs with AURKA, CDK4, and RAB25. MEDIAN_TIME_HIGH and MEDIAN_TIME-LOW columns indicate the median times of patient survival with high and low neoPPI levels, respectively. PVALUE indicates the log-rank test p-values. QVALUE indicates the q-values. Determine neoPPI-regulated genes Inline graphic Timing: 4 min Note: This step determines neoPPI-regulated pathways to explore the biological mechanism and functions behind neoPPI-regulated genes. * 10. Identify neoPPI-regulated genes. + a. Set the cancer type and calculate the PPI scores prior to the analysis of neoPPI-regulated genes by executing the #Define cancer type & update PPI scores cell. #Define cancer type & update PPI scores cancer = 'SKCM' df_mut_exp_samples,df_wt_exp_samples = av.get_wt_mut_expression(cancer,params) all_mut_score_nn_df,all_wt_score_nn_df,scores_mut_nn_df, scores_wt_nn_df = av.calculate_ppi_scores_not_scaled(cancer, params, df_wt_exp_samples,df_mut_exp_samples) + b. Run the #neoPPI-correlated genes cell to determine the correlation between neoPPI scores and gene expression in mutant samples. The results will be automatically saved to the Project/Table/Correlated_genes folder. #neoPPI-correlated genes corr_dict = av.calculate_correlations(df_mut_exp_samples,df_wt_exp_samples ,partners, all_mut_score_nn_df,all_wt_score_nn_df) #Save it folder = tbl_folder+'Correlated_genes/' if not os.path.exists(folder): os.makedirs(folder) for p in corr_dict.keys(): corr_dict[p].to_csv(folder+av.gen_filename(driver_gene,driver_ mut,cancer,"_"+p+"_correlated_genes.csv"),sep=',') print("The analysis of neoPPI-correlated genes as been completed.") + c. Alternatively, previously calculated neoPPI-correlated genes can be loaded by executing the #Load correlated genes from file cell. #Load correlated genes from file cancer = 'SKCM' folder = tbl_folder+'Correlated_genes/' corr_dict = {} for partner in partners: corr_dict[partner] = pd.read_csv(folder+av.gen_filename(driver_gene,driver_mut,canc er, "_"+partner+"_correlated_genes.csv"),sep=',',index_col=0) + d. Determine signature genes for individual neoPPI. o i. Indicate the neoPPI partner by executing the #Select the partner cell, followed by the #Get signature genes cell. #Select the partner: partner = "AURKA" #Get signature genes def on_value_change(change): global sig_genes [box.children[0].children[x].observe(on_value_change, names='value') for x in range(0,4)]; sig_genes, tbl = av.get_signature_genes2(corr_dict,partner,driver_gene,box .children[0].children[0].value,box.children[0].children[1 ].value,box.children[0].children[2].value,box.children[0] .children[3].value) return(box,sig_genes,tbl) box,sig_genes = av.display_signature_genes(corr_dict,partner,driver_gene) [box.children[0].children[x].observe(on_value_change, names='value') for x in range(0,4)]; display(box) Note: The interaction scrollbars ([100]Figure 8A) allow us to adjust the statistical thresholds. Each signature gene within the display is linked to the HGNC portal ([101]https://www.genenames.org) for detailed gene information ([102]Figure 8B). o ii. Execute the #Save signature genes statistics cell to view and save the detailed statistical characteristics of the identified signature genes ([103]Figure 8C). o iii. To visualize the signature gene network, run the Cytoscape application. #Network of the signature genes node_df = pd.DataFrame(sig_genes) node_df['PARTNER']=partner types = [[g,"GENE"] for g in sig_genes] types.append([partner,'PARTNER']) types_df = pd.DataFrame(types,columns=['node','type']) p4c,file = av.display_cynetwork3(params,cancer,node_df,types_df, ['red','blue'],['PARTNER','GENE'],layout='force-directed ') p4c.notebook_show_image(file) o iv. Then, execute the #Network of signature genes cell. Note: The network will appear in the Notebook as an image and in the Cytoscape software as an interactive network ([104]Figure 9). + e. Determine neoPPI-regulated genes for multiple neoPPIs. o i. Adjust the statistical thresholds in the #Get signature genes of multiple neoPPIs cell. The following settings are used by default: CORR_BP_MUT = 0.33. PVAL_BP_MUT = 0.05. PVAL_MUT_vs_WT = 0.05. QVAL_MUT_vs_WT = 0.25. o ii. Execute the cell to identify signature genes for all binding partners defined in step 3. The resulting table with identified genes and associated statistical parameters will be shown and saved to the Project/Tables folder: + f. Compare the size and composition of signature gene sets. o i. Execute the #Size distribution of neoPPI-regulated gene sets. o ii. Execute #neoPPI-signature gene set overlap, JACCARD score cells to calculate JACCARD scores ([105]Figure 10). #Size distribution of neoPPI-regulated gene sets fig,stat_df = av.neoPPI_genes_distr(sign_gene_dict,'red',8,8) fig.savefig(fig_folder+av.gen_filename(driver_gene,driver _mut,cancer, "_signature_genes_hist.pdf"),dpi=600,format="pdf") stat_df.sort_values(by='SIZE',ascending=False).head(10) #neoPPI-signature gene set overlap, JACCARD score g=av.jaccard(sign_gene_dict,fig_folder,'Blues',8,8) g.savefig(fig_folder+av.gen_filename(driver_gene,driver_m ut,cancer, "_jaccard_heatmap1.pdf"),dpi=600,format='pdf') + g. To visualize the correlation between PPI-scores and mRNA expression of a signature gene in mutant and wild-type samples execute the #PPI-score/mRNA expression correlation cell. The correlation plots will be automatically saved to the Project/Figures folder ([106]Figure 11). Note: The partner and gene variables define the neoPPI binding partner and the signature gene, respectively. #PPI-score/mRNA expression correaltion partner='AURKA' gene = 'PLK1' fig = av.ppi_score_mrna_corr(gene,driver_gene,partner,all_mut_score_ nn_df, all_wt_score_nn_df, df_mut_exp_samples,df_wt_exp_samples, 'darkorange','deepskyblue') #Save it fig_file = fig_folder+av.gen_filename(driver_gene,driver_mut, cancer,"_"+partner+"_"+gene+"_correlattion_plot.png") fig.savefig(fig_file, dpi=600,format='png') QVAL = 0.25 sign_gene_dict = {} sign_gene_dict,sign_gene_df= av.get_signature_genes_for_multiple_binding_partners(CORR_BP_M UT, PVAL_BP_MUT,PVAL,QVAL,corr_dict,cancer,driver_gene,driver_mut) #Show the table display(HTML("
" + sign_gene_df.style.to_html() + "
")) #Save it! sign_gene_df.to_csv(tbl_folder+av.gen_filename(driver_gene,dri ver_mut, cancer,"_signature_genes.csv"),sep=",") Figure 8. [107]Figure 8 [108]Open in a new tab Determine signature genes for individual neoPPI (A) An interactive tool enables an easy adjustment of statistical thresholds, such as the correlation coefficient between neoPPI score and expression of the neoPPI- regulated gene in mutant samples (CORR_BP_MUT), p-value for the CORR_BP_MUT correlation, p-value of statistical difference between CORR_BP_MUT and CORR_BP_WT (PVAL_MUT_vs_WT), and q-value of statistical difference between CORR_BP_MUT and CORR_BP_WT (PVAL_MUT_vs_WT). (B) The identified signature genes are shown and hyperlinked with their corresponding pages on the HGNC database website. (C) The signature gene statistical parameters are summarized in the table, including GENE: neoPPI-regulated gene, CORR_BP_MUT: Correlation coefficient between neoPPI score and expression of the neoPPI-regulated gene in mutant samples, M_MUT: The number of mutant samples used to calculate CORR_BP_MUT, CORR_BP_WT: Correlation coefficient between neoPPI score and expression of the neoPPI-regulated gene in the wild type samples, N_WT: The number of mutant samples used to calculate CORR_BP_WT, PVAL_BP_MUT: q-value of CORR_BP_MUT correlation, QVAL_BP_MUT: q-value of CORR_BP_MUT correlation, PVAL_BP_WT: q-value of CORR_BP_WT correlation, QVAL_BP_WT: q-value of CORR_BP_WT correlation, PVAL: p-value of statistical difference between CORR_BP_MUT and CORR_BP_WT, QVAL: q-value of statistical difference between CORR_BP_MUT and CORR_BP_WT. Figure 9. [109]Figure 9 [110]Open in a new tab The signature gene network visualization The network of BRAF V600E/AURKA neoPPI signature gene network is shown as an example. The neo-binding partner (AURKA) is shown in yellow. The signature genes are shown in light blue. Figure 10. [111]Figure 10 [112]Open in a new tab Size distribution and the similarity analysis of neoPPI-regulated gene sets The size distribution can be visualized as a histogram (A) and saved in a tabular format (B). (C) A heatmap shows the neoPPI-regulated gene set similarity evaluated in terms of the Jaccard index. The overall averaged Jaccard index of 3.7% indicates a very limited overlap between signature gene sets of different neoPPIs. Figure 11. [113]Figure 11 [114]Open in a new tab neoPPI scores/mRNA expression correlation analysis Regression plots show the correlation between PLK1 expression and neoPPI scores calculated for (A) BRAF V600E/AURKA and (B) BRAF WT/AURKA PPIs. Determine neoPPI-regulated pathways * 11. Define the reference gene sets in the #Define the pathway gene sets to analyze cell. #Define the pathway gene sets to analyze #Currently Averon uses genesets defined in MSigDB: [115]https://www.gsea-msigdb.org/gsea/msigdb pathway_files = ["h.all.v2022.1.Hs.symbols.gmt", "c2.cp.kegg.v2022.1.Hs.symbols.gmt", "c2.cp.reactome.v2022.1.Hs.symbols.gmt"] params['pathway_files']=pathway_files Note: In this example, we use “h.all.v2022.1.Hs.symbols.gmt”, “c2.cp.kegg.v2022.1.Hs.symbols.gmt”, and “c2.cp.reactome.v2022.1.Hs.symbols.gmt” sets defined in the Molecular Signature Database (MSigDB).[116]^7^,[117]^14 The corresponding reference GMT dataset files are located in the AVERON\input\pathway_genesets\MSigDB folder, defined by pathway_folder = data_folder+“pathway_genesets/MSigDB/” variable.” * 12. To conduct the analysis, execute the #Conduct the pathway enrichment analysis cell. Use the partner variable to specify a binding partner for the analysis or set partner = “” to analyze all binding partners. #Conduct the pathway enrichment analysis #If partner = "", the enrichment analysis will be conducted for all binding partners #If a particular partner is specified, the analysis will be conducted just for this partner #partner="AURKA" #specify a particular binding partner partner = "" #uncomment this line to use all the partners for the analysis bars_dict,enrichment_dict=av.pathway(sign_gene_dict,pathway_files,p artner,coding_genes,pathway_folder) print("Pathway enrichment analysis completed!") + a. Execute the #Show bar graphs cells to visualize the results as bar graphs ([118]Figure 12A). #Show bar graphs partner = 'AURKA' pathway = "h.all.v2022.1.Hs.symbols.gmt" fig = bars_dict[partner][pathway] fig + b. Save individual enrichment plots by executing the #Save the bar graph cell. + c. Save the enrichment analysis statistics by executing #Save the enrichment analysis cell. Set the qval variable to change the q-value statistical significance cut-off. + d. Run the #Show the enrichment statistics cell to show the enrichment analysis statistics. Note: The qval variable defines the threshold for statistical significance, and the partner variable indicates for which neo-binding partner the results will be shown. #Save the bar graph fig_file = fig_folder+av.gen_filename(driver_gene,driver_mut,cancer, "_"+partner+"_Enrichment_"+ (".").join(pathway.split(".")[:-1])+".png") fig.savefig(fig_file, dpi=600,format='png') #Save the enrichment analysis qval = 0.05 av.save_enrichment(enrichment_dict,qval,params,cancer) #Show the enrichment statistics partner = "AURKA" qval = 0.05 enrichment_df = enrichment_dict[partner] enrichment_df = enrichment_df.loc[enrichment_df['qvalue']" + enrichment_df.style.set_properties(∗∗{'text-align': 'left'}).to_html() + enrichment_df.style.to_html() + "")) * 13. Visualize and export the enrichment analysis as a network. #Connect Mutant driver - Partner - Pathway #Set the qval threshold: qval=0.05 all_enrichment_df,all_enrichment_types_df= av.connect_mutant_driver_partner_pathway(enrichment_dict, driver_gene,qval) + a. Execute the #Connect Mutant driver - Partner - Pathway cell to prepare the network. + b. Generate an interactive network within the AVERON Notebook by executing the #Interactive network cell ([119]Figure 12B) or a Cytoscape network by executing the #Cytoscape network cell ([120]Figure 12C). Note: The interactive network works best with a small number (<100) of nodes. For larger networks, it is recommended to explore the network in Cytoscape. The Cytoscape application should be run prior to the Cytoscape network generation. The Cytoscape network will be saved as an image to the Project/Figures folder and as an SIF file to the Project/Networks folder. #Interactive network selected_partners = ['AURKA','CDK4','RAB25'] #indicate partners to use. #We don't recommentd to use more than 10 partners ipycytoscape_obj = av.create_interactive_network2(all_enrichment_df.loc[all_enric hment_df['Partner'].isin(selected_partners)],all_enrichment_ty pes_df,colors=['green','red','blue','orange']) ipycytoscape_obj #Cytoscape network p4c = av.display_cynetwork2(all_enrichment_df,all_enrichment_types_d f, colors=['#04DB00','#00CCCC','#CCCC00'],layout='force-directed' ) os.chdir(fig_folder) file = fig_folder+av.gen_filename(driver_gene,driver_mut,cancer, "_pathway_network.png") p4c.export_image(file,overwrite_file=True); #Save network to file p4c.export_network(net_folder+av.gen_filename(driver_gene,driv er_mut, cancer,"_pathway_network.sif"),type='SIF',overwrite_file=True) ; p4c.notebook_show_image(file) * 14. To represent the enrichment analysis in a heatmap format, execute the #Enrichment heatmap cell ([121]Figure 12D). The heatmap image will be saved to the Project/Figures folder. * 15. A descriptive summary of how different neoPPIs can regulate different biological pathways through their signature genes can be generated by executing the #Generate text description cell ([122]Figure 12E). #Enrichment heatmap #Set the qval threshold: qval=0.05 all_enrichment_df,all_enrichment_types_df= av.connect_mutant_driver_partner_pathway(enrichment_dict, driver_gene,qval) #select subset of pathway genesets to use #e.g. HALLMARK, KEGG, REACTOME subset = "HALLMARK" g=av.enrichment_heatmap(subset,all_enrichment_df,driver_gene, fig_folder,'blue','auto') g.savefig(fig_folder+av.gen_filename(driver_gene,driver_mut, cancer,"_"+subset+"_Enrichment_heatmap.pdf"),dpi=600,format='pdf') #Generate text description partner="AURKA" qval = 0.05 description = av.get_txt_description_of_regulations(partner, enrichment_dict,qval,params) a = ('
').join(description) display(HTML("
" + a + "
")) #Save it! fname = tbl_folder+av.gen_filename(driver_gene,driver_mut,cancer, "_"+partner+"_neoPPI_functions.txt") with open(fname, 'w') as f: f.write(("\n").join(description)) Figure 12. [123]Figure 12 [124]Open in a new tab The enrichment analysis of neoPPI-regulated genes conducted based on KEGG, REACTOME, and MSigDB cancer hallmark sets (A) The bar graph shows enrichment of BRAF V600E/AURKA neoPPI-regulated genes in the MSigDB cancer hallmark genes. The red dash line indicates the statistical significance cut-off of q-value = 0.05. (B) An interactive network shows the connectivity between the mutant driver gene, neo binding partners, and the neoPPI-regulated pathways defined. The mutant driver gene (e.g., BRAF V600E) is shown in green. The neo binding partners (AURKA, CDK4, and RAB25) are shown in red. The neoPPI-regulated pathways are shown in gray. The gene and pathway names can be shown by hovering the mouse over the corresponding node. (C) The network can also be visualized and explored in Cytoscape. The mutant driver gene, neo binding partners, and neoPPI-regulated pathways are shown in green, red, and yellow, respectively. The pathway names are hidden for clarity. (D) A heatmap shows the MSigDB cancer hallmark gene sets that can be regulated by BRAF V600E neoPPIs with AURKA, CDK4, and RAB25. (E) A descriptive summary of how neoPPIs can regulate different oncogenic pathways. Uncover druggable vulnerabilities Inline graphic Timing: 11 min Note: This step enables the exploration of the neoPPI-regulated clinically actionable targets with available approved drugs and inhibitors. * 16. Determine clinically significant neoPPI-regulated genes. + a. Execute the #Clinically significant neoPPI- signature genes cell to determine neoPPI signature genes whose high expression in mutant samples correlates with worsened clinical outcomes ([125]Figure 13A). Note: With default parameters, the high expression of a gene g is defined as the expression above the 67th percentile, and the low expression is defined as the gene expression below the 33rd percentile in the mutant samples. #Clinically significant neoPPI-signature genes #use partners to determine clinically significant signature genes for all #neoPPIs or provide a subset of neo binding partners e.g. ['AURKA'] or #['AURKA','RAB25','CDK4'] survival_df,sign_gene_dict,survival_plots = av.sign_genes_survival(df_mut_exp_samples, sign_gene_dict,params,cancer,partners) display(HTML("
" + survival_df.loc[survival_df['CLIN_FDR']<0.1].sort_values( by=['CLIN_FDR']).style.to_html() + "
")) + b. To save all the survival plot images for all neoPPIs analyzed, execute #Save survival cell. The plots will be saved to the Project/Figures folder. #Save survival plots pval = 0.05 qval = 0.25 av.save_survival_plots(survival_plots,pval, qval,partner,params, cancer,survival_df) Note: Use the pval and qval variables to define the statistical thresholds to save the plots. + c. To visualize and save the survival plot for individual genes, execute the #Survival plot for a single gene cell ([126]Figure 13B). #Survival plot for a single gene gene = 'PLK1' #Save it av.single_gene_survival_plot(df_mut_exp_samples,gene, clinical_f)[0].savefig(fig_folder+'survival_'+gene+'.pdf', dpi=600,format='pdf') av.single_gene_survival_plot(df_mut_exp_samples,gene,clinical_ f)[0] Note: Use the gene variable to set the neo-binding partner for the analysis. * 17. Identify approved drugs and general inhibitors drugs available for neoPPI-regulated genes by executing the #Gene-drug connectivity cell ([127]Figure 13C). Note: The pval and the qval variables can be used to set statistical thresholds and conduct the analysis for a subset of clinically significant genes. To perform the analysis for all neoPPI signature genes, set pval = 1 and qval = 1. The information about the available drugs is extracted from the IUPHAR database[128]^10: [129]https://www.guidetopharmacology.org. Note: The identified compounds will be shown in an interactive table, where each compound is directly linked to its page on the IUPHAR website for detailed exploration. For convenience, the data is also shown in a scrollable table, which is automatically saved to the Project/Tables folder ([130]Figure 13D). #Gene-drug connectivity #Set p-value and q-value thresholds for gene clinical significance: pval = 0.05 qval = 0.25 #use partners to determine clinically significant signature genes for all #neoPPIs or provide a subset of neo binding partners e.g. ['AURKA'] or #['AURKA','RAB25','CDK4']: clin_genes,ligands_df,gene_drugs_df,ofile = av.get_drugs(sign_gene_dict, pval,qval,params,cancer,partners) display(HTML("
" + gene_drugs_df.style.to_html() + "
" + "" + "Gene-drug connectivity table was saved to
" + ofile + " or pip install --upgrade to update the packages. Replace the with an outdated package. If the package is incompatible and needs a specific version number, reinstall the packages using conda install = or pip install == commands. Replace the with the reinstalled package and with a specific version number. The detailed instructions on package managing with conda can be found at [136]https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/ma nage-pkgs.html. Problem 4 The processing and analysis of large number of neoPPIs (>200) can be slow. (Related to step 3). Potential solution Split a large neoPPI set into smaller sets of <100 PPIs and analyze them individually providing the corresponding files in step 3 of the “determine PPI Levels in cancer samples” stage. Problem 5 Not enough samples with mutations of interest. (Related to step 1). Potential solution Consider a simultaneous analysis of multiple or all mutations by setting the driver_mut variable in step 1 of the “[137]determine PPI Levels in cancer samples” #Define the mutant driver driver_gene = 'SPOP' driver_mut = 'All' #driver_mut can be an array of point mutations or #driver_mut = 'ALL’ for all mutants params['driver_gene']=driver_gene params['driver_mut']=driver_mut if driver_gene not in genes_with_expression: print("Gene not found") Alternatively, consider integrating other datasets, such as GENIE or GWAS sets. Problem 6 The error “Gene not found.” appears in step 1 of the “Determine PPI Levels in cancer samples” stage (related to step 1). Potential solution Check the gene name provided. Make sure that the currently approved gene symbol and not a common gene name is used. For example, instead of driver_gene = ‘LKB1’, use driver_gene = ‘STK11’. Refer to HGNC database at [138]https://www.genenames.org regarding the standard gene symbols. Resource availability Lead contact Andrey A. Ivanov (andrey.ivanov@emory.edu). Technical contact Andrey A. Ivanov (andrey.ivanov@emory.edu). Materials availability This study did not generate new unique reagents. Data and code availability This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the [139]key resources table. All original code has been deposited on GitHub and is publicly available as of the date of publication. The link is provided in the [140]key resources table. Any additional information required to reanalyze the data reported in this paper is available from the [141]lead contact upon request. Acknowledgments