Abstract Purpose This study aims to explore gene expression signatures and serum biomarkers to predict intrinsic chemoresistance in epithelial ovarian cancer (EOC). Patients and Methods Gene expression profiling data of 322 high-grade EOC cases between 2009 and 2010 in The Cancer Genome Atlas project (TCGA) were used to develop and validate gene expression signatures that could discriminate different responses to first-line platinum/paclitaxel-based treatments. A gene regulation network was then built to further identify hub genes responsible for differential gene expression between the complete response (CR) group and the progressive disease (PD) group. Further, to find more robust serum biomarkers for clinical application, we integrated our gene signatures and gene signatures reported previously to identify secretory protein-encoding genes by searching the DAVID database. In the end, gene-drug interaction network was constructed by searching Comparative Toxicogenomics Database (CTD) and literature. Results A 349-gene predictive model and an 18-gene model independent of key clinical features with high accuracy were developed for prediction of chemoresistance in EOC. Among them, ten important hub genes and six critical signaling pathways were identified to have important implications in chemotherapeutic response. Further, ten potential serum biomarkers were identified for predicting chemoresistance in EOC. Finally, we suggested some drugs for individualized treatment. Conclusion We have developed the predictive models and serum biomarkers for platinum/paclitaxel response and established the new approach to discover potential serum biomarkers from gene expression profiles. The potential drugs that target hub genes are also suggested. Introduction Epithelial ovarian cancer, which accounts for over 90% of all ovarian cancers, occurs most commonly in sixth and seventh decades of postmenopausal women and is a leading cause of cancer death among women in developed countries [35][1]. In the United States, there were approximately 21,990 new cases of ovarian cancer diagnosed and 15,460 deaths in 2011 [36][2]. Primary cytoreductive surgery followed by postoperative chemotherapy is considered the standard of care for advanced ovarian cancer [37][3]. First-line chemotherapy with platinum and paclitaxel agents is capable of achieving a complete response (CR) in the approximately 70% of patients with advanced disease [38][4]. However, about 30% patients do not respond to these drugs and even the patients who initially respond to first line chemotherapy frequently relapse and eventually become resistant to those agents. How to predict chemotherapeutic resistance and even more importantly, how to reverse the resistance are clinically challenged. One of approaches is to identify predictive biomarkers especially those biomarkers that could be also therapeutic targets. Gene expression profiling technology was used to identify chemoresistance-related biomarkers [39][5]–[40][13]. However, to date, no gene expression signature has been proved to be sufficiently effective in predicting chemoresistance in clinical practice, which is largely due to inappropriate sample inclusion and/or small sample size used in the studies. To address this challenge, we critically selected and assessed 322 serous ovarian cancer patients only with CR or progressive disease (PD) to platinum/paclitaxel-based therapy from The Cancer Genome Atlas (TCGA) project to identify gene expression signatures associated with chemoresistance. By using supervised principal components method, a 349-gene predictive model and an 18-gene de-correlated model independent of patient age, stage, debulking status or tumor stages were developed for chemoresistance prediction. Further, to identify serum chemotherapeutic biomarkers for more practically clinical application, we combined our 322-gene expression profile and four previous findings, which were selected based on the strict criteria of gene expression profile of the epithelial ovarian cancer, validation, appropriate sample size, and treatment response prediction of first line chemotherapy. We found ten serum biomarkers that have predictive value for primary response to first-line chemotherapy. In the end, several drugs that could target hub genes in our models were suggested. Our results provide a platform for the selection of the most suitable drugs for a better treatment outcome of those patients resistant to platinum/paclitaxel-based chemotherapy. Patients and Methods Ethics Statement We are free to use ovarian cancer data in TCGA by meeting its freedom-to-publish criteria: A marker paper has been published on that tumor type. The Research Ethics Committee of Peking University Cancer Hospital & Institute waived the requirement for ethical approval of this analysis because the registry is a de-identified database. Written consents were obtained from all alive patients. Patients and Tissue Samples A total of 322 patients with high-grade serous ovarian cancer were carefully selected from TCGA database (National Cancer Institute. The cancer genome atlas data portal. [41]http://tcga-data.nci.nih.gov/tcga/findArchives.htm. Accessed September 1, 2011). Detailed information of the selected patients including age at diagnosis, tumor stage, grade and debulking status are listed in [42]Table 1. All ovarian cancer specimens' information and clinical definitions were previously described [43][14]. All selected patients received a first-line platinum/paclitaxel-based treatment except that four patients' treatment regimen was unknown. The 322 samples were randomly divided into training (n = 200) and testing sets (n = 122). In the training set, 177 of 200 patients demonstrated CR and 23 of 200 patients demonstrated PD to primary platinum/paclitaxel-based therapy after surgery. In the testing set, 110 of 122 patients had CR and 12 of 122 patients had PD to platinum/paclitaxel-based treatment. Table 1. Clinicopathological Characteristics of Ovarian Cancer Patients. Characteristics Clinical Complete Responders (n = 287) Progressive Desease (n = 35) p-value Mean age, years 58.76655 59.91429 0.574[44]^a Stage(FIGO), No. of patients 0.065[45]^b II 20 0 III 228 26 IV 39 9 Grade, No. of patients 0.484[46]^b 2 35 6 3 251 29 4 1 0 Surgical debulking, No.of patients 0.001[47]^b None 70 2 ≤1 cm 138 16 1 cm∼2 cm 13 6 >2 cm 38 10 Unknown 28 1 First-line chemotherapy 1[48]^b Platinum-based Taxane (paclitaxel or docetaxel) 283 35 Unknown 4 0 [49]Open in a new tab Abbreviations: FIGO = Fe'de'ration Internationale de Gyne'cologie et Obste'trique; ^a Mann-Whitney test. ^b Fisher's exact test. Selection of Related Studies Published Previously In order to find previous studies closely related to our study, we searched online databases from 2005∼2011 with strict criteria: the same cancer subtype, validation, appropriate sample size, and treatment response prediction of first line chemotherapy. Four studies were identified [50][6], [51][8], [52][10], [53][11] and their detailed information was listed in [54]Table 2. For the validation of our signature genes, 3 datasets [55][15]–[56][17] from NCBI GEO database [57][18] were downloaded. These 3 datasets are all gene expression profiles of chemoresistant ovarian cancer cell lines ‘A2780-resistant’ and parental cell line ‘A2870’, which were independently generated by 3 different groups. There are 5, 3 and 6 replicates in datasets [58]GSE15372, [59]GSE28646, and [60]GSE33482 respectively. Genes that are closely related to platinum/paclitaxel treatment response are also searched on the CTD database. Table 2. Selection of the previously published gene signatures associated with response to platinum/Paclitaxel-based treatment (2005 to 2011). Publication Platform No.of genes Samples investigated Journal Dressman et al, 2007 Affymetrix Human U133A GeneChip 1388 119 advanced-stage serous ovarian cancers J Clin Oncol Helleman et al, 2006 18K cDNA microarrays 68 96 primary ovarian adenocarcinoma(mainly serous) Int J Cancer Jazaeri et al, 2005 Combined two cDNA microarrays contained 32,448 and 7,585 features each 85 21primary chemosensitive tumors and 24 primary chemoresistant tumors(mainly serous) Clin Cancer Res Ju et al, 2009 Affymetrix Human U133A GeneChip 100 5 primary chemosensitive tumors and 8 primary chemoresistant tumors Oncol Res [61]Open in a new tab Gene Expression Profiling Analysis Gene expression profiling data (level 3) of 322 serous ovarian cancer samples were obtained from the TCGA Data Portal. The profiling of all the samples was performed on the Human U133A Gene Chip (Affymetrix, Santa Clara, CA). Bioinformatics and Statistical Analysis The supervised principal components method was employed for generating a general predictive gene model and a gene model that is independent of key clinical features including age, stage, debulking status, and grade (de-correlated model). The analysis above was conducted using superpc package [62][19] in R 2.14.0 (R Foundation for Statistical Computing [[63]http://www.r-project.org/]). The differentially expressed genes in 3 GEO dataset were also computed in R. We used ChEA for transcription factor analysis [64][20], DAVID and Clone/Gene ID Converter for gene annotation [65][21], [66][22], and GATHER for pathway enrichment analysis [67][23]. GNCpro ([68]http://gncpro.sabiosciences.com/gncpro/gncpro.php), C3NET package [69][24] in R 2.14.0, MiMI plugin [70][25], and GeneMANIA [71][26] plugin in Cytoscape [72][27] were employed to explore and plot gene-gene interaction network and transcription factor network as well as top ten genes' interaction network and gene-drug interaction network. Standard statistical tests were used to analyze the clinical and gene expression profiling data, including the χ^2 test, fisher exact test and independent samples t-test. Significance was defined as a p value of less than 0.05. Benjamini-Hochberg multiple testing correction was used to estimate the false discovery rate in the pathway analysis [73][28]. Receiver operating characteristic (ROC) curve and Area Under the Curve (AUC) were used for signature predictability evaluation. Analyses were primarily performed using R and SPSS version 18 (SPSS Inc, Chicago, Illinois). Results Development of Predictive Models Associated With Chemotherapeutic Response To identify a gene expression signature that predicts response to chemotherapy and thus help determine the most appropriate regimen for personalized treatment, a 349-gene predictive model and an 18-gene de-correlated model were developed from the training set using superpc package in Bioconductor ([74]Fig. 1). Specifically, first we computed the univariate regression score of each feature (12042 genes) in regard to patient's treatment outcome (CR or PD). Then we performed a 10 fold cross-validation to find out the best threshold and to form a reduced data matrix consisting of only those features whose score exceeds a threshold (in our case, the best threshold is 1.26). Then we performed principal component analysis to find out the most significant gene set for prediction of treatment response. It turns out the first principal component containing 349 features is the best (p = 0.025). In order to keep the strongest power in prediction of response, we didn't do the shrinkage. Finally, the most significant principal component in a regression model was used to predict the treatment outcome. Similarly, in case of developing de-correlated model, first we fitted a linear model to key clinical features (age, stages, debulking status, and grade) as competing predictors, and then we replaced these features by the residual from this fit. In the superpc model building process, these ‘de-correlated’ features are used to explicitly look for predictors which are independent of key clinical features. We chose the threshold 1.85 and the first principal component contains 18 features with p = 0.001. Since 18-gene model is a small gene set, we didn't do the shrinkage either. Figure 1. Work flow of the study design. Figure 1 [75]Open in a new tab The 322 high-grade serous ovarian cancer cases were randomly divided in the training set (200 samples) and the testing set (122 samples). The training set was used to generate the predictive model and de-correlated model that is independent of key clinical features. Then these two models were validated using the testing set. Next we used 3 datasets from GEO database to validate signature genes in our findings. To explore potential biomarkers in serum, we combined signature genes in these two models with genes previously reported in four previous studies and queried these genes in DAVID database. Seventy-seven genes encoding secretory proteins were identified ([76]Table S3). The predictability of those genes for chemotherapeutic response was then tested individually using the data from all 322 samples. Finally, we performed a functional analysis on those signature genes and suggested some drugs that could target the hub genes in our findings. As shown in [77]Fig. 2, the 349-gene signature had an AUC = 0.826 (p<0.001) in the training set ([78]Fig. 2A) and AUC = 0.702 (p = 0.022) in the testing set ([79]Fig. 2B). The 18-gene de-correlated signature had an AUC = 0.775 (p<0.001) in the training set ([80]Fig. 2C) and AUC = 0.614 (p = 0.197) in the testing set ([81]Fig. 2D). In the 349-gene model, 30 most weighted genes were listed in [82]Table 3 (We determined the top 30 genes by the rank of importance-score of each gene in [83]Fig. S6, which was computed for each gene equal its correlation with the supervised principal component predictor. See [84]Table S1 for all genes' information in this model) and 18 genes in the 18-gene de-correlated model were listed in [85]Table 4 (important-score of each gene is listed in [86]Fig. S7). Figure 2. ROC curves of the two predictive models in the training set and the testing set. [87]Figure 2 [88]Open in a new tab (A) ROC curve of the 349-gene predictive model in training set (200 samples, AUC = 0.826; p<0.001. (B) ROC curve of the 349-gene predictive model in the testing set (122 samples, AUC = 0.702; p = 0.022). (C) ROC curve of the 18-gene de-correlated predictive model in the training set (200 samples, AUC = 0.775; p<0.001. (D) ROC curve of the 18-gene de-correlated predictive model in the testing set (122 samples, AUC = 0.614; p = 0.197). Table 3. Top 30 weighted genes in 349-gene signature. Gene Symbol Description Function C6orf120 Chromosome 6 open reading frame 120 Secreted, signal, extracellular region BLMH Bleomycin hydrolase response to toxin, response to drug ACTR6 ARP6 actin-related protein 6 homolog (yeast) Actin/actin-like USP21 Ubiquitin specific peptidase 21 positive regulation of transcription, chromatin modification NIT1 Nitrilase 1 nitrilase activity, hydrolase activity VPS72 Vacuolar protein sorting 72 homolog (S. cerevisiae) negative regulation of gene expression, chromatin regulator KIAA0859 KIAA0859 methyltransferase, tumor promoter GTF2H5 General transcription factor IIH, polypeptide 5 DNA repair,response to DNA damage stimulus PIGC Phosphatidylinositol glycan anchor biosynthesis, class C protein amino acid lipidation, GPI anchor metabolic process C1orf25 Chromosome 1 open reading frame 25 tRNA (guanine) methyltransferase activity, ion binding TBP TATA box binding protein transcription regulation,CARM1 and Regulation of the Estrogen Receptor NCSTN Nicastrin positive regulation of apoptosis, Notch signaling pathway SF3B4 Splicing factor 3b, subunit 4, 49 kDa RNA splicing factor activity, transesterification mechanism SCAMP3 Secretory carrier membrane protein 3 response to extracellular stimulus, protein transport MTX1 Metaxin 1 establishment of protein localization, intracellular transport C1orf27 Chromosome 1 open reading frame 27 oxidation reduction, metal ion binding RHOT1 Ras homolog gene family, member T1 microtubule-based transport, programmed cell death,small GTPase mediated signal transduction ZNF200 Zinc finger protein 200 regulation of transcription, transition metal ion binding SCNM1 sodium channel modifier 1 mRNA processing, transition metal ion binding DDX23 DEAD (Asp-Glu-Ala-Asp) box polypeptide 23 mrna processing, cellular macromolecular complex assembly SSR2 Signal sequence receptor, beta (translocon-associated protein beta) establishment of protein localization, intracellular transport,signal sequence binding ENSA Endosulfine alpha response to extracellular stimulus, ion channel inhibitor activity PDCD2 Programmed cell death 2 apoptosis, dna-binding, metal-binding TIMM17B Translocase of inner mitochondrial membrane 17 homolog B (yeast) protein localization NFS1 NFS1 nitrogen fixation 1 homolog (S. cerevisiae) sulfurtransferase activity,cysteine metabolic process GNPDA1 Glucosamine-6-phosphate deaminase 1 alcohol catabolic process, amino sugar catabolic process NKIRAS2 NFKB inhibitor interacting Ras-like 2 I-kappaB kinase/NF-kappaB cascade,small GTPase mediated signal transduction ENOPH1 Enolase-phosphatase 1 Cysteine and methionine metabolism TTC31 Tetratricopeptide repeat domain 31 unknown NUDT9 Nudix (nucleoside diphosphate linked moiety X)-type motif 9 purine nucleotide metabolic process, ion transport [89]Open in a new tab Table 4. The 18 signatured genes in the 18-gene de-correlated model. Gene Symbol Gene Name Function AFF1 AF4/FMR2 family, member 1 positive regulation of gene expression, Proto-oncogene AFM afamin serum transport proteins CLCA4 chloride channel accessory 4 calcium-activated chloride channel CXXC4 CXXC finger 4 chemotherapy resistance, metal-binding ESR2 estrogen receptor 2 (ER beta) negative regulation of apoptosis,metal-binding, dna-binding HSD17B2 hydroxysteroid (17-beta) dehydrogenase 2 response to chemical stimulus LMO1 LIM domain only 1 (rhombotin 1) cell proliferation,apoptosis regulation, Proto-oncogene, metal-binding MVK mevalonate kinase isoprenoid and sterol synthesis,atp-binding OPCML opioid binding protein/cell adhesion molecule-like tumor suppressor, cell adhesion PAPPA PAPPA antisense RNA (non-protein coding); pregnancy-associated plasma protein A, pappalysin 1 wound healing and angiogenesis, metal-binding PDCD1LG2 programmed cell death 1 ligand 2 regulation of immune system process PSMD4 proteasome (prosome, macropain) 26S subunit, non-ATPase, 4 mitotic cell cycle, metal-binding, rna-binding, atp-binding RNASEL ribonuclease L (2′,5′-oligoisoadenylate synthetase-dependent) tumor suppressor, metal binding SEMA4F sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4F regulation of cell growth SLC17A7 solute carrier family 17 (sodium-dependent inorganic phosphate cotransporter), member 7 ion transport, cell junction TNFSF11 tumor necrosis factor (ligand) superfamily, member 11 regulation of cell apoptosis TRIM15 tripartite motif-containing 15 metal-binding ZP2 zona pellucida glycoprotein 2 (sperm receptor) cell-cell recognition, microenvironment [90]Open in a new tab Based on these results, the 349-gene model had high sensitivity and specificity in both the training set and the testing set. The 18-gene de-correlated model had good sensitivity and specificity in the training set, but relatively low sensitivity and specificity in the testing set. Functional Analysis of the Signature Genes from the Two Predictive Models To understand the biological roles of the signature genes from the 349-gene predictive model and the 18-gene de-correlated model involved in chemoresistance, we performed three types of analyses. First, we conducted a gene-gene interaction network analysis to identify hub genes in the 349-gene model using MiMI plugin in Cytoscape ([91]Fig. 3). By defining hub genes as genes that interact with at least three other genes, ten hub genes were identified ([92]Table 5), of which UBE2I (Ubiquitin-conjugating enzyme E2I) [93][29], [94][30], CASP3 (Caspase 3, apoptosis-related cysteine peptidase) [95][31] and MAPK3 (Mitogen-activated protein kinase 3) [96][32], [97][33] are closely associated with platinum/paclitaxel-based chemotherapeutic response. Figure 3. Ten hub genes in the 349-gene signature. [98]Figure 3 [99]Open in a new tab Genes that interact with at least three other genes were selected, among which UBE2I, CASP3 and MAPK3 are important molecules that are involved in ovarian cancer progression or chemoresistance. Detailed information of these ten hub genes are listed in [100]Table 4. Table 5. Ten hub genes in the 349-gene signature. Gene Symbol Description Function UBE2I Ubiquitin-conjugating enzyme E2I mitotic cell cycle, negative regulation of gene expression SMARCE1 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily e, member 1 negative regulation of transcription, chromatin modification CASP3 Caspase 3, apoptosis-related cysteine peptidase response to tumor necrosis factor, regulation of cell proliferation DISC1 Disrupted in schizophrenia 1 microtubule organizing center ARHGEF11 Rho guanine nucleotide exchange factor (GEF) 11 regulation of cell growth, regulation of apoptosis CENTD2 Centaurin, delta 2 promote tumor survival RHOT1 Ras homolog gene family, member T1 microtubule-based transport, programmed cell death,small GTPase mediated signal transduction ARHGAP6 Rho GTPase activating protein 6 negative regulation of cell-matrix adhesion, small GTPase mediated signal transduction ARHGEF9 Cdc42 guanine nucleotide exchange factor (GEF) 9 induction of apoptosis by extracellular signals, regulation of Ras protein signal transduction MAPK3 Mitogen-activated protein kinase 3 Ras protein signal transduction, cell cycle [101]Open in a new tab Considering that most changes in gene expression are regulated by upstream regulatory transcription factors and/or signaling genes, we then searched on ChEA for transcription factors that could regulate the 349 genes. Twenty nine transcription factors with statistical significance (p<0.01) were found ([102]Table S2). Based on different interaction types, we constructed an interaction network of these transcription factors ([103]Fig. S5).and performed a pathway enrichment analysis using these factors. We found that six pathways were most related to eight of the 29 transcription factors, including MAPK signaling pathway (p = 0.0007), TGF-beta signaling pathway (p = 0.001), cell cycle (p = 0.001), Wnt signaling pathway (p = 0.003), focal adhesion (p = 0.007), and cell proliferation (p = 0.02). These pathways may play important roles in chemoresistance in ovarian cancer. Detailed information of the transcription factors are listed in [104]Table 6 and [105]Table S2. Table 6. Six enriched pathways of 29 transcription factors derived from the 349-gene model. Annotation Transcription Factors p-value MAPK signaling pathway ELK1 JUN MYC 0.0007 TGF-beta signaling pathway E2F4 MYC 0.001 Cell cycle E2F1 E2F4 0.001 Wnt signaling pathway JUN MYC 0.003 Focal adhesion ELK1 JUN 0.007 Cell proliferation AR E2F1 E2F4 ESR1 ETS1 MYC 0.02 [106]Open in a new tab The gene-gene interaction network of the 18-gene model was built by C3NET in Bioconductor ([107]Fig. S6A), in which 17 genes have physical interactions with the other genes. To gain a more complete picture of the 18 genes and their interacting neighbors, we constructed a network using GNCpro ([108]Fig. S6B), in which 11 of 18 genes have interactions with the other genes and PAPPA, TNFSF11, and ESR2 are important hub genes in the 18-gene model. As shown in [109]Fig. S6B, TNFSF11 up-regulates critical transcription factors such as JUN, SRC and AKT1, ESR2 has physical interactions with SP1, AKT1 and SRC, and PAPPA modifies IGFBP4 [110][34]–[111][40]. Hunting for Potential Serum Biomarkers for Chemotherapeutic Response Since serum biomarker is most conveniently detected in clinics, we sought to set up a new way to identify potential serum biomarkers for chemotherapeutic response from gene expression profiles by targeting genes encoding secretory proteins. We integrated genes in 349-gene model and 18-gene model and those genes from the four previous studies and searched for genes encoding the secretory proteins in DAVID database ([112]Fig. 1). As a result, 77 genes were identified to encode the secretory proteins that could be secreted into the serum ([113]Table S3). We then tested the predictive values of these genes individually for chemotherapeutic response using the 322 gene expression profiling data and computed the AUC value of these genes ([114]Table S4). Top ten genes with highest AUC values ([115]Table 7) were found to have the ability to discriminate the CR group from the PD group (p<0.05), of which AFM has been reported to be an independent serum biomarker of CA125 for the prediction of ovarian cancer progression by comparative proteomics analysis [116][41], [117][42]. Table 7. Area Under the Curve (AUC) of Top Ten genes (p<0.05) that encode secretory proteins. Gene AUC Std. Error[118]^a Asymptotic Sig.[119]^b Asymptotic 95% Confidence Interval Lower Bound Upper Bound CLPS 0.637 0.041 0.008 0.556 0.718 C1orf56 0.636 0.048 0.009 0.542 0.730 AFM 0.630 0.055 0.012 0.523 0.737 GIP 0.618 0.052 0.022 0.515 0.721 PRG4 0.618 0.052 0.023 0.515 0.721 CPA2 0.617 0.046 0.024 0.527 0.707 FOLR1 0.613 0.047 0.029 0.521 0.705 IL1RL1 0.608 0.046 0.037 0.518 0.699 COMP 0.604 0.055 0.045 0.496 0.711 C6orf120 0.602 0.050 0.048 0.504 0.700 [120]Open in a new tab Abbreviations: Std.: standard. Sig: significance. ^a Under the nonparametric assumption; ^b Null hypothesis: true area = 0.5. Functional Analysis of Potential Serum Therapeutic Biomarkers To further investigate the roles of the top ten serum biomarkers in chemotherapeutic response, we constructed a gene/protein interaction network using GNCpro. As shown in [121]Fig. 4A, IL1RL1, PRG4, AFM, GIP and COMP appeared to be critical hub genes since they could interact with the genes known to be involved in chemoresistance. For example, AFM seems to interact indirectly with MUC1, ESR1, and BRCA1 that are known to contribute to the resistance to the platinum/paclitaxel-based treatment ([122]Fig. 4B). Figure 4. Hub genes and gene-gene interaction networks of top ten secretory protein-encoding genes. [123]Figure 4 [124]Open in a new tab (A) Hub genes and neighboring genes of top ten secretory protein-encoding genes. (B) AFM was exemplified to show potential mechanisms of the top ten secretory protein-encoding genes probably involving in chemoresistance. Further validation of genes in our signature, hub genes and potential serum biomarkers We further validated genes in our signature, hub genes and potential serum biomarkers in 3 more ways. First, we use 3 different datasets ([125]GSE15372, [126]GSE28646 and [127]GSE33482) from NCBI GEO database to validate our data. These 3 datasets are all gene expression profiles of chemoresistant ovarian cancer cell lines ‘A2780-resistant’ and parental cell line ‘A2870’, which were generated by 3 different groups. There are 5, 3 and 6 replicates in datasets [128]GSE15372, [129]GSE28646 and [130]GSE33482, respectively. Differentially expressed genes in these 3 dataset were computed and displayed in [131]Table S8. We use the Venn diagram to show the overlap between our signature genes and those differentially expressed genes ([132]Fig. 5). 133 genes of 349-gene model, 9 genes of 18-gene model, 7 of 13 hub genes and 5 of 10 potential serum biomarkers are overlapped with differentially expressed genes from those 3 datasets ([133]Fig. 5A, 5B, 5C, and 5D, respectively). Figure 5. Venn diagram showing the overlap between our signatures genes and 3 external datasets from NCBI GEO database. [134]Figure 5 [135]Open in a new tab The Venn diagram shows how much genes in the 349-gene model (A), 18-gene model (B), hub genes (C), and top 10 serum biomarkers (D) are overlapped with 3 external datasets [136]GSE15372, [137]GSE28646 and [138]GSE33482. Meanwhile, we searched CTD database for genes that could associated with sensitivity to platinum/paclitaxel-based drugs ([139]Table S5), then the overlap between our signature genes and searched results was also presented ([140]Fig. S4C). 30 of 349-gene model, 3 of 18-gene model and 4 of 13 hub genes are overlapped with searched results. In addition, we presented the overlap between our signature genes and those signature genes from 4 literatures ([141]Fig. S3 and [142]S4B). 16 genes of 349-gene model, 0 genes of 18-gene model, 1 of 13 hub genes, and 2 of 10 potential serum biomarkers are overlapped with gene signatures from 4 previous publications ([143]Table S9). However, as we can see from [144]Figure S4D, signature genes from literatures in [145]Table 2 also shown little overlap among them. This may due to their relatively small sample size, different standard of sample selection, or different methods to develop predictive models. Construction of Gene-drug interaction network and gene targeting drug suggestion Since we already got key transcription factors and hub genes, we might want to know which drugs could target these genes in order to reverse the resistance to platinum/paclitaxel-based treatment. By searching on The CTD and NCBI Pubmed Database, several drugs and specific inhibitors were found to interact with our key transcription factors and hub genes. By combining these results, we build a key transcription factor-drug interaction network ([146]Fig. S7) and a hub gene-chemical interaction network ([147]Fig. 6), which not only show us which chemicals can inhibit these key transcription factors and hub genes, but also tell us how these genes could increase or decrease the susceptibility of chemotherapeutic drugs. For example, ESR2 could increase the patient's susceptibility to Cisplatin, Etoposide and Raloxifene, while Gefitinib could increase the expression of ESR2. MAPK3 could decrease the patient's susceptibility to Doxorubicin, Dacarbazine and Estrogens, while Gefitinib and Cisplatin could decrease the expression of MAPK3, which suggested Gefitinib might be a good drug for platinum/paclitaxel-resistant patients. Figure 6. Hub gene-drug interaction network. [148]Figure 6 [149]Open in a new tab The hub gene-drug interaction network shows us how these genes and drugs could interact with each other. For example, ESR2 could increase the patient's susceptibility to Cisplatin, Etoposide and Raloxifene, while Gefitinib could increase the expression of ESR2. Discussion Prediction of chemotherapeutic response is always a challenging clinical task. Many efforts have been conducted to find gene expression signatures to discriminate different responders using the high-throughput technology. However, none of those gene signatures is formally used in clinics. Possible reasons might be lacking of critical sample selection or small sample size. Our study and 4 previous publications listed in [150]Table 2 are all aimed to find out gene signatures for predicting platinum-based treatment outcome of serous ovarian cancer. And there are several genes overlapped between our signature genes and those signature genes from previous publications. However, there are several differences among ours and those 4 publications. The statistical methods for developing models we use (Dressman et al. shotgun stochastic search [151][10], Ju et al. manually select top differentially expressed genes [152][11], Helleman et al. BRB & SAM [153][8], and Jazaeri et al. BRB [154][6]), sample size, and selection standard are not the same. Our model size (349-gene signature) is more appropriate compared with Dressman et al. (1704 probes representing 1388 genes). Although Ju et al., Helleman et al. and Jazaeri et al have smaller model size (100, 68, and 85 respectively), they either had small training set or merely used top differentially expressed genes as predictors. In conclusion, our signatures are novel compared to the 4 previous publications ([155]Table 2) in 4 aspects: 1) We have a bigger sample size (322 compared with 119, 96, 45 and 13), which are more convincible in developing predictive models. 2) We have a better sample selection (just using CR and PD samples and mainly focusing on platinum/paclitaxel resistance). 3) In order to find out which genes are truly related to treatment outcomes and to exclude the potential bias of those key clinical features, we developed a de-correlated model which was novel. 4) We employed supervised component analysis to develop 349-gene signature and 18-gene signature, which is different from that just using highest differentially expressed genes (Ju et al.). To further elucidate the biological contribution of those signature genes to treatment outcomes, we built the regulatory networks to identify critical hub genes and signaling pathways differentially present in the CR and PD groups. Among the ten hub genes identified in the 349-gene model, UBE2I is correlated with histological subtypes of EOC [156][43], CASP3 is the most important marker of apoptosis [157][44], and MAPK3 plays a crucial role in EOC progression. The other genes (SMARCE1, DISC1, CENTD2, RHOT1, ARHGAP6, ARHGEF9 and ARHGEF11) are also involved in cancer progression or chemoresistance [158][45]–[159][49]. Since gene expression profiles are noisy and it is hard to find most significant pathways by doing the pathway enrichment directly, we set up a new strategy to solve this problem. Our strategy is based on two aspects: 1) most gene expression changes are regulated by transcription factors and 2) the action of transcription factors is relatively less noisy. By this approach, we found that the genes in the 349-gene model are regulated by 29 transcription factors that are enriched in the six critical pathways, including MAPK, TGF beta, Wnt, cell cycle, Focal adhesion, and cell proliferation signaling pathways. The association of these pathways with the response to chemotherapy or cancer progression has been reported in previous studies [160][50]–[161][58]. The hub genes, transcription factors and critical signaling pathways we identified could be potential targets for drug design after further validation. One important feature of an ideal biomarker is easy to detect. We thus developed a new approach to screen chemotherapeutic biomarkers that could be detected in the serum. We found ten genes encoding secretory proteins that have the ability to separate CR from PD and thus could be potential serum biomarkers for predicting the response to the platinum/paclitaxel-based treatment in EOC. AFM identified in the study was reported to be an independent diagnostic marker of CA125 [162][42], which partly supports our strategy and findings. Since CA125 is the conventional biomarker for ovarian cancer progression and chemotherapeutic response, the addition of AFM to CA125 could thus improve the prognostic power in EOC. The construction of gene-drug interaction network gave us more hints on how to choose the right drugs for individualized treatment. As shown in results section, Gefitinib might be an appropriate drug for the treatment of platinum/paclitaxel-resistant patients by decreasing the expression of MAPK3 and increasing the expression of ESR2. Although our findings are encouraging, there are still some questions unanswered. For instance, experimental validations are still needed to explore the specific roles of those hub genes, transcription factors and signaling pathways in chemoresistance using ovarian cancer cell lines and animal. In addition, we need to further test those serum biomarkers using serum samples of ovarian cancer patients. Summarily, we developed two predictive models that yield insights into the molecular mechanisms of chemoresistance. Based on the models, we built an upstream regulatory network in which several critical transcription factors and signaling pathways may play crucial roles in chemoresistance in EOC. Further, by integrating with published findings, we found ten potential serum biomarkers that could be used in clinical practice. In addition, gene-drug interaction network was constructed, which not only shows us which drugs can inhibit these key transcription factors and hub genes, but also tell us how these genes could increase or decrease the susceptibility of chemotherapeutic drugs. This is a good beginning for us to select the most suitable drugs for a better treatment outcome of those patients resistant to platinum/paclitaxel-based chemotherapy. Supporting Information Table S1 Gene list of the 349-gene signature. (XLSX) [163]Click here for additional data file.^ (26.6KB, xlsx) Table S2 Potential Transcriptional Factors that regulate genes in the 349-gene signature. (XLSX) [164]Click here for additional data file.^ (10.4KB, xlsx) Table S3 77 genes that encode secretory proteins related with platinum/paclitaxel-based treatment(integrated our results with 4 previous studies). (XLSX) [165]Click here for additional data file.^ (15.1KB, xlsx) Table S4 Area Under the Curve of genes that encode secretory proteins. (XLSX) [166]Click here for additional data file.^ (15.5KB, xlsx) Table S5 Genes interact with platinum or paclitaxel searched from CTD database. (XLSX) [167]Click here for additional data file.^ (322.3KB, xlsx) Table S6 Importance-scores of 349 genes in predictive regression model. (XLSX) [168]Click here for additional data file.^ (22.4KB, xlsx) Table S7 Importance-scores of 18 genes in de-correlated model. (XLSX) [169]Click here for additional data file.^ (9.5KB, xlsx) Table S8 Differentially expressed genes in [170]GSE15372 & [171]GSE33482 & [172]GSE28646. (XLSX) [173]Click here for additional data file.^ (183.5KB, xlsx) Table S9 Signature genes from 4 previous publications which were listed in [174]Table 2 . (XLSX) [175]Click here for additional data file.^ (31.9KB, xlsx) Figure S1 Heat map of 349-gene signature against 322 patients. This diagram shows the heat map of 349-gene signature against 322 patients, in which rows represent different genes in 349- gene signature and columns represent different patients. The blue bar above the heat map represents CR and green bar represents PD. (TIFF) [176]Click here for additional data file.^ (255KB, tiff) Figure S2 Heat map of 18-gene signature against 322 patients. This diagram shows the heat map of 18-gene signature against 322 patients, where rows represent different genes in 18-gene signature and columns represent different patients. The blue bar above the heat map represents CR and green bar represents PD. (TIFF) [177]Click here for additional data file.^ (318.7KB, tiff) Figure S3 The Venn diagram showing the overlap between our signatures and genes from previous publications. The Venn diagram shows how much genes in the 349-gene model (A), 18-gene model (B), hub genes (C), top 10 serum biomarkers (D) are overlapped with 3 previous publications (Dressman et al., Ju et al. and Jazaeri et al.). (TIF) [178]Click here for additional data file.^ (1,023KB, tif) Figure S4 The Venn diagram showing the overlap among our signatures and other datasets & publications. (A) The Venn diagram shows genes in 18-gene signature are all belong to genes in 349-gene signature. (B) The Venn diagram shows there are no overlap between genes from Helleman et al. and genes in our findings (349-gene signature, 18-gene signature and hub genes). (TIF) [179]Click here for additional data file.^ (927.8KB, tif) Figure S5 Transcription factor Interaction network and enriched pathways derived from the 349-gene model. Eight transcription factors circled in dot yellow line are enriched in TGF-beta, MAPK and Wnt signaling pathway (red arrow). (TIF) [180]Click here for additional data file.^ (989.3KB, tif) Figure S6 Gene-gene interaction network of 17 genes in the 18-gene model and Hub gene interaction network of 18 signature genes. (A) Gene-gene interaction network of 17 in the 18-gene model analyzed by C3NET. (B) Hub genes and neighboring genes of the 18 signature genes. (TIF) [181]Click here for additional data file.^ (732.5KB, tif) Figure S7 Transcription factor- drug Interaction network. This diagram shows how Transcription factors and drug are interacted. For example, MYC could decrease patient's susceptibility to Cisplatin, Fluorouracil and Doxorubicin, whereas 10058-F4 could decrease the expression of MYC. (TIF) [182]Click here for additional data file.^ (557.4KB, tif) Acknowledgments