Abstract Background Nearly one-third of serous ovarian cancer (OVCA) patients will not respond to initial treatment with surgery and chemotherapy and die within one year of diagnosis. If patients who are unlikely to respond to current standard therapy can be identified up front, enhanced tumor analyses and treatment regimens could potentially be offered. Using the Cancer Genome Atlas (TCGA) serous OVCA database, we previously identified a robust molecular signature of 422-genes associated with chemo-response. Our objective was to test whether this signature is an accurate and sensitive predictor of chemo-response in serous OVCA. Methods We first constructed prediction models to predict chemo-response using our previously described 422-gene signature that was associated with response to treatment in serous OVCA. Performance of all prediction models were measured with area under the curves (AUCs, a measure of the model’s accuracy) and their respective confidence intervals (CIs). To optimize the prediction process, we determined which elements of the signature most contributed to chemo-response prediction. All prediction models were replicated and validated using six publicly available independent gene expression datasets. Results The 422-gene signature prediction models predicted chemo-response with AUCs of ~70 %. Optimization of prediction models identified the 34 most important genes in chemo-response prediction. These 34-gene models had improved performance, with AUCs approaching 80 %. Both 422-gene and 34-gene prediction models were replicated and validated in six independent datasets. Conclusions These prediction models serve as the foundation for the future development and implementation of a diagnostic tool to predict response to chemotherapy for serous OVCA patients. Electronic supplementary material The online version of this article (doi:10.1186/s12943-016-0548-9) contains supplementary material, which is available to authorized users. Keywords: Ovarian cancer, Chemo-response, Prediction model, Data integration, Individualized treatment Background Epithelial ovarian cancer (OVCA) has the highest mortality rate of all gynecologic cancers [[39]1]. The most common histological subtype of OVCA is serous [[40]2]. The majority of patients present with advanced disease at diagnosis and, while some benefit from a treatment combining cytoreductive surgery and chemotherapy [[41]3], nearly a third of patients with serous OVCA will not respond to this initial treatment and die from disease within one year after diagnosis [[42]1, [43]4]. Despite significant research directed at understanding the biology of OVCA [[44]5, [45]6], outcomes remain poor for a majority of patients, particularly those who do not respond to initial chemotherapy. A major limitation is the lack of validated biomarkers that can effectively predict response to chemotherapy [[46]7, [47]8]. Previous attempts to define predictors of response to treatment have been limited by number of patients included, mixture of histological types and stages, and lack of validation in independent sets [[48]9, [49]10]. In contrast, breast cancer gene signatures have been identified that can accurately predict recurrence [[50]11] and chemotherapeutic response [[51]12, [52]13]. These signatures were subsequently validated in independent clinical studies [[53]13–[54]15]. For example, one of these signatures, OncotypeDx, used 600 cases to create an association model and validated it in an additional 400 cases [[55]11, [56]12]. Currently, there is no similar clinically available test for OVCA to identify which patients will respond to initial treatment [[57]16]. In recently published studies using the Cancer Genome Atlas (TCGA) serous OVCA database [[58]17], we identified a robust molecular signature associated with chemo-response by integrating publicly available biological and clinical data from 450 serous OVCA patients. This yielded a 422-gene molecular signature that was replicated in five independent gene expression experiments [[59]18]. The contributing data used to identify this signature included gene expression, gene copy number alteration, gene mutations, DNA methylation, and miRNA profiles, all of which are available in TCGA dataset for serous OVCA. The presence of a strong association between the 422-gene signature and chemo-response from our previous work, though, does not imply that the signature also is predictive of chemo-response [[60]9]. Therefore, the main objective of the present study was to determine the performance of the 422-gene signature as a predictor of chemo-response in serous OVCA. We also optimized and determined which of the elements of the signature contributed more to all prediction models. In this process, we identified a smaller set of 34 genes (the “optimized” set) from the original 422 signature that are predictive of response and that replicated the area under the curve (AUC) of the original complete gene set. Our data demonstrate that both the complete and the optimized models are predictive of outcome and are now replicated and validated in independent datasets. Methods Patients and data collection for prediction model All data collection and processing, including the consenting process, were performed after approval by all local institutional review boards and in accord with the TCGA Human Subjects Protection and Data Access Policies, adopted by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). Patients with serous OVCA in TCGA were utilized to create a prediction model in the testing dataset, and were divided into two categories: complete responders (CR) and incomplete responders (IR). Clinical complete response (CR) was defined as progression-free survival 6 months after the first platinum-based treatment. In patients with incomplete response (IR), the disease either not did respond or progressed during treatment (refractory), or recurred within 6 months of treatment completion (resistant) [[61]4, [62]19]. Patients defined as IR in our study are also clinically referred to as ‘platinum-resistant’ [[63]20], with direct implications for treatment and prognosis. In the TCGA dataset, there were 292 patients classified as CR and 158 classified as IR. Table [64]1 describes the clinical characteristics of these patients. Chemo-response was the most significant prognostic factor for survival in multivariable analysis by Cox proportional hazards regression (p-value < 10^−14), and patients with IR had a significantly decreased median survival compared to CR patients (Fig. [65]1) [[66]18]. Table 1. Clinical data from TCGA patients CR IR p-value* Number of Patients 292 158 Age (Avg.) 60 59.6 N.S. Grade N.S. Grade 1 4 1 Grade 2 35 18 Grade 3 246 135 Stage p < 0.01 Stage I 10 3 Stage II 19 1 Stage III 224 123 Stage IV 39 29 Surgical outcome N.S. Optimal (<1 cm residual) 207 92 Suboptimal (>1 cm residual) 52 57 Optimal Treatment p < 0.001 Optimal (Surgery + 6 cycles) 179 66 Suboptimal 113 92 [67]Open in a new tab *Multivariable analysis of TCGA clinical variables: Only FIGO stage and optimal treatment (including optimal surgery AND 6 cycles of platinum-based chemotherapy) were independently associated with chemo-response in serous OVCA Fig. 1. Fig. 1 [68]Open in a new tab Survivorship by chemo-response in serous OVCA TCGA data. Chemo-response was the most significant factor in the multivariable analysis for survival. Complete responders (CR) have a median survival 2 years greater than IR Gene signature and prediction analysis We previously identified a 422-gene signature that is robustly associated with chemo-response [[69]18]. To assess predictive performance of this signature, we applied the ‘Classification for MicroArrays’ (CMA) to TCGA serous OVCA data. CMA is a statistical tool designed to construct and evaluate classifiers (or prediction models) derived from microarray experiments using a large number of standard methods [[70]21] and the R environment for statistical computing ([71]www.r-project.org) [[72]22]. Of the different methods available in the CMA package [[73]21] to perform the analysis, nine methods consistently handle missing values, lower number of samples, and compute AUCs without reporting any errors: random forest [[74]23], least absolute shrinkage and selection operator (Lasso) [[75]24], Elastic Net [[76]24], prediction analysis for microarrays (PAM) [[77]25], diagonal discriminant analysis [[78]26], partial least squares (PLS) [[79]27], PLS - random forest [[80]27], penalized logistic regression [[81]28], and PLS - logistic regression [[82]27]. We used these nine methods for the rest of the study to compare the predictive performance of all of the different datasets and for both the complete and optimized models. Two other available methods, linear and quadratic schrinkage, could not compute AUC. Fisher’s discriminant analysis could not handle more variables than subjects; neural networks was unstable/difficult to tune and interpret; k-nearest neighbors and support vector machines could not tune and evaluate AUCs. Initially, all 422 genes associated with chemo-response in serous ovarian cancer [[83]18] were utilized to construct prediction models, termed 422-gene prediction models. To assess how accurately the groups (CR and IR) were predicted, and to avoid over-fitting, cross-validation was used (internal validation of the classifier) [[84]29]. The predictive performance was computed with corrections for TCGA batch-effect and to account for two other variables independently associated with chemo-response in serous OVCA (FIGO stage classification and optimal treatment, Table [85]1) [[86]10]. Sensitivity, specificity and AUC of the predictor/classifier were also calculated. For each of the AUC measurements, we also computed a 95 % confidence interval (CI) to compare different models and different methods of classification. To illustrate the performance of the predictor in classifying chemo-response, a receiver operating characteristic (ROC) curve was generated. These analyses also facilitated comparison of the performance of the predictor models across independent serous OVCA datasets and assessed how consistently the models predicted chemo-response in OVCA patients based on sensitivity, specificity, misclassification rate, and AUC. Finally, we identified which patients were more likely to be misclassified and the clinical characteristics that were associated with misclassification. Selection of most informative genes of prediction models We focused on the selection of informative genes, because the composition of prediction models is paramount for their performance [[87]9]. The selection process was performed with all available methods in the software package: two-sample t-test; Welch modification of the t-test; Wilcoxon rank sum test; F-test; Kruskal-Wallis test; “moderated” t and F test, respectively, using the package ‘limma’ in R statistics; one-step Recursive Feature Elimination (RFE) in combination with the linear support vector machines (SVM); random forest variable importance measure; least absolute shrinkage and selection operator (or Lasso); the regularized regression method or elastic net; component-wise boosting; and ad-hoc “Golub” criterion [[88]21]. Using the gene selection tool, each gene was ranked depending on its relative importance in prediction models. These genes were ordered based on their rank and their relative ‘weight’ in the prediction process, and the prediction model analysis was applied by including only those genes that had been ranked at least once (one ‘hit’) by each method. These models, containing only the 34 selected and more informative genes, were termed 34-gene prediction models and comprised the optimized gene set as compared to the complete gene set. Data retrieval for replication and validation analyses Validation and replication of the prediction models was performed using datasets in the Gene Expression Omnibus (GEO) and the European Bioinformatics Institute, part of the European Molecular Biology Laboratory (EMBL-EBI), that contain gene expression paired with treatment response data (Table [89]2). Databases were downloaded in their raw state to maximize platform and annotation information, and then data were normalized. Response to therapy variables were coded to make outcomes comparable with TCGA: CR and IR. Also, patients that underwent optimal debulking (with largest residual disease of <1 cm) and completed six cycles of platinum-based therapy were considered to have ‘optimal treatment’. Lesser treatments were considered suboptimal. This analysis was performed because optimal treatment and FIGO stage of disease were also significantly and independently associated with chemo-response in TCGA (Table [90]1). Both clinical variables were collected, when available, and assessed for association to chemo-response in these new datasets in order to account for them in the prediction analysis. Also, batch-effect, if available, was accounted for to correct for any bias, as was also performed in the initial prediction model using the TCGA dataset. Table 2. Publicly available GEO datasets of patients with serous OVCA used for validation/replication of prediction models Repositories Number of patients Study Names References