Abstract

Simple Summary

   Tumor classification based on genomic features may be able to identify
   new clinically relevant subtypes and disease characteristics. By
   integrating multiple levels of genetic and epigenetic data, distinct
   clusters can be defined among tumors with similar histology and
   previously unknown distinguishing features. Here, we aimed to find
   prognostic subtypes among high-grade serous ovarian tumors by
   integrating their transcriptomic and methylomic features. Feature
   selection was applied to retain only those features most significantly
   correlated with disease recurrence. By using consensus clustering and
   machine learning algorithms, we describe four groups of tumors
   characterized by unique genetic and epigenetic properties which were
   associated with significant differences in clinical outcomes. By using
   both techniques in succession, we uncovered both differential features
   between groups and defining ontologies therein. One such group was
   associated with stromal and immune diverse cell populations and was
   associated with poor clinical outcomes. Our findings identify unique
   contributors to disease recurrence in high-grade serous ovarian cancer.

Abstract

   High-grade serous ovarian cancer (HGSOC) is characterized by a complex
   genomic landscape, with both genetic and epigenetic diversity
   contributing to its pathogenesis, disease course, and response to
   treatment. To better understand the association between genomic
   features and response to treatment among 370 patients with newly
   diagnosed HGSOC, we utilized multi-omic data and semi-biased clustering
   of HGSOC specimens profiled by TCGA. A Cox regression model was
   deployed to select model input features based on the influence on
   disease recurrence. Among the features most significantly correlated
   with recurrence were the promotor-associated probes for the NFRKB and
   DPT genes and the TREML1 gene. Using 1467 transcriptomic and methylomic
   features as input to consensus clustering, we identified four distinct
   tumor clusters—three of which had noteworthy differences in treatment
   response and time to disease recurrence. Each cluster had unique
   divergence in differential analyses and distinctly enriched pathways
   therein. Differences in predicted stromal and immune cell-type
   composition were also observed, with an immune-suppressive phenotype
   specific to one cluster, which associated with short time to disease
   recurrence. Our model features were additionally used as a neural
   network input layer to validate the previously defined clusters with
   high prediction accuracy (91.3%). Overall, our approach highlights an
   integrated data utilization workflow from tumor-derived samples, which
   can be used to uncover novel drivers of clinical outcomes.

   Keywords: high-grade serous ovarian cancer (HGSOC), multi-omic
   analysis, data integration, consensus clustering, neural network

1. Introduction

   Ovarian cancer remains among the leading causes of cancer death among
   women, with an overall 1 in 78 mortality risk [[30]1]. High-grade
   serous ovarian cancer (HGSOC) is the most prevalent subtype, with a
   5-year survival rate of approximately 47.4% [[31]2]. HGSOC is thought
   to originate from the fallopian tube epithelium and harbors—almost
   universally—a p53 mutated signature. These tumors exhibit heterogeneous
   gene expression and methylation profiles and a diverse mutational
   landscape [[32]3]. The first-line treatment is a combination of
   platinum-based chemotherapy, with ~85% of patients achieving a complete
   or partial response to therapy [[33]4]. However, recurrence rates are
   high, with a median platinum-free interval (PFI) of 18 months [[34]3].
   Patients whose recurrence occurs earlier than 6 months are deemed
   “platinum-resistant”, and those recurring after are
   “platinum-sensitive”. Although other second-line therapies, such as
   liposomal doxorubicin, bevacizumab, and gemcitabine, have demonstrated
   clinical benefit, others—such as PD1/PD-L1-targeted
   immunotherapies—have yielded only marginal responses
   [[35]5,[36]6,[37]7,[38]8].

   The integrated molecular analysis of HGSOC in The Cancer Genome Atlas
   (TCGA) yielded a classification of tumors based on histology and
   genomic features (mutation, copy number, gene expression) [[39]9]. This
   analysis found ubiquitous TP53 mutations, a defined subset of tumors
   displaying BRCA1/2 germline mutations, and characteristic DNA copy
   number alterations and promoter methylation events. Non-negative matrix
   factorization (NMF) consensus clustering was conducted using mRNA
   expression, miRNA expression, DNA copy number, and promoter methylation
   data as input. Four subtypes were then defined: Mesenchymal,
   Proliferative, Immunoreactive, and Differentiated, building off a prior
   analysis which used microarray in isolation [[40]9,[41]10]. The
   Mesenchymal subtype demonstrated poor overall survival, increased
   expression of genes related to stromal components, and active EMT- and
   angiogenesis-related genes. The Proliferative subtype showed increased
   expression of markers of proliferation (e.g., MCM2, PCNA), as well as
   increased WNT/β-catenin signaling [[42]9,[43]10]. Tumors with an
   Immunoreactive subtype were associated with improved overall survival
   and upregulated genes relating to immune cell activation (CXCL11,
   CXCL12 CXCR3, etc.). The Differentiated subtype displayed high
   expression of markers of epithelial differentiations, such as MUC1 and
   MUC16. Other studies have attempted similar “omic” classifications;
   however, the prognostic and therapeutic relevance of molecular subtypes
   remains variable between studies and has not drastically influenced
   treatment decisions [[44]9,[45]10,[46]11,[47]12].

   Advancements in unsupervised learning and multi-omic data integration
   are allowing for more complex methods of tumor subtyping
   [[48]13,[49]14]. These newer subtyping methods can capture more disease
   context with each added layer of data but remain untested for clinical
   utility. Here, we present a novel workflow using integrated methylomic
   and transcriptomic data from TCGA, which we use to cluster HGSOC tumors
   by using “omic” features selected based on correlation with disease
   recurrence. We characterize the genomic features of these newly defined
   clusters through downstream differential analyses and use machine
   learning techniques to uncover novel subtypes with distinct phenotypes
   and clinically relevant signatures.

2. Materials and Methods

2.1. Data Collection

   Paired RNA-seq and 27k Illumina methylation array data were obtained
   for 374 HGSOC specimens from TCGA, downloaded directly from the Genomic
   Data Portal (GDC) via a 2011 Legacy dataset available online [[50]15].
   Legacy data were required, as cancer recurrence information is not
   available for current iterations. Supplemental clinical metadata was
   gathered using the Broad Institute GDAC Firehose database
   ([51]http://gdac.broadinstitute.org), accessed on 17 August 2022. Time
   to recurrence was defined as days from diagnosis to tumor recurrence.
   Patients were censored at the last date they were alive and known to be
   recurrence-free.

2.2. RNA-seq Data Processing

   RNA-seq data were collected as BAM genomic alignment files. Fragments
   per kilobase of transcript per million mapped reads (FPKM) and expected
   gene counts were then calculated using RNA-Seq by
   Expectation–Maximization (RSEM) [[52]16]. FPKM counts were normalized
   to log2(FPKM + 1) and used for clustering analysis and neural network
   construction. Data distribution was examined after normalization to
   check for normality. Low gene counts were filtered to remove those with
   counts of zero in ≥70% of samples. RSEM expected counts were rounded to
   the nearest integer and used for downstream differential analysis.
   Three patients had multiple aliquot samples; in these cases, the mean
   expression values across aliquots were used.

2.3. 27k Methylation Array Data Processing

   27k Illumina methylation array data were collected in a SeSAMe
   Methylation Beta Value file format. Nine multi-aliquot samples existed,
   for which the mean beta values were used to represent each sample.
   Missing beta values were imputed using ChAMP version 2.30.0, using the
   K-nearest neighbors method (KNN) with k = 5 neighbors [[53]17]. Probes
   with a proportion of samples with an NA value above 0.2 were removed.
   For downstream correlation analyses, probes were averaged to the gene
   level.

2.4. Data Integration and Feature Selection

   The expression data (count matrix) and methylation data (beta values
   matrix) were unified such that each patient had one representative
   sample in each dataset. Prior to sample clustering, various feature
   selection methods were performed to select the most relevant features
   for cluster distinction. Two main methods were used—mean absolute
   deviation (MAD) and Cox proportional hazards regression models
   [[54]18].

   The univariate Cox model was used for each feature (gene, probe) to
   indiscriminately test its significance in relation to disease
   recurrence. In this case, the event variable was the binary recurrence
   or censoring indicator, and the time was the number of days from cancer
   diagnosis to recurrence, death, or censoring. The Cox-selected features
   were chosen for downstream analysis. Features were evaluated for
   selection based on both R-square value and Ward p-value. A cutoff value
   of 0.05 was used for the proportional hazards p-value for each feature.
   The coxph function in R survival package was used for this analysis
   [[55]19].

   The MAD method was initially used for feature selection for the 500
   most variable features from each dataset but was not deployed in
   downstream clustering. The following formula was used:
   MAD = Σ|xᵢ − μ|/n, where
   n is the number of samples, and
   |xᵢ − μ| is the absolute deviation of feature xᵢ from the mean.

2.5. Unsupervised Clustering Analysis

   In total, 1467 features (methylation = 650 CpG probes, expression = 817
   genes) were selected for use in the clustering analysis. Various
   methods of unsupervised clustering were used to identify distinct
   subtypes using this feature set. The methods used for clustering
   included Ward hierarchical clustering, partition around medoids (PAM),
   Clara, and similarity network fusion (SNF). Ultimately, a combined SNF
   and consensus clustering (CC) was applied [[56]20]. First, SNF was
   conducted to obtain the fusion matrix of sample similarity for the
   combined methylation and expression feature set. Next, the matrix was
   used as the input sample distance for consensus clustering, with k = 4
   nearest neighbors. In total, 20 iterations were performed, and an alpha
   variance for the local model was set to 0.5. This iterative clustering
   was performed using the CancerSubtypes package [[57]21].

2.6. Survival and Recurrence Analysis

   Overall survival and time to recurrence were estimated using the method
   of Kaplan–Meier and compared between the identified clusters using the
   logrank test. With a sample size n = 370 and expecting to identify 3 or
   4 clusters, we expected the cluster size to be approximately n = 100.
   Assuming approximately 80% event rate (i.e., approximately 160 events
   in the two clusters), a Cox proportional hazards model comparing
   time-to-event outcomes between two such clusters would provide 80%
   power to detect a hazard ratio HR = 1.56 when testing at two-sided α =
   0.05. Adjusting for multiple comparisons and testing at α = 0.01, there
   would be 80% power to detect HR = 1.72.

2.7. Differential Expression and Methylation

   To test for differences in expression and methylation among the
   responsive and non-responsive clusters, differential expression was
   performed using DESeq2 [[58]22]. DESeq utilizes an empirical Bayesian
   method to assess differential expression for each gene, making specific
   cluster-wise comparisons (e.g., Cluster 4 vs. Cluster 1). An
   FDR-adjusted p-value of <0.05 (Benjamini–Hochberg correction) and
   absolute LFC (|>0.5|) were used as thresholds for significance. To test
   for methylation differences among the two clusters of interest,
   differential methylation was performed using limma [[59]23]. Limma
   employs a linear model with an empirical Bayes procedure to make
   pairwise comparisons between one cluster and all others. An
   FDR-adjusted p-value of <0.05 (BH) was used as a cutoff for
   significance with an absolute delta beta value (|>0.10|). Gene
   annotation for each differential probe was obtained from the Illumina
   Human Methylation 27k Annotation. Enrichment analysis using
   differentially expressed and methylated genes was conducted using
   Enrichr [[60]24].

2.8. MLP Neural Network Architecture

   The selected features (n = 1467) used for unsupervised clustering were
   used as the input nodes for a multilayer perceptron (MLP) artificial
   neural network. The data were split into 80/20 training and testing
   sets (by % of samples), respectively. The sample labels were defined as
   the cluster numbers (i.e., subtypes) previously identified. The model
   architecture was as follows: one input layer (nodes = 1467), three
   hidden layers (nodes = 1000, 1000, 100), and one output layer (labels =
   4). A layer dropout rate of 0.1 was used in the first layer. For the
   first four layers, the relu activation function was used. For the
   output layer, the softmax activation function was used. The adam
   optimizer function was used, with a learning rate of 0.0001. The
   categorical crossentropy loss function was used. Model accuracy was
   calculated based on the confusion matrix of predicted and actual class
   labels. Machine learning analysis was performed with Keras [[61]25].

2.9. Gene Set Enrichment Analysis

   The top features were extracted based on high neuron fire frequency
   within the assembled neural network model. Genes of interest were
   isolated from the trained model, based on those with highest feature
   weights and bias in aggregate. Gene ontology enrichment analysis was
   then performed using clusterProfiler (software package) version 4.8.1
   [[62]26].

2.10. Additional Downstream Analyses

   Three additional downstream analyses were performed on the identified
   clusters to evaluate further signature differences among them.
   Correlation analysis was performed for each cluster separately to
   evaluate the correlation of gene-level methylation with expression,
   using MethylMix 2.0 [[63]27]. Immune cell-type composition analysis was
   applied using Immunedeconv, (clusterProfiler version 4.8.1) on TPM
   counts, to assess these differences among each cluster [[64]28].

2.11. Workflow Validation and Analysis

   To validate the described workflow and demonstrate its utility on an
   external dataset, RNA-seq expression and 450k methylation array data
   were collected from the HGSOC dataset maintained by the International
   Cancer Genome Consortium (ICGC) [[65]29]. Following requisite filtering
   and quality control, 61 tumor samples for which both transcriptomic and
   methylation data were available were selected for analysis. Expression
   filtering and normalization were performed as described, using
   normalized HTSeq counts. A more stringent p-value cutoff was applied to
   Cox feature selection (p < 0.01) to account for the higher number of
   probes included in the 450k array. All downstream analyses were
   conducted in the same fashion as the described workflow. To compare
   methylation probes from 450k array ICGC data to 27k array TCGA data,
   450 k probes were limited to only those falling within +/− 1500 base
   pairs of TSS, representing the promoter region. However, all 450k
   probes were considered for feature selection.

3. Results

3.1. Unsupervised Clustering Analyses of Integrated DNA Methylation and RNA
Sequencing Data

   Data were processed and analyzed according to the workflow outlined in
   [66]Figure 1A. Of the 380 patients within TCGA-OV dataset, for whom
   both methylation array and RNA-seq data were available, 370 samples for
   each assay remained after the unification and averaging of aliquots.
   Following filtering and normalization, a univariate Cox model
   (proportional hazards regression) was applied to evaluate the
   significance of each feature (gene and probe) related to disease
   recurrence. In total, 54,494 features were evaluated. The feature
   selection for the clustering analysis employed only those features with
   likelihood-ratio test p-values of <0.05. Prior feature selection
   attempts, using features with high variability (mean absolute
   deviation), were inconclusive.

Figure 1.

   [67]Figure 1
   [68]Open in a new tab

   Analysis workflow (A). Methylation array (27k) and RNA sequencing data
   (RNA seq) datasets were unified to n = 370 patients. Necessary
   pre-processing, filtration, and normalization were performed for each
   dataset. (B,C) Ward clustering of unified RNA-seq (B) and methylation
   (C) data, for n = 370 patient samples. Feature selection using Cox
   regression resulted in 1467 features selected for clustering. K = 4
   clusters set for analysis. Each column represents a tumor specimen, and
   each row represents a genomic feature. No major distinctive clusters
   observed using this method.

   In total, 1467 features were selected through the described algorithm
   and used for clustering. Among the features most significantly
   correlated with recurrence were the promotor-associated probes for the
   NFRKB and DPT genes and the TREML1 gene ([69]Supplementary Table S1).
   Ward hierarchical clustering was performed on the unified dataset, with
   k = 4 clusters ([70]Figure 1B,C). No clear pattern of clusters emerged,
   possibly due to the high inter-sample variability of features. Thus,
   more sophisticated clustering methods were employed.

   Similarity network fusion combined with consensus clustering yielded
   optimal k = 4 clusters with distinct differences in disease recurrence
   propensity ([71]Figure 2A). All identified clusters demonstrated a
   silhouette width of acceptable quality ([72]Figure 2B). Called-feature
   expression was largely distinct by cluster ([73]Figure 2C). Cluster 1
   (n = 101 tumors) showed a higher propensity toward treatment
   sensitivity; the median time to disease recurrence was 977 days
   ([74]Table 1). Cluster 4 (n = 111 tumors) showed a trend toward
   treatment resistance; the median time to disease recurrence was 428
   days. Cluster 3 (n = 98 tumors) also trended toward resistance and
   lethality, with a median time to recurrence of 568 days and median time
   to death of 1248 days. These clusters were selected for downstream
   differential analyses, as they had the highest disparity in treatment
   outcomes.

Figure 2.

   [75]Figure 2
   [76]Open in a new tab

   Clustering results, overview. Similarity network fusion plus consensus
   clustering (SNF + CC) was performed on the features selected by Cox
   regression (k = 817 genes, 650 probes). (A) KM-based time to recurrence
   analysis performed for each identified cluster. K = 4 clusters were
   optimal for significance (p-value 3 × 10^−7). (B) The silhouette width
   of each identified cluster. (C) Heatmap of the called-feature
   expression of the features selected by Cox regression. Each column
   represents a patient, and each row represents a genomic feature.

Table 1.

   SNF + CC clustering, subtype overview. The subtype classification,
   number of observations, and mean days to tumor recurrence are shown for
   each cluster (k = 4) identified. Three observations were removed due to
   missing data in at least one dataset.
   Cluster Observations (n) Median Days to Tumor Recurrence Median Days to
   Death Subtype
   1 101 977 1511 Responsive
   2 50 723 1278 NM
   3 98 568 1248 NR
   4 111 428 1259 NR
   p-Value: 6 × 10^−8
   [77]Open in a new tab

3.2. Characteristics of Identified Clusters

   Cluster 1: “Responsive”

   Differential gene expression for this subtype was performed against all
   others to identify features potentially contributing to
   chemo-responsiveness. In total, 407 genes (DEGs) were found to be
   differentially expressed (FDR-adj p-value < 0.05, |log2FC| > 0.50), 26
   upregulated and 381 downregulated genes ([78]Figure 3A). Of these, 261
   were protein coding genes. All top 20 differentially expressed genes
   (by adjusted p-value) were downregulated in this cluster vs. all others
   ([79]Table 2). The pathway analysis of the differentially expressed
   genes identified several enriched signaling pathways: Notch signaling,
   Adipogenesis, Sulfation Biotransformation, and Vitamin A and carotenoid
   metabolism ([80]Figure 3B). Several of the differential genes found in
   the Notch signaling pathway have endogenous roles in development but
   also in differentiation and proliferation—DLK1, CNTN6, DLL1, and HEY1
   ([81]Figure 3C). As observed in the network analysis, several
   sub-groups of differentially expressed genes form small clusters of
   network-oriented signaling ([82]Figure 3D). Methylation analysis
   comparing Cluster 1 to all others revealed 1292 differentially
   methylated probes (DMPs), all limited to the promoter region, as
   included in the 27k Illumina platform. The top 20 DMPs, selected based
   on adjusted p-value, were all hypermethylated. In total, 41 genes were
   both differentially expressed and had at least one promoter-level probe
   differentially methylated. These genes were selected as a cluster
   “signature” for further inspection.

Figure 3.

   [83]Figure 3
   [84]Open in a new tab

   Results of differential analysis, Cluster 1. (A) All differentially
   expressed genes, with adj. p-value < 0.05 and |LFC| > 0.50, with genes
   of interest highlighted. (B) Pathway enrichment analysis for Cluster 1
   (“Responsive”), performed using ENRICHR. Enriched pathways found using
   differentially expressed genes within Cluster 1. * denotes statistical
   significant (p < 0.05) (C) Presence of differentially expressed genes
   within significantly enriched pathway sets. (D) Network analysis using
   three foundational pathway sets, showing connectivity between
   differentially expressed genes and enriched pathways.

Table 2.

   Differentially expressed genes: Cluster 1 (“Responsive”) vs. All (A).
   Top 20 differentially expressed genes, sorted by significance (adjusted
   p-value). A total of 407 DEGs.
    Gene     Log2 FC     Adj. p-Value      Type        Direction
   GABRG2  −4.91896998  6.79 × 10^−27 protein_coding Downregulated
   DPYSL5  −4.287431685 3.89 × 10^−26 protein_coding Downregulated
    NXF2   −6.848164657 8.82 × 10^−26 protein_coding Downregulated
    CUX2   −3.320974078 4.62 × 10^−21 protein_coding Downregulated
    FGF17  −3.378578839 6.53 × 10^−21 protein_coding Downregulated
   CLEC2L  −3.519088593 2.55 × 10^−19 protein_coding Downregulated
    PAGE2  −5.530950007 2.55 × 10^−19 protein_coding Downregulated
   IGF2BP1 −3.047958903 5.53 × 10^−18 protein_coding Downregulated
   MAGEA9B −5.083688527 2.53 × 10^−17 protein_coding Downregulated
   IGDCC3  −3.087367559 2.64 × 10^−17 protein_coding Downregulated
    OR3A2  −4.033030242 3.86 × 10^−17 protein_coding Downregulated
    TNNT3  −2.729237962 1.31 × 10^−16 protein_coding Downregulated
   ZNF716  −4.746030841 3.14 × 10^−16 protein_coding Downregulated
   TUBB2B  −2.88250031  9.76 × 10^−16 protein_coding Downregulated
    NXF2B  −5.151107593 1.23 × 10^−15 protein_coding Downregulated
   IGLON5  −2.628183582 3.77 × 10^−15 protein_coding Downregulated
   CYP2W1  −2.769837506 1.54 × 10^−14 protein_coding Downregulated
    SLIT1  −2.276786555 4.87 × 10^−14 protein_coding Downregulated
    RAPSN  −2.529386839 4.87 × 10^−14 protein_coding Downregulated
   FAM133A −2.672339049 7.34 × 10^−14 protein_coding Downregulated
   [85]Open in a new tab

   Cluster 3: “Non-Responsive|Immunosuppressive”

   The differential expression of Cluster 3, compared directly to the
   Responsive Cluster 1, revealed 826 DEGs—711 were upregulated and 115
   were downregulated. Of these, 551 were protein-coding genes. Within the
   top 20 DEGs, sorted by adjusted p-value, many immunoglobulin
   heavy-chain genes (IgHV) and immunoglobulin kappa variable chains
   (IgKV) were identified ([86]Table 3). Among the top protein-coding
   DEGs, several chemokine receptors were identified (e.g., CXCR2, CCR2)
   ([87]Figure 4A). The pathway enrichment of these DEGs yielded multiple
   pathways related to immune and inflammatory signaling: Inflammatory
   Response Pathway, Chemokine signaling pathway, and Selective expression
   of chemokine receptors during T-cell polarization ([88]Figure 4B).
   Groups of interleukin and costimulatory-coding genes defined the
   phenotype, preset across multiple enriched pathways ([89]Figure 4C). No
   clear sub-clustering of differential genes was observed in the network
   analysis ([90]Figure 4D). Due to the observed changes in
   immune/inflammatory signaling within this cluster, cell-type
   composition analysis was performed using Immunedeconv. The analysis
   estimated that Cluster 3 was enriched in immune cells, including both
   immune reactive cells like CD4+ effector memory T-cells and CD8+
   central memory T-cells as well as immuno-suppressive cells, such as
   macrophages, M1 macrophages, M2 macrophages, monocytes, myeloid
   dendritic cells, cancer-associated fibroblasts, and hematopoietic stem
   cells ([91]Figure 5). The overall myeloid dendritic cell activation
   score, the tumor microenvironment score, and the immune score were
   increased in Cluster 3 compared with Clusters 1, 2, and 4, suggesting
   that this cluster is associated with an immune-diverse phenotype. A
   Kruskal–Wallis test revealed statistically significant immune cell-type
   enrichment, particularly in Cluster 3 ([92]Supplementary Table S2).
   Differential methylation analysis of Cluster 3 revealed 596 DMPs with
   an adjusted p-value < 0.05 and an absolute delta beta value ≥0.10. Of
   these, 354 CpG sites were hypermethylated, and five probes had an
   absolute delta beta value >0.19.

Table 3.

   Differentially expressed genes: Cluster 3
   (“Non-Responsive|Immunosuppressive”) vs. Cluster 1 (“Responsive”). Top
   20 differentially expressed genes, sorted by significance (adjusted
   p-value). A total of 826 DEGs.
     Gene    Log2 FC    Adj. p-Value      Type       Direction
   IGHV3-72 4.85327615 1.01 × 10^−29   IG_V_gene    Upregulated
   IGLV7-46 4.10472706 3.87 × 10^−24   IG_V_gene    Upregulated
   IGHV4-34 3.87051574 9.83 × 10^−22   IG_V_gene    Upregulated
   IGKV1-39 4.89152857 1.19 × 10^−21   IG_V_gene    Upregulated
   IGLV1-47 3.86382442 9.44 × 10^−19   IG_V_gene    Upregulated
   IGKV1-12 3.95799068 8.33 × 10^−18   IG_V_gene    Upregulated
   IGHV1-58 5.2744758  8.33 × 10^−18   IG_V_gene    Upregulated
   IGHV4-28 4.0640693  5.68 × 10^−17   IG_V_gene    Upregulated
    PAGE2   6.19952601 8.28 × 10^−17 protein_coding Upregulated
   IGHV2-5  3.70552433 1.24 × 10^−16   IG_V_gene    Upregulated
   IGHV1-18 3.64573859 4.17 × 10^−16   IG_V_gene    Upregulated
   IGHV1-2  3.85467877 6.57 × 10^−16   IG_V_gene    Upregulated
   IGHV3-49 3.89253991 1.76 × 10^−15   IG_V_gene    Upregulated
   IGHV3-20 4.7110679  3.41 × 10^−15   IG_V_gene    Upregulated
   IGLV9-49 4.0216455  5.63 × 10^−15   IG_V_gene    Upregulated
   IGHV5-51 3.47327279 1.22 × 10^−14   IG_V_gene    Upregulated
    P2RX1   2.08886429 2.98 × 10^−14 protein_coding Upregulated
   IGKV5-2  4.30138825 3.27 × 10^−14   IG_V_gene    Upregulated
   IGHV3-73 4.1415814  3.55 × 10^−14   IG_V_gene    Upregulated
   IGHV3-11 3.51650267 5.26 × 10^−14   IG_V_gene    Upregulated
   [93]Open in a new tab

Figure 4.

   [94]Figure 4
   [95]Open in a new tab

   Results of differential analysis, Cluster 3
   (“Non-Responsive|Immunosuppressive”) vs. Cluster 1 (“Responsive”). (A)
   All differentially expressed genes, with adj. p-value < 0.05 and |LFC|
   > 0.50, with genes of interest highlighted. (B) Pathway enrichment
   analysis for Cluster 3, performed using ENRICHR. Enriched pathways
   found using differentially expressed genes within Cluster 3. * denotes
   statistical significant (p < 0.05) (C) Presence of differentially
   expressed genes within significantly enriched pathway sets. (D) Network
   analysis using three foundational pathway sets.

Figure 5.

   [96]Figure 5
   [97]Open in a new tab

   Fractional estimates of cell-type composition within each identified
   cluster. Constructed using immunedeconv R package. Higher fraction of
   multiple immune cell types found within Cluster 3.

   Cluster 4: “Non-Responsive|Hypomethylated”

   Cluster 4, associated with the shortest time to recurrence, contained
   tumors displaying 225 DEGs when compared directly to the Responsive
   Cluster 1. Of those, 161 genes were upregulated. In contrast to Cluster
   3, IgHV and IGLV genes found among the DEGs of Cluster 4 were mostly
   downregulated ([98]Table 4). However, several noteworthy genes were
   significantly upregulated ([99]Figure 6A). The pathway enrichment of
   Cluster 4 DEGs revealed multiple pathways of interest unique to the
   cluster: GABA receptor signaling (GABRA3 and DLL1 genes), Ovarian
   infertility (CNTN6 and COL9A3), and Nuclear receptors (HEY1, MATN3,
   COL9A3), pointing to a phenotype more dysregulated in hormone signaling
   ([100]Figure 6B–D). Differential methylation analysis of this cluster
   relative to Cluster 1 revealed 700 DMPs, all of them hypomethylated.
   Thus, we refer to this cluster as the “hypomethylated” cluster,
   including tumors with potentially less gene silencing occurring at DMP
   loci. The pathway analysis of the DMPs in this cluster revealed the
   enrichment of PI3K-Akt-MTOR signaling.

Table 4.

   Differentially expressed genes: Cluster 4
   (“Non-Responsive|Hypomethylated”) vs. Cluster 1 (“Responsive”). Top 20
   differentially expressed genes, sorted by significance (adjusted
   p-value). A total of 225 DEGs.
      Gene     Log2 FC    Adj. p-Value      Type        Direction
     GAGE2E   5.63658378 6.82 × 10^−6  protein_coding  Upregulated
     CTAG1A   5.24453883 9.92 × 10^−6  protein_coding  Upregulated
     CTAG1B   5.15421251 1.84 × 10^−9  protein_coding  Upregulated
      NXF2    5.04533375 6.06 × 10^−8  protein_coding  Upregulated
     PAGE2    4.95316397 4.13 × 10^−10 protein_coding  Upregulated
     GAGE13   4.92955607 1.53 × 10^−5  protein_coding  Upregulated
     NXF2B    4.78873403 8.63 × 10^−9  protein_coding  Upregulated
     GAGE2A   4.4743878    0.00088715  protein_coding  Upregulated
     GABRG2   4.3361448  1.54 × 10^−13 protein_coding  Upregulated
              3.93401196 1.36 × 10^−5      lncRNA      Upregulated
     SPRR1B   −3.4839398   0.00023679  protein_coding Downregulated
    IGHV2-26  −3.1670253 3.89 × 10^−8    IG_V_gene    Downregulated
    IGLV3-1   −3.1480808 1.55 × 10^−9    IG_V_gene    Downregulated
    IGLV3-21  −3.1374207 3.76 × 10^−10   IG_V_gene    Downregulated
   IGHV5-10-1 −2.9508194 4.33 × 10^−5    IG_V_gene    Downregulated
    IGLV3-27  −2.9179351 9.52 × 10^−8    IG_V_gene    Downregulated
   IGHV3-64D  −2.6754653   0.00046012    IG_V_gene    Downregulated
    IGLV2-23  −2.5860868 9.12 × 10^−7    IG_V_gene    Downregulated
    IGHV3-7   −2.5576913 1.29 × 10^−6    IG_V_gene    Downregulated
    IGHV3-13  −2.4825443   0.00010162    IG_V_gene    Downregulated
   [101]Open in a new tab

Figure 6.

   [102]Figure 6
   [103]Open in a new tab

   Results of differential analysis, Cluster 4
   (“Non-Responsive|Hypomethylated”) vs. Cluster 1 (“Responsive”). (A) All
   differentially expressed genes, with adj. p-value < 0.05 and |LFC| >
   0.50, with genes of interest highlighted. (B) Pathway enrichment
   analysis for Cluster 4, performed using ENRICHR. Enriched pathways
   found using differentially expressed genes within Cluster 4. * denotes
   statistical significant (p < 0.05) (C) Presence of differentially
   expressed genes within significantly enriched pathway sets. (D) Network
   analysis using three foundational pathway sets.

3.3. Workflow Validation on External Dataset

   To validate the feature selection and clustering approach, we applied
   our methodology to an external ICGC clinical dataset, where time to
   disease recurrence was available [[104]29]. Following Cox-based feature
   selection, 6884 gene features from RNA sequencing and methylation array
   data were used for SNF and consensus clustering (statistically
   significant correlation with disease recurrence, p < 0.01). Of these,
   6505 features were methylation probes. A four-cluster optimal solution
   was again identified ([105]Supplementary Figure S1). Each identified
   cluster had a significant difference in time to disease recurrence
   among clusters (p < 0.0001). In total, 56 promoter-region probes and 9
   gene features overlapped with those identified in the primary model
   ([106]Supplementary Table S3).

3.4. Pre-Biased MLP Neural Network

   To verify the clusters previously identified using SNF and CC, a
   Multi-Layer Perceptron (MLP) neural network was constructed, using the
   1467 features selected by the univariate Cox regression model as node
   inputs. With three hidden layers, the output layer led to a
   classification into one of four clusters. This feedforward structure
   combined the selected feature inputs with initial weights and subjected
   them to an activation function for each layer ([107]Figure 7A). The
   data were split into 80/20 randomized training/testing groups, and the
   model was evaluated on prediction accuracy. After model tuning, the
   final prediction accuracy was 91.3% ([108]Figure 7B), with a confusion
   matrix demonstrating the test predictions ([109]Figure 7C). Following
   model tuning and completion, the top nodes were isolated by filtering
   to those with the highest model weight and learning bias. Gene ontology
   analysis was then applied to those features, yielding many
   immune/inflammation-response-related pathways ([110]Figure 8).

Figure 7.

   [111]Figure 7
   [112]Open in a new tab

   Multi-Layer Perceptron (MLP) neural network constructed to predict de
   novo clusters using features selected by Cox regression. (A)
   Architecture and tuning parameters used for final model construction.
   Four nodes in final (prediction) layer representing each cluster. A
   total of 1467 input features from Cox regression selection. (B)
   Confusion matrix generated by 80/20 model training/testing prediction
   of cluster ID. (C) Model training over 100 epoch period, with generated
   loss and prediction accuracy.

Figure 8.

   [113]Figure 8
   [114]Open in a new tab

   Gene ontology analysis from nodes with highest weights and bias in MLP
   constructed model. Analysis conducted using clusterProfiler—vector of
   genes returns the enriched ontologies at a p-value cutoff of 0.05 and
   q-value cutoff of 0.05. Pathway nodes are colored by significance.

3.5. Signature Genes

   Among the differentially expressed and methylated features of each
   cluster, a list of overlapping features occurring in both lists was
   produced ([115]Figure 9A). The hierarchical clustering of these
   features, visualized by cluster, produced distinct patterns between
   each group ([116]Figure 9B).

Figure 9.

   [117]Figure 9
   [118]Open in a new tab

   Signature gene sets found for each cluster. (A) List of overlapped DEGs
   and DMPs found from differential analysis comparing each cluster of
   interest. (B) Heatmap of the normalized (log2 FPKM + 1) expression
   counts for each signature gene, clustered by row and grouped by
   cluster.

4. Discussion

   Our findings have several notable implications, within the context of
   HGSOC genomic profiling. The semi-biased clustering of samples by
   integrated transcriptomic and methylomic data was superior to
   traditional hierarchical clustering, as well as feature selection using
   variability measures. The immunosuppressive phenotype found in Cluster
   3 is distinct; the estimated presence of multiple stromal and immune
   cell types was associated with a poor response to treatment. Finally,
   the entire set of selected features was required for accurate subtype
   prediction using a supervised MLP neural network.

4.1. Clustering and Feature Selection

   Since the wide deployment of next generation sequencing (NGS), several
   methods have been developed to cluster RNA sequencing data into de novo
   cancer subtypes. HGOSC, like other tumor types, is genetically and
   epigenetically heterogeneous [[119]9]. The clustering of such data is
   further complicated by the presence of batch effects, inconsistent
   sequencing depth, differences in clinical features and treatment, data
   scarcity, and differences in assays. Thus, integrated clustering
   methods including multi-omic data types may offer insights which
   single-layer clustering cannot capture. RNA-seq data contain a larger
   number of total elements (typically ~60,000), and 27k methylation array
   data add an additional ~27,000 features; therefore, feature selection
   is critical toward developing a reproducible and effective model. Using
   all available features as input can negatively impact cluster accuracy
   and distinction [[120]30]. Often, features are selected based on
   variability—using a measure of mean absolute deviation (MAD) across all
   samples [[121]31,[122]32,[123]33]. Thus, the most variable features are
   used as inputs for unsupervised learning. However, the utility and
   interpretation of clusters resultant from such variable input features
   is unclear. Due to increased variability, the statistical power is also
   reduced. In our workflow, we forwent the MAD metric for clustering as
   we observed inferior separation between clusters tested (k = 2…8).
   Instead, a univariate Cox regression method was applied to select
   features most closely correlated with disease recurrence, across both
   layers of “omic” data. This resulted in 1467 total features selected
   for our analysis, with an almost equal representation of RNA-seq and
   methylomic features.

   Multi-omics clustering methods used for subtyping have included basic
   k-means clustering, iCluster, similarity network fusion, and consensus
   clustering algorithms [[124]33,[125]34]. Such methods work well with
   high-dimensional data but are equally dependent upon the input data
   used for clustering. These methods largely assume a distribution of
   input feature data, e.g., Poisson (iCluster), negative binomial, and
   non-parametric (average-linkage hierarchical clustering). NMF
   clustering, or non-negative matrix factorization, utilizes
   dimensionality reduction to create a matrix as a proxy for clustering
   [[126]34]. Intrinsically, NMF clustering can also only be used with
   non-negative values. Further, the utility of using these de novo
   clusters for clinical application beyond target discovery remains
   unclear.

   Similarity network fusion (SNF) is a clustering approach which works
   best for multi-layer data integration [[127]35]. This method works by
   building sample networks for each layer of data, then fusing them into
   one representative network. This approach has been used as the basis
   for more robust subtyping analyses in cancer [[128]36,[129]37]. Here,
   we demonstrate that pairing the SNF fusion matrix generated with
   aggregated consensus clustering (SNF + CC) results in more distinct
   clusters which are predictive of recurrence probability. Similar
   clustering methodology was used previously to define a novel subtype of
   sarcoma, which correlated with worse clinical outcomes [[130]38].

   We deem our approach “semi-biased” as the integrated features were
   selected based on their correlation with disease recurrence. Thus, the
   de novo clusters have an expected difference in predicted recurrence.
   However, by conducting downstream differential expression and
   methylation analysis between the most clinically different clusters, we
   identified additional features of each which may drive the phenotype.
   We also captured biological insights based on the differential analyses
   that were not evident with hierarchical clustering or a two-category
   definition of platinum response (i.e., “platinum-sensitive” vs.
   “platinum-resistant”). Finally, we performed a supervised neural
   network deep learning exercise to confirm the validity of our clusters
   and ensure the assignments were detectable based on the selected
   features alone. Each of the clinically distinct clusters (1—Responsive;
   3—Non-Responsive, Immunosuppressive; 4—Non-Responsive, Hypomethylated)
   was associated with clinically significant findings.

4.2. Individual Findings Related to Each Defined Cluster

   The Cluster 1 “Responsive” group contained multiple SOX genes which
   were downregulated (SOX30, SOX 11, SOX3). The sex-determining region
   Y-box (SOX) family of transcription factors has been linked to
   oncogenic processes, programs related to cell differentiation and
   stemness [[131]39]. Some of the SOX genes have been associated with
   poor prognosis and cancer progression (SOX2, SOX5) [[132]39,[133]40].
   In lung cancer, SOX30 had a tumor suppressor role through p53
   activation [[134]41]. The high expression of SOX 30 was correlated with
   better overall survival. Similarly, SOX11 has been implicated as a
   factor that can function as both a tumor suppressor and a metastasis
   inducer [[135]42].

   In contrast, SOX3 was shown to upregulate pro-apoptotic genes in breast
   cancer and was associated with a poor prognosis. The SOX transcription
   factors can promote cell proliferation and survival in cancer in a
   context-dependent fashion. Of the three SOX genes found downregulated
   in Cluster 1, both SOX11 and SOX3 have been demonstrated to have
   oncogenic features in ovarian or cervical cancer
   [[136]43,[137]44,[138]45]. We found all three genes downregulated in
   the cluster associated with the best clinical outcomes. Additionally,
   these genes were not differentially methylated in this cluster vs. the
   non-responsive clusters 3/4. This axis of transcription factors may
   function in a context-dependent fashion within this HGSOC subset.

   The Cluster 3 “Non-Responsive|Immunosuppressive” group is noteworthy
   for its immune cell-type composition differences and its upregulation
   of multiple chemokine signaling and immunoglobulin heavy and
   kappa-chain-associated genes. Both CXCR2 and CCR2 have been associated
   with chemoresistance in multiple cancer types, including HGSOC
   [[139]46]. The targeting of CXCR2 inhibited lung cancer progression and
   promoted the cytotoxic effects of cisplatin [[140]47]. Many immune
   signaling chemokines and receptor genes were found (CCL5, CCRL2, CD244,
   CD28, CD3G, CD48, CD52, CD84). IRAK3 has recently been studied for
   targeting to enhance pro-inflammatory cytokine signaling in cancer
   [[141]48]. Immune cell-type composition analyses suggest interesting
   differences in stromal and immune cell composition in this cluster.
   Populations of macrophages, monocytic cells, and cancer-associated
   fibroblasts appeared to be increased in Cluster 3, as does the overall
   immune signature. Cancer-associated fibroblasts (CAFs) can promote
   tumor growth in the microenvironment via the remodeling of the
   extracellular matrix but also by the secretion of cytokines and growth
   factors, some of which were identified via differential analysis
   [[142]49]. Macrophages can have both pro- and antitumoral effects
   depending on polarization state [[143]50]. Tumor-associated macrophages
   (TAMs) can incline toward a pro-growth state, suppressing immune
   response by cytokine and chemokine production and the co-recruitment of
   other immunosuppressive cell types (e.g., myeloid-derived suppressor
   cells and regulatory T-cells) [[144]51]. Interestingly, the activated
   myeloid dendritic cell (mDC) fraction was also higher in Cluster 3.
   Typically, mDCs function in antigen presentation and, when activated,
   can secrete IL-12 and promote the differentiation of T-cells into Th1
   cells [[145]52]. Thus, mDCs typically function in an immune-promoting
   fashion. Taken with the prior cell-type differences, the precise
   mechanism of immunosuppression in Cluster 3 warrants further
   examination.

   The RNA-seq findings for Cluster 4 “Non-Responsive|Hypomethylated” were
   noteworthy in contrast to Cluster 3. The two groups had comparable
   clinical outcomes regarding both time to recurrence and time to death.
   Yet, whereas Cluster 3 showed the upregulation of multiple
   immunoglobulin chain coding genes, Cluster 4 showed the downregulation
   of this group. Both the mutation status of IGHV genes and expression
   response to treatment have been demonstrated as prognostic biomarkers
   [[146]53]. Patients with a TP53 mutation and unmutated IGHV genes were
   predicted to have shorter time to first treatment in a cohort of
   chronic lymphocytic leukemia patients [[147]54]. That the expression of
   these immunoglobulin genes should vary so much between two similar
   non-responsive groups was an unexpected finding, particularly as they
   were compared to the Cluster 1 (“Responsive”) baseline. Follow-up
   inspection is warranted around this gene set. Among the most
   upregulated DEGs in Cluster 4, which were not found in Cluster 3, we
   note ELAVL3, GCK, PYY, SSTR3, FGF17, and PLPPR1. Several of these have
   previously been noted for their varying roles in oncogenesis and cell
   proliferation [[148]55,[149]56,[150]57,[151]58]. GCK is a glucokinase
   responsible for glucose metabolism by sensing glucose uptake and
   encouraging insulin release. GCK is highly expressed in the liver, and
   its expression is closely related to hexokinase-2 [[152]59]. Its role
   in ovarian cancer has not been studied. Our data support that the
   expression of this enzyme could be associated with worse clinical
   outcomes, potentially through the alteration of glucose metabolism.

   The shared features of both non-responsive clusters (3 and 4) included
   the overexpression of multiple tumor antigen genes: PAGE2, GAGE2A,
   GAGE13, and GAGE2E. Such tumor antigens have been described in many
   cancer types, and they represent cancer/testis antigens which are
   normally expressed in male germ cells and are repressed in cancer
   [[153]59,[154]60]. They represent potential targets for innate or
   therapeutic immune recognition.

   Notably, many DMPs within Cluster 4 were hypomethylated—compared to
   Cluster 1. This finding would imply a lack of epigenetic repression in
   the treatment-resistant group. We found that there were significantly
   more overexpressed genes in the DEG groups of both Cluster 4 and 3 (the
   non-responsive groups), compared to Cluster 1. Among the DMPs of
   Cluster 4, there were multiple anion transporters (SLC16A3, SLC26A8),
   as well as multiple altered probes in the promoter region of IL1RN. The
   latter gene codes for an interleukin 1 cytokine which modulates immune
   and inflammatory response [[155]61].

   Finally, our findings are restricted to HGSOC, a distinct entity within
   the spectrum of ovarian tumors, and confirm that the heterogeneity of
   clusters within HGSOC are unique to this disease. For instance,
   multiple studies have examined relapse propensity in low-grade serous
   ovarian cancer (LGSOC) [[156]62,[157]63]. LGSOC typically displays
   initial treatment resistance and a histology resembling more closely
   that of normally differentiated epithelial cells. Genomic profiling (n
   = 215) indicated high rates of MAPK, BRAF, and NRAS mutation. Notably,
   the overexpression of CDKN2A was associated with worse disease outcome,
   and the low expression of CD274 (PDL1) was found in a subset of LGSOC
   patients. Conversely, other studies have demonstrated that high
   progesterone receptor expression, correlated with the presence of
   CTNNB1 mutations, is associated with a favorable prognosis [[158]64].
   While low genomic complexity is a profile typical of LGSOC, our
   findings confirm the genetic and epigenetic complexity of HGSOC.

   Further, the sheer number of features that were significantly
   correlated with recurrence in our Cox model supports the notion that a
   small group of genomic features are likely insufficient as a biomarker
   set for HGSOC prognostic classification. The clinical applicability of
   such models remains limited at this time; however, a validated
   signature including multiple genetic and epigenetic features, as we
   propose here, may become useful in the future to identify patients at
   high risk for early recurrence.

4.3. Utility and Data Limitations

   The clinical utility of novel subtyping, even with the introduction of
   multi-omics integration and newer clustering methods, is still debated.
   Here, we present a novel workflow used to integrate methylation array
   data with NGS RNA-seq data. Using our semi-biased clustering approach,
   we were able to define gene signatures correlating with distinct
   phenotypes of chemo-response in subsets of HGSOC patients from TCGA and
   ICGC. We paired this approach with an MLP neural network learning model
   to ensure that each de novo cluster was predictable from the features
   selected for it. In so doing, we provide a feature selection,
   clustering, and deep learning approach which can be applied to any
   multi-omics dataset with a definable outcome measure (e.g., survival,
   recurrence, response to therapy). The inclusion of 27k array data
   presented challenging integration with RNA sequencing data. Firstly,
   the available probes are limited to the promoter region for this assay
   (distance to TSS ≤ 1500). Secondly, most genes are represented by only
   one probe—those with more probes had to be averaged by probe-wise
   methylation level. Finally, the normalization methods widely available
   for 450k and EPIC array are not available for 27k array data. Thus, a
   bespoke workflow had to be created for within-array normalization,
   probe-level quality filtration, and imputation.

5. Conclusions

   Here, we utilize a workflow of integrated ‘omics data, across HGSOC
   tumors profiled in the TCGA to identify novel subtypes of the disease.
   Using unsupervised clustering, we present four unique phenotypes which
   are predictable via MLP machine learning architecture, based upon
   features most correlated with disease recurrence. Importantly, we
   identified genetic markers of each subtype, as well as a strong
   immunosuppressive signature in one cluster enriched for stromal and
   immune cell types. Our findings underscore the importance for ‘omic
   integration in single-cancer clustering analysis and can be extended to
   other layers of data.

Supplementary Materials

   The following supporting information can be downloaded at:
   [159]https://www.mdpi.com/article/10.3390/cancers15143649/s1, Figure
   S1. Results of Consensus Clustering of RNA and 450k array features from
   ICGC validation dataset, selected by Cox Regression (n = 6884 gene
   features). KM-based recurrence-risk analysis performed for each
   identified cluster. Recurrence variable analyzed against the censorship
   variable (lost to follow-up, study withdrawal). K = 4 clusters were
   optimal for significance (p-value < 0.0001). Table S1. Top 20 features
   selected by Cox regression model (p < 0.05) of integrated
   transcriptomic and methylomic TCGA data. Table S2. Results of
   Kruskal-Wallis test, comparing the mean of various immune cell-type
   fractions across all identified clusters. Highlighted rows have a
   p-value < 0.01. Table S3. Common features selected in both primary
   analysis of integrated TCGA data, and validation analysis of integrated
   ICGC data. ICGC 450k methylation array data were limited to
   promoter-associated features (TSS +/− 1500) when examining overlapped
   features.
   [160]Click here for additional data file.^ (205.7KB, zip)

Author Contributions

   Conceptualization, R.K. and D.M.; Methodology, R.K.; Validation, M.K.;
   Formal analysis, R.K.; Data curation, R.K.; Writing—original draft,
   R.K.; Writing—review & editing, D.M., R.D. and M.K.; Visualization,
   R.K.; Supervision, R.D.; Funding acquisition, D.M. All authors have
   read and agreed to the published version of the manuscript.

Institutional Review Board Statement

   Not applicable.

Informed Consent Statement

   Not applicable.

Data Availability Statement

   TCGA expression and methylation data are available through GDC portal:
   [161]https://portal.gdc.cancer.gov.

Conflicts of Interest

   The authors declare no conflict of interest related to this work. D.M.
   received consultant fees from Astra Zeneca, Glaxo Smith Kline,
   Elsevier, and the Gynecologic Oncology Group Partners Foundation and
   research funding from Merck and PinotBio, not related to this project.

Funding Statement

   Funding for this study was contributed by The Translational Bridge
   Program of the Robert H Lurie Comprehensive Cancer Center, the US
   Department of Veterans Affairs IO1BX000792 to D.M. and by NCI CCSG P30
   CA060553 awarded to the Robert H Lurie Comprehensive Cancer Center.

Footnotes

   Disclaimer/Publisher’s Note: The statements, opinions and data
   contained in all publications are solely those of the individual
   author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI
   and/or the editor(s) disclaim responsibility for any injury to people
   or property resulting from any ideas, methods, instructions or products
   referred to in the content.

References