Abstract

   Background: The incidence of Oral Cancer (OC) is high in Asian
   countries, which goes undetected at its early stage. The study of
   genetics, especially genetic networks holds great promise in this
   endeavor. Hub genes in a genetic network are prominent in regulating
   the whole network structure of genes. Thus identification of such genes
   related to specific cancer types can help in reducing the gap in OC
   prognosis.

   Methods: Traditional study of network biology is unable to decipher the
   inter-dependencies within and across diverse biological networks.
   Multiplex network provides a powerful representation of such systems
   and encodes much richer information than isolated networks. In this
   work, we focused on the entire multiplex structure of the genetic
   network integrating the gene expression profile and DNA methylation
   profile for OC. Further, hub genes were identified by considering their
   connectivity in the multiplex structure and the respective
   protein-protein interaction (PPI) network as well.

   Results: 46 hub genes were inferred in our approach with a high
   prediction accuracy (96%), outstanding Matthews coefficient correlation
   value (93%) and significant biological implications. Among them, genes
   PIK3CG, PIK3R5, MYH7, CDC20 and CCL4 were differentially expressed and
   predominantly enriched in molecular cascades specific to OC.

   Conclusions: The identified hub genes in this work carry ontological
   signatures specific to cancer, which may further facilitate improved
   understanding of the tumorigenesis process and the underlying molecular
   events. Result indicates the effectiveness of our integrated multiplex
   network approach for hub gene identification. This work puts an
   innovative research route for multi-omics biological data analysis.

   Keywords: Hub genes, Multiplex network, Multi-omics data, Oral cancer
     __________________________________________________________________

   Hub genes; Multiplex network; Multi-omics data; Oral cancer

1. Introduction

   Cancer has emerged as a deadly disease killing people worldwide. Head
   and neck squamous cell carcinoma (HNSCC) is the most occurring cancer
   in south Asia with an annual incidence of more than 58% in India
   [29]Kulkarni (2013); [30]Joshi et al. (2014). Oral cancer (OC) is the
   most common subtype of HNSCC. In India, more than two-third of the oral
   cancer patients present in the advanced stage of cancer [31]Kulkarni
   (2013). Despite the significant progress in molecular diagnosis, early
   detection of OC is still not available. Early detection of any cancer
   type is challenging [32]Schiffman et al. (2015). Hence, precisely
   predicting the disease will add significant value in treating patients.
   Historically it is known that cancer occurs due to mutation in genes
   [33]Hanahan and Weinberg (2011). This boosts the extensive analysis of
   genetic networks using data mining algorithms and machine learning
   tools for the identification of significant genes involved in the
   disease progression. Furthermore, the study of gene expression levels
   to detect or discriminate cancer or its subtypes justifies the
   existence of a strong correlation between gene expression and disease
   types [34]Marisa et al. (2013); [35]Randhawa and Acharya (2015).
   However, gene expression data is limited with high noise, fewer number
   of samples in comparison to a large number of genes. DNA methylation is
   a heritable epigenetic mark that is essential for normal cell
   development [36]Jin et al. (2011). Aberrant methylation pattern
   significantly modulates gene expression levels and results in the
   development of cancer [37]Wajed et al. (2001). Hence, a combined study
   of gene expression and DNA methylation pattern can help in identifying
   potential genetic indicators of cancer and allow for prevention and
   early detection, which is crucial to cancer treatment.

   In network theory, a group of highly connected nodes is known as hub
   nodes and these sets of nodes are found to be significant in many
   networks. Hub nodes in a genetic network are expected to play an
   influential role in terms of regulating the whole network structure.
   Langfelder et al. addressed the identification of intramodular hubs in
   multiple genomic networks [38]Langfelder et al. (2013). The selected
   hub genes were biologically meaningful and can be considered for
   prognostic applications. The growing availability of diverse omics data
   enhances the study of biological systems at different levels. In this
   context, different integrative pipelines of multi-omics data analysis
   have been suggested. Study reveals that transcriptomic data along with
   the know-how of the existing gene network provides a new way of
   exploring the genetic associations for candidate gene selection and
   functional module identification. This combination gives in-depth
   biological interpretation and enhanced statistical analysis. A dense
   subgraph based integrated approach was deployed in [39]Swarnkar et al.
   (2015), that combined expression analysis of genes with protein-protein
   interaction (PPI) networks to identify functionally enriched genes. The
   extracted dense subgraphs spoke for densely connected and highly
   co-expressed clusters from biological networks. [40]Gadaleta et al.
   (2017) integrated gene expression data and DNA methylation data for
   glioblastoma multiforme using Regression2Net and identified potential
   candidate genes showing significant over representation in different
   cancer related pathways. Eight most connected hub genes and candidate
   differentially expressed genes were identified in [41]Liu et al. (2018)
   by integrating multiple cohort profile data sets of B-cell lymphoma.
   The identified biomarkers were associated with significant biological
   communications and could be the therapeutic targets to mark the
   prognostic difference between the subtypes of B-cell lymphoma. A gene
   co-expression network based approach was suggested in [42]Randhawa and
   Acharya (2015) to identify few vital genes with prognostic significance
   in various stages of Oral Squamous Cell Carcinoma (OSCC). Predictive
   accuracy of 81% was observed for the selected genes when tested against
   the developed predictive model to classify the preliminary and advanced
   stage of OSCC tumor. Lei Chen et al. proposed an integrated
   computational method based on random walk with restart and shortest
   path algorithms to identify novel genes associated with oral squamous
   cell carcinoma which opens an opportunity to the in-depth study of OC
   [43]Chen et al. (2017). Data from different experiments, web sources,
   and PPIs were integrated in [44]Kumar et al. (2017) to identify
   candidate genes of OC. The 39 selected candidate genes from this
   approach were significantly enriched with different biological
   processes involved in OC. [45]Li et al. (2018) compared the mean
   expression levels of genes between diseased and control samples to
   identify differentially expressed and differentially variable genes for
   OSCC. In our previous work [46]Mahapatra et al. (2018), we had proposed
   an integrated pipeline of biological networks for the identification of
   candidate genes specific to a disease. In the first step of the
   pipeline, the constructed network of co-expressed genes in combination
   with a known PPI network resulted in an induced network that preserved
   the interplay and relationship of genes/proteins. Further, densely
   connected modules were extracted and hub genes in the modules were
   identified. The selected dense modules were observed to be biologically
   significant and statistically significant with high predictive ability.

   Nevertheless, the in-depth analysis of biological networks holds a
   great promise in the identification of such candidate biomarkers and
   plays a decisive role in deciphering the biological mechanisms behind
   complex diseases. However, these biological networks consist of
   different types of links among the interacting units, each link
   encoding for a type of interaction. As a result, these networks show a
   higher complexity describing inter-dependencies within and across
   various networks which may fluctuate. Subsequently, it is not able to
   record the information characterizing the complex biological system
   [47]Doncheva et al. (2018). Traditional study of genetic networks
   assumes that gene-gene associations are exclusive, unvarying and
   encapsulating all information between them. Simultaneously it ignores
   the presence of multiple types of gene-gene interactions
   (multiplexity), which may lead to misleading of results and generation
   of false positives [48]De Domenico et al. (2013); [49]Kanawati (2015).
   In this context, multiplex network provides a way to model and analyze
   such type of complex real world networks by integrating multiple types
   of interactions among different biological units. A multiplex framework
   was proposed to study the effect of social relationships on employee
   performance, which represented the entire employee social network as a
   multiplex structure [50]Cai et al. (2018). In comparison to the
   singleton network model, better reasoning of employee performance was
   found through the multiplex network model that integrated the various
   types of social relations. Finn et al. outlined the use of multilayer
   network in animal behavior analysis [51]Finn et al. (2019). The
   effectiveness of the multilayer network was analyzed to uncover
   eco-evolutionary dynamics of animal social behavior. Chierici et al.
   [52]Chierici et al. (2020) proposed an integrative network fusion
   pipeline, using Similarity Network Fusion (SNF) in a machine learning
   framework for biomarker identification in cancer. The exploitation of
   multilayer network analysis in biological networks is sparse; however,
   clearly relevant with the context. In a very recent study, a
   constraint-based PageRank algorithm: iRank was proposed to prioritize
   cancer genes for hepatocellular carcinoma (HCC) using a multiplex
   network of omics data [53]Shang and Liu (2020). Gene Regulatory Network
   (GRN) was utilized as a core-level network in the multiplex system and
   other omics level networks were used to define interactions across
   different levels (DNA level, RNA level). The identified cancer genes
   were observed with improved rank and better classifier performance by
   empirically integrating more levels of omics data. Wang et al. proposed
   a multiplex network based data integrative approach and identified 3
   subtypes for glioblastoma multiforme (GBM) and breast invasive
   carcinoma (BIC) [54]Wang et al. (2016). The result obtained was highly
   consistent with the state-of-the-art techniques (Normalized Mutual
   Information>0.8). A multiple network based algorithm: EMDN, was
   proposed in [55]Ma et al. (2017) that integrates multiple networks of
   gene differential co-expression and gene differential co-methylation to
   identify the epigenetic modules. EMDN algorithm proceeds in three
   steps: (i) prioritizing the seed (ii) identification of module by seed
   expansion (iii) refinement of candidate modules. The spotted epigenetic
   modules were highly enriched by known biological pathways and can serve
   as biomarkers to predict breast cancer subtypes. The multi-network
   analysis of different types of omics data was highly effective in
   correlating and integrating multiple data levels and discovering
   complex disease patterns with multiple facets. To the best of our
   knowledge, the outlined studies of biomarker identification for OC
   potentially investigated the driving force of a gene on a single omics
   level for cancer progression. Gene-gene co-expression was adopted as
   the most common type of correlation to understand the network of genes.
   The comprehensive contribution of gene correlation at multiple
   biological levels was not considered.

   In the present study, we proposed a multiplex network based integration
   strategy for the identification of the hub genes for Oral Cancer (OC).
   The novelty in our approach lies in (1) Construction of three
   integrated networks for analyzing the comprehensive contribution of
   gene correlations across transcriptomic and epigenetic levels in OC.
   (2) Identification of few hub genes fulfilling the defined ensemble
   norm of multi-degree centrality and intramodular connectivity.
   Concretely, individual networks of gene co-expression (GCoExNW) and
   gene co-methylation (GCoMythNW) were constructed out of 120 samples of
   gene expression and DNA methylation data. Further, along with the
   multiplex network, an union and intersection networks were constructed
   by integrating the co-expression and co-methylation relation between
   genes. Hub genes were identified from the constructed integrated
   networks employing two different selection criteria defined in the
   following sections. Extracted hub gene sets from the constructed
   multiplex network were compared with the hub genes identified from the
   union network and intersection network. The workflow of our proposed
   approach is illustrated in [56]Fig. 1. The identified hub genes were
   evaluated in terms of: i) Statistical Competence ii) Differential
   Expression Analysis iii) Biological significance analysis.

Figure 1.

   [57]Figure 1
   [58]Open in a new tab

   Systematic illustration of pipeline for Hub Gene selection using
   integrated multiplex network based approach. Preprocessing step removes
   noise present in the data; Correlation NW construction step constructs
   two separate networks of gene co-expression and co-methylation;
   Integrated network Construction constructs three different integrated
   networks using the individual networks of gene co-expression and
   co-methylation; Hubgene identification step filters significant hub
   genes from three integrated networks; Finally the last step is
   statistical and biological validation of handpicked hub genes.

2. Materials and methods

2.1. Dataset

   Gene expression and DNA methylation data for head and neck squamous
   cell carcinoma (HNSCC) were considered for our study, comprising of 13
   primary sites of head and neck region The data were downloaded from GDC
   data portal of TCGA website.[59]1 Specifically, 120 matched samples for
   level-3 data set of gene expression and DNA methylation were obtained,
   consisting of 100 tumor samples and 20 normal samples. The gene
   expression data were generated using RNA sequencing (RNA-seq)
   technology in TCGA project. For DNA methylation data, β intensity of
   the signal of probes was used for analysis. The β values are continuous
   valued, with a minimum value of 0 (unmethylated) and a maximum value of
   1 (completely methylated). Annotation table of the Illumina
   Human-Methylation27 platform was used to map the probe IDs to
   corresponding gene symbols. For the gene symbols corresponding to
   multiple probe IDS, the β values of probes were averaged representing
   the β value of the gene. In the present study, the NCBI gene
   dataset[60]2 was utilized for biological relevance analysis of the
   distinguished genes. It includes 8933 cancer related genes and 316 oral
   cancer genes. STRING interactome database (Version 11.0)[61]3 was used
   to further validate the identified genes by mapping them to the
   respective PPI network of genes/proteins.

2.2. Weighted gene correlation network analysis

   Correlation among expression level and methylation intensity of genes
   was used to decipher their functionality. In this context, Weighted
   Gene Correlation Network Analysis (WGCNA) [62]Langfelder and Horvath
   (2008) in R, was used to construct a weighted network of correlation
   and to discover modules of highly correlated genes. WGCNA is a well
   established method in the field of biological science for the analysis
   of gene co-expression networks [63]Padi and Quackenbush (2015);
   [64]Nangraj et al. (2020). It computes Pearson correlation coefficient
   to measure the correlation among genes. Construction of scale free
   network strongly depends on soft threshold power β. Co-expression
   similarity is raised to the power of β to compute adjacency
   [65]Langfelder et al. (2007). The functions pickSoftThreshold and
   scaleFreePlot of the WGCNA package, assists respectively in selecting a
   proper β value to compute adjacency and for achieving a network with
   scale free topology. The best value of β is chosen when the curve
   showing the relation between scale free topology fit index
   [MATH: <msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup>
   :MATH]
   and β reaches a saturation point.

2.3. Multiplex network construction

   A multiplex network can be defined as assemblages of distinct networks.
   Individual network encodes for a different layer of the multiplex
   system. Each layer in the multiplex system contains the same set of
   entities (nodes) but interconnected by different types of links
   (edges). The layers are connected (and hence coupled) to each other via
   interlayer edges [66]Kanawati (2015); [67]Boccaletti et al. (2014). A
   multiplex network can be utilized for modeling different types of
   networks such as:
     * •
       Multi relational network: Multi relational network is a platform
       for modeling a complex system of networks that encodes different
       types of association among participating units. The link between
       nodes in a layer depicts a type of interaction. In this work we
       have considered co-expression and co-methylation similarity among
       genes as multiple types of relationships.
     * •
       Dynamic Network: where edges within a layer represent the network
       state at a particular timestamp.
     * •
       Attributed Network: where based on the similarity measure applied
       to a group of nodes, a similarity graph can be constructed which
       can be redefined as an additional layer.

   MuxViz is an open-source R software package with a graphical user
   interface, used specially for the visualization and analysis of
   multiplex networks [68]De Domenico et al. (2015). MuxViz consists of a
   collection of algorithms for measuring the “node centrality”,
   “interlayer correlation and reducibility”, “identifying community
   structure in the network” and “motif analysis” [69]De Domenico et al.
   (2013). We focused on “node centrality”, specifically degree centrality
   and eigenvector centrality of nodes to identify the hub genes in the
   integrated network.

2.4. Proposed model

   The steps of our proposed integrated multiplex network based approach
   of hub gene identification is explained below. Workflow of the proposed
   work is demonstrated in [70]Fig. 1.

2.4.1. Preprocessing

   The sample matched data of Gene expression and DNA methylation were
   inputted separately to the preprocessing step of our proposed approach.
   Firstly, genes with more than 30% of missing values across the samples
   were filtered and further, missing values in the data were replaced
   with the mean of the sample [71]Mahapatra et al. (2018). It brought
   about 18283 genes and 120 samples in gene expression data and 20523
   genes and 120 samples in DNA methylation data. Hierarchical clustering
   was performed on both the data sets to remove outliers in the samples,
   producing 116 gene expression samples and 115 DNA methylation samples.
   Out of these 114 samples were common in both data set consisting of 100
   tumor and 14 normal samples, which was further considered in the next
   step of network construction. Additionally, genes were filtered showing
   fold change (FC) less than 0.5 in their expression levels in normal and
   tumor samples, producing 10376 genes [72]Dembele and Kastner (2014);
   [73]Zhao et al. (2018).

2.4.2. Correlation network construction

   Preprocessed data of gene expression and DNA methylation with 10376
   genes and 20523 genes respectively were separately inputted to WGCNA
   functions for the construction of gene co-expression network (GCoExNW)
   and gene co-methylation network (GCoMythNW). WGCNA defines a
   correlation similarity which is raised to power β to compute adjacency
   similarity, so that degree distribution fits a small world network. The
   power value of 6 was taken where the relation between
   [MATH: <msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup>
   :MATH]
   and β reaches saturation. Additionally, to obtain the similarity
   between genes at network topology level, a Topological Overlap Matrix
   (TOM) was computed from the adjacency similarity matrix. The resultant
   network meets the scale free independence criterion. Keeping a
   threshold of 0.1, the edges in the GCoEx network and GCoMyth network
   were scrutinized to facilitate the formation of integrated networks.
   BlockWiseModules function performs automatic network construction
   followed by identification of clusters of interconnected genes termed
   as modules of co-expressed genes in a block-wise manner. If the number
   of genes in the data set exceeds maxBlockSize, genes will be
   pre-clustered into blocks whose size should not exceed maxBlockSize and
   the function constructs network in a block wise manner. We have set
   maxBlockSize to be 30. WGCNA utilizes unsupervised hierarchical
   clustering for module detection. Later modules in blocks are merged
   based on mergeCutHeight threshold. mergeCutHeight is the dendrogram cut
   height for module merging. Observing the cluster dendrogram and
   experimenting with different values for mergeCutHeight, the cut height
   value .30 resulted in a good set of modules of correlated genes. The
   intramodular connectivity of the genes within the module was computed
   for all genes in both the networks of GCoExNW and GCoMythNW.
   Intramodular connectivity is a measure of connection strength or
   co-expression level of a given gene with respect to all the genes of a
   particular module.

2.4.3. Integrated network construction

   Three different integrated networks were constructed from the
   correlation similarity networks i.e. GCoExNW and GCoMythNW. The
   integrated networks formed were:
     * 1.
       Union Network: Considering the union of all the edges in the
       generated GCoExNW and GCoMythNW, an integrated Union Network was
       formed consisting of 6189 nodes and 304656 edges. The edges in the
       created Union Network convey either co-methylation, co-expression
       or co-methylation along with the co-expression relationships among
       genes. Alteration of genes can be at single or multiple biological
       levels.
     * 2.
       Intersection Network: An integrated Intersection Network was
       created by taking the common edges between the two singleton
       networks i.e. GCoExNW and GCoMythNW. This intersection network
       consisted of 259 nodes and 357 edges. Here, genes that alter at
       multiple biological levels were focused. i.e. genes in this
       Intersection Network not only co-expressed but co-methylated also.
       Genes in this Intersection Network co-alter in the epigenetic as
       well as transcriptomic levels.
     * 3.
       Multiplex Network: Multiplex network provides a platform for
       modeling of the complex system of networks where nodes in the
       network are connected through the variety of links. It is
       represented as a network of networks, each encoding different types
       of association among participating units/nodes [74]Kivela et al.
       (2014). Each network encodes for different layers in a multiplex
       system with coupling links between layers. More specifically, in
       the present study, we exploited the multiplex networks to model the
       association between functionally similar genes at multiple
       biological levels. Different types of functional dependencies exist
       among the essential molecular components of the cell across diverse
       biological levels [75]Elling and Deng (2009). Because of this
       interdependency, the disruption caused due to aberrant methylation
       of a single gene can be quickly propagated to the PPI network. It
       leads to abnormal functioning of tissue or cell which culminate in
       disease. In this work, the co-expression and the co-methylation
       relationship between genes was utilized to construct the multiplex
       system. MuxViz software [76]De Domenico et al. (2015) was used for
       the construction and visualization of the multiplex network.
       The multiplex network was constructed as an edge colored multigraph
       consisting of two layers by taking into consideration the
       co-expression and co-methylation similarity among genes, where each
       layer contains the same sets of nodes. The two layers of the
       multiplex network were namely GCoEx (representing gene
       co-expression network) and GCoMyth (representing gene
       co-methylation network). 1068 number of genes were found to be
       common in the constructed GCoEXNW and GCoMythNW. This set of 1068
       genes and their involved edges in the respective GCoExNW and
       GCoMythNW were considered for the construction of a multiplex
       network. An aggregate network was also generated, where data from
       different interactions or layers is added or somehow packed up into
       a monoplex structure. However, the aggregated network may lead to
       loss of information because different types of links were treated
       indifferently [77]Kanawati (2015). We cross-examined the 1068
       genes, for the presence of disease-related genes, which resulted in
       704 cancer related genes and 29 oral cancer genes. The multiplex
       network formed is shown in [78]Fig. 2. For clarity in
       visualization, only the edges involving oral cancer genes are shown
       in the figure.
       Multiplex network consolidates the analysis of different biological
       networks and simultaneously examines the importance of individual
       interacting unit with and across networks. To quantify the
       importance of a node in the multiplex network, the diagnostic
       measures adopted were degree centrality (multi-degree centrality)
       and eigenvector centrality. Multi-degree centrality of a node in
       the multiplex network is the number of edges of any type that are
       incident on it, i.e. the participation of the node within and
       across different layers. Such individual units have a larger impact
       on whole network dynamics. Eigenvector centrality is the measure of
       the influence of a node in a network. It assigns each node a
       centrality score that depends on both the number and strength of
       connections [79]Negre et al. (2018). Node with high eigenvector
       centrality (score>0.5) indicates that the node is connected to many
       nodes who themselves have high scores.

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Multiplex Network formed from GCoExNW and GCoMyth.

2.4.4. Hub gene identification

   A highly connected gene in a network is termed as a hub gene. In our
   earlier report [82]Mahapatra et al. (2018), a gene was marked as a hub
   if it is connected to 10%, 20% or 30% of other genes in the whole
   network. However, going through the cited approach, an adequate number
   of hub genes were not identified in this work. Two different criteria
   were set for screening the hub genes in the three integrated networks.

   In network theory, a node is said to be a hub if its connectivity is
   larger than the average connectivity of the whole network [83]Das et
   al. (2017); [84]Barabasi and Oltvai (2004). Fixing this as the first
   criteria, 1710 genes in Union Network, 78 genes in Intersection network
   and 527 genes in the Multiplex Network were marked as hub genes,
   showing high connection degree in their respective integrated network.

   Furthermore, in criteria 2, we introduced an ensemble of gene ranking
   and intramodular connectivity to select the hub genes in the integrated
   networks. Three sets of hub genes were identified by taking the top 20%
   of genes, ranked according to their connectivity (degree centrality) in
   the respective integrated networks. Moreover, the marked hub gene has
   to meet the normalized value of its intramodular connectivity >0.20 in
   their respective GCoExNW and GCoMythNW as well. Going through the above
   steps, 721 genes in the Union Network, 40 genes in the Intersection
   Network and 233 genes in the Multiplex network were figured out as hub
   genes. Criteria 2 of hub gene identification resulted in genes that
   were not only highly connected in the integrated structure but are also
   well connected in their neighborhood.

2.4.5. Validation of hub genes

     * (A)
       Statistical Validation
       Matthews coefficient correlation (MCC) is a widely accepted
       parameter in various machine learning platforms to quantify the
       quality of a binary classifier. It represents the correlation
       coefficient between observed and predicted binary classifications,
       with a value ranging between −1 to +1. MCC value of +1 signifies
       perfect prediction, whereas −1 indicates a mismatch between
       prediction and observation. A coefficient value of 0 designates for
       random prediction. It is known to be a balanced measure as it works
       well even if the classes are of varying sizes. In this work
       predictive ability of the hand-picked hub genes from Union,
       Intersection and Multiplex networks were calculated and compared in
       terms of MCC scores, as the OC data set we had considered for
       analysis contains an imbalanced ratio between the number of disease
       and normal samples. Along with MCC, other measures such as
       Sensitivity, Specificity, Predictive Accuracy, Precision and
       F-measure were adopted for statistical analysis in comparison to
       the known true classes [85]Mahapatra et al. (2018).
     * (B)
       Differential Expression (DE) Analysis The objective of DE analysis
       aims at identifying genes showing quantitative changes in
       expression levels between tumor and normal samples. Comparison of
       sample variance between two groups was performed using t-test
       [86]Pan (2002). To control for multiple testing, we used Benjamini
       and Hochberg's method to adjust for p-values of DE tests so that
       the false discovery rate padj(FDR)<0.05. Finally, logarithmic Fold
       Change values for the genes were computed to identify the
       differentially expressed genes. The result was plotted using R
       package ggplot.
     * (C)
       Biological Validation
       The following approaches were employed for biologically validating
       the identified hub genes.
          + (i)
            Correspondence between the hub genes and the disease of
            interest: gene dataset from NCBI was utilized for biological
            relevance analysis of the distinguished genes with respect to
            Oral Cancer. The data set includes 8933 cancer related genes
            and 316 oral cancer genes. The identified hub gene sets were
            compared in terms of the presence of oral cancer genes and
            cancer related genes.
          + (ii)
            Functional Association Analysis:
               o (a)
                 Association of the distinguished hub genes with the seed
                 genes: tumorigenesis is a complex multi-step process
                 involving mutation in genes due to dysregulated
                 biological processes and pathways [87]Levine and Steffen
                 (2001). The functionality of a gene can not be studied
                 completely as an isolated component. Because genes always
                 interact with each other to be expressed. A group of
                 genes can be marked as functionally analogous by mapping
                 them to a known network. Study suggests, direct
                 protein-protein interactions are the strongest indicator
                 of functional association among genes and this
                 information can be exploited for the identification of
                 candidate genes [88]Vinayagam et al. (2014).
                 In this study, a set of 28 universally accepted OC genes
                 that play a crucial role in Oral Cancer progression were
                 considered as seed genes (Supplementary Table S2). The
                 marked hub genes along with the seed genes were mapped to
                 the PPI network to unveil the functionality of hub genes
                 and strengthen the study of disease pathogenesis. STRING
                 database [89]Szklarczyk et al. (2015) and Cytoscape
                 [90]Shannon et al. (2003) were used to construct and
                 visualize the connectivity of the seed genes and hub
                 genes in the PPI network. Cytoscape is an open-source
                 software used for the analysis and visualization of
                 biological networks. stringApp of Cytoscape plug-in was
                 employed for the analysis.
               o (b)
                 Gene Ontology (GO) functional enrichment analysis of
                 putative hub genes: DAVID [91]Sherman et al. (2009) was
                 adopted to uncover and further analyze the biological
                 meaning behind the identified optimal hub genes. DAVID is
                 a tool for functionally classifying genes using a set of
                 functional annotation tools and analyzing the biological
                 role of genes. It is widely used to perform GO and KEGG
                 (Kyoto Encyclopedia of Genes and Genomes) pathway
                 enrichment analyses of genes.

3. Results

   The Hub Gene Identification step of our proposed work gave rise to six
   different sets of hub genes selected from the three integrated
   networks: Union network, Integration network and Multiplex network. The
   different sets of hub genes, marked by deploying criteria 1 and
   criteria 2 of our approach are given in [92]Table 1.

Table 1.

   Hub genes marked in the three integrated networks by deploying
   selection parameters C1 and C2.
   Criteria 1 (C1)
     __________________________________________________________________

   Criteria 2 (C2)
     __________________________________________________________________

     Hub    NoG     CR      OC     Hub    NoG    CR      OC
   C1-HubUn 1710 706(41%) 25(2%) C2-HubUn 721 279(38%) 10(2%)
   C1-HubIn 78   33(42%)  0(0%)  C2-HubIn 40  14(35%)  0(0%)
   C1-HubMN 527  218(42%) 8(2%)  C2-HubMN 233 97(42%)  8(4%)
   [93]Open in a new tab

   C1-HubUn, C1-HubIn, C1-HubMN are three hub gene sets identified from
   Union Network, Intersection Network and Multiplex Network using
   criteria 1; C2-HubUn, C2-HubIn, C2-HubMN are three hub gene sets
   identified from Union Network, Intersection Network and Multiplex
   Network using criteria 2; NoG Number of genes, CR Number of Cancer
   related genes, OC Number of Oral cancer genes.

   Before moving further we did a priori validation of six hub gene sets
   by investigating their correspondence with the disease. The spotted hub
   gene sets were examined for the presence of cancer related (CR) genes
   and oral cancer (OC) genes as described in the section [94]2.4.5 (C)
   “Biological validation - Correspondence between the hub genes and the
   disease of interest”. The percentage of CR and OC genes found in the
   hub gene sets is summarized in [95]Table 1. All the designated hub gene
   sets were observed with a good proportion of co-expressed cancer
   candidate genes. This witness the fact that disease related genes have
   high interacting partners compared to other genes [96]Wu et al. (2008).
   The Criteria 2 of our approach was narrower than that of Criteria 1. In
   Criteria 2, genes were mined based on an ensemble of gene ranking and
   intramodular connectivity, which results in a reduced fraction of CR
   genes in the respective hub gene sets. However, the hub gene sets
   namely C1-HubMN and C2-HubMN identified from the constructed multiplex
   network were showing a highest fraction of CR genes which is consistent
   both for Criteria 1 and Criteria 2. Although small, a fraction of OC
   genes were also spotted in the hub gene sets extracted from the
   integrated Union network and Multiplex network, whereas no OC genes
   were spotted in the intersection network. A set of 74 genes, named as
   C1-CmnHub found to be common in three hub gene sets extracted using
   criteria 1 and 38 genes were found common in the six hub gene sets
   extracted from the integrated networks. Out of them genes ACDY2, AKAP6,
   ASPM, F8, FCGR3A, FOXO4, HSDL2, LILRB1, ME3, MUC21, NDRG2, PIK3CG,
   TNNC1, SIGLEC9 were examined as related to cancer. Additionally, eight
   genes: CCL18, CCL4, TERT, RECK, LEPR, CDC20, LTA, HLADQA which are
   known to be oral cancer were spotted both in the hubs C1-HubMN and
   C2-HubMN. An intersection chart showing the common genes spotted in
   different hub gene sets is illustrated in [97]Figs. 3a and 3b. These
   genes were highly connected as well as showing a significant
   correlation between expression and methylation levels in the multiplex
   network which may have a regulatory effect on related genes. The
   identified hub gene sets were further validated in terms of their
   prediction accuracy in the succeeding section.

Figure 3.

   [98]Figure 3
   [99]Open in a new tab

   Venn diagram showing different hub gene sets and their intersection.
   (a) Intersection of hub gene sets C1-HubUn, C1-HubIn, C1-HubMN
   extracted using criteria 1 shows 74 common genes: C1-CmnHub. (b)
   Intersection of hub gene sets C2-HubUn, C2-HubIn, C2-HubMN extracted
   using criteria 1 and C1-CmnHub shows 38 number of genes.

3.1. Classifier performance

   Hub genes in a gene network are high degree nodes having more
   interacting partners than non-hub nodes. These genes direct the whole
   network structure and play a very important role in multiple regulatory
   systems [100]He and Zhang (2006). Three different classifiers KNN
   (k-nearest neighbors, with k=3, based on the different repeated
   experimental trials), Random Forest (RF) and Support Vector Machine
   (SVM) had been applied for evaluating the identified sets of hub genes
   to determine how good or bad they were in predicting the disease state
   of a test data. To avoid one time occasionality, the class outcome of
   the hub genes as predictor entity were examined using 10 fold cross
   validation. All the sets of hub genes showed a low variance in MCC for
   all three classifiers, which is summarized in [101]Table 2. Each set of
   hub genes had achieved a standard measure of MCC value of >= 0.5.
   However, the hub genes selected from the integrated multiplex networks
   were showing the highest value of MCC. All the hub gene sets achieved a
   comparable prediction accuracy between 92% and 96% and showed good
   performing scores of sensitivity, specificity, precision and F-measure
   with respect to various classes for different classifiers. However,
   among all, hubs C1-HubMN and C2-HubMN score highest in terms of
   sensitivity and specificity. In the medical context, sensitivity and
   specificity are the key measures for evaluating a diagnostic model. It
   is noteworthy to mention here that the genes marked as the hub in the
   multiplex network were co-related across transcriptomic as well as
   epigenetic levels and highly connected in the constructed multiplex
   system. Results revealed that the integrated multiplex network approach
   effectively identifies hub genes with improved prediction accuracy.

Table 2.

   Classifier performance of Hub genes marked in the integrated networks.
   Hub NoG Cl 3NN
     __________________________________________________________________

   RF
     __________________________________________________________________

   SVM
     __________________________________________________________________

   Sn Sp Pr FM MCC Sn Sp Pr FM MCC Sn Sp Pr FM MCC
   C1-HubUn 1710 N 0.90 0.95 0.95 0.92 0.89 0.89 0.94 0.94 0.91 0.90 0.92
   0.98 0.95 0.94 0.91
   T 0.95 0.90 0.93 0.94 0.89 0.94 0.89 0.92 0.93 0.90 0.98 0.92 0.97 0.98
   0.91
   C1-HubIn 78 N 0.82 0.99 0.97 0.89 0.86 0.77 0.89 0.96 0.87 0.84 0.82
   0.99 0.97 0.89 0.86
   T 0.99 0.82 0.93 0.96 0.86 0.89 0.77 0.92 0.96 0.84 0.99 0.82 0.93 0.96
   0.86
   C1-HubMN 527 N 0.85 0.99 0.97 0.90 0.96 0.87 1.00 1.00 0.93 0.93 0.92
   1.00 1.00 0.96 0.95
   T 0.99 0.85 0.94 0.97 0.96 1.00 0.88 0.95 0.98 0.93 1.00 0.92 0.97 0.99
   0.95
   C2-HubUn 721 N 0.92 0.97 0.92 0.92 0.89 0.87 1.00 1.00 0.93 0.91 0.82
   1.00 1.00 0.90 0.88
   T 0.97 0.92 0.97 0.97 0.89 1.00 0.87 0.95 0.98 0.91 1.00 0.82 0.94 0.97
   0.88
   C2-HubIn 40 N 0.77 0.84 0.95 0.87 0.84 0.77 0.99 0.97 0.86 0.82 0.69
   1.00 1.00 0.82 0.79
   T 0.85 0.77 0.92 0.96 0.84 0.99 0.77 0.92 0.95 0.82 1.00 0.69 0.89 0.94
   0.79
   C2-HubMN 233 N 0.82 0.99 0.97 0.89 0.98 0.80 1.00 1.00 0.89 0.93 0.95
   0.99 0.97 0.96 0.95
   T 0.99 0.82 0.93 0.96 0.98 1.00 0.80 0.93 0.96 0.93 0.99 0.95 0.98 0.99
   0.95
     
   Cmn-Hub 46 N 0.96 0.98 0.98 0.91 0.98 0.90 0.95 0.97 0.97 0.95 0.92
   0.99 1.00 0.96 0.96
   T 0.99 0.95 0.91 0.98 0.98 0.85 0.90 0.99 o.99 0.95 0.99 0.92 0.97 0.99
   0.96
   [102]Open in a new tab

   The hub gene sets marked in bold showing significant predictive
   accuracy measures.

   C1-HubUn, C1-HubIn, C1-HubMN are three hub gene sets identified from
   Union Network, Intersection Network and Multiplex Network using
   criteria 1; C2-HubUn, C2-HubIn, C2-HubMN are three hub gene sets
   identified from Union Network, Intersection Network and Multiplex
   Network using criteria 2; Cmn-Hub is the hub gene set consisting of 46
   common hub genes from all six integrated networks; NoG Number of genes,
   Cl class label, 3NN 3 nearest neighbors, RF random forest, SVM support
   vector machine, Sn sensitivity, Sp specificity, Pr precision, FM
   F-measure, MCC Matthews correlation coefficient; N normal and T tumor
   Oral cancer sample.

   The objective of this work was to identify few driver genes which are
   crucial for understanding the genetic etiology behind a complex disease
   and potential candidate for identifying drug targets for OC. As all the
   hub gene sets showed competitive performance as a predictor variable,
   the proposal of identifying hub genes boiled down to forming another
   hub gene set: Cmn-Hub, consisting of 46 genes. These 46 hub genes
   comprised of 38 common genes (found in all six hub gene sets) and 8 OC
   genes (spotted in both the hubs C1-HubMN and C2-HubMN). To our
   surprise, these 46 hub genes were outperforming with average
   sensitivity value of 93% and specificity value of 94% when applied to
   the classifiers. The observed prediction accuracy of hub gene set
   Cmn-Hub was 96% with an average MCC value of 0.96. ROC curve showing
   the comparative performance of Cmn-Hub, C1-HunMN, C2-HubMN is
   illustrated in [103]Fig. 4. The common hub gene set with a smaller size
   (46 genes) was observed with significantly better performance as a
   classifier in comparison to other identified hubs. Additionally, the
   eigenvector centrality of hub genes in the multiplex network was
   observed to be greater than 0.5, indicating the strong connectivity of
   these hub genes with other hub genes in the network. This suggests that
   these genes are highly influential in the biological network and
   contribute most as a predictor variable. The selected hub genes were
   further validated in our next step in terms of their connectivity and
   functional association with the known set of genes for oral cancer. The
   list of hub genes is given as Supplementary Table S1. Further, the
   differential expression analysis and biological importance of these
   hand picked 46 number of genes is discussed in succeeding sections.

Figure 4.

   [104]Figure 4
   [105]Open in a new tab

   Comparative performance of the identified Hub gene sets (Cmn-Hub,
   C1-HubMN, C2-HubMN) for three different classifiers: k-nearest neighbor
   for k = 3 (3NN), random forest (RF) and support vector machine (SVM).

3.2. Functional association analysis

3.2.1. Association of the selected hub genes with the seed genes

   To further validate the filtered gene set, we looked for the
   interconnection and functional association of our studied gene sets
   with the 28 seed genes (Supplementary Table S2) taken based on its
   association in the progression of OC or Head and Neck Squamous
   Carcinoma (HNSCC). Identified 46 number of hub genes along with the 28
   seed oral cancer genes were mapped to PPI network using stringApp of
   Cytoscape which maps the genes to STRING database of interacting
   proteins [106]Szklarczyk et al. (2015). STRING includes both physical
   interactions from experimental data and functional associations from
   curated pathways, automatic text mining, and prediction methods. A
   minimum interaction score greater than 0.5 (optimum confidence) was
   used and only query proteins were displayed. The co-expressed hub genes
   identified in the integrated multiplex network were also relatively
   closer in the PPI network, which is shown in [107]Fig. 5. Genes viz.
   FOXO4, NDGR2 which are known to get involved in cancer were marked as
   strongly connected with the widely known cancer gene TP53. PIK3CG, CCL4
   were strongly connected with HRAS, PIK3CA, NOTCH1, JUN which are known
   specific to OC. CDC20, RECK, NDRG2 were connected with seed gene PTEN.
   Hub gene TERT was highly co-related with PTEN, TP53, HRAS, PIK3CA which
   are benchmark genes for OC. This shows, genes that are co-altered at
   multiple levels of expression, also are more closer in the known
   biological network. These genes can be considered significant for
   further study of prognostic influence in oral cancer. To more
   accurately select the hub genes, a biological significance analysis of
   these hub genes had been done in our next step.

Figure 5.

   [108]Figure 5
   [109]Open in a new tab

   Association of 46 significant hub genes and 28 seed genes in PPI
   network.

3.2.2. Identification of deferentially expressed hub genes (DEHs)

   DE analysis was carried out to identify the hub genes showing
   significant expression changes between tumor and normal samples. A
   filtering threshold of padj(FDR)<0.05 and Log2Fold-change>2 was set to
   identify the DEHs. Out of 46 identified hub genes, 28 genes were found
   to be significantly deferentially expressed, which is shown in
   [110]Fig. 6a. The list of DEHs is given in Supplementary Table S3.
   [111]Fig. 6b demonstrates the 16 up-regulated DEHs and 12
   down-regulated DEHs. The most significant up-regulated genes were
   CDC20, CCL4, LCP2, ASPM, FCGR3A, whereas genes NDGR2, MUC21, PERM1,
   TNNC1, HSPB7 were observed to be significantly down-regulated. Among
   them, genes CCL4, CDC20 are Oral cancer genes and ASPM, FCGR3A, TNNC1
   are related to cancer.

Figure 6.

   [112]Figure 6
   [113]Open in a new tab

   Differential expression analysis of hub genes. (a) A Volcano plot:
   Genes represented by blue color dots are showing significantly altered
   expression (FDR<0.05 and LogFC>2) (b) 16 up-regulated and 12
   down-regulated DEHs with their p-values.

3.2.3. Functional association analysis of the selected hub genes

   To further unveil the biological implications behind the selected
   genes, Gene Ontology (GO) functional and pathway enrichment analysis of
   hand picked hub genes along with the considered seed genes were done
   using DAVID [114]Sherman et al. (2009). A count>=2 and EASE>0.1 were
   set as cut off parameters. EASE is a GO enrichment analysis score based
   on the Fisher Exact Test [115]Hosack et al. (2003). A threshold of
   p<0.05 was used to indicate significant functionality and pathway
   categories. GO functional enrichment analysis was carried out based on
   three functional groups: (1) biological processes, (2) cellular
   components and (3) Molecular functions.

   Enriched list of the biological processes involving the hub genes is
   summarized in Supplementary Table S4. Result revealed that the
   identified hub genes were significantly enriched in different
   biological processes including immune response, cell proliferation,
   apoptosis, cell cycle arrest, signal transduction, Ras protein signal
   transduction along with the benchmark OC genes NOTCH1, JUN and HRAS.
   JUN is responsible for OC metastasis whereas deviated expression of
   NOTCH1 is responsible for cell proliferation. Proper functioning of the
   immune system requires a balanced interaction between immune and
   non-immune cells. Chemokines regulate leukocyte migration and
   trafficking to their destined tissues [116]Chakraborty et al. (2018).
   Encoded proteins from hub genes CCL4, CCL18 has chemokinetic and
   inflammatory functions. Aberrant expression of these genes may disrupt
   the immune system and lead to resistant tumor formation. The presence
   of cancer related gene PIK3CG in many biological processes regulating
   the immune system gives an indication to the researchers for further
   analysis to uncover the genetic etiology behind the disease. The rate
   of growth in normal and cancer tissues is determined by a balance
   between cell proliferation and apoptosis. As cancer cell continues to
   grow uncontrollably, biological activities are also disturbed in cell
   cycle regulation. Somatic mutation in TP53 plays a significant role in
   regulating cell cycle, cellular differentiation, DNA repair, apoptosis
   [117]Nanda et al. (2011); [118]Polyak et al. (1994). Gene CDC20 which
   was observed to be significantly up-regulated is significantly enriched
   in the biological process related to cell proliferation. Hub gene CDC20
   was observed to be significantly up-regulated in OC. This implies that
   deregulated expression of CDC20 can mediate a series of downstream
   genes to drive cellular proliferation and activate cell death
   apoptosis. Genes CASQ2, MYH6, MYH7 which are not known to be cancer
   related, were found to be highly enriched in GO terms related to muscle
   contraction. Muscle contraction has been reported as a key biological
   process in OC, which witnesses the presence of myofibroblasts in tumor
   stroma of patients with lymph node involvement [119]Shaikh et al.
   (2019). A series of genes including LILRB1, FOXO4 (cancer related) were
   found to be predominantly enriched in cell cycle arrest, immune
   response, regulation of the apoptotic process. Apoptosis is one of the
   key processes in oral cancer progression. Additionally, gene PIK3CG,
   TERT, CCL4, CCL18 were found to be highly enriched with biological
   processes such as neutrophil chemotaxis and angiogenesis. Angiogenesis
   is one of the crucial factors in cancer progression [120]Nishida et al.
   (2006). The presence of neutrophils at the primary tumor site activates
   a broad spectrum of functions leading to cancer. It generates the
   secretion of proangiogenic factors as well as the proteolytic
   activation of proangiogenic factors which causes angiogenesis
   [121]Granot and Jablonska (2015). [122]Fig. 7 shows the most
   significant biological processes observed in the identified hub genes.

Figure 7.

   [123]Figure 7
   [124]Open in a new tab

   Most significant biological processes (P-value<0.05) observed in the
   identified hub genes.

   The enumerated list of cellular components (CC) and molecular functions
   (MF) involved in the putative list of hub genes are summarized in
   Supplementary Tables S5 and S6 respectively. Our list of hub genes was
   found to be mostly enriched in the cytosol (CC), phosphatidylinositol
   3-kinase complex (CC), plasma membrane (CC), protein complex (CC),
   muscle myosin complex (CC) along with the verified list of seed genes
   such as TP53, NOTCH1, JUN, PIK3CA, HRAS. Moreover, we also noted
   protein binding (MF), enzyme binding (MF), transcription factor binding
   (MF), kinase activity (MF), phosphatidylinositol-4,5-bisphosphate
   3-kinase activity (MF) in the enriched list of molecular functions
   entailing genes PIK3CG, CDC20, CCL4, ADCY2 and TERT. Results suggest
   that there is a direct or indirect association between the functional
   activity and localization of the proteins encoded by the scrutinized
   hub genes and OC.

   DAVID also produce the KEGG pathway enrichment analysis for the 46
   identified hub genes with respect to the widely accepted seed genes.
   The highlighted pathways were p53 signaling pathway, Chronic myeloid
   leukemia, small cell lung cancer, Pathways in cancer, ErbB signaling
   pathway, PI3K-Akt signaling pathway and focal adhesion. In general,
   these pathways are responsible for cell cycle regulation, signal
   transduction and many cancer mediated process. Pathways in cancer, p53
   signaling pathway, small cell lung cancer are directly attributed to
   OC. p53 protein regulates cell cycle and functions and acts as a tumor
   suppressor. Disrupted regulation of p53 signaling elements deregulates
   numerous tumor suppressing process including DNA repair, cell cycle
   arrest and apoptosis [125]Stegh (2012). p53 gene is one of the most
   frequently mutated genes in oral cancer [126]Rowley et al. (1998).
   Isoforms of oncoprotein BCR-ABL activates tyrosine kinase and cause
   chronic myeloid leukemia. Deregulation of these isoforms causes altered
   cellular adhesion, activation of mitogenic signaling pathways,
   inhibition of apoptosis process [127]Melo and Deininger (2004). Hub
   genes PIK3CG and PIK3R5 were found as markers in chronic myeloid
   leukemia and play an important role in cell growth, proliferation,
   differentiation, motility, survival and intracellular trafficking.
   Mutation of proto-oncogene PIK3CA is a key factor in cell proliferation
   and survival through activation of the PI3K/Akt signaling pathway.
   PIK3CA along with PTEN plays a pivotal role in the PI3K-Akt signaling
   pathway and hence accepted as a biomarker for oral carcinogenesis
   [128]Chang et al. (2013). Hub genes PIK3CG and PIK3R5 along with the
   seed gene PIK3CA were observed in many regulating pathways in OC.
   Overexpression of PIK3CG in tumor cells suppresses cell proliferation.
   Loss of function of this protein contributes to tumorigenesis process
   [129]Semba et al. (2002). Focal adhesion kinase is a cytoplasmic
   tyrosine kinase identified as a key mediator of signaling by integrins,
   a major family of cell surface receptors and other receptors in both
   normal and tumor cells [130]Zhao and Guan (2009). It is known to be an
   important mediator of cell growth, cell proliferation, cell survival
   and cell migration, all of which are often dysfunctional in cancer
   cells. The pathway enrichment analysis unveils that the hub genes
   marked by our approach are related to the development of cancer. The
   highly significant pathways observed in this analysis are demonstrated
   in [131]Fig. 8. A detailed list of enriched pathways is given in
   Supplementary Table S7.

Figure 8.

   [132]Figure 8
   [133]Open in a new tab

   Highly enriched list of pathways (P-value<0.05 observed in the
   identified hub genes.

4. Discussion

   Oral cancer ranked among the top three cancers and counts about thirty
   percent of prevalent cases in India. Early diagnosis and prevention of
   OC are prioritized at the global health forums [134]Kulkarni (2013).
   Genomic studies show that hub genes in a genetic network plays an
   influential role in regulating the whole network structure. Thus,
   identification of such genes related to specific cancer types can help
   in reducing the gap in OC prognosis.

   In this study, a biological network based integrative approach was
   proposed for the identification of hub genes related to OC. Data from
   both genomic and epigenomic levels were employed for the study. Gene
   co-expression network and gene co-methylation network were constructed
   using WGCNA. In the next step three integrated networks: Union Network,
   Integration Network and Multiplex network were generated from the
   constructed correlation networks. Multiplex network was constructed
   where isolated networks of gene co-expression and co-methylation encode
   for different layers in the multiplex system with coupling edges in
   between. Hence interplay among the interacting units across layers is
   preserved. Degree centrality (multi-degree centrality) and eigenvector
   centrality of genes were computed to find the connectivity of genes in
   the multiplex network. Based on the connectivity of genes in the
   integrated networks and the correlation networks, criteria 1 and
   criteria 2 were fixed for the identification of hub genes from the
   integrated network which results in six sets of hub genes. Among all,
   38 hub genes were found to be common in all the hub gene sets.

   To examine the performance of these genes as a predictor variable, the
   classifier accuracy of these gene sets was determined on the
   classifiers KNN, RF and SVM. The classifier performance indicates that
   all the hub gene sets attained good values for MCC. However, hub gene
   sets constructed from Multiplex network (C1-HubMN and C2-HubMN)
   achieved significantly high scores for MCC, Sensitivity and
   Specificity. This corroborates the ability of the Multiplex network in
   exploiting different types of interactions among biological entities.
   We further identified 46 genes as our putative hub genes, comprising of
   38 common genes (found common in all six hub gene sets) and 8 OC genes
   (spotted in both the hubs C1-HubMN and C2-HubMN). To our surprise, the
   classifier performance of Cmn-Hub with 46 genes was significantly high
   with a prediction accuracy of 96% in comparison to other hub gene sets
   whose size varies between 40 - 1710. We also investigated the
   interconnection and functional association of our studied hub genes (46
   number) with the 28 seed genes which are known to get associated with
   OC, by mapping them to the PPI network using stringApp plugin of
   Cytoscape. Multiplex centrality analysis considers the participation of
   a gene at multiple levels of the biological system. These genes are
   significantly influential both in transcriptomic and epigenetic level
   with eigenvector centrality >0.5. The differential expression analysis
   identified 28 hub genes showing a significant difference in expression
   levels between two experimental groups. Among them, genes CDC20, CCL4,
   LCP2, ASPM, FCGR3A were observed to be most significantly up-regulated
   and genes NDGR2, MUC21, PERM1, TNNC1, HSPB7 were significantly
   down-regulated. The DAVID biological enrichment analysis conveyed that
   the marked set of genes including PIK3CG, PIk3R5, CCL4, CDC20, TERT,
   LTA, MYH7 significantly takes part in various biological processes
   including immune response, cell proliferation, apoptosis, cell cycle
   arrest, signal transduction, muscle contraction, neutrophil chemotaxis,
   angiogenesis and pathways such as p53 signaling pathway, Chronic
   myeloid leukemia, Pathways in cancer, ErbB signaling pathway, PI3K-Akt
   signaling pathway along with the seed genes. More importantly, this
   approach focuses on identifying the hub genes that co-express at
   multiple levels including both genetic and epigenetic. The proposed
   multi-layered data integration approach provides the researchers with
   new insight for detailed exploration of tumorigenesis and progression
   in OC by systematically enhancing the study of gene action.

5. Conclusions

   Diagnosis of OC at its early stage results in better therapeutic
   implications. The study reveals that genes encoded for hub proteins are
   enriched for disease genes and preferentially targeted by pathogens.
   Thus identification of such hub genes may give a further clue to the
   researchers for a thorough understanding of the disease pathogenesis.

   This work aimed at identifying a set of hub genes crucial for OC using
   an integrated framework of multi-omics data. Three integrated networks:
   Union Network, Integration Network and Multiplex Network were
   constructed using expression level as well as methylation intensity of
   genes. A set of 46 genes were identified as hub genes which are
   validated in terms of their statistical competence and biological
   importance. We spotted five genes PIK3CG, PIK3R5, MYH7, CDC20 and CCL4,
   which were observed to be predominantly enriched in various biological
   processes and pathways crucial for OC. This work suggests an
   alternative research direction for biological network analysis. The
   identified genes in our approach can be a potential prognostic
   biomarker in the process of tumorigenesis and underlying molecular
   events in OC. Our findings enlighten the researchers for a more
   extensive investigation of OC.

Declarations

Author contribution statement

   S. Mahapatra: Conceived and designed the experiments; Performed the
   experiments; Analyzed and interpreted the data; Wrote the paper.

   R. Bhuyan, J. Das: Analyzed and interpreted the data.

   T. Swarnkar: Conceived and designed the experiments; Wrote the paper.

Funding statement

   This research did not receive any specific grant from funding agencies
   in the public, commercial, or not-for-profit sectors.

Data availability statement

   Data associated with this study can be found at The Cancer Genome Atlas
   (TCGA).

Declaration of interests statement

   The authors declare no conflict of interest.

Additional information

   Supplementary content related to this article has been published online
   at [135]https://doi.org/10.1016/j.heliyon.2021.e07418.

   No additional information is available for this paper.

Footnotes

   ^1

   [136]https://portal.gdc.cancer.gov/.
   ^2

   [137]http://www.ncbi.nlm.nih.gov/geo/.
   ^3

   [138]https://string-db.org/.

Supplementary material

   The following Supplementary material is associated with this article:
   Supplementary-S1.xlsx

   List of identified 46 Hub genes (Cmn-Hub).
   [139]mmc1.xlsx^ (8.8KB, xlsx)
   Supplementary-S2.xlsx

   List of 28 seed genes.
   [140]mmc2.xlsx^ (7.5KB, xlsx)
   Supplementary-S3.xlsx

   List of DEHs identified in the Hub gene set: Cmn-Hub.
   [141]mmc3.xlsx^ (9.4KB, xlsx)
   Supplementary-S4.xlsx

   Significantly enriched list of biological processes of selected 46 Hub
   Genes.
   [142]mmc4.xlsx^ (12.5KB, xlsx)
   Supplementary-S5.xlsx

   Significantly enriched list of cellular components involved in selected
   46 Hub Genes.
   [143]mmc5.xlsx^ (8.1KB, xlsx)
   Supplementary-S6.xlsx

   Significantly enriched list of molecular functions involved in selected
   46 Hub Genes.
   [144]mmc6.xlsx^ (8.2KB, xlsx)
   Supplementary-S7.xlsx

   Significantly enriched pathway terms involved in selected 46 Hub Genes.
   [145]mmc7.xlsx^ (11.5KB, xlsx)

References