Abstract Network theory has attracted much attention from the biological community because of its high efficacy in identifying tumor-associated genes. However, most researchers have focused on single networks of single omics, which have less predictive power. With the available multiomics data, multilayer networks can now be used in molecular research. In this study, we achieved this with the construction of a bilayer network of DNA methylation sites and RNAs. We applied the network model to five types of tumor data to identify key genes associated with tumors. Compared with the single network, the proposed bilayer network resulted in more tumor-associated DNA methylation sites and genes, which we verified with prognostic and KEGG enrichment analyses. Keywords: bilayer network, centrality, DNA methylation, RNA 1. Introduction Biological processes include complex molecular interactions that can be efficiently characterized by network models [[30]1]. Since the early 2000s, much attention has been paid to the structure and function of molecular interaction networks. Metabolic, protein, and gene coexpression networks possess small-world and scale-free characteristics [[31]2,[32]3,[33]4,[34]5,[35]6]. Additionally, the robustness of the molecular interaction networks can help us understand the mechanisms behind complex diseases [[36]7,[37]8]. In particular, disease-associated genes need to be identified from the network perspective [[38]9,[39]10,[40]11]. Although experimental methods can more accurately identify disease-associated genes, they are costly and time consuming [[41]12]. Therefore, many computational approaches have been proposed to identify disease-associated genes, for example, through gene expression analysis or some machine learning methods [[42]13,[43]14]. However, these gene expression analyses are subjected to many uncertainties, and therefore they are not very accurate [[44]15], but machine learning methods often lead to overfitting and the results cannot be adequately explained [[45]16]. Because of their accuracy and interpretability, molecular interaction networks have attracted considerable attention from the biological community. Specifically constructing a proper network model and using it to identify disease-associated genes are fundamental problems. In the literature, many approaches have been developed through nodal centrality. For instance, the tumor-associated genes in gene coexpression networks have more edges than nonassociated genes, requiring the adoption of the degree centrality to identify these genes [[46]17]. Additionally, the betweenness centrality, defined on the basis of shortest paths, has been adopted in protein–protein interaction networks to identify tumor-associated proteins [[47]18]. On the basis of these two measures, many variants have been developed, such as the PageRank centrality [[48]19] and the HITS algorithm [[49]20]. The different centrality indicators enable the identification of hub nodes in the network from different perspectives [[50]21]. In recent decades, multilayer networks have been widely used to thoroughly analyze biological systems because of the multiple and complex interactions between molecules [[51]22,[52]23]. The multilayer networks generated from multiomics provide more information than single networks [[53]24]. For instance, Cantini et al. [[54]25] constructed four-layer network including transcription factor cotargeting, microRNA cotargeting, protein–protein interactions, and gene coexpression, and showed that the multilayer network communities were enriched in biological components involved in the oncogenic process that could not be determined from the coexpression network alone. Ehsan et al. [[55]26] applied the random walk algorithm to a three-layer network containing gene coexpression, regulatory, and physical interaction layers, and identified several hub genes affecting colon carcinoma. Mahapatra et al. [[56]27] proposed a multiplex network model using gene expression and gene methylation data from oral cancer to identify hub nodes. During the analysis, the centrality measure was still used. Zhang et al. [[57]28] built a miRNA–protein expression network for breast cancer patients and combined degree and betweenness centralities to find new miRNAs as biomarkers. Wang et al. [[58]29] proposed EDCPID centrality based on tensor decomposition and applied it to a yeast landscape and H3N2 inflammatory and lung cancer multilayer networks, being able to effectively identify hubs. Chen et al. [[59]30] calculated the betweenness centrality of the protein layer nodes in the network and filtered out the top 10% proteins in a four-layer network of ingredient–protein–metabolic pathway disease associated with the Xiaochaihu decoction. Despite many studies on multilayer molecular interaction networks, we notice two shortcomings. First, in most multilayer network models (e.g., [[60]26,[61]28]), the focus has been gene and protein coupling, but few researchers have considered epigenomic data. Second, the construction of each layer of a multilayer network uses only the corresponding single omics, whereas the impact of other omics has usually been ignored (e.g., [[62]25,[63]26]). In this study, we used DNA methylation and RNA interaction data to construct a bilayer network of DNA methylation sites–RNAs. In particular, we considered the interactions between the genes corresponding to two DNA methylation sites in the RNA layer to determine whether the correlation between these two DNA methylation sites is reliable. Applying the method to four typical tumors, we identified more tumor-associated genes and DNA methylation sites in the network with centrality indicators. The results of the prognostic analysis of these hub nodes showed that disease-associated DNA methylation sites could be more accurately found using a bilayer network than with a single network. The results of KEGG pathway analysis confirmed that the hub genes identified through the bilayer network were closely associated with tumors. 2. Materials and Methods 2.1. Data Collection and Preprocessing We obtained DNA methylation datasets (Illumina Human Methylation 450K, level 3), gene expression datasets (IlluminaHiSeq_RNASeqV2, level 3), and clinical datasets for lung squamous cell carcinoma (LUSC), breast cancer (BRCA), endometrioid cancer (UCEC), kidney cancer (KIRC), and bladder cancer (BLCA) from the TCGA database [[64]31]. The [MATH: β :MATH] values were derived at the Johns Hopkins University and University of Southern California TCGA genome characterization center, which are continuous variables between 0 and 1, representing the ratio of the intensity of the methylated bead type to the combined locus intensity [[65]32]; we removed the probes with β values of “NA” and those not in the gene regions considered in our analysis. Clinical data included time of death and death status of patients ([66]Table S1). We obtained the RNA binding relationships from the RNAInter database [[67]33], which is a comprehensive RNA interactome resource. The database scores the confidence level of each interaction relationship by combining literature support and experimental validation, where a score above 0.75 indicates the existence of strong experimental evidence for the interaction relationship [[68]34]. 2.2. Network Construction Currently, DNA methylation networks are generally constructed on the basis of the correlation coefficients between sites [[69]35]. However, DNA methylation sites mainly regulate gene expression by recruiting proteins involved in gene expression or by inhibiting the binding of transcription factors to DNA, with no direct relationship between the sites [[70]36]. Therefore, networks based on Pearson correlation coefficients between the sites are inaccurate. The methylation level of a site affects the expression of the corresponding genes [[71]37], which motivated us to construct a bilayer network of DNA methylation sites and RNAs. 2.2.1. Construction of the DNA Methylation and RNA Interaction Networks We determined the differences in the DNA methylation sites between tumor and paraneoplastic tissue using empirical Bayes’ moderated t-test method, contained in the limma package [[72]38] in R (version 4.0.3, Guido Masarotto, AT, 2020). To reduce the risk of false positives, we adjusted p-values with the Benjamini–Hochberg false discovery rate (FDR) method. We used an FDR < 0.01 as the significance threshold [[73]39]. We then joined the edges using the cutoff of the Pearson’s correlation coefficients between sites. The Pearson correlation coefficient [[74]40] between the different DNA methylated sites was defined as follows: [MATH: r=< mi>i=1n(XiX¯)(YiY¯)i=1 n(XiX¯)2i=1n< mrow>(YiY¯)2 :MATH] (1) where [MATH: n :MATH] is the number of samples in the tumor dataset, [MATH: Xi :MATH] is the level of the DNA methylation of the [MATH: ith :MATH] sample, and [MATH: X¯ :MATH] is the average level of the DNA methylation at the site. We calculated all [MATH: r :MATH] values with the C language program. The screening of edges in correlation networks is mainly based on the hypothesis test p-value or the cutoff of correlation coefficients. Because the p-value is easily affected by the sample size [[75]41], we adopted a cutoff of the correlation coefficients to construct the DNA methylation network. Here, two DNA methylation sites were linked if the correlation coefficient [MATH: |r|>0. 8 :MATH] [[76]42]. RNAs can regulate each other through binding relationships [[77]43], which mainly depend on the structure of the RNA and the base sequence [[78]44] and are relatively stable [[79]45]. For the RNA layer, we therefore used the binding relationship between RNAs. For the edges of the RNA layer, we used the RNA interactions with confidence scores >0.75 from the RNAInter database, which are supported by strong experimental evidence [[80]34]. 2.2.2. Construction of Bilayer Network of DNA Methylation Sites–RNAs The links between the DNA methylation layer and the RNA layer are the DNA methylation sites connected with the corresponding genes. Using the edge information in the RNA layer, we filtered the edges in the DNA methylation layer. Specifically, we required two methylation sites to satisfy one of the following conditions: (i) being located on the same gene; (ii) the corresponding genes connected at the RNA layer; and (iii) the corresponding genes did not have an edge in the RNA layer, but they connected to an intermediate gene. Edges between DNA methylation sites that did not meet one of the above conditions were removed, even though they were strongly correlated with each other. The above criteria were determined using the networkX package in python (version 3.8.1). 2.3. Network Indicators 2.3.1. Degree Centrality (DC) Degree centrality [[81]46] represents the number of connected edges that a node has with other nodes in the network. The degree centrality of node [MATH: vi :MATH] is defined as [MATH: DC(v< mi>i)=jAijN1 :MATH] (2) where [MATH: N :MATH] is the total number of nodes in the network, and [MATH: A :MATH] represents the adjacency matrix of the network. The larger the degree of centrality of a node, the more important it is the in the network [[82]3] ([83]Figure 1). Figure 1. [84]Figure 1 [85]Open in a new tab Illustration of a hub node based on the (left) degree and (right) betweenness centralities. 2.3.2. Betweenness Centrality (BC) Betweenness centrality [[86]47] is a measure of the participation of a node in the shortest paths in a network. The betweenness centrality of node [MATH: vi :MATH] is defined as [MATH: BC(v< mi>i)=VS ViV t,s<tσst(vi )σst :MATH] (3) where [MATH: σst :MATH] is the number of shortest paths from [MATH: s :MATH] to [MATH: t :MATH] , and [MATH: σst(vi) :MATH] is the number of shortest paths from [MATH: s :MATH] to [MATH: t :MATH] that pass through node [MATH: vi :MATH] . In biological networks, nodes with high betweenness centrality generally play a key role in the connectivity of the network, such as communicating two modules and serving as a bridge to connect them [[87]48] ([88]Figure 1). In this study, only the DNA methylation layer was different in the bilayer network for various tumors, and therefore we performed the centrality analysis only for the DNA methylation layer. 2.3.3. Average Degree The average degree, usually denoted by [MATH: k :MATH] , is the average of the degrees of all nodes [[89]49]: [MATH: k=AijN :MATH] (4) The average degree can indicate how many neighbors the nodes have, on average, in the network. 2.3.4. ER Random Network The ER random network model [[90]49] is an equal-opportunity network model. In this model, given a certain number of nodes, a node has the same probability of inter-relationship (connection) with any other node, denoted as the edge probability [MATH: p :MATH] of the network. 2.3.5. Clustering Coefficient The clustering coefficient [[91]49] is a coefficient used to describe the level to which nodes in a graph form clusters with each other. It considers not only the number of neighbors of a node, but also the relationships between neighboring nodes. For example, the number of neighbors of node [MATH: i :MATH] is [MATH: ki :MATH] . These neighboring nodes have at most [MATH: ki(ki1)/2 :MATH] edges between them. The clustering coefficient [MATH: Ci :MATH] for node [MATH: i :MATH] is defined as the ratio of the number of edges [MATH: Ei :MATH] formed between the neighboring nodes of that node and the maximum number of possible edges: [MATH: Ci=2Eiki(ki< mo>−1) :MATH] (5) The higher the clustering coefficient, the more compact the network. The average clustering coefficient [MATH: Cnetwork :MATH] is the average of the clustering coefficients of all nodes in the network. In biological networks, variations in the clustering coefficient are generally used to characterize the degree of modularity of the network [[92]50]. 2.3.6. Shortest Path Length The shortest path length, [MATH: dij :MATH] between nodes [MATH: i :MATH] and [MATH: j :MATH] is defined as the number of edges on the shortest path connecting these two nodes [[93]49]. The average shortest path is defined as the average of the paths between any two nodes in the network, and it reflects the tightness of the network: [MATH: lnetwork=112N(< mrow>N+1)ij< /mi>dij :MATH] (6) The concept of the shortest path was used to find functional clusters in biological systems [[94]51]. We calculated all the above metrics in a network using the networkX package in python (version 3.8.1). 2.4. Statistical Analysis 2.4.1. Chi-Squared Test The chi-squared test is used to test the level of deviation between the actual observed and theoretically inferred values of a sample [[95]52]. The null hypothesis is that the observed frequencies do not differ from the expected frequencies, and the alternative hypothesis is that the observed frequencies differ from the expected frequencies. The chi-squared test statistic is defined as [MATH: χ2=< mo>(fofe)2fe :MATH] (7) where [MATH: fo :MATH] and [MATH: fe :MATH] represent the observed and theoretical values, respectively. For the chi-squared test of column independence, the degrees of freedom are [MATH: df=( R1)( C1) :MATH] , where [MATH: R :MATH] and [MATH: C :MATH] denote the number of rows and columns in the table, respectively. The chi-squared test requires the degree of freedom to determine the significance level of the statistic. The chi-squared table or statistical software can be used calculate the corresponding p value according to the chi-squared value and the degree of freedom. We conducted the chi-squared test in this study with the CHISQ.TEST function in Excel [[96]53]. We considered p  <  0.05 as statistically significant. 2.4.2. Log Rank Test The log rank test is used to test the significant differences in the location of the distribution of the overall population in which the test data are located in the case of an arbitrary overall distribution [[97]54]. By arranging the observations in ascending order, each observation is numbered in order, which is called the rank. The test is then performed by calculating the rank sum for each of the two groups of observations. The null hypothesis is that the overall distribution of the two groups is the same, and the alternative hypothesis is that the overall distribution of the two groups is different. The rank sum of the smallest group of sample size is used as the t-test statistic. In this study, we performed the log rank test using the lifelines package in Python (version 3.8.1). We considered p  <  0.05 as statistically significant, indicating the distribution of the two groups was significantly different. 2.5. Identification of Differentially Expressed Genes (DEGs) We determined the differentially expressed genes between the tumor and paraneoplastic tissues using the empirical Bayes’ moderated t-test method, contained in the limma package [[98]38] in R (version 4.0.3). We calculated log2 (fold change) using the average expression of the two groups of genes. The thresholds were FDR < 0.05 and | log2 (fold change) | > 1 [[99]55]. 2.6. Survival Analysis We used survival analysis to examine the relationship between the DNA methylation sites and overall survival (OS). We divided patients into high- and low-risk groups on the basis of the mean β value of the site. We analyzed the difference between the two groups with KM analysis [[100]56] on the basis of the lifelines package in Python (version 3.8.1). A log rank p < 0.05 was considered statistically significant. 2.7. KEGG Pathway Enrichment Analysis We performed functional enrichment analysis for genes that we found only in the bilayer network. We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [[101]57] tool for the KEGG enrichment analysis based on the hypergeometric distribution to compute the p-values [[102]58]. We set [MATH: p<0.05 :MATH] as the threshold value. 3. Results 3.1. Characteristics of DNA Methylation Sites–RNAs Bilayer Network On the basis of the DNA methylation data from the TCGA database and the RNA interaction information from the RNAInter database, we obtained bilayer networks of the DNA methylation sites and RNAs ([103]Figure 2) for five types of tumors: LUSC, BRCA, UCEC, KIRC, and BLCA. Figure 2. [104]Figure 2 [105]Open in a new tab Flow of DNA methylation–RNA bilayer network analysis. Nodes in DNA methylation layer are blue, representing DNA methylation sites; nodes in the RNA layer are red, representing RNAs. We constructed the DNA methylation network on the basis of the cut-off correlation coefficient and the RNA network on the basis of RNA binding. According to the relationship between the genes corresponding to DNA methylation sites in the RNA layer, we performed edge filtering between the DNA methylation sites. After the bilayer network was constructed, we analyzed the hub nodes according to centrality indices. Although the average degree in the DNA methylation layer considerably varied (e.g., 19 for LUSC compared with 4 for UCEC), the degree distributions of the five tumors showed right-skewed behavior, implying a scale-free characteristic ([106]Figure 3). In this scenario, a few nodes in the network had a large number of edges, and thus they were identified as hubs. We also noticed the small-world property. As shown in [107]Table 1, the average clustering coefficient was high, and the average shortest path length was low. We obtained all the results from the edge-filtered DNA methylation layer, which was much sparser than the single DNA methylation network directly generated from correlations. Figure 3. [108]Figure 3 [109]Open in a new tab Distributions of nodal degree distributions and shortest path lengths for five datasets: (a,f) LUSC; (b,g) BRCA; (c,h) UCEC; (d,i) KIRC; and (e,j) BLCA. [MATH: Nα :MATH] represents number of nodes in the network, [MATH: Eα :MATH] represents number of edges in the network, and represents average degree of nodes in the network. Table 1. Information of the DNA methylation layer network. Tumor Type Number of Nodes in the DNA Methylation Layer Number of Edges in the DNA Methylation Layer Average Degree Average Clustering Average Clustering of the ER Network Average Path Length Average Path Length of the ER Network LUSC 66,291 640,801 19.333 0.451 0.003 8.131 11.102 BRCA 43,515 289,859 13.322 0.436 0.003 6.227 10.681 UCEC 40,562 74,230 3.660 0.495 0.001 4.559 10.611 KIRC 33,910 137,679 8.120 0.430 0.002 6.907 10.431 BLCA 32,355 148,856 9.201 0.466 0.002 5.596 10.385 [110]Open in a new tab Next, we analyzed the structural properties of the RNA layer. Despite various tumors, the structure remained unchanged, containing 8087 RNAs and 20,128 RNA binding relationships. Most of nodes in the network were mRNAs, and most of edges were produced between noncoding RNAs and mRNAs, in agreement with the situation in reality. Moreover, the average numbers of edges connected to lncRNA, miRNA, and mRNA were 15.207, 12.941, and 2.883, respectively, implying that the noncoding RNAs were more central. Finally, we recovered the scale-free and small-world features of the RNA network. 3.2. Correlation of Hubs with Tumor Development Process Hub nodes play an important role in biological processes [[111]58]. To identify these nodes, various centrality metrics have been adopted in the study of biological networks [[112]3,[113]59], among which the degree centrality [[114]46] and betweenness centrality [[115]47] are commonly used because of their efficacy and interpretability. Next, we applied these two centralities to rank the nodes with importance in the DNA methylation layer, and we present the most important nodes in [116]Table 2. Table 2. Hub nodes and corresponding genes screened according to centrality metrics. Degree Betweenness Tumor Type Hub DNA Methylation Sites Corresponding Gene Hub DNA Methylation Sites Corresponding Gene LUSC Cg25080152 MYC Cg08133058 SASH1 BRCA Cg24771570 GRB2 Cg26383454 SMIM13 UCEC Cg14751398 E2F3 Cg18776056 FKBP4 KIRC Cg08311343 CDK6 Cg19858017 CLSTN1 BLCA Cg12931157 NFYA Cg01473187 TSPAN6 [117]Open in a new tab According to screening based on the degree centrality, the most critical node for LUSC was Cg25080152, corresponding to the gene MYC, which is a target gene for cancer therapy [[118]60]. The most important node we identified in BRCA was Cg24771570, which corresponds to the gene GRB2. Most cancer malignancies are caused by abnormal signaling of the Grb2 adaptor molecule [[119]61]. Cg14751398 was the largest hub node in the UCEC network, located on E2F3, which is linked to poor prognosis in some cancers as an oncogenic factor [[120]62]. Cg08311343 was the most significant node in the KIRC network, which is located on the CDK6 gene. CDK6 is able to regulate the cell cycle, and its inhibitors have been used as effective therapeutic drugs for breast cancer [[121]63]. In the BLCA network, the most critical DNA methylation site was Cg12931157, corresponding to the gene NFYA, which is associated with cell-cycle alterations and cell proliferation as a transcription factor and is closely related to several tumors [[122]64]. Our ranking of nodes with importance based on the betweenness centrality revealed that the most important node for LUSC was Cg08133058, corresponding to the gene SASH1, which is a prognostic indicator and a potential therapeutic target in non-small-cell lung cancer [[123]65]. We identified Cg26383454, located on the SMIM13 gene, as the most important DNA methylation site for BRCA, which is a membrane-associated protein. The key node identified for UCEC was Cg18776056, located on the gene FKBP4, which is a progestin receptor cochaperone protein associated with cancer malignancy [[124]66]. We identified Cg19858017 for KIRC, corresponding to the gene CLSTN1, which can be used as a biomarker for a variety of cancers [[125]67,[126]68]. Finally, the most critical node in the BLCA network was Cg01473187, corresponding to the gene TSPAN6, which is a suppressor of Ras-driven cancer [[127]69]. In summary, all the genes corresponding to the hub nodes identified were closely associated with tumors according to the two centrality measures. To illustrate the fact that the bilayer network could find more tumor-associated DNA methylation sites than the single DNA methylation correlation networks, we further compared the number of prognostically correlated loci among the hub DNA methylation sites found by the two approaches [[128]34]. We calculated the number of survival-associated loci among the top 100–500 DNA methylation sites with the largest degree and betweenness centralities for the five tumor datasets ([129]Figure 4, [130]Table S2). In this scenario, the results of the chi-squared test for all the tumors showed that the betweenness centrality for the bilayer network was better than that of the single network because more prognostic-associated DNA methylation sites were identified. For the degree centrality, although the chi-squared test results showed that the bilayer network was better than the single network only for one cancer, the rest of the bilayer networks marginally outperformed the corresponding single network. This finding can be explained as a result of the filtering of edges between the DNA methylation sites through the information in the RNA layer, which enhanced the authenticity of the network. Thus, the betweenness centrality of the bilayer network was substantially improved compared with that of the single network. The degree centrality ranks the importance of a node by the number of its neighbors. In a biological network, the more a node interacts with other nodes, the more important the node. In contrast, the betweenness centrality assesses node importance by counting the number of times that it serves as the shortest path in the network. In a biological network, this shortest path is closely related to actual biological pathways. A node with a large betweenness centrality is located at the intersection of multiple critical pathways in the DNA methylation layer, and a contiguous edge in the methylation layer represents a pathway in the RNA layer where the node is also at the intersection of critical pathways, and therefore the identified node is more important to the network. Figure 4. [131]Figure 4 [132]Open in a new tab Number of survival-associated sites in the top 100 DNA methylation sites resulting from degree and betweenness centralities. (a) Comparison of the number of survival-associated sites in the top 100 DNA methylation sites according to degree centrality for bilayer and single methylation networks (chi-squared test LUSC [MATH: χp2= 0.651 :MATH] , chi-squared test BRCA [MATH: χp2= 0.247 :MATH] , chi-squared test UCEC [MATH: χp2= 0.247 :MATH] , chi-squared test KIRC [MATH: χp2< 0.005 :MATH] , and chi-squared test BLCA [MATH: χp2< 0.005 :MATH] ). (b) Comparison of number of survival-associated sites in top 100 DNA methylation sites according to betweenness centrality for bilayer and single methylation networks (chi-squared test LUSC [MATH: χp2= 0.007 :MATH] , chi-squared test BRCA [MATH: χp2< 0.005 :MATH] , chi-squared test UCEC [MATH: χp2< 0.005 :MATH] , chi-squared test KIRC [MATH: χp2= 0.033 :MATH] , and chi-squared test BLCA [MATH: χp2= 0.011 :MATH] ). 3.3. Correlations between DNA Methylation Sites Located on the Same Gene In general, involvement in the same biological process or similarity in gene function leads to gene coexpression. To verify this property for comethylation [[133]70], we calculated the correlation coefficients between hub the DNA methylation sites and present the heat maps in [134]Figure 5 and the subnet formed by these hubs in [135]Figures S1–S10. Overall, we noticed a strong correlation between them, implying that the identified hubs from the network corresponding to genes are likely located within the same biological pathways or perform similar functions. Using prognostic analysis and subsequent pathway enrichment analysis, we found that many of the hub nodes are associated with cancer. Figure 5. [136]Figure 5 [137]Open in a new tab Correlation between top 100 DNA methylation sites in betweenness centrality and degree centrality rankings. Heat maps of degree centrality and betweenness centrality hubs of the same cancer are at the top and bottom of the same column: (a,f) LUSC; (b,g) BRCA; (c,h) UCEC; (d,i) KIRC; and (e,j) BLCA. The correlation between hub DNA methylation sites obtained by degree centrality was stronger, and the strong correlation between the hub DNA methylation sites obtained by betweenness centrality can also be observed. For the DNA methylation sites on the same gene, the correlation between them tended to be stronger. For example, among the top 100 DNA methylation sites in the BRCA network, multiple sites, such as cg27588093, cg21160149, cg04988794, cg27523417, and cg17421241, are all located on the PRDM16 gene, and we found a strong Pearson correlation coefficient between them, [MATH: | r |avg=0.781 :MATH] (the average correlation of the top 100 DNA methylation sites resulting from the betweenness centrality was 0.277). This gene is a protein-coding gene that encodes a zinc finger transcription factor that suppresses tumor production [[138]71]. Among the top 100 DNA methylation sites in the BLCA network, multiple sites, such as cg24701780, cg24804145, cg15192120, and cg25497530, are all located on the PTPRN2 gene, and the average correlation between them was [MATH: | r |avg=0.737 :MATH] , which was larger than that averaged over the top 100 DNA methylation sites. This gene encodes a protein with sequence similarity to the receptor-like protein tyrosine phosphatase, which accelerates cancer progression and metastasis [[139]72,[140]73]. In [141]Table S3, we summarize the correlation coefficients for the sites on the same gene. In general, they are stronger than those of the top 100 sites, in agreement with the literature [[142]74]. 3.4. Hubs in DNA Methylation Layer Aggregates Differentially Expressed Genes To obtain deeper insight into the screened DNA methylation sites and their corresponding genes, we counted the number of differentially expressed genes near the top 100–500 DNA methylation sites ranked by degree centrality and betweenness centrality, separately. We found that the genes corresponding to the DNA methylation sites screened using either measure in the bilayer networks were accessible in two steps to the differentially expressed genes. On the contrary, only a small fraction of the genes corresponding to sites obtained from the single network could reach the differentially expressed genes in two steps. We show the results in [143]Table 3 and [144]Table S4. All tumor data differed, according to the chi-squared test, in both single and bilayer networks ( [MATH: χp2< 0.001 :MATH] ). The hub nodes in the DNA methylation layer of the bilayer network aggregated differentially expressed genes. As suggested by Le et al. [[145]11], differentially expressed genes play a crucial role in tumor development. Table 3. Number of DNA methylation sites near differentially expressed genes. Degree Betweenness Tumor Type Single Network Bilayer Network Single Network Bilayer Network LUSC 55 100 60 100 BRCA 51 100 51 100 UCEC 70 100 63 100 KIRC 63 100 59 100 BLCA 54 100 54 100 [146]Open in a new tab However, this clustering of differentially expressed genes does not mean that all genes corresponding to hub DNA methylation sites are differentially expressed. The genes corresponding to the top 100 DNA methylation sites according to the centrality ranking are rarely differentially expressed and most of them are linked to differentially expressed genes through some noncoding RNA. For example, in the top 100 sites for LUSC, most genes are related to SNHG16 or some other miRNA. SNHG16 is a lncRNA regulating a number of mRNAs in the RNA layer that can be reached in two steps via SNHG16. Because noncoding RNAs regulate multiple mRNAs and mRNAs are regulated by multiple noncoding RNAs, these noncoding RNAs act as bridges between the mRNAs in the RNA layer. The larger the betweenness centrality of a node, the higher the number of shortest paths traveling through that node. Therefore, the betweenness centrality can be used to identify hub nodes that are located in key pathways in the network that are likely to be involved in the expression of differential genes. For the degree centrality, the hubs identified on the basis of it are more likely to interact with differential genes because they have many neighbors. Similar to the results of the prognostic analysis, the results of the differential expression analysis were also better under the betweenness centrality than under the degree centrality. All genes corresponding to the top 500 DNA methylation sites according to the betweenness centrality reached the differentially expressed genes in two steps for all tumors. Under the degree centrality, only four tumors exhibited the same behavior. 3.5. KEGG Pathway Analysis As mentioned above, the genes corresponding to the top 100–500 DNA methylation sites in terms of the degree and betweenness centralities could be screened in both the bilayer and single networks. The shortest paths in the network are closely related to biological pathways. The genes identified by the betweenness centrality have a high probability of being located in the critical pathway. Analogously, the genes identified on the basis of the degree centrality are likely to be located on some critical pathways due to their large number of neighbors. To explore the functions of those genes, we performed KEGG pathway enrichment analysis ([147]Figure 6 and [148]Figure 7, [149]Table S5). Figure 6. [150]Figure 6 [151]Open in a new tab Gene-enriched pathways screened from bilayer networks according to betweenness centrality. The X-axis represents the count and ratio of hub genes, dot size represents the number of hub genes, and dot color represents the negative of the logarithm of the p-value. Figure 7. [152]Figure 7 [153]Open in a new tab Gene-enriched KEGG pathways screened from bilayer networks according to degree centrality. The X-axis represents count and ratio of hub genes, dot size represents the number of hub genes, and dot color represents the negative of the logarithm of the p-value. [154]Figure 6 shows the KEGG pathway enrichment results of screening the top 100 genes on the basis of the betweenness centrality, where multiple KEGG pathways are associated with tumors. For hub genes in the bladder cancer bilayer network, the most important pathway is “bladder cancer”, in addition to “adherens junction” and “proteoglycans in cancer”. The hub genes in the endometrioid cancer bilayer network are enriched in the “PI3K–Akt signaling pathway” and “p53 signaling pathway”. The p53 signaling pathway is one of the most well-known cancer-related pathways, playing an integral role in multiple tumors [[155]75]. Proteoglycans in cancer is an important cancer-related pathway closely related to the immune escape of tumor cells [[156]76]. The PI3K-Akt signaling pathway plays an essential role in the regulation of cell survival, growth, and proliferation [[157]77]. Moreover, several cancer-related pathways are also enriched, including “pathways in cancer”, “bladder cancer”, and “microRNAs in cancer”. The enrichment results of the top 100 screened genes based on the degree centrality are shown in [158]Figure 7. We likewise identified cancer-related pathways, such as “hippo signaling pathway”, “central carbon metabolism in cancer”, “PI3K-Akt signaling pathway”, and “bladder cancer”. In summary, the bilayer network approach could find genes involved in tumor-related processes that could not be found by the single networks. 4. Discussion DNA methylation can influence life processes by regulating gene expression and is therefore associated with the development of various tumors [[159]70]. In this study, we constructed a bilayer network using DNA methylation and RNA interaction data from five tumors and identified a set of hub DNA methylation sites and genes using centrality indicators. Both the DNA methylation and the RNA layer networks showed the scale-free and small-world characteristics that are essential in biological networks. The majority of the DNA methylation sites screened using the centrality metric were also associated with prognosis, and the bilayer network outperformed the single network, enabling the identification of more prognosis-associated sites. By analyzing the correlation between the DNA methylation sites, we illustrated that the sites on the same gene are more strongly correlated. In addition, we found that differentially expressed genes near the hub sites were enriched. Finally, our KEGG analysis revealed that hub genes in the RNA layer were involved in multiple tumor-related pathways. Regarding the hub nodes identified in the DNA methylation layer, several issues are worth discussing. First, the genes corresponding to the most critical DNA methylation sites were mRNAs, which are closely associated with tumors. For the RNA layer, however, noncoding RNAs are located at more central positions. This is partly due to most of the DNA methylation sites being located on protein-coding genes. We found that 298,715 DNA methylation sites are in the gene region, of which 297,057 are on mRNA and only 1658 are on noncoding RNAs such as lncRNA. Additionally, in the RNA layer, a noncoding RNA can act as a bridge between two mRNAs, which results in the two mRNAs being accessible in two steps, and the DNA methylation sites on these mRNAs are not filtered out. Second, because the DNA methylation sites on the genes that perform similar functions are comethylated, we calculated the correlations between the hub sites, finding that these hubs always showed more positive correlations between them. Although DNA methylation sites do not directly interact, the positive correlation between the hub sites is actually a reflection of the site–gene–gene–site relationship. We speculate that the reason for this positive correlation may be related to the regulation of gene expression by DNA methylation sites as well as post-transcriptional regulation. A combination of site–gene correlations as well as gene–gene correlations may be required to explain this finding. In the RNA layer, we found that noncoding RNAs act as key bridges in the network. These noncoding RNAs contain lncRNAs and miRNAs, with a smaller number of noncoding RNAs at the central of the network and a larger number of mRNAs at the margin of the network. Only 7 pairs of mRNAs are directly linked, whereas 5,981,746 pairs of mRNAs are indirectly linked through a noncoding RNA. Therefore, the correction of the RNA layer to the DNA methylation layer is affected only by noncoding RNA. The difference in the status of noncoding RNAs and mRNAs also shows that our network is able to allow for some errors in the RNA layer. Overall, the proposed bilayer network framework has higher fidelity than traditional correlation networks and can be used to effectively analyze multi-mics data to identify many tumor-associated DNA methylation sites and genes that cannot be identified by single networks. We suggest three avenues for future study. First, the methylation of loci is mainly regulated by methylation-modifying proteins, which we did not consider in this study. Protein layers can be incorporated into our multilayer network. Second, we used only TCGA data in the present study, and other data sources may be added for validation in a subsequent study. Finally, for the identification of hub nodes in the network, we only used two typical centrality metrics. Other popular metrics are worth considering. Supplementary Materials The following supporting information can be downloaded at [160]https://www.mdpi.com/article/10.3390/life13010076/s1. [161]Table S1: Basic TCGA data information. [162]Table S2: Comparison of prognostic correlation numbers of DNA methylation sites in top 100 to top 500 centrality. [163]Table S3: Correlation between DNA methylation sites on the same gene. [164]Table S4: Number of differential genes within two steps of DNA methylation sites of top 100 to top 500 centrality. Table S5: KEGG enrichment results of hub genes in top 100 to top 500. Figures S1–S10: Hub nodes found by degree and betweenness centralities constituting the subnetwork of the five tumors. [165]Click here for additional data file.^ (77.7MB, zip) Author Contributions Methodology, X.-J.X., L.-C.Z. and R.Z.; formal analysis, X.-J.X. and H.-X.G.; writing—review and editing, X.-J.X., H.-X.G., L.-C.Z. and R.Z.; supervision, L.-C.Z. and R.Z.; funding acquisition, L.-C.Z. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement Not applicable. Informed Consent Statement Informed consent was obtained from all subjects involved in the study. Data Availability Statement We used public data accessible from the Cancer Genome Atlas (TCGA) database and RNAInter database. Conflicts of Interest The authors declare no conflict of interest. Funding Statement This research was funded by the National Natural Science Foundation of China, grant number 32070395. Footnotes Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. References