Abstract It has recently begun to be considered that cancer is a systemic disease and that it must be studied at every level of complexity using many of the currently available approaches, including high-throughput technologies and bioinformatics. To achieve such understanding in cervical cancer, we collected information on gene, protein and phosphoprotein expression of the HeLa cell line and performed a comprehensive analysis of the different signaling pathways, transcription networks and metabolic events in which they participate. A total expression analysis by RNA-Seq of the HeLa cell line showed that 19,974 genes were transcribed. Of these, 3,360 were over-expressed, and 2,129 under-expressed when compared to the NHEK cell line. A protein-protein interaction network was derived from the over-expressed genes and used to identify central elements and, together with the analysis of over-represented transcription factor motifs, to predict active signaling and regulatory pathways. This was further validated by Metal-Oxide Affinity Chromatography (MOAC) and Tandem Mass Spectrometry (MS/MS) assays which retrieved phosphorylated proteins. The 14-3-3 family members emerge as important regulators in carcinogenesis and as possible clinical targets. We observed that the different over- and under-regulated pathways in cervical cancer could be interrelated through elements that participate in crosstalks, therefore belong to what we term “meta-pathways”. Additionally, we highlighted the relations of each one of the differentially represented pathways to one or more of the ten hallmarks of cancer. These features could be maintained in many other types of cancer, regardless of mutations or genomic rearrangements, and favor their robustness, adaptations and the evasion of tissue control. Probably, this could explain why cancer cells are not eliminated by selective pressure and why therapy trials directed against molecular targets are not as effective as expected. Introduction Cells are complex, dynamic systems, which use molecular signaling circuits that govern basic cellular activities and coordinate their actions [34][1]. The ability of cells to perceive and respond in an appropriate manner to the microenvironment is the basis for homeostasis, development, tissue repair and immunity. Errors in information management are responsible for different cell-derived conditions, such as autoimmune diseases, metabolic syndromes and cancer [35][2]–[36][5]. Cancer requires a very complex set of conditions. It is driven by a Darwinian model of evolution at the cellular level [37][6], comprising all levels of cellular information (i.e., genetics, epigenetics, transcriptional and translational regulation and translational modifications). Accordingly, it involves communication between different cell types, and interactions between the tumoral microenvironment and the whole organism [38][7], [39][8]. Our understanding of cancer has evolved because of this context and acquired knowledge. Hanahan and Weinberg suggested that all cancers have certain essential alterations in cell physiology that coordinate the malignant phenotype, which is characterized by self-sufficiency in growth signals, insensitivity to growth inhibitors, evasion of programmed cell death, increased replicative potential, sustained angiogenesis, tissue invasiveness and metastasis, reprogramming of energy metabolism and evasion of immune destruction. Moreover, these hallmarks are accompanied by additional enabling features, including mutations and genomic instability, and the promotion of inflammation by tumors [40][9], [41][10]. Cervical cancer represents an interesting opportunity for the study of malignant transformation, mainly due to our understanding of its etiologic agent, High-Risk Human Papilloma Viruses (HR-HPVs), which are found in 90.7% of cases [42][11]. The HR-HPV oncoproteins E6 and E7 are able to interact with the p53 and pRb tumor suppressors in addition to more than 300 other known proteins. Of the more than 120 types of HPVs that infect humans, only a few high-risk types are associated with carcinogenesis. HPV16 and HPV18 are the most prevalent high-risk HPVs, and they are present in 54.6% and 11% of cervical squamous cell carcinomas, respectively [43][12]–[44][14]. Patients with cancer caused by these HPV types are the most widely studied. The first established cervical carcinoma cell line, HeLa, is positive for HPV18 and has served as the basis for most of our knowledge regarding the underlying cell biology of cancer. However, due to spontaneous elimination of the virus, not all patients infected with HR-HPV develop cervical cancer. Most HPV infections are subclinical, with only a small fraction producing epithelial lesions, and an even smaller fraction of these lesions developing into cancer [45][15]. Consequently, HR-HPV infection is necessary but not sufficient for the development of cervical cancer [46][16]. Thus, the conditions that allow the development of cervical cancer both following HR-HPV infection and in its absence are not thoroughly known. Understanding biological complexity at different levels of organization (which would be critical for a model such as cancer, in which cellular dynamics are altered at the molecular and tissular levels) requires combining the results obtained from various experiments to recreate the system’s behavior [47][17], [48][18]. The molecular profiling methods known as “omics” (e.g., transcriptomics, proteomics, and metabolomics) allow a global search of the characteristics that define the system under study and the integration of this knowledge into simple models with great explanatory and predictive power. However, these models can and must be contrasted against new experimental data. Currently, the cellular signaling system represents the biggest challenge for systems biology [49][19]–[50][21]. A signaling pathway consists of multiple sequential events, including covalent modifications, recruitment, allosteric activation or inhibition and protein binding [51][22]. However, as our understanding of the interactions between signaling pathways increases, it becomes more apparent that the signals do not necessarily occur independently through parallel (and isolated) linear pathways, but rather, through a large and complex network of interconnected signaling pathways [52][23], [53][24]. The complex architecture of signaling networks can be understood as a set of interacting network motifs, which can provide specific network properties and add a new level of complexity to that which already exists within the spatio-temporal organization and compartmentalization of signals [54][25], [55][26].Network theory approaches have been useful to discriminate components that have a major overall effect on the system, given the number and variety of pathways in which they are involved [56][27]. Such highly-connected elements are relevant for both cellular homeostasis and disease [Hao et al. 2009]. A large number of efforts are being held in order to understand the key actors and pathways that facilitate the appearance and maintenance of cancer cells, and how these are physiologically related to the ten hallmarks proposed by Weinberg and Hanahan [57][10]. In a previous study, the analysis of the proteomes of six different cervical cancer cell lines and the protein-protein interaction networks in which they participated led us to propose that the delicate balance between the life and death decisions of cells, as well as the neoplastic phenotype, might be due to the overregulation of the transcription factors c-Myc and E2F1. This can apparently result from both viral infection and the overexpression of the protein 14-3-3Z, which has been shown to deregulate apoptosis and promote the G1 to S phase transition. Furthermore, it has been suggested to play an important role in the epithelial-mesenchymal transition (EMT) [58][28].Our study also gave insights into the multiple pathways that were orchestrated for the stabilization of the cancerous phenotype, but the actual relations among such pathways still needed to be analyzed from a broader perspective. In addition, the connections with transcriptional regulation were not fully exploited, and direct evidence of post-translational modifications that were transferred throughout signaling cascades was not yet provided. The aim of the present study was to predict the behavior of signaling pathways and regulatory networks, and determine the molecular signature of cervical cancer in a HeLa cell line model. We used data generated via sequencing, performed a differential expression analysis and incorporated microarray data to predict the response of transcription factors. This information allowed the reconstruction of the signaling, metabolic and transcriptional regulation pathways. Finally, we enriched phosphorylated proteins using Metal-Oxide Affinity Chromatography (MOAC) and identified them using tandem mass spectrometry (MS/MS), in order to validate and build a model based on all these different levels of biological information. Results We integrated different layers of complexity within the dynamics of a HeLa cells in order to track the flow of information. Using this procedure, we were able to obtain information regarding the maintenance of the malignant state and the differences between cancerous and normal cells ([59]Fig. 1). Figure 1. Pipeline. [60]Figure 1 [61]Open in a new tab We have integrated different layers of information within biological cell dynamic that tracks the flow of information. First, we performed an analysis of all transcripts in HeLa cells; this analysis provided an overview of gene expression. Subsequently, we performed a RNA-seq differential expression analysis and a query of the over-representation of the activity of TFs. This allowed reconstruction of the metabolic pathways and the signaling and cellular transcriptional regulatory pathways. Finally, we validated this reconstruction with a phosphoproteomic analysis. There are Different Gene Expression Profiles and Gene Ontologies in the HeLa and NHEK Cell Lines First, to assure the certainty of the gene expression profile of HeLa cells, we used the RNA-Seq data set generated by Nagaraj et al. [62][29]. This total expression analysis yielded a set of 19,974 transcripts. The principal difference between these analyses and those from the Nagaraj group was that we performed quartile normalization, which improves the accuracy of the differential expression calls for low-abundance transcripts by eliminating the bias of highly-expressed genes [63][30]. The distribution of the reads is bimodal ([64]Fig. 2a). A gene was considered transcribed if the confidence interval lower boundary and the FPKM were greater than zero. In total 53934 genes were mapped from which 20110 are proteins. With the resulting full expression data, we developed a metric to elucidate the representation of cellular processes by means of Gene Ontology (GO) [65][31] using the domain of cellular components in level 3 ([66]Fig. 2b). Importantly, the expression of each GO term fulfills at least 97% of the total reported for that term (Table S1 in [67]File S1). We used this metric to establish the biological quality of our analysis and to build our predictions from these data, as well as to validate the differential expression analysis. Figure 2. Gene expression patterns in the HeLa cells. [68]Figure 2 [69]Open in a new tab a) The distribution of total transcripts shows that there are two populations, one low-abundance population and a second larger, high-abundance population. This dichotomy shows that the parameters used to search for low abundance transcripts was successful. b) A graphical representation of the cellular process that was distinguished based on Gene Ontology (GO), using the domain of cellular components in level 3; the amount of retrieved elements was compared against the total size of the pathway. To understand the changes in gene expression in the HeLa cell line compared with a normal cell, we next performed a differential expression analysis via RNA-Seq, using the epithelial keratinocyte cell line NHEK as our expression control. Out of 47498 ENCODE-annotated transcripts in total, we identified 3,360 over-expressed genes and 2,129 under-expressed genes ([70]Fig. 3a; [71]Spreadsheet S1). Using this information, a GO enrichment analysis was built with the web tool ConsensusPathDB [72][32] using level 3 of the “Biological Process” domain. Even at this level of resolution, it was clear that the differential expression of genes in HeLa cells strongly favored cell proliferation over tissue organization (Table S2 in [73]File S1). The categories with a clear over-representation were “Cell cycle”, “Gene expression”, “Metabolism building blocks” and “Cytoskeletal reorganization” ([74]Fig. 3b). The categories that were under-represented included “Tissue development”, “Organs and systems”, “Signaling”, “Cell adhesion”, “Lipid metabolism” and “Programmed cell death” ([75]Fig. 3c). Figure 3. Differential gene expression in HeLa Cells versus NHEK. [76]Figure 3 [77]Open in a new tab a) A scatter plot showing the quality of the RNA-seq differential expression analysis results, using the epithelial keratinocyte cell line NHEK as a control. There were a total of 3,360 overexpressed genes and 2,129 under-expressed genes. b) The percentage distribution of the level 3 biological process domain GO terms represented by the over-expressed transcripts. c) The percentage distribution of the level 3 biological process domain GO terms represented by the under-expressed transcripts. These charts were constructed from a summary of all the similar GO terms in a functional cellular circuit. Based on these results, we propose that there is a strong tendency for HeLa cells to express genes that assist in the evasion of tissue control, which affords a clear adaptive advantage to proliferation without barriers. The Regulatory Network of Transcription Factors that are Differentially Expressed in HeLa Cells Controls Fundamental Processes Maintaining the Neoplastic State To understand the transcriptional network that governs gene expression in HeLa cells, we used the Affymetrix microarray data from HeLa cells and normal cervical epithelia generated by the Scotto group [78][33] and deposited in the GEO database. These data were loaded into the MARA web tool [79][34], which retrieves the transcription factors with altered expression. 19,171 genes (corresponding to the Affymetrix HG-U133A annotation) were evaluated, and a total of 189 significantly-activated TFs were reported. The transcriptional targets of E2F, ZNF143, YY1, ELKA, GABP, NRF1, MYB, NFY, HIF1A, TFDP1 and ELF were over-represented, whereas the transcriptional targets of ETS, NFATc, NR1H4, SMAD, TFCP2, HIC1, AR, TBP, SRF and KLF12 were under-represented. Next, we used the database generated by the MARA transcriptional target analysis to establish the network of regulation obtained from the differential expression analysis. Targets were obtained for each TF, and their regulatory networks were reconstructed. It should be noted that c-Myc, hepatocyte nuclear factor 4-alpha, BRCA1, VHL and NEMO were all involved in more than one transcription factor network and were overexpressed. After obtaining information from the TFs and their targets, we performed a GO enrichment analysis using the web tool ConsensusPathDB, and level 3 of the “Biological Process” domain. In the overexpressed-TF networks, we identified 103 GO terms (Table S3 in [80]File S1), including “Cell proliferation”, “Metabolism of building blocks”, “Cellular organization”, “Angiogenesis”, “Central metabolism” and “Signaling” ([81]Fig. 4a). In the under-expressed-TF networks, we identified 159 GO terms (Table S4 in [82]File S1), which were largely related to “Tissue homeostasis” or “Miscellaneous”. Figure 4. Transcription factor expression networks. [83]Figure 4 [84]Open in a new tab a) The percentage distribution of level 3 biological process domain GO terms represented by the over-represented TF network, highlighting ”Cell proliferation”, ”Metabolism of building blocks”, ”Cellular organization”, ”Angiogenesis”, ”Central metabolism” and ”Signaling”. This chart was constructed from a summary of all the similar GO terms in a functional cellular circuit. b) Hubs that were obtained from the node degree centrality measure of the over-represented TF network. The color indicates the score, with red being the highest and yellow the lowest. c) Hubs were obtained from the betweeness centrality measure of the over-represented TF networks. The color indicates the score, with red being the highest and yellow the lowest. With the information generated by the TF network analysis, interaction networks were built using Cytoscape software [85][35] and the Hubba plug-in [86][36]. Hubs were defined using the node degree ([87]Fig. 4b) and betweenness centralities ([88]Fig. 4c). Surprisingly, c-Myc, HNF4A, BRCA1, VHL and NEMO were the nodes that had the highest values for these measures. These results suggest that a set of overexpressed genes in HeLa cells has control over a particular regulatory network that is not present in normal cervical epithelium. They also suggest reduced expression of some other regulatory networks. This feature may be an important source of the complexity that allows the strengthening of the cellular system by selection pressure. Identification of Under- or Over-represented Pathways that Govern the Neoplastic Phenotype in HeLa Cells To understand which pathways and processes are responsible for maintaining the neoplastic phenotype, the complete list of previously identified over- and under-expressed transcripts was transformed into non-redundant UniProt identifiers [89][37]. A pathway enrichment analysis was then performed on this list using the ConsensusPathDB online tool. From the overexpressed transcripts, 83 over-represented pathways were recovered through the Pathway Interaction Database (PID) (Table S5 in [90]File S1). Among these pathways, there were many remarkable signaling pathways, including those governed by ATR and Aurora A and B. Some transcriptional regulatory networks were identified as well, including “E2F”, “MYB”, “Targets of c-Myc transcriptional activation” and “Direct p53 effectors”. For the under-expressed transcripts, 15 over-represented pathways were detected via PID [91][38] (Table S6 in [92]File S1). Some important pathways were the “Transcriptional targets of deltaNp63”, “TAp63”, “AP1 family members Fra1 and Fra2”, and “Direct p53 effectors”. Notably, both over- and under expressed transcripts were found within the “Direct p53 effectors” (Table S7 in [93]File S1). Importantly, the identification of under-expressed and over-expressed transcripts from the same data set illustrates the power of our analysis, as it enabled us to distinguish, at the genomic level, components of the same network that promoted the neoplastic state either by actual participation or by omission. Another pathway enrichment analysis was performed with a focus on the Kyoto Encyclopedia of Genes and Genomes (KEGG) [94][39] for both sets of transcripts; with this approach, we identified 17 pathways for the over-expressed genes ([95]Table 1) and 25 pathways for the under-expressed genes (Table S8 in [96]File S1). These results offer a perspective on the diverse events that occur in HeLa cells compared with non-malignant cells. Some of the highly represented processes were involved in cell proliferation (e.g., “DNA replication”, “Cell cycle” and “Homologous recombination”). Yet, other significant pathways included the “Fanconi Anemia pathway”, “Transcriptional misregulation in cancer” and “Small cell lung cancer”. These findings provided direct evidence of the conservation of malignant processes in different cancer types. Surprisingly, within the under-expressed genes, there was an over-representation of lipid metabolism pathways (particularly those involved in the metabolism of steroids, linoleic acid, arachidonic acid and the synthesis of unsaturated fatty acids), and pathways involved in cellular adhesion and several pathogen infections. Table 1. Enriched PID pathway–based sets of over expressed transcripts. Pathway name Set size Candidates P-value Q-value DNA replication 36 28 (77.8%) 7.24E15 1.67E-12 Cell cycle 124 53 (42.7%) 9.42E11 1.09E-08 Homologous recombination 28 20 (71.4%) 7.71E10 5.94E-08 Systemic lupus erythematosus 138 51 (37.5%) 4.46E08 2.58E-06 Fanconi anemia pathway 52 26 (50.0%) 1.32E07 6.12E-06 Mismatch repair 23 15 (65.2%) 6.97E07 2.68E-05 Base excision repair 33 18 (54.5%) 2.27E06 7.49E-05 Transcriptional misregulation in cancer 180 53 (29.8%) 6.84E05 0.00198 Lysine degradation 49 20 (40.8%) 0.00018 0.0037 Purine metabolism 166 49 (29.5%) 0.00016 0.0037 One carbon pool by folate 19 10 (52.6%) 0.00062 0.0135 RNA transport 157 44 (28.0%) 0.00113 0.0217 HTLV-I infection 263 67 (25.5%) 0.00124 0.0221 Nucleotide excision repair 46 17 (37.0%) 0.00172 0.0284 Pyrimidine metabolism 101 30 (29.7%) 0.00253 0.039 Spliceosome 127 35 (27.6%) 0.00463 0.0668 Small cell lung cancer 87 25 (28.7%) 0.00876 0.119 [97]Open in a new tab The set size refers to the number of transcripts that have a UniProt ID in the corresponding PID pathway–based set at the ConsensusPathDB site. The number of candidates contained refers to the number of proteins that are part of the extended network and appear as part of the pathway. P-values were calculated using a hypergeometric test; Q-values represent a correction of the P-values for multiple testing using the false discovery rate method. These data strengthened the results that were obtained from the MARA microarray analysis. However, even though the pathway enrichment analyses were important to our evaluation of the biological significance of the changes in gene expression, there are limits to these types of studies. For example, the same set of transcripts can be considered part of different pathways. Consequently, the results must be carefully evaluated and validated before any definitive interpretations are made. Therefore, similar to the current work, studies that involve different layers of information and validation are strongly needed. There is a Defined Pattern in the Activation of Cellular Circuits in HeLa Cells that Involves Interconnections and Crosstalk that could be Identified based on the Differentially Expressed Genes, Regulatory Networks and Pathways To visualize the integration of the different layers of biological information that were obtained in the present work, the PID Batch Query tool was used. As an input, we used the list of transcripts obtained in the differential expression analysis, as well as the list of super-active TFs that was obtained from the MARA analysis. The PID batch query yielded 64 pathways curated in the Biopax level 3 format [98][40], which displayed an increased amount of interconnections and crosstalk among their cellular circuits. Such interconnection and crosstalk could allow constant proliferation, immortalization and cell migration by means of both the over- and under-expression of different genes and a pattern of transcriptional control that differed from that observed in normal cells. To explore this hypothesis, we performed a reconstruction of every one of the aforementioned networks using the Bisogenet plug-in [99][41] in Cytoscape. Afterwards, we searched for the nodes that served as interconnectors between the explored pathways, and with the resulting data, we reconstructed two models. In the first model, we built a network by using every component of each pathway ([100]Fig. 5). Node-degree and betweeness centralities were analyzed, and with these values, we could determine the presence of hubs in all the pathways. This lead to the observation that c-Myc, BRCA1, VEGFA and E2F1 were the most interconnected and influential nodes in all of the networks. Figure 5. Network of interconnections and crosstalk among the cellular circuits. [101]Figure 5 [102]Open in a new tab The network was made from data obtained by the analysis of signaling pathways and regulation. The nodes represent proteins that make up each of the pathways; each pathway is indicated by using different colors. The size of the node is determined by betweeness centrality measure, a larger size shows greater number of shortest paths that pass through that node. In our second model, we followed the hypothesis that there are well-defined cellular circuits that can be extrapolated to other malignant cells. Such circuits can be adapted to the concept of “Hallmarks of Cancer” that was originally proposed by Hanahan and Weinberg, which we reshaped in what we called “meta-pathways” ([103]Fig. 6, [104]Figure S1). Each one of the pathways that appear in the figure possesses a certain degree of connection with other pathways that was determined either by transcriptional regulation, signaling, metabolism or a combination of these mechanisms. This meta-network would be capable of maintaining the neoplastic phenotype. In [105]Figure 6, each of the pathways and meta-pathways is related to its possible role in the generation of the hallmarks of cancer. Figure 6. Meta-pathways analysis. [106]Figure 6 [107]Open in a new tab An analysis was conducted combining the obtained signaling and transcriptional regulation pathways; the edges indicate the regulatory or hierarchical relationship, and the nodes indicate the pathway. The colors denote each of the hallmarks of cancer, with the two most representative hallmarks indicated per node. Additionally, we use a betwenness-weighed layout allowing the separation of dense clusters and the identification of elements with high centrality. To validate the data of pathway over- and under- representation that were obtained, we conducted a phosphoproteomic analysis of the HeLa cell line via enrichment with Metal-Oxide Affinity Chromatography followed by identification of the proteins via LC/MS-MS. We identified a total of 271 phosphorylated proteins (Table S9 in [108]File S1), reported in 40 level 3 GO terms (Table S10 in [109]File S1), 21 PID pathways (Table S11 in [110]File S1) and 16 KEGG pathways (Table S12 in [111]File S1). As expected, due to the well-documented low correlation between transcript expression and protein expression, only 17% of the proteins had their equivalence as a transcript. In contrast, we found that there was a high correlation between the pathways which were based on transcript/TF expression and those based on phosphorylated proteins. Significantly, among these groups were “Validated targets of c-Myc transcriptional activation”, “Signaling events mediated by HDAC Class III”, “LKB1 signaling events”, “Class I P3K signaling events” and “FoxO family signaling”. However, we can find some important pathways that encompass members of the three levels of biological information. For example, the “c-Myb transcription factor network” and the “E2F transcription factor network”, whose targets were found within “c-Myc transcriptional activation” and, simultaneously, in pathways of central metabolism. Finally, the entire data set that was extracted from the three layers of information was used to reconstruct the signaling, regulatory and metabolic networks that govern the HeLa cell line and are different from the networks present in both normal cervical epithelium and the NHEK keratinocyte cell line. For this purpose, the KEGG Mapper tool [112][42] was used to assemble the general metabolic map ([113]Fig. 7), the cell cycle map ([114]Fig. 8a) and the adhesion molecules and focal adhesion maps ([115]Fig. 8b). Based on these maps, we were able to identify a clear pattern that could be divided into at least three groups with the following characteristics: Figure 7. Over-represented metabolic pathways. [116]Figure 7 [117]Open in a new tab Metabolic maps were reconstructed based on the KEGG database and the over-represented TF networks and identified phosphoprotein analyses, resulting in a map of general metabolism. The over-expressed transcripts are displayed in green, the under-expressed transcripts in red and the identified phosphoproteins in purple. Figure 8. Cell cycle and cell adhesion molecules pathways. [118]Figure 8 [119]Open in a new tab a) A reconstruction of cell cycle and MCM complex pathways. b) A reconstruction of cell adhesion molecules. Both maps were constructed based on the KEGG database and the transcriptomics, over-represented TF networks and identified-phosphoprotein analyses. The overexpressed transcripts are displayed in green, the under-expressed transcripts in red and the identified phosphoproteins in purple. 1. An increase in cell proliferation, resulting from the overexpression of the MCM-complex genes and diverse cell-cycle–associated proteins, which permitted sustained proliferative signaling. 2. Over-activation of carbohydrate metabolism (e.g., glycolysis and gluconeogenesis) ([120]Fig. 9b), as well as an increase in the expression of enzymes in the pentose-phosphate pathway, which generates carbon skeletons for the synthesis of nitrogenous bases and histidine. In addition, malonyl-CoA can be obtained from pyruvate metabolism ([121]Fig. 9a), which matches the “deregulation of cellular energetics” hallmark of cancer. 3. Activation of invasion and metastasis that was caused by the loss/gain of expression of several cell adhesion proteins, such as CLDN, OCLN and ESAM, the expression of which was increased. Figure 9. Over-represented metabolic pathways. [122]Figure 9 [123]Open in a new tab Metabolic maps were reconstructed based on the KEGG database and the over-represented TF networks and identified phosphoprotein analyses, resulting in a map of a) pyruvate metabolism and b) glycolysis. The over-expressed transcripts are displayed in green, the under-expressed transcripts in red and the identified phosphoproteins in purple. Discussion In the present work, we performed an analysis of the different layers of biological information available from HeLa cells. The purpose of this analysis was to build a biological model of the pathways and meta-pathways that would allow us to define the relationships between this model and the hallmarks of cancer while maintaining a systemic perspective. Hanahan and Weinberg postulated the existence of an intricate cellular circuitry that can be individually aligned with the hallmarks of cancer. They also proposed that in the near future, every circuit might be segmented into specialized sub-circuits that support discrete biological properties in normal cells and are reprogrammed in cancer cells, resulting in their hallmark capabilities. The work presented here is an attempt to prove, both theoretically and experimentally, the existence of these sub-circuits and their relationships to the specific pathways that afford specific cellular functions. Previous studies have attempted to integrate different levels of information for the prediction of signaling pathways that are either a determinant of or an obstacle to the malignant state. Minn et al. [124][43] designed a novel methodology that permits the definition of new signaling pathways. This methodology is based on the initial identification of a gene that works as a master regulator. Next, the available information is increased via a gene set analysis (GSEA). Finally, the regulatory interactions are functionally validated via the over-expression of some genes that participate in the proposed pathway. This strategy can be coupled to a systemic analysis, such as our own, to validate novel or non-canonical pathways. The validation step is critical because it has to be thoroughly representative of the complete pathways network, and the most frequently used technique for validation is the direct measurement of mRNA or proteins [125][44]. Therefore, we decided to validate our initial predictions using proteomics, with a special emphasis on identifying phosphoproteins. The phosphoproteomic validation was pursued because reversible phosphorylation of proteins is the most widely known, and presumably the most frequently used, post-translational modification in mammalian cells. Approximately 1.7% of the human genome encodes protein kinases [126][45], and at any moment, approximately 30% of all proteins can be phosphorylated [127][46]. Phosphorylation suggests that the protein is in its active form (as is the case with several metabolic enzymes) and that upstream signals are active. Alternatively, phosphorylation could result in a conformational change that allows the modulation of specific activities [128][47]. The analysis of total transcripts in the HeLa cell line yielded a notable result. A total of 19,974 genes were found to be transcribed, and while this may appear to be an exaggerated number, it has been demonstrated via RNA-Seq analysis that approximately 16,245 genes are transcribed in several types of breast cancer, and this number can vary from 14,648 to 18,290 [129][48]. When we conducted the GO analysis, we observed that the transcripts covered at least 97% of the GO terms. This aided in the evaluation of the quality of the information we obtained and prepared us to make the comparisons with the differential expression data. When analyzing the differential-expression data from a classical perspective, we observed over-expression of the typical oncogenic proteins that participate in diverse types of cancer. These proteins make up three groups: 1) the c-Myc and c-Myb transcription factors, which are expressed in a great variety of tumors [130][49]–[131][52], 2) the DNA repair and recombination proteins BRCA1 and 2, which are amplified in breast cancer and non-small-cell lung cancer [132][53], [133][54], and 3) the proteins assigned to mitotic checkpoints, such as BUB1 and BUB3, which can result in genomic instability when overexpressed [134][55], [135][56]. Within the global perspective of this work, we found 3,360 overexpressed genes and 2,129 under expressed genes. The GOs and pathways that resulted from the ConsensusPathDB analysis showed a clear over-expression of genes that are coadjuvant in maintaining the sustained proliferative signaling cancer hallmark at the transcriptional, signaling or metabolic levels. Similarly, when we analyzed the data regarding all the hyperactive transcription factors, we clearly found that the E2F, c-Myc and c-Myb pathways were positively regulating a large number of genes that initiate mitosis and allow cell cycle progression [136][57]–[137][59]. The over-expression of the FOXM1 transcription factor network has been associated with increased cell proliferation in animal models of prostate carcinoma [138][60]. Through the integration and validation of these data via the phosphoproteomic analysis, we found that over 70% of the components of the “Cell Cycle” KEGG pathway ([139]Fig. 7a) and all of the components of the Minichromosome Maintenance Complex, whose increased levels have been observed in two other cancer models [140][61], [141][62], were over-expressed. Regarding the deregulation of cellular energetics, we observed a clear increase in glycolysis and pyruvate/lactate metabolism, which is expelled from the cell, resulting in the acidification and remodeling of the extracellular matrix [142][63], [143][64]. However, the energetic inefficiency of using glycolysis to obtain ATP is compensated by the synthesis of substrates to create cellular building blocks (i.e., lipids, nitrogenous bases and peptides) from carbon skeletons that are obtained from glucose [144][65], [145][66] ([146]Fig. 6). One of the key enzymes in cancer is Pyruvate Kinase M2 because it possesses an extra tyrosine that is phosphorylated, which allows its detection in the phosphoproteomic analysis. This phosphorylation inhibits the positive regulation resulting from the fructose-1,6-bisphosphate level, stimulating the pathway and leaving a large amount of phosphorylated intermediates that can be used for anabolic synthesis and cell growth [147][67]. On the other hand, the huge quantity of glucose that is required by the cell to obtain energy is facilitated by the over-expression of the genes that encode the membrane glucose transporters GTR3, GTR4, GTR8 and GTR14 [148][68]. Glucose remains inside of the cell because it is phosphorylated and converted into glucose-6-phosphate by HXK2, which is also over-expressed in HeLa cells. During this step, glucose-6-phosphate is shunted into the pentose-phosphate pathway and is used for nitrogenous base metabolism, and we found that the enzymes responsible for turning glucose-6-phosphate into riboses and deoxyriboses [149][69] (i.e., G6PI, K6PF, DEOC, RBSK, KPRA, KPRB and PRPS1) were all over-expressed. At the transcriptional level, most of metabolic enzymes are regulated by the action of the c-Myc and HIF-1-alpha transcriptional regulatory networks [150][70], which were over-expressed at the transcriptional, TF and phosphoprotein levels. The “Activation, Invasion and Metastasis” hallmark of cancer is profoundly complex because it reflects the differential expression of diverse adhesion molecules, such as the Claudins. These molecules are involved in a highly complex interplay and have been reported to be both over- and under-expressed in other malignancies ([151]Fig. 7b) [152][71]. One indication of metastatic potential is the appearance of EMT markers. In the present study, we detected the over-expression of OCLN, VIME and 14-3-3Z [153][72], [154][73]. As for transcriptional regulation, the Endotelins pathway was over-regulated due to the expression of EDN2 and EDNRA, which resulted in the crossroads between migration and cell proliferation. Finally, the “Signaling events mediated by VEGF” pathway was over-activated because of the over-expression of VEGFA and VEGFB. This pathway permits angiogenesis, vasculogenesis and endothelial cell growth. Furthermore, it represents a crosstalk among various cancer hallmarks because it induces endothelial cell proliferation, promotes cell migration, inhibits apoptosis and induces permeabilization of blood vessels [155][74]. At the protein level, the family of 14-3-3 signal transducers plays an important role, given that their differential expression affects cell proliferation, evasion of apoptosis, cell adhesion, mitogenic signaling and the EMT [156][75]–[157][78]. The 14-3-3S isoform was found to be under-expressed in the differential expression analysis. This decreased expression supports most models that suggest that this protein functions as a tumor suppressor, and its loss or reduced expression is strongly correlated with a poor prognosis [158][79]. It is induced by DNA damage and is required for stable G2 arrest. It is directly regulated by p53 and has been found to be silenced or diminished in the majority proportion of carcinomas. The inactivation of 14-3-3S also leads to the immortalization of primary keratinocytes [159][80]. At the phosphoprotein level we identified the 14-3-3B, 14-3-3E, 14-3-3F, 14-3-3G, 14-3-3S and 14-3-3Z isoforms. The 14-3-3Z isoform antagonizes the 14-3-3S isoform; the amount of the latter diminishes because of the decrease in the expression of p53 in breast cancer, and its overexpression has been correlated with the EMT, metastasis and cell proliferation. Thus, it has been proposed that this family of signal-transducing proteins is critical for the malignant phenotype [160][81], [161][82]. After conducting the integrative analysis, we discovered that a large number of spliceosome genes were over-expressed. Constituents of the U2 component, including SRSF1, which is considered oncogenic itself, were particularly over-expressed. It has been suggested that any change in the stoichiometry or activity of splicing factors is capable of modifying the proportions of isoforms that normally do not exist or are less abundant in normal cells ([162]Fig. 10). This phenomenon could contribute directly or indirectly to the development, progression and maintenance of cancer. Another hypothesis proposes that diverse RNA-binding proteins possess a wide array of functions, and changes in their expression could trigger oncogenic effects that are unrelated to their original role within the spliceosome [163][18], [164][83]. Figure 10. Spliceosome pathway. [165]Figure 10 [166]Open in a new tab The spliceosome pathway was reconstructed based on the KEGG database and the transcriptomics, over-represented TF networks and identified phosphoprotein analyses. The over-expressed transcripts are displayed in green, the under-expressed transcripts in red and the identified phosphoproteins in purple. Integrating the evidence we obtained at the systems level, we have created a model to show the relationship between the hallmarks of cancer and the signaling, regulatory and metabolic pathways that are differentially expressed in HeLa cells. [167]Figure 6 and Supplementary [168]Figure 1 illustrate what we have termed “meta-pathways.” In these meta-pathways, the reconstructed pathways show their interrelations at the gene regulation, signaling and/or metabolic levels. Each node represents a pathway and is colored according to the two most representative hallmarks. One of the first features that stood out is the large number of signals that sustain the proliferative state of HeLa cells, as well as the great redundancy that can be appreciated among the different hallmarks. These data suggest that this system is extremely robust, as every hallmark is represented more than once and is supported by different pathways. Such behavior would provide an explanation for why directed therapies have not had the expected level of success. When one gene or subset of genes collapses, a bottleneck would immediately develop. This might result in the positive selection of a phenotype that could maintain the hallmarks of cancer via other pathways [169][84], [170][85]. Using the data generated in the present work, we propose that systematic robustness is not entirely random. Enough evidence was found to suggest that there are defined patterns in the activation of cellular circuits that involve interconnections and interferences associated to the hallmarks of cancer, the latter being accurately represented by the meta-pathways we have proposed. These observations allow us to hypothesize that, when a cell expresses abnormal levels of key proteins (e.g., TFs, signal transducers, metabolic enzymes, splicing factors or whole pathways) that should normally be repressed, it is highly likely that, if any of these expression patterns are advantageous for the cell, such patterns will become fixed by selection pressure. These effects will allow the cell to possess a response arsenal that can become more sophisticated with further exposure to selective pressures over time. Methods Gene and Transcript Quantification Raw reads were obtained from the European Nucleotide Archive (ENA) under the study accession ERP000959. This data was described in Nagaraj et al, 2011. This data was sequenced on two Illumina Genome Analyzer IIx lanes using 76+7 cycles. Raw reads of two sequencing lanes were combined, adapters were trimmed, and reads shorter than 70 nt, or with more than five bases below a quality score of 15 (PHRED-scale) were removed. The processed reads were aligned to the human reference genome (hg19/GRCh37 excluding additional haplotypes) using TopHat v1.0.13 [171][86] and transcripts and genes of the Ensembl [172][87] release 59 were quantified using Cufflinks v0.8.3 [173][88]. 50 million single-end 76 bp reads were mapped to the human reference sequence and assembled into 183086 Transcripts [174][89]. Differential Expression Analysis from RNA-seq Data Paired-end RNA-seq data of HeLa-S3 and NHEK cell-lines was downloaded from the ENCyclopediaOf DNA Elements (ENCODE) [175][90] project’s webpage (UCSC accession numbers wgEncodeEH000130 and wgEncodeEH000131 respectively). Fastq files from two HeLa-S3 75×75 paired-end RNA-seq libraries (experiment numbers 10881, 10882) and two NHEK libraries (experiment numbers 10884, 11586) were aligned to the hg19 version of the human reference genome using TopHat v1.4.1 and Gencode annotation Version 12. Default parameters were used and only the read length was modified to 75 bases. The human genome index was built using bowtie v0.12.7 [176][89]. Differentially expressed genes between NHEK and HeLa-S3 were identified using cuffdiff v 1.3.0 from the Cufflinks package. A P-value threshold of 0.01 was set for all significant differentially-expressed genes. Transcriptional Factor Analysis Affymetrix microarray dataset HG-U133A of HeLa and normal epithelium was downloaded from the Gene Expression Omnibus (GEO) [177][91] with accession numbers [178]GSM246123 and [179]GSM246422 respectively. The.cel files were uploaded to the website of MARA. This algorithm normalizes them altogether, assigns PolII promoters and binding sites within them to probe sets present on the microarray, and runs the TF activity analysis. Association of Transcriptional Targets with their RNA Expression Level The information of the genes that were reported as targets of each differentially activated TF, as reported by MARA, was associated with their particular levels of expression according to the RNA-Seq analysis. Individual MARA reports for each gene were parsed through ad-hoc scripts written in Perl 5.12.4. The processed information, as well as the data from the RNA-Seq results, were poured into a relational database built in MySQL Server 5.5 (Community Edition). Subsequently, a query for each relevant TF was conducted, relating the targets that corresponded to every TF with RNA-Seq data by joining the corresponding UniProt identifiers. The outputs were stored in plain text files and analyzed with spreadsheet software. Network Reconstruction The network reconstruction was performed with the aid of the Cytoscape plug-in BisoGenet, using the identified proteins as bait nodes and adding edges with the following parameters: Organism> Homo sapiens, protein identifiers only; Data Settings>protein-protein interactions; all data sources and all experimental methods; method> By adding edges connecting input nodes and as Output>Proteins. Pathway and GO Enrichment Analysis Enrichment was done employing ConsensusPathDB, of the Max Planck Institute for Molecular Genetics, by using the overrepresentation analysis online tool. As input, we uploaded the UNIPROT protein identifiers of all the elements of the total expression analysis, differential expression analysis, TF networks and identified phosphoproteins. We searched for pathways as defined by PID and KEGG, with a minimal overlap with the input list of 5 and a P-value cutoff at 0.0001. Also, employing the same website and the same analysis tool, we performed an enrichment analysis based on Gene Ontology level 3 categories of “Biological processes”. For this analysis, we considered only the identified core proteins and set the p-value cutoff at 0.00001. Cell Culture The HeLa cell line was provided by the oncology laboratory of the Centro Médico Siglo XXI which belongs to the Instituto Mexicano del Seguro Social. The HeLa cell line was cultured in RPMI-advanced 1640 serum-free media (Gibco BRL, USA) with red phenol and antibiotic-antimycotic solution (10,000 units penicillin, 10 mg streptomycin, and 25 µg amphotericin B per mL), supplemented with 1% fetal bovine serum (Invitrogen, Carlsbad, CA) and 200 mM of GlutaMAX (Invitrogen). The cells were incubated in 5% of CO[2] and humidity saturation at 37°C in culture flasks of 75 cm^2 (NalgeNunc International, Rochester, NY). Cells were harvested at 70% confluence with Verseno solution (Tris base 25 mM, NaCL 136.8 mM, KCl 5.36 mM, EDTA 1 mM pH7.7) and washed 3 times in phosphate buffer saline (0.1 M sodium phosphate and 0.15 M NaCl in one liter, pH 7.2). Enrichment of Phosphoproteins Protein phenol extraction was performed [180][92]. The protein extraction was resuspended in incubation buffer and then the metal-oxide affinity chromatography (MOAC) [181][93]. The phospho-protein fraction was eluted and precipitated [182][94], and then a Sodium dodecyl sulfate-polyacrylamide Gel Electrophoresis (SDS-PAGE) was conducted. Gels were stained with Coomassie Blue G-250. All bands were excised with a razor blade and the tryptic digestion was performed. Chromatographic Separation The tryptic peptides (8 ul) were desalted and concentrated on a Zorbax (Agilent 5065-9913) prior to analysis on a reverse-phase column (Agilent Zorbax 300SB C18, 3.5 um, 150×0.075 mm). Separation was performed at 400 nL/min using a lineal gradient. Mobile phase A was water with 0.1% formic acid by volume. Mobile phase B was acetonitrile with 0.1% formic acid by volume. The gradient conditions in the chromatographic run were set up as follow: A 95% (0 min) to 95% (14 min); A 95%(14 min) to 60% (54 min); A 60% (54 min) to 20% (56 min); A 20% (56 min) to 20% (61 min); A 20% (61 min) to 95% (62 min); and A 95% (62 min) to 95% (72 min). MS/MS Analysis Proteins were analyzed by MS/MS using a nanoflow chromatograph (Agilent 1100 nano pump G2226A) coupled to a hybrid triple quadrupole linear ion trap (QTRAP 3200, AB Sciex) equipped with a Nanospray II source and using Information Dependent Acquisition (IDA). Precursor ion determination was carried out using an Enhanced MS scan over a mass range of 300–1600 m/z at 4,000 amu/s (with not trapping in Q0 and Dynamic fill time) with an ion spray voltage of 3300 applied to a Picotip FS360-75-15-N with ion spray gas (nitrogen). Precursor ions were collided in Q2 using rolling collision energy (maximum allowed CE = 80). Enhanced product ion scans (MS/MS) were performed over a mass range of 100–1700 m/z at 1000 amu/sec, and collision voltages were determined dynamically. All precursor ion mass/charge ratios were confirmed with an Enhance Resolution scan. Protein identification was done by using the Mascot algorithm ([183]http://www.matrixscience.com), with the SwissProt database; the search parameters included trypsin digestion, MS/MS ion search, monoisotropic mass application, protein mass unrestricted, peptide mass tolerance of ±1.2 Da, fragment mass tolerance of ±0.6 Da and Max Missed Cleavages of 1. The data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: hAB36ZvUMqsCgDBALN0mJQ4dts+h4YiAIk5ZSasBjaG2TKhIztzBenjpxYlZaoeq41YheUt 9ahhLnC2iPCGKy0SDsGwAAAAAAAAntw =  = . Supporting Information Figure S1 An analysis was conducted combining the obtained signaling and transcriptional regulation pathways; the edges indicate the regulatory or hierarchical relationship, and the nodes indicate the pathway. The colors denote each of the hallmarks of cancer, with the two most representative hallmarks indicated per node. (TIFF) [184]Click here for additional data file.^ (742.7KB, tiff) File S1 Supporting Tables S1–S12. Table S1: Enriched gene ontology level 3 categories of biological processes of total expression analysis. Table S2: Enriched gene ontology level 3 categories of biological processes of over-expressed genes. Table S3: Enriched gene ontology level 3 categories of biological processes of over-expressed-TFs networks. Table S4: Enriched gene ontology level 3 categories of biological processes of under-expressed-TFs networks. Table S5: Enriched PID pathway-based sets of over-expressed transcripts. Table S6: Enriched PID pathway-based sets of sub-expressed transcripts. Table S7: list of over and under expressed transcripts members of the regulatory network “Direct effectors of p53”. Table S8: Enriched KEGG pathway-based sets of under-expressed transcripts. Table S9: Phosphoproteins relationship according to which replicates were identified. Table S10: Enriched gene ontology level 3 categories of biological processes of identified phosphoproteins. Table S11: Enriched PID pathway-based sets of identified phosphoproteins. Table S12: Enriched KEGG pathway-based sets of identified phosphoproteins. (PDF) [185]Click here for additional data file.^ (1MB, pdf) Spreadsheet S1 Statistical data of the RNA-Seq analysis shows the full dataset, and the data with significant differential expression. (XML) [186]Click here for additional data file.^ (13.7MB, xml) Acknowledgments