Abstract Pediatric cancer (PC), that is cancer occurring in children, is the leading cause of death among children worldwide, with an incidence of 175,000 per year. Elucidating the genetic abnormalities and underlying cellular mechanisms may provide less toxic curative treatments. Therefore, it is important to understand the pathology of pediatric cancer at the genetic, genomic and epigenetic level. To unveil the cellular complexity of PC, we have developed a database of pediatric cancers (Pedican), the first literature-based pediatric gene data resource by comprehensive literature curation and data integration. In the current release, Pedican contains 735 human genes, 88 gene fusion and 24 chromosome abnormal events curated from 2245 PubMed abstracts. Pedican provides detailed annotations for each gene, such as Entrez gene information, involved pathways, protein–protein interactions, mutations, gene expression, methylation sites, TF regulation, and post-translational modification. Additionally Pedican has a user-friendly web interface, which allows sophisticated text query, sequence searches, and browsing by highlighted literature evidence and hundreds of cancer types. Overall, our curated pediatric cancer-related gene list maps the genomic and cellular landscape for various pediatric cancers, providing a valuable resource for further experiment design. The Pedican is available at [30]http://pedican.bioinfo-minzhao.org/. __________________________________________________________________ Pediatric cancer (PC) is the second leading cause of death among children of 5~14 years of age in the United States, trailing only behind fatal accidents[31]^1. It is also estimated that 175,000 cases per year of children (less than 15 years old) were diagnosed with cancer worldwide[32]^1. Less than 40% of patients (those mainly from high-income countries) are able to receive adequate treatment[33]^2,[34]^3. In addition, children with cancer are at high risk of mental problems. Though the survival rate of PC has continuously improved by the use of radiotherapy and chemotherapy, the adverse effects may substantially affect the quality of life for survivors[35]^4,[36]^5. Elucidating the genetic abnormalities and underlying cellular mechanisms which initiate the cancer may provide earlier diagnosis and less toxic treatments. Therefore, it is important to understand the pathology of pediatric cancer at the genetic, genomic and epigenetic levels. The pioneer effort in Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (NCI) was to collect PC patients’ medical records, including the incidence of childhood cancer in the United States, began in 1975, gathering large amounts of information on survival, gender differences, and geographical distribution[37]^6. The accumulated single gene-based association studies showed that PCs are distinct from adult cancers[38]^4. Recently, population-based genetic screening was initiated by St. Jude Children’s Hospital and the University of Washington Children’s Cancer Genome Project (The Pediatric Cancer Genome Project, PCGP) in 2010[39]^7. As the world’s largest genetic analysis of PC, PCGP created the first genetic landscape of 15 major PCs by next-generation sequencing at a cost of about $ 65 million. However, the PCGP focused on the major PC types. The official PCGP website provides PCGP data, not containing the information from published literature. Another pediatric related web resource, pond4kids, is made up of hospital-based cancer registration and clinical information, not including patient genetic data. The genetic abnormality relating to other harmful PCs are scattered in the literature without systematic collection and comparison. In this study, we integrated known genetic predisposition information from thousands of cases in the literature to complement the population-based study from PCGP. To this aim, 2245 PC-related PubMed abstracts were collected and manually curated, which result in 735 human PC-related human genes, 88 gene fusion events, and 24 chromosome-level events being recorded. Moreover, we provide comprehensive biological annotation for biological pathway, gene regulation, interaction and expression in a user-friendly way, which may help the PC community to obtain a better understanding of pathogenesis for various PCs, and even facilitate the gene prioritization and prediction for PCs. In addition, this data resource also makes it feasible to compares the genetic differences for the cancers in children and adults. Results Functional enrichment analyses pinpoint development-related NOTCH1, FGFR and GAB1 signaling transduction in PC To explore the relevant biological processes of our collected genes, gene-set enrichment analysis was adopted to characterize whether the 735 PC-related genes had any significant annotations comparing to all the human protein-coding genes. Using strict cutoff (corrected p-value less than 0.01 and the annotated genes more than 30% of all PC-related genes); we identified 35 statistically significant enriched pathways ([40]Table S1) and 170 gene ontology terms ([41]Table S2). Those enriched functional pathways are mainly related to cancers such as transcriptional mis-regulation, constitutive PI3K/AKT signaling, proteoglycans and the P53 signaling pathway ([42]Table 1). Notably, the top enriched gene ontology terms are all related to development processes, such as cell fate commitment, gland development, regulation of organ morphogenesis, stem cell proliferation, mesenchyme development, and morphogenesis of a branching epithelium. Table 1. The statistically significant enriched pathways of PC-related genes. Pathway Adjusted P-values* KEGG pathway Pathways in cancer 7.78E-15 Bladder cancer 4.22E-07 Transcriptional misregulation in cancer 1.23E-06 Melanoma 1.56E-06 Prostate cancer 3.67E-06 Colorectal cancer 2.03E-05 Chronic myeloid leukemia 2.16E-05 Hepatitis B 4.00E-05 Proteoglycans in cancer 6.62E-05 Endometrial cancer 0.00022523 Glioma 0.000346597 Pancreatic cancer 0.000416823 Thyroid cancer 0.000563627 Non-small cell lung cancer 0.000715898 Renal cell carcinoma 0.001631639 p53 signaling pathway 0.002305302 Acute myeloid leukemia 0.004965593 Reactome pathway Constitutive PI3K/AKT Signaling in Cancer 5.27E-06 PI3K/AKT activation 1.11E-05 Signaling by SCF-KIT 1.34E-05 Signaling by FGFR 1.91E-05 PI-3K cascade 2.04E-05 PIP3 activates AKT signaling 2.04E-05 PI3K events in ERBB2 signaling 2.04E-05 PI3K/AKT Signaling in Cancer 2.04E-05 PI3K events in ERBB4 signaling 2.04E-05 Downstream signaling of activated FGFR 2.10E-05 Signaling by ERBB4 2.14E-05 GAB1 signalosome 3.45E-05 Role of LAT2/NTAL/LAB on calcium mobilization 4.12E-05 Downstream signal transduction 4.55E-05 NOTCH1 Intracellular Domain Regulates Transcription 0.000662652 Constitutive Signaling by NOTCH1 HD+PEST Domain Mutants 0.001169199 Constitutive Signaling by NOTCH1 PEST Domain Mutants 0.001169199 [43]Open in a new tab Note: *Adjusted P-values: the P-values of the hypergeometric test were corrected by Benjamini-Hochberg multiple testing correction. In fact, the pathway analysis result also confirmed the gene ontology result. The PC-related genes were also highly enriched in development-signaling pathways such as Notch1 intracellular domain regulates transcription, constitutive signaling by Notch PEST domain mutants, downstream signaling of activated FGFR, and the GAB1 signalosome. The Notch signaling pathway has a dual role in cancer (oncogenic and tumor suppressor functions)[44]^8. It is hypothesized that Notch tends to modulate the epithelial mesenchymal transition (EMT) during cancer metastasis[45]^9. However, the role of Notch signaling in PCs has only been studied in childhood T cell acute lymphoblastic leukemia (T-ALL)[46]^10. More extensive studies of Notch signaling in other PCs will provide a rationale for Notch-based therapeutic strategies. FGFR is the receptor for fibroblast growth factors (FGFs), which are often relevant to cell stemness, proliferation, anti-apoptosis, drug resistance, and angiogenesis[47]^11. In our Pedican, four FGFRs (FGFR1, FGFR2, FGFR3, FGFR4) were recorded to be related to PCs. For example, FGFR1 was reported to be associated with tumorigenesis of Ewing’s sarcoma[48]^12 and Rhabdomyosarcoma[49]^13. It was demonstrated that FGFR inhibitors have an effect on overcoming drug resistance, thus FGFR-based therapeutic strategy is promising. More systematic studies using a targeted-sequencing approach will be useful to detect more candidate mutations in other PCs. GAB1 is a docking protein to transduce cellular signals from tyrosine kinases, such as Met (the hepatocyte growth factor) and EGFR (the epidermal growth factor receptor). The role of GAB1 signalosome in cancer was only reported in breast[50]^14 and colorectal cancers[51]^15. Though GAB1 is not included in our Pedican as there is not direct link of GAB1 to any PCs, the other components of the GAB1 signalosome are enriched in our 735 PC-related genes, such as PDGFB, PDGFA, EGFR, MDM2, CDK4, PDGFRA. In summary, our results highlight that multiple cellular signaling events are related to PCs, especially NOTCH, FGFR and GAB1 signaling. The GAB1 is a good candidate gene to test its functions in PCs and other adult cancers. PC-related genes are enriched in adult cancers, preterm birth and high birth weight Though previous studies show that the PCs are different from their corresponding adult cancers[52]^4, our disease-based enrichment analysis still shows connections between the PC-related genes and a broad-spectrum of human adult cancers ([53]Table S3). Even the enrichment analysis of PC-related genes cannot measure how much commonality exists for the underlying molecular mechanisms between PCs and adult cancers; instead, it may imply that the overall signaling pathways of PCs are similar to adult cancers. The cancers involved mainly include those of the breast, colorectal, lung, stomach, esophageal, leukemia, bladder, prostate, pancreas, cervix, liver, melanoma, ovary and glioma. Systematic comparison of PCs with adult cancers may provide more comprehensive picture for the underlying common molecular mechanism between PCs and adult cancers. Most interestingly, the 735 PC-related genes are also over-represented in endometriosis, type 1 diabetes (T1D), benzene toxicity, primary biliary cirrhosis, preterm birth, and high birth weight. The positive association of high birth weight to both childhood and adult cancers is shown by several studies[54]^16,[55]^17,[56]^18,[57]^19,[58]^20,[59]^21. Though the risk of preterm birth to an increased incidence of breast cancer in the mother has been discussed previously[60]^22,[61]^23, there is no direct evidence linking preterm birth to PCs. Our enrichment analysis may provide a clue for further exploration on the potential role of preterm birth in PCs. Therefore, further data mining on our Pedican may provide a clue about a potential role of birth weight and preterm birth in both PCs and adult cancers, including changes of hormone signaling along the cancer development. Prioritize the key genes in PC and their mutational landscape in pan-cancer genomic data To systematically evaluate the importance of PC-related genes, we conducted a gene ranking using 47 reliable genes as a training set by Endeavour (see Methods). The top ten ranked genes, included CDK4, CCND2, IGF1R, PDGFRB, CHEK2, CASP10, ERBB3, ATR, and E2F1. Not surprisingly, the majority of these top ranked genes are involved in the key pathway of cancers such as the cell cycle and P53 signaling pathway. Although our collected genes have been demonstrated to have abnormal gene expression or other functional relevance to PCs, the systematic examination of the genetic variants in pan-cancer has not yet been conducted. These mutational patterns are useful for comparing the PCs with their counterpart adult cancers. As shown in [62]Fig. 1, the top 100 ranking PC-related genes (including 47 genes from the training set and 53 top ranked genes from Endeavor) have overwhelming mutations in adult cancers. It is interesting that the 100 genes are over 90% mutated in a few cancers and cell lines including colorectal cancer, lung small cell cancer, bladder cancer, uterine cancer, ovarian cancer, squamous cell lung cancer, glioblastoma multiforme, pancreatic cancer, prostate cancer and melanoma. This result may highlight that PCs share substantial molecular mechanisms from adult cancers. The further comparison between specific PC and its corresponding adult cancer may provide more clues. Figure 1. The mutational landscape for the top 100 PC-related genes in multiple cancers. [63]Figure 1 [64]Open in a new tab The PC-related protein-protein interaction network is highly modularized By using the integrative protein-protein interaction data from the Pathway Commons database[65]^24, we performed a pathway reconstruction to present a cellular map related to PC. The reconstructed PC-related protein-protein interaction network contains 819 genes and 7720 gene-gene interactions with existent evidence from known biological pathways ([66]Fig. 2). Among the 819 nodes, 725 are from our curated 735 PC-related genes. The remaining 94 are the linker genes to bridge the PC-related genes to form a fully connected map. Therefore, the majority of curated PC-related genes are organized in a highly modular structure. This is not only supportive of the precision of our data curation, but it also reveals the PC-related genes are acting in a high-density cellular module. Figure 2. Reconstructed PC map using protein-protein interaction data. [67]Figure 2 [68]Open in a new tab (A) The 335 genes in red are genes from the core dataset in our Pedican. The remaining 36 genes in orange are linker genes that bridge the 335 genes; (B) the degree distribution; (C) the short path length frequency; (D) the correlation between closeness centrality and the number of neighbours. The common cancer genes across multiple PC types On the basis of information from the literature, we annotated all the genes in Pedican with a specific cancer type. We classified all the PC types into 17 major groups according to anatomic and biological functions, including bone, cardiovascular, connective tissue, dermatological, developmental, ear/nose/throat, endocrine, gastrointestinal, genitourinary, hematological, immunological, muscular, neurological, ophthalmology, related syndrome, renal, and unclassified. The majority of PC cancer-related genes are related to neurological (357) and blood (220) functions. Based on the common genes in the 17 PC groups, the overlapping relationships were plotted in [69]Fig. 3. It revealed that the multiple cancer groups shared potential molecular mechanisms. For instance, 58 common genes are found between neurological-related cancers and haematological-related cancers. Figure 3. The shared genes across multiple PCs. [70]Figure 3 [71]Open in a new tab The length of circularly arranged segments is proportional to the total genes in each PC group. The ribbons connecting different segments represent the number of shared genes between PC groups. The three outer rings are stacked bar plots that represent relative contribution of other PC group to the PC group totals. Conclusion Pedican is constructed as a free database and analysis server to enable users to rapidly search and retrieve summarized PC-related genes. The functional enrichment analyses reveal that multiple developmental processes are related to PC-related genes involved in various cancer types. Our curated gene list provides a clue to the discovery of the common driver genes across multiple PCs and to explore the difference between the adult cancers and their counterpart PCs. The Pedican is freely accessible at [72]http://Pedican.bioinfo-minzhao.org/. Limitations and future work This study aims to integrate literature and genomic data to explore the common mechanisms for different pediatric cancers. Comparing with the other public databases, our pediatric cancer database provided a curated, organized, and annotated gene list for pediatric cancer in an easily accessible way. From our web interface, user can not only find the reported genes related to pediatric cancer with their origin references, but also obtain more comprehensive knowledge about these