Abstract Background The morbidity of thyroid carcinoma has been rising worldwide and increasing faster than any other cancer type. The most common subtype with the best prognosis is papillary thyroid cancer (PTC); however, the exact molecular pathogenesis of PTC is still not completely understood. Methods In the current study, 3 gene expression datasets ([35]GSE3678, [36]GSE3467, and [37]GSE33630) and 2 miRNA expression datasets ([38]GSE113629 and [39]GSE73182) of PTC were selected from the Gene Expression Omnibus (GEO) database and were further used to identify differentially expressed genes (DEGs) and deregulated miRNAs between normal thyroid tissue samples and PTC samples. Then, Gene Ontology (GO) and pathway enrichment analyses were conducted, and a protein-protein interaction (PPI) network was constructed to explore the potential mechanism of PTC carcinogenesis. The hub gene detection was performed using the CentiScaPe v2.0 plugin, and significant modules were discovered using the MCODE plugin for Cytoscape. In addition, a miRNA-gene regulatory network in PTC was constructed using common deregulated miRNAs and DEGs. Results A total of 263 common DEGs and 12 common deregulated miRNAs were identified. Then, 6 significant KEGG pathways (P < 0.05) and 82 significant GO terms were found to be enriched, indicating that PTC was closely related to amino acid metabolism, development, immune system, and endocrine system. In addition, by constructing a PPI network and miRNA-gene regulatory network, we found that hsa-miR-181a-5p regulated the most DEGs, while BCL2 was targeted by the most miRNAs. Conclusions The results of this study suggested that hsa-miR-181a-5p and BCL2 and their regulatory networks may play important roles in the pathogenesis of PTC. Keywords: Papillary thyroid cancer (PTC), Differentially expressed genes (DEGs), Deregulated miRNAs, BCL2, hsa-miR-181a-5p Background The morbidity of thyroid carcinoma has been rising worldwide and increasing faster than any other type of cancer, mainly due to the increasing use of diagnostic equipment [[40]1]. It was reported that in 2017 thyroid cancer was the fifth most common cancer among American women [[41]2]. Remarkably, thyroid cancer is 3 times more common in women than in men [[42]2]. Similarly, thyroid cancer was the fastest growing cancer among Chinese from 1988 to 2013. Between 1988 and 2013, the incidence of thyroid cancer increased by an average of 14.73% per year in men and 18.98% per year in women [[43]3]. Papillary thyroid cancer (PTC) is the most common subtype of differentiated thyroid cancers (DTCs), with the best overall prognosis [[44]4]. However, the biggest challenge in treating PTC is to identify an easy method for the early recognition of benign or malignant nodules and the detection of overtreatment. In recent years, a number of risk genes of PTC have been identified, including Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 1 (CITED1) [[45]5], LDL receptor related protein 4 (LRP4) [[46]6] and tektin 4 (TEKT4) [[47]7], whose downregulation can significantly inhibit the proliferation, migration, and invasion of PTC cells. Interleukin 17 receptor A (IL17RA) polymorphisms, which play an important role in tumor development, have been found to influence both the unilateral and bilateral development of PTC [[48]8]. In addition, increased expression of flavin-containing monooxygenase 1 (encoded by the FMO1 gene) could serve as a biomarker that independently predicts favorable recurrence-free survival in classical PTC patients [[49]9]. These observations suggest that an increasing number of genes are crucial for the pathogenesis of PTC. In addition, miRNAs also play an important role in the pathogenesis of various cancers, especially PTC. For example, miR-524-5p can inhibit cell migration, invasion, and apoptosis by targeting FOXE1 and ITGA3 in PTC [[50]10]. In addition, miR-215 was found to target ARFGEF1 and inhibit the proliferation and metastasis of PTC by regulating the epithelial-mesenchymal transformation [[51]11]. Furthermore, miR-509 [[52]12], miR-1270 [[53]13], miR-128 ,[[54]14] and many other miRNAs can inhibit PTC by targeting specific genes. These studies focused on one specific gene or miRNA; however, the comprehensive view of how these miRNAs and genes affect PTC remains unknown. The aim of our study was to screen significant gene and miRNA changes through bioinformatics methods to provide guidance for the study of PTC mechanisms and clinical treatment. In this study, 3 gene expression datasets ([55]GSE3678, [56]GSE3467, and [57]GSE33630) and 2 miRNA expression datasets ([58]GSE113629 and [59]GSE73182) (Sample analysis was shown in Additional file [60]1: Figure S1) of PTC were selected from the GEO database that were further used to identify DEGs and deregulated miRNAs between normal thyroid tissue samples and PTC samples. As a result, 263 DEGs and 12 deregulated miRNAs were identified based on the criteria we set. Then, GO and pathway enrichment analyses were conducted, and a PPI network was constructed to explore the potential mechanism of PTC carcinogenesis. The hub gene detection was performed using the CentiScaPe v2.0 plugin, and significant modules were discovered using the MCODE plugin for Cytoscape. In addition, a miRNA-gene regulatory network of PTC was constructed using common deregulated miRNAs and DEGs, and we found that hsa-miR-181a-5p regulated the most DEGs, while BCL2 was targeted by the most miRNAs in this network. However, the specific mechanisms of how hsa-miR-181a-5p could regulate BCL2 need further experiments. In conclusion, hsa-miR-181a-5p and BCL2 are expected to be distinctive biomarkers of benign or malignant tumors and potential therapeutic targets of PTC. Methods Acquisition of gene and miRNA expression profile microarray data The microarray data were acquired from the Gene Expression Omnibus (GEO) database ([61]www.ncbi.nlm.nih.gov/geo) [[62]15]. Three gene expression datasets ([63]GSE3678, [64]GSE3467, and [65]GSE33630) and 2 miRNA expression datasets ([66]GSE113629 and [67]GSE73182) of PTC were included in this study. Dataset [68]GSE3678 included 7 PTC samples and 7 paired normal thyroid tissue samples; dataset [69]GSE3467 included 9 PTC patients with paired tumor and normal thyroid tissue; and dataset [70]GSE33630 included 49 PTC samples and 45 normal thyroid tissue samples. These 3 gene expression datasets were all based on the platform of [71]GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array [[72]16–[73]19]. The miRNA dataset [74]GSE113629, based on the [75]GPL24741 Agilent-070156 Human_miRNA_V21.0_Microarray 046064 platform, included matched neoplasms and normal thyroid tissues from 5 patients with PTC. The [76]GSE73182 dataset based on the [77]GPL20194 Agilent-035758 Human miRBASE 16.0 plus 031181 platform included 19 primary papillary thyroid carcinomas and 5 normal thyroids [[78]20, [79]21]. Identification of DEGs and deregulated miRNA The interactive web tool GEO2R ([80]www.ncbi.nlm.nih.gov/geo/geo2r) was used to screen the DEGs and deregulated miRNAs between normal thyroid tissue samples and PTC samples [[81]15]. The Benjamin and Hochberg false discovery rate (FDR) method was used to correct the adjusted P value and correct the occurrence of false positive results. The cutoff standard was defined as P value< 0.01, adjusted P value < 0.01 and |logFC| > 1. GO terms and KEGG pathway enrichment analysis Pathway data were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database ([82]https://www.kegg.jp/) [[83]22] to examine specific pathways. To identify the Gene Ontology (GO) annotation and pathways in which DEGs were enriched, functional annotation tools were used for GO terms and KEGG pathway enrichment analysis in Database for Annotation, Visualization and Integrated Discovery (DAVID) ([84]https://david.ncifcrf.gov/) [[85]23]. The significance level of KEGG pathway enrichment was calculated using a cutoff of P value < 0.05. A GO term was considered significantly enriched if it showed a P value < 0.05. Construction of the PPI network The PPI network of DEGs was identified using the Search Tool for the Retrieval of Interacting Genes (STRING) ([86]http://string-db.org/) database [[87]24]. Subsequently, Cytoscape software (v 3.6.1) was used to visualize the PPI network. Topological measurements of the PPI network Several common topological measurements were investigated to reveal the basic features of the PPI network. The node degree, neighborhood connectivity, topological coefficients, and clustering coefficients were analyzed for the whole network. Hub gene identification and module analysis of the PPI network The hub genes in this PPI network were defined as nodes with a connective degree > 10 and identified using the CentiScaPe v2.0 plugin for Cytoscape [[88]25]. The most significant modules in the PPI network were identified by the Molecular Complex Detection (MCODE) plugin [[89]26] with MCODE scores ≥ 5, degree cutoff = 2, node score cutoff = 0.2, Max depth = 100 and k-core = 3. Construction of a network of deregulated miRNAs targeting DEGs The miRNA-gene target data were extracted from miRTarBase [[90]27] ([91]http://mirtarbase.mbc.nctu.edu.tw/php/index.php). When miRNA-gene pairs showed strong evidence of interaction in humans and matched the DEGs we identified, they were selected. Subsequently, Cytoscape software (v 3.6.1) was used to visualize the miRNA-gene network. Results Identification of DEGs and deregulated miRNAs in PTC The gene expression datasets [92]GSE3678, [93]GSE3467, and [94]GSE33630 were acquired from the GEO database. DEGs between normal thyroid tissue and PTC samples were screened using GEO2R. As a result, 436, 653, and 1237 DEGs were identified from the [95]GSE3467, [96]GSE3678, and [97]GSE33630 datasets, respectively. Volcano plots were generated for the 3 gene expression datasets for intuitive representation of the DEGs (Fig. [98]1a–c). The green plots represent downregulated DEGs, the red plots represent upregulated DEGs, and the black plots are not DEGs. In addition, Venn diagrams were also drawn for comparison of the number of total DEGs, upregulated DEGs, and downregulated DEGs in the three datasets. As a result, 263 common DEGs were obtained (Fig. [99]1d), comprising 120 coupregulated genes and 143 codownregulated genes (Fig. [100]1e, f). Fig. 1. [101]Fig. 1 [102]Open in a new tab Volcano plot and Venn diagram of DEGs and deregulated miRNAs in gene/miRNA expression profiling datasets. a–c Volcano plots of DEGs in normal thyroid tissue and PTC samples in the [103]GSE3467, [104]GSE3678, and [105]GSE33630 datasets. d–f Venn diagrams illustrating the number of all upregulated and downregulated DEGs in three gene expression datasets. The intersection in the center represents the common DEGs among the three datasets. g, i Volcano plots of deregulated miRNAs in normal thyroid tissue and PTC samples in [106]GSE73182 and [107]GSE113629 datasets, respectively. h Venn diagrams of all deregulated miRNAs in the two datasets. The intersection in the center represents the common deregulated miRNAs between the two datasets. DEGs and deregulated miRNAs were selected by P value < 0.01 and |logFC| > 1. The x-axis shows the fold-change in gene/miRNA expression, and the y-axis shows the statistical significance of the differences. Colors represent different genes/miRNAs: black for genes/miRNAs without significantly different expression, red for significantly upregulated genes/miRNAs and green for significantly downregulated genes/miRNAs. PTC, papillary thyroid cancer; FC, fold change The miRNA datasets [108]GSE113629 and [109]GSE73182 were also analyzed to screen deregulated miRNAs. The volcano plots showed that 16 and 2008 deregulated miRNAs were identified from the [110]GSE73182 and [111]GSE113629 datasets, respectively (Fig. [112]1g, i). Among them, 8 downregulated miRNAs and 8 upregulated miRNAs in the [113]GSE73182 dataset, 96 downregulated miRNAs and 1912 upregulated miRNAs in the [114]GSE113629 dataset, were identified. In addition, a Venn diagram was generated for comparison of the number of deregulated miRNAs in the two miRNA datasets; thus, 12 common deregulated miRNAs were acquired (Fig. [115]1h). Pathway enrichment analysis of common DEGs With the aid of the David database, KEGG pathway enrichment analysis was performed using common DEGs. As a result, 6 KEGG pathways (P < 0.05) were significantly enriched, and the top 6 pathways are shown in Fig. [116]2a. These common DEGs were identified to be enriched in the pathways of ‘Tyrosine metabolism’, ‘Pathways in cancer’, ‘Small cell lung cancer’, ‘Axon guidance’, ‘Complement and coagulation cascades’ and ‘Adipocytokine signaling pathway’. The most significant pathway ‘Tyrosine metabolism’ is shown in Fig. [117]2b. In addition, a classification of these pathways was also performed, which showed that the six pathways were divided into three major categories (Metabolism, Human Diseases and Organismal Systems) and five minor categories (Amino acid metabolism, Cancers, Development, Immune system and Endocrine system) (Table [118]1). These results indicated that PTC was closely related to amino acid metabolism, development, immune system, and endocrine system. Fig. 2. [119]Fig. 2 [120]Open in a new tab Pathway enrichment analysis of common DEGs. a Top six significant pathways of DEGs with KEGG enrichment (P < 0.05). b In-depth study of the top PTC-associated pathway (hsa00350: Tyrosine metabolism). DEGs, differentially expressed genes; KEGG, Kyoto encyclopedia of genes and genomes; PTC, papillary thyroid cancer Table 1. Six significant KEGG pathways and their classification Pathway P value Classification hsa00350:Tyrosine metabolism 0.002602 Metabolism; Amino acid metabolism hsa05200:Pathways in cancer 0.005643 Human Diseases; Cancers: Overview hsa05222:Small cell lung cancer 0.013589 Human Diseases; Cancers: Specific types hsa04360:Axon guidance 0.019287 Organismal Systems; Development hsa04610:Complement and coagulation cascades 0.027889 Organismal Systems; Immune system hsa04920:Adipocytokine signaling pathway 0.029209 Organismal Systems; Endocrine system [121]Open in a new tab GO enrichment analysis of common DEGs To better understand common DEGs, a GO enrichment analysis was also performed using DAVID. As a result, a total of 82 significant GO terms were identified with a cutoff of P < 0.05. The top biological processes (BPs), cellular components (CCs), and molecular functions (MFs) are shown in Fig. [122]3 a, b, and c, respectively. A total of 57 BPs, 16 CCs, and 9 MFs were included in the 82 significant GO terms (Fig. [123]3d). The top 10 BPs were ‘positive regulation of MAP kinase activity’, ‘response to estrogen’, ‘sensory perception of sound’, ‘activation of MAPK activity’, ‘reactive oxygen species metabolic process’, ‘cell adhesion’, ‘melanocyte differentiation’, ‘positive regulation of epithelial cell proliferation involved in lung morphogenesis’, ‘regulation of ERK1 and ERK2 cascade’ and ‘mesenchymal cell differentiation’. In addition, the most significant CC was the ‘extracellular exosome’, and the most significant MF was ‘protein homodimerization activity’. These results further illustrate the fundamental characteristics and functions of cell adhesion, differentiation, proliferation, and MAPK activity in PTC. Therefore, the roles of these DEGs could be better understood by analyzing these significantly enriched GO terms in PTC pathogenesis. Fig. 3. [124]Fig. 3 [125]Open in a new tab GO enrichment analysis of common DEGs. a The top 10 significant biological processes (P < 0.05). b The top 10 significant cellular components (P < 0.05). c The top 9 significant molecular functions (P < 0.05). d A pie chart of the proportion of the significant GO terms, including 57 biological processes, 16 cellular components and 9 molecular functions. DEGs, differentially expressed genes; GO, Gene Ontology Construction of the PPI network and module analysis To further explore the interaction among the 263 common DEGs, a PPI network was constructed (Fig. [126]4a). The PPI network contained 189 nodes and 346 edges. To explore the basic characteristics of the PPI network, the topological features of the network were analyzed in terms of degrees, topological coefficients, neighborhood connectivity, and clustering coefficients (Fig. [127]4b–e). It was observed that this network followed a natural rule that the majority of nodes had a low degree and that only a few nodes were highly connected with the others. Similar to other biological networks, the degree distribution of this network displayed a power law distribution of f(x) = 100.46x^−1.54 with an R^2 of 0.857, indicating that the network followed a scale-free distribution and behaved like a small-world network [[128]28]. Fig. 4. [129]Fig. 4 [130]Open in a new tab PPI network constructed from the common DEGs and their topological features and significant modules. a A PPI network was constructed from STRING using the 164 common DEGs. The nodes represent proteins, the edges represent the interactions of proteins and the green circles and red circles indicate downregulated and upregulated DEGs, respectively. b–e The basic topological features of the network included degrees, topological coefficients, neighborhood connectivity, and clustering coefficients. The degree distribution of this network displayed a power law distribution of f(x) = 100.46x^−1.54 and an R^2 of 0.857. f–h Three significant modules in the PPI network with MCODE scores ≥ 5. Green circles and red circles indicate downregulated and upregulated DEGs, respectively. PPI, protein-protein interaction; DEGs, differentially expressed genes; STRING, Search Tool for the Retrieval of Interacting Genes; MCODE, Molecular Complex Detection Furthermore, the hub genes in the PPI network with a connectivity degree > 10 were identified using the CentiScaPe v2.0 plugin for Cytoscape. The most significant 11 node degree genes were FN1, CCND1, MET, RUNX2, IRS1, SERPINA1, KRT19, FGFR2, MUC1, NCAM1, and WFS1. Detailed information on the 11 hub genes is summarized in Table [131]2. Among these 11 hub genes, IRS1 and WFS1 had not been previously validated in PTC, which prompted us to further investigate the role of IRS1 and WFS1 in PTC. Table 2. Detailed information on the 11 hub genes identified in the PPI network Gene Full name Overview Expression in PTC Reference FN1 Fibronectin 1 FN1 is involved in cell adhesion and migration processes including host defense and metastasis. Upregulated [[132]29] CCND1 Cyclin D1 CCND1 has been demonstrated to interact with tumor suppressor protein Rb. Upregulated [[133]30] MET MET proto-oncogene, receptor tyrosine kinase MET plays a role in cellular survival, embryogenesis, and cellular migration and invasion. Downregulated [[134]31] RUNX2 RUNX family transcription factor 2 RUNX2 is essential for osteoblastic differentiation and skeletal morphogenesis. Upregulated [[135]32] IRS1 Insulin receptor substrate 1 Mutations in IRS1 are associated with type II diabetes and susceptibility to insulin resistance. No relevant biological experiments [[136]33] SERPINA1 Serpin family A member 1 The protein encoded by SERPINA1 is an inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Upregulated [[137]34] KRT19 Keratin 19 KRT19 is responsible for the structural integrity of epithelial cells. It is specifically expressed in the periderm. Upregulated [[138]35] FGFR2 Fibroblast growth factor receptor 2 FGFR-2 is involved in regulating cell proliferation, migration and differentiation, as well as in the response to injury and tissue repair. Downregulated [[139]36] MUC1 Mucin 1, cell surface associated Overexpression, aberrant intracellular localization, and changes in the glycosylation of MUC1 have been associated with carcinomas. Downregulated [[140]37] NCAM1 Neural cell adhesion molecule 1, also known as CD56 NCAM1 is involved in the development of the nervous system, the expansion of T cells and dendritic cells with a regulatory role in cell motility and migratory capacity of neoplastic cells. Downregulated [[141]38] WFS1 Wolframin ER transmembrane glycoprotein Mutations in WFS1 are associated with Wolfram syndrome, also called DIDMOAD (diabetes insipidus, diabetes mellitus, optic atrophy, and deafness), an autosomal recessive disorder. No related references [[142]39]