Abstract Introduction Essential thrombocytosis (ET) is a group of myeloproliferative neoplasms characterized by abnormal proliferation of platelet and megakaryocytes. Research on potential key genes and novel regulatory markers in essential thrombocythemia (ET) is still limited. Methods Downloading array profiles from the Gene Expression Omnibus database, we identified the differentially expressed genes (DEGs) through comprehensive bioinformatic analysis. GO, and REACTOME pathway enrichment analysis was used to predict the potential functions of DEGs. Besides, constructing a protein–protein interaction (PPI) network through the STRING database, we validated the expression level of hub genes in an independent cohort of ET, and the transcription factors (TFs) were detected in the regulatory networks of TFs and DEGs. And the candidate drugs that are targeting hub genes were identified using the DGIdb database. Results We identified 63 overlap DEGs that included 21 common up-regulated and 42 common down-regulated genes from two datasets. Functional enrichment analysis shows that the DEGs are mainly enriched in the immune system and inflammatory processes. Through PPI network analysis, ACTB, PTPRC, ACTR2, FYB, STAT1, ETS1, IL7R, IKZF1, FGL2, and CTSS were selected as hub genes. Interestingly, we found that the dysregulated hub genes are also aberrantly expressed in a bone marrow cohort of ET. Moreover, we found that the expression of CTSS, FGL2, IKZF1, STAT1, FYB, ACTR2, PTPRC, and ACTB genes were significantly under-expressed in ET (P<0.05), which is consistent with our bioinformatics analysis. The ROC curve analysis also shows that these hub genes have good diagnostic value. Besides, we identified 4 TFs (SPI1, IRF4, SRF, and AR) as master transcriptional regulators that were associated with regulating the DEGs in ET. Cyclophosphamide, prednisone, fluorouracil, ruxolitinib, and lenalidomide were predicted as potential candidate drugs for the treatment of ET. Discussion These dysregulated genes and predicted key regulators had a significant relationship with the occurrence of ET with affecting the immune system and inflammation of the processes. Some of the immunomodulatory drugs have potential value by targeting ACTB, PTPRC, IL7R, and IKZF1 genes in the treatment of ET. Keywords: essential thrombocythemia, hub genes, regulatory markers, candidate drugs, bioinformatics analysis Introduction Essential thrombocythemia (ET) belongs to a class of BCR/ABL negative myeloproliferative neoplasm (MPN) that is characterized by unexplained proliferation of megakaryocytes in the bone marrow.[34]^1 Generally speaking, the increased platelet is natural to be ignored by patients due to the lack of typical clinical characters and could result in persistent high platelet counts, even cause an embolism. However, the pathogenesis of the increased number of megakaryocytes is still unclear.[35]^2 When thrombotic complications, bleeding, and leukemic transformation occur, which could severely affect the patient’s quality of life.[36]^3 In recent years, targeted therapeutic drugs are being explored as a mono-or combination therapy in this field, such as ruxolitinib, pacritinib, fedratinib.[37]^4 Although it seems that some existing researches focus on ET pathogenesis, there still limited treatment is available on ET; also, the mechanism of ET is complicated, and more genetic mutations involved.[38]^5^,[39]^6 Thus, identifying molecular biomarkers associated with ET is crucially essential for personalized therapy. Gene chip is a new nucleic acid analysis and detection technology, and the relevant analysis results have been stored in public databases.[40]^7 Therefore, gene expression profiling is an appropriate application field of gene chip technology, and integrating and re-analyzing these genomic data to identifying specific biomarkers that are closely related to the disease provides new avenues.[41]^8 Recently, with the massive generation of tumor sequencing data, bioinformatics is developing rapidly in hematological malignancies.[42]^9 Gene Expression Omnibus (GEO) database serves as a public gene expression profile repository for high-throughput experimental data, and this allows us to comprehensively clarify the pathogenesis of ET from extensive data analysis at the multi-gene level.[43]^10 In this study, we will thoroughly analyze genomic data in more detail to explore potential biological therapeutic targets. Materials and Methods Microarray Data Analysis and DEG Identification The gene expression profiles were obtained from The National Center for Biotechnology Information-Gene Expression Omnibus Data Sets (NCBI-GEO Datasets, [44]https://www.ncbi.nlm.nih.gov/gds/). The [45]GSE61629 expression data was based on the [46]GPL570 platform, which uses HG-U133_Plus_2 Affymetrix Human Genome U133 Plus 2.0 arrays and includes 54 total RNA samples from whole blood [21control subjects, 8 ET patients, 21 polycythemia vera (PV) patients, 4 primary myelofibrosis (PMF) patients].[47]^11 The [48]GSE124281 expression data was based on the [49]GPL10558 platform, which profiling by uses Illumina HumanHT-12 V4.0 expression bead chip and includes 22 RNA samples from whole blood (8 control subjects, 5 ET patients, 9 PMF patients). Another dataset, [50]GSE123732, was used as an independent cohort for validating the results. The expression data of [51]GSE123732 was based on the [52]GPL10558 Illumina HumanHT-12 V4.0 expression bead chip platform and contains 6 RNA samples from bone marrow mononuclear cells (3 ET patients and 3 healthy individuals). In this study, the raw data from Affymetrix were processed using the robust multiarray (RMA) method for background correction, normalization quality control, and gene expression index summarization in the “Affy” package.[53]^12 The raw data from Illumina were processed using the “Lumi,”[54]^13 “org.Hs.eg.db”,[55]^14 and “Limma”[56]^15 packages from Bioconductor in the R program (Version 3.6.2) were used to annotate the microarray data and identify DEGs between patients with ET and control subjects. Genes with the absolute value of |log2FC|>0.5 and P < 0.05 were used as the cut-off criteria of DEG analysis. The DEGs data were processed by “pheatmap”[57]^16 and “ggrepel”[58]^17 packages in the R program to draw a heatmap and volcano plot of the significantly changed genes. Venn diagrams were used to show up-regulated common genes, down-regulated common genes, and up-regulated versus down-regulated common genes. Functional Enrichment Analysis of DEGs The Gene Ontology (GO) and the pathway enrichment analysis of DEGs were carried out using the ClusterProfiler package of the R program (Version 3.6.2).[59]^18 GO terms enrichment analysis include biological process (BP), molecular function (MF), cellular component (CC). The pathway analysis was carried out using REACTOME ([60]http://www.reactome.org) online database, and P-value <0.05 was considered the cut-off criterion. Prediction of the Master Transcription Factors (TFs) Significantly Regulating the DEGs To predict master TFs that are remarkably associated with the regulating of DEGs, we have utilized the iRegulon plugin of Cytoscape software (version 3.8.0) to detect regulons from all DEGs.[61]^19 For this purpose, we used an extensive collection of TF motifs and a large collection of ChIP-seq tracks. The iRegulon method depends on a ranking-and-recovery system where all genes of the human genome are scored by a motif discovery step integrating the clustering of binding sites within cis-regulatory modules (CRMs) and the potential distal location of CRMs upstream or downstream of the transcription start site (TSS ±10 kb). The recovery step calculates the normalized enrichment score (NES) of TFs for each set of genes, input for each of the individual analyses, leading to the prediction of the TFs based on NES and their putative direct target genes which exist in the input lists. This method optimizes the association of TFs to motifs using both explicit annotations and predictions of TF orthologs and motif similarity. A transcription factor normalized enrichment score was computed for each group where a normalized enrichment score > 4.0 corresponds, and the maximum false discovery rate (FDR) on motif similarity set as 0.001. Construction and Analysis of Protein–Protein Interaction (PPI) Network The network of predicted associations for DEGs was searched through the STRING online database (STRING, [62]http://string‐db.org, version 11.0).[63]^20 DEGs with a Combined Score ≥0.4 was set as the cut-off criterion to construct the PPI network, and then the PPI network was visualized Cytoscape software (version 3.8.0).[64]^21 The Molecular Complex Detection (MCODE) plugin of Cytoscape software was used to detect the functional clusters of the PPI network,[65]^22 cluster finding parameters sets as Node Score Cutoff=0.2; K-Core= 2; and Max.Depth= 100. Then the top 10 Hub genes were filtered from the results of the CytoHubba plugin of Cytoscape software,[66]^23 in this work, and we ranked hub nodes by the Maximal Centrality Clique (MCC) method. Expression Level Differentially Expressed Hub Genes in a Bone Marrow Cohort of ET The expression levels of top hub genes were further investigated using an independent cohort ([67]GSE123732) associated with ET. We downloaded this dataset from NCBI that contains the ET patients with healthy controls (n=6). We used ET patients with matched healthy controls for validating our findings. We employed the “limma” package in the R program from the Bioconductor project ([68]http://www.bioconductor.org/) for identifying differential expression of significant hub genes in ET. Hub genes with |log2FC| >0.50 and P < 0.05 were considered as statistically significant. qRT-PCR Verify the Expression Level of Hub Genes in the Clinical Samples To clarify whether the expression profile of key genes also had consistent results in our clinical samples, we further validated the expression levels of hub genes in peripheral blood from 10 ET patients (diagnosed according to the WHO 2016 diagnostic criteria, including five patients with JAK2 V617F mutation) using qPCR. Total RNA was isolated from 1 mL of whole blood using Invitrogen Trizol (Carlsbad, CA, USA), followed by standard phenol-chloroform extraction. Complementary DNA (cDNA) was quantified using a reverse transcription kit (K1622, Thermo Fisher Scientific, USA), and the expression levels of hub genes were performed using SYBR Green Master Mix (SYBR GREEN, Beijing, China). The housekeeping gene GAPDH was used as an internal control. The primers were synthesized by Shanghai Biotechnology Co., Ltd ([69]Table 1). qRT-PCR was performed on ViiATM 7 system software (Thermo Fisher Scientific, ABI7500, USA). The results were normalized to the expression of GAPDH and expressed as fold change (2^−ΔΔCT). The qRT-PCR experiment on each clinical sample with three biological replicates. Peripheral whole blood from 10 anonymous healthy volunteers was used as control samples. Table 1. The Primers for qRT-PCR Target Sequence (5ʹ - 3ʹ) CTSS (human) -RT-F AAGCACAGGGACACAAAGAGGAATC CTSS (human) -RT-R TTGATGAAGAGCAGCCAGTGATGTAG FGL2 (human) -RT-F CACTCTGTTCATTCCTCCAGGTATTCG FGL2 (human) -RT-R GTGTAGCATAAGAACCTAGCCGTCAG IKZF1 (human) -RT-F GAACCTGCTGCTGCTCTCCAAG IKZF1 (human) -RT-R GCTGCTCCTCGTTGTTGCTCTC IL7R (human) -RT-F TTAAAGGCTTCTGGAGTGAATGGAGTC IL7R (human) -RT-R CCAAGATGACCAACAGAGCGACAG ETS1 (human) -RT-F TGAAGGCAAAGGAAACTAAGGAAGGAG ETS1 (human) -RT-R TCACAACCAGCAGAAAGATGACTACC STAT1 (human) -RT-F CTGTGCGTAGCTGCTCCTTTGG STAT1 (human) -RT-R CTGAAGTTCGTACCACTGAGACATCC FYB (human) -RT-F CGGAGTTACCTAGCGGACAATGATG FYB (human) -RT-R GCACCTAATGAACACAGCAGAATGAC ACTR2 (human) -RT-F TGGTAGACTCTGGAGATGGTGTGAC ACTR2 (human) -RT-R TCCTCGCAACAGAAGTAGCTTGATAAG PTPRC (human) -RT-F GGAGTCTGATGTTCAAGAGCAGGAAG PTPRC (human) -RT-R GCAGGCACAAGAAGGTAGGAGAAG ACTB (human) -RT-F GCAGAAATGGGATGGAAAATCAACC ACTB (human) -RT-R ATGCTCTCTCATAAACTCTCGTGGA GAPDH (human) -RT-F AGAAGGCTGGGGCTCATTTG GAPDH (human) -RT-R AGGGGCCATCCACAGTCTTC [70]Open in a new tab ROC Curve Analysis of Hub Genes The receiver operating characteristic (ROC) curve was applied to examine the diagnostic value of differential hub genes in mRNA expression for distinguishing control subjects from ET patients. The statistical analysis and visualization were performed using GraphPad Prism 8.0 software (version 8.3.0). The hub genes with an area under the ROC curve (AUC)≥0.5 consider significant sensitivity and specificity, and 0.8≤AUC≤1.0 indicates that hub genes may have remarkable diagnostic value. Identification of Food and Drug Administration (FDA)-Approved Drug-Hub Gene Interaction We identified the potential candidate drugs that target the differentially dysregulated hub genes by using the DGIdb.[71]^24 DGIdb collects drug-gene interaction data from 30 disparate sources, including ChEMBL, DrugBank, Ensembl, NCBI Entrez, PharmGKB, PubChem, Clinical Trial Databases, and literature in NCBI PubMed. The drug-gene interactions supported by at least one database and PubMed reference were identified. We selected the only drugs that have been approved by the FDA. Results Identification of Differentially Expressed Genes in ET For identifying DEGs in MPNs, we obtained gene expression data from three datasets of patients with Essential Thrombocythemia and healthy controls. A total of 864 DEGs were identified from [72]GSE124281, and 717 DEGs were identified from [73]GSE61629, respectively. It should be noted that the datasets [74]GSE124281 and [75]GSE61629 were both include ET, PV, and PMF samples, so we have further selected expression profiling before treatment in ET patients and healthy controls from the raw data. Expectedly, we got very few genes between up and down groups. [76]Figure 1A and [77]B show the heatmap and volcano plot of DEGs; these DEGs were well clustered between ET patients and healthy controls. After that, the Venn diagram in R language was used to identify the 63 overlapping DEGs between two gene expression profiles. Among 63 DEGs, we identified 21 common up-regulated and 42 common down-regulated genes, as well as up-regulated versus down-regulated common genes between two gene expression profiles ([78]Figure 1C). Figure 1. [79]Figure 1 [80]Open in a new tab Identification of DEGs in two datasets ([81]GSE61629 and [82]GSE124281). (A) Heatmap of the top 50 DEGs in GSE621629 and [83]GSE124281 datasets, respectively. (B) Volcano plot of DEGs in [84]GSE61629 and [85]GSE124281 datasets. Red and blue plots represent genes with [logFC]>0.5 and P<0.05, and black plots represent genes with no significant difference. Furthermore, the green plots represent up-regulated and down-regulated genes with [logFC]>1 and P<0.05, and the labeled genes represent these genes with [logFC]>1.5 and P<0.05. (C) Venn diagram of commonly changed DEGs in the two datasets. (including 22 common up-regulated genes, 41 common-down-regulated genes, six genes have interacted between [86]GSE124281 up-regulated DEGs and [87]GSE61629 downregulated DEGs, four genes have interacted between [88]GSE61629 up-regulated DEGs [89]GS124281 downregulated DEGs). DEGs, differentially expressed genes. DEGs, differentially expressed genes. GO Analysis and Function Enrichment of DEGs The results of GO analysis demonstrated that biological processes (BP) of DEGs are mainly enriched in the neutrophil degranulation, neutrophil activation, and mediated involved in neutrophil-mediated immune response, regulation of hemopoiesis, receptor signaling pathway via STAT, and interleukin-6-mediated signaling pathway ([90]Table 2, [91]Figure 2A). For cellular component (CC), GO analysis results showed that the DEGs were primarily enriched in the focal adhesion and cell-substrate junction ([92]Table 2, [93]Figure 2B). Regarding molecular function (MF), the DEGs are mainly enriched in RNA polymerase II-specific DNA-binding transcription factor binding and DNA-binding transcription factor binding ([94]Table 2, [95]Figure 2C). Besides, the most significant pathways identified through REACTOME pathway analysis were present in [96]Table 3, and the results have shown that the common DEGs were mainly related to the immune system, such as innate immune system, neutrophil degranulation, and interleukin-6 signaling. Table 2. Gene Ontology Term Enrichment Analysis of DEGs in Essential Thrombocythemia Ontology Description Genes Ratio P-value Genes Found BP Neutrophil degranulation 10/57 1.84E-06 STOM/PNP/FGL2/IQGAP1/CTSS/LYZ/CD93/CMTM6/ACTR2/PTPRC BP Neutrophil activation involved in the immune response 10/57 1.95E-06 STOM/PNP/FGL2/IQGAP1/CTSS/LYZ/CD93/CMTM6/ACTR2/PTPRC BP Neutrophil activation 10/57 2.34E-06 STOM/PNP/FGL2/IQGAP1/CTSS/LYZ/CD93/CMTM6/ACTR2/PTPRC BP Neutrophil mediated immunity 10/57 2.38E-06 STOM/PNP/FGL2/IQGAP1/CTSS/LYZ/CD93/CMTM6/ACTR2/PTPRC BP Positive regulation of hemopoiesis 6/57 2.08E-05 PNP/ETS1/IL7R/LEF1/STAT1/PTPRC BP Interaction with symbiont 4/57 8.95E-05 IFI27/STOM/GPX1/LEF1 BP Regulation of hemopoiesis 8/57 9.16E-05 PNP/FGL2/PRKCB/ETS1/IL7R/ LEF1/STAT1/PTPRC BP Interleukin-6-mediated signaling pathway 3/57 9.32E-05 CBL/IL6ST/STAT1 BP Negative regulation of immune response 5/57 9.36E-05 GPX1/FGL2/IL7R/CNOT7/ PTPRC BP Receptor signaling pathway via STAT 5/57 0.000163942 IL6ST/IL7R/CNOT7/STAT1/ PTPRC BP Regulation of cell-cell adhesion 7/57 0.000212952 PNP/FGL2/IL6ST/ETS1/IL7R/ LEF1/ PTPRC CC Focal adhesion 8/59 2.62E-05 FHL2/ITGB5/CBL/IQGAP1/ ACTB/ACTR2/YWHAB/PTPRC CC Cell-substrate junction 8/59 2.96E-05 FHL2/ITGB5/CBL/IQGAP1/ ACTB/ACTR2/YWHAB/PTPRC MF RNA polymerase II-specific DNA-binding transcription factor binding 7/57 2.89E-05 IFI27/FHL2/PRKCB/ACTB/ LEF1/ RBL2/STAT1 MF DNA-binding transcription factor binding 7/57 0.000128279 IFI27/FHL2/PRKCB/ACTB/ LEF1/RBL2/STAT1 [97]Open in a new tab Abbreviations: BP, biological processes; CC, cellular component; MF, molecular function. Figure 2. [98]Figure 2 [99]Open in a new tab GO term enrichment analysis of DEGs. (A) Biological process. (B) Cellular component. (C) Molecular function. DEGs, differentially expressed genes. GO, Gene Ontology. Table 3. Signaling Pathway Enrichment Analysis of DEGs in REACTOME Pathway Name Genes Gene Ratio P-value FDR Innate Immune System 11 0.0099 1.64E-08 2.52E-05 Neutrophil degranulation 8 0.0167 3.68E-08 2.82E-05 Interleukin-6 signaling 3 0.2727 1.85E-07 9.46E-05 Interleukin-6 family signaling 3 0.125 2.25E-06 8.62E-04 MAP2K and MAPK activation 3 0.075 1.09E-05 2.94E-03 Cell-Cell communication 4 0.0308 1.15E-05 2.94E-03 Signaling by moderate kinase activity BRAF mutants 3 0.0667 1.55E-05 3.39E-03 Clathrin-mediated endocytosis 4 0.0276 1.77E-05 3.39E-03 [100]Open in a new tab Identification of TFs is Significantly Associated with the Overlap DEGs We identified four TFs: SPI1, IRF4, SRF, and AR as master transcriptional regulators that are associated with regulating the DEGs in ET. Furthermore, we identified 8 (GTF2A1, TEAD4, E2F3, MYB, SP9, GATA2, CEBPA, IKZF2) and 6 (SPIB, FLI1, SPI1, ELK1, LEF1, CREB3) TFs for the up-regulated and the down-regulated genes in ET, respectively. The master TFs associated with the DEGs were displayed in the regulatory networks. In the networks, TEAD4 (TEA Domain Transcription Factor 4) and MYB (MYB Proto-Oncogene, Transcription Factor) both target 11 up-regulated genes ([101]Figure 3A). SPI1(Spi-1 Proto-Oncogene, Hematopoietic Transcription Factor) and SPIB (Spi-B Transcription Factor) target 26 and 23 down-regulated genes separately ([102]Figure 3B). Besides, IRF4 (Interferon Regulatory Factor 4, Transcription Factor) targets 35 up and down-regulated genes ([103]Figure 3C). The results of TFs prediction will provide insights into the development and pathogenesis of ET. Figure 3. [104]Figure 3 [105]Open in a new tab Regulatory networks of the TFs and their targeted DEGs identified by iRegulon. (A) Regulatory network of the TFs and their targeted up-regulated genes. (B) Regulatory network of the TFs and their targeted down-regulated genes. (C) Regulatory network of the TFs and their targeted up and down-regulated genes. The green color octagon indicates TFs, the purple color oval indicates the DEGs regulated by TFs, and the blue color oval indicates that TFs do not restrict them. TFs, transcriptional regulators; DEGs, differentially expressed genes. PPI Network Analysis and Hub Gene Identification Based on the STRING database (version 11.0), we constructed a PPI network reflecting the functional association between DEGs and visualized the network analysis results in Cytoscape software (version 3.8.0) with 33 nodes and 54 edges, as shown in [106]Figure 4. Then, we identified two clusters in the PPI network by utilizing the MCODE application in Cytoscape. Cluster 1 consists of four nodes and six edges, including FYB, FGL2, PTPRC, CTSS ([107]Figure 5A). Cluster 2 contained three nodes and three edges, including CD93, STOM, CMTM6 ([108]Figure 5B). The top 10 hub genes inside the network were identified with the CytoHubba plugin, including ACTB, PTPRC, ACTR2, FYB, STAT1, ETS1, IL7R, IKZF1, FGL2, and CTSS ([109]Table 4, [110]Figure 5C). Figure 4. [111]Figure 4 [112]Open in a new tab PPI network of DEGs in the STRING database. Based on the STRING online database, the DEGs PPI network was constructed containing 63 DEGs. The different colors in the figure indicate the connectivity of genes, and the deep color shows the genes with the highest connectivity in the PPI network. PPI, protein–protein interaction; DEGs, differentially expressed genes. Figure 5. [113]Figure 5 [114]Open in a new tab Clusters and hub genes identified in the PPI network. (A) Cluster 1 in the PPI network. The orange nodes represent the screened genes in cluster 1. (B) Cluster 2 in the PPI network. The orange nodes represent the screened genes in cluster 2. (C) Ten hub gene identification in a PPI network based on the MCC method. The dark (red) nodes show the genes with higher MCC scores in the PPI network. PPI, protein–protein interaction; MCC, maximal centrality clique. Table 4. The Rank of Hub Genes in the PPI Network of DEGs Rank Name Score Regulatory Status [115]GSE124281 [116]GSE61629 logFC P-value logFC P-value 1 ACTB 10 Down −0.5 2.10E-04 −0.71 8.24E-03 2 PTPRC 9 Down −0.65 4.68E-04 −1.04 1.04E-02 3 ACTR2 7 Down −0.72 1.88E-05 −0.9 3.14E-05 3 FYB 6 Down −1.02 1.07E-04 −0.52 1.26E-03 5 STAT1 6 Down −0.53 4.26E-02 −0.78 1.75E-03 6 ETS1 5 Down −0.63 3.42E-02 −0.95 4.32E-04 7 IL7R 5 Down −1.2 2.09E-04 −0.54 3.17E-04 8 IKZF1 5 Down −0.72 4.34E-04 −0.59 2.28E-06 8 FGL2 4 Down −1.35 2.61E-04 −0.55 1.87E-03 10 CTSS 4 Down −0.6 6.41E-04 −0.69 2.33E-04 [117]Open in a new tab Note: The rank of hub genes was identified in the PPI network. Differentially Expressed Hub Genes are Dysregulated in a Bone Marrow Cohort of ET To investigate the hub genes expression in bone marrow, we re-analyzed the significant hub genes in an independent cohort. Interestingly, we found that the top ten down-regulated hub genes (ACTB, PTPRC, ACTR2, FYB, STAT1, ETS1, IL7R, IKZF1, FGL2, and CTSS) were also down-regulated in a bone marrow cohort containing ET and healthy samples ([118]Figure 6). It indicates that these aberrantly expressed hub genes may have a vital contribution to the ET. ACTB, PTPRC, ACTR2, FYB, STAT1, IL7R, and FGL2 dysregulated hub genes are also downregulated in a bone marrow cohort of ET (P< 0.05). It indicates the consistency of our findings in this study. However, three of the down-regulated hub genes (ETS1, IKZF1, and CTSS) showed similar expression differences between ET and healthy samples but not significant (P> 0.05). This result Indicating that the hub genes are also deregulated in the bone marrow of ET patients. Figure 6. [119]Figure 6 [120]Open in a new tab Validated the expression of the identified hub genes in an independent cohort group. The top 10 hub genes were both lower expressed in ET samples from the [121]GSE123732 database, and the seven genes as ACTB, PTPRC, ACTR2, FYB, STAT1, IL7R, and FGL2 with a significant difference between ET and control. *P<0.05, **P<0.01. ET, essential thrombocythemia. Results of Expression Validation of Hub Genes in Clinical Samples By further validating the hub genes in 20 clinical samples (10 ET vs 10 healthy individuals), we found that the expression of 8 of hub genes (CTSS, FGL2, IKZF1, STAT1, FYB, ACTR2, PTPRC, and ACTB) was significantly down-regulated in the ET samples (P<0.05), which was consistent with our analysis. However, two of the hub genes (IL7R and ETS1) were significantly up-regulated (P<0.05) in both ET samples ([122]Figure 7). This result further indicates that these hub genes, which we identified in our previous analysis, might be recognized as potential key genes in ET. Figure 7. [123]Figure 7 [124]Open in a new tab Validated the expression of the identified hub genes in clinical samples. Eight of the hub genes (CTSS, FGL2, IKZF1, STAT1, FYB, ACTR2, PTPRC, and ACTB) were both lower expressed in ET samples, and two hub genes (IL7R and ETS1) were higher expressed in ET samples. ***P<0.001. ET, essential thrombocythemia. The Expression Levels of Hub Genes are Associated with the Diagnostic Value in ET To evaluate the diagnostic value of the dysregulated hub genes, we analyzed the correlation of hub gene expression levels. The performance of these ten hub genes in the ET group and healthy individuals was analyzed by the receiver operating characteristic (ROC) curve. The results show that the expression of these hub genes showed excellent diagnostic value between ET patients and healthy. All the hub genes showed a significant sensitivity and specificity between ET patients and healthy [area under the ROC curve (AUC)≥0.5] ([125]Figure 8). ROC curves showed their high diagnostic value as biomarkers for ET (such as ACTB AUC: 0.975 in GSE 124281 and 0.768 in [126]GSE61629; PTPRC AUC: 1.0 in GSE 124281 and 0.768 in [127]GSE61629; STAT1 AUC: 0.8 in GSE 124281 and 0.5 in [128]GSE61629; EST1 AUC: 0.8 in GSE 124281 and 0.893 in [129]GSE61629; CTSS AUC: 0.975 in GSE 124281 and 0.774 in [130]GSE61629). This result indicating the diagnostic significance of these aberrantly expressed hub genes in ET patients, and these genes may be promising targets for developing diagnostic markers to manage patients with ET. Figure 8. [131]Figure 8 [132]Open in a new tab ROC curves employed to assess the diagnostic value of the hub genes. The ROC curve of the top 10 hub genes both showed a significant sensitivity and specificity in two databases (all the genes with AUC≥0.5). This result indicating these genes may have good diagnostic value for patients with ET. ROC, receiver operating characteristic. Identification of Candidate Drugs Targeting Hub Genes We screened all ten hub genes for drug-gene interactions by using DGIdb, and we identified FDA-approved drugs that potentially target the protein products of four hub genes (PTPRC, ACTB, IL7R, IKZF1). Still, the interaction types between target genes and drugs are uncertain ([133]Table 5). The results showed that cyclophosphamide, prednisone, fluorouracil, ruxolitinib, and lenalidomide both had been identified as potential medicine for ET treatment. However, as far as we know, the inhibitory effects of ACTB, PTPRC, IL7R, and IKZF1 have not been tested for the treatment of ET. Our data suggest that these genes may be promising targets for developing anticancer drugs to treat patients with ET. Table 5. FDA-Approved Drugs Potentially Targeting Five Hub Genes Drug Target Gene Interaction Types Sources PMIDs Cyclophosphamide ACTB N/A NCI 12167460 Prednisone PTPRC N/A NCI 17063711 Fluorouracil PTPRC N/A NCI 15206578 Ruxolitinib IL7R N/A CGI 22897847,22955920 Lenalidomide IKZF1 N/A CKB 24292625 [134]Open in a new tab Note: The analysis results were obtained from the cancer-relevant drug-gene interactions database. Abbreviations: NCI, national cancer institute; CGI, cancer genome interpreter; CKB, the Jackson laboratory clinical knowledgebase. Discussion The Philadelphia chromosome-negative myeloproliferative neoplasms (MPNs) are a group of clonal hematopoietic stem cell disorders. As far as we know, there are signaling pathways that are highly sensitive to cytokines and aberrantly activated in MPNs. However, the molecular mechanisms leading to the alteration of these signaling pathways are still unknown.[135]^25 Since 2005, the discovery of mutations in JAK2, CALR, and MPL genes has changed the classification and diagnosis of BCR/ABL-negative MPN. The JAK2V617F positive variations are in 50–60% of ET patients,[136]^26^,[137]^27 MPL gene mutations are in 3%-5%, and CALR mutations are in 60–88% of JAK2 V617F mutation-negative ET patients.[138]^28^,[139]^29 It is noteworthy that there are still 10–15% of patients with triple-negative gene mutations, and their genetic abnormalities are unknown. Recently, with the development of second-generation sequencing (NGS) and other whole-genome analysis techniques, many different gene mutations have been found in MPNs, including TET2, DNMT3A, IDH1/2, TP53, and EZH2.[140]^30 Nevertheless, these gene mutations also exist in other diseases, which are not specific to MPNs, and have a low frequency of mutations (< 10%).[141]^31 Some of these genes are directly involved in the pathogenesis of ET, and some are directly involved in epigenetic regulation. They can induce JAK2 mutations, and often earlier than JAK2 variation, suggesting that the pathogenesis of ET is the result of multiple gene disorders, not a single gene mutation.[142]^32 Recently, the rapid development of bioinformatics technology provides us with an advanced method to study the pathogenesis of disease more comprehensively and identify possible core targets, which can also offer the potential for the diagnosis and treatment of ET. In this study, we selected two microarrays (accession [143]GSE57793 and [144]GSE124281) that met the inclusion criteria from the GEO databases and screened a total of 63 overlap DEGs. GO analysis results showed that the DEGs significantly enriched in the immune system and regulation of hemopoiesis, like neutrophil mediated immune response, Interleukin-6-mediated signaling pathway, and receptor signaling pathway via STAT. Neutrophils are an essential part of the human immune system,[145]^33 they could make an active immune response, and the formation of the neutrophil extracellular trap (NET) has been known as a component of innate immunity.[146]^34 It is currently recognized that NET could be triggered by a variety of proinflammatory cytokines, such as increased levels of IL-6, IL-1, IL-8, and TNFα.[147]^35 Although NETs are mainly involved in host defense, they also provide a suitable scaffold for the formation of blood clots and induce a robust procoagulant response by binding red blood cells, platelets, and von Willebrand factor (VWF).[148]^36 Through the contact phase, activation NETs will link the processes between innate immunity and thrombosis, and it explains how NET promotes the function of immune thrombosis. Recent research found that neutrophils isolated from patients with JAK2 V617F mutations in MPNs showed up-regulation of NADPH oxidase activity, which is closely related to the increase in ROS production.[149]^37 Furthermore, the other study showed that inhibition of JAK2 could lead to a reduction in neutrophils induced superoxide production.[150]^38 The results have further confirmed that the abnormal activation of JAK2 V617F is related to the excessive activation of neutrophil NADPH oxidase, and the neutrophils could play a vital role in the activation of oxidative stress and the occurrence of myeloproliferative neoplasms. These conclusions also provide a new explanation for the occurrence of embolism in ET patients. Regulation and misregulation of TFs are associated with human diseases since TFs crucially controlled the genes and gene expression programs.[151]^39 In human T cell acute lymphoblastic leukemia, the core transcriptional machinery regulates molecular pathogenesis.[152]^40 It was demonstrated that a crucial transcription factor, GATA1, is associated with the pathogenesis of ET.[153]^41 Therefore, identifying more master TFs that regulating gene expressions in human diseases, including ET, is one of the substantial targets. To identify regulatory networks of master transcriptional regulators (MMTRs), we utilized iRegulon, a plugin of Cytoscape, to identify MMTRs. We found that the TFs such as IRF4, SRF, AR, SPI1, GTF2A1, TEAD4, E2F3, MYB, SP9, GATA2, CEBPA, IKZF2, SPIB, FLI1, ELK1, LEF1, and CREB3 were significantly associated with the DEGs. Based on our research, we have confirmed IRF4, TEAD4, MYB, SPI1, and SPIB with the most regulatory relationships. IRF4 belongs to the IRF (Interferon Regulatory Factor) family of transcription factors and is significant in the regulation of interferon in the regulation of interferon-induced genes.[154]^42^,[155]^43 It has been confirmed with lymphocyte-specific and negatively regulate Toll-like receptor (TLR) signaling and is essential for the activation of the innate and adaptive immune system.[156]^44^,[157]^45 Recently, IRF4 also has been reported in the literature that it can bind to the interferon-stimulated response element (ISRE) of MHC class I promoters and is specifically involved in the pathogenesis of lymphatic system cancer.[158]^46 Furthermore, research has found that IRF4 is highly expressed in B cells and plasma cells, as well as controls the differentiation of B cells to plasma cells and the conversion of immunoglobulin classes.[159]^47 There was also a study that hypothesizes that direct targeting of IRF4 may be a potential therapeutic strategy for the treatment of multiple myeloma.[160]^48 TEAD4 (also known as Transcription Enhancement Factor 3, TEF-3) is a crucial member of the TEAD family.[161]^49 Studies have reported TEAD4 plays a vital role in the Hippo signaling pathway, thereby regulating cell proliferation, migration, and epithelial-mesenchymal transition (EMT) induction.[162]^50 In the study of the expression pattern and biological function of TEAD4, it was observed that the expression level of TEAD4 was increased in a variety of tumor tissues, such as lung adenocarcinoma, colorectal cancer, breast cancer, etc., and patients with higher TEAD4 expression tend to have higher poor overall survival.[163]^51–53 More and more convincing evidence also shows that TEAD4 does play a role in cancer development, and it is also a new prognostic marker for various cancers.[164]^54 MYB has been confirmed to play an essential role in the regulation of hematopoiesis.[165]^55 As far as we know, MYB has abnormal expression or rearrangement or undergoes translocation in leukemia and lymphoma and is considered to be an oncogene.[166]^56 In recent years, the implementation of advanced molecular biology and genetic manipulation techniques in vitro and in vivo has further revealed the critical role of MYB in many types of tumors, including leukemia, colon cancer, breast cancer, and adenoid cystic carcinoma.[167]^57–59 In general, the high level of MYB expression is related to the blockade of cell differentiation and continued proliferation, which leads to carcinogenicity.[168]^60 Clinically, MYB has been found to have abnormal expression, rearrangement, or translocation in many types of leukemia and lymphoma, so it is considered to be an oncogene, and its strange expression or rearrangement or translocation is closely related to poor prognosis.[169]^61 Still, there a study has found that lower levels of MYB expression have been shown to promote an ET-like disease in mice and megakaryocytic development in humans.[170]^62 SPI1 (also named hematopoietic transcription factor PU.1) encodes an ETS (Erythroblast Transformation Specific) domain transcription factor, which activates gene expression during the development of myeloid cells and B lymphocytes, and can also specifically participate in the differentiation or activation of macrophages or B cells.[171]^63 In recent years, research has found that SPI1 as a critical regulator of many steps in the hematopoietic process, not only can limit the self-renewal of blood stem cells but also the imbalance of its expression or activity leads to leukemia.[172]^64^,[173]^65 Moreover, SPIB (SPI1 Related Transcription Factor) also can act as a lymphoid-specific enhancer and promote the development of plasmacytoid dendritic cells (pDCs), which can produce large amounts of interferon.[174]^66 SPIB has been shown to regulate many genes essential for BCR-mediated signals, including Igβ heavy chain, Ig light chain (α and α), mb-1 (Igα), and tyrosine kinase BTK.[175]^67 Recently, a study has found that SPIB with the diffuse large B-cell lymphoma (DLBCL).[176]^68 Its expression is essential for the survival of patients with DLBCL and contributes to cell apoptosis through the PI3K-AKT pathway.[177]^69 Besides, gene expression profiling, and IHC staining have both detected SPIB in some solid malignant tumors, indicating that SPI1 and SPIB may be abnormally expressed in In many types of tumors.[178]^70 Also, the other TFs such as GATA2, CEBPA, FLI1, SPI1, CREB3 may also have been implicated in the pathogenesis of leukemic transformation of myeloproliferative neoplasm.[179]^71–75 These results of TFs identification revised us to an in-depth understanding of the regulatory network of MMTRs in ET. In the procession of REACTOME pathway analysis, we also found that the DEGs were mainly enriched in the innate immune system, neutrophil degranulation, interleukin-6 signaling. As far as we know, the abnormal expression of cytokines related to inflammation and immune regulation was ubiquitous in MPN patients, especially those with ET.[180]^76 Thus, driving clonal evolution by inducing mutations in additional inflammatory and immunomodulatory genes can further enhance cytokine and cytokine release, thereby promoting MPNs development.[181]^77 Also, recent clinical trials have proved that IL-6 can mediate the occurrence and development of MPN through the IL-6/JAK2/STAT3 axis, and current clinical trials about new drugs target the IL- 6/JAK2/STAT3 signal axis have achieved encouraging results.[182]^78 Besides, the REACTOME database used to conduct pathway enrichment analysis also presented additional enrichment in both MAP2K and MAPK activation, signaling by moderate kinase activity BRAF mutants, and the role of these classic signal pathway has been confirmed related with MPNs through an effect on the immune system and inflammation processes.[183]^79–81 Subsequently, we further analyzed the protein interaction network in the STRING database and the top 10 hub genes screened out, such as ACTB, PTPRC, ACTR2, FYB, STAT1, ETS1, IL7R, IKZF1, FGL2, and CTSS, were both down-regulated in two profiles. Interestingly, we found that the hub genes we identified were both down-regulated in a bone marrow cohort, and we also confirmed that the expression level of 8 identified hub genes was down-regulated in our clinical samples. However, it is worth mentioning that two of the hub genes (IL7R and ETS1) are contrary to the results of previous findings, and the result indicating that the deregulation of hub genes might be started from patients’ bone marrow and exist in peripheral blood. Besides, we used multiple databases to identify and analyzed the therapeutic drugs targeting these genes, and five target drugs were finally determined. Studies have found that ACTB appeared to be strongly associated with the JAK2 V617F mutation in ET patients, and the drug interaction analysis showed that ACTB was target by cyclophosphamide.[184]^82 PTPRC (also named CD45) is protein tyrosine phosphatase (PTP) as a natural counter-product of PTK activity. A single-Cell RNA-Seq research has determined that the PTPRC gene can be stably expressed in MPN stem and progenitor cells.[185]^83 Then, the target drug identification shows that prednisone and fluorouracil have been detected as the potential drug for targeting PTPRC.[186]^84 The defect of the IL7R gene may be related to severe combined immunodeficiency (SCID), and IL7R could lead to the activation of downstream JAK/STAT and PI3K/Akt/mTOR signaling cascades, so loss-of-function mutations in IL7R could play an essential role in the oncogenesis of MPNs.[187]^85 IL7R signals through the JAK/STAT (specifically, JAK1) and PI3K/mTOR pathways and gain-of-function mutations result in their constitutive activation; there is research that finds that Ruxolitinib would be expected to abrogate this signaling.[188]^86 IKZF1 gene encodes a transcription factor that is associated with MPN transformation, and lenalidomide could significantly reduce the abundance of IKZF1 in the multiple myeloma cell line.[189]^87^,[190]^88 Although we have verified the relationship between the above four genes and MPN through the literature, as well as potential therapeutic drugs that target these genes, the relationship between STAT1 and MPN is still worthy of our attention. STAT1 is one of the members of the STAT family, which is involved in cell proliferation, differentiation, immune response, and other essential biological processes.[191]^89 Studies have shown that under the stimulation of IFN, the phosphorylation of STAT1 will activate downstream gene expression, and the megakaryocytes of mice lacking STAT1 showed a decrease in polyploidy. This evidence indicates that STAT1 signal transduction plays an essential function in the differentiation of normal megakaryocytes. Under the background of JAK2V617F, the deletion of the STAT1 gene can lead to the occurrence and development of ET and aggravate the disease burden.[192]^90 Altogether, these dysregulated hub genes and key regulators are associated with complications and pathogenesis of ET. It should be emphasized that this study still has some limitations. The [193]GSE124281 and [194]GSE61629 were originated from different experimental platforms ([195]GPL570 vs [196]GPL1058), which may lead to internal heterogeneity of the study. Another limitation of this study is that the sample size (n=7) of the bone marrow cohort is small, and our findings need to be further validated with a larger clinical sample. Conclusion Above all, the hub genes and key regulators were screened out as the critical markers in ET, and these markers could mediate the occurrence, development, and diagnosis of MPN by affecting the immune system and inflammatory processes. The hub genes we screened in this present study both have essential diagnostic value, and we found that the deregulation of hub genes might be started from bone marrow and exist in peripheral blood. Besides, we find that cyclophosphamide, prednisone, fluorouracil, ruxolitinib, and lenalidomide could be potential treatments through targeting on CTB, PTPRC, IL7R, IKZF1. Due to the complex pathogenesis of ET and many influencing factors, it is necessary to explore and sort out the genes and interaction clusters with logical relationships and practical significance, then to find out the entry point of the follow-up research work. Acknowledgments