Abstract It has been indicated that there is an association between inflammatory bowel disease (IBD) and hepatocellular carcinoma (HCC). However, the molecular mechanism underlying the risk of developing HCC among patients with IBD is not well understood. The current study aimed to identify shared genes and potential pathways and regulators between IBD and HCC using a system biology approach. By performing the different gene expression analyses, we identified 871 common differentially expressed genes (DEGs) between IBD and HCC. Of these, 112 genes overlapped with immune genes were subjected to subsequent bioinformatics analyses. The results revealed four hub genes (CXCL2, MMP9, SPP1 and SRC) and several other key regulators including six transcription factors (FOXC1, FOXL1, GATA2, YY1, ZNF354C and TP53) and five microRNAs (miR-124-3p, miR-34a-5p, miR-1-3p, miR-7-5p and miR-99b-5p) for these disease networks. Protein-drug interaction analysis discovered the interaction of the hub genes with 46 SRC-related and 11 MMP9- related drugs that may have a therapeutic effect on IBD and HCC. In conclusion, this study sheds light on the potential connecting mechanisms of HCC and IBD. Introduction Hepatocellular carcinoma (HCC), a major malignant form of the liver, is known as one of the most dangerous cancers and a leading cause of cancer deaths globally [[34]1]. Surgical resection, transplantation and local ablation remain as standard therapeutic regimens for patients at an early stage of HCC with high overall survival (OS) rates. HCC patients with later stages, on the other hand, are usually subscribed for radio-/chemotherapies with significantly poorer outcomes [[35]2]. These findings suggest an urgent demand for novel early diagnostic biomarkers and clinical treatment guidance for HCC patients. Generally, HCC develops in patients with cirrhosis and chronic liver inflammation, which were driven by a hepatitis virus infection, alcohol consumption, long-term smoking and non-alcoholic fatty liver-associated diseases [[36]3–[37]6]. Emerging evidence has demonstrated a contributing role of gut microbiomes in HCC development. Accordingly, microbial dysbiosis and leaky gut stimulate the release of microbiota-associated metabolites, remarkably contributing to hepatic inflammation, fibrosis, cell growth and anti-apoptosis signals [[38]7, [39]8]. Inflammatory bowel disease (IBD) is a chronic inflammation of the gastrointestinal tract, which includes Crohn’s disease (CD) and ulcerative colitis (UC) [[40]9]. Despite of being different in the clinical features, the pathogenesis of CD and UC involves the same risk factors, such as genetic susceptibility and alteration in gut microbiome and immune response [[41]9]. Abnormal gut microbiota, for example, may contribute to intestinal inflammation and immune response dysregulation that eventually result in IBD [[42]9–[43]11]. Moreover, recent researches have demonstrated that severe IBD can lead to gastrointestinal cancers [[44]12, [45]13] as well as various extra-intestinal manifestations including cardiovascular diseases, immune-mediated diseases [[46]14–[47]17] and malignancies such as cholangiocarcinoma, lymphoma, melanoma [[48]18–[49]23]. Remarkably, HCC risk among patients with IBD, especially among those with CD, have been repeatedly reported [[50]21, [51]24–[52]26]. These findings suggested a potential cross-talk in the pathophysiological pathways of IBD and HCC that needs to be further elucidated. Despite numerous efforts on decoding the fundamental signaling molecules, the biomarkers and mutual underlying molecular pathways between HCC and IBD remained poorly understood. Therefore, in the present study, we have applied a systems biology approach to discover common potential biomarkers and the underlying mechanisms, thereby providing new insights into the pathology and clarifying the mutual immunity mechanisms of IBD and HCC. Materials and methods Data query The workflow for the current study is presented in [53]Fig 1. To discover the differentially expressed genes (DEGs) in IBD and HCC, three microarray datasets namely [54]GSE75214 ([55]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75214) for IBD and [56]GSE14520 ([57]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE14520) and the Cancer Genome Atlas TCGA-LIHC ([58]https://www.cancer.gov) for HCC were used as the input. IBD dataset ([59]GSE75214) comprises the gene expression of biopsies from the colon of 11 controls, 97 UC and eight CD patients and from the (neo-)terminal ileum of 11 controls and 67 CD patients. The HCC dataset [60]GSE14520 contains the gene expression of 225 HCC samples and 220 normal liver tissues and TCGA-LIHC includes 50 normal control and 374 HCC samples. Fig 1. The workflow of the current study. [61]Fig 1 [62]Open in a new tab HCC, hepatocellular carcinoma; IBD, inflammatory bowel disorder; DEGs, differentially expressed genes; PPI, protein-protein interaction; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; TFs, Transcription factors. Differential gene expression analysis Prior to the differential gene expression analyses, data were transformed using log2 function in R program (ver. 4.0.2; R Development Core Team, Vienna, Austria). Then, a principal component analysis (PCA) was conducted using prcomp function to remove outliers. The genes (probes) that were expressed in less than three samples were excluded. The Limma [[63]27] and DESeq2 packages [[64]28] were used to find the significant DEGs and the p-values were corrected using the False Discovery Rate (FDR) correction toolkit in the R software (ver. 4.0.2; R Development Core Team, Vienna, Austria) for GSE and TCGA data. The significant DEGs with FDR < 0.05 for GSE (fold change > 1) and TCGA (fold change > 2) data were identified. A total of 1,793 immune genes (IMGs) were attained from 17 categories from the Analysis Portal (ImmPort) website ([65]https://www.immport.org) and Immunology Database after excluding the duplicates [[66]29]. The common DEGs between IBD, HCC and IMGs datasets were then identified and visualized by using VennDiagram package [[67]30]. Gene ontology and pathway enrichment analysis Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway via the clusterProfiler package [[68]31] were used to assess the enrichment of common immune-related DEGs. A p-value < 0.01 was used as a cut-off to determine the significant enrichment GO terms and KEGG pathways. Identification of hub genes via protein-protein interaction (PPI) network construction PPI network based on STRING ([69]https://string-db.org/), an online tool for protein interaction analysis, was constructed for a set of common immune-related DEGs. Homo sapiens was selected as the organism for subsequent analysis. Network visualization in STRING was transferred to Cytoscape software ([70]http://www.cytoscape.org/; version 3.8.2) to explore target modules and potential hub genes. The modules for potential hub genes in the PPI network via MCODE plugin were identified with MCODE score > 2 and nodes > 3 as the cut-off criteria. The interaction scores with a moderate confidence level of 0.4 was considered as the cut-off for constructive visualization and disjoint nodes were hidden. Degree > 21 was selected as a cut-off for hub gene identification. Survival analysis The TCGA data of 347 patients with HCC were used for survival analysis. These HCC patients were assigned into low-risk and high-risk groups based on their median value of the prognostic risk score. Univariate Cox regression analysis was performed to explore the correlation between the DEGs and OS [[71]32]. The hazard ratio (HR) of death was computed and Bonferroni adjusted p-values < 0.05 was considered statistically significant. The protein expressions of prognostic hub genes The Human Protein Atlas (HPA; [72]https://www.proteinatlas.org/) is an open-access resource for human transcriptome and proteome [[73]33]. The hub genes were validated by immunohistochemistry results in HCC and normal tissues obtained from the HPA database and a previous study [[74]34]. The immune infiltration of prognostic hub genes Since lymphocyte infiltration is an important indicator for lymph nodes’ status and cancer survival, the association between the hub genes expression levels and the immune infiltration levels in HCC was evaluated using TIMER version 1 (TIMER: Tumor Immune Estimation Resource) [[75]35]. Identification of DEGs-interacted transcription factors and microRNAs To identify transcription factors (TFs) and microRNAs (miRNAs) that bind to the hub genes to regulate their expression, TF-target and miRNA-target interaction analyses were performed using two open-access databases, JASPAR [[76]36] and MirTarbase [[77]37], respectively, followed by a topological analysis using NetworkAnalyst [[78]38]. Protein-drug interaction analysis Protein-drug interaction analysis provides information about the potential interaction between drugs and the target genes [[79]39]. To identify potential drugs from the Comparative Toxicogenomics Database (CTD) that might interact with the common DEGs, protein-drug interaction analysis was performed via NetworkAnalyst [[80]38]. Gene-disease association analysis The gene-disease associations by DisGeNET, which cover a wide range of biomedical characteristics of diseases, was commonly used to understand human genetic diseases [[81]40]. The relationship between common DEGs and associated diseases was explored through NetworkAnalyst [[82]38]. DNA methylation analysis The UALCAN tool was used to find the correlation between DNA methylation and four hub genes. It provided information on TCGA gene expression, DNA methylation, clinical data and friendly web resource [[83]41]. Results Common DEGs among HCC, IBD and IMGs Gene expression datasets (HCC-[84]GSE14520, HCC-TCGA and IBD-[85]GSE75214) and IMGs list were collected in the current study. There were 9,045, 10,657, and 4,406 significant DEGs in the IBD-[86]GSE75214, HCC-[87]GSE14520, HCC-TCGA datasets, respectively. The Jvenn tool showed common DEGs among groups, 112 common DEGs among HCC, IBD and IMGs have been detected ([88]Fig 2; [89]S1 Table). These common DEGs were then subjected to further downstream analyses. Fig 2. The Venn diagram for visualization of immune-related differentially expressed genes found in IBD, HCC. [90]Fig 2 [91]Open in a new tab The value represented the number of unique gene symbols covered from the ensemble IDs and probe IDs. IBD, Inflammatory bowel disease; HCC, Hepatocellular carcinoma; IMGs, immune genes; TCGA, The Cancer Genome Atlas. Gene ontology and pathway enrichment analysis of common DEGs The top significantly enriched GOs and KEGG pathways were shown in [92]Fig 3A and 3B, respectively. The GO analysis revealed that these common DEGs were significantly contributed to negative regulation of response to external stimulus, cell chemotaxis and regulation of inflammatory response under biological process ([93]Fig 3A). For cellular component-GOs, common DEGs were significantly involved in the external side of the plasma membrane and secretory granule lumen. Lastly, for molecular function, DEGs were mainly involved in receptor-ligand activity, signaling receptor activator activity, cytokine activity, etc. ([94]Fig 3A). In addition, the most importantly enriched KEGG pathways included cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptors, axon guidance, IL-17 signaling pathway in cancer ([95]Fig 3B). Fig 3. [96]Fig 3 [97]Open in a new tab The mainly enriched (A) gene ontologies (GOs) and (B) KEGG pathways for 112 DEGs. The abscissa represents the number of genes enriched in the function. MF, molecular function; CC, cellular component; BP, biological process. Determination of hub proteins This PPI network analysis via STRING database is commonly used to investigate the biological responses in disease and health conditions. The PPI network of 112 common DEGs analysis revealed 20 hub genes that meet the cut-off degree > 21 ([98]Fig 4A; [99]S2 Table). Six modules for potential hub genes in the PPI network with MCODE score > 2 and nodes > 3 were identified ([100]S1 Fig). The visualization of the main module using Cytoscape showed four hub proteins in the center, namely CXCL2, MMP9, SPP1 and SRC ([101]Fig 4B). The topological features and involvement of the hub proteins in HCC and IBD were presented in [102]Table 1. Fig 4. The protein-protein interaction (PPI) network of 112 immune-related differently expressed genes. [103]Fig 4 [104]Open in a new tab The hub proteins were selected based on the topological parameter (degree>21). (A) The PPI network was generated using STRING. (B) The main module showed four hub genes in the center by MCODE plugin in the Cytoscape. Table 1. Overview of four hub proteins obtained from the protein-protein interaction network in HCC and IBD. Symbol Degree Aspect CXCL2 >21 • CXCL2 is downregulated in HCC. Overexpression of CXCL2 inhibits HCC cell proliferation and tumor growth; induces apoptosis [[105]34]. • [106]CXCL2 was highly expressed in the inflamed colon of IBD patients [[107]42, [108]43]. Overexpression of the corresponding receptor CXCR2 in mesenchymal stromal cells induces anti-inflammatory effect [[109]42]. MMP9 >21 • MMP9 is associated with tumor invasion and poor outcomes and is expected to be a potential predictive marker for HCC patients [[110]44]. • MMP9 is upregulated in inflamed mucosa or serum of IBD patients and is a novel marker for intestinal inflammation [[111]45]. SPP1 >21 • SPP1 promotes tumor growth in HCC; a diagnostic and therapeutic marker for HCC; SPP1 polymorphisms are associated with HCC occurrence [[112]46, [113]47]. • SPP1 is up-regulated in IBD. The SPP1 expression by CD103−dendritic cells (DCs) is crucial for their pathogenicity. Inhibiting the interaction of SPP1 with integrin α9 expressed on CD103−DCs abolished their inflammatory effects [[114]48]. SRC >21 • SRC promotes HCC progression, invasion and metastasis [[115]49, [116]50]. • c-SRC activity is highly induced in premalignant ulcerative colitis epithelia, and is strongly associated with colon cancer development [[117]51]. [118]Open in a new tab HCC: Hepatocellular carcinoma; CXCL2: C-X-C Motif Chemokine Ligand 2; MMP9: Matrix Metalloproteinase 9; SPP1: Secreted Phosphoprotein 1; SRC: Proto-oncogene tyrosine-protein kinase Src. The hub genes validation The twenty hub genes from the PPI network with the highest degree were subjected to the survival analysis using univariate Cox analysis. The results exposed the significance of four hub genes (CXCL2, MMP9, SPP1 and SRC) as prognostic makers of HCC ([119]Fig 5; [120]S3 Table). Specifically, the increased expression levels of MMP9 (p = 0.028), SPP1 (p = 0.0001) and SRC (p = 0.032) and the decreased expression levels of CXCL2 (p = 0.026) were strongly related to poorer prognosis in HCC patients ([121]Fig 5). Fig 5. [122]Fig 5 [123]Open in a new tab Association of the overall survival and four potential hub genes (A) CXCL2; (B) MMP9; (C) SPP1; (D) SRC in HCC based on Kaplan–Meier plotter. The horizontal axis signifies the time to event (in days). The patients were stratified into the high- and low-risk-level group and labeled with green and red color, respectively. HR is the hazard ratio of the high-risk over low-risk groups and p < 0.05 indicates a statistically significant difference. The hub genes immunohistochemistry expression The protein expression levels of four hub genes (CXCL2, MMP9, SPP1 and SRC) in HCC and control group was explored via the HPA database and a previous study [[124]34]. Accordingly, protein expression levels of MMP9, SPP1 and SRC were substantially increased and CXCL2 was decreased in HCC compared to the controls ([125]Fig 6). Fig 6. Immunohistochemistry of three hub proteins from the Human Protein Atlas database. [126]Fig 6 [127]Open in a new tab Protein levels of (A) MMP9 in HCC tissues; (B) MMP9 in normal liver tissues; (C) SPP1 in HCC tissues; (D) SPP1 in normal liver tissues; (E) SRC in HCC tissues; (F) SRC in normal liver tissues. The correlation between hub genes and immune cell infiltration and their methylation status in HCC We used TIMER online tool to explore association between hub genes and six immune cell types (CD4+/CD8+ T cells, B cells, macrophages, neutrophils and dendritic cells) and tumor purity by the Spearman tests. These analyses indicated that MMP9 expression was significantly correlated infiltrating levels of all six immune cell types and tumor purity, especially high for dendritic cell, B cell, macrophage and CD8+ T cells; SPP1 expression was significantly associated with infiltrating levels of macrophage and dendritic cell; and SRC expression was significantly linked to macrophage, dendritic cell, CD4+ T cells, B cell and neutrophil ([128]S2 Fig). In addition, gene expression and methylation analysis showed significant differences in both gene expression ([129]Fig 7A) and methylation patterns ([130]Fig 7B) of CXCL2, MMP9, SPP1 and SRC between liver tumor and normal liver tissue samples. Furthermore, a negative correlation between methylation patterns and gene expression was also noted for three genes (MMP9, SPP1 and SRC). This finding indicated that upregulation of these three hub genes might be a result of their diminished DNA methylation in HCC. Fig 7. The expression and methylation status of hub genes in hepatocellular carcinoma (n = 377) as compared to the normal controls (n = 50) using TCGA samples. [131]Fig 7 [132]Open in a new tab (A) Expression levels of CXCL2, MMP9, SPP1 and SRC using TCGA samples; (B) Promoter methylation levels of CXCL2, MMP9, SPP1 and SRC, respectively. Determination of regulatory signatures Next, a network-based approach was performed to screen for the DEG-TF, DEG-miRNA interactions, thereby detecting the potential regulatory molecules of the hub DEGs. The gene-TF and gene-miRNA networks revealed six TFs namely FOXC1, FOXL1, GATA2, YY1, ZNF354C and TP53 ([133]Fig 8A; [134]S4 Table) and five miRNAs namely miR-124-3p, miR-34a-5p, miR-1-3p, miR-7-5p and miR-99b-5p ([135]Fig 8B; [136]S5 Table) as the potential regulators of the four hub genes. The biological functions of these TFs and miRNAs in HCC are presented in [137]Table 2. Fig 8. Hub gene interaction network. [138]Fig 8 [139]Open in a new tab (A) Hub genes—transcription factors (TFs): The red spheres represent four hub genes; the blue squares represent six major hub genes-associated TFs and the cyan squares represent other gene-associated TFs; (B) Hub genes—miRNAs: red spheres represent four hub genes; the bigger blue squares show five main hub gene-associated miRNAs; the smaller blue squares show other hub gene-associated miRNAs. Table 2. Top regulatory signatures of common DEGs predicted from DEG-TF and DEG-miRNA interaction networks. Factors Biological functions References