Abstract Background Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related mortality globally, and ranks fifth in terms of incidence. It primarily affects males and has a high prevalence in Asia. Risk factors include hepatitis B and C, liver cirrhosis, nonalcoholic fatty liver disease (NAFLD), and alcohol consumption. Late-stage diagnosis results in a poor survival rate of approximately 20%, underscoring the need for early detection methods to improve the survival rates. This study aimed to identify prognostic biomarkers for HCC through bioinformatic analysis of microarray datasets, providing insights into potential therapeutic targets. Methods We analyzed five microarray datasets, comprising 402 HCC samples and 121 control samples. To identify relevant biological pathways, we conducted differential gene expression, Gene Ontology (GO), and KEGG pathway enrichment analyses. We identified hub genes and quantitatively assessed transcription factors and microRNAs targeting these genes. Additionally, molecular docking and dynamic simulations (100 ns) were used to identify potential drug candidates capable of inhibiting the activity of differentially expressed hub genes. Results Our bioinformatic approach identified several promising HCC biomarkers. Among these, CDK1/CKS2 was identified as a key therapeutic target, with a regulatory role in HCC pathogenesis, suggesting its potential for further investigation. Digoxin (DB00390) has been highlighted as a potential repurposed drug candidate because of its favorable drug-likeness and stability, as confirmed by virtual screening, ADMET analysis, molecular docking study and dynamic simulations. Conclusion This study enhances our understanding of HCC biology and offers new insights into drug interactions. It presents several promising biomarkers for the early diagnosis, prognosis, and therapy. Further investigation into CDK1/CKS2 as a therapeutic target and the role of the identified biomarkers could contribute to improved diagnostic and therapeutic strategies for HCC. Keywords: differential gene identification, PPI network, biomarkers, docking, molecular dynamics simulation Highlights * • Prognostic biomarkers for hepatocellular carcinoma (HCC) were identified by analyzing five microarray datasets containing 402 cancer and 121 control samples. * • CDK1 has been observed to be upregulated in HCC samples and is considered a potential oncogene involved in various cellular functions and processes. * • Molecular docking studies indicated that Digoxin (DB00390) has a strong binding affinity for the CDK1/CKS2 protein. * • Root mean square deviation (RMSD) analysis confirmed the stability of the complexes throughout the 100 ns production period. * • Molecular dynamics simulations revealed the flexibility of the binding site, the binding free energy of the complexes, and region-specific residue contributions to ligand binding. 1. Background Liver cancer primarily comprises of hepatocellular carcinoma (HCC) ([40]Bray et al., 2024). It ranks third in cancer-related mortality globally, and is the fifth most common cancer worldwide ([41]Wang et al., 2021). The incidence of HCC is notably higher in males than in females ([42]Chidambaranathan-Reghupaty et al., 2021; [43]Singh et al., 2020), with most cases being reported in Asia ([44]Marques et al., 2020). The risk factors for HCC are diverse, and include HBV or HCV infection, liver cirrhosis, nonalcoholic fatty liver disease (NAFLD), and alcoholism ([45]Ghouri et al., 2017; [46]Bray et al., 2024). HCC induced by HBV and HCV is believed to be particularly prevalent ([47]Ghouri et al., 2017; [48]Singh et al., 2020). Late-stage diagnosis of HCC significantly contributes to early mortality, with survival rates of approximately 20% in patients diagnosed at advanced stages ([49]Zhou et al., 2020). Conversely, early detection can improve survival rates by nearly 70% ([50]Tsuchiya et al., 2015). Numerous pathological biomarkers associated with HCC have been identified, many of which serve as prognostic indicators for poor outcomes. These biomarkers include genes and non-coding RNAs that regulate pathways controlling both proliferative and non-proliferative cellular activities, and may also function as immune checkpoint inhibitors ([51]Marques et al., 2020). Non-coding RNAs, including miRNAs, lncRNAs, and snoRNAs, play a significant role in cellular activities in hepatocellular carcinoma (HCC) by regulating oncogenes and associated signaling pathways ([52]Arefnezhad et al., 2024). Identifying specific biomarkers for HCC is a critical area of research ([53]Schlosser et al., 2022). Although numerous biomarkers have been implicated in the disease, microarray data analysis methods have become widely employed to predict causative biomarkers in various cancers, including HCC. Several studies have utilized bioinformatic approaches to investigate cancers ([54]Shil et al., 2022; [55]Mitra et al., 2023). These studies have adopted diverse methodologies to identify potential biomarkers and to examine their roles in disease prognosis. Most studies have focused on microarray datasets from the Gene Expression Omnibus (GEO) database to identify prognostic genes, miRNAs, and transcription factors associated with HCC. Our study aimed to identify significant biomarkers that can act as prognostic indicators for hepatocellular carcinoma (HCC) through comprehensive bioinformatics analysis. We obtained five datasets from the Gene Expression Omnibus (GEO) database and performed differential gene expression analysis to identify common biomarkers between diseased and control samples. To further our investigation, we employed various bioinformatics tools for Gene Ontology (GO) and KEGG enrichment analyses, hub gene identification, and survival analysis. Additionally, we conducted a quantitative assessment of transcription factors (TFs) and miRNAs targeting the differentially expressed genes (DEGs). We also performed virtual screening and selected potential drug candidates based on ADMET analysis and found Digoxin (DB00390) as an inhibitor of CDK1. Digoxin, a purified digitalis preparation derived from the leaves of Digitalis lanata (foxglove), is primarily used in the therapy of congestive heart failure, breast cancer ([56]Kurzeder et al., 2025) and prostate cancer ([57]Lin et al., 2004). Recently, in hypoxia-induced mice models, HCC tumor size has been significantly reduced by administering Digoxin ([58]Zhang M. S. et al., 2022). However, the detailed mechanism remains unknown. Following the identification of key biomarkers, we applied molecular docking and molecular dynamics simulation techniques to explore drug candidates that may inhibit gene activity, potentially hindering liver cancer progression and offering therapeutic options for patients with HCC. These in silico methods significantly reduce both the cost and time associated with drug discovery while ensuring a reasonable level of accuracy ([59]Aghajani et al., 2022). 2. Methods 2.1. Collection of datasets The microarray datasets [60]GSE41804 ([61]Hodo et al., 2013), [62]GSE45267 ([63]Wang et al., 2013), [64]GSE62232 ([65]Schulze et al., 2015), [66]GSE112790 ([67]Shimada et al., 2019), and [68]GSE121248 ([69]Wang et al., 2007) were obtained from the Gene Expression Omnibus repository ([70]https://www.ncbi.nlm.nih.gov/geo/). Each dataset belonged to the platform [71]GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0. [72]GSE41804 consists of 20 resected live tumor tissues and resected non-tumor liver tissues. [73]GSE45267 contained 48 primary HCC samples and 39 non-cancerous samples. [74]GSE62232 contained 81 HCC solid tumors and 10 non-tumor liver tissues. [75]GSE112790 contained 183 liver cancer tumor tissues and 15 normal liver tissues. [76]GSE121248 contained 70 tumor samples and 37 adjacent normal samples. 2.2. Differential gene expression analysis Differential gene expression analysis was conducted using GEO2R ([77]https://www.ncbi.nlm.nih.gov/geo/geo2r/) for the [78]GSE41804, [79]GSE45267, [80]GSE62232, [81]GSE112790, and [82]GSE121248 datasets. Statistically significant up- and downregulated genes were filtered based on an adjusted p-value (adj. P-value) < 0.05 and |Log2FC| > 2. To identify the common up- and downregulated genes, Venn diagrams were constructed using a Multiple List Comparator ([83]https://molbiotools.com/listcompare.php). 2.3. Gene ontology (GO) and KEGG pathway enrichment analysis GO and KEGG enrichment analyses were performed using DAVID ([84]DAVID Functional Annotation Tools ncifcrf.gov). The groups of common up-and downregulated genes were analyzed separately. An adjusted P-value <0.05 was considered to identify statistically significant terms and pathways. 2.4. Construction of PPI network and hub gene selection The protein-protein interaction network of the common up-and downregulated gene set was generated in the Integrated Interactions Database ([85]IID utoronto.ca) and by selecting “Retrieve all PPIs of query proteins’ in the option ‘Select retrieval strategy.” Network analyses were performed using Cytoscape ([86]https://cytoscape.org/) to identify the top 10 hub genes using the plugin Cytohubba (through the EPC algorithm). 2.5. Identification of transcription factors (TF) associated the DEGs The TF2DNA database ([87]https://www.fiserlab.org/tf2dna_db/) was used to identify the Transcription factors (TFs) that regulate the transcription of upregulated and downregulated genes. The species chosen was Homo sapiens, Sources: TF2DNA (computational), and p-value = 0.0001. 2.6. Identification of miRNA targets The miRWalk database ([88]http://mirwalk.umm.uni-heidelberg.de/) was used to identify the miRNAs that target the mRNAs of the upregulated and downregulated genes. The top five upregulated and downregulated genes were chosen for analysis. 2.7. Survival analysis and validation of hub genes GEPIA ([89]http://gepia.cancer-pku.cn/) was used to identify potential candidate biomarkers associated with liver Hepatocellular Carcinoma (LIHC) prognosis. For this purpose, we assessed the genes that showed statistically significant (p < 0.05) Overall Survival (OS) and Disease-Free survival of patients (DFS) affected with LIHC. The survival analysis was done using the “overall survival” method and the median group cutoff was set to 50% for both high and low and the hazards ratio was taken into consideration. The expression levels of these genes in LIHC-affected tissues were determined by comparing their expression levels in normal tissues. Analyses were based on the TCGA-LIHC cohort. 2.8. Virtual screening and molecular docking The necessary PDB file (PDB ID: 6GU7) was downloaded from the Protein Data Bank. This file contains the crystalized structure of CDK1, CDK1 regulatory subunit 2 and the co-crystallized molecule AZD-5438 ([90]Wood et al., 2019). AZD-5438 is identified as a potential inhibitor of CDK1/CKS2 ([91]Byth et al., 2009). CDK1 shows high co-occurrence and co-expression with CKS2 ([92]Zhang et al., 2019). The protein file was prepared by removing the water molecules and other ligand molecules, adding missing hydrogens and fixing the atom names in the Discovery Studio (2024) prior to virtual screening. The modified PDB file was submitted to the Drugrep (cao.labshare.cn) server for virtual screening using the databases of FDA approved drugs (contains 4714 drugs) and experimental drugs (contains 6883 drugs) ([93]Gan et al., 2023) out of the three available drug libraries (FDA approved drug library, experimental drug library and traditional Chinese medicine). After screening, DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, and DB11581 were selected for repurposing based on their binding affinity with CDK1/CKS2. These ligands were docked (blind docking) in the HDOCK server ([94]http://hdock.phys.hust.edu.cn/) and used for further analysis. Additionally, AZD-5438 was docked with the CDK1/CKS2 to obtain its binding affinity and root mean square deviation (RMSD) for comparison. 2.9. ADMET analysis Properties, such as absorption, distribution, metabolism, excretion, and toxicity, were tested for the selected molecules to determine their competence as drug molecules. ADMET analysis was performed using the ADMETlab 3.0 web server ([95]Fu et al., 2024). For the calculation of LD50, ProTox web server ([96]Banerjee et al., 2024) was used. Drug likeness was calculated using the pkCSM web server ([97]Pires et al., 2015). 2.10. Molecular dynamics simulation Necessary molecular dynamics simulations were performed to concretize the results of the molecular docking study. MD simulations were performed using Gromacs 2019 software and the Compute Unified Device Architecture (CUDA) API on NVIDIA graphics processors. CDK1/CKS2 and CDK1/CKS2-Digoxin complex derived from molecular docking were imported into the gromacs and converted from *.pdb to *.gro format. Gromacs works with proteins, nucleic acids, and lipids, with predefined force field parameters available in the software itself. However, these required force field parameters are not available for the ligands in Gromacs. So, a third-party server, “Ligand parameter generator” or “Ligpargen” ([98]Jorgensen and Tirado-Rives, 2005; [99]Dodda et al., 2017a; [100]Dodda et al., 2017b) was used to generate force field parameters for ligands. The output files from Ligpargen are used directly in gromacs, which define its topology and force-field parameters. The protein-ligand complexes were placed in a virtual 3D cube filled with water and simulated using the SPC/E water model. The entire system was then electrically neutralized by adding the necessary monovalent Na+ and Cl-ions, and a GROningen MOlecular Simulation (GROMOS) field was applied. A constant temperature ([101]Akharume and Adedeji, 2023) of 300 K and 1.00 bar pressure was applied to the system using NPT and NVT as an ensemble class. A leapfrog integrator was used to compile the motion equations with a 2-fs time step size. A Verlet (buffered neighbor searching) cutoff scheme was used for non-bonded interactions, a modified Berendsen thermostat was applied to control the temperature, and the Parrinello-Rahman method was used to regulate pressure. Particle Mesh Ewald was used for calculations of long-range electrostatics. Both the short-range electrostatic and short-range Van der Waals cutoffs ([102]Piana et al., 2012) were set to 1.0 nm. Standard Gromacs protocols were used to minimize the energy of the system and perform the simulation for 100 ns. Similar parameters were used to simulate the apoprotein (CDK1/CKS2) for 100 ns for comparison. Tools provided by Gromacs software were used to calculate the trajectories of individual atoms. The same software was used to plot the Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), solvent-accessible surface area (SASA), Radius of Gyration (RoG) for both the CDK1/CKS2 and CDK1/CKS2-Digoxin complex. SASA was calculated using the rolling ball algorithm ([103]Richmond, 1984). Additionally, the number of hydrogen bonds, hydrogen bond angle and bond distance during simulation time (100 ns) were calculated for CDK1/CKS2-Digoxin complex. Another tool g_mmpbsa ([104]Kumari et al., 2014) was used to calculate the Molecular Mechanics - Poisson-Boltzmann Surface Area or MM-PBSA from the topology and trajectory data of CDK1/CKS2-Digoxin complex generated from the simulation. 3. Results 3.1. Common set of genes differentially expressed across the datasets Differential gene expression analysis was performed on five datasets ([105]GSE41804, [106]GSE45267, [107]GSE62232, [108]GSE112790, and [109]GSE121248) using the cut-off criteria of an adj. P-value <0.05 and |log2FC|> 2. Using the Venn diagram ([110]Figure 1), we identified 19 upregulated and 61 downregulated genes that were common among these datasets. FIGURE 1. [111]Two Venn diagrams compare datasets GSE41804, GSE45267, GSE62232, GSE121248, and GSE112790. The left diagram highlights the number 18 in a red circle, while the right highlights 61. Overlapping sections show shared elements between datasets. [112]Open in a new tab Venn diagrams of differential gene expression analysis across five microarray datasets. Common up and downregulated genes are circled in red. 3.2. DEG enriched GO terms and KEGG pathways From the GO and KEGG pathway enrichment analysis in DAVID, 18 GO terms and one pathway were enriched by the upregulated genes, whereas 54 GO terms and 9 KEGG pathways were enriched by the downregulated genes. The GO terms were Biological Process (BP), Cellular Components (CC), and Molecular Functions (MF). The results are shown in [113]Supplementary Table S1 . 3.3. PPI network revealed hub genes The PPI network of both up-and downregulated genes constructed in the Integrated Interactions Database (IID) was analyzed in Cytoscape using the plugin Cytohubba (EPC algorithm). We obtained the top 10 nodes among the upregulated and downregulated genes, as scored by Cytohubba. Since CDK1 and ESR1 in the up-and downregulated networks were the highest ranked, they were predicted to be hub genes. 3.4. Transcription factors (TF) targeting the top DEGs The TFs of the top five upregulated and downregulated genes were identified using the TF2DNA database. The upregulated genes, NEK2, CDK1, SULT1C2, RRM2, and ANLN, were regulated by 22, 76, 7, 42, and 40 TFs, respectively. Similarly, the downregulated genes, ESR1, LIFR, CXCL12, ADRA1A, and IGF2, were regulated by 113, 30, 36, 11, and 136 TFs, respectively. The results are presented in [114]Supplementary Table S2. 3.6. Top DEGs targeted by miRNA The top five upregulated and downregulated genes were subjected to analysis in the miRWalk database to identify miRNAs that participate in the post-transcriptional modification of these genes. The upregulated genes NEK2, CDK1, SULT1C2, RRM2, and ANLN were targeted by 12, 14, 10, 8, and 7 miRNAs, respectively. Similarly, the downregulated genes, ESR1, LIFR, CXCL12, ADRA1A, and IGF2 were targeted by 58, 36, 36, 32, and 31 miRNAs, respectively. The results are presented in [115]Supplementary Table S3. 3.7. Patient survivability affected due to hub gene expression The top five upregulated and downregulated genes were analyzed for their association with overall and disease-free survival in LIHC using the TCGA-LIHC cohort, which was divided into high and low expression groups. Kaplan-Meier analysis of the upregulated genes NEK2, CDK1, SULT1C2, RRM2, and ANLN showed that the CDK1 high-expression group had significantly poorer OS and DFS than the low-expression group and all other upregulated gene groups ([116]Figure 2). In contrast, among the downregulated genes ESR1, LIFR, CXCL12, ADRA1A, and IGF2, ESR1 had the poorest OS and DFS in the low-expression group of LIHC patients compared to the other downregulated gene groups ([117]Figure 2). FIGURE 2. [118]Nine survival analysis graphs depict overall and disease-free survival over 120 months. Each graph compares survival rates of high and low expression groups for different genes (NEK2, CDK1, RRM2, SULT1C2, ANLN). Red lines represent high expression, and blue lines represent low expression. Annotations include Logrank p-values, hazard ratios, and sample sizes. Each graph's x-axis shows time in months, and the y-axis shows percent survival. Nine Kaplan-Meier survival plots show overall and disease-free survival rates by months for different gene expression groups: ESR1, IGF2, LIFR, CXCL12, and ADRA1A. Red lines represent high expression groups, blue lines represent low expression groups. Logrank p-values, hazard ratios, and sample sizes are included for each plot. [119]Open in a new tab Overall survival and Disease free survival of the top 5 up and downregulated genes in HCC. The expression levels of the upregulated and downregulated genes in LIHC-affected and normal TCGA cohort data were also evaluated using GEPIA ([120]Figure 3). All genes, except NEK2 and ANLN, showed statistically significant upregulation in LIHC samples compared to normal samples. Similarly, except for LIFR, all other genes showed statistically significant downregulation in LIHC samples compared to that in normal samples. FIGURE 3. Boxplots showing expression levels of eight genes (CDK1, NEK2, RRM2, SULT1C2, ANLN, ESR1, IGF2, and LIFR) in tumor (red) versus normal (gray) liver samples. Each plot includes individual data points, median lines, and whiskers indicating variability outside the upper and lower quartiles. Significance is indicated with an asterisk. The sample size is 369 tumor and 50 normal samples for each gene. [121]Open in a new tab Expression level of the up and downregulated genes in log[2] (TPM + 1) in LIHC patients compared to normal. 3.8. Properties of drug molecules identified by ADMET analysis ADMET analysis provides insight into various properties of drug molecules, such as absorption, distribution, metabolism, excretion, and toxicity ([122]Tables 1–[123]5). These properties help determine whether the drug should be taken orally, its absorption rate in the gastrointestinal tract, the likelihood of being metabolized by the cytochrome P450 group of enzymes, and the excretion rate of the drug molecule ([124]Zhao et al., 2021). It also sheds light on potential toxic effects, such as hERG-inhibition, skin sensitization, hepatotoxicity, neurotoxicity, carcinogenicity, and various effects on the eyes. TABLE 1. Absorption properties of the drug candidates. Absorption DB00390 DB00511 DB00696 DB00872 DB01396 DB09102 DB11581 Caco-2 Permeability −4.974 −5.21 −5.029 −4.931 −5.192 −5.002 −4.927 MDCK Permeability (10–6 cm/s) −4.892 −4.773 −4.748 −4.685 −4.763 −4.618 −4.734 PAMPA (logPeff) 0.26 0.411 0.108 0.017 0.954 0.574 0.811 Pgp-inhibitor 0.001 0.001 0.095 0.985 0.001 0.988 0.021 Pgp-substrate 0.933 0.998 1.0 0.812 1.0 0.306 0.058 HIA 0.0 0.0 0.019 0.0 0.005 0.001 0.0 [125]Open in a new tab TABLE 5. Toxicity properties of the drug candidates. Toxicity DB00390 DB00511 DB00696 DB00872 DB01396 DB09102 DB11581 hERG Blockers no no no no no yes no hERG Blockers (10um) no no no yes no yes no skin sensitization yes yes yes yes yes yes yes AMES Mutagenicity yes yes yes yes yes yes yes Carcinogenicity no yes yes yes yes yes yes Eye Corrosion no no no no no no no Eye Irritation no no no yes no no no Human Hepatotoxicity yes yes yes yes yes yes yes Drug-induced Neurotoxicity no yes yes yes yes yes yes LD50 (mg/kg) 5105 5105 800 3550 650 700 2000 [126]Open in a new tab TABLE 2. Distribution properties of the drug candidates. Distribution DB00390 DB00511 DB00696 DB00872 DB01396 DB09102 DB11581 PPB (%) 91.751 95.179 94.243 98.012 95.487 98.201 97.635 VDss (L/kg) −0.393 −0.373 −0.287 −0.032 −0.395 0.076 0.114 BBB 0.977 0.043 0.85 0.809 0.019 0.0 0.598 OATP1B1 inhibitor 0.913 0.926 0.08 0.79 0.756 0.996 0.885 OATP1B3 inhibitor 0.0 0.001 0.007 0.39 0.069 0.994 0.69 BCRP inhibitor 0.0 0.0 0.001 0.064 0.00 0.269 0.0 MRP1 inhibitor 1.0 0.993 0.693 0.887 0.877 0.948 0.293 [127]Open in a new tab TABLE 3. Metabolic properties of the drug candidates. Metabolism DB00390 DB00511 DB00696 DB00872 DB01396 DB09102 DB11581 CYP2D6 substrate yes yes yes no yes yes yes CYP3A4 substrate yes yes no yes yes yes yes CYP1A2 inhibitor yes yes no no yes yes no CYP2C19 inhibitor no no no no no no yes CYP2C9 inhibitor yes yes no no yes yes no CYP2D6 inhibitor yes yes yes no yes yes yes CYP3A4 inhibitor yes no no no no yes yes [128]Open in a new tab TABLE 4. Excretion properties of the drug candidates. Excretion DB00390 DB00511 DB00696 DB00872 DB01396 DB09102 DB11581 CL[plasma] (mL/min/kg) 3.56 2.136 7.853 4.619 1.705 3.578 1.003 T[1/2] (hours) 2.583 2.053 0.946 0.506 2.339 1.103 1.355 [129]Open in a new tab Caco-2 is a monolayer cell line that is used as a model for the emulation of intestinal drug absorption ([130]Van Breemen and Li, 2005). A high value (>−5.15 log unit) indicates that the drug is likely to be absorbed in the intestines upon oral administration ([131]Flores‐Holguín et al., 2021). DB00390, DB00872, and DB11581 have high permeability, suggesting that these drugs have higher absorption rates than the rest. All drugs except DB00872 and DB09102 can act as p-glycoprotein inhibitors (pgp-inhibitors), while only DB11581 can act as a pgp-substrate. This has a significant effect on drug fate. The value of the volume of distribution (VDss) shows whether the drug is likely to be distributed in the plasma or tissue. Higher VDss values (>−0.45) suggest that the drug is distributed in the tissue ([132]Han et al., 2019). All drugs, except DB09102 and DB11581, were estimated to be distributed in the plasma. DB00390, DB00696, DB00872, and DB11581 are more likely to cross the blood-brain barrier ([133]Pires et al., 2015). Drugs are typically metabolized by Cytochrome P450 enzymes including CYP3A4, CYP1A2, CYP2C1, CYP2C9, and CYP2D6 ([134]Ogu and Maxa, 2000). It is important for the drug to be metabolized by cytochrome P450 enzymes, as this phenomenon prohibits the accumulation of the drug in the system and causes side effects ([135]Jindal et al., 2019). DB09102 and DB00390 have a high chance of being metabolized, whereas DB00696 and DB00872 are likely to accumulate in the body. The CL[plasma] parameter shows the renal absorption rate of a drug ([136]Horde and Gupta, 2024). DB00390, DB00511, DB00872, and DB09102 exhibited low renal clearance and DB01396 as well as DB11581 exhibited extremely low renal clearance. These drugs can have high efficiency as they are more likely to be reabsorbed by the kidneys. However, the T[1/2] values, which indicate the half-life of the drug inside the body, were favorable for DB00390 and DB00511. While assessing toxicity, it was found that except DB00390, all the other drugs exhibited some levels of carcinogenicity. None of the drugs caused eye corrosion, and only DB00872 caused eye irritation. However, all drugs displayed some levels of skin sensitization, Mutagenicity, and Hepatotoxicity. DB00390 alone showed no drug-induced neurotoxicity. The lethal Dose 50 (LD50) parameter was used to show the amount of drug needed to be lethal to 50% of the test subjects. This value was calculated in mg/kg units. DB00390 and DB00511 are safe for oral administration ([137]Gadaleta et al., 2019). All the drugs were screened to estimate their drug-likeness based on their physicochemical characteristics ([138]Table 6). DB00696 and DB00872 fall within the range of 100–600 g/mol and satisfy rule-1 however all the other drugs slightly deviate from this range and have molecular weights of approximately 700 g/mol. DB00872 only follows rule 2 and shows a log P value within the 0–3 log mol/L range. All drugs chosen here show 0-11 rotatable bonds and follow rule 3. DB00390, DB00696, and DB00872 show hydrogen bond acceptors less than 12, maintaining rule 4. All drugs showed less than seven hydrogen bond donors, upholding rule 5. However, no drug falls within the range of 0–140 A^2 surface area and does not uphold rule 6 ([139]Alsfouk et al., 2024). The six drug selection rules applied in this study should be regarded as flexible guidelines rather than rigid cutoffs, similar to the Rule of Five (RoF), which suggests that compounds failing two or more criteria may have poor oral bioavailability, though each criterion is often used individually as a filter ([140]Yusof and Segall, 2013). Moreover, DB00390 (Digoxin) is a well-established drug molecule, and digitalis-based drugs such as digitoxin and Digoxin are still in clinical use as oral medications for treating heart failure and atrial arrhythmias ([141]Zhou et al., 2023). TABLE 6. Physicochemical properties to determine drug-likeness. Name Molecular weight (g/mol) LogP Rotatable bonds H-bond acceptors H-bond donors Surface area (A^2) DB00390 757.765 4.9059 3 11 2 315.675 DB00511 774.731 5.6738 6 14 2 323.322 DB00696 569.577 2.506 1 9 0 244.996 DB00872 496.570 4.86952 3 4 2 219.165 DB01396 732.694 5.3213 6 13 4 305.916 DB09102 722.762 5.6664 5 14 0 308.777 DB11581 852.329 9.6965 9 13 2 355.938 [142]Open in a new tab 3.9. Candidate drugs selected by molecular docking analysis Seven drug candidates were selected for molecular docking studies: DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, and DB11581. The HDOCK server was used for the molecular docking study, and the results are summarized in [143]Table 7. Four of the seven selected drugs, DB00390, DB00511, DB09102, and DB11581, showed high affinity for the receptor molecule. However, DB00511 had a high ligand RMSD, indicating that the drug is less likely to bind to the receptor compared to the other three drugs. DB00390 and DB00511 form two hydrogen bonds with the receptor, making the complexes more stable than the rest. DB09102 and DB11581 did not form any hydrogen bonds. DB00390 and DB00511 showed a high number of hydrophobic interactions (17 and 14, respectively). The remaining candidates showed a moderate number of hydrophobic interactions ([144]Figures 4, [145]6). Although DB00511, DB09102, and [146]DBB11581 showed high affinity, low ligand RMSD, and a considerable number of interactions, they were not considered because of their potential carcinogenicity and high toxicity values obtained from the previous ADMET analysis. Therefore, DB00390 was considered suitable for further molecular dynamic simulation studies. TABLE 7. Docking score with CDK1/CKS2, RMSD and number of interacting bonds of DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, DB11581. Drug ID Docking score Ligand RMSD (Å) Number of H-bonds Number of hydrophobic interactions DB00390 −245.01 50.77 2 17 DB00511 −245.66 64.29 2 14 DB00696 −201.7 65.35 1 10 DB00872 −177.17 62.12 1 10 DB01396 −222.61 43.71 1 8 DB09102 −247.75 43.46 0 12 DB11581 −247.75 43.46 0 12 AZD-5438 −129.94 64.62 0 8 [147]Open in a new tab FIGURE 4. [148]Molecular structures are shown in eight panels labeled with codes: DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, DB11581, and AZD-5438 with CDK1/CKS2 receptor. Each structure is displayed with vivid blue, white, and beige coloring, indicating hydrophobicity levels. [149]Open in a new tab Hydrophobic interactions of DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, DB11581 and AZD-5438 with CDK1/CKS2 receptor. Hydrophobicity varies from +3 (brown) to −3 (blue). FIGURE 6. [150]Chart (a) displays RMSD over time in picoseconds, with spikes reaching up to 5 nanometers. Chart (b) illustrates RMS fluctuation by residue, ranging from 0 to 2 nanometers, with higher variability. Graph (c) shows the radius of gyration over time in picoseconds with Rg values mostly stable around 3.5 nm but with spikes around 60,000 ps and 90,000 ps. Graph (d) depicts the solvent accessible surface area, starting around 650 nm² and gradually decreasing over time, with minimal fluctuations and a stable red line at the bottom around 100 nm². [151]Open in a new tab (a) RMSD, (b) RMSF comparison of CDK1/CKS2 (black) and CDK1/CKS2-Digoxin complex (red) (c) Radius of Gyration, (d) Solvent Accessible Surface Area. Moreover, a comparative study with a well known molecule (AZD-5438) was performed through successive molecular interactions to establish repurposed drug DB00390 as a potential inhibitor of CDK1/CKS2. While AZD-5438 showed nearly half affinity towards the CDK1/CKS2 complex compared to DB00390, the RMSD value of AZD-5438 was slightly less than DB00390. Furthermore, Ligplus analysis showed that the number of hydrogen bonds and hydrophobic interactions with CDK1/CKS2 dropped drastically ([152]Figure 5) for AZD-5438, inferring Digoxin (DB00390) as a better inhibitor. FIGURE 5. [153]Molecular docking representations of various compounds, each labeled with a unique identifier such as DB00390, DB00811, DB00696 (Ergotamine), DB00872, DB01396, DB09102, DB11851, and AZD-5438. Structures display atoms and interactions with target proteins, highlighted by red arcs and labeled with amino acid names. [154]Open in a new tab Hydrophobic interactions and hydrogen bonds of DB00390, DB00511, DB00696, DB00872, DB01396, DB09102, DB11581 and AZD-5438 with CDK1/CKS2 receptor (Ligplot analysis). 3.10. Validated receptor-drug interaction confirmed by molecular dynamics simulation Based on molecular docking and ADMET analyses, DB00390 (Digoxin) was used for the molecular dynamic simulation study. Various graphs i.e., RMSD, RMSF, RoG, SASA, MM-PBSA, Hydrogen bond number, angle and distance were derived from the molecular dynamics simulation results ([155]Figure 6). Fluctuation of RMSD around 0.25 nm and decreasing to 0.1 nm at 70–80 ns indicates that the CDK1/CKS2-Digoxin complex is fairly stable ([156]Maruyama et al., 2023). Comparatively, the RMSD of the apoprotein remained at 0.5 nm with occasional spikes reaching up to 4.5–5.0 nm, reflecting instability in the absence of the ligand ([157]Figure 6a). This implies that compared to CDK1/CKS2, the CDK1/CKS2-Digoxin complex is more stable and has enhanced inhibitory potential and potentially more biologically effective. The average RMSF of each residue was calculated to determine the residues that fluctuated during the simulation. The residues of CDK1/CKS2 generally showed higher fluctuation compared to the CDK1/CKS2-Digoxin complex implying higher instability ([158]Figure 6b). The radius of gyration (RoG) of CDK1/CKS2-Digoxin also showed low fluctuation around 5 nm. The RoG of CDK1/CKS2 stayed around 3.5 nm with two abrupt increases upto 5 nm and 4.7 nm at 60 ns and 85 ns respectively ([159]Figure 6c). The solvent-accessible surface area is also a good indicator of protein complex compactness and stability. In our case, the SASA of CDK1/CKS2-Digoxin was fluctuating between 75 and 95 square nm compared to 600 square nm of CDK1/CKS2 ([160]Figure 6d), which indicates that the protein remained compact and stable during the simulation ([161]Bagewadi et al., 2023). Higher value of RoG but lower values of RMSD, RMSF and SASA suggest possible shift in mass distribution rather than instability of the CDK1/CKS2-Digoxin complex ([162]Miu et al., 2008). The MM-PBSA graph shows that the total binding free energy drops from initial −50 kJ/mol to −400 kJ/mol at 40 ns and 60 ns and stays around −200 kJ/mol at the end of the simulation ([163]Figure 7a). This gradual reduction of the total binding free energy indicates that the whole complex achieves stability over time ([164]Kumari et al., 2014). Molecular dynamic simulation analysis shows 1–3 hydrogen bonds being generated between the protein and the drug during the simulation time period of 100 ns ([165]Figure 7b). Hydrogen bond distances and angles show the fluctuation between 0.24 and 0.35 nm and 2–30° respectively, which infer a stable bonding between the protein and the drug ([166]Figures 7c,d). FIGURE 7. Two graphs are shown. The first graph (a) shows protein Van der Waals energy over 100,000 picoseconds with energy values ranging from zero to negative 500 kJ/mol. The second graph (b) depicts the number of hydrogen bonds over the same timeframe, ranging from zero to five. Both graphs demonstrate fluctuations. Two graphs illustrate hydrogen bond distribution. The top graph plots frequency against donor-acceptor distance in nanometers, peaking near 0.3 nanometers. The bottom graph shows frequency relative to hydrogen-donor-acceptor angles in degrees, with fluctuations peaking around 10 degrees. [167]Open in a new tab (a) MM-PBSA graph representing the total binding energy of the CDK1/CKS2-DB00390 complex, (b) Number of Hydrogen bonds, (c) Distribution of Hydrogen bond distance and (d) Distribution of Hydrogen bond angle during the 100 ns simulation. 4. Discussion Hepatocellular carcinoma (HCC) is one of the most challenging cancers to treat owing to its late diagnosis and complex characteristics. It often affects males more than females. Numerous studies have focused on bioinformatic analyses to identify biomarkers for HCC. In our study, we conducted a differential gene expression analysis across five microarray datasets, identifying 19 upregulated and 64 downregulated genes that were commonly observed in all datasets. These commonly upregulated and downregulated genes were then subjected to GO and pathway enrichment analyses. The number of significant GO terms and KEGG pathways enriched with the upregulated genes was significantly lower than that of their downregulated counterparts. Following this, we constructed a PPI network for the up-and downregulated gene sets of HCC and identified key hub genes through analysis of each of the gene sets. We selected CDK1 and ESR1 as hub genes for our analysis. From the differentially expressed gene sets, we selected the top five upregulated (CDK1, NEK2, RRM2, SULT1C2, and ANLN) and downregulated genes (ESR1, IGF2, LIFR, CXCL12, and ADRA1A) for our studies related to gene-miRNA and TF gene targets, including patient survival analysis. We found that these genes including CDK1 and ESR1 were targets of a large number of TFs and miRNAs. In addition, CDK1 and ESR1 showed significant and poor OS and DFS in patients with HCC, and they were validated to be upregulated and downregulated in patients with HCC. KEGG/GO enrichment analysis showed that various pathways can potentially be influenced by the upregulation of CDK1 gene, e.g., mitochondrial ATP synthesis coupled electron transport, G2/M transition of mitosis, protein serine/threonine kinase activity, etc. We found that ESR1 is involved in proteoglycan mediated cell signaling pathways and shaping the tumor microenvironment. This finding is supported by earlier research of [168]Dituri et al. (2022). SOX transcription factors (SOX4, SOX7, SOX15, SOX17) and FOXM1 downregulated the expression of ESR1 ([169]Xiao et al., 2019) while FOXN2 and FOXS1 were involved in upregulation of CDK1 expression ([170]Liu et al., 2021). The gene ESR1 encodes a transcription factor that is located in the chromosomal region chromosomal region 6q25.1 ([171]Tsiambas et al., 2011). Its protein structure comprises N- and C-terminal domains, activation and ligand-binding domains, and DNA-binding domains. ESR1 has been proposed to play a regulatory role in liver, prostate, endometrial, and lung cancer ([172]Hu et al., 2022). A cell line-based study found that ESR1 induces upregulation of lncRNA MEG3 under high glucose conditions, which hinders HCC progression ([173]Cheng et al., 2022). A study found that miR-9-5p downregulates ESR1 gene activity in HCC, causing cancer cell proliferation, migration, and invasion ([174]Wang et al., 2021). ESR1 can also regulate the expression of MMAA, an obesity-metabolism differential gene, to prevent HCC, as shown in a study in females ([175]Zhang Y. et al., 2022). CDK1 is a member of the serine-threonine protein kinase family, which is important during transitions from the G1/S or G2/M phase of the cell cycle ([176]Enserink and Kolodner, 2010). Studies have shown that the gene activity of CDK1 is regulated by miR-582-5p in pathways involving CDK1 and AKT3, which in turn induces cell cycle arrest in the G0/G1 phase. miR-582-5p is downregulated ([177]Zhang et al., 2015). CDK1 also plays a role in immune infiltration in HCC ([178]Zou et al., 2020). A bioinformatic study revealed that CDK1, FOXM1, TCF7L1, E2F4, and SIN3A can act as important transcriptional regulators in HBV-induced HCC ([179]Wang et al., 2018). Thus, CDK1 may act as a prognostic biomarker for HBV-induced HCC. We propose that CDK1 and ESR1, identified as upregulated and downregulated genes, respectively, could serve as potential biomarkers for HCC. However, we focused exclusively on the upregulated gene CDK1 as a candidate for molecular docking and dynamics simulations, hypothesizing that a suitable drug could effectively inhibit CDK1 protein. In cancer, downregulated genes - particularly tumor suppressors-are typically not targeted for inhibition; rather, therapeutic strategies aim to restore or enhance their expression and function. In some contexts, increasing the activity of such genes may involve inhibiting their negative regulators instead. ESR1 (estrogen receptor 1) exhibits tumor-suppressive behavior in liver cancer, where its downregulation is associated with disease progression ([180]Hishida et al., 2013). Studies have shown that ESR1 knockout in mouse models increases susceptibility to liver tumorigenesis ([181]O’Brien et al., 2021), while its activation through estrogen signaling suppresses tumor growth ([182]Fuentes and Silveyra, 2019). Given this tumor-suppressive role, ESR1 was excluded from further analysis focused on drug inhibition effects. Other bioinformatic studies on HCC have identified CDK1 and ESR1 as potential hub genes ([183]Table 8). For instance, [184]Zhang et al. (2021) found that these genes were upregulated and downregulated, respectively, in HCC using a different set of microarray samples than those used in our study. Another study highlighted CDK1 and ESR1 as hub genes, both of which are associated with poor survival outcomes ([185]Ni et al., 2019). In spite of some earlier research findings showing CDK1 as a prominent regulator of liver cancer ([186]Ni et al., 2019; [187]Sun and Zhang, 2020; [188]Hao et al., 2021; [189]Su et al., 2023), molecular docking studies have identified treatment options such the Phellodendron and Anemarrhena drug pair (PADP) as a promising inhibitor of CDK1 ([190]Ruan et al., 2022). However, molecular dynamics modelling has not yet shown Digoxin and CDK1’s interaction behaviour. To bridge this gap, further evaluation of Digoxin’s mechanism of action inside cancer cells is necessary to understand its potential as a CDK1 inhibitor. TABLE 8. A brief review of previous studies in connection to our study. References Dataset used Key methodology Status of CDK1 and ESR1 in the