Abstract Background Diabetic foot ulcers (DFUs) are among the most prevalent and dangerous complications of diabetes. Angiogenesis is pivotal for wound healing; however, its role in the chronic wound healing process in DFU requires further investigation. We aimed to investigate the pathogenic processes of angiogenesis in DFU from a molecular biology standpoint and to offer insightful information about DFU prevention and therapy. Methods Differential gene and weighted gene co-expression network analyses (WGCNA) were employed to screen for genes related to DFU using the downloaded and collated GSES147890 datasets. With the goal of identifying hub genes, an interaction among proteins (PPI) network was constructed, and enrichment analysis was carried out. Utilizing a variety of machine learning techniques, including Boruta, Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Least Absolute Shrinkage and Selection Operator (LASSO), we were able to determine which hub genes most strongly correspond to DFU. This allowed us to create an ideally suited DFU forecasting model that was validated via an external dataset. Finally, by merging 36 angiogenesis-related genes (ARGs) and machine learning models, we identified the genes involved in DFU-related angiogenesis. Results By merging 260 genes located in the green module and 59 differentially expressed genes (DEGs), 35 candidate genes highly associated with DFU were found for more investigation. 35 candidate genes were enriched in epidermal growth factor receptor binding, nuclear division regulation, fluid shear stress, atherosclerosis, and negative regulation of chromosomal structure for the enrichment study. Fifteen hub genes were found with the aid of the CytoHubba plug. The LASSO method scored better in terms of prediction performance ([45]GSE134341) (LASSO:0.89, SVM:0.65, Boruta:0.66) based on the validation of the external datasets. We identified thrombomodulin (THBD) as a key target gene that potentially regulates angiogenesis during DFU development. Based on the external validation dataset ([46]GSE80178 and [47]GSE29221), receiver operating characteristic (ROC) curves with higher efficiency were generated to confirm the potential of THBD as a biomarker of angiogenesis in DFU. Furthermore supporting this finding were the results of Western blot and real-time quantitative polymerase chain reaction (RT-qPCR), which showed decreased THBD expression in human umbilical vein endothelial cells (HUVECs) cultivated under high glucose. Conclusions The findings implicate that THBD may influence DFU progression as a potential target for regulating angiogenesis, providing a valuable direction for future studies. Keywords: DFU, Angiogenesis, DEGs, WGCNA, Machine learning 1. Introduction Diabetic foot ulcer, a vascular consequence of diabetes, affects approximately 6.3 % of people worldwide [[48]1]. Diabetes accounts for at least half of all amputations, with DFU infections being the most prevalent cause [[49]2]. Although there have been significant improvements in clinical care and the clinical presentation of DFU is quite evident, most patients continue to experience chronic non-healing wounds [[50]3]. To prevent and cure DFU, it is beneficial to investigate its pathophysiology at the molecular level. A biological process known as Angiogenesis occurs when smooth muscle and endothelial cells migrate, proliferate, and sprout on top of pre-existing vessels to form new capillaries [[51]4]. Numerous angiogenic and anti-angiogenic factors control the angiogenesis process [[52][5], [53][6], [54][7]]. One of the key mechanisms in wound tissue repair is angiogenesis, which occurs throughout the wound healing course. Traumatic revascularization facilitates the restoration of blood circulation to the wound and creates favorable conditions for wound healing by supplying vital metabolic components such as oxygen and nutrients to wound tissue cells [[55]8]. The dynamic regulation of angiogenesis may be altered because of inconsistent results in the expression of promotional and anti-angiogenic factors as a result of tissue loss, necrosis, and the presence of gangrene in patients with diabetic feet, leading to delayed wound healing or even non-healing. The specific mechanisms underlying the reduction of angiogenesis in DFUs remain unclear and require further investigation. Machine learning is highly efficient in identifying disease-related biomarkers, making accurate clinical diagnoses, and determining the best course of therapy in the field of biomedicine [[56]9]. In this study, we explored crucial hub genes highly associated with the regulation of chronic non-healing due to reduced angiogenesis in DFU using publicly available data, a literature review, WGCNA, differential expression analysis, and robust prediction models. 2. Methods 2.1. Data sources and overview [57]Fig. 1 is a visual representation of the study protocol. Six skin tissue samples regarding diabetes patients and seven control tissue samples made up the [58]GSE147890 dataset, which was gathered from the Gene Expression Omnibus (GEO) database ([59]http://www.ncbi.nlm.nih.gov/geo/). The DFU-related [60]GSE134347, [61]GSE80178, and [62]GSE29221 datasets have been compiled from the GEO database for the purpose of testing the accuracy of these results. The Hallmark Gene set of the Molecular Signatures Database (GSEA | MSigDB, [63]gsea-msigdb.org) yielded 36 genes in total, which are referred to as ARGs [[64]10]. [65]Supplemental Table 1 contains detailed information on ARGs. Fig. 1. [66]Fig. 1 [67]Open in a new tab Workflow diagram of the research. 2.2. Data processing and DEGs analysis The R software was used as the primary analytical tool for the investigation. For further improving the reliability of the outcomes, we initially corrected the backdrop of the expression profile data using the limma package's Normalize Between Arrays function. DEGs were then identified using a further differential analysis with a cutoff point of |log2 (fold-change) | > 0.05 along adj. P-value <0.05 [[68]11], based on the R package "limma.” A heat map and a volcano gram were used to display the findings of the DEGs investigation. 2.3. Building the joint expression system and locating the DFU-related key module We applied the 'WGCNA' package to implement weighted network analysis of gene co-expression in an attempt to find modules related to DFU [[69]12]. The adjacency matrix determines the network relationships. A soft threshold is selected for the adjacency matrix by constructing a scale-free network. In the following step, a topological overlap matrix was created employing the adjacency matrix. To discover genes with joint expression modules, investigations on block module function and module division were conducted out. Different colors were used to denote certain modules that had various correlations with the DFU. Genes contained in the key module, which was deemed to be the most pertinent module, were used for additional studies. 2.4. Study of the genes linked to DFU through GO and KEGG pathway enrichment To identify prospective genes strongly related to DFU, we merged the DEGs and genes of the green module. The R package "clusterProfifiler" was utilized to conduct functional analyses of Gene Ontology (GO) and Kyoto Encyclopedia for Genes and Genomes (KEGG) pathways, with the aim of investigating the possible functions in biological systems and pathways tied to the candidate genes. Significant results were defined as p-value <0.05 and q-value <0.1 [[70]13]. Three subject matters were included in the GO enrichment analysis: molecular function (MF), cellular component (CC), and biological process (BP). The findings were displayed employing the "ggplot2" program. 2.5. Building the network of interactions among proteins and locating hub genes Networks of interactions among proteins deliver significant insight for comprehending the molecular processes behind disease processes. Cytoscape was used to construct the PPI network of 35 candidate genes based on the Search Tool for the Retrieval of Interacting Genes (STRING) ([71]https://string-db.org/). Subsequently, the CytoHubba plug-in of Cytoscape was used to screen hub genes through scoring. Further comprehensive analyses of key genes were performed. 2.6. Solid prediction model constructed with several types of algorithms for machine learning Using the R packages "glmnet," "caret," and "Boruta," we built a machine learning model to identify specific genes predominantly linked with DFU from 15 hub genes [[72]14]. The entire dataset of the 15 hub genes was subjected to LASSO, SVM-RFE, and Boruta analyses. For additional validation of the model's prediction accuracy, the [73]GSE134341 dataset served as an external dataset, and ROC curves acted as a method for examining the efficacy of the model's predictions. As a result, the intersecting genes between the 36 ARGs and the genes discovered using model analysis were regarded as key angiogenesis-related genes associated with DFU. 2.7. Cell culture Shanghai Zhong Qiao Xin Zhou Biotechnology Co., Ltd. (Shanghai, China) was the supplier of the HUVECs. The cells were cultivated at 37 °C in a humid atmosphere with 5 % CO2 using Dulbecco's modified Eagle's medium (DMEM, Gibco, Carlsbad, CA, USA), which included 10 % fetal bovine serum (FBS; Biological Industries, Israel) complemented with 1 % penicillin-streptomycin. 2.8. RT-qPCR confirmation Total RNA was extracted from HUVECs cultured using two different methods, followed by reverse transcription for cDNA synthesis and RT-qPCR. All products were purchased from TIANGEN BIOTECH (BEIJING)Co., LTD. The reference interned gene chosen was β-actin, and the primer sequences were F-CACCATTGGCAATGAGCGGTTC, R-AGGTCTTTGCGGATGTCCACGT. The sequences of THBD to be verified were as follows: F-TCCCAGGAGACAGTTCAAGAAAGC and R-ACCAGTGACAAGGAACAGTGACAG. 2.9. Western blot analysis HUVECs cultured using the two different methods were spiked with phenylmethylsulfonyl fluoride (PMSF) (Seven Innovation (Beijing) Biological Technology Co., Ltd.) and radioimmunoprecipitation assay (RIPA) lysate (Seven Innovation (Beijing) Biological Technology Co., Ltd.) at a ratio of 1:100. The lysate containing the cells was then rested on ice for 30 min, then placed in a 4 °C centrifuge at 12000 rpm and centrifuged for 10 min, the supernatant was finally taken as the total protein. A Bicinchoninic Acid (BCA) Protein Assay Kit (Wuhan Servicebio Technology Co., Ltd.) was then used to determine the protein concentrations of the two groups, which were quantified and leveled. The whole protein was electrophoresed in equal quantities using 12 % sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) from Shanghai EpiZyme Biotechnology Co., Ltd. The protein was then transferred onto PVDF membranes from Merck Millipore, USA. After blocking the PVDF membrane with 5 % skim milk for 2 h at ambient temperature, the primary antibody was incubated at 4 °C for the whole night. The following primary antibodies were used: actin (Abcam, Cambridge, UK) and THBD ([74]ABClone). On the second day, the PVDF membrane was incubated with the matching secondary antibodies for 2 h at ambient temperature, followed by three 10-min washes with Tris-buffered saline and Tween 20 (TBST) (Wuhan Servicebio Technology Co., Ltd.). The prepared ECL reagent (Wuhan Servicebio Technology Co., Ltd.) was applied dropwise to the PVDF membrane and a FluorChem E Imaging System (Protein Simple, USA) was used to capture protein band images. 2.10. Statistical analysis A double-blind method was employed to conduct statistical analyses using GraphPad Prism 7.0 (GraphPad Software, USA). For comparison between groups, defined as unpaired Student's t-test was implemented. A significance level of P < 0.05 was used to determine statistical significance. 3. Results 3.1. Data preprocessing and DEGs screening [75]Fig. 2 demonstrates the comparison of the samples in the [76]GSE147890 expression dataset before and after normalization. We conducted an expression differential analysis using the [77]GSE147890 dataset. These DEGs allowed us to make distinctions between DFU samples and healthy controls, as demonstrated by principal component analysis (PCA) ([78]Fig. 3A). The 59 DEGs found, including 28 upregulated and 31 downregulated genes, are shown on a volcano gram ([79]Fig. 3B). We plotted a heat map to demonstrate the expression of the DEGs ([80]Fig. 3C). [81]Supplemental Table 2 had information on the DEGs. Fig. 2. [82]Fig. 2 [83]Open in a new tab Data on gene expression in samples are normalized. (A) Prior to normalization. (B) Following to normalization. Fig. 3. [84]Fig. 3 [85]Open in a new tab DEGs in DFU identification. (A) The expression of DEGs forms the basis of PCA. The scatter diagram is drawn using Dims 1 and 2, which stand in for the X- and Y-axes, respectively, and each point denotes a sample. (B) The DEGs' volcano gram comparing the DFU and control samples. Cutoff: adj. P-value <0.05 and |log2 (fold-change) | > 0.05. (C) The DEG heat map compares the control and DFU samples. Cutoff: adj. P-value <0.05 and |log2 (fold-change) | > 0.05. 3.2. Identification of key module related to DFU by WGCNA We established a scale-free network with a soft threshold of = 6 to extract the major modules strongly associated with DFU ([86]Fig. 4A). Following that, making use of correlation clustering, we determined all of the module hallmark genes, which represented each module's entirety expression levels. We identified 83 modules and colored them in various ways ([87]Fig. 4B). According to the analysis, the green module had the strongest link to DFU (cor = 0.83, p = 5e04) ([88]Fig. 4C). Out of 723 significant genes in the green module with DFU genes (cor = 0.7, p = 1.6e-107), we identified 260 genes as the most linked with DFU based on GS = 0.65 and MM = 0.75. ([89]Fig. 4D). [90]Supplemental Table 3 contains information about the 260 genes in the green module. Fig. 4. [91]Fig. 4 [92]Open in a new tab Identification of critical module related to DFU by WGCNA. (A) Left: Scale-free index analysis for different soft-threshold powers (β). Analyzing the mean connection for different soft-threshold powers is shown on the right. (B) Joint expression gene module identification. The dendrogram's branches cluster into 83 modules, each of which was given a distinct color designation. (C) Heatmap illustrating the connection between the phenotype and each module eigengene. (D) Participants in the green module and DFU are correlated. 3.3. Genes and pathway enrichment analysis 35 candidate genes strongly associated with DFU were identified ([93]Fig. 5A). These genes were mostly enriched in mitotic spindle assembly checkpoint signaling, negative regulation of chromosomal structure, and regulation of nuclear division in biological processes ([94]Fig. 5B). These genes were primarily enriched in condensed chromosomes, chromosomes, centromeric regions, and chromosomal areas in the cellular components ([95]Fig. 5C). Additionally, these genes showed higher levels of protein kinase regulator activity, epidermal growth factor receptor binding, and kinase regulator activity in molecular function ([96]Fig. 5D). The pathways involved were most prominent in prostate cancer, fluid shear stress, atherosclerosis, and transcriptional dysregulation in cancer, as evidenced by KEGG pathway analysis ([97]Fig. 5E). [98]Supplemental Table 4 provides more information on the enrichment analysis findings. Fig. 5. [99]Fig. 5 [100]Open in a new tab GO and KEGG analysis of 35 candidate genes. (A) 35 potential genes found using DEGs and WGCNA were displayed in a Venn diagram. (B) Analysis of GO enrichment in biological processes. (C) Analysis of GO enrichment in a biological component. (D) Molecular function using GO enrichment analysis. (E) Enrichment analysis using KEGG of 35 candidate genes; the P-value is indicated by the color of the bar. 3.4. Building the network of interactions among proteins and identifying hub genes The PPI network, depicted in [101]Fig. 6A, contained 79 edges and 35 nodes, with an average node degree of 4.51 nodes. The degree is higher if the node is larger and darker than the other nodes. The combined score increased as the edges became darker and broader. Fifteen hub genes were found using the CytoHubba plug-in ([102]Fig. 6B, [103]Table 1). Additionally, we conducted a correlation study using the "ggcorrplot" program to investigate the expression correlation of 15 hub genes. The findings showed that the 15 hub genes had strong positive or negative associations with each other ([104]Fig. 6C). The DFU and control groups differed significantly in the expression of 15 key genes, according to expression analysis. In contrast to the controls, which showed greater expression levels of CENPU, SMC2, CADM1, TOP2A, TTK, CCNA2, RAD51AP1, NDC80, and CENPF, the DFU samples showed higher expression levels of KLF6, THBD, DUSP1, SMAD7, MMP3, and IL1R2 ([105]Fig. 6D). Fig. 6. [106]Fig. 6 [107]Open in a new tab The PPI network's identification and analysis of hub genes. (A) The PPI network's 35 genes. (B) The CytoHubba plug served to identify the hub genes. (C) Correlation heatmap of hub genes. (D) Hub gene expression diagram in violin plot form. Table 1. The detailed information on 15 genes. No. Official Symbol Official Full Name Also known as Summary 1 CENPU centromere protein U CENP50, CENPU50, KLIP1, MLF1IP, PBIP1 The centromere is a specialized chromatin domain, present throughout the cell cycle, that acts as a platform on which the transient assembly of the kinetochore occurs during mitosis. 2 SMC2 structural maintenance of chromosomes 2 CAPE; CAP-E; SMC-2; SMC2L1 Predicted to enable ATP binding activity; chromatin binding activity; and single-stranded DNA binding activity. Involved in mitotic chromosome condensation. Located in condensed chromosome; cytoplasm; and nuclear lumen. Part of condensin complex. 3 CADM1 cell adhesion molecule 1 BL2; ST17; IGSF4; NECL2; RA175; TSLC1; IGSF4A; Necl-2; SYNCAM; sgIGSF; sTSLC-1; synCAM1 Enables signaling receptor binding activity. Involved in several processes, including cell recognition; positive regulation of cytokine production; and susceptibility to natural killer cell mediated cytotoxicity. Located in plasma membrane. 4 TOP2A DNA topoisomerase II alpha TOP2; TP2A; TOPIIA; TOP2alpha This gene encodes a DNA topoisomerase, an enzyme that controls and alters the topologic states of DNA during transcription. This nuclear enzyme is involved in processes such as chromosome condensation, chromatid separation, and the relief of torsional stress that occurs during DNA transcription and replication. 5 TTK TTK protein kinase ESK; PYT; CT96; MPH1; MPS1; MPS1L1 This gene encodes a dual specificity protein kinase with the ability to phosphorylate tyrosine, serine and threonine. Associated with cell proliferation, this protein is essential for chromosome alignment at the centromere during mitosis and is required for centrosome duplication. 6 KLF6 KLF transcription factor 6 GBF; ZF9; BCD1; CBA1; CPBP; PAC1; ST12; COPEB This gene encodes a member of the Kruppel-like family of transcription factors. The zinc finger protein is a transcriptional activator, and functions as a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene, some of which are implicated in carcinogenesis. 7 THBD thrombomodulin TM; THRM; AHUS6; BDCA3; CD141; BDCA-3; THPH12 The protein encoded by this intronless gene is an endothelial-specific type I membrane receptor that binds thrombin. This binding results in the activation of protein C, which degrades clotting factors Va and VIIIa and reduces the amount of thrombin generated. Mutations in this gene are a cause of thromboembolic disease, also known as inherited thrombophilia. 8 CCNA2 cyclin A2 CCN1; CCNA The protein encoded by this gene belongs to the highly conserved cyclin family, whose members function as regulators of the cell cycle. This protein binds and activates cyclin-dependent kinase 2 and thus promotes transition through G1/S and G2/M. 9 RAD51AP1 RAD51 associated protein 1 PIR51 Enables nucleic acid binding activity. Involved in DNA repair; cellular response to ionizing radiation; and positive regulation of DNA recombination. Located in chromatin and nucleus. Part of protein-containing complex. 10 DUSP1 dual specificity phosphatase 1 HVH1; MKP1; CL100; MKP-1; PTPN10 The protein encoded by this gene is a phosphatase with dual specificity for tyrosine and threonine. The encoded protein can dephosphorylate MAP kinase MAPK1/ERK2, which results in its involvement in several cellular processes. This protein appears to play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. 11 NDC80 NDC80 kinetochore complex component HEC; HEC1; TID3; KNTC2; HsHec1; hsNDC80 This gene encodes a component of the NDC80 kinetochore complex. The encoded protein consists of an N-terminal microtubule binding domain and a C-terminal coiled-coiled domain that interacts with other components of the complex. This protein functions to organize and stabilize microtubule-kinetochore interactions and is required for proper chromosome segregation. 12 SMAD7 SMAD family member 7 CRCS3; MADH7; MADH8 The protein encoded by this gene is a nuclear protein that binds the E3 ubiquitin ligase SMURF2. Upon binding, this complex translocates to the cytoplasm, where it interacts with TGF-beta receptor type-1 (TGFBR1), leading to the degradation of both the encoded protein and TGFBR1. Expression of this gene is induced by TGFBR1. Variations in this gene are a cause of susceptibility to colorectal cancer type 3 (CRCS3). Several transcript variants encoding different isoforms have been found for this gene. 13 CENPF centromere protein F CENF; hcp-1; CILD31; STROMS; PRO1779 This gene encodes a protein that associates with the centromere-kinetochore complex. The protein is a component of the nuclear matrix during the G2 phase of interphase. In late G2 the protein associates with the kinetochore and maintains this association through early anaphase. It localizes to the spindle midzone and the intracellular bridge in late anaphase and telophase, respectively, and is thought to be subsequently degraded. 14 MMP3 matrix metallopeptidase 3 SL-1; STMY; STR1; CHDS6; MMP-3; STMY1 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP's are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. This gene encodes an enzyme which degrades fibronectin, laminin, collagens III, IV, IX, and X, and cartilage proteoglycans. The enzyme is thought to be involved in wound repair, progression of atherosclerosis, and tumor initiation. The gene is part of a cluster of MMP genes which localize to chromosome 11q22.3. 15 IL1R2 interleukin 1 receptor type 2 IL1RB; CD121b; IL1R2c; CDw121b; IL-1R-2; IL-1RT2; IL-1RT-2 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. [108]Open in a new tab 3.5. Building and verifying the ideal model for prediction Three machine learning algorithms (LASSO, SVM-RFE, and Boruta) produced five, five, and five genes, respectively ([109]Fig. 7A–D). In comparison to SVM-RFE and Boruta, the LASOO algorithm exhibited superior prediction performance according to the Area Under Curve (AUC) values of the three different prediction models ([110]Fig. 7E). Five genes (CADM1, KLF6, THBD, DUSP1, and SMAD7) discovered by LASSO were thought to be crucial genes primarily associated with DFU. These five essential genes enabled PCA to distinguish between DFU samples and healthy controls ([111]Fig. 7F). Fig. 7. [112]Fig. 7 [113]Open in a new tab Building and verifying the ideal predictive model. (A, B) Five DFU-related genes were identified using the minimal lambda-based LASSO technique. (C) Five genes linked to DFU were found using the SVM technique. (D) Five genes linked to DFU were identified by the Boruta algorithm. (E) Validating all three prediction models via an external dataset. (F) PCA of the five essential genes shows good differentiation power. 3.6. Key angiogenesis-related genes associated with DFU We discovered THBD by integrating 36 ARGs with five key genes, most of which are related to DFU, as potential target genes that may eventually alter angiogenesis during the progression of DFU disease ([114]Fig. 8A). Finally, to confirm the potential of THBD as a biomarker linked to angiogenesis in DFU, ROC curves were generated according to the validated external dataset ([115]GSE80178 and [116]GSE29221) ([117]Fig. 8B). Fig. 8. [118]Fig. 8 [119]Open in a new tab Crucial angiogenesis-related genes correlated with DFU were identified and validated. (A) Venn diagram displaying the amount of genes that 36 ARGs and five important genes share in common. (B) Applying two external datasets to validate the crucial angiogenesis-related genes correlated with DFU. 3.7. THBD showed a lower relative expression level in HUVECs with high glucose culture The expression was detected in high glucose and non-HG culture media. The results indicate the protein expression level of THBD in the high glucose culture medium was considerably lower than that in the normal culture medium (P < 0.05), as seen in [120]Fig. 9A and B. Fig. 9. [121]Fig. 9 [122]Open in a new tab THBD showed a lower relative expression level in HUVECs with high glucose culture. (A) The relative expression of THBD was detected in high glucose and non-high glucose culture medium by RT-qPCR. (B) The THBD protein level was detected in high glucose and non-high glucose culture medium by Western blot analysis. For the corresponding electrophoresis gels and the uncropped membranes, see [123]Supplemental Fig. 1. 4. Discussion Despite advancements in treatment such as surgical debridement, biological debridement, wound dressings, and diabetic foot ulcers, a persistent clinical issue continues to place a heavy financial and psychological burden on society and families, affecting nearly 6.3 % of the global population [[124]1]. Angiogenesis is one of the first processes identified as essential for wound healing [[125]15]. However, diabetic wounds fail to heal and remain in a permanent state of chronic inflammation due to vasculopathy and reduced angiogenesis [[126]16]. Therefore, it is necessary to explore the pathological mechanisms of decreased angiogenesis in DFU from a molecular and biological perspective. These investigations will shed light on DFU therapy and preventive strategies. Bioinformatics has become increasingly crucial for the diagnosis, treatment, and prognostication of illnesses because of the rapid advancement of next-generation sequencing technologies. Within the present study, we analyzed 35 DFU-related genes in the [127]GSE147890 gene chip dataset through a combination of WGCNA and gene expression difference analysis. According to further functional annotation analysis, the 35 candidate genes were mostly enriched in binding to the epidermal growth factor receptor, controlling nuclear division, and negatively regulating chromosomal structure. In addition, enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes revealed that the majority of these genes were associated with atherosclerosis, fluid shear stress, prostate cancer, and transcriptional dysregulation. We created network interaction maps for the 35 genes using an online network database. Given the linkage scores, 15 of these genes were chosen as hub genes for further analysis. Among the 15 genes, CADM1, KLF6, THBD, DUSP1, and SMAD7 were shown to be most associated with DFU using three machine learning techniques (LASSO, SVM-RFE, and Boruta), which were verified using external datasets. THBD was discovered to be related to angiogenesis in DFU after combining the 36 ARGs acquired from the database. [128]GSE80178 and [129]GSE29221 served as external datasets to plot the ROC curves, and THBD performed well. Type I transmembrane glycoprotein THBD is found on the surfaces of epidermal keratinocytes and endothelial cells [[130]17]. Through connections and intricate regulatory expression, THBD's five unique structural domains of the THBD integrate important biological processes and biochemical pathways, such as those linked to coagulation, innate immunity, inflammation, and cell proliferation [[131][18], [132][19], [133][20]]. THBD has several implications for the pathophysiology and/or development of illness, according to 20 years of study. THBD has been demonstrated in several preclinical and clinical trials to have a tumor-inhibiting impact by lowering cell invasion, proliferation, and metastasis [[134][21], [135][22], [136][23]]. THBD-dependent synthesis of APCs provides protection against DN by inhibiting glucose-induced death of glomerular endothelial cells and podocytes in mouse models [[137]24]. Through APC-dependent and independent pathways, soluble thrombomodulin, particularly lectin-like structural domains, protects against ischemia-reperfusion damage to the heart, lungs, and kidney [[138][25], [139][26], [140][27]]. However, only a few studies have supported the hypothesis that THBD regulates angiogenesis and affects DFU wound growth. An earlier investigation found that THBD, a regulator of angiogenesis, stimulates angiogenesis in rats following hindlimb ischemia by activating FGFR1 (fibroblast growth factor receptor-1) [[141]28]. Additionally, THBD promotes wound healing and probably plays a role in chronic nonhealing wounds [[142]29,[143]30]. Vascular endothelial cells are the first target cells to be destroyed by microvascular complications in diabetic patients; thus, the target gene THBD was validated in this study using high-glucose cultured HUVECs to mimic the endothelial cell environment of diabetic foot patients [[144]31]. The RT-qPCR results for the inhibition of THBD expression in a high-glucose environment are consistent with the positive effect it exerts on wound healing described above, suggesting that low THBD expression in DFU delays wound healing. In this study, considering the AUC values and a thorough literature analysis, we logically believe that THBD is a regulator of angiogenesis and is implicated in the formation of DFU. This study has several limitations. The findings were primarily based on computational analyses and were validated in HUVECs. Further investigations encompassing diverse cell types, including mature vascular endothelial cells, are planned to substantiate these findings, thereby enhancing the comprehension of THBD expression and functionality within diverse cellular contexts. The restricted sample size of both the internal and external datasets could potentially impact the precision of the results. Hence, a larger cohort of DFU samples would enhance the predictive accuracy. 5. Conclusions To identify possible biomarkers for DFU, we built machine learning prediction models based on bioinformatics studies conducted on a variety of datasets. These results suggested that THBD may influence DFU development as a potential target for regulating angiogenesis, which is an essential topic for our follow-up study. Data availability statement Data included in article/supp. material/referenced in article. Funding statement This study was supported by National Natural Science Foundation of China (82074426, 82104864, 82204822), Natural Science Foundation of Liaoning Province (2021-BS-215, 2022-MS-25, 2023-MS-13), Liaoning Revitalization Talents Program (XLYC1802014), Liaoning Key Research and Development Planning Project (2017226015), Basic Research Projects of Liaoning Provincial Department of Education (LJKMZ20221286), Naural Science Foundation of Tibet Autonomous Region and Regional Science(XZ202301ZR0030G, XZ2023ZR-ZY82(Z)) and Technology Project of Naqu City. CRediT authorship contribution statement Xingkai Wang: Conceptualization, Data curation, Investigation, Supervision, Writing – original draft, Writing – review & editing. Lei Meng: Conceptualization, Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing. Juewei Zhang: Conceptualization, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. Linxuan Zou: Data curation, Investigation, Supervision, Writing - original draft, Writing - review & editing. Zhuqiang Jia: Formal analysis, Methodology, Software, Writing – original draft. Xin Han: Supervision, Validation, Visualization, Writing – original draft. Lin Zhao: Data curation, Investigation, Supervision, Writing – original draft. Mingzhi Song: Formal analysis, Methodology, Software, Writing – original draft. Zhen Zhang: Conceptualization, Funding acquisition, Project administration, Resources, Writing – original draft, Writing – review & editing. Junwei Zong: Conceptualization, Funding acquisition, Project administration, Resources, Writing – original draft, Writing – review & editing. Shouyu Wang: Conceptualization, Funding acquisition, Project administration, Resources, Writing – original draft, Writing – review & editing. Ming Lu: Conceptualization, Funding acquisition, Project administration, Resources, Writing – original draft, Writing – review & editing. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments