Abstract

Background and aims

   Alcohol-related liver disease (ALD) is a worldwide burden. Cuproptosis
   has been shown to play a key role in the development of several
   diseases. However, the role and mechanisms of cuproptosis in ALD remain
   unclear.

Methods

   The RNA-sequencing data of ALD liver samples were downloaded from the
   Gene Expression Omnibus (GEO) database. Bioinformatical analyses were
   performed using the R data package. We then identified key genes
   through multiple machine learning methods. Immunoinfiltration analyses
   were used to identify different immune cells in ALD patients and
   controls. The expression levels of key genes were further verified.

Results

   We identified three key cuproptosis-related genes (CRGs) (DPYD,
   SLC31A1, and DBT) through an in-depth analysis of two GEO datasets,
   including 28 ALD samples and eight control samples. The area under the
   curve (AUC) value of these three genes combined in determining ALD was
   1.0. In the external datasets, the three key genes had AUC values as
   high as 1.0 and 0.917, respectively. Nomogram, decision curve, and
   calibration curve analyses also confirmed these genes’ ability to
   predict the diagnosis. These three key genes were found to be involved
   in multiple pathways associated with ALD progression. We confirmed the
   mRNA expression of these three key genes in mouse ALD liver samples.
   Regarding immune cell infiltration, the numbers of B cells, CD8 (+) T
   cells, NK cells, T-helper cells, and Th1 cells were significantly lower
   in ALD patient samples than in control liver samples. Single sample
   gene set enrichment analysis (ssGSEA) was then used to estimate the
   immune microenvironment of different CRG clusters and CRG-related gene
   clusters. In addition, we calculated CRG scores through principal
   component analysis (PCA) and selected Sankey plots to represent the
   correlation between CRG clusters, gene clusters, and CRG scores.
   Finally, the three key genes were confirmed in mouse ALD liver samples
   and liver cells treated with ethanol.

Conclusions

   We first established a prognostic model for ALD based on 3 CRGs and
   robust prediction efficacy was confirmed. Our investigation contributes
   to a comprehensive understanding of the role of cuproptosis in ALD,
   presenting promising avenues for the exploration of therapeutic
   strategies.

   Keywords: Alcohol-related liver disease, Cuproptosis, Diagnostic,
   Immune infiltration, Machine learning

Abbreviations

   ALD
          alcohol-related liver disease

   ALT
          alanine aminotransferase

   AST
          aspartate aminotransferase

   AUC
          area under curve

   CC
          cellular component

   CDF
          cumulative distribution function

   CI
          credibility interval

   CRG
          cuproptosis-related gene

   DCA
          decision curve analysis

   DBT
          Dihydrolipoamide Branched-Chain Transacylase E2

   DE-CRG
          differential expression of cuproptosis-related gene

   DEGs
          differential expression genes

   DGIdb
          drug-gene interaction databases

   DPD
          dihydropyrimidine dehydrogenase

   DSigDB
          drug signatures database

   GEO
          gene expression omnibus

   GO
          gene ontology

   GSEA
          gene set enrichment analysis

   GSVA
          gene set variation analysis

   HE
          Hematein & Eosin

   KEGG
          Kyoto Encyclopedia of Genes and Genomes

   LA
          lipoic acid

   LASSO
          least absolute shrinkage and selection operator

   NIAAA
          National Institute on Alcohol Abuse and Alcoholism

   PCA
          principal Component Analysis

   PPI
          protein-protein interaction

   RF
          random forest

   ROC
          receiver operating characteristics

   ssGSEA
          single-sample gene set enrichment analysis

   SVM-RFE
          support vector machine‐recursive feature elimination

   TCA
          tricarboxylic acid

1. Introduction

   Alcohol-related liver disease (ALD) is liver damage caused by prolonged
   heavy alcohol consumption and encompasses a range of diseases, such as
   alcoholic fatty liver disease, alcoholic hepatitis, cirrhosis, and
   complications of cirrhosis [[45][1], [46][2], [47][3]]. ALD is one of
   the leading causes of chronic liver disease worldwide, accounting for
   48 % of cirrhosis-related deaths in the United States [[48]4,[49]5].
   ALD, with a comparable incidence in China to that in the United States,
   is emerging as a significant contributor to the overall liver disease
   burden in China [[50]6].

   Copper is an extracellular element responsible for the precise
   functioning of the bone marrow and central nervous system and acts as a
   cofactor for many antioxidant enzymes, such as super oxide dismutase;
   thus, disorders of copper metabolism may cause organ dysfunction
   [[51][7], [52][8], [53][9], [54][10]]. Dysregulation of copper is
   significantly associated with liver disease [[55]11]. Wilson disease is
   a classic example of copper overload, an autosomal recessive genetic
   disorder characterized by multiple mutations in the ATP7B gene
   [[56]12]. Studies have shown a disturbance of copper metabolism in ALD
   patients [[57][13], [58][14], [59][15]]. Alcohol diet can cause liver
   damage and decrease the expression of HIF-1alpha, occludin, SOD1, and
   GPX1 genes. It was also shown that a lack of copper in the diet may
   exacerbate these changes, and supplementation with copper may improve
   these conditions [[60]16]. Furthermore, in vitro cell experiments have
   shown that appropriate copper supplementation can promote cell growth
   and reduce the production of reactive oxygen species (ROS). Previous
   studies have shown that moderate ALD is closely related to oxidative
   stress, and copper deficiency will promote dyslipidemia and increase
   oxidative stress because copper is an essential cofactor of many
   antioxidant enzymes; thus, insufficient copper may further promote the
   damaging effects of alcohol [[61]17]. Alcohol metabolism in the liver
   cells of patients with excessive alcohol consumption produces a large
   level of ROS, resulting in elevated lipid peroxidation, which disrupts
   mitochondrial function and causes liver cell death [[62]18].
   Cuproptosis is a new form of cell death associated with many diseases,
   such as non-ALD,Wilson disease, and hepatocellular carcinoma [[63][19],
   [64][20], [65][21]].

   Research focused on uncovering the relationship between cuproptosis and
   the underlying pathophysiology of ALD is relatively limited. Therefore,
   we conducted a comprehensive study of the expression, diagnosis, immune
   correlation, and mechanism of cuproptosis-related genes (CRGs) in ALD
   is still needed. In this study, we downloaded ALD-related data from the
   Gene Expression Omnibus (GEO) database and conducted analysis and
   machine-learning analyses to explore the relationship between CRGs and
   ALD. We identified differentially expressed genes and the key genes
   among them and subsequently built and externally validated a prediction
   model. Finally, we performed immunoinfiltration analyses, assessed
   related drugs, and investigated associated competing endogenous RNAs
   (ceRNAs).

2. Materials and methods

2.1. Patients and datasets

   The transcriptomic analysis of ALD and normal non-ALD liver specimens
   included two datasets, [66]GSE28619 and [67]GSE103580, downloaded from
   the GEO database. Specifically, [68]GSE28619 (based on the [69]GPL570
   platform) contained 7 healthy control samples and 15 samples of
   alcoholic hepatitis. [70]GSE103580 ([71]GPL13667 platform) included 13
   cases of alcoholic hepatitis and 1 healthy control. The battle function
   of the “sva” package was used to eliminate batch effects and
   potentially unknown variables. After combining data from [72]GSE28619
   and [73]GSE103580, the final datasets included 28 cases of alcoholic
   hepatitis and 8 control cases. The flow chart of this study is shown in
   [74]Fig. 1.

Fig. 1.

   [75]Fig. 1
   [76]Open in a new tab

   Flowchart of the present study.

2.2. Expression of differentially expressed genes in ALD

   The differential expression analysis was performed using the R package
   limma based on the GEO database. We identified differentially expressed
   genes (DEGs) between ALD and healthy samples. The screening criteria
   were as follows: P < 0.05. The intersection of DEGs related to CRGs was
   created using the “VennDiagram” R package and defined as DE-CRGs for
   subsequent analysis.

2.3. Correlation analysis and protein-protein interaction network
construction

   The R software packages “RCircos” and “heatmap” were used to generate
   landscape maps of 23 chromosomes and heatmaps of 18 DE-CRGs,
   respectively. The “circlize” software package was used to create
   correlation Circos plots based on Pearson correlation analysis.
   Eighteen DE-CRGs protein-protein interaction (PPI) networks were
   constructed using the Search Tool for the Retrieval of Interacting
   Genes/Proteins (STRING; [77]https://string-db.org/) database.

2.4. Gene ontology and KEGG pathway enrichment analysis

   The R package “clusterProfiler” was employed to analyze the biological
   function of genes through Gene Ontology (GO) and KEGG pathway
   enrichment analysis. GO annotation included biological processes (BP),
   cellular components (CC), and molecular functions (MF).

2.5. Construction of the CRGs diagnostic model

   Two datasets, [78]GSE28619 and [79]GSE103580, were integrated for
   modeling, and the [80]GSE142530 and [81]GSE155907 datasets were used as
   external validation sets for verification. Employing the “glmnet”
   package in R, we conducted the least absolute shrinkage and selection
   operator (LASSO) regression on the chosen linear model to decrease the
   data dimension while preserving valuable variables [[82]22,[83]23]. The
   main idea of recursive feature elimination (REF) was to build the model
   iteratively and then select the best (or worst) feature (selected
   according to the coefficient) [[84]24]. The maximum interval principle
   of the support vector machine (SVM) was the foundation for the
   sequential reverse selection algorithm known as support vector machine
   recursive feature elimination (SVM-RFE) [[85][25], [86][26], [87][27]].
   We used the “e1071” package in R to screen the best variable genes to
   establish the SVM-RFE model [[88]28,[89]29]. The random forest (RF)
   algorithms were combined to select the best genes. The RF model is a
   regression tree technique that uses bootstrap aggregation and
   randomization of predictors to achieve high prediction accuracy
   [[90]30]. The “randomForest” R package was used for RF. The genes
   selected by three kinds of machine learning were intersected to obtain
   the final key genes.

2.6. Verification of the model

   First, we used the “rms” package in R to build a nomogram model to
   predict the occurrence of ALD. The “pROC” package in R [[91]31] was
   employed to analyze the area under the curve (AUC), specificity, and
   sensitivity of diagnostic value for marker genes using a time-dependent
   ROC. Each central gene was assigned a score, which was then added to
   form a total score. Two external datasets, [92]GSE142530 and
   [93]GSE155907, were used to verify the diagnostic capability of the
   diagnostic model.

2.7. Gene set enrichment analysis and gene set variation analysis

   Gene set enrichment analysis (GSEA) was conducted using the
   “clusterProfiler” package to explore the potential functions of the hub
   genes [[94]32]. In addition, differential expression genes (DEGs) and
   pathway enrichment analyses were executed with the “gene set variation
   analysis (GSVA)”, “clusterProfiler”, and “Limma” packages. Statistical
   significance was determined by enrichment analysis with p < 0.05.

2.8. Evaluating immune infiltration

   Using the “GSVA” package in R, a single-sample GSEA (ssGSEA) algorithm
   was employed to assess enrichment scores between various immune cells
   and functions or pathways between ALD and control groups. We downloaded
   reference gene collections from a public database
   ([95]http://www.immport.org). Spearman correlation analysis was used to
   explore the relationship between the three key genes and the immune
   score. The Wilcoxon test was used to explore differences in immune cell
   and immune-related functional enrichment scores. The enrichment scores
   between the ALD and normal liver sample groups are represented by box
   plots.

2.9. Detection of CRG clusters and gene clusters and their relationship with
immune cells

   The “ConsensusClusterPlus” package was used to distinguish CRG and Gene
   clusters according to CRG and gene regulatory factors, respectively.
   The number of clusters was determined based on a cumulative
   distribution function (CDF) curve and a specific k value [[96]33].
   Then, principal component analysis (PCA) was used to further validate
   the CRG modification pattern [[97]34]. PCA method was applied to
   calculate the CRG scores of each sample [[98]35]. The CRG scores of the
   two clusters were shown in a box graph. Finally, Sankey plots were
   selected to represent the correlation between CRG clusters, gene
   clusters, and CRG scores. The ssGSEA was used to estimate the extent of
   23 immune cells across different CRG and Gene clusters to explore
   correlations between them.

2.10. Drug-gene interaction analyses and ceRNA network construction

   Drug-gene interactions were explored using the drug-gene interaction
   database (DGIdb, [99]https://dgidb.genome.wustl.edu/) [[100]36]. One
   can forecast the miRNA-mRNA relationship by utilizing the three
   essential genes in miRDB ([101]http://www.mirdb.org/) and TargetScan
   ([102]https://www.targetscan.org/vert_80). Evidence of a direct
   interaction between the miRNA and lncRNA was obtained using SpongeScan
   ([103]http://spongescan.rc.ufl.edu/). Cytoscape software (version
   3.9.0) was utilized to visualize mRNA‒miRNA-lncRNA interactions via the
   ceRNA network.

2.11. ALD mouse model construction and histological procedure

   Sixteen 8-week-old female C57BL/6 WT mice were used for this
   experiment. All mice were administered the Lieber-DeCarli diet for 5
   days before feeding to adapt to the liquid diet. The experimental group
   was then freely administered the Lieber-DeCarli diet containing 5 %
   (v/v) ethanol for 10 days. The control group was provided with an equal
   calorie-controlled diet. On day 16, the experimental group was given a
   single dose of ethanol (6 g/kg body weight, 31.5 % ethanol), and the
   control group was given isocaloric dextrin-maltose [[104]37]. All mice
   were euthanized 9 h later, and blood was taken from the inferior vena
   cava with a 1 ml syringe and centrifuged at 3000 rpm for 15 min. The
   supernatant was collected to obtain the mouse plasma. Serum aspartate
   aminotransferase (AST) and alanine aminotransferase (ALT) levels were
   measured using an automatic biochemical analyzer (ANTECH Diagnostics,
   Los Angeles, CA, USA). Liver tissue samples were also obtained from the
   mice. Part of each liver sample was placed in an EP tube and directly
   frozen in liquid nitrogen. The remaining tissue was fixed in formalin
   solution, embedded in paraffin, and stained with hematoxylin & eosin
   (HE) staining. A section of frozen fresh liver tissue was prepared with
   a thickness of 4 mm and stained with Oil Red O. Rabbit monoclonal
   antibodies were used for immunohistochemistry: anti-DBT (12451-1-AP),
   anti-DPYD (27662-1-AP), and anti-SLC31A1 (67221-1-Ig). All animal
   experiments were approved by the Animal Care and Use Committee of the
   First Affiliated Hospital of Guangxi Medical University
   (NO.2023-S606-01).

2.12. Cell culture and treatments

   The AML12 cell line was supplied by the Cell Bank of Type Culture
   Collection of the Chinese Academy of Sciences (Shanghai, China). It was
   cultured in mouse liver parenchymal cell AML12 complete culture medium
   with DMEM/F-12 (1:1) (Gibco, 11330-032) 89 mL ITS Liquid Media
   Supplement (Sigmadg, I3146) 1 mL Dexamethasone (Sigma, D4902-100 mg) 40
   ng/mL FBS (Gibco) 10 mL at 37μC with 5 % CO[2] using a cell incubator.
   When cells reached a confluence of 60–70 %, they were divided into 2
   groups (n = 4), including [[105]1] the control group (treated with NS)
   and [[106]2] the ethanol group (treated with 200 mM ethanol). The
   treatment duration was 24 h. The dose selection of ethanol was based on
   a previous study on carbon tetrachloride-induced rat hepatocyte injury
   model.

2.13. Protein extraction and western blotting

   Liver samples were homogenized in lysis buffer (Solarbio R0010).
   Samples were then sonicated and incubated on ice for 30 min. Debris was
   removed by centrifugation at 12,000 rpm. Cellular total protein was
   extracted using a M-PER Mammalian Protein Extraction Reagent (Pierce,
   Rockford, IL). Samples were separated in a denaturing 10 %
   polyacrylamide gel and transferred to a 0.45 mm pore nitrocellulose
   membrane. Non-specific binding sites were blocked with 5 % nonfat milk
   (YAMEI PS112L) for 1 h at room temperature. Membranes were then
   incubated with appropriate primary antibodies in TBS with 0.1 % Tween
   20 (TBST). Membranes were washed and incubated with secondary
   antibodies conjugated to horseradish peroxidase to show the bands with
   Omni-ECL buffer (YAMEI SQ201). Parallel blotting of GAPDH (Proteintech
   10494-1-AP) was used as an internal control.

2.14. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR)

   The RNA was extracted from mouse liver tissues and the AML12 cell line
   with TRIzol reagent (Thermo Fisher Scientific, USA). RNA was extracted
   and eluted using an RNA binding column, and purified total RNA samples
   were obtained. The RNA was reverse transcribed using the PrimeScript™
   RT reagent Kit (Takara, Japan), and quantitative real-time PCR
   (qRT-PCR) was performed using the FX Connect system (Bio-Rad, USA) and
   SYBR® Green Supermix (Bio-Rad, USA). Hub genes expression levels were
   analyzed using 2−DDCT, and the outcomes were demonstrated using GAPDH
   as an internal control. To ensure the reliability and comparability of
   the data, the following normalization criteria were applied:1. We
   selected GAPDH as reference genes whose expression levels remain
   constant under different experimental conditions; 2, ΔCt Calculation:
   The Ct value of the target gene for each sample was normalized to the
   Ct value of the reference gene, resulting in a ΔCt value; 3. ΔΔCt
   Method: The ΔCt values of the experimental group were compared with
   those of the control group to calculate the ΔΔCt value (ΔΔCt = ΔCt
   (experimental group) - ΔCt (control group)); 4. Calculation of Relative
   Expression Levels: The relative mRNA expression levels were determined
   using formula 2^-ΔΔCt. Reverse transcription of the RNA was performed
   using the PrimeScript™ RT Reagent Kit (Takara, Japan), and qRT-PCR was
   conducted with an FX Connect system (Bio-Rad, USA) and SYBR® Green
   Supermix (Bio-Rad, USA). qRT-PCR was repeated three times. The primers
   used in this study are shown in [107]Supplementary Table S1.

2.15. Statistical analyses

   Continuous variable data are presented as the mean ± standard
   deviation. Student's t-test was employed to compare two groups, while
   the Wilcoxon rank-sum test was used to examine non-normally distributed
   variables. A p value less than 0.05 was considered to indicate a
   significant difference. The symbols *, **, and *** indicate that the p
   values were less than 0.05, 0.01, and 0.001, respectively. R software
   (version 4.2.1) was utilized for all statistical analyses.

3. Results

3.1. Identification of CRGs involved in ALD

   A total of 6234 DEGs (p < 0.05) were identified using the “limma”
   package from two batch-normalized datasets comprising 28 ALD samples
   and 7 control samples ([108]GSE28619 and [109]GSE103580). Heatmap of
   DEGs was presented in [110]Fig. 2A. The volcano plot of the DEGs was
   displayed in [111]Fig. S1, GO analyses of the 6234 DEGs were shown in
   [112]Fig. S2, and KEGG analyses were shown in [113]Fig. S3.

Fig. 2.

   [114]Fig. 2
   [115]Open in a new tab

   Identification of cuproptosis-related genes in ALD. (A) Heatmap of DEGs
   was presented. (B) Venn diagrams showing the intersection between DEGs
   and CRGs. (C) Eighteen DE-CRGs were located on 23 chromosomes, with the
   boxplots illustrating the differential expression between ALD and
   control samples (D). (E) Heatmap showed the expression patterns of
   these 18 DE-CRGs. (F) Gene relationship network diagram of the 18
   DE-CRGs. (G) Correlation analysis of the 18 DE-CRGs was conducted, with
   orange and blue representing positive and negative correlations,
   respectively. (H) GO analysis of DEGs associated with cuproptosis. (I)
   KEGG analysis of DEGs associated with cuproptosis. P values are
   displayed as follows: *p < 0.05; **p < 0.01; ***p < 0.001. ALD,
   alcohol-related liver disease; CRGs,cuproptosis-related genes; DEGs:
   Differentially expressed genes. DE-CRGs: differentially expressed
   cuproptosis-related genes; GO,gene ontology; KEGG, Kyoto Encyclopedia
   of Genes and Genomes.

   In addition, 6234 DEGs overlapped with 38 CRGs, revealing 18 DE-CRGs
   (DLAT, ISCA2, GLRX5, NDUFV2, ACO2, NDUFA1, DPYD, DBT, LIPT1, NFE2L2,
   GLS, PDHB, PPAT, PLAT, CDKN2A, SLC31A1, NDUFA8, and LIPA) with
   significant differences between the ALD and control groups, as shown in
   [116]Fig. 2B. The chromosomal locations of the 18 DE-CRGs were shown in
   a circular diagram ([117]Fig. 2C). Twelve DE-CRGs (ISCA2, NDUFV2, ACO2,
   NDUFA1, LIPT1, GLS, PDHB, PPAT, PLAT, CDKN2A, NDUFA8 and LIPA) were
   upregulated in ALD, and six (DLAT, GLRX5, DPYD, DBT, NFE2L2, and
   SLC31A1) were downregulated in ALD ([118]Fig. 2D and E). A PPI analysis
   using STRING was performed to explore potential crosstalk among these
   18 DE-CRGs, as shown in [119]Fig. 2F. The correlation among the 18
   DE-CRGs was shown in [120]Fig. 2G. We found that DE-CRGs were related
   to aerobic respiration, purine ribonucleotide biosynthetic,
   oxidoreductase, and iron-sulfur cluster binding and other pathways with
   Gene ontology (GO) enrichment analysis including cross gene enrichment
   in Biological Process (BP), Cellular Component (CC), and Molecular
   Function (MF). The results were displayed in [121]Fig. 1H.
   Additionally, lipoic acid (LA) metabolism, citrate cycle, etc., in KEGG
   pathway analysis was depicted in [122]Fig. 2I.

3.2. Identification of diagnostic marker genes for ALD

   Given the individual complexity and heterogeneity of patients with ALD
   and healthy controls, candidate CRGs were identified from 18 DE-CRGs
   using LASSO regression and two validated machine learning models
   (SVM-RFE and RF), which helped predict the diagnosis of ALD. Eight
   DE-CRGs were identified by the LASSO logistic regression algorithm
   ([123]Fig. 3A and B). The number of SVM features was 8 ([124]Fig. 3C).
   The minimum error of the classifier was 0.0, and the maximum accuracy
   was 1.0 ([125]Fig. 3D). Eighteen DG-CRGs were performed with RF
   analysis, of which three had an average Gini reduction greater than 2
   ([126]Fig. 3E and F). A Venn diagram was then used to intersect the
   essential genes in the LASSO, SVM-RFE, and RF analyses. Finally, three
   key genes (DPYD, SLC31A1, and DBT) were identified for further analysis
   ([127]Fig. 3G).

Fig. 3.

   [128]Fig. 3
   [129]Open in a new tab

   Machine learning was utilized to identify diagnostic marker genes for
   ALD. (A) The adjusted parameter selection in the LASSO model was
   cross-validated 10 times, with each curve representing a gene. (B)
   LASSO coefficient analysis. (C) The SVM-RFE algorithm's feature
   selection accuracy and error reach a peak of 100 % at eight genes, with
   the lowest cross-validation error being 0 %. (D) The relationship
   between random forest tree number and error rate was also evident. (E)
   The ALD group's error was represented by a red line, the control
   group's by a green line, and the total sample error by a black line in
   random forest analysis. (F) Eighteen DG-CRGs were performed with random
   forest analysis, which three had an average Gini reduction greater than
   2. (G) Venn diagram showing overlapping genes obtained using the three
   machine learning algorithms (LASSO, random forest, and SVM-RFE). ALD,
   alcohol-related liver disease; LASSO, least absolute shrinkage and
   selection operator; SVM-RFE, support vector machine‐recursive feature
   elimination; DE-CRGs, differential expression of cuproptosis-related
   genes.

3.3. Evaluation of the diagnostic performance of ALD diagnostic marker genes

   A nomogram model of patients with ALD was constructed using the “rms”
   package to evaluate the predictive efficiency of the three central
   genes, DPYD, SLC31A1, and DBT ([130]Fig. 4A). The nomogram model's
   numerical value for each biomarker was used to forecast the ALD risk,
   with a correction curve indicating an obvious connection between the
   predicted and actual probability ([131]Fig. 4B). The DCA revealed that
   the net benefit from this model was significantly higher than 0,
   implying its remarkable accuracy and ability to provide physicians with
   a basis for decision-making ([132]Fig. 4C). The clinical impact curve
   showed that this nomogram model has a very high diagnostic capability
   ([133]Fig. 4D). The ROC curve analysis showed that the three marker
   gene features combined demonstrated high performance in diagnosing ALD
   (AUC = 0.704, [134]Fig. 4E). ROC curves were generated for each of the
   three labeled genes, and the predictive value of each was assessed. The
   individual predictive ROC results for these three genes all exceeded
   0.83 ([135]Fig. 4F), and their expression was decreased in both
   datasets ([136]Fig. 4G and J). The ROC curve for the combination of the
   three genes in the [137]GSE142530 and [138]GSE155907 sets were 0.670
   and 0.800, respectively ([139]Fig. 4H and K), and there were the ROC
   curve for the predicted performance of the three genes separately
   ([140]Fig. 4I and L). These indications suggest that the model based on
   these three marker genes may have strong prediction efficacy for ALD.

Fig. 4.

   [141]Fig. 4
   [142]Open in a new tab

   Validation of marker genes expression. (A) Nomogram of marker genes.
   (B) Calibration curve. (C) The predictive efficiency of the nomogram
   model was illustrated by DCA. (D) The diagnostic ability of the
   nomogram model was further demonstrated by the clinical impact curve.
   (E) The ROC of the combination of the three key genes for the diagnosis
   of ALD was 0.704 (95 % CI 0.504–0.892). (F) ROC results showed that
   three marker genes separately diagnosed ALD. The AUC values of DPYD,
   SLC31A1, and DBT were 0.847, 0.913, and 0.832, respectively. (G)
   Boxplots of [143]GSE142530 revealed that the three DE-CRGs between the
   ALD and control samples were significantly different. (H) The
   three-gene model, based on a threefold cross-validation process in
   [144]GSE142530, yielded ROC results with an AUC value of 0.0.670. (I)
   The AUC values of DPYD, SLC31A1, and DBT were 0.825, 0.721, and 0.729,
   respectively, in [145]GSE142530. (J) The box diagram shows that the
   three genes differed significantly in different groups of
   [146]GSE155907 samples. (K) The three-gene model, based on threefold
   cross-validation in [147]GSE155907, yielded ROC results of 0.800. (L)
   The AUC values of DPYD, SLC31A1, and DBT for the diagnosis of ALD in
   [148]GSE155907 were 0.850, 1.0, and 0.750, respectively. DCA, decision
   curve analysis; ROC, receiver operating characteristics; ALD,
   alcohol-related liver disease; CI, confidence interval; AUC, area under
   curve; DE-CRGs, differential expression of cuproptosis-related genes.

3.4. GSEA, GSVA, and ssGSEA analysis

   We used ssGSEA to identify the major signaling pathways of the three
   genes in the above model. GSEA of the KEGG pathways demonstrated that
   these three genes are implicated in glycine, serine, threonine, and
   retinol metabolism ([149]Fig. 5A–C). DPYD and SLC31A1 are involved in
   cytochrome p450 metabolism ([150]Fig. 5B and C). DBT and SLC31A1 are
   related to hormone biosynthesis ([151]Fig. 5A and C). DPYD is involved
   in tryptophan metabolism, and the influence of cytochrome p450 on
   genomics ([152]Fig. 5B). [153]Figs. S4–S6 shows the GSEA outcome for
   the GO enrichment analysis.

Fig. 5.

   [154]Fig. 5
   [155]Open in a new tab

   An enrichment analysis of the three marker genes. GSEA analysis of the
   signature based on (A) ADBT, (B) DPYD, and (C) SLC31A1. KEGG pathway
   enrichment for (D) DBT, (E) DPYD, and (F) SLC31A1 was performed by
   GSVA, with the top 50 shown according to the enrichment score. The
   boxplots demonstrated the disparities in immune cells (G) and function
   (H) between ALD and non-ALD samples. (I) The correlations of immune
   cells and three genes were also shown (red: positive correlation,
   green: negative correlation). p values are indicated as *p < 0.05;
   **p < 0.01; ***p < 0.001. GSEA, gene set enrichment analysis; GSVA,
   gene set variation analysis; ALD, alcohol-related liver disease.

   GSVA revealed distinct activity pathways between low- and
   high-expression subtypes determined according to the levels of the
   three hub genes. Our analysis revealed that DBT, DPYD, and SLC31A1
   overexpression are involved in extracellular matrix-receptor
   interactions ([156]Fig. 5D–F). Overexpression of DBT and SLC31A1 is
   involved in Leishmania infection, glycan biosynthesis, and viral
   myocarditis. In the high DPYD expression group, galactose metabolism
   globo-series base biosynthesis pathways were active ([157]Fig. 5E). Low
   DBT, DPYD, and SLC31A1 expression levels were linked to lysine
   degradation, tryptophan metabolism, retinol metabolism, threonine
   metabolism, steroid hormone biosynthesis, and other pathways ([158]Fig.
   5D–F). The GSVA result for the GO enrichment analysis is shown in
   [159]Figs. S7–S9.

   In order to verify whether cuproptosis could promote ALD progression by
   mediating immune infiltration, we conducted ssGSEA analysis. ssGSEA
   analysis showed that significantly decreased B cells, CD8 (+) T cells,
   NK cells, T-helper cells, and Th1 cells were observed in ALD patients
   versus normal liver tissue ([160]Fig. 5G). Regarding immune function,
   the APC coinhibition, cytolytic activity, and type II IFN response
   scores were lower in the ALD group than in the control group ([161]Fig.
   5H). [162]Fig. 5F shows that DPYD was significantly connected with the
   type II IFN response, Th1 cells, and NK cells; DBT was significantly
   associated with MHC class 1, CCR, and aDCs. Whereas, SLC31A1 was
   significantly correlated with B cells, APC coinhibition, and CCR
   ([163]Fig. 5I).

3.5. The correlation between CRG clusters and the immune microenvironment

   The “Consensus Cluster Plus” package in R classified different CRG
   clusters according to three hub gene regulators. Taking 2 as the
   optimal k value, 28 ALD samples were divided into CRG clusters A and B
   ([164]Fig. 6A–C). DBT, DPYD, and SLC31A1 expression levels were higher
   in CRG cluster A than in CRG cluster B ([165]Fig. 6D). Heatmaps were
   used to illustrate the different gene expression profiles in the two
   CRG clusters ([166]Fig. 6E). PCA was then used to verify the correction
   of the CRG clustering classification ([167]Fig. 6F). A more in-depth
   analysis of immune cell infiltration was performed considering the
   association between ALD and the immune microenvironment. As shown in
   [168]Figs. 6G and 10 types of immune cells showed significant
   differences between CRG clusters A and B, implying differences in
   immune responses between the two clusters. The association between
   immune cell infiltration and the expression profiles of three CRG
   regulators was confirmed by ssGSEA-based heatmaps ([169]Fig. 6H). In
   addition, we investigated the relationship between immune cell
   infiltration and three CRGs by ssGSEA. As shown in [170]Fig. 6I, DPYD
   was negatively correlated with activated dendritic cells, CD56 bright
   natural killer cells, CD56dim natural killer cells, and monocytes.
   SLC31A1 was negatively correlated with immature dendritic cells and
   positively correlated with monocytes ([171]Fig. 6J). A negative
   correlation was observed between DBT and activated CD4 T cells,
   dendritic cells, gamma delta T cells, immature dendrites, MDSCs,
   macrophages, mast cells, plasmacytoid dendrites, regulatory T cells, T
   follicular helper cells, and type 1 helper cells ([172]Fig. 6K).

Fig. 6.

   [173]Fig. 6
   [174]Open in a new tab

   Correlation between CRG clusters and the immune microenvironment. (A)
   The three CRG hub genes were clustered into CRG clusters A and B under
   unsupervised consensus clustering (k = 2). (B) The consistent
   distribution of various k (ranging from 2 to 9) was described using the
   experience CDF graph. (C) Relative increase in cluster stability by
   delta area fraction. (D) The boxplots displayed the expression level of
   three CRG regulator genes between different CRG clusters. (E) The
   heatmap illustrated the transcriptional profile of CRG regulator genes
   in each cluster. (F) PCA further revealed a significant difference
   between CRG clusters A and B. (G) Different cells in immune
   microenvironment in both groups were presented in the boxplots. (H)
   Heatmap showed the correlation between CRG regulator genes and the
   different immune cells. (I–K) The correlations between three key genes
   and different immune cells. *p < 0.05, **p < 0.01, ***p < 0.001. CRGs,
   cuproptosis-related genes; CDF, cumulative distribution function; PCA,
   principal Component Analysis.

3.6. The correlation between the CRG cluster, gene cluster, and CRG score

   Ninety-two DEGs overlapping between gene clusters A and B were shown in
   [175]Fig. 7A. The optimal k value of the gene cluster was set to 2 to
   ensure compatibility with the number of gene clusters ([176]Fig. 7B–D).
   Heatmaps were used to show the different expression profiles of CRGs
   between the two gene clusters ([177]Fig. 7E). Compared with those in
   gene cluster A, the expression levels of DBT, DPYD, and SLC31A1 were
   increased in gene cluster B ([178]Fig. 7F). We then identified six
   types of immune cells in gene cluster B was significantly higher in
   gene cluster A ([179]Fig. 7G), indicating a significant feature of
   immune microenvironment infiltration. Box charts were used to observe
   the differences in CRG scores between CRG clusters and gene clusters.
   As shown in [180]Fig. 7H, CRG cluster A had a lower CRG score than CRG
   cluster B. The CRG score in gene cluster A was also lower than that in
   gene cluster B ([181]Fig. 7I). Therefore, the CRG scores of different
   CRG clusters and gene clusters differed significantly. The software
   package “ggalluvial” R was used to construct a Sankey chart to better
   display the correspondence between CRG scores, CRG clusters, and gene
   clusters ([182]Fig. 7J).

Fig. 7.

   [183]Fig. 7
   [184]Open in a new tab

   Generation of the CRGs signature and immune microenvironment. (A) 92
   DEGs between CRG clusters A and B. (B) Consensus clustering matrix of
   92 DEGs for k = 2. (C) The consensus CDF curve based on various k
   (k = 2–9). (D) Relative increase in cluster stability by delta area
   fraction. (E) The heatmap showed the distinct expression of CRGs in
   each gene cluster. (F) The boxplots displayed the expression levels of
   the three hub CRGs in their respective clusters. (G) Different immune
   cells in the two CRG clusters. The CRG score was calculated between the
   A and B groups in distinct CRG clusters (H) and gene clusters (I). (J)
   A Sankey diagram illustrated the correlations among the CRG clusters,
   gene clusters, and CRG scores. *p < 0.05, **p < 0.01, ***p < 0.001.
   DEGs, differential expression genes; CRG, cuproptosis-related gene;
   CDF, cumulative distribution function.

3.7. Identification of drug candidates and ceRNA networks based on marker
genes

   To further explore drug therapy options for ALD, we analyzed the
   interactions between key genes and drugs by DGIdb. Cytoscape analysis
   revealed the interaction between genetic markers and drugs ([185]Fig.
   8A). A ceRNA network was constructed with the three essential genes
   using the TargetScan, miRanda, and miRDB databases, revealing 170
   miRNAs and 138 lncRNAs ([186]Fig. 8B).

Fig. 8.

   [187]Fig. 8
   [188]Open in a new tab

   mRNA-drugs and ceRNA network. (A) The purple rectangle nodes symbolize
   the drugs, while the mRNA-drug interaction network is represented by
   orange dots. (A) The ceRNA network, based on marker genes, is depicted
   with blue dots for miRNA and purple dots for lncRNA.

3.8. Expression of CRGs in a mouse ALD model and cell alcohol intervention
model

   H&E and Oil Red O staining suggested hepatic steatosis in the ALD group
   ([189]Fig. 9A). AST and ALT levels were significantly higher in the ALD
   group than in the control group ([190]Fig. 9B). The above results
   indicate that we successfully established the ALD model. qRT‒PCR
   measurement of mRNA levels indicated that the expression levels of the
   three key genes were reduced in the ALD group compared with those in
   the control group ([191]Fig. 9C). Meanwhile, the expression of the
   protein amounts of the three genes in ADL and normal liver tissues was
   significantly different through Western blot ([192]Fig. 9D–E). Nile red
   staining demonstrated substantial lipid deposition in the cell of the
   Ethanol group, characterized by the formation of much more fat droplets
   ([193]Fig. 9F). Taken together, these results collectively indicate the
   successful establishment of the Cells alcohol intervention model. The
   mRNA levels of three CRG hub genes were also assessed using qRT-PCR.
   The results indicated that the expression of DPYD, DBT, and DPYD was
   significantly down-regulated in the ethanol group compared with the NS
   group ([194]Fig. 9G). Immunoblot analysis also confirmed a similar
   decreased expression in the ethanol group ([195]Fig. 9H). These results
   confirm that these three signature CRGs may play an important
   regulatory role in ALD development.

Fig. 9.

   [196]Fig. 9
   [197]Open in a new tab

   Expression of CRGs in a mouse ALD model and cell alcohol intervention
   model. The NIAAA mouse ALD model was generated in C57BL/6J mice.
   Pair-fed mice were used as controls. Serum and liver tissues were
   collected on day 16 for further analysis. (A) HE and Oil Red O
   staining. (B) Serum ALT and AST levels. (C) The relative mRNA
   expression of the three hub genes was verified by qRT‒PCR. (D) The
   protein expression of the three hub genes was confirmed by Western
   blotting. (E) The relative protein expression of the three hub genes.
   (F) Nile red staining. (G) The mRNA levels of the three hub genes in
   cells with/without ethanol treatment. (H) The protein levels of the
   three hub genes in cells with/without ethanol treatment. N = 6–8,
   *p < 0.05, **p < 0.01, ***p < 0.001. CRGs, cuproptosis-related genes;
   ALD, alcohol-related liver disease; NIAAA, National Institute on
   Alcohol Abuse and Alcoholism; HE, Hematein & Eosin; ALT, alanine
   aminotransferase; AST, aspartate aminotransferase; qRT‒PCR,
   quantitative reverse transcription polymerase chain reaction.

4. Discussion

   Alcohol consumption, particularly when prolonged and intense, can cause
   alcoholic fatty liver disease, cirrhosis, and even liver cancer. ALD is
   a major burden worldwide [[198][38], [199][39], [200][40]]. Although
   many studies have been conducted on ALD, many related mechanisms are
   not completely clear, and treatment methods are limited. Exploring
   effective diagnostic biomarkers for ADL and related therapeutic drugs
   is of substantial clinical value. Cuprotosis is a newly discovered
   mechanism of copper-dependent cell death that plays an important role
   in many diseases [[201]21,[202][41], [203][42], [204][43], [205][44]].
   Menkes disease and Wilson disease are two typical genetic diseases
   caused by mutations in the copper transport ATPase gene
   [[206]45,[207]46]. Imbalances in the distribution and homeostasis of
   copper within cells have been linked to neurodegenerative diseases,
   mitochondrial myopathy, and some types of cancer [[208]47]. In
   addition, abnormal copper metabolism may also affect the cholesterol
   biosynthesis pathway, as observed in fish models, where copper exposure
   leads to significant downregulation of genes in the cholesterol
   biosynthesis pathway [[209]48]. However, the specific role of
   cuproptosis in the pathogenesis and regulation of ALD remains unclear.
   Here, we investigated the possible role of CRGs in ALD, identifying
   potential key genes and exploring possible target drugs.

   We downloaded ALD and control liver data from the GEO database for
   statistical analysis to identify differentially expressed CRGs and
   obtained 18 DEGs associated with cuproptosis. These results suggested
   that CRGs may influence ALD progression. Our correlation analysis
   showed that the identified DE-CRGs were closely correlated with each
   other; however, some had no obvious correlation at the protein level.
   Thus, there was heterogeneity in the interaction of CRGs at the gene
   and protein levels.

   The important role of CRGs in LA metabolism, 2-oxocarboxylic acid
   metabolism, and ROS production was revealed by GO and KEGG enrichment
   analyses. LA is an important cofactor of mitochondrial dehydrogenase
   [[210]49], which is necessary for cell growth, metabolic fuel
   production, and antioxidant defenses [[211]50]. LA binds well to copper
   and is also an endogenous membrane permeability metabolite that helps
   improve copper overload in human diseases [[212]51,[213]52]. LA reduces
   the toxic effects of copper elevation by altering the cellular redox
   environment [[214]51]. Copper-dependent cell death occurs through the
   direct binding of copper to the fatty component of the tricarboxylic
   acid (TCA) cycle. The accumulation of fatty acylated proteins,
   resulting in the loss of iron-sulfur (Fe-S) tuftin and the production
   of protein-toxic stress, culminates in cell death [[215]41]. GO MF
   analysis showed that the 18 DE-CRGs were significantly correlated with
   iron-sulfur cluster binding. Previous studies have demonstrated that
   Fe-S protein is significantly associated with copper deformities
   [[216]41]. The biological production of cellular iron thionein (Fe/S)
   is the basic and minimal function of mitochondria [[217]53].
   Destabilization and an overall reduction in Fe-S cluster proteins cause
   proteotoxic stress and mitochondrial dysfunction, and the Fe-S cluster
   protein FDX1 and lipoylation are key regulators of cuproptosis
   [[218]41].

   LASSO, RF, and SVM-REE analyses of the 18 DE-CRGs revealed three key
   genes (DPYD, SLC31A1, and DBT) that can effectively predict ALD, with
   an AUC value of 0.704. Two external datasets were used to verify the
   validity of the three-gene model, with AUCs of 0.670 and 0.800. The AUC
   values for the three key genes in both validation datasets exceeded
   0.670. The nomogram model, calibration curves, and DCA showed that this
   model had a good prediction ability and clinical application value.
   Thus, a predictive model combining these three key genes could serve as
   a reliable and robust biomarker for the effective prediction of ALD.

   DPYD encodes dihydropyrimidine dehydrogenase (DPD), a key enzyme
   involved in fluoropyrimidine metabolism [[219]54]. DPD is widely
   distributed in various normal tissues, especially in the liver and
   peripheral blood mononuclear cells [[220]55]. DPD is the primary and
   rate-limiting enzyme in the uracil and thymine catabolic pathways
   [[221]56]. Regulating pyrimidine catabolism protects against the
   occurrence and development of liver fibrosis [[222]57]. SLC31A1 is a
   member of the copper transporter family with an extracellular copper
   ion binding domain and is considered the main transporter of copper
   ions into the cell [[223]11]. SLC31A1 encodes the high-affinity copper
   uptake protein CTR1, which is essential for mammalian development and
   copper homeostasis [[224]58]. Copper overload and deposition are likely
   related to SLC31A1 [[225]59]. SLC31A1 is abnormally expressed in
   various cancers and is associated with cancer progression [[226]60]. As
   a key gene associated with copper metabolism disorder, SLC31A1 is of
   great value in the prognostic prediction of pancreatic cancer and
   colorectal cancer [[227]61,[228]62]. The GSEA results indicated that
   SLC31A1 was involved in extracellular matrix-receptor interactions and
   O-type glycan biosynthesis. GSVA further showed that high SLC31A1
   expression participated in extracellular matrix-receptor interaction
   and O-type glycan biosynthesis, and low SLC31A1 expression participated
   in retinol metabolism and steroid hormone biosynthesis. DBT is a
   subunit of branched-chain alpha-ketoate dehydrogenase and is essential
   for the second step in branched-chain amino acid catabolism
   [[229]63,[230]64]. DBT instability can lead to an accumulation of
   deleterious derivatives [[231]65]. This imbalance in branched-chain
   amino acid metabolism has been linked to the emergence of ALD
   [[232]66]. DBT can affect MAPK signaling [[233]63], and regulating MAPK
   pathways can improve liver disease in mice with ALD [[234]67].
   Long-term drinking can cause lipid metabolism disorders and cause liver
   lesions [[235]66]. Morrell et al. [[236]68] proposed that lipid
   synthesis was associated with fatty liver disease and copper
   deficiency. The lipoacyl-binding domain of DBT interacts with annexin
   A2, activating Hippo signaling, inhibiting tumor progression, and
   correcting lipid metabolism disorders [[237]69]. The GSEA results
   indicated that SLC31A1 was involved in amino acid metabolism and
   extracellular matrix-receptor interactions. ECM proteins affect ROS
   production [[238]70]. ROS plays an important role in the clinical and
   pathological profile of ALD by affecting intracellular signaling
   pathways and disrupting transcriptional control [[239]71].

   Cuprotosis has been shown to regulate inflammation [[240]21], and our
   results indicate that the three CRGs were significantly associated with
   immune inflammatory cells. Immune cells and inflammation play an
   important role in ALD; various inflammatory cells play a promoting
   role, especially interleukin-8 and neutrophils [[241]72]. This study
   demonstrated a significant distinction between ALD patients and the
   control group in terms of B cells, CD8 (+) T cells, macrophages, NK
   cells, T helper cells, Th1 cells, APC coinhibition, CCR, cytolytic
   activity, accessory inflammatory response, and type II IFN responses.
   Oxidative stress and inflammation play an important role in ALD
   [[242]73]. This study showed that DPYD was significantly associated
   with the type II IFN response, Th1 cells, and NK cells, DBT was
   significantly associated with MHC class 1, CCR, and aDCs, and SLC31A1
   was significantly associated with B cells, APC coinhibition, and CCR.
   SLC31A1 and DBT were significantly associated with immune cells and
   immune function [[243]74]. Previous studies have shown that SLC31A1 is
   significantly associated with immune cells [[244]60,[245]75]. DPYD is
   associated with immune cells and plays a role in the immune
   microenvironment [[246]76]. Activated B-cell pathways and macrophages
   play an important role in ALD [[247][77], [248][78], [249][79]].
   Enhanced CD8 (+) T cells mediate alcoholic liver injury [[250]80]. MHC
   class 1 and NK cells induce hepatocyte apoptosis in ALD
   [[251]81,[252]82]. It is of substantial clinical significance to
   explore therapeutic methods for ALD. We conducted gene-targeting drug
   analysis based on the three key genes identified. Several drugs
   targeting the DPYD and SLC31A1 genes were antitumor chemotherapy drugs
   acting on cell biosynthesis. One of these drugs is leucovorin, a
   derivative of folate, which is a precursor for 5,10-methylene
   tetrahydrofolate [[253][83], [254][84], [255][85], [256][86]].
   Leucovorin is the coenzyme of a carbon group transferase and is
   involved in many important reactions, such as purine and pyrimidine
   synthesis [[257]83]. Folic acid is essential for the normal functioning
   of organisms. In cells, folic acid is a donor and acceptor of carbon
   metabolism and is involved in methylation, an important biological
   process [[258]87]. Folic acid is also involved in the synthesis of
   nucleic acids, amino acids, and pantothenic acid [[259]88]. Folic acid
   has been shown to reduce alcohol-induced liver damage, and folic acid
   supplementation may be useful for the prevention and treatment of ALD
   [[260]89]. Therefore, leucovorin has the potential to be an effective
   preventive measure for ALD. Since lncRNAs, miRNAs, and mRNAs interact
   and affect cellular biosynthesis [[261][90], [262][91], [263][92]], we
   constructed an mRNA-miRNA-lncRNA regulatory network for ALD, revealing
   that lncRNAs could regulate the three key genes (DPYD, DBT, and
   SLC31A1). Therefore, gene-targeted drug analysis provides a new way to
   search for possible drugs to prevent and treat ALD further, and ceRNA
   network analysis provides a new avenue for further exploring ALD
   pathogenesis. These findings require further validation in cell and
   animal studies.

   Based on these findings, therapeutic strategies, such as small
   interfering RNA (siRNA) or gene editing techniques, are designed and
   implemented to target genes involved in copper death, with a view to
   developing new therapies.

   Our study has some limitations. First, we performed genetic analysis on
   data downloaded from the GEO database, and the datasets may have
   certain biases. Second, the total number of cases was relatively small.
   Furthermore, we have not yet performed cellular or animal validation of
   the gene-targeting drugs we discovered.

5. Conclusions

   We identified three important genes, and by combining these three
   genes, we can accurately diagnose patients with ALD. Then, we explored
   the relationship between these genes and invasive immune cells and
   analyzed the significant heterogeneity in immune responses between ALD
   patients and control liver samples. Our research unveils the role of
   copper in ALD, providing a new theoretical foundation for the potential
   pathogenesis of ALD and therapeutic options.

Ethics approval and consent to participate

   All animal experiments were approved by the Animal Care and Use
   Committee of the First Affiliated Hosputal of Guangxi Medical
   University (NO.2023-S606-01).

Consent for publication

   Not applicable.

Data availability

   The datasets in this study were enrolled from the GEO database
   ([264]https://www.ncbi.nlm.nih.gov/geo/), with the following data
   accessions enrolled: [265]GSE28619, [266]GSE103580, [267]GSE142530 and
   [268]GSE155907. This data can be found here:
   [269]https://www.jianguoyun.com/p/DT-bA08QjrmsDBjJ-bIFIAA.

Funding

   This work was supported in part by the National Key Research and
   Development Program (2022YFE0131600); Guangxi Science and Technology
   Base and Talent Project (GuikeAA21220002); The 111 Project (D17011);
   Advanced Innovation Teams and Xinghu Scholars Program of Guangxi
   Medical University.

CRediT authorship contribution statement

   Jiangfa Li: Writing – review & editing, Writing – original draft,
   Validation, Methodology, Formal analysis, Data curation. Yong Wang:
   Writing – original draft, Methodology, Formal analysis, Data curation.
   Zhan Wu: Methodology, Investigation. Mingbei Zhong: Methodology,
   Investigation. Gangping Feng: Methodology, Investigation, Formal
   analysis. Zhipeng Liu: Validation, Methodology. Yonglian Zeng:
   Supervision, Software, Data curation. Zaiwa Wei: Methodology, Formal
   analysis. Sebastian Mueller: Methodology, Investigation, Data curation.
   Songqing He: Visualization, Validation, Methodology, Investigation,
   Funding acquisition, Conceptualization. Guoqing Ouyang: Supervision,
   Software, Resources, Methodology, Investigation, Formal analysis, Data
   curation. Guandou Yuan: Writing – review & editing, Visualization,
   Supervision, Methodology, Investigation, Funding acquisition, Formal
   analysis, Data curation, Conceptualization.

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Acknowledgements