Abstract

Aim

   The term “Cuproptosis” was coined to describe a novel type of cell
   death triggered by intracellular copper buildup that is fundamentally
   distinct from other recognized types such as autophagy, ferroptosis,
   and pyroptosis in recent days. As the underlying mechanism was newly
   identified, its potential connection to pancreatic adenocarcinoma
   (PAAD) is still an open issue.

Methods

   A set of machine learning algorithms was used to develop a
   Cuproptosis-related gene index (CRGI). Its immunological
   characteristics were studied by exploring its implications on the
   expression of the immunological checkpoints, prospective immunotherapy
   responses, etc. Moreover, the sensitivity to chemotherapeutic drugs was
   predicted. Unsupervised consensus clustering was performed to more
   precisely identify different CRGI-based molecular subtypes and
   investigate the immunotherapy and chemotherapy efficacy. The expression
   of DLAT, LIPT1 and LIAS were also investigated, through real-time
   quantitative polymerase chain reaction (RT-qPCR), western blot, and
   immunofluorescence staining (IFS).

Results

   A novel CRGI was identified and validated. Additionally, correlation
   analysis revealed major changes in tumor immunology across the high-
   and low-CRGI groups. Through an in-depth study of each medication, it
   was determined that the predictive chemotherapeutic efficacy of 32
   regularly used anticancer drugs differed between high- and low-CRGI
   groups. The results of the molecular subtyping provided more support
   for such theories. Expressional assays performed at transcriptomic and
   proteomic levels suggested that the aforementioned Cuproptosis-related
   genes might serve as reliable diagnostic biomarkers in PAAD.

Significance

   This is, to the best of our knowledge, the first study to examine
   prognostic prediction in PAAD from the standpoint of Cuproptosis. These
   findings may benefit future immunotherapy and chemotherapeutic
   therapies.

   Keywords: cuproptosis, machine learning, pancreatic cancer, tumor
   microenvironment, immunotherapy, chemotherapy, gene signature

Introduction

   Copper is an indispensable element for human survival. However, its
   redox activity can be damaging to the cell that has evolved highly
   coordinated processes to chelate copper ions and transport them
   throughout the cell. Owing to its key involvement in pathways needed
   for normal cell development and metabolism, the copper level is
   typically dysregulated in malignancies ([29]1). Hence, Cuproptosis as a
   unique cell death mechanism in which intracellular copper concentration
   plays a crucial role has attracted considerable interest in the
   scientific community ([30]2).

   The exploration of a potential connection between the copper
   equilibrium and the healthiness of the pancreas has a long history ever
   since the last century. As early as 1989, Dubick et al. found that
   nutritional copper deficiency raised a morphological change in the
   pancreas in female rats, and increased its susceptibility to oxidative
   damage ([31]3). In 1997, Fields et al. reported that copper deficiency
   could lead to impaired functions and pancreatic atrophy in both male
   and female rats ([32]4).

   In the recent decade, increasing evidence suggested that abnormal
   buildup of copper stress might be linked to a lot of cancer types such
   as prostatic cancer ([33]5), among which, pancreatic cancer was
   included. Clinically, according to the observation of Lener et. al.,
   the concentration of copper ions was significantly elevated in patients
   with PAAD ([34]6, [35]7).

   Inspired by these findings, scientists started to develop new
   treatments attempting to regulate copper hemostasis. Existing studies
   suggested that copper-ionophores and copper-chelators might act as
   anticancer agents, although the lack of selectivity remained one of the
   most challenging obstacles in reality. Lately, breakthroughs in which
   the conjugation of targeting units with copper ionophores was proven to
   be effective occurred. Additionally, alternative options were brought
   forth by the exploitation of proionophores and the implementation of a
   nano-drug delivery system. One typical example in this field is the
   project led by Gaál et al. ([36]8) who applied a thermosensitive
   liposomal formulation laden with copper and neocuproine to the C26
   cancer cells in mice and detected both in-vitro and in-vivo toxicity.

   Given that various types of cell death modalities (e.g. autophagy
   ferroptosis, etc.) were proven to be intimately associated with the
   eradication of tumors ([37]9–[38]11), we were inquisitive about the
   relationship between Cuproptosis and pancreatic adenocarcinoma (PAAD)
   for its exceptionally poor 5-year overall survival (OS)
   ([39]12–[40]20), and a striking fact that one of the
   Cuproptosis-related genes, CDKN2A, was a fully-investigated and
   well-known biomarker in PAAD at the same time.

   Taking it altogether, in the present study, we optimized a
   Cuproptosis-gene index (CRGI) with important implications for
   prognosis, tumor immunology, molecular subtypes, and the efficacy of
   immunotherapy and chemotherapy through 10 mainstream algorithms in
   machine learning. Supplementarily, 3 essential CRGI genes (i.e., DLAT,
   LIPT1, and LIAS) were found with a promising potential to serve as
   diagnostic biomarkers through computational method analyzing open data
   combined with in-vitro validation.

   [41]Figure 1 demonstrates the workflow of the present study briefly.

Figure 1.

   [42]Figure 1
   [43]Open in a new tab

   Graphical abstract of the present study.

Materials and methods

Collection and integration of the transcriptome data and matched clinical
information

   In the present study, we retrospectively curated 10 Cuproptosis-related
   genes from the work of Tsvetkov et al. ([44]2). The transcriptomic data
   and matching clinical information on pancreatic adenocarcinoma (PAAD)
   from publicly accessible sources, including the Cancer Genome Atlas
   (TCGA, [45]https://www.cancer.gov/tcga, N = 177) and the Australian
   dataset of International Cancer Genome Consortium (ICGC,
   [46]https://www.icgc-argo.org, N = 269) ([47]21). All the data involved
   in the present study were processed by R Foundation (v 4.0.3) and
   corresponding R packages. Notably, if it wasn’t specifically mentioned,
   P-value<0.05 is considered statistically significant and might be
   annotated as * within the figures. Moreover, **, ***, and **** might
   appear within the figures to indicate the P-value thresholds 0.01,
   0.001, and 0.0001, respectively. Besides, we took the academic writing
   style of Xie et al. ([48]22) as a reference to construct the present
   manuscript.

Machine learning in the development of cuproptosis-related gene index

   10 mainstream machine learning algorithms were used in the optimization
   of CRGI, including least absolute shrinkage and selection operator
   (LASSO), decision tree, Gaussian mixture model (GMM), gradient boosted
   decision trees (GBDT), K-nearest neighbors (K-NN), light gradient
   boosting machine (LGBM), logistic regression, random forest, support
   vector machine (SVM), and extreme gradient boosting (XGboost)
   ([49]23–[50]31). Their performances were assessed by the time-dependent
   receiver operative characteristic (ROC) curves in which the area under
   the curve (AUC) represented the predictive power. The greater the AUC
   value indicated the better accuracy and robustness of the model. The
   ROC curve was created by the R package “timeROC”.

Decision curve analysis

   Usually, prognostic models and diagnostic tests are mathematically
   evaluated with measures of accuracy that do not consider clinical
   outcomes. To help improve such shortcomings, DCA was developed
   ([51]32). It is often used to compare the efficacy of different
   predictive models and diagnostic tests to maximize the clinical
   benefits when false positives and false negatives are known to be
   unavoidable. In the present study, we performed this analysis by using
   the R package “ggDCA”.

Construction of a conventional nomogram and corresponding calibration curve

   The CRGI of our model that was optimized through the machine learning
   method as aforementioned was integrated as a prognostic indicator with
   other clinical factors to estimate the overall survival (OS)
   probability, through univariate and multivariate Cox regression, and a
   traditional nomogram with calibration curve was constructed from these
   results. The visualization was achieved by using the R packages
   “forestplot”, and “rms” ([52]33, [53]34).

Construction of an online OS calculator

   By utilizing the analytic results acquired from the last step, we built
   an easy-to-use web-based OS calculator by the R package “DynNom”
   ([54]35).

   The calculator is available at
   [55]https://debmed.shinyapps.io/CRGIProgCal/.

Gene set enrichment analysis

   The GSEA software (v 4.0.3,
   [56]http://software.broadinstitute.org/gsea/index.jsp) combined with
   the gseKEGG and gseGO functions of the “clusterProfiler” package was
   used to investigate the underlying mechanism of the high- and low-CRGI
   groups, and further identified the KEGG and GO pathways that were
   significantly enriched through Gene Ontology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) databases ([57]36–[58]41).

Estimation of the tumor microenvironment condition

   The tumor microenvironment condition was assessed quantitatively by
   calculating the levels of stromal and immune cell infiltration using
   the expression profiles obtained from the TCGA dataset. This was done
   by the R package “ESTIMATE” in which the stromal score, immune score,
   and ESTIMATE score were calculated ([59]42). The Wilcoxon t-test was
   performed for the calculation of each score to compare them in high-
   and low-CRGI groups.

Screening of immune cell infiltration

   The gene expression profiles were processed by integrating 7 mainstream
   immunoinformatic algorithms, including TIMER, CIBERSORT, CIBERSORT−ABS,
   QUANTISEQ, MCPCOUNTER, XCELL, and EPIC, and the immune cell
   infiltration matrix was obtained ([60]43–[61]45). The R package
   “ggplot2” was used to visualize the distribution of infiltration of
   diverse immune cell types as a heatmap.

Cancer-immunity cycle and 19 known biological processes

   In July 2013, Chen and Mellman ([62]46) systemically described a series
   of self-sustaining stepwise events, termed the cancer-immunity cycle in
   which the anticancer immune responses eliminated the cancer cells
   efficiently. The cancer-immunity cycle consisted of 7 steps: “release
   of cancer antigens”, “cancer antigen presentation”, “priming and
   activation”, “trafficking of T cells to tumors”, “infiltration of T
   cells into tumors”, “recognition of cancer cells by T cells”, and
   “killing of cancer cells”.

   As for the 19 known biological processes including tumor inflammation
   signature, cellular response to hypoxia, tumor proliferation signature,
   EMT markers, ECM-related genes, angiogenesis, apoptosis, DNA repair,
   etc., genes involved in this analysis were retrieved from the works of
   Wei et al. and Mariathasan et al. ([63]47, [64]48).

Prediction of the potential response to immune checkpoint blockade

   The concept of immunotherapy for tumors was proposed at the end of the
   19th century and refers to a treatment method that uses the body’s
   immune system to destroy cancer cells. The therapies that use immune
   checkpoint blockade have revolutionized the treatment of human cancer.

   Herein, we firstly used the Tumor Immune Dysfunction and Exclusion
   (TIDE) algorithm which was developed as a computational method to model
   the primary mechanisms of tumor immune escape to predict the
   responsiveness of a single sample or a subtype based on expression
   profiling data ([65]49). Since the original publication of the TIDE
   algorithm, researchers have widely applied it to their studies. A
   typical example could be seen in the work of Tang et al. in 2021 in the
   Journal of Translational Medicine ([66]50). More similar studies could
   also be found elsewhere. For instance, the work of Chen et al.
   regarding a Necroptosis-related lncRNA signature in breast cancer also
   directly used the TIDE algorithm for predictive purposes ([67]51).

   Nevertheless, considering that the TIDE algorithm was merely
   experimentally verified in melanomas and non-small cell lung cancer, to
   further confirm the reliability of the TIDE prediction, the SubMap
   algorithm was implemented ([68]52). It calculated the similarity of the
   expression profile of PAAD patients in the high- and low-CRGI groups to
   the urothelial bladder carcinoma (BLCA) patients recorded in the
   IMvigor210 dataset who responded or did not respond to the PD-1 and
   CTLA4 therapies, which in turn indirectly predicted immunotherapy
   efficacy ([69]53). Such a solution was inspired by some previously
   published articles like the work of Yu et al. on lung adenocarcinoma
   ([70]54).

Unsupervised clustering through the K-means algorithm

   The consistency analysis was performed using the R package
   “ConsensusClusterPlus (v 1.54.0)”, the maximal number of clusters was
   6, and 80% of the samples were extracted 100 times through a
   re-sampling approach ([71]55). The package generated the consensus
   matrix, empirical cumulative distribution function (CDF), and delta
   area plots for each selected K value.

   Moreover, as a complementary confirmation, a principal component
   analysis (PCA) was conducted to elucidate if the samples were
   well-separated with the batch effect fully removed.

Survival analysis

   The Kaplan Meiler curve was applied to compare the survival difference
   between different groups. The P-value and hazard ratio (HR) with a 95%
   confidence interval (CI) were generated by log-rank test and univariate
   Cox proportional hazards regression. Both were done by the R package
   “survival” ([72]56, [73]57).

Evaluation of the response to chemotherapy

   32 commonly used anticancer drugs were involved in the present study.
   Their half-maximal inhibitory concentration (IC50) values were
   predicted from the expression matrix by the pRRopheticPredict function
   of the R package “pRRophetic” ([74]58).

Cell lines and cell culture

   There were 5 cell lines (i.e., normal human pancreatic ductal
   epithelial cell line: HPDE6-C7, and pancreatic cancer cell lines:
   CFPAC-1, PANC-1, SW1990, and AsPC-1) involved. HPDE6-C7 was cultured in
   a mixed solution of high-glucose DMEM and 10% FBS, CFPAC-1 was cultured
   in a mixed solution of DMEM/IMDM/1640 and 10% FBS, PANC-1 was cultured
   in a mixed solution of DMEM/1640 and 10% FBS, SW1990 was cultured in a
   mixed solution of DMEM/L-15/1640 and 10%FBS, and AsPC-1 was cultured in
   a mixed solution of DMEM/1640 and 10%FBS. All the aforementioned cell
   lines were incubated at 37°C and in a 5% CO2 incubator.

Real-time polymerase chain reaction

   The total RNA of cells was extracted through the one-step method (i.e.,
   Trizol), then 2uL of which was used for reverse transcription. The
   quality of the cDNA was tested before further steps. The RT-qPCR was
   conducted under the following primer sequences design ([75]
   Supplementary Material T1 ) and using the StepOne Software instrument
   (ABI, USA). The reaction conditions were 95°C for 5 min, 95°C for 10 s,
   58°C for 20 s, and 72°C for 20 s. in total, 40 cycles were run. After
   the end of the reaction, the software automatically analyzed the
   fluorescence signal and converted it to the Ct value.

Western blot

   All the cell lines were washed by PBS 3 times and lysed by lyase,
   assisted with an ultrasonic cell crusher to ensure we could obtain the
   targeted proteins fully. The extracted proteins were quantified and
   loaded for SDS-PAGE gel for electrophoresis, transferred to PVDF
   membranes, and blocked by 10% milk. Later, the membranes were incubated
   with primary antibodies and secondary antibodies to form
   immunocomplexes which were visualized by enhanced chemiluminescence and
   followed by directly photographing and quantitative analysis.

Immunofluorescence staining

   HPDE6-C7 and AsPC-1 cell lines were used in the IFS validation and
   observed under 100- and 400-time magnification for the protein staining
   of DLAT, LIPT1, and LIAS, respectively. Information regarding the
   antibodies used in the present study is available in the
   [76]Supplementary Material T2 .

Results

CRGI was optimized from 10 mainstream algorithms

   10 Cuproptosis-related genes were curated from the work of Tsvetkov
   et al. ([77]2). Combined with 6 well-recognized biomarkers (i.e., KRAS,
   TP53, SMAD4, BRCA1, BRCA2, and CDKN2A) in pancreatic adenocarcinoma
   (PAAD), they were subjected to mainstream machine learning procedures
   to develop a Cuproptosis-related gene index (CRGI) ([78]59).

   In the TCGA dataset, among 10 mainstream machine learning algorithms,
   we optimized the best model through LASSO penalized Cox regression that
   had a leading AUC value in 1, 2, 3, and 4-year overall survival (OS)
   predictive performance, up to 0.736, 0.703, 0.708, 0.812, respectively
   ([79] Figure 2C ). The formula for the CRGI calculation is:

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Predictive performance comparison of the 10 mainstream machine learning
   algorithms in the TCGA dataset. (A) Distribution of the CRGI and
   survival status of individual PAAD patients. (B) Survival analysis of
   the high- and low-CRGI groups. (C) The time-dependent ROC curve with
   the AUC value of 1-, 2-, 3-, 4-, and 5-year OS prediction of the best
   model optimized by LASSO penalized Cox regression. (D–L) The predictive
   accuracy of other machine learning algorithms (i.e., Decision Tree,
   GMM, GBDT, K-NN, LGBM, Logistic Regression, Random Forest, SVM, and
   XGboost, respectively).
   [MATH: <mtable
   columnalign="left"><mtr><mtd><mi>C</mi><mi>R</mi><mi>G</mi><mi>I</mi><m
   text> </mtext><mo>=</mo><mtext> </mtext><mn>0.5316</mn><mo>*</mo><mi>K<
   /mi><mi>R</mi><mi>A</mi><mi>S</mi><mtext> </mtext><mo>+</mo><mtext> </m
   text><mn>0.014</mn><mo>*</mo><mi>T</mi><mi>P</mi><mn>53</mn><mtext> </m
   text><mo>–</mo><mtext> </mtext><mn>0.0407</mn><mo>*</mo><mi>C</mi><mi>D
   </mi><mi>K</mi><mi>N</mi><mn>2</mn><mi>A</mi><mtext> </mtext><mo>–</mo>
   <mtext> </mtext><mn>0.0999</mn><mo>*</mo><mi>S</mi><mi>M</mi><mi>A</mi>
   <mi>D</mi><mn>4</mn></mtd></mtr><mtr><mtd><mtext> </mtext><mo>+</mo><mt
   ext> </mtext><mn>0.3768</mn><mo>*</mo><mi>B</mi><mi>R</mi><mi>C</mi><mi
   >A</mi><mn>1</mn><mtext> </mtext><mo>+</mo><mn>0.0866</mn><mo>*</mo><mi
   >B</mi><mi>R</mi><mi>C</mi><mi>A</mi><mn>2</mn><mtext> </mtext><mo>–</m
   o><mtext> </mtext><mn>0.1292</mn><mo>*</mo><mi>L</mi><mi>I</mi><mi>A</m
   i><mi>S</mi><mtext> </mtext><mo>–</mo><mtext> </mtext><mn>0.587</mn><mo
   >*</mo><mi>L</mi><mi>I</mi><mi>P</mi><mi>T</mi><mn>1</mn><mtext> </mtex
   t><mo>–</mo><mtext> </mtext><mn>0.3158</mn><mo>*</mo><mi>D</mi><mi>L</m
   i><mi>D</mi><mtext> </mtext><mo>+</mo><mtext> </mtext><mn>0.4833</mn><m
   o>*</mo><mi>D</mi><mi>L</mi><mi>A</mi><mi>T</mi><mtext> </mtext><mo>–</
   mo><mtext> </mtext><mn>0.3627</mn><mo>*</mo><mi>P</mi><mi>D</mi><mi>H</
   mi><mi>A</mi><mn>1</mn><mtext> </mtext></mtd></mtr><mtr><mtd><mo>–</mo>
   <mn>0.3253</mn><mo>*</mo><mi>M</mi><mi>T</mi><mi>F</mi><mn>1</mn><mtext
   > </mtext><mo>–</mo><mtext> </mtext><mn>0.1286</mn><mo>*</mo><mi>G</mi>
   <mi>L</mi><mi>S</mi></mtd></mtr></mtable> :MATH]

   Following the calculation of the CRGI, patients were separated into the
   high- and low-CRGI groups by the median value of all CRGI. It was
   observed that the number of patients who deceased significantly climbed
   up with the increase in CRGI value ([82] Figure 2A ). The survival
   analysis further revealed the fact that the low-CRGI group possessed a
   significant survival advantage ([83] Figure 2B ). Similar analytic
   results were found when we validated our model in the ICGC dataset
   ([84] Figures 3D, E ).

Figure 3.

   [85]Figure 3
   [86]Open in a new tab

   Comparison of other cell death mechanisms-based prognostic signatures
   in PAAD. (A–C) The ROC curve of autophagy-, ferroptosis-, and
   pyroptosis-based models in the TCGA dataset, respectively. Distribution
   of the CRGI and survival status of individual PAAD patients (D), and
   Survival analysis (E) of the high- and low-CRGI groups in the ICGC
   dataset. (F–I) The ROC curve of our model, the autophagy-,
   ferroptosis-, and pyroptosis-based models in the ICGC dataset,
   respectively. DCA diagrams of our model, the autophagy-, ferroptosis-,
   and pyroptosis-based models in the TCGA dataset (J), and ICGC dataset
   (K). All: all positive; None: all negative. They are the extreme
   conditions that serve as background references.