Abstract Aim The term “Cuproptosis” was coined to describe a novel type of cell death triggered by intracellular copper buildup that is fundamentally distinct from other recognized types such as autophagy, ferroptosis, and pyroptosis in recent days. As the underlying mechanism was newly identified, its potential connection to pancreatic adenocarcinoma (PAAD) is still an open issue. Methods A set of machine learning algorithms was used to develop a Cuproptosis-related gene index (CRGI). Its immunological characteristics were studied by exploring its implications on the expression of the immunological checkpoints, prospective immunotherapy responses, etc. Moreover, the sensitivity to chemotherapeutic drugs was predicted. Unsupervised consensus clustering was performed to more precisely identify different CRGI-based molecular subtypes and investigate the immunotherapy and chemotherapy efficacy. The expression of DLAT, LIPT1 and LIAS were also investigated, through real-time quantitative polymerase chain reaction (RT-qPCR), western blot, and immunofluorescence staining (IFS). Results A novel CRGI was identified and validated. Additionally, correlation analysis revealed major changes in tumor immunology across the high- and low-CRGI groups. Through an in-depth study of each medication, it was determined that the predictive chemotherapeutic efficacy of 32 regularly used anticancer drugs differed between high- and low-CRGI groups. The results of the molecular subtyping provided more support for such theories. Expressional assays performed at transcriptomic and proteomic levels suggested that the aforementioned Cuproptosis-related genes might serve as reliable diagnostic biomarkers in PAAD. Significance This is, to the best of our knowledge, the first study to examine prognostic prediction in PAAD from the standpoint of Cuproptosis. These findings may benefit future immunotherapy and chemotherapeutic therapies. Keywords: cuproptosis, machine learning, pancreatic cancer, tumor microenvironment, immunotherapy, chemotherapy, gene signature Introduction Copper is an indispensable element for human survival. However, its redox activity can be damaging to the cell that has evolved highly coordinated processes to chelate copper ions and transport them throughout the cell. Owing to its key involvement in pathways needed for normal cell development and metabolism, the copper level is typically dysregulated in malignancies ([29]1). Hence, Cuproptosis as a unique cell death mechanism in which intracellular copper concentration plays a crucial role has attracted considerable interest in the scientific community ([30]2). The exploration of a potential connection between the copper equilibrium and the healthiness of the pancreas has a long history ever since the last century. As early as 1989, Dubick et al. found that nutritional copper deficiency raised a morphological change in the pancreas in female rats, and increased its susceptibility to oxidative damage ([31]3). In 1997, Fields et al. reported that copper deficiency could lead to impaired functions and pancreatic atrophy in both male and female rats ([32]4). In the recent decade, increasing evidence suggested that abnormal buildup of copper stress might be linked to a lot of cancer types such as prostatic cancer ([33]5), among which, pancreatic cancer was included. Clinically, according to the observation of Lener et. al., the concentration of copper ions was significantly elevated in patients with PAAD ([34]6, [35]7). Inspired by these findings, scientists started to develop new treatments attempting to regulate copper hemostasis. Existing studies suggested that copper-ionophores and copper-chelators might act as anticancer agents, although the lack of selectivity remained one of the most challenging obstacles in reality. Lately, breakthroughs in which the conjugation of targeting units with copper ionophores was proven to be effective occurred. Additionally, alternative options were brought forth by the exploitation of proionophores and the implementation of a nano-drug delivery system. One typical example in this field is the project led by Gaál et al. ([36]8) who applied a thermosensitive liposomal formulation laden with copper and neocuproine to the C26 cancer cells in mice and detected both in-vitro and in-vivo toxicity. Given that various types of cell death modalities (e.g. autophagy ferroptosis, etc.) were proven to be intimately associated with the eradication of tumors ([37]9–[38]11), we were inquisitive about the relationship between Cuproptosis and pancreatic adenocarcinoma (PAAD) for its exceptionally poor 5-year overall survival (OS) ([39]12–[40]20), and a striking fact that one of the Cuproptosis-related genes, CDKN2A, was a fully-investigated and well-known biomarker in PAAD at the same time. Taking it altogether, in the present study, we optimized a Cuproptosis-gene index (CRGI) with important implications for prognosis, tumor immunology, molecular subtypes, and the efficacy of immunotherapy and chemotherapy through 10 mainstream algorithms in machine learning. Supplementarily, 3 essential CRGI genes (i.e., DLAT, LIPT1, and LIAS) were found with a promising potential to serve as diagnostic biomarkers through computational method analyzing open data combined with in-vitro validation. [41]Figure 1 demonstrates the workflow of the present study briefly. Figure 1. [42]Figure 1 [43]Open in a new tab Graphical abstract of the present study. Materials and methods Collection and integration of the transcriptome data and matched clinical information In the present study, we retrospectively curated 10 Cuproptosis-related genes from the work of Tsvetkov et al. ([44]2). The transcriptomic data and matching clinical information on pancreatic adenocarcinoma (PAAD) from publicly accessible sources, including the Cancer Genome Atlas (TCGA, [45]https://www.cancer.gov/tcga, N = 177) and the Australian dataset of International Cancer Genome Consortium (ICGC, [46]https://www.icgc-argo.org, N = 269) ([47]21). All the data involved in the present study were processed by R Foundation (v 4.0.3) and corresponding R packages. Notably, if it wasn’t specifically mentioned, P-value<0.05 is considered statistically significant and might be annotated as * within the figures. Moreover, **, ***, and **** might appear within the figures to indicate the P-value thresholds 0.01, 0.001, and 0.0001, respectively. Besides, we took the academic writing style of Xie et al. ([48]22) as a reference to construct the present manuscript. Machine learning in the development of cuproptosis-related gene index 10 mainstream machine learning algorithms were used in the optimization of CRGI, including least absolute shrinkage and selection operator (LASSO), decision tree, Gaussian mixture model (GMM), gradient boosted decision trees (GBDT), K-nearest neighbors (K-NN), light gradient boosting machine (LGBM), logistic regression, random forest, support vector machine (SVM), and extreme gradient boosting (XGboost) ([49]23–[50]31). Their performances were assessed by the time-dependent receiver operative characteristic (ROC) curves in which the area under the curve (AUC) represented the predictive power. The greater the AUC value indicated the better accuracy and robustness of the model. The ROC curve was created by the R package “timeROC”. Decision curve analysis Usually, prognostic models and diagnostic tests are mathematically evaluated with measures of accuracy that do not consider clinical outcomes. To help improve such shortcomings, DCA was developed ([51]32). It is often used to compare the efficacy of different predictive models and diagnostic tests to maximize the clinical benefits when false positives and false negatives are known to be unavoidable. In the present study, we performed this analysis by using the R package “ggDCA”. Construction of a conventional nomogram and corresponding calibration curve The CRGI of our model that was optimized through the machine learning method as aforementioned was integrated as a prognostic indicator with other clinical factors to estimate the overall survival (OS) probability, through univariate and multivariate Cox regression, and a traditional nomogram with calibration curve was constructed from these results. The visualization was achieved by using the R packages “forestplot”, and “rms” ([52]33, [53]34). Construction of an online OS calculator By utilizing the analytic results acquired from the last step, we built an easy-to-use web-based OS calculator by the R package “DynNom” ([54]35). The calculator is available at [55]https://debmed.shinyapps.io/CRGIProgCal/. Gene set enrichment analysis The GSEA software (v 4.0.3, [56]http://software.broadinstitute.org/gsea/index.jsp) combined with the gseKEGG and gseGO functions of the “clusterProfiler” package was used to investigate the underlying mechanism of the high- and low-CRGI groups, and further identified the KEGG and GO pathways that were significantly enriched through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases ([57]36–[58]41). Estimation of the tumor microenvironment condition The tumor microenvironment condition was assessed quantitatively by calculating the levels of stromal and immune cell infiltration using the expression profiles obtained from the TCGA dataset. This was done by the R package “ESTIMATE” in which the stromal score, immune score, and ESTIMATE score were calculated ([59]42). The Wilcoxon t-test was performed for the calculation of each score to compare them in high- and low-CRGI groups. Screening of immune cell infiltration The gene expression profiles were processed by integrating 7 mainstream immunoinformatic algorithms, including TIMER, CIBERSORT, CIBERSORT−ABS, QUANTISEQ, MCPCOUNTER, XCELL, and EPIC, and the immune cell infiltration matrix was obtained ([60]43–[61]45). The R package “ggplot2” was used to visualize the distribution of infiltration of diverse immune cell types as a heatmap. Cancer-immunity cycle and 19 known biological processes In July 2013, Chen and Mellman ([62]46) systemically described a series of self-sustaining stepwise events, termed the cancer-immunity cycle in which the anticancer immune responses eliminated the cancer cells efficiently. The cancer-immunity cycle consisted of 7 steps: “release of cancer antigens”, “cancer antigen presentation”, “priming and activation”, “trafficking of T cells to tumors”, “infiltration of T cells into tumors”, “recognition of cancer cells by T cells”, and “killing of cancer cells”. As for the 19 known biological processes including tumor inflammation signature, cellular response to hypoxia, tumor proliferation signature, EMT markers, ECM-related genes, angiogenesis, apoptosis, DNA repair, etc., genes involved in this analysis were retrieved from the works of Wei et al. and Mariathasan et al. ([63]47, [64]48). Prediction of the potential response to immune checkpoint blockade The concept of immunotherapy for tumors was proposed at the end of the 19th century and refers to a treatment method that uses the body’s immune system to destroy cancer cells. The therapies that use immune checkpoint blockade have revolutionized the treatment of human cancer. Herein, we firstly used the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm which was developed as a computational method to model the primary mechanisms of tumor immune escape to predict the responsiveness of a single sample or a subtype based on expression profiling data ([65]49). Since the original publication of the TIDE algorithm, researchers have widely applied it to their studies. A typical example could be seen in the work of Tang et al. in 2021 in the Journal of Translational Medicine ([66]50). More similar studies could also be found elsewhere. For instance, the work of Chen et al. regarding a Necroptosis-related lncRNA signature in breast cancer also directly used the TIDE algorithm for predictive purposes ([67]51). Nevertheless, considering that the TIDE algorithm was merely experimentally verified in melanomas and non-small cell lung cancer, to further confirm the reliability of the TIDE prediction, the SubMap algorithm was implemented ([68]52). It calculated the similarity of the expression profile of PAAD patients in the high- and low-CRGI groups to the urothelial bladder carcinoma (BLCA) patients recorded in the IMvigor210 dataset who responded or did not respond to the PD-1 and CTLA4 therapies, which in turn indirectly predicted immunotherapy efficacy ([69]53). Such a solution was inspired by some previously published articles like the work of Yu et al. on lung adenocarcinoma ([70]54). Unsupervised clustering through the K-means algorithm The consistency analysis was performed using the R package “ConsensusClusterPlus (v 1.54.0)”, the maximal number of clusters was 6, and 80% of the samples were extracted 100 times through a re-sampling approach ([71]55). The package generated the consensus matrix, empirical cumulative distribution function (CDF), and delta area plots for each selected K value. Moreover, as a complementary confirmation, a principal component analysis (PCA) was conducted to elucidate if the samples were well-separated with the batch effect fully removed. Survival analysis The Kaplan Meiler curve was applied to compare the survival difference between different groups. The P-value and hazard ratio (HR) with a 95% confidence interval (CI) were generated by log-rank test and univariate Cox proportional hazards regression. Both were done by the R package “survival” ([72]56, [73]57). Evaluation of the response to chemotherapy 32 commonly used anticancer drugs were involved in the present study. Their half-maximal inhibitory concentration (IC50) values were predicted from the expression matrix by the pRRopheticPredict function of the R package “pRRophetic” ([74]58). Cell lines and cell culture There were 5 cell lines (i.e., normal human pancreatic ductal epithelial cell line: HPDE6-C7, and pancreatic cancer cell lines: CFPAC-1, PANC-1, SW1990, and AsPC-1) involved. HPDE6-C7 was cultured in a mixed solution of high-glucose DMEM and 10% FBS, CFPAC-1 was cultured in a mixed solution of DMEM/IMDM/1640 and 10% FBS, PANC-1 was cultured in a mixed solution of DMEM/1640 and 10% FBS, SW1990 was cultured in a mixed solution of DMEM/L-15/1640 and 10%FBS, and AsPC-1 was cultured in a mixed solution of DMEM/1640 and 10%FBS. All the aforementioned cell lines were incubated at 37°C and in a 5% CO2 incubator. Real-time polymerase chain reaction The total RNA of cells was extracted through the one-step method (i.e., Trizol), then 2uL of which was used for reverse transcription. The quality of the cDNA was tested before further steps. The RT-qPCR was conducted under the following primer sequences design ([75] Supplementary Material T1 ) and using the StepOne Software instrument (ABI, USA). The reaction conditions were 95°C for 5 min, 95°C for 10 s, 58°C for 20 s, and 72°C for 20 s. in total, 40 cycles were run. After the end of the reaction, the software automatically analyzed the fluorescence signal and converted it to the Ct value. Western blot All the cell lines were washed by PBS 3 times and lysed by lyase, assisted with an ultrasonic cell crusher to ensure we could obtain the targeted proteins fully. The extracted proteins were quantified and loaded for SDS-PAGE gel for electrophoresis, transferred to PVDF membranes, and blocked by 10% milk. Later, the membranes were incubated with primary antibodies and secondary antibodies to form immunocomplexes which were visualized by enhanced chemiluminescence and followed by directly photographing and quantitative analysis. Immunofluorescence staining HPDE6-C7 and AsPC-1 cell lines were used in the IFS validation and observed under 100- and 400-time magnification for the protein staining of DLAT, LIPT1, and LIAS, respectively. Information regarding the antibodies used in the present study is available in the [76]Supplementary Material T2 . Results CRGI was optimized from 10 mainstream algorithms 10 Cuproptosis-related genes were curated from the work of Tsvetkov et al. ([77]2). Combined with 6 well-recognized biomarkers (i.e., KRAS, TP53, SMAD4, BRCA1, BRCA2, and CDKN2A) in pancreatic adenocarcinoma (PAAD), they were subjected to mainstream machine learning procedures to develop a Cuproptosis-related gene index (CRGI) ([78]59). In the TCGA dataset, among 10 mainstream machine learning algorithms, we optimized the best model through LASSO penalized Cox regression that had a leading AUC value in 1, 2, 3, and 4-year overall survival (OS) predictive performance, up to 0.736, 0.703, 0.708, 0.812, respectively ([79] Figure 2C ). The formula for the CRGI calculation is: Figure 2. [80]Figure 2 [81]Open in a new tab Predictive performance comparison of the 10 mainstream machine learning algorithms in the TCGA dataset. (A) Distribution of the CRGI and survival status of individual PAAD patients. (B) Survival analysis of the high- and low-CRGI groups. (C) The time-dependent ROC curve with the AUC value of 1-, 2-, 3-, 4-, and 5-year OS prediction of the best model optimized by LASSO penalized Cox regression. (D–L) The predictive accuracy of other machine learning algorithms (i.e., Decision Tree, GMM, GBDT, K-NN, LGBM, Logistic Regression, Random Forest, SVM, and XGboost, respectively). [MATH: CRGI = 0.5316*K< /mi>RAS + 0.014*TP53 0.0407*CD KN2A 0.0999*SMA D4 + 0.3768*BRCA1 +0.0866*BRCA2 0.1292*LIAS 0.587*LIPT1 0.3158*DLD + 0.4833*DLAT 0.3627*PDHA1 0.3253*MTF1 0.1286*G LS :MATH] Following the calculation of the CRGI, patients were separated into the high- and low-CRGI groups by the median value of all CRGI. It was observed that the number of patients who deceased significantly climbed up with the increase in CRGI value ([82] Figure 2A ). The survival analysis further revealed the fact that the low-CRGI group possessed a significant survival advantage ([83] Figure 2B ). Similar analytic results were found when we validated our model in the ICGC dataset ([84] Figures 3D, E ). Figure 3. [85]Figure 3 [86]Open in a new tab Comparison of other cell death mechanisms-based prognostic signatures in PAAD. (A–C) The ROC curve of autophagy-, ferroptosis-, and pyroptosis-based models in the TCGA dataset, respectively. Distribution of the CRGI and survival status of individual PAAD patients (D), and Survival analysis (E) of the high- and low-CRGI groups in the ICGC dataset. (F–I) The ROC curve of our model, the autophagy-, ferroptosis-, and pyroptosis-based models in the ICGC dataset, respectively. DCA diagrams of our model, the autophagy-, ferroptosis-, and pyroptosis-based models in the TCGA dataset (J), and ICGC dataset (K). All: all positive; None: all negative. They are the extreme conditions that serve as background references.