Abstract Background Thyroid cancer (THCA) has become a common malignancy in recent years, with the mortality rate steadily increasing. PANoptosis is a unique kind of programmed cell death (PCD), including pyroptosis, necroptosis, and apoptosis, and is involved in the proliferation and prognosis of numerous cancers. This paper demonstrated the connection between PANoptosis-related genes and THCA based on the analyses of Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, which have not been evaluated yet. Methods We identified PANoptosis-related differentially expressed genes (PRDEGs) by multi-analyzing the TCGA-THCA and GEO datasets. To identify the significant PRDEGs, a prognostic model was constructed using least absolute shrinkage and selection operator regression (LASSO). The predictive values of the significant PRDEGs for THCA outcomes were determined using Cox regression analysis and nomograms. Gene enrichment analyses were performed. Finally, immunohistochemistry was carried out using the human protein atlas. Results A LASSO regression model based on nine PRDEGs was constructed, and the prognostic value of key PRDEGs was explored via risk score. Univariate and multivariate Cox regression were implemented to identify further three significant PRDEGs closely related to distant metastasis, lymph node metastasis, and tumor stage. Then, a nomogram was constructed, which presented high predictive accuracy for 5 years survival of THCA patients. Gene enrichment analyses in THCA were strongly associated with PCD pathways. CASP6 presented significantly differential expression during clinical T stage, N stage, and PFI events (P < 0.05 for all) and demonstrated the highest degree of diagnostic efficacy in PRDEGs (HR: 2.060, 95 % CI: 1.170–3.628, P < 0.05). Immunohistochemistry showed CASP6 was more abundant in THCA tumor tissue. Conclusion A potential prognostic role for PRDEGs in THCA was identified, providing a new direction for treatment. CASP6 may be a potential therapeutic target and a novel prognostic biomarker for THCA. Keywords: Thyroid cancer, PANoptosis-related genes, LASSO-Cox regression, Prognosis, TCGA, GEO 1. Introduction Thyroid cancer (THCA) is a common malignant tumor of the thyroid gland in the world, which accounts for approximately 1 % of entire malignancies [[35]1]. Globally, the THCA morbidity rate has increased in recent decades and is estimated to be the fourth leading cancer type [[36]2]. With the advances in treatment made, including surgical resection, radioactive iodine, thyroid-stimulating hormone (TSH)-suppressive therapy, redifferentiation drugs, targeted therapies, and immunotherapy [[37]3], patients with papillary thyroid carcinomas have an overall 5-year survival of as high as 95 %. Despite this, recurrence and metastasis still occur frequently (20–30 %) [[38]4], which results in increased mortality associated with sizeable social healthcare burdens [[39]5]. About half of distant metastasis patients pass away within five years [[40]6,[41]7]. Hence, it is essential to identify a more accurate prognostic module for THCA to develop and introduce novel treatments. Programmed cell death (PCD) is related to the occurrence and progression of malignancies and includes classical methods such as apoptosis, necroptosis, and pyroptosis. PANoptosis describes the simultaneous occurrence and regulation of apoptosis, necroptosis, and pyroptosis in the pathological processes of some diseases [[42]8]. Plenty of evidence has been unveiled that PANoptosis participates in the development of cancer. A mouse model lacking IRF1 (an upstream regulator of PANoptosis) was found to be hypersusceptible to colorectal tumorigenesis [[43]9]. Despite their resistance to the progression of melanoma and colorectal cancer, mice with adenosine deaminase acting on RNA-1 (ADAR1) deficiency exhibit restored tumorigenesis when the ZBP1 Zalpha2 domain is deleted, suggesting that ADAR1 restrains ZBP1-mediated PANoptosis, thereby facilitating tumorigenesis [[44]10]. The PANscore, a scoring system based on PANoptosis patterns of individual gastric cancer patients, was developed, and the low PANscore group achieved higher immunotherapy response rates and prognoses. More recently, PANoptosis-related gene signatures for gliomas have been proven to be prognostic. However, to date, PANoptosis-related differentially expressed genes (PRDEGs) have not been established as prognostic biomarkers of THCA. The goal of this research was to analyze PANoptosis-related genes in THCA using bioinformatics. The 510 THCA and 58 normal samples from The Cancer Genome Atlas (TCGA) database were utilized to acquire clinical data and gene expression. The GeneCards database was applied to download 14 PANoptosis-related genes. We screened the relevant literature in the PubMed database, resulting in 23 PANoptosis-related genes (PRGs). We then combined and de-duplicated the two resources, resulting in a final set of 28 PRGs, which were included in subsequent analyses. The GEO database was applied to acquire THCA gene expression data and clinical datasets ([45]GSE33630, [46]GSE35570, [47]GSE65144, and [48]GSE76039). We identified 14 TCGA-THCA PRDEGs that were consistent with those in the [49]GSE33630 and [50]GSE35570 datasets. According to the above data resources, a risk scoring system was constructed using the TCGA-THCA dataset and applied to a combined GEO dataset, including [51]GSE33630, [52]GSE35570, [53]GSE65144, and [54]GSE76039, to identify three PRDEGS. A nomogram was built by considering the three PRDEGs and then validated by calibration and decision curve analysis (DCA). In addition, the potential function and mechanism of PRDEGs in TCGA-THCA were explored by protein-protein interaction (PPI) network analysis, functional analysis, gene set variation analysis (GSVA), and gene set enrichment analysis (GSEA). As a result, three PRGs were discovered to be independent predictive factors of observed survival in THCA patients. CASP6 may be a potential therapeutic target and a prognostic biomarker for patients with THCA. 2. Materials & methods 2.1. Data download With the application of the R package "TCGAbiolinks", the expression matrix of the THCA dataset was downloaded from TCGA ([55]https://portal.gdc.cancer.gov/) [[56]11], including 510 samples of THCA (tumor group) and 58 paracancerous samples (normal group), standardized as Fragments Per Kilobase per million; the UCSC Xena database ([57]http://genome.ucsc.edu) was utilized to acquire the relevant clinical data [[58]12]. R package “limma” was utilized to normalize the count sequencing data from TCGA-THCA [[59]13]. R package GEOquery was utilized to acquire clinical information and THCA-related gene expression data from the GEO database in datasets [60]GSE33630 [[61][14], [62][15]], [63]GSE35570 [[64]16], [65]GSE65144 [[66]17], and [67]GSE76039 [[68][18], [69][19]]. Gene annotation was performed by means of the [70]GPL570 Affymetrix Human Genome U133 Plus 2.0 (HG-U133_Plus_2) Array. All the samples included in this study were obtained from Homo sapiens. [71]GSE33630 consisted of gene expression profile microarray data from 60 samples of THCA and 45 samples of normal tissue. [72]GSE33570 contains gene expression profile microarray data of 65 THCA tissue specimens and 51 normal tissue samples. The [73]GSE65144 included 12 THCA samples and 13 normal thyroid tissue samples. Similarly, [74]GSE76039 comprises 37 tumor tissue specimens. [75]GSE33630 [[76][20], [77][21]], [78]GSE35570, [79]GSE65144, and [80]GSE76039 [[81]22] served as validation sets. The GeneCards database [[82]23] ([83]https://www.genecards.org/) provides concise genomic information on all known human genes. "PANoptosis" served as a keyword to search for PRGs (PANoptosis-related genes) in the GeneCards database, from which 14 PRGs were obtained. We also identified 23 PRGs by retrieving all relevant published literature [[84]24,[85]25] from PubMed. The PRGs from the two resources were further combined and deduplicated, resulting in a final set of 28 PRGs, which were used for subsequent analyses. The specific gene names are presented in [86]Supplementary Table S1. 2.2. Differential expression analysis To identify potential mechanisms, related biological features, and signaling pathways of differential genes between the tumor and normal groups in THCA datasets of TCGA, [87]GSE33630, and [88]GSE35570 were standardized using the R package "limma", followed by a differential analysis of gene expression profiles between the two TCGA-THCA groups. The identification of differentially expressed genes (DEGs) was performed in diverse groups, with adjusted P < 0.05 and logFC >0.2 (upregulated genes) or < 0.2 (downregulated genes) as differential expression. R package "ggplot 2" was utilized to make volcano plots for DEG. Upregulated genes and downregulated genes were intersected with PRGs and displayed using Venn Diagrams. PRDEGs for THCA were derived by combining both intersections. Furthermore, we drew boxplots of group comparisons separately for DEGs in datasets including TCGA-THCA, [89]GSE33630, and [90]GSE35570. The DEGs with the same expression trend and statistically significant discrepancy were extracted for subsequent analyses, and their expression in TCGA-THCA dataset was plotted as a heatmap using the "pheatmap" package of R. 2.3. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses Large-scale functional enrichment research is commonly conducted using GO analysis [[91]26], which comprising molecular functions (MF), biological processes (BP), and cellular components (CC). KEGG [[92]27,[93]28] is a comprehensive database that integrates biological pathways, genomics, medicines, and diseases. The "ClusterProfiler" R package was utilized to conduct KEGG and GO pathway enrichment analyses of PRDEGs, with thresholds set as FDR <0.2 and P < 0.05, followed by correcting P-value by means of the Benjamini–Hochberg (BH) method. 2.4. GSEA We assessed gene distribution trends within a predefined gene set to identify genes contributing to phenotypes using GSEA. The enrichment analysis was processed in genes from the tumor and normal groups of TCGA-THCA using the R package "ClusterProfiler" (seed = 2020), repeated 1000 times, Min Size = 10 and Max Size = 500; the P-value was corrected by the BH method. The MSigDB database was applied to download the "c2. cp.v7.2. symbols" gene set, and significantly enriched functions were identified with FDR (q-value) < 0.25 and P < 0.05. 3. GSVA GSVA [[94]29] is a non-parametric, unsupervised approach in which expression matrixes of genes in different samples are converted to examine the transcriptome results related to gene set enrichment to assess whether there are enriched different pathways in various samples. The MSigDB database was applied to download a "h.all.v7.4. symbols.gmt" gene set, and GSVA was conducted on TCGA-THCA, with adjusted P < 0.05 as statistical difference, so as to explore the functional enrichment difference between the two groups. 3.1. PPI network STRING [[95]30] is a database used to search for interactions between known and predictive proteins. In this study, we conducted STRING to construct a PPI network related to candidate PRDEGs, with >0.400 as the minimum required interaction score. Protein complexes with specific biological functions can be identified by looking for densely connected regions in the PPI network. The maximal clique centrality (MCC) algorithm has been broadly applied to performance indicators in bioinformatics. Cytoscape [[96]31] (version 3.9.1) was applied to perform PPI network visualization. GeneMANIA, a website used to predict genes that possess similar functions to the screened PRDEGs, was employed to construct an interaction network of candidate genes. 3.2. Least absolute shrinkage and selection operator (LASSO) regression model To obtain a prognostic model of PRDEGs in THCA, with 10-fold cross-validation, we utilized LASSO [[97]32,[98]33] regression (seed = 2021). In addition, we ran 1000 repeats per period to prevent overfitting. LASSO regression is a type of linear regression that avoids overfitting and enhances the generalization capability of the refined model by imposing a penalty term (λ × absolute values of slopes) on the magnitude of the model coefficients. According to the LASSO regression model, risk factor graphs were drawn to depict the group and survival status of each tumor sample in accordance with the risk score. [MATH: riskscore=i< mrow>Coeffic< /mi>ient(hubgene i)*mRNAExpression(hubgene i) :MATH] 3.3. Cox regression model For the purpose of assessing the prognostic value of PRDEGs in THCA, the expression of screened PRDEGs in the TCGA-THCA dataset was analyzed using multivariate Cox regression with the P-value threshold set to 0.1, and then the Cox regression model was built and presented in a forest plot. According to the results of the multivariate, a nomogram was constructed to individualize the predicted probability of 1-, 3-, and 5-year survivals. A calibration evaluation was performed on the accuracy and discriminative capacity of the nomogram. Calibration curves and nomograms were drawn using R package "rms". The clinical utility of these models was assessed with DCA. The R package "ggDCA" [[99]34] was adopted to conduct DCA and evaluate the predictions of 1-, 3-, and 5-year prognostic nomograms. 3.4. Immunohistochemical analysis Immunohistochemical (IHC) analysis uses the principle of specific antigen-antibody binding to detect and locate target antigens in cells and tissues, mainly with light microscopy. Human Protein Atlas (HPA) [[100]35] database ([101]www.proteinatlas.org/) was utilized to implement IHC analysis for key gene expression screened by the Cox model in THCA and normal thyroid tissues before contrast staining. 3.5. Statistical methods R version 4.1.2 was adopted for statistical analysis and plotting, and continuous variables are described by mean ± standard deviation. Data between two continuous variable groups were compared using the Wilcoxon rank sum test; Student's t-test was utilized for statistical difference between the two groups, and the Kruskal–Wallis test was implemented for multi-group comparison. Categorical variables were examined by means of chi-square or Fisher's exact tests. LASSO regression was carried out using the "glmnet" [[102]36] package, and R package "pROC" [[103]37] was applied to generate a receiver operating characteristic (ROC) curve. Unless otherwise indicated, correlations were analyzed using the Spearman test, with statistical significance setting as two-tailed P-values less than 0.05. 4. Results 4.1. Differential expression analysis Altogether, there were 18,436 DEGs acquired from the TCGA-THCA dataset, 10,798 of which were identified using the following thresholds: average |logFC| > 0.2 and adjusted P < 0.05. Under such threshold, 5863 DEGs were highly expressed (upregulated), and 4935 DEGs were lowly expressed (downregulated) in THCA, as shown in [104]Fig. 1. The final DEGs were visualized using a volcano plot ([105]Fig. 2A). Eight upregulated genes (CASP6, CASP7, FADD, GSDMD, MLKL, PSTPIP2, PYCARD, and RBCK1) and six downregulated genes (AIM2, IFNG, TAB3, TNF, TNFAIP3, and ZBP1) were maintained by taking the intersection of diversely regulated DEGs and PRGs, and a Venn diagram was generated ([106]Fig. 2B and C). The names and expression information for the 14 PRDEGs are listed in [107]Table 1, [108]Table 2. Next, we drew grouped comparison plots ([109]Fig. 2D–F) of the 14 PRDEGs in TCGA-THCA, GSE 33630, and [110]GSE35570 to recognize the expression trends and whether the differences were statistically significant. According to the figure, comparing genes in the [111]GSE33630 and [112]GSE35570 datasets with TCGA, nine genes were consistent with the validation results, including CASP6, CASP7, FADD, GSDMD, MLKL, PSTPIP2, PYCARD, RBCK1, and TAB3. Fig. 1. [113]Fig. 1 [114]Open in a new tab Flowchart of the selection of the related genes. TCGA, the cancer genome atlas. THCA, thyroid cancer. PRGs, PANoptosis-related genes. PRDEGs, PANoptosis-related differentially expressed genes. GO, Gene Ontology. KEGG, Kyoto Encyclopedia of Genes and Genomes. GSEA, Gene Set Enrichment Analysis. GSVA, Gene Set Variation Analysis. PPI, Protein-protein interaction. LASSO, least absolute shrinkage and selection operator. ROC curve, receiver operating characteristic curve. IHC, immunohistochemical analysis. Fig. 2. [115]Fig. 2 [116]Open in a new tab Analyses of differentially expressed genes. (A) Volcano plot of the distribution of all differentially expressed genes, mapping 5863 upregulated (red dots) and 4935 downregulated (blue dots) genes. (B) Venn diagram showing overlap of the upregulated DEGs and the PRGs. (C) Venn diagram showing overlap of the downregulated DEGs and the PRGs. (D–E) Expression differences in the 14 PRDEGs between tumor tissues and normal samples in TCGA-THCA, [117]GSE33630, and [118]GSE35570 datasets. Ns, not significant, P ≥ 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001. TCGA, the cancer genome atlas. THCA, thyroid cancer. DEGs, differentially expressed genes. PRGs, PANoptosis-related genes. PRDEGs, PANoptosis-related differentially expressed genes. (For interpretation of the references to color in this figure legend, the reader is