Abstract

Background

   Thyroid cancer (THCA) has become a common malignancy in recent years,
   with the mortality rate steadily increasing. PANoptosis is a unique
   kind of programmed cell death (PCD), including pyroptosis, necroptosis,
   and apoptosis, and is involved in the proliferation and prognosis of
   numerous cancers. This paper demonstrated the connection between
   PANoptosis-related genes and THCA based on the analyses of Gene
   Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases,
   which have not been evaluated yet.

Methods

   We identified PANoptosis-related differentially expressed genes
   (PRDEGs) by multi-analyzing the TCGA-THCA and GEO datasets. To identify
   the significant PRDEGs, a prognostic model was constructed using least
   absolute shrinkage and selection operator regression (LASSO). The
   predictive values of the significant PRDEGs for THCA outcomes were
   determined using Cox regression analysis and nomograms. Gene enrichment
   analyses were performed. Finally, immunohistochemistry was carried out
   using the human protein atlas.

Results

   A LASSO regression model based on nine PRDEGs was constructed, and the
   prognostic value of key PRDEGs was explored via risk score. Univariate
   and multivariate Cox regression were implemented to identify further
   three significant PRDEGs closely related to distant metastasis, lymph
   node metastasis, and tumor stage. Then, a nomogram was constructed,
   which presented high predictive accuracy for 5 years survival of THCA
   patients. Gene enrichment analyses in THCA were strongly associated
   with PCD pathways. CASP6 presented significantly differential
   expression during clinical T stage, N stage, and PFI events (P < 0.05
   for all) and demonstrated the highest degree of diagnostic efficacy in
   PRDEGs (HR: 2.060, 95 % CI: 1.170–3.628, P < 0.05).
   Immunohistochemistry showed CASP6 was more abundant in THCA tumor
   tissue.

Conclusion

   A potential prognostic role for PRDEGs in THCA was identified,
   providing a new direction for treatment. CASP6 may be a potential
   therapeutic target and a novel prognostic biomarker for THCA.

   Keywords: Thyroid cancer, PANoptosis-related genes, LASSO-Cox
   regression, Prognosis, TCGA, GEO

1. Introduction

   Thyroid cancer (THCA) is a common malignant tumor of the thyroid gland
   in the world, which accounts for approximately 1 % of entire
   malignancies [[35]1]. Globally, the THCA morbidity rate has increased
   in recent decades and is estimated to be the fourth leading cancer type
   [[36]2]. With the advances in treatment made, including surgical
   resection, radioactive iodine, thyroid-stimulating hormone
   (TSH)-suppressive therapy, redifferentiation drugs, targeted therapies,
   and immunotherapy [[37]3], patients with papillary thyroid carcinomas
   have an overall 5-year survival of as high as 95 %. Despite this,
   recurrence and metastasis still occur frequently (20–30 %) [[38]4],
   which results in increased mortality associated with sizeable social
   healthcare burdens [[39]5]. About half of distant metastasis patients
   pass away within five years [[40]6,[41]7]. Hence, it is essential to
   identify a more accurate prognostic module for THCA to develop and
   introduce novel treatments.

   Programmed cell death (PCD) is related to the occurrence and
   progression of malignancies and includes classical methods such as
   apoptosis, necroptosis, and pyroptosis. PANoptosis describes the
   simultaneous occurrence and regulation of apoptosis, necroptosis, and
   pyroptosis in the pathological processes of some diseases [[42]8].
   Plenty of evidence has been unveiled that PANoptosis participates in
   the development of cancer. A mouse model lacking IRF1 (an upstream
   regulator of PANoptosis) was found to be hypersusceptible to colorectal
   tumorigenesis [[43]9]. Despite their resistance to the progression of
   melanoma and colorectal cancer, mice with adenosine deaminase acting on
   RNA-1 (ADAR1) deficiency exhibit restored tumorigenesis when the ZBP1
   Zalpha2 domain is deleted, suggesting that ADAR1 restrains
   ZBP1-mediated PANoptosis, thereby facilitating tumorigenesis [[44]10].
   The PANscore, a scoring system based on PANoptosis patterns of
   individual gastric cancer patients, was developed, and the low PANscore
   group achieved higher immunotherapy response rates and prognoses. More
   recently, PANoptosis-related gene signatures for gliomas have been
   proven to be prognostic. However, to date, PANoptosis-related
   differentially expressed genes (PRDEGs) have not been established as
   prognostic biomarkers of THCA.

   The goal of this research was to analyze PANoptosis-related genes in
   THCA using bioinformatics. The 510 THCA and 58 normal samples from The
   Cancer Genome Atlas (TCGA) database were utilized to acquire clinical
   data and gene expression. The GeneCards database was applied to
   download 14 PANoptosis-related genes. We screened the relevant
   literature in the PubMed database, resulting in 23 PANoptosis-related
   genes (PRGs). We then combined and de-duplicated the two resources,
   resulting in a final set of 28 PRGs, which were included in subsequent
   analyses. The GEO database was applied to acquire THCA gene expression
   data and clinical datasets ([45]GSE33630, [46]GSE35570, [47]GSE65144,
   and [48]GSE76039). We identified 14 TCGA-THCA PRDEGs that were
   consistent with those in the [49]GSE33630 and [50]GSE35570 datasets.
   According to the above data resources, a risk scoring system was
   constructed using the TCGA-THCA dataset and applied to a combined GEO
   dataset, including [51]GSE33630, [52]GSE35570, [53]GSE65144, and
   [54]GSE76039, to identify three PRDEGS. A nomogram was built by
   considering the three PRDEGs and then validated by calibration and
   decision curve analysis (DCA). In addition, the potential function and
   mechanism of PRDEGs in TCGA-THCA were explored by protein-protein
   interaction (PPI) network analysis, functional analysis, gene set
   variation analysis (GSVA), and gene set enrichment analysis (GSEA). As
   a result, three PRGs were discovered to be independent predictive
   factors of observed survival in THCA patients. CASP6 may be a potential
   therapeutic target and a prognostic biomarker for patients with THCA.

2. Materials & methods

2.1. Data download

   With the application of the R package "TCGAbiolinks", the expression
   matrix of the THCA dataset was downloaded from TCGA
   ([55]https://portal.gdc.cancer.gov/) [[56]11], including 510 samples of
   THCA (tumor group) and 58 paracancerous samples (normal group),
   standardized as Fragments Per Kilobase per million; the UCSC Xena
   database ([57]http://genome.ucsc.edu) was utilized to acquire the
   relevant clinical data [[58]12]. R package “limma” was utilized to
   normalize the count sequencing data from TCGA-THCA [[59]13]. R package
   GEOquery was utilized to acquire clinical information and THCA-related
   gene expression data from the GEO database in datasets [60]GSE33630
   [[61][14], [62][15]], [63]GSE35570 [[64]16], [65]GSE65144 [[66]17], and
   [67]GSE76039 [[68][18], [69][19]]. Gene annotation was performed by
   means of the [70]GPL570 Affymetrix Human Genome U133 Plus 2.0
   (HG-U133_Plus_2) Array. All the samples included in this study were
   obtained from Homo sapiens. [71]GSE33630 consisted of gene expression
   profile microarray data from 60 samples of THCA and 45 samples of
   normal tissue. [72]GSE33570 contains gene expression profile microarray
   data of 65 THCA tissue specimens and 51 normal tissue samples. The
   [73]GSE65144 included 12 THCA samples and 13 normal thyroid tissue
   samples. Similarly, [74]GSE76039 comprises 37 tumor tissue specimens.
   [75]GSE33630 [[76][20], [77][21]], [78]GSE35570, [79]GSE65144, and
   [80]GSE76039 [[81]22] served as validation sets.

   The GeneCards database [[82]23] ([83]https://www.genecards.org/)
   provides concise genomic information on all known human genes.
   "PANoptosis" served as a keyword to search for PRGs (PANoptosis-related
   genes) in the GeneCards database, from which 14 PRGs were obtained. We
   also identified 23 PRGs by retrieving all relevant published literature
   [[84]24,[85]25] from PubMed. The PRGs from the two resources were
   further combined and deduplicated, resulting in a final set of 28 PRGs,
   which were used for subsequent analyses. The specific gene names are
   presented in [86]Supplementary Table S1.

2.2. Differential expression analysis

   To identify potential mechanisms, related biological features, and
   signaling pathways of differential genes between the tumor and normal
   groups in THCA datasets of TCGA, [87]GSE33630, and [88]GSE35570 were
   standardized using the R package "limma", followed by a differential
   analysis of gene expression profiles between the two TCGA-THCA groups.
   The identification of differentially expressed genes (DEGs) was
   performed in diverse groups, with adjusted P < 0.05 and logFC >0.2
   (upregulated genes) or < 0.2 (downregulated genes) as differential
   expression. R package "ggplot 2" was utilized to make volcano plots for
   DEG. Upregulated genes and downregulated genes were intersected with
   PRGs and displayed using Venn Diagrams. PRDEGs for THCA were derived by
   combining both intersections. Furthermore, we drew boxplots of group
   comparisons separately for DEGs in datasets including TCGA-THCA,
   [89]GSE33630, and [90]GSE35570. The DEGs with the same expression trend
   and statistically significant discrepancy were extracted for subsequent
   analyses, and their expression in TCGA-THCA dataset was plotted as a
   heatmap using the "pheatmap" package of R.

2.3. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
enrichment analyses

   Large-scale functional enrichment research is commonly conducted using
   GO analysis [[91]26], which comprising molecular functions (MF),
   biological processes (BP), and cellular components (CC). KEGG
   [[92]27,[93]28] is a comprehensive database that integrates biological
   pathways, genomics, medicines, and diseases. The "ClusterProfiler" R
   package was utilized to conduct KEGG and GO pathway enrichment analyses
   of PRDEGs, with thresholds set as FDR <0.2 and P < 0.05, followed by
   correcting P-value by means of the Benjamini–Hochberg (BH) method.

2.4. GSEA

   We assessed gene distribution trends within a predefined gene set to
   identify genes contributing to phenotypes using GSEA. The enrichment
   analysis was processed in genes from the tumor and normal groups of
   TCGA-THCA using the R package "ClusterProfiler" (seed = 2020), repeated
   1000 times, Min Size = 10 and Max Size = 500; the P-value was corrected
   by the BH method. The MSigDB database was applied to download the "c2.
   cp.v7.2. symbols" gene set, and significantly enriched functions were
   identified with FDR (q-value) < 0.25 and P < 0.05.

3. GSVA

   GSVA [[94]29] is a non-parametric, unsupervised approach in which
   expression matrixes of genes in different samples are converted to
   examine the transcriptome results related to gene set enrichment to
   assess whether there are enriched different pathways in various
   samples. The MSigDB database was applied to download a "h.all.v7.4.
   symbols.gmt" gene set, and GSVA was conducted on TCGA-THCA, with
   adjusted P < 0.05 as statistical difference, so as to explore the
   functional enrichment difference between the two groups.

3.1. PPI network

   STRING [[95]30] is a database used to search for interactions between
   known and predictive proteins. In this study, we conducted STRING to
   construct a PPI network related to candidate PRDEGs, with >0.400 as the
   minimum required interaction score. Protein complexes with specific
   biological functions can be identified by looking for densely connected
   regions in the PPI network. The maximal clique centrality (MCC)
   algorithm has been broadly applied to performance indicators in
   bioinformatics. Cytoscape [[96]31] (version 3.9.1) was applied to
   perform PPI network visualization. GeneMANIA, a website used to predict
   genes that possess similar functions to the screened PRDEGs, was
   employed to construct an interaction network of candidate genes.

3.2. Least absolute shrinkage and selection operator (LASSO) regression model

   To obtain a prognostic model of PRDEGs in THCA, with 10-fold
   cross-validation, we utilized LASSO [[97]32,[98]33] regression
   (seed = 2021). In addition, we ran 1000 repeats per period to prevent
   overfitting. LASSO regression is a type of linear regression that
   avoids overfitting and enhances the generalization capability of the
   refined model by imposing a penalty term (λ × absolute values of
   slopes) on the magnitude of the model coefficients. According to the
   LASSO regression model, risk factor graphs were drawn to depict the
   group and survival status of each tumor sample in accordance with the
   risk score.
   [MATH: <mrow><mtext>risk</mtext><mspace
   width="0.25em"></mspace><mtext>score</mtext><mo
   linebreak="badbreak">=</mo><mrow><munder><mo>∑</mo><mi>i</mi></munder><
   mrow><mi>C</mi><mi>o</mi><mi>e</mi><mi>f</mi><mi>f</mi><mi>i</mi><mi>c<
   /mi><mi>i</mi><mi>e</mi><mi>n</mi><mi>t</mi><mspace
   width="0.25em"></mspace><mrow><mo
   stretchy="true">(</mo><msub><mrow><mi>h</mi><mi>u</mi><mi>b</mi><mspace
   width="0.25em"></mspace><mi>g</mi><mi>e</mi><mi>n</mi><mi>e</mi></mrow>
   <mi>i</mi></msub><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mi>m</mi><mi>R</mi><mi>N</mi><m
   i>A</mi><mspace
   width="0.25em"></mspace><mi>E</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</
   mi><mi>s</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi><mspace
   width="0.25em"></mspace><mrow><mo
   stretchy="true">(</mo><msub><mrow><mi>h</mi><mi>u</mi><mi>b</mi><mspace
   width="0.25em"></mspace><mi>g</mi><mi>e</mi><mi>n</mi><mi>e</mi></mrow>
   <mi>i</mi></msub><mo stretchy="true">)</mo></mrow></mrow></mrow></mrow>
   :MATH]

3.3. Cox regression model

   For the purpose of assessing the prognostic value of PRDEGs in THCA,
   the expression of screened PRDEGs in the TCGA-THCA dataset was analyzed
   using multivariate Cox regression with the P-value threshold set to
   0.1, and then the Cox regression model was built and presented in a
   forest plot. According to the results of the multivariate, a nomogram
   was constructed to individualize the predicted probability of 1-, 3-,
   and 5-year survivals. A calibration evaluation was performed on the
   accuracy and discriminative capacity of the nomogram. Calibration
   curves and nomograms were drawn using R package "rms". The clinical
   utility of these models was assessed with DCA. The R package "ggDCA"
   [[99]34] was adopted to conduct DCA and evaluate the predictions of 1-,
   3-, and 5-year prognostic nomograms.

3.4. Immunohistochemical analysis

   Immunohistochemical (IHC) analysis uses the principle of specific
   antigen-antibody binding to detect and locate target antigens in cells
   and tissues, mainly with light microscopy. Human Protein Atlas (HPA)
   [[100]35] database ([101]www.proteinatlas.org/) was utilized to
   implement IHC analysis for key gene expression screened by the Cox
   model in THCA and normal thyroid tissues before contrast staining.

3.5. Statistical methods

   R version 4.1.2 was adopted for statistical analysis and plotting, and
   continuous variables are described by mean ± standard deviation. Data
   between two continuous variable groups were compared using the Wilcoxon
   rank sum test; Student's t-test was utilized for statistical difference
   between the two groups, and the Kruskal–Wallis test was implemented for
   multi-group comparison. Categorical variables were examined by means of
   chi-square or Fisher's exact tests. LASSO regression was carried out
   using the "glmnet" [[102]36] package, and R package "pROC" [[103]37]
   was applied to generate a receiver operating characteristic (ROC)
   curve. Unless otherwise indicated, correlations were analyzed using the
   Spearman test, with statistical significance setting as two-tailed
   P-values less than 0.05.

4. Results

4.1. Differential expression analysis

   Altogether, there were 18,436 DEGs acquired from the TCGA-THCA dataset,
   10,798 of which were identified using the following thresholds: average
   |logFC| > 0.2 and adjusted P < 0.05. Under such threshold, 5863 DEGs
   were highly expressed (upregulated), and 4935 DEGs were lowly expressed
   (downregulated) in THCA, as shown in [104]Fig. 1. The final DEGs were
   visualized using a volcano plot ([105]Fig. 2A). Eight upregulated genes
   (CASP6, CASP7, FADD, GSDMD, MLKL, PSTPIP2, PYCARD, and RBCK1) and six
   downregulated genes (AIM2, IFNG, TAB3, TNF, TNFAIP3, and ZBP1) were
   maintained by taking the intersection of diversely regulated DEGs and
   PRGs, and a Venn diagram was generated ([106]Fig. 2B and C). The names
   and expression information for the 14 PRDEGs are listed in [107]Table
   1, [108]Table 2. Next, we drew grouped comparison plots ([109]Fig.
   2D–F) of the 14 PRDEGs in TCGA-THCA, GSE 33630, and [110]GSE35570 to
   recognize the expression trends and whether the differences were
   statistically significant. According to the figure, comparing genes in
   the [111]GSE33630 and [112]GSE35570 datasets with TCGA, nine genes were
   consistent with the validation results, including CASP6, CASP7, FADD,
   GSDMD, MLKL, PSTPIP2, PYCARD, RBCK1, and TAB3.

Fig. 1.

   [113]Fig. 1
   [114]Open in a new tab

   Flowchart of the selection of the related genes. TCGA, the cancer
   genome atlas. THCA, thyroid cancer. PRGs, PANoptosis-related genes.
   PRDEGs, PANoptosis-related differentially expressed genes. GO, Gene
   Ontology. KEGG, Kyoto Encyclopedia of Genes and Genomes. GSEA, Gene Set
   Enrichment Analysis. GSVA, Gene Set Variation Analysis. PPI,
   Protein-protein interaction. LASSO, least absolute shrinkage and
   selection operator. ROC curve, receiver operating characteristic curve.
   IHC, immunohistochemical analysis.

Fig. 2.

   [115]Fig. 2
   [116]Open in a new tab

   Analyses of differentially expressed genes. (A) Volcano plot of the
   distribution of all differentially expressed genes, mapping 5863
   upregulated (red dots) and 4935 downregulated (blue dots) genes. (B)
   Venn diagram showing overlap of the upregulated DEGs and the PRGs. (C)
   Venn diagram showing overlap of the downregulated DEGs and the PRGs.
   (D–E) Expression differences in the 14 PRDEGs between tumor tissues and
   normal samples in TCGA-THCA, [117]GSE33630, and [118]GSE35570 datasets.
   Ns, not significant, P ≥ 0.05; *, P < 0.05; **, P < 0.01; ***,
   P < 0.001. TCGA, the cancer genome atlas. THCA, thyroid cancer. DEGs,
   differentially expressed genes. PRGs, PANoptosis-related genes. PRDEGs,
   PANoptosis-related differentially expressed genes. (For interpretation
   of the references to color in this figure legend, the reader is