Abstract

   Osteoarthritis (OA), a degenerative disease of the joints, has one of
   the highest disability rates worldwide. This study investigates the
   role of pyroptosis-related genes in osteoarthritis and their expression
   in different chondrocyte subtypes at the individual cell level. Using
   OA-related datasets for single-cell RNA sequencing and RNA-seq, the
   study identified PRDEGs and DEGs and conducted Cox regression analysis
   to identify independent prognostic factors for OA. CASP6, NOD1, and
   PYCARD were found to be prognostic factors. Combined Weighted Gene
   Correlation Network Analysis with PPI network, a total of 15 hub genes
   related to pyroptosis were involved in the notch and oxidative
   phosphorylation pathways, which could serve as biomarkers for the
   diagnosis and prognosis of OA patients. The study also explored the
   heterogeneity of chondrocytes between OA and normal samples,
   identifying 19 single-cell subpopulation marker genes that were
   significantly different among 7 chondrocyte cell clusters. AGT, CTSD,
   CYBC, and THYS1 were expressed differentially among different cell
   subpopulations, which were associated with cartilage development and
   metabolism. These findings provide valuable insights into the molecular
   mechanisms underlying OA and could facilitate the development of new
   therapeutic strategies for this debilitating disease.

   Subject terms: Computational biology and bioinformatics, Immunology

Introduction

   Osteoarthritis (OA) is a degenerative disease characterized by joint
   and muscle decline, synovial inflammation, and bone and cartilage
   decomposition. It is a prevalent condition among the elderly and a
   leading cause of disability worldwide, imposing a significant health
   and economic burden on individuals and society^[30]1,[31]2. While the
   exact cause of OA remains unclear, it has been associated with age,
   obesity, inflammation, trauma, and genetics. Current treatments,
   including medication, non-pharmacological approaches, and surgery, have
   limited efficacy and can have adverse side effects^[32]3,[33]4.
   Mesenchymal stem cell therapy has shown promise in promoting cartilage
   tissue repair and reducing inflammation, but more evidence is needed to
   support its use as a primary treatment option^[34]5.

   Pyroptosis, a form of active cell death triggered by endogenous or
   exogenous stimuli, has been implicated in the pathophysiology of
   OA^[35]6. Mechanisms of pyroptosis involve mitochondrial dysfunction,
   inflammatory responses, and oxidative stress. The elevated levels of
   oxidative stress and expression of pyroptosis-related proteins found in
   joint tissues of OA patients further suggest the significance of
   pyroptosis in the pathogenesis of OA^[36]7–[37]10.

   To gain insight into the pathogenesis of OA related to pyroptosis, a
   bioinformatics analysis was performed using available RNA-seq data for
   OA. Our research aims to comprehensively understand the molecular
   status of osteoarthritis (OA) and identify key genes and pathways
   associated with cell apoptosis in this disease. To achieve this goal,
   we analyzed a large amount of transcriptomic data from both OA and
   control synovial tissues. Additionally, we also analyzed single-cell
   RNA sequencing (scRNA-seq) data from chondrocytes (cartilage cells).
   Differential analysis was conducted to identify differentially
   expressed genes (DEGs) between OA samples with high and low immune
   estimates. Pyroptosis-related differentially expressed genes (PRDEGs)
   were identified using the Mann–Whitney U test based on obtained
   pyroptosis-related genes (PRGs). Weighted Gene Correlation Network
   Analysis (WGCNA) was then applied to screen for pyroptosis-related
   modules, and Pearson correlation analysis was used to screen module
   genes. Key genes were obtained by intersecting DEGs and module genes,
   and a protein–protein interaction (PPI) network was constructed to
   identify hub genes. GO/KEGG function enrichment analysis, Gene set
   enrichment analysis (GSEA), and Gene set variation analysis (GSVA) were
   conducted based on hub genes, and mRNA-miRNA, mRNA-TF, and mRNA-drug
   networks of hub genes were constructed. Single-cell RNA sequencing
   (scRNA-seq) was used to reveal different chondrocyte subtypes and
   compare gene expression differences in different cell subpopulations.
   Immune infiltration analysis was performed to obtain immune cells with
   different levels of infiltration between the High-Pyroptosis score
   group and the Low-Pyroptosis score group.

   This bioinformatics analysis provides insights into the pathogenesis of
   OA related to pyroptosis and identifies potential prognostic
   indicators. It also explores cell heterogeneity and expression
   differences of hub genes in different OA chondrocyte subpopulations,
   providing a new perspective on the management of OA.

Methods

Data collection and processing

   We obtained the gene expression profiling microarray
   [38]GSE12021^[39]11, [40]GSE55235^[41]12,[42]GSE82107^[43]13 and
   [44]GSE89408^[45]14 of osteoarthritis (OA) from the Gene Expression
   Omnibus (GEO, [46]https://www.ncbi.nlm.nih.gov/geo/) database using
   “GEOquery” R package^[47]15. Thereinto, the [48]GSE12021 contains 19
   samples, including 10 OA samples and 9 control samples. The
   [49]GSE55235 consists of 20 samples, including 10 OA samples and 10
   control samples. [50]GSE82107 contains 17 samples, including 10 OA
   samples and 7 control samples. The [51]GSE89408 consists of 49 samples,
   including 22 OA samples and 27 control samples. It is important to
   emphasize that we excluded all RA (rheumatoid arthritis) samples
   because our research primarily focuses on the pathophysiological
   processes of osteoarthritis, rather than conducting a comparative study
   involving various types of arthritis. Although both RA and
   osteoarthritis are arthritic conditions, they have significantly
   different pathogenic mechanisms and pathological processes. To maintain
   the clarity and specificity of our research, we chose to include only
   samples from osteoarthritis patients and excluded samples from other
   types of arthritis. The details of the 4 GEO dataset samples are shown
   in Table [52]1. We first merged the datasets [53]GSE12021,
   [54]GSE55235, [55]GSE82107 and [56]GSE89408, then removed the batch
   effect using the “sva” R package^[57]16. “Sva” (Surrogate Variable
   Analysis) is a commonly used R software package in gene expression data
   analysis. It is a method employed to detect and adjust for potential
   batch effects, aiming to enhance the reliability and accuracy of
   differential expression analysis and expression level clustering in
   various research studies. Next, we applied the “limma” R package^[58]17
   to standardize the merged datasets to obtain an OA dataset. The “limma”
   is a widely used R software package in gene expression data analysis.
   It is a linear modeling method designed for differential expression
   analysis and is applicable for both microarray and RNA sequencing data.
   Then Principal Component Analysis (PCA) was performed on the expression
   matrix of the datasets before and after the removal of batch effect to
   verify the effect of removing the batch effect. PCA is a method of data
   dimensionality reduction. It extracts feature vectors (components) of
   data from high-dimensional data, converts them into low-dimensional
   data, and uses two-dimensional or three-dimensional graphs to display
   these features.

Table 1.

   Dataset details.
   GEO ID Species OA sample Control
   sample Organization
   source GPL ID
   [59]GSE12021 Homo sapiens 10 9 synovial membrane [60]GPL96[HG-U133A]
   Affymetrix Human Genome U133A Array
   [61]GSE55235 Homo sapiens 10 10 synovial membrane [62]GPL96[HG-U133A]
   Affymetrix Human Genome U133A Array
   [63]GSE82107 Homo sapiens 10 7 synovial membrane [64]GPL96[HG-U133A]
   Affymetrix Human Genome U133A Array
   [65]GSE89408 Homo sapiens 22 27 synovial membrane
   [66]GPL570[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
   [67]Open in a new tab

   This scRNA-seq data set, [68]GSE169454, was retrieved from the GEO
   database and was generated using the [69]GPL555 Illumina HiSeq 2500
   platform (Homo sapiens). The samples were collected from 4 patients
   with knee osteoarthritis and 3 normal subjects. We selected 5 single
   cell samples of chondrocytes, among which 3 were normal (Normal1:
   [70]GSM5203389, Normal2: [71]GSM5203390, Normal3: [72]GSM5203391) and 2
   were OA (OA1: [73]GSM5203392, OA2: [74]GSM5203393).

   Seurat is a widely used R software package in the analysis of
   single-cell RNA sequencing (scRNA-seq) data. It is a highly flexible
   and powerful tool for processing, visualizing, and analyzing gene
   expression data at the single-cell level. For single-cell data
   processing, we used the “Seurat” R package^[75]18 to create five
   single-cell data as Seurat objects, and calculated the percentage of
   mitochondria in each cell through the "PercentageFeatureSet" function
   of the “Seurat” package. It is generally believed that when a cell has
   a high proportion of mitochondrial genes, it might be in a state of
   apoptosis or lysis. Therefore, for each cell, the proportion of
   mitochondrial genes was computed, and cells having a mitochondrial
   fraction more than 5% were filtered away. Due to the fact that cells
   with fewer than 200 genes were found to be of poor quality or to be
   empty droplets, and that cell duplication or multiplication might
   result in an excessively large gene count, we also filtered out cells
   with feature < 500. In addition, we filtered cells with total
   UMI > 20,000. Subsequently, we obtained 22,839 chondrocytes from
   patients with osteoarthritis and 6,360 normal chondrocytes.

   After completing the quality control procedure, OA samples (OA1, OA2)
   and normal samples (Normal1, Normal2, Normal3) were integrated
   respectively, and batch effects were removed using the CCA method in
   Seurat with the functions named the "FindIntegrationAnchors" and
   "IntegrateData". For Seurat objects, utilising the most differentially
   expressed genes, we did linear dimensional decrease using principal
   components (PC) calculation^[76]19. Then, we used "FindNeighbors" and
   "FindClusters" functions of Seurat to group cells into clusters with
   the optimum cluster size to identify the cell type. Then to compress
   the data gathered in the chosen important main components to two
   dimensions, we employed tSNE (t-Distributed Neighbor Embedding),
   realizing a visual graph-based clustering of cells. In addition, 33
   pyroptosis-related genes (PRGs) were obtained based on previous
   studies. In addition, 33 pyroptosis-related genes (PRGs) were obtained
   based on previous studies^[77]20.

Cell annotation

   Cell annotation is the process of associating and labeling each cell in
   single-cell sequencing data with its corresponding cell type, which
   helps researchers identify the cell types of individual cells and
   explore transcriptional differences in different physiological states
   and diseases. For osteoarthritis samples, 14 clusters were visualized
   using tSNE. For the normal samples, a total of 10 clusters were
   visualized using tSNE, and seven distinct chondrocyte subtypes were
   revealed by artificial annotation of cell type marker genes. The marker
   genes of homeostatic chondrocytes (HomC) contain MMP3, FOSB and JUN.
   The marker genes of hypertrophic chondrocytes (HTC) contain COL10A1,
   IBSP and JUN. The marker genes of prehypertrophic chondrocytes (preHTC)
   include COL10A1, IBSP, COL2A1 and TGFBI. The marker genes of regulatory
   chondrocytes (RegC) include CHI3L1 and CHI3L2. The marker genes of
   fibrochondrocytes (FC) contain COL1A1, COL1A2, S100A4, PRG4 and TMSB4X.
   The marker genes of reparative chondrocytes (RepC) include COL2A1,
   CILP, COL3A1 and COMP. The marker genes of preFC contain IL11, COL2A1,
   CILP and OGN^[78]18,[79]19,[80]21–[81]23. To understand the expression
   patterns of diagnostic markers in patients with osteoarthritis, we
   compared gene expression in different cell groups.

Immunologic estimate

   The level of tumor microenvironment cell, immune cell, and stromal cell
   invasion have significant influence on the prognosis. To clarify the
   influence of genes involved in immunity and stromal cells on prognosis,
   the “ESTIMATE” R package^[82]24 could leverage the uniqueness of the
   tumor sample's transcriptional pattern to deduce the composition of
   tumour cells and the various invading normal cells. We used the
   “ESTIMATE” to estimate the immune activity of samples by using the
   expression profile matrix data of the OA dataset. Then ESTIMAE score,
   Immune score and Stromal score of the characteristics of the expression
   matrix were calculated based on the ESTIMATE algorithm to quantify the
   immune and stromalscore components of the sample. Next, we separated OA
   samples into two categories, one with a high ImmuneScore and another
   with a low ImmuneScore, based on the midpoint of the distribution. The
   “limma” R package was applied to conduct differential expression
   analysis, and differentially expressed genes (DEGs) were discovered in
   distinct groups. We chose | logFC |> 0.5 and p value < 0.05 as the
   standard screened genes for further study. Thereinto, genes with
   logFC > 0.5 and p value < 0.05 were up-regulated genes and those with
   logFC < − 0.5 and p value < 0.05 were down-regulated genes. The
   consequences were presented as the volcano map drawn by “ggplot2” R
   package^[83]25 and the heatmap drawn by “pheatmap” R package^[84]26.

Calculation of pyroptosis score based on OA dataset

   Firstly, the Mann–Whitney U test was used to evaluate the difference
   between OA and control samples with regards to PRG expression (WilCoxon
   rank sum test).With a P-value less than 0.05, the results were
   considered significant, and pyroptosis-related differentially expressed
   genes (PRDEGs) were discriminated.

   The single-sample gene-set enrichment analysis (ssGSEA) algorithm could
   determine how many times each gene appears in the database samples. The
   “GSVA” R package^[85]27 was applied to count the Pyroptosis score of
   each sample in the OA group by the ssGSEA algorithm on account of the
   PRDEGs expression matrix of each sample in the OA dataset. Based on the
   median Pyroptosis value, we separated the OA samples into two groups:
   High-Pyroptosis and Low-Pyroptosis.

Enrichment analysis between the High-Pyroptosis score group and the
Low-Pyroptosis score group

   For the goal of comprehending the distinctions between the two groups'
   biological systems, GSEA^[86]28 was performed using gene expression
   profile data of OA patients. GSEA is a computer tool that permits the
   evaluation of whether a particular gene set demonstrates numerically
   substantial improvements between two biological states. The
   “clusterProfiler” R package^[87]29 was used to perform GSEA, with a
   P-Value of less than 0.05 indicating considerably enhanced data.

   GSVA is a method of gene set enrichment analysis for unduplicated
   samples. The score of the corresponding gene set could be obtained for
   each sample through GSVA, and the differentially expression analysis
   could be performed for pathways of each sample to obtain differentially
   expressed pathways between groups. We downloaded the reference gene set
   from MSigDB database^[88]30
   ([89]https://www.gsea-msigdb.org/gsea/msigdb/). Finally, the "GSVA" R
   program was utilized to find commonalities between samples from the
   high-risk and low-risk groups. Also, we used “limma” R package to
   conduct differentially expression analysis on scores of various
   pathways between two groups. The parameter is set to p value < 0.05 and
   | logFC |> 0, which is displayed by heatmap.

Identification of key genes

   The goals of WGCNA^[90]31 are to identify modules of co-expressed
   genes, study how genetic network correlate with phenotypes, and dissect
   key genes within these networks. The pickSoftTreshold method ultimately
   determined a value of 12 to be the best possible soft threshold.
   Subsequently, using this flexible limit, we formed a topological
   matrix, clustered the data hierarchically, and built a scale-free
   network. With 50 as the minimum number of modules, gene modules were
   actively slashed and identified, and Eigengenes were computed.
   Correlation between modules was built on account of module Eigengenes,
   hierarchical clustering was conducted, and modules with correlation
   greater than 0.5 were combined again, and finally 5 modules were
   gained. The association between module pyroptosis scores was analysed
   using Pearson's method, and candidate module genes (MEGs) were
   eliminated by this process (MEGs). Subsequently, we took the
   intersection of DEGs and MEGs to obtain genes as key genes in this
   study.

Construction of PPI network and identification of hub genes

   STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is
   an online tool used to search and analyze the interactions between
   proteins and genes. It provides extensive information on protein
   interactions, helping researchers explore protein–protein interaction
   networks and functional regulatory mechanisms. PPI network predictions
   and network construction were performed using the online STRING (a
   search engine used to find interacting genes) tool^[91]32. The PPI
   network was seen using Cytoscape^[92]33, and hub genes were filtered
   using the cytoHubba^[93]34 extension.

GO and KEGG enrichment analysis

   Gene Ontology (GO) analysis has been utilized as a general approach for
   mass functional enrichment, including biological processes (BP),
   molecular functions (MF) and Cellular Component (CC)^[94]35. The
   Genomes, biological processes, illnesses, and medications material is
   stored in the Kyoto Encyclopedia of Genes and Genomes (KEGG)^[95]36.
   For both DEGs and hub genes, we performed GO annotation analysis and
   KEGG pathway enrichment analysis using the "clusterProfiler" R package.
   We arbitrarily chose a significance level of 0.05 for the FDR
   threshold. The top 8 outcomes of GO and the top 10 results of KEGG are
   shown in the bar chart and dot chart, respectively.

Construction of mRNA-miRNA, mRNA-TF, mRNA-drugs interaction network

   ENCORI^[96]37 database (https: // Starbase.sysu.edu.cn/) is the 3.0
   version of the starBase database. The ENCORI database provides multiple
   graphical user interfaces for discussing microRNA targets, including
   those between miRNA and ncRNA, miRNA and mRNA, ncRNA and RNA, RNA and
   RNA, and RBP and ncRNA or RBP and mRNA. These interactions are
   supported by data mined from CLIP-seq and degradation sequencing
   experiments. Predictions of miRNA target genes and their functions were
   made using the miRDB database^[97]38. In this study, miRNAs interacting
   with hub genes (mRNA) were forecasted in accordance with ENCORI and
   miRDB databases, and the mRNA-miRNA connection system was plotted using
   the intersection of the two databases.

   CHIPBase database^[98]39 ([99]https://rna.sysu.edu.cn/chipbase/) found
   a matrix of binding sites for hundreds of DNA motifs, and predicted
   millions of Transcription factor (Transcription factors, TF) and
   transcriptional regulatory relationships between genes from the DNA
   binding protein CHIP-seq data. hTFtarget database^[100]40
   ([101]http://bioinfo.life.hust.edu.cn/hTFtarget) refers to an
   integrated database of human TF and target regulation. Using the
   CHIPBase (version 3.0) and hTFtarget databases, we looked for
   transcription factors (TFS) that bind to hub genes.

   Comparative Toxicogenomics Database (CTD)^[102]41
   ([103]http://ctdbase.org/) is based on an innovative digital ecosystem
   linking chemicals, genes, phenotypes, diseases and known toxicological
   information, which could be used to facilitate access to human health
   related information. With References ≥ 3 and Organisms ≥ 3 as screening