Abstract Graves' disease (GD) is an autoimmune disorder characterized by hyperthyroidism resulting from autoantibody-induced stimulation of the thyroid gland. Despite recent advancements in understanding GD's pathogenesis, the molecular processes driving disease progression and treatment response remain poorly understood. In this study, we aimed to identify crucial immunogenic factors associated with GD prognosis and immunotherapeutic response. To achieve this, we implemented a comprehensive screening strategy that combined computational immunogenicity-potential scoring with multi-parametric cluster analysis to assess the immunomodulatory genes in GD-related subtypes involving stromal and immune cells. Utilizing weighted gene co-expression network analysis (WGCNA), we identified co-expressed gene modules linked to cellular senescence and immune infiltration in CD4^+ and CD8^+ GD samples. Additionally, gene set enrichment analysis enabled the identification of hallmark pathways distinguishing high- and low-immune subtypes. Our WGCNA analysis revealed 21 gene co-expression modules comprising 1,541 genes associated with immune infiltration components in various stages of GD, including T cells, M1 and M2 macrophages, NK cells, and Tregs. These genes primarily participated in T cell proliferation through purinergic signaling pathways, particularly neuroactive ligand-receptor interactions, and DNA binding transcription factor activity. Three genes, namely PRSS1, HCRTR1, and P2RY4, exhibited robustness in GD patients across multiple stages and were involved in immune cell infiltration during the late stage of GD (p < 0.05). Importantly, HCRTR1 and P2RY4 emerged as potential prognostic signatures for predicting overall survival in high-immunocore GD patients (p < 0.05). Overall, our study provides novel insights into the molecular mechanisms driving GD progression and highlights potential key immunogens for further investigation. These findings underscore the significance of immune infiltration-related cellular senescence in GD therapy and present promising targets for the development of new immunotherapeutic strategies. Keywords: Graves' disease, Immunotherapeutic biomarkers, Immune infiltration, WGCNA, Gene co-expression modules, Molecular mechanisms Highlights * • Identified key immunogenic factors for GD prognosis and therapy response * • Utilized computational immunogenicity-potential scoring and multi-parametric cluster analysis * • Co-expression modules related to immune infiltration and cellular senescence in CD4^+ and CD8^+ GD samples identified * • Three genes (PRSS1, HCRTR1, P2RY4) were identified as potential key immunogens for GD therapy and overall survival prediction * • Immune infiltration-related cellular senescence in GD underscores the importance of new immunotherapeutic strategies. 1. Introduction Graves' disease (GD), a prevalent autoimmune disorder, affects the thyroid gland causing hyperthyroidism, often seen in individuals between the ages of 30 and 50 [[35]1]. Women are more likely to be diagnosed with GD, with a female-to-male ratio of around 7:1 [[36]2]. The incidence of GD is higher in industrialized countries such as the United States and Europe, with a prevalence of 1–2% and 0.5–2%, respectively [[37]3,[38]4]. However, the incidence of GD in Asia is relatively lower, with a reported rate of 0.3–0.7%. A study published in the Journal of Clinical Endocrinology & Metabolism has indicated an increasing trend of GD incidence in China, with a rate of 0.5–0.8% [[39]5,[40]6]. The standard treatment options for GD include antithyroid medications, radioactive iodine (RAI) therapy, and thyroidectomy [[41][7], [42][8], [43][9]]. Close monitoring and adjustments to the treatment plan may be necessary for some patients [[44]10,[45]11]. Although current treatments effectively manage hyperthyroidism, they do not address the underlying immune activation and autoantibody production [[46]11]. Hence, there is a need for further research and development of treatments targeting the underlying immunological mechanisms in GD. GD is characterized by immune pathology that involves the production of autoantibodies targeting the thyroid-stimulating hormone receptor (TSHR) in the thyroid gland, which results in hyperthyroidism due to excessive secretion of thyroid hormones [[47]12,[48]13]. The immune response in GD also involves the participation of T and B cells, which play critical roles in the disease progression [[49]14]. T cells secrete cytokines that promote inflammation and stimulate the thyroid gland, while B cells produce autoantibodies against thyroid-specific antigens, which contribute to ongoing immune activation and excessive thyroid hormone secretion [[50]14,[51]15]. The exact cause of immune-mediated autoantibody production in autoimmune disorders is not yet fully understood, although genetic susceptibility, viral infections, and endocrine imbalances have been suggested as contributing factors. The initial triggers for autoantibody production and immune activation are still unknown, and the precise interactions between immune cells and autoantibodies in the thyroid gland require further investigation [[52]12,[53]16]. However, recent advances in computational tools, such as the immunogenicity-potential scoring (IPS) [[54]17,[55]18], microenvironment cell populations-counter (MCP) [[56][19], [57][20], [58][21]], X-cells algorithms [[59]22,[60]23], and weighted gene co-expression network analysis (WGCNA) [[61]24,[62]25], have enabled researchers to gain a deeper understanding of the molecular mechanisms underlying autoantibody production and the complex interactions between stromal and immune cells in the thyroid gland [[63]26,[64]27]. These cutting-edge tools allow for large-scale gene expression analysis to identify key biological pathways and processes involved in the path immunology of autoimmune disorders, providing valuable insights into the underlying mechanisms of these diseases [[65]28]. In this study, we present a pioneering investigation into the immune landscape of GD utilizing cutting-edge computational tools. Our approach integrates advanced methodologies, including IPS, MCP, and X-cell algorithms, complemented by the well-established WGCNA [[66]26,[67]27]. This departure from conventional research strategies aims to fill crucial knowledge gaps regarding the molecular origins of immune-mediated autoantibody production in the GD [[68]28]. By leveraging large-scale gene expression analysis, we uncover key biological pathways, unraveling the intricate immunological processes that characterize autoimmune disorders. This unique amalgamation of methodologies not only propels our comprehension of GD immunology but also lays the groundwork for revolutionary diagnostic and therapeutic avenues with profound implications. Specifically, we aim to determine the disrupted immune network associated with GD pathogenesis and evaluate the feasibility of utilizing a prognostic marker to predict the immune microenvironment status in GD patients. By combining the expression data of stromal and immune cells in GD-affected tissues, our study seeks to derive a novel understanding of the intricate interactions between immune cells and autoantibodies in GD and enhance our capacity to forecast and manage this debilitating autoimmune disorder. Our research has the potential to provide valuable insights into the pathophysiology of GD and inform the development of more effective diagnostic and therapeutic strategies. 2. Materials and methods 2.1. Study design In this observational study, transcriptomic data of GD obtained from The Cancer Genome Atlas (TCGA) and The Cancer Genome Atlas Glioma (CGGA) collections were analyzed to identify cellular and molecular pathways associated with senescence in GD through an immune infiltration-guided strategy. The study employed IPS, MCP, and Xcell algorithms to comprehensively analyze the immune cell fractions and immune clustering of the immune infiltration components in GD datasets. The IPS algorithm was utilized to evaluate the overall abundance of immune cells within the transcriptomic data. This computational approach entails a thorough assessment of the expression profiles linked to various immune cell types. IPS offers a quantitative measure, enabling the determination of the global immune cell landscape in the context of GD [[69]17,[70]18]. Additionally, MCP was employed to scrutinize the specifics of immune cell populations present in the GD datasets. This algorithm utilizes advanced computational methods to deconvolute bulk transcriptomic data, providing insights into the quantitative distribution of distinct immune cell subsets. By leveraging gene expression signatures associated with various immune cell types, MCP enhances the resolution for characterizing specific immune cell populations contributing to the GD microenvironment [[71]19,[72]20]. Moreover, the distribution analysis of immune cell populations in the GD datasets was conducted using the Xcell algorithm. Xcell employs a sophisticated computational framework to evaluate the relative proportions of diverse immune cell types within the GD transcriptomic landscape [[73]22,[74]23]. By leveraging specific gene expression profiles associated with various immune cells, Xcell enhances our understanding of the spatial arrangement and interplay of immune cell populations in the context of GD. These algorithms collectively contribute to a comprehensive assessment of the immune landscape, providing valuable information on both the overall abundance and specific distribution patterns of immune cell populations associated with GD [[75]26,[76]27]. Furthermore, the WGCNA was used to identify genes related to the immune hub and involved in glioma immune invasion. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to uncover specific signaling pathways, and the reliability of the immune-hub genes was evaluated through clinical prognosis analysis. A protein-protein interaction analysis was also performed to predict potential mechanisms. The study aimed to provide a comprehensive understanding of the immune profile of GD and its potential implications for prognosis and therapy. The sample abundance scores for immune cells were separately calculated for patient samples from different clinical stages and ages [[77]29]. The flowchart in [78]Fig. 1 summarizes the preparation, processing, analysis, and validation of the data. Fig. 1. [79]Fig. 1 [80]Open in a new tab Flowchart describing the schematic overview of the study design for investigating immune infiltration characteristics in patients with GD. 2.2. Data acquisition In this study, transcriptomic datasets were collected from the NCBI GEO database to investigate epigenetic profiling in CD4^+ and CD8^+ T cells of patients with GD. The datasets contained information on changes in genes related to T cell receptor signaling, providing valuable insights into the underlying mechanisms of GD. The study sample consisted of 38 GD patients and 31 healthy individuals from the [81]GSE71956 cohort. The researchers extracted total RNA from sorted CD4^+ and CD8^+ T cells to analyze DNA methylation patterns of specific genes and examine their correlation with gene expression profiles in GD. This approach enabled the identification of potential targets for therapeutic intervention and a deeper understanding of the molecular pathways involved in GD pathogenesis. The study followed the PRISMA statement recommendations to ensure transparency and reproducibility of the research process. 2.3. Data pre-processing Before the gene expression analysis, the raw data were preprocessed using the Limma package in R/Bioconductor, with version R 2.14.0. Quantile normalization was conducted on the data to ensure the accurate representation of gene expression levels, which was implemented using the Limma package. To match the microarray probe sets with their corresponding genes, an Affymetrix-released annotation file for each platform was used. To eliminate redundancy, probes that mapped to multiple genes were removed, and the average expression level of the probes that mapped to the same gene was used. Lastly, hierarchical clustering was carried out to eliminate any noisy data from the analysis. 2.4. Gene co-expression network reconstruction The WGCNA approach was employed to identify co-expressed gene modules related to cellular senescence and immune infiltration in the GD [[82]28,[83]30]. To begin with, three biological co-expression networks of immune pathways were constructed using the WGCNA R package, and a weighted adjacency matrix and a topological overlap matrix were built to evaluate the correlation among the genes involved in the senescence [[84]30,[85]31]. The authors identified biologically relevant modules that had a minimum size of 30 and a split depth of 3. To understand the association of gene sets with the disease immunophenotype, the module-phenotype relationship was calculated, and the study highlighted that the glioma-immune microenvironment can impact cellular senescence's pro- or anti-immunopathology tendencies and therapy response. Additionally, the IPS, MCP, and Xcell algorithms were employed to evaluate the immune cell infiltration level, and the correlation between the immune score and hub genes was determined. Finally, the Chi-square test was carried out to assess the significance of immuno-module genes in each group, to comprehend the immune infiltration-related cellular senescence genes in glioma. 2.5. Gene and pathway enrichment analysis To gain insights into the biological processes and molecular functions related to the targeted immune hubs in glioma, several analyses were conducted using multiple datasets such as DisGeNET [[86]32], Clinical Variations [[87]24], DisGeNET [[88]25], BeFree [[89]26], BioCyc [[90]27], KEGG [[91]28], and MSigDBBioCarta [[92]30]. ToppGene and the clusterProfiler package were used to perform GO analysis and KEGG analysis to uncover enriched pathways across multiple datasets [[93][33], [94][34], [95][35]]. The gene set enrichment analysis (GSEA) method was used to conduct pathway enrichment analysis based on 50 hallmark pathways from the Molecular Signature Database [[96]36]. The Enrichplot package was used to visualize the results, and Kaplan-Meier diagrams were created to determine the prognostic pattern of the top overlapping oncogenic pathways. These analyses aimed to reveal the underlying mechanisms of the targeted immune hubs and their impact on patient prognosis [[97]37]. 2.6. Protein-protein interaction network construction To analyze the relationships between genes, a protein-protein interaction (PPI) network was constructed from the genes in the enriched modules. The PPI network was constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) and visualized using Cytoscape [[98]38]. In the context of our analysis, STRING served as a valuable resource for comprehending the functional associations and potential interactions between genes. The constructed PPI network, visualized through Cytoscape, provided an insightful representation of gene connectivity and potential functional relationships. Hub genes were identified as those with node connectivity of ≥5, indicating that they play a crucial role in the network due to their many connections to other genes. The PPI network visualization allows for the identification of potential interactions and functional relationships between genes, providing valuable information for further analysis. The integration of STRING and Cytoscape in this analysis not only enhances the clarity of the PPI network but also contributes to the identification of crucial interactions, providing a foundation for further in-depth analysis of the functional relationships between genes. 2.7. Development of a comprehensive and predictive nomogram To investigate the relationship between the expression of the key immune-related gene and clinical grade, the researchers utilized the survival package and survminer package in R for survival analysis. Clinicopathological grading and survival time records of the GD cases in the dataset were analyzed, and the correlation between the target immune-related gene and clinical grade was evaluated. The Kaplan-Meier method was employed to plot the survival curve, with the log-rank test used as a statistical significance test. A p-value of less than 0.05 was considered significant. Univariate and multivariate Cox regression analyses were conducted to identify independent prognostic factors. A comprehensive and predictive nomogram was developed using the "rms" R package, which incorporated the immune-score signature, age, gender, and stages. The performance and accuracy of the nomogram were evaluated using receiver operating characteristics (ROC) and calibration curves. 2.8. Data analysis All data analysis in this study was carried out using the R software, version 4.0.3. Statistical tests such as the Wilcoxon signed-rank test were used to compare two groups, while the Kruskal-Wallis test was applied to compare three or more groups. A p-value less than 0.05 was considered statistically significant, and all tests were two-sided. Descriptive statistics such as mean ± standard deviation (SD) and medians (ranges) were used to present normally and non-normally distributed variables, respectively. Categorical variables were reported as numbers and percentages. T-tests or one-way ANOVA were used for data analysis, while Spearman's rank correlation coefficient test was used to identify the effect of the cut-off threshold and construct the ROC curve. The relationship between sensitivity and specificity was determined between groups. An adjusted p-value less than 0.05 was considered statistically significant. 3. Results 3.1. Identification of immune infiltration-related senescence genes In this study, we aimed to identify senescence genes that have a role in the immune system and their connection with chronic diseases and GD. To achieve this, we utilized three algorithms - IPS, MCP, and Xcell - to measure different immune cell subsets in the TCGA cohort based on their phenotypic characteristics. S1 Figure shows the results of these analyses for each sample, and we examined the immune molecular characteristics at different stages of CD4^+ and CD8^+ GD samples. Our results indicated a significant decrease in immune scores as GD progressed, while purity decreased in higher grades ([99]Fig. 2A). We also observed that GD was more prevalent in the 30–50 age group than in those above 50 years old ([100]Fig. 2B). By using these algorithms, we evaluated the immune scores and infiltrations of various immune infiltration components such as B cells, M1 Macrophages, M2 Macrophages, NK cells, and Tregs at different stages of GD ([101]Fig. 2C). Our analysis indicated that the infiltration of these immune cells was lower in the later stages of GD, except for DCs, which increased in severe stages. We also found a positive correlation between GA-related genes and immune scores. Additionally, we assessed the accuracy of the immunogenicity score of IPS, AZ-IPS, and CP-IPS and observed that different immunogenicity algorithms of GD decreased in the late stages of the disease. However, the MHC-I binding prediction component of the IPS algorithm (MHC 1-IPC) increased in the severe stage of GA, indicating an accurate prediction of MHC binding affinity for the development of effective cancer immunotherapies ([102]Fig. 2D). Overall, our study provides insights into the role of senescence genes in the immune system and their association with chronic diseases and GD. Fig. 2. [103]Fig. 2 [104]Open in a new tab Differences in immune infiltration characteristics among patients with different stages of GD (A) and different ages (B). (C) The nine immune cell scores differed among the three stages based on IPS, MCP, and Xcell algorithms. (D) Immune cell fractions between different stages of GD. Values are mean ± SEM, *p < 0.05, **p < 0.001. 3.1.1. Immuno-module screening To analyze samples with high immune scores at the systems level, we utilized the WGCNA approach to construct a biological network. Samples were divided into high-expression outliers and low-expression groups based on the interquartile range, with values 1.5 times greater than the interquartile range being classified as high-expression outliers. Both IPC and MCP algorithms were used to analyze and select the high immune score samples ([105]Fig. 3 A and 3B respectively). Out of the 38 samples, 11 samples with low immune scores were identified as outliers and removed from the study. The high immune expression group consisted of 68% (27/38) of the TCGA samples and 32% (12/38) of the samples ([106]Fig. 3). For further investigation of immune-related genes and hubs, we performed additional analysis on the high immune score samples, given the focus of our study on GD, an autoimmune disorder. We sorted the sample results according to all immunoassay measurements, after screening them using both the IPC and MCP algorithms, as outlined in the S1 Table. Fig. 3. [107]Fig. 3 [108]Open in a new tab Immune screening of gene expression in CD4^+ and CD8^+ GD samples from the TCGA database was evaluated based on the IPS (A) and MCP (B) algorithms. The vertical gray line denotes a subset of high immune score expression outliers (right) determined by the 1.5 × interquartile range. Box plots (far right) display aggregate expression of the genes in high and low immune score groups. Statistical analysis was performed using the Mann-Whitney U test. 3.1.2. Immuno-module constructing The study utilized the WGCNA method to construct a co-expression network of genes associated with the development of GD. Analysis of the microarray data identified 1,541 differentially expressed genes, and 21 co-expression modules were found to be linked with GD ([109]Fig. 4A). We identified five modules through cluster dendrogram analysis, each labeled with a unique color: blue, cyan, dark red, midnight blue, and royal blue, which showed similar module integration ([110]Fig. 4). Among the selected immuno-module, our analysis resulted in a scale-free network that displayed a relative balance between scale independence and means connectivity when we used a soft thresholding power of 6 ([111]Fig. 4B). Interestingly, the blue module contained taxonomic genes that were different from the genes in the other modules. We employed eigengene co-expression analysis to merge the dynamic modules into three using a threshold of 0.5, and this process confirmed the reliability of the dark-red module, as illustrated in [112]Fig. 4B. Our study's key finding was that 75% (52/69) of the genes in the dark-red module were significantly correlated with immune response in GD, making it the critical module between the normal mild GD and severe GD groups (p = 0.035). We have listed the genes of the selected immune module in the S2 Table. We also conducted dimensionless topological network analysis between different stages and age groups to identify potential candidate key immunogens linked to GD progression, which was used to scale the adjacency matrix ([113]Fig. 4C). Our results indicated that the dark-red module was the most significant module linked to late-stage GD and contained 69 mRNAs that were potential candidates for further investigation ([114]Fig. 4D; p = 0.004). Moreover, our module correlation analysis revealed statistically significant positive correlations between age and pathological stage characteristics with the dark red, midnight blue, and royal blue modules (all p < 0.05). Fig. 4. [115]Fig. 4 [116]Open in a new tab Weighted gene correlation network analysis (WGCNA) of CD4^+ and CD8^+ GD samples based on stage (mild, moderate, and severe) and age range. (A) Clustering dendrogram and module-trait analysis. Different colors of the column indicate different hub modules. (B) Analysis of the scale-free fit index and mean connectivity for various soft-threshold powers. (C) Clustering dendrogram of all genes with dissimilarity based on the topological overlap and assigned module colors. (D) Heatmap for the correlation between immune modules and traits. (For interpretation of the references to color in this figure legend, the reader is