Abstract Systemic lupus erythematosus (SLE) patients exhibit a heightened risk of developing lung cancer, yet the underlying molecular mechanisms remain poorly understood. This study aimed to identify shared genetic factors linking SLE and LC using publicly available transcriptomic data from the Gene Expression Omnibus (GEO). Through integrated differentially expressed gene (DEG) analysis and weighted gene co-expression network analysis (WGCNA), we identified five genes consistently upregulated in both SLE and lung cancer. Gene set enrichment analysis (GSEA) revealed that these shared genes were enriched in inflammatory pathways, particularly those involving interferon-alpha, interferon-gamma, and general inflammatory responses. We applied least absolute shrinkage and selection operator (LASSO) regression to pinpoint potential diagnostic biomarkers and identified two key candidates: AIM2 and SLC26A8. These biomarkers demonstrated robust diagnostic performance with area under the ROC curve (AUC) values exceeding 0.75 in both training and validation cohorts. Immune infiltration and survival analyses using The Cancer Genome Atlas (TCGA) further supported their clinical relevance. Notably, high AIM2 expression was significantly associated with poorer overall survival in female lung adenocarcinoma patients (P = 0.03), and SLC26A8 expression was significantly linked to survival outcomes only in patients with a history of smoking (P = 0.01). These findings are particularly meaningful in SLE, where most patients are female and smoking is a known risk factor. Our study enhances the understanding of autoimmune-driven carcinogenesis and opens new avenues for precision medicine strategies in managing patients with SLE at risk for lung cancer. Keywords: Systemic lupus erythematosus, Lung cancer, Tumorigenesis, Tumor immune microenvironment, AIM2, SLC26A8, Bioinformatics 1. Background Systemic lupus erythematosus (SLE) is one of the leading causes of death in young females [[27]1,[28]2], with an increasing prevalence ranging from 13 to 7713.5 per 100,000 persons worldwide [[29]3]. This chronic autoimmune disease features not only multi-organ involvement due to the disease itself but also susceptibility to various comorbidities, including an increased risk of certain types of cancer [[30][4], [31][5], [32][6], [33][7]]. A meta-analysis comprising 57,890 SLE patients suggested a 1.6-fold increased risk of developing lung cancer in the SLE population compared with healthy controls (95 % CI: 1.44–1.77; P < 0.00001) [[34]8]. Research has also indicated that the standardized mortality ratio for lung cancer in individuals with SLE is 2.3 (95 % CI 1.6–3.0) in contrast to that in the general population [[35]9]. The distribution of lung cancer subtypes in SLE patients resembled that in lung cancer patients from the general population, with adenocarcinoma being the most prevalent subtype, followed by small cell carcinoma and squamous cell carcinoma [[36]10]. While most SLE patients who develop lung cancer are ever smokers, indicating that tobacco smoking is a shared risk factor for both SLE and lung cancer, a proportion of nonsmoking SLE individuals still develop lung malignancies with histological types not associated with smoking [[37]11]. Although the exact mechanism of tumorigenesis in the SLE population remains elusive and complicated, emerging evidence suggests that persistent immune activation and pulmonary fibrosis may create a microenvironment conducive to the carcinogenesis of lung tissues [[38][12], [39][13], [40][14]]. Additionally, an increasing number of studies indicate that genetic susceptibility contributes to SLE and the occurrence of lung malignancy [[41][15], [42][16], [43][17], [44][18], [45][19], [46][20]], suggesting a potential genetic link that may predispose individuals with SLE to the development of lung cancer. The identification of shared biomarkers that promote tumorigenesis and autoimmune genesis is crucial for elucidating the commonalities and molecular mechanisms of SLE and lung cancer and further developing targeted therapeutic strategies. In this study, we explored the connections between SLE and lung cancer by analyzing the biological mechanisms involving DEGs shared by SLE and lung cancer patients. This investigation was conducted through comprehensive bioinformatics analysis and machine learning applied to gene expression profiles from whole blood samples from patients with these disorders. 2. Materials and methods 2.1. Data collection and processing The workflow chart of this study is shown in [47]Fig. 1. We searched for gene expression profiles of SLE and lung cancer patients in the Gene Expression Omnibus (GEO) database ([48]https://www.ncbi.nlm.nih.gov/geo/) [[49]21] via the terms "systemic lupus erythematosus" and "lung cancer". The criteria included (1) Homo sapiens; (2) Array and sequencing data should include both case and control groups; (3) samples obtained from whole blood cells or peripheral mononuclear cells; and (4) the number of samples in each dataset must be greater than 30. Eventually, [50]GSE72509 ([51]GPL16791; containing 99 SLE and 18 control whole blood cell samples) and [52]GSE42830 ([53]GPL10558; containing 8 lung cancer and 38 healthy whole blood samples) were obtained and used as training datasets. We further chose [54]GSE61635 ([55]GPL570; containing 99 SLE and 30 control whole blood cell samples) and [56]GSE42834 ([57]GPL10558; containing 16 lung cancer and 143 healthy whole blood samples) for subsequent validation([58]Table 1). All expression data were downloaded into R software (version4.41) using R “GEOquery” for downstream analysis. Fig. 1. [59]Fig. 1 [60]Open in a new tab Flow chart. Table 1. Information of GEO datasets containing the SLE/lung cancer patients. GSE number Platform Samples Disease Groups Sample type [61]GSE72509 [62]GPL16791 99 patients/18 controls SLE Training set Whole Blood [63]GSE42830 [64]GPL10558 8 patients/38 controls Lung cancer Training set Whole Blood [65]GSE61635 [66]GPL570 99 patients/30 controls SLE Validating set Whole Blood [67]GSE42834 [68]GPL10558 16 patients/143 controls Lung cancer Validating set Whole Blood [69]Open in a new tab 2.2. Identification of shared DEGs GEO2R ([70]http://www.ncbi.nlm.nih.gov/geo/geo2r) is a web tool allowing users to identify DEGs across self-defined groups, using Bioconductor R packages for analysis. We divided [71]GSE72509 and [72]GSE42830 into disease and control groups, respectively, based on the disease state of the sample. For the microarray data, the force normalization function was applied, and R “GEOquery” and “limma” packages were utilized. Significant DEGs were identified as those whose adjusted P value was ≤0.05 and whose | log[2] fold change(FC) | was ≥1. Moreover, probe IDs without a corresponding gene symbol are removed. The shared DEGs from the SLE dataset and lung cancer dataset were obtained and visualized via the R “ggvenn” package. 2.3. Weighted gene coexpression network (WGCNA) construction WGCNA is a bioinformatics method that clusters genes into modules on the basis of their coexpression patterns across all samples and identifies genes strongly correlated with the phenotype and the biological processes represented by these modules [[73]22]. We performed WGCNA via the WGCNA R package and initiated the process by excluding poor-quality genes and outlier samples through sample clustering and principal component analysis (PCA). The optimal soft power was then selected to balance the highest scale-free topology fit index (R^2) with adequate mean connectivity. We then constructed a topological overlap matrix (TOM) and computed the correlations between gene modules and phenotypes to identify modules associated with clinical traits. Heatmaps with Spearman's correlation coefficients between modules with SLE and those with lung cancer were drawn for module selection. The scripts used in this process are available in Additional file 1. 2.4. Functional enrichment analysis To gain insight into the potential shared pathological mechanism of SLE and lung cancer, we performed Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on the shared DEGs and module genes identified via WGCNA via the Annotation, Visualization and Integrated Discovery (DAVID) database ([74]https://david.ncifcrf.gov). A P value < 0.05 was considered a significant term, and the results were visualized via the R “ggplot2” package. 2.5. Gene set enrichment analysis (GSEA) GSEA was conducted via the R “clusterProfiler” package to explore the biological functions of the hub genes. GSEA is a computational method that calculates gene set enrichment scores and helps understand the biological significance of gene expression changes in the context of the whole transcriptome [[75]23]. Gene expression data from SLE patients and lung cancer patients were first imported, and Spearman's correlation coefficient for each hub gene against all protein-coding genes in the transcriptome was computed. The resulting correlation values are then ranked against known gene sets. The hallmark gene sets were downloaded from the Molecular Signatures Database (MSigDB) ([76]https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) [[77]24,[78]25]. Enriched gene sets with P values < 0.05, normalized enrichment scores >1, and FDR q values < 0.25 were considered significant. The scripts used in this process are available in Additional file 3. 2.6. Validation of core shared genes The LASSO regression algorithm, a machine learning approach that applies a penalty to the absolute values of the regression coefficients and shrinks some to zero, was employed via the R “glmnet” package to filter the most relevant cored shared genes with non-zero coefficients through 10-fold cross-validation. We further conducted receiver operating characteristic (ROC) curve analysis with the R package “pROC” to evaluate the diagnostic value of the selected genes for SLE and lung cancer. The area under the curve (AUC) was determined to assess the accuracy and sensitivity of these hub genes in disease diagnosis. Additionally, ROC analysis was carried out on the validation datasets to further confirm the diagnostic relevance of the identified shared genes. 2.7. Single cell RNA sequencing analysis To validate our findings at the single-cell level, we performed an integrative single-cell RNA sequencing (scRNA-seq) analysis using the publicly available dataset [79]GSE136035, which includes peripheral blood mononuclear cells (PBMCs) from four SLE patients and two healthy controls. Raw gene expression matrices were downloaded and processed using the Seurat (v5.0) R package. Each sample was independently loaded and filtered based on standard quality control criteria (cells with >200 and < 6000 detected features and <10 % mitochondrial gene content). After normalization and identification of highly variable features, samples were integrated using Seurat's anchor-based integration workflow. Dimensionality reduction was conducted via PCA, followed by clustering and Uniform Manifold Approximation and Projection (UMAP) visualization. For cell-type annotation, we employed the SingleR package using the BlueprintEncodeData reference. Predicted cell-type identities were added to the metadata of the integrated object, enabling cluster-level analysis of gene expression across immune cell subsets. To quantify transcript-level differences across conditions, we implemented a pseudobulk differential expression approach. For each sample, raw UMI counts were aggregated across all cells to generate sample-level gene expression matrices, which were then analyzed using DESeq2. This strategy accounts for sample-level biological variability while preserving the benefits of single-cell resolution. This single-cell level validation complements our bulk transcriptomic analysis and helps clarify the cell-type-specific expression patterns of candidate genes. 2.8. Immune cell infiltration The Tumor IMmune Estimation Resource (TIMER) ([80]https://cistrome.shinyapps.io/timer/) is a dedicated online tool designed to analyze the infiltration of immune cells in various cancer types via gene expression data [[81]26]. The gene module of TIMER provides insights into the tumor microenvironment by estimating the correlation of specific gene expression and the abundance of six different immune cell types, including B cells, T cells, macrophages, neutrophils, dendritic cells, and CD4^+ T cells, in selected cancer samples. 2.9. Survival analysis The Kaplan–Meier plotter ([82]https://kmplot.com/analysis/) is a popular web server that evaluates the correlation between a gene of interest and patient survival in various cancer types [[83]27]. The patient samples are split into two groups on the basis of the median expression of our gene of interest. Biased arrays were excluded, and a P value < 0.05 was considered significant. 3. Results 3.1. Identification and analysis of shared DEGs associated with SLE and lung cancer We collected gene expression datasets of SLE and lung cancer patients and filtered the significant DEGs between the control and disease groups in each dataset via GEO2R with cutoff thresholds of P value < 0.05 and |log2FC| >1. There were 878 DEGs between SLE patients and healthy controls, with 117 genes downregulated and 761 genes upregulated, whereas 326 DEGs were identified between the lung cancer samples and the normal samples, including 160 downregulated genes and 166 upregulated genes ([84]Fig. 2A and B; see also Additional file 2: [85]Table S1 and Table S2). A Venn diagram, consisting of 24 upregulated genes and five downregulated genes, was used to display the DEGs shared between SLE patients and lung cancer patients ([86]Fig. 2C). Fig. 2. [87]Fig. 2 [88]Open in a new tab Identification and analyses of DEGs from the SLE and lung cancer datasets. A The volcano plot presenting DEGs in [89]GSE72509. B The volcano plot presenting DEGs in [90]GSE42830. C Venn diagram showing the intersected DEGs of SLE and lung cancer. D GO analysis for shared DEGs. To gain an initial understanding of the molecular mechanisms involved in SLE and lung cancer, we imported these 29 shared DEGs into the DAVID online database for functional analysis. GO analysis revealed that the innate immune response and inflammatory response were significantly enriched ([91]Fig. 2D). 3.2. Construction of a weighted gene coexpression network and determination of key modules in SLE and lung cancer Weighted gene coexpression network analysis (WGCNA) was performed to identify key modules associated with SLE and lung cancer. First, we excluded three outlier samples, including two SLE samples and one healthy control sample. A soft-thresholding power of 20 was selected on the basis of the average connectivity and scale independence ([92]Fig. 3A). A total of 32 modules associated with SLE were generated, and a heatmap was used to visualize the correlation between the modules and SLE ([93]Fig. 3B). The MEmagenta module (r = 0.47, p = <0.05) and the MEdark turquoise module (r = 0.42, p < 0.05) were identified as the modules most positively correlated with SLE and were chosen as the key modules for further analysis of the [94]GSE72509 dataset. For [95]GSE42830, no outliers were detected, and a soft-thresholding power of 6 was selected to generate the network ([96]Fig. 3C). The MEturquoise module was most positively correlated with lung cancer (r = 0.66, p < 0.05) and was selected for subsequent exploration ([97]Fig. 3D). The intersection of the MEmagenta and MEdarkturquoise modules from [98]GSE72509 and the MEturquoise module from [99]GSE42830 resulted in 125 shared module genes ([100]Fig. 4A). Fig. 3. [101]Fig. 3 [102]Open in a new tab WGCNA for identification of genes and modules positively correlated with SLE and lung cancer. A The scale-free topology model for identifying optimal soft power threshold in SLE dataset. B The heatmap illustrating the module-traits relationship of SLE. C The scale-free topology model for identifying optimal soft power threshold in lung cancer dataset. D The heatmap illustrating the module-traits relationship of lung cancer. Fig. 4. [103]Fig. 4 [104]Open in a new tab Identification and functional enrichment analysis of key modules genes in SLE and lung cancer. A The Venn diagram presenting the intersection of most positively correlated genes of SLE and lung cancer. B GO and KEGG enrichment analysis of shared modules genes. The gene ontology and KEGG pathway enrichment analysis of these shared module genes revealed significant involvement in immune-related biological processes, such as defense response to virus, positive regulation of NF-kappaB signaling, and cytokine-mediated signaling pathways ([105]Fig. 4B). Enriched cellular components include the cytosol, NLRP1 inflammasome complex, and plasma membrane components. Molecular function terms were enriched for RNA binding, receptor binding, and protein domain-specific binding. KEGG pathway analysis further highlighted enrichment in immune signaling pathways, including NOD-like receptor signaling, NF-kappaB signaling, and RIG-I-like receptor signaling pathways, indicating critical roles in inflammation and cellular signaling mechanisms. 3.3. Hub gene identification and gene set enrichment analysis To identify the shared DEGs primarily linked to both SLE and lung cancer, we intersected the genes from the MEmagenta and MEdarkturquoise modules of [106]GSE72509, the MEturquoise module of [107]GSE42830, and the shared DEGs. As shown in [108]Fig. 5, five shared DEGs associated with SLE and lung cancer were identified: AIM2 (absent in melanoma 2), ANKRD22(ankyrin repeat domain 22), CLEC4D (C-type lectin domain family 4 member D), SLC26A8(solute carrier family 26-member 8), and TNFAIP6 (TNF alpha induced protein 6). To elucidate the functional relevance of the five hub genes, we conducted GSEA based on transcriptome-wide Spearman's correlation profiles in SLE and lung cancer datasets. In SLE, all five genes consistently showed strong enrichment in immune-related hallmark pathways, including interferon alpha and gamma responses, inflammatory response, TNF-α signaling, IL6-JAK-STAT3 signaling, and complement activation, highlighting their potential roles in the regulation of innate and adaptive immunity ([109]Fig. 6A). In the lung cancer dataset, while some heterogeneity was observed, most notably for SLC26A8 and ANKRD22, which were more prominently associated with cancer-specific pathways, such as PI3K-Akt-mTOR signaling and mTORC1 signaling, key immune pathways such as interferon responses, inflammatory signaling, and TNF-α signaling remained recurrent among AIM2, TNFAIP6, and CLEC4D([110]Fig. 6B). These findings suggest that these genes may contribute to a shared immunological axis in SLE and lung cancer, potentially bridging chronic inflammation and tumor immune microenvironment modulation. Fig. 5. [111]Fig. 5 [112]Open in a new tab Five hub genes were obtained by intersecting the shared DEGs and key module genes of SLE and lung cancer. Fig. 6. [113]Fig. 6 [114]Open in a new tab GSEA of core shared genes in SLE (A) and lung cancer(B). 3.4. Performance of the LASSO regression algorithm and evaluation of core shared genes To identify the hub genes with the strongest predictive value, a LASSO regression model was established. AIM2, SLC26A8, and TNFAIP6 retained nonzero coefficients under the regularization parameter (λ) corresponding to the minimum mean squared error (MSE) and the λ within one standard error (SE) of the minimum MSE in the SLE dataset ([115]Table 2). In contrast, in the lung cancer dataset, only AIM2 and SLC26A8 showed nonzero coefficients through cross-validation. Table 2. Coefficients of hub genes calculated through LASSO regression model. Gene Symbol (NCBI Gene ID) Coefficients __________________________________________________________________ [116]GSE72509 (SLE) __________________________________________________________________ [117]GSE42830 (lung cancer) __________________________________________________________________ Minimum MSE λ λ within 1-SE Minimum MSE λ λ within 1-SE AIM2 (9447) 0.817 0.318 1.28 0.54 ANKRD22(118932) 0 0 0.037 0 CLEC4D (338339) 0 0 0 0 SLC26A8(116369) 0.474 0.127 1.09 0.32 TNFAIP6(7130) 0.038 0.026 0 0 [118]Open in a new tab To assess the diagnostic potential of AIM2 and SLC26A8 for SLE and lung cancer, we utilized receiver operating characteristic (ROC) curve analysis on the [119]GSE72509 and [120]GSE42830 datasets. The AUC values for both AIM2 and SLC26A8 indicated substantial diagnostic significance for SLE and lung cancer in the training data. To verify these findings, we further tested their diagnostic accuracy via the [121]GSE61635 dataset for SLE patients and the [122]GSE42834 dataset for lung cancer patients. The AUC values for AIM2 and SLC26A8 were 0.93 and 0.85, respectively, in [123]GSE61635, whereas both biomarkers performed well in [124]GSE42834, highlighting AIM2 and SLC26A8 as promising diagnostic markers for both SLE and lung cancer ([125]Fig. 7). Fig. 7. [126]Fig. 7 [127]Open in a new tab ROC curves of AIM2 and SLC26A8 on training and validating datasets to evaluate and validate the diagnostic values for SLE and lung cancer. To validate our findings at single-cell resolution, we further performed an integrated analysis of the [128]GSE136035 scRNA-seq dataset, which includes PBMCs from four SLE patients and two healthy controls. Analysis of the UMAP plot, incorporating SingleR-derived cell type annotations, demonstrates that AIM2 expression is particularly elevated in B cell subsets ([129]Fig. 8). To quantify transcript-level differences across conditions, we implemented a pseudobulk differential expression approach. For each sample, raw UMI counts were aggregated across all cells to generate sample-level gene expression matrices, which were then analyzed using DESeq2. This strategy accounts for sample-level biological variability while preserving the benefits of single-cell resolution. AIM2 was significantly upregulated in SLE samples compared to controls (log[2] fold change = 1.025, p = 0.0356), consistent with findings from bulk RNA-seq analysis. However, SLC26A8 expression was not detected in the scRNA-seq dataset, likely due to its low or absent expression in circulating PBMCs. Additionally, cell-type-specific pseudobulk analysis was not feasible due to limited overlap in annotated immune cell subsets between the SLE and control groups after quality filtering. Fig. 8. [130]Fig. 8 [131]Open in a new tab The UMAP visualization, integrating SingleR-predicted cell types with AIM2 gene expression, reveals a distinct enrichment of AIM2 expression within B cell populations. 3.5. Immune infiltration analysis of core shared genes in lung cancer Given that GSEA of the hub genes is closely linked to immune regulation, we investigated the correlation between the expression levels of AIM2 and SLC26A8 and immune infiltration in lung cancer via TIMER. AIM2 demonstrated a significant negative correlation with tumor purity in both LUAD and LUSC (p < 0.05), suggesting its predominant expression in the tumor immune microenvironment rather than within tumor cells ([132]Fig. 9A). Additionally, AIM2 showed moderate positive partial correlations with multiple immune cell populations, especially neutrophils and dendritic cells. These results indicate that AIM2 may play an active role in shaping or responding to immune infiltration across lung cancer subtypes. Fig. 9. [133]Fig. 9 [134]Open in a new tab Immune infiltration analysis of core shared genes in LUAD and LUSC. A Correlation between the expression of AIM2 and tumor immune microenvironment. B Correlation between the expression of SLC26A8 and tumor immune microenvironment. In contrast, SLC26A8 did not show a statistically significant correlation with tumor purity in either LUAD or LUSC([135]Fig. 9B). However, partial correlation analysis revealed modest but significant associations with several immune cell types in both cancer types, including B cells, CD8^+ T cells, CD4^+ T cells, and macrophages. While these correlations were lower than those observed for AIM2, the data suggest a potential, though weaker, immunological role for SLC26A8, particularly in the context of innate immune cell infiltration. These findings highlight the differential involvement of the two genes in the lung cancer immune landscape, with SLC26A8 more strongly linked to immune cell enrichment. 3.6. Assessing core shared gene expression and clinical outcomes Finally, we investigated the prognostic significance of AIM2 and SLC26A8 in lung cancer patients via Kaplan–Meier survival analysis. The results revealed that higher AIM2 expression was associated with poorer overall survival in the LUAD subtype ([136]Fig. 10A; see also Additional file 4:[137]Figure S3 A). Given the female predominance of SLE, we further explored sex-specific differences in LUAD and found that elevated AIM2 expression was significantly associated with worse outcomes exclusively in female patients. ([138]Fig. 10B). In contrast, SLC26A8 expression levels were not significantly associated with survival outcomes when stratified by lung cancer subtypes or patient sex (Additional file 4:[139]Figure S3 B and C). However, higher SLC26A8 levels were linked to poorer overall survival specifically in patients with a smoking history, but not in those who never smoked, suggesting its potential relevance in smoking-related lung cancer, where tobacco exposure is also a shared risk factor with SLE ([140]Fig. 10C). Fig. 10. [141]Fig. 10 [142]Open in a new tab Kaplan‒Meier analysis of overall survival based on AIM2 and SLC26A8 expression. A Higher AIM2 expression was associated with poorer overall survival in the LUAD subtype. B Elevated AIM2 expression is linked to poorer overall survival exclusively in female lung cancer patients. C Data stratified by smoking history indicated that lower SLC26A8 expression is significantly associated with better survival outcomes only in patients with tobacco exposure. 4. Discussion Systemic lupus erythematosus (SLE) is a chronic autoimmune disease characterized by systemic inflammation and immune dysregulation that affects multiple organs and leads to 2.87-fold greater mortality than the general population does [[143]28]. Moreover, epidemiological data revealed that SLE patients have an approximately 1.62-fold increased risk of developing lung cancer compared with the general population [[144]29], and cancer is the second leading cause of death among SLE patients, following cardiovascular disease [[145]30]. Over the past decade, emerging evidence has suggested a potential link between chronic inflammation and the development of various cancers [[146][31], [147][32], [148][33], [149][34]]. Persistent immune activation and a dysregulated inflammatory response in SLE are thought to create a microenvironment conducive to tumorigenesis [[150][35], [151][36], [152][37]]. A previous study further demonstrated that genetically predisposed SLE directly contributes to an increased risk of lung cancer development through Mendelian randomization analysis [[153]38]. This elevated risk underscores the importance of investigating the molecular and genetic mechanisms linking SLE and lung cancer. While the precise mechanisms connecting SLE and lung cancer are not fully understood, our study aims to investigate the genetic signatures correlated with SLE and lung cancer, with the goal of identifying more targeted therapies, improving cancer surveillance in SLE patients, and providing insights into the shared pathways of immune dysregulation and carcinogenesis. In the present research, we used a comprehensive approach to identify shared gene signatures between SLE patients and lung cancer patients by analyzing data from the GEO database. Through the integrated DEG and WGCNA analyses, we identified five genes shared between the two diseases. These genes are enriched primarily in immune and inflammatory pathways, such as the interferon-gamma and interferon-alpha pathways, as determined through GSEA. Additionally, via the LASSO regression algorithm, we found AIM2 and SLC26A to be the genes most significantly associated with both SLE and lung cancer. ROC curves were used to assess the diagnostic value of AIM2 and SLC26A8 for SLE and lung cancer. By calculating the AUC in the training and validation datasets, the values above 0.7 indicated that AIM2 and SLC26A8 are potential biomarkers for SLE and lung cancer. AIM2 is one of the key pathogen recognition receptors and plays a major role in the innate immune response [[154]39]. Located in the cytoplasm, AIM2 identifies double-stranded DNA and initiates the formation of the inflammasome [[155]40]. This results in the activation of caspase-1 and the production of proinflammatory interleukin-1β (IL-1β) and IL-18 [[156]41]. This pathway is essential for protection against microbial infections. However, irregularities in AIM2 expression have been linked to various autoimmune and inflammatory conditions, including SLE [[157][42], [158][43], [159][44], [160][45]]. In line with our findings, AIM2 levels are significantly increased in SLE patients and enhance B-cell differentiation by regulating the Bcl-6-Blimp-1 pathway [[161]46]. SLE disease severity has also been shown to be correlated with AIM2 expression, as inhibiting AIM2 expression significantly alleviated SLE symptoms by reducing macrophage activation and suppressing the inflammatory response in lymphocyte-derived apoptotic DNA lupus mice [[162]47]. In addition, the overexpression of AIM2 is closely related to promote ng lung tumorigenesis [[163]48]. AIM2 inflammasome-driven IL-1β promotes HIF-1α expression via the NF-κB/COX-2 pathway, while its interaction with mitochondria activates MAPK/ERK signaling, contributing to tumor development in NSCLC [[164]49,[165]50]. SLC26A8 is a chloride ion transporter primarily known for its role in spermatogenesis and sperm motility [[166]51,[167]52]. Although its function has been well documented in reproductive biology, emerging evidence suggests that SLC26A8 may also play a role in cancer biology [[168][53], [169][54], [170][55]]. In a study by Han Z et al., SLC26A8 was found to be one of the top upregulated genes in PBMCs from patients with hepatocarcinoma, and SLC26A8, along with five other dysregulated genes in liver cancer, was significantly enriched in several pathways, comprising the immune response, granulocyte activation, T-cell activation, Toll-like receptor binding, and GTPase regulator activity [[171]53]. SLC26A8 is also an eosinophil-related gene correlated with the clinical outcome of bladder urothelial carcinoma and a potential susceptibility gene for hereditary nonpolyposis colorectal cancer [[172]54,[173]55]. Additionally, increasing research has suggested that SLC26A8 is associated with autoimmune diseases and inflammatory regulation [[174][56], [175][57], [176][58], [177][59]]. SLC26A8 was identified as a matrix metalloproteinase (MMPS)-related disease marker and plays a role in the pathogenesis of inflammatory bowel disease [[178]56]. A study of moderate-to-severe asthma demonstrated that SLC26A8 is associated with mucosal immunity, cell metabolism, and airway remodeling [[179]57]. Moreover, SLC26A8 shows notable diagnostic performance in pediatric septic shock patients and is involved in the infiltration of immune cells [[180]58]. An in vitro study further confirmed that SLC26A8 mutation triggers vasoactive neuropeptide production and contributes to neurogenic inflammation during the development of rosacea, a chronic inflammatory skin disorder [[181]59]. With respect to lung cancer treatment, the immune microenvironment has become crucial in tumor progression and therapeutic resistance [[182]60]. In this study, we assessed the correlation between core shared genes, AIM2 and SLC26A8, and the immune microenvironment in lung cancer. Our finding that elevated expression of AIM2 is positively associated with an active tumor microenvironment in LUAD and LUSC is consistent with the findings of previous studies [[183]61,[184]62]. In addition, we discovered that SLC26A8 has a mild but significant association with the activation of B cells and CD4^+ T cells in the lung cancer microenvironment. These results underscore the potential of targeting the tumor microenvironment, particularly AIM2 and SLC26A8, as a promising therapeutic approach for lung cancer patients. We aimed to further highlight the correlation between AIM2 and SLC26A8 expression and the clinical outcome of patients with lung cancer. Specifically, in female patients with lung cancer, AIM2 expression significantly correlated with overall survival, whereas no such association was observed in male patients. Our findings also revealed that in female lung adenocarcinoma patients, higher AIM2 expression is linked to worse overall outcomes. Given that the majority of SLE patients are female and that lung adenocarcinoma is the most common lung cancer subtype in SLE patients, these results suggest a potential sex-specific role in which AIM2 may contribute to lung cancer progression through mechanisms such as hormonal responses. AIM2 may serve as both a valuable biomarker for assessing lung cancer prognosis and a potential therapeutic target, particularly in female patients with lung adenocarcinoma. On the other hand, the association of lower SLC26A8 expression with better overall survival in lung cancer patients with a smoking history, but the lack of association in nonsmokers, implies a potential role of SLC26A8 in smoking-related pathways or inflammatory responses in lung cancer. There are several limitations to our study. First, the current study is completely based on in silico analyses. Second, the precise mechanisms underlying the immune responses triggered by SLC26A8 require further investigation. Therefore, in vivo and in vitro experiments to verify the role of AIM2 and SLC26A8 in the tumorigenesis of SLE patients are necessary. In conclusion, this study provides novel insight into the molecular pathway connections between SLE and lung cancer. By utilizing multiple analytical techniques with validation via external datasets, we are committed to enhancing the robustness of the results and hope to offer immediate translational benefits, such as biomarkers for diagnosis and prognosis, as well as a more comprehensive understanding of the molecular pathways linking SLE and lung cancer. 5. Conclusion The present study revealed that the chronic inflammatory state may be linked to the mechanisms of SLE and lung cancer and that AIM2 and SLC26A8 are key shared genes. While the precise mechanisms connecting SLE and lung cancer are not fully understood, the use of integrated bioinformatics to detect and validate genes shared by SLE and lung cancer may provide potential therapeutic targets and pave the way for understanding common pathogenic mechanisms. List of abbreviations AIM2 absent in melanoma 2 ANKRD22 ankyrin repeat domain 22 AUC area under the curve CLEC4D C-type lectin domain family 4 member D DEGs differentially expressed genes GEO Gene Expression Omnibus GO Gene Ontology GSEA Gene Set Enrichment Analysis IL interleukin KEGG Kyoto Encyclopedia of Genes and Genomes LASSO least absolute shrinkage and selection operator LUAD lung adenocarcinoma LUSC lung squamous cell carcinoma MMPS matrix metalloproteinases MSE minimum mean squared error PBMCs peripheral blood mononuclear cells PCA principal component analysis PPI protein‒protein interaction ROC receiver operating characteristic scRNA-seq single-cell RNA sequencing SE standard error SLC26A8 solute carrier family 26-member 8 SLE systemic lupus erythematosus TNFAIP6 TNF alpha induced protein 6 WGCNA weighted gene coexpression network analysis [185]Open in a new tab CRediT authorship contribution statement Chueh-Hsuan Hsu: Writing – original draft, Software, Formal analysis, Data curation. Shuo-Chueh Chen: Resources, Methodology. Yung-Luen Yu: Writing – review & editing, Supervision, Project administration, Conceptualization. Ethics approval and consent to participate Not applicable. Consent for publication All the authors have read and approved the content and agree to submit it for consideration for publication in the journal. Availability of data and materials The datasets generated during and/or analyzed during the current study are available in the GEO database repository ([186]https://www.ncbi.nlm.nih.gov/geo/). Funding This research was funded by grants from the National Science and Technology Council, Taiwan (NSTC 110-2314-B-039-034-MY3; NSTC 112-2320-B-039-020-; NSTC 113-2314-B-039-019-MY3; NSTC 114-2314-B-039-067-), China Medical University, Tai-wan (CMU108-MF-01; CMU109-MF-03), and China Medical University Hospital, Taiwan (DMR-112-198; DMR-113-020). Declaration of competing interest The authors declare that they have no competing interests. Acknowledgments