Abstract Background Systemic lupus erythematosus (SLE) is an autoimmune disease that involves multiple organs. However, the current SLE-related biomarkers still lack sufficient sensitivity, specificity and predictive power for clinical application. Thus, it is significant to explore new immune-related biomarkers for SLE diagnosis and development. Methods We obtained seven SLE gene expression profile microarrays ([33]GSE121239/11907/81622/65391/100163/45291/49454) from the GEO database. First, differentially expressed genes (DEGs) were screened using GEO2R, and SLE biomarkers were screened by performing WGCNA, Random Forest, SVM-REF, correlation with SLEDAI and differential gene analysis. Receiver operating characteristic curves (ROCs) and AUC values were used to determine the clinical value. The expression level of the biomarker was verified by RT‒qPCR. Subsequently, functional enrichment analysis was utilized to identify biomarker-associated pathways. ssGSEA, CIBERSORT, xCell and ImmuCellAI algorithms were applied to calculate the sample immune cell infiltration abundance. Single-cell data were analyzed for gene expression specificity in immune cells. Finally, the transcriptional regulatory network of the biomarker was constructed, and the corresponding therapeutic drugs were predicted. Results Multiple algorithms were screened together for a unique marker gene, MX2, and expression analysis of multiple datasets revealed that MX2 was highly expressed in SLE compared to the normal group (all P < 0.05), with the same trend validated by RT‒qPCR (P = 0.026). Functional enrichment analysis identified the main pathway of MX2 promotion in SLE as the NOD-like receptor signaling pathway (NES=2.492, P < 0.001, etc.). Immuno-infiltration analysis showed that MX2 was closely associated with neutrophils, and single-cell and transcriptomic data revealed that MX2 was specifically expressed in neutrophils. The NOD-like receptor signaling pathway was also remarkably correlated with neutrophils (r >0.3, P < 0.001, etc.). Most of the MX2-related interacting proteins were associated with SLE, and potential transcription factors of MX2 and its related genes were also significantly associated with the immune response. Conclusion Our study found that MX2 can serve as an immune-related biomarker for predicting the diagnosis and disease activity of SLE. It activates the NOD-like receptor signaling pathway and promotes neutrophil infiltration to aggravate SLE. Keywords: systemic lupus erythematosus, MX2, machine learning, biomarker, immune infiltration Introduction Systemic lupus erythematosus (SLE), a systemic autoimmune disease characterized by abnormal activity of the immune system, is a serious threat to human health ([34]1, [35]2). Recently, accumulating evidence has demonstrated that some biomarkers have important reference values for the early diagnosis and treatment of SLE. For instance, hsa_circ_0000479 and TCONS_00483150 are novel diagnostic markers for SLE ([36]3, [37]4). PGLYRP2 and DTX1 can predict disease activity in SLE patients ([38]5, [39]6). DKK-1 can be used as a biomarker for active lupus erythematosus nephritis ([40]7). However, the current SLE-related biomarkers still lack sufficient sensitivity, specificity and predictive power for clinical application. Based on the pathological changes in the immune microenvironment, it is necessary to discover new immune-related biomarkers for SLE diagnosis and development. MX2 (MX Dynamin Like GTPase 2), encodes a protein with nuclear and cytoplasmic forms and is a member of the dynamin family and large GTPase family. The nuclear form is localized in a granular pattern in the heterochromatin region below the nuclear envelope. In the present study, we found that MX2 is closely related to inflammation-related pathways. Interestingly, inflammation-related signals, such as Toll-like receptor, NF-κB and Nod-like receptor, also play an important role in the development of SLE. For instance, in SLE patients, loss of B-cell tolerance to autoantigens is controlled by Toll-like receptors in a cell-intrinsic manner ([41]8). Toll-like receptor signaling inhibitory peptides ameliorate inflammation in animal models and human SLE ([42]9). A recent study found that reduced TNFAIP3 and enhanced UBE2L3 synergistically activate NF-κB, thereby synergistically increasing lupus risk ([43]10). Peli1 negatively regulates noncanonical NF-κB signaling to inhibit SLE ([44]11). Curcumin attenuates murine lupus by inhibiting NLRP3 inflammatory vesicles ([45]12). Dietary olive bitter glycosides and their acyl derivatives ameliorated the pristane-induced inflammatory response in peritoneal macrophages of SLE mice and murine lupus nephritis by inhibiting the NLRP3 inflammatory vesicle pathway ([46]13, [47]14). Meanwhile, MX2 has been found to play an important role in immune-related diseases and immune cells, but the function of MX2 and its detailed regulatory mechanisms in SLE remain unclear ([48]15–[49]18). In the present study, a total of five methods, namely, weighted gene co-expression network analysis (WGCNA), support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), correlation with SLEDAI and differential expressed genes (DEGs) analysis, were initially applied to screen MX2, a potential biomarker for SLE. The differential expression levels of MX2 in SLE vs. healthy individuals were validated by real-time quantitative PCR (RT‒qPCR). Then, the relationship between MX2 and clinical parameters in SLE patients was analyzed. Thereafter, the mechanism by which MX2 promotes SLE was investigated by enrichKEGG and GSEA. Four immune infiltration analysis algorithms, including ssGSEA, CIBERSORT, xCell and ImmuCellAI, were performed to explore the relationship between MX2 and immune cells. Finally, the transcriptional regulatory network of MX2 was constructed, and the corresponding therapeutic drugs were predicted. The flowchart of this study was shown in [50]Figure 1 . Figure 1. [51]Figure 1 [52]Open in a new tab The flowchart of the analysis process. Material and methods Data collection and processing SLE-related public data were first downloaded from the Gene Expression Omnibus (GEO) database ([53]19, [54]20), and seven datasets were collected: three datasets from PBMC samples [[55]GSE121239 ([56]GPL13158, Normal: 20, SLE: 292) ([57]21, [58]22), [59]GSE11907 ([60]GPL96, Normal: 12, SLE: 110) ([61]23), [62]GSE81622 ([63]GPL10558, Normal: 25, SLE: 30) ([64]24)]; four datasets from whole blood samples [[65]GSE65391 ([66]GPL10558, Normal: 72, SLE: 924) ([67]25), [68]GSE100163 ([69]GPL6884, Normal: 14, SLE: 55) ([70]26–[71]28), [72]GSE45291 ([73]GPL13158, Normal: 20, SLE: 292) ([74]29), [75]GSE49454 ([76]GPL1261, Normal: 20, SLE: 157) ([77]30)]. GEO2R was used for differential expression analysis of the [78]GSE121239, [79]GSE11907 and [80]GSE81622 datasets. |log2 FC| > 0.5 and adj.P.Val <0.05 were screened as differentially expressed genes (DEGs). Information on the datasets was displayed in [81]Supplementary File S1 , and the UMAP plots of the datasets were shown in [82]Supplementary Figure S1 . Biomarker screening Five methods were used for the screening of biomarkers of SLE: WGCNA ([83]31), SVM-RFE and RF ([84]32, [85]33) based on DEGs results from the [86]GSE121239 dataset, SLEDAI correlation analysis, and the difference analysis of two SLE datasets ([87]GSE11907, [88]GSE81622). Gene module classification and identification of DEGs was performed using the R package “WGCNA” ([89]34, [90]35). First, the samples were clustered to assess whether there were any significant outliers. Second, the automatic network construction function was used to construct the co-expression network. A soft threshold β is calculated. Third, hierarchical clustering and dynamic tree cutting functions were used to detect modules. Finally, gene importance (GS) and module membership (MM) were calculated. Linking modules to clinical traits. The corresponding module extracted the corresponding module gene information for further analysis. The SVM classifier was constructed using the R package “e1071” and cross-validated to eliminate regression features. The “randomForest” package was used for the RF classification process. For correlation analysis, the Spearman correlation coefficient was applied, and GEO2R was used for DEGs analysis of the SLE dataset ([91]GSE11907, [92]GSE81622). Then, overlapping genes were selected from the above five methods for further analysis in this study. The specific screening conditions for the analytic methods were shown in [93]Supplementary File S1 . PBMC isolation A total of 47 samples, including 23 SLE samples and 24 healthy controls, were collected from the First Affiliated Hospital of USTC (Anhui Provincial Hospital). This study was approved by the ethics committee of the First Affiliated Hospital of USTC, and informed consent forms were signed by the patients. Peripheral blood mononuclear cells (PBMCs) were isolated from EDTA anticoagulated by FicollPaque density gradient centrifugation. Then, the cells were cultured in RPMI-1640 medium. Cells were maintained in a humidified incubator at 37°C with 5% CO[2]. Real-time quantitative PCR RNA was extracted using total RNA isolation reagent (Epizyme, Shanghai, China). A cDNA synthesis kit (Bioer, Hangzhou, China) was used for reverse transcription of total RNA. GAPDH (forward: 5’-GTCTCCTCTGACTTCAACAGCG-3’, reverse: 5’-ACCACCCTGTTGCTGTAGCCAA-3’), MX2 (forward: 5’-CACCGAGCTAGAGCTTCAGGA-3’, reverse: 5’-CCGGGAAGGTCAATGATGGT-3’), and GAPDH was subsequently processed as internal references. Relative expression was calculated by the 2^-ΔΔCt