Abstract Background We developed a novel system for quantifying DNA damage response (DDR) to help diagnose and predict the risk of Alzheimer’s disease (AD). Methods We thoroughly estimated the DDR patterns in AD patients Using 179 DDR regulators. Single-cell techniques were conducted to validate the DDR levels and intercellular communications in cognitively impaired patients. The consensus clustering algorithm was utilized to group 167 AD patients into diverse subgroups after a WGCNA approach was employed to discover DDR-related lncRNAs. The distinctions between the categories in terms of clinical characteristics, DDR levels, biological behaviors, and immunological characteristics were evaluated. For the purpose of choosing distinctive lncRNAs associated with DDR, four machine learning algorithms, including LASSO, SVM-RFE, RF, and XGBoost, were utilized. A risk model was established based on the characteristic lncRNAs. Results The progression of AD was highly correlated with DDR levels. Single-cell studies confirmed that DDR activity was lower in cognitively impaired patients and was mainly enriched in T cells and B cells. DDR-related lncRNAs were discovered based on gene expression, and two different heterogeneous subtypes (C1 and C2) were identified. DDR C1 belonged to the non-immune phenotype, while DDR C2 was regarded as the immune phenotype. Based on various machine learning techniques, four distinctive lncRNAs associated with DDR, including FBXO30-DT, TBX2-AS1, ADAMTS9-AS2, and MEG3 were discovered. The 4-lncRNA based riskScore demonstrated acceptable efficacy in the diagnosis of AD and offered significant clinical advantages to AD patients. The riskScore ultimately divided AD patients into low- and high-risk categories. In comparison to the low-risk group, high-risk patients showed lower DDR activity, accompanied by higher levels of immune infiltration and immunological score. The prospective medications for the treatment of AD patients with low and high risk also included arachidonyltrifluoromethane and TTNPB, respectively, Conclusions In conclusion, immunological microenvironment and disease progression in AD patients were significantly predicted by DDR-associated genes and lncRNAs. A theoretical underpinning for the individualized treatment of AD patients was provided by the suggested genetic subtypes and risk model based on DDR. Keywords: DNA damage response, single-cell, Alzheimer’s disease, molecular subtypes, machine learning, immunity Background Alzheimer’s disease (AD) is currently considered the most well-known form of dementia worldwide, as evidenced by the over-accumulation of extracellular amyloid plaque and the entanglement of neurofibrillary ([37]1). The number of people with AD is positively correlated with advanced age, with more than 50 million people affected by AD ([38]2). It is worth noting that most AD patients exhibit a poor prognosis with a median survival time of only 5-10 years due to a lack of early diagnose and effective treatment ([39]3). Though some FDA-approved pharmacological treatments such as donepezil, rivastigmine, galanthamine, and other drugs have been used to prevent the progression of AD in the past decades ([40]4, [41]5), the heterogeneity of AD patients limits the therapeutic efficacy of these drugs ([42]6). Distinct molecular characteristics have been reported to be the main cause of AD heterogeneity, which is also closely related to the differences in clinical outcomes ([43]7–[44]9). Nonetheless, the potential molecular mechanisms underlying AD heterogeneity remain largely unknown. Therefore, to guide the individualized treatment of AD patients, it is necessary to clarify the heterogeneity of AD and accurately distinguish the molecular characteristics of each patient. Genomic instability is one of the cardinal features of AD, and the DNA damage response (DDR) exerts an important role in maintaining genome integrity ([45]10). DDR contains several well-coordinated processes, including the detection of DNA damage, the transduction of DNA damage signals, the promotion of DNA damage repair, the activation of cell cycle checkpoints, and the initiation of apoptosis when damage is irreversible ([46]11). Intracellular DDR mechanisms enable cells to detect and repair DNA damage, and improper repair is one of the main causes of disease development and progression, including AD ([47]12–[48]14). Recent studies have shown that the accumulation of DNA damage is a well-recognized factor in aging and plays a vital role in the initiation of AD. It was found that DDR deficiency caused by mutations in DDR regulators, including BRAC1 occurs in the brain regions of AD and is implicated in the development of pathology ([49]13). In addition, the breast cancer susceptibility gene 1, a DDR-associated gene, was found to accumulate in neurofibrillary tangles in AD brain. Its dysregulation is positively correlated with the pathogenesis of tauopathies ([50]15, [51]16). DDR-related genes have been extensively studied in non-neural cancer tissues, but less is known in the nervous system. Therefore, it is imperative to comprehensively elucidate the expression patterns of DDR-related regulators and the potential molecular mechanisms of DDR in AD pathogenesis. Long non-coding RNAs (LncRNAs) are a subclass of RNA molecules that are more than 200 nucleotides in length and are not translated into proteins. They are thought to be closely related to transcription, epigenetics, and post-transcriptional regulation ([52]17). An increasing number of lncRNAs have been demonstrated to be participated in AD development and pathogenesis, and have been shown to serve as novel biomarkers for early diagnosis and effective therapeutic targets for patients with AD ([53]18, [54]19). Several researchers demonstrated that lncRNAs also exert a protective role in promoting cell survival and preventing the development of various diseases via sustaining genomic stability. For example, as the upstream regulator of DDR, the lncRNA exerts a vital role in resisting heart failure via inhibiting the ataxia telangiectasia mutated (ATM)-DDR signaling pathway and increasing the activation of mitochondrial bioenergetics ([55]20). Moreover, in the nervous system, the interaction of brain specific DNA damage-related lncRNA1 (BS-DRL1) with the chromatin protein HMGB1 induced by DDR can improve motor function and delay the degeneration of Purkinje cells in mice ([56]21). In addition, another study also demonstrated that LncRNA Meg3 can function as the stabilizer of the DDR-related gene PTBP3 and participate in the maintenance of endothelial function ([57]22). However, the role of lncRNA-mediated DDR signaling pathway in AD remains unknown and needs further exploration. We thoroughly assessed the DDR expressional patterns in AD patients in our investigation. Data from single cells were used to visualize the DDR landscape of different cell types in AD. Weighted gene coexpression network analysis (WGCNA) was used to identify DDR-associated lncRNAs, and 154 AD patients were then classified into heterogeneous subtypes based on their clinical traits, biological behaviors, DDR levels, and immunological characteristics. The suitable lncRNAs connected to DDR were then chosen using four machine learning techniques, including least absolute shrinkage and selection operator (LASSO), support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and eXtreme Gradient Boosting (XGBoost). For AD patients at various risk levels, a scoring system was developed to determine their biological traits, immunological microenvironment, and prospective treatment medications. Overall, this research creatively clarified the link between DDR expression patterns and AD heterogeneity, offering novel perspectives on how to treat AD patients on an individual basis. Materials Bulk transcriptome data acquisition and pre-processing The bulk AD transcriptome data were retrieved from the GEO (Gene Expression Omnibus, [58]https://www.ncbi.nlm.nih.gov/geo/) database. [59]GSE48350, [60]GSE5281, and [61]GSE28146 with lncRNAs and mRNAs data are based on the [62]GPL570 platform, which included 173 normal and 80 AD brain tissue samples, 74 healthy brain tissues and 87 brain tissues from AD patients, 8 no-AD and 22 AD brain tissue samples, respectively ([63]23–[64]25). [65]GSE122063 that mainly contained mRNAs data is based on the [66]GPL16699 platform and included 44 normal and 56 AD brain tissues samples ([67]26). In addition, we extracted mRNAs expressional data from 157 normal and 319 AD brain tissues samples from the [68]GSE33000 dataset (built on the [69]GPL4372 platform) ([70]27). Since [71]GSE48350 and [72]GSE5281 datasets were combined based on the Combat function of “sva” R package ([73]http://bioconductor.org/help/search/index.html?q=sva/) and a total of 8 abnormally expressed samples were removed ([74]28). In addition, due to the high proportion of non-elderly samples identified in the normal group of the [75]GSE48350 dataset, we chose the normal samples aged over 65 years for the further study. Eventually, a total of 150 normal and 161 AD brain tissues samples were obtained. While other three datasets [76]GSE28146, [77]GSE122063, and [78]GSE33000 were selected as the validation sets. The raw data were log2-transformed and normalized according to the Robust Multiple Array Average (RMA) function of the “affy” R package [79]http://www.bioconductor.org/help/search/index.html?q=affy/). Differential analysis was performed using the “limma” R package ([80]http://www.bioconductor.org/help/search/index.html?q=limma/) and adjusted p-values (FDR) for DElncRNAs were determined. Genes with the value of |log2FC|>0.5 and FDR <0.05 between DDR subtypes or risk groups were determined as Differential expressed gene (DEGs). Single-cell sequencing data processing The single-cell transcriptome data (15 mild cognitive impairment (MCI)/AD and 44 normal CSF samples) were obtained from the GEO database ([81]GSE200164). The expression matrix was normalized by the “NormalizeData” function of the “Seurat” package [82]https://cran.r-project.org/web/packages/Seurat/index.html). Integrated datasets and batch elimination were generated with the IntegrateData function of the “Seurat” package. Subsequently, the combined object underwent principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis. The filtering of the cells was performed based on the following parameters: Cell count >3 cells, 200 genes