ABSTRACT Molecular subtyping in diffuse large B‐cell lymphoma (DLBCL) leads to facilitating drug selection. However, an integrated prognostic model based on molecular subtyping and clinical features has not been well established. Here, we retrospectively performed whole genome sequencing, whole exome sequencing, and fluorescence in situ hybridization in newly diagnosed DLBCLs, established a simplified LymphType algorithm for classification evaluation, and proposed a new integrated prognostic stratification system, combined molecular subtypes and International Prognostic Index (IPI) scoring system in our in‐house sequencing cohort (N = 100), and validated in three public cohorts (N = 1480). Compared with IPI scoring system and classification algorithm model alone, the discrimination ability of prognostic model based on the new integrated model showed best discrimination of overall survival with concordance index value (0.773 vs. 0.724 vs. 0.648). We subsequently established a four‐category risk model defined for the integrated prognostic model as follows: low, low‐intermediate, high‐intermediate, and high risk, demonstrating stronger prognostic separation across all end points (all p < 0.001) in our in‐house cohort and three validation cohorts. Collectively, the new feasible integrated prognostic stratification system contributes to accurate prognosis assessment in clinical routine and provides a new basis for the follow‐up treatment. Keywords: diffuse large B‐cell lymphoma, defined genetic subtype, LymphType, International Prognostic Index, integrated prognostic model __________________________________________________________________ 1. Genetic subtypes based on molecular subtyping correlated with prognosis in DLBCL. 2. Integrated genetic subtype‐based IPI model revealed prognostic discrimination. 3. New four‐category risk model defined for integrated model showed diverse prognosis. graphic file with name MCO2-6-e70190-g005.jpg 1. Introduction Diffuse large B‐cell lymphoma (DLBCL), a heterogeneous disease, accounts for highest incidence in non‐Hodgkin lymphoma [[52]1, [53]2]. Disease management is challenged by heterogeneity in clinical outcomes [[54]3]. This malignancy exhibits significant clinical heterogeneity, characterized by various morphologic, genetic, and phenotypic features, which contribute to its variable prognosis and response to treatment. Combined immunochemotherapy and targeted therapy have changed the management of DLBCL over the past decade [[55]4, [56]5, [57]6]. Despite significant progress in the treatment of DLBCL, a subset of patients still experiences poor prognosis. In recent years, risk prognostic factors for DLBCL are increasingly being reported. International Prognostic Index (IPI) scoring system, including age, lactate dehydrogenase, performance status, stage, and extranodal involvement, is routinely used as global standard to predict prognostic stratification of DLBCL [[58]7, [59]8]. While useful, IPI scoring system does not fully encompass the genetic heterogeneity observed in DLBCL. The integration of genomic or transcriptomic data into existing prognostic frameworks is essential for enhancing predictive accuracy and tailoring treatment approaches to individual patient profiles. In order to better recognize the molecular mechanism of disease occurrence and development, genomic and transcriptomic abnormalities have recently been proved to be valuable prognostic biomarkers in multiple studies based on massively parallel next‐generation sequencing, playing a crucial role in the pathogenesis of DLBCL [[60]9, [61]10, [62]11, [63]12, [64]13, [65]14]. Genomic studies have identified several recurrent genetic mutations in DLBCL, such as MYD88 and CD79B [[66]15]. These mutations have been associated with specific clinical and biological features of the disease. Additionally, gene expression profiling has led to the identification of distinct molecular subtypes of DLBCL, which have different responses to treatment [[67]16]. A robust prediction model based on gene expression profiling facilitated the prognostic evaluation and risk stratification of patients with DLBCL [[68]9]. Recent studies have also shown that defined genetic subtypes of DLBCL were both a potential target for drug efficacy evaluation, and an important biomarker for prognostic stratification [[69]17, [70]18, [71]19, [72]20, [73]21, [74]22]. Although these subtyping methods have shed light onto the defined genetic subtypes, there remains a critical gap in the clinical application of genetic findings to improve patient stratification and treatment personalization. Up to now, an integrated prognostic model based on molecular typing algorithm and clinical features has not yet been well established. Here, we aim to build a simplified algorithm to realize six defined genetic subtypes and propose a new integrated prognostic stratification system in newly diagnosed DLBCL, which could potentially lead to personalized treatment strategies and improved patient outcomes. 2. Results 2.1. Molecular Characteristics The clinical characteristics of 100 newly diagnosed DLBCL patients in our Peking University Cancer Hospital & Institute (PKUCH) cohort are shown in Table [75]S1, including 61 males and 39 females, with a median age of 57 years (range, 26–89). Forty‐one percent (41 out of 100) of the patients had internal lymph node lesions, and the rest 59.0% (59 out of 100) had external lymph node lesions, including primary testis, breast, and other sites. According to Hans cell of origin (COO) classification [[76]23], 27.0% were germinal center B‐cell like (GCB). Flow chart of this study design was shown in Figure [77]1. FIGURE 1. FIGURE 1 [78]Open in a new tab Enrolment of study cohort. DLBCL, diffuse large B‐cell lymphoma; R‐CHOP, rituximab, cyclophosphamide, doxorubicin hydrochloride, vincristine, and prednisone; WGS, whole genome sequencing; WES, whole exome sequencing; FISH, fluorescence in situ hybridization; IPI, International Prognostic Index; NA, not appliable. Genomic landscape, including gene mutations, gene copy number variations (CNVs), and chromosomal CNVs, was established in Figure [79]2. Significant associations of gene mutations were discovered between mutated MYD88 and mutations in CD79B, PIM1, IGLL5, BCL2, ETV6, KLHL14, GRHPR, and TBL1XR1, and between mutated TP53 and CD79B, PIM1, and IGLL5 mutations (all p < 0.05; Figure [80]S1). Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis of the mutated genes revealed significant enrichment in pathways related to pathways in cancers, ECM–receptor interaction, focal adhesion, MAPK signaling pathway, and signal transduction and human papillomavirus infection (Figure [81]3A). Gene ontology enrichment results were shown in Figure [82]S2, including biological process, cellular component, and molecular function analysis. Based on mutational signature analysis of 96 substitution patterns using the non‐negative matrix factorization algorithm [[83]24], we discovered three mutational signatures, including Signature 1, Signature 15, and Signature A, related to age of cancer diagnosis, defective DNA mismatch repair, and unknown function, respectively (Figure [84]3B). FIGURE 2. FIGURE 2 [85]Open in a new tab Genomic landscape in the training cohort. Heatmap shows top specific mutated genes in gene mutations, gene CNVs, and chromosomal CNVs in each patient, detected in ≥15% patients. CNV, copy number variation. FIGURE 3. FIGURE 3 [86]Open in a new tab Genetic characteristics in the training cohort. (A) Bubble chart illustrates the mutation analysis of KEGG pathways. (B) Mutational signatures are displayed according to the 96 substitution classification defined by the different substitution class. Volcano plots display the correlation between gene mutations in mutated genes detected in ≥15% patients, and age (C), Hans COO classification (D), stage (E), invasion organ (F), IPI (G), and DE (H), respectively. KEGG, Kyoto Encyclopedia of Genes and Genomes; GCB, germinal center B‐cell like; IPI, International Prognostic Index; DE, BCL2/MYC double expressors. To explore the association of gene mutations with age, Hans COO classification, stage, invasion organ, IPI score, BCL2/MYC double expressors (DE), and treatment response, we performed gene‐related subgroup analyses. In different age groups, mutated ETV6 was significantly present in elderly patients (p < 0.05; Figure [87]3C). Based on Hans COO classification, we discovered that mutated BTG2 was significantly present in GCB group, compared with non‐GCB group (p < 0.05; Figure [88]3D). From the comparison of different stages, mutated BTG1 and DUSP2 were more common in patients with lower stages I–II (both p < 0.05; Figure [89]3E). Compared with primary lymphatic nodes, mutated MYD88, ETV6, PIM1, PRDM15, and FOXC1 significantly existed in primary extranodal lymphomas (all p < 0.05; Figure [90]3F). Furthermore, in the IPI comparison groups, we concluded mutated ACTB was more familiar in lower IPI 0–2 group (p < 0.05; Figure [91]3G). Interestingly, mutated TBL1XR1 was correlated with DE group, while GOLGA6L2 mutation was related to non‐DE group (both p < 0.05; Figure [92]3H). In the analysis of treatment response, we combined complete response and partial response patients into response group, stable disease and progressive disease patients into nonresponse group for comparison, we found that no mutated genes were more distributed in response or nonresponse group (all p > 0.05). 2.2. Full and Simplified Versions of LymphType Algorithm Assessments According to the LymphGen algorithm on defined genetic subtypes of DLBCL [[93]19], we implemented the optimization and construction of LymphType algorithm internally. From the comparison results of data classification in NCI cohort (N = 574) using LymphGen algorithm, we reached 99.8% consistency through inhouse algorithm. Among them, the classification consistency of A53, BN2, EZB, MCD, and ST2 subtypes reached 100%, genetic composite subtype reached 100%, and only one patient in the N1 classification on LymphGen algorithm was divided into “Other” subgroup (Figure [94]4A). FIGURE 4. FIGURE 4 [95]Open in a new tab Performance evaluation of full and simplified versions of LymphType algorithm. (A) Sankey plot shows the comparison of the full version of LymphType and LymphGen algorithms in NCI cohort. (B) Pie chart displays the defined genetic subtypes based on full version of LymphType algorithm in PKUCH cohort. (C) Histogram shows the comparison between different genetic subtypes based on full version of LymphType algorithm in primary lymphatic node and extranodal lymphomas in PKUCH cohort. (D) Kaplan–Meier survival curve shows the prognostic effect of OS in PKUCH cohort according to full version of LymphType algorithm. (E) Sankey plot shows the comparison of the simplified version and full version of LymphType algorithm in PKUCH cohort. (F) Confusion matrix displays the number of matches in each defined genetic subtype based on the full and simplified versions of LymphType in PKUCH cohort. OS, overall survival. Next, we conducted molecular typing analysis of the LymphType algorithm on the results of 100 retrospective patients in our single‐center cohort. We discovered that the A53, BN2, EZB, MCD, N1, ST2, and genetically composite subgroups accounted for 4.0, 20.0, 4.0, 24.0, 1.0, 3.0, and 5.0% (Figure [96]4B). We further analyzed the genetic subtypes of DLBCL patients at different invasion organs. Compared with primary lymphatic nodes, the most common subtype was MCD in primary extranodal lymphomas (32.8 vs. 13.5%, p = 0.035; Figure [97]4C), which was consistent with previous study [[98]19]. We then assessed the relationship between six defined genetic subtypes and prognosis in our training cohort. Overall, molecular typing tended to be an ideal way to distinguish overall survival (OS) in patients (Figures [99]4D and [100]S3). At present, the full version of our LymphType algorithm achieves accurate classification in DLBCL into six different defined genetic subtypes based on probabilistic method. However, due to the involvement of multiple omics analysis and intensive cost, the complex algorithm, including whole genome sequencing (WGS), whole exome sequencing (WES), and fluorescence in situ hybridization (FISH), brings some difficulties in clinical practice. Therefore, we propose a simplified algorithm for classification evaluation, achieved by WGS, targeted 74‐gene panel sequencing (Table [101]S2), and FISH analysis. Compared with the full version of the LymphType algorithm, the accuracy of the simplified version is as high as 99.0%, as shown in Figure [102]4E,F. One patient (one out of 100) in the BN2 classification on full version of LymphType algorithm was classified into “Other” subgroup based on simplified LymphType algorithm, indicating the simplified algorithm has the nearly same prediction effect as the full version of the algorithm, which can be used in the subsequent research. 2.3. Integrated IPI and Simplified LymphType Algorithm Prognostic Model Development We next evaluated optimal feature selection for prognostic model development of OS. Considering that composite subtype is composed of two or more single subtypes, we included specified subtype contained in composite subtype into each single subtype in prognostic analysis (Figure [103]S3). In univariable Kaplan–Meier curve analysis to determine possible predictive factors associated with OS, we further analyzed age, performance status, stage, extranodal site, IPI scoring system, and genetic subtypes including MCD and A53 (both p < 0.2), as shown in Table [104]S3. To prevent multicollinearity, we excluded variables discovered in univariable Kaplan–Meier curve analysis and also existed in IPI scoring system: age, performance status, stage, and extranodal involvement. Finally, three variables were incorporated in the integrated model based on IPI and two defined genetic subtypes, namely genetic subtype‐based IPI (IPI‐G) model, using least absolute shrinkage and selection operator (LASSO) Cox regression model, built as a weighted sum observed for each patient, based on coefficient profiles (Figure [105]5A,B). The integrated IPI‐G prognostic model was built including the weighted coefficients of these variables: IPI × 1.19 + MCD × 1.66 + A53 × 2.79 (IPI scored as 0–3, 0 denotes low risk [LR], 1 low‐intermediate risk [LIR], 2 high‐intermediate risk [HIR], and 3 high risk [HR]; two genetic subtypes mentioned above scored as 1). A prognostic nomogram that integrated all the three variables from the LASSO Cox regression model was constructed (Figure [106]5C). To discriminate and calibrate the nomogram for predicting OS, calibration curves were built to illustrate the optimal consistency for OS probability between predictions and actual observations in the training PKUCH cohort and validation NCI, BCA, and DHP cohorts (Tables [107]S4–S6). All the calibration curves from the training cohort and three validation cohorts were well fitted (Figure [108]S4). FIGURE 5. FIGURE 5 [109]Open in a new tab Predictive performance evaluation of the integrated IPI‐G prognostic model. (A) Ten‐fold cross validation curve for tuning parameter selection. The vertical and horizontal axis represents mean square error and λ, respectively. (B) The coefficient curve for tuning parameter. The vertical and horizontal axis represents the feature's coefficient and λ, respectively. (C) Nomogram model for predicting OS. (D) Predictive performance based on C‐index comparisons in three models, including IPI‐G, IPI, G models. (E) ROCs represent the predictive performances in the three models in the training PKUCH cohort. (F) Four‐category risk group defined for the integrated IPI‐G prognostic model. LASSO, least absolute shrinkage and selection operator; λ, lambda; OS, overall survival; C‐index, concordance index; IPI, International Prognostic Index; IPI‐G, genetic subtype‐based IPI; G, genetic subtype; ROC, receiver operating characteristic curve; LR, low risk; LIR, low‐intermediate risk; HIR, high‐intermediate risk; HR, high risk. To investigate the performance difference in predicting prognosis among IPI model, genetic subtype (G) model, and the new integrated IPI‐G model, we compared the discrimination ability with the concordance index (C‐index). The new IPI‐G prognostic model indicated best discrimination of OS with C‐index value of 0.773 in PKUCH cohort, compared with IPI score, classification algorithm model alone with C‐index value of 0.724 and 0.648, respectively. Similarity results were also seen in NCI, BCA, and DHP cohorts (Figure [110]5D). The area under curve (AUC) for predicting 3‐year OS displayed more excellent conformity based on the integrated IPI‐G model, compared with IPI model, and G model alone (AUC, 0.788 vs. 0.750 vs. 0.637) (Figure [111]5E). We further tried subgroup analysis in the PKUCH cohort, including DE subgroup and non‐GCB subgroup, and the performance of the IPI model was both demonstrated in the above two subgroups (Figure [112]S5). As the four‐category IPI scoring system has guided clinical studies, we subsequently established a four‐category risk model defined for the integrated IPI‐G prognostic model mainly based on the maximally selected log‐rank statistics as follows: LR, LIR, HIR, and HR, scored at ≤1.00, <1.00 to ≤1.50, <1.50 to ≤4.00, and >4.00, respectively (Figure [113]5F). Our new four‐category risk IPI‐G model demonstrated stronger prognostic separation across all end points and especially to solve the cross problem of partial survival curves, compared with the four‐category IPI model, in the training PKUCH cohort from our center (Figure [114]6A) and the validation NCI, BCA, and DHP cohorts (Figure [115]6B–D). Due to the limited sample size of each cohort, we combined the four cohorts for data analysis (N = 1209). We discovered the crossover phenomenon was existed between LIR and HIR survival curves in the IPI model, but the IPI‐G model can successfully enhance patient stratification (Figure [116]S6). FIGURE 6. FIGURE 6 [117]Open in a new tab The four‐category risk model defined for the integrated IPI‐G prognostic model. Kaplan–Meier survival curves and Sankey plots based on four‐category integrated IPI‐G prognostic model, compared with IPI model, in the training PKUCH cohort (A), and the validation NCI cohort (B), BCA cohort (C), and DHP cohort (D), respectively. OS, overall survival; IPI, International Prognostic Index; IPI‐G, genetic subtype‐based IPI; LR, low risk; LIR, low‐intermediate risk; HIR, high‐intermediate risk; HR, high risk. 3. Discussion In the present study, we built a simplified algorithm to realize six defined genetic subtypes based on WGS, targeted 74‐gene panel sequencing, and BCL2 or BCL6 rearrangement status, and first developed a new integrated prognostic stratification system, combined IPI scoring system and simplified defined genetic subtypes in DLBCL. Our research confirmed the landscape of genetic alterations, including gene mutations, gene CNVs, chromosomal CNVs, and BCL2 or BCL6 rearrangements, in newly diagnosed DLBCLs. The most frequently mutated genes discovered in our cohort were IGLL5 (76.0%), PIM1 (74.0%), HIST1H1B (63.0%), HIST1H1E (63.0%), and BTG2 (61.1%), consistent with the results of previous Sánchez‑Beato et al.’s study [[118]25]. The prognostic value of gene mutations has been well reported in several studies. DLBCL patients with TP53 mutations harbored shorter survival [[119]12, [120]26–[121]28]. Mutations in CD79B, ETS1, and CD58 had a significantly inferior survival [[122]25]. NOTCH1 mutations, independent of established clinical variables, were significantly associated with poorer survival [[123]29]. A related study have shown that patients can be stratified via the gene expression profiling‐based model [[124]9]. In our study, we identified several genes that align with those reported in the referenced article, such as HLA‐B, ZFP36L1, and ITPKB. While the key genes in our mutational model, such as TP53, MYD88, and CD79B, were not discovered in the expression gene model. However, whether the above gene mutations are related to gene expression needs to be further studied in basic research. According to gene expression profiling, the well‐known COO classification divided DLBCLs into activated B cell and GCB subtypes, closely related to the prognosis [[125]30, [126]31]. The emergence of the Hans classification, with immunohistochemical analysis of CD10, BCL6, and MUM1, made COO classification an easier method in clinical practice [[127]23]. Molecular subtyping studies in DLBCL based on genetic information have been reported gradually in recent years [[128]12, [129]17, [130]19, [131]20, [132]25, [133]32], leading to the proposals of novel defined genetic subtypes determined by distinct genetic patterns. In 2018, Staudt et al. [[134]17] first identified four prominent genetic subtypes, including MCD, BN2, N1, and EZB, in 574 DLBCL patients, providing a potential classification for precision‐medicine strategies. Almost at the same time, Shipp et al. [[135]18] performed a comprehensive genetic analysis in 304 primary DLBCLs and discovered five distinct DLBCL subsets, including Cluster 0–5. In 2020, the above research group, Staudt et al. [[136]17], then proposed a seven‐classification algorithm, named LymphGen algorithm, containing A53, BN2, EZB‐MYC^+, EZB‐MYC^−, MCD, N1, and ST2. They discovered distinct genetic subtypes harbored different prognosis and pathway dependencies, suggesting that drug use could be guided according to different genetic subtypes [[137]19]. The Phoenix trial concluded that MCD or N1 subtypes of DLBCL patients (aged ≤60 years) experienced more significantly improved event‐free survival treated with ibrutinib plus rituximab, cyclophosphamide, doxorubicin hydrochloride, vincristine, and prednisone (R‐CHOP) regimen, compared with R‐CHOP alone [[138]33]. Based on LymphGen algorithm [[139]19], several research groups have done optimization and research on this basis, and proposed simplified versions of the algorithm, including Sakaida et al. [[140]32] from a Japanese cohort, Sánchez‐Beato et al. [[141]25] from a Spain cohort, and Zhao et al. [[142]20] from a Chinese cohort. Different from other studies mentioned above, the samples of enrolled patients in our study were all adopted under unified data acquisition conditions, such as the same sequencing panel for WES and sequencing platform for WES and WGS, to ensure the comparability of results. The integrated IPI‐G model based on our study is more convincing. Our six defined genetic subtypes, including A53, BN2, EZB, MCD, N1, and ST2, were similar to those previously reported, and the most frequent subtype in our PKUCH cohort was MCD, characterized by cooccurrence of MYD88 and CD79B mutations [[143]13, [144]17, [145]19]. Unfortunately, since our previous design did not incorporate the MYC rearrangement status, our algorithm could not further classify the EZB subtype into EZB‐MYC^+ or EZB‐MYC^−. The complexity of algorithms mentioned above, containing the large number of gene mutations, gene CNVs, chromosomal CNVs, and rearrangements used to define the genetic subtypes, made it challenging to perform them in the real‐world clinical routine. Therefore, the proposed simplified version of the LymphType algorithm, with 99.0% consistency of full version algorithm, which was close agreement with that of LymphGen (99.8%), can be more convenient for clinical use, as the simplified version of LymphType changes the WES involved in molecular typing to multigene panel sequencing. Since the core determinant of A53 subtype is chromosomal CNV [[146]19], WGS sequencing data can be used to identify chromosomal CNV more accurately. The exact classification of A53 is also conducive to the accurate determination of other subtypes. Therefore, our simplified version of the algorithm incorporates WGS sequencing. From the point of view of clinical translation, our simplified version of the algorithm overcame magnificent obstacles, including complicated computational expertise and intensive cost. In terms of the sequencing process, it is relatively simple to complete the construction of the experimental sequencing libraries with the same tissue samples, which are used for both WGS and WES sequencing, respectively. The survival rates of DLBCL patients with different defined genetic subtypes were revealed to be diverse, and both MCD and A53 subtypes were observed to be a poorer prognostic subtype in our cohort, consistent with previously studies [[147]12, [148]17–[149]19, [150]22, [151]34]. Highlighting the significance of our finding, although the genetic mutation analysis in the DLBCL prognosis has been reported, the prognosis assessment model of gene mutations combined with clinical characteristics, such as IPI, has not been well explored. The new IPI‐G nomogram model exhibited excellent prediction ability with a C‐index of 0.773 better than IPI score system or classification algorithm alone, indicating IPI score combined with molecular subtyping plays an important role in prognostic stratification. Considering the feasibility in clinical practice, we classified the integrated model into four categories, and found the four‐category model could effectively distinguish the prognosis of DLBCL patients, especially for patients with LIR and HIR based on IPI model. Genetic subtyping results help stratify patient prognosis and select targeted drugs, ultimately enhancing clinical benefits. For example, MCD‐subtype patients can be treated with Bruton's tyrosine kinase inhibitors to enhance efficacy and prognosis [[152]21, [153]33]. In the future, newly diagnosed patients should undergo molecular typing tests for drug selection and comprehensive prognostic evaluation. Our study focused solely on an in‐depth analysis of DNA data. Given the prognostic value of RNA expression results [[154]9], we conclude that combining DNA and RNA data may more effectively differentiate patient prognosis. However, this requires further verification. There are several limitations in our current study. First, this was a single‐center retrospective study. Second, the integrated model was developed in a relatively small cohort despite external validations. Third, the role of genetic subtyping in drug efficacy was not investigated in this study. A multicenter prospective study is needed to verify the feasibility of this model and drug efficacy evaluation in the future. In summary, we build a new feasible integrated prognostic stratification system, consisting of IPI scoring system and simplified defined genetic subtypes, in newly diagnosed DLBCL, contributing to accurate prognosis assessment in clinical routine and providing a new basis for the follow‐up treatment. 4. Methods and Materials 4.1. Study Cohort A total of 100 newly diagnosed DLBCL patients with eligible WGS, WES, and FISH testing data per World Health Organization criteria were enrolled in this retrospective study at PKUCH from January 2014 to January 2023. Diagnostic confirmation was independently performed by two expert hematopathologists. All patients had no bone marrow infiltration at diagnosis, uniformly treated with R‐CHOP like regimen, and had long‐term follow‐up at March 2024. Rearrangements of BCL2 and BCL6 were assessed by FISH analysis based on formalin‐fixed paraffin embedded (FFPE) tissues. The study was approved by the Ethics Committee at PKUCH in accordance with the Declaration of Helsinki. Informed consents were obtained from patients. 4.2. Sample Collection, Processing, and Sequencing Procedure Fifty‐eight percent (58 out of 100) of patients had FFPE tissues with a paired normal specimen, and the remaining 42.0% (42 out of 100) of patients owned only FFPE tissues. Peripheral blood samples were selected as a source for germline DNA identification. Sample collection, processing, and sequencing procedure details were shown in [155]Supporting Information: Materials and Methods. For mutation calling from WES data, MuTect2 [[156]35] were performed for small insertions and deletions, and mutations were annotated with ANNOVAR software [[157]36]. For copy number analysis from WES data, we conducted in house algorithm for gene CNVs. In brief, whole exomes were divided into adjacent and nonoverlapping bins based on the exons of each gene, and the coverage of each bin was calculated. The coverage bias related with GC content of the reference genome was normalized. Then, we build a baseline based on 50 healthy individuals and calculated the residuals of each bin over the baseline using a LOESS‐based method. For structural variants from R‐CHOP data, arm‐level CNVs were identified by WisecondorX [[158]37]. The variants were further filtered by recurrent sequencing artifacts and germline events in an in‐house list based on approximately 1000 tissue and peripheral blood samples as normal pool from nonlymphoma patients with the same WES sequencing panel. We also developed an algorithm called SomaticFinder to analyze somatic mutations based on tumor‐only samples ([159]Supporting Information: Materials and Methods and Figure [160]S5). 4.3. LymphType Algorithm Development The goal of our algorithm, named LymphType, is to achieve six defined genetic subtypes using WES, WGS, and FISH data, based on LymphGen [[161]19]. The core of the LymphType algorithm is to realize molecular typing by gene mutations, gene CNVs, chromosomal CNVs, and BCL2 or BCL6 rearrangements. Gene mutations were obtained from WES, including missense, nonsense, silent, and frameshift mutations. Gene CNVs were derived from WES. Chromosomal CNVs were gained from WGS, including amplification, gain, heterozygous deletion, and homozygous deletion. LymphType algorithm divided the patients into six single genetic subtypes, including A53, BN2, EZB, MCD, N1, and ST2, and several genetically composite subtypes. 4.4. External Validation Cohorts Three external validation cohorts were enrolled in this study (Figure [162]1). Validation cohort 1 (NCI cohort) was used for LymphType algorithm development (N = 574), and the integrated IPI‐G prognostic model and the four‐category risk model defined for the integrated IPI‐G prognostic model validations (N = 203) [[163]19]. Validation cohort 2 (BCA cohort) and validation cohort 3 (DFCI/HOVON84/PETAL (DHP) cohort) were both performed to validate the integrated IPI‐G prognostic and the four‐category risk models mentioned above (N = 311, N = 595, respectively) [[164]22, [165]38]. 4.5. Statistical Analysis Statistical tests were performed using SPSS (version 22.0) or R package (version 4.3.1). Continuous variables were compared using Mann–Whitney or Wilcoxon test. Categorical variables were compared using chi‐square or Fisher's exact test. OS was measured from the date of diagnosis to death or last follow up. Survival analyses were evaluated with the Kaplan–Meier curves using the log‐rank test. Variables with a p < 0.2 in univariable Kaplan–Meier curve analysis were selected for LASSO Cox regression analysis for data dimensionality reduction and variable selection, improving prediction accuracy and interpretation. All p values, two‐sided, less than 0.05 were considered statistically significant. Author Contributions W. L., Y. S., S. C., F. L., and J. Q. designed the study and approved the final manuscript. L. M., J. D., C. Z., and J. Z. collected the clinical sample and data. S. Y., L. C., H. Wu, H. Wang, and H. C. performed the sequencing platform. J. D., C. Z., and L. L. analyzed the data. L. M., J. D., L. L., and J. Q. interpreted the results. L. M., J. D., F. L., and J. Q. drafted and revised the manuscript. All authors have read and approved the final manuscript. Ethics Statement The study was approved by the Ethics Committee at Peking University Cancer Hospital & Institute in accordance with the Declaration of Helsinki (approval number: 2022KT163). Informed consents were obtained from patients. Conflicts of Interest Authors Jiayue Qin, Lixia Liu, Shunli Yang, Libin Chen, Hong Chen, Feng Lou, and Shanbo Cao are the employees of Acornmed Biotechnology Co., Ltd., but has no potential relevant financial or nonfinancial interests to disclose. The other authors declare no conflicts of interest. Supporting information Supporting Information [166]MCO2-6-e70190-s001.docx^ (2.2MB, docx) Acknowledgments