Abstract Gene and cell therapies pose safety concerns due to potential insertional mutagenesis by viral vectors. We introduce MELISSA, a regression-based statistical framework for analyzing Integration Site (IS) data to assess insertional mutagenesis risk, by estimating and comparing gene-specific integration rates and their impact on clone fitness. We characterized the IS profile of a lentiviral vector on Mesenchymal Stem Cells (MSCs) and compared it with that of Hematopoietic Stem and Progenitor Cells (HSPCs). We applied MELISSA to published IS data from patients enrolled in gene therapy clinical trials, successfully identifying both known and novel genes that drive changes in clone growth through vector integration. MELISSA offers a quantitative tool to bridge the gap between IS data and safety and efficacy evaluation, facilitating the generation of comprehensive data packages supporting Investigational New Drug (IND) and Biologics License (BLA) applications and the development of safe and effective gene and cell therapies. Subject terms: Statistical methods, Mesenchymal stem cells, Data mining, Haematopoietic stem cells __________________________________________________________________ Viral vector integration can affect the safety of gene and cell therapies. Here, authors introduce MELISSA, a regression-based statistical tool that quantifies integration site risks and clone growth effects, aiding the safety evaluation of therapies in both research and clinical settings. Introduction Gene therapy using Hematopoietic Stem and Progenitor Cells (HSPC) modified with viral vectors has emerged as a promising approach for treating rare monogenetic diseases^[42]1,[43]2. This strategy involves introducing functional copies of therapeutic genes into patients’ HSPCs, enabling them to produce the missing or malfunctioning protein and restore normal cellular function. Due to their efficient gene delivery capabilities, viral vectors have also found extensive application in novel cell therapies such as Chimeric Antigen Receptor (CAR)-T immunotherapies, where Lenti/Retroviral Vectors are used to deliver CAR genes in a patient’s T cell. The capability for viral vectors to integrate their cargo DNA into unpredictable locations within the host genome carries a certain risk of Insertional Mutagenesis (IM), defined as the disruption or dysregulation of a gene caused by a vector insertion. IM can lead to enhanced cell growth due to the activation of oncogenes or the disruption of tumor suppressor genes by the integrated DNA^[44]3–[45]6. Clonal dominance underscores the risk of a particular clone or a small subset of clones of transduced cells gaining a growth advantage, potentially skewing the clonal composition of the repopulating cells towards monoclonal or oligoclonal configurations that pose additional long-term safety and efficacy concerns, including oncogenic transformation^[46]7–[47]9. Regulatory agencies like the US FDA require rigorous preclinical safety and efficacy evaluations and 15 years of IM monitoring for patients treated with genetically modified HSPCs^[48]10. Recently, the US FDA extended monitoring to individuals receiving BCMA- or CD19-directed autologous CAR T cell immunotherapies. A comprehensive risk-benefit analysis is crucial for evaluating the potential for IM, yet the scientific community still lacks standardized indices, metrics, and methods to clearly distinguish between safe and unsafe integration profiles. The association of the integrome, namely the distribution of IS across the host genome, with genomic annotation data has elucidated the role of chromatin conformation, transcriptional activity, and the cellular genome 3D nuclear organization on IS frequency^[49]11–[50]15. These insights, combined with the analysis of oncogenic transformation mechanisms observed in clinical trials, enabled the generation of novel and safer viral vectors for HSPC gene therapies. Using IS analysis to monitor the clonal composition over time, scientists can gain insights into the diversity and dynamics of the engrafting cell population, assess the long-term efficacy, and the risk of clonal dominance^[51]7,[52]16–[53]18. From a safety perspective, IS analyses serve as the primary screening tool for identifying potential risks associated with uncontrolled clonal expansion by detecting abundant clones based on their relative contribution and mapping IS location within the host cell genome. Conventionally, the identification of abundant clones has been based on predefined percentage thresholds calculated on an individual sample basis. Leveraging IS information from multiple datasets to estimate the integrome and clonal contributions can improve the characterization of gene targeting preferences and associated risks and enhance the