Abstract We used epidemiological data from 21195 patients with cancer and 180407 matched controls, including in-depth analyses in 216 cancer patients, to discover clinical and molecular determinants that predispose cancer patients to breakthrough infections. Patients with B cell malignancies, with differential expression of CD24, CDK14 and PLEKHG1, were most susceptible to breakthrough infections, suggesting that these patients may require more booster immunisations to ameliorate cellular responses and immune protection against COVID-19. Subject terms: Cancer, Vaccines, RNA vaccines __________________________________________________________________ Patients with cancer may have decreased humoral and T cell responses to mRNA vaccines, exposing them to increased risk of severe COVID-19^[86]1–[87]7. The rates of breakthrough infections remain higher, even in fully vaccinated patients with cancer, indicating that regular boosters may be required to upkeep the longevity of immunity^[88]6,[89]8,[90]9. In October 2024, the Advisory Committee on Immunisation Practices recommended that moderately or severely immunocompromised individuals may receive additional doses based on shared clinical decision-making^[91]10. However, it remains uncertain whether all, or only a subset of patients, require regular booster doses to protect against SARS-CoV-2 infection. In this study, we used epidemiological data from 21195 patients with cancer and 180407 matched controls, as well as in-depth translational data from a subset of 216 patients, to discover clinical and molecular determinants that predispose to SARS-CoV-2 breakthrough infections. Transcriptomic and serological analysis was performed on a subset of patients to discover host characteristics associated with increased rates of breakthrough infections. Overall, 21,044 individual patients with cancer (7463 receiving active treatment and 13,732 cancer survivors) with 180,407 matched controls from the general population were included across the delta and omicron waves. Cancer survivors were defined as individuals with a previous diagnosis of cancer and not currently receiving any active cancer treatment. The demographic characteristics were detailed in Supplementary Table [92]1. More than 95% of the patients with active cancer, cancer survivors and the matched controls from the general population were fully vaccinated with at least two doses of mRNA SARS-CoV-2 vaccines (Supplementary Table [93]1). Despite similar vaccination rates, the cumulative incidence of SARS-CoV-2 infections among both active-treated patients with cancer and cancer survivors was higher than matched controls, especially after the easing of nationwide non-pharmacological interventions of infection control such as mask mandates (Supplementary Fig. [94]1a). To dissect the determinants associated with increased breakthrough SARS-CoV-2 infections in patients with cancer, we studied a cohort of 216 patients with cancer who participated in the Oncovax study, where blood samples were drawn before the third vaccine dose was administered, at a median of 112 days (interquartile range 91–140) after the first dose (Fig. [95]1a, Supplementary Table [96]2). Patients received primary and booster doses of mRNA SARS-CoV-2 vaccines and were followed up for SARS-CoV-2 infections for 1 year. The predominant SARS-CoV-2 strain that was circulating was the Omicron variant. The third vaccine dose was given ~3 months after the second dose (Fig. [97]1a). Patients with breakthrough infection (n = 103) showed comparable vaccination rates to those without breakthrough infection (n = 113) (Supplementary Table [98]3). Consistent with the epidemiological observations in the larger cohort, patients with cancer on active treatment had similar breakthrough infection rates compared to cancer survivors (Fig. [99]1b). The types of active treatment, including chemotherapy, immunotherapy and targeted therapy, did not affect breakthrough infection rates (Supplementary Fig. [100]1b) or severity of symptoms (Supplementary Table [101]3). The differences in breakthrough infection rates were also not attributed to demographic characteristics, such as age, sex and ethnicity (Supplementary Table [102]3). Next, we examined if tumour types were associated with breakthrough infection rates. Patients with underlying haematological malignancies had significantly higher breakthrough infection rates within 12 months of primary vaccination, regardless of whether they were on active treatment (Fig. [103]1c). As the tumour type had the greatest effects on breakthrough infection rates (Supplementary Table [104]3), we subsequently focused our gene expression and immunological analyses on the patients with haematological malignancies. Fig. 1. Baseline characteristics of patients with cancer associated with SARS-CoV-2 breakthrough infections. [105]Fig. 1 [106]Open in a new tab a Schematic of the study schedule for 273 patients with cancer on the Oncovax study who have at least three vaccine doses. Patients received primary and booster doses of mRNA SARS-CoV-2 vaccines and were followed up for SARS-CoV-2 infections for 1 year. Blood drawn pre-vaccine dose 3 was sent for RNA sequencing and serological assays. Created in BioRender. Zhong, Y. (2025) [107]https://BioRender.com/z05h519. b Breakthrough infection rates among patients with cancer who are in active treatment compared to cancer survivors. c Breakthrough infection rates among patients with haematological malignancies and those with solid tumours. Chi-square test, **P < 0.01. d Volcano plot showing increased (red) and decreased (blue) expression of genes in patients with haematological malignancies compared to those with solid tumours. (Fold-change > 1.5; FDR-adjusted p value < 0.05). e Venn diagram of pathway enrichment modules that were significantly enriched in patients with haematological malignancies and with breakthrough infections. f Significantly enriched modules (P195, P122 and P123) (adjusted p value < 0.05) in patients with haematological malignancies and with breakthrough infections. g Network plot of leading-edge genes in the three modules P195, P122 and P123. CD24, CDK14 and PLEKHG1 were common leading-edge genes shared between the three modules. h Breakthrough infection rates among patients with different haematological and solid tumour cancer types. One-way ANOVA is used to evaluate significance. To understand the baseline characteristics associated with haematological malignancies and breakthrough infections, total RNA was extracted from the whole blood of 73 patients who had available samples at the pre-dose 3 time point. These patients had similar clinical features to the main Oncovax cohort of 216 patients (Fig. [108]1a, Supplementary Table [109]2). There were more genes that were significantly downregulated in patients with haematological malignancies compared to those with solid tumours (fold-change > 1.5, false discovery rate [FDR] adjusted p value < 0.05) (Fig. [110]1d). To further dissect the molecular pathways and immune cell types involved, we performed Gene Set Enrichment Analysis (GSEA) using the blood transcription modules (BTMs), a curated database comprising an integrated large-scale network of publicly available human blood transcriptomes^[111]11. B-cell modules were most negatively downregulated in patients with haematological malignancies compared to those with solid tumours (Supplementary Fig. [112]1c). On the other hand, the positively enriched pathways consisted of T cell and myeloid cell modules (Supplementary Fig. [113]1c). Furthermore, to identify the pathways that were also associated with the higher risk of breakthrough infections, we performed GSEA comparing patients with and without breakthrough infections regardless of the tumour type (Supplementary Fig. [114]1d). Interestingly, the B-cell modules were also negatively downregulated in patients with breakthrough infections (Supplementary Fig. [115]1d). Taken together, we identified that B-cell modules (P122, P123, P195) were negatively downregulated in patients with haematological malignancies and with breakthrough infections (Fig. [116]1e, f), with CD24, CDK14 and PLEKHG1 as the top common leading-edge genes that were represented in the B-cell modules (Fig. [117]1g). Moreover, to examine the types of B cells that express CD24, CDK14 and PLEKHG, we used the Human Protein Atlas and Monaco datasets^[118]12. The expression of CD24, CDK14 and PLEKHG1 was found to be most highly expressed in naïve and memory B cells (Supplementary Fig. [119]1e–g). Although CD24 can be detected in eosinophils, and CDK14 can be detected in basophils and neutrophils, these immune cells were not significantly enriched in patients with haematological malignancies and with breakthrough infections (Fig. [120]1f, Supplementary Fig. [121]1c, d). The enrichment of B-cell modules in patients with haematological malignancies and breakthrough infection led us to postulate that the type of haematological malignancies may influence the rates of breakthrough infections. We thus stratified the haematological malignancies by immune cell type. Consistent with the gene expression data, we observed that patients with B-cell malignancies, including Hodgkin’s lymphoma, acute lymphoblastic leukaemia, diffuse large B-cell lymphoma and follicular lymphoma, accounted for a greater difference in risk profile compared to those with myeloid haematological malignancies and other types of solid tumours (Fig. [122]1h). In addition, to examine if the B-cell malignancies affected the humoral responses to the mRNA vaccine, we measured the anti-spike (S) IgG titre and variant-specific neutralising antibody titres against SARS-CoV-2 variants Wuhan-Hu-1, Beta, Delta, Omicron BA.2, Omicron XBB.1.16, and JN.1. Using 72 subjects that had similar clinical features as the entire cohort of 220 patients (Fig. [123]1a, Supplementary Table [124]3), we first compared the anti-S IgG or neutralising antibody titres between patients with and without breakthrough infection. No difference in anti-S IgG or neutralising antibody titres was observed (Supplementary Fig. [125]1h, i). Likewise, antibody titres were not different between patients with haematological malignancies and solid tumours (Supplementary Fig. [126]1j, k). Thus, the increased rates of breakthrough infections in patients with B-cell malignancies were not attributed to differences in antibody responses, suggesting that antibody titres alone cannot be used to recommend the need for booster vaccinations to protect against COVID-19. This study addresses the clinical characteristics of patients with cancer who are at higher risk of breakthrough infections using a comprehensive analysis of tumour types, cancer treatment, gene expression and serology data. While previous meta-analysis indicated that patients undergoing chemotherapy, targeted therapy and steroid usage had reduced seroconversion rates^[127]13, we did not observe differences in breakthrough infection rates and antibody production in our study, and these variabilities may arise due to differences in the types of vaccines used, host genetics, environmental factors or the sampling time point where serology was performed. Instead, we identified that tumour type, especially patients with B-cell malignancies, had a bigger influence on susceptibility to breakthrough infections, likely attributed to immunosuppression generated by B-cell lymphodepletion or aplasia. However, in our study, the increased breakthrough infections cannot be fully explained by compromised antibody responses to SARS-CoV-2, as these patients can produce detectable cross-reactive neutralising antibodies after mRNA vaccination. It is thus plausible that cellular immunity may be more important for long-term protection, especially against variants of concern and when antibody titre wanes over time^[128]14–[129]17. The findings that downregulation of CD24, CDK14 and PLEKHG1 were the leading-edge genes associated with both haematological malignancies and increased breakthrough infections highlighted the involvement of B cells in vaccine-mediated protection against SARS-CoV-2. That CD24, CDK14 and PLEKHG1 are more highly expressed in naïve and memory B cells may suggest that the responsiveness to SARS-CoV-2 may be implicated due to reduced availability and function of naïve and memory B cells. CD24 is involved in B-cell development and the expression of CD24 declines as B cells mature^[130]18. CDK14 regulates cell cycle regulation, especially mitotic progression, which could be relevant for B-cell proliferation and expansion^[131]19. The Rho guanine nucleotide exchange factor PLEKHG1 regulates cell motility and morphology^[132]20. Collectively, these findings raise an interesting possibility that changes in B-cell frequencies, development and function in patients with B-cell malignancies can lead to reduced T cell responses and increased breakthrough infections. This is supported by mouse models demonstrating that T cell vaccine responses depend on B cells^[133]21. Future studies using single-cell RNAseq and deep immunophenotyping of naïve and memory B cells in patients with B-cell malignancies will shed deeper insights into the molecular underpinnings involved. In conclusion, our data show that the tumour type, particularly patients with B-cell malignancies, are at greater risk of breakthrough infections. These patients may require more frequent booster immunisations to generate an enhanced immune response, providing immune protection. In addition, antiviral approaches may be required for these patients since they may be more susceptible to symptomatic infection. Extended studies investigating the availability and function of SARS-CoV-2-specific memory B and T cells in patients with B-cell malignancies will be required to learn more about their role in protection against SARS-CoV-2 symptomatic infections. We acknowledge the limitations of our study. Firstly, as our study relied on reported SARS-CoV-2 infections to the Ministry of Health, it did not differentiate between symptomatic and asymptomatic infections; however, as there was no proactive surveillance performed for asymptomatic individuals during the period of the study, many of the infections captured in our study can be expected to be symptomatic SARS-CoV-2 infections. Second, we acknowledge the possibility of other confounders such as time spent in hospital and outdoors, healthcare seeking behaviour and the habits of using Personal Protective Equipment. Third, while we do not detect significant differences in neutralising antibody production between patients with haematological malignancies and those with solid tumours in our study, we do not exclude the possibility that these differences may become more apparent with bigger sample sizes, as indicated by a previous report showing that B-cell chronic lymphocytic leukaemia patients had suboptimal antibody responses^[134]22. Finally, our study suggested that impaired cellular immunity is a likely explanation for the differences in breakthrough infection rates in patients with haematological malignancies, so future studies that thoroughly investigate effects on T cell frequencies, activation and functions in these patients will be needed to confirm their role in vaccine-mediated protection. Methods Clinical cohort The multicenter cohort study was conducted by Singapore’s Ministry of Health, which included all patients who were on follow-up in the public healthcare system for diagnosis of cancer across the National Cancer Centre Singapore, National University Cancer Institute and Tan Tock Seng Hospital. Patients received the mRNA-based vaccines BNT162b2 and mRNA-1273. More details related to the study design have been previously described^[135]8. Controls were separately identified for the Delta (15 September 2021 to 20 December 2021) and Omicron periods (6 January 2022 to 30 June 2023) for each patient with cancer through a 1:5 matching based on age, sex, race, and socioeconomic status (approximated using flat type). It is possible for an individual who was identified as a control during the Delta period to also serve as a control during the Omicron period, which is why the number of controls is not exactly five times the number of patients with cancer. People who took non-mRNA vaccines are not included in the numbers. Cumulative incidence rates A competing risks regression model using Stata (version 18), adjusting for race, sex, socioeconomic status (approximated using flat type), and age categories, was conducted to estimate the cumulative incidence of SARS-CoV-2 infections over time. The dataset, structured on a time-to-event basis, treated SARS-CoV-2 infections as the primary event and non-COVID-19 related deaths as the competing event. Based on the regression results, cumulative incidence function (CIF) graphs for each group were generated to visualise the cumulative probability of infection from 15 September 2021 to 30 June 2023 and then superimposed into one chart. Oncovax clinical study A subset of 216 patients with cancer were recruited prospectively from two hospitals of the National University Cancer Institute, Singapore: National University Hospital and Ng Teng Fong General Hospital, Singapore, between July 2021 and March 2022. All patients had a personal history of malignancy, were ≥21 years old, and were deemed by their primary physician to be suitable to receive SARS-CoV-2 vaccination. Patients on active treatment were defined as those on systemic chemotherapy, tyrosine kinase inhibition or immunotherapy. Cancer survivors include those who are at least 3–5 months post-adjuvant chemotherapy on surveillance, those on adjuvant hormone therapy, or those on radiotherapy alone. Clinicopathological data (including age at diagnosis, sex, type of cancer, and anti-neoplastic treatment at enrolment) were collected and de-identified. All patients received mRNA SARS-CoV-2 vaccination with either the Pfizer BNT162b2 vaccine or the Moderna mRNA-1273 vaccine. As this was an observational study, primary vaccination and booster doses of vaccine were given at the discretion of patients and primary physicians. Subjects were matched with administrative data on SARS-CoV-2 vaccinations and infections reported to the Ministry of Health for the purposes of monitoring disease transmission and vaccination uptake under the Infectious Disease Act. While symptomaticity of SARS-CoV-2 infections was not available from this database, as the national directive during the time of the study was to perform diagnostic testing for symptomatic individuals, the vast majority of these cases would be symptomatic infections. Administrative data on vaccination included the date and brand of vaccinations, while data on SARS-CoV-2 infections include the notification date of infection, and whether cases required treatment in hospital, Supplemental oxygen, intensive care unit admission and/or resulted in death. Blood was taken immediately before the first vaccine dose, 3–8 weeks after the first dose, 3 months after dose 1 or just before the third dose, and at 6- and 12-month post-dose 1. The National Healthcare Group Domain Specific Review Board (DSRB) provided ethical approval for this study (reference number 2021/00523). All individuals enrolled in this study provided written informed consent approved by DSRB and in compliance with the Declaration of Helsinki principles. Whole blood processing and bulk RNA sequencing Whole blood samples were received on dry ice and stored at −80 °C until processing. Before RNA extraction, samples were incubated at room temperature (RT) for 2 h. RNA was isolated using the PAXgene Blood RNA Kit (Qiagen, 762174) following the manufacturer’s protocol. RNA integrity was assessed using an Agilent Bioanalyzer RNA 6000 Nano chip (Agilent, 5067-1511). For library preparation, 300 ng of RNA underwent globin and rRNA depletion with the NEBNext Globin & rRNA Depletion Kit (NEB, E7755X), followed by library construction using the NEBNext Ultra II RNA Library Prep Kit (NEB, E7775L). The library size and concentration were verified via HT DNA Extended Range LabChip (Perkin Elmer, 760517) and pooled to equal concentrations before sequencing on the Illumina NovaSeq 6000. Raw sequencing results were processed and the raw counts were generated using the standard nf-core RNASeq pipeline version 3.6. Bioinformatics analysis 73 samples at the pre-dose 3 time point (P7) were included in bulk-RNAseq analysis. Three samples were excluded as one patient had an additional vaccine dose before P7 and two patients did not come for the P7 visit. Two pair-wise comparisons were made: One comparing the differences between cancer types—haematological malignancies versus solid organ tumours. The other comparison involved a comparison between patients with and without breakthrough infection. Raw counts were filtered for genes with at least 10 reads for a minimum of 16 samples, and 10 reads for a minimum of 36 samples in cancer type and COVID-19 breakthrough status, respectively, resulting in 15577 and 14497 genes. Samples were normalised through DESeq2’s DESeq function (v1.42.0)^[136]23, where each sample’s size factors and gene-wise dispersions were estimated and subsequently fitted onto a negative binomial distribution model. The Wald statistical test was conducted in both comparisons for downstream differential expression analysis. The lfcShrink function was used to shrink the fold-changes to prevent false positives arising from large fold-differences in low read counts. For the volcano plot, significantly different expression levels at pre-vaccination time-points were determined based on fold-change > 1.5 and FDR-adjusted p value < 0.05, using the Benjamini-Hochberg Step-Up FDR-controlling Procedure. Pathway enrichment analysis GSEA was performed with clusterProfiler (v4.10.0)^[137]24 using fold-change values as the input. Blood transcription module (BTM)^[138]11 database, curated by Li et al., was used to understand the immune cell types involved in the comparisons. The top 10 positive and negative significantly enriched pathways (adjusted p value < 0.05) in the comparisons were reported. Venn diagrams were plotted with ggvenn (v0.1.10) to determine pathways that were commonly enriched in haematological malignancies and with breakthrough infections. To determine the gene expression levels of CD24, CDK14 and PLEKHG1 in the different immune cells, we leveraged on the Human Protein Atlas single-cell sequencing derived from the Human Protein Atlas and Monaco datasets. The weblinks with the information are as follows: [139]https://www.proteinatlas.org/ENSG00000272398-CD24/immune+cell[140] https://www.proteinatlas.org/ENSG00000058091-CDK14/immune+cell[141]http s://www.proteinatlas.org/ENSG00000120278-PLEKHG1/immune+cell. Spike (S)-binding IgG quantification Anti-S IgG was quantified using enzyme-linked immunosorbent assay (ELISA), as previously described^[142]10. Briefly, two high-binding 96-well ELISA microplates (Greiner) were coated with 1 μg/mL Wuhan-Hu-1 S hexapro protein diluted in PBS and incubated at RT for 45 min. Plates were washed with PBS-T (0.05% Tween-20) and incubated with blocking buffer (PBS with 3% BSA) at RT for 1 h. Plasma was diluted 200× and 5000× in blocking buffer, while a standard antibody (anti-SARS-CoV-2 S RBD Neutralising Antibody, Acrobiosystems) was serially diluted 10× for the standard curve. Blocked plates were washed and then incubated with diluted plasma and antibody standard in duplicates at RT for 1 h. Plates were washed and then incubated with HRP-IgG secondary antibody (Life Technologies) at 10,000× dilution at RT for 1 h. Lastly, plates were washed and then detection reagent 3,30,5,50-Tetramethylbenzidine (TMB) (Thermofisher) was added. Reaction was quenched with 1 M Sulphuric acid/phosphoric acid. Sample optical density was measured with a spectrophotometer at 450 nm, and concentrations in U/ml were interpolated from the standard curve using GraphPad Prism 10.0.2. Neutralising antibody titres quantification Neutralising antibodies against VOCs were measured with a pseudotyped virus neutralisation assay (pVNT), as previously described^[143]17. Briefly, human lung carcinoma epithelial (A549, ATCC CRM CCL-185) cells were grown and maintained in RPMI 1640 supplemented with 10% FBS. The human ACE2 gene in the pFUGW vector was introduced into A549 cells by lentivirus transduction1 and maintained in RPMI 1640 supplemented with 10% FBS and 15 µg/ml of blasticidin. Human embryo kidney (HEK293T, ATCC CRL-3216) cells were grown and maintained in DMEM supplemented with 10% FBS. SARS-CoV-2 2 parental (Wuhan-Hu-1), Beta, Delta, Omicron BA.2, Omicron XBB.1.16 and JN.1 full-length spike pseudotyped viruses were produced by transfecting 20 µg of pCAGGS spike plasmid into 5 million HEK293T cells using FuGENE 6 (Promega). At 24 h post-transfection, the transfected cells were infected with VSV∆G luc seed virus at an MOI of 5 for 2 h. After two PBS washes, infected cells were replenished with DMEM 10% FBS supplemented with 1:5000 diluted anti-VSV-G mAb (Clone 8G5F11, Kerafast). Upon 80% cytopathic effect, pseudotyped viruses were harvested by centrifugation at 2000 × g for 5 min. Pseudoviruses (~3 million RLU) were pre-incubated with four-fold serial diluted test serum in a final volume of 50 μl for 1 h at 37 °C, followed by infection of A549-ACE2 cells. At 20–24 h post-infection, an equal volume of ONE-Glo luciferase substrate (Promega) was added, and the luminescence signal was measured using the Cytation 5 microplate reader (BioTek) with Gen5 software version 3.10. The 50% neutralising titre (NT50) was interpolated using GraphPad Prism 10.0.2. Statistical analysis Statistical analyses were performed with Prism 10.3.1 software, including unpaired t-test, chi-squared test and one-way ANOVA. RNAseq and pathway enrichment analysis was performed with R, using the DESeq and clusterProfiler packages. Figure [144]1a was created in BioRender, while all other figures were created using R; every element of all figures were created by the authors of this manuscript. Supplementary information [145]Supplementary File^ (584.6KB, pdf) Acknowledgements