Abstract

   Collisions of the transcription and replication machineries on the same
   DNA strand can pose a significant threat to genomic stability. These
   collisions occur in part due to the formation of RNA-DNA hybrids termed
   R-loops, in which a newly transcribed RNA molecule hybridizes with the
   DNA template strand. This study investigated the role of RAD52, a known
   DNA repair factor, in preventing collisions by directing R-loop
   formation and resolution. We show that RAD52 deficiency increases
   R-loop accumulation, exacerbating collisions and resulting in elevated
   DNA damage. Furthermore, RAD52’s ability to interact with the
   transcription machinery, coupled with its capacity to facilitate R-loop
   dissolution, highlights its role in preventing collisions. Lastly, we
   provide evidence of an increased mutational burden from double-strand
   breaks at conserved R-loop sites in human tumor samples, which is
   increased in tumors with low RAD52 expression. In summary, this study
   underscores the importance of RAD52 in orchestrating the balance
   between replication and transcription processes to prevent collisions
   and maintain genome stability.

   Subject terms: Stalled forks, DNA damage response, Genomic instability
     __________________________________________________________________

   Collisions of transcription and replication machineries on the same DNA
   strand threaten genomic stability. Here, the authors show that RAD52
   prevents these collisions by regulating R-loop formation and
   resolution. RAD52 deficiency leads to increased R-loops, exacerbated
   collisions, DNA damage, and higher mutational burden in tumors.

Introduction

   Transcription and replication are two tightly regulated processes
   necessary for gene expression and DNA duplication respectively, both of
   which are essential for cellular integrity. It is imperative for the
   cell to maintain temporal and spatial separation of these two processes
   to prevent them from colliding (transcription-replication conflicts
   (TRCs)), which can result in replication stress and DNA damage,
   ultimately leading to genome instability and tumorigenesis. Aberrant
   accumulation of secondary structures such as R-loops have been
   implicated as a major source of TRCs^[58]1,[59]2. R-loops are
   three-stranded RNA-DNA hybrids that are formed transiently during
   transcription when the nascent RNA anneals back to the template DNA and
   displaces the non-template strand within the RNA polymerase active
   site^[60]3. Traditionally, these hybrids have been shown to play an
   important physiological role in gene activation, termination, and
   chromatin regulation. However, transcriptional dysregulation, both in
   the form of gene overexpression and aberrant RNA polymerase II (Pol II)
   pausing, has been associated with the accumulation and persistence of
   pathological R-loops^[61]4–[62]6. R-loops pose a significant threat to
   DNA replication as transcription and replication translocate on the
   same DNA template.

   As TRCs pose a considerable threat to genomic integrity, mechanisms to
   manage the collisions are required to prevent them causing undue DNA
   damage^[63]4,[64]7. Prevention mechanisms include limiting the
   accumulation of R-loops by assembling RNA-binding proteins on nascent
   RNA^[65]8,[66]9; regulating topological stress associated with
   transcription and chromatin architecture^[67]10–[68]12; removal of
   R-loops via nucleases^[69]13,[70]14 or RNA-DNA helicases^[71]15,[72]16;
   and, the subsequent repair of the damage resulting from
   TRCs^[73]17–[74]21. In contrast, recent studies have proposed that
   R-loops can play a major role in double-strand break (DSB) repair at
   transcriptionally active loci via homologous recombination (HR), by
   providing a scaffold for the recruitment of DNA repair factors to the
   site of damage^[75]22,[76]23. Unexpectedly, RAD52 emerged as a common
   factor in all these R-loop associated DSB repair
   pathways^[77]24–[78]28.

   Human RAD52, a protein known to have a DNA binding ability, has been
   associated with DSB repair owing to its role as a back-up HR repair
   factor^[79]29 and its synthetic lethal relationship in BRCA-deficient
   cancers^[80]30–[81]32. Surprisingly, recent studies have shown strong
   RNA-binding ability for RAD52 in vitro supporting a role in resolving
   transcription associated DSBs^[82]24–[83]28. However, given the
   interaction of RAD52 with RNA^[84]26,[85]33,[86]34 and Pol II^[87]35,
   it is conceivable that RAD52 could also be involved upstream in the
   regulation of R-loops themselves.

   To understand RAD52’s role in R-loop management, we performed mass
   spectrometry (MS) analysis of the RAD52 protein interactome. We found
   that RAD52 predominantly interacts with proteins engaged in the
   transcription complex, suggesting that RAD52 recruitment to R-loop
   sites may be facilitated via this interaction. We observed that loss of
   RAD52 induces elevated levels of Pol II pausing and R-loop accumulation
   leading to increased TRCs and genomic instability. Furthermore, we
   identified a role for the previously uncharacterized C-terminal domain
   of RAD52, in that it is essential for RAD52’s interaction with Pol II
   and helps recruit Topoisomerase IIα (TOP2A) to R-loops, in order to
   alleviate torsional stress and aid in resolving TRCs. Additionally, we
   found direct evidence of increased mutational scars at R-loop forming
   regions across tumor types and these were exacerbated in tumors with
   low levels of RAD52 expression. This study supports a role of RAD52
   directly at R-loops, and its absence contributes to increased R-loop
   associated genomic instability.

Results

RAD52 interacts with the transcriptional complex and co-localizes with RNA
POL II

   We first sought to identify RAD52-interacting proteins under
   physiological conditions in an unbiased manner by performing an
   immunoprecipitation (IP) with the expression of RAD52 fused to a
   N-terminal HA-tag, followed by MS (Fig. [88]1a, b, Supplementary
   Fig. [89]1a). MS discovered 212 proteins significantly enriched over
   the HA-tag control. Reassuringly, RPA1, a critical sub-unit of the RPA
   complex and a known interactor of RAD52^[90]36–[91]39 was identified in
   this analysis (Fig. [92]1b, c, Supplementary Data [93]1). However, the
   majority of hits identified had an RNA-associated role as depicted by
   the gene ontology (GO) analysis (Supplementary Fig. [94]1b, c), rather
   than DNA repair, suggesting that RAD52 has a strong interaction with
   the transcription machinery (as highlighted in Fig. [95]1b). To
   validate this observation, we performed both a co-immunoprecipitation
   (Co-IP) and a proximity ligation assay (PLA) between endogenous RAD52
   and Pol II, finding clear evidence for their interaction independent of
   DNA or RNA (Fig. [96]1c–f, Supplementary Fig. [97]1d), corroborating
   previous observations of this interaction seen with over-expressed
   RAD52^[98]35.

Fig. 1. RAD52 association with the transcriptional complex.

   [99]Fig. 1
   [100]Open in a new tab

   a Schematic representation of the workflow for the identification of
   RAD52 interacting proteins. HA-control and HA-RAD52 immunoprecipitation
   was performed in HEK293T cells using α-HA tagged magnetic beads for the
   pulldown followed by Mass spectrometry (MS). b Volcano plot of the
   proteins identified in RAD52 IP-MS in n = 3 biologically independent
   experiments. Mean log2 fold change in protein intensities on the x-axis
   of all replicates between HA and HA-RAD52 are plotted against the
   −log10 adjusted p-value (Student’s two-sided t-test with equal
   variance) on the y-axis. 212 proteins were identified to be
   significantly enriched. Significantly enriched proteins in blue
   (p < 0.05) and non-significant in grey. c Co-immunoprecipitation of
   endogenous RAD52 binding proteins in HeLa cells. RAD52 and IgG
   antibodies were used to immuno-precipitate proteins and analyzed by
   immunoblotting with indicated antibodies. Results reproducible for at
   least 2 biological replicates. d Schematic representation of PLA to
   visualize proximity of RAD52 protein and RNA Pol II. e Representative
   images of the nuclear PLA foci (α-RAD52: α-RNA Pol II S2) across stated
   conditions (Scale bar 10 µM). f Quantitative analysis of nuclear PLA
   foci from (e) Data are plotted as mean ± SEM. The data presented
   shows ≥ 500 nuclei from 3 biological replicates; p-values calculated
   using unpaired two tailed t-tests. g Metagene plots showing the
   distribution of the RNA Pol II and RAD52 Chromatin immunoprecipitation
   sequencing (ChIP-seq) peaks (IP/input) in HeLa cells across genes and
   the flanking regions ( ± 10 kb). TSS: Transcription Start Site, TES:
   Transcription End Site. h Heatmap representing RNA Pol II and RAD52
   ChIP-seq tracks, centered at the TSS and TES ± 10 kb, and rank-ordered
   according to RNA Pol II occupancy. i Bar chart showing how RNA Pol II
   and RAD52 peaks are distributed across different genomic regions as
   indicated. Peaks were obtained with MACS2. Genome wide distribution is
   shown on top for comparison. j Venn diagram showing the overlap of
   peaks RNA Pol II ChIP and RAD52 ChIP according to MACS2 across the
   genome. k A representative snapshot of chromosome 19 depicting RNA Pol
   II (red) and RAD52 (green) ChIP binding sites in control HeLa cells.
   Input DNA (grey) represents a negative control for background
   normalization. Schematics in Fig. 1 (a) and (d) were created with
   BioRender.com released under a Creative Commons
   Attribution-NonCommercial-NoDerivs 4.0 International license. Source
   data are provided as a Source Data file.

   To understand the spatial resolution of the RAD52-Pol II interaction,
   we performed a chromatin immunoprecipitation sequencing (ChIP-seq) of
   endogenous Pol II and RAD52. We observed that there is a significant
   coincidence of Pol II and RAD52 peaks across the genome, with increased
   enrichment seen at transcription start sites (TSS) and transcription
   end sites (TES) (Fig. [101]1g–k, Supplementary Fig. [102]1e).
   Approximately 40% of all RAD52 peaks were associated with Pol II,
   though only a subset of Pol II peaks colocalized with RAD52 (~10%),
   suggesting that RAD52 is recruited to only a subset of all sites of
   transcription (Fig. [103]1j).

RAD52 prevents TRCs by reducing Pol II pausing and decreasing the level of
pathologic R-loops

   We next sought to better understand the function of RAD52 as an
   accessory factor associated with the transcription complex. We observed
   that depletion of RAD52 leads to increased PoI II pausing at the TSS as
   demonstrated by increased accumulation of Pol II peaks at the TSS
   specifically in the subset of genes that had Pol II-RAD52 co-occupancy
   (Fig. [104]2a, b, Supplementary Fig. [105]2a, b), loss of RAD52 having
   no effect on the Pol II profiles of the other genes (Supplementary
   Fig. [106]2c). Furthermore, this did not affect global gene expression
   profiles (Supplementary Fig. [107]2d, Supplementary Data [108]2),
   suggesting RAD52 does not alter transcriptional profiles in cells.

Fig. 2. Loss of RAD52 increases R-loop formation and exacerbates
transcription-replication conflicts.

   [109]Fig. 2
   [110]Open in a new tab

   a Representative snapshot of chromosome 9 depicting RNA Pol II
   occupancy from ChIP-seq analysis (IP/input) in siNT (red) and siRAD52
   (dark red) transfected HeLa cells. (b) Metagene plot showing the
   distribution of the RNA Pol II occupancy at the TSS and flanking
   regions ( ± 10 kb) of genes with overlapping RNA Pol II and RAD52
   peaks. Plots shown: siNT (control) and siRAD52 transfected HeLa cells.
   (c) ChIP-seq of RNA Pol II (red), RAD52 (green) and S9.6 (R-loops;
   blue) occupancy in control HeLa cells. Representative snapshot of
   chromosomes 21 are shown. Input (grey) DNA as negative control for
   background normalization. d Venn diagram of the percentage of genes
   overlapping with RNA Pol II, RAD52 and S9.6 ChIP peaks (MACS2). e
   Representative images of S9.6 immunostaining to detect R-loops in siNT
   (control) and siRAD52 transfected HeLa cells. RNase H treatment was
   added as a negative control to eliminate R-loops (Scale bar 10 µM). f
   Quantitative analysis of nuclear S9.6 foci across stated conditions
   from (e). Data plotted as box and whiskers. Boxes extend from the
   25th–75th percentiles, with the median displayed as a line. The
   whiskers mark the minimum (1 percentile) and maximum (99^th
   percentile). The data presented shows ≥ 500 nuclei from 3 biological
   replicates; p-values calculated using unpaired two tailed t-tests. g
   Schematic representation of PLA to visualize proximity of PCNA and RNA
   Pol II to measure TRCs. The schematic illustration was created with
   BioRender.com released under a Creative Commons
   Attribution-NonCommercial-NoDerivs 4.0 International license. h
   Representative images of the nuclear PLA foci (α-PCNA: α-RNA Pol II S2)
   across stated conditions (Scale bar 10 µM). i Quantitative analysis of
   nuclear PLA foci from (h). Data are plotted as mean ± SEM. The data
   presented shows ≥ 500 nuclei from 3 biological replicates; p-values
   calculated using unpaired two tailed t-tests. Source data are provided
   as a Source Data file.

   Increased Pol II pausing has been known to be associated with R-loop
   accumulation^[111]40. We wanted to see if this holds true at RAD52
   associated Pol II pausing sites (Fig. [112]2a, b). To this end, we
   performed ChIP-seq analysis of R-loop associated peaks using the S9.6
   antibody that has been characterized to specifically interact with
   RNA-DNA hybrids^[113]41. We observed that 22% of RAD52 peaks associated
   with R-loops, albeit at a frequency lower than its association with Pol
   II (41%), suggesting that RAD52 may associate with Pol II independently
   of R-loops (Fig. [114]2c, d). Upon comparing the RAD52 interactome
   (Fig. [115]1b) with that of R-loops^[116]42, it was apparent that the
   majority of proteins which associated with RAD52 also associated with
   R-loops (Supplementary Fig. [117]2e, f, Supplementary Data [118]3),
   indicating that RAD52 may play an important role in R-loop homeostasis.

   We next set out to investigate the effect of RAD52 loss on global
   R-loop levels. We observed that RAD52 loss in cells led to a
   significant increase in global S9.6 signal (Fig. [119]2e, f,
   Supplementary Fig. [120]3a–h), comparable to depletion of Aquarius
   (AQR), a known R-loop resolution factor^[121]15 (Supplementary
   Fig. [122]3i, j). Furthermore, the observed S9.6 signal was sensitive
   specifically to RNase H treatment but not RNase III (Fig. [123]2e, f,
   Supplementary Fig. [124]3b–d). RNase H is known to explicitly digest
   RNA-DNA hybrid, confirming the detection of R-loops, and no other
   non-specific RNA species under the given conditions^[125]43. Prior work
   has suggested that increased R-loops pose a threat to replication,
   leading to increased TRCs^[126]44. In order to understand the
   physiological consequences of increased R-loops in RAD52 deficient
   backgrounds, we performed a PLA between Pol II and PCNA, an essential
   component of the replisome (Fig. [127]2g, Supplementary Fig. [128]3k,
   l). We found a significant increase in TRCs observed with the loss of
   RAD52, which was further amplified with increased R-loops in an
   AQR-deficient context (Fig. [129]2h, i), implicating RAD52 as a
   mediator of TRC resolution. This effect of RAD52 was also found in
   Senataxin (SETX)^[130]45 depleted cells (Supplementary Fig. [131]3m,
   n), confirming that the effect of RAD52-loss on TRC’s was caused by the
   presence of increased R-loops, independent of causation. However, it is
   worth noting that the RAD52-Pol II interaction is not limited to the
   S-phase of the cell cycle, suggesting that RAD52 associates with the
   transcription machinery throughout the cell cycle, potentially acting
   as a surveyor of replication stress (Supplementary Fig. [132]3o–q).

RAD52 is recruited to sites of transcription-replication conflicts via its
RNA-Pol II interacting C-terminal domain

   A previous study demonstrated that RAD52 interacts with the
   transcription complex via its C-terminal domain^[133]35. Given our
   observation that RAD52 associates with Pol II (Fig. [134]1 & [135]2),
   we posited that RAD52’s C-terminal domain would be essential for its
   role in resolving TRCs via its interaction with the transcription
   machinery. In order to test this hypothesis, we generated an HA-tagged
   RAD52 mutant in which we deleted amino acids 302–410 (referred to as
   RAD52^∆C) (Fig. [136]3a, b, Supplementary Fig. [137]4a). This amino
   acid region has previously been identified as the minimum number of
   residues needed for RAD52 to interact with Pol II in vitro^[138]35. As
   RAD52 is a protein known for its role in DNA repair, we first confirmed
   that this was not disrupted by deleting the C-terminus. We tested this
   using functional assays of DSB repair, namely single strand annealing
   (SSA) and HR using the previously described reporters^[139]46,[140]47.
   In RAD52^-/- cells, we observed that complementation with either
   RAD52^WT or RAD52^∆C rescued the SSA, and HR deficient phenotypes
   induced by RAD52 deficiency (Fig. [141]3c, d). This implies that the
   loss of the C-terminus of RAD52 does not impair its DNA repair
   activity.

Fig. 3. C-terminal domain of RAD52 is essential for the prevention of
transcription-replication conflicts.

   [142]Fig. 3
   [143]Open in a new tab

   a Schematics of the domain structures of wild type (WT) - RAD52 protein
   and C-terminal (ΔC) deleted RAD52 (Δ302-410 amino acids). From
   N-terminal to C-terminal, RAD52 protein has DNA binding domain, RPA
   binding domain, RAD52 binding domain, RNA Pol II binding domain and a
   nuclear localization signal (NLS). The domains are not drawn to scale.
   b Western blot confirming the expression of HA-RAD52^WT and
   HA-RAD52^ΔC. Results reproducible for at least 2 biological replicates.
   c (Left) Scheme of the single stranded annealing (SSA) reporter system:
   The SSA-GFP reporter contains a 5′ fragment of the GFP (5′-GFP) gene,
   and a 3′ fragment of the GFP (3′-GFP) with an I-SceI site. Repair of
   the I-SceI-induced DSB by SSA leads to formation of GFP+ cells.
   (Middle) Quantification of SSA repair assay in WT and RAD52^−/− HCT116
   cells. (Right) Quantification of SSA repair assay in RAD52^−/− HCT116
   cells with overexpression of either RAD52^WT or RAD52^ΔC (n = 4
   biological replicates). d (Left) Scheme of the homology dependent
   recombination (HDR) reporter system The HDR-GFP reporter system
   contains the GFP gene interrupted by a I-SceI site, and a fragment of
   the GFP with truncated 3′- and 5′-terminus. Repair of the
   I-SceI-induced DSB by HDR leads to formation of GFP+ cells. (Middle)
   Quantification of HDR repair assay in WT and RAD52^−/− HCT116 cells.
   (Right) Quantification of HDR repair assay in RAD52^−/− HCT116 cells
   with overexpression of either RAD52^WT or RAD52^ΔC. (n = 5 biological
   replicates). e Schematic representation of PLA to visualize proximity
   of HA-tagged RAD52 (HA-RAD52) and RNA Pol II. f Representative images
   of the nuclear PLA foci (α-HA: α-RNA Pol II S2) across stated
   conditions with overexpression of either RAD52^WT or RAD52^ΔC (Scale
   bar 10 µM). g Quantitative analysis of nuclear PLA foci across stated
   conditions described in (f). The data presented shows ≥ 500 nuclei from
   3 biological replicates. h Schematic representation of PLA to visualize
   proximity of PCNA and RNA Pol II to measure TRCs. i Representative
   images of the nuclear PLA foci (α-PCNA: α-RNA Pol II S2) across stated
   conditions with overexpression of either RAD52^WT or RAD52^ΔC in HeLa
   cells (Scale bar 10 µM). j Quantitative analysis of nuclear PLA foci
   from across stated conditions described in (i). The data presented
   shows ≥ 500 nuclei from 3 biological replicates. In Fig. 3 (c) (d) (g)
   and (j), data are plotted as mean ± SEM and p-values calculated using
   unpaired two tailed t-tests. Schematics in Fig. 3 (a) (c) (d) (e) and
   (h) were created with BioRender.com released under a Creative Commons
   Attribution-NonCommercial-NoDerivs 4.0 International license. Source
   data are provided as a Source Data file.

   To test if the C-terminus was indeed responsible for RAD52’s
   interaction with Pol II, we performed a PLA between the two proteins
   (Fig. [144]3e, Supplementary Fig. [145]4b, c). As suggested by in vitro
   biochemical studies^[146]35, RAD52^∆C had a reduced interaction with
   Pol II in human cells (Fig. [147]3f, g), confirming the importance of
   the C-terminal domain. Furthermore, RAD52^∆C failed to rescue the
   elevated levels of R-loops and TRCs associated with the loss of RAD52
   (Fig. [148]3h–j, Supplementary Fig. [149]4d, e), implicating the
   essentiality of the C-terminal domain of RAD52 in the reduction of
   TRCs.

RAD52 recruits TOP2A to R-loops to help resolve transcription-replication
conflicts

   In order to elucidate the mechanism by which RAD52 facilitates
   resolution of TRCs, we performed an IP-MS analysis of overexpressed
   RAD52^WT and RAD52^∆C to tease apart factors that were differentially
   associated with the C-terminal domain of RAD52 (Supplementary
   Fig. [150]5a). The screen was done in an AQR-depleted background so as
   to increase basal levels of R-loop (Supplementary Fig. [151]3i, j). Of
   the 26 R-loop associated factors that had differential interaction with
   RAD52^WT and RAD52^∆C,TOP2A stood out as a top hit owing to its
   previously suggested role in TRC resolution^[152]48–[153]50
   (Supplementary Fig. [154]5b, c, Supplementary Data [155]4&[156]5,
   Supplementary Note [157]1). PLA between RAD52 and TOP2A was performed
   to confirm this interaction (Fig. [158]4a–c), which was further
   elevated in the presence of increased R-loops (Supplementary
   Fig. [159]5d, e). As expected, the RAD52-TOP2A interaction was
   disrupted in the absence of the RAD52 C-terminal domain
   (Fig. [160]4d–f), corroborating the IP-MS results.

Fig. 4. RAD52 recruits TOP2A to mitigate transcription-replication conflicts.

   [161]Fig. 4
   [162]Open in a new tab

   a Schematic representation of PLA to visualize proximity of RAD52 and
   TOP2A. b Representative images of the nuclear PLA foci (α-RAD52:
   α-TOP2A) in siNT (control) and siAQR transfected HeLa cells (Scale bar
   10 µM). c Quantitative analysis of nuclear PLA foci across stated
   conditions described in (b). The data presented shows ≥ 500 nuclei from
   3 biological replicates. d Schematic representation of PLA to visualize
   proximity of HA-tagged RAD52 (HA-RAD52) and TOP2A. e Representative
   images of the nuclear PLA foci (α-HA: α-TOP2A) in siRAD52 (5’UTR)
   transfected HeLa cells with overexpression of either RAD52^WT or
   RAD52^ΔC (Scale bar 10 µM). f Quantitative analysis of nuclear PLA foci
   across stated conditions described in (e). The data presented
   shows ≥ 500 nuclei from 3 biological replicates. g Representative
   images of S9.6 immunostaining to detect R-loops in siNT (control) and
   siTOP2A transfected HeLa cells. RNase H treatment was added as a
   negative control to eliminate R-loops (Scale bar 10 µM). h Quantitative
   analysis of nuclear S9.6 foci across stated conditions from (g). Data
   plotted as box and whiskers. Boxes extend from the 25th to 75th
   percentiles, with the median displayed as a line. The whiskers mark the
   minimum (1 percentile) and maximum (99th percentile). The data
   presented shows ≥ 500 nuclei from 3 biological replicates; p-values
   calculated using unpaired two tailed t-tests. i Schematic
   representation of PLA to visualize proximity of PCNA and RNA Pol II to
   measure TRCs. j Representative images of the nuclear PLA foci (PCNA:
   RNA Pol II S2) in siNT (control) and siTOP2A transfected HeLa cells
   (Scale bar 10 µM). k Quantitative analysis of nuclear PLA foci across
   stated conditions described in (j). The data presented shows ≥ 500
   nuclei from 3 biological replicates (l) Schematic representation of PLA
   to visualize proximity of S9.6 and TOP2A. m Representative images of
   the nuclear PLA foci (α-S9.6: α-TOP2A) in siNT (control), siRAD52 and
   siAQR transfected HeLa cells (Scale bar 10 µM). n Quantitative analysis
   of nuclear PLA foci across stated conditions described in (m)
   normalized to siNT. The data presented shows ≥ 500 nuclei from 3
   biological replicates. o Mechanistic model of RAD52 role in preventing
   transcription-replication conflicts. In Fig. 4 (c–k) and (n), data are
   plotted as mean ± SEM and p-values calculated using unpaired two tailed
   t-tests. Schematics in Fig. 4 (a) (d) (i) (l) and (o) were created with
   BioRender.com released under a Creative Commons
   Attribution-NonCommercial-NoDerivs 4.0 International license. Source
   data are provided as a Source Data file.

   Consistent with previous studies^[163]48–[164]50, loss of TOP2A led to
   increased R-loops as well as elevated TRCs (Fig. [165]4g–k,
   Supplementary Fig. [166]5f), confirming its role in mitigating R-loop
   associated TRCs. To study if recruitment of TOP2A to these sites was
   indeed RAD52 associated, we performed a PLA between TOP2A and S9.6
   (Fig. [167]4l, Supplementary Fig. [168]5g, h). There was a significant
   reduction in TOP2A recruitment to R-loops in the absence of RAD52, both
   in physiological conditions and with elevated R-loops (Fig. [169]4m,
   n), supporting our hypothesis that RAD52 helps to recruit TOP2A to
   R-loop associated TRCs, alleviating the duplex torsional stress and
   helping resolve TRCs (Fig. [170]4o). This observation was further
   supported by correlation with a previously published TOP2A ChIP-seq
   dataset^[171]51, where 18% of RNA Pol II-RAD52 overlapping peaks (from
   Fig. [172]1j) co-occurred with TOP2A peaks, indicating that TOP2A is
   recruited to RAD52-mediated TRCs and thus consistent with our model
   (Supplementary Fig. [173]5i).

RAD52-depleted cells have increased replication stress and accumulate γH2AX
at R-loop forming regions

   TRCs have been shown to be an established source of replication stress
   in cells owing to stalled replication and increased fork
   collapse^[174]1,[175]2,[176]5. Given RAD52’s role in TRC resolution, we
   hypothesized that RAD52 depletion could potentially lead to increased
   replication stress. We observed a mild increase in replication stress
   as measured by reduced DNA fiber track lengths after a sequential pulse
   with two thymidine analogs—5-Chloro-2′-deoxyuridine (CldU) and
   5-iodo-2′-deoxyuridine (IdU)—for 30 min each (Fig. [177]5a–c,
   Supplementary Fig. [178]6a–c). Furthermore, consistent with its role in
   the resolution of TRCs, overexpression of the RAD52^∆C mutant was
   unable to rescue the increased replication stress phenotype observed
   with RAD52-depletion (Fig. [179]5a–c). However, depletion of RAD52 did
   not affect global cell cycle profiles nor did it lead to an ATM and
   ATR-dependent checkpoint activation, suggesting that the induced local
   replication effects do not lead to an altered S-phase (Supplementary
   Fig. [180]6d, e). Moreover, R-loop associated TRCs have been shown to
   have increased levels of DNA damage^[181]2,[182]44.

Fig. 5. Loss of RAD52 causes replication stress and increased DNA damage.

   [183]Fig. 5
   [184]Open in a new tab

   a Schematic representation of DNA fiber assay performed in HCT116 wild
   type (WT) and RAD52 knockout cells (RAD52^-/-) cells with plasmid
   overexpression of either RAD52^WT or RAD52^ΔC followed by incubation
   with 5-Chloro-2′-deoxyuridine (CldU) and 5-iodo-2′-deoxyuridine (IdU)
   for 30 min each to label nascent DNA. b Representative images of DNA
   fiber images in HCT116 WT and RAD52^-/- cells with overexpression of
   either RAD52^WT or RAD52^ΔC (Scale bar 2 µM). (c) Measurement of DNA
   fiber lengths across stated conditions described in (b) to measure
   replication rates. Data plotted as box and whiskers. Boxes extend from
   the 25th to 75th percentiles, with the median displayed as a line. The
   whiskers mark the minimum (1 percentile) and maximum (99th percentile).
   The data presented shows ≥100 DNA fibers from 3 biological replicates;
   p-values calculated using unpaired two tailed t-tests. d Heat map of
   the intensity of γH2AX ChIP signals (siNT and siRAD52 transfected HeLa
   cells) at genes that have a detectable R-loop peak as determined in
   Supplementary Fig. [185]6b. The γH2AX occupancy is displayed relative
   to the TSS ± 0.5 Mb. e Schematic representation of PLA to visualize
   proximity of S9.6 and γH2AX. f Representative images of the nuclear PLA
   foci (α-S9.6: α-γH2AX) in siNT (control), siRAD52 and siAQR transfected
   HeLa cells (Scale bar 10 µM). g Quantitative analysis of nuclear PLA
   foci across stated conditions described in (f). Data are plotted as
   mean ± SEM. The data presented shows ≥ 500 nuclei from 3 biological
   replicates; p-values calculated using unpaired two tailed t-tests.
   Schematics in Fig. 5 (a) and (e) were created with BioRender.com
   released under a Creative Commons Attribution-NonCommercial-NoDerivs
   4.0 International license. Source data are provided as a Source Data
   file.

   To further assess the DNA damage at RAD52-associated R-loop forming
   regions (Supplementary Fig. [186]6b), we analyzed the distribution of
   γH2AX around R-loops in RAD52-depleted cells using ChIP-seq. We
   observed that there was increased γH2AX accumulation at R-loop forming
   genes, which was persistent even ±0.5 Mb around the TSS (Fig. [187]5d,
   Supplementary Fig. [188]6f, g). These findings were further
   corroborated by performing a PLA of S9.6 and γH2AX in RAD52-depleted
   cells, under physiological and increased R-loop conditions
   (Fig. [189]5e–g). However, loss of RAD52 does not elicit a global DNA
   damage response as confirmed by the unaltered levels of total γH2AX in
   normal versus RAD52-depleted cells (Supplementary Fig. [190]6h, i).
   These findings are also consistent with the unchanged cell cycle
   progression previously observed.

R-loops are a source of genome instability in tumors

   R-loops have long been associated with DNA damage as a consequence of
   prolonged replication fork stalling and DSBs arising from collapsed
   replication forks^[191]1,[192]2. However, most of the evidence for the
   damage associated with R-loops comes from indirect evidence in the form
   of increased γH2AX foci or comet assay tail lengths^[193]5. The direct
   consequence of R-loops on the genome in the form of mutational
   signatures remains poorly understood. We hypothesized that if R-loops
   can lead to DSBs, there should be an increased burden of genomic scars
   associated with conserved R-loop forming regions across human tumors.

   To investigate this hypothesis, we built a consensus R-loop dataset
   comprised of correlated peaks from 18 published datasets^[194]52
   (Supplementary Fig. [195]7a, b, Supplementary Data [196]6). This
   combined dataset was assessed to confirm that it followed the
   established conventions of R-loops being associated with transcribed
   genes, with a significant increase in occurrence being observed at TSSs
   and TESs^[197]53 (Fig. [198]6a). We next proceeded to overlay our
   R-loop dataset with previously identified somatic mutations from the
   PCAWG, ICGC and TCGA cohorts^[199]54–[200]56 (see methods for details).
   We observed a significant increase in structural alterations associated
   with R-loop forming regions across the genome (Fig. [201]6b). Not
   surprisingly, we observed a decrease in single nucleotide variants
   (SNVs) in R-loop forming regions, consistent with likely increased
   transcription-coupled repair (Fig. [202]6c and [203]6f). In contrast,
   genetic alterations likely to form due to a DSB were significantly
   increased in R-loop forming regions, including both insertions and
   deletions > 1 bp (long InDels) and structural variants (SVs),
   (Fig. [204]6d, e, g, h, Supplementary Fig. [205]8a–c). Historically,
   SVs and indels have been associated with aberrant repair at DSBs either
   from NHEJ or backup pathways to HR^[206]57. These observations are
   consistent with the idea that R-loops lead to the formation of DSBs,
   repair of which can result in large insertions, deletions, and
   translocations, as we observed.

Fig. 6. Increased mutational burden and genomic instability associated with
R-loops were observed in human tumor samples.

   [207]Fig. 6
   [208]Open in a new tab

   a The genomic distribution of the consensus R-loop dataset as
   identified in Supplementary Fig. [209]7b. Various genomic regions are
   color coded according to the labels on the bottom. The expected
   distribution in case peaks were randomly positioned in the genome is
   shown for comparison. TTS and TES are significantly enriched in the
   R-loop dataset (P < 0.001) as determined by the Fisher’s exact test. b
   Circos plots showing structural variations and genomic alterations
   caused by breakpoints enriched in R-loop (right) forming regions versus
   non-R-loop regions (left). c–e Genomic windows depicting the
   frequencies of single nucleotide variants (SNV-left), long
   InDels > 1 bp (middle) and structural variants (SV-right), analyzed at
   R-loop vs non-R-loop across various cancer types. The horizontal
   coordinate represents different types of cancers and vertical
   coordinates represents coverage at all genomic regions, TSS and TES.
   Data is quantified by log fold change between mutational burden at
   R-loop versus non-R-loop regions. f–h Quantification of the average
   number of SNVs, Long indels, SVs per Mb of genome at TSS and TES in
   R-loop versus non-R-loop forming regions. Data are plotted as
   mean ± SEM; p-values calculated using unpaired two tailed t-tests. i
   Schematic to show the two types of TRCs: co-directional collisions
   (top) and Head on collision (bottom). The schematic illustration was
   created with BioRender.com released under a Creative Commons
   Attribution-NonCommercial-NoDerivs 4.0 International license. j
   Quantification of the percentage of collisions occur at R-loop sites in
   terms of co-directional collisions and head-on collisions. Data are
   plotted as a bar graph with absolute percentage. (Fisher’s exact test).
   k Quantification of the comparison of average number of alterations per
   Mb of genome which are mapped to collision sites between CD and HO.
   Data are plotted as mean ± SEM. p-values were calculated by two-sided
   non-parametric Mann–Whitney test. l Quantification of the comparison of
   average number of alterations per Mb of genome at R-loop sites between
   tumors with high and low expression of RAD52. Tumors were categorized
   as expressing low (RAD52 low; bottom quartile) or high levels of RAD52
   mRNA (RAD52 high; top quartile). Data plotted as box and whiskers.
   Boxes extend from the 25th to 75th percentiles, with the median
   displayed as a line. The whiskers mark the minimum (5^th percentile)
   and maximum (95^th percentile). (n = 95 (RAD52 high), n = 94 (RAD52
   low)); p-values calculated using unpaired two tailed t-tests. Source
   data are provided as a Source Data file.

   TRCs are preferentially enriched at sites of head-on collisions (HO) as
   opposed to co-directional collisions (CD)^[210]44 (Fig. [211]6i). To
   determine whether HO collisions could lead to increased accumulation of
   R-loop induced genomic alterations, we classified our R-loop dataset as
   CD or HO-associated by overlaying it with the previously published and
   annotated Okazaki fragment sequencing (OK-seq) data^[212]44,[213]58
   (Fig. [214]6j). As expected, we observed a significant difference
   between the genetic alterations at CD versus HO, with a 3-fold increase
   at HO (Fig. [215]6k). Furthermore, tumors with lower RAD52 expression
   levels seem to correlate with increased mutations at R-loops
   (Fig. [216]6l), supporting the idea that RAD52 acts at R-loops to
   prevent genomic rearrangements.

Discussion

   Despite its apparent dispensability in humans, RAD52 has been
   characterized as an essential backup DNA repair factor for BRCA2 due to
   its ability to mediate HR and SSA. Recently, RAD52 was shown to be
   involved in genome maintenance via additional roles in Break-Induced
   Replication (BIR) and mitotic DNA synthesis (MiDAS) arising from
   replication stress^[217]29,[218]59. In this study, we uncovered a
   previously undescribed role for human RAD52 in R-loop homeostasis via
   its association with the transcriptional machinery. We identified a
   robust RAD52-Pol II interaction (Fig. [219]1) and determined that RAD52
   associates with Pol II predominantly at the TSS in a subset of genes.
   Notably, loss of RAD52 alone was sufficient to cause increased Pol II
   pausing at these loci (Fig. [220]2).

   While Pol II pausing has been implicated as a rate-limiting step in
   transcription, it can be particularly problematic for the maintenance
   of genome integrity by interfering with the replication machinery,
   causing TRCs followed by DNA damage^[221]60. Pol II pausing promotes
   the formation of transient secondary structures such as R-loops which
   are the major source of such conflicts^[222]40,[223]44. In this study,
   we present strong evidence that RAD52 helps resolve these R-loops from
   forming TRCs, the loss of RAD52 leading to increased TRCs and
   associated DNA damage (Figs. [224]2 and [225]5). Interestingly, while
   almost half of chromatin bound RAD52 was associated with Pol II, only
   half of these sites were associated with R-loop formation
   (Fig. [226]2d), suggesting that RAD52 may associate with transcription
   sites independently of R-loop formation. Furthermore, we found that the
   previously uncharacterized C-terminal domain of RAD52 is essential for
   its Pol II interaction and its role in TRC resolution, separate to its
   role in HR and SSA (Fig. [227]3). RAD52’s involvement in R-loop
   resolution and collision avoidance is supplementary to its DSB-repair
   roles. RAD52’s ability to support DNA/RNA binding is linked to its
   annealing abilities, but recruitment to the sites of transcription
   requires the C-terminal domain. We think that apart from RAD52 helping
   to resolve R-loops, RAD52 can also have a downstream role in repairing
   DSBs that arise from persistent R-loops as seen in
   transcription-coupled homologous recombination (TC-HR) or
   transcription-associated homologous recombination repair
   (TA-HR)^[228]24–[229]28.

   The cell tightly maintains R-loop homeostasis by regulating pathways
   that control its formation and degradation^[230]4,[231]7. The release
   of supercoiling associated with transcription and replication is
   mediated by topoisomerase 1 (single-strand DNA nicking) within the
   region of the transcription site or the site active
   replication^[232]61. However, when transcription and replication are
   leading to a head-on collision, the duplex DNA between the sites of
   transcription and replication are trapped by supercoiling of different
   polarities, creating a zone of conflict, where the duplexes may form
   “knotted” loops. Release of the accumulated duplex DNA torsional stress
   is required to resolve the TRC. The Topoisomerase IIα (TOP2A) cleavage
   complex is one such protein complex that could release the two sources
   of negative supercoiling coming from opposite directions during
   transcription and replication^[233]62,[234]63, the loss of which leads
   to increased R-loop accumulation and increased TRCs^[235]50
   (Fig. [236]4). We found that RAD52 promotes TOP2A recruitment to R-loop
   sites, thus helping resolve TRCs and preventing the ensuing genomic
   instability (Fig. [237]4). It is therefore conceivable that the
   increased Pol II pausing observed in RAD52-depleted cells (Fig. [238]2)
   could be ascribed to the inability of the cell to recruit TOP2A to the
   TRC region, leading to increased torsional stress^[239]63 and R-loop
   enrichment. However, while we demonstrate that the C-terminal domain of
   RAD52 seems to mediate TOP2A’s recruitment to TRC’s, further
   experiments are warranted to determine if this function is dependent or
   independent of RAD52’s association to the Pol II complex, which is also
   mediated via its C-terminal domain.

   Pathological R-loops have long been implicated in genome instability,
   albeit through indirect evidence in the form of γH2AX signal or
   accumulation of DSBs at R-loop forming regions^[240]5. Here, we provide
   direct evidence of R-loops acting as a driver of DSB-induced genomic
   instability in varied human tumor samples (Fig. [241]6). We observed
   elevated levels of structural variants and indels at R-loop forming
   regions across tumor types, but not single nucleotide variants, in
   contrast to recent observations reported to be linked to
   R-loops^[242]64. This difference likely stems from analytic
   differences, whereas here we used tumors only sequenced by WGS with
   light filtering, prior work had analyzed heterogeneously sequenced
   tumors with over 95% of cancers removed from analysis. Furthermore, an
   increased density of these mutations was observed at HO collisions when
   compared to CO collisions, supporting the concept that HO collisions
   produce DSB and are more harmful for the cell. Additionally, low RAD52
   expression in tumors was associated with an increased mutational burden
   at R-loops, consistent with the previously uncharacterized role for
   RAD52. Moreover, there was no correlation between TOP2A expression
   level and mutational signatures, unlike that seen with RAD52 expression
   levels (Fig. [243]6l, Supplementary Fig. [244]9a–c). This can be
   explained by the fact that there is no direct correlation between the
   expression levels of TOP2A and RAD52 (Supplementary Fig. [245]9c), but
   the effect we see is a function of RAD52 not being able to recruit
   TOP2A to the sites of collisions, resulting in DNA breaks and genomic
   instability. This is in concordance with the long-standing view in the
   field that for DNA repair, the protein levels do not always play the
   rate-limiting step, but it is often the opportunity for the protein to
   get to the DNA lesion at the right time that matters. Hence, expression
   of TOP2A is unsurprisingly unrelated to genomic instability.

   Our study finds a unique role for RAD52 in genome maintenance via its
   ability to resolve R-loops and TRCs. Considering that transcription
   induced replication stress is one of the most common endogenous sources
   of DSB in the cell, it is possible that this transcription associated
   role of RAD52 may also contribute to its synthetic lethal phenotype
   observed in BRCA-deficient cells, in addition to the previously
   characterized DSB repair activities, including RAD51-mediator function
   and single-strand annealing^[246]30–[247]32. Furthermore, we
   demonstrate that R-loops if left unrepaired, can lead to genomic
   instability resulting in mutagenesis, chromosomal rearrangements, and
   cancer.

Methods

Cell Culture and transfections

   HeLa (ATCC, #CCL-2), HEK293T (ATCC, #CLR-3216) and U2OS (ATCC, #HTB-96)
   cells were grown in complete DMEM high glucose supplemented with 10%
   FBS, 2 mM L-glutamine, 20 mM HEPES, 100 I.U./ml Penicillin, and
   100 μg/ml Streptomycin. HCT116 WT and RAD52^-/- cell lines were
   obtained from Dr. Eric A. Hendrickson^[248]65 and cultured in McCoy’s
   5 A medium supplemented with 10% FBS, 2 mM L-glutamine and 100 I.U./ml
   Penicillin, and 100 μg/ml Streptomycin. All cells were grown in a
   humidified 37 °C incubator with 5% CO[2].

   0.5 × 10^6 cells were reverse transfected using RNAiMAX (Invitrogen,
   #13778150) according to the manufacturer’s instructions with 40 pmol of
   siRNAs of Rad52 (Dharmacon ON-TARGETplus SMARTpool, #L-011760), Rad52
   5’UTR (Dharmacon ON-TARGET 5’UTR, #J-011760-06), AQR (Dharmacon
   ON-TARGETplus SMARTpool, #L-022214), TOP2A (Dharmacon ON-TARGETplus
   SMARTpool, #L-004239) or scrambled non target siRNA (Dharmacon
   ON-TARGETplus SMARTpool, #D-001810) as indicated. Cells were harvested
   48 h after transfection and processed as needed.

   0.25 × 10^6 cells/well were seeded in a six-well plates and treated
   with the respective siRNA as described above, 24 h post knock-down,
   cells were transfected with 2 μg of the HA-RAD52^WT or HA-RAD52^∆C
   over-expressing or HA-control plasmid (see below) using Lipofectamine
   3000 (Invitrogen, #L3000015).

Plasmid constructs

   The plasmids used to express HA-RAD52^WT and HA-RAD52^∆C were
   derivatives of pcDNA3.1( + )-N-HA. The gblocks corresponding to full
   length RAD52 and RAD52 (Δ302-410 amino acids) were cloned into the
   pcDNA3.1( + )-N-HA backbone using KpnI/NotI restriction enzymes and the
   plasmid was confirmed by sanger sequencing.

Immunoprecipitation and MASS spectrometry Analysis

   HEK293T cells were transfected with the respective plasmids as per the
   experimental conditions mentioned. Post transfection, the cells were
   washed with ice cold PBS and resuspended in Lysis buffer (50 mM
   Tris-HCl pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% NP-40, 1X Protease
   Inhibitor) and incubate for 20 min on rotor at 4 °C. After 4 cycles of
   water-bath sonication for a total of 6 min, lysates were centrifugation
   for 10 min at 15000 x g at 4 °C. Immunoprecipitation was performed on
   the supernatants using Pierce Anti-HA Magnetic Beads (Thermo
   Scientific, #88836) overnight at 4 °C on the rotor. Beads were
   extensively washed in the lysis buffer and stored at -80 °C, until
   ready for mass spectrometry.

   A fraction of the beads was processed for immunoblotting to confirm the
   pull-down. The beads were denatured and eluted in LDS Non-Reducing
   Sample Buffer (Thermo Scientific, #84788). by boiling for 5–10 min.
   Proteins were separated on 4–12% acrylamide SDS-PAGE, transferred on
   Nitrocellulose membrane and detected with the indicated antibodies
   described in the table and ECL reagents.

   Sample preparation and mass spec analysis was carried out by Poochon
   Scientific (Frederick, Maryland). Post IP, in order to precipitate the
   protein, beads from 3 independent replicates were treated with 50 μl of
   2% SDS, heated at 95 °C for 10 min and centrifuged. Supernatant was
   processed for trypsin digestion as per SOP-PS-6003 (Standard Operation
   of Procedure for in Solution Digestion). The digested peptide mixture
   was then concentrated and desalted using SPN columns as per SOP-
   PS-6005 (Standard Operation of Procedure for Desalting Digested
   Peptides). Reconstituted desalted peptides in 30 μl of 0.1% formic
   acid. 12 μl of peptides was analyzed by 110 min LC/MS/MS run. The
   LC/MS/MS analysis of samples were carried out using a Thermo Scientific
   Orbitrap Exploris 240 Mass Spectrometer and a Thermo Scientific Dionex
   UltiMate 3000 RSLCnano System. Peptide mixture from each sample was
   loaded onto a peptide trap cartridge at a flow rate of 5 μL/min. The
   trapped peptides were eluted onto a reversed-phase EasySpray C18 column
   (Thermo Scientific) using a linear gradient of acetonitrile (3–36%) in
   0.1% formic acid. The elution duration was 110 min at a flow rate of
   0.3 μl/min. Eluted peptides from the EasySpray column were ionized and
   sprayed into the mass spectrometer, using a Nano-EasySpray Ion Source
   (Thermo Scientific) under the following settings: spray voltage,
   1.6 kV, Capillary temperature, 275 °C. Raw data file acquired from each
   sample was searched against human protein sequences database and target
   protein sequences provided by the client using the Proteome Discoverer
   2.4 software (Thermo Scientific, San Jose, CA) based on the SEQUEST
   algorithm. Carbamidomethylation ( + 57.021 Da) of cysteines was fixed
   modification, and Oxidation Met and Deamidation Q/N-deamidated
   ( + 0.98402 Da) were set as dynamic modifications. The minimum peptide
   length was specified to be five amino acids. The precursor mass
   tolerance was set to 15 ppm, whereas fragment mass tolerance was set to
   0.05 Da. The maximum false peptide discovery rate was specified as
   0.01. The resulting Proteome Discoverer Report contains all assembled
   proteins with peptides sequences and peptide spectrum match counts
   (PSM#).

   Protein quantification/normalization used the normalized spectral
   abundance factors (NSAFs) method to calculate the protein relative
   abundance^[249]66,[250]67. NSAF normalization was carried out as
   follow. NSAFs were calculated as follows:
   [MATH: <mi
   mathvariant="normal">NSAFN</mi><mo>=</mo><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">SN</mi><mo>/</mo><mi
   mathvariant="normal">LN</mi></mrow><mo>)</mo></mrow><mo>/</mo><mfenced
   close=")" open="("><mrow><mo mathsize="big">∑</mo><mi
   mathvariant="normal">ni</mi><mo>=</mo><mn>1</mn><mi
   mathvariant="normal">Si</mi><mo>/</mo><mi
   mathvariant="normal">Li</mi></mrow></mfenced> :MATH]
   1

   Where N is the protein index; SN (PSM#) is the number of peptide
   spectra matched to the protein; LN is the length of protein N (number
   of amino acid residues); and n is the total number of proteins in the
   input database (proteome profile for one cell sample). Protein
   enrichment was calculated by comparing fold change between the sample
   pull down and the HA-tag control.

   Pathway analysis was carried out using Gene Ontology (GO)
   software^[251]68,[252]69. Functional protein interaction network
   analysis was performed using interaction data from the STRING
   database^[253]70. Only interactions with a score >0.15 are represented
   in the networks.

Co-Immunoprecipitation assay

   To detect endogenous RAD52 interacting proteins, HeLa cells were seeded
   in a 10 cm dish for 24 h. The cells were then washed with ice cold PBS
   and resuspended in Lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl,
   1 mM EDTA, 1% NP-40, 1X Protease Inhibitor) and incubate for 20 min on
   rotor at 4 °C. After 5 cycles of water-bath sonication for a total of
   10 min, lysates were centrifugation for 10 min at 15000 x g at 4 °C.
   The lysate was treated with Benzonase nuclease (Millipore-Sigma
   Novagen, #707463) at a concentration of 25U per 1 ml on the rotor for
   1 h at room temperature. Immunoprecipitation was performed on the
   supernatants using α-RAD52 conjugated agarose beads (Santa Cruz
   Biotech, #sc-365341 AC) overnight at 4 °C on the rotor. A control
   immunoprecipitation was carried out using normal mouse IgG conjugated
   agarose beads (Santa Cruz Biotech, #sc-2343). Beads were extensively
   washed in the lysis buffer and processed for immunoblotting to confirm
   the pull-down. The beads were denatured and eluted in LDS Non-Reducing
   Sample Buffer (Thermo Scientific, #84788). by boiling for 5–10 min.
   Proteins were separated on 4–12% acrylamide SDS-PAGE, transferred on
   PVDF membrane and detected with the indicated antibodies and ECL
   reagents.

Proximity ligation assay

   Cells were seeded on poly-L-lysine-coated coverslips and reverse
   transfected with indicated siRNAs on the same day. 48 h after
   transfection, cells were washed with PBS and pre-extracted with 0.25%
   TritonX-100 for 5 min on ice. Cells were fixed with 4% paraformaldehyde
   for 20 min and then washed with PBS twice for 5 min. Cells were
   incubated with 100% methanol for 30 s and then washed with PBS three
   times for 5 min. Cells were blocked with blocking solution overnight at
   4 °C provided in the PLA kit. Coverslips were incubated with primary
   antibodies (Supplementary Table [254]1) diluted in antibody diluent for
   1 h at room temperature (RT). Next, coverslips were incubated with
   mouse/rabbit secondary probes Duolink® In Situ PLA® Probe Anti-Rabbit
   PLUS and Duolink® In Situ PLA® Probe Anti-Mouse MINUS. Proximity
   ligation was performed using either Duolink In Situ Red Kit
   Mouse/Rabbit (Millipore-Sigma, #DUO92008), Duolink® In Situ Detection
   Reagents FarRed (Millipore-Sigma, #DUO92013), and Duolink® In Situ
   Detection Reagents Green (Millipore-Sigma, DUO92014) according to the
   manufacturer’s protocol. The oligonucleotides and antibody-nucleic acid
   conjugates used were those provided in the Millipore-Sigma PLA kit. For
   EdU (5-ethynyl-2’-deoxyuridine) staining in PLA experiments, cells were
   incubated with 10 µM EdU for 30 min before pre-extraction step. EdU
   detection was performed via Click-IT reaction mixture provided in the
   kit (Thermo Scientific, #[255]C10086) according to manufacturer’s
   instructions. Stained cells were mounted with mounting medium
   containing DAPI. Samples were visualized on the Nikon Spinning disk
   confocal microscope at 60X, and images were collected and then analyzed
   with the Nikon Elements AR Analysis Explorer (version 5.21.03).

Immunofluorescence

   For R-loop staining, Experiments were performed similar to reported
   procedures^[256]71,[257]72 with details as follows. Cells were fixed
   with ice cold methanol for 10 min and permeabilized with acetone for
   3 min. Cells were washed 3 times with ice cold PBS (5 min each). For
   RNase III treated samples: cells were incubated with RNase III enzyme
   (New England Biolabs, #M0245S) with a dilution of 1:200 in 1X RNase III
   buffer supplemented with manganese chloride @37 °C for 30 min. For
   RNase H treated samples: cells were incubated with RNase H enzyme (New
   England Biolabs, #M0297L) with a dilution of 1:50 in 1X RNase H buffer.
   Cells were incubated in RNase H enzyme @37 °C for 2 h. After
   incubation, cells were washed with cold PBS for 5 min (3 times). Cells
   were incubated in blocking buffer (3% BSA, 1% goat serum 0.1% Triton in
   4X SSC buffer) overnight at 4 C. Cells were incubated with the S9.6
   primary antibody (1:500, Millipore, #MABE1095) diluted in blocking
   buffer for 2 h at RT and washed 3 times with 4X SSC buffer for 5 min
   each. Cells were incubated with the Alexa Fluor Plus 488 secondary
   antibody (1:1500, Invitrogen, #A48255) diluted in blocking buffer for
   1 h at RT (dark storage) and washed 3 times with 4X SSC buffer for
   5 min each. For mitochondrial staining, cells were incubated with
   250 nM MitoTracker Deep Red FM probe (Thermo Scientific, #[258]M22426)
   for 30 min prior to the fixation step. Cells were mounted with mounting
   medium containing DAPI (Millipore-Sigma, #DUO82040) for 30 min at RT
   (dark storage) and the slides were stored at 4 °C. For counter staining
   the nucleolus, the nucleolin antibody (Cell Signaling, #14574; 1:1000
   dilution) was combined with the S9.6 antibody and staining was carried
   out as described above, The Alexa Fluor Plus 555 secondary antibody
   (1:1500, Invitrogen, #A32732) was used.

   For γH2AX staining, cells were fixed with 4% paraformaldehyde (EMS,
   #15710) for 20 min and washed with 1X PBS twice for 5 min each. Cells
   were permeabilized with 0.5% Triton X-100 for 10 min at RT and washed
   with 1X PBS three times for 5 min each. Cells were incubated in
   blocking buffer (3% BSA, 0.1% Triton in 1X PBS buffer) overnight at
   4 C. Primary antibody incubation was performed with mouse monoclonal
   anti-phospho-H2A.X (Ser139) Antibody (1:1500) (Millipore-Sigma,
   #05-636) for 2 h at RT. Cells were washed three times with 0.1%
   TritonX-100 in 1X PBS for 5 min each. Secondary antibody incubation was
   performed with Goat anti-mouse Alexa Fluor Plus 488 (Invitrogen,
   #A32723) for 1 h in dark at RT. Cells were washed three times with 0.1%
   TritonX-100 in 1X PBS for 5 min each.

   Samples were visualized on the Nikon Spinning disk confocal microscope
   at 60X, and images were collected and then analyzed with the Nikon
   Elements AR Analysis Explorer (version 5.21.03).

Dot blot

   Total nucleic acid was extracted using the DNeasy Blood and Tissue Kit
   (Qiagen, #69504) and RNA:DNA hybrids were detected and quantified by
   dot blot assay. Samples were spotted on Amersham Hybond-N+ membrane
   (Cytiva, #RPN119B) in duplicates using the Bio-Dot Apparatus (BioRad,
   #1706545) and vacuum suction, dried and UV crosslinked. For the RNase H
   treatment, the genomic DNA was incubated in with the enzyme (New
   England Biolabs, #M0297) at a concentration of 10U of RNase H /μg of
   DNA at 37 °C for 20 h prior to spotting. Blots were blocked with 5%
   nonfat dried milk in TBST, then incubated overnight at 4 °C with an
   anti-RNA:DNA hybrid antibody S9.6 against RNA:DNA hybrids (Millipore,
   #05-636, 1:500 dilution) and double-stranded DNA (Novus
   Biologicals,#NBP3-07302,1:500 dilution) in TBST. Blots were washed
   3 times for 15 min each in TBST, incubated in either α-mouse IgG,
   HRP-linked Antibody (Cell Signaling, #7076; 1:5000 dilution) or
   α-rabbit IgG, HRP-linked Antibody (Cell Signaling, #7074; 1:5000
   dilution) respectively for 1 hr at room temperature. Images were taken
   after incubation with SuperSignal West Pico PLUS (Fisher Pierce,
   PI34578).

Western Blotting

   Cells were collected by trypsinization, lysed on ice in RIPA buffer
   (25 mM Tris-HCl pH 7.6, 150 mM NaCl, 0.1% SDS, 1% NP-40, 1% sodium
   deoxycholate) supplemented with 1 tablet/10 ml lysis buffer of
   cOmplete™, EDTA-free Protease Inhibitor Cocktail (Roche,
   #11-873-580-001), 1 tablet/10 ml lysis buffer of phosphatase inhibitor
   cocktail PhosSTOP (Millipore-Sigma, #4906845001) and 10 mM PMSF for
   30 min, sonicated, and clarified by centrifugation for 20 min at 10,000
   RPM at 4 °C. The supernatant was quantified using the Pierce BCA
   protocol (Thermo Scientific, #23225). Equivalent amounts of proteins
   were separated by SDS–PAGE and transferred onto a nitrocellulose
   membrane. Membranes were blocked in 5% milk in TBST (137 mM NaCl,
   2.7 mM KCl, 19 mM Tris-Base and 0.05% Tween-20) for at least 1 h at
   room temperature. Incubation with primary antibodies was performed
   overnight at 4 °C. Membranes were washed in TBST and incubated with
   HRP-conjugated secondary antibodies for 1 h at room temperature, and
   developed with Pierce ECL (Thermo Scientific, #32106). The primary
   antibodies used for Western blotting included α-RAD52 (Santa Cruz
   Biotech, #sc-365341; 1:500 dilution), α-GAPDH (Abcam, #ab8245c; 1:1000
   dilution), α-AQR (Bethyl Laboratories Inc, #A302547A; 1:500 dilution),
   α-Lamin A/C (Cell Signaling, #4777; 1:1000 dilution), α-HA (Santa Cruz
   Biotech, #sc-7392; 1:1000 dilution), α-Vinculin (Cell Signaling,
   #13901; 1:1000 dilution), α-TOP2A (Santa Cruz Biotech, #sc-365916;
   1:500 dilution), α-βTubulin (GeneTex, #GTX107175; 1:1000 dilution),
   α-ATR (phospho Thr1989) (GeneTex, #GTX128145; 1:1000 dilution), α-ATM
   (phospho S1981) (Abcam, #ab81292; 1:2000 dilution), α-RPA2 (phospho
   S4/S8) (Abcam, #ab87277; 1:2000 dilution), α-RPA2 (phospho S33) (Bethyl
   Laboratories Inc, #A300246A; 1:1000 dilution), α-ATR (Santa Cruz
   Biotech, sc-1887; 1:500 dilution), α-ATM (Abcam, #ab78; 1:1000
   dilution), α-RPA2 (Cell Signaling, #2208; 1:1000 dilution), α-RPA1
   (Cell Signaling, #2267; 1:1000 dilution), α-Rbp1 CTD (Cell Signaling,
   #2629; 1:4000 dilution) and α-SETX (Novus Biologicals, #NB100-57542;
   1:1000 dilution). The secondary antibodies were α-mouse IgG, HRP-linked
   Antibody (Cell Signaling, #7076; 1:5000 dilution), α-rabbit IgG,
   HRP-linked Antibody (Cell Signaling, #7074; 1:5000 dilution), α-rat
   IgG, HRP-linked Antibody (Cell Signaling, #7077; 1:5000 dilution) or
   α-goat IgG (H + L), HRP-linked Antibody (Invitrogen, #PA1-28664; 1:5000
   dilution).

Chromatin Immunoprecipitation

   HeLa cells were transfected with the respective siRNAs as per the
   experimental conditions mentioned. 48 h post transfection, the cells
   were cross-linked with formaldehyde at 1% final concentration for
   10 min at room temperature or cross-linked according to dual
   cross-linking protocol as described previously^[259]73. Chromatin
   immunoprecipitation (ChIP) assays were conducted using the Zymo-Spin
   ChIP kit (Zymo Research Corp., #D5210) following manufacturer’s
   instructions. Sonication was performed at high power setting for 80
   cycles (30 s ON, 30 s OFF) using a Bioruptor Plus (Diagenode Inc.,
   Denville, NJ), yielding a modal fragment size of <600 bp. Antibodies
   used in ChIP assays included: α-RNA Pol II (Cell Signaling 2629S),
   α-γH2AX (Abcam, #ab2893), α-RAD52 (Santa Cruz Biotechnology,
   #sc-365341), or α-S9.6 (Kerafast, #ENH001) and normal mouse IgG
   (Millipore-Sigma, #12-371). Approximately 20 µg of chromatin was used
   in each ChIP assay with 5 µg of antibodies, or 100 µg of chromatin was
   used in each ChIP assay with 10 µg of antibodies. IgG negative control
   was included with each assay. DNA libraries were prepared by Zymo
   Research Epigenetics Services and were sequenced on a NovaSeq
   sequencer.

ChIP-seq analysis

   ChIP sequencing reads were trimmed using Cutadapt^[260]74 and aligned
   to the human reference genome (hg19) using BWA^[261]75. We applied read
   filtering to remove reads that were marked as duplicates (picard)
   (“Picard Toolkit.” 2019. Broad Institute, GitHub Repository.
   [262]https://broadinstitute.github.io/picard/; Broad Institute), reads
   that were not primary alignments, unmapped, mapped to multiple
   locations, or contained > 4 mismatches (samtools)^[263]76.

   Deeptools^[264]77 bamCoverage was used to create normalized bigWig
   files using CPM (counts per million) normalization. We further used
   deeptools computeMatrix, plotHeatmap and plotProfile for visualization
   of ChIP-Seq data at TSS and TTS.

   MACS2 callpeak^[265]78 was used to identify ChIP-Seq peaks relative to
   input data. To compare peaks from different conditions we retained
   peaks with P-values > 2 (-log10) from the narrowPeak output files. The
   ChIPpeakAnno package^[266]79 was used in R to create Venn diagrams of
   colocalized peaks in different conditions.

RNA-seq

   HeLa cells were transfected with the respective siRNAs as per the
   experimental conditions mentioned. 48 h post transfection, samples were
   sent to Zymo Research for Total RNA-Seq Service. Total RNA-Seq
   libraries were constructed from 500 ng of total RNA. Libraries were
   prepared using the Zymo-Seq RiboFree Total RNA Library Prep Kit (Zymo
   Research Corp., #R3000) according to the manufacturer’s instructions.
   Briefly, RNA was reverse transcribed into cDNA, which was followed by
   ribosomal RNA depletion. After that partial P7 adapter sequence was
   ligated at 3’ end of cDNAs, followed by second strand synthesis and
   partial P5 adapter ligation to 5’ end of the double stranded DNAs.
   Lastly, libraries were amplified to incorporate full length adapters
   under the following conditions: initial denaturation at 95 °C for
   10 min; 10–16 cycles of denaturation at 95 °C for 30 sec, annealing at
   60 °C for 30 sec, and extension at 72 °C for 60 sec; and final
   extension at 72 °C for 7 min. Successful library construction was
   confirmed with Agilent’s D1000 ScreenTape Assay on TapeStation. RNA-Seq
   libraries were sequenced on an Illumina NovaSeq to a sequencing depth
   of at least 30 million read pairs (150 bp paired-end sequencing) per
   sample.

RNA-seq analysis

   RNA-Seq reads were aligned to GRCh37 human genome using STAR RNA-Seq
   aligner^[267]80, and then reads from transcripts were counted using
   GenomicAlignments package in Bioconductor^[268]81,[269]82. Fold changes
   between siRAD52 and siNT cells were obtained from DESeq2^[270]83, which
   was performed using raw read counts in siRAD52/siNT as pairs. P-values
   obtained from DESeq2 were corrected for multiple testing using
   Benjamini and Hochberg method.

   According to Chip-seq RNA Pol II and RAD52 data, we found 10% of peaks
   in RNA Pol II data overlapped with RAD52 peaks. Thus, we tried to tell
   whether the expression changes in genes near these overlapped peaks
   were influenced by RAD52. Here, we retrieved genes overlapped with
   these peaks covering transcription start sites for the analysis. We
   first examined gene expression levels for the overlapped genes
   according to DESeq2 normalized counts. Most of genes did not display
   significant change between RAD52 and NT cells. Next, we compared gene
   fold changes between RAD52 and NT cells for genes overlapped with RAD52
   peaks and not overlapped with RAD52 peaks. However, by examining the
   relationship between averaged expression with fold changes included in
   DESeq2 output, we observed higher variance of fold changes for lowly
   expressed genes, therefore, we decided to exclude lowly expressed genes
   with averaged expression level <50 for the comparison. Student’s t-test
   was used to test the differences of gene fold changes. At last, pathway
   enrichment analysis was performed for the genes overlapped with RAD52
   peaks using ClusterProfiler^[271]84.

DSB repair assays

   The SSA reporter plasmid hprtSAGFP (Addgene, #41594) and the HR
   reporter plasmid pDRGFP (Addgene, #26475) were gifts from Maria Jasin.

   HCT116 WT or RAD52^-/- cells were transfected with 0.5 µg DR-GFP or
   SA-GFP and 1.5 µg pCBASceI (Addgene, #26477) using Lipofectamine 3000
   as described in the cell culture and transfection section. 72 h later,
   the cells were harvested, and percentages of GFP-positive cells per
   100,000 cells were determined by flow cytometry (HTFC Screening System,
   IntelliCyt). For each experiment, the percentage of GFP positive cells
   in the empty vector control was subtracted from the I-SceI-transfected
   cells. Flow cytometry data were analyzed using BD FlowJo (v.10.6.2).

DNA fiber analysis

   Cells were seeded onto 6 well plate and allowed to grow for 24 h. Cells
   were sequentially labeled with thymidine analogs: 25 µM CldU
   (5-Chloro-2′-deoxyuridine) and 250 µM IdU (iododeoxyuridine) for 30 min
   each. The reaction was terminated by addition of ice-cold PBS and cells
   were trypsinized. Cells were lysed with lysis buffer (50 mM EDTA pH
   8.0, 0.5% SDS, 200 mM Tris-Cl pH 7.5) on a clean slide and incubated
   for 5–7 min. After cell lysis, DNA was spread on glass slides and
   slides were tilted at an angle of 25°. Slides were air dried. Cells
   were fixed in methanol/acetic acid (3:1) for 10 min, denatured with
   2.5 M HCl for 1 h. Cells were blocked with 5% BSA in PBST (10 mM sodium
   phosphate, 0.15 M NaCl, 0.1% Tween™ 20 buffer at pH 7.5) for 1 h and
   stained with primary antibody Anti-BrdU (5-bromo-2′-deoxyuridine)
   monoclonal antibody from rat (abcam #ab6326; 1:50) and Anti-BrdU
   (5-bromo-2′-deoxyuridine) monoclonal antibody from mouse (BD
   Biosciences #347580; 1:100) for 2 h. Slides were washed 3 times with
   PBST. Slides were incubated with Chicken anti-rat Alexa Fluor 488
   (Invitrogen, #A21470; 1:300) and Rabbit anti-Mouse Alexa fluor
   594-conjugated (Invitrogen, #A11062; 1:200) for 1 h. Slides were washed
   3 times with PBST and coverslips were mounted on the slides with the
   mounting medium. DNA fibers were visualized on the Nikon Spinning disk
   confocal microscope at 60X, and images were collected and then analyzed
   with ImageJ.

Cell cycle analysis

   HeLa cells were transfected as described and harvested 48 h post
   transfection. Cells were then washed with PBS and fixed in 1 ml cold
   70% ethanol for at least 30 min on ice. Cells were pelleted and washed
   with PBS. The cells were resuspended in the staining solution (0.1%
   TrittonX-100, 200 µg/mL RNase A and 50 µg/mL propidium iodide in PBS)
   and incubated for 15 min at 37 °C in the dark. 50,000 cells per
   condition were analyzed by flow cytometry using the LSR Fortessa
   instrument (BD Biosciences). BD FACSDiva software was used with the BD
   Biosciences LSR Fortessa Analyzer for flow cytometry data acquisition.
   Flow cytometry data were analyzed using BD FlowJo (v.10.6.2).

R-loop consensus analysis

   To identify consensus regions of R-loop in the human genome, we sourced
   23 published R-loop bigwig files from the UCSC genome browser^[272]52.
   These files were subsequently converted to bed format utilizing the
   ‘bigWigToBedGraph’ tool
   ([273]https://genome.ucsc.edu/goldenPath/help/bigWig.html). Strands
   were merged, and replicates were consolidated with the ‘bedtools
   unionbedg’ function^[274]85. Broad peaks were then identified using the
   ‘macs2 bdgpeakcall’ function^[275]86. The average intensity score for
   the bed regions was ascertained with our custom
   ‘calculate_mean_intensity_score.pl’ script. On assessing the
   correlation of peak scores across the 23 tracks, five tracks exhibited
   discrepancies and were consequently excluded from further analyses. The
   remaining 18 bed files were merged using the ‘bedtools multiinter’
   function, including all R-loop regions without any filters. These
   regions were subtracted from the entire human hg19 reference genome, to
   obtain genomic regions devoid of any R-loop. To pinpoint the consensus
   R-loop region, we filtered out peak scores below 200, merged regions
   from varying tracks with the ‘bedtools multiinter’ function, and
   further refined this merged file to capture regions with a minimum of
   five overlapping tracks and a maximum length of 5000 bp. The resulting
   consensus R-loop region and control areas were employed to analyze
   R-loop enrichment across different genomic sections and to evaluate
   breakpoint density within R-loop regions subsequently. All scripts used
   in processing and the derived consensus R-loop and control regions can
   be obtained at
   [276]https://github.com/ipstone/rloop_genome_instability.

Consensus overlap analysis with genomic mutational signatures

   The mutational calls (SNVs, indels and structural variants) were
   downloaded from PCAWG (264 liver, 239 pancreatic, 189 prostate, 71
   ovarian and 70 melanoma tumors)^[277]56, ICGC/BRCA-EU project (320 ER+
   breast tumors)^[278]55 and TCGA cohorts (377 liver, 185 pancreatic, 500
   prostate, 587 ovarian, 470 melanoma, and 713 ER+ breast
   tumors)^[279]54. The gene annotations for transcription start site
   (TSS) and transcription end site (TES) were downloaded from
   [280]GENCODE for ‘genecode.v19.annotation.gtf’ data file. A window of
   1 kb (+/-) TSS and TES were added to these sites to intersected with
   the previously prepared R-loop positive regions as well as the R-loop
   negative regions. The resulting bed files was used to intersect with
   the genomics mutation data to calculate the density of SNVs, indels and
   structural variant breakpoints on these regions. For SNVs and indels,
   the count of mutations were calculated using the
   ‘SigProfilerMatrixGenerator’ function from SigProfiler package
   ([281]https://github.com/AlexandrovLab/SigProfilerMatrixGenerator),
   these numbers were further divided by the genomic region length to get
   mutation densities for the respective regions. Similarly, for the
   structural variant breakpoints, the downloaded structure variants data
   were intersected with the respective R-loop positive and negative
   genomic regions using ‘bedtools pairtobed’ function. After
   intersection, the SV breakpoints densities were calculated similarly as
   SNVs and indels. The density calculations and comparisons were all done
   in the R statistical software and visualized in Prism.

   For the head-on (HO) collision and the co-directional (CD) collision
   double-strand break density comparison, the HO/CD regions published in
   the afore mentioned paper^[282]44 were downloaded and intersected with
   R loop consensus positive regions and negative regions. A + /- 1 kb
   window was added to the R-loop consensus region before intersection.
   The double strand breaks density (indels and SVs) was calculated
   similarly as for the TSS/TES regions above. All the analysis code and
   accompanying input files are available at the following github
   repository: [283]https://github.com/ipstone/rloop_genome_instability.

Evaluating association between RAD52 RNA expression and genomic alterations
at R-loop

   To investigate whether down-regulation of RAD52 is associated with
   increased alterations in R-loops, we performed an analysis of tumor
   RNA-seq data in the PCAWG cohort. Specifically, we downloaded PCAWG
   RNA-seq data from [284]ICGC and extracted RAD52 Fragments Per Kilobase
   of transcript per Million mapped reads (FPKM) values, which were
   further transformed into Transcripts Per Million (TPM) values. To
   eliminate confounding factors stemming from homologous recombination
   deficiency (HRD), HRD cases were excluded based on the results defined
   by Nguyen et al^[285]87.

   Utilizing the RAD52 TPM values, tumors were categorized into RAD52-high
   and RAD52-low groups, representing the top 25% and bottom 25% of
   samples.

   To compare structural variants (SVs) and Indels in R-loop regions, we
   calculated the averaged SVs and Indels per Mbp for each sample. This
   calculation was performed using the formula: (N[sv] + N[indel])/R-loop
   segment size * 1,000,000. Finally, an unpaired t-test with Welch’s
   correction was used to compare the levels of SVs and Indels in R-loop
   regions for RAD52-high and RAD52-low tumors.

Image processing and data analysis

   For PLA and S9.6, γH2AX experiments, slides were imaged at 60X
   (immersion oil) with Nikon spinning disk confocal microscope. PLA foci
   per nucleus, S9.6 and γH2AX foci per nucleus were calculated using
   Nikon Elements AR Analysis Explorer (version 5.21.03), where DAPI was
   used as a mask for the nucleus. The number of PLA foci, S9.6 foci,
   γH2AX foci was counted for each DAPI to obtain the average number of
   gH2AX foci in each condition.

Statistical analysis

   Statistical analysis was carried out by unpaired two-tailed t-test
   (unless stated otherwise) using GraphPad Prism Version 10.2.1 for
   Windows (GraphPad Software, San Diego, CA, USA). All values are
   expressed as mean ± standard error of the mean (SEM). p-values <0.05
   were considered statistically significant. ns: non-significant.

Reporting summary

   Further information on research design is available in the [286]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [287]Supplementary Information^ (4.1MB, pdf)
   [288]Peer Review File^ (1.3MB, pdf)
   [289]41467_2024_51784_MOESM3_ESM.pdf^ (125.9KB, pdf)

   Description of Additional Supplementary Files
   [290]Supplementary Data 1^ (2.9MB, xlsx)
   [291]Supplementary Data 2^ (4.1MB, xlsx)
   [292]Supplementary Data 3^ (23.6KB, xlsx)
   [293]Supplementary Data 4^ (43.5KB, xlsx)
   [294]Supplementary Data 5^ (50.6KB, xlsx)
   [295]Supplementary Data 6^ (11.5KB, xlsx)
   [296]Reporting Summary^ (5.8MB, pdf)

Source data

   [297]Source Data^ (4.7MB, zip)

Acknowledgements