Abstract
Collisions of the transcription and replication machineries on the same
DNA strand can pose a significant threat to genomic stability. These
collisions occur in part due to the formation of RNA-DNA hybrids termed
R-loops, in which a newly transcribed RNA molecule hybridizes with the
DNA template strand. This study investigated the role of RAD52, a known
DNA repair factor, in preventing collisions by directing R-loop
formation and resolution. We show that RAD52 deficiency increases
R-loop accumulation, exacerbating collisions and resulting in elevated
DNA damage. Furthermore, RAD52’s ability to interact with the
transcription machinery, coupled with its capacity to facilitate R-loop
dissolution, highlights its role in preventing collisions. Lastly, we
provide evidence of an increased mutational burden from double-strand
breaks at conserved R-loop sites in human tumor samples, which is
increased in tumors with low RAD52 expression. In summary, this study
underscores the importance of RAD52 in orchestrating the balance
between replication and transcription processes to prevent collisions
and maintain genome stability.
Subject terms: Stalled forks, DNA damage response, Genomic instability
__________________________________________________________________
Collisions of transcription and replication machineries on the same DNA
strand threaten genomic stability. Here, the authors show that RAD52
prevents these collisions by regulating R-loop formation and
resolution. RAD52 deficiency leads to increased R-loops, exacerbated
collisions, DNA damage, and higher mutational burden in tumors.
Introduction
Transcription and replication are two tightly regulated processes
necessary for gene expression and DNA duplication respectively, both of
which are essential for cellular integrity. It is imperative for the
cell to maintain temporal and spatial separation of these two processes
to prevent them from colliding (transcription-replication conflicts
(TRCs)), which can result in replication stress and DNA damage,
ultimately leading to genome instability and tumorigenesis. Aberrant
accumulation of secondary structures such as R-loops have been
implicated as a major source of TRCs^[58]1,[59]2. R-loops are
three-stranded RNA-DNA hybrids that are formed transiently during
transcription when the nascent RNA anneals back to the template DNA and
displaces the non-template strand within the RNA polymerase active
site^[60]3. Traditionally, these hybrids have been shown to play an
important physiological role in gene activation, termination, and
chromatin regulation. However, transcriptional dysregulation, both in
the form of gene overexpression and aberrant RNA polymerase II (Pol II)
pausing, has been associated with the accumulation and persistence of
pathological R-loops^[61]4–[62]6. R-loops pose a significant threat to
DNA replication as transcription and replication translocate on the
same DNA template.
As TRCs pose a considerable threat to genomic integrity, mechanisms to
manage the collisions are required to prevent them causing undue DNA
damage^[63]4,[64]7. Prevention mechanisms include limiting the
accumulation of R-loops by assembling RNA-binding proteins on nascent
RNA^[65]8,[66]9; regulating topological stress associated with
transcription and chromatin architecture^[67]10–[68]12; removal of
R-loops via nucleases^[69]13,[70]14 or RNA-DNA helicases^[71]15,[72]16;
and, the subsequent repair of the damage resulting from
TRCs^[73]17–[74]21. In contrast, recent studies have proposed that
R-loops can play a major role in double-strand break (DSB) repair at
transcriptionally active loci via homologous recombination (HR), by
providing a scaffold for the recruitment of DNA repair factors to the
site of damage^[75]22,[76]23. Unexpectedly, RAD52 emerged as a common
factor in all these R-loop associated DSB repair
pathways^[77]24–[78]28.
Human RAD52, a protein known to have a DNA binding ability, has been
associated with DSB repair owing to its role as a back-up HR repair
factor^[79]29 and its synthetic lethal relationship in BRCA-deficient
cancers^[80]30–[81]32. Surprisingly, recent studies have shown strong
RNA-binding ability for RAD52 in vitro supporting a role in resolving
transcription associated DSBs^[82]24–[83]28. However, given the
interaction of RAD52 with RNA^[84]26,[85]33,[86]34 and Pol II^[87]35,
it is conceivable that RAD52 could also be involved upstream in the
regulation of R-loops themselves.
To understand RAD52’s role in R-loop management, we performed mass
spectrometry (MS) analysis of the RAD52 protein interactome. We found
that RAD52 predominantly interacts with proteins engaged in the
transcription complex, suggesting that RAD52 recruitment to R-loop
sites may be facilitated via this interaction. We observed that loss of
RAD52 induces elevated levels of Pol II pausing and R-loop accumulation
leading to increased TRCs and genomic instability. Furthermore, we
identified a role for the previously uncharacterized C-terminal domain
of RAD52, in that it is essential for RAD52’s interaction with Pol II
and helps recruit Topoisomerase IIα (TOP2A) to R-loops, in order to
alleviate torsional stress and aid in resolving TRCs. Additionally, we
found direct evidence of increased mutational scars at R-loop forming
regions across tumor types and these were exacerbated in tumors with
low levels of RAD52 expression. This study supports a role of RAD52
directly at R-loops, and its absence contributes to increased R-loop
associated genomic instability.
Results
RAD52 interacts with the transcriptional complex and co-localizes with RNA
POL II
We first sought to identify RAD52-interacting proteins under
physiological conditions in an unbiased manner by performing an
immunoprecipitation (IP) with the expression of RAD52 fused to a
N-terminal HA-tag, followed by MS (Fig. [88]1a, b, Supplementary
Fig. [89]1a). MS discovered 212 proteins significantly enriched over
the HA-tag control. Reassuringly, RPA1, a critical sub-unit of the RPA
complex and a known interactor of RAD52^[90]36–[91]39 was identified in
this analysis (Fig. [92]1b, c, Supplementary Data [93]1). However, the
majority of hits identified had an RNA-associated role as depicted by
the gene ontology (GO) analysis (Supplementary Fig. [94]1b, c), rather
than DNA repair, suggesting that RAD52 has a strong interaction with
the transcription machinery (as highlighted in Fig. [95]1b). To
validate this observation, we performed both a co-immunoprecipitation
(Co-IP) and a proximity ligation assay (PLA) between endogenous RAD52
and Pol II, finding clear evidence for their interaction independent of
DNA or RNA (Fig. [96]1c–f, Supplementary Fig. [97]1d), corroborating
previous observations of this interaction seen with over-expressed
RAD52^[98]35.
Fig. 1. RAD52 association with the transcriptional complex.
[99]Fig. 1
[100]Open in a new tab
a Schematic representation of the workflow for the identification of
RAD52 interacting proteins. HA-control and HA-RAD52 immunoprecipitation
was performed in HEK293T cells using α-HA tagged magnetic beads for the
pulldown followed by Mass spectrometry (MS). b Volcano plot of the
proteins identified in RAD52 IP-MS in n = 3 biologically independent
experiments. Mean log2 fold change in protein intensities on the x-axis
of all replicates between HA and HA-RAD52 are plotted against the
−log10 adjusted p-value (Student’s two-sided t-test with equal
variance) on the y-axis. 212 proteins were identified to be
significantly enriched. Significantly enriched proteins in blue
(p < 0.05) and non-significant in grey. c Co-immunoprecipitation of
endogenous RAD52 binding proteins in HeLa cells. RAD52 and IgG
antibodies were used to immuno-precipitate proteins and analyzed by
immunoblotting with indicated antibodies. Results reproducible for at
least 2 biological replicates. d Schematic representation of PLA to
visualize proximity of RAD52 protein and RNA Pol II. e Representative
images of the nuclear PLA foci (α-RAD52: α-RNA Pol II S2) across stated
conditions (Scale bar 10 µM). f Quantitative analysis of nuclear PLA
foci from (e) Data are plotted as mean ± SEM. The data presented
shows ≥ 500 nuclei from 3 biological replicates; p-values calculated
using unpaired two tailed t-tests. g Metagene plots showing the
distribution of the RNA Pol II and RAD52 Chromatin immunoprecipitation
sequencing (ChIP-seq) peaks (IP/input) in HeLa cells across genes and
the flanking regions ( ± 10 kb). TSS: Transcription Start Site, TES:
Transcription End Site. h Heatmap representing RNA Pol II and RAD52
ChIP-seq tracks, centered at the TSS and TES ± 10 kb, and rank-ordered
according to RNA Pol II occupancy. i Bar chart showing how RNA Pol II
and RAD52 peaks are distributed across different genomic regions as
indicated. Peaks were obtained with MACS2. Genome wide distribution is
shown on top for comparison. j Venn diagram showing the overlap of
peaks RNA Pol II ChIP and RAD52 ChIP according to MACS2 across the
genome. k A representative snapshot of chromosome 19 depicting RNA Pol
II (red) and RAD52 (green) ChIP binding sites in control HeLa cells.
Input DNA (grey) represents a negative control for background
normalization. Schematics in Fig. 1 (a) and (d) were created with
BioRender.com released under a Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International license. Source
data are provided as a Source Data file.
To understand the spatial resolution of the RAD52-Pol II interaction,
we performed a chromatin immunoprecipitation sequencing (ChIP-seq) of
endogenous Pol II and RAD52. We observed that there is a significant
coincidence of Pol II and RAD52 peaks across the genome, with increased
enrichment seen at transcription start sites (TSS) and transcription
end sites (TES) (Fig. [101]1g–k, Supplementary Fig. [102]1e).
Approximately 40% of all RAD52 peaks were associated with Pol II,
though only a subset of Pol II peaks colocalized with RAD52 (~10%),
suggesting that RAD52 is recruited to only a subset of all sites of
transcription (Fig. [103]1j).
RAD52 prevents TRCs by reducing Pol II pausing and decreasing the level of
pathologic R-loops
We next sought to better understand the function of RAD52 as an
accessory factor associated with the transcription complex. We observed
that depletion of RAD52 leads to increased PoI II pausing at the TSS as
demonstrated by increased accumulation of Pol II peaks at the TSS
specifically in the subset of genes that had Pol II-RAD52 co-occupancy
(Fig. [104]2a, b, Supplementary Fig. [105]2a, b), loss of RAD52 having
no effect on the Pol II profiles of the other genes (Supplementary
Fig. [106]2c). Furthermore, this did not affect global gene expression
profiles (Supplementary Fig. [107]2d, Supplementary Data [108]2),
suggesting RAD52 does not alter transcriptional profiles in cells.
Fig. 2. Loss of RAD52 increases R-loop formation and exacerbates
transcription-replication conflicts.
[109]Fig. 2
[110]Open in a new tab
a Representative snapshot of chromosome 9 depicting RNA Pol II
occupancy from ChIP-seq analysis (IP/input) in siNT (red) and siRAD52
(dark red) transfected HeLa cells. (b) Metagene plot showing the
distribution of the RNA Pol II occupancy at the TSS and flanking
regions ( ± 10 kb) of genes with overlapping RNA Pol II and RAD52
peaks. Plots shown: siNT (control) and siRAD52 transfected HeLa cells.
(c) ChIP-seq of RNA Pol II (red), RAD52 (green) and S9.6 (R-loops;
blue) occupancy in control HeLa cells. Representative snapshot of
chromosomes 21 are shown. Input (grey) DNA as negative control for
background normalization. d Venn diagram of the percentage of genes
overlapping with RNA Pol II, RAD52 and S9.6 ChIP peaks (MACS2). e
Representative images of S9.6 immunostaining to detect R-loops in siNT
(control) and siRAD52 transfected HeLa cells. RNase H treatment was
added as a negative control to eliminate R-loops (Scale bar 10 µM). f
Quantitative analysis of nuclear S9.6 foci across stated conditions
from (e). Data plotted as box and whiskers. Boxes extend from the
25th–75th percentiles, with the median displayed as a line. The
whiskers mark the minimum (1 percentile) and maximum (99^th
percentile). The data presented shows ≥ 500 nuclei from 3 biological
replicates; p-values calculated using unpaired two tailed t-tests. g
Schematic representation of PLA to visualize proximity of PCNA and RNA
Pol II to measure TRCs. The schematic illustration was created with
BioRender.com released under a Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International license. h
Representative images of the nuclear PLA foci (α-PCNA: α-RNA Pol II S2)
across stated conditions (Scale bar 10 µM). i Quantitative analysis of
nuclear PLA foci from (h). Data are plotted as mean ± SEM. The data
presented shows ≥ 500 nuclei from 3 biological replicates; p-values
calculated using unpaired two tailed t-tests. Source data are provided
as a Source Data file.
Increased Pol II pausing has been known to be associated with R-loop
accumulation^[111]40. We wanted to see if this holds true at RAD52
associated Pol II pausing sites (Fig. [112]2a, b). To this end, we
performed ChIP-seq analysis of R-loop associated peaks using the S9.6
antibody that has been characterized to specifically interact with
RNA-DNA hybrids^[113]41. We observed that 22% of RAD52 peaks associated
with R-loops, albeit at a frequency lower than its association with Pol
II (41%), suggesting that RAD52 may associate with Pol II independently
of R-loops (Fig. [114]2c, d). Upon comparing the RAD52 interactome
(Fig. [115]1b) with that of R-loops^[116]42, it was apparent that the
majority of proteins which associated with RAD52 also associated with
R-loops (Supplementary Fig. [117]2e, f, Supplementary Data [118]3),
indicating that RAD52 may play an important role in R-loop homeostasis.
We next set out to investigate the effect of RAD52 loss on global
R-loop levels. We observed that RAD52 loss in cells led to a
significant increase in global S9.6 signal (Fig. [119]2e, f,
Supplementary Fig. [120]3a–h), comparable to depletion of Aquarius
(AQR), a known R-loop resolution factor^[121]15 (Supplementary
Fig. [122]3i, j). Furthermore, the observed S9.6 signal was sensitive
specifically to RNase H treatment but not RNase III (Fig. [123]2e, f,
Supplementary Fig. [124]3b–d). RNase H is known to explicitly digest
RNA-DNA hybrid, confirming the detection of R-loops, and no other
non-specific RNA species under the given conditions^[125]43. Prior work
has suggested that increased R-loops pose a threat to replication,
leading to increased TRCs^[126]44. In order to understand the
physiological consequences of increased R-loops in RAD52 deficient
backgrounds, we performed a PLA between Pol II and PCNA, an essential
component of the replisome (Fig. [127]2g, Supplementary Fig. [128]3k,
l). We found a significant increase in TRCs observed with the loss of
RAD52, which was further amplified with increased R-loops in an
AQR-deficient context (Fig. [129]2h, i), implicating RAD52 as a
mediator of TRC resolution. This effect of RAD52 was also found in
Senataxin (SETX)^[130]45 depleted cells (Supplementary Fig. [131]3m,
n), confirming that the effect of RAD52-loss on TRC’s was caused by the
presence of increased R-loops, independent of causation. However, it is
worth noting that the RAD52-Pol II interaction is not limited to the
S-phase of the cell cycle, suggesting that RAD52 associates with the
transcription machinery throughout the cell cycle, potentially acting
as a surveyor of replication stress (Supplementary Fig. [132]3o–q).
RAD52 is recruited to sites of transcription-replication conflicts via its
RNA-Pol II interacting C-terminal domain
A previous study demonstrated that RAD52 interacts with the
transcription complex via its C-terminal domain^[133]35. Given our
observation that RAD52 associates with Pol II (Fig. [134]1 & [135]2),
we posited that RAD52’s C-terminal domain would be essential for its
role in resolving TRCs via its interaction with the transcription
machinery. In order to test this hypothesis, we generated an HA-tagged
RAD52 mutant in which we deleted amino acids 302–410 (referred to as
RAD52^∆C) (Fig. [136]3a, b, Supplementary Fig. [137]4a). This amino
acid region has previously been identified as the minimum number of
residues needed for RAD52 to interact with Pol II in vitro^[138]35. As
RAD52 is a protein known for its role in DNA repair, we first confirmed
that this was not disrupted by deleting the C-terminus. We tested this
using functional assays of DSB repair, namely single strand annealing
(SSA) and HR using the previously described reporters^[139]46,[140]47.
In RAD52^-/- cells, we observed that complementation with either
RAD52^WT or RAD52^∆C rescued the SSA, and HR deficient phenotypes
induced by RAD52 deficiency (Fig. [141]3c, d). This implies that the
loss of the C-terminus of RAD52 does not impair its DNA repair
activity.
Fig. 3. C-terminal domain of RAD52 is essential for the prevention of
transcription-replication conflicts.
[142]Fig. 3
[143]Open in a new tab
a Schematics of the domain structures of wild type (WT) - RAD52 protein
and C-terminal (ΔC) deleted RAD52 (Δ302-410 amino acids). From
N-terminal to C-terminal, RAD52 protein has DNA binding domain, RPA
binding domain, RAD52 binding domain, RNA Pol II binding domain and a
nuclear localization signal (NLS). The domains are not drawn to scale.
b Western blot confirming the expression of HA-RAD52^WT and
HA-RAD52^ΔC. Results reproducible for at least 2 biological replicates.
c (Left) Scheme of the single stranded annealing (SSA) reporter system:
The SSA-GFP reporter contains a 5′ fragment of the GFP (5′-GFP) gene,
and a 3′ fragment of the GFP (3′-GFP) with an I-SceI site. Repair of
the I-SceI-induced DSB by SSA leads to formation of GFP+ cells.
(Middle) Quantification of SSA repair assay in WT and RAD52^−/− HCT116
cells. (Right) Quantification of SSA repair assay in RAD52^−/− HCT116
cells with overexpression of either RAD52^WT or RAD52^ΔC (n = 4
biological replicates). d (Left) Scheme of the homology dependent
recombination (HDR) reporter system The HDR-GFP reporter system
contains the GFP gene interrupted by a I-SceI site, and a fragment of
the GFP with truncated 3′- and 5′-terminus. Repair of the
I-SceI-induced DSB by HDR leads to formation of GFP+ cells. (Middle)
Quantification of HDR repair assay in WT and RAD52^−/− HCT116 cells.
(Right) Quantification of HDR repair assay in RAD52^−/− HCT116 cells
with overexpression of either RAD52^WT or RAD52^ΔC. (n = 5 biological
replicates). e Schematic representation of PLA to visualize proximity
of HA-tagged RAD52 (HA-RAD52) and RNA Pol II. f Representative images
of the nuclear PLA foci (α-HA: α-RNA Pol II S2) across stated
conditions with overexpression of either RAD52^WT or RAD52^ΔC (Scale
bar 10 µM). g Quantitative analysis of nuclear PLA foci across stated
conditions described in (f). The data presented shows ≥ 500 nuclei from
3 biological replicates. h Schematic representation of PLA to visualize
proximity of PCNA and RNA Pol II to measure TRCs. i Representative
images of the nuclear PLA foci (α-PCNA: α-RNA Pol II S2) across stated
conditions with overexpression of either RAD52^WT or RAD52^ΔC in HeLa
cells (Scale bar 10 µM). j Quantitative analysis of nuclear PLA foci
from across stated conditions described in (i). The data presented
shows ≥ 500 nuclei from 3 biological replicates. In Fig. 3 (c) (d) (g)
and (j), data are plotted as mean ± SEM and p-values calculated using
unpaired two tailed t-tests. Schematics in Fig. 3 (a) (c) (d) (e) and
(h) were created with BioRender.com released under a Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International license. Source
data are provided as a Source Data file.
To test if the C-terminus was indeed responsible for RAD52’s
interaction with Pol II, we performed a PLA between the two proteins
(Fig. [144]3e, Supplementary Fig. [145]4b, c). As suggested by in vitro
biochemical studies^[146]35, RAD52^∆C had a reduced interaction with
Pol II in human cells (Fig. [147]3f, g), confirming the importance of
the C-terminal domain. Furthermore, RAD52^∆C failed to rescue the
elevated levels of R-loops and TRCs associated with the loss of RAD52
(Fig. [148]3h–j, Supplementary Fig. [149]4d, e), implicating the
essentiality of the C-terminal domain of RAD52 in the reduction of
TRCs.
RAD52 recruits TOP2A to R-loops to help resolve transcription-replication
conflicts
In order to elucidate the mechanism by which RAD52 facilitates
resolution of TRCs, we performed an IP-MS analysis of overexpressed
RAD52^WT and RAD52^∆C to tease apart factors that were differentially
associated with the C-terminal domain of RAD52 (Supplementary
Fig. [150]5a). The screen was done in an AQR-depleted background so as
to increase basal levels of R-loop (Supplementary Fig. [151]3i, j). Of
the 26 R-loop associated factors that had differential interaction with
RAD52^WT and RAD52^∆C,TOP2A stood out as a top hit owing to its
previously suggested role in TRC resolution^[152]48–[153]50
(Supplementary Fig. [154]5b, c, Supplementary Data [155]4&[156]5,
Supplementary Note [157]1). PLA between RAD52 and TOP2A was performed
to confirm this interaction (Fig. [158]4a–c), which was further
elevated in the presence of increased R-loops (Supplementary
Fig. [159]5d, e). As expected, the RAD52-TOP2A interaction was
disrupted in the absence of the RAD52 C-terminal domain
(Fig. [160]4d–f), corroborating the IP-MS results.
Fig. 4. RAD52 recruits TOP2A to mitigate transcription-replication conflicts.
[161]Fig. 4
[162]Open in a new tab
a Schematic representation of PLA to visualize proximity of RAD52 and
TOP2A. b Representative images of the nuclear PLA foci (α-RAD52:
α-TOP2A) in siNT (control) and siAQR transfected HeLa cells (Scale bar
10 µM). c Quantitative analysis of nuclear PLA foci across stated
conditions described in (b). The data presented shows ≥ 500 nuclei from
3 biological replicates. d Schematic representation of PLA to visualize
proximity of HA-tagged RAD52 (HA-RAD52) and TOP2A. e Representative
images of the nuclear PLA foci (α-HA: α-TOP2A) in siRAD52 (5’UTR)
transfected HeLa cells with overexpression of either RAD52^WT or
RAD52^ΔC (Scale bar 10 µM). f Quantitative analysis of nuclear PLA foci
across stated conditions described in (e). The data presented
shows ≥ 500 nuclei from 3 biological replicates. g Representative
images of S9.6 immunostaining to detect R-loops in siNT (control) and
siTOP2A transfected HeLa cells. RNase H treatment was added as a
negative control to eliminate R-loops (Scale bar 10 µM). h Quantitative
analysis of nuclear S9.6 foci across stated conditions from (g). Data
plotted as box and whiskers. Boxes extend from the 25th to 75th
percentiles, with the median displayed as a line. The whiskers mark the
minimum (1 percentile) and maximum (99th percentile). The data
presented shows ≥ 500 nuclei from 3 biological replicates; p-values
calculated using unpaired two tailed t-tests. i Schematic
representation of PLA to visualize proximity of PCNA and RNA Pol II to
measure TRCs. j Representative images of the nuclear PLA foci (PCNA:
RNA Pol II S2) in siNT (control) and siTOP2A transfected HeLa cells
(Scale bar 10 µM). k Quantitative analysis of nuclear PLA foci across
stated conditions described in (j). The data presented shows ≥ 500
nuclei from 3 biological replicates (l) Schematic representation of PLA
to visualize proximity of S9.6 and TOP2A. m Representative images of
the nuclear PLA foci (α-S9.6: α-TOP2A) in siNT (control), siRAD52 and
siAQR transfected HeLa cells (Scale bar 10 µM). n Quantitative analysis
of nuclear PLA foci across stated conditions described in (m)
normalized to siNT. The data presented shows ≥ 500 nuclei from 3
biological replicates. o Mechanistic model of RAD52 role in preventing
transcription-replication conflicts. In Fig. 4 (c–k) and (n), data are
plotted as mean ± SEM and p-values calculated using unpaired two tailed
t-tests. Schematics in Fig. 4 (a) (d) (i) (l) and (o) were created with
BioRender.com released under a Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International license. Source
data are provided as a Source Data file.
Consistent with previous studies^[163]48–[164]50, loss of TOP2A led to
increased R-loops as well as elevated TRCs (Fig. [165]4g–k,
Supplementary Fig. [166]5f), confirming its role in mitigating R-loop
associated TRCs. To study if recruitment of TOP2A to these sites was
indeed RAD52 associated, we performed a PLA between TOP2A and S9.6
(Fig. [167]4l, Supplementary Fig. [168]5g, h). There was a significant
reduction in TOP2A recruitment to R-loops in the absence of RAD52, both
in physiological conditions and with elevated R-loops (Fig. [169]4m,
n), supporting our hypothesis that RAD52 helps to recruit TOP2A to
R-loop associated TRCs, alleviating the duplex torsional stress and
helping resolve TRCs (Fig. [170]4o). This observation was further
supported by correlation with a previously published TOP2A ChIP-seq
dataset^[171]51, where 18% of RNA Pol II-RAD52 overlapping peaks (from
Fig. [172]1j) co-occurred with TOP2A peaks, indicating that TOP2A is
recruited to RAD52-mediated TRCs and thus consistent with our model
(Supplementary Fig. [173]5i).
RAD52-depleted cells have increased replication stress and accumulate γH2AX
at R-loop forming regions
TRCs have been shown to be an established source of replication stress
in cells owing to stalled replication and increased fork
collapse^[174]1,[175]2,[176]5. Given RAD52’s role in TRC resolution, we
hypothesized that RAD52 depletion could potentially lead to increased
replication stress. We observed a mild increase in replication stress
as measured by reduced DNA fiber track lengths after a sequential pulse
with two thymidine analogs—5-Chloro-2′-deoxyuridine (CldU) and
5-iodo-2′-deoxyuridine (IdU)—for 30 min each (Fig. [177]5a–c,
Supplementary Fig. [178]6a–c). Furthermore, consistent with its role in
the resolution of TRCs, overexpression of the RAD52^∆C mutant was
unable to rescue the increased replication stress phenotype observed
with RAD52-depletion (Fig. [179]5a–c). However, depletion of RAD52 did
not affect global cell cycle profiles nor did it lead to an ATM and
ATR-dependent checkpoint activation, suggesting that the induced local
replication effects do not lead to an altered S-phase (Supplementary
Fig. [180]6d, e). Moreover, R-loop associated TRCs have been shown to
have increased levels of DNA damage^[181]2,[182]44.
Fig. 5. Loss of RAD52 causes replication stress and increased DNA damage.
[183]Fig. 5
[184]Open in a new tab
a Schematic representation of DNA fiber assay performed in HCT116 wild
type (WT) and RAD52 knockout cells (RAD52^-/-) cells with plasmid
overexpression of either RAD52^WT or RAD52^ΔC followed by incubation
with 5-Chloro-2′-deoxyuridine (CldU) and 5-iodo-2′-deoxyuridine (IdU)
for 30 min each to label nascent DNA. b Representative images of DNA
fiber images in HCT116 WT and RAD52^-/- cells with overexpression of
either RAD52^WT or RAD52^ΔC (Scale bar 2 µM). (c) Measurement of DNA
fiber lengths across stated conditions described in (b) to measure
replication rates. Data plotted as box and whiskers. Boxes extend from
the 25th to 75th percentiles, with the median displayed as a line. The
whiskers mark the minimum (1 percentile) and maximum (99th percentile).
The data presented shows ≥100 DNA fibers from 3 biological replicates;
p-values calculated using unpaired two tailed t-tests. d Heat map of
the intensity of γH2AX ChIP signals (siNT and siRAD52 transfected HeLa
cells) at genes that have a detectable R-loop peak as determined in
Supplementary Fig. [185]6b. The γH2AX occupancy is displayed relative
to the TSS ± 0.5 Mb. e Schematic representation of PLA to visualize
proximity of S9.6 and γH2AX. f Representative images of the nuclear PLA
foci (α-S9.6: α-γH2AX) in siNT (control), siRAD52 and siAQR transfected
HeLa cells (Scale bar 10 µM). g Quantitative analysis of nuclear PLA
foci across stated conditions described in (f). Data are plotted as
mean ± SEM. The data presented shows ≥ 500 nuclei from 3 biological
replicates; p-values calculated using unpaired two tailed t-tests.
Schematics in Fig. 5 (a) and (e) were created with BioRender.com
released under a Creative Commons Attribution-NonCommercial-NoDerivs
4.0 International license. Source data are provided as a Source Data
file.
To further assess the DNA damage at RAD52-associated R-loop forming
regions (Supplementary Fig. [186]6b), we analyzed the distribution of
γH2AX around R-loops in RAD52-depleted cells using ChIP-seq. We
observed that there was increased γH2AX accumulation at R-loop forming
genes, which was persistent even ±0.5 Mb around the TSS (Fig. [187]5d,
Supplementary Fig. [188]6f, g). These findings were further
corroborated by performing a PLA of S9.6 and γH2AX in RAD52-depleted
cells, under physiological and increased R-loop conditions
(Fig. [189]5e–g). However, loss of RAD52 does not elicit a global DNA
damage response as confirmed by the unaltered levels of total γH2AX in
normal versus RAD52-depleted cells (Supplementary Fig. [190]6h, i).
These findings are also consistent with the unchanged cell cycle
progression previously observed.
R-loops are a source of genome instability in tumors
R-loops have long been associated with DNA damage as a consequence of
prolonged replication fork stalling and DSBs arising from collapsed
replication forks^[191]1,[192]2. However, most of the evidence for the
damage associated with R-loops comes from indirect evidence in the form
of increased γH2AX foci or comet assay tail lengths^[193]5. The direct
consequence of R-loops on the genome in the form of mutational
signatures remains poorly understood. We hypothesized that if R-loops
can lead to DSBs, there should be an increased burden of genomic scars
associated with conserved R-loop forming regions across human tumors.
To investigate this hypothesis, we built a consensus R-loop dataset
comprised of correlated peaks from 18 published datasets^[194]52
(Supplementary Fig. [195]7a, b, Supplementary Data [196]6). This
combined dataset was assessed to confirm that it followed the
established conventions of R-loops being associated with transcribed
genes, with a significant increase in occurrence being observed at TSSs
and TESs^[197]53 (Fig. [198]6a). We next proceeded to overlay our
R-loop dataset with previously identified somatic mutations from the
PCAWG, ICGC and TCGA cohorts^[199]54–[200]56 (see methods for details).
We observed a significant increase in structural alterations associated
with R-loop forming regions across the genome (Fig. [201]6b). Not
surprisingly, we observed a decrease in single nucleotide variants
(SNVs) in R-loop forming regions, consistent with likely increased
transcription-coupled repair (Fig. [202]6c and [203]6f). In contrast,
genetic alterations likely to form due to a DSB were significantly
increased in R-loop forming regions, including both insertions and
deletions > 1 bp (long InDels) and structural variants (SVs),
(Fig. [204]6d, e, g, h, Supplementary Fig. [205]8a–c). Historically,
SVs and indels have been associated with aberrant repair at DSBs either
from NHEJ or backup pathways to HR^[206]57. These observations are
consistent with the idea that R-loops lead to the formation of DSBs,
repair of which can result in large insertions, deletions, and
translocations, as we observed.
Fig. 6. Increased mutational burden and genomic instability associated with
R-loops were observed in human tumor samples.
[207]Fig. 6
[208]Open in a new tab
a The genomic distribution of the consensus R-loop dataset as
identified in Supplementary Fig. [209]7b. Various genomic regions are
color coded according to the labels on the bottom. The expected
distribution in case peaks were randomly positioned in the genome is
shown for comparison. TTS and TES are significantly enriched in the
R-loop dataset (P < 0.001) as determined by the Fisher’s exact test. b
Circos plots showing structural variations and genomic alterations
caused by breakpoints enriched in R-loop (right) forming regions versus
non-R-loop regions (left). c–e Genomic windows depicting the
frequencies of single nucleotide variants (SNV-left), long
InDels > 1 bp (middle) and structural variants (SV-right), analyzed at
R-loop vs non-R-loop across various cancer types. The horizontal
coordinate represents different types of cancers and vertical
coordinates represents coverage at all genomic regions, TSS and TES.
Data is quantified by log fold change between mutational burden at
R-loop versus non-R-loop regions. f–h Quantification of the average
number of SNVs, Long indels, SVs per Mb of genome at TSS and TES in
R-loop versus non-R-loop forming regions. Data are plotted as
mean ± SEM; p-values calculated using unpaired two tailed t-tests. i
Schematic to show the two types of TRCs: co-directional collisions
(top) and Head on collision (bottom). The schematic illustration was
created with BioRender.com released under a Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 International license. j
Quantification of the percentage of collisions occur at R-loop sites in
terms of co-directional collisions and head-on collisions. Data are
plotted as a bar graph with absolute percentage. (Fisher’s exact test).
k Quantification of the comparison of average number of alterations per
Mb of genome which are mapped to collision sites between CD and HO.
Data are plotted as mean ± SEM. p-values were calculated by two-sided
non-parametric Mann–Whitney test. l Quantification of the comparison of
average number of alterations per Mb of genome at R-loop sites between
tumors with high and low expression of RAD52. Tumors were categorized
as expressing low (RAD52 low; bottom quartile) or high levels of RAD52
mRNA (RAD52 high; top quartile). Data plotted as box and whiskers.
Boxes extend from the 25th to 75th percentiles, with the median
displayed as a line. The whiskers mark the minimum (5^th percentile)
and maximum (95^th percentile). (n = 95 (RAD52 high), n = 94 (RAD52
low)); p-values calculated using unpaired two tailed t-tests. Source
data are provided as a Source Data file.
TRCs are preferentially enriched at sites of head-on collisions (HO) as
opposed to co-directional collisions (CD)^[210]44 (Fig. [211]6i). To
determine whether HO collisions could lead to increased accumulation of
R-loop induced genomic alterations, we classified our R-loop dataset as
CD or HO-associated by overlaying it with the previously published and
annotated Okazaki fragment sequencing (OK-seq) data^[212]44,[213]58
(Fig. [214]6j). As expected, we observed a significant difference
between the genetic alterations at CD versus HO, with a 3-fold increase
at HO (Fig. [215]6k). Furthermore, tumors with lower RAD52 expression
levels seem to correlate with increased mutations at R-loops
(Fig. [216]6l), supporting the idea that RAD52 acts at R-loops to
prevent genomic rearrangements.
Discussion
Despite its apparent dispensability in humans, RAD52 has been
characterized as an essential backup DNA repair factor for BRCA2 due to
its ability to mediate HR and SSA. Recently, RAD52 was shown to be
involved in genome maintenance via additional roles in Break-Induced
Replication (BIR) and mitotic DNA synthesis (MiDAS) arising from
replication stress^[217]29,[218]59. In this study, we uncovered a
previously undescribed role for human RAD52 in R-loop homeostasis via
its association with the transcriptional machinery. We identified a
robust RAD52-Pol II interaction (Fig. [219]1) and determined that RAD52
associates with Pol II predominantly at the TSS in a subset of genes.
Notably, loss of RAD52 alone was sufficient to cause increased Pol II
pausing at these loci (Fig. [220]2).
While Pol II pausing has been implicated as a rate-limiting step in
transcription, it can be particularly problematic for the maintenance
of genome integrity by interfering with the replication machinery,
causing TRCs followed by DNA damage^[221]60. Pol II pausing promotes
the formation of transient secondary structures such as R-loops which
are the major source of such conflicts^[222]40,[223]44. In this study,
we present strong evidence that RAD52 helps resolve these R-loops from
forming TRCs, the loss of RAD52 leading to increased TRCs and
associated DNA damage (Figs. [224]2 and [225]5). Interestingly, while
almost half of chromatin bound RAD52 was associated with Pol II, only
half of these sites were associated with R-loop formation
(Fig. [226]2d), suggesting that RAD52 may associate with transcription
sites independently of R-loop formation. Furthermore, we found that the
previously uncharacterized C-terminal domain of RAD52 is essential for
its Pol II interaction and its role in TRC resolution, separate to its
role in HR and SSA (Fig. [227]3). RAD52’s involvement in R-loop
resolution and collision avoidance is supplementary to its DSB-repair
roles. RAD52’s ability to support DNA/RNA binding is linked to its
annealing abilities, but recruitment to the sites of transcription
requires the C-terminal domain. We think that apart from RAD52 helping
to resolve R-loops, RAD52 can also have a downstream role in repairing
DSBs that arise from persistent R-loops as seen in
transcription-coupled homologous recombination (TC-HR) or
transcription-associated homologous recombination repair
(TA-HR)^[228]24–[229]28.
The cell tightly maintains R-loop homeostasis by regulating pathways
that control its formation and degradation^[230]4,[231]7. The release
of supercoiling associated with transcription and replication is
mediated by topoisomerase 1 (single-strand DNA nicking) within the
region of the transcription site or the site active
replication^[232]61. However, when transcription and replication are
leading to a head-on collision, the duplex DNA between the sites of
transcription and replication are trapped by supercoiling of different
polarities, creating a zone of conflict, where the duplexes may form
“knotted” loops. Release of the accumulated duplex DNA torsional stress
is required to resolve the TRC. The Topoisomerase IIα (TOP2A) cleavage
complex is one such protein complex that could release the two sources
of negative supercoiling coming from opposite directions during
transcription and replication^[233]62,[234]63, the loss of which leads
to increased R-loop accumulation and increased TRCs^[235]50
(Fig. [236]4). We found that RAD52 promotes TOP2A recruitment to R-loop
sites, thus helping resolve TRCs and preventing the ensuing genomic
instability (Fig. [237]4). It is therefore conceivable that the
increased Pol II pausing observed in RAD52-depleted cells (Fig. [238]2)
could be ascribed to the inability of the cell to recruit TOP2A to the
TRC region, leading to increased torsional stress^[239]63 and R-loop
enrichment. However, while we demonstrate that the C-terminal domain of
RAD52 seems to mediate TOP2A’s recruitment to TRC’s, further
experiments are warranted to determine if this function is dependent or
independent of RAD52’s association to the Pol II complex, which is also
mediated via its C-terminal domain.
Pathological R-loops have long been implicated in genome instability,
albeit through indirect evidence in the form of γH2AX signal or
accumulation of DSBs at R-loop forming regions^[240]5. Here, we provide
direct evidence of R-loops acting as a driver of DSB-induced genomic
instability in varied human tumor samples (Fig. [241]6). We observed
elevated levels of structural variants and indels at R-loop forming
regions across tumor types, but not single nucleotide variants, in
contrast to recent observations reported to be linked to
R-loops^[242]64. This difference likely stems from analytic
differences, whereas here we used tumors only sequenced by WGS with
light filtering, prior work had analyzed heterogeneously sequenced
tumors with over 95% of cancers removed from analysis. Furthermore, an
increased density of these mutations was observed at HO collisions when
compared to CO collisions, supporting the concept that HO collisions
produce DSB and are more harmful for the cell. Additionally, low RAD52
expression in tumors was associated with an increased mutational burden
at R-loops, consistent with the previously uncharacterized role for
RAD52. Moreover, there was no correlation between TOP2A expression
level and mutational signatures, unlike that seen with RAD52 expression
levels (Fig. [243]6l, Supplementary Fig. [244]9a–c). This can be
explained by the fact that there is no direct correlation between the
expression levels of TOP2A and RAD52 (Supplementary Fig. [245]9c), but
the effect we see is a function of RAD52 not being able to recruit
TOP2A to the sites of collisions, resulting in DNA breaks and genomic
instability. This is in concordance with the long-standing view in the
field that for DNA repair, the protein levels do not always play the
rate-limiting step, but it is often the opportunity for the protein to
get to the DNA lesion at the right time that matters. Hence, expression
of TOP2A is unsurprisingly unrelated to genomic instability.
Our study finds a unique role for RAD52 in genome maintenance via its
ability to resolve R-loops and TRCs. Considering that transcription
induced replication stress is one of the most common endogenous sources
of DSB in the cell, it is possible that this transcription associated
role of RAD52 may also contribute to its synthetic lethal phenotype
observed in BRCA-deficient cells, in addition to the previously
characterized DSB repair activities, including RAD51-mediator function
and single-strand annealing^[246]30–[247]32. Furthermore, we
demonstrate that R-loops if left unrepaired, can lead to genomic
instability resulting in mutagenesis, chromosomal rearrangements, and
cancer.
Methods
Cell Culture and transfections
HeLa (ATCC, #CCL-2), HEK293T (ATCC, #CLR-3216) and U2OS (ATCC, #HTB-96)
cells were grown in complete DMEM high glucose supplemented with 10%
FBS, 2 mM L-glutamine, 20 mM HEPES, 100 I.U./ml Penicillin, and
100 μg/ml Streptomycin. HCT116 WT and RAD52^-/- cell lines were
obtained from Dr. Eric A. Hendrickson^[248]65 and cultured in McCoy’s
5 A medium supplemented with 10% FBS, 2 mM L-glutamine and 100 I.U./ml
Penicillin, and 100 μg/ml Streptomycin. All cells were grown in a
humidified 37 °C incubator with 5% CO[2].
0.5 × 10^6 cells were reverse transfected using RNAiMAX (Invitrogen,
#13778150) according to the manufacturer’s instructions with 40 pmol of
siRNAs of Rad52 (Dharmacon ON-TARGETplus SMARTpool, #L-011760), Rad52
5’UTR (Dharmacon ON-TARGET 5’UTR, #J-011760-06), AQR (Dharmacon
ON-TARGETplus SMARTpool, #L-022214), TOP2A (Dharmacon ON-TARGETplus
SMARTpool, #L-004239) or scrambled non target siRNA (Dharmacon
ON-TARGETplus SMARTpool, #D-001810) as indicated. Cells were harvested
48 h after transfection and processed as needed.
0.25 × 10^6 cells/well were seeded in a six-well plates and treated
with the respective siRNA as described above, 24 h post knock-down,
cells were transfected with 2 μg of the HA-RAD52^WT or HA-RAD52^∆C
over-expressing or HA-control plasmid (see below) using Lipofectamine
3000 (Invitrogen, #L3000015).
Plasmid constructs
The plasmids used to express HA-RAD52^WT and HA-RAD52^∆C were
derivatives of pcDNA3.1( + )-N-HA. The gblocks corresponding to full
length RAD52 and RAD52 (Δ302-410 amino acids) were cloned into the
pcDNA3.1( + )-N-HA backbone using KpnI/NotI restriction enzymes and the
plasmid was confirmed by sanger sequencing.
Immunoprecipitation and MASS spectrometry Analysis
HEK293T cells were transfected with the respective plasmids as per the
experimental conditions mentioned. Post transfection, the cells were
washed with ice cold PBS and resuspended in Lysis buffer (50 mM
Tris-HCl pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% NP-40, 1X Protease
Inhibitor) and incubate for 20 min on rotor at 4 °C. After 4 cycles of
water-bath sonication for a total of 6 min, lysates were centrifugation
for 10 min at 15000 x g at 4 °C. Immunoprecipitation was performed on
the supernatants using Pierce Anti-HA Magnetic Beads (Thermo
Scientific, #88836) overnight at 4 °C on the rotor. Beads were
extensively washed in the lysis buffer and stored at -80 °C, until
ready for mass spectrometry.
A fraction of the beads was processed for immunoblotting to confirm the
pull-down. The beads were denatured and eluted in LDS Non-Reducing
Sample Buffer (Thermo Scientific, #84788). by boiling for 5–10 min.
Proteins were separated on 4–12% acrylamide SDS-PAGE, transferred on
Nitrocellulose membrane and detected with the indicated antibodies
described in the table and ECL reagents.
Sample preparation and mass spec analysis was carried out by Poochon
Scientific (Frederick, Maryland). Post IP, in order to precipitate the
protein, beads from 3 independent replicates were treated with 50 μl of
2% SDS, heated at 95 °C for 10 min and centrifuged. Supernatant was
processed for trypsin digestion as per SOP-PS-6003 (Standard Operation
of Procedure for in Solution Digestion). The digested peptide mixture
was then concentrated and desalted using SPN columns as per SOP-
PS-6005 (Standard Operation of Procedure for Desalting Digested
Peptides). Reconstituted desalted peptides in 30 μl of 0.1% formic
acid. 12 μl of peptides was analyzed by 110 min LC/MS/MS run. The
LC/MS/MS analysis of samples were carried out using a Thermo Scientific
Orbitrap Exploris 240 Mass Spectrometer and a Thermo Scientific Dionex
UltiMate 3000 RSLCnano System. Peptide mixture from each sample was
loaded onto a peptide trap cartridge at a flow rate of 5 μL/min. The
trapped peptides were eluted onto a reversed-phase EasySpray C18 column
(Thermo Scientific) using a linear gradient of acetonitrile (3–36%) in
0.1% formic acid. The elution duration was 110 min at a flow rate of
0.3 μl/min. Eluted peptides from the EasySpray column were ionized and
sprayed into the mass spectrometer, using a Nano-EasySpray Ion Source
(Thermo Scientific) under the following settings: spray voltage,
1.6 kV, Capillary temperature, 275 °C. Raw data file acquired from each
sample was searched against human protein sequences database and target
protein sequences provided by the client using the Proteome Discoverer
2.4 software (Thermo Scientific, San Jose, CA) based on the SEQUEST
algorithm. Carbamidomethylation ( + 57.021 Da) of cysteines was fixed
modification, and Oxidation Met and Deamidation Q/N-deamidated
( + 0.98402 Da) were set as dynamic modifications. The minimum peptide
length was specified to be five amino acids. The precursor mass
tolerance was set to 15 ppm, whereas fragment mass tolerance was set to
0.05 Da. The maximum false peptide discovery rate was specified as
0.01. The resulting Proteome Discoverer Report contains all assembled
proteins with peptides sequences and peptide spectrum match counts
(PSM#).
Protein quantification/normalization used the normalized spectral
abundance factors (NSAFs) method to calculate the protein relative
abundance^[249]66,[250]67. NSAF normalization was carried out as
follow. NSAFs were calculated as follows:
[MATH: NSAFN=(SN/LN)/∑ni=1Si/Li :MATH]
1
Where N is the protein index; SN (PSM#) is the number of peptide
spectra matched to the protein; LN is the length of protein N (number
of amino acid residues); and n is the total number of proteins in the
input database (proteome profile for one cell sample). Protein
enrichment was calculated by comparing fold change between the sample
pull down and the HA-tag control.
Pathway analysis was carried out using Gene Ontology (GO)
software^[251]68,[252]69. Functional protein interaction network
analysis was performed using interaction data from the STRING
database^[253]70. Only interactions with a score >0.15 are represented
in the networks.
Co-Immunoprecipitation assay
To detect endogenous RAD52 interacting proteins, HeLa cells were seeded
in a 10 cm dish for 24 h. The cells were then washed with ice cold PBS
and resuspended in Lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl,
1 mM EDTA, 1% NP-40, 1X Protease Inhibitor) and incubate for 20 min on
rotor at 4 °C. After 5 cycles of water-bath sonication for a total of
10 min, lysates were centrifugation for 10 min at 15000 x g at 4 °C.
The lysate was treated with Benzonase nuclease (Millipore-Sigma
Novagen, #707463) at a concentration of 25U per 1 ml on the rotor for
1 h at room temperature. Immunoprecipitation was performed on the
supernatants using α-RAD52 conjugated agarose beads (Santa Cruz
Biotech, #sc-365341 AC) overnight at 4 °C on the rotor. A control
immunoprecipitation was carried out using normal mouse IgG conjugated
agarose beads (Santa Cruz Biotech, #sc-2343). Beads were extensively
washed in the lysis buffer and processed for immunoblotting to confirm
the pull-down. The beads were denatured and eluted in LDS Non-Reducing
Sample Buffer (Thermo Scientific, #84788). by boiling for 5–10 min.
Proteins were separated on 4–12% acrylamide SDS-PAGE, transferred on
PVDF membrane and detected with the indicated antibodies and ECL
reagents.
Proximity ligation assay
Cells were seeded on poly-L-lysine-coated coverslips and reverse
transfected with indicated siRNAs on the same day. 48 h after
transfection, cells were washed with PBS and pre-extracted with 0.25%
TritonX-100 for 5 min on ice. Cells were fixed with 4% paraformaldehyde
for 20 min and then washed with PBS twice for 5 min. Cells were
incubated with 100% methanol for 30 s and then washed with PBS three
times for 5 min. Cells were blocked with blocking solution overnight at
4 °C provided in the PLA kit. Coverslips were incubated with primary
antibodies (Supplementary Table [254]1) diluted in antibody diluent for
1 h at room temperature (RT). Next, coverslips were incubated with
mouse/rabbit secondary probes Duolink® In Situ PLA® Probe Anti-Rabbit
PLUS and Duolink® In Situ PLA® Probe Anti-Mouse MINUS. Proximity
ligation was performed using either Duolink In Situ Red Kit
Mouse/Rabbit (Millipore-Sigma, #DUO92008), Duolink® In Situ Detection
Reagents FarRed (Millipore-Sigma, #DUO92013), and Duolink® In Situ
Detection Reagents Green (Millipore-Sigma, DUO92014) according to the
manufacturer’s protocol. The oligonucleotides and antibody-nucleic acid
conjugates used were those provided in the Millipore-Sigma PLA kit. For
EdU (5-ethynyl-2’-deoxyuridine) staining in PLA experiments, cells were
incubated with 10 µM EdU for 30 min before pre-extraction step. EdU
detection was performed via Click-IT reaction mixture provided in the
kit (Thermo Scientific, #[255]C10086) according to manufacturer’s
instructions. Stained cells were mounted with mounting medium
containing DAPI. Samples were visualized on the Nikon Spinning disk
confocal microscope at 60X, and images were collected and then analyzed
with the Nikon Elements AR Analysis Explorer (version 5.21.03).
Immunofluorescence
For R-loop staining, Experiments were performed similar to reported
procedures^[256]71,[257]72 with details as follows. Cells were fixed
with ice cold methanol for 10 min and permeabilized with acetone for
3 min. Cells were washed 3 times with ice cold PBS (5 min each). For
RNase III treated samples: cells were incubated with RNase III enzyme
(New England Biolabs, #M0245S) with a dilution of 1:200 in 1X RNase III
buffer supplemented with manganese chloride @37 °C for 30 min. For
RNase H treated samples: cells were incubated with RNase H enzyme (New
England Biolabs, #M0297L) with a dilution of 1:50 in 1X RNase H buffer.
Cells were incubated in RNase H enzyme @37 °C for 2 h. After
incubation, cells were washed with cold PBS for 5 min (3 times). Cells
were incubated in blocking buffer (3% BSA, 1% goat serum 0.1% Triton in
4X SSC buffer) overnight at 4 C. Cells were incubated with the S9.6
primary antibody (1:500, Millipore, #MABE1095) diluted in blocking
buffer for 2 h at RT and washed 3 times with 4X SSC buffer for 5 min
each. Cells were incubated with the Alexa Fluor Plus 488 secondary
antibody (1:1500, Invitrogen, #A48255) diluted in blocking buffer for
1 h at RT (dark storage) and washed 3 times with 4X SSC buffer for
5 min each. For mitochondrial staining, cells were incubated with
250 nM MitoTracker Deep Red FM probe (Thermo Scientific, #[258]M22426)
for 30 min prior to the fixation step. Cells were mounted with mounting
medium containing DAPI (Millipore-Sigma, #DUO82040) for 30 min at RT
(dark storage) and the slides were stored at 4 °C. For counter staining
the nucleolus, the nucleolin antibody (Cell Signaling, #14574; 1:1000
dilution) was combined with the S9.6 antibody and staining was carried
out as described above, The Alexa Fluor Plus 555 secondary antibody
(1:1500, Invitrogen, #A32732) was used.
For γH2AX staining, cells were fixed with 4% paraformaldehyde (EMS,
#15710) for 20 min and washed with 1X PBS twice for 5 min each. Cells
were permeabilized with 0.5% Triton X-100 for 10 min at RT and washed
with 1X PBS three times for 5 min each. Cells were incubated in
blocking buffer (3% BSA, 0.1% Triton in 1X PBS buffer) overnight at
4 C. Primary antibody incubation was performed with mouse monoclonal
anti-phospho-H2A.X (Ser139) Antibody (1:1500) (Millipore-Sigma,
#05-636) for 2 h at RT. Cells were washed three times with 0.1%
TritonX-100 in 1X PBS for 5 min each. Secondary antibody incubation was
performed with Goat anti-mouse Alexa Fluor Plus 488 (Invitrogen,
#A32723) for 1 h in dark at RT. Cells were washed three times with 0.1%
TritonX-100 in 1X PBS for 5 min each.
Samples were visualized on the Nikon Spinning disk confocal microscope
at 60X, and images were collected and then analyzed with the Nikon
Elements AR Analysis Explorer (version 5.21.03).
Dot blot
Total nucleic acid was extracted using the DNeasy Blood and Tissue Kit
(Qiagen, #69504) and RNA:DNA hybrids were detected and quantified by
dot blot assay. Samples were spotted on Amersham Hybond-N+ membrane
(Cytiva, #RPN119B) in duplicates using the Bio-Dot Apparatus (BioRad,
#1706545) and vacuum suction, dried and UV crosslinked. For the RNase H
treatment, the genomic DNA was incubated in with the enzyme (New
England Biolabs, #M0297) at a concentration of 10U of RNase H /μg of
DNA at 37 °C for 20 h prior to spotting. Blots were blocked with 5%
nonfat dried milk in TBST, then incubated overnight at 4 °C with an
anti-RNA:DNA hybrid antibody S9.6 against RNA:DNA hybrids (Millipore,
#05-636, 1:500 dilution) and double-stranded DNA (Novus
Biologicals,#NBP3-07302,1:500 dilution) in TBST. Blots were washed
3 times for 15 min each in TBST, incubated in either α-mouse IgG,
HRP-linked Antibody (Cell Signaling, #7076; 1:5000 dilution) or
α-rabbit IgG, HRP-linked Antibody (Cell Signaling, #7074; 1:5000
dilution) respectively for 1 hr at room temperature. Images were taken
after incubation with SuperSignal West Pico PLUS (Fisher Pierce,
PI34578).
Western Blotting
Cells were collected by trypsinization, lysed on ice in RIPA buffer
(25 mM Tris-HCl pH 7.6, 150 mM NaCl, 0.1% SDS, 1% NP-40, 1% sodium
deoxycholate) supplemented with 1 tablet/10 ml lysis buffer of
cOmplete™, EDTA-free Protease Inhibitor Cocktail (Roche,
#11-873-580-001), 1 tablet/10 ml lysis buffer of phosphatase inhibitor
cocktail PhosSTOP (Millipore-Sigma, #4906845001) and 10 mM PMSF for
30 min, sonicated, and clarified by centrifugation for 20 min at 10,000
RPM at 4 °C. The supernatant was quantified using the Pierce BCA
protocol (Thermo Scientific, #23225). Equivalent amounts of proteins
were separated by SDS–PAGE and transferred onto a nitrocellulose
membrane. Membranes were blocked in 5% milk in TBST (137 mM NaCl,
2.7 mM KCl, 19 mM Tris-Base and 0.05% Tween-20) for at least 1 h at
room temperature. Incubation with primary antibodies was performed
overnight at 4 °C. Membranes were washed in TBST and incubated with
HRP-conjugated secondary antibodies for 1 h at room temperature, and
developed with Pierce ECL (Thermo Scientific, #32106). The primary
antibodies used for Western blotting included α-RAD52 (Santa Cruz
Biotech, #sc-365341; 1:500 dilution), α-GAPDH (Abcam, #ab8245c; 1:1000
dilution), α-AQR (Bethyl Laboratories Inc, #A302547A; 1:500 dilution),
α-Lamin A/C (Cell Signaling, #4777; 1:1000 dilution), α-HA (Santa Cruz
Biotech, #sc-7392; 1:1000 dilution), α-Vinculin (Cell Signaling,
#13901; 1:1000 dilution), α-TOP2A (Santa Cruz Biotech, #sc-365916;
1:500 dilution), α-βTubulin (GeneTex, #GTX107175; 1:1000 dilution),
α-ATR (phospho Thr1989) (GeneTex, #GTX128145; 1:1000 dilution), α-ATM
(phospho S1981) (Abcam, #ab81292; 1:2000 dilution), α-RPA2 (phospho
S4/S8) (Abcam, #ab87277; 1:2000 dilution), α-RPA2 (phospho S33) (Bethyl
Laboratories Inc, #A300246A; 1:1000 dilution), α-ATR (Santa Cruz
Biotech, sc-1887; 1:500 dilution), α-ATM (Abcam, #ab78; 1:1000
dilution), α-RPA2 (Cell Signaling, #2208; 1:1000 dilution), α-RPA1
(Cell Signaling, #2267; 1:1000 dilution), α-Rbp1 CTD (Cell Signaling,
#2629; 1:4000 dilution) and α-SETX (Novus Biologicals, #NB100-57542;
1:1000 dilution). The secondary antibodies were α-mouse IgG, HRP-linked
Antibody (Cell Signaling, #7076; 1:5000 dilution), α-rabbit IgG,
HRP-linked Antibody (Cell Signaling, #7074; 1:5000 dilution), α-rat
IgG, HRP-linked Antibody (Cell Signaling, #7077; 1:5000 dilution) or
α-goat IgG (H + L), HRP-linked Antibody (Invitrogen, #PA1-28664; 1:5000
dilution).
Chromatin Immunoprecipitation
HeLa cells were transfected with the respective siRNAs as per the
experimental conditions mentioned. 48 h post transfection, the cells
were cross-linked with formaldehyde at 1% final concentration for
10 min at room temperature or cross-linked according to dual
cross-linking protocol as described previously^[259]73. Chromatin
immunoprecipitation (ChIP) assays were conducted using the Zymo-Spin
ChIP kit (Zymo Research Corp., #D5210) following manufacturer’s
instructions. Sonication was performed at high power setting for 80
cycles (30 s ON, 30 s OFF) using a Bioruptor Plus (Diagenode Inc.,
Denville, NJ), yielding a modal fragment size of <600 bp. Antibodies
used in ChIP assays included: α-RNA Pol II (Cell Signaling 2629S),
α-γH2AX (Abcam, #ab2893), α-RAD52 (Santa Cruz Biotechnology,
#sc-365341), or α-S9.6 (Kerafast, #ENH001) and normal mouse IgG
(Millipore-Sigma, #12-371). Approximately 20 µg of chromatin was used
in each ChIP assay with 5 µg of antibodies, or 100 µg of chromatin was
used in each ChIP assay with 10 µg of antibodies. IgG negative control
was included with each assay. DNA libraries were prepared by Zymo
Research Epigenetics Services and were sequenced on a NovaSeq
sequencer.
ChIP-seq analysis
ChIP sequencing reads were trimmed using Cutadapt^[260]74 and aligned
to the human reference genome (hg19) using BWA^[261]75. We applied read
filtering to remove reads that were marked as duplicates (picard)
(“Picard Toolkit.” 2019. Broad Institute, GitHub Repository.
[262]https://broadinstitute.github.io/picard/; Broad Institute), reads
that were not primary alignments, unmapped, mapped to multiple
locations, or contained > 4 mismatches (samtools)^[263]76.
Deeptools^[264]77 bamCoverage was used to create normalized bigWig
files using CPM (counts per million) normalization. We further used
deeptools computeMatrix, plotHeatmap and plotProfile for visualization
of ChIP-Seq data at TSS and TTS.
MACS2 callpeak^[265]78 was used to identify ChIP-Seq peaks relative to
input data. To compare peaks from different conditions we retained
peaks with P-values > 2 (-log10) from the narrowPeak output files. The
ChIPpeakAnno package^[266]79 was used in R to create Venn diagrams of
colocalized peaks in different conditions.
RNA-seq
HeLa cells were transfected with the respective siRNAs as per the
experimental conditions mentioned. 48 h post transfection, samples were
sent to Zymo Research for Total RNA-Seq Service. Total RNA-Seq
libraries were constructed from 500 ng of total RNA. Libraries were
prepared using the Zymo-Seq RiboFree Total RNA Library Prep Kit (Zymo
Research Corp., #R3000) according to the manufacturer’s instructions.
Briefly, RNA was reverse transcribed into cDNA, which was followed by
ribosomal RNA depletion. After that partial P7 adapter sequence was
ligated at 3’ end of cDNAs, followed by second strand synthesis and
partial P5 adapter ligation to 5’ end of the double stranded DNAs.
Lastly, libraries were amplified to incorporate full length adapters
under the following conditions: initial denaturation at 95 °C for
10 min; 10–16 cycles of denaturation at 95 °C for 30 sec, annealing at
60 °C for 30 sec, and extension at 72 °C for 60 sec; and final
extension at 72 °C for 7 min. Successful library construction was
confirmed with Agilent’s D1000 ScreenTape Assay on TapeStation. RNA-Seq
libraries were sequenced on an Illumina NovaSeq to a sequencing depth
of at least 30 million read pairs (150 bp paired-end sequencing) per
sample.
RNA-seq analysis
RNA-Seq reads were aligned to GRCh37 human genome using STAR RNA-Seq
aligner^[267]80, and then reads from transcripts were counted using
GenomicAlignments package in Bioconductor^[268]81,[269]82. Fold changes
between siRAD52 and siNT cells were obtained from DESeq2^[270]83, which
was performed using raw read counts in siRAD52/siNT as pairs. P-values
obtained from DESeq2 were corrected for multiple testing using
Benjamini and Hochberg method.
According to Chip-seq RNA Pol II and RAD52 data, we found 10% of peaks
in RNA Pol II data overlapped with RAD52 peaks. Thus, we tried to tell
whether the expression changes in genes near these overlapped peaks
were influenced by RAD52. Here, we retrieved genes overlapped with
these peaks covering transcription start sites for the analysis. We
first examined gene expression levels for the overlapped genes
according to DESeq2 normalized counts. Most of genes did not display
significant change between RAD52 and NT cells. Next, we compared gene
fold changes between RAD52 and NT cells for genes overlapped with RAD52
peaks and not overlapped with RAD52 peaks. However, by examining the
relationship between averaged expression with fold changes included in
DESeq2 output, we observed higher variance of fold changes for lowly
expressed genes, therefore, we decided to exclude lowly expressed genes
with averaged expression level <50 for the comparison. Student’s t-test
was used to test the differences of gene fold changes. At last, pathway
enrichment analysis was performed for the genes overlapped with RAD52
peaks using ClusterProfiler^[271]84.
DSB repair assays
The SSA reporter plasmid hprtSAGFP (Addgene, #41594) and the HR
reporter plasmid pDRGFP (Addgene, #26475) were gifts from Maria Jasin.
HCT116 WT or RAD52^-/- cells were transfected with 0.5 µg DR-GFP or
SA-GFP and 1.5 µg pCBASceI (Addgene, #26477) using Lipofectamine 3000
as described in the cell culture and transfection section. 72 h later,
the cells were harvested, and percentages of GFP-positive cells per
100,000 cells were determined by flow cytometry (HTFC Screening System,
IntelliCyt). For each experiment, the percentage of GFP positive cells
in the empty vector control was subtracted from the I-SceI-transfected
cells. Flow cytometry data were analyzed using BD FlowJo (v.10.6.2).
DNA fiber analysis
Cells were seeded onto 6 well plate and allowed to grow for 24 h. Cells
were sequentially labeled with thymidine analogs: 25 µM CldU
(5-Chloro-2′-deoxyuridine) and 250 µM IdU (iododeoxyuridine) for 30 min
each. The reaction was terminated by addition of ice-cold PBS and cells
were trypsinized. Cells were lysed with lysis buffer (50 mM EDTA pH
8.0, 0.5% SDS, 200 mM Tris-Cl pH 7.5) on a clean slide and incubated
for 5–7 min. After cell lysis, DNA was spread on glass slides and
slides were tilted at an angle of 25°. Slides were air dried. Cells
were fixed in methanol/acetic acid (3:1) for 10 min, denatured with
2.5 M HCl for 1 h. Cells were blocked with 5% BSA in PBST (10 mM sodium
phosphate, 0.15 M NaCl, 0.1% Tween™ 20 buffer at pH 7.5) for 1 h and
stained with primary antibody Anti-BrdU (5-bromo-2′-deoxyuridine)
monoclonal antibody from rat (abcam #ab6326; 1:50) and Anti-BrdU
(5-bromo-2′-deoxyuridine) monoclonal antibody from mouse (BD
Biosciences #347580; 1:100) for 2 h. Slides were washed 3 times with
PBST. Slides were incubated with Chicken anti-rat Alexa Fluor 488
(Invitrogen, #A21470; 1:300) and Rabbit anti-Mouse Alexa fluor
594-conjugated (Invitrogen, #A11062; 1:200) for 1 h. Slides were washed
3 times with PBST and coverslips were mounted on the slides with the
mounting medium. DNA fibers were visualized on the Nikon Spinning disk
confocal microscope at 60X, and images were collected and then analyzed
with ImageJ.
Cell cycle analysis
HeLa cells were transfected as described and harvested 48 h post
transfection. Cells were then washed with PBS and fixed in 1 ml cold
70% ethanol for at least 30 min on ice. Cells were pelleted and washed
with PBS. The cells were resuspended in the staining solution (0.1%
TrittonX-100, 200 µg/mL RNase A and 50 µg/mL propidium iodide in PBS)
and incubated for 15 min at 37 °C in the dark. 50,000 cells per
condition were analyzed by flow cytometry using the LSR Fortessa
instrument (BD Biosciences). BD FACSDiva software was used with the BD
Biosciences LSR Fortessa Analyzer for flow cytometry data acquisition.
Flow cytometry data were analyzed using BD FlowJo (v.10.6.2).
R-loop consensus analysis
To identify consensus regions of R-loop in the human genome, we sourced
23 published R-loop bigwig files from the UCSC genome browser^[272]52.
These files were subsequently converted to bed format utilizing the
‘bigWigToBedGraph’ tool
([273]https://genome.ucsc.edu/goldenPath/help/bigWig.html). Strands
were merged, and replicates were consolidated with the ‘bedtools
unionbedg’ function^[274]85. Broad peaks were then identified using the
‘macs2 bdgpeakcall’ function^[275]86. The average intensity score for
the bed regions was ascertained with our custom
‘calculate_mean_intensity_score.pl’ script. On assessing the
correlation of peak scores across the 23 tracks, five tracks exhibited
discrepancies and were consequently excluded from further analyses. The
remaining 18 bed files were merged using the ‘bedtools multiinter’
function, including all R-loop regions without any filters. These
regions were subtracted from the entire human hg19 reference genome, to
obtain genomic regions devoid of any R-loop. To pinpoint the consensus
R-loop region, we filtered out peak scores below 200, merged regions
from varying tracks with the ‘bedtools multiinter’ function, and
further refined this merged file to capture regions with a minimum of
five overlapping tracks and a maximum length of 5000 bp. The resulting
consensus R-loop region and control areas were employed to analyze
R-loop enrichment across different genomic sections and to evaluate
breakpoint density within R-loop regions subsequently. All scripts used
in processing and the derived consensus R-loop and control regions can
be obtained at
[276]https://github.com/ipstone/rloop_genome_instability.
Consensus overlap analysis with genomic mutational signatures
The mutational calls (SNVs, indels and structural variants) were
downloaded from PCAWG (264 liver, 239 pancreatic, 189 prostate, 71
ovarian and 70 melanoma tumors)^[277]56, ICGC/BRCA-EU project (320 ER+
breast tumors)^[278]55 and TCGA cohorts (377 liver, 185 pancreatic, 500
prostate, 587 ovarian, 470 melanoma, and 713 ER+ breast
tumors)^[279]54. The gene annotations for transcription start site
(TSS) and transcription end site (TES) were downloaded from
[280]GENCODE for ‘genecode.v19.annotation.gtf’ data file. A window of
1 kb (+/-) TSS and TES were added to these sites to intersected with
the previously prepared R-loop positive regions as well as the R-loop
negative regions. The resulting bed files was used to intersect with
the genomics mutation data to calculate the density of SNVs, indels and
structural variant breakpoints on these regions. For SNVs and indels,
the count of mutations were calculated using the
‘SigProfilerMatrixGenerator’ function from SigProfiler package
([281]https://github.com/AlexandrovLab/SigProfilerMatrixGenerator),
these numbers were further divided by the genomic region length to get
mutation densities for the respective regions. Similarly, for the
structural variant breakpoints, the downloaded structure variants data
were intersected with the respective R-loop positive and negative
genomic regions using ‘bedtools pairtobed’ function. After
intersection, the SV breakpoints densities were calculated similarly as
SNVs and indels. The density calculations and comparisons were all done
in the R statistical software and visualized in Prism.
For the head-on (HO) collision and the co-directional (CD) collision
double-strand break density comparison, the HO/CD regions published in
the afore mentioned paper^[282]44 were downloaded and intersected with
R loop consensus positive regions and negative regions. A + /- 1 kb
window was added to the R-loop consensus region before intersection.
The double strand breaks density (indels and SVs) was calculated
similarly as for the TSS/TES regions above. All the analysis code and
accompanying input files are available at the following github
repository: [283]https://github.com/ipstone/rloop_genome_instability.
Evaluating association between RAD52 RNA expression and genomic alterations
at R-loop
To investigate whether down-regulation of RAD52 is associated with
increased alterations in R-loops, we performed an analysis of tumor
RNA-seq data in the PCAWG cohort. Specifically, we downloaded PCAWG
RNA-seq data from [284]ICGC and extracted RAD52 Fragments Per Kilobase
of transcript per Million mapped reads (FPKM) values, which were
further transformed into Transcripts Per Million (TPM) values. To
eliminate confounding factors stemming from homologous recombination
deficiency (HRD), HRD cases were excluded based on the results defined
by Nguyen et al^[285]87.
Utilizing the RAD52 TPM values, tumors were categorized into RAD52-high
and RAD52-low groups, representing the top 25% and bottom 25% of
samples.
To compare structural variants (SVs) and Indels in R-loop regions, we
calculated the averaged SVs and Indels per Mbp for each sample. This
calculation was performed using the formula: (N[sv] + N[indel])/R-loop
segment size * 1,000,000. Finally, an unpaired t-test with Welch’s
correction was used to compare the levels of SVs and Indels in R-loop
regions for RAD52-high and RAD52-low tumors.
Image processing and data analysis
For PLA and S9.6, γH2AX experiments, slides were imaged at 60X
(immersion oil) with Nikon spinning disk confocal microscope. PLA foci
per nucleus, S9.6 and γH2AX foci per nucleus were calculated using
Nikon Elements AR Analysis Explorer (version 5.21.03), where DAPI was
used as a mask for the nucleus. The number of PLA foci, S9.6 foci,
γH2AX foci was counted for each DAPI to obtain the average number of
gH2AX foci in each condition.
Statistical analysis
Statistical analysis was carried out by unpaired two-tailed t-test
(unless stated otherwise) using GraphPad Prism Version 10.2.1 for
Windows (GraphPad Software, San Diego, CA, USA). All values are
expressed as mean ± standard error of the mean (SEM). p-values <0.05
were considered statistically significant. ns: non-significant.
Reporting summary
Further information on research design is available in the [286]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[287]Supplementary Information^ (4.1MB, pdf)
[288]Peer Review File^ (1.3MB, pdf)
[289]41467_2024_51784_MOESM3_ESM.pdf^ (125.9KB, pdf)
Description of Additional Supplementary Files
[290]Supplementary Data 1^ (2.9MB, xlsx)
[291]Supplementary Data 2^ (4.1MB, xlsx)
[292]Supplementary Data 3^ (23.6KB, xlsx)
[293]Supplementary Data 4^ (43.5KB, xlsx)
[294]Supplementary Data 5^ (50.6KB, xlsx)
[295]Supplementary Data 6^ (11.5KB, xlsx)
[296]Reporting Summary^ (5.8MB, pdf)
Source data
[297]Source Data^ (4.7MB, zip)
Acknowledgements