Abstract

   Eukaryotic genomes are extensively transcribed into various types of
   RNAs, many of which are physically associated with chromatin in cis at
   their transcription sites or in trans to other genomic loci. Emerging
   roles have been uncovered for these chromatin-associated RNAs (caRNAs)
   in gene regulation and genome organization, yet they remain challenging
   to interrogate. Here, we present TaDRIM-seq, a technique employing
   Protein G (PG)-Tn5-targeted DNA elements and in situ proximity ligation
   to concurrently probe caRNAs across diverse genomic regions as well as
   global RNA-RNA interactions within intact nuclei. Notably, this
   approach diminishes required cell inputs, minimizes hands-on time
   compared to established methodologies, and is compatible in both
   mammalian cells and plants. Using this technique, we identify extensive
   caRNAs at DNA anchor regions associated with chromatin loops and reveal
   diurnal variation in RNA-DNA and RNA-RNA connectivity networks within
   rice.

   Subject terms: Genome-wide association studies, Chromatin analysis,
   Next-generation sequencing, Non-coding RNAs
     __________________________________________________________________

   Chromatin-associated RNAs play various roles in gene regulation and
   genome organization, yet they remain challenging to interrogate. Here
   the authors report a targeted DNA-associated RNA and RNA-RNA
   interaction mapping method and use it to identify caRNAs at DNA anchor
   regions associated with chromatin loops.

Introduction

   The eukaryotic genome is prevalently transcribed and produces a vast
   array of distinct RNAs^[44]1. Some RNAs, in particular long non-coding
   RNAs (lncRNAs), interact with chromatin by being either recruited to
   different genomic regions (Trans-acting RNAs) or retained at their
   transcription sites (Cis-acting RNAs)^[45]2. These RNAs possess
   significant functions in regulating gene transcription^[46]3–[47]5,
   epigenomic modifications^[48]6–[49]8 and chromatin states^[50]9–[51]12.

   Current studies estimate that approximately 1500 RNA-binding proteins
   (RBPs) exist in mammals^[52]13, which associate with diverse regulatory
   RNAs to execute their functions^[53]14. The architectural protein
   CCCTC-binding factor (CTCF) relies on its RNA bind capacity for
   chromatin organization and gene regulation in mammalian cells^[54]15.
   RNAs are also required for Polycomb repressive complex 2
   (PRC2)-mediated chromatin remodeling^[55]16. In contrast, RNA-RNA
   spatial interactions and their binding proteins facilitate chromatin
   looping during the transcriptional activation of genes^[56]17. This
   evidence suggests that the functional versatility of RNA molecules in
   3D genome organization is mediated by their networking with protein,
   DNA, and RNA. Current methodologies for caRNAs associated with targeted
   DNA loci mapping, such as ChRD-PET^[57]18and RedChIP^[58]19, demand
   extensive research resources, time-consuming procedures, inefficiency
   in the in vitro ligation reaction, and inability to capture RNA-RNA
   interactions.

   To address these limitations, we report a targeted DNA-associated RNA
   and RNA-RNA interaction mapping by sequencing (TaDRIM-Seq) technique
   that combines PG-Tn5-targeted DNA element and in situ proximity
   ligation to simultaneously profile protein-centric RNA-chromatin
   interactome, as well as RNA-RNA spatial interactions within the
   nucleus. We show that TaDRIM-seq not only presents an efficient,
   high-resolution, and powerful detection strategy in both animal and
   plant systems, but also reduces the number of input cells and the time
   required. Profiling of caRNAs by TaDRIM-seq within H3K9ac- and
   H3K27me3-marked regions in rice, as well as H3K4me3- and CTCF-marked
   regions in K562 cells, reveals protein-specific RNA-DNA and global
   RNA-RNA interactions, and the diurnal variations in RNA-DNA and RNA-RNA
   interactomes captured by TaDRIM-seq in rice suggest dynamic
   coordination of 3D genome during day/night cycles.

Results

Overview of the TaDRIM-seq Technology

   In the TaDRIM-seq method (Fig. [59]1a), cells or plant tissues were
   crosslinked with 1% formaldehyde, and nuclei were isolated and
   incubated with a specific antibody. The PG-Tn5, preloaded with a DNA
   adapter, bound to this antibody and labeled the targeted genomic DNA
   with the DNA adapter by tagmentation. Concurrently, a 5’-end adenylated
   biotin-labeled bridge linker was designed to specifically ligate 3’-end
   RNA and double-stranded DNA. We then performed in situ RNA ligation
   followed by cDNA synthesis using ligated bridge linker-RNA. The
   non-ligated bridge linker was removed by washing, and in situ proximal
   ligation was used to generate cDNA-bridge linker-DNA adapter-DNA and
   cDNA- bridge linker-cDNA chimeric fragments. The ligation products were
   subsequently converted to double-stranded DNA through reverse
   crosslinking, gap-filling, and second-strand DNA synthesis, with DNA
   fragments being produced using Tn5 transposase. Finally, the ligation
   products from DNA fragments were captured by biotin-streptavizdin
   affinity and PCR-amplified to generate TaDRIM-seq libraries for
   next-generation sequencing. The TaDRIM-seq sequencing reads were
   processed by two distinct pipelines to extract target protein-specific
   RNA-DNA interactome and global RNA-RNA interactions in the nucleus
   (Supplementary Fig. [60]1a).

Fig. 1. The TaDRIM-seq technique.

   [61]Fig. 1
   [62]Open in a new tab

   a TaDRIM-seq schematic. b Genome browser tracks showing the
   distribution of DNA from H3K4me3 and CTCF TaDRIM-seq data in K562
   cells; H3K4me3 TaDRIM-seq in 293T cells and Arabidopsis; and H3,
   H3K4me3, and H3K27me3 TaDRIM-seq data in rice. c Strand orientation of
   TaDRIM-seq sense and antisense RNA. Left: CTCF TaDRIM-seq RNA. Rigth:
   H3K9ac TaDRIM-seq RNA. d Number of RNAs identified on CTCF-marked
   genomic loci using TaDRIM-seq, RedChIP, and fRIP-seq in K562 cells. e
   Scatterplot showing RNA-DNA interactions by CTCF TaDRIM-seq compared
   with CTCF RedChIP in K562. Pearson correlation coefficient: r = 0.82. f
   Examples of RNA-chromatin interactions within the region chr1:
   5,811,994-12,441,569, located within the broader region chr1:
   1-19,209,288, were identified using CTCF TaDRIM-seq and RedChIP data.
   Each line represents an RNA-DNA interaction that is divided into RNA
   ends and DNA ends. Bottom: scatterplot showing the RNA-DNA interactions
   mediated by protein-coding RNA from RERE gene in chr1:
   8,297,864–9,467,625. Source data are provided as a Source Data file.

   To evaluate this approach, we first tested the effect of various
   cross-linking conditions on the TaDRIM-seq method in K562 cells using
   the H3K4me3 antibody, and found that under 1% formaldehyde, 10 min
   crosslinking conditions, the TaDRIM-seq method exhibited the optimal
   signal-to-noise ratio, detecting more RNAs and RNA-involved interaction
   clusters (Supplementary Fig. [63]1b, c). Subsequently, we applied it to
   K562 and 293T cells using CTCF and H3K4me3 antibodies, respectively,
   and observed high robustness and reproducibility across biological
   replicates (Supplementary Fig. [64]1d–g). To further evaluate the
   compatibility of TaDRIM-seq, we tested it in rice using H3, H3K9ac, and
   H3K27me3 antibodies, and H3K4me3 antibody in Arabidopsis seedlings,
   both of which demonstrated high reproducibility across different
   samples and replicates (Supplementary Fig. [65]2).

   To assess the specificity of TaDRIM-seq, we compared the enrichment of
   TaDRIM-seq DNAs with cleavage under targets and tagmentation (CUT&Tag)
   DNAs obtained using the same antibody and found that the DNA peaks of
   TaDRIM-seq were highly similar to the CUT&Tag peaks, whereas the H3
   TaDRIM-seq DNA signals were widely dispersed over the whole genome
   (Fig. [66]1b). Next, we identified peaks using MACS2 with reads from
   TaDRIM-seq and CUT&Tag. The TaDRIM-seq peaks coincided with the CUT&Tag
   peaks, with approximately 77% of the H3K4me3 peaks (total, 18,269) in
   293T cells, 73% of the H3K9ac peaks in rice (total, 22,873), and 87.8%
   of the H3K4me3 peaks in Arabidopsis (total, 16,225) overlapping their
   respective CUT&Tag peaks (Supplementary Fig. [67]3a). We subsequently
   estimated the proportion of reads that aligned within the called peaks
   and found that the fraction of reads in peaks (FRiPs) was 53.5% for
   H3K4me3 TaDRIM-seq compared to 79.2% for H3K4me3 CUT&Tag in K562 cells.
   The FRiPs were 32.06% for H3K27me3 TaDRIM-seq and 37.5% for H3K27me3
   CUT&Tag in rice. In Arabidopsis, the FRiP for H3K4me3 TaDRIM-seq was
   32%, whereas it was 48.3% for H3K4me3 CUT&Tag (Supplementary
   Fig. [68]3b). These results suggest a relatively high specificity of
   TaDRIM-seq in detecting DNA sites of interest.

   To test the specificity of the RNA molecules captured by TaDRIM-seq, we
   assessed RNA strand specificity and revealed distinctive
   transcriptional trajectories (Fig. [69]1c and Supplementary
   Fig. [70]3c). These results suggest that TaDRIM-seq not only
   successfully captured DNA regions specifically targeted by the protein
   of interest, but also reflected the specificity of RNA in human cells
   and plants.

   Subsequently, we compared CTCF-associated RNA-DNA interactions in K562
   cells obtained by TaDRIM-seq with the results generated by previously
   reported approaches^[71]19,[72]20. We observed that the total number of
   RNAs identified by the three protocols was highly consistent
   (Fig. [73]1d), with approximately 80.4% of the overlapping RNAs
   detected by TaDRIM-seq and fRIP-Seq. RedChIP also showed high
   concordance with the TaDRIM-seq RNA-DNA results (r = 0.82; Fig. [74]1e,
   f). The previously reported^[75]20–[76]23 lncRNAs MALAT1, NEAT1, and
   XIST associate with CTCF protein binding sites, with MALAT1 and XIST
   enriched at the transcription start sites (TSS) and X chromosome,
   respectively, whereas NEAT1 showed extensive trans-interactions on a
   whole-genome scale (Supplementary Fig. [77]3d, e), consistent with
   previous studies.

Profiling of epigenome, RNA-DNA and RNA-RNA interactomes with TaDRIM-seq

   The TaDRIM-seq technology yields multi-sets of interaction data and
   also provides information on epigenetic modifications. In the H3K4me3
   TaDRIM-seq dataset, we observed that the unique RNA-DNA interaction
   reads (11.5 million) generated 581,634 high-confident RNA-DNA
   interaction clusters (false discovery rate <0.05), among which 253,758
   interaction clusters (interaction contact count ≥3) were associated
   with RNAs transcribed from 16,716 genic loci (Supplementary
   Data [78]1).

   We next constructed an RNA-DNA heatmap and observed comprehensive
   interactions among caRNAs and genomic regions marked by H3K4me3 at
   various resolutions across the genome (Fig. [79]2a). Furthermore,
   TaDRIM-seq in situ profiles enriched transcripts and their binding
   sites in the H3K4me3-marked regions (Figs. [80]1b and [81]2b).

Fig. 2. TaDRIM-seq provides protein-centric RNA-chromatin interactomes,
RNA-RNA spatial interactions, and protein-binding DNA and RNA information.

   [82]Fig. 2
   [83]Open in a new tab

   a RNA-DNA contact heatmap of chromosome 3 in diverse resolutions by
   H3K4me3 TaDRIM-seq in K562 cells. Res, resolution. b Region of
   chromosome 2 showing the interaction of identified RNAs with DNA
   elements marked by H3K4me3 in K562 cells. c, d RNA-RNA contact heatmap
   of chromosome 11 at different resolutions in K562 cells by TaDRIM-seq.
   Zoomed-in view of detailed lncRNA MALAT1 interaction with NEAT1 (d).
   CTCF and H3K4me3 TaDRIM-seq data were combined to capture RNA-RNA
   interactions in K562 cells. e TaDRIM-seq recapitulates intramolecular
   interactions between NEAT1 and XIST lncRNAs. f TaDRIM-seq chimeric
   reads of U3 snoRNA in K562 cells. Source data are provided as a Source
   Data file.

   We next identified approximately 22.5 million uniquely mappable reads
   in H3K4me3 TaDRIM-seq to generate 63,730 and 330,347 intra- and
   inter-molecular RNA-RNA interaction clusters (false discovery rate
   <0.05), respectively. Analysis of the genome-wide intermolecular
   RNA-RNA interaction map showed that different RNA molecules engaged in
   extensive interactions within the nucleus (Fig. [84]2c). For example,
   MALAT1 and NEAT1 not only displayed significant enrichment on chromatin
   but also formed robust intermolecular interactions with one another
   (Fig. [85]2d). Intramolecular interactions of RNAs were captured by
   TaDRIM-seq as well, such as XIST and NEAT1 (Fig. [86]2e), and snRNAs
   (U1, U2, U3) (Fig. [87]2f and Supplementary Fig. [88]4) in rice and
   K562 cells. Together, these results demonstrate that TaDRIM-seq is an
   effective multi-omic technique that enables concurrent detection of
   RNA-chromatin, RNA-RNA, and intramolecular RNA interactions, providing
   a comprehensive caRNA map in animal cells and plants.

Comparison of TaDRIM-seq with existing technologies

   Compared to similar RNA-chromatin proximity ligation
   methods^[89]18,[90]19, TaDRIM-seq represents a significant technical
   improvement. A key step of TaDRIM-seq compared to existing
   methodologies is combining a DNA adapter and bridge linker, which
   enables simultaneous profiling of RNA-chromatin interactome at
   chromatin targets and global RNA-RNA interactions in a single
   experiment. While RedChIP and ChRD-PET can only detect interactions
   between RNA and chromatin. Recently, RT&Tag was developed to
   effectively profile RNA at chromatin targets^[91]24. Compared to
   RT&Tag, TaDRIM-seq requires a relatively high number of input cells,
   sequencing depth, and cost. However, TaDRIM-seq can provide information
   on chromatin-associated RNAs and their respective binding sites, as
   well as RNA-RNA interactions (Supplementary Fig. [92]5a). We then
   estimated the proportions of the two types of interactions and
   discovered that TaDRIM-seq captures RNA-DNA and RNA-RNA interactions
   with varying proportions across different samples. For instance, in
   young rice leaves, RNA-DNA interactions had a significantly greater
   contribution compared to RNA-RNA interactions (66.5% vs. 33.5%). In
   contrast, in human 293T cells, the proportions of these two
   interactions were 50.7% and 49.3%, respectively (Supplementary
   Fig. [93]5b).

   In addition, TaDRIM-seq does not require chromatin immunoprecipitation,
   End-repair, A-tailing, and sonication (Fig. [94]3a), which greatly
   reduces the technical complexity and time duration. Moreover, compared
   with ChRD-PET, which disrupts the nuclear structure and with all
   experimental steps performed in solution, the proximity ligation and
   antibody binding were performed in intact nuclei by TaDRIM-seq, which
   reduces possible false-positive rates^[95]25.

Fig. 3. Comparison of TaDRIM-seq with similar techniques.

   [96]Fig. 3
   [97]Open in a new tab

   a Summary of the characteristics of the TaDRIM-seq, ChRD-PET, and
   RedChIP methods. b Analysis of the uniquely RNA-DNA contact mapped
   reads in TaDRIM-seq, ChRD-PET, and RedChIP data. c Normalized average
   RNA read coverage signals around exons from the TaDRIM-seq and ChRD-PET
   methods in rice. d Compare the RNA-DNA contact heatmap of chromosome 8
   in diverse resolutions in rice to the ChRD-PET maps. Res, resolution. e
   Histogram showing the percentages of intra- and intermolecular RNA-RNA
   interactions by TaDRIM-seq and RIC-seq in K562 cells and rice. Source
   data are provided as a Source Data file.

   RedChIP and ChRD-PET rely on immunoprecipitation to capture
   protein-mediated RNA-DNA interactions, thus requiring a high amount of
   sample material and potentially leading to the loss of interaction
   data. Comparative analyses of input material and RNA-DNA interaction
   clusters indicated that TaDRIM-seq achieved a 17-fold and 20-fold
   reduction in plant materials and input cells, respectively, for the
   experiment (Fig. [98]3a). TaDRIM-seq outperformed ChRD-PET,
   demonstrating a more than ~4.25-fold increase in high-confident RNA-DNA
   interaction clusters (233,734 versus 54,994) with lower sequencing
   depth, and identified 32.6% more transcribed RNAs (24,861 versus
   18,748) in rice (Supplementary Fig. [99]5c). TaDRIM-seq was also
   compatible with long-read sequencing, whereas in RedChIP, the MmeI was
   utilized to trim DNA sequences, producing only 18-20 base pair tags
   that are more challenging to map uniquely to the reference genome.
   Consistently, TaDRIM-seq showed a 2.5-fold increase (4.68% versus
   1.90%) and a 3.7-fold increase (13.70% versus 3.75%) in uniquely mapped
   RNA-DNA contact reads in K562 cells and rice, respectively, compared to
   RedChIP and ChRD-PET (Fig. [100]3b and Supplementary Data [101]1). We
   also observed that RNA reads exhibited higher exon and adjacent
   intronic signals in TaDRIM-seq (Fig. [102]3c) than in ChRD-PET in rice,
   whereas TaDRIM-seq and RedChIP showed no significant variation in K562
   cells (Supplementary Fig. [103]5d), likely because TaDRIM-seq has more
   aligned reads than ChRD-PET.

   Next, we classified RNA-DNA interactions based on the interaction
   distance as follows: cis for distances ≤3 kb, intra-chromosomal for
   distances >3 kb within the same chromosome, and inter-chromosomal for
   interactions occurring between different chromosomes. TaDRIM-seq
   generated a significantly lower ratio of coding RNA-mediated
   interaction clusters occurring at a distance <3 kb within the same
   chromosome (cis interaction) (Supplementary Fig. [104]5e, f). The
   H3K9ac TaDRIM-seq and ChRD-PET interaction heatmaps of chromosome 8 at
   500 kb resolution contained 481,612 and 222,049 pairwise contacts,
   respectively. In the selected region at 100 kb resolution
   (chr8:9,500,000–16,500,000), the TaDRIM-seq contact heatmap contained
   45% more pairwise contacts than ChRD-PET (53,619 versus 29,538). In the
   selected region at 10 kb resolution (chr8:13,700,000–15,100,000), the
   TaDRIM-seq contact heatmap contained 12,810 pairwise contacts, which is
   3.6 fold of the number (3534) identified in ChRD-PET (Fig. [105]3d).
   These results indicate a significant enhancement of the resolution in
   detecting RNA-chromatin interactions in plant systems.

   Regarding RNA-RNA interactions, TaDRIM-seq mainly captured
   intermolecular interactions (85.1%), whereas RIC-seq detected
   intramolecular interactions (81.4%) in K562 cells (Fig. [106]3e),
   possibly due to in situ RNA proximity bridge linker ligation performed
   in TaDRIM-seq. Consistently, intermolecular interactions were mainly
   mediated by protein-coding RNA and snRNA in both TaDRIM-seq and RIC-seq
   (Supplementary Fig. [107]5g). Taken together, TaDRIM-seq exhibits
   significant improvements over existing methods in terms of experimental
   complexity, input cell requirements and resolution, among other
   factors, while also providing an innovative type of data on RNA-RNA
   interactions.

Global view of RNA-DNA/RNA-RNA interactions and their relation to 3D genome
architecture

   To better understand the interactive features of individual RNA in
   different chromosomal regions, we analyzed individual RNA-mediated
   interactions in rice and K562 cells and noticed that some RNAs possess
   a high connection frequency of interaction at specific chromatin
   targets. For example, MALAT1 from CTCF-marked regions was more involved
   in numerous trans-interactions than the H3K4me3-marked regions in K562
   cells (Supplementary Fig. [108]6a). Interestingly, the
   spliceosome-associated snRNAs (including U1, U2, U4, U5, U6, and U11)
   mediated interactions more in active chromatin regions (H3K9ac marked)
   when compared with inactive chromatin regions (H3K27me3 marked) in rice
   (Supplementary Fig. [109]6b), which reflects the role of spliceosomal
   RNAs in coordinated RNA processing machinery. Remarkably, OsLHY, a core
   circadian clock gene in rice, was involved in 1109 RNA-DNA interactions
   in active chromatin regions and 221 within inactive chromatin regions
   (Supplementary Fig. [110]6b), consistent with its potential function in
   mediating the transcription of active rhythmic genes. In human cells,
   approximately 67,986 CTCF binding sites have been identified in K562
   cells^[111]26. In contrast, the CTCF TaDRIM-seq analysis revealed only
   10,602 CTCF-binding sites (Supplementary Fig. [112]3a), implying that
   many CTCF sites may not be associated with RNA.

   We further generated a two-dimensional contact matrix displaying the 20
   RNAs with the highest contact frequency in RNA-chromatin interactions
   in both rice and K562 cells. We observed that some trans-acting RNAs,
   such as MALAT1 and NEAT1, displayed extensive binding to most
   chromosomes in K562 cells (Fig. [113]4a and Supplementary
   Fig. [114]6c). Similarly, in rice, numerous RNAs were detected to
   interact widely with both active and inactive chromatin regions, such
   as putative lncRNA MSTRG.19487.2 and MSTRG.32815.1 (Supplementary
   Fig. [115]6d, e). In addition, MALAT1, NEAT1, and OsLHY also showed
   extensive intermolecular RNA-RNA interactions (Fig. [116]4b and
   Supplementary Fig. [117]6f). For example, MALAT1 RNAs were associated
   with 1,024 CTCF peaks and were also in proximity with other caRNAs
   (Supplementary Fig. [118]7a), which was further confirmed using RNA
   fluorescence in situ hybridization (RNA FISH) (Fig. [119]4c). Given
   that TaDRIM-seq detects RNA-RNA interactions based on proximity
   ligation rather than direct RNA-RNA binding, and considering that
   MALAT1 binds extensively across the genome, it is also possible that
   nearby RNAs were captured through proximity ligation. In contrast to
   MALAT1, some RNAs, such as AGAP1, IMMP2L, and USP34, did not form
   RNA-RNA interactions in TaDRIM-seq, which was then confirmed by RNA
   FISH (Fig. [120]4c and Supplementary Fig. [121]7b–f).

Fig. 4. Characterization of chromatin-associated RNAs in the 3D genome
architecture.

   [122]Fig. 4
   [123]Open in a new tab

   a Heatmap showing the top twenty chromatin-associated RNAs with the
   high connection frequency with genome, are widely recruited to
   multigenomic loci across the whole genome at 1 Mb resolution in CTCF
   TaDRIM-seq. b Circos plots of RNA-RNA interactions mediated by MALAT1
   and NEAT1 lncRNA in TaDRIM-seq. Each line represents an RNA-RNA
   interaction. The outer circles show different chromosomes, and the
   inner circles indicate high-frequency contacts. c MALAT1 is colocalized
   with NEAT1, SMYD3, and [124]GSE1 by RNA FISH in K562 cells. RNA FISH
   analysis also showed the co-localization of NEAT1, SMYD3, and
   [125]GSE1. Scale bar, 2 μm. MALAT1/ NEAT1: n = 47 cells were examined;
   MALAT1/ SMYD3: n = 41 cells were examined; MALAT1/ [126]GSE1: n = 42
   cells were examined. d Model for 3D chromatin structure based on CTCF
   ChIA-PET and CTCF TaDRIM-seq data. Model 1: the RNA-DNA interactions of
   DNA ends (Basal) that are not involved in chromatin interactions. Model
   2: RNA-DNA interactions of DNA ends (Anchor) that participate in the
   chromatin loop. Based on TaDRIM-seq technology, global RNA-RNA spatial
   interactions can be determined. e Boxplot showing the expression levels
   of basal genes from model 1 (n = 3927) and anchor genes from model 2
   (n = 6064) in CTCF TaDRIM-seq, as well as intensities of H3K4me3 and
   RNAPII peaks in basal and anchor genes. H3K4me3 marker peaks are
   associated with basal genes (n = 15,358) and anchor genes (n = 7383),
   while RNAPII marker peaks are associated with basal genes (n = 5160)
   and anchor genes (n = 8743). In the box plot, the center line indicates
   the median, and the box represents the interquartile range (IQR). The
   whiskers extend to 1.5 times the IQR. P-values were evaluated using a
   one-sided Wilcoxon rank sum test. f Histogram showing the number of
   RNA-DNA interactions in models 1 and 2. Left: all RNA-DNA interactions
   in CTCF TaDRIM-seq. Right: MALAT1-chromatin interactions in CTCF
   TaDRIM-seq. g Screenshot showing genome-wide chromatin interactions
   involving MALAT1. h Connectivity network of RNA-chromatin interactions
   on chromosome 15 in K562 cells. Gray line: RNA-DNA interaction. DNA
   (green dots) belongs to Model 1, and DNA (blue dots) belongs to Model
   2. RNA 1 (red-dot) is engaged in intermolecular RNA-RNA interactions,
   and RNA 2 (yellow-dot) is not involved in intermolecular RNA-RNA
   interactions. Source data are provided as a Source Data file.

   To explore the potential relationship between RNA-chromatin
   interactions and three-dimensional (3D) chromatin structure, we
   combined CTCF TaDRIM-seq data with CTCF-associated ChIA-PET data from
   K562 cells. We classified the basal peaks and anchors into model 1 and
   model 2 according to whether the DNA anchors of the CTCF ChIA-PET loops
   and basal peaks were associated with RNAs (Fig. [127]4d). Compared with
   the basal genes from model 1, the anchor genes from model 2 exhibited
   significantly higher intensities of H3K4me3 and RNAPII, corresponding
   to their transcriptional levels (Fig. [128]4e).

   Furthermore, 44,585 RNA-DNA interactions associated with CTCF-marked
   DNAs (77.38%; total, 57,535) belonged to model 2 (Fig. [129]4f). Within
   MALAT1 RNA-DNA interactions, we observed 820 (80.08% of the total 1024)
   interactions associated with the anchor region of chromatin loops
   (Fig. [130]4f, g). In addition to the massive interactions of MALAT1
   and NEAT1 with various DNAs and RNAs, we identified microRNA-MIR3648,
   which is transcribed from the upstream region of pre-rRNA^[131]27.
   MIR3648 showed highly engagement in a multitude of RNA-RNA interactions
   within the cell nucleus (Supplementary Fig. [132]7g). Furthermore,
   since TaDRIM-seq detection of RNA-RNA interactions is not based on
   direct RNA-RNA binding events, it may also lead to the identification
   of RNA-RNA interactions involving highly transcribed RNAs.

   Multidimensional interaction networks were generated with RNA-DNA,
   RNA-RNA, and DNA-DNA data using Cytoscape on all chromosomes in K562
   cells (Fig. [133]4h and Supplementary Fig. [134]8). As illustrated on
   Chr.15, transcribed RNAs enriched in CTCF-related chromatin not only
   form an RNA-chromatin interactome at anchor regions of the chromatin
   loop but are also involved widely in RNA-RNA spatial interaction
   networks. For instance, one RNA and seven RNAs interact with the basal
   site (FBN1) and the anchor locus (PAK6), respectively, where the six
   RNAs interacting with the anchor gene also display RNA-RNA spatial
   interactions. While the RNA interacted with the basal gene tends to
   form few or no RNA-RNA interactions. Collectively, these findings
   suggest that RNAs are widely present at anchor regions from chromatin
   loops, and some RNAs may indirectly target specific genomic sites
   through RNA-RNA interactions. Therefore, by combining RNA-DNA, RNA-RNA
   interactions, and protein binding site information in TaDRIM-seq, we
   can predict the RNA networks at specific protein-marked genomic loci,
   providing a comprehensive connectivity map of nuclear components to
   uncover the function of RNA in genome organization and gene regulation.

Dynamic changes in H3K9ac-marked RNA-DNA interactions during diurnal cycles

   RNAPII-tethered chromatin interactions are highly dynamic and are
   associated with rhythmic gene expression in rice^[135]28. To
   investigate how RNA-chromatin and RNA-RNA interactions respond to
   day-night cycles in rice, we performed H3K9ac TaDRIM-seq on samples
   collected at 8:00 (AM) and 8:00 (PM) and generated in situ TaDRIM-seq
   contact maps. We found that the RNA-chromatin contact matrices at
   500 kb and 50 kb resolutions showed massive RNA-DNA interactions. At
   10 kb resolution, significant differences became prominent at the two
   time points; the same RNA interacted with one genomic region much
   stronger at 8 PM than at 8 AM (Fig. [136]5a). We classified caRNAs into
   mRNAs, lncRNAs, microRNAs, and sn/snoRNAs and calculated the proportion
   of cis-, intra-, and inter-chromosomal interactions at the two time
   points. Interestingly, snRNAs and snoRNAs displayed a higher
   contribution to inter-chromosomal interactions at both time points
   compared to other types of RNAs (Fig. [137]5b), indicating that they
   are primarily involved in long-range interactions with multiple genomic
   loci, which is consistent with previous reports^[138]18,[139]29. In
   contrast, mRNAs, lncRNAs, and microRNAs showed an increase in
   intra-chromosomal interactions at PM (Fig. [140]5b). Global views of
   H3K9ac-associated RNA-DNA interactions of 12 chromosomes of rice
   exhibited uneven distribution of caRNA along the genome between morning
   and evening (Supplementary Fig. [141]9a). We subsequently calculated
   intermolecular RNA-RNA interaction clusters formed by each transcript
   at two time points and observed 4524 (31%) up-regulated and 4329 (30%)
   down-regulated RNA interactions in the morning compared with evening,
   whereas 5629 (39%) RNAs exhibited no significant diurnal variation
   (Supplementary Fig. [142]9b). From the altered RNA-RNA interactions in
   rice, we identified key components in the circadian oscillator, such as
   OsLHY, OsPRR73, and Ghd7.1. The transcripts from the OsPRR73 and Ghd7.1
   gene loci exhibited widespread intermolecular RNA-RNA interactions in
   the AM compared to PM, whereas the OsLHY transcript only engaged in
   RNA-RNA interactions at 8 AM (Supplementary Fig. [143]9c).

Fig. 5. Alterations in H3K9ac-dependent RNA-DNA interactomes and RNA-RNA
interactions in rice during the diurnal cycle.

   [144]Fig. 5
   [145]Open in a new tab

   a RNA-chromatin contact heatmaps at 500, 50, and 10 kb resolution of
   the 08:00 (AM) and 20:00 (PM) H3K9ac TaDRIM-seq datasets. Zoomed-in
   view of detailed RNA-DNA interaction in the indicated region
   (chr3:12,332,500–12,416,250). b Histogram showing the percentages of
   cis-, intra-, and inter-chromosomal interactions by different types of
   RNAs in AM and PM. c Volcano plot showing transcripts differentially
   enriched for AM compared to PM H3K9ac TaDRIM-seq (FC > 2, FDR < 0.05).
   RNAs specifically enriched in AM and PM are highlighted in red and
   blue, respectively. d Examples showing RNA-DNA chromatin interactions
   by OsLHY (AM-specific mRNA) and MSTRG.34836.12 (PM-specific lncRNA).
   The top panel displays a genome browser track with RNA signals for
   OsLHY and MSTRG.34836.12. The bottom panel presents a circos plot
   detailing chromatin interactions involving OsLHY and MSTRG.34836.12
   RNA. e Genome browser track showing OsLHY (indicated regions:
   chr8:2,798,370–3,093,452) and MSTRG.34836.12 (indicated regions:
   chr8:13,880,548–14,924,032) involved in chromatin loops. f Connectivity
   network constructed from H3K9ac-centric RNA-DNA interactions from the
   top 10 specifically enriched RNAs with the highest FDR values at 08:00
   (AM) and 20:00 (PM). Line: RNA-DNA interaction. Red dots represent
   AM-specific RNAs and black dots represent PM-specific RNAs. DNA ends
   (green dots) are anchors, and DNA ends (magenta dots) are basal. Source
   data are provided as a Source Data file.

   To identify time-specific caRNAs in the diurnal cycle, we compared each
   RNA with the connection frequency of interaction at the two time points
   and found 674 and 335 RNAs that were differentially enriched at 8 AM
   and 8 PM, respectively (FC > 2, FDR < 0.05; Fig. [146]5c). The OsLHY
   transcript was heavily distributed in the whole genome at 8 AM, with a
   dramatic decrease in RNA-DNA interactions at 8 PM (1109 versus 7),
   while the genomic distribution pattern of lncRNA MSTRG.34836.12
   presented substantial differences between the two time points, with an
   increase in both cis- and inter-chromosomal interactions at 8 PM
   (Fig. [147]5d). GO enrichment analysis indicated that DNAs involved in
   specific RNAs interactions were enriched in light harvesting and
   carbohydrate biosynthetic processes at 8 AM, and in secondary metabolic
   and carbohydrate metabolic processes at 8 PM (Supplementary
   Fig. [148]9d). Together, our results demonstrate that TaDRIM-seq can
   discriminate time-specific RNA-chromatin interactions associated with
   H3K9ac, providing new insights into rhythmic gene coordination along
   with diurnal processes.

   To investigate the relationship between the time-specific
   caRNA-chromatin interactions detected by TaDRIM-seq and diurnal genome
   architecture in rice, we leveraged Hi-C data produced from the same
   tissue at 8 AM and 8 PM. We obtained H3K9ac-mediated chromatin
   interactions from Hi-C data based on H3K9ac-marked genomic regions by
   H3K9ac ChIP-seq at 8 AM and 8 PM. A total of 233,734 and 225,675
   high-confident RNA-DNA interaction clusters (interaction’s contact
   count ≥3), containing 21,983 and 23,475 DNA ends, were identified at 8
   AM and 8 PM, respectively (Supplementary Fig. [149]9e).Within these
   DNAs, 75.99% and 90.58% were also engaged in chromatin loops at the two
   time points. Two time-specific caRNAs, OsLHY and MSTRG.34836.12, were
   extensively associated with chromatin loops (Fig. [150]5e). From the
   TaDRIM-seq data, 194,240 and 210,034 of the RNA-DNA interactions
   involved DNA anchor ends marked by H3K9ac in the morning and evening,
   respectively (Supplementary Fig. [151]9f). We next constructed
   interaction networks from time-specific RNAs (FC > 2, FDR < 0.05: top
   10) and found OsLHY-centered highly connected RNA-DNA networks in the
   morning in contrast to the scattered, sporadic spatial RNA-DNA
   connectivity in the evening (Fig. [152]5f). Collectively, these results
   suggest that altered RNA-chromatin, RNA-RNA, and DNA-DNA interactions
   can form various networks throughout diurnal cycles, providing insights
   into the potential roles of plant caRNAs in chromatin organization and
   gene regulation.

Discussion

   Emerging roles of chromatin-associated RNAs, their respective binding
   DNA element and RNA-RNA interactions have been identified for chromatin
   structure and gene expression regulation from recent
   studies^[153]2,[154]15,[155]16. Here, we developed TaDRIM-seq, a robust
   and efficient methodology for simultaneously capturing RNA-DNA
   interactions at different epigenomic or protein-targeted regions, as
   well as the global RNA-RNA interactomes. We applied TaDRIM-seq for
   caRNAs mapping at H3K27me3- and H3K9ac-marked regions in rice, as well
   as H3K4me3- and CTCF-marked regions in K562 cells. Intra- and
   inter-molecular RNA-RNA interactions were profiled concurrently in
   mammalian cells and plants, and alterations in caRNAs distribution at
   H3K9ac-marked regions, as well as associated intermolecular RNA-RNA
   interactions during diurnal cycles, were uncovered in rice.

   Compared to existing methodologies, TaDRIM-seq provides several primary
   advantages^[156]18,[157]19. First, TaDRIM-seq enables the concurrent
   detection of two types of caRNA interactions within the same experiment
   by designing a DNA adapter for caRNA-DNA capture and an bridge linker
   for RNA-RNA detection. This methodology can be easily adapted to
   different antibodies in both animal and plant systems. Another key
   technical advancement in TaDRIM-seq is the use of a hyperactive pG-Tn5
   fusion protein for in situ fragmentation and tagmentation of
   protein-targeted chromatin. Thus, TaDRIM-seq does not depend on
   immunoprecipitation to capture protein-bound RNA-DNA interactions,
   significantly reducing the required sample material and technical
   complexity. Moreover, TaDRIM-seq simplifies the workflow by omitting
   end-repair, A-tailing, and sonication, thereby decreasing hands-on time
   and improving the performance-to-cost ratio. Meanwhile, TaDRIM-seq
   exhibited significantly higher resolution at 10 kb in detecting
   RNA-chromatin interactions, particularly in plant systems.

   Application of TaDRIM-seq in rice has unveiled genome-wide interactions
   between caRNAs in both active (H3K9ac-mark) and inactive
   (H3K27me3-mark) chromatin regions. The results showed that sn/sno RNAs
   were primarily involved in inter-chromosomal interactions in both
   H3K9ac- and H3K27me3-marked regions. In contrast, protein-coding RNAs,
   lncRNAs, and microRNAs tend to engage in more intra-chromosomal
   interactions within inactive versus active chromatin regions. Previous
   studies in plants have shown that spliceosome-associated snRNAs,
   including U1, U2, U4, U5, and U6atac, are widely involved in
   inter-chromosomal interactions at various genomic loci^[158]18,[159]29.
   Consistent with these results, we found that snRNAs, including U1, U2,
   U4, U5, U6atac, and U11, were strongly associated with active chromatin
   regions, which may reflect their potential roles in coordinated RNA
   processing machinery. Regarding lncRNAs, we identified MSTRG.19487.2
   and MSTRG.32815.1, which are associated with the two lncRNAs MALAT1 and
   NEAT1 in mammals^[160]21–[161]23, and are distributed across all
   chromosomes.

   MALAT1 is widely associated with chromatin and the 3D architecture of
   the genome. This may be due to MALAT1 localizing to nuclear speckles
   and the chromosomal interactions arranged around these
   structures^[162]30,[163]31. Recent studies have also revealed that
   NEAT1 and MALAT1 exhibit extensive intermolecular RNA-RNA
   interactions^[164]17. TaDRIM-seq detected NEAT1 and MALAT1, as well as
   several other lncRNAs, such as LINC02476 and ENSG00000286251,
   exhibiting similar chromatin-association patterns. Moreover, robust
   RNA-RNA interactions between NEAT1 and MALAT1 were also captured by
   TaDRIM-seq, which was further confirmed by RNA FISH assayed in the
   nucleus. Since TaDRIM-seq enables the detection of protein-specific
   RNA-chromatin interactions, we found a higher frequency of association
   between MALAT1 and CTCF-marked genomic loci compared between
   H3K4me3-marked DNAs. It has been reported that MALAT1 targets CTCF
   binding sites and participates in chromatin interactions mediated by
   CTCF protein^[165]32. Consistent with previous findings, our data
   revealed numerous caRNAs that were tightly bound to DNA anchor loci at
   chromatin loops in both rice and K562 cells. Mammalian studies indicate
   that RNA-DNA interactions are mainly localized within topologically
   associating domains (TADs)^[166]22,[167]23, whereas in rice, RNAs are
   more enriched in chromatin-interacting domains (CIDs)^[168]18,
   suggesting a potential role for RNA-chromatin interactions in shaping
   three-dimensional genome organization. Recent liquid-liquid phase
   separation studies in human cells have shown that ncRNAs can form
   condensates within the cell nucleus for transcriptional
   regulation^[169]33. Interestingly, we found that some caRNAs were also
   highly engaged in RNA-RNA interactions. Therefore, we speculate that
   RNAs associated with chromatin may also form condensates via RNA-RNA
   interactions, thereby affecting the three-dimensional structure of the
   genome.

   Furthermore, emerging role of microRNAs, such as miR3648, have been
   uncovered to bind extensively to repressed chromatin in the
   B-compartment as annotated by Hi-C data^[170]21,[171]25,[172]34. From
   our TaDRIM-seq results, miR3648 interacted with DNA and was highly
   involved in a multitude of RNA-RNA interactions within the nucleus.
   These findings provide valuable insights for functional studies of
   RNA-RNA interactions in chromatin organization.

   To test the sensitivity of the TaDRIM-seq methodology, we profiled
   alterations in H3K9ac-mediated RNA-chromatin and global RNA-RNA
   interactions between day and night in rice. We found that snRNAs and
   snoRNAs displayed a higher contribution to inter-chromosomal
   interactions at both time points, whereas the proportions of
   intra-chromosomal interactions mediated by mRNAs, lncRNAs, and microRNA
   exhibited a significant increase in the evening. Furthermore, we
   observed that transcripts from the OsLHY gene loci, a core circadian
   clock gene in rice, not only exhibited extensive inter-chromosomal
   interaction patterns, but also exhibited massive intermolecular RNA-RNA
   connectivity networks. In contrast, none of the evening clock genes
   formed such an intensive interacting network; instead, more scattered
   and sporadic connectivity networks were formed in the evening. We
   further revealed that DNAs that interact with time-specific RNAs were
   enriched for the GO term “light harvesting” and “carbohydrate
   biosynthetic process” in the morning, whereas “secondary metabolic
   process” and “carbohydrate metabolic process” in the evening,
   suggesting that dynamic RNA-chromatin interactions may have functions
   in rhythmic biological processes. Previous studies have shown that
   stress can induce changes in the RNA-chromatin interactome and
   chromatin-associated RNAs interactions with genes that respond to
   stimuli in Arabidopsis and human endothelial cells^[173]29,[174]35.

   In summary, TaDRIM-seq is an efficient, versatile, and technically
   simple method to assay RNA-chromatin interactions, as well as RNA-RNA
   interactions at various epigenomic regions and nuclear protein binding
   sites. TaDRIM-seq data provide a rich resource of caRNAs for further
   functional studies of RNA-chromatin interactions in both animals and
   plants. We anticipate that TaDRIM-seq will serve as an effective tool
   for comprehensive investigations into the role of caRNAs in chromatin
   remodeling and gene regulation.

Methods

Cell culture and crosslinking

   Human K562 (ATCC, CCL-243) and 293T (ATCC, CRL-3216) cells were
   cultured in RPMI 1640 (Thermo Fisher, C22400500BT) and DMEM (Thermo
   Fisher, C11995500BT), respectively, supplemented with 10% FBS (Thermo
   Fisher; 10099141C) and 1% penicillin-streptomycin (Thermo Fisher;
   15140122). The cells were grown at 37 °C with 5% CO[2] in a CO[2]
   incubator (Cellmate, Esco Scientific). Cells were cross-linked with 1%
   formaldehyde (Sigma-Aldrich; F8775) in 1×PBS (Thermo Fisher; AM9625)
   for 10 min with rotation at room temperature, followed by quenching
   with 0.2 M glycine (Sigma-Aldrich; G8898) for 5 min at room temperature
   with rotation. The cells were washed three times with 1× PBS and stored
   at −80 °C.

Plant materials and crosslinking

   In this study, rice Xian/indica cv. Minghui 63 (MH63) seeds were used
   and sown in May. Germinated seedlings were transplanted into a paddy
   field (Huazhong Agricultural University, Wuhan, China) and grown under
   normal agricultural conditions until harvest in July. Plants for young
   rice leaf tissue were grown in a greenhouse under a 14/10-h day/night
   cycle at a temperature of 32/28 °C. Wild-type A. thaliana (Colombia-0
   ecotype) was used in this study. The Arabidopsis seeds were
   surface-sterilized and stratified at 4 °C for 4 days. The Arabidopsis
   seedlings were cultivated at 22 °C under 12-h cycles of white
   fluorescent light and darkness for 15 days on Murashige and Skoog (MS)
   plates. All tissues were immediately cross-linked with formaldehyde and
   stored at −80 °C.

pG-Tn5 transposase assembly

   According to the manufacturer’s protocol (Vazyme; S602), pG-Tn5
   transposase was assembled. Briefly, primers A
   (5’-phos-CTGTCTCTTATACACATCTT-3’) and B
   (5′-phos-AAGATGTGTATAAGAGACAG-3′) were dissolved to 100 μM in annealing
   buffer. And then, the equimolar mixture of primers A and B was annealed
   by PCR instrument at 75 °C for 15 min, 60 °C for 10 min, 50 °C for
   10 min, 40 °C for 10 min, and 25 °C for 30 min, and named primer mix.
   Afterward, hyperactive pG-Tn5 Transposase (500 ng/μl), primer mix, and
   coupling buffer were mixed on the basis of the manufacturer’s protocol
   and were incubated at 30 °C for 1 h and named TTE Mix and stored at
   −20 °C for future use.

Construction of TaDRIM-seq library

   Nuclei preparation and tagmentation. Approximately 1 × 10^6
   cross-linked cells were used per replicate. Rice formaldehyde-fixed
   nuclei were isolated according to a previously reported
   protocol^[175]36. Cell nuclei were isolated in 1 mL lysis buffer (10 mM
   Tris pH 7.5, 10 mM NaCl, 0.2% NP40, 1× protease inhibitors, 0.2 U/μl
   RiboLock RNase) (Thermo Fisher) and incubated for 10 min on ice
   followed by centrifugation for 3 min at 2500 g and 4 °C. The nuclei
   were washed twice with 300 µl 1× wash buffer A (20 mM HEPES pH 7.6,
   150 mM NaCl, 0.5 mM Spermidine, 1× protease inhibitors, 0.2U/μl
   RiboLock RNase) (Thermo Fisher), and nuclei were collected by
   centrifugation for 3 min at 2500 g and 4 °C. Nuclei were then
   resuspended in Antibody buffer (495 μl 1× wash buffer A, 5 μl 5%
   digitonin, 2 μl 0.5 M EDTA, 1.6 μl 30%BSA, 10 μg antibody (H3K4me3,
   Abcam, ab8580; H3K9ac, Abcam, ab10812; CTCF, Abcam, ab188408; H3,
   Abclonal, A2348; H3K27me3, Abclonal, A2363), and incubated at 4 °C for
   2 h with rotation. Nuclei were pelleted for 3 min at 2500 g and washed
   five times with 300 μl 1× wash buffer A, resuspended in 500 µl 1×
   transposase incubation buffer (20 mM HEPES pH 7.6, 300 mM NaCl, 0.5 mM
   Spermidine, 0.01% digitonin, 1× protease inhibitors, 0.2U/μl RiboLock
   RNase) (Thermo Fisher), followed by adding 5 μl TTE Mix and incubation
   at 4 °C for 1 h with rotation. Nuclei were collected and washed five
   times with 300 µl 1×transposase incubation buffer, resuspended in
   tagmentation buffer (500 µl 1× transposase incubation buffer, 5 µl 1 M
   MgCl[2]), and incubated at 37 °C for 1 h. Next, 500 µl of 40 mM EDTA
   (Thermo Fisher; AM9260G) was added to stop the reaction. The isolated
   nuclei were washed twice with 500 µl 1× PBS (Thermo Fisher; AM9625) and
   then incubated in 200 µl permeabilization buffer (190 µl nuclease-free
   water, 10 µl 10% SDS) (Thermo Fisher) at 62 °C for 10 min. The SDS
   reaction by adding add 580 μl of nuclease-free water and 100 μl of 10%
   (vol/vol) Triton X-100 (Sigma-Aldrich; T8787) and incubated at 37 °C
   for 15 min with rotation. RNA fragmentation and PNK treated. The
   isolated nuclei were washed twice with 500 µl 1× wash buffer B (20 mM
   Tris-HCl, pH 7.5, 10 mM MgCl[2,] and 0.2% Tween 20) and resuspended in
   500 µl 1× CutSmart buffer (NEB; B6004S), followed by adding 1.6 µl
   RNase I[f] (NEB; M0243S) diluted tenfold in 1× PBS, and incubated at
   37 °C for 3 min with rotation followed by incubation for 5 min on ice.
   Nuclei were collected by centrifugation and resuspended in 400 µl PNK
   solution containing 1× T4 PNK reaction buffer, 1 mM ATP (Thermo Fisher;
   R0441), 1U/μl RiboLock RNase (Thermo Fisher; EO0382), and 1U/μl T4 PNK
   (Thermo Fisher; EK0032) at 37 °C for 40 min with rotation.

   In situ ligation of DNA and RNA. The specific bridge linker was
   designed to contain 5ʹ phosphorylation [5Phos] and Internal Biotin dT
   [ibiodT] chemical modifications. The forward strand was
   5′-[5APP]CCTCGATAC/iBIOdT/TGAGCTGAC-3′, and the reverse strand was
   5′-[5Phos]GTCAGC/iBIOdT/CAAGTATCGAG-3′. The 5′App ends of the forward
   strand were treated using a DNA 5′ Adenylation Kit (NEB; E2610L). The
   isolated nuclei were washed twice with 500 µl 1× wash buffer B and
   resuspended in 400 µl of RNA ligation solution (1× T4 RNA ligase
   reaction buffer, 1U/μl RiboLock RNase, 10U/μl T4 RNA Ligase 2
   truncated, 1.5pmol/μl bridge linker, 15% (w/v) PEG 8000, 0.1% Triton
   X-100) (NEB; M0242L), and incubated overnight at 16 °C with rotation.
   The nuclei were pelleted by centrifugation and washed five times with
   500 μl 1× wash buffer B, resuspended in 400 μl reverse transcription
   solution (1× MMLV buffer, 0.5 mM dNTP, 8U/μl MMLV (RNase H-), 1U/μl
   RiboLock RNase) (Promega; M5301), and then incubated at 42 °C for 1 h.
   The isolated nuclei were washed three times with 500 μl 1× wash buffer
   B, resuspended in T4 DNA ligation solution (1× DNA ligase reaction
   buffer with 10 mM ATP, 4U/μl T4 DNA ligase, 0.1% (vol/vol) Triton X-100
   and 0.1 mg/ml BSA) (NEB; M0202S), and incubated overnight at 16 °C with
   rotation.

   Reversal crosslinking and RNA/DNA purification. The isolated nuclei
   were resuspended in proteinase K solution (50 mM Tris-HCl, pH 7.5, 1%
   SDS, 100 mM NaCl, 1 mM EDTA, and 1 mg/mL proteinase K) at 65 °C for 3 h
   with shaking at 800 rpm. RNA/DNA complex was extracted and dissolved in
   nuclease-free water.

   Gap-filling and second-strand cDNA synthesis. The purified RNA/DNA
   complexes were incubated with gap-filling solution (10×NEBuffer
   r2.1,1 mM dNTP and 0.18U/μl T4 DNA polymerase) (NEB; M0203S) at room
   temperature for 40 min to repair the pG-Tn5 transposition gap, followed
   by second-strand cDNA synthesis by second strand synthesis Module (NEB;
   E6111S).

   DNA library preparation, PCR Enrichment, and sequencing. Next, DNA-cDNA
   complexes were converted to a DNA library using the TruePrep DNA
   Library Prep Kit for Illumina (Vazyme; TD501). The fragments containing
   biotinylated bridge-linker were enriched by M280 beads (Thermo Fisher;
   11206D) before PCR, and the enriched library DNA was size-selected
   using AMPure XP beads (Beckman; A63881) and sequenced (2× 150 bp) using
   Illumina Hiseq X-Ten.

Construction of ChRD-PET library

   The ChRD-PET libraries were constructed as previously
   described^[176]18. Briefly, 5 g of rice leaves were ground into a fine
   powder in liquid nitrogen and then lysed in 7 ml of buffer S (50 mM
   HEPES-KOH pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium
   deoxycholate and 1% SDS) at 4 °C for 40 minutes with rotation.
   Subsequently, 30 ml of buffer F (50 mM HEPES-KOH pH 7.5,150 mM NaCl,
   1 mM EDTA, 1% Triton X-100 and 0.1% sodium deoxycholate) was added and
   mixed thoroughly, followed by sonicated treatment using a Bioruptor
   (Diagenode) with the following conditions: high level, 36 cycles, 30 s
   on and 50 s off. The chromatin complexes were centrifuged, and the
   supernatant was transferred for immunoprecipitation using 50 μl of
   H3K9ac antibody (Abcam; ab10812) and 750 μl of suspended protein G
   magnetic beads (~1:100 dilution) (Thermo Fisher; 10009D). Chromatin
   complexes-beads were washed three times with 1 ml of 0.1% SDS FA cell
   lysis buffer (0.05 M HEPES-KOH, 0.15 M NaCl, 0.001 M EDTA, 1% Triton
   X-100, 0.1% sodium deoxycholate and 0.1% SDS), then washed twice with
   1 ml of high-salt ChIP buffer (0.05 M HEPES-KOH, 0.35 M NaCl, 0.001 M
   EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 0.1% SDS), once
   with 1 ml of ChIP wash buffer (0.01 M Tris-HCl, 0.25 M LiCl, 0.001 M
   EDTA, 2.5% NP-40 and 0.5% sodium deoxycholate) and twice with 1 ml of
   1× TE buffer (Ambion; AM9849). Next, the chromatin complex-beads were
   processed for end repair and A-tailing using T4 DNA polymerase
   (Promega; M421F) and Klenow enzyme (NEB; M0212L). A specific bridge
   linker (forward strand: 5′ [5APP]CCTCGATAC/iBIOdT/TGAGCTGACT-3′;
   reverse strand: 5′-[5Phos] GTCAGC/iBIOdT/CAAGTATCGAG-3′) was designed
   to ligate RNA, followed by reverse transcription using MMLV (RNase H-)
   (Promega; M5301), incubated at 42 °C for 1 h. The chromatin
   complex-beads were resuspended in T4 DNA ligation solution (1× DNA
   ligase reaction buffer with 10 mM ATP, 4 U/μl T4 DNA ligase, 0.1%
   (vol/vol) Triton X-100, and 0.1 mg/ml BSA) (NEB; M0202S) and incubated
   overnight at 16 °C with rotation. The complex-beads were reverse
   crosslinked and subjected to second-strand cDNA synthesis using the
   Second Strand Synthesis Module (NEB; E6111S). Finally, the ChRD-PET
   library was constructed using the TruePrep DNA Library Prep Kit for
   Illumina (Vazyme; TD501) and sequenced (2 × 150 bp) on an Illumina
   HiSeq X-Ten platform.

CUT&Tag

   The CUT&Tag assay was carried out according to a previously published
   protocol^[177]37, with slight modifications. Crosslinked mature rice
   leaves were used to extract nuclei. The nuclei pellets were washed
   twice with 300 µl of 1× wash buffer A, resuspended in 100 µl of
   antibody buffer (99 µl 1× wash buffer A, 1 µl 5% digitonin, 0.4 µl
   0.5 M EDTA, 0.32 µl 30% BSA, and 5 µg H3K27me3 (Abclonal; A2363) or
   H3K9ac antibody (Abcam; ab10812), and incubated at 4 °C for 2 h with
   rotation. The nuclei were collected, washed three times with 300 µl of
   1× wash buffer A, and resuspended in 100 µl of 1× transposase
   incubation buffer (20 mM HEPES pH 7.6, 300 mM NaCl, 0.5 mM spermidine,
   0.01% digitonin, 1× protease inhibitors, and 0.04 µM pG-Tn5) at 4 °C
   for 1 h with rotation. Next, the nuclei were collected, washed three
   times with transposase incubation buffer (without pG-Tn5), and
   incubated at 37 °C for 1 h. The transposed DNA was purified using a DNA
   Clean and Concentrator-5 kit (Zymo Research; D4014). Finally, the DNA
   library was amplified by PCR, size-selected using AMPure XP beads, and
   sequenced (2× 150 bp) using the Illumina HiSeq X-Ten platform.

RT&Tag

   RT&Tag libraries were constructed using 4 μg of H3K4me3 (antibody
   Abcam; ab8580) and 100,000 human K562 cells according to RT&Tag
   protocol ([178]https://www.protocols.io/view/rt-amp-tag-bn36mgre).
   Finally, the RT&Tag library was sequenced (2× 150 bp) using the
   Illumina HiSeq X-Ten platform.

RNA-FISH

   Human K562 cells were fixed by 4% paraformaldehyde (Boster; AR1068)
   spread on slides. RNA FISH probes were designed according to the target
   sequence (Supplementary Table [179]1). The sample was subjected to
   dehydration and denaturation using methanol, followed by digestion with
   pepsin. Next, the hybridization buffer containing specific targeting
   probes was added to the chamber for incubation at 37 °C overnight. The
   sample was washed three times and then subjected to rolling circle
   amplification using Phi29 DNA polymerase (Vazyme; N106-02) at 30 °C
   overnight. Subsequently, the hybridization buffer containing
   fluorescent probes was added to the samples, followed by dehydration
   with an ethanol series and mounting with mounting medium. Finaaly, the
   information of RNA spatial position was observation via a Leica THUNDER
   Imaging Systems.

Processing of RNA-DNA interactions in TaDRIM-seq and ChRD-PET data

   The raw TaDRIM-Seq RNA-DNA portion data and ChRD-PET data were analyzed
   using the ChRD-PET pipeline^[180]18, a process for chromatin-associated
   RNA-DNA interactions, and consists of three scripts. It mainly
   includes: splitting data into DNA-end and RNA-end data, mapping DNA-end
   data to the reference genome by BWA^[181]38 (version 0.7.15) and
   RNA-end data to the reference genome by HISAT2^[182]39 (version 2.1.0)
   and Bowtie2^[183]40 (version1.9); Extracting the uniquely matched
   reads; Converting bam files to BigWig files; Merging RNA-DNA
   interaction through reads ID; checking the reliability of RNA-DNA
   interactions, etc. In this process, the linker sequence of TaDRIM-Seq
   data used was

   A-CCTCGATACTTGAGCTGACAAGATGTGTATAAGAGACAG;

   B-CTGTCTCTTATACACATCTTGTCAGCTCAAGTATCGAGG.

   The linker sequence of the ChRD-PET data used is as follows:

   A-CCTCGATACTTGAGCTGACT

   B-AGTCAGCTCAAGTATCGAGG

   The histone modification peak corresponding to the TaDRIM-Seq and
   ChRD-PET data were expanded by ±10% as the supplied anchor to acquire
   specifically enriched information on RNA-DNA interactions. Only after
   choosing the interactions with an FDR < 0.05 to be examined further.

Peak calling for TaDRIM-seq data

   The parameters are slightly different when MACS2^[184]41 (version
   2.1.1) is used to call the peak for the DNA and RNA end data. For the
   DNA part, the main parameters are: -f BAM -g 3.6e+8; For the RNA part,
   the main parameters are: -f BAM -g 3.6e+8 --nomodel --extsize 400. To
   visualize the peak files, the bam file of DNA and RNA compared to the
   reference genome can be converted to a bigWig file by bamCoverage in
   deepTools^[185]42 (version 3.2.1). In this process, the main parameters
   on the DNA side are --binSize 5 --normalizeUsing RPKM
   --minMappingQuality 20, and on the RNA side are
   --samFlagExclude/--samFlagInclude 16 --binSize 5 -- minMappingQuality
   20 --normalizeUsing RPKM. Finally, bigWig files can be visualized
   through Integrative Genomics Viewer (IGV)^[186]43 (version 2.5.0).

TaDRIM-seq contact map construction and visualization

   Python tool that was used to deal with the RNA-DNA unbalanced matrix in
   ChRD-PET data^[187]18 was employed as a reference to normalize the
   matrix, considering that the RNA-DNA contact matrix is an unbalanced
   matrix. The core function of Python is to normalize an unbalanced
   matrix using the VC_seq technique. Finally, 500 kb, 100 kb and 10 kb
   resolution RNA-DNA interaction heat maps emerged.

TaDRIM-seq interactive network construction

   To show the RNA-DNA interactions in the whole genome, the Fruchterman
   Reingold pattern in Gephi^[188]44 software was used to map the DNA-RNA
   interactions on each chromosome. Because the number of RNA-DNA
   interactions on each chromosome was too large, 2000 DNA-RNA
   interactions were randomly selected from each chromosome as the dataset
   to draw the interaction network to make the network diagram more
   intuitive, and different information in the network were distinguished
   by the size and color of nodes.

Processing of RNA-RNA interaction in TaDRIM-seq

   In order to obtain more accurate and comprehensive RNA-RNA interaction
   information, we modified the reference genome (GRCh38 and MH63RS1) as
   follows: first, low complexity region and simple repeat region
   sequences were masked using bedtools maskfasta^[189]45; then, we used
   Rfam^[190]46 and infernal^[191]47 to annotate ncRNAs on the genome and
   selected the most highly expressed sequences of each type of ncRNA
   based on the Total RNA-seq data as representatives; these sequences
   were then merged into the reference genome that had been masked for the
   corresponding sequences on the chromosome. Likewise, 45S rDNA sequences
   on chromosomes were masked and added separately to the reference
   genome. Next, mitochondrial information was added to the reference
   genome. Finally, the Y chromosome and scaffold information were removed
   from the GRCh38 reference genome, and chloroplast information was added
   to the MH63RS1 reference genome. The reference genome constructed after
   the above steps was used for RNA-RNA interaction information analysis.
   The acquisition of RNA-RNA interaction information involved three main
   steps: linker identification, sequence alignment, and cluster
   identification. First, the sequencing reads were merged using
   FLASH^[192]48 to obtain full-length sequences for fragments shorter
   than or equal to 290 bp. Then, use cutadapt (-b file: $Linker_file -e
   0.2 --no-indels --discard -O 34) to identify the RNA-RNA bridge linker
   for merged (≤290 bp) and unmerged (>290 bp) sequences and the RNA-RNA
   bridge linker is:

   A-CCTCGATACTTGAGCTGACGTCAGCTCAAGTATCGAGG

   B-CCTCGATACTTGAGCTGACGTCAGCTCAAGTATCGAGG; For each sequencing record,
   if there was only one linker recognized and the sequencing length of
   both ends were greater than 12 bp, then the sequences at both ends of
   the corresponding linker were extracted, de-weighted by the custom
   script and aligned to the modified reference genome described above
   using hisat2^[193]49; Only Sequencing record with high-quality(Q > 20)
   and uniquely mapped pair-end-tags were retained, and final
   Inter-Molecular interaction information were extracted after
   interactions distance seperation; Finally, we use DG algorithm^[194]50
   to identify high-confidence interactions and Monte Carlo
   simulation^[195]51 to detect significant contacts as Cai et al.
   described reviously^[196]17.

Prediction of RNA secondary structure

   The prediction of RNA secondary structure mainly uses RNAfold
   software^[197]52,[198]53, and the input file is the sequence
   information of RNA in FASTA format. RNA circular secondary structure
   can be visualized using the Java package VARNAv3-93. jar^[199]54, and
   the input file is the RNA Vienna format file, which can be converted by
   sequence information in fasta format.

Analysis of differences in gene interaction intensity

   The difference in gene interaction intensity was analyzed using
   DESeq2^[200]55. First, the expression intensity of genes in different
   samples was calculated, and then DESeq2 was used to normalize the
   quantified expression intensity and analyze the difference between
   different samples^[201]56. In the screening results, genes whose log2
   (FoldChange) was greater than 1.5 or less than −1.5, and whose
   corrected padj value was less than 0.05, were considered to have
   differences in interaction intensity between the two groups. The
   difference results were then visualized using the differential volcano
   map in the ggplot2 package^[202]57 of the R software.

Prediction of lncRNAs

   The main steps for lncRNA identification are as follows: Each
   experimental transcripome sample data was assembled separately with
   StringTie (version 2.1.4)^[203]39. Because the experimental data of
   this study are chain specific, the parameter “-rf” should be added
   during the assembly process. Then StringTie merge is used to merge all
   sample transcripts into a complete transcript; Transcripts classified
   as “u”, “j”, “o”, “x”, “i” were screened using gffcompare^[204]58;
   Transcript length greater than 200 bp; To exclude transcripts with
   Coding Potential, Coding Potential Calculator (CPC) was
   used^[205]59,[206]60 and coding-non-coding Index (CNCI)^[207]61were
   used to calculate transcriptional Coding potential by removing
   transcripts with CPC > 0 and CNCI > 0^[208]60,[209]62; To assess
   whether the remaining transcripts contain any known protein domains,
   HMMER-3^[210]63 was used to identify transcripts translated in all
   possible frameworks.Transcripts matching Pfam hit (Evalue < 1*10−5) are
   excluded, and the remaining transcripts are retained lncRNAs, which are
   potential lncRNAs that will be used for TaDRIM-Seq and ChRD-PET data
   analysis.

KEGG path analysis

   The plant GSEA (The Plant GeneSet Enrichment Analysis Toolkit) is an
   online site
   ([211]http://structuralbiology.cau.edu.cn/PlantGSEA/analysis.php), and
   the KEGG/GO enrichment modules are used in the KEGG pathway enrichment
   analysis of target genes. In the result file, the FDR (corrected
   p-value) of each path entry must be less than 0.05, to serve as the
   significance threshold, and the smaller the FDR value, the higher the
   significance^[212]64.

Statistics & reproducibility

   No statistical method was used to predetermine sample size. No data
   were excluded from the analyses. The experiments were not randomized.
   The investigators were not blinded to allocation during experiments and
   outcome assessment.

Reporting summary

   Further information on research design is available in the [213]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [214]Supplementary Information^ (23.8MB, pdf)
   [215]41467_2024_53534_MOESM2_ESM.docx^ (12.3KB, docx)

   Description of Additional Supplementary Information
   [216]Supplementary Data 1^ (20.3KB, xlsx)
   [217]Reporting Summary^ (88.2KB, pdf)

Source data

   [218]Source Data^ (3.4MB, pdf)
   [219]Transparent Peer Review file^ (31.6MB, zip)

Acknowledgements