Abstract Background Existing studies have demonstrated that the integration analysis of transcriptomic and epigenomic data can be used to better understand the onset and progression of many diseases, as well as identify new diagnostic and prognostic biomarkers. However, such investigations on large-scale sequencing data remain challenging for researchers or clinicians with limited bioinformatics knowledge. Results To facilitate the interpretation of the gene regulatory landscape, we developed an R Shiny application [Linking of atac-seq to gene expression data (Linkage)] for exploring and visualizing potential cis-regulatory elements (CREs) of genes based on ATAC-seq and RNA-seq data. Linkage offers six modules to identify, annotate, and interpret potential gene regulatory elements from the whole genome step by step. Linkage currently supports the analysis of human and mouse datasets. Linkage can provide interactive visualization for the correlation between chromatin accessibility and gene expression. More than that, Linkage identifies transcription factors (TFs) that potentially drive the chromatin changes through identifying TF binding motifs within the CREs and constructing trans-regulatory networks of the target gene set. Conclusions This powerful tool enables researchers to conduct extensive multi-omics integration analysis and generate visually appealing visualizations that effectively highlight the relationship between genes and corresponding regulatory elements. With Linkage, users can obtain publishable results and gain deeper insights into the gene regulatory landscape. Availability and implementation ‘Linkage’ is freely available as a Shiny web application [40]https://xulabgdpu.org.cn/linkage. The source code and instructions for Linkage can be accessed via Github [41]https://github.com/luoyyyy/Linkage. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-025-12115-6. Keywords: Multi-omics, ATAC-seq, RNA-seq, R-shiny, Gene regulatory elements, Transcription factors Background With the rapid advancement of high-throughput sequencing technologies and bioinformatics, increasing attention has been paid to the critical regulatory roles of genomic non-coding regions [[42]1]. These regions, including promoters and enhancers, often harbor transcription factor binding sites (TFBS) that directly influence gene expression. As an important feature of the epigenome, chromatin accessibility reflects the degree to which DNA is exposed and accessible to regulatory proteins such as transcription factors, and has thus become a key focus in gene regulation studies [[43]2]. Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) is a high-throughput technology that employs an engineered Tn5 transposase to map and quantify genome-wide chromatin accessibility [[44]3]. Higher chromatin accessibility means more active DNA-protein binding events that happen at these regions for bulk ATAC-seq data analysis. On the other hand, bulk mRNA sequencing (RNA-seq) is the most common application to estimate and quantify gene expression level, which has broad utilities in differential expression analysis, cancer classification, disease diagnosis, and optimizing therapeutic treatment [[45]4, [46]5]. To date, many studies have performed bioinformatics analysis to characterize gene regulatory landscapes by integrating ATAC-seq and RNA-seq data [[47]6, [48]7]. However, this kind of work remains very challenging for scientists without enough experience in high-throughput sequencing data analysis. The emergence of web applications has alleviated this issue by providing user-friendly interfaces for bioinformatics analysis. Nevertheless, the existing web tools only support analyzing datasets from public domains, such as The Cancer Genome Atlas (TCGA) and The Encyclopedia of DNA Elements (ENCODE), which is not enough for researchers who intend to analyze their own datasets [[49]8, [50]9]. Here, we introduce Linkage, a user-friendly interactive R Shiny web-based application for exploring and visualizing potential gene regulatory elements based on ATAC-seq and RNA-seq data. Linkage allows users to upload customized data or re-analyze public datasets. The main feature of Linkage is to identify potential gene regulatory elements for the whole genome by performing a correlation analysis between matched chromatin accessibility and gene expression across multiple samples. Additional modules are developed to allow users to perform more comprehensive and biologically meaningful analysis for the links between ATAC-seq peaks and target genes. Analysis and visualization results are returned to the web page and can be downloaded in PDF, PNG, JPEG, CSV, and TXT formats. The workflow and typical output schema are shown in Fig. [51]1. Fig. 1. [52]Fig. 1 [53]Open in a new tab Overview of functionality and workflows with Linkage. A The home module of Linkage. B The regulatory peaks search module of Linkage. C The regulatory peaks visualization module. D The regulatory peaks annotation module of Linkage. E The motif analysis module of Linkage. F The regulatory networks module of Linkage. G The pathway enrichment module of Linkage Implementation Home module The Home Module contains the brief instructions and Import Data panel of Linkage (Fig. [54]1A). Linkage provides two public multi-omics datasets, i.e., the TCGA Breast Cancer cohort dataset and the [55]GSE121589 dataset, for users to re-analyze them. The TCGA Breast Cancer cohort is a human dataset that contains bulk RNA-Seq and ATAC-Seq data from 72 patients with breast cancer [[56]10]. The [57]GSE121589 is a mouse dataset that constitutes bulk RNA-Seq and ATAC-Seq data from 33 mice Muscle Stem Cells (MuSCs) [[58]11]. Apart from the provided public datasets, Linkage also enables users to upload and explore their own data. Only two tab-delimited text/CSV input files (a chromatin accessibility matrix and a gene expression matrix) are required before running Linkage. The gene expression matrix file is a tab-delimited multi-column data matrix, where the first column represents gene symbols, and the following columns represent normalized or raw expression levels of genes for each sample. The chromatin accessibility matrix file is also a tab-delimited multi-column data matrix, where the first three columns represent the names of chromosomes, the start, and the end coordinates on the chromosomes of the peaks respectively, and the remaining columns represent normalized or raw chromatin accessibility levels of peaks for each sample. For reference genome annotation, Linkage has preloaded GENCODE Release 43 (GRCh38.p13) for human and Release M37 (GRCm39) for mouse as default reference files in the web application [[59]12]. Users are advised to provide normalized values (e.g., TPM, CPM, or log-transformed counts) and at least 10 matched RNA-seq and ATAC-seq samples to ensure reliable integrative analysis. Each element of ready-to-analysis files will appear in the Chromatin Accessibility Matrix panel and Gene Expression Matrix panel. Regulatory peaks search module The Regulatory Peaks Search Module is the key module of Linkage that allows users to detect all potential regulatory DNA regions for a specific gene (Fig. [60]1B). In this study, we define a regulatory peak as an ATAC-seq peak (i.e., a chromatin-accessible region) that shows a statistically significant correlation with the expression level of a nearby gene across samples. Such correlation implies potential cis-regulatory activity, and these peaks are considered to harbor candidate cis-regulatory elements (CREs). Linkage adopted the same procedure as in TCGA [[61]10] to associate regulatory regions with the predicted target genes. Specifically, given an input gene and search scale, Linkage automatically performs standard correlation analysis between the quantitative expression level of the input gene and each quantitative chromatin accessibility measure in the region across all samples (Fig. [62]2A). To accommodate input genes from various annotation sources, Linkage supports three widely used gene ID types: Ensembl ID, gene symbol, and Entrez ID. Fig. 2. [63]Fig. 2 [64]Open in a new tab Example of using Linkage to conduct key regulatory elements recognition analysis. A the scatter plot of quantitative chromatin accessibility and gene expression; B the statistically significant regulatory peaks; C the normalized ATAC-seq peak signal representing chromatin accessibility; D the expression boxplot of target genes; E the upsetplot of regulatory peaks annotation; F the Motif scanning table; G the interactive GRNs; H the GO/KEGG Enrichment Table Users can adjust the search scale and correlation analysis algorithm (Spearman/Pearson/Kendall). The default correlation method is Spearman correlation, which is robust to outliers and does not assume linearity between chromatin accessibility and gene expression. Since promoter capture Hi-C (PCHi-C) data suggested that >75% of three-dimensional (3D) promoter-based interactions occur within a 500 kilobase pairs (kbp) distance [[65]13], we set the default search scale to 500 kbp upstream or downstream of the input gene’s TSS as potential regulatory regions. This default search scale can avoid spurious predictions. To enable users to explore interactions between distal ATAC-seq peaks and genes, Linkage supports the maximum search scale up to 1 million base pairs (Mbp). For multiple hypothesis testing correction, we adopt a default false discovery rate (FDR) threshold of 0.01, which balances sensitivity and specificity. Then, all the statistically significant results are listed in the Potential Cis-regulatory Regions panel (Fig. [66]2B). By clicking on a specific entry of this panel, users can view the scatter plot of quantitative chromatin accessibility and gene expression from the Correlation Plot panel. The corresponding rho and FDR for the correlation analysis will also be shown on the scatter plot. A positive correlation indicates that more accessible of open chromatin regions appear to have higher expression level of genes. Such finding suggests that the CREs in such regions could potentially activate the expression of target genes. On the contrary, a clear negative correlation suggests that the target gene associated with the chromatin accessibility regions might be repressed by the CREs in these regions. Regulatory peaks visualization module The Regulatory Peaks Visualization Module allows users to visualize the chromatin accessibility signals around a specific regulatory peak, as well as the corresponding quantitative expression of the target gene of this regulatory peak (Fig. [67]1C). Users initially select a regulatory peak that is obtained from the Regulatory Peaks Search Module. Linkage then categorizes samples into five groups based on the quantitative chromatin accessibility of the specific regulatory peak, ranging from low to high for each sample. Specifically, these five groups are defined using quantile-based bins of the ATAC-seq peak signal across samples (0–20%, 20–40%, 40–60%, 60–80%, and 80–100%), which facilitates visualization of chromatin accessibility variation across samples. The coverage track of mapped ATAC-seq reads (Fig. [68]2C) and the expression boxplot of the target gene for each group will be shown simultaneously (Fig. [69]2D). Regulatory peaks annotation module The Regulatory Peaks Annotation Module allows users to visualize the annotation of the predicted regulatory peaks for genes that are given in the previous module (Fig. [70]1D). Once users click ‘Annotate Regulatory Peaks’, Linkage performs genomic location-based annotation of the identified peaks using the corresponding function from the ChIPseeker package [[71]14]. The annotation includes promoter, 5′ UTR, 3′ UTR, exon, intron, downstream, and distal intergenic regions. Promoters are defined as ± 3 kb around the transcription start site (TSS). Distal intergenic regions refer to genomic regions located outside of annotated gene bodies and downstream zones. These definitions follow the default settings of ChIPseeker and are based on gene models provided by GENCODE [[72]12]. This standardization ensures consistent and biologically meaningful annotation across datasets. To effectively visualize the overlaps and distribution in the annotation for peaks, Linkage also produces the upsetplot that is adopted from the ChIPseeker package (Fig. [73]2E). Cis-Regulatory elements detection module After identifying potential regulatory peaks, researchers can further explore which specific CREs are activated with Linkage (Fig. [74]1E). CREs and the TFs that bind on them play a central role in gene transcription regulation, which can be detected by ATAC-seq data. The Cis-Regulatory Elements Detection Module supports users to visualize the enriched TFBS within potential regulatory peaks. Motif annotation is performed using the motifmatchr package [[75]15] based on position weight matrices (PWMs) from the JASPAR 2022 database [[76]16]. This module identifies individual TFBSs within each peak, rather than performing a statistical enrichment analysis. By clicking on a specific regulatory peak, users can view the location and binding score information of each enriched TFBS of this DNA region from the Motif Scanning Table (Fig. [77]2F). Once users select one TFBS of this table, the corresponding sequence logo of this CRE will appear in the Sequence-logo Plot panel. In addition, to enhance interpretability, we added a new column to the Motif Scanning Table displaying the average expression level of each enriched TF across all samples. This improvement provides insight into whether the enriched TFs are also transcriptionally active in the dataset. Gene regulatory network module The interplay between CREs and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs) (Fig. [78]1F). Studies of GRNs are crucial for understanding how cellular identity is established, maintained, and disrupted in disease. The Gene Regulatory Network Module helps users to visualize GRNs whose nodes are represented by genes and their corresponding CREs that were inferred from previous analysis of Linkage. First, users input a list of genes and TFs of interest which was obtained from the previous analysis. Then users can customize a series of parameters in association with building the GRN, including types of gene symbols, calculation methods, and thresholds of interactions between the nodes (edges of the GRNs). Once users click the ‘Build the GRN’ button, Linkage will perform a standard correlation analysis between the expression levels of the selected genes and the expression levels of transcription factors whose binding motifs are detected in the previously linked ATAC-seq peaks. The significant calculation results of correlation analysis are shown in the Gene-TF Table panel. Meanwhile, Linkage produces the corresponding informatic and interactive GRNs that are adopted from the visNetwork package [[79]17] (Fig. [80]2G). Users can further change network layouts, select subnetworks, and save the GRNs as spreadsheets with interaction scores or plots. Furthermore, Linkage allows users to upload the previously saved Gene-TF Table file to rebuild the GRN without re-running the correlation analysis, which is especially useful for large gene sets where the original computation may be time-consuming. This feature enables faster re-analysis and visualization by skipping the repeated computation of TF–gene interaction scores. Pathway enrichment analysis module The Pathway Enrichment Analysis Module supports users to visualize tabular and graphical pathway enrichment results of the genes/TFs of interest that were previously produced from other modules of Linkage (Fig. [81]1G). The pathway enrichment analysis can link them with underlying molecular pathways and functional categories such as gene ontology (GO) [[82]18] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [[83]19]. Within this module, users can input a list of interested genes/TFs and set four key parameters (i.e., adjusted p-value cutoff, q-value cutoff, minimal size of annotated genes for testing, and maximal size of annotated genes for testing) for pathway enrichment analysis following the implementation of the clusterProfiler package [[84]20]. Then once users click the ‘Build Pathway Enrichment Analysis’ button, Linkage automatically performs GO and KEGG enrichment analysis. The corresponding enrichment categories will be returned in the GO/KEGG Enrichment Table. Finally, Linkage implements several different visualization methods adopted from the enrichplot package [[85]21] to interpret the functional results in the GO/KEGG Enrichment Table from multiple perspectives (Fig. [86]2H). Results and discussions Case study To better show the functionalities of Linkage, we collect multi-omics profiles of human breast cancer and mice MuSCs for experiments. For the human breast cancer dataset, we collect 72 samples with matched chromatin accessibility and gene expression data. For each detected expressed gene, we asked which regulatory regions might have contributed to the expression of this gene in breast cancer patients. To address this question, we implemented standard correlation analysis between the quantitative chromatin accessibility measure and the gene expression level across all samples. The Spearman correlation coefficient, r, was used to evaluate whether the potential regulatory regions were significantly correlated with the gene expression. A total of 60,483 expressed genes were analyzed. All peaks whose summits were located within 500 kbp from a gene’s TSS were considered by default. A conservative FDR cutoff of 0.01 was used to avoid false positives. This analysis identified a total of 489,132 regulatory regions for 18,104 genes and a complete list of these regulatory regions can be found in Supplementary Table [87]S1. Next, we further explored whether Linkage could provide insights into the regulatory networks of breast cancer-related genes. Literature evidence indicates that PGF, CTSB, and EDN1 play important roles in breast cancer progression [[88]22–[89]24]. Using the publicly available datasets collected for this study, we examined the associations of these genes with transcription factors in the Gene Regulatory Network module and observed distinct patterns: CTSB is linked to 27 TFs, EDN1 to 3 TFs, whereas PGF did not show highly correlated TFs in our example dataset. In the Pathway Enrichment module, all three genes are enriched in multiple cancer-relevant pathways, consistent with previous reports [[90]22–[91]24], including angiogenesis (PGF), extracellular matrix remodeling (CTSB), and vascular homeostasis (EDN1). Due to the limited sample size and lack of experimental validation, we did not investigate the underlying mechanisms in depth. Nevertheless, this case study demonstrates that Linkage can uncover biologically meaningful CRE–TF–gene associations and pathway-level insights, while leaving broader exploration to the end-user. We also collect matched chromatin accessibility and gene expression data of 33 mice MuSCs. A total of 1475 regulatory regions for 910 genes are identified with the same protocol and a complete list of these regulatory regions can be found in Supplementary Table [92]S2. Notably, the number of significant CRE-gene associations identified in the human dataset was substantially higher than in the mouse dataset. This difference primarily stems from the larger sample size and higher inter-sample variability in the human cohort, which increases the statistical power of correlation-based detection. In contrast, the mouse dataset is relatively homogeneous, both in sample type and biological context, thereby limiting the number of detectable associations. Taken together, these results demonstrate that the Linkage application can identify gene expression-associated regulatory regions from multi-omics data, which could potentially implicate regulatory elements responsible for cancer development. Comparison of linkage with other tools Several Shiny applications, such as SPACE and genomeSidekick, have been published in the arena of multi-omics data integration analyses [[93]8, [94]25]. We qualitatively compared the ability of Linkage with the previously published applications in identifying and interpreting potential gene regulatory elements from multi-omics data (Fig. [95]3). SPACE allows users to analyze and visualize tens of thousands accessible DNA elements from multiple cancer types, but only the data from predefined cohorts is available for analysis. Thus, users cannot upload and analyze their customized datasets with SPACE. The genomeSidekick software serves the purpose for investigation of ATAC-seq and RNA-seq data. However, the primary concentration of genomeSidekick remains confined to the foundational data analysis stages. They tend to have limited capabilities for carrying out downstream advanced analyses, such as linking regulatory regions to gene expression, scanning the potential CREs, constructing regulatory networks, and so on. To the best of our knowledge, Linkage is currently the only Shiny application that is specifically designed for predicting gene regulatory elements from ATAC-seq and RNA-seq data. To improve user experience, we also provided local installation instructions in the GitHub repository to facilitate smoother usage. Fig. 3. [96]Fig. 3 [97]Open in a new tab Comparison of supported features from currently available multi-omics data integration web-servers. “NA” indicates that the feature is not applicable to the tool; “Check mark” indicates that the feature is supported by the tool We note that Linkage requires preprocessed input, which includes expression and accessibility matrices derived from upstream steps such as alignment, quantification, peak calling, and normalization. This design trades full automation for flexibility and clarity. Users can leverage established pipelines for these steps, such as Galaxy [[98]26] for end‑to‑end GUI workflows, UTAP2 [[99]27] for combined transcriptome and epigenome processing. By relying on these robust upstream pipelines, Linkage remains focused on integrative correlation analysis and visualization with cleanly defined inputs. Conclusions Linkage is a web-based Shiny application that can utilize large-scale multi-omics datasets for the identification and prioritization of DNA regulatory elements from whole genomic regions. Linkage helps users perform standard correlation analysis between predefined chromatin accessibility peaks and gene expression profiles across multiple samples to identify potential regulatory relationships. The visualization and annotation modules provide user-friendly graphical interfaces to visualize the ATAC-seq genome tracks and genomics location features of a given regulatory peak. The motif module helps assess the probability of potential DNA regulatory elements within individual regulatory peaks. The Linkage web application also supports users to explore the GRNs and molecular pathways composed of DNA regulatory elements and target genes that were collected by the previous analyses. In addition to ATAC-seq, Linkage is also compatible with other chromatin accessibility data types such as DNase-seq, ChIP-seq, and CUT&Tag, provided that a properly formatted peak matrix is supplied. This flexibility enables broader applicability of the tool to diverse epigenomic modalities. Thus, the Linkage web application provides an accessible and interactive platform for integrative analysis, even for users without extensive programming expertise or prior experience in multi-omics data integration. In conclusion, with the increasing popularity of higher-resolution sequencing strategies such as single-cell technologies, integrative multi-omics data analysis will undoubtedly receive more and more attention. We will keep on developing and upgrading the Linkage network resource to benefit the omics data integration research community. Availability and requirements Project name: Linkage. Project home page: [100]https://github.com/luoyyyy/Linkage. Operating system(s): Platform independent. Programming language: R. Other requirements: None. License: MIT. Any restrictions to use by non-academics: None. Supplementary Information [101]Supplementary material 1.^ (3.5MB, csv) [102]Supplementary material 2.^ (26.5MB, csv) [103]Supplementary material 3.^ (13.9KB, docx) Acknowledgements