How to Combine Hi-C, ATAC-Seq, and RNA-Seq to Study Gene Regulation
Summary
Hi-C ATAC-seq RNA-seq integration provides three complementary layers of regulatory evidence — 3D chromatin architecture, chromatin accessibility, and transcriptional output — that no single assay can supply alone. This guide explains what each method contributes to a gene regulation study, when a two-assay combination is sufficient versus when all three are needed, how to design matched-sample experiments for reliable integration, and what analytical outputs to expect from the combined dataset. Scenario-based use cases cover cancer regulatory biology, developmental epigenomics, and plant stress response.
Key Takeaways
- Hi-C maps genome-wide chromatin contacts including TADs, A/B compartments, and enhancer-promoter loops; ATAC-seq identifies which regulatory elements are accessible; RNA-seq measures transcriptional output — the three layers are hierarchically linked and answer different regulatory questions
- Two-assay combinations (ATAC-seq + RNA-seq, or Hi-C + RNA-seq) are sufficient for most research questions; adding a third assay is justified when the specific question requires all three evidence layers simultaneously
- The most informative use of Hi-C in a three-assay design is to assign distal regulatory elements to target genes via chromatin loops — a connection that ATAC-seq + RNA-seq proximity-based analysis cannot make beyond ~500 kb
- TAD boundary disruption is a documented mechanism of oncogene activation: in IDH-mutant gliomas, CTCF hypermethylation disrupts a boundary near the PDGFRA oncogene, allowing an adjacent enhancer to aberrantly contact and activate it (Flavahan et al., Nature, 2016) — a finding that required Hi-C to identify
- Matched-sample design — all three assays performed on the same biological sample — is required for meaningful correlation analysis; cross-experiment integration confounds regulatory signal with batch effects
- Analysis proceeds in two steps: first establish ATAC-seq and RNA-seq concordance, then add Hi-C loop context to assign distal elements to target genes
What Each Assay Contributes: A Three-Layer Regulatory Map
Before deciding which combination of assays to run, it helps to be precise about what each method actually measures — not in general terms, but in terms of what specific regulatory question it can and cannot answer.
Figure 1. Hi-C, ATAC-seq, and RNA-seq provide three hierarchically linked regulatory layers; the integration output is a set of enhancer–promoter pairs with supporting accessibility and expression evidence.
| Assay | What It Measures | Primary Output | What It Cannot Tell You |
|---|---|---|---|
| Hi-C | Genome-wide chromatin contacts (all loci pairs) | TADs, A/B compartments, loops, insulation scores | Whether a contacted element is transcriptionally active or accessible |
| ATAC-seq | Open chromatin regions (Tn5-accessible loci) | Peak set of accessible elements; NFRs; TF footprints | Which open elements are in physical contact with a gene promoter |
| RNA-seq | Steady-state transcript levels | DEGs, expression values, pathway enrichment | What chromatin changes drive expression differences |
The regulatory logic connecting these three layers flows in one direction: chromatin architecture (Hi-C) establishes the spatial framework in which regulatory elements (ATAC-seq peaks) physically contact gene promoters to drive transcription (RNA-seq). A complete mechanistic interpretation requires all three. But as described below, many research questions are answerable with just two.
For a technical overview of Hi-C technology principles including crosslinking, ligation, and contact map generation, see our introduction resource. For a detailed overview of how ATAC-seq works including the Tn5 transposase mechanism and open chromatin interpretation, see our ATAC-seq resource.
Why No Single Assay Is Sufficient for Regulatory Mechanism Studies
Each assay has a structural blind spot that the other two can fill.
Hi-C at standard resolution identifies that two genomic loci are in spatial proximity — a loop between an enhancer and a promoter, or the boundaries of a topologically associating domain (TAD). It does not reveal whether the enhancer in that loop is in an open, accessible chromatin state, nor whether the target gene is actively transcribed. A loop identified by Hi-C may be constitutive and functionally inert, or condition-specific and functionally critical — Hi-C alone cannot distinguish these.
ATAC-seq identifies every accessible chromatin region in the cell — active promoters, open enhancers, nucleosome-depleted regulatory elements. But it does not identify which of those accessible elements physically contacts which gene promoter. Proximity-based assignment (attributing accessible peaks to the nearest TSS) is computationally convenient but biologically inaccurate for distal regulatory elements that skip intervening genes through three-dimensional looping. A distal enhancer 800 kb from its target gene, skipping five protein-coding genes, will be misassigned by proximity logic.
RNA-seq provides the biological readout — which genes are differentially expressed between conditions — but it is the output of regulation, not the mechanism. A set of DEGs is the starting point for regulatory discovery, not the answer.
When to Run Two Assays vs All Three
Most epigenomics projects do not require all three assays simultaneously. Adding Hi-C to an ATAC-seq + RNA-seq design increases cost, complexity, and sample input requirements substantially — it should be justified by the specific research question, not added by default.
Figure 2. Most regulatory discovery projects are well-served by ATAC-seq + RNA-seq alone; Hi-C adds unique value when 3D loop context or TAD boundary interpretation is required.
ATAC-Seq + RNA-Seq: Sufficient for Most Regulatory Discovery Projects
The combination of ATAC-seq and RNA-seq answers the most common regulatory discovery question: which chromatin changes are associated with expression changes between conditions? The integration framework — matching differential ATAC-seq peaks to differential RNA-seq genes — identifies concordant loci where both chromatin accessibility and gene expression change in the same direction, yielding high-confidence regulatory candidates.
Multi-omics association between ATAC-seq and RNA-seq operates at two levels: a global association that captures overall trends, and a gene-level association that prioritizes key genes and functions for mechanism discovery. The integrated pipeline combines correlation analysis, Venn overlap, four-quadrant classification, locus tracks, and pathway enrichment to link chromatin openness to transcriptional output.
This two-assay approach is well-suited for: identifying transcription factors whose motifs are enriched in differentially accessible peaks; mapping promoter-proximal regulatory changes associated with DEGs; generating a ranked shortlist of candidate regulatory elements for follow-up validation. Our ATAC-seq and RNA-seq integration service delivers this analysis as an end-to-end pipeline with matched-sample design and ranked output tables.
When Hi-C Adds Unique Value
Hi-C becomes necessary when the research question explicitly involves three-dimensional regulatory logic that proximity-based analysis cannot resolve:
Distal enhancer–gene assignment (>500 kb): When a relevant regulatory element is located more than 500 kb from its target gene — a common situation for developmental and tissue-specific enhancers — proximity-based ATAC + RNA-seq assignment will misidentify the target. Hi-C contact maps resolve the actual looping connection. A three-assay integration study (Hi-C + ATAC-seq + H3K27ac ChIP-seq + RNA-seq) in macrophages identified 5,039 enhancer-promoter loops by intersecting ATAC and H3K27ac peaks with Hi-C contact data — a regulatory network that proximity analysis alone would have substantially distorted.
TAD boundary disruption in cancer: When structural or epigenetic alterations disrupt chromatin insulation, causing genes to fall under the influence of regulatory elements normally separated from them by a TAD boundary, Hi-C is the only method that directly maps the disruption. In IDH-mutant gliomas, CTCF hypermethylation disrupts a constitutive insulator near the PDGFRA locus, allowing an enhancer normally associated with the housekeeping gene FIP1L1 to aberrantly contact and activate the PDGFRA oncogene (Flavahan et al., Nature, 2016). This mechanism was identifiable only through Hi-C contact mapping combined with methylation and expression data.
Compartment switching: Changes in A/B compartment identity — regions transitioning from inactive B to active A compartment — reflect large-scale chromatin reorganization that ATAC-seq and RNA-seq capture only partially. Hi-C provides the compartment-level context that gives these changes spatial meaning. Our Hi-C sequencing service supports full genome-wide contact mapping with TAD, compartment, and loop analysis.
Scenario-Based Use Cases
Cancer Regulatory Biology: TAD Disruption and Enhancer Hijacking
The most clinically compelling use case for three-assay integration is structural regulatory disruption in cancer. Somatic mutations, copy number alterations, and epigenetic changes can all disrupt TAD boundaries — the insulation units that separate regulatory elements from genes in neighboring domains.
CTCF binds to insulator sites that regulate genome structure into TADs, insulating genes in one domain from activation by enhancers in different domains. DNA methylation can prevent CTCF binding, and loss of these insulation sites has been shown to increase expression of the PDGFRA oncogene, potentially by abrogating the TAD boundary and permitting the PDGFRA gene access to an active enhancer in a distinct domain.
For cancer regulatory studies, the recommended assay combination depends on the question's specificity. For gene-level regulatory changes (promoter accessibility + expression), ATAC-seq + RNA-seq provides the most actionable output. When structural disruption — TAD boundary changes, enhancer hijacking, compartment switching — is suspected as the mechanism, Hi-C is required. For targeted investigation of known loci, Capture Hi-C provides loop detection at 1–2 kb resolution with high data efficiency, and integrates with ATAC-seq and CUT&Tag data for a richer regulatory network model. Our Capture Hi-C service supports targeted regulatory locus analysis in cancer and other disease settings.
Developmental Epigenomics: Chromatin Remodeling Precedes Transcription
In developmental biology, chromatin accessibility changes frequently precede the transcriptional changes they enable — open chromatin is established at enhancers before the target gene is activated, a phenomenon called enhancer priming. ATAC-seq + RNA-seq at multiple developmental time points captures this temporal offset. Time-series ATAC-seq data combined with Hi-C interactions reveals that while promoter accessibility remains stable across developmental transitions, enhancer accessibility becomes dynamically more open toward the gene activation time point — the enhancer side shows stronger and earlier accessibility changes than the promoter side.
Hi-C in developmental designs adds loop dynamics: which enhancer-promoter contacts are established de novo at each developmental stage, and which are dissolved. This is particularly valuable for understanding how lineage commitment reorganizes regulatory topologies rather than simply opening or closing individual elements.
Plant Stress Response: Accessible Regulatory Elements Under Abiotic Stress
Plant epigenomics studies combining ATAC-seq and RNA-seq have established that abiotic stress responses involve rapid chromatin remodeling at stress-responsive regulatory elements. Combined ATAC-seq and RNA-seq analysis in rice OsNMCP1 overexpression lines under drought conditions revealed chromatin accessibility patterns and downstream gene expression changes, elucidating the roles of nuclear envelope proteins in regulating drought and root growth-related genes — the integration of these two assays elevated the study to a level specifically noted as instructive for high-quality publication.
For plant stress studies, ATAC-seq + RNA-seq is typically sufficient to identify stress-responsive regulatory elements and their associated genes. Hi-C adds value when the research question involves long-range regulatory contacts — such as whether a stress-induced accessible element located far from a stress-responsive gene is physically connected to that gene's promoter — or when chromatin domain reorganization under stress is the subject of investigation.
Matched-Sample Design and Sample Planning
The analytical value of three-assay integration depends entirely on the sample design. Data from three assays run on three different batches of cells — even the same cell line, the same passage, the same treatment — cannot be reliably integrated for correlation analysis. The chromatin state and transcriptional output captured by each assay reflects not only the biological condition but also the technical variation of that specific experiment. Multi-omics integration is only as reliable as the sample matching.
Split-Sample Strategy: Allocating One Biological Sample Across Three Assays
The recommended approach is to collect a single biological sample (one animal, one patient biopsy, one cell culture replicate) and allocate defined portions to each assay at the time of collection. For cell-based experiments:
- Hi-C: requires the largest cell input of the three methods — typically several million cells for standard in situ Hi-C; input scales with the required resolution and genome size
- ATAC-seq: can be performed at lower cell input than Hi-C, from thousands of cells in optimized protocols; standard bulk ATAC-seq typically requires 50,000–500,000 cells depending on cell type
- RNA-seq: lowest input requirement of the three; RNA extraction from a small portion of the sample is sufficient for standard bulk RNA-seq
For each biological replicate, aliquot cells before processing: one fraction to Hi-C crosslinking, one to nuclei isolation for ATAC-seq, one to RNA extraction. All three fractions must come from the same population before any condition-specific treatment variation can accumulate. Our ATAC-seq service and integrating RNA-seq and epigenomic data analysis service support matched-sample experimental design consultation before collection.
Replication Requirements and Batch Effect Considerations
Multi-omics integration is not more forgiving of low replicate numbers than individual assay analysis — it is less forgiving. Correlation-based integration (comparing ATAC-seq peak scores to RNA-seq expression values across samples) requires biological replicates to distinguish true regulatory relationships from sample-specific variation.
A minimum of two biological replicates per condition supports basic concordance analysis; three or more replicates enable statistical testing with FDR control. For Hi-C specifically, technical replication (multiple libraries from the same biological sample) is recommended to confirm loop calls at the contact frequency thresholds used for loop detection. Batch effects between Hi-C libraries prepared at different times can introduce systematic biases in contact frequency normalization; all libraries within a comparison should be processed in the same batch where possible.
Planning a matched-sample multi-omics project? Our team can review your sample allocation and replicate structure before collection. Contact us to discuss your study design →
Integration Analysis: From Three Data Layers to Regulatory Pairs
Once matched-sample data is collected and individually processed, integration analysis follows a two-step logic that prevents the combinatorial complexity of three-assay co-analysis from overwhelming biological interpretation.
Step 1 — ATAC + RNA-Seq Baseline: Concordant Accessibility and Expression
The first integration step establishes the ATAC-seq and RNA-seq concordance landscape: which genomic loci show matched changes in chromatin accessibility and gene expression between conditions. The standard output is a four-quadrant classification:
- ↑ accessibility / ↑ expression: activated regulatory program — highest priority candidates
- ↑ accessibility / ↓ expression: accessible elements with repressed target genes — potential repressor-binding sites or buffering elements
- ↓ accessibility / ↑ expression: closed chromatin with upregulated genes — may reflect indirect regulation or distal activation
- ↓ accessibility / ↓ expression: deactivated regulatory program
Pearson correlation of matched features provides a global view of the accessibility-expression relationship; gene-level assignment links specific ATAC-seq peaks (within promoter windows, typically ±2 kb from TSS) to their associated DEGs. This step produces the candidate regulatory element shortlist that Step 2 will contextualize using Hi-C.
Figure 3. Integration analysis proceeds in two steps: first establishing ATAC-seq and RNA-seq concordance, then adding Hi-C loop context to assign distal regulatory elements to target genes.
Step 2 — Adding Hi-C Context: Assigning Regulatory Elements to Target Genes via Loops
The second step takes the candidate regulatory element shortlist from Step 1 and maps each element against the Hi-C contact landscape to ask: is this accessible element physically connected to a gene promoter via a chromatin loop, and does that loop-anchored gene show correlated expression changes?
Enhancers are defined as loci with overlapping ATAC-seq and histone modification peaks that do not overlap gene promoters; intersecting these enhancers with chromatin loops from Hi-C data identifies the subset of accessible elements with confirmed promoter connectivity — the highest-confidence regulatory candidates in the integrated dataset.
For distal elements (those failing Step 1's proximal assignment), this step is the primary discovery mechanism: an accessible peak 400 kb from any TSS appears uninterpretable by proximity analysis but may be the loop anchor of a high-confidence enhancer-promoter interaction visible in Hi-C data. The final integrated output — a ranked list of regulatory element–target gene pairs with accessibility, expression, and loop evidence — provides the mechanistic substrate for downstream validation.
For analysis guidance on Hi-C data processing, contact matrix normalization, and TAD/loop calling, see our Hi-C data analysis workflow resource. Our epigenomic data analysis service supports custom three-assay integration pipelines for research, drug development, and breeding applications.
Ready to integrate your multi-omics dataset? Explore our integrated epigenomic data analysis services →
Frequently Asked Questions
1) What does Hi-C add to ATAC-seq and RNA-seq in a multi-omics study?
Hi-C provides the 3D spatial context that ATAC-seq and RNA-seq cannot supply: which accessible regulatory elements are physically connected to which gene promoters via chromatin loops, and whether those genes fall within the same or different topologically associating domains. The most important unique contribution of Hi-C is distal enhancer-to-gene assignment — for regulatory elements located more than ~500 kb from their target gene, proximity-based ATAC + RNA-seq integration misidentifies the target. Hi-C contact data resolves the actual looping connection. Hi-C also enables TAD boundary analysis, which is required for studying structural disruptions of chromatin insulation in cancer and developmental systems.
2) Can ATAC-seq and RNA-seq alone identify enhancer-promoter regulatory pairs?
For promoter-proximal regulatory elements — those within 2–5 kb of a gene's TSS — ATAC-seq + RNA-seq integration is sufficient for high-confidence assignment. For distal enhancers located hundreds of kilobases from their target genes, proximity-based assignment is unreliable and Hi-C loop data is required. In practice, the majority of tissue-specific and condition-specific regulatory elements characterized in cancer and developmental biology are distal (>50 kb from target gene); the fraction amenable to proximal assignment alone is a minority of the most informative regulatory candidates.
3) What is the minimum sample requirement for running all three assays on the same biological sample?
The input requirements are set by the highest-demand assay in the combination — standard Hi-C requires millions of cells per library, which substantially exceeds the input needed for ATAC-seq (typically 50,000–500,000 cells) or RNA-seq (even lower). Split-sample allocation must account for Hi-C input first; ATAC-seq and RNA-seq fractions are sized relative to the remaining material. For samples where Hi-C input is constrained, in situ Hi-C with targeted low-input protocols or Capture Hi-C (which enriches specific loci rather than the whole genome) can reduce input requirements. Exact requirements depend on cell type, genome size, and target resolution.
4) How do you integrate Hi-C, ATAC-seq, and RNA-seq data computationally?
Integration follows a two-step logic. Step 1: process ATAC-seq (peak calling with MACS2/HMMRATAC) and RNA-seq (DESeq2 or edgeR differential expression) independently, then match peaks to genes through promoter-proximal overlap and correlation analysis. Step 2: process Hi-C (Juicer, HiCExplorer, or cooltools for contact matrix generation and normalization; HiCCUPS or MUSTACHE for loop calling), then intersect Hi-C loop anchors with the ATAC-seq peak set and DEG list to identify loop-supported regulatory pairs. Bedtools is the standard tool for intersection operations across all three data types. The final output is a ranked table of regulatory element–target gene pairs with evidence from all three assays.
5) When is Hi-C + ATAC-seq + RNA-seq integration necessary versus just two assays?
Three-assay integration is necessary when: (1) the research question requires distal enhancer-to-gene assignment beyond proximity analysis range; (2) TAD boundary disruption or compartment switching is suspected as a regulatory mechanism; (3) the study involves structural variants or epigenetic changes known to affect chromatin insulation (cancer, imprinting disorders); or (4) the goal is to build a comprehensive gene regulatory network rather than identify candidate elements. For most standard regulatory mechanism studies — identifying which accessible elements drive which expression changes in a condition-specific manner — ATAC-seq + RNA-seq is sufficient and more cost-effective.
6) What are the most common failure points in multi-omics Hi-C ATAC-seq RNA-seq integration?
Four failure modes account for most integration problems. First, sample mismatch: running the three assays on different biological batches introduces technical confounders that correlation analysis cannot separate from biology. Second, reference genome version inconsistency: aligning Hi-C, ATAC-seq, and RNA-seq data to different genome builds or annotation versions makes intersection analysis unreliable. Third, normalization mismatch: using raw counts for one assay and normalized values for another inflates or deflates apparent concordance. Fourth, proximity-based assignment for distal elements: attributing ATAC-seq peaks to the nearest TSS and calling the result a regulatory pair without Hi-C confirmation produces many false assignments for distal elements.
7) How many replicates are needed for Hi-C ATAC-seq RNA-seq integration studies?
A minimum of two biological replicates per condition enables basic concordance analysis; three or more replicates support FDR-controlled differential analysis for all three assays. Multi-omics integration is more replicate-sensitive than single-assay analysis because correlation-based integration amplifies sample-specific noise — a spurious accessibility change in one replicate can falsely associate with a spurious expression change in the same replicate, producing a false regulatory candidate. For Hi-C specifically, technical replicates (multiple libraries from the same biological sample) are recommended to validate loop calls at lower confidence thresholds.
References
- Flavahan, W.A. et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature, 529, 110–114 (2016). https://doi.org/10.1038/nature16490
- Klemm, S.L., Shipony, Z. & Greenleaf, W.J. Chromatin accessibility and the regulatory epigenome. Nature Reviews Genetics, 20, 207–220 (2019). https://doi.org/10.1038/s41576-018-0089-8
- Javierre, B.M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell, 167, 1369–1384 (2016). https://doi.org/10.1016/j.cell.2016.09.037
- Grandi, F.C., Modi, H., Kampman, L. & Corces, M.R. Chromatin accessibility profiling by ATAC-seq. Nature Protocols, 17, 1518–1552 (2022). https://doi.org/10.1038/s41596-022-00692-9
- Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biology, 18, 83 (2017). https://doi.org/10.1186/s13059-017-1215-1
- Kim, T. et al. Comparative characterization of 3D chromatin organization in triple-negative breast cancers. Experimental & Molecular Medicine, 54, 585–600 (2022). https://doi.org/10.1038/s12276-022-00768-2


