HiFi‑C Sequencing: Sample Prep, Crosslinking, and Workflow

At a glance:

Key takeaways
Overview of the HiFi-C Sequencing Workflow
Sample Types Suitable for HiFi-C Sequencing
Crosslinking Strategies for HiFi-C Experiments
Chromatin Digestion and Proximity Ligation
HiFi-C Library Preparation
Sequencing Considerations for HiFi-C Experiments
Data Outputs and Downstream Analysis
Common Pitfalls in HiFi-C Sample Preparation
When Should Researchers Use HiFi-C Sequencing?
Next steps
Author
References

Preparing reliable samples for HiFi-C sequencing is the single biggest lever you control for downstream success. HiFi-C is a long-read chromatin interaction sequencing approach that pairs in situ Hi-C chemistry with PacBio HiFi reads to capture proximity-ligated concatemers with high accuracy. Done well, it supports chromosome-scale genome assembly, more confident structural variant analysis, and cleaner 3D genome maps—especially in repetitive regions where short reads struggle. Done poorly, it yields sparse contact maps, high duplication, and inconclusive scaffolding.

This QC-first guide walks through an executable sample preparation workflow—from crosslinking through library construction and sequencing—and spells out concrete QC gates (library size distribution, duplication ceiling, cis/trans ranges, P(s) decay features, and more). Throughout, we note parameters that labs can adopt immediately and where to pilot and tune for cell type or tissue context.

Key takeaways

Prioritize a compliance-friendly, auditable SOP with explicit parameters. For crosslinking, a widely used starting point is 1% formaldehyde at room temperature for 10 minutes; quench with 0.125 M glycine for 5 minutes on ice, then proceed to nuclei extraction/cleanup. Treat this as a validated baseline and re-optimize per sample type.
Protect long, ligated fragments at every step. Gentle handling and size selection schemes that preserve multi-kb concatemers improve the efficiency of HiFi-C on PacBio instruments.
Use pre-defined QC gates before scaling: fragment peak/N50 window appropriate for HiFi, duplication rate ceiling, usable-pair projections, balanced cis/trans (~40–60 on human-scale), and an expected P(s) decay shape supported by literature benchmarks.
Pilot first, then scale. Sequence a modest fraction to confirm library complexity and distance-stratified signal; only then commit deeper throughput.

Overview of the HiFi-C Sequencing Workflow

HiFi-C sequencing adapts the logic of in situ Hi-C—crosslink proximal chromatin, digest, and ligate fragments that were physically close in the nucleus—then sequences the resulting concatemers as accurate long reads. The standard pipeline proceeds as follows. First, collect and stabilize your input: cultured cells or tissues kept cold. When feasible, crosslink fresh samples promptly; for frozen material, maintain cryogenic conditions until nuclei isolation. Next, fix chromatin interactions, typically with formaldehyde, so that genuine spatial proximities survive downstream handling. Digest with a frequent-cutting restriction enzyme (e.g., DpnII or MboI) to create sticky-ended fragments, then perform proximity ligation in situ under dilute conditions to favor junctions that reflect real spatial proximity. Purify the ligated DNA, enrich long fragments, and construct PacBio-compatible libraries while minimizing shear. Finally, run PacBio HiFi sequencing to generate high-accuracy reads that traverse ligated concatemers and move on to chromatin interaction analysis for mapping, contact matrices, and downstream scaffolding/SV/3D applications.

This logic—preserve native proximities, tag them by ligation, and read through them with long, accurate sequences—explains why HiFi-C sequencing can improve mappability and phasing in repeat-rich regions relative to short-read Hi-C, as shown in long-read 3C/Hi-C studies such as the open-access CiFi method in 2024/2025. See the demonstration of multi-contact long-read capture and assembly utility in the peer-reviewed CiFi study: "Accurate long-read chromatin conformation capture with low input" (2024/2025).

HiFi C sequencing workflow chromatin interaction analysis

Overview of the HiFi-C sequencing workflow for long-read chromatin interaction analysis.

Sample Types Suitable for HiFi-C Sequencing

HiFi-C sequencing works on a wide range of inputs, but success hinges on nuclei integrity, accessible chromatin, and careful handling to preserve long ligated molecules. Cultured mammalian cells—adherent or suspension—are the most straightforward; crosslink directly in culture vessels to minimize handling. Typical inputs range from 0.5–5 million cells per aliquot for standard protocols; low-input adaptations exist but should be piloted.

Fresh animal tissues crosslink well if processed promptly. For archived material, cryogenic pulverization followed by rapid nuclei isolation helps maintain integrity; expect digestion efficiency to vary with tissue composition. In plants, cell walls and secondary metabolites can impede diffusion and digestion. Enhanced chemistries (e.g., dual crosslinkers and dual enzymes) have improved signal in plants. For instance, the Frontiers in Plant Science upgrade to "Hi-C 3.0" reported better feature recovery with FA+DSG crosslinking and DpnII+DdeI digestion in plant material, increasing valid-pair fractions and loop detection compared with conventional protocols, as described by Han et al., 2023 (Frontiers in Plant Science).

For fresh samples, crosslink promptly, maintain the cold chain, and avoid over-handling before fixation. For frozen samples, keep them cryogenic until nuclei prep, and plan to re-tune crosslinking and digestion; always run a small pilot to verify QC before scaling. Finally, set inputs to exceed downstream QC and duplicate-rate thresholds. Where cell numbers are limiting, adopt low-input variants inspired by long-read 3C approaches, but only after a pilot confirms complexity.

Crosslinking Strategies for HiFi-C Experiments

Purpose of crosslinking

Crosslinking chemically "freezes" spatial proximities so digestion and ligation capture real contacts rather than random collisions. In situ processing within nuclei further suppresses noise relative to early Hi-C variants. This logic and its advantages are summarized in practical reviews such as "The Hitchhiker's Guide to Hi-C Analysis" (Methods, 2014) and updated overviews like Liu et al., 2024 (Frontiers in Genetics).

Executable baseline SOP (formaldehyde)

Crosslink with 1% formaldehyde at room temperature for 10 minutes.
Quench with 0.125 M glycine for 5 minutes on ice.
Proceed immediately to nuclei extraction and cleanup under cold conditions.

Treat these as validated starting parameters for many cultured cells. As with any crosslinking chemistry, optimize for your cell type or tissue (e.g., adjust time ±2–5 minutes, verify temperature control). Document lot numbers and timing to support audit trails. A quick audit checklist that teams often adopt includes: recording exact start/stop times to the minute; verifying FA concentration with a dated stock log; confirming quench molarity freshly prepared; logging ambient temperature; and capturing a snapshot gel or Bioanalyzer trace from a small pilot ligation to document high-molecular-weight material prior to scale-up. These simple steps make internal QA reviews much smoother.

Quenching and cleanup

Glycine neutralizes unreacted formaldehyde and helps restore conditions for restriction digestion. After quenching, wash to remove residual crosslinker and prepare for nuclei isolation. Maintain gentle pipetting and minimal vortexing to reduce shear that would later shorten concatemers.

Common failure modes to watch for

Over-crosslinking reduces digestion efficiency and ligation yield, inflates short-range contacts, and elevates duplicate rates. On contact maps, this can skew P(s) decay toward very short genomic distances and flatten long-range signal relative to expectations from benchmarks such as Yardımcı & Noble's 2019 reproducibility study in Genome Biology. Under-crosslinking increases random ligations and artifacts like self-circles/dangling ends; diagonals and compartment patterns weaken, and long-range cis contacts are depressed, as noted in QC assessments like Dozmorov, 2021 (GigaScience) and the Methods guide above. A practical test is to run a tiny digestion-ligation on a crosslinked aliquot and visualize the smear; a strong high-molecular-weight shoulder with reduced low-molecular-weight noise generally signals you're in the right window.

chromatin crosslinking formaldehyde HiFi C sequencing

Formaldehyde crosslinking stabilizes chromatin interactions for HiFi-C sequencing.

Chromatin Digestion and Proximity Ligation

Restriction enzyme choice and digestion

Most mammalian protocols favor frequent cutters that recognize GATC sites (DpnII or its isoschizomer MboI) to generate fragments with sticky ends and higher effective resolution than 6-cutters like HindIII. This rationale and early procedural standards are reviewed in Belton et al., 2012 (Methods). For plants and recalcitrant tissues, the Hi-C 3.0 strategy of dual digestion (DpnII + DdeI) after strengthened crosslinking (FA + DSG) improved fragment distributions and signal recovery, per Han et al., 2023 (Frontiers in Plant Science).

Proximity ligation logic

Carrying out ligation within intact nuclei (in situ) under dilute conditions enriches for ligations that reflect true spatial proximity. The goal is to create chimeric junctions that encode pairwise (and sometimes multi-contact) interactions. Long-read approaches like HiFi-C then read across these concatemers, aiding mapping and phasing in repeat-rich regions, consistent with the long-read capture evidence in the CiFi study (2024/2025).

HiFi-C Library Preparation

From ligation product to long-read-ready library

After proximity ligation, purify DNA with bead-based cleanup, keeping elution volumes and bead ratios tuned for long fragments. The critical goal is to preserve multi-kilobase concatemers that PacBio HiFi reads can traverse. Size selection can be used to push the insert distribution into a multi-kb window, but be mindful that overly aggressive cuts reduce complexity and can inflate duplicates.

Long-fragment recovery and size selection

Use gentle handling (wide-bore tips, minimal pipetting cycles) and avoid vortexing. If using size selection, pilot a window that retains substantial 3–15 kb material, then check the empirical distribution (e.g., peak and N50) before deep sequencing. Because HiFi-C insert windows are not yet standardized in peer-reviewed guidance, prefer data-driven validation of your specific library's length profile and complexity, as encouraged by scaffolding evaluations such as Bickhart et al., 2022 (Genome Research/PMC).

Library complexity and sequencing efficiency

High complexity lowers duplication and increases unique usable contacts per unit of sequencing. Before committing multiple SMRT Cells, run a shallow test to estimate duplication, project usable pairs, and confirm distance-stratified interaction profiles look plausible. Here's the deal: a single well-designed pilot often saves far more cost and time than it consumes, because it prevents deep sequencing of a suboptimal library.

HiFi C library preparation workflow PacBio sequencing

Preparation of HiFi-C sequencing libraries from proximity-ligated chromatin fragments.

If you prefer to benchmark your library against established wet-lab and bioinformatics gates before scale-up, consider a pilot with a specialized provider; many groups opt to align their QC and reporting against a standardized process through a neutral, non-promotional resource such as our internal HiFi-C sequencing service during method validation. The goal is simply to verify complexity and contact-map behavior before committing deeper budgets.

Practical QC checkpoints (planning ranges)

The following consolidated metrics are widely referenced for Hi-C-like data quality. Treat them as planning ranges that you'll interpret alongside organism size, protocol details, and experimental goals. See linked references in surrounding text for methodology and rationale.

QC aspect	Practical range/goal	Why it matters
Library size distribution	Peak/N50 in multi-kb window compatible with HiFi; minimal sub-kb tail	Preserves long concatemers that HiFi reads can traverse efficiently
Duplication rate (deep sets)	Aim < ~40%	Preserves unique yield; high duplication signals low complexity (see Bickhart 2022; Yardımcı 2019)
Mean HiFi read length and quality	Typical HiFi 10–25 kb; Q≥30 for most reads	Long, accurate reads improve mapping in repeats and support phasing
Usable/valid contacts	Sufficient for target resolution per modeling (e.g., HiCRes)	Confirms ligation chemistry worked and depth will meet analysis goals
Cis/trans ratio (human-scale)	~40–60	Balanced intra- vs inter-chromosomal signal (Lajoie 2014)
P(s) decay curve	Monotonic with expected slope	Detects over/under-crosslinking and ligation/mapping artifacts (Yardımcı 2019)
Scaffold N50 uplift (assembly projects)	Clear increase vs pre-HiFi-C assembly	Confirms that interaction data enable chromosome-scale scaffolding

Sequencing Considerations for HiFi-C Experiments

HiFi-C sequencing benefits from the properties of PacBio HiFi reads: long consensus reads (commonly 10–25 kb) with high per-read accuracy (often Q30 or better). Those traits improve alignment specificity and help traverse repetitive or structurally complex regions—exactly where scaffolding and SV detection depend on clean, long-range contact evidence.

Depth planning and resolution

Resolution scales with read count. For short-read Hi-C, modeling frameworks estimate achievable bin resolution vs. depth; the same logic guides HiFi-C planning. According to Oluwadare et al., 2022 (Nucleic Acids Research, HiCRes), you can predict target resolution from downsampled data—use this approach on a pilot HiFi-C subset to set budgets realistically. For chromosome-scale scaffolding, depth requirements are often lower than for high-resolution loop detection; existing scaffolding evaluations like Bickhart et al., 2022 (Genome Research/PMC) discuss trade-offs between contiguity, noise, and depth.

Instrument considerations

Balance SMRT Cell allocation against library complexity and read length distribution. If test-run metrics show high duplication or a short peak, revisit size selection or cleanup before scaling. Think of it this way: you're trading throughput for unique contacts; improving complexity usually yields better returns than simply adding more cells.

Data Outputs and Downstream Analysis

What you receive from a HiFi-C sequencing run typically includes raw HiFi reads and derived contact data.

After mapping and filtering, you'll build contact maps to drive several downstream analyses.

Raw read processing and contact maps

Map long reads, parse ligation junctions, and convert to contact pairs/matrices. Evaluate cis/trans ratios, distance-stratified contact densities, and P(s) decay to confirm signal quality. The expected shape characteristics and reproducibility considerations are summarized in Yardımcı & Noble's 2019 assessment (Genome Biology).

Genome scaffolding

Integrating HiFi-C contact maps with long-read assemblies commonly improves contiguity from contig N50 to chromosome-scale scaffolds, provided the interaction data are clean. Technical reviews have documented how Hi-C-type data aid scaffolding and how to evaluate results; see Bickhart et al., 2022 (Genome Research/PMC) and perspective on what drives chromosome-scale improvements in Kadota & Nishiyama, 2020 (GigaScience).

Structural variation and 3D genome analysis

Long-range contacts reinforce and disambiguate SV calls in repeat-rich contexts, while high-accuracy reads improve mappability. For 3D genome studies (compartments, domain-like structures, and loops), plan for additional depth beyond scaffolding needs; reviews such as Lajoie et al., 2014 (Methods) outline how contact features map to biological structures. Long-read multi-contact evidence further strengthens interpretation in complex regions, as shown by the CiFi study (2024/2025).

If you're validating a new organism or tissue type and want an independent checkpoint on library QC and contact-map behavior, you can optionally benchmark against a standardized lab+informatics process via our HiFi-C sequencing service during pilot runs. This is strictly about method validation—confirming complexity, cis/trans balance, and P(s) shape before committing full budgets.

Common Pitfalls in HiFi-C Sample Preparation

Poor crosslinking undermines contact capture. Under-crosslinking reduces true long-range cis interactions and increases artifacts like self-circles and dangling ends. If your cis/trans drops well below ~30 on human-scale data or long-range contacts are markedly depressed, re-tune crosslink time (for example, extend by 2–5 minutes) and verify that quenching and washes are timely and complete. Over-crosslinking leads to poor digestion and ligation and an excess of ultra-short-distance contacts—dial back time slightly, ensure fresh FA stocks, and confirm incubation temperature.

DNA degradation or shearing lowers unique yield. Harsh nuclei handling, excessive pipetting, or vortexing can shift the library's peak down and erode N50, leading to high duplication. Switch to wide-bore tips, minimize transfers, and reassess cleanup bead ratios and elution volumes to protect long concatemers.

Low library complexity inflates duplicates. A narrow insert distribution or over-amplification (if any PCR steps are used) boosts redundancy. Consider broadening the size-selection window, increasing input mass (when available), or revisiting cleanup to improve recovery of mid- to long-fragments.

Insufficient sequencing depth yields sparse matrices. Downsample pilot data and use a resolution-vs-depth model (e.g., HiCRes) to project needs; scale SMRT Cells accordingly.

When Should Researchers Use HiFi-C Sequencing?

Choose HiFi-C when you need chromosome-level scaffolding from a contiguous long-read assembly; when structural variation must be clarified in complex or repetitive regions; and when 3D genome features (compartments, domain-like structures, loops) matter for your questions. It's also compelling for repeats-heavy or polyploid genomes where long-read multi-contact evidence improves mappability and phasing relative to short-read Hi-C alone, as highlighted by CiFi (2024/2025). What's the smallest pilot that would still give you a confident read on complexity and cis/trans? In practice, even a fraction of a SMRT Cell can provide enough signal to decide go/no-go for scale-up.

Next steps

Start with the baseline crosslink SOP (1% FA, 10 min RT; quench 0.125 M glycine, 5 min on ice) and run a small pilot to check library size distribution, duplication, cis/trans, and P(s) shape. Use those readouts to lock parameters before large-scale sequencing. If you'd like an external checkpoint or need to accelerate method validation, you can coordinate a limited pilot through our HiFi-C sequencing service to benchmark QC and projected depth.

Author

Author: Dr. Yang H., Senior Scientist at CD Genomics
LinkedIn: https://www.linkedin.com/in/yang-h-a62181178/

References

Lajoie BR, Dekker J, Kaplan N. The Hitchhiker's Guide to Hi-C Analysis (Methods, 2014): foundational practical guidance on crosslinking, digestion, ligation, and QC. Methods (2014).
Han J et al. An upgraded method of high-throughput chromosome conformation capture (Hi-C 3.0) in plants: FA+DSG and dual digestion improve signal recovery. Frontiers in Plant Science (2023).
Yardımcı GG, Noble WS, et al. Measuring the reproducibility and quality of Hi-C data: expectations for P(s) decay, reproducibility metrics, and artifact detection. Genome Biology (2019).
Bickhart DM et al. Technical considerations in Hi-C scaffolding and evaluation of chromosome-scale assemblies. Genome Research/PMC (2022).
Oluwadare O et al. HiCRes: Estimate and predict the resolution of a Hi-C dataset from downsampled data. Nucleic Acids Research (2022).
Belton JM et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods (2012).
Liu R et al. Hi-C advances for 3D genome architecture analysis. Frontiers in Genetics (2024).
CiFi Consortium. Accurate long-read chromatin conformation capture with low input: long-read multi-contact capture and assembly utility. Open-access study (2024/2025).

For Research Use Only. Not for use in diagnostic procedures.

Talk about your projects

For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment