At a glance:
Preparing reliable samples for HiFi-C sequencing is the single biggest lever you control for downstream success. HiFi-C is a long-read chromatin interaction sequencing approach that pairs in situ Hi-C chemistry with PacBio HiFi reads to capture proximity-ligated concatemers with high accuracy. Done well, it supports chromosome-scale genome assembly, more confident structural variant analysis, and cleaner 3D genome maps—especially in repetitive regions where short reads struggle. Done poorly, it yields sparse contact maps, high duplication, and inconclusive scaffolding.
This QC-first guide walks through an executable sample preparation workflow—from crosslinking through library construction and sequencing—and spells out concrete QC gates (library size distribution, duplication ceiling, cis/trans ranges, P(s) decay features, and more). Throughout, we note parameters that labs can adopt immediately and where to pilot and tune for cell type or tissue context.
HiFi-C sequencing adapts the logic of in situ Hi-C—crosslink proximal chromatin, digest, and ligate fragments that were physically close in the nucleus—then sequences the resulting concatemers as accurate long reads. The standard pipeline proceeds as follows. First, collect and stabilize your input: cultured cells or tissues kept cold. When feasible, crosslink fresh samples promptly; for frozen material, maintain cryogenic conditions until nuclei isolation. Next, fix chromatin interactions, typically with formaldehyde, so that genuine spatial proximities survive downstream handling. Digest with a frequent-cutting restriction enzyme (e.g., DpnII or MboI) to create sticky-ended fragments, then perform proximity ligation in situ under dilute conditions to favor junctions that reflect real spatial proximity. Purify the ligated DNA, enrich long fragments, and construct PacBio-compatible libraries while minimizing shear. Finally, run PacBio HiFi sequencing to generate high-accuracy reads that traverse ligated concatemers and move on to chromatin interaction analysis for mapping, contact matrices, and downstream scaffolding/SV/3D applications.
This logic—preserve native proximities, tag them by ligation, and read through them with long, accurate sequences—explains why HiFi-C sequencing can improve mappability and phasing in repeat-rich regions relative to short-read Hi-C, as shown in long-read 3C/Hi-C studies such as the open-access CiFi method in 2024/2025. See the demonstration of multi-contact long-read capture and assembly utility in the peer-reviewed CiFi study: "Accurate long-read chromatin conformation capture with low input" (2024/2025).
Overview of the HiFi-C sequencing workflow for long-read chromatin interaction analysis.
HiFi-C sequencing works on a wide range of inputs, but success hinges on nuclei integrity, accessible chromatin, and careful handling to preserve long ligated molecules. Cultured mammalian cells—adherent or suspension—are the most straightforward; crosslink directly in culture vessels to minimize handling. Typical inputs range from 0.5–5 million cells per aliquot for standard protocols; low-input adaptations exist but should be piloted.
Fresh animal tissues crosslink well if processed promptly. For archived material, cryogenic pulverization followed by rapid nuclei isolation helps maintain integrity; expect digestion efficiency to vary with tissue composition. In plants, cell walls and secondary metabolites can impede diffusion and digestion. Enhanced chemistries (e.g., dual crosslinkers and dual enzymes) have improved signal in plants. For instance, the Frontiers in Plant Science upgrade to "Hi-C 3.0" reported better feature recovery with FA+DSG crosslinking and DpnII+DdeI digestion in plant material, increasing valid-pair fractions and loop detection compared with conventional protocols, as described by Han et al., 2023 (Frontiers in Plant Science).
For fresh samples, crosslink promptly, maintain the cold chain, and avoid over-handling before fixation. For frozen samples, keep them cryogenic until nuclei prep, and plan to re-tune crosslinking and digestion; always run a small pilot to verify QC before scaling. Finally, set inputs to exceed downstream QC and duplicate-rate thresholds. Where cell numbers are limiting, adopt low-input variants inspired by long-read 3C approaches, but only after a pilot confirms complexity.
Crosslinking chemically "freezes" spatial proximities so digestion and ligation capture real contacts rather than random collisions. In situ processing within nuclei further suppresses noise relative to early Hi-C variants. This logic and its advantages are summarized in practical reviews such as "The Hitchhiker's Guide to Hi-C Analysis" (Methods, 2014) and updated overviews like Liu et al., 2024 (Frontiers in Genetics).
Treat these as validated starting parameters for many cultured cells. As with any crosslinking chemistry, optimize for your cell type or tissue (e.g., adjust time ±2–5 minutes, verify temperature control). Document lot numbers and timing to support audit trails. A quick audit checklist that teams often adopt includes: recording exact start/stop times to the minute; verifying FA concentration with a dated stock log; confirming quench molarity freshly prepared; logging ambient temperature; and capturing a snapshot gel or Bioanalyzer trace from a small pilot ligation to document high-molecular-weight material prior to scale-up. These simple steps make internal QA reviews much smoother.
Glycine neutralizes unreacted formaldehyde and helps restore conditions for restriction digestion. After quenching, wash to remove residual crosslinker and prepare for nuclei isolation. Maintain gentle pipetting and minimal vortexing to reduce shear that would later shorten concatemers.
Over-crosslinking reduces digestion efficiency and ligation yield, inflates short-range contacts, and elevates duplicate rates. On contact maps, this can skew P(s) decay toward very short genomic distances and flatten long-range signal relative to expectations from benchmarks such as Yardımcı & Noble's 2019 reproducibility study in Genome Biology. Under-crosslinking increases random ligations and artifacts like self-circles/dangling ends; diagonals and compartment patterns weaken, and long-range cis contacts are depressed, as noted in QC assessments like Dozmorov, 2021 (GigaScience) and the Methods guide above. A practical test is to run a tiny digestion-ligation on a crosslinked aliquot and visualize the smear; a strong high-molecular-weight shoulder with reduced low-molecular-weight noise generally signals you're in the right window.
Formaldehyde crosslinking stabilizes chromatin interactions for HiFi-C sequencing.
Most mammalian protocols favor frequent cutters that recognize GATC sites (DpnII or its isoschizomer MboI) to generate fragments with sticky ends and higher effective resolution than 6-cutters like HindIII. This rationale and early procedural standards are reviewed in Belton et al., 2012 (Methods). For plants and recalcitrant tissues, the Hi-C 3.0 strategy of dual digestion (DpnII + DdeI) after strengthened crosslinking (FA + DSG) improved fragment distributions and signal recovery, per Han et al., 2023 (Frontiers in Plant Science).
Carrying out ligation within intact nuclei (in situ) under dilute conditions enriches for ligations that reflect true spatial proximity. The goal is to create chimeric junctions that encode pairwise (and sometimes multi-contact) interactions. Long-read approaches like HiFi-C then read across these concatemers, aiding mapping and phasing in repeat-rich regions, consistent with the long-read capture evidence in the CiFi study (2024/2025).
After proximity ligation, purify DNA with bead-based cleanup, keeping elution volumes and bead ratios tuned for long fragments. The critical goal is to preserve multi-kilobase concatemers that PacBio HiFi reads can traverse. Size selection can be used to push the insert distribution into a multi-kb window, but be mindful that overly aggressive cuts reduce complexity and can inflate duplicates.
Use gentle handling (wide-bore tips, minimal pipetting cycles) and avoid vortexing. If using size selection, pilot a window that retains substantial 3–15 kb material, then check the empirical distribution (e.g., peak and N50) before deep sequencing. Because HiFi-C insert windows are not yet standardized in peer-reviewed guidance, prefer data-driven validation of your specific library's length profile and complexity, as encouraged by scaffolding evaluations such as Bickhart et al., 2022 (Genome Research/PMC).
High complexity lowers duplication and increases unique usable contacts per unit of sequencing. Before committing multiple SMRT Cells, run a shallow test to estimate duplication, project usable pairs, and confirm distance-stratified interaction profiles look plausible. Here's the deal: a single well-designed pilot often saves far more cost and time than it consumes, because it prevents deep sequencing of a suboptimal library.
Preparation of HiFi-C sequencing libraries from proximity-ligated chromatin fragments.
If you prefer to benchmark your library against established wet-lab and bioinformatics gates before scale-up, consider a pilot with a specialized provider; many groups opt to align their QC and reporting against a standardized process through a neutral, non-promotional resource such as our internal HiFi-C sequencing service during method validation. The goal is simply to verify complexity and contact-map behavior before committing deeper budgets.
The following consolidated metrics are widely referenced for Hi-C-like data quality. Treat them as planning ranges that you'll interpret alongside organism size, protocol details, and experimental goals. See linked references in surrounding text for methodology and rationale.
| QC aspect | Practical range/goal | Why it matters |
|---|---|---|
| Library size distribution | Peak/N50 in multi-kb window compatible with HiFi; minimal sub-kb tail | Preserves long concatemers that HiFi reads can traverse efficiently |
| Duplication rate (deep sets) | Aim < ~40% | Preserves unique yield; high duplication signals low complexity (see Bickhart 2022; Yardımcı 2019) |
| Mean HiFi read length and quality | Typical HiFi 10–25 kb; Q≥30 for most reads | Long, accurate reads improve mapping in repeats and support phasing |
| Usable/valid contacts | Sufficient for target resolution per modeling (e.g., HiCRes) | Confirms ligation chemistry worked and depth will meet analysis goals |
| Cis/trans ratio (human-scale) | ~40–60 | Balanced intra- vs inter-chromosomal signal (Lajoie 2014) |
| P(s) decay curve | Monotonic with expected slope | Detects over/under-crosslinking and ligation/mapping artifacts (Yardımcı 2019) |
| Scaffold N50 uplift (assembly projects) | Clear increase vs pre-HiFi-C assembly | Confirms that interaction data enable chromosome-scale scaffolding |
HiFi-C sequencing benefits from the properties of PacBio HiFi reads: long consensus reads (commonly 10–25 kb) with high per-read accuracy (often Q30 or better). Those traits improve alignment specificity and help traverse repetitive or structurally complex regions—exactly where scaffolding and SV detection depend on clean, long-range contact evidence.
Resolution scales with read count. For short-read Hi-C, modeling frameworks estimate achievable bin resolution vs. depth; the same logic guides HiFi-C planning. According to Oluwadare et al., 2022 (Nucleic Acids Research, HiCRes), you can predict target resolution from downsampled data—use this approach on a pilot HiFi-C subset to set budgets realistically. For chromosome-scale scaffolding, depth requirements are often lower than for high-resolution loop detection; existing scaffolding evaluations like Bickhart et al., 2022 (Genome Research/PMC) discuss trade-offs between contiguity, noise, and depth.
Balance SMRT Cell allocation against library complexity and read length distribution. If test-run metrics show high duplication or a short peak, revisit size selection or cleanup before scaling. Think of it this way: you're trading throughput for unique contacts; improving complexity usually yields better returns than simply adding more cells.
After mapping and filtering, you'll build contact maps to drive several downstream analyses.
Map long reads, parse ligation junctions, and convert to contact pairs/matrices. Evaluate cis/trans ratios, distance-stratified contact densities, and P(s) decay to confirm signal quality. The expected shape characteristics and reproducibility considerations are summarized in Yardımcı & Noble's 2019 assessment (Genome Biology).
Integrating HiFi-C contact maps with long-read assemblies commonly improves contiguity from contig N50 to chromosome-scale scaffolds, provided the interaction data are clean. Technical reviews have documented how Hi-C-type data aid scaffolding and how to evaluate results; see Bickhart et al., 2022 (Genome Research/PMC) and perspective on what drives chromosome-scale improvements in Kadota & Nishiyama, 2020 (GigaScience).
Long-range contacts reinforce and disambiguate SV calls in repeat-rich contexts, while high-accuracy reads improve mappability. For 3D genome studies (compartments, domain-like structures, and loops), plan for additional depth beyond scaffolding needs; reviews such as Lajoie et al., 2014 (Methods) outline how contact features map to biological structures. Long-read multi-contact evidence further strengthens interpretation in complex regions, as shown by the CiFi study (2024/2025).
If you're validating a new organism or tissue type and want an independent checkpoint on library QC and contact-map behavior, you can optionally benchmark against a standardized lab+informatics process via our HiFi-C sequencing service during pilot runs. This is strictly about method validation—confirming complexity, cis/trans balance, and P(s) shape before committing full budgets.
Poor crosslinking undermines contact capture. Under-crosslinking reduces true long-range cis interactions and increases artifacts like self-circles and dangling ends. If your cis/trans drops well below ~30 on human-scale data or long-range contacts are markedly depressed, re-tune crosslink time (for example, extend by 2–5 minutes) and verify that quenching and washes are timely and complete. Over-crosslinking leads to poor digestion and ligation and an excess of ultra-short-distance contacts—dial back time slightly, ensure fresh FA stocks, and confirm incubation temperature.
DNA degradation or shearing lowers unique yield. Harsh nuclei handling, excessive pipetting, or vortexing can shift the library's peak down and erode N50, leading to high duplication. Switch to wide-bore tips, minimize transfers, and reassess cleanup bead ratios and elution volumes to protect long concatemers.
Low library complexity inflates duplicates. A narrow insert distribution or over-amplification (if any PCR steps are used) boosts redundancy. Consider broadening the size-selection window, increasing input mass (when available), or revisiting cleanup to improve recovery of mid- to long-fragments.
Insufficient sequencing depth yields sparse matrices. Downsample pilot data and use a resolution-vs-depth model (e.g., HiCRes) to project needs; scale SMRT Cells accordingly.
Choose HiFi-C when you need chromosome-level scaffolding from a contiguous long-read assembly; when structural variation must be clarified in complex or repetitive regions; and when 3D genome features (compartments, domain-like structures, loops) matter for your questions. It's also compelling for repeats-heavy or polyploid genomes where long-read multi-contact evidence improves mappability and phasing relative to short-read Hi-C alone, as highlighted by CiFi (2024/2025). What's the smallest pilot that would still give you a confident read on complexity and cis/trans? In practice, even a fraction of a SMRT Cell can provide enough signal to decide go/no-go for scale-up.
Start with the baseline crosslink SOP (1% FA, 10 min RT; quench 0.125 M glycine, 5 min on ice) and run a small pilot to check library size distribution, duplication, cis/trans, and P(s) shape. Use those readouts to lock parameters before large-scale sequencing. If you'd like an external checkpoint or need to accelerate method validation, you can coordinate a limited pilot through our HiFi-C sequencing service to benchmark QC and projected depth.
Author: Dr. Yang H., Senior Scientist at CD Genomics
LinkedIn: https://www.linkedin.com/in/yang-h-a62181178/
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment