Using Hi-C to Refine Plant Genome Assemblies After Nanopore Sequencing
Nanopore sequencing has changed the baseline for plant genome assembly.
In many projects, long reads already produce contigs that are far more continuous than earlier short-read drafts, especially across repeat-rich regions and structurally difficult parts of the genome.
For many teams, the first real question is no longer whether long reads improve assembly quality. It is whether the resulting draft has actually crossed the line from sequence generation into structural finishing.
That distinction matters. A plant assembly can be locally strong and still remain incomplete at chromosome scale. Contigs may be long, but they may not yet be reliably assigned to chromosomes, ordered across long genomic intervals, or oriented with enough confidence for annotation and downstream comparative work.
This is where Hi-C often becomes useful. It does not replace long reads. It adds a different layer of evidence: long-range structural information that sequence overlap alone cannot provide.
This article focuses on a practical decision that many plant genome projects eventually face: after Nanopore assembly, when does Hi-C still add real value, and what should a team look for before treating scaffolding as the next logical step?
Why Hi-C Still Matters After Long-Read Assembly
Nanopore sequencing improves local continuity, while Hi-C adds long-range structural evidence for chromosome-scale organization.
Local Continuity From Long Reads
Nanopore assembly usually improves several things at once:
- contig length
- gap reduction
- repeat traversal
- local sequence continuity
- recovery of difficult genomic regions
Those gains are often large enough that the draft already looks "good" by conventional assembly standards. But a long contig set is not yet a chromosome-level genome.
What Remains at Chromosome Scale
At that stage, the unresolved questions are usually structural rather than sequence-based. Common issues include:
- chromosome assignment
- contig order across long genomic distances
- contig orientation
- confidence in large joins
- scaffold-level plausibility for downstream annotation
The Structural Role of Hi-C
Hi-C contributes long-range contact information that helps group, order, and orient contigs at chromosome scale. In practical terms, long reads build the pieces; Hi-C helps place them into a larger chromosomal framework.
For method background, see the introduction to Hi-C technology.
Figure 1. Long reads improve contiguity within genomic regions, while Hi-C provides the long-range evidence needed to organize those contigs into chromosome-scale scaffolds.
Defining the Scaffolding Objective
Hi-C projects are easier to design and evaluate when the assembly objective is explicit.
Common Scaffolding Objectives
| Primary Objective | What Hi-C Is Being Used For | Typical Output |
|---|---|---|
| Chromosome anchoring | assigning contigs to chromosomes and building pseudomolecules | chromosome-scale scaffolds |
| Structural review | checking suspicious joins or scaffold inconsistencies | corrected scaffold structure |
| Haplotype-aware scaffolding | preserving biologically distinct but similar regions | more conservative, cleaner scaffold structure |
For a relatively straightforward diploid genome, chromosome anchoring may be the main goal. The team wants to move from long contigs to chromosome-scale pseudomolecules that are ready for annotation.
In other projects, especially where there are concerns about large-scale joins, Hi-C is more valuable as a structural review tool than as a continuity booster.
In highly heterozygous or polyploid systems, the real challenge may be to avoid combining sequences that should remain separate.
A better project brief usually sounds like this:
- "We need chromosome anchoring for downstream annotation."
- "We need to review large-scale scaffold joins before finalizing pseudomolecules."
- "We need a more conservative scaffold structure for a highly heterozygous assembly."
That framing is much more useful than simply saying the team wants a "better assembly."
Assembly Readiness Before Hi-C Scaffolding
Hi-C contributes most when the draft assembly is already reasonably stable.
Features of a Scaffold-Ready Draft
A draft is usually in good shape for Hi-C when:
- contigs are long enough for long-range linking to add value
- polishing is largely complete
- contamination has been reviewed
- organellar sequences have been checked
- the main open questions concern scaffold structure, not local sequence quality
Conditions That Still Require Cleanup
Hi-C is often premature when:
- contigs remain highly fragmented
- obvious local misassemblies are still unresolved
- chloroplast or mitochondrial sequences remain mixed into the nuclear assembly
- base-level quality remains unstable
- the assembly is still changing substantially after polishing
At this stage, it helps to separate "drafts that are ready to be organized" from "drafts that still need to be stabilized."
Figure 2. Hi-C is usually most productive when the remaining uncertainty is scaffold-level rather than base-level or contamination-related.
The Timing of the Hi-C Step
Timing matters more than many teams expect. If Hi-C is introduced too early, the project risks using expensive long-range data to organize a draft that is not yet structurally trustworthy. If it is introduced too late, the added value may be limited because the assembly is already structurally close to its practical endpoint.
For workflow expectations from the experimental side, the Hi-C protocols and analysis guide is a helpful reference.
Plant Genome Complexity and Hi-C Interpretation
Plant genome scaffolding is rarely just a sequencing problem. In many cases, it is a genome biology problem as much as a data-generation problem.
Figure 3. Plant genome biology changes the scaffolding problem by introducing repeat-driven ambiguity, haplotype complexity, ploidy-related challenges, and organellar interference.
Repeat Content
Repeats are one of the most obvious complications. In a repeat-rich plant genome, long contigs may still correspond to structurally ambiguous regions. Hi-C contacts from repetitive sequence are often less clean than those from unique sequence, which means that strong continuity does not automatically translate into confident chromosome organization.
Heterozygosity
Highly heterozygous genomes introduce another form of ambiguity. If the draft contains partially separated haplotypes, contact patterns may reflect biologically real relationships within one haplotype while becoming misleading when forced into a single consensus structure.
Polyploidy
Polyploid genomes make the interpretation problem harder still. Homeologous chromosomes may be similar enough to complicate assignment while still representing distinct biological units. In those settings, scaffolding is no longer just about making sequences longer. It is about preserving chromosomal distinction while still building a usable structural framework.
Organellar Carryover
Organellar DNA is also more than a cleanup detail. Chloroplast and mitochondrial sequences can remain mixed into the nuclear assembly and generate strong contact signals that distort scaffolding if they are not handled properly.
Taken together, these factors explain why genome size alone is not a useful basis for planning a plant Hi-C project. A realistic scope review also needs to consider:
- repeat burden
- ploidy
- heterozygosity
- assembly provenance
- tissue source
- downstream use of the scaffolded genome
For computational context on contact-map interpretation, the Hi-C sequencing data analysis overview is worth reviewing.
Hi-C Library Quality and Data Planning
Useful scaffolding depends not only on the assembly, but also on the quality of the Hi-C library and the amount of usable contact data.
The experimental side matters more than many readers expect. Nuclei quality, crosslinking consistency, and ligation efficiency all influence how much of the final dataset consists of meaningful long-range contacts rather than technical background. Plant tissues can be especially demanding because of cell walls, secondary metabolites, and variable nuclear yield.
A weak library does not just reduce signal quality. It can make a structurally informative assembly look more ambiguous than it actually is.
At the same time, sequencing depth should not be treated as a fixed number that applies across all plant projects. Data requirements shift with:
- genome size
- repeat burden
- ploidy
- contig structure
- scaffold-level ambition
- downstream use case
The better question is not "How much Hi-C data do we need?" but "How much scaffold confidence does the downstream use require?" A genome being prepared for annotation and comparative genomics usually demands a higher standard than one being scaffolded mainly for internal structural review.
Software Choice and Contact-Map Review
Software choice still matters. Different scaffolders can produce meaningfully different results even when they are applied to the same assembly and Hi-C dataset. Benchmarking studies have shown that continuity, correctness, and scaffold behavior can vary substantially across tools, particularly in plant genomes with higher structural complexity.
That is why scaffolding should not be treated as a one-command step. A realistic workflow usually includes:
- choosing a scaffolding tool that matches the genome context
- running an initial scaffolding pass
- reviewing the resulting contact map
- correcting suspicious joins where needed
- deciding whether the output is structurally defensible enough for downstream use
Contact-map review remains especially important. Statistics alone do not always reveal problematic joins. In many plant assemblies, suspicious off-diagonal signals, diagonal breaks, or unusual long-range patterns become visible only through map inspection.
For broader analytical context, CD Genomics' advanced Hi-C analysis guide and Hi-C, Micro-C, and Capture Hi-C comparison are useful background resources.
Evaluating the Scaffolded Assembly
A scaffolded assembly should be judged by structural usefulness, not just by whether the N50 increased.
The Limits of N50
N50 remains useful, but it is incomplete. A scaffold set can become longer and still become less trustworthy structurally. Aggressive joining can inflate continuity metrics while introducing misjoins that later complicate annotation, synteny analysis, or downstream biological interpretation.
Figure 4. A stronger scaffolded assembly is defined by chromosome-scale plausibility, structural confidence, and downstream usability, not by continuity metrics alone.
A More Complete Evaluation
A stronger evaluation asks whether the assembly is becoming more usable, not simply longer.
| Continuity-Only View | Structure-Aware View |
|---|---|
| Are scaffolds longer? | Are scaffolds structurally more plausible? |
| Did N50 improve? | Did chromosome assignment improve? |
| Does the summary table look better? | Does the contact map support the scaffold structure? |
| Did continuity increase? | Is the assembly more usable for annotation and comparison? |
A more complete evaluation usually includes:
- chromosome assignment completeness
- contact-map structure
- evidence of misjoin review and correction
- remaining unresolved scaffold issues
- readiness for annotation
- suitability for comparative genomics
Deliverables That Are Actually Useful
A useful deliverable should include more than summary assembly statistics. It should also include:
- a chromosome anchoring summary
- contact-map review outputs
- a record of structural corrections
- a realistic description of unresolved limitations
A Staged Path for Complex Plant Projects
Some plant genome projects are structurally straightforward enough to move directly into full scaffolding. Others are not. In complex cases, a staged approach can save time and reduce risk.
This is particularly true for:
- very large genomes
- highly repetitive assemblies
- polyploid material
- highly heterozygous drafts
- uncertain nuclei preparation quality
- first-time scaffolding in a poorly characterized species
A staged review can help answer practical questions before a full effort is committed:
- Is the draft genuinely scaffold-ready?
- Is the library quality likely to be high enough?
- Is the chromosome-scale target realistic?
- Is manual curation likely to be minor or substantial?
The value of this step is not simply to produce more QC. It is to produce a clearer project decision: proceed, refine the draft first, adjust the Hi-C data target, or narrow the scaffolding goal.
Preparing a Scope Review
When teams ask for Hi-C scaffolding support, the quality of the first technical brief often determines how useful the first conversation will be.
At minimum, a good scope review should include:
- current assembly metrics
- assembly method
- polishing status
- estimated genome size
- ploidy notes
- heterozygosity notes
- tissue source
- intended scaffolding outcome
- downstream use case
It also helps to flag major complications early, especially:
- high repeat burden
- polyploid or hybrid material
- organellar carryover concerns
- unstable contigs
- phased budgeting needs
A scaffolding-readiness review is usually more productive than a generic pricing request. Framing the conversation around assembly status, contact-data needs, and scaffold-level goals leads to a much more useful technical discussion.
For projects ready to discuss Hi-C-based finishing, the CD Genomics Hi-C service page is a reasonable starting point, on a research-use-only basis.
Final Perspective
Hi-C adds the clearest value after Nanopore assembly when the project has already moved beyond contig building and into structural finishing. The important question is rarely whether Hi-C can produce more data. The more useful question is whether those data are being applied to the right assembly problem at the right stage.
A scaffolded plant genome is only better when it becomes:
- more structurally trustworthy
- more defensible at chromosome scale
- more useful for annotation and comparison
Frequently Asked Questions
Can Hi-C scaffolding correct local sequence errors?
Not in the same way polishing does. Hi-C mainly helps with long-range structure. It can help flag large misjoins, but it does not replace base-level correction.
How does polyploidy change what Hi-C can realistically do?
Polyploidy increases ambiguity between related chromosomes. In practice, that usually means more conservative scaffolding, more cautious interpretation, and less confidence in some joins than would be possible in a simpler diploid genome.
Can automated scaffolding output be used directly?
Sometimes in simpler genomes. In many plant projects, contact-map review still adds value because plausible-looking joins can still be structurally questionable.
When does Hi-C add the clearest value?
Usually when the draft is already contiguous and stable, contamination has been reviewed, and the remaining challenge is long-range chromosome organization rather than local sequence quality.
References
- Hou Y, Wang L, Pan W. Comparison of Hi-C-based scaffolding tools on plant genomes. Genes. 2023;14(12):2147. https://doi.org/10.3390/genes14122147
- Obinu L, Trivedi U, Porceddu A. Benchmarking of Hi-C tools for scaffolding plant genomes obtained from PacBio HiFi and ONT reads. Frontiers in Bioinformatics. 2024;4:1462923. https://doi.org/10.3389/fbinf.2024.1462923
- Sun H, Jiao W-B, Krause K, et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar. Nature Genetics. 2022;54:342-348. https://doi.org/10.1038/s41588-022-01015-0
- Walkowiak S, Gao L, Monat C, et al. Multiple wheat genomes reveal global variation in modern breeding. Nature. 2020;588:277-283. https://doi.org/10.1038/s41586-020-2961-x




