Genome Assembly Strategy Solution

A genome assembly project should not begin with a platform choice alone. It should begin with the biological question, the species, the sample quality, the expected genome complexity, and the downstream analyses the final assembly must support.

CD Genomics provides a Genome Assembly Strategy Solution to help research teams design de novo, chromosome-level, haplotype-resolved, or T2T-like assembly workflows using PacBio, Nanopore, Hi-C, genome annotation, assembly QC review, and downstream-ready bioinformatics.

We help you choose an assembly plan before sequencing begins, so your project is built around the genome resource you actually need.

  • Select the right assembly level before sequencing
  • Combine PacBio, Nanopore, Hi-C, or hybrid strategies
  • Plan for complex, repetitive, heterozygous, or polyploid genomes
  • Review N50, BUSCO, QV, completeness, and contamination metrics
  • Receive annotation and downstream-ready genome outputs
Sample Submission Guidelines

Genome assembly strategy solution overview

Solution Highlights

  • Assembly level selection
  • PacBio, Nanopore, Hi-C, or hybrid design
  • Complex genome planning
  • QC, annotation, and downstream-ready outputs

Built for research teams planning usable genome resources, not just assembly files.

Table of Contents

    Genome assembly strategy decision framework

    Explore how assembly level, genome complexity, sample quality, sequencing strategy, QC, annotation, and downstream analysis connect.

    Start with the Assembly Level Your Research Actually Needs

    Many genome assembly pages begin by comparing sequencing platforms. In practice, the better starting point is the assembly level. A microbial genome, a first draft for a non-model species, a chromosome-level plant genome, and a haplotype-resolved animal genome do not need the same plan.

    Before recommending a workflow, we help you define what the final assembly must support. A project for gene discovery may need a different assembly and annotation strategy from a project focused on trait mapping, structural variation, pan-genome construction, or population genomics.

    Draft or contig-level assembly for early genome resources

    A draft or contig-level assembly may be suitable when your goal is an early reference resource, broad gene discovery, microbial genome reconstruction, or preliminary comparative analysis. It can be useful when the genome is compact, the research question does not require chromosome-scale ordering, or the project is designed as a first step before deeper analysis.

    This level can provide a practical starting point, but it may not fully support linkage analysis, chromosome-scale structural interpretation, or complex repeat resolution.

    Chromosome-level assembly for linkage, trait, and comparative studies

    Chromosome-level assembly is often needed when genomic position matters. This includes trait mapping, breeding research, chromosome evolution, synteny analysis, comparative genomics, and many plant or animal genome projects.

    Hi-C scaffolding can help order and orient assembled contigs into chromosome-scale scaffolds. For projects where the final assembly will support downstream mapping or comparative work, chromosome-level structure can be more valuable than a fragmented assembly with high local accuracy but limited long-range organization.

    Haplotype-resolved assembly for heterozygous or polyploid genomes

    For highly heterozygous, outbred, hybrid, or polyploid organisms, a single collapsed representation may hide important allele- or haplotype-specific structure. In these cases, haplotype-resolved or phased assembly may be useful.

    This strategy can be important for plant and animal breeding, allele-specific gene discovery, structural variation analysis, and projects where subgenomes or homologous chromosomes need careful interpretation.

    T2T-like assembly when repeats, centromeres, and telomeres matter

    A T2T-like strategy may be considered when unresolved gaps, long repeats, centromeric regions, telomeric regions, or complex structural regions are central to the study. This is usually a higher-demand project type because it depends heavily on sample quality, read length, assembly strategy, and manual or custom review.

    Not every project needs a T2T-like assembly. We help you decide whether this level of resolution is necessary for your research question or whether a chromosome-level or phased assembly would be more practical.

    Match the Sequencing Strategy to Genome Complexity

    Once the target assembly level is clear, the sequencing strategy becomes easier to design. Different genomes require different evidence layers. Genome size, repeat content, ploidy, heterozygosity, contamination risk, and sample quality all affect the final assembly plan.

    When PacBio HiFi is the accuracy anchor

    PacBio SMRT Sequencing can support genome assembly projects that require long-read evidence with high consensus accuracy. PacBio HiFi reads are often valuable for de novo genome assembly because they combine long-read structure with high per-read accuracy.

    PacBio HiFi can be especially useful when the project needs reliable consensus quality, strong gene-space recovery, and a clean foundation for annotation.

    When Nanopore ultra-long reads help bridge repeats

    Nanopore sequencing can be useful when the genome contains long repeats, large structural regions, or gaps that require longer spanning evidence. For some complex genomes, ultra-long reads can help bridge regions that shorter reads cannot resolve.

    Nanopore data may also be considered in T2T-like strategies or projects where read length is a major advantage.

    When Hi-C is needed for chromosome-scale scaffolding

    Hi-C Sequencing Service provides long-range contact information that can help order and orient contigs into chromosome-scale scaffolds. It is especially relevant when the final assembly needs chromosome-level structure.

    Hi-C is not simply an optional decoration. When the research goal depends on chromosome-scale organization, Hi-C or another long-range scaffolding approach may be a key part of the strategy.

    When short-read polishing or hybrid evidence still helps

    Short-read sequencing can still have value in assembly projects. It may support polishing, local correction, contamination review, variant evaluation, or complementary analysis depending on the project design.

    A hybrid strategy can be useful when one data type does not answer every question. The point is not to include every technology, but to combine the evidence layers that fit the genome and downstream goal.

    What We Review Before Recommending an Assembly Plan

    A good assembly plan starts with risk review. We do not want to recommend a high-end workflow that the sample cannot support, or a minimal workflow that cannot answer the downstream research question.

    Species, genome size, ploidy, and heterozygosity

    We first review the organism and any available genome information. Useful details include estimated genome size, ploidy, known repeat content, expected heterozygosity, related reference genomes, and whether the species is domesticated, wild, hybrid, inbred, outbred, or polyploid.

    These details help determine whether the project may need a standard de novo assembly, chromosome-level scaffolding, haplotype-resolved assembly, or a more advanced strategy.

    HMW DNA quality and sample risk

    High molecular weight DNA quality is one of the most important factors in long-read genome assembly. Tissue type, preservation method, extraction difficulty, DNA fragment size, contaminants, polysaccharides, polyphenols, microbial contamination, and sample age can all affect library construction and assembly continuity.

    For difficult samples, we review feasibility before locking the assembly strategy.

    Existing sequencing data or draft assemblies

    Some projects begin with existing data. You may already have short reads, PacBio reads, Nanopore reads, Hi-C data, or a fragmented draft assembly.

    In these cases, we can help evaluate whether the data can be reused, improved, scaffolded, polished, annotated, or integrated into a revised assembly workflow.

    Downstream goals that affect assembly design

    Downstream goals should shape the assembly plan. A genome intended for gene annotation may need different priorities than one intended for structural variation, pan-genome analysis, genome-wide association, population genomics, or breeding marker development.

    We review these goals early so the assembly is designed as a usable genome resource, not just a FASTA file.

    Genome Assembly Strategy Options Compared

    The best genome assembly strategy depends on both the genome and the research goal. The table below summarizes common options and how we help position them.

    Strategy Best-fit use case Sample requirement sensitivity Genome complexity fit QC considerations Downstream readiness
    Short-read draft assembly Compact genomes, early screening, simple microbial projects, or preliminary resources Moderate; shorter DNA fragments may be acceptable depending on project Limited for high repeats, large genomes, and complex structure Needs coverage review, contamination check, and assembly completeness review May support basic gene discovery or microbial analysis, but limited for complex downstream structure
    PacBio HiFi assembly Accurate de novo assembly, gene-space recovery, reference genome construction, phasing-ready projects Requires suitable high-quality DNA Strong for many plant, animal, fungal, and non-model genomes Evaluate contig N50, BUSCO, QV, completeness, contamination, and phasing if applicable Strong foundation for annotation, comparative genomics, and many reference genome projects
    Nanopore long-read or ultra-long assembly Repeat-rich regions, long structural regions, gap closure, T2T-like strategies Highly sensitive to HMW DNA quality and fragment length Strong when long-read span is critical Evaluate read length, coverage, consensus quality, repeat resolution, and polishing strategy Useful for complex structure, gap resolution, and long-range genome architecture
    Hi-C scaffolding Chromosome-level assembly, linkage, synteny, breeding, comparative genomics Requires suitable material for Hi-C library preparation Strong for ordering and orienting contigs into chromosome-scale scaffolds Evaluate contact map quality, scaffold accuracy, misjoins, and chromosome assignment Important for chromosome-level downstream work
    Hybrid assembly Projects needing complementary accuracy, continuity, polishing, or long-range evidence Depends on all data types included Flexible for complex or high-value projects Requires careful integration and cross-platform QC Strong when the assembly must support multiple downstream uses
    Haplotype-resolved assembly Heterozygous, hybrid, outbred, or polyploid organisms Requires strong data quality and sufficient coverage Strong when allele-specific or subgenome-aware interpretation matters Evaluate phasing accuracy, haplotype separation, duplication, and completeness Useful for breeding, allele-specific analysis, SV, and complex genome interpretation
    T2T-like assembly Centromeres, telomeres, long repeats, unresolved gaps, premium reference resources Very sensitive to sample quality, read length, and data design Strong for difficult repetitive regions when supported by data Evaluate gap closure, repeat resolution, QV, manual review, and structural consistency Useful for high-end reference projects and repeat-centric research
    Microbial, fungal, or compact genome assembly Bacterial, fungal, viral, plasmid, or engineered strain genomes Often less demanding than large eukaryotic genomes, but contamination control is important Suitable for compact genomes; strategy depends on plasmids, repeats, and genome structure Evaluate circularization, contamination, plasmids, completeness, and annotation quality Useful for strain characterization, comparative genomics, and synthetic biology research

    End-to-End Workflow from Sample Review to Usable Genome Resource

    From sample feasibility review to sequencing design, assembly, QC, annotation, and downstream-ready files

    Genome assembly workflow with sequencing strategy and QC checkpoints

    A genome assembly project moves through several technical and decision checkpoints. We build the workflow around the final assembly level and the downstream research goal.

    We review the organism, sample type, preservation method, expected DNA quality, and risk factors. For long-read assembly, high molecular weight DNA is often critical. When sample risk is high, we discuss options before sequencing begins.

    Based on the target assembly level, we recommend the data types needed. This may include PacBio HiFi, Oxford Nanopore, Hi-C, short-read polishing, or a hybrid design. The sequencing plan should match the genome complexity rather than follow a fixed template.

    The assembly workflow may include contig assembly, polishing, haplotype separation, scaffold construction, Hi-C-based ordering and orientation, gap review, and T2T-like refinement when appropriate.

    When included in the project, we support repeat annotation, gene prediction, functional annotation, and downstream bioinformatics. The final output can include assembly files, annotation files, QC reports, visual summaries, and project documentation.

    Sample Requirements and Project Intake Information

    Sample quality has a direct effect on genome assembly strategy. Long-read assembly, chromosome-level scaffolding, haplotype-aware projects, and T2T-like workflows may require different sample and data planning.

    Final requirements depend on species, genome size, ploidy, tissue type, preservation method, platform choice, and target assembly level. Before project confirmation, our team reviews the information below and recommends the most suitable workflow.

    Sample or input type What we review Quality focus Required project information Typical QC checkpoints Notes
    Fresh or frozen tissue for HMW DNA Tissue type, preservation, expected DNA yield, contamination risk Long fragment DNA suitable for long-read libraries Species, genome size estimate, ploidy, downstream goal DNA integrity, purity, concentration, fragment size, contamination review Final requirements depend on species, genome size, assembly level, and platform strategy
    Plant, animal, fungal, or non-model organism samples Sample source, tissue difficulty, inhibitors, related references, expected repeat content Feasibility for de novo, chromosome-level, or phased assembly Species, sample source, estimated genome size, ploidy, target assembly level Sample suitability review, DNA quality review, contamination risk review Complex or inhibitor-rich tissues may require special review before workflow selection
    Existing sequencing data FASTQ/BAM files, platform, coverage, read length, sample labels, prior assembly Compatibility with reassembly, polishing, scaffolding, or annotation Sequencing platform, genome target, prior assembly files, analysis goal File integrity, read QC, coverage review, assembly feasibility review Can support rescue, improvement, reanalysis, or downstream annotation when data quality is suitable
    Draft assembly files Assembly FASTA, statistics, annotation status, contamination concerns, scaffolding needs Improvement potential and downstream suitability Assembly FASTA, existing QC, species information, desired improvement level Contiguity review, BUSCO review, contamination check, scaffold feasibility review May be improved through polishing, scaffolding, annotation, or custom bioinformatics depending on data

    How to Read Genome Assembly QC Without Over-Relying on N50

    N50 is widely used, but it should not be the only metric used to judge a genome assembly. A high N50 can reflect long contigs or scaffolds, but it does not automatically mean the assembly is complete, accurate, correctly scaffolded, or useful for every downstream analysis.

    QC metric What it helps evaluate What it does not fully answer
    Contig N50 Assembly continuity before scaffolding Completeness, correctness, contamination, or gene recovery
    Scaffold N50 Long-range scaffold continuity Whether scaffolds are correctly ordered and oriented
    BUSCO Gene-space completeness using conserved genes Repeat resolution, structural correctness, or whole-genome accuracy
    QV Consensus accuracy estimate Long-range structure, phasing quality, or annotation usefulness
    Genome size comparison Whether assembly size matches expectation Whether the sequence is complete or correctly assembled
    Contamination review Non-target sequence or mixed-sample risk Biological interpretation or annotation accuracy by itself
    Hi-C contact map review Chromosome-level scaffolding consistency Base-level accuracy or gene completeness
    Annotation summary Gene prediction and functional interpretation readiness Whether assembly structure is fully correct

    N50 can help describe assembly continuity, but it does not measure everything. A high N50 assembly can still have contamination, misjoins, missing genes, collapsed repeats, or poor annotation readiness.

    BUSCO helps evaluate conserved gene completeness, while QV can provide a consensus accuracy estimate when applicable. These metrics help complement N50, especially when the assembly will support gene discovery, comparative genomics, or publication-oriented research.

    The best QC framework depends on what the assembly must support. A genome used for gene annotation, pan-genome analysis, structural variation, or trait mapping may need different checks. We help interpret QC in the context of the research goal.

    Annotation and Downstream Analysis Make the Assembly Usable

    A genome assembly becomes more valuable when it is connected to annotation and downstream analysis. For many research teams, the final goal is not only a FASTA file. It is a usable genome resource.

    Genome annotation and gene prediction

    We can support Genome Annotation and Gene Prediction Service for projects that require gene models, coding sequences, protein sequences, functional annotation, and annotation summaries.

    This is especially important for non-model organisms, species with limited annotation resources, and projects focused on gene discovery.

    Repeat annotation and functional annotation

    Repeat annotation helps characterize transposable elements, repetitive regions, and repeat content that may influence assembly strategy and downstream interpretation. Functional annotation can help connect predicted genes with known databases, pathways, gene families, or biological functions.

    Comparative genomics, pan-genome, SV, and population support

    When the assembly will support downstream studies, we can help plan additional analyses through Genomic Data Analysis, Pan Genome, Variant Calling, and Population Genetics services.

    These modules can support comparative genomics, gene family expansion, pan-genome construction, structural variation analysis, population genomics, and breeding-related research.

    Files your team can reuse for future studies

    • Assembly FASTA
    • GFF or GTF annotation files
    • Repeat annotation files
    • Protein FASTA and CDS FASTA
    • BUSCO reports and QV summaries
    • Hi-C scaffolding outputs
    • Comparative genomics tables
    • Pan-genome or SV-ready files
    • Project report

    Choose a Strategy Based on the Research Question, Not the Technology Name

    A good assembly strategy starts with the research question. We help you decide what genome resource is needed and which data types can support it.

    If your goal is a first reference genome

    A de novo reference strategy may be suitable when no close reference exists or when you need a new genome resource for a non-model species. In many cases, De Novo Whole Genome Sequencing Service or Plant/Animal Whole Genome de novo Sequencing can support this goal.

    If your goal is trait mapping or breeding support

    Chromosome-level assembly may be more useful when genomic position matters. Hi-C scaffolding can support trait mapping, linkage analysis, comparative genomics, and breeding-related research.

    If your goal is polyploid or haplotype-aware interpretation

    Haplotype-resolved assembly may be needed when the organism is highly heterozygous, outbred, hybrid, or polyploid. This strategy can help preserve allele- or subgenome-specific structure when supported by data.

    If your goal is pan-genome, SV, or population genomics

    If the assembly will support pan-genome construction, structural variation analysis, or population genomics, we help plan the assembly and downstream outputs together. The goal is to avoid building an assembly that looks acceptable on paper but is not suitable for the next analysis step.

    Request Assembly Strategy Review

    Compliance / Disclaimer

    CD Genomics provides this service for Research Use Only (RUO). This service is not intended for clinical diagnosis, direct medical interpretation, patient management, treatment guidance, direct-to-consumer testing, or guaranteed discovery claims.

    Demo Results

    Demo results help your team understand what an assembly project may deliver. These examples show output types, not fixed biological conclusions.

    Assembly continuity and chromosome scaffolding summary

    Assembly continuity and chromosome scaffolding summary

    This output may show contig statistics, scaffold statistics, chromosome-scale scaffold views, and a Hi-C contact map summary when Hi-C scaffolding is included.

    BUSCO QV and contamination review dashboard

    BUSCO, QV, and contamination review dashboard

    This output summarizes assembly completeness, consensus quality, and contamination review in a compact format.

    Annotation and downstream-ready output view

    Annotation and downstream-ready output view

    This output may show gene annotation summaries, repeat annotation tracks, gene family outputs, and files prepared for comparative or population-level analysis.

    FAQ

    1. What is a Genome Assembly Strategy Solution?

    It is a research-focused service approach that helps you choose and execute the right genome assembly plan based on your species, sample quality, genome complexity, assembly level, and downstream goals.

    2. How do I know what assembly level my project needs?

    The right level depends on the research question. Early gene discovery may need a draft or contig-level assembly, while trait mapping, synteny, and breeding research often benefit from chromosome-level assembly. Haplotype-resolved or T2T-like strategies may be considered for complex genomes or repeat-rich regions.

    3. When is draft assembly enough?

    Draft assembly may be enough for compact genomes, early reference development, preliminary gene discovery, or projects where chromosome-scale position is not central. It may not be enough for linkage, structural variation, chromosome evolution, or pan-genome work.

    4. When should I choose chromosome-level assembly?

    Chromosome-level assembly is useful when genomic position, long-range structure, trait mapping, synteny, or comparative genomics matters. Hi-C or related scaffolding methods may be used to support this level.

    5. When is haplotype-resolved assembly important?

    Haplotype-resolved assembly can be important for heterozygous, outbred, hybrid, or polyploid organisms. It helps preserve allele- or haplotype-specific information when the data and project design support it.

    6. When is T2T-like assembly worth considering?

    A T2T-like strategy may be worth considering when centromeres, telomeres, large repeats, unresolved gaps, or high-end reference genome quality are central to the research question. It is more demanding and should be planned carefully.

    7. How do PacBio and Nanopore differ for genome assembly?

    PacBio HiFi reads are often valued for high-accuracy long-read assembly. Nanopore long or ultra-long reads can be useful for spanning long repeats and complex regions. Many projects benefit from choosing one or combining technologies based on the genome and research goal.

    8. Why is Hi-C useful for chromosome-level assembly?

    Hi-C provides long-range contact information that can help order and orient contigs into chromosome-scale scaffolds. It is especially useful when downstream analysis depends on chromosome-level structure.

    9. Why should I not rely on N50 alone?

    N50 describes continuity, but it does not fully measure completeness, accuracy, contamination, misassembly risk, or annotation readiness. A strong QC review should combine multiple metrics.

    10. What sample information is needed before recommending a strategy?

    Useful information includes species, genome size estimate, ploidy, sample type, preservation method, DNA quality, expected heterozygosity, related reference genomes, existing sequencing data, and downstream research goals.

    11. Can existing sequencing data or draft assemblies be improved?

    Yes. Existing data or draft assemblies may support polishing, scaffolding, reassembly, annotation, contamination review, or downstream analysis when data quality is suitable.

    12. What deliverables can I expect from a genome assembly project?

    Deliverables may include assembly FASTA files, QC summaries, N50 statistics, BUSCO reports, QV estimates, contamination review, annotation files, repeat annotation, Hi-C scaffolding outputs, genome browser-ready files, and project reports.

    13. Can genome assembly results support annotation, pan-genome, SV, or population genomics?

    Yes. When planned correctly, genome assembly can support annotation, comparative genomics, pan-genome analysis, structural variation analysis, and population genomics. These downstream needs should be considered before the assembly plan is finalized.

    14. Is this service intended for clinical or diagnostic use?

    No. This service is designed for research-focused genome assembly and bioinformatics projects only.

    Literature Case: Hi-C Scaffolding Changes How Genome Assemblies Are Evaluated

    Published Research Highlight

    Benchmarking of Hi-C tools for scaffolding plant genomes obtained from PacBio HiFi and ONT reads

    Journal: Frontiers in Bioinformatics
    Published: 2024

    Background

    Chromosome-level genome assembly often requires more than contig generation. Hi-C reads can help order and orient large genomic regions into scaffolds, making them useful for projects that need chromosome-scale structure.

    Methods

    The study generated two de novo Arabidopsis thaliana assemblies from the same PacBio HiFi and Oxford Nanopore data. It then scaffolded the assemblies using 3D-DNA, SALSA2, and YaHS.

    The scaffolded assemblies were evaluated using contiguity, completeness, accuracy, and structural correctness. This design is relevant because it compares not only sequencing data types, but also downstream scaffolding and quality interpretation.

    Results

    1. The study reported that Hi-C scaffolding tools showed different performance characteristics across the evaluated assemblies.
    2. YaHS performed best in that analysis.
    3. The broader lesson is important for project planning: chromosome-level assembly quality depends not only on sequencing platform, but also on scaffolding method, QC review, and structural correctness.

    Hi-C scaffolding benchmark for plant genome assemblies generated from PacBio HiFi and Oxford Nanopore readsA Hi-C scaffolding benchmark illustrates why chromosome-level genome assembly should be evaluated by contiguity, completeness, accuracy, and structural correctness rather than one metric alone.

    Conclusion

    This case supports the central idea behind our Genome Assembly Strategy Solution. Genome assembly planning should not stop at choosing PacBio, Nanopore, or Hi-C. A strong strategy also considers assembly level, scaffolding method, QC metrics, annotation, and downstream usability.

    For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
    Related Services
    Quote Request
    ! For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
    Contact CD Genomics
    Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
    Top