If you're planning animal and plant de novo genome sequencing for a complex genome (high repeat content, high heterozygosity, or polyploidy), the fastest way to burn budget is to treat the project like "ship DNA → get a FASTA."
Evaluation-stage teams usually fail for more mundane reasons: DNA quality that looked "fine" on a NanoDrop, missing QC reports, unclear acceptance metrics, or deliverables that aren't publication-ready.
This guide is designed as a pre-flight checklist: sample requirements, QC checkpoints, and deliverables you should agree on before you start—with explicit "done when…" gates you can use in a statement of work.
Key takeaways
Define success upfront with an acceptance bundle that includes contiguity + completeness + correctness (not just N50).
For long-read assemblies, HMW DNA integrity is the limiting factor—set explicit intake gates (Qubit mass, NanoDrop ratios, PFGE/Femto Pulse profile, handling notes).
Require intermediate QC artifacts during library prep and sequencing so failures can be diagnosed (not hidden behind a final FASTA).
Choose strategy based on your constraint: HiFi for consensus accuracy, ultra-long reads for repeat resolution, and specify the polishing/validation plan.
Treat assembly as milestone deliverables (draft → polished → scaffolded) with validation at each stage (mapping/k-mer checks, BUSCO, QV/equivalent, contamination screening).
Agree on a deliverables package (raw reads, versioned assemblies, QC reports, pipeline manifests) so the work is reproducible and publication-ready.
Put governance in writing: rerun/top-up policy, timeline by milestone, and IP/data handling expectations.
Step 1: Define success before you ship a sample
Before you talk about platforms or coverage, lock down what "success" means for your specific organism and downstream goals.
1.1 Choose the target assembly level
Pick one and write it into the project scope:
Draft contig assembly (fastest, cheapest): useful for exploratory gene discovery and some comparative genomics.
Polished contig assembly (common baseline): higher consensus accuracy; better for annotation and downstream comparisons.
Chromosome-scale assembly (often needed for complex plant/animal genomes): requires scaffolding evidence (typically Hi-C) and stronger validation.
Near T2T (only for some projects): requires aggressive QC, multi-data integration, and realistic expectations.
1.2 Define acceptance metrics (don't let N50 be the only headline)
A high N50 can coexist with misjoins, collapsed haplotypes, or missing genic regions. For evaluation-stage work, set an acceptance bundle that covers contiguity + completeness + correctness.
Minimum acceptance bundle:
Completeness: BUSCO completeness using the correct lineage dataset (e.g., Embryophyta for plants, Metazoa for animals)
Correctness: a consensus accuracy proxy (often reported as QV or an equivalent method) with the method stated
Contiguity: contig N50 and (if scaffolding is in scope) scaffold N50
Key Takeaway: Treat N50 as a contiguity descriptor, not a quality guarantee. Your acceptance criteria should include at least one completeness metric and one correctness metric.
Step 2: Sample requirements for animal and plant de novo genome sequencing
Complex animal/plant assemblies are disproportionately sensitive to DNA integrity. If you want long reads, you need high molecular weight DNA (HMW DNA)—and you need to protect it from shearing.
2.1 The minimum QC panel you should run (and record)
Oxford Nanopore's QC guidance is explicit about purity thresholds and handling. They state DNA should have OD 260/280 ~1.8 and OD 260/230 2.0–2.2, and recommend a Qubit fluorometer for accurate DNA quantification (with NanoDrop mainly for purity checks when DNA is sufficiently concentrated). Oxford Nanopore "Input DNA/RNA QC" (last updated 2026-04-10)
Use this as a vendor-neutral baseline QC gate.
2.2 HMW DNA acceptance table (practical gates)
QC item
What to measure
"Pass" gate (typical)
Done when…
Quantity
Qubit dsDNA (BR/HS as appropriate)
Enough for planned library strategy + contingency
You can allocate input for library prep and reserve material for reruns
Purity (protein/phenol)
NanoDrop A260/280
~1.8 (roughly 1.75–2.0)
Value is stable across re-measurements and consistent with clean gDNA
Purity (salts/organics)
NanoDrop A260/230
2.0–2.2
No clear inhibitor signal; no need for extra cleanup
Integrity / size
PFGE or Femto Pulse (preferred for >10 kb)
Strong HMW peak; minimal smear
Fragment-size profile supports long-read library construction
Handling
Process notes
No vortexing; gentle mixing
Extraction/handling steps are documented and reproducible
2.3 Plant- and animal-specific failure modes to plan for
This is where many SOWs are too generic.
Plant samples (common failure modes)
Polysaccharides/phenolics → low A260/230 and enzymatic inhibition.
Tough cell walls → harsh extraction → shearing.
Chloroplast/mitochondrial overrepresentation → surprises in assembly totals.
Animal samples (common failure modes)
Field collection / microbiome exposure → contamination risk.
Tissue degradation in transport → fragmented DNA profiles.
Inconsistent tissue type across samples → variability in yield and inhibitor load.
"Done when…" for this subsection:
You have a written mitigation plan for your top 1–2 risks (extra cleanup step, alternate tissue choice, repeat extraction threshold, etc.).
2.4 Input amount expectations (why genome size belongs in the first email)
Providers often specify DNA input either as an absolute mass (µg) or as a genome-size-scaled guideline. For example, PacBio discusses HiFi library inputs in workflow-specific terms (with standard workflows often described as scaling with genome size, and low/ultra-low workflows available for constrained samples). PacBio "New Ampli‑Fi ultra‑low‑input protocol"
Practical evaluation question to ask your provider:
"Give me a table that maps genome size × ploidy/heterozygosity assumptions × library strategy × required input × contingency. What happens if sample QC barely misses the gate?"
Step 3: QC checkpoints during library construction and sequencing
If you only receive raw FASTQ and a final assembly, you've lost visibility into where things went wrong. Your evaluation checklist should require intermediate QC artifacts.
3.1 Library QC checkpoints to request
Ask the provider to report (at minimum):
DNA quantification method and kit (and whether RNase treatment was used)
Size distribution method (PFGE/Femto Pulse; settings)
Library size distribution (where applicable)
Any cleanup steps applied and why
A "stop/go" decision note when a sample is borderline (and whether re-extraction or alternate tissue was recommended)
Done when…
You can explain, sample by sample, whether the limiting factor is purity, yield, or fragmentation.
3.2 Sequencing run QC checkpoints to request
For each sample (or each library pool), request:
Yield (Gb) and read count
Read length distribution (including read N50)
Quality distribution (platform-appropriate)
Coverage estimate vs target with the assumed genome size stated
Pro Tip: Require the provider to state the genome size assumption used for coverage calculations. Otherwise, "30× coverage" can be meaningless.
Step 4: Choose a sequencing strategy that matches your failure modes
For complex plant and animal genomes, strategy is usually a trade-off between:
If consensus accuracy is your constraint (gene models, annotation confidence, polished variant calls): bias toward PacBio HiFi sequencing.
If long-range repeat resolution is your constraint (extreme repeats, structural complexity): consider Oxford Nanopore ultra-long reads, often paired with an explicit polishing plan.
4.2 Coverage: how to ask for it (and how to avoid paying for the wrong thing)
Coverage targets should be written per haplotype when relevant, and tied to an explicit output.
PacBio guidance commonly frames de novo assembly planning in terms of HiFi read coverage per haplotype (for many projects, a 10–15× per-haplotype recommendation is a common starting point, with higher targets in ultra-low-input contexts). Use that as a planning baseline, then adjust for genome complexity and assembly goals.
For a practical explanation of why "more data" has diminishing returns (and where it still matters), PacBio's coverage explainer is a useful reference for building an internal cost-benefit model for depth planning. PacBio "Sequencing 101: sequencing coverage" (updated 2026-04-13)
Practical evaluation questions to ask your provider:
What's the proposed coverage target (and genome size assumption)?
Which metric improves if we add more data—BUSCO, correctness (QV), scaffold correctness, or mostly N50?
What's the planned fallback if the first run misses the target (top-up sequencing vs re-library vs re-extraction)?
4.3 When Hi-C is worth it (and what "QC" should look like)
Hi-C scaffolding is usually worth considering when:
you need chromosome-scale assemblies for synteny, breeding-relevant haplotypes, or large-scale rearrangements
your genome is repeat-rich or polyploid and contigs alone won't resolve structure
If you include Hi-C, don't accept it as a black box: require contact-map evidence and a written description of how misjoins were handled.
For example, CD Genomics offers HiFi-C as a long-read + conformation capture approach that can support chromosome-scale objectives in appropriate projects. CD Genomics HiFi‑C sequencing
Step 5: Assembly QC checkpoints (validate at each milestone)
Treat assembly as a multi-stage deliverable. For evaluation-stage work, don't accept a single "final FASTA" without the QC trail.
Correctness/accuracy report (QV or equivalent), with methodology stated
Contamination screen output and interpretation
6.5 Optional but common add-ons
Structural variant analysis outputs (if in scope)
Annotation package (gene models, functional annotation) if requested
CD Genomics positions its animal/plant genomics services as end-to-end, including sequencing, assembly, and optional downstream analyses and reporting—so this deliverables checklist maps cleanly to what a full-service provider should be able to package in a reproducible way. CD Genomics animal/plant whole genome sequencing
Step 7: Governance items that prevent surprises (cost, timeline, reruns, and IP)
Teams working on milestone-based acceptance and publication timelines should treat governance as a first-class technical requirement.
7.1 Rerun and failure policy (must be explicit)
Write these answers into the SOW:
If sample QC fails at intake, what's the recommended action (cleanup vs re-extraction vs alternate tissue)?
Who pays for resequencing if yield is below target because of library failure vs because the input was out-of-spec?
Is top-up sequencing possible without rebuilding the library?
7.2 Timeline transparency by milestone
Ask for a milestone timeline (not a single date):
raw data delivery
draft assembly delivery
polished assembly delivery
scaffolded assembly delivery (if in scope)
final reproducible package delivery (all files + manifests)
7.3 Data ownership and sovereignty
Evaluation-stage objections often include data security and IP.
Done when…
storage location, retention window, and access controls are agreed
publication constraints and embargo needs are documented
Next steps
If you want a second set of eyes on your plan for animal and plant de novo genome sequencing, share:
organism + estimated genome size
ploidy/heterozygosity expectations
desired assembly level (contig vs chromosome-scale)
whether Hi-C is planned
…and we'll translate that into a concrete QC gate + deliverables checklist you can use to compare providers and reduce re-sampling risk.
If you're exploring provider options that cover both PacBio and ONT strategies plus bioinformatics delivery, CD Genomics' long-read service hub is a useful starting point: CD Genomics LongSeq.
FAQ
What's the single most common cause of failure in plant/animal de novo genome projects?
Not a fancy algorithm—usually HMW DNA that isn't actually HMW anymore (shearing during extraction/handling) or chemical inhibitors that block enzymatic steps.
Can I rely on NanoDrop alone?
Use NanoDrop for purity ratios, but for DNA mass, Qubit-style fluorometry is preferred because contaminants and residual RNA distort absorbance-based readings.
Do I always need Hi-C?
No. If your downstream requires chromosome-scale structure (e.g., large-scale synteny, structural rearrangements, breeding-relevant haplotypes), scaffolding evidence can be worth it. If your goal is an accurate contig set for gene discovery, it may be unnecessary.
What deliverable do I need for publication and reproducibility?
At minimum: raw FASTQ, versioned assemblies, BUSCO report, and a pipeline manifest (tools, versions, parameters). Without those, reviewers (and future you) can't reproduce the result.
Author: Dr. Yang H., Senior Scientist at CD Genomics
Dr. Yang H. focuses on long-read sequencing (PacBio SMRT and Oxford Nanopore) and de novo genome assembly workflows for complex animal and plant genomes, with an emphasis on sample QC, acceptance metrics, and reproducible deliverables.
For Research Use Only. Not for use in diagnostic procedures.
Talk about your projects
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment