What Is De Novo Whole Genome Sequencing?
Whole-genome de novo sequencing is the process of assembling a complete genome from scratch — without relying on any pre-existing reference sequence. Unlike resequencing, which aligns reads to a known genome, de novo assembly reconstructs the genome directly from sequencing fragments, making it the method of choice for species that lack a reference genome or for projects that require an unbiased view of genomic content.
CD Genomics provides comprehensive de novo genome sequencing services covering the full spectrum of biological complexity — from viral, bacterial, and fungal genomes to large plant and animal genomes. Our multi-platform approach integrates Illumina short reads, PacBio HiFi long reads, Oxford Nanopore ultra-long reads, and Hi-C chromatin interaction data to deliver assemblies ranging from standard draft quality to telomere-to-telomere (T2T) gapless resolution, including pan-genome construction when population-level diversity is needed.
De Novo Sequencing Across All Organism Types
De novo genome sequencing is not a one-size-fits-all service. Different organism groups present distinct challenges in genome size, ploidy, repeat content, and heterozygosity. Our service portfolio is structured to address these differences with tailored experimental and analytical strategies.

Microbial Whole Genome De Novo Sequencing (Viral / Bacterial / Fungal)
Microbial genomes range from a few kilobases (viruses) to tens of megabases (fungi). Our microbial de novo workflow combines ONT or PacBio long reads for contiguous assembly with optional Illumina short reads for polishing. This hybrid approach consistently produces complete, circular chromosomes and fully assembled plasmids.
- Viral genomes — Reference-free assembly for DNA and RNA viruses. See our viral genome sequencing service.
- Bacterial genomes — Complete chromosome and plasmid assembly with methylation detection. Visit bacterial whole genome de novo sequencing.
- Fungal genomes — From yeast to filamentous fungi, including repeat-rich regions. Explore our fungal whole genome de novo sequencing service.
Plant and Animal Whole Genome De Novo Sequencing
Plant and animal genomes are larger and more complex, often involving polyploidy, high heterozygosity, and large repetitive element fractions. Our standard workflow integrates genome survey (Illumina, ~50×), long-read backbone (PacBio HiFi, 30–60×), Hi-C scaffolding (≥100×), and optional ONT ultra-long gap closure. This strategy routinely delivers chromosome-level assemblies across crops, livestock, wildlife, and aquatic species. See our dedicated plant and animal whole genome de novo sequencing page.
Telomere to Telomere (T2T) Genome Assembly
When standard reference-grade assemblies still contain unresolved gaps — in centromeres, telomeres, segmental duplications, or ribosomal DNA arrays — T2T genome assembly is the next step. T2T assembly aims for gapless reconstruction of every chromosome from one end to the other. The Earth BioGenome Project defines T2T quality as zero sequence gaps with a base accuracy QV > 60.
Our Technology Synergy
| Component | Role | Platform |
|---|---|---|
| Accuracy anchor | High-fidelity backbone with base-level precision | PacBio HiFi (15–20 kb, Q20+) |
| Bridge builder | Ultra-long reads spanning centromeres and rDNA arrays | ONT ultra-long (N50 > 100 kb) |
| Scaffolding architect | Chromosome ordering and structural validation | Hi-C or Pore-C |
The combination of HiFi accuracy with ONT ultra-long contiguity has proven effective even in complex polyploid genomes. A 2024 review in Nature Genetics (Garg et al., doi:10.1038/s41588-024-01830-7) highlights how T2T strategies are now being applied to crops with high ploidy and large repeat fractions, transforming the resolution of plant genomics.
Our T2T service portfolio includes three tiers: Vertebrate T2T (large Gb-scale genomes), Plant T2T (polyploid crops and repeat-rich genomes), and Bacterial T2T (closed circular chromosomes with zero gaps). For detailed information, see our telomere-to-telomere sequencing service.

Pan Genome Analysis
A single reference genome captures the genetic content of one individual, but it cannot represent the full genetic diversity of a species. Pan-genome analysis addresses this by constructing a comprehensive gene repertoire across multiple individuals or strains.
Pan-genomes consist of:
- Core genome — Genes present in all individuals, typically responsible for essential biological functions.
- Variable (accessory) genome — Genes present in a subset of individuals, often encoding adaptive traits, virulence factors, or strain-specific functions.
Our pan-genome workflow supports both linear and graph-based approaches:
- De novo assembly of multiple individuals using the same multi-platform strategy
- Core/accessory genome classification
- Gene presence-absence variation (PAV) analysis
- Phylogenetic analysis based on gene content
- Optional graph-based pan-genome construction for complex datasets
A minimum of two samples is needed, but larger sample sizes (tens to hundreds) provide increasingly comprehensive coverage. Visit our pan genome analysis service for more details.

Technology and Platform Strategy
Choosing the right sequencing platform for a de novo genome project depends on genome size, complexity, and the desired assembly quality. The table below summarizes the complementary roles of each platform in our workflow.
| Platform | Read Type | Typical Length | Accuracy | Primary Role in De Novo Assembly |
|---|---|---|---|---|
| Illumina | Short paired-end | 150 bp × 2 | >Q30 | Genome survey, polishing, variant validation |
| PacBio HiFi | Circular consensus | 15–20 kb | >Q20 (99.9%) | Primary contig backbone, haplotype resolution |
| Oxford Nanopore | Native long reads | 10–100+ kb | Varies by basecaller | Gap closure, repeat spanning, SV detection |
| Hi-C | Chromatin conformation capture | PE150 | >Q30 | Chromosome anchoring and scaffolding |
For most plant and animal de novo projects, a hybrid strategy combining HiFi (30–60×) with Hi-C (≥100×) delivers chromosome-scale assemblies with contig N50 values exceeding 10 Mb. Adding ONT ultra-long reads (40–100×) enables T2T resolution. For microbial genomes, a simpler ONT + Illumina hybrid approach is typically sufficient to generate complete, closed assemblies.
De Novo Genome Sequencing Workflow
Our end-to-end workflow is designed to maintain sample traceability and quality control at every step, from sample receipt to final data delivery.

1. Sample QC
Integrity assessment via PFGE or Femto Pulse. Purity checks (OD260/280, OD260/230, Qubit quantification). RNA removal and DNA quantification.
2. Genome Survey (Illumina)
~50× coverage short-read sequencing for k-mer analysis. Output includes genome size estimate, heterozygosity rate, repeat content, and GC content.
3. Long-Read Sequencing
PacBio HiFi (30–60×) and/or ONT ultra-long library preparation and sequencing. Platform selection based on genome complexity and project scope.
4. Assembly and Scaffolding
De novo assembly using hifiasm or equivalent. Hi-C reads for chromosome-level scaffolding with 3D-DNA or equivalent pipelines.
5. Polishing and Gap Closure
Short-read polishing for base-level correction. ONT ultra-long reads for resolving remaining gaps. Iterative refinement until target quality is met.
6. Quality Assessment
BUSCO completeness scoring, contiguity metrics (N50), k-mer-based QV estimation (Merqury), and LAI for plant genomes. Hi-C contact map validation.
7. Genome Annotation (Optional)
Repeat masking, gene structure prediction, functional annotation (GO, KEGG, Pfam, InterPro), and non-coding RNA identification.
Bioinformatics Analysis
Our bioinformatics pipeline is designed to deliver actionable genomic data, not just raw sequences.
Standard deliverables include:
- Genome survey report — K-mer analysis with size, heterozygosity, and repeat estimates
- De novo assembly — FASTA format with primary and alternate haplotypes if applicable
- Assembly quality assessment — Contig N50, scaffold N50, BUSCO completeness, QV score, LAI (plants)
- Genome annotation — Repeat elements, gene structure prediction, functional annotation (GO, KEGG, Pfam, InterPro)
- Hi-C validation — Contact map and chromosome anchoring verification
- Comprehensive QC report
Optional add-on analyses:
- Comparative genomics — ortholog clustering, gene family expansion/contraction, phylogeny, synteny
- Pan-genome construction — core/accessory genome classification, PAV analysis, graph-based representation
- Haplotype-resolved assembly and phasing
- Epigenetic analysis — 5mC methylation detection (PacBio HiFi or ONT native)

Sample Requirements
Proper sample quality is the foundation of a successful de novo genome assembly, especially for long-read and Hi-C sequencing.
| Sample Type | Recommended Input | Concentration | Purity (OD260/280) | Notes |
|---|---|---|---|---|
| High molecular weight gDNA | ≥1–5 µg | ≥30 ng/µL | 1.8–2.0 | Fresh tissue preferred for long-read projects; no degradation |
| Tissue (for extraction) | ≥100 mg fresh weight | — | — | Snap-frozen in liquid nitrogen; avoid RNAlater for HMW DNA |
| Whole Blood (vertebrate) | ≥2 mL | — | — | EDTA anticoagulant; store at 4°C, ship on cold packs |
| Microbial culture pellet | ≥10⁸ cells (bacteria) ≥10⁷ cells (fungi) |
— | — | Snap-frozen or in DNA preservation buffer |
| Hi-C sample | Same source as main DNA | — | — | Requires cross-linked fresh tissue; cannot use archived DNA |
All samples undergo in-house QC upon receipt. We also offer DNA extraction services for projects where sample preparation is a concern.
Deliverables
CD Genomics provides comprehensive and organized deliverables for every de novo genome sequencing project, tailored for seamless downstream analysis and publication.
| Deliverable | Description |
|---|---|
| Raw sequencing data | FASTQ files per platform and library |
| Assembly files | Genome assembly in FASTA format (primary + alternate haplotypes if applicable) |
| Assembly quality report | N50 metrics, BUSCO completeness, QV score, LAI (plants), Hi-C validation |
| Genome annotation | GFF3 annotation file (repeat, gene structure, functional annotation) |
| Comparative analysis report | Optional — ortholog tables, phylogeny, synteny blocks, PAV results |
| Project documentation | Methods summary, software and parameter logs, data usage guide |
Why Choose CD Genomics for De Novo Genome Sequencing
From advanced sequencing platforms to high-quality data delivery, CD Genomics offers an efficient, end-to-end solution tailored to diverse de novo genome sequencing needs.
- Full-spectrum coverage — From viral and bacterial genomes to complex plant and animal assemblies. One provider, one point of contact.
- Multi-platform strategy — Illumina, PacBio HiFi, Oxford Nanopore, and Hi-C available in-house. Platform combinations tailored to each genome.
- Proven track record — Completed de novo genome projects across diverse species, meeting EBP reference-grade standards (6.C.Q40 or higher).
- T2T capability — Dedicated T2T service using the HiFi + ultra-long ONT synergy for vertebrates, plants, and bacteria.
- End-to-end service — Sample QC, library preparation, sequencing, assembly, annotation, and optional pan-genome analysis under a single workflow.
CD Genomics is committed to supporting your genomic discoveries with dependable and comprehensive de novo genome sequencing services.

De Novo Genome Sequencing: From Any Species to Publication-Ready Assembly
Demo Results
Below are representative data types generated during a typical de novo genome sequencing project. Results will vary by species and project scope.
Figure 1: K-mer Distribution (Genome Survey)
K-mer frequency plot from short-read data, used to estimate genome size, heterozygosity, and repeat content. This analysis guides downstream platform selection and coverage targets.
Figure 2: Hi-C Chromatin Interaction Heatmap
Genome-wide Hi-C contact map used to order and orient contigs into chromosome-scale scaffolds. Strong diagonal signal indicates correct scaffolding.
Figure 3: BUSCO Completeness Assessment
Percentage of complete, fragmented, duplicated, and missing BUSCO genes. Reference-quality assemblies typically achieve >95% complete BUSCO scores.
Figure 4: Assembly Contiguity Comparison
Illustrative comparison of contig N50 values across strategies. Hybrid approaches combining PacBio HiFi with Hi-C consistently produce the highest contiguity. (Reference: Hotaling et al., BMC Genomics, 2023)
Reference
- Hotaling, et al. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics. 2023. https://doi.org/10.1186/s12864-023-09193-9
De Novo Genome Sequencing FAQs
1. What types of organisms can you sequence using de novo whole genome sequencing?
We provide de novo sequencing for viruses, bacteria, fungi, plants, and animals — from small microbial genomes (a few kb) to large vertebrate and plant genomes (gigabase scale). Platform strategies are tailored to each organism group.
2. What is the difference between standard de novo assembly and T2T genome assembly?
Standard de novo assembly typically resolves >95% of the genome but leaves gaps in repetitive regions such as centromeres, telomeres, and ribosomal DNA. T2T assembly closes these gaps using ultra-long reads, producing a gapless genome with all chromosomes from telomere to telomere. The Earth BioGenome Project (EBP) defines T2T quality as zero gaps with QV > 60.
3. How many samples are needed for pan-genome analysis?
A minimum of two individuals is required, but larger sample sizes (typically 10–100+) provide more comprehensive coverage of the species' genetic diversity. The optimal number depends on population structure, genetic diversity, and research objectives.
4. What sequencing platforms do you use for de novo genome assembly?
We use Illumina (short reads for survey and polishing), PacBio HiFi (high-accuracy long reads for backbone assembly), Oxford Nanopore (ultra-long reads for gap closure and repeat resolution), and Hi-C (chromosome-scale scaffolding). The platform combination is customized for each project.
5. What quality metrics are used to evaluate genome assemblies?
Standard metrics include contig N50, scaffold N50 (contiguity), BUSCO completeness (gene content, targeting >90–95% complete), QV score (base accuracy, targeting ≥Q40 per EBP standards), LAI (for plant genomes), and Hi-C contact map validation (structural correctness).
6. What are the sample requirements for de novo genome sequencing?
For standard de novo projects, high-molecular-weight gDNA with OD260/280 of 1.8–2.0 and minimal degradation is recommended. Input amounts range from ≥1 µg (short-read) to ≥5 µg (long-read). Fresh tissue is preferred for HMW DNA extraction. See the Sample Requirements section above for detailed guidelines.
7. Can you assemble polyploid or highly heterozygous genomes?
Yes. Our PacBio HiFi-based approach is specifically designed to resolve complex genomes. HiFi reads provide the accuracy needed for haplotype separation in polyploids, and higher coverage (≥60×) is applied for species with elevated heterozygosity or ploidy. A 2024 review in Nature Genetics (doi:10.1038/s41588-024-01830-7) documents successful T2T assemblies across polyploid crops using these same strategies.
8. What bioinformatics analysis is included with the de novo sequencing service?
Standard bioinformatics includes genome survey (k-mer analysis), de novo assembly, assembly quality assessment, and structural/functional annotation. Optional modules include comparative genomics (gene family, synteny, phylogeny), pan-genome construction, haplotype-resolved assembly, and epigenetic analysis.
Case Study: Telomere to Telomere Genome of Fragaria vesca
Open Access Publication Highlight
The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry
Journal: Horticulture Research
Impact Factor: 8.7
Published: 2023
Background
Fragaria vesca (wild strawberry) is a model system for fruit development, plant-pathogen interactions, and functional genomics. Despite its importance, previous genome assemblies lacked continuity with gaps in repetitive regions and unresolved centromeres, limiting structural and functional genomic studies.
Methods
The study used PacBio HiFi long-read sequencing combined with Hi-C chromatin conformation capture. The assembly employed hifiasm for primary contig generation and 3D-DNA for Hi-C-guided scaffolding, followed by manual curation to close remaining gaps.
Results
- Final T2T assembly: 220.8 Mb across all 7 chromosomes as single contigs
- All 14 telomeres and 7 centromeres precisely identified
- BUSCO completeness: 98.2%
- Zero gaps across all chromosomes
Conclusion
This T2T genome provides a gap-free reference for Fragaria genomics, enabling accurate analysis of centromere structure, telomere biology, and evolutionary dynamics. The assembly strategy demonstrates that T2T resolution is achievable for plant genomes using PacBio HiFi and Hi-C — the same approach we apply in our T2T sequencing service.
Figure 1 from Sun P, et al. Horticulture Research, 2023. Chromosome ideograms of the Fragaria vesca T2T assembly.
Reference
- Sun P, et al. The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry. Horticulture Research. 2023. https://doi.org/10.1093/hr/uhad027
Related Publications
Here are publications from projects using de novo genome sequencing and related genomic services:
A de novo assembly of genomic dataset sequences of the sugar beet root maggot Tetanops myopaeformis
Journal: Data in Brief
Year: 2024
Combinations of Bacteriophage Are Efficacious against Multidrug-Resistant Pseudomonas aeruginosa and Enhance Sensitivity to Carbapenem Antibiotics
Journal: Viruses
Year: 2024
Genetic and environmental influences on the distributions of three chromosomal inversion polymorphisms in Anopheles gambiae
Journal: PLOS Genetics
Year: 2025
The genetic legacy of fragmentation and overexploitation in the threatened medicinal plant Aquilaria sinensis
Journal: Scientific Reports
Year: 2020
Generation of a highly attenuated strain of Pseudomonas aeruginosa for commercial production of alginate
Journal: Microbial Biotechnology
Year: 2020
Genome Analysis and Replication Studies of the African Green Monkey Simian Foamy Virus Serotype 3 Strain FV2014
Journal: Viruses
Year: 2020
High-Density Mapping and Candidate Gene Analysis of Pl18 and Pl20 in Sunflower by Whole-Genome Resequencing
Journal: International Journal of Molecular Sciences
Year: 2020
Identification of factors required for m6A mRNA methylation in Arabidopsis reveals a role for the conserved E3 ubiquitin ligase HAKAI
Journal: New Phytologist
Year: 2017
See more articles published by our clients.
