Table of Contents

What Is De Novo Whole Genome Sequencing?

Whole-genome de novo sequencing is the process of assembling a complete genome from scratch — without relying on any pre-existing reference sequence. Unlike resequencing, which aligns reads to a known genome, de novo assembly reconstructs the genome directly from sequencing fragments, making it the method of choice for species that lack a reference genome or for projects that require an unbiased view of genomic content.

CD Genomics provides comprehensive de novo genome sequencing services covering the full spectrum of biological complexity — from viral, bacterial, and fungal genomes to large plant and animal genomes. Our multi-platform approach integrates Illumina short reads, PacBio HiFi long reads, Oxford Nanopore ultra-long reads, and Hi-C chromatin interaction data to deliver assemblies ranging from standard draft quality to telomere-to-telomere (T2T) gapless resolution, including pan-genome construction when population-level diversity is needed.

De Novo Sequencing Across All Organism Types

De novo genome sequencing is not a one-size-fits-all service. Different organism groups present distinct challenges in genome size, ploidy, repeat content, and heterozygosity. Our service portfolio is structured to address these differences with tailored experimental and analytical strategies.

Microbial and plant/animal genome de novo sequencing illustration

Microbial Whole Genome De Novo Sequencing (Viral / Bacterial / Fungal)

Microbial genomes range from a few kilobases (viruses) to tens of megabases (fungi). Our microbial de novo workflow combines ONT or PacBio long reads for contiguous assembly with optional Illumina short reads for polishing. This hybrid approach consistently produces complete, circular chromosomes and fully assembled plasmids.

Viral genomes — Reference-free assembly for DNA and RNA viruses. See our viral genome sequencing service.
Bacterial genomes — Complete chromosome and plasmid assembly with methylation detection. Visit bacterial whole genome de novo sequencing.
Fungal genomes — From yeast to filamentous fungi, including repeat-rich regions. Explore our fungal whole genome de novo sequencing service.

Plant and Animal Whole Genome De Novo Sequencing

Plant and animal genomes are larger and more complex, often involving polyploidy, high heterozygosity, and large repetitive element fractions. Our standard workflow integrates genome survey (Illumina, ~50×), long-read backbone (PacBio HiFi, 30–60×), Hi-C scaffolding (≥100×), and optional ONT ultra-long gap closure. This strategy routinely delivers chromosome-level assemblies across crops, livestock, wildlife, and aquatic species. See our dedicated plant and animal whole genome de novo sequencing page.

Telomere to Telomere (T2T) Genome Assembly

When standard reference-grade assemblies still contain unresolved gaps — in centromeres, telomeres, segmental duplications, or ribosomal DNA arrays — T2T genome assembly is the next step. T2T assembly aims for gapless reconstruction of every chromosome from one end to the other. The Earth BioGenome Project defines T2T quality as zero sequence gaps with a base accuracy QV > 60.

Our Technology Synergy

Component	Role	Platform
Accuracy anchor	High-fidelity backbone with base-level precision	PacBio HiFi (15–20 kb, Q20+)
Bridge builder	Ultra-long reads spanning centromeres and rDNA arrays	ONT ultra-long (N50 > 100 kb)
Scaffolding architect	Chromosome ordering and structural validation	Hi-C or Pore-C

The combination of HiFi accuracy with ONT ultra-long contiguity has proven effective even in complex polyploid genomes. A 2024 review in Nature Genetics (Garg et al., doi:10.1038/s41588-024-01830-7) highlights how T2T strategies are now being applied to crops with high ploidy and large repeat fractions, transforming the resolution of plant genomics.

Our T2T service portfolio includes three tiers: Vertebrate T2T (large Gb-scale genomes), Plant T2T (polyploid crops and repeat-rich genomes), and Bacterial T2T (closed circular chromosomes with zero gaps). For detailed information, see our telomere-to-telomere sequencing service.

T2T genome assembly strategy combining PacBio HiFi and ONT ultra-long reads

Pan Genome Analysis

A single reference genome captures the genetic content of one individual, but it cannot represent the full genetic diversity of a species. Pan-genome analysis addresses this by constructing a comprehensive gene repertoire across multiple individuals or strains.

Pan-genomes consist of:

Core genome — Genes present in all individuals, typically responsible for essential biological functions.
Variable (accessory) genome — Genes present in a subset of individuals, often encoding adaptive traits, virulence factors, or strain-specific functions.

Our pan-genome workflow supports both linear and graph-based approaches:

De novo assembly of multiple individuals using the same multi-platform strategy
Core/accessory genome classification
Gene presence-absence variation (PAV) analysis
Phylogenetic analysis based on gene content
Optional graph-based pan-genome construction for complex datasets

A minimum of two samples is needed, but larger sample sizes (tens to hundreds) provide increasingly comprehensive coverage. Visit our pan genome analysis service for more details.

Pan-genome core and accessory genome composition diagram

Technology and Platform Strategy

Choosing the right sequencing platform for a de novo genome project depends on genome size, complexity, and the desired assembly quality. The table below summarizes the complementary roles of each platform in our workflow.

Platform	Read Type	Typical Length	Accuracy	Primary Role in De Novo Assembly
Illumina	Short paired-end	150 bp × 2	>Q30	Genome survey, polishing, variant validation
PacBio HiFi	Circular consensus	15–20 kb	>Q20 (99.9%)	Primary contig backbone, haplotype resolution
Oxford Nanopore	Native long reads	10–100+ kb	Varies by basecaller	Gap closure, repeat spanning, SV detection
Hi-C	Chromatin conformation capture	PE150	>Q30	Chromosome anchoring and scaffolding

For most plant and animal de novo projects, a hybrid strategy combining HiFi (30–60×) with Hi-C (≥100×) delivers chromosome-scale assemblies with contig N50 values exceeding 10 Mb. Adding ONT ultra-long reads (40–100×) enables T2T resolution. For microbial genomes, a simpler ONT + Illumina hybrid approach is typically sufficient to generate complete, closed assemblies.

Request a Platform Recommendation

De Novo Genome Sequencing Workflow

Our end-to-end workflow is designed to maintain sample traceability and quality control at every step, from sample receipt to final data delivery.

De novo genome sequencing workflow from sample QC to genome annotation

1. Sample QC

Integrity assessment via PFGE or Femto Pulse. Purity checks (OD260/280, OD260/230, Qubit quantification). RNA removal and DNA quantification.

2. Genome Survey (Illumina)

~50× coverage short-read sequencing for k-mer analysis. Output includes genome size estimate, heterozygosity rate, repeat content, and GC content.

3. Long-Read Sequencing

PacBio HiFi (30–60×) and/or ONT ultra-long library preparation and sequencing. Platform selection based on genome complexity and project scope.

4. Assembly and Scaffolding

De novo assembly using hifiasm or equivalent. Hi-C reads for chromosome-level scaffolding with 3D-DNA or equivalent pipelines.

5. Polishing and Gap Closure

Short-read polishing for base-level correction. ONT ultra-long reads for resolving remaining gaps. Iterative refinement until target quality is met.

6. Quality Assessment

BUSCO completeness scoring, contiguity metrics (N50), k-mer-based QV estimation (Merqury), and LAI for plant genomes. Hi-C contact map validation.

7. Genome Annotation (Optional)

Repeat masking, gene structure prediction, functional annotation (GO, KEGG, Pfam, InterPro), and non-coding RNA identification.

Bioinformatics Analysis

Our bioinformatics pipeline is designed to deliver actionable genomic data, not just raw sequences.

Standard deliverables include:

Genome survey report — K-mer analysis with size, heterozygosity, and repeat estimates
De novo assembly — FASTA format with primary and alternate haplotypes if applicable
Assembly quality assessment — Contig N50, scaffold N50, BUSCO completeness, QV score, LAI (plants)
Genome annotation — Repeat elements, gene structure prediction, functional annotation (GO, KEGG, Pfam, InterPro)
Hi-C validation — Contact map and chromosome anchoring verification
Comprehensive QC report

Optional add-on analyses:

Comparative genomics — ortholog clustering, gene family expansion/contraction, phylogeny, synteny
Pan-genome construction — core/accessory genome classification, PAV analysis, graph-based representation
Haplotype-resolved assembly and phasing
Epigenetic analysis — 5mC methylation detection (PacBio HiFi or ONT native)

Bioinformatics analysis pipeline for de novo genome assembly

Sample Requirements

Proper sample quality is the foundation of a successful de novo genome assembly, especially for long-read and Hi-C sequencing.

Sample Type	Recommended Input	Concentration	Purity (OD260/280)	Notes
High molecular weight gDNA	≥1–5 µg	≥30 ng/µL	1.8–2.0	Fresh tissue preferred for long-read projects; no degradation
Tissue (for extraction)	≥100 mg fresh weight	—	—	Snap-frozen in liquid nitrogen; avoid RNAlater for HMW DNA
Whole Blood (vertebrate)	≥2 mL	—	—	EDTA anticoagulant; store at 4°C, ship on cold packs
Microbial culture pellet	≥10⁸ cells (bacteria) ≥10⁷ cells (fungi)	—	—	Snap-frozen or in DNA preservation buffer
Hi-C sample	Same source as main DNA	—	—	Requires cross-linked fresh tissue; cannot use archived DNA

All samples undergo in-house QC upon receipt. We also offer DNA extraction services for projects where sample preparation is a concern.

Deliverables

CD Genomics provides comprehensive and organized deliverables for every de novo genome sequencing project, tailored for seamless downstream analysis and publication.

Deliverable	Description
Raw sequencing data	FASTQ files per platform and library
Assembly files	Genome assembly in FASTA format (primary + alternate haplotypes if applicable)
Assembly quality report	N50 metrics, BUSCO completeness, QV score, LAI (plants), Hi-C validation
Genome annotation	GFF3 annotation file (repeat, gene structure, functional annotation)
Comparative analysis report	Optional — ortholog tables, phylogeny, synteny blocks, PAV results
Project documentation	Methods summary, software and parameter logs, data usage guide

Why Choose CD Genomics for De Novo Genome Sequencing

From advanced sequencing platforms to high-quality data delivery, CD Genomics offers an efficient, end-to-end solution tailored to diverse de novo genome sequencing needs.

Full-spectrum coverage — From viral and bacterial genomes to complex plant and animal assemblies. One provider, one point of contact.
Multi-platform strategy — Illumina, PacBio HiFi, Oxford Nanopore, and Hi-C available in-house. Platform combinations tailored to each genome.
Proven track record — Completed de novo genome projects across diverse species, meeting EBP reference-grade standards (6.C.Q40 or higher).
T2T capability — Dedicated T2T service using the HiFi + ultra-long ONT synergy for vertebrates, plants, and bacteria.
End-to-end service — Sample QC, library preparation, sequencing, assembly, annotation, and optional pan-genome analysis under a single workflow.

CD Genomics is committed to supporting your genomic discoveries with dependable and comprehensive de novo genome sequencing services.

De novo genome sequencing service overview

De Novo Genome Sequencing: From Any Species to Publication-Ready Assembly

Demo Results

Below are representative data types generated during a typical de novo genome sequencing project. Results will vary by species and project scope.

K-mer distribution plot for genome size and heterozygosity estimation

Figure 1: K-mer Distribution (Genome Survey)
K-mer frequency plot from short-read data, used to estimate genome size, heterozygosity, and repeat content. This analysis guides downstream platform selection and coverage targets.

Hi-C chromatin interaction heatmap for chromosome scaffolding

Figure 2: Hi-C Chromatin Interaction Heatmap
Genome-wide Hi-C contact map used to order and orient contigs into chromosome-scale scaffolds. Strong diagonal signal indicates correct scaffolding.

Figure 3: BUSCO Completeness Assessment
Percentage of complete, fragmented, duplicated, and missing BUSCO genes. Reference-quality assemblies typically achieve >95% complete BUSCO scores.

Assembly contiguity comparison across sequencing platforms

Figure 4: Assembly Contiguity Comparison
Illustrative comparison of contig N50 values across strategies. Hybrid approaches combining PacBio HiFi with Hi-C consistently produce the highest contiguity. (Reference: Hotaling et al., BMC Genomics, 2023)

Reference

Hotaling, et al. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics. 2023. https://doi.org/10.1186/s12864-023-09193-9

De Novo Genome Sequencing FAQs

1. What types of organisms can you sequence using de novo whole genome sequencing?

We provide de novo sequencing for viruses, bacteria, fungi, plants, and animals — from small microbial genomes (a few kb) to large vertebrate and plant genomes (gigabase scale). Platform strategies are tailored to each organism group.

2. What is the difference between standard de novo assembly and T2T genome assembly?

Standard de novo assembly typically resolves >95% of the genome but leaves gaps in repetitive regions such as centromeres, telomeres, and ribosomal DNA. T2T assembly closes these gaps using ultra-long reads, producing a gapless genome with all chromosomes from telomere to telomere. The Earth BioGenome Project (EBP) defines T2T quality as zero gaps with QV > 60.

3. How many samples are needed for pan-genome analysis?

A minimum of two individuals is required, but larger sample sizes (typically 10–100+) provide more comprehensive coverage of the species' genetic diversity. The optimal number depends on population structure, genetic diversity, and research objectives.

4. What sequencing platforms do you use for de novo genome assembly?

We use Illumina (short reads for survey and polishing), PacBio HiFi (high-accuracy long reads for backbone assembly), Oxford Nanopore (ultra-long reads for gap closure and repeat resolution), and Hi-C (chromosome-scale scaffolding). The platform combination is customized for each project.

5. What quality metrics are used to evaluate genome assemblies?

Standard metrics include contig N50, scaffold N50 (contiguity), BUSCO completeness (gene content, targeting >90–95% complete), QV score (base accuracy, targeting ≥Q40 per EBP standards), LAI (for plant genomes), and Hi-C contact map validation (structural correctness).

6. What are the sample requirements for de novo genome sequencing?

For standard de novo projects, high-molecular-weight gDNA with OD260/280 of 1.8–2.0 and minimal degradation is recommended. Input amounts range from ≥1 µg (short-read) to ≥5 µg (long-read). Fresh tissue is preferred for HMW DNA extraction. See the Sample Requirements section above for detailed guidelines.

7. Can you assemble polyploid or highly heterozygous genomes?

Yes. Our PacBio HiFi-based approach is specifically designed to resolve complex genomes. HiFi reads provide the accuracy needed for haplotype separation in polyploids, and higher coverage (≥60×) is applied for species with elevated heterozygosity or ploidy. A 2024 review in Nature Genetics (doi:10.1038/s41588-024-01830-7) documents successful T2T assemblies across polyploid crops using these same strategies.

8. What bioinformatics analysis is included with the de novo sequencing service?

Standard bioinformatics includes genome survey (k-mer analysis), de novo assembly, assembly quality assessment, and structural/functional annotation. Optional modules include comparative genomics (gene family, synteny, phylogeny), pan-genome construction, haplotype-resolved assembly, and epigenetic analysis.

Case Study: Telomere to Telomere Genome of Fragaria vesca

Open Access Publication Highlight

The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry

Journal: Horticulture Research
Impact Factor: 8.7
Published: 2023

Background

Fragaria vesca (wild strawberry) is a model system for fruit development, plant-pathogen interactions, and functional genomics. Despite its importance, previous genome assemblies lacked continuity with gaps in repetitive regions and unresolved centromeres, limiting structural and functional genomic studies.

Methods

The study used PacBio HiFi long-read sequencing combined with Hi-C chromatin conformation capture. The assembly employed hifiasm for primary contig generation and 3D-DNA for Hi-C-guided scaffolding, followed by manual curation to close remaining gaps.

Results

Final T2T assembly: 220.8 Mb across all 7 chromosomes as single contigs
All 14 telomeres and 7 centromeres precisely identified
BUSCO completeness: 98.2%
Zero gaps across all chromosomes

Conclusion

This T2T genome provides a gap-free reference for Fragaria genomics, enabling accurate analysis of centromere structure, telomere biology, and evolutionary dynamics. The assembly strategy demonstrates that T2T resolution is achievable for plant genomes using PacBio HiFi and Hi-C — the same approach we apply in our T2T sequencing service.

Figure 1 from Sun P et al. 2023 — Fragaria vesca T2T genome chromosome ideograms Figure 1 from Sun P, et al. Horticulture Research, 2023. Chromosome ideograms of the Fragaria vesca T2T assembly.

Reference

Sun P, et al. The telomere-to-telomere genome of Fragaria vesca reveals the genomic evolution of Fragaria and the origin of cultivated octoploid strawberry. Horticulture Research. 2023. https://doi.org/10.1093/hr/uhad027

Related Publications

Here are publications from projects using de novo genome sequencing and related genomic services:

A de novo assembly of genomic dataset sequences of the sugar beet root maggot Tetanops myopaeformis

Journal: Data in Brief

Year: 2024

https://doi.org/10.1016/j.dib.2024.110298

Combinations of Bacteriophage Are Efficacious against Multidrug-Resistant Pseudomonas aeruginosa and Enhance Sensitivity to Carbapenem Antibiotics

Journal: Viruses

Year: 2024

https://doi.org/10.3390/v16071000

Genetic and environmental influences on the distributions of three chromosomal inversion polymorphisms in Anopheles gambiae

Journal: PLOS Genetics

Year: 2025

https://doi.org/10.1371/journal.pgen.1011742

The genetic legacy of fragmentation and overexploitation in the threatened medicinal plant Aquilaria sinensis

Journal: Scientific Reports

Year: 2020

https://doi.org/10.1038/s41598-020-76654-6

Generation of a highly attenuated strain of Pseudomonas aeruginosa for commercial production of alginate