Introduction – What Is Genome Survey Sequencing and Why It Matters
Genome Survey Sequencing (GSS) is a cost-efficient approach designed to capture the fundamental features of a genome before investing in full-scale assembly projects. By applying low-depth Illumina sequencing combined with k-mer distribution analysis, GSS generates essential statistics such as genome size, heterozygosity rate, repeat sequence proportion, and GC content. These parameters not only provide a first glance into the structural complexity of a species' genome but also act as critical indicators of whether subsequent de novo sequencing and assembly will be straightforward or technically challenging.
For researchers working on plants, animals, or other non-model organisms, uncertainties in genome size or complexity often lead to difficulties in project design. A survey-level dataset reduces these risks by offering a reliable estimate of assembly difficulty, ensuring that resources are allocated wisely and sequencing strategies are tailored to the biological reality of each organism.
Advantages – Genome Survey Sequencing Benefits for Genomic Research
- Reliable genome size estimation
K-mer analysis allows accurate inference of genome size without requiring prior references, which is particularly valuable for species with limited genomic information. - Early detection of complexity factors
Peaks in the k-mer curve reveal heterozygosity levels and repetitive elements, enabling researchers to anticipate challenges such as fragmented assemblies or misassembled repeats. - Guidance for downstream sequencing
The survey results highlight whether a genome can be resolved using short-read sequencing alone or whether long-read platforms and hybrid approaches are recommended. - Cost-effective project planning
Instead of committing to expensive high-coverage sequencing immediately, GSS ensures that initial investment produces interpretable results that guide subsequent stages of genome exploration. - Broader applications in comparative genomics
When multiple related species or populations are studied, genome survey data provide a quantitative basis for comparing genomic complexity, GC content distribution, or structural divergence.
Service Workflow – Our Genome Survey Sequencing Pipeline
Genome Survey Sequencing can be tailored to different research contexts. CD Genomics provides two complementary service modes to accommodate both researchers who need complete sequencing support and those who have already generated raw data but require in-depth analysis.
Option A: Genome Survey Sequencing + Bioinformatics Analysis
For projects starting from DNA samples, we provide an integrated pipeline that ensures data quality and robust interpretation.

1. Sample Quality Control and DNA Extraction
- Assessment of DNA integrity, purity, and concentration.
- Extraction from plant tissues, animal organs, or microbial cells using optimized protocols to minimize contamination.
2. Low-Depth Illumina Sequencing
- Construction of short-insert libraries (e.g., ~270 bp).
- Paired-end sequencing (typically PE150) at ~50× depth to capture representative genome coverage.
3. Data Quality Assessment
- Filtering raw reads to obtain high-quality clean data.
- Calculation of Q20/Q30, GC distribution, and evaluation of potential contamination or organelle DNA proportion.
4. K-mer Analysis
- Generation of k-mer frequency distribution curves.
- Identification of main peak, heterozygous peak, and repeat peaks to derive genome size, heterozygosity, and repeat content.
5. Result Interpretation and Strategy Recommendation
- Compilation of comprehensive metrics and visual outputs.
- Guidance on de novo genome sequencing strategy, including whether to complement with long-read or Hi-C data.
Option B: Genome Survey Bioinformatics Analysis Only
For researchers who already possess short-read sequencing data, our team can directly perform advanced k-mer based analyses without additional sequencing.
1. Input Data Review and QC
- Verification of sequencing format and quality standards.
- Removal of low-quality reads, adapters, and contaminants.
2. K-mer Frequency Distribution
- Application of high-performance algorithms (e.g., Jellyfish, KMC) to compute k-mer counts.
- Detection of genome size, repeat sequence proportion, heterozygosity, and GC content.
3. Analytical Report Generation
- Delivery of genome survey results with detailed plots, statistical summaries, and tailored recommendations for subsequent de novo genome sequencing.

Customer to Provide – What Researchers Should Prepare
To ensure reliable and reproducible results from Genome Survey Sequencing, proper sample preparation and clear project objectives are essential. We recommend that researchers prepare the following:
- High-Quality Genomic DNA
- Provide freshly extracted genomic DNA with high integrity and minimal degradation.
- For plants, young leaves or sterile seedling tissue are preferred to reduce polysaccharide and polyphenol contamination.
- For animals, high-quality DNA from whole blood, liver, or other suitable tissues is recommended.
- Sufficient DNA Quantity and Concentration
- Adequate DNA input is required to construct short-insert libraries for Illumina sequencing.
- For bioinformatics-only analysis, raw sequencing data in standard FASTQ format is sufficient.
- Project Information
- Define research objectives clearly, such as genome size estimation, heterozygosity assessment, or guiding de novo genome sequencing.
- Providing any available reference information (e.g., genome size of related species) helps refine k-mer analysis and improve the accuracy of genome complexity assessment.
- Optional Requirements
- Indicate whether downstream applications such as de novo genome assembly or hybrid sequencing (PacBio HiFi/ONT + Illumina) are planned. This allows us to tailor the survey sequencing strategy for your specific research needs.
Deliverables – What You Will Receive
Depending on your chosen service model, you will receive a clear, publication-ready package that supports downstream de novo genome projects:
Service Description – Flexible Genome Survey Service Models
To meet different research needs, CD Genomics offers two service configurations for Genome Survey Sequencing (GSS). You can either choose a complete workflow from DNA to report, or opt for bioinformatics-only analysis if sequencing data has already been generated in-house or by another provider.
| Service Model | Suitable For | Workflow Components | Key Deliverables | Advantages |
|---|---|---|---|---|
| Option A: Genome Survey Sequencing + Analysis | Researchers starting with biological samples (plant, animal, or microbial) |
|
|
One-stop solution with both sequencing and analysis; ensures consistency and reliability; directly links to downstream de novo assembly services |
| Option B: Genome Survey Bioinformatics Analysis Only | Researchers who already have Illumina short-read data (FASTQ format) |
|
|
Applications – When to Choose Genome Survey Sequencing
Genome Survey Sequencing is not an end point but a strategic entry into deeper genome exploration. By providing rapid and reliable estimates of genome architecture, GSS enables researchers to make informed decisions in diverse research scenarios:
- Feasibility Assessment for De Novo Genome Projects
Before committing resources to high-depth sequencing and assembly, GSS helps evaluate whether the target genome is suitable for de novo strategies, particularly in species with unknown or complex genomes. - Plant and Animal Breeding Programs
In agricultural and aquaculture research, genome size and heterozygosity information supports parental line selection, breeding strategy optimization, and evaluation of genetic diversity across populations. - Comparative and Evolutionary Genomics
GSS provides genome size, repeat content, and GC composition data that are valuable for distinguishing closely related species and tracing genome evolution across taxa. - Genome Complexity and Repeat Analysis
Identifying high repeat content or elevated heterozygosity early on allows researchers to adjust their sequencing design, for example by integrating long-read or hybrid assembly approaches, ensuring higher-quality assemblies. - Reference Genome Preparation
When building a reference genome for a new organism, GSS lays the groundwork by defining baseline parameters that guide library construction, sequencing depth, and assembly algorithms.
Case Study
(Zheng, Z., Zhang, N., Huang, Z., et al., Scientific Reports, 2022)
Research Subject: Platostoma palustre (Lamiaceae), a non-model edible and medicinal herb.
Research Methods: Low-depth Illumina paired-end sequencing; k-mer frequency analysis; draft de novo assembly; SSR mining and validation
Research Focus:
1. K-mer analysis estimated genome size at ~1.21 Gb, with ~70.62% repeats and ~0.33% heterozygosity, defining a large, repeat-rich yet low-heterozygosity genome.
2. From the survey data, 15,498 SSR motifs were identified (dinucleotides most abundant; AT/TA ~44.28%), providing a rich marker resource for this species.
3. Validated SSRs separated P. palustre accessions from related Lamiaceae taxa into distinct groups, supporting germplasm characterization and downstream breeding studies.
Estimation of P. palustre genome size using k-mer (k = 19) analysis.
FAQ – Common Questions About Genome Survey Sequencing
Q1: What makes Genome Survey Sequencing different from full de novo genome sequencing?
A: GSS focuses on low-depth Illumina sequencing and k-mer analysis to assess genome size, heterozygosity, repeats, and GC content. It does not produce a complete genome assembly but guides whether de novo sequencing is feasible and how to design it effectively.
Q2: How much sequencing depth is recommended for a genome survey?
A: Typically, ~50× coverage with short-insert Illumina paired-end reads is sufficient to generate reliable k-mer frequency distributions and accurate genome size estimation.
Q3: Can I submit my own sequencing data for analysis?
A: Yes. If you already have raw Illumina FASTQ files, we provide bioinformatics-only genome survey analysis, including data QC, k-mer distribution analysis, and genome feature reporting.
Q4: What genome features can be revealed by k-mer analysis?
A: K-mer profiling provides estimates of genome size, heterozygosity, repeat sequence proportion, and GC content. These parameters help predict assembly complexity and inform downstream sequencing strategies.
Q5: When is a genome survey especially useful?
A: This service is particularly valuable for projects on non-model plants, animals, or microorganisms where genome size and complexity are unknown, or when planning large-scale de novo genome sequencing.
Related Services
Send a MessageFor any general inquiries, please fill out the form below.