Genome Survey Sequencing workflow showing genomic DNA, Illumina sequencing, k-mer analysis, and results.

Genome Survey Sequencing (GSS)

  • Accurate
  • Genome-wide overview
  • Essential for de novo planning

Service Highlights

Compared with direct de novo sequencing, Genome Survey Sequencing (GSS) offers a rapid, economical, and reliable way to evaluate genome characteristics in advance. By combining low-depth Illumina sequencing with k-mer analysis, GSS delivers key metrics such as genome size, heterozygosity, repeat content, and GC composition. These insights allow researchers to design the most effective de novo sequencing strategy, avoid unnecessary costs, and reduce the risk of assembly failure.

Request a Quote
Introduction

Introduction – What Is Genome Survey Sequencing and Why It Matters

Genome Survey Sequencing (GSS) is a cost-efficient approach designed to capture the fundamental features of a genome before investing in full-scale assembly projects. By applying low-depth Illumina sequencing combined with k-mer distribution analysis, GSS generates essential statistics such as genome size, heterozygosity rate, repeat sequence proportion, and GC content. These parameters not only provide a first glance into the structural complexity of a species' genome but also act as critical indicators of whether subsequent de novo sequencing and assembly will be straightforward or technically challenging.

For researchers working on plants, animals, or other non-model organisms, uncertainties in genome size or complexity often lead to difficulties in project design. A survey-level dataset reduces these risks by offering a reliable estimate of assembly difficulty, ensuring that resources are allocated wisely and sequencing strategies are tailored to the biological reality of each organism.

Advantages

Advantages – Genome Survey Sequencing Benefits for Genomic Research

  • Reliable genome size estimation
    K-mer analysis allows accurate inference of genome size without requiring prior references, which is particularly valuable for species with limited genomic information.
  • Early detection of complexity factors
    Peaks in the k-mer curve reveal heterozygosity levels and repetitive elements, enabling researchers to anticipate challenges such as fragmented assemblies or misassembled repeats.
  • Guidance for downstream sequencing
    The survey results highlight whether a genome can be resolved using short-read sequencing alone or whether long-read platforms and hybrid approaches are recommended.
  • Cost-effective project planning
    Instead of committing to expensive high-coverage sequencing immediately, GSS ensures that initial investment produces interpretable results that guide subsequent stages of genome exploration.
  • Broader applications in comparative genomics
    When multiple related species or populations are studied, genome survey data provide a quantitative basis for comparing genomic complexity, GC content distribution, or structural divergence.
Service Workflow

Service Workflow – Our Genome Survey Sequencing Pipeline

Genome Survey Sequencing can be tailored to different research contexts. CD Genomics provides two complementary service modes to accommodate both researchers who need complete sequencing support and those who have already generated raw data but require in-depth analysis.

Option A: Genome Survey Sequencing + Bioinformatics Analysis

For projects starting from DNA samples, we provide an integrated pipeline that ensures data quality and robust interpretation.

Genome Survey Sequencing service workflow

1. Sample Quality Control and DNA Extraction

  • Assessment of DNA integrity, purity, and concentration.
  • Extraction from plant tissues, animal organs, or microbial cells using optimized protocols to minimize contamination.

2. Low-Depth Illumina Sequencing

  • Construction of short-insert libraries (e.g., ~270 bp).
  • Paired-end sequencing (typically PE150) at ~50× depth to capture representative genome coverage.

3. Data Quality Assessment

  • Filtering raw reads to obtain high-quality clean data.
  • Calculation of Q20/Q30, GC distribution, and evaluation of potential contamination or organelle DNA proportion.

4. K-mer Analysis

  • Generation of k-mer frequency distribution curves.
  • Identification of main peak, heterozygous peak, and repeat peaks to derive genome size, heterozygosity, and repeat content.

5. Result Interpretation and Strategy Recommendation

  • Compilation of comprehensive metrics and visual outputs.
  • Guidance on de novo genome sequencing strategy, including whether to complement with long-read or Hi-C data.

Option B: Genome Survey Bioinformatics Analysis Only

For researchers who already possess short-read sequencing data, our team can directly perform advanced k-mer based analyses without additional sequencing.

1. Input Data Review and QC

  • Verification of sequencing format and quality standards.
  • Removal of low-quality reads, adapters, and contaminants.

2. K-mer Frequency Distribution

  • Application of high-performance algorithms (e.g., Jellyfish, KMC) to compute k-mer counts.
  • Detection of genome size, repeat sequence proportion, heterozygosity, and GC content.

3. Analytical Report Generation

  • Delivery of genome survey results with detailed plots, statistical summaries, and tailored recommendations for subsequent de novo genome sequencing.

Genome Survey Bioinformatics Analysis workflow

Customer to Provide

Customer to Provide – What Researchers Should Prepare

To ensure reliable and reproducible results from Genome Survey Sequencing, proper sample preparation and clear project objectives are essential. We recommend that researchers prepare the following:

  • High-Quality Genomic DNA
    • Provide freshly extracted genomic DNA with high integrity and minimal degradation.
    • For plants, young leaves or sterile seedling tissue are preferred to reduce polysaccharide and polyphenol contamination.
    • For animals, high-quality DNA from whole blood, liver, or other suitable tissues is recommended.
  • Sufficient DNA Quantity and Concentration
    • Adequate DNA input is required to construct short-insert libraries for Illumina sequencing.
    • For bioinformatics-only analysis, raw sequencing data in standard FASTQ format is sufficient.
  • Project Information
    • Define research objectives clearly, such as genome size estimation, heterozygosity assessment, or guiding de novo genome sequencing.
    • Providing any available reference information (e.g., genome size of related species) helps refine k-mer analysis and improve the accuracy of genome complexity assessment.
  • Optional Requirements
    • Indicate whether downstream applications such as de novo genome assembly or hybrid sequencing (PacBio HiFi/ONT + Illumina) are planned. This allows us to tailor the survey sequencing strategy for your specific research needs.
Deliverable

Deliverables – What You Will Receive

Depending on your chosen service model, you will receive a clear, publication-ready package that supports downstream de novo genome projects:

  • Sequencing Data
    • For full service: raw Illumina paired-end reads and high-quality clean data.
    • For analysis-only: processed FASTQ-based results.
  • Quality Control Report
    • Q20/Q30 metrics, read length distribution, contamination check, and organelle DNA assessment.
  • K-mer Analysis Outputs
    • K-mer frequency distribution plots with main, heterozygosity, and repeat peaks.
    • Estimated genome size, heterozygosity rate, repeat content, and GC composition.
  • Interpretation & Recommendations
    • Expert summary of genome complexity.
    • Guidance for optimal de novo genome sequencing strategies, including depth and long-read integration.
  • Data Files & Figures
    • Downloadable clean datasets, tables, and visual reports for immediate use in bioinformatics or publications.
Service Description

Service Description – Flexible Genome Survey Service Models

To meet different research needs, CD Genomics offers two service configurations for Genome Survey Sequencing (GSS). You can either choose a complete workflow from DNA to report, or opt for bioinformatics-only analysis if sequencing data has already been generated in-house or by another provider.

Service Model Suitable For Workflow Components Key Deliverables Advantages
Option A: Genome Survey Sequencing + Analysis Researchers starting with biological samples (plant, animal, or microbial)
  • DNA extraction & QC
  • Short-insert Illumina library construction (e.g., ~270 bp)
  • Low-depth paired-end sequencing (~50×)
  • Data QC (Q20/Q30, contamination, organelle content)
  • K-mer analysis & genome metrics estimation
  • Interpretation & recommendations
  • Raw & clean sequencing data
  • Quality reports
  • K-mer distribution plots
  • Genome size, heterozygosity, repeat content, GC composition
  • Strategy guidance for de novo sequencing
One-stop solution with both sequencing and analysis; ensures consistency and reliability; directly links to downstream de novo assembly services
Option B: Genome Survey Bioinformatics Analysis Only Researchers who already have Illumina short-read data (FASTQ format)
  • Input data QC & filtering
  • K-mer frequency analysis
  • Genome feature estimation
  • Report generation with recommendations
  • Processed data files
  • K-mer plots & genome metrics (size, heterozygosity, repeats, GC%)
  • Analytical interpretation
Application

Applications – When to Choose Genome Survey Sequencing

Genome Survey Sequencing is not an end point but a strategic entry into deeper genome exploration. By providing rapid and reliable estimates of genome architecture, GSS enables researchers to make informed decisions in diverse research scenarios:

  • Feasibility Assessment for De Novo Genome Projects
    Before committing resources to high-depth sequencing and assembly, GSS helps evaluate whether the target genome is suitable for de novo strategies, particularly in species with unknown or complex genomes.
  • Plant and Animal Breeding Programs
    In agricultural and aquaculture research, genome size and heterozygosity information supports parental line selection, breeding strategy optimization, and evaluation of genetic diversity across populations.
  • Comparative and Evolutionary Genomics
    GSS provides genome size, repeat content, and GC composition data that are valuable for distinguishing closely related species and tracing genome evolution across taxa.
  • Genome Complexity and Repeat Analysis
    Identifying high repeat content or elevated heterozygosity early on allows researchers to adjust their sequencing design, for example by integrating long-read or hybrid assembly approaches, ensuring higher-quality assemblies.
  • Reference Genome Preparation
    When building a reference genome for a new organism, GSS lays the groundwork by defining baseline parameters that guide library construction, sequencing depth, and assembly algorithms.
Case

Case Study

Genome survey sequencing enables genome size estimation and SSR marker development in Platostoma palustre (Chinese mesona)

(Zheng, Z., Zhang, N., Huang, Z., et al., Scientific Reports, 2022)

Research Subject: Platostoma palustre (Lamiaceae), a non-model edible and medicinal herb.

Research Methods: Low-depth Illumina paired-end sequencing; k-mer frequency analysis; draft de novo assembly; SSR mining and validation

Research Focus:

1. K-mer analysis estimated genome size at ~1.21 Gb, with ~70.62% repeats and ~0.33% heterozygosity, defining a large, repeat-rich yet low-heterozygosity genome.

2. From the survey data, 15,498 SSR motifs were identified (dinucleotides most abundant; AT/TA ~44.28%), providing a rich marker resource for this species.

3. Validated SSRs separated P. palustre accessions from related Lamiaceae taxa into distinct groups, supporting germplasm characterization and downstream breeding studies.

Estimation of the genome size of P. palustre through k-mer analysis. Estimation of P. palustre genome size using k-mer (k = 19) analysis.

FAQ

FAQ – Common Questions About Genome Survey Sequencing

Q1: What makes Genome Survey Sequencing different from full de novo genome sequencing?

A: GSS focuses on low-depth Illumina sequencing and k-mer analysis to assess genome size, heterozygosity, repeats, and GC content. It does not produce a complete genome assembly but guides whether de novo sequencing is feasible and how to design it effectively.

Q2: How much sequencing depth is recommended for a genome survey?

A: Typically, ~50× coverage with short-insert Illumina paired-end reads is sufficient to generate reliable k-mer frequency distributions and accurate genome size estimation.

Q3: Can I submit my own sequencing data for analysis?

A: Yes. If you already have raw Illumina FASTQ files, we provide bioinformatics-only genome survey analysis, including data QC, k-mer distribution analysis, and genome feature reporting.

Q4: What genome features can be revealed by k-mer analysis?

A: K-mer profiling provides estimates of genome size, heterozygosity, repeat sequence proportion, and GC content. These parameters help predict assembly complexity and inform downstream sequencing strategies.

Q5: When is a genome survey especially useful?

A: This service is particularly valuable for projects on non-model plants, animals, or microorganisms where genome size and complexity are unknown, or when planning large-scale de novo genome sequencing.

Related Services

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Send a MessageSend a Message

For any general inquiries, please fill out the form below.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
OUR MISSION

CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.

Contact Us
Copyright © CD Genomics. All Rights Reserved.
Top