Low-Pass vs. High-Coverage WGS: Choosing the Right Sequencing Depth for Your Research Goals and Budget

Q: At what coverage can I reliably call structural variants?

Large deletions and duplications (>1 Mb) are detectable from 0.5-1× lpWGS using read-depth-based tools (cn.mops, CNVkit). For comprehensive SV detection including insertions, inversions, and smaller events ( 95% sensitivity for events >1 kb in a 3 Gb genome.

Q: What is the cheapest way to sequence 500 genomes?

At current (2025) pricing for a 1 Gb genome: 1× lpWGS at ~$30/sample = $15,000 total. Add $5,000 for imputation against a public reference panel = $20,000. This provides common-variant genotypes suitable for GWAS, population structure, and genomic prediction. If rare variants or SVs are required, budget for 10× at ~$200/sample = $100,000 for 500 samples.

Q: Can I combine samples sequenced at different depths in one analysis?

Yes. Joint genotyping with GATK handles heterogeneous coverage across samples, and GLIMPSE2 imputation can harmonize a mixed-coverage design where 10-20% of samples are deep (≥25×) and the remainder are low-coverage (1-4×). This hybrid design is the most cost-effective strategy for population-scale projects in non-model organisms.

Q: How much storage does a WGS project need?

A 1,000-sample project at 10× for a 3 Gb genome generates approximately 60 TB of data across FASTQ, BAM, and VCF files. Using CRAM instead of BAM reduces this by 40-50% (~36 TB). Adding PGEN for genotype data saves an additional 1-2 TB. Cloud archival storage costs roughly $25-50 per TB per month for active storage and $1-4 per TB per month for archival (glacier) storage.

Q: What is the turnaround time for WGS at different depths?

Sequencing time scales linearly with coverage. A NovaSeq S4 flow cell produces ~3 Tb of data per 44-hour run. At 1× (3 Gb/sample), approximately 1,000 samples can be sequenced per run. At 30×, approximately 33 samples per run. Typical project timelines including library preparation, sequencing, and bioinformatics: 1× lpWGS = 4-6 weeks for 1,000 samples; 30× WGS = 8-12 weeks for 100 samples.

The Depth Spectrum — What Each Coverage Tier Actually Delivers

A population geneticist planning a GWAS on 2,000 soybean accessions asks: "Can I get away with 1× coverage and imputation, or do I need 10×?" A cancer genomics lab tracking clonal evolution across 500 single cells asks: "Is 30× enough, or do I need 60×?" A conservation biologist with a $15,000 grant asks: "How many individual genomes can I sequence at what depth before I run out of money?"

These three researchers share one question framed three ways: what sequencing depth do I actually need? The answer is never a single number — it is a function of the biological question, the variant type of interest, the available reference panel, and the budget. This guide provides the evidence, cost models, and decision framework to answer it.

CD Genomics provides Whole Genome Sequencing at every depth tier — from ultra-low-pass (0.5×) for imputation-powered GWAS to deep (30×+) for reference panel construction — enabling projects to match depth precisely to research goals without overpaying for coverage they do not need.

Sequencing depth (or coverage) is the average number of times each base in the genome is read by the sequencer. At 1× coverage, each base is read once on average — but the Poisson distribution of read sampling means that approximately 37% of bases are not read at all. At 30×, more than 99.9% of bases are covered by at least one read, and the mean depth at heterozygous variant sites is approximately 15 reads — sufficient to distinguish true heterozygotes from sequencing errors with high confidence.

Depth is not binary. Five operational tiers define what variants can be detected and what questions can be answered:

Tier	Coverage	Fraction of Genome Covered (≥1 read)	Heterozygous Genotype Accuracy	Cost/Sample (Human, 2025)	Best For
Ultra-low pass	0.1–0.5×	10–40%	Not called directly (imputation)	$15–30	Biobank-scale ancestry, polygenic scores (PGS), broad CNV screening
Low-pass + imputation	0.5–4×	40–98%	Imputed: r² 0.85–0.95 for common SNPs	$30–100	GWAS of common variants, genomic selection, population structure
Standard coverage	10–15×	>99.9%	Called: >99% for SNPs	$150–250	Selection scans, demographic inference, rare SNP discovery (MAF >2%)
Deep coverage	25–35×	>99.99%	Called: >99.9% for SNPs	$250–400	Reference panel construction, high-confidence rare variants (MAF 0.1–2%)
Ultra-deep	50×+	>99.999%	Called: >99.99%	$500–1,200+	Somatic mosaicism, single-cell WGS, tumor-normal pairs, liquid biopsy

The critical variable is not coverage per se but genotype accuracy at the variant classes that matter for your question. A 0.5× genome imputed against a well-matched reference panel of 150,000 haplotypes can achieve r² > 0.90 for common SNPs (MAF > 5%) — rivalling or exceeding the accuracy of a 500K SNP array (Rubinacci et al., 2023). A 2026 Molecular Ecology benchmark by Atsawawaranunt et al. demonstrated that reduced-representation methods (RADseq) produce false-positive selection signals driven by locus dropout in specific populations — errors that WGS, even at low coverage, resolves because genome-wide sampling captures the full allele frequency spectrum rather than a biased subset. Different depths enable different biology, and different methods carry different blind spots.

Low-Pass WGS (0.5–4×) — Genotyping Without Breaking the Bank

Low-pass whole genome sequencing (lpWGS) sequences the entire genome at 0.5× to 4× coverage and then uses statistical imputation — inferring unobserved genotypes from a reference panel of fully sequenced haplotypes — to fill in the missing data. The approach has matured rapidly since 2023, driven by three developments: the GLIMPSE2 imputation engine, which achieves sublinear computational scaling in both sample count and marker count (processing a 1× genome against 150,000 haplotypes in ~11 hours at <$0.10 per genome); the availability of large, population-matched reference panels (UK Biobank, gnomAD, All of Us, 1000 Genomes for humans; breed- and population-specific panels for agricultural species); and the convergence of sequencing costs to the point where 1× WGS costs less than a mid-density SNP array while providing genome-wide coverage without ascertainment bias.

How Imputation Makes Low-Pass Work

Imputation from low-coverage data is fundamentally different from imputation from SNP arrays. Array-based imputation starts from 500K–2M known genotypes and fills gaps by haplotype matching. Low-pass imputation starts from sparse, genome-wide genotype likelihoods — every position in the genome has some probability of each genotype, derived from the handful of reads that overlap it. This richer input, combined with the Li-Stephens hidden Markov model at the core of GLIMPSE2, produces more accurate imputed genotypes than array-based imputation at common and low-frequency variants, particularly in populations underrepresented on commercial arrays.

Three tools define the current lpWGS imputation landscape:

GLIMPSE2 (Rubinacci et al., 2023): The state of the art. Requires a phased reference panel (SHAPEIT5-phased haplotypes). Scales sublinearly. Recommended for depth ≥0.5×. Performs best with reference panels of >1,000 haplotypes. The --K parameter (number of conditioning states) should be increased to 2,000–4,000 for populations with high genetic diversity.

QUILT (Davies et al., 2021): Alternative to GLIMPSE2 that performs comparably at depths ≥0.5×. Uses a different algorithmic approach (diploid HMM incorporating both reference haplotypes and read information simultaneously). QUILT and GLIMPSE2 achieve comparable accuracy for broad ancestry inference at depths as low as 0.15×, but both require ≥0.5× for reliable genotype calling suitable for GWAS (Rubinacci et al., 2023; Wasik et al., 2021).

STITCH (Davies et al., 2016): Reference-free imputation — does not require a phased reference panel. Instead, it leverages linkage disequilibrium patterns directly from the low-coverage sequencing data across many samples. This makes STITCH uniquely valuable for non-model organisms with no reference panel, but it requires larger sample sizes (≥100 individuals) and higher coverage (≥2×) to achieve accuracy comparable to reference-based methods. A 2026 aquaculture study found STITCH underperformed GLIMPSE2 for low-frequency variants in mud crab but was adequate for common-variant GWAS when no reference panel existed.

What Low-Pass WGS Detects — and What It Misses

Variant Class	Detection at 0.5–1×	Detection at 2–4×	Notes
Common SNPs (MAF >5%)	Excellent (r² >0.90 via imputation)	Excellent (r² >0.95)	Comparable to 500K SNP array at 1×
Low-frequency SNPs (MAF 1–5%)	Good (r² 0.75–0.85)	Very good (r² 0.85–0.93)	Reference panel quality is the bottleneck
Rare SNPs (MAF 0.1–1%)	Poor (r² <0.50)	Moderate (r² 0.50–0.70)	Requires large, population-matched reference panel
Private/novel SNPs	Undetectable	Very poor	Not recoverable by imputation — need de novo calling at ≥10×
Large CNVs (>1 Mb)	Detectable	Good	cn.mops, CNVkit can call from 0.5–1×
Small CNVs (<100 kb)	Poor	Moderate	Resolution improves with depth
Structural variants	Poor	Poor–Moderate	Requires ≥10× for reliable SV calling

The practical implication: if your research question is driven by common and low-frequency variants — GWAS of complex traits, genomic prediction in breeding populations, population structure analysis, or ancestry inference — low-pass WGS at 1–2× with imputation delivers statistical power comparable to deep WGS at a fraction of the cost. If your question depends on rare, population-private, or de novo variants, low-pass is the wrong tool.

Cost-Efficiency: The Low-Pass Advantage in Numbers

Consider a fixed budget of $50,000 for a human-scale (3 Gb) genome project:

Depth Strategy	Samples Sequencable	Common-SNP GWAS Power	Rare Variant Detection	Future Reusability
30× deep WGS	~170	Good (moderate N)	Excellent	Maximum
10× standard WGS	~330	Better	Good	High
1× lpWGS + imputation	~1,600	Best (high N)	None	Moderate
2× lpWGS + imputation	~800	Very good	Poor	Moderate

For common-variant GWAS power, sample size dominates coverage beyond ~1×. Sequencing 1,600 individuals at 1× will find more real GWAS associations than 170 individuals at 30× — this is the central insight that has driven adoption of lpWGS in biobank-scale and agricultural breeding programs since 2023.

CD Genomics' Shallow Whole Genome Sequencing service provides lpWGS on Illumina and MGI platforms with standardized imputation pipelines (GLIMPSE2 + SHAPEIT5-phased reference panels), delivering analysis-ready genotype calls for GWAS, genomic selection, and population structure analysis. For projects that combine low-coverage screening with focused deep validation, CD Genomics' Whole Genome SNP Genotyping service offers orthogonal validation of imputed genotypes at selected loci.

Figure 1: Low-Pass WGS Workflow and Imputation Accuracy — A 3-panel illustration. Left panel: A schematic of the lpWGS workflow — sparse reads across a chromosome segment, genotype likelihood calculation at each position, imputation against a phased reference panel, output of imputed genotypes with dosage and quality scores. Center panel: A line graph showing imputation accuracy (r² on Y-axis) vs minor allele frequency for three coverage levels (0.5×, 1×, 2×), demonstrating the MAF-dependent accuracy drop-off. Right panel: A bar chart comparing GWAS power for 30× WGS on 200 samples vs 1× lpWGS on 2,000 samples for a simulated polygenic trait, showing lpWGS with larger N outperforms deep WGS with smaller N.

Standard Coverage (10–30×) — The Workhorse of Re-Sequencing

Standard-coverage WGS at 10–30× is the default for projects where individual genotypes must be called — not imputed — with high confidence. At 10×, approximately 99.5% of the genome is covered by at least one read; at 30×, coverage is essentially complete (>99.99%) and heterozygous calls are supported by a median of 15 reads, providing the statistical power to distinguish true heterozygotes from sequencing error with >99.9% accuracy (DePristo et al., 2011).

What Standard Coverage Enables

De novo SNP and indel discovery. Unlike imputation-based approaches, standard coverage supports per-sample variant calling with GATK HaplotypeCaller or DeepVariant, detecting variants without reliance on a reference panel. This is essential for non-model organisms, admixed populations, and studies where novel or population-private variants are the primary focus. The sensitivity gain from 10× to 30× is substantial for rare variants: at 10×, a heterozygous SNP with MAF 0.5% is called in approximately 85% of carriers; at 30×, that rises to >97% (Zhao et al., 2020).

Population genetic inference. Selection scans (XP-CLR, iHS, nSL), demographic reconstruction (PSMC, MSMC2, Stairway Plot 2), and population differentiation statistics (Fst, D-statistic) all benefit from called genotypes rather than imputed dosages — particularly when the analysis involves allele frequency spectra, where imputation can smooth or distort the site frequency distribution at low frequencies. For PSMC analysis, which requires heterozygote calls across a single diploid genome, 15–20× is the practical minimum.

Structural variant detection. Reliable SV calling requires read depth, split-read, and paired-end discordance signals that are sparse or absent at low coverage. Manta, Delly, and Lumpy — the standard SV callers — achieve >80% sensitivity for deletions >1 kb and duplications >5 kb at 15× in a 3 Gb genome; at 30×, sensitivity for the same SV classes exceeds 95%. For SV-focused studies, coverage below 15× introduces an unacceptably high false-negative rate.

When 10× Is Enough, When 30× Is Needed

Application	10× Sufficient?	15× Sufficient?	30× Recommended?
SNP calling (common, MAF >5%)	Yes	Yes	Overkill
SNP calling (rare, MAF <1%)	Marginal	Adequate	Yes
Indel calling (<50 bp)	Marginal	Adequate	Yes
SV detection (>1 kb)	Marginal	Adequate	Yes
PSMC demographic inference	No (≥18×)	Marginal	Yes
HLA/phased haplotype calling	No	No	Yes
De novo mutation detection (trio)	No	No	Yes (≥30× per sample)
Reference panel construction	No	No	Yes

A practical rule: for single-nucleotide variant discovery in species with existing reference panels, 10× is cost-effective. For any analysis involving indels, structural variants, phasing, or rare variants, budget for 30×. The marginal cost of going from 10× to 30× — roughly $100–200 per sample at current pricing — buys disproportionate gains in variant detection sensitivity and future data utility.

A representative application: the USDA-ARS soybean pangenome project re-sequenced 300 Glycine max accessions at 15× to characterize nucleotide diversity (π), identify selective sweeps via XP-CLR, and reconstruct the domestication bottleneck with PSMC. At 15×, called genotypes achieved >99% concordance with deep WGS for SNPs with MAF >2%, and MSMC2 successfully recovered the known ~8,000-year domestication bottleneck — analyses that would have been unreliable with imputed 1× genotypes. For population genomic inference that depends on allele frequency spectra rather than individual-level genotype calls, 15× represents a pragmatic sweet spot between cost and data quality.

Figure 2: Coverage vs. Variant Detection Sensitivity — A multi-line plot showing variant detection sensitivity (Y-axis, 0–100%) as a function of sequencing depth (X-axis, 1× to 60×). Five curves representing different variant classes: homozygous SNPs (yellow, >95% at 5×), heterozygous SNPs (blue, >95% at 15×), small indels 1–10 bp (green, >90% at 20×), large deletions >1 kb (orange, >90% at 25×), and de novo mutations (red, >90% at 40×). Dashed vertical reference lines at 10× and 30×. Clean white background, scientific plotting aesthetic, minimal gridlines.

High Coverage (30×+) — Rare Variants, Somatic Mutations, and Reference-Grade Genomes

Deep WGS at ≥30× occupies a distinct niche: it is required when the variants of interest are individually rare, somatically acquired, or must serve as a community reference resource for years of reanalysis.

Rare Variant Association Testing

Rare variants (MAF <1%) contribute disproportionately to the missing heritability of complex traits and are the primary targets of gene-based association tests (SKAT-O, burden tests). Calling a rare heterozygous variant requires sufficient read depth to distinguish the alternate allele from sequencing error: at 30×, a heterozygous site has a median alternate allele depth of 15 reads, and the probability of observing ≥3 alternate alleles from sequencing error alone (Q30 base quality → 0.1% error rate) at a homozygous reference site is approximately 10⁻⁶. At 10×, the same probability rises to ~10⁻³, producing false-positive rare variant calls that dilute association signals. For rare-variant burden testing in cohorts of >1,000 individuals, the rare-variant false discovery rate at 10× is 3–5× higher than at 30×, directly reducing statistical power.

Reference Panel Construction

High-quality imputation reference panels — the backbone of low-pass WGS strategies — are themselves built from deeply sequenced genomes. The gnomAD reference panel uses 30× PCR-free Illumina WGS; the 1000 Genomes Project high-coverage phase used 30×; TOPMed uses 30–38×. The logic is circular but sound: you need a relatively small number of deeply sequenced genomes to unlock the cost efficiency of low-coverage sequencing for thousands more. For non-model organisms, sequencing 50–100 genetically representative individuals at ≥25× and phasing with SHAPEIT5 provides a custom reference panel sufficient to impute the remaining cohort at 1–4× with >94% concordance, as demonstrated in allo-octoploid strawberry (Koorevaar et al., 2025).

Somatic Mutation Detection

Cancer genomics, aging research, and clonal evolution studies require distinguishing true somatic variants — present in a fraction of cells — from germline heterozygotes and sequencing errors. At 30×, a somatic variant present in 10% of cells has a median alternate allele depth of 1.5 reads — at the edge of detectability. At 60×, that same variant has a median alternate depth of 3 reads, crossing the standard minimum threshold for somatic calling (≥3 supporting reads). For single-cell WGS or ultra-low-frequency somatic variant detection, 60–100× is the operational standard.

CD Genomics provides 30× WGS on Illumina NovaSeq and DNBSEQ platforms through its Whole Genome Sequencing service, with optional long-read complement via Long-Read Sequencing Services for structural variant resolution and haplotype phasing.

The Hidden Costs of Depth — Storage, Compute, and Time

Sequencing depth is not just a cost of reagents — it generates proportional data volume, storage burden, and compute time. These hidden costs often exceed the sequencing itself over a project's lifecycle.

Data Generation at Each Depth Tier

Depth	FASTQ Size (3 Gb Genome)	BAM Size	CRAM Size	Total per Sample	1,000 Samples
0.5×	~1.5 GB	~1 GB	~0.5 GB	~3 GB	~3 TB
1×	~3 GB	~2 GB	~1 GB	~6 GB	~6 TB
4×	~12 GB	~8 GB	~4 GB	~24 GB	~24 TB
10×	~30 GB	~20 GB	~10 GB	~60 GB	~60 TB
30×	~90 GB	~60 GB	~30 GB	~180 GB	~180 TB
60×	~180 GB	~120 GB	~60 GB	~360 GB	~360 TB

CRAM format reduces alignment storage by 40–50% compared to BAM. For genotype data, the PGEN format (PLINK 2.0) achieves 98% compression compared to flat-text VCF — a 2 TB genotype matrix becomes ~40 GB. These format choices are not cosmetic; for a 1,000-sample project at 30×, choosing CRAM + PGEN from the outset saves approximately 100 TB of storage, translating to $25,000–50,000 in cloud storage costs over a 5-year project lifecycle.

Compute Costs Scale with Depth

Alignment with BWA-MEM2 scales approximately linearly with read count — a 30× genome takes roughly 30× longer to align than a 1× genome. Joint genotyping with GATK scales less favorably: GenomicsDBImport processing time is roughly proportional to the number of variant sites, which itself scales sublinearly with depth (diminishing returns beyond ~15× for SNP discovery), but GenotypeGVCFs runtime scales with both sample count and depth. For a 1,000-sample cohort at 10×, joint genotyping requires approximately 500 core-hours and 500 GB RAM; at 30×, the same cohort requires approximately 1,500 core-hours and 1 TB RAM — a 3× compute cost increase for a 2× gain in rare variant sensitivity.

The Cloud vs. HPC Decision at Different Depths

For projects under ~200 samples at ≤10×, cloud computing (AWS, Google Cloud) is cost-competitive with on-premise HPC and avoids upfront infrastructure costs. For projects exceeding 500 samples at ≥30×, on-premise HPC with parallel storage (Lustre, GPFS) amortizes to a lower per-sample cost but requires six-figure upfront investment. A practical intermediate: use cloud spot/preemptible instances for per-sample alignment (embarrassingly parallel), then on-premise or reserved cloud instances for joint genotyping (memory-intensive, harder to parallelize).

Decision Framework — Matching Depth to Your Research Question

The choice of sequencing depth should be driven by four questions, answered in order:

What variant class answers your biological question? Common SNPs (MAF >5%) → 0.5–2× + imputation is sufficient. Rare SNPs (MAF <1%) → ≥15× required. Structural variants → ≥20×. Somatic mutations → ≥60×. De novo mutations → ≥30× in trios.
Do you have a population-matched reference panel? Yes, with >1,000 haplotypes → lpWGS at 0.5–2× is viable. No reference panel → two options: (a) sequence 50–100 individuals at ≥25× to build a custom panel, then sequence remainder at 1–4×; or (b) sequence all samples at ≥10× for called genotypes without imputation.
What is your budget per sample? <$50 → 0.5–1× lpWGS. $50–100 → 1–4× lpWGS. $100–250 → 10–15× standard. $250–400 → 30× deep. Above $400/sample → ultra-deep specialized applications.
How will the data be used in the future? If the dataset will be re-analyzed for years, combined with other cohorts, or serve as a community resource → invest in ≥30× for maximum flexibility. If the analysis is single-purpose (one GWAS, one publication) → lpWGS at 1–2× is the cost-effective choice.

Rapid Decision Table

Your Scenario	Recommended Depth	Justification
GWAS, N >2,000, human/population-matched panel	0.5–1× lpWGS	Common-variant power driven by N, not depth
GWAS, N 200–500, non-model species, no panel	10–15× standard	Need called genotypes; imputation not viable
Population structure + demography, 10–30 per pop	10–15×	PSMC, Fst, and π benefit from called genotypes
Genomic selection, breeding program	1–4× lpWGS + custom panel	Maximize N; imputation validated in agriculture
Rare-variant burden test, case-control	30×	Low MAF calls require high depth
Reference panel construction	25–35×	Community resource; maximizes downstream imputation accuracy
Somatic mosaicism / single-cell	60×+	Low VAF calls require extreme depth
CNV-only screen, large cohort	0.5–1× lpWGS	Large CNVs detectable at very low depth
SV discovery	20–30×	Manta/Delly sensitivity drops below 15×
De novo assembly (reference genome)	30–50× HiFi + 15–20× ONT	See our De Novo Genome Sequencing Guide

WGS Depth Decision Flowchart: visual decision tree mapping research questions to recommended sequencing depths with color-coded terminal nodes. Figure 3: WGS Depth Decision Flowchart — A visual decision tree mapping research questions to recommended sequencing depths. Starting from the top: (1) "What variant class answers your question?" branches to Common SNPs → Low-Pass, Rare SNPs/SVs → Standard/Deep, Somatic → Ultra-Deep. (2) "Reference panel available?" branches to Yes → lpWGS + imputation, No → Standard or build custom panel. (3) "Budget per sample?" with dollar thresholds mapped to depth tiers. (4) "Future reuse?" branches to Yes → 30× Deep, No → match depth to immediate question. Terminal nodes color-coded by depth tier: light blue (0.5–4×), medium blue (10–15×), dark blue (30×), navy (60×+). Modern flat infographic style, white background, clean sans-serif typography.

The Hybrid Approach — Mixing Depths in One Project

The most cost-effective large-scale designs often combine depth tiers within a single project. Three validated hybrid strategies:

Reference panel + discovery cohort. Sequence 10–20% of samples at ≥25× to build a custom haplotype reference panel; sequence the remaining 80–90% at 1–4× and impute against the custom panel. This strategy delivered 94–98% imputation concordance in allo-octoploid strawberry using ~70 reference individuals at ≥25× (Koorevaar et al., 2025) and has been validated across aquaculture species (spotted sea bass, olive flounder, mud crab), crops (maize, soybean, rice), and livestock (cattle, pig, salmon).

WES + low-pass WGS for CNV. Whole exome sequencing (WES) captures coding variants at high depth but is blind to noncoding CNVs. Adding 2–4× lpWGS to WES samples — the "blended genome-exome" approach — provides genome-wide CNV detection at marginal additional cost (~$40–80 per sample). This is increasingly adopted in rare disease research where coding SNV analysis (WES) and noncoding CNV analysis (lpWGS) are both required.

Phased rollout across budget cycles. Year 1: 1× lpWGS on the full cohort ($30/sample, 2,000 samples = $60K). Analyze, publish GWAS. Year 2–3: 30× on the top 200 samples ($300/sample, $60K). Build custom reference panel, re-impute Year 1 data, publish rare-variant analysis. Year 4: re-analyze the combined dataset with improved methods. This staged approach aligns spending with grant cycles while progressively increasing data resolution.

For projects requiring both population-scale re-sequencing and depth optimization, see our companion guide on Large-Scale WGS Re-Sequencing Projects for coverage of sample logistics, joint genotyping at scale, and population genetic analysis suites. For the broader context of how depth decisions fit into the WGS landscape, see our Whole Genome Sequencing Services Hub.

Practical Procurement — From Decision to Purchase Order

How to Talk to Sequencing Providers

When requesting quotes for a WGS project, specify these parameters — providers cannot give accurate pricing without them:

Genome size and expected coverage per sample (not just "WGS" — "3 Gb genome, 10× coverage, 150 bp paired-end")
Number of samples and whether they are provided as extracted DNA or tissue (DNA extraction adds $20–50/sample)
Library preparation type (PCR-free vs PCR-plus; PCR-free costs more but eliminates GC bias)
Multiplexing preference (how many samples per lane/flow cell; higher multiplexing reduces per-sample cost)
Data delivery format (FASTQ only vs BAM/CRAM + VCF; analysis services add $50–200/sample depending on depth and complexity)
Turnaround time (standard 8–12 weeks vs expedited 4–6 weeks; expedited typically carries a 25–50% surcharge)

Validating Your Depth Choice with a Pilot

Before committing the full cohort, run a pilot batch of 8–16 samples at your planned depth plus one higher tier. If you are planning 1× lpWGS, pilot at 1× and 4× for the same samples. Compare: imputation accuracy (r²) at 1× vs called genotypes at 4×; concordance at known variant sites if validation data exists; and library complexity metrics (duplicate rate, insert size distribution, coverage uniformity). A $1,500–3,000 pilot can prevent a $50,000 mistake.

Budget Template for a 3 Gb Genome Project

Line Item	1× lpWGS (1,000 samples)	10× Std (300 samples)	30× Deep (100 samples)
DNA extraction + QC	$20,000 ($20/sample)	$6,000 ($20/sample)	$2,000 ($20/sample)
Library preparation	$50,000 ($50/sample)	$21,000 ($70/sample PCR-free)	$10,000 ($100/sample PCR-free)
Sequencing	$30,000 ($30/sample)	$60,000 ($200/sample)	$30,000 ($300/sample)
Data storage (5 yr)	$3,000	$15,000	$15,000
Bioinformatics analysis	$15,000	$15,000	$10,000
Imputation (if applicable)	$5,000	—	—
Project management	$5,000	$5,000	$3,000
Total	$128,000	$122,000	$70,000
Cost per sample	$128	$407	$700
GWAS power (h²=0.3)	Highest (N=1,000)	Moderate (N=300)	Low (N=100)

CD Genomics provides Genome-Wide Association Study (GWAS) and Population Evolution analysis services integrated with WGS at any depth, from experimental design consultation through publication-ready figures. For projects requiring copy number analysis complementing low-pass WGS, our CNV Sequencing Services provide depth-optimized CNV calling at both low and high coverage.

FAQ

What is the difference between low-pass WGS and SNP arrays?

Low-pass WGS sequences the entire genome at 0.5–4× coverage and imputes missing genotypes, capturing genome-wide variation without pre-selected markers. SNP arrays genotype 500K–2M pre-selected sites. Low-pass WGS avoids ascertainment bias (arrays are designed primarily from European populations), captures variants arrays miss, and generates data that can be re-analyzed as reference panels and imputation methods improve. However, arrays are simpler to analyze (no imputation required for called genotypes) and remain cheaper at very small sample sizes (<50).

At what coverage can I reliably call structural variants?

Large deletions and duplications (>1 Mb) are detectable from 0.5–1× lpWGS using read-depth-based tools (cn.mops, CNVkit). For comprehensive SV detection including insertions, inversions, and smaller events (<100 kb), ≥20× coverage with split-read and paired-end-based callers (Manta, Delly) is recommended. SV calling from 30× data achieves >95% sensitivity for events >1 kb in a 3 Gb genome.

Do I need a reference panel for low-pass WGS imputation?

For GLIMPSE2-based imputation, yes — a phased reference panel of ≥500 haplotypes is the minimum, with >1,000 strongly preferred. If no reference panel exists for your species, two alternatives: (1) build a custom panel by sequencing 50–100 genetically diverse individuals at ≥25×, then impute the remainder at 1–4×; or (2) use STITCH for reference-free imputation, which requires ≥100 samples at ≥2× coverage.

How does sequencing depth affect GWAS statistical power?

For common-variant GWAS (MAF >5%), statistical power is driven primarily by sample size, not coverage, once coverage exceeds ~0.5× with imputation. Sequencing 1,000 individuals at 1× will detect more true GWAS associations than 100 individuals at 30×. For rare-variant GWAS (MAF <1%), the relationship reverses: variant detection sensitivity requires ≥15× coverage, and imputation cannot recover variants absent from the reference panel.

What is the cheapest way to sequence 500 genomes?

At current (2025) pricing for a 1 Gb genome: 1× lpWGS at ~$30/sample = $15,000 total. Add $5,000 for imputation against a public reference panel = $20,000. This provides common-variant genotypes suitable for GWAS, population structure, and genomic prediction. If rare variants or SVs are required, budget for 10× at ~$200/sample = $100,000 for 500 samples.

Can I combine samples sequenced at different depths in one analysis?

Yes. Joint genotyping with GATK handles heterogeneous coverage across samples, and GLIMPSE2 imputation can harmonize a mixed-coverage design where 10–20% of samples are deep (≥25×) and the remainder are low-coverage (1–4×). This hybrid design is the most cost-effective strategy for population-scale projects in non-model organisms.

How much storage does a WGS project need?

A 1,000-sample project at 10× for a 3 Gb genome generates approximately 60 TB of data across FASTQ, BAM, and VCF files. Using CRAM instead of BAM reduces this by 40–50% (~36 TB). Adding PGEN for genotype data saves an additional 1–2 TB. Cloud archival storage costs roughly $25–50 per TB per month for active storage and $1–4 per TB per month for archival (glacier) storage.

What is the turnaround time for WGS at different depths?

Sequencing time scales linearly with coverage. A NovaSeq S4 flow cell produces ~3 Tb of data per 44-hour run. At 1× (3 Gb/sample), approximately 1,000 samples can be sequenced per run. At 30×, approximately 33 samples per run. Typical project timelines including library preparation, sequencing, and bioinformatics: 1× lpWGS = 4–6 weeks for 1,000 samples; 30× WGS = 8–12 weeks for 100 samples.

References:

Rubinacci S, Hofmeister RJ, Sousa da Mota B, Delaneau O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nature Genetics. 2023;55(7):1088-1090. doi:10.1038/s41588-023-01438-3
Wasik K, Berisa T, Pickrell JK, et al. Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics. BMC Genomics. 2021;22:197. doi:10.1186/s12864-021-07508-2
Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nature Genetics. 2023;55(7):1243-1249. doi:10.1038/s41588-023-01415-w
Koorevaar T, van de Weg E, Visser RGF, et al. Genotype imputation from low-coverage WGS using haplotype reference panels in cultivated strawberry. BMC Genomics. 2025;26(1):968. doi:10.1186/s12864-025-12270-w
DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43(5):491-498. doi:10.1038/ng.806
Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. doi:10.1093/gigascience/giab008
Atsawawaranunt K, Whibley A, Santure AW, et al. Missing or mis-telling the story? Trade-offs for restriction-site associated compared to whole genome sequencing. Molecular Ecology. 2026;35(5):e17707. doi:10.1111/mec.17707
Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Scientific Reports. 2020;10:20222. doi:10.1038/s41598-020-77218-4
Davies RW, Flint J, Myers S, Mott R. Rapid genotype imputation from sequence without reference panels. Nature Genetics. 2016;48(8):965-969. doi:10.1038/ng.3594
Davies RW, Kucka M, Su D, et al. Rapid genotype imputation from sequence with reference panels. Nature Genetics. 2021;53(7):1104-1111. doi:10.1038/s41588-021-00877-0

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

Related Services

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.