Low-Coverage WGS + ANGSD vs ddRAD: When to Replace, When to Complement
Projects that profile genetic diversity, demography, or selection often face a high-impact choice: lcWGS + ANGSD genotype likelihoods or ddRAD. This decision guide explains what each method delivers, when to switch, and when to run them together—so you can design a defensible study, control costs, and still recover genome-wide signals.

Executive Decision Snapshot
The short version.
- Choose lcWGS + ANGSD when you need unbiased, genome-wide information: structure, site frequency spectrum (SFS), FST, relatedness, linkage disequilibrium (LD) and selection scans—all from low depth using genotype likelihoods rather than hard calls. It scales well, integrates with imputation, and avoids many ascertainment issues tied to arrays.
- Choose ddRAD when the reference is weak or absent, the genome is very large, or budgets demand strict per-sample cost control while still delivering robust population differentiation and diversity signals. ddRAD remains a cost-aware way to genotype non-model species at scale.
Not either/or.
Running a small ddRAD pilot to map broad structure, then scaling with lcWGS + ANGSD for genome-wide inference is common. This complementarity keeps early risk low while preserving long-term analytical power.
What "lcWGS + ANGSD" Really Means
Low-coverage WGS (lcWGS).
lcWGS typically ranges from ~0.5× to ~6× per sample. At these depths, hard genotype calls are unreliable, but ANGSD solves this by operating on genotype likelihoods (GLs) computed directly from reads and base qualities. With GLs, you can estimate allele frequencies, perform population structure inference, compute FST, and build the SFS—without converting to error-prone hard calls.
Overview of ANGSD's genotype-likelihood (GL) approach enabling population genetic inference at low sequencing depth. (Korneliussen T.S. et al. (2014) BMC Bioinformatics)
ANGSD, in practice.
ANGSD and its GL-aware companions (e.g., NGSadmix) are built for low-depth population genomics. They let you run PCA/structure, admixture, relatedness, and selection tests while propagating uncertainty from read depth and quality. That statistical discipline is the reason lcWGS + ANGSD often outperforms array-based pipelines for discovery-oriented projects or diverse ancestries.
Why analysts like it.
- Works on BAM/CRAM, avoids premature hard calls.
- Handles large cohorts with scalable workflows.
- Plays well with imputation (e.g., GLIMPSE2) when you later need dense genotypes for GWAS or fine-mapping.
Where ddRAD Still Wins
No or poor reference genome.
ddRAD was designed for non-model species. By using two restriction enzymes and a narrow size window, it samples a repeatable genomic fraction across individuals—without requiring a finished reference genome. That makes it a defensible default for many wildlife and plant projects starting from scratch.
Loading plot of the first (F1) and second (F2) principal components showing the variation for main fruit traits in accessions of the mini-core set developed from ddRAD SNP data of 288 cultivated tomato genotypes. (Esposito, S., et al. Hortic Res, 2020)
Tight budgets, large genomes.
Very large or repeat-rich genomes can make lcWGS expensive at useful depths. ddRAD keeps per-sample costs tractable while still recovering thousands of informative SNPs. With careful enzyme choice, size-selection, and paired-end assembly (e.g., Stacks v2), you can generate reliable population-level markers.
Stable differentiation and diversity readouts.
When your primary deliverables are structure, diversity, and FST, ddRAD can be more than enough—especially in conservation genomics and breeding programs where relative differences matter more than dense LD scans. Parameterization still matters; follow published guidance to tune m/M/n and filtering to maximize shared, high-quality loci.
Where lcWGS + ANGSD Wins
Genome-wide, unbiased signals.
lcWGS samples the whole genome, which helps avoid ascertainment bias common to pre-selected SNP sets. That matters for demographic inference, rare variants, and selection signals. With modern imputation (e.g., GLIMPSE2), low-coverage data can be boosted to dense variant sets with strong accuracy—especially when large reference panels exist.
Imputation performance of low-coverage sequencing data. (Li, D., et al., Journal of Dairy Science, 2025)
Population structure, admixture, and relatedness at low depth.
GL-based approaches (ANGSD, NGSadmix) let you estimate structure and relatedness directly from low-depth data, preserving power while acknowledging uncertainty. This is particularly useful in mixed-ancestry cohorts or when sampling constraints limit depth.
Downstream analytics beyond ddRAD's comfort zone.
If you need LD-decay curves, selection scans with fine resolution, or demographic models that rely on genome-wide SFS, lcWGS is generally the stronger substrate—even at low depth. In several large-cohort settings, low-coverage sequencing outperforms arrays for discovery and imputation-driven analyses, especially at rare variants.
Design Trade-offs: Samples, Depth, Reference & Budget
The four levers you can actually control.
- Reference quality. A high-quality reference tilts the choice toward lcWGS; a weak or distant reference keeps ddRAD competitive.
- Depth vs samples. More samples at lower depth improve structure/admixture inference with ANGSD; ddRAD offers a parallel cost-control path by reducing genomic breadth.
- Genome size/complexity. Larger genomes raise lcWGS costs; ddRAD moderates locus counts via enzyme/window choices.
- Target analyses. If you need LD/selection scans or demographic models, favor lcWGS + ANGSD. For differentiation and diversity across many sites at low cost, ddRAD is often sufficient.
Decision patterns we see repeatedly.
- Exploration without a reference: start with ddRAD to map structure; preserve tissue/DNA for later lcWGS if a reference improves.
- Reference exists, budget moderate: lcWGS at 0.5–4× with ANGSD/NGSadmix for GL-based inference; impute later if GWAS is planned.
- Mixed-method validation: cross-validate ddRAD-based structure with lcWGS on a subset; this de-risks long projects and quantifies any method bias.
Practical Recipes (Field-Tested)
Recipe A — No/weak reference & strict cost control (ddRAD).
- Pick a rare + frequent cutter pair suitable for genome GC/methylation patterns; run a narrow but tolerant size window to maximize shared loci.
- Use unique dual indexes and balanced pooling; check for index hopping on patterned flow cells and remediate during demultiplexing.
- Assemble with Stacks v2 (paired-end aware); grid-search m/M/n and apply paralog/missingness filters to stabilize downstream inferences.
Outputs: structure, diversity, FST, and a filtered VCF/PLINK set for selection scans with moderate resolution.
Recipe B — Moderate reference; need LD/selection and SFS (lcWGS + ANGSD).
- Sequence 0.5–4× depending on genome size and budget; align to the best available reference.
- Compute genotype likelihoods in ANGSD; run NGSadmix for structure; estimate FST and build the SFS.
- If imputation is planned, keep read groups and QC tidy; later boost density with GLIMPSE2 or similar.
Outputs: genome-wide structure, LD decay, selection scans, demographic summaries; imputation-ready BAM/CRAM plus GL matrices.
Recipe C — Relatedness at low depth (lcWGS + GLs).
- Avoid hard calls at 1–2×; estimate kinship/relatedness directly from GLs to reduce bias.
- Use replicate pairs and known pedigrees to validate thresholds.
Outputs: a kinship matrix that supports downstream GWAS controls or stratified sampling decisions.
Method Biases & How to Manage Them
Ascertainment bias (arrays) vs breadth (lcWGS) vs representation (ddRAD).
Arrays draw from pre-ascertained SNPs, which can distort diversity and demographic inferences; lcWGS avoids this by sampling the whole genome. ddRAD samples a repeatable subset of the genome defined by restriction motifs and size windows; that subset is not pre-ascertained by frequency, but it can carry representation quirks if enzyme/window choices are extreme. Plan pilots to quantify bias before committing.
Pipeline effects are real.
For ddRAD, Stacks v2 improves locus assembly and genotyping using paired-end contigs; parameter choices (m/M/n) and PCR-clone handling change results. For lcWGS + ANGSD, different GL models and filters also matter. Freeze versions, record commands, and publish parameter files with your release.
Index hopping and multiplexing.
On patterned flow cells, measure and mitigate index swapping with unique dual indexes and cleanup of free adapters. Remove unexpected index pairs in demultiplexing to reduce cross-talk.
Characterization of index swapping mechanism. (Costello M. et al., 2018, BMC Genomics)
Integration with Services & Deliverables
What we hand over for either path.
- Clean VCF/BCF or GL-aware outputs, PLINK files, and a version-locked data dictionary.
- A visual QC dashboard covering depth, missingness, contamination, and replicate concordance.
- For lcWGS + imputation projects, pre-imputation and post-imputation reports with panel metadata.
Where each service fits.
- Population Genomics Sequencing Service: run lcWGS or ddRAD at scale with plate randomization and UDIs.
- Pipeline Choice: ANGSD/NGSadmix for GLs; Stacks v2 for ddRAD.
- Population Structure & Relatedness: GL-aware structure and kinship at low depth.
- Project Design & Budget Modeling: reads-per-sample calculator; enzyme/window pilots; imputation panel strategy.
FAQ
How many reads per sample for lcWGS + ANGSD?
It depends on genome size and analyses, but many projects run at ~0.5–4× and validate with SFS and structure convergence rather than fixed read counts. The key is that GL-based methods preserve power at low depth while acknowledging uncertainty.
No reference genome—can lcWGS still work?
You can align to a draft or a pseudo-reference, but if the reference is weak and budgets are tight, ddRAD is often the safer first step. Build structure with ddRAD, then revisit lcWGS once the reference improves.
Are arrays cheaper than lcWGS for population genomics?
Sometimes, but arrays carry ascertainment bias that can distort diversity and demography. Low-coverage sequencing plus modern imputation is increasingly competitive and often superior for discovery and rare variants.
Can I estimate relatedness at 1–2× depth?
Yes. Use GL-based methods rather than hard calls. They deliver reliable kinship and structure with low coverage.
Does ddRAD still make sense if I need selection scans?
Yes, for coarse scans and comparative FST. For fine-scale LD/selection and demographic modelling, lcWGS + ANGSD is generally stronger due to genome-wide coverage.
Your Next Steps
If you need a fast, defensible plan, we will model both paths against your constraints:
- Share genome size, reference status, target analyses (structure, LD/selection, PRS/GWAS), and budget envelope.
- We'll return a side-by-side design: ddRAD vs lcWGS + ANGSD, a pilot plan (enzyme/window or depth grid), and a reads-per-sample budget you can take to internal review.
- Add optional Low-Pass WGS & Imputation if you need dense genotypes later for GWAS or fine-mapping.
Related Reading:
References
- Peterson, B.K., Weber, J.N., Kay, E.H., Fisher, H.S., Hoekstra, H.E. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7, e37135 (2012).
- Korneliussen, T.S., Albrechtsen, A., Nielsen, R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15, 356 (2014).
- Díaz-Arce, N., Rodríguez-Ezpeleta, N. Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better? Frontiers in Genetics 10, 533 (2019).
- Lachance, J., Tishkoff, S.A. SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. Genome Medicine 5, 33 (2013).
- Costello, M., Fleharty, M., Abreu, J. et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 332 (2018).
- Rochette, N.C., Rivera-Colón, A.G., Catchen, J.M. Stacks 2: Analytical methods for paired-end sequencing improve RADseq-based population genomics. Molecular Ecology 28(21), 4737–4754 (2019).
- Meisner, J., Albrechtsen, A. Inferring population structure and admixture proportions in low-depth NGS data. Genetics 210(2), 719–731 (2018).
- Esposito, S., Cardi, T., Campanelli, G. et al. ddRAD sequencing-based genotyping for population structure analysis in cultivated tomato provides new insights into the genomic diversity of Mediterranean ‘da serbo' type long shelf-life germplasm. Hortic Res 7, 134 (2020).
- Li, D., Xiao, Y., Chen, X. et al. Genomic selection and weighted single-step genome-wide association study of sheep body weight and milk yield: Imputing low-coverage sequencing data with similar genetic background panels. Journal of Dairy Science 108, 3820–3834 (2025).
* Designed for biological research and industrial applications, not intended
for individual clinical or medical purposes.