This ddRAD plant manual is written for breeding programs and ag-biotech teams that need reliable SNP panels from non-model crops. It walks through enzyme pairs for plants, plant-aware sample prep, library and size-selection choices for repetitive genomes, coverage targets for small vs. large genomes, and an analysis workflow (Stacks/ipyrad) that turns reads into breeding-ready markers.

The practical problem. Many crop projects require tens of thousands of markers across hundreds of lines, often without a complete reference genome. Double-digest RAD-seq (ddRAD) solves this by using two restriction enzymes and narrow size selection to sample the same genomic fraction across individuals. Compared with shotgun sequencing, ddRAD delivers stable locus recovery at a lower cost per sample, making it well suited to non-model plants with large or repeat-rich genomes.
Double-digest RAD-seq schematic: two-enzyme digestion plus narrow size selection yields consistent genomic tags and supports high multiplexing with dual indexing. (Peterson B.K. et al., 2012, PLOS ONE).
Why ddRAD over single-digest RAD or GBS? Two-enzyme digestion reduces random shearing effects, increases reproducibility across plates and seasons, and allows you to "dial" marker density by choosing enzyme pairs and fragment windows. In breeding contexts, that control translates into cleaner downstream statistics—fewer missing genotypes, better locus sharing across cohorts, and less batch-to-batch drift.
What you'll get from this guide. A decision-first walkthrough: (1) plant-aware sampling and DNA QC, (2) enzyme pair selection by genome size and GC profile, (3) library design for repetitive genomes, (4) sequencing and coverage targets you can budget around, (5) analysis and QC tuned for plants, and (6) how to hand off a panel for population structure, GWAS/QTL, and selection scans—without wasting runs on avoidable pitfalls.
Project goals map → design (quick view).
Common early pitfalls to avoid. Over-tight fragment windows in repeat-rich genomes (causing locus dropout), hidden relatedness inflating signals, and index hopping on patterned flow cells when non-unique indexes are used. These three issues account for a large share of preventable QC failures in plant ddRAD panels.
Sampling that respects plant chemistry. Young leaves are usually best (lower polysaccharides and phenolics). Seeds and roots often carry more inhibitors; pre-washes (e.g., sorbitol) help. Maintain a cold chain; avoid repeated freeze–thaw cycles that shear DNA. A CTAB-based extraction remains a robust default for diverse crop tissues; many labs add PVP for phenolic-rich species and RNase to reduce carryover.
DNA QC thresholds that protect downstream genotyping.
These thresholds are conservative but practical across leaf, seed, and root tissues. In our experience, meeting them reduces allele dropout and uneven digestion far more than any exotic cleanup step.
Barcoding and batching at plate scale. Randomize samples across plates and columns to decouple phenotype groups from lane effects. Use unique dual indexes (UDI) so that hopped reads (index misassignment) can be identified and removed during demultiplexing. On patterned flow cells (e.g., HiSeq 4000, NovaSeq), a modest level of index cross-talk can occur; UDIs and careful cleanup of free adapters meaningfully reduce it.
Characterization of index swapping on patterned flow cells; UDI enables detection of swapped reads across tiles and index combinations. (Costello M. et al., 2018, BMC Genomics).
Cross-batch controls that pay off later.
Hands-on prep tips (from plant ddRAD projects).
Together, these practical steps have a stronger effect on downstream call rates than minor tweaks to PCR cycles or ligation timing.
Choosing enzyme pairs by genome and GC content. Enzyme selection determines tag density and locus sharing. In plants, methylation patterns matter: methylation-sensitive rare cutters (e.g., PstI) bias toward gene-rich regions, which can be desirable for trait mapping; methylation-insensitive partners (e.g., MspI) maintain broader coverage. Practical, plant-friendly pairs often combine a rare cutter with a frequent cutter to balance tag count and complexity.
Table — Common ddRAD enzyme pairs for plants (practical guide)
| Rare cutter + frequent cutter | Methylation sensitivity | Typical use case (qualitative) | Notes for plants |
| PstI + MspI | PstI sensitive; MspI largely insensitive | Genic bias, moderate tag density | Good for many crops; enriches gene-proximal SNPs |
| EcoRI + MspI | EcoRI sensitive; MspI insensitive | Balanced panels | Broadly used; watch GC bias in some species |
| SbfI + MspI | SbfI very rare; MspI insensitive | Large genomes; fewer but consistent tags | Keeps locus counts tractable in repeat-rich genomes |
| AvaII + MspI | AvaII rare; MspI insensitive | Cross-species, cost-aware plant panels | Reported as robust in plant-optimized ddRAD protocols |
How to use this table. Start with two candidate pairs that bracket your expected tag density (e.g., PstI+MspI for genic bias and AvaII+MspI for cross-species consistency). Run a 24–48 sample pilot with two fragment windows to see which combination maximizes shared polymorphic loci after filtering.
Fragment window selection for repetitive genomes. In large or repeat-rich genomes, a moderately wide window (for example, 400–700 bp) often improves locus sharing while limiting repetitive fragments. Extremely tight windows can increase dropout when small shifts in fragment distributions occur between plates. Evaluate distribution on a Bioanalyzer or equivalent before committing the full cohort.
Avoiding star activity and chimera formation.
Indexing and read structure. Use UDIs to control index cross-talk and pair with paired-end sequencing (e.g., PE150) so that downstream software can assemble short contigs per locus. The extra information from paired-end reads reduces paralog conflation and improves genotyping in complex plant genomes.
How many reads per sample? There is no one-size-fits-all number across crops, but a practical rule is to aim for coverage that stabilizes genotype calls after filtering. Many population-scale ddRAD projects target ≈10× per retained locus. Translating that into reads depends on enzyme+window choices and genome size; typical ranges land around 1–3 million read pairs per sample for moderate-density panels in small-to-medium genomes. Plan a pilot to measure: (i) loci shared across ≥80% of individuals, (ii) missingness after QC, and (iii) effective per-locus depth.
Genome-size-aware scenarios.
Lane balance and multiplexing. Pool libraries to equalize representation and avoid over-clustering. On patterned flow cells, keep adapter carryover low and use UDIs; any residual index swapping can then be removed during demultiplexing.
When to consider low-pass depth plus imputation. For very large cohorts, moderate per-sample depth combined with within-population imputation can reduce costs while preserving analytical power. This is common in plant genomics pipelines where pedigree or population structure supports accurate imputation; document the reference panels and methods used to keep results reproducible.
Pilot design that answers the only question that matters. Compare two enzyme pairs × two size windows on 24–48 representative samples. Choose the recipe that maximizes shared polymorphic loci at acceptable missingness. Do not chase the absolute largest raw SNP count; prioritize robustness and cross-plate reproducibility.
De novo vs. reference-guided analysis. When a high-quality reference genome exists (even from a related cultivar), reference-aligned calling improves contiguity, functional interpretation, and integration with downstream tools (LD, GWAS, selection scans). Where references are incomplete, a de novo approach remains defensible if you use conservative filters. Two widely used, well-documented toolchains:
Parameter tuning that actually matters (m/M/n). The trio of parameters—minimum depth to form a stack (m), allowable mismatches within individuals (M), and across individuals (n)—controls marker yield, genotyping error, and inferred differentiation. Follow a documented grid-search strategy and track core metrics (loci, SNPs, heterozygosity, r80/shared loci).
Number of polymorphic loci present in ≥75% of individuals across values of m (with different M/n), before and after PCR-clone removal. (Díaz-Arce N. & Rodríguez-Ezpeleta N., 2019, Frontiers in Genetics)
Pipeline differences change biological conclusions. Stacks, SAMtools-based workflows, and GATK can produce different call sets on reduced-representation data because of genotype models and filters. To protect interpretability: fix versions, publish parameter files, and replicate key steps on a subset when you change chemistry or instruments.
Plant-aware filtering to reduce false positives.
A mini-playbook for fast triage.
De novo vs. reference-aligned (decision note). If you anticipate Genome-wide Association Analysis or selection scans, reference-aligned calls simplify integration with gene annotations and LD maps. If your priority is broad discovery in a non-model crop, start de novo with conservative filters and consider lifting over representative loci once a reference becomes available.
What "breeding-ready" looks like.
Panel hand-off checklist (one page).
Why this matters to breeding. The faster you can turn sequencing into a stable, annotated SNP set, the sooner your program can run genomic selection, verify QTLs, and advance lines with confidence. A robust ddRAD panel is often the cheapest path to that outcome in non-model crops.
Start with a rarer cutter paired to a frequent, largely methylation-insensitive cutter—for example, SbfI + MspI or AvaII + MspI—and test a moderately wide window (e.g., 400–700 bp). The goal is a tractable number of tags that still share well across individuals
Budget to stabilize genotype likelihoods after filtering. Many panels balance cost and power around ≈10× per retained locus; depending on genome size and enzyme/window choice, that may equate to ~1–3 M read pairs per sample. Confirm with a pilot; do not lock budgets without empirical locus-sharing metrics.
Yes, with tighter size selection, paired-end reads, and paralog-aware filters (depth and heterozygosity caps). Paired-end aware callers (e.g., Stacks v2) help separate paralogs and reduce false heterozygosity in complex plant genomes.
Use unique dual indexes, remove free adapters/primers before pooling, and discard unexpected index combinations during demultiplexing. Double indexing is a proven strategy to limit misassignment.
No. ddRAD was designed to work without one. When a good reference exists, reference-aligned calling simplifies LD, GWAS, and functional interpretation; otherwise, a de novo pipeline with conservative filters is a defensible starting point.
If you're mapping diversity or traits in a non-model crop, our team can help you move from idea to pilot to breeding-ready panel:
Ready to scope a pilot? Share species, approximate genome size, desired marker density, and budget envelope. We'll return two enzyme pairs × two size windows, a per-sample coverage plan, and a milestone-based timeline you can take to internal review.
Related Reading:
References