LD Decay and Haplotype Blocks: Interpreting Curves for Marker Strategy
Modern genotyping projects live or die on how well you read the LD signal. This article explains how to turn LD decay curves and haplotype block calls into practical choices for marker density, sliding-window sizes, and cross-population transfer. Built for Linkage Disequilibrium Analysis at scale (and full Linkage Disequilibrium Analysis workflows), we prioritize interpretation over tooling. You'll learn what the curve features mean, how recombination sculpts those features, and how to convert them into defensible, reviewer-ready decisions that save budget without losing power.
Why LD Decay Matters Now
LD decay describes how the average pairwise r² decreases as genomic distance increases.
LD decay concentrates many biological and technical realities into one curve. A fast drop implies extensive recombination or a large effective population size. A slow decline suggests longer correlation tracts, fewer historical recombination events, or specialized local structure. Both scenarios demand different marketing strategies.
What's at stake:
- Panel efficiency. Over-dense panels waste reads and budget. Under-dense panels miss signal and reduce predictive power.
- Transferability. A design that fits one ancestry may underperform elsewhere. Decay differences are often the culprit.
- Downstream stability. Collinearity from poor spacing inflates false positives or weakens fine-mapping resolution.
Attention check: If your pilot GWAS feels unstable, your marker spacing likely ignores the true decay curve. Fix the spacing, and the models usually calm down.
Reading the Curve: From r² vs Distance to Design Knobs
A good decay plot bins distances and reports the mean or median r² per bin. Read it as you would any dynamics plot: short-range plateau, inflection zone, long-range tail. Three interpretable features guide most design choices.
Analysis of LD Decay for Functionally Different Segments of the Human Genome. (Guryev V. et al. (2006) PLOS Genetics)
1) Initial r² (short-range plateau)
This shows how tightly nearby variants move together. Elevated plateaus often appear in low-recombination segments, in founder populations, or in regions with structural features. High short-range r² means you can remove redundant markers more aggressively within that segment while watching for abrupt shifts at segment boundaries.
Implication: allow tighter pruning or fewer tags in that short range, but do not assume the same rule applies 200 kb away.
2) Half-decay distance (H50)
Define H50 as the distance where r² drops to half the short-range plateau. It's a practical proxy for "correlation reach."
- H50 ≤ 50 kb: correlation drops quickly; use smaller windows and higher density.
- H50 ≈ 100–250 kb: correlation persists; windows can be wider with modest density.
- H50 > 250 kb: watch for long-range structure or specific regions such as the MHC.
Implication: H50 helps set your default sliding-window size and the typical number of tags per megabase.
3) Tail behaviour (longer distances)
The tail reveals residual long-range correlation. Extended tails can reflect recombination cold spots, inversions, population substructure, or run-of-homozygosity effects.
Schematic of the structure of long range linkage disequilibria. (Koch E. et al. (2013) PLOS ONE)
Implication: long tails argue for region-aware rules—exclude the region from genome-wide pruning, analyze it separately, and document exceptions in the report.
Quick reading checklist
- Short-range plateau height → local redundancy.
- H50 position → default window and step.
- Tail persistence → special-region handling and reporting notes.
Haplotype Blocks: Definitions, Methods, and Thresholds
A haplotype block is a chromosomal segment where variants travel together with minimal recombination inside the segment.
Patterns of LD for Orthologous Genomic Segments of Approximately 5 Mb in Rat, Human, and Mouse. (Guryev V. et al. (2006) PLOS Genetics)
You'll see multiple block-calling traditions. Choose one, document thresholds, and keep the outputs reviewer-friendly.
Common approaches you can defend
- D′-based (Gabriel criteria). Estimates recombination evidence between pairs and groups adjacent markers into blocks when the evidence is low.
- Four-gamete rule. Detects historical recombination by identifying all four haplotype combinations; a boundary exists where all four appear.
- LD segmentation / HMM. Models correlation transitions explicitly; more flexible in uneven density or mixed panels.
Each method has different sensitivities to allele frequency, sample size, and genotyping density. Whatever you choose, publish:
- Block size distribution. Median and IQR per chromosome and per ancestry.
- MAF sensitivity. Show how block metrics shift from MAF ≥0.05 to MAF ≥0.01.
- Tag density per block. Report the minimum tag count meeting r² coverage goals within blocks.
Guardrail: keep the r² colour scale and legend consistent across chromosomes. Inconsistent scales undermine comparisons and peer confidence.
Different block-building methods and parameters (e.g., LD thresholds, fixed windows, HaploBlocker) produce markedly different numbers of blocks and haplotypes across datasets. (Giraldo P.A. et al. (2023) Frontiers in Plant Science)
Turning Curves into Numbers: Density, Windows, and Cross-Population Transfer
This is where decay interpretation becomes policy. The aim is to set marker spacing, sliding-window sizes, and r² thresholds that survive contact with real cohorts.
From curve features to spacing policy
- If H50 ≤ 50 kb:
- Density: aim for at least one high-quality marker every 20–40 kb in target regions.
- Windows: 100–150 kb windows for pruning and inspection; step modestly.
- Tags: r² ≥ 0.8 for local coverage, but verify in dense, gene-rich segments.
- If H50 ~ 100–250 kb:
- Density: one marker every 50–100 kb in baseline regions; denser around known loci.
- Windows: 250 kb is practical for pruning/inspection; shrink in high-recombination hotspots.
- Tags: r² ≥ 0.8 still works; raise thresholds in fine-mapping targets.
- If tails extend beyond 250 kb:
- Density: mix general spacing with region-specific exceptions.
- Windows: keep global windows moderate; treat long-range segments as special cases.
- Tags: consider multi-marker tags or haplotype-based proxies for stubborn segments.
Windows and steps that mirror the curve
A single global window rarely reflects local biology. The curve argues for hybrid windows:
- Use distance-based windows (kb) to align with biology.
- Cap pair counts by variant count in ultra-dense regions to keep compute bounded.
- Move the step from 5 to 10 variants when the curve is smooth; small loss, big time win.
r² thresholds for different deliverables
- Pruning: r² ≈ 0.2 keeps models stable and reduces collinearity.
- Tagging: r² ≥ 0.8 strikes a balance between coverage and parsimony; push to 0.9 for critical loci.
- Block reporting: if reviewers ask for blocks, document the method and r² bins next to the block statistics.
Cross-population transfer in three moves
- Anchor group: fit density and windows to the anchor ancestry where you have power.
- Coverage checks: compute cross-ancestry r² coverage for the same tag set; flag weak regions.
- Supplements: add small, ancestry-specific supplements in flagged regions rather than redesigning the whole panel.
Outcome: a base panel that travels well, with compact add-ons where biology diverges.
Average r2 values for CA, MA and CAN animals, and persistence of LD phase (PL) between CA and MA animals with respect to physical genomic distance (kb). (Mokry F.B. et al. (2014) BMC Genomics)
QC and Reporting: The Plots Reviewers Accept
LD interpretation collapses without clean inputs. Even here—where we avoid tool minutiae—the minimum QC story matters because it defends your curve and block outputs.
QC story (what you state, not how you compute)
- Allele and build harmonisation. Declare the reference build and allele conventions; note strand handling.
- Missingness controls. State sample and marker completeness thresholds used to stabilise r².
- Frequency filters. Report the MAF threshold used for decay and block analysis.
- Population structure. Confirm that decay and blocks were computed within ancestry strata.
- Imputation quality. If imputed, provide INFO/R² cutoffs applied before LD summaries.
One sentence that helps reviewers: "Decay and blocks were computed within ancestry strata, after harmonisation and frequency/quality filters; thresholds and file manifests are provided in the supplement."
Plot set that earns trust
- r²–distance curves per ancestry. Same binning, same axes, same r² colour scale.
- Representative LD heatmaps. Choose a typical region and one complex region; annotate windows and MAF.
- Block size histogram. Label median size and IQR; show the count of single-SNP "blocks" if your criteria permit them.
- Tag coverage table. For each ancestry, report the proportion of variants covered by tags at r² ≥ 0.8 in target regions.
Reproducibility note: include a parameter manifest listing frequency filters, window sizes, step sizes, r² bins, block criteria, reference build, and software/versions.
Common interpretation pitfalls to pre-empt
- Mixed ancestry in decay. Pooled decay curves blur real differences; always stratify first.
- Scale switching. Changing r² scales across figures invites misreading; lock the scale.
- Over-pruning before decay. Pruning too early flattens the curve and hides structure; compute decay on a representative set.
Worked Examples: From Plot to Policy
Below are compact, scenario-based examples. They illustrate how to read the curve and adjust the design accordingly without repeating pipeline steps.
Example A — Fast decay cohort (short H50, clean tail)
- Curve: high initial r², H50 ~ 40–60 kb, tail dives quickly.
- Interpretation: frequent recombination; local correlation dies quickly.
- Policy: dense around candidates (20–40 kb), global windows ~150 kb, step modestly. Keep r² ≥ 0.8 for tags; raise to 0.9 only in fine-mapping pockets.
- Transfer: expect lower transfer into cohorts with slower decay; prepare small supplements.
Example B — Moderate decay cohort (mid H50, classic shape)
- Curve: initial r² moderate, H50 ~ 150–200 kb, steady tail.
- Interpretation: "typical" recombination profile; good for a single base density.
- Policy: global windows ~250 kb, step 5–10 variants; one tag every 50–100 kb as a baseline.
- Transfer: anchor the panel here, test coverage in other ancestries, then plan add-ons.
Example C — Long-tail region (local anomaly)
- Curve (regional): short-range looks normal, but r² > 0.2 persists beyond 300 kb.
- Interpretation: long-range LD region or structure such as an inversion.
- Policy: treat the region specially—do not let it dictate genome-wide spacing. Build a local rule: widened windows for inspection, capped comparisons, and explicit reporting.
- Transfer: often ancestry-specific; test before adding heavy marker load.
Tip: These examples emphasise interpretation and policy, not commands. For execution details, send readers to your internal pipeline or to your PLINK LD Workflow page.
Cross-Population Comparisons Without the Pain
You do not need to recompute everything from scratch to compare groups meaningfully. Plan comparisons so they isolate true biological differences.
- Fix the binning. Use identical distance bins and r² statistics across ancestries.
- Report paired metrics. Compare H50, tail length thresholds (e.g., distance where r² < 0.1), and block medians side by side.
- Lock the visuals. Same axis ranges and colour scales; otherwise readers will infer differences that are scale artefacts.
- Highlight exceptions, not averages. A small set of regions often drives transfer issues; list them explicitly.
Practical deliverable: a one-page, four-panel figure—two decay plots and two histograms (block size and tag coverage)—with identical axes. It answers 90% of collaborator questions at a glance.
From Curves to Cost: Budget-Aware Decisions
Interpretation guides budget just as much as it guides science. Use the curve to trade off reads, turnaround time, and robustness.
- Avoid uniform density by habit. If decay is fast in most regions, target depth where it matters, not everywhere.
- Bundle "difficult" loci. Treat long-tail regions as a separate cost centre; concentrate extra markers only there.
- Stage work with pilots. A tiny pilot on two chromosomes validates spacing and block rules before you commit the whole cohort.
- Document exceptions. Exceptions cost time later if undocumented. Label them once in the manifest and reuse across studies.
Outcome: a design that spends where biology demands and nowhere else.
FAQs: LD Decay & Haplotype Blocks
1) How do I read an LD decay curve and choose bins/windows?
Use fixed distance bins across cohorts; set window size from H50 and cap comparisons in dense data.
2) What is H50 and how does it guide marker density?
H50 is the distance where r² halves; smaller H50 needs tighter spacing, larger H50 allows wider spacing but watch long tails.
3) Which haplotype block method should I use, and what must I report?
Pick one method (Gabriel, four-gamete, or HMM) and report thresholds, MAF filters, block sizes, and software/version.
4) What r² thresholds work for pruning and tag SNP selection?
Use ~0.2 for pruning; use ≥0.8—up to 0.9 for fine-mapping—for tagging, validated per ancestry.
5) How do I handle long-range LD so it doesn't distort design rules?
Flag LRLD regions, exclude them from global pruning, analyze separately, and document the rationale.
6) How do I check cross-population transfer before freezing panel density?
Keep identical bins and r² cutoffs across ancestries, test tag coverage, and add small ancestry-specific supplements where needed.
Action: Apply, Validate, and Scale
Turn interpretation into a plan your team can execute this week.
Step-by-step next actions
- Extract features from your current decay plots: plateau r², H50, tail threshold distances.
- Propose windows and density from those features, including a small list of expected exception regions.
- Pilot on two chromosomes, then adjust the step and windows based on retained markers and runtime.
- Lock the manifest with frequency filters, window sizes, step sizes, r² thresholds, and block criteria.
- Validate transfer into at least one other ancestry; add a compact supplement if required.
- Publish a figure set with consistent scales and captions that state the exact thresholds used.
Internal links for the hub-and-spoke journey
Glossary
- LD decay: average r² as a function of genomic distance; used to set spacing.
- H50: distance where r² halves from its short-range plateau; a proxy for correlation reach.
- Haplotype block: contiguous segment with high internal LD and limited recombination.
- Tag SNP: a marker chosen to proxy nearby variants at a chosen r² threshold.
Reader Checklist (copy into your brief)
- Do our decay plots show a fast, moderate, or slow drop?
- What are H50 and tail thresholds in our anchor ancestry?
- Which regions demand exceptions, and why?
- What density and windows follow from those numbers?
- How will we validate transfer and document exceptions?
Final Though
Interpreting LD decay and haplotype blocks is not an academic exercise—it is your pathway from exploratory data to efficient, credible panels. Read the curve features, convert them into spacing and windows, and publish a manifest that any reviewer can follow. Done well, LD interpretation becomes a durable asset across studies, platforms, and ancestries.
CD Genomics provides research-use services for institutions and companies. We do not offer personal or clinical testing.
References
- Guryev, V., Smits, B.M.G., van de Belt, J. et al. Haplotype Block Structure Is Conserved across Mammals. PLoS Genetics 2, e121 (2006).
- Koch, E., Ristroph, M., Kirkpatrick, M. Long Range Linkage Disequilibrium across the Human Genome. PLoS ONE 8, e80754 (2013).
- Mokry, F.B., Buzanskas, M.E., de Alvarenga Mudadu, M. et al. Linkage disequilibrium and haplotype block structure in a composite beef cattle breed. BMC Genomics 15 (Suppl 7), S6 (2014).
- Weber, S.E., Frisch, M., Snowdon, R.J., Voss-Fels, K.P. Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets. Frontiers in Plant Science 14, 1217589 (2023).
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
- Gabriel, S.B., Schaffner, S.F., Nguyen, H. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).
- Purcell, S., Neale, B., Todd-Brown, K. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81, 559–575 (2007).
* Designed for biological research and industrial applications, not intended
for individual clinical or medical purposes.