Reporting LD Results That Reviewers Accept: Plots, Thresholds, Pitfalls
Many strong papers stall because linkage disequilibrium analysis is reported inconsistently. Reviewers see an ld heatmap with unclear axes, or an ld plot r2 without a stated cutoff. Methods mention tools but omit versions and window sizes. Most comments trace to three gaps: ambiguous metrics, unmotivated thresholds, and missing quality control ld details. This guide shows what to report, why it matters, and how to package figures so reviewers say "clear and reproducible." You will find templates, checklists, and direct links to implement the pipeline and connect LD results to tag SNP design.
Why LD Reporting Gets Papers Stuck in Review
LD is intuitive once shown well, yet easy to misread without context. A reviewer asks, "Is this r² or D′?" Another asks, "Why 0.2 as the cutoff?" A third notes that population structure was not addressed, making the decay curve hard to interpret. None of these are fatal. They are preventable.
Three recurring failure points:
- Metric ambiguity: r² and D′ carry different biological meanings and statistical behaviour.
- Threshold opacity: reviewers want to see the link from decay behaviour to chosen cutoffs.
- Invisible QC: without sample and variant filtering evidence, any plot looks suspect.
Fix these, and you transform a good analysis into a reviewer-ready package. Your results become easier to trust, reuse, and extend.
What Reviewers Want to See—And Why It Matters
Editors reward clarity, reproducibility, and relevance. Strong LD reporting advances all three:
- Clarity: State the LD metric in each figure and caption. Mark thresholds on plots. Annotate notable high-LD "ridges" or long-range LD regions.
- Reproducibility: List software, versions, windows, steps, and seeds. Provide a compact parameter table and, if possible, a toy region with commands.
- Relevance: Tie LD evidence to what follows—pruning for association tests, panel design, or haplotype-based interpretation.
A crisp LD section shortens reviewer exchanges and speeds decisions. It also saves your readers time when they adapt your pipeline to their cohorts.
Implementation details: see Running LD the Right Way: PLINK Workflow, Parameters, and LD Pruning
From LD to assay choices: see From LD to Tag SNPs: Building Efficient Panels Without Losing Power
Minimum Reporting Standards for LD Methods & Legends
Editors look for specific items in Methods and figure captions. Make them easy to find.
Study context
- Reference genome build and source.
- Phasing status and tool assumptions.
- Cohort description and ancestry labels at journal-approved resolution.
- Final sample counts after QC.
Variant handling
- Variant types analyzed (SNPs alone, or SNPs + indels).
- Filters with thresholds: minor allele frequency, call rate, Hardy–Weinberg equilibrium.
- Imputation details if used: reference panel, software, version, quality score cutoffs.
- Strand and allele harmonisation across datasets and chips.
LD computation
- Metric used: r² or D′, with a one-line rationale.
- Window size and step in physical distance (e.g., 500 kb window, 50 kb step).
- Pairwise distance bins for decay plots (e.g., 10 kb bins).
- Software, exact versions, and relevant flags.
- Random seeds for any stochastic procedures.
Parameter table (recommended)
A single, skimmable table listing: software + version, window/step, bin sizes, filters and cutoffs, seeds, and any liftOver notes. Reviewers often copy this table into their notes.
Quick answers for reader
- When to report r² vs D′? Use r² for predictability, pruning, and tag SNP evidence; use D′ to describe recombination and define haplotype blocks. Report both when both topics are central.
- What window must be stated? Always state the physical window and step. Justify them using your cohort's LD decay.
For curve logic and block strategy, see LD Decay and Haplotype Blocks: Interpreting Curves for Marker Strategy.
The Core Figure Set: Heatmaps, LD Plot r², Blocks, and QC
A focused figure set communicates better than a crowded one. Most manuscripts need four components.
1) Pairwise LD heatmap
- Triangular matrix with a fixed colour scale from 0 to 1.
- Metric labelled on the legend (r² or D′).
- Genomic coordinates on axes; add a gene track if space allows.
- Consistent scale across panels for fair comparison.
- Bin size or SNP density noted; post-QC sample size in caption.
Caption template:
"Pairwise LD heatmap (r²) for chrX: 34.0–34.8 Mb; 5 kb bins; scale 0–1; n=1,234 post-QC."
Pairwise LD heatmaps with r² (upper triangle) and |D′| (lower triangle). (Andrade A.C.B. et al. (2019) PLOS ONE).
2) LD decay curve (ld plot r2)
- Mean or median r² by distance bins.
- 95% confidence intervals via bootstrap or jackknife.
- Annotate the distance where r² drops below the chosen cutoff.
- Optional panels: per chromosome or per population.
Caption template:
"LD decay: mean r² by 10 kb bins with bootstrap 95% CI; r² < 0.2 at ~180 kb."
LD decay across distance in three popcorn populations. (Andrade A.C.B. et al. (2019) PLOS ONE).
3) Haplotype block visualisation
- Identify the block algorithm and parameters (e.g., CI-based D′ thresholds).
- Provide block size distribution as an inset or table.
- Show one example region with gene context and block boundaries.
Caption template:
"Haplotype blocks by CI-based D′; min CI 0.7; median block 38 kb (IQR 16–82 kb)."
Genome-wide distribution of SNPs and haplotype blocks across chromosomes. (Otyama P.I. et al. (2019) BMC Genomics).
4) QC evidence panels
- MAF distribution after filtering.
- Missingness per sample and per variant.
- Relatedness and duplicate checks.
- HWE departures and handling.
- Imputation quality, if applicable.
- Known long-range LD regions or inversions and how they were treated.
Design guidance
- Prefer vector output; if raster, supply ≥300 dpi.
- Use colour-blind-safe palettes; avoid red–green reliance.
- Keep axis labels readable in print (≥8–9 pt).
Thresholds That Survive Peer Review (r², Windows, MAF)
Thresholds should be presented as decisions tied to observed decay, marker density, and study goals. Write the goal, then the numbers, then the justification.
Pruning for association pre-processing
- Goal: reduce correlated predictors and inflated test counts.
- Typical choice: window 200–500 kb, step 20–50 kb, pruning r² between 0.1 and 0.5.
- Justification: balances redundancy control with retention of signal.
- Report: window, step, r², per-population settings if used, and sensitivity checks.
Tag SNP selection for panels
- Goal: capture common variation using the fewest markers.
- Typical choice: r² ≥ 0.8 (or ≥0.9 for very compact panels).
- Justification: ensures predictive coverage for target MAF ranges.
- Report: coverage achieved (proportion of common SNPs tagged) and constraints (assay chemistry, multiplexing).
- Next: From LD to Tag SNPs
(/pop-genomics/resources/tag-snp-selection-from-ld.html).
Block calling
- Goal: describe recombination landscape and haplotype structure.
- Parameters: algorithm, confidence thresholds, minimum informative SNPs per block.
- Report: block size distribution and at least one region with gene context.
MAF matters for r²
- r² depends on allele frequency. Very rare variants inflate variance.
- Present MAF-stratified r², or apply a stated MAF floor (for example, ≥1–5%).
- Include a short sensitivity analysis if rare variants feature in conclusions.
Raising the MAF threshold inflates mean r² and extends LD half-decay distances. (Otyama P.I. et al. (2019) BMC Genomics).
Choosing windows and steps
- Read your decay curve first. Choose a window that extends beyond the distance where r² crosses the main cutoff.
- Use a step small enough to preserve local structure but not so small that noise dominates.
- State both numbers and link them to marker density and computational limits.
Concise answers for scanners
- Acceptable r² thresholds: 0.1–0.5 for pruning; ≥0.8 for tagging, when justified by decay and study aims.
- How to justify: reference the decay crossing point, your SNP density, and the downstream trade-off you prioritise.
For commands and flags, see Running LD the Right Way. For curve interpretation, see LD Decay & Haplotype Blocks.
Quality Control for LD: Show Your Work
Robust QC turns attractive plots into evidence. Make QC visible and specific.
QC checklist
- Sample filters: call-rate threshold, sex checks if relevant, removal of duplicates and close relatives.
- Variant filters: MAF floor, per-variant call-rate cutoff, HWE test and handling of outliers.
- Build and alignment: confirm genome build; document liftover steps if used; verify allele and strand alignment.
- Imputation: reference panel, software and version, quality score and cutoff.
- Long-range LD and inversions: list known regions and your treatment.
Threshold table
Provide a compact table summarising all thresholds and pass/fail counts. Reviewers will cite the table directly in their reports.
Common pitfalls and one-line fixes
- Allele flips: harmonise strands against the reference; validate on a sentinel SNP set.
- Mixed builds: liftover all datasets to a single build; re-index and re-check.
- Phasing ambiguity: declare if r² used unphased genotypes; state phasing for block calls.
- Rare-variant inflation: show MAF-stratified r² or analyses excluding very rare variants.
- Hidden long-range LD: add larger windows to the scan; report any excluded regions.
Make reproduction painless
- Archive commands, parameter files, and seeds.
- Provide a small "toy region" with input and expected outputs.
- Add a short README with steps a reviewer can run in minutes.
Peer-Review FAQ: LD Heatmaps, r² Cutoffs, and QC Evidence
What must a publication-ready LD heatmap include?
A triangular matrix with a fixed 0–1 colour scale, genomic coordinates, and the metric (r² or D′). Include bin size and post-QC sample count. Keep the legend clear and consistent across panels.
When should I report r² versus D′?
Use r² for predictability, pruning, and tag SNP evidence. Use D′ for recombination history and haplotype block structure. If you discuss both tagging and blocks, report both metrics.
What r² thresholds are acceptable for pruning and tagging?
Pruning commonly uses r² between 0.1 and 0.5. Tagging typically targets r² ≥ 0.8. Reviewers expect a short justification tied to decay behaviour and marker density.
How should I choose LD window and step sizes?
Base them on your decay curve. The window should exceed the distance where r² crosses your cutoff. Choose a step that preserves local structure without exaggerating noise.
Do I need phasing for LD analyses and block calling?
Pairwise r² can be computed from unphased genotypes. Many block algorithms assume phased haplotypes. Declare phasing status and tool assumptions in Methods.
How many samples are needed for stable r² estimates?
Stability improves with larger sample sizes and with common variants. Very rare variants increase variance; consider MAF-stratified r² or a stated MAF floor.
How do I visualise LD decay and report the crossing point?
Bin SNP pairs by distance, plot mean or median r² with 95% intervals, and annotate the distance where r² falls below your cutoff. If relevant, compare curves by chromosome or population.
How should I handle long-range LD or inversions?
Flag known regions, run sensitivity analyses with and without them, and document the decision in Methods and captions. Mention any regions excluded from pruning or tagging.
Which QC items must be shown for LD studies?
MAF and missingness filters, HWE checks, relatedness removal, imputation quality (if used), and post-QC sample counts. Present a compact threshold table and pass/fail counts.
What figure resolution and fonts do journals expect?
Prefer vector formats for line art. For raster heatmaps, provide ≥300 dpi. Use fonts large enough to remain readable in print proofs (at least 8–9 pt).
How do I report haplotype blocks and versions?
Name the algorithm, version, and parameters (for example, confidence intervals for D′). Provide block size distributions and an example region with gene context.
Can I compare LD across populations, and what needs harmonising?
Yes, after harmonising genome builds, variant sets, and allele strands. Plot curves side by side and discuss demographic context, not just numeric differences.
What belongs in Supplementary to reduce reviewer queries?
Command logs, parameter files, random seeds, input manifests, and a toy example with expected outputs. A short README saves both authors and reviewers time.
Conclusion: Make LD Results Easy to Trust
Clear ld heatmaps, justified ld plot r2 thresholds, and visible quality control ld move manuscripts from "unclear" to "accepted." Report the metric, the context, and the rationale. Then back every claim with reproducible commands and well-labelled figures. Your audience should grasp the story quickly, and reviewers should be able to re-run essentials without guesswork.
If you need publication-ready deliverables, CD Genomics provides reviewer-ready LD reports: heatmaps, decay curves, block calls, parameter logs, and a submission checklist. All services are for research use only; we do not provide clinical or individual testing.
Next steps
References
- Andrade, A.C.B., Viana, J.M.S., Pereira, H.D. et al. Linkage disequilibrium and haplotype block patterns in popcorn populations. PLOS ONE 14, e0219417 (2019).
- Otyama, P.I., Wilkey, A., Kulkarni, R. et al. Evaluation of linkage disequilibrium, population structure, and genetic diversity in the U.S. peanut mini core collection. BMC Genomics 20, 481 (2019).
- He, F., Ding, S., Wang, H. et al. IntAssoPlot: An R package for integrated visualization of genome-wide association study results with gene structure and linkage disequilibrium matrix. Frontiers in Genetics 11, 260 (2020).
- Chang, C.C., Chow, C.C., Tellier, L.C.A.M. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
- Barrett, J.C., Fry, B., Maller, J. et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
- Machiela, M.J., Chanock, S.J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
* Designed for biological research and industrial applications, not intended
for individual clinical or medical purposes.