Selective Sweep Analysis Workflow: How to Detect Recent Selection With iHS, XP-EHH, and FST

Selective sweep analysis workflow diagram comparing neutral and sweep signatures

Selective sweep scans are easy to run and hard to defend.

If you're committed to a recent/positive selection scan, your real risk is getting peaks that collapse under confounders like structure, demography, recombination heterogeneity, or technical imbalance.

This article gives you a decision-first, reviewer-ready workflow that treats iHS, XP-EHH, and FST as pieces of one project design. It's a selective sweep analysis workflow you can defend: decide whether your dataset can support a scan, pick the best first statistic, control false positives, and report results in an auditable way.

Key takeaways

A defensible sweep scan starts with cohort definition and confounder control, not software. Choose iHS vs XP-EHH vs FST based on population setup and expected sweep stage, then call candidate regions conservatively (clusters and windows, not single SNP extremes). Report standardization, windowing, and region-merging rules explicitly and treat signals as hypotheses that require cross-evidence support.

Why selective sweep analysis still matters in population genomics

Selective sweep scans remain useful when your question is local adaptation, domestication, breed differentiation, or population-specific responses to environmental or management pressure. Recent positive selection can leave correlated patterns in haplotypes (long-range homozygosity), allele frequencies (skews relative to background), and between-population differentiation.

What sweeps do not give you is a finished functional conclusion. A selection signal is not a causal gene, and it is not a validated trait mechanism. In good projects, sweep scans narrow the search space and guide follow-up work.

If you want a baseline refresher for how local sweeps interact with diversity summaries, CD Genomics' page on genetic diversity analysis provides helpful context for interpreting "reduced diversity" without turning it into a single-cause story.

Selective sweep analysis workflow: decide before you compute

A sweep scan becomes defendable when you treat it as a workflow with explicit gates. In practice, you want to answer four questions:

Is your cohort definition stable under structure checks?
Does your genotype matrix support haplotype interpretation, or should claims stay region-level and conservative?
Which statistic should you run first given your contrast (one population vs two, incomplete vs near-fixed)?
How will you call peaks, merge candidate regions, and report sensitivity checks so results are reproducible?

In-article image: end-to-end pipeline diagram from raw data to reporting

Which statistic should you use first?

The best first statistic depends on whether selection is within one population or between populations, whether the sweep is incomplete or near fixation, and whether phased haplotypes are trustworthy enough to interpret long-range patterns.

When iHS is the better starting point

iHS is a within-population haplotype statistic. It is typically most informative for recent, incomplete sweeps where the selected allele is not fixed and both alleles are present at usable frequency. In that regime, selection can keep a long haplotype intact around the favored allele before recombination breaks it down.

Because iHS is haplotype-based, phasing and marker density matter. If phasing is weak, iHS can still be computed, but the correct response is not to ignore the issue; it is to tighten your interpretation rules and report sensitivity checks.

For a concise grounding in EHH-derived statistics and what iHS/XP-EHH are measuring, see the review/tutorial "Detecting selection using extended haplotype homozygosity (EHH)" (2022).

When XP-EHH is more informative

XP-EHH is a cross-population haplotype statistic. It is often more informative when a sweep is closer to completion in one population relative to a reference population, because the contrast is "extended haplotype in population A compared to population B." If your project is framed explicitly as an "XP-EHH selective sweep" scan, the same caution applies: the statistic is only as interpretable as the biological comparison pair and the harmonization of QC and phasing across populations.

XP-EHH does not rescue an unstable comparison. If your reference population has a different demographic history, hidden admixture, or systematically different data quality, XP-EHH can flag differences that are methodological rather than biological.

If you want a practical anchor for common EHH-based scan implementations in pipelines, the selscan paper (MBE, 2014) is a useful reference point.

When FST adds value rather than noise

FST measures differentiation. It is valuable when your question is explicitly comparative and you need a frequency-contrast layer—exactly why careful FST outlier interpretation matters. But "high FST" is not a synonym for "positive selection," and FST outliers can be frequent under neutrality when the assumed population model is mismatched.

A widely cited caution is "Pervasive selection or is it…? Why are FST outliers sometimes so frequent?" (2013), which explains why outlier counts can inflate even without pervasive adaptation.

Which statistic should I use for recent selection?

Use this table as a pre-analysis decision record. It is designed to be copied into project notes so your method choice looks like a choice, not a habit.

Study goal	Population setup	Expected sweep stage	Input requirement	Main strength	Main limitation	Better first choice
Detect recent selection within a single population	One population	Incomplete / ongoing	Reliable haplotypes; adequate marker density	Sensitive to long haplotypes at intermediate allele frequency	Sensitive to phasing error and demography	iHS
Detect population-specific sweeps relative to a reference	Two populations (test vs reference)	Near completion in test population	Comparable QC/phasing across populations	Highlights differential haplotype extension	Reference choice can confound interpretation	XP-EHH
Prioritize loci with strong between-population differentiation	Two or more populations	Any (not stage-specific)	Stable allele frequencies	Simple differentiation lens; useful as context	High FST ≠ selection; model sensitivity	FST (supporting)
Build a conservative candidate-region shortlist	One or two populations	Uncertain / mixed	At least one haplotype stat + frequency context	Convergent evidence reduces over-interpretation	Requires explicit window/merge rules	iHS or XP-EHH first; add FST as context

In-article image: decision flowchart from study goal and data readiness to the recommended first statistic

Start with data that can support a sweep scan

A practical way to use this selective sweep analysis workflow is to treat "data readiness" as a gating decision: if your genotype matrix cannot support haplotype interpretation, run a more conservative scan plan and downgrade claims accordingly. That is the core of positive selection scan QC in real projects.

Sweep statistics are only as credible as the genotype matrix, haplotype quality, and population definition behind them.

Marker density and missingness determine whether haplotype patterns are interpretable and whether your peaks represent biology or information artifacts.

If you want a focused refresher on how LD shapes what "windowing" and "clustering" mean in practice, see linkage disequilibrium analysis.

For study design, whole-genome datasets usually provide the cleanest substrate for haplotype interpretation (see whole genome re-sequencing for population genetics), while reduced-representation datasets can be workable but require more caution about coverage and marker distribution (see reduced-representation sequencing for population genetics).

Build a defensible workflow before you calculate any statistic

A credible sweep workflow starts with QC, population definition, LD-aware preprocessing, and sensitivity checks rather than with the software command itself.

Step 1: QC and cohort definition

Define cohorts in a way that you can reproduce from a sample sheet and a script. Report inclusion/exclusion rules, missingness thresholds, and relatedness handling. If you cannot explain cohort rules clearly, you cannot defend downstream peaks.

Step 2: Structure and relatedness checks

Structure is the gatekeeper for sweep interpretation. Run structure checks before sweep scans, not after, and treat cohort labels as hypotheses you test.

For a reviewer-friendly order of operations (QC → LD pruning → PCA → ancestry modeling → tree-like summaries), see the population structure analysis workflow.

Step 3: Phasing and harmonized inputs

If you phase, report how and ensure the same approach is applied consistently across populations. If you cannot defend phasing reliability, keep your interpretation region-centric and require stronger neighborhood support.

Step 4: Standardization, windowing, and peak calling

A single extreme value is rarely the best evidence. Prefer region-level candidate calls supported by multiple nearby markers and a windowing rule that reflects LD scale.

If you want a citation for the general point that window size smooths noise at the cost of sensitivity and that this tradeoff should be discussed, see "Evaluating the performance of selection scans to detect…" (2007).

Control false positives before you interpret a peak

Demography, structure, recombination heterogeneity, and technical imbalance can all mimic selection unless you test them explicitly.

Structure and demography: the two most common confounders

Structure can inflate sweep-like signals because ancestry differences can create long-range correlations and allele-frequency contrasts that resemble selection. Demographic history can reduce diversity and reshape haplotype patterns genome-wide, changing what "outlier" means.

A useful reference for how demography and recombination context can elevate false positives in sweep detection is "On Detecting Selective Sweeps Using Single Genomes" (2011).

For a compact overview of structure-analysis approaches and when to use them, see population structure analysis tools.

Recombination heterogeneity, long-range LD, and assembly context

Low-recombination regions preserve longer haplotypes under neutrality, which can make haplotype outliers less meaningful. If you lack a recombination map, mark peaks in suspected low-recombination regions as lower confidence unless supported by multiple independent lines of evidence.

Assembly artifacts can produce abnormal LD and reduced diversity patterns. If a small number of scaffolds dominate your top hits, treat that as a signal to inspect genome context before writing a selection story.

Batch effects and missingness imbalance

If sequencing batch is confounded with population labels, both haplotype and frequency statistics can detect batch rather than selection. The simplest defensive move is to show that your top candidates are stable under reasonable QC perturbations and that missingness is not driving allele frequency contrasts.

Before you call it selection, check these five things: whether the signal aligns with structure axes, whether it is driven by one subcluster or batch, whether it sits in an unusual LD/low-recombination context, whether it is highly sensitive to window/threshold choices, and whether it lacks cross-statistic support.

If gene flow is a plausible alternative explanation for your contrasts, it should be part of interpretation framing rather than an afterthought. CD Genomics' overview of gene flow analysis provides a useful companion lens for that discussion.

Interpret iHS, XP-EHH, and FST together without overclaiming

The strongest story usually comes from convergent evidence, not from the tallest single peak.

High absolute iHS suggests unusually long haplotypes around one allele within a population, consistent with an incomplete sweep under the right assumptions.

XP-EHH highlights differential haplotype extension between populations and is most interpretable when your comparison pair is biologically justified and harmonized.

FST is a differentiation context layer. It can help you prioritize contrasts, but it needs demographic context and should not be treated as proof of positive selection.

If you need a conceptual refresher on what FST measures and how to interpret it, see "defining, estimating and interpreting FST" (2009).

What good sweep figures and tables look like

If your lab's goal is publication rather than exploration, treat this as "how to report selection scans" so the workflow is auditable: figures and tables should make the decision logic visible.

Reviewer-trusted sweep reporting depends on figures that show signal distribution, genomic context, filtering logic, and cross-method consistency.

At minimum, aim for one genome-wide plot per statistic you report, at least one regional zoom-in per top candidate, and a candidate-region table that records coordinates, summary statistics, and caveats so the analysis is easy to re-check.

A useful high-level reference for categories of genome-wide selection scans is "Genome-wide scans for footprints of natural selection" (2010).

When you discuss thresholds, be explicit about how you controlled false discoveries at genome scale. Multiple testing is an easy reviewer attack surface, and it is often worth stating your rationale even when you use empirical cutoffs; see "Multiple-testing corrections in selection scans…" (2025).

In-article image: triptych mockup showing genome-wide scan, regional zoom, and candidate table

Common failure modes in real projects

Most weak sweep projects fail for predictable reasons: the comparison pair is biologically vague, cohort labels hide structure, FST outliers are treated as selection by default, haplotype statistics are applied to data that cannot support haplotypes, or reporting is too thin to reproduce thresholding and region calls.

The fix is usually not more computation; it is a tighter decision record and a more conservative evidence ladder.

When to use a service instead of building everything in-house

Selective sweep analysis is often worth outsourcing when cohort definition, phasing strategy, cross-method interpretation, and reviewer-ready reporting are the bottlenecks rather than raw computation.

A provider should ask for your study goal, candidate population labels, sequencing/data type and variant calling details, and planned comparison pairs before scoping the work. If those questions aren't asked, the workflow cannot be defensible.

Actionable sweep deliverables include an auditable QC and cohort-definition report, structure/relatedness checks, standardized genome-wide statistics with stated thresholds, region-level candidate calls with merging rules, and a figure/table set aligned to a manuscript.

If you want that workflow packaged as a bioinformatics deliverable, CD Genomics offers a selective sweep analysis service for research use only.

FAQs

Is iHS better than XP-EHH?

Neither is universally better; iHS and XP-EHH are optimized for different contrasts. iHS is usually the better first choice when you have one focal population and you expect incomplete sweeps, because it targets within-population haplotype-length imbalance before fixation. XP-EHH is often more informative when your design is explicitly comparative and you suspect a population-specific sweep that is near fixation relative to a reference. If the comparison pair is unstable or phasing reliability is poor, the correct conclusion is not "one statistic failed," but "the project assumptions need to be tightened."

Can FST alone detect selective sweeps?

FST alone can highlight loci with strong differentiation, and a sweep can produce that pattern, but the statistic does not uniquely imply selection. Drift, structure, spatial sampling, and demographic history can inflate neutral variance and create high-FST outliers without recent adaptation. In practice, FST is most defensible when framed as prioritization and interpreted alongside evidence that a region also has sweep-like haplotype structure, region-level clustering of signals rather than a single-marker extreme, and a biological contrast that makes sense for the populations being compared.

Do I need phased data for sweep analysis?

If you want to interpret haplotype-length signals directly, you need reliable haplotypes, which makes phased data highly valuable for iHS and XP-EHH. Without phasing, you can still do frequency-based contrasts and you can still rank candidate regions, but your reporting should become more conservative and region-centric. The practical question is not "phased vs unphased" as a binary; it's whether phasing uncertainty could change your top candidates. If the answer is yes, treat results as hypothesis-generating and emphasize sensitivity checks.

How many samples are enough for a sweep scan?

There isn't a single number because sufficiency depends on the statistic, marker density, and how clean your population definitions are. What matters is stable allele-frequency estimation within each cohort and enough information content to distinguish local outliers from genome-wide background. In many projects, adding more samples to an unstable cohort does less for interpretability than refining population labels, balancing batches, and ensuring comparable data quality across groups. Sample size also interacts with missingness: a large cohort with uneven missingness can be less trustworthy than a smaller, well-controlled one.

Why do different statistics highlight different regions?

They are sensitive to different signatures and timescales. Haplotype statistics emphasize unusually long haplotypes that persist after recent selection, which is often most visible for ongoing or population-specific sweeps before recombination erodes the pattern. Differentiation measures emphasize allele frequency contrasts between groups, which can reflect selection but also demographic history. Because each statistic has different failure modes, disagreement is often diagnostic: it tells you to revisit cohort definition, comparison choice, and whether a region has neighborhood-level support that survives confounder checks.

What should I report so reviewers trust the analysis?

Report the cohort definition rules, QC thresholds, and structure/relatedness checks before you report any peaks. State how phasing was handled and whether it was consistent across populations. Make standardization and windowing explicit, including how candidate regions were defined and merged. Show genome-wide plots plus regional zooms for top candidates, and provide a candidate-region table that includes coordinates, supporting statistics, and interpretation caveats. Finally, include at least one sensitivity check demonstrating that top candidates are stable under reasonable changes to QC, thresholds, or window definitions.

Author

Dr. Yang H.

Senior Scientist at CD Genomics

Dr. Yang H. contributes scientific content on genomics methods, sample strategy, and project planning for research teams working in biodiversity, population genetics, and related fields. His writing focuses on helping readers make clearer technical decisions before starting or outsourcing complex research workflows.

LinkedIn Profile

* Designed for biological research and industrial applications, not intended for individual clinical or medical purposes.