Selective Sweep Analysis Workflow: How to Detect Recent Selection With iHS, XP-EHH, and FST

Selective sweep scans are easy to run and hard to defend.
If you're committed to a recent/positive selection scan, your real risk is getting peaks that collapse under confounders like structure, demography, recombination heterogeneity, or technical imbalance.
This article gives you a decision-first, reviewer-ready workflow that treats iHS, XP-EHH, and FST as pieces of one project design. It's a selective sweep analysis workflow you can defend: decide whether your dataset can support a scan, pick the best first statistic, control false positives, and report results in an auditable way.
Key takeaways
A defensible sweep scan starts with cohort definition and confounder control, not software. Choose iHS vs XP-EHH vs FST based on population setup and expected sweep stage, then call candidate regions conservatively (clusters and windows, not single SNP extremes). Report standardization, windowing, and region-merging rules explicitly and treat signals as hypotheses that require cross-evidence support.
Why selective sweep analysis still matters in population genomics
Selective sweep scans remain useful when your question is local adaptation, domestication, breed differentiation, or population-specific responses to environmental or management pressure. Recent positive selection can leave correlated patterns in haplotypes (long-range homozygosity), allele frequencies (skews relative to background), and between-population differentiation.
What sweeps do not give you is a finished functional conclusion. A selection signal is not a causal gene, and it is not a validated trait mechanism. In good projects, sweep scans narrow the search space and guide follow-up work.
If you want a baseline refresher for how local sweeps interact with diversity summaries, CD Genomics' page on genetic diversity analysis provides helpful context for interpreting "reduced diversity" without turning it into a single-cause story.
Selective sweep analysis workflow: decide before you compute
A sweep scan becomes defendable when you treat it as a workflow with explicit gates. In practice, you want to answer four questions:
- Is your cohort definition stable under structure checks?
- Does your genotype matrix support haplotype interpretation, or should claims stay region-level and conservative?
- Which statistic should you run first given your contrast (one population vs two, incomplete vs near-fixed)?
- How will you call peaks, merge candidate regions, and report sensitivity checks so results are reproducible?

Which statistic should you use first?
The best first statistic depends on whether selection is within one population or between populations, whether the sweep is incomplete or near fixation, and whether phased haplotypes are trustworthy enough to interpret long-range patterns.
When iHS is the better starting point
iHS is a within-population haplotype statistic. It is typically most informative for recent, incomplete sweeps where the selected allele is not fixed and both alleles are present at usable frequency. In that regime, selection can keep a long haplotype intact around the favored allele before recombination breaks it down.
Because iHS is haplotype-based, phasing and marker density matter. If phasing is weak, iHS can still be computed, but the correct response is not to ignore the issue; it is to tighten your interpretation rules and report sensitivity checks.
For a concise grounding in EHH-derived statistics and what iHS/XP-EHH are measuring, see the review/tutorial "Detecting selection using extended haplotype homozygosity (EHH)" (2022).
When XP-EHH is more informative
XP-EHH is a cross-population haplotype statistic. It is often more informative when a sweep is closer to completion in one population relative to a reference population, because the contrast is "extended haplotype in population A compared to population B." If your project is framed explicitly as an "XP-EHH selective sweep" scan, the same caution applies: the statistic is only as interpretable as the biological comparison pair and the harmonization of QC and phasing across populations.
XP-EHH does not rescue an unstable comparison. If your reference population has a different demographic history, hidden admixture, or systematically different data quality, XP-EHH can flag differences that are methodological rather than biological.
If you want a practical anchor for common EHH-based scan implementations in pipelines, the selscan paper (MBE, 2014) is a useful reference point.
When FST adds value rather than noise
FST measures differentiation. It is valuable when your question is explicitly comparative and you need a frequency-contrast layer—exactly why careful FST outlier interpretation matters. But "high FST" is not a synonym for "positive selection," and FST outliers can be frequent under neutrality when the assumed population model is mismatched.
A widely cited caution is "Pervasive selection or is it…? Why are FST outliers sometimes so frequent?" (2013), which explains why outlier counts can inflate even without pervasive adaptation.
Which statistic should I use for recent selection?
Use this table as a pre-analysis decision record. It is designed to be copied into project notes so your method choice looks like a choice, not a habit.
| Study goal | Population setup | Expected sweep stage | Input requirement | Main strength | Main limitation | Better first choice |
|---|---|---|---|---|---|---|
| Detect recent selection within a single population | One population | Incomplete / ongoing | Reliable haplotypes; adequate marker density | Sensitive to long haplotypes at intermediate allele frequency | Sensitive to phasing error and demography | iHS |
| Detect population-specific sweeps relative to a reference | Two populations (test vs reference) | Near completion in test population | Comparable QC/phasing across populations | Highlights differential haplotype extension | Reference choice can confound interpretation | XP-EHH |
| Prioritize loci with strong between-population differentiation | Two or more populations | Any (not stage-specific) | Stable allele frequencies | Simple differentiation lens; useful as context | High FST ≠ selection; model sensitivity | FST (supporting) |
| Build a conservative candidate-region shortlist | One or two populations | Uncertain / mixed | At least one haplotype stat + frequency context | Convergent evidence reduces over-interpretation | Requires explicit window/merge rules | iHS or XP-EHH first; add FST as context |

Start with data that can support a sweep scan
A practical way to use this selective sweep analysis workflow is to treat "data readiness" as a gating decision: if your genotype matrix cannot support haplotype interpretation, run a more conservative scan plan and downgrade claims accordingly. That is the core of positive selection scan QC in real projects.
Sweep statistics are only as credible as the genotype matrix, haplotype quality, and population definition behind them.
Marker density and missingness determine whether haplotype patterns are interpretable and whether your peaks represent biology or information artifacts.
If you want a focused refresher on how LD shapes what "windowing" and "clustering" mean in practice, see linkage disequilibrium analysis.
For study design, whole-genome datasets usually provide the cleanest substrate for haplotype interpretation (see whole genome re-sequencing for population genetics), while reduced-representation datasets can be workable but require more caution about coverage and marker distribution (see reduced-representation sequencing for population genetics).
Build a defensible workflow before you calculate any statistic
A credible sweep workflow starts with QC, population definition, LD-aware preprocessing, and sensitivity checks rather than with the software command itself.
Step 1: QC and cohort definition
Define cohorts in a way that you can reproduce from a sample sheet and a script. Report inclusion/exclusion rules, missingness thresholds, and relatedness handling. If you cannot explain cohort rules clearly, you cannot defend downstream peaks.
Step 2: Structure and relatedness checks
Structure is the gatekeeper for sweep interpretation. Run structure checks before sweep scans, not after, and treat cohort labels as hypotheses you test.
For a reviewer-friendly order of operations (QC → LD pruning → PCA → ancestry modeling → tree-like summaries), see the population structure analysis workflow.
Step 3: Phasing and harmonized inputs
If you phase, report how and ensure the same approach is applied consistently across populations. If you cannot defend phasing reliability, keep your interpretation region-centric and require stronger neighborhood support.
Step 4: Standardization, windowing, and peak calling
A single extreme value is rarely the best evidence. Prefer region-level candidate calls supported by multiple nearby markers and a windowing rule that reflects LD scale.
If you want a citation for the general point that window size smooths noise at the cost of sensitivity and that this tradeoff should be discussed, see "Evaluating the performance of selection scans to detect…" (2007).
Control false positives before you interpret a peak
Demography, structure, recombination heterogeneity, and technical imbalance can all mimic selection unless you test them explicitly.
Structure and demography: the two most common confounders
Structure can inflate sweep-like signals because ancestry differences can create long-range correlations and allele-frequency contrasts that resemble selection. Demographic history can reduce diversity and reshape haplotype patterns genome-wide, changing what "outlier" means.
A useful reference for how demography and recombination context can elevate false positives in sweep detection is "On Detecting Selective Sweeps Using Single Genomes" (2011).
For a compact overview of structure-analysis approaches and when to use them, see population structure analysis tools.
Recombination heterogeneity, long-range LD, and assembly context
Low-recombination regions preserve longer haplotypes under neutrality, which can make haplotype outliers less meaningful. If you lack a recombination map, mark peaks in suspected low-recombination regions as lower confidence unless supported by multiple independent lines of evidence.
Assembly artifacts can produce abnormal LD and reduced diversity patterns. If a small number of scaffolds dominate your top hits, treat that as a signal to inspect genome context before writing a selection story.
Batch effects and missingness imbalance
If sequencing batch is confounded with population labels, both haplotype and frequency statistics can detect batch rather than selection. The simplest defensive move is to show that your top candidates are stable under reasonable QC perturbations and that missingness is not driving allele frequency contrasts.
Before you call it selection, check these five things: whether the signal aligns with structure axes, whether it is driven by one subcluster or batch, whether it sits in an unusual LD/low-recombination context, whether it is highly sensitive to window/threshold choices, and whether it lacks cross-statistic support.
If gene flow is a plausible alternative explanation for your contrasts, it should be part of interpretation framing rather than an afterthought. CD Genomics' overview of gene flow analysis provides a useful companion lens for that discussion.
Interpret iHS, XP-EHH, and FST together without overclaiming
The strongest story usually comes from convergent evidence, not from the tallest single peak.
High absolute iHS suggests unusually long haplotypes around one allele within a population, consistent with an incomplete sweep under the right assumptions.
XP-EHH highlights differential haplotype extension between populations and is most interpretable when your comparison pair is biologically justified and harmonized.
FST is a differentiation context layer. It can help you prioritize contrasts, but it needs demographic context and should not be treated as proof of positive selection.
If you need a conceptual refresher on what FST measures and how to interpret it, see "defining, estimating and interpreting FST" (2009).
What good sweep figures and tables look like
If your lab's goal is publication rather than exploration, treat this as "how to report selection scans" so the workflow is auditable: figures and tables should make the decision logic visible.
Reviewer-trusted sweep reporting depends on figures that show signal distribution, genomic context, filtering logic, and cross-method consistency.
At minimum, aim for one genome-wide plot per statistic you report, at least one regional zoom-in per top candidate, and a candidate-region table that records coordinates, summary statistics, and caveats so the analysis is easy to re-check.
A useful high-level reference for categories of genome-wide selection scans is "Genome-wide scans for footprints of natural selection" (2010).
When you discuss thresholds, be explicit about how you controlled false discoveries at genome scale. Multiple testing is an easy reviewer attack surface, and it is often worth stating your rationale even when you use empirical cutoffs; see "Multiple-testing corrections in selection scans…" (2025).

Common failure modes in real projects
Most weak sweep projects fail for predictable reasons: the comparison pair is biologically vague, cohort labels hide structure, FST outliers are treated as selection by default, haplotype statistics are applied to data that cannot support haplotypes, or reporting is too thin to reproduce thresholding and region calls.
The fix is usually not more computation; it is a tighter decision record and a more conservative evidence ladder.
When to use a service instead of building everything in-house
Selective sweep analysis is often worth outsourcing when cohort definition, phasing strategy, cross-method interpretation, and reviewer-ready reporting are the bottlenecks rather than raw computation.
A provider should ask for your study goal, candidate population labels, sequencing/data type and variant calling details, and planned comparison pairs before scoping the work. If those questions aren't asked, the workflow cannot be defensible.
Actionable sweep deliverables include an auditable QC and cohort-definition report, structure/relatedness checks, standardized genome-wide statistics with stated thresholds, region-level candidate calls with merging rules, and a figure/table set aligned to a manuscript.
If you want that workflow packaged as a bioinformatics deliverable, CD Genomics offers a selective sweep analysis service for research use only.
FAQs
Neither is universally better; iHS and XP-EHH are optimized for different contrasts. iHS is usually the better first choice when you have one focal population and you expect incomplete sweeps, because it targets within-population haplotype-length imbalance before fixation. XP-EHH is often more informative when your design is explicitly comparative and you suspect a population-specific sweep that is near fixation relative to a reference. If the comparison pair is unstable or phasing reliability is poor, the correct conclusion is not "one statistic failed," but "the project assumptions need to be tightened."
FST alone can highlight loci with strong differentiation, and a sweep can produce that pattern, but the statistic does not uniquely imply selection. Drift, structure, spatial sampling, and demographic history can inflate neutral variance and create high-FST outliers without recent adaptation. In practice, FST is most defensible when framed as prioritization and interpreted alongside evidence that a region also has sweep-like haplotype structure, region-level clustering of signals rather than a single-marker extreme, and a biological contrast that makes sense for the populations being compared.
If you want to interpret haplotype-length signals directly, you need reliable haplotypes, which makes phased data highly valuable for iHS and XP-EHH. Without phasing, you can still do frequency-based contrasts and you can still rank candidate regions, but your reporting should become more conservative and region-centric. The practical question is not "phased vs unphased" as a binary; it's whether phasing uncertainty could change your top candidates. If the answer is yes, treat results as hypothesis-generating and emphasize sensitivity checks.
There isn't a single number because sufficiency depends on the statistic, marker density, and how clean your population definitions are. What matters is stable allele-frequency estimation within each cohort and enough information content to distinguish local outliers from genome-wide background. In many projects, adding more samples to an unstable cohort does less for interpretability than refining population labels, balancing batches, and ensuring comparable data quality across groups. Sample size also interacts with missingness: a large cohort with uneven missingness can be less trustworthy than a smaller, well-controlled one.
They are sensitive to different signatures and timescales. Haplotype statistics emphasize unusually long haplotypes that persist after recent selection, which is often most visible for ongoing or population-specific sweeps before recombination erodes the pattern. Differentiation measures emphasize allele frequency contrasts between groups, which can reflect selection but also demographic history. Because each statistic has different failure modes, disagreement is often diagnostic: it tells you to revisit cohort definition, comparison choice, and whether a region has neighborhood-level support that survives confounder checks.
Report the cohort definition rules, QC thresholds, and structure/relatedness checks before you report any peaks. State how phasing was handled and whether it was consistent across populations. Make standardization and windowing explicit, including how candidate regions were defined and merged. Show genome-wide plots plus regional zooms for top candidates, and provide a candidate-region table that includes coordinates, supporting statistics, and interpretation caveats. Finally, include at least one sensitivity check demonstrating that top candidates are stable under reasonable changes to QC, thresholds, or window definitions.
