TL;DR — From Painting to Clusters in Seven Practical Steps
Phase genomes and align to a recombination map; run ChromoPainter v2 in EM mode to estimate global Ne and μ, then "paint" haplotypes and merge outputs with ChromoCombine to build a coancestry matrix. Feed that matrix into fineSTRUCTURE, run MCMC with built-in convergence diagnostics, and interpret the cluster tree against metadata and PCA/ADMIXTURE. When you need admixture dates and source profiles, pass the painted output to GLOBETROTTER or fastGLOBETROTTER.
Simulated data illustrate how haplotype-aware painting refines structure: linked coancestry heatmaps and corresponding PCA separate closely related groups more clearly than unlinked summaries. (Lawson D.J. et al. (2012) PLOS Genetics)
Allele-frequency approaches (PCA or ADMIXTURE) capture broad structure, but they often blur very recent common ancestry. Haplotype methods do better here because long shared segments carry time-rich information. ChromoPainter implements the Li–Stephens copying model, representing each recipient haplotype as a mosaic copied from donor haplotypes; fineSTRUCTURE then clusters samples using the coancestry counts implied by those copying paths. This combination has resolved striking geography-matched clusters in dense sampling projects (for example, the UK's PoBI map), revealing subtle differentiation that frequency methods alone struggled to detect.
Plain-English takeaway: if your question involves recent splits, cryptic isolates, or fine-scale stratification relevant to downstream association or demography, haplotype painting + fineSTRUCTURE gives you a sharper lens than SNP frequencies alone.
Why this matters: the copying model assumes well-phased haplotypes and realistic recombination distances; garbage in, noisy copying out.
Run ChromoPainter v2 in EM mode on a representative subset to infer Ne (effective population size) and μ (per-site mutation/switch rate). Record these values for full runs and document exactly which chromosomes or windows you used for EM; consistency makes runs reproducible and speeds reviewer checks.
Pragmatic tips
Aggregate the copying counts (or chunk lengths) to create a dense coancestry matrix where rows are recipients and columns are donors. Inspect diagonals and row/column sums to catch I/O mishaps; visualize a pilot heatmap to confirm expected blocks (e.g., by geography or known pedigree). This matrix is the single input fineSTRUCTURE needs to infer clusters.
From painted haplotypes to clusters: fineSTRUCTURE's coincidence matrix and MAP tree derived from the coancestry matrix, with performance gains over frequency-only models in separating subtle splits. (Lawson D.J. et al. (2012) PLOS Genetics)
European coancestry heatmaps reveal haplotype-sharing structure; juxtaposed ADMIXTURE barplots show how frequency-based assignments align yet can miss fine-scale patterns captured by painting. (Lawson D.J. et al. (2012) PLOS Genetics)
Sampling of DNA segment pairs in fastGLOBETROTTER relative to GLOBETROTTER. (Wangkumhang P. et al. (2022) Genome Research)
1) Painting Bias from Donor Choice
If donors lack coverage of plausible ancestry sources, copying becomes biased. Broaden donors or restructure donor panels before re-painting; your Methods should state donor selection logic.
2) Over-Clustering from Aggressive Chunking
Excessively small chunks or inconsistent chunk boundaries can inflate spurious splits. Use recommended chunk sizes, then test stability by re-running a subset with coarser boundaries. The manual's computational considerations provide guidance.
3) Misinterpreting fineSTRUCTURE Trees
A tree summarizes coancestry patterns, not necessarily a strict population phylogeny. Anchor your narrative in geography and external evidence (e.g., historical records or reference panels). Dense regional sampling (as in PoBI) yields geography-coherent clusters while still accommodating admixture signals.
4) Under-Powered EM Estimation
Running the EM step on a narrow subset can mis-estimate Ne/μ. Re-estimate on a stratified subset and confirm stability across seeds; document chosen values and justification.
5) Under-Running The MCMC
If diagnostics flag insufficient mixing, extend the chain and revisit thinning/burn-in settings. Cite convergence checks in Methods to preempt reviewer questions.
Methods (boilerplate to adapt):
"We phased autosomes and aligned variants to a standard recombination map. Using ChromoPainter v2, we estimated global Ne and μ by EM on a representative subset and then painted all samples in parallel across chromosome chunks, merging outputs with ChromoCombine. We aggregated chunk counts into a genome-wide coancestry matrix and clustered with fineSTRUCTURE, assessing MCMC convergence with the provided diagnostics and comparing independent chains. We validated key splits against PCA/ADMIXTURE and, where relevant, dated admixture events with GLOBETROTTER/fastGLOBETROTTER (bootstrapping and null-individual settings documented)."
Figure set for the report:
Yes. ChromoPainter v2 expects phased SNPs and a genetic map; the copying model assumes realistic recombination distances. Poor phasing or missing maps lead to noisy copying paths and unstable coancestry matrices. Use the EM step to estimate Ne/μ before large runs.
With chunking and parallel painting, thousands of samples are feasible. Split by chromosome or chunk, run jobs as arrays, ChromoCombine outputs, then cluster with fineSTRUCTURE. The manuals outline HPC staging and practical parameter ranges.
Use the convergence diagnostics introduced in newer versions and compare independent chains. If key splits vary, extend the chain, revisit thinning, or tweak chunking. Report diagnostics in your Methods to satisfy reviewers.
Add them after you trust clusters and coancestry patterns. These tools use painted haplotypes to identify and date admixture over roughly the last ~4,500 years and to reconstruct plausible source profiles; fastGLOBETROTTER is dramatically faster with similar accuracy.
They're complementary. PCA/ADMIXTURE summarizes frequency-level structure; ChromoPainter/fineSTRUCTURE focus on haplotype-level sharing that captures more recent ancestry. Robust practice is to cross-validate signals across methods.
When your project needs fine-scale structure, recent relatedness, or historical admixture timing, ChromoPainter + fineSTRUCTURE provides the right level of resolution—and the GLOBETROTTER suite extends it to timelines and sources. The winning playbook is simple and reproducible: phase and map, EM for Ne/μ, paint and merge, build the coancestry matrix, cluster with convergence checks, then interpret and (optionally) date.
If you want a turnkey path from raw genotypes to a reviewer-ready structure report, start a Population Structure Analysis project: we'll scope the donor panel, EM subset, chunking and HPC budget, and deliver the full pack—coancestry heatmaps, fineSTRUCTURE trees, stability diagnostics, and (when needed) GLOBETROTTER/fastGLOBETROTTER dating—designed to integrate cleanly with your PCA QC and ADMIXTURE pipelines.
Related reading:
References