Mutational Dynamics, Polymorphism Information Content, and High-Throughput Genotyping
Microsatellite markers, also known as short tandem repeats or SSRs, remain one of the most information-dense marker systems in genetics. A single locus can carry many allelic states. That feature gives SSRs unusual resolving power in population-genetic studies, biodiversity research, inheritance-pattern analysis, and marker development for non-model species. In practical terms, SSRs are still valuable because they can deliver high per-locus discriminatory power in appropriately validated research panels.
Their strength, however, comes from the same property that makes them difficult to analyze. SSRs are built from short repeat units. Those repeats are intrinsically unstable during DNA replication and amplification. As a result, SSR workflows are shaped by two parallel realities. First, the underlying biology generates real allelic diversity. Second, the same repetitive architecture also generates analytical artifacts, especially during PCR and fragment analysis.
That is why a useful SSR resource should not stop at the usual checklist of "highly polymorphic, codominant, and widely distributed." Those statements are true, but they do not explain why some loci are clean and powerful while others are noisy, unstable, or hard to interpret. They do not explain why dinucleotide repeats often create stronger stutter than many trinucleotide loci. They do not explain why a marker can show strong polymorphism on paper and still fail in production. They also do not explain why sequence-based SSR profiling has become increasingly attractive.
The most productive way to understand SSRs is to connect three layers of logic. The first layer is mutational mechanism. The second is signal interpretation. The third is platform choice. Once those three layers are linked, the whole field becomes easier to navigate.
The biology of short tandem repeats
An SSR is a stretch of DNA composed of a short motif repeated in tandem. The repeat unit may be one nucleotide long, two nucleotides long, three nucleotides long, or longer. A sequence such as AAAAAAAA is a mononucleotide repeat. A tract such as CACACACA is a dinucleotide repeat. A sequence such as CAGCAGCAG is a trinucleotide repeat. These patterns are common in many genomes, but they are not equally stable.
The key reason is simple. Repetitive DNA is structurally easy to misalign. When the replication machinery moves across a repeat tract, nearly identical repeat units sit next to each other like interchangeable tiles. That makes local pairing less secure than in a nonrepetitive region. A brief dissociation event can be followed by imperfect realignment. Once that happens, the locus may gain or lose repeat units.
Replication slippage is the core mutational engine
The central mechanism behind SSR polymorphism is replication slippage. During DNA synthesis, the polymerase copies the repeat tract while the template and newly synthesized strands remain temporarily paired. If one strand slips out of register and then reanneals incorrectly, a looped-out structure can form.
Two main outcomes are possible.
If the newly synthesized strand loops out, the daughter molecule may gain one or more repeat units. This produces repeat expansion.
If the template strand loops out, the daughter molecule may lose one or more repeat units. This produces repeat contraction.
Mismatch-repair systems can sometimes correct these slipped intermediates. But correction is not guaranteed. If the misaligned intermediate escapes repair, the altered repeat count becomes fixed and enters the next generation as a new allele. That is the direct molecular basis of SSR length polymorphism.
This mechanism also explains why SSR mutation rates are so much higher than those of typical SNPs. A point mutation usually requires single-base misincorporation plus repair escape. An SSR length mutation can arise from local structural misalignment within a repetitive tract. In other words, the repeat architecture itself creates a mutational shortcut. That is why SSR loci often show mutation rates in the range commonly cited around 10^-3 to 10^-4 per locus per generation, far above many single-nucleotide substitution rates.
Why hypervariability makes SSRs so informative
High mutation rate does not automatically mean high usefulness. But in SSRs, it often leads to exactly that. Because repeat-copy number can shift up or down over evolutionary time, many allelic states can accumulate at a single locus. This makes SSRs highly informative for differentiating genotypes, estimating diversity, resolving fine-scale population structure, and studying relatedness in research settings.
That is the real source of SSR power. A locus with many possible states can carry far more information than a biallelic marker at the same physical scale. This is why SSR panels often remain competitive when the biological goal is targeted diversity analysis rather than dense genome-wide association.
Still, not all SSRs are equally informative. The phrase "SSRs are highly polymorphic" is true at the category level but incomplete at the locus level. Some loci are rich, stable, and easy to score. Others are only modestly variable. Still others are highly variable but analytically troublesome.
What controls SSR stability and variability
Several features shape how an SSR behaves.
Repeat number is one of the biggest factors. Longer uninterrupted repeat tracts are generally more prone to slippage. More repeated units create more chances for misalignment. That can increase allele diversity, but it can also increase assay difficulty.
Motif length also matters. Mononucleotide repeats are often highly unstable, but they can be difficult to genotype cleanly with fragment-based approaches. Dinucleotide repeats are historically popular and can be very polymorphic, yet they are also well known for generating stronger stutter. Trinucleotide and tetranucleotide repeats are often easier to interpret because their artifact profiles are commonly less severe, though this is not an absolute rule.
Repeat purity is another major factor. Perfect repeats, where every unit is identical, are more likely to slip than interrupted repeats. A single interruption within the tract can change both biological stability and analytical behavior.
Flanking sequence quality matters as much as the repeat itself. If the flanks are unstable, repetitive, or highly variable across populations, primer performance becomes less reliable. That increases the risk of weak amplification, allele dropout, or null alleles.
Mononucleotide, dinucleotide, and trinucleotide repeats are not equivalent
It is tempting to group all SSRs together. In practice, motif class strongly influences both marker performance and interpretation.
Mononucleotide repeats are often the most fragile in polymerase-based workflows. A long homopolymer tract can be biologically variable, but it can also be difficult to score reproducibly because slippage artifacts are common.
Dinucleotide repeats often provide strong polymorphism, which explains their historical popularity. But they also tend to produce prominent stutter peaks. In a capillary electropherogram, that means the analyst may see not just the main allele peak, but also a predictable series of smaller peaks one repeat unit away. The more intense that stutter pattern becomes, the harder it is to tell true alleles from polymerase-generated by-products.
Trinucleotide and tetranucleotide repeats often provide a better balance between polymorphism and interpretability. Their larger repeat increments can make allele spacing easier to read, and their stutter profiles are often more manageable. For fragment-based genotyping, that can be a decisive advantage.
This is why marker selection should never be based on raw variability alone. The real question is not "Which locus is the most polymorphic?" The real question is "Which locus gives enough polymorphism while still remaining stable, readable, and scalable?"
Figure 1. Template- or nascent-strand loop-out during replication can fix repeat expansion or contraction when slipped intermediates escape repair.
What polymorphism information content actually tells you
Polymorphism information content, or PIC, is one of the most widely used metrics in SSR marker evaluation. In simple terms, PIC estimates how informative a marker is for distinguishing genotypes. A locus with many alleles at balanced frequencies tends to have a high PIC value. A locus with only a few alleles, or one overwhelmingly dominant allele, tends to have a lower PIC value.
That makes PIC a useful screening metric. It helps separate nominally polymorphic loci from genuinely informative ones. In marker-development studies, high-PIC loci are often prioritized because they are more likely to contribute useful discriminatory power to a panel.
But PIC should never be treated as a complete quality metric.
A marker can have a strong PIC value and still perform badly in practice. It may amplify inconsistently. It may generate severe stutter. It may show unstable allele binning across runs. It may carry a recurrent null-allele signal because the primer-binding site is variable. It may even look excellent in the discovery set and then collapse during validation in broader populations.
That is why good panel design requires a triage framework, not just a ranking list. A deployable SSR marker should ideally meet five criteria at the same time:
- High or at least useful PIC
- Clean amplification
- Low recurrent stutter burden
- Stable allele binning across replicates or runs
- No consistent null-allele signal
This distinction matters. PIC measures informational potential. It does not measure operational reliability. In real projects, the best panel is not the one with the highest theoretical diversity. It is the one that preserves enough diversity while remaining analytically trustworthy.
That principle becomes even more important in population-genetic studies of non-model organisms. A small or mid-sized panel can perform very well if the loci are clean and robust. By contrast, a larger panel full of unstable or ambiguous markers may add less value than expected. This is why many teams now combine de novo microsatellite marker development workflows with early-stage validation rather than treating discovery and deployment as separate steps.
From mutational biology to laboratory signal
Once the mutational logic of SSRs is clear, the next step becomes easier to understand. The laboratory assay is trying to measure biological length variation in a DNA structure that is also highly prone to polymerase slippage during PCR. In other words, the same repeat architecture that created the biological polymorphism can also generate assay artifacts during amplification.
That is the central tension in SSR genotyping.
A good SSR panel must capture true allelic differences without being overwhelmed by technical by-products. The whole downstream challenge of fragment analysis follows from that single fact.
In most traditional workflows, SSR genotyping begins with locus-specific PCR. The resulting amplicons are then separated by size, historically by gels and more precisely by capillary electrophoresis. Capillary electrophoresis became the dominant platform because it can resolve small fragment-length differences with high precision and moderate throughput. For many marker panels, it remains a practical and effective method.
But high precision is not the same thing as high interpretive certainty. A capillary instrument can measure fragment length very well and still leave the analyst with a difficult biological question: which peak represents a real allele, and which peak is only a by-product of slippage during PCR?
Technical sovereignty: genotyping and data deconvolution
A strong SSR dataset is not created by simply running PCR and reading the tallest peak. It is created by understanding what the peak pattern means. That requires more than instrument access. It requires locus awareness.
This is where technical sovereignty matters. In SSR work, technical sovereignty means understanding how repeat structure, peak spacing, stutter behavior, amplification quality, and primer performance interact at each locus. It means recognizing when a trace is reliable, when it is questionable, and when a marker should be redesigned or retired.
Without that layer of interpretation, SSR data can look cleaner than it really is.
What capillary electrophoresis does well
Capillary electrophoresis separates fluorescently labeled DNA fragments by size as they migrate through a polymer-filled capillary under an electric field. In SSR analysis, this provides three important advantages.
First, it offers finer size resolution than standard gel-based methods.
Second, it supports moderate-throughput workflows and multiplexed panels.
Third, it produces a peak-based output rather than a simple band-presence signal, which gives the analyst much more structure to work with.
In a clean heterozygous sample, the electropherogram may show two dominant peaks separated by the expected repeat-unit interval. In a clean homozygous sample, one dominant peak is expected. Internal size standards and allele-binning rules are then used to convert these signals into genotype calls.
When the marker is well chosen and the assay is well optimized, capillary electrophoresis remains fast, cost-effective, and highly useful. This is one reason targeted amplicon sequencing workflows and multiplex locus-focused strategies are often developed alongside classical SSR pipelines rather than replacing them outright.
Why 1-bp resolution does not automatically solve the problem
One of the most common misconceptions in SSR genotyping is that once a platform can resolve fragments at 1-bp resolution, the genotype is effectively known. That is not true.
Fragment resolution and allele certainty are different things.
An SSR allele model is usually based on repeat-unit increments. If the locus is a dinucleotide repeat, true alleles are generally expected to differ in steps of two bases. If the trace shows a small one-base shoulder or an unexpected nearby peak, that does not automatically indicate a biological allele. It may reflect incomplete adenylation, local peak distortion, off-target products, baseline noise, or instrument-level variation.
In other words, the instrument may measure what is physically present with great precision, while the analyst still has to decide what the signal biologically means.
A second limitation is size homoplasy. Two amplicons can share the same fragment length and still differ in internal sequence composition or flanking variation. Capillary electrophoresis cannot see that difference if total size remains unchanged. This is one of the major reasons sequence-based SSR workflows have become more attractive.
The stutter peak problem
Stutter peaks are among the most important analytical complications in SSR work. They arise when DNA polymerase slips during PCR and produces amplicons that are shorter or longer than the main product by one or more repeat units. In most cases, the most prominent stutter peak appears one repeat unit smaller than the main allele peak, but real patterns can be more complicated.
Stutter is not random noise. It is a repeat-architecture-dependent artifact. That makes it predictable to a point, but also hard to ignore.
Loci with long, pure repeat tracts tend to generate stronger stutter. Dinucleotide repeats are especially known for this behavior. Mononucleotide repeats can also be difficult. Trinucleotide and tetranucleotide loci often behave more cleanly, though again, locus context matters.
The key challenge is that a stutter peak can sit exactly where a real minor allele might be expected. In a simple case, the analyst can still separate the major allele from the artifact because the intensity relationship is familiar. In a harder case, especially in heterozygotes with closely spaced alleles, the distinction becomes much less obvious.
This is why serious SSR genotyping does not rely only on generic peak-height thresholds. Good deconvolution uses locus-specific expectations. It asks whether the observed spacing matches the motif. It asks whether the secondary signal fits the normal stutter profile of that locus. It checks whether the pattern is reproducible across replicates. It also asks whether the marker repeatedly generates ambiguous calls across the sample set.
A useful deconvolution framework usually includes:
- expected allele spacing based on motif length
- typical stutter position and relative intensity
- minimum rules for calling a secondary peak as real
- replicate consistency checks
- special scrutiny for loci with recurrent homozygote excess
- retirement criteria for persistently unstable markers
That last point is important. Not every SSR locus deserves to remain in the panel. Some markers are informative but not deployable. A marker that repeatedly creates scoring uncertainty may cost more in analyst time and downstream error than it contributes in information.
Assay design can reduce interpretation burden before genotyping begins
The cleanest way to solve a difficult trace is often to prevent it from becoming difficult in the first place.
Upstream assay design has a major effect on downstream deconvolution. Better primer design can reduce off-target amplification. Better locus selection can reduce stutter burden. Better multiplex balancing can reduce weak or overloaded peaks. Better flank selection can reduce the risk of hidden primer-site polymorphism.
This is why targeted microsatellite genotyping workflows should be treated as design problems as much as measurement problems. A panel that is thoughtfully built at the beginning usually produces cleaner electropherograms later. By contrast, a panel optimized only for theoretical polymorphism may generate interpretive debt at every later stage.
The null allele problem
Null alleles are one of the most underestimated problems in SSR genotyping. A null allele is not absent from the genome. It is only absent from the signal. The usual cause is a mutation in the primer-binding region that weakens or prevents amplification of one allelic copy.
The analytical consequence can be severe.
If a heterozygous sample carries one amplifying allele and one null allele, the electropherogram may show only the amplifying product. The sample then appears homozygous, even though it is not. Across a dataset, this creates an excess of apparent homozygotes. In turn, that can distort heterozygosity estimates and generate apparent deviation from Hardy-Weinberg expectations.
This is not a small technical nuisance. It sits at the boundary between molecular failure and population-genetic interpretation. A locus with recurrent null alleles can make a biologically ordinary population look genetically strange.
Why null alleles matter so much in real studies
The biggest problem with null alleles is that their downstream signature is easy to misread. A locus with homozygote excess may suggest inbreeding, substructure, assortative mating, or selective effects. All of those are biologically plausible explanations. But the same pattern may also arise because one class of alleles is failing to amplify.
That is why null alleles are so dangerous in research interpretation. They imitate biological signal.
The risk becomes even more serious in inheritance-pattern studies, biodiversity research, and any project where each locus carries substantial weight. A small number of poorly behaved markers can distort conclusions more than expected, especially when the total panel size is not large.
How to recognize a null-allele-prone locus
No single sign proves the presence of a null allele, but some patterns should raise suspicion.
A repeated excess of homozygous calls at one locus is an obvious clue.
Unexpected Hardy-Weinberg deviation limited to a small subset of markers is another.
Weak amplification in a population-specific subset of samples can also be informative.
A locus that behaves cleanly in one lineage but poorly in another may indicate flanking-site variation rather than true biological absence of diversity.
In practice, null alleles should be treated as a marker-validation problem, not merely a downstream statistical nuisance.
The best response is often redesign, not patching
Software can estimate null-allele frequency. That can be useful during data review. But estimation is not the same as correction. If a marker repeatedly shows evidence of primer-site mismatch or allele dropout, the cleanest response is often to redesign the primers or replace the locus.
That is why flanking-region stability matters so much during marker development. A good SSR locus is not defined only by the repeat tract. It is also defined by whether the surrounding sequence supports reliable amplification across the intended sample set.
This is one of the points where sequence-based SSR profiling becomes especially valuable. If the workflow captures both repeat and flanking-sequence variation, the analyst gains a much clearer view of why a locus is behaving badly. In that context, sequence-based SSR profiling such as Hi-SSRseq or broader targeted region sequencing workflows can improve interpretability rather than merely increasing throughput.
Figure 2. Capillary electrophoresis can resolve SSR fragments with high precision, but stutter peaks and primer-site mutations can create ambiguous or falsely homozygous genotype patterns.
Hardy-Weinberg deviation is a clue, not a conclusion
One of the most common mistakes in SSR analysis is to treat Hardy-Weinberg deviation as direct evidence of biology before marker behavior has been fully checked.
A departure from equilibrium may indeed reflect biological structure. It may point to nonrandom mating, inbreeding, demographic subdivision, or selective processes. But it may also reflect null alleles, allele dropout, scoring bias, or hidden technical asymmetry in amplification.
The practical lesson is simple. Population-genetic statistics should not be interpreted independently of locus diagnostics.
This is especially true when working with moderate sample sizes or limited marker panels. In those settings, a few unstable loci can shift the whole analytical picture. A high-PIC marker is useful only when its genotype model is believable. If the peak pattern is not trustworthy, the diversity estimate built on top of it will not be trustworthy either.
The case for moving beyond fragment length begins here
Once the main sources of ambiguity in fragment-based SSR calling are clear, the case for sequence-resolved workflows becomes much easier to evaluate.
Capillary electrophoresis remains useful. It is still efficient for many targeted projects. But its core limitation is now obvious: it measures fragment length, not full allele sequence. That means it cannot directly resolve size homoplasy, flanking polymorphism, or all sources of hidden allele complexity.
This is the point where the field begins to shift. Researchers do not move toward SSR-seq simply because NGS is newer. They move because fragment length alone is sometimes an incomplete representation of the locus.
SSR-seq: from fragment length to sequence-resolved alleles
The most important conceptual shift in modern SSR analysis is this: a fragment is not the same thing as an allele. In capillary electrophoresis, the allele is inferred from amplicon length. In SSR-seq, the allele is defined from sequence. That difference matters because two amplicons can share the same apparent size and still differ in repeat composition, internal interruptions, or flanking polymorphisms. Sequence-based microsatellite studies have shown that this added sequence resolution can uncover diversity hidden by size-only scoring and reduce misinterpretation caused by size homoplasy.
That is why SSR-seq should not be framed as "CE on a sequencer." It changes the information model. A CE workflow asks how long the fragment is. An SSR-seq workflow asks what sequence-defined variant is present at that locus and how much of the variation lies in the repeat tract versus the flanks. The second question is richer. It is also more portable across projects, because sequence-defined alleles are easier to compare than fragment bins that depend on platform-specific sizing behavior.
What SSR-seq captures that CE can miss
SSR-seq usually starts with locus amplification, often in multiplex format, followed by library preparation and next-generation sequencing. The main gain is that each locus is scored from reads spanning the repeat and at least part of the flanking sequence. This creates several advantages at once.
First, SSR-seq can separate same-size, different-sequence alleles. This is the classic size-homoplasy problem. Two alleles may migrate to the same apparent fragment length in CE, yet one may carry a repeat interruption while the other carries a flanking SNP or a different internal repeat arrangement. Sequence-based scoring splits those hidden states apart.
Second, SSR-seq can improve locus standardization across studies. Fragment bins often need cross-run normalization and platform-specific calibration. Sequence strings are still not effortless, but they are inherently more transferable than size calls defined by local instrument behavior. The PeerJ workflow paper also emphasized that reusing old CE-era loci without redesign is often suboptimal for sequence-based genotyping, which is why modern SSR-seq projects increasingly co-design loci, multiplex structure, and bioinformatic calling rules.
Third, SSR-seq can make difficult loci more interpretable. If a locus behaves oddly in CE, read-level data can reveal whether the issue comes from repeat complexity, flanking polymorphism, unexpected indels, or poor primer neighborhoods. In that sense, SSR-seq is not only a throughput upgrade. It is also a diagnosis upgrade.
SSR-seq does not eliminate complexity
SSR-seq improves allele definition, but it does not make repetitive DNA trivial. It shifts the problem. CE asks the analyst to interpret peaks. SSR-seq asks the analyst to interpret read families, depth balance, locus-specific error profiles, and repeat-aware bioinformatic outputs. The gain is real, but only when the pipeline is built specifically for microsatellites rather than treated as a generic amplicon workflow.
Read depth matters. Low-frequency artifact reads still need to be separated from true minor alleles. Multiplex balance still matters. Repeat-aware parsing still matters. This is why the strongest SSR-seq workflows are not just wet-lab protocols. They are integrated systems that combine marker design, multiplex amplification, sequencing, and automated calling logic.
When the move to SSR-seq is justified
The move from CE to SSR-seq is usually worth serious consideration under a few recurring conditions.
It makes sense when size homoplasy is likely to matter.
It makes sense when stutter-heavy CE traces are becoming the main bottleneck.
It makes sense when null alleles or primer-site variation are suspected across divergent populations.
It makes sense when the project already sits inside an NGS-centered workflow.
And it makes sense when non-model species marker discovery is already part of the project design.
In those cases, the question is no longer whether CE can work. It often can. The real question is whether fragment length alone still captures enough of the biology.
Figure 3. Modern SSR workflows extend from fragment-length genotyping to sequence-based SSR profiling and NGS-assisted locus discovery, especially in non-model species.
De novo SSR discovery in non-model species
One reason SSRs remain relevant is that marker discovery is no longer tied to slow legacy enrichment pipelines. Genome survey sequencing and skim-WGS now make it much easier to identify candidate repeat loci, recover usable flanking sequence, design primers, and build first-pass panels in species with limited genomic resources. Recent genome-wide SSR discovery studies continue to use shallow or survey-style sequencing to generate polymorphic marker sets for population-genetic analysis in non-model organisms.
This changes the old criticism that SSR development is always too slow to be practical. That criticism still has force in projects that truly need dense genome-wide markers. But it is much weaker in targeted diversity studies, inheritance-pattern work, or population-structure projects where a modest number of highly informative loci is enough. In those settings, low-pass discovery plus focused validation can be a very efficient route from uncharacterized genome to usable marker panel.
A practical discovery pipeline usually follows this logic. DNA is generated at quality suitable for low-pass sequencing. Reads are assembled lightly or scanned directly for tandem repeats. Candidate loci are filtered by motif class, repeat count, uniqueness of flanking sequence, expected amplicon size, and multiplex compatibility. Primers are then designed and pilot-tested before full deployment. The key point is that the best loci are not just abundant repeats. They are repeats that survive validation.
That means a strong candidate locus usually balances four things at once:
- enough repeat length to generate useful polymorphism
- stable and unique flanking sequence
- manageable artifact burden
- compatibility with the intended endpoint, whether CE or SSR-seq
This is why discovery and genotyping should be designed together. If the intended endpoint is CE, cleaner motif classes and tract structures may deserve priority. If the intended endpoint is SSR-seq, loci with informative flanking variation may become more attractive.
SSR versus SNP markers: the right comparison
The SSR-versus-SNP debate becomes misleading when it is phrased as a universal contest. The better question is: better for what?
SNPs dominate dense genome-wide association, high-throughput imputation, and very large distributed marker sets because they are abundant, computationally scalable, and well matched to modern multiplex platforms. SSRs remain strong where multi-allelic information per locus matters, where the study is targeted rather than genome-wide, or where modest marker numbers must still deliver strong discriminatory power. Comparative studies support this more nuanced view. In one Heredity study on Armillaria cepistipes, multi-allelic SSRs were especially useful for detecting structure at smaller spatial scales, while SNPs better reflected deeper divergence across more distant populations. In a separate BMC Genomics comparison for a conservation-relevant species, both marker systems supported population-genetic analyses, but the resulting estimates and clustering behavior were not identical, reinforcing that marker choice changes inference.
| Metric | SSR markers | SNP markers |
|---|---|---|
| Basic allelic structure | Multi-allelic | Usually biallelic |
| Per-locus information content | Often high | Usually lower per locus |
| Mutation behavior | Repeat-length change, relatively high mutation rate | Lower mutation rate at most loci |
| Classical workflow | PCR + CE fragment sizing | Arrays or sequencing |
| Modern upgrade path | SSR-seq / sequence-based microsatellite genotyping | GBS, ddRAD, arrays, WGS-derived genotyping |
| Strength in population structure | Strong with modest locus numbers | Strong when many loci are distributed genome-wide |
| Strength in GWAS | Limited | Usually preferred |
| Main technical challenge | Stutter, null alleles, allele binning | Missingness, ascertainment bias, platform effects |
| Best fit in non-model targeted work | Often very good | Strong when broad genome-scale discovery is justified |
The practical takeaway is straightforward. For small to mid-scale population-genetic studies, parentage-style inference in research settings, or targeted diversity work, SSRs can still be extremely efficient. For dense genome-wide association and very high-dimensional variant analysis, SNP-based systems are usually the better fit. For difficult SSR loci where size alone is no longer enough, SSR-seq becomes the bridge between the two worlds.
A decision framework for real projects
A useful way to choose among SSR, SSR-seq, and SNP workflows is to work backward from the biological question.
| Project goal | Best-fit marker strategy | Why | Main caveat |
|---|---|---|---|
| Moderate-scale diversity or relatedness analysis | SSR panel | High per-locus information, modest locus count can still be powerful | Locus quality must be tightly validated |
| Difficult fragment interpretation at otherwise useful loci | SSR-seq | Resolves hidden sequence variation and reduces size-only ambiguity | Requires repeat-aware sequencing analysis |
| Dense genome-wide association or fine mapping | SNP workflow | Broad genomic coverage and scalability | Per-locus information is lower |
| Non-model species with limited prior resources | Skim-WGS plus SSR development, or SNP discovery if genome-wide goals are essential | Flexible entry point with lower discovery burden than full-scale genomics in some projects | Marker choice must match downstream inference needs |
The strongest projects are not loyal to one marker class. They are loyal to fit-for-purpose design.
Conclusion
Microsatellite markers remain relevant because their core strength has not changed. They convert repeat instability into high allelic information. What has changed is the workflow around them. Today, SSRs can be discovered faster, screened more rationally, and genotyped either by classical fragment analysis or by sequence-based methods that recover information CE cannot see. The most useful way to evaluate an SSR project now is through three linked questions: what mechanism generates the variation, what artifacts complicate the signal, and when does fragment length stop being enough? Projects that answer those three questions clearly can still extract exceptional value from microsatellite systems in modern population-genetic research.
FAQ
What is the biggest advantage of SSR-seq over capillary electrophoresis?
SSR-seq captures the repeat region and flanking sequence together, which helps resolve same-size alleles and reduces size-homoplasy problems that CE cannot see directly.
Does SSR-seq eliminate stutter problems completely?
No. It reduces some limitations of fragment-based scoring, but repetitive DNA still requires locus-aware analysis and artifact filtering at the sequence level.
Are SSRs still useful in non-model species?
Yes. Recent genome survey and shallow-sequencing studies continue to use SSR discovery successfully in poorly characterized species for diversity and population-genetic analysis.
When are SNPs a better choice than SSRs?
SNPs are usually better when the study needs dense genome-wide coverage, such as GWAS, fine mapping, or very high-dimensional population-genomic analysis.
Why can a high-PIC marker still be a bad marker?
Because PIC reflects informational potential, not operational reliability. A locus can be polymorphic yet still be compromised by stutter, poor amplification, unstable binning, or null alleles. This is an inference from the marker-behavior literature and the CE/SSR-seq comparisons discussed above.
What is the main reason null alleles are dangerous?
They can make heterozygotes appear homozygous, which distorts heterozygosity estimates and can create misleading Hardy-Weinberg deviation.
References
- Schlotterer C. The evolution of molecular markers--just a matter of fashion? Nature Reviews Genetics. 2004;5(1):63-69. DOI: 10.1038/nrg1249
- Dakin EE, Avise JC. Microsatellite null alleles in parentage analysis. Heredity. 2004;93:504-509. DOI: 10.1038/sj.hdy.6800545
- van Oosterhout C, Weetman D, Hutchinson WF. Estimation and adjustment of microsatellite null alleles in nonequilibrium populations. Molecular Ecology Notes. 2006;6(1):255-256. DOI: 10.1111/j.1471-8286.2005.01082.x
- Vartia S, Villanueva-Canas JL, Finarelli J, Farrell ED, Collins PC, Hughes GM, Carlsson JEL, Gauthier DT, McGinnity P, Cross TF, FitzGerald RD, Mirimin L, Crispie F, Cotter PD, Carlsson J. A novel method of microsatellite genotyping-by-sequencing using individual combinatorial barcoding. Royal Society Open Science. 2016;3(1):150565. DOI: 10.1098/rsos.150565
- Viruel J, Haguenauer A, Juin M, et al. SSR-seq: Genotyping of microsatellites using next-generation sequencing reveals higher levels of polymorphism as compared to traditional fragment size scoring. Ecology and Evolution. 2018;8(22). DOI: 10.1002/ece3.4533
- Lepais O, et al. Fast sequence-based microsatellite genotyping development workflow. PeerJ. 2020;8:e9085. DOI: 10.7717/peerj.9085
- Zimmerman SJ, Aldridge CL, Oyler-McCance SJ. An empirical comparison of population genetic analyses using microsatellite and SNP data for a species of conservation concern. BMC Genomics. 2020;21:382. DOI: 10.1186/s12864-020-06783-9
- Tsykun T, Rellstab C, Dutech C, Sipos G, Prospero S. Comparative assessment of SSR and SNP markers for inferring the population genetic structure of the common fungus Armillaria cepistipes. Heredity. 2017;119(5):371-380. DOI: 10.1038/hdy.2017.48
- Genome-wide SSR marker discovery and population genetic analysis in a non-model species. Trees. 2025. DOI: 10.1007/s00468-025-02651-9
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.