Microsatellites and eccDNAs: What Circularization Means for Repeat Biology and Cancer Studies

Microsatellites—short tandem repeats like (CAG)n, (CA)n, and (AT)n—are inherently unstable, prone to replication slippage, and rich in non‑B DNA structures. Those properties do more than generate indels. They also predispose loci to form extrachromosomal circular DNA (eccDNA), small to large circles that carry repeat tracts, nearby genes, and junction signatures of repair. In this guide, we bring together mechanism, measurement, and interpretation so you can study eccDNA repeats with rigor.
Research Use Only (RUO) scope: Everything here is framed for discovery and method development. We discuss cancer and repeat‑expansion mechanisms strictly as research contexts. We do not address diagnosis, treatment, or prognosis.
You’ll learn how replication slippage, hairpins, and microhomology‑mediated repair generate circles from repeat‑dense loci; why this matters for basic repeat biology and for research on genome instability in cancer; and how to design mixed‑platform sequencing experiments with defensible bioinformatics and QC. Throughout the article, we use the phrase microsatellite and eccDNAs deliberately, because pairing the two concepts reflects how most laboratories encounter these molecules in practice—repeat biology driving circularization, and circles returning the favor by reshaping repeats.
Microsatellite and eccDNAs: from slippage to circles (mechanism overview)
Short tandem repeats readily form secondary structures. During S‑phase, polymerase slippage can create a hairpin loop either on the nascent or template strand. That loop is stabilized by base pairing within the repeat and nearby low‑complexity sequence, becoming a substrate for cleavage and repair. When a nick or double‑strand break arises near the loop, the cell can resolve the damage through microhomology‑mediated end joining (MMEJ/alt‑EJ), producing a ligated circle that preserves the repeat and flanks it with short microhomologies.
Two lines of evidence anchor this model. First, junction microhomology: In mammalian systems, many eccDNA junctions carry short (often <5–10 bp) microhomology tracts, frequently GC‑enriched, consistent with MMEJ templating and ligation. Hu and colleagues profiled circular junctions and observed widespread microhomology usage across tissues, validating junctions by outward‑facing PCR and Sanger sequencing (eLife, 2023; DOI: 10.7554/eLife.87115). Second, microsatellite hotspots: When specific non‑B‑DNA‑forming repeats are engineered at a chromosomal site, eccDNA arises robustly from those inserts but not from controls, pointing to repeat‑driven fragility and circularization. Gadgil et al. showed recurrent, nonrandom template switches and abundant junction variants, implicating replication‑associated break‑induced replication and MMEJ in circle formation (NAR Cancer, 2024; DOI: 10.1093/narcan/zcae027).
If you’d like a deeper mechanistic backdrop on repair choice, alt‑EJ, replication stress, and transposon activity, see our series article Linking eccDNA to Genome Instability: alt‑EJ, Replication Stress, and Retrotransposons, which surveys additional models and evidence in context: Linking eccDNA to Genome Instability.
Figure 1: A short tandem repeat (e.g., CAG)n forms a hairpin during replication (1), a nick or break occurs (2), and MMEJ ligates an excised fragment into an eccDNA with a short, GC‑rich microhomology at the junction (3). The junction sequence is the anchor for confident detection and validation.
Why repeats matter in cancer research (RUO): MSI context, eccDNA load, and open questions
Microsatellite instability (MSI) arises when mismatch repair is impaired, elevating replication errors, especially in repeats. It is reasonable to hypothesize that MSI‑like contexts could increase rates of loop formation, template switching, and MMEJ‑resolved circularization. However, large pan‑cancer ecDNA/eccDNA prevalence studies to date rarely stratify by MSI/dMMR status, so any causal link between MSI and increased eccDNA load remains an open research question rather than a settled fact. Reviews summarizing circular DNA in malignancies agree on widespread heterogeneity but stop short of MSI‑specific conclusions, which highlights an opportunity for carefully controlled RUO studies.
What we can say in a research frame: eccDNA repeats are frequently observed at junctions across tissues, consistent with repeat‑driven fragility; extrachromosomal amplification can range from small eccDNA to larger ecDNA amplicons harboring genes; and repair pathway choice, including MMEJ, likely shapes circle diversity. Whether MSI status modulates these features in predictable ways remains to be quantified in stratified cohorts.
For a broader research survey of circular DNA in malignancies, including oncogene‑bearing circles, see our companion article: eccDNA in Cancer: Gene Amplification, Oncogene Regulation, and Research Applications. We emphasize again: these are research observations, not clinical statements.
Sequencing strategies for repeat‑rich eccDNA: a mixed, RUO‑friendly design
Because repeats intensify alignment ambiguity and junction heterogeneity, a hybrid sequencing strategy is often the most defensible path for repeat‑bearing eccDNA.
Primary resolution with long reads, scaling with short reads. Long reads (PacBio HiFi, ONT) can span tandem arrays and directly traverse circular junctions. PacBio HiFi offers high per‑read accuracy, which helps for precise junction dissection and distinguishing motif interruptions. ONT offers very long reads and native methylation, enabling you to survey repeat context and epigenetic marks contemporaneously. Short reads (Circle‑Seq/enzymatic enrichment and targeted capture) enable cost‑effective cohort scaling, capture‑panel confirmation of nominated junctions, and orthogonal validation (e.g., outward‑facing PCR plus amplicon resequencing).
A practical decision framework: If your central aim is to discover and map repeat‑bearing circles and you can work with modest sample counts, begin with long‑read libraries from eccDNA‑enriched input, then backstop with targeted short‑read assays for confirmation. If your aim is cohort screening and relative quantitation at nominated loci, short‑read capture panels and Circle‑Seq–style libraries can be primary, provided you validate a representative subset with long‑read junction support. If you need epigenetic context at repeats (e.g., CpG methylation status around the junction), incorporate ONT runs on the same enrichment to avoid new library biases.
Controls and wet‑lab QC that matter for repeats. Exonuclease efficacy should be quantified by qPCR on a linear‑only genomic locus; aim for high depletion (for example, >90%) and report the exact assay and fold‑depletion achieved. Include a spike‑in plasmid control to normalize recovery across batches. For an example of enzymatic depletion and spike‑in usage in related contexts, see Yang et al. (PLOS Genetics, 2022; DOI: 10.1371/journal.pgen.1010024). Rolling‑circle amplification (RCA) boosts yield but can create concatemers and enrich polymerase slippage artifacts around repeats; if you use RCA, keep reaction times conservative, perform post‑RCA size selection, and prioritize junction‑centric validation downstream. For hybrid capture, include negative controls lacking the target repeat and monitor off‑target recovery, especially across paralogous repeat families.
Disclosure: CD Genomics is our product. For RUO long‑read WGS and hybrid designs that support repeat‑rich regions, see Whole Genome Sequencing Services for platform capabilities and consultation: Whole Genome Sequencing Services.
Figure 2: A dot‑plot representation of a 1 kb synthetic locus with a central (CAG)25 tract reveals parallel diagonals that signal periodic self‑similarity—one reason multi‑mapping is a first‑order challenge for eccDNA repeats.
Bioinformatics for eccDNA repeats: multi‑mapping, junction evidence, and repeat‑aware filters
The core problem is straightforward to state and tricky to solve: repetitive reads often map equally well to many loci, and circular junctions may be short, heterogeneous, and bordered by microhomology. A defensible pipeline leans heavily on junction‑centric evidence and stratified validation tiers.
Mapping and candidate discovery. For short reads, use a split‑read–aware aligner (e.g., BWA‑MEM) with settings that retain secondary/supplementary alignments. Extract outward‑facing pairs and split reads that support head‑to‑tail junctions. Circle‑oriented tools such as Circle‑Map can realign and score circular candidates; empirical thresholds like a high circle score plus multiple split and discordant reads are common starting points. For long reads, use minimap2 to map against the reference; scan for reads that traverse putative junctions (read segments that map head‑to‑tail across the same locus). Consider an assembly step (e.g., Flye or Shasta) if you expect larger circles; for small circles, direct read‑through evidence is often sufficient. Tools like CReSIL (long‑read eccDNA caller) and integrative pipelines (e.g., eccDNA‑pipe) can automate candidate consolidation.
Repeat annotation and microhomology inspection. Annotate candidates with RepeatMasker and Tandem Repeats Finder. Flag those with high repeat content at junctions for more stringent scrutiny. Inspect junction sequences for short microhomologies (e.g., 2–10 bp), especially GC‑rich motifs, which are typical of MMEJ and have been reported in multiple systems (Hu 2023; DOI: 10.7554/eLife.87115). Where helpful, visualize local k‑mer uniqueness or mappability to understand whether flanking sequence supports unique placement.
Tiered validation and reporting. Tier 1 denotes a long‑read–supported junction (≥1 read traversing the head‑to‑tail junction with high mapping quality) with or without outward‑facing PCR confirmation. Tier 2 captures short‑read support with ≥3 independent split reads and ≥1 supporting read‑pair signature, plus outward‑facing PCR and Sanger validation. Tier 3 is provisional with limited support; report separately and do not use for locus‑level conclusions without orthogonal evidence. For an in‑depth discussion of filtering artifacts and reporting standards, see our methods‑focused companion: Bioinformatics for eccDNA: Detection Algorithms, Filtering Artifacts, and Reporting Standards.
As a practical heuristic for short‑read Circle‑Seq calls, we typically treat a candidate as a starting‑point hit when Circle‑Map reports a circle score >50 with ≥3 independent split reads and ≥1 discordant (outward‑facing) pair; note that some workflows accept ≥2 independent structural evidences as a minimal filter. Cross‑validation reduces false positives: compare Circle‑Map calls against ECCsplorer’s multi‑module output and confirm long‑read junctions with CReSIL or an integrated eccDNA‑pipe run. These thresholds are empirical starting points and must be calibrated with spike‑ins and cohort controls.
Reporting and QC templates for microsatellite‑rich eccDNA
Because replicate overlap can be modest in Circle‑Seq–style datasets, your QC section should be as robust as your results section. At minimum, report input and enrichment (mass of input DNA; exonuclease enzymes and conditions; linear‑DNA depletion metrics and spike‑in plasmid recovery), library attributes (platform, read length distribution, yield, duplicate rate, and any size‑selection windows), candidate quality (counts per sample; fraction overlapping repeats; number of Tier 1/2 junctions; distribution of microhomology lengths and GC content at junctions), and reproducibility (replicate concordance at nominated junctions; technical vs biological replicate design; batch identifiers and run dates for transparency).
A concise table helps readers evaluate design trade‑offs:
| Strategy | Primary strength | Typical pitfalls in repeats | Validation path |
|---|---|---|---|
| PacBio HiFi enrichment | High per‑read accuracy; precise junction bases | Input requirements; may under‑sample tiny circles if size‑selected too tightly | Outward‑facing PCR + short‑read capture |
| ONT enrichment | Ultra‑long reads; native methylation context | Higher raw error around homopolymers; careful polishing needed | Outward‑facing PCR; re‑basecall/polish; short‑read confirmation |
| Circle‑Seq (short reads) | Cohort scale; cost‑efficient | Multi‑mapping; RCA chimeras; concatemer bias | Long‑read confirmation for a subset + Sanger |
| Hybrid capture | Targeted sensitivity | Off‑target capture in paralogous repeats | Spike‑in targets; negative controls; long‑read traversal |
To streamline reporting across studies of microsatellite and eccDNAs, many teams create a standardized “circular repeat” data dictionary that records sample metadata, enrichment steps, enzyme lots, software versions, junction sequences, primer sets, and validation outcomes. Share it alongside BAM/FASTQ subsets as RUO supplemental files.
Reproducibility and standards alignment (RUO): To improve reproducibility and community reuse, studies should publish a minimal metadata CSV (sample ID, sample type/tissue, platform, library prep/enzyme and lot, exonuclease/RCA conditions, sequencing run ID, spike‑in ID and recovery%, software/tools + versions, and validation thresholds). Deposit raw reads (CRAM/FASTQ) to SRA/ENA, analysis scripts and workflows (Nextflow/Snakemake) to GitHub, and a versioned snapshot with example output and metadata to Zenodo (DOI). Provide a CSV template and command‑line snippets for validation in supplements.
Worked example: a compact pipeline for repeat‑aware eccDNA discovery (RUO)
Below is a compact, reproducible outline you can adapt. Think of it as a scaffold rather than a one‑size‑fits‑all recipe.
Preprocessing and enrichment QC
- Run FastQC/MultiQC on raw reads; filter low‑complexity reads if warranted.
- Quantify exonuclease depletion by qPCR at a linear locus; record fold‑depletion. Spike in a small plasmid and report recovery.
Short‑read mapping and candidate nomination
- Align with BWA‑MEM (retain secondary/supplementary alignments).
- Run Circle‑Map Realign; nominate candidates with circle score above your cohort‑calibrated threshold (start with 50+ as a heuristic) and require ≥3 split reads plus ≥1 supporting discordant pair.
Long‑read mapping and junction traversal
- Align with minimap2 using settings tuned for accurate mapping of repetitive DNA (e.g., map‑ont or map‑hifi presets). Extract reads spanning head‑to‑tail junctions.
- Optionally assemble with Flye/Shasta if larger circles are expected; polish assembled contigs to refine junction bases.
Repeat annotation and junction inspection
- Annotate with RepeatMasker/TRF. Inspect 100–200 bp windows around the junction for microhomology (2–10 bp) and GC enrichment.
Validation and tiering
- Design outward‑facing primers; validate a representative subset by PCR and Sanger.
- Promote candidates to Tier 1 when a long‑read traverses the junction and/or PCR/Sanger confirms the exact sequence.
Reporting
- Produce a per‑sample table listing candidate coordinates, repeat class, support counts, tier, and validation status. Include a section on “eccDNA repeats” summarizing the fraction of circles with repeat‑flanked junctions and the microhomology length distribution.
For additional platform capability context in RUO terms, see our neutral overview at DNA Sequencing.
Three short research vignettes (RUO)
- Replication stress and repeat‑rich eccDNA in cultured cells. Design: Exonuclease‑enriched DNA from cell lines exposed to hydroxyurea; long‑read library (ONT) to capture spanning reads at simple repeats, followed by short‑read capture confirmation of nominated junctions. Outcome: Elevated counts of repeat‑bordered circles under stress; microhomology distributions enriched for 3–6 bp GC‑rich motifs; a subset validated by outward‑facing PCR and Sanger. Note: Stress conditions can shift size distributions; report exact dosing and duration to support reproducibility.
- MSI‑H cohort exploration (hypothesis‑generating). Design: Retrospective RUO analysis of stored extracts; Circle‑Seq libraries on a small MSI‑H group and a microsatellite‑stable group; confirm a subset with PacBio HiFi to pin down junctions. Outcome: Heterogeneous patterns of repeat‑rich eccDNA; signal varies by locus and sample. Without stratified power, treat any differences as preliminary. Note: MSI status is a covariate, not a conclusion; prioritize effect‑size estimation and confidence intervals over dichotomous claims.
- A CAG‑repeat circle with long‑read junction traversal. Design: Target a locus known for CAG expansions; apply long‑read enrichment; identify direct read‑through of a head‑to‑tail junction flanked by a 4 bp GC‑rich microhomology; confirm with outward‑facing PCR and Sanger. Outcome: Tier 1 validation with exact junction sequence; short‑read capture indicates presence across additional replicates at lower abundance. Note: Provide the junction FASTA, PCR primers, and software versions in your supplemental package for reuse.
Figure 3: Illustrative eccDNA length distribution informed by open‑access reports—peaks around nucleosome‑associated sizes (∼150–400 bp) with a broader tail into the kilobase range. Use your own enrichment and platform to establish expected ranges in your system and report them transparently.
Practical pitfalls and how to mitigate them
RCA‑induced artifacts can generate concatemers and amplify slippage in repeat tracts. Use conservative reaction times, enzyme choices validated for low chimera rates, and size selection. Downstream, require junction‑centric evidence rather than relying on coverage peaks. Multi‑mapping over‑confidence is another trap: reporting candidates based only on coverage shoulders near repeats risks mapping to paralogous loci. Make junction evidence your gate and annotate k‑mer uniqueness in the flanking sequence. Hybrid‑capture cross‑talk in paralogous repeat families can be reduced with blocker oligos and monitored with decoy probes. Finally, beware over‑filtering true positives: stringent multi‑mapping filters can discard real circles arising from low‑complexity regions. Tier candidates instead of using a single hard cutoff, and reserve long‑read validation budget for the gray zone.
Conclusion: what circular analysis unlocks in repeat biology and cancer research
Studying microsatellite and eccDNAs together opens a window into how repetitive DNA reshapes genomes outside chromosomes. At the mechanism level, you can quantify slippage‑driven hairpins, MMEJ footprints, and template switching. At the systems level, you can explore how repeat context interacts with replication stress and repair choice to diversify circles. For cancer research specifically (RUO), eccDNA repeats offer testable hypotheses about heterogeneity and plasticity without stepping into clinical claims. Here’s the deal: if you anchor your pipeline on junction‑centric evidence, mix long‑read discovery with short‑read scaling, and report repeat‑aware QC transparently, your findings will travel well between labs.
If you are planning a mixed‑platform study and need a neutral walkthrough of platform and analysis trade‑offs, our resource pages summarize typical inputs and analysis deliverables in a research framing. For example, see the brief overview at DNA Sequencing or the pan‑platform summary at Whole Genome Sequencing Services.
Author
Yang H. — Senior Scientist, CD Genomics; University of Florida.
Yang is a genomics researcher with over 10 years of research experience in genetics, molecular and cellular biology, sequencing workflows, and bioinformatic analysis. Skilled in both laboratory techniques and data interpretation, Yang supports RUO study design and NGS-based projects.
References:
-
Hu J, Zhang C, Li X, et al. Microhomology‑mediated circular DNA formation from mammalian germline cells. eLife. 2023;12:e87115. DOI: 10.7554/eLife.87115.
-
Gadgil RY, Node‑Langhans T, Roberts B, et al. Microsatellite break‑induced replication generates highly mutagenized eccDNAs with abundant template switching. NAR Cancer. 2024;6(2):zcae027. DOI: 10.1093/narcan/zcae027.
-
Yang N, Van Houten M, Zhu Y, et al. Transposable element landscapes in aging Drosophila. PLOS Genetics. 2022;18(3):e1010024. DOI: 10.1371/journal.pgen.1010024.
-
Mann L, Seibt KM, Weber B, Heitkam T, et al. ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next‑generation sequencing data. BMC Bioinformatics. 2022;23:40. DOI: 10.1186/s12859-021-04545-2. Full open‑access text: https://pmc.ncbi.nlm.nih.gov/articles/PMC8760651/
-
Deshpande V, Luebeck J, Nguyen N.P.D., et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nature Communications. 2019;10:297. DOI: 10.1038/s41467-018-08200-y.
Turner KM, Deshpande V, Beyter D, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543:122–125. DOI: 10.1038/nature21356.
-
Fang M, Li Y, Chen X, et al. eccDNA‑pipe: an integrated pipeline for identification, analysis and visualization of extrachromosomal circular DNAs. Briefings in Bioinformatics. 2024;25(2):bbae034. DOI: 10.1093/bib/bbae034.
-
Wanchai V, Kanchanaphum P, Inthanon W, et al. CReSIL: accurate identification of extrachromosomal circular DNA from long‑read sequencing data. Scientific Reports. 2022;12:8153. DOI: 10.1038/s41598-022-12163-0.
-
Prada‑Luengo I, Krogh A, Maretty L, Regenberg B. Sensitive detection of circular DNAs at single‑nucleotide resolution using guided realignment of partially aligned reads. BMC Bioinformatics. 2019;20:663. DOI: 10.1186/s12859-019-3160-3.
-
Wang Z, Liu J, Chen Q. Unveiling the mysteries of extrachromosomal circular DNA. Cell & Bioscience. 2024;14:163. DOI: 10.1186/s13578-024-01263-3.