banner
Reading CRISPR NGS Results: Indels, Allele Frequency, Mosaicism, and Zygosity

Reading CRISPR NGS Results: Indels, Allele Frequency, Mosaicism, and Zygosity

Flat-vector illustration of a computer screen displaying a simplified targeted NGS report with allele frequency bars, an indel size distribution histogram, and a zygosity callout panel. Sample tubes labeled T0 and T1 sit beside the screen. Figure 1: An annotated targeted NGS report for a CRISPR-edited crop line, showing the key elements researchers need to interpret — allele frequency, indel distribution, and zygosity.

You sent genomic DNA from your regenerated T0 plantlets or T1 seedlings for targeted amplicon NGS, and the service provider has returned a report. It contains a table of alleles, a list of percentages, and terms like "modified reads" and "quantification window." The data is all there — but making sense of it, especially for a polyploid crop or a chimeric T0 line, is not always straightforward.

This article walks through each section of a typical CRISPR targeted NGS report, explains how to interpret the key metrics, and provides decision rules for translating NGS data into line advancement calls. It assumes you have already chosen targeted amplicon NGS as your validation method — if you are still deciding which method to use, start with the CRISPR editing validation method comparison first.

What the Report Contains

A typical targeted amplicon NGS report for CRISPR validation includes three layers of output: a sample-level editing efficiency summary, an allele frequency table, and quality control statistics. Understanding what each layer tells you is the first step.

Editing Efficiency

The top-level number — usually labeled "Modified reads" or "Editing efficiency" — is the percentage of sequencing reads that differ from the wild-type reference sequence within the quantification window around the cut site. It answers the question: did editing happen?

In tools like CRISPResso2, this number appears as Modified% in the quantification summary. A modified read is any read carrying an insertion, deletion, or substitution within the window. Unmodified reads match the reference exactly.

A 75% editing efficiency means three out of four reads show some edit at the target site. A 3% editing efficiency means editing occurred, but at a frequency well below what Sanger sequencing would reliably detect. Both numbers are useful — but editing efficiency alone does not tell you whether the edits are the same allele or a mix of different ones.

Allele Frequency Table

The allele table lists every unique sequence found in the data, the number of reads supporting each, and its percentage of total reads. This is where zygosity is determined.

A typical table includes these columns:

Column What It Means
Aligned Sequence The DNA sequence of the allele, with dashes for deletions
Reference Sequence The wild-type reference for comparison
Read Count Number of reads supporting this allele
% Reads Allele frequency as a percentage of all aligned reads
Read Status Modified or Unmodified

The allele table is the core of the report. A single edited allele at 98% frequency tells a very different story than six edited alleles each at 10–20% frequency, even if both samples have similar overall editing efficiency.

QC Statistics

The report header or a separate QC tab reports total input reads, reads after quality filtering, and reads successfully aligned to the reference amplicon. Low alignment rates — below roughly 70% — can indicate poor sample quality, primer-dimer contamination, or a reference sequence that does not match the actual amplicon. Before drawing conclusions from the allele table, check that the alignment rate is acceptable and that read depth per amplicon meets the minimum in your analysis protocol.

Four Zygosity Patterns

Across plant editing projects — diploid and polyploid alike — the allele frequency table resolves into one of four basic patterns. Recognizing these patterns quickly is the core skill of reading a CRISPR NGS report.

Four-panel diagram showing allele frequency distributions for homozygous, heterozygous, biallelic, and mosaic/chimeric editing outcomes in a CRISPR-edited plant. Each panel shows a simplified bar chart of allele percentages. Figure 2: The four zygosity patterns commonly seen in CRISPR-edited crop lines, each defined by the number and frequency distribution of edited alleles.

Homozygous

A single edited allele at high frequency — typically above 90%, and ideally above 95% — with no wild-type reads or only trace wild-type reads remaining.

In a diploid crop, this means both copies of the gene carry the same edit. In practice, a clean homozygous call requires two pieces of evidence: a dominant edited allele above 90%, and wild-type reads below 2–3%. If the wild-type reads sit at 4–5%, consider the possibility of residual heterozygosity, low-level chimerism, or cross-contamination. For lead lines heading into publication, colony screening or a second independent PCR can confirm the absence of the wild-type allele.

For guidance on confirming homozygosity as part of a broader validation package, the article on CRISPR edited plant line selection and sequencing QC covers the evidence package reviewers expect.

Heterozygous

Two alleles at roughly equal frequency: one wild-type at approximately 50%, and one edited allele at approximately 50%.

This is the expected pattern when one copy of the gene is edited and the other remains wild-type. The exact ratio rarely hits a perfect 50:50 — 45:55 or 48:52 are normal. For line advancement, a heterozygous T1 plant can be selfed to produce homozygous T2 progeny. The key is confirming that the edited allele sequence is the in-frame or out-of-frame variant you intended.

Biallelic

Two different edited alleles, each at roughly 40–60%, with no wild-type reads or very few wild-type reads.

This means both gene copies are edited, but they carry different indels — for example, a +1 insertion on one allele and a −3 deletion on the other. Biallelic edits are common in CRISPR editing, particularly when NHEJ repair generates different outcomes at each cut site. From a functional knockout perspective, two different frameshift alleles are often equivalent to a homozygous knockout, but reviewers will want to see the sequences of both alleles.

Mosaic / Chimeric

Three or more alleles, often including wild-type, at varying frequencies — none approaching 90%.

Mosaicism is typical in T0 plants, where different cells carry different editing outcomes. The allele table may show the wild-type allele at 30%, two different edited alleles at 20% each, and several low-frequency alleles below 5%. A mosaic T0 plant can still produce homozygous T1 progeny if the edited allele is present in the germline, but the T0 allele frequencies cannot reliably predict T1 segregation ratios.

The distinction between biallelic and mosaic matters for line advancement. A clean biallelic T1 plant can advance directly. A mosaic T0 should advance to T1 progeny testing — genotype 10–20 T1 plants per line to determine whether the edited allele transmits through the germline and at what frequency.

Why T0 and T1 Results Differ

A common source of confusion is seeing different allele patterns in T0 and T1 generations of the same line. The T0 plant shows a complex mix of alleles at low frequencies. The T1 progeny from the same T0 parent show a clean homozygous or heterozygous pattern. This is expected behavior, not a data error.

Somatic vs. Germline Editing

In T0 plants, editing often occurs in somatic cells during tissue culture and regeneration — after the first few cell divisions. The result is a plant composed of sectors with different genotypes. An allele that is abundant in leaf tissue (the typical DNA source for genotyping) may be absent from the germline. Conversely, an allele present at low frequency in the leaf may be the one that gave rise to the germline and transmits to the next generation.

Because targeted NGS genotypes a bulk DNA extract — typically from leaf tissue — the allele frequencies it reports reflect the somatic tissue sampled, not necessarily the germline. This is why T0 allele frequencies should be treated as screening data, not as predictors of T1 segregation.

Stabilization by T1

By the T1 generation, Mendelian segregation simplifies the picture. A T0 plant that was chimeric in leaf tissue but carried a single edited allele in the germline will produce T1 progeny that segregate as heterozygous or homozygous for that allele. The T1 allele table typically shows the clean patterns described above — homozygous, heterozygous, or biallelic — rather than the multi-allele mosaic of T0.

Practical guidance: genotype T0 plants by Sanger or targeted NGS to confirm editing occurred. Make line advancement decisions on T1 data. For the initial T0 screen, the CRISPR editing validation method comparison covers when Sanger is sufficient and when to escalate to NGS.

What Read Depth Means for Confidence

Read depth — the number of sequencing reads covering the target amplicon — determines how confidently a zygosity call can be made.

How Many Reads Is Enough

Zygosity Call Minimum Read Depth Why
Homozygous (>95% single allele) 500 reads Need enough reads to rule out a 5% minor allele with confidence
Heterozygous (~50:50) 200 reads Two alleles at high frequency; moderate depth sufficient
Biallelic (two edited alleles) 500 reads Need to distinguish two edited alleles from sequencing errors
Mosaic detection 1,000+ reads Low-frequency alleles need deep coverage to distinguish from noise
Polyploid crop genotyping 1,000+ reads Multiple allelic states at one locus; more reads needed to resolve each

These are practical guidelines, not rigid cutoffs. If your report shows 198 reads for a heterozygous call, the conclusion is still directionally correct. But if a "homozygous" call rests on 120 reads, consider whether a 5% wild-type allele — roughly 6 reads — would be distinguishable from sequencing noise at that depth.

The Difference Between Read Depth and Read Count

A targeted amplicon NGS report often distinguishes between total reads — the raw output from the sequencer — and aligned reads — the subset that maps to the reference amplicon. The relevant number for zygosity calling is aligned reads per sample, not total reads. If your report shows 50,000 total reads but only 300 aligned to your target, the effective depth is 300. Low alignment rates are typically a panel design or sample quality issue — for guidance on avoiding this problem, see the article on CRISPR amplicon sequencing panel design for edited crop lines.

From Allele Data to Line Selection

The allele table gives you the numbers. The next step is deciding which lines to keep, which to advance, and which to discard.

Three Decision Rules

  • Advance — A single edited allele at over 90% frequency, with wild-type reads below 3%, in a T1 or later plant. For a biallelic line, two edited alleles summing to over 95% with no wild-type reads. The line is ready for phenotyping, seed increase, or inclusion in a publication.
  • Test progeny — Any T0 plant with confirmed editing, regardless of allele pattern. The T0 genotype reflects somatic tissue; progeny testing tells you what the germline carries. Also applies to T1 plants with a clear edited allele at 40–60% (heterozygous) — self or cross these to produce homozygous T2.
  • Discard — No modified reads above the detection threshold (typically 0.1% for targeted NGS). Also discard lines where the only edited allele is a silent or in-frame mutation when a knockout was intended, or lines where the target site shows unexpected large deletions that disrupt surrounding genes.

What to Do With Low-Frequency Alleles

Targeted NGS frequently detects edited alleles at frequencies below 1% that are invisible to Sanger and even to PCR-CE. Most are genuine editing outcomes — rare NHEJ repair products present in a small fraction of cells. In a T1 or later plant, low-frequency alleles below 1–2% are generally noise for the purpose of line selection: the dominant allele pattern drives the decision.

Three scenarios where low-frequency alleles do matter:

  • Regulatory or safety assessment — if the edited plant may enter a regulated field trial, document all detected alleles at the target site, including low-frequency ones.
  • T0 chimerism assessment — a T0 plant with one dominant allele at 60% and several low-frequency alleles at 2–5% is likely chimeric, not biallelic.
  • Unexpected on-target rearrangements — large deletions or inversions at low frequency can signal complex repair outcomes that warrant follow-up.

Common Confusions in NGS Reports

Several points in a CRISPR NGS report generate recurring questions. Addressing them here avoids back-and-forth with the service provider.

Editing Efficiency vs. Knockout Efficiency

Editing efficiency (Modified%) measures the percentage of reads with any sequence change at the target site. Knockout efficiency measures the percentage of reads with a frameshift mutation expected to disrupt gene function. The two numbers differ because in-frame indels — deletions or insertions in multiples of 3 bp — count as edited but may not knock out the gene.

If your goal is a functional knockout, the frameshift percentage is the more relevant number. CRISPResso2 reports this as Frameshift% when a coding sequence is provided. If your report only includes editing efficiency, ask for the frameshift breakdown.

Why the Same Line Can Give Different Results from Different Services

The same DNA sample analyzed by two different targeted NGS providers can return slightly different allele frequencies. Common sources of variation include the specific PCR primers used, the number of amplification cycles, the sequencing platform, and the bioinformatics pipeline — particularly the quantification window size and the minimum allele frequency threshold for reporting. These differences are typically small (1–3%) and do not change the zygosity call. If two reports disagree on zygosity for the same sample, check the read depth, alignment rate, and quantification window settings before questioning the sample.

Sample Cross-Contamination

When an allele table shows a low-frequency edited allele (1–3%) in a sample that should be wild-type, or a wild-type allele at 3–5% in a sample that should be homozygous, consider cross-contamination during DNA extraction, PCR setup, or library pooling. NGS is sensitive enough to detect trace contamination that would be invisible in a Sanger trace. Most service providers include negative controls in each sequencing run — if the control is clean, the low-frequency signal is likely real biology (rare repair outcomes) rather than contamination.

Indel Size Distribution

Bar chart showing the distribution of indel sizes from a typical CRISPR editing experiment, with most indels clustering around -3 to +1 bp and a smaller number of larger deletions extending to -50 bp. Figure 3: Indel size distribution from a targeted NGS run, showing the characteristic peak of small indels around the cut site and the long tail of larger deletions.

Most CRISPR-induced indels are small — 1 to 5 bp deletions or insertions at the cut site. When the report shows indels of 20 bp, 50 bp, or larger, these are real repair outcomes, not sequencing errors. Large deletions are more common in certain species, with certain guide RNAs, and when Cas9 remains active for extended periods. If large deletions appear at more than a few percent frequency, consider sequencing the surrounding genomic region to check for unexpected structural changes, particularly for lines heading into field trials or regulatory review.

Putting the Report to Use

A targeted NGS report for CRISPR validation is not the end of the workflow. It is the data that feeds into line advancement, publication figures, and — for some projects — regulatory documentation.

Making a Line Table

For your own records, condense the NGS report for each line into a spreadsheet row:

Field Example
Line ID OsFTL-12-T1
Target locus OsFTL1
Editing efficiency 87.3%
Top allele + frequency −7 bp deletion, 52.1%
Second allele + frequency +1 bp insertion, 35.2%
Wild-type reads 0.8%
Zygosity call Biallelic
Frameshift? Both alleles out-of-frame
Read depth 1,247 aligned reads
Decision Advance to T2 seed increase

This table becomes the supplementary data for a manuscript and the internal tracking document for the breeding or research program. For lines destined for publication, the article on CRISPR edited plant line selection and sequencing QC outlines the full evidence package.

When the Data Needs a Second Look

If the report does not match expectations — editing efficiency far below what the guide RNA design predicted, an unexpected wild-type allele in a line you thought was fixed, or allele frequencies that differ dramatically between duplicate samples — do not assume the NGS data is wrong. First, re-extract DNA from a new tissue sample and re-amplify. If the result holds, consider whether the original T0 or T1 plant was mislabeled, whether the target site has a natural polymorphism that interfered with primer binding, or whether the guide RNA has lower on-target activity than predicted.

CD Genomics provides CRISPR Validation Sequencing for Agriculture covering on-target genotyping by targeted NGS with analyzed data returned as allele frequency tables and editing efficiency summaries. The Targeted Sequencing platform supports custom amplicon panel design for projects requiring multiplexed screening. For data analysis support, Bioinformatics Analysis Services include custom allele calling, frameshift annotation, and integration with phenotypic data.

FAQ

Q1: What is a "good enough" editing efficiency for line advancement?
A: It depends on the editing goal, not a single universal threshold. For a knockout line, a frameshift allele above 90% frequency in T1 with no wild-type reads is sufficient for most functional studies. For a base edit at a specific nucleotide, editing efficiency as low as 10–20% may be acceptable if the desired conversion is confirmed and the line can be enriched through selection or selfing. There is no fixed minimum editing efficiency for publication — reviewers care about the quality of the zygosity evidence, not the raw efficiency number.

Q2: How do I distinguish a biallelic line from a mosaic line in the allele table?
A: A biallelic line shows two edited alleles, each at roughly 40–60% frequency, with no or very few wild-type reads. The two allele frequencies typically sum to near 100%. A mosaic line shows three or more alleles, often including wild-type, and the frequencies are more fragmented — for example, 30% wild-type, 20% edited allele A, 18% edited allele B, and several alleles below 10%. Biallelic is a stable genotype; mosaic requires progeny testing.

Q3: My T0 plant shows 95% editing efficiency with one dominant allele. Can I call it homozygous?
A: You can call it a candidate homozygous T0, but confirm in T1 progeny. A T0 plant regenerated from tissue culture may carry a uniform somatic genotype while the germline is different. Genotype 10–20 T1 progeny from this T0. If all T1 plants carry the same edit — with the expected Mendelian ratios for the selfed progeny — the homozygous call is confirmed.

Q4: How deep does NGS sequencing need to go for a polyploid crop?
A: For a tetraploid crop (e.g., potato, durum wheat), aim for at least 1,000 aligned reads per sample. A hexaploid (e.g., bread wheat) may require 2,000 or more reads to resolve all allelic states with confidence. The key is not just total depth but even coverage across alleles — if one allele amplifies more efficiently than others, the reported frequencies can be skewed regardless of depth. A spike-in of known-ratio controls can help detect amplification bias.

Q5: What if the NGS report shows a small percentage of unexpected edits at the target site?
A: Low-frequency unexpected edits — a rare indel at 0.5%, a substitution at 0.3% — are normal in CRISPR editing and reflect the stochastic nature of DNA repair. In most cases, they do not affect line selection decisions. Document them for your records and focus on the dominant alleles. If unexpected edits exceed 5% combined frequency, the line may be more chimeric than anticipated and should be progeny-tested before advancement.

Glossary

Allele frequency: The percentage of sequencing reads supporting a specific edited or unedited allele at the target locus, calculated as reads for that allele divided by total aligned reads.

Aligned reads: The subset of total sequencing reads that map to the reference amplicon sequence. Only aligned reads are used for allele frequency calculations.

Biallelic edit: Two different edited alleles at approximately equal frequency, with no wild-type allele detectable. Both copies of the gene are edited, but differently.

Chimera / mosaic: A plant composed of cells with different genotypes at the edited locus, producing an allele table with three or more alleles at fragmented frequencies.

Editing efficiency (Modified%): The percentage of aligned reads carrying any sequence change — insertion, deletion, or substitution — relative to the wild-type reference within the quantification window.

Frameshift percentage: The subset of modified reads where the indel length is not a multiple of 3 bp, predicting a shift in the amino acid reading frame and likely gene knockout.

Heterozygous: One edited allele at approximately 50% frequency and one wild-type allele at approximately 50%.

Homozygous: A single edited allele at high frequency (typically above 90–95%), with trace or no wild-type reads. Both gene copies carry the same edit.

Quantification window: The region around the predicted cut site (typically centered 3 bp upstream of the PAM for Cas9) within which sequence variants are counted as editing events.

Read depth: The number of aligned sequencing reads covering the target amplicon per sample. Higher depth enables more confident detection of low-frequency alleles.

References

  1. Clement, K., Rees, H., Canver, M. C., Gehrke, J. M., Farouni, R., Hsu, J. Y., Cole, M. A., Liu, D. R., Joung, J. K., Bauer, D. E., & Pinello, L. "CRISPResso2 provides accurate and rapid genome editing sequence analysis." Nature Biotechnology, 2019, 37(3), 224–226. DOI: 10.1038/s41587-019-0032-3
  2. Gong, Z., Zhang, Y., Xia, D., Yoon, S., Crisp, P. A., & Botella, J. R. "Comprehensive benchmarking of genome editing quantification methods for plant applications." iScience, 2025, 28(6), 112350. DOI: 10.1016/j.isci.2025.112350
  3. Liu, X., Huang, L., Shen, Q., et al. "An Efficient and Cost-Effective Novel Strategy for Identifying CRISPR-Cas-Mediated Mutants in Plant Offspring." The CRISPR Journal, 2025, 8(1), 26–36. DOI: 10.1089/crispr.2024.0057
  4. Qin, Y., Yun, S. D., Kim, H. L., Choi, J. Y., Lim, M.-H., Oh, S. A., & Park, S. K. "Molecular Characterization of CRISPR-Cas9-Edited Rice Across Generations and Associated Technical Challenges in Nucleotide Editing Tracing." Plant Breeding and Biotechnology, 2025, 13, 207–228. DOI: 10.9787/PBB.2025.13.207
  5. Tsakirpaloglou, N., Septiningsih, E. M., & Thomson, M. J. "Guidelines for Performing CRISPR/Cas9 Genome Editing for Gene Validation and Trait Improvement in Crops." Plants, 2023, 12(20), 3564. DOI: 10.3390/plants12203564
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Send a MessageSend a Message

For any general inquiries, please fill out the form below.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
We provide the best service according to your needs Contact Us
OUR MISSION

CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.

Contact Us
Copyright © CD Genomics. All Rights Reserved.
Top