From Edited Plant to Lead Line: Selection, Sequencing QC, and Data Package
Figure 1: The edited plant selection funnel — from a population of regenerated lines through sequential QC checkpoints to a single lead line with a complete molecular characterization data package.
Generating edited plants is the start, not the end. A typical CRISPR editing project in crops produces dozens or hundreds of T0 regenerants, each carrying a different set of editing outcomes, integration patterns, and background mutations. Selecting which lines to advance — and building the evidence to support that decision — is where the project's scientific value is determined.
This article lays out a practical selection framework for moving from a population of edited plants to one or two lead lines with a publication-ready or regulatory-grade molecular characterization package. It assumes you already have on-target genotyping data in hand — the methods for generating that data are covered in CRISPR editing validation in crops and reading CRISPR NGS results.
What Lead Line Selection Means
Lead line selection is not the same as finding an edited plant. It is the process of narrowing a population of edited lines to the one or two that will be advanced into field trials, seed multiplication, publication figures, or regulatory review. A lead line carries the desired edit in a clean genetic background, is free of transgene sequences, and is supported by data that can withstand reviewer or regulator scrutiny.
The Selection Funnel
The selection process narrows lines at each step:
- All regenerated T0 lines — confirm that editing occurred at the target site. Lines with no detectable editing are discarded.
- Edited T0 lines with single-copy T-DNA — exclude lines with complex integration patterns, tandem repeats, or vector backbone integration.
- T1 progeny of selected T0 lines — confirm Mendelian segregation, identify homozygous or biallelic lines, and confirm absence of the Cas9 transgene in null segregants.
- Lead T1/T2 candidates — full molecular characterization: on-target allele verification with high-depth NGS, copy number confirmation, off-target screening, and phenotypic assessment.
- Final lead line — one line with a complete data package, matched wild-type comparator, and documented generational stability.
At each step, lines are eliminated. The goal is not to find every edited line but to find the cleanest one.
Why Every Generation Needs Different QC
QC demands differ by generation. A T0 plant is a screening unit — its genotype reflects somatic tissue, not necessarily the germline, and chimerism is expected. The appropriate QC at T0 is confirmation that editing occurred, plus rough T-DNA copy number screening to exclude multi-copy lines from the labor-intensive T1 phase.
T1 is the first generation where zygosity is genetically meaningful. Homozygous and biallelic lines can be identified, and segregation of the Cas9 transgene away from the edited allele can be confirmed. This is the generation where most QC investment should be concentrated, because decisions made on T1 data determine which lines enter field trials or seed increase.
T2 and beyond confirm stability. The edit should be inherited without change, the transgene should remain absent, and the phenotype should be consistent across progeny. A single round of high-depth amplicon NGS plus copy number ddPCR at T2 is typically sufficient if T1 characterization was thorough.
Confirming the Edit
Before investing in off-target screening or copy number analysis, confirm exactly what edit is present in each candidate line.
On-Target Genotyping with High-Depth NGS
Targeted amplicon NGS at 500–1,000× depth provides the allele-level resolution needed for lead line characterization. Sanger sequencing is adequate for T0 screening but does not reliably detect alleles below 10–15% frequency or resolve complex indel patterns in polyploid crops. For lead lines, high-depth NGS answers three questions that Sanger cannot:
- Are there low-frequency edited alleles below the Sanger detection limit that could indicate residual chimerism?
- In a polyploid crop, which subgenomes carry the edit and at what zygosity?
- Is the wild-type allele truly absent, or present at 1–2% — trace levels that could indicate cross-contamination or residual heterozygosity?
The amplicon panel approach described in CRISPR amplicon panel design covers the primer placement and coverage planning required to produce publication-grade allele data.
Zygosity Verification Across Generations
A zygosity call from T0 leaf tissue is not reliable for line advancement. Qin et al. (2025) characterized CRISPR-edited rice lines across multiple generations and documented cases where T0 genotypes did not predict T1 segregation patterns — a finding consistent with the well-established principle that T0 plants are often chimeric, and the germline genotype may differ from the somatic tissue sampled for genotyping.
Confirm zygosity at T1 or later. For a diploid crop, a homozygous call requires a single edited allele at >90% frequency with wild-type reads below 2–3% at high sequencing depth. For a biallelic call, two edited alleles should together account for >95% of reads. If wild-type reads persist at 3–5% in a line that should be homozygous, re-sample from a different tissue or self the line and check T2 progeny before concluding the call.
Confirming Transgene-Free Status
The Cas9 transgene and selection marker must be segregated away from the edited allele. PCR screening for Cas9 and the selectable marker in T1 progeny identifies null segregants — lines that carry the edit but not the transgene. Wang et al. (2024) demonstrated a dual-function D-amino acid oxidase selection system in Arabidopsis that simultaneously enriches for multiplex-edited lines and selects against Cas9-containing progeny, achieving 92.6% editing efficiency in T1 while eliminating the transgene by T2 without additional screening.
In practice, screen T1 plants by PCR for Cas9 and the selection marker. Self Cas9-negative plants and confirm the edited allele is homozygous in T2. For vegetatively propagated crops where segregation is not possible, select lines with the lowest T-DNA copy number at T0 and plan for transient expression or DNA-free delivery methods (RNP, viral vectors) in future projects.
Copy Number and Integration Analysis
Transgene copy number and integration pattern affect both regulatory status and the stability of the edited phenotype across generations.
Why Copy Number Matters
A single-copy, full-length T-DNA integration at a single genomic locus is the ideal outcome. Multi-copy integrations, tandem repeats, truncated inserts, and vector backbone sequences create complications:
- Multi-copy integrations are harder to segregate away from the edit, may cause gene silencing, and complicate regulatory approval.
- Tandem repeats can undergo homologous recombination in subsequent generations, producing unpredictable structural rearrangements.
- Vector backbone integration means sequences beyond the T-DNA borders — including antibiotic resistance genes in the plasmid backbone — are present in the plant genome, which can trigger additional regulatory scrutiny.
- Complex loci with insertions at multiple genomic locations require tracking multiple segregation loci, multiplying the population size needed to recover clean null segregants.
How to Assess Copy Number and Integration
| Method | What It Detects | Cost per Sample | Best For |
|---|---|---|---|
| ddPCR | T-DNA copy number (absolute quantification) | Low | Screening dozens of lines |
| qPCR | Relative copy number (requires calibrator) | Low | Budget-constrained labs |
| Southern blot | Integration pattern, band count | Medium | Historical standard, still required by some regulators |
| Paired-end WGS (low-coverage) | Insertion site, flanking sequences, structural arrangement | Medium | Full characterization of lead candidates |
| Whole-genome sequencing (30×) | Complete integration landscape, all insertion sites | High | Final lead line dossier |
Xu et al. (2024) compared all four major copy number estimation methods — Southern blot, qPCR, ddPCR, and paired-end WGS — across GM crop events and found that ddPCR provides the best combination of precision and throughput for routine screening, while paired-end WGS uniquely identifies insertion sites, structural arrangements, and backbone sequences that other methods miss.
A practical workflow: screen all edited T0 lines by ddPCR to select single-copy candidates, then perform paired-end WGS on the two to three lead candidates at T1 to confirm the integration structure and rule out backbone sequences. Reserve high-coverage WGS for the final lead line.
Figure 2: Four methods for T-DNA copy number estimation — Southern blot, qPCR, ddPCR, and paired-end WGS — each with different resolution, throughput, and cost profiles for lead line characterization.
Off-Target Screening for Lead Candidates
Off-target analysis is the most debated component of the lead line data package — essential for regulatory dossiers, increasingly expected by journal reviewers, but challenging to perform and interpret correctly.
When Off-Target Analysis Is Required
Not every edited line needs a full off-target screen. Tier the investment by line stage:
| Line Stage | Off-Target Requirement | Method |
|---|---|---|
| T0 screening | None | — |
| T1 candidate selection | In silico prediction + top 5–10 sites by targeted NGS | Amplicon NGS of predicted off-target sites |
| Lead line (publication) | All predicted sites within 2 mismatches + top 50 genome-wide | Targeted amplicon panel + optional low-coverage WGS |
| Lead line (regulatory) | Genome-wide off-target assessment | WGS (30×) of edited line + matched WT comparator |
Interpreting Off-Target Results
The most important concept in off-target analysis is the baseline: plants regenerated from tissue culture carry 1,200–1,500 single-nucleotide variants (SNVs) compared to the reference genome, regardless of whether CRISPR was applied. Sretenovic et al. (2023) demonstrated this in tomato by whole-genome sequencing of ABE8e-edited lines alongside GFP-expressing control plants regenerated through the same tissue culture pipeline. Both groups carried ~1,200–1,500 SNVs per plant, with no enrichment of editor-specific mutation signatures in the edited group.
The practical implication: finding a SNV in an edited line does not mean CRISPR caused it. Without a matched comparator — a wild-type plant of the same variety that went through the same tissue culture and regeneration process — it is impossible to distinguish editing-induced mutations from somaclonal variation. For publication, the minimum standard is to sequence the predicted off-target sites in both the edited line and a wild-type comparator. For regulatory dossiers, WGS of both the edited line and a matched WT comparator is the expected approach.
Building the Data Package
The data package is the final deliverable that supports the lead line selection decision. Its contents depend on the line's destination — a publication, a regulatory submission, or internal advancement in a breeding program.
Publication Data Package
For a peer-reviewed publication reporting a new edited crop line, reviewers typically expect:
- On-target editing evidence. High-depth targeted NGS data for the edited line and a WT comparator, showing the allele table, editing efficiency, and zygosity. Raw sequencing data deposited in a public repository (SRA, GEO).
- Off-target assessment. All predicted off-target sites sequenced in the edited line and WT, with results presented as a supplementary table. If WGS was performed, include the WGS comparison between the edited line and WT.
- Copy number and integration data. ddPCR or Southern blot confirming single-copy T-DNA integration. Evidence of transgene absence in the lead line by PCR spanning the Cas9 and selection marker cassettes.
- Generational stability. Amplicon NGS data from at least two generations (T1 and T2, or T2 and T3) showing the edit is stably inherited without sequence change.
- Phenotypic characterization. The edited phenotype compared to WT under controlled conditions, with replicate data. For agronomic traits, multi-location field trial data may be expected.
Qin et al. (2025) provide a worked example of this data package for CRISPR-edited rice lines, covering edited nucleotide sequences, gene expression, target phenotypes, transgene absence, and potential off-target effects across generations. Their study also documents the technical challenges encountered — including difficulty tracing specific nucleotide edits through PCR-based methods — which is valuable context for researchers building their first data package.
Internal Advancement Data Package
For lines advancing within a breeding program rather than to publication, the data package can be leaner:
- High-depth amplicon NGS confirming the edit at T1 and T2
- ddPCR confirming single-copy integration and transgene-free status
- Phenotype data from at least one generation of controlled-environment testing
The key difference is that off-target WGS is usually unnecessary for internal advancement. If the line later becomes a commercial candidate, the full regulatory package can be built at that point.
Figure 3: The five core components of a publication-ready lead line data package: on-target genotyping, off-target assessment, copy number data, generational stability, and phenotypic characterization.
Common Selection Mistakes
Advancing a T0 line based on leaf genotype alone. T0 leaf tissue reflects somatic editing, not necessarily germline transmission. A T0 plant with a perfect homozygous edit in leaf DNA may produce T1 progeny that segregate for the wild-type allele. Always confirm zygosity at T1 before making advancement decisions.
Assuming one round of Sanger sequencing is sufficient for lead line QC. Sanger sequencing has a detection floor around 10–15% allele frequency. Low-frequency edited alleles, residual wild-type reads at 2–5%, and subgenome-specific alleles in polyploids are invisible to Sanger but detectable by targeted NGS. For a lead line heading to publication or field trials, high-depth NGS is the appropriate tool.
Skipping the matched WT comparator for off-target analysis. Without sequencing a WT plant that went through the same tissue culture process, somaclonal variants are indistinguishable from off-target edits. This is the single most common reason reviewers request additional experiments.
Ignoring copy number in favor of editing efficiency. A line with a perfect edit at 95% efficiency but three T-DNA copies at two loci is a worse lead candidate than a line with 80% efficiency and a single-copy integration. The multi-copy line will require larger T1 populations to recover clean null segregants, and the edit may be linked to a T-DNA insertion in a gene that complicates phenotyping.
Failing to deposit raw sequencing data. Most plant journals now require raw NGS data deposition in a public repository. Plan for this before sequencing — ensure the data format, read depth, and consent documentation meet repository requirements.
From Selection to Production
A lead line with a complete molecular data package is the deliverable that converts an editing project into a publication, a field trial application, or a breeding program entry point. The investment in proper QC — high-depth NGS, copy number analysis, off-target screening, and generational stability data — is repaid in reviewer confidence, regulatory clarity, and the avoidance of costly re-work when a hastily advanced line fails at the next stage.
For researchers who need support building a lead line data package, CD Genomics provides CRISPR Validation Sequencing for Agriculture, which includes high-depth targeted amplicon NGS with allele frequency analysis for on-target confirmation and zygosity calling. The Targeted Sequencing platform supports custom amplicon panels for off-target candidate screening, and Bioinformatics Analysis Services provide WGS-based off-target analysis, copy number characterization, and integration site mapping.
For the broader strategy around off-target evidence — including when genome-wide screening is warranted and how to interpret conflicting off-target prediction results — the article on CRISPR off-target evidence for crop research covers the topic in depth.
FAQ
Q1: How many T1 plants should I genotype to find a clean homozygous null segregant?
A: For a single-copy T-DNA insertion at one locus, approximately 16 T1 plants gives a >95% probability of finding at least one homozygous edited, transgene-free plant. This assumes Mendelian segregation of both the edited allele (1:2:1 ratio) and the T-DNA (3:1 for presence:absence), with the two loci unlinked. If the T-DNA and the edit are linked on the same chromosome, or if the line has multiple T-DNA insertions, increase to 48–96 plants. Genotype all plants by PCR or targeted NGS before assuming segregation ratios.
Q2: Is whole-genome sequencing always necessary for off-target analysis?
A: No. For most publication-grade lead lines, targeted NGS of the top 5–50 predicted off-target sites — determined by in silico tools (CRISPOR, Cas-OFFinder) and, if available, experimental detection (GUIDE-seq, CIRCLE-seq) — is sufficient when performed on both the edited line and a WT comparator. WGS becomes necessary when: (a) the line is destined for a regulatory dossier, (b) the guide RNA has high sequence similarity to many genomic regions, or (c) the crop has a complex, polyploid genome where in silico prediction tools have limited accuracy. The article on CRISPR off-target evidence for crop research covers this decision in detail.
Q3: What copy number is "acceptable" for a lead line?
A: Single-copy, full-length T-DNA integration at one genomic locus is the gold standard. A single-copy insertion with a short truncation at one end is usually acceptable if the truncation does not include the Cas9 or guide RNA expression cassette. Two copies at unlinked loci are workable but require larger T1 populations to isolate clean null segregants. More than two copies, tandem repeats, or vector backbone integration should disqualify a line from lead status unless the species is exceptionally difficult to transform and no better lines exist.
Q4: How do I handle a polyploid crop where zygosity calls are ambiguous?
A: In polyploid crops (wheat, potato, canola), a single "homozygous" call across all subgenomes is rarely achievable in one generation. Instead, document which subgenomes carry the edit, at what allele frequencies, and with what zygosity — using the subgenome-specific primer strategies described in the panel design article. Advance the line if the desired edit is present in the target subgenome(s) and no wild-type allele is detectable at high depth. Full subgenome resolution can be achieved in subsequent generations through selfing or crossing.
Q5: How long does the full lead line characterization process take?
A: From T0 regenerant to a publication-ready data package for one lead line: 8–12 months for a seed-propagated crop with a 3–4 month generation time (rice, tomato, Arabidopsis). This includes T1 seed production (3–4 months), T1 genotyping and selection (1–2 months), T2 seed production (3–4 months), T2 confirmation genotyping (1 month), and data analysis and packaging (1 month). For crops with longer generation times or those requiring vernalization, budget 18–24 months. The timeline can be shortened by using speed breeding or off-season nurseries but cannot be compressed below the biological minimum of two generations to confirm stable inheritance.
References
- Qin, Y., Yun, S. D., Kim, H. L., Choi, J. Y., Lim, M.-H., Oh, S. A., & Park, S. K. "Molecular Characterization of CRISPR-Cas9-Edited Rice Across Generations and Associated Technical Challenges in Nucleotide Editing Tracing." Plant Breeding and Biotechnology, 2025, 13, 207–228. DOI: 10.9787/PBB.2025.13.207
- Sretenovic, S., Green, Y., Wu, Y., Cheng, Y., Zhang, T., Van Eck, J., & Qi, Y. "Genome- and transcriptome-wide off-target analyses of a high-efficiency adenine base editor in tomato." Plant Physiology, 2023, 193(1), 291–303. DOI: 10.1093/plphys/kiad347
- Xu, W., Liang, J., Wang, F., & Yang, L. "Comparative evaluation of gene copy number estimation techniques in genetically modified crops: insights from Southern blotting, qPCR, dPCR and NGS." Plant Biotechnology Journal, 2024, 22(12), 3456–3458. DOI: 10.1111/pbi.14466
- Wang, F.-Z., Bao, Y., Li, Z., Xiong, X., & Li, J.-F. "A dual-function selection system enables positive selection of multigene CRISPR mutants and negative selection of Cas9-free progeny in Arabidopsis." aBIOTECH, 2024, 5, 140–150. DOI: 10.1007/s42994-023-00132-6
- Ikram, M., Rauf, A., Rao, M. J., Maqsood, M. F. K., Bakhsh, M. Z. M., Ullah, M., Batool, M., Mehran, M., & Tahira, M. "CRISPR-Cas9 based molecular breeding in crop plants: a review." Molecular Biology Reports, 2024, 51(1), 227. DOI: 10.1007/s11033-023-09086-w
This article is for Research Use Only. CD Genomics provides agricultural genomics services for research purposes; it does not provide clinical diagnosis, treatment recommendations, or regulatory approval guarantees.
Send a MessageFor any general inquiries, please fill out the form below.


