CRISPR Screening Sequencing in Plants: gRNA Library Readout
Figure 1: A pooled CRISPR screening workflow in plants — from gRNA library construction through Agrobacterium-mediated transformation to NGS-based gRNA readout and hit identification.
A single-gene CRISPR knockout answers one question: what happens when this gene is disrupted. A pooled CRISPR screen answers hundreds or thousands of questions in one experiment: which genes, when disrupted, produce a phenotype of interest. The difference is scale — and the data that comes back from a pooled screen is fundamentally different from the allele tables discussed in the earlier articles in this series.
This article covers the design and execution of a pooled CRISPR screening experiment in plants, with a focus on what happens after the plants regenerate: the NGS readout that connects each gRNA sequence to its phenotypic effect. It assumes familiarity with CRISPR editing validation in crops and CRISPR amplicon panel design, and serves as a bridge to the higher-throughput applications those earlier articles reference.
How a Pooled CRISPR Screen Works
The logic of a pooled screen is straightforward. A library of gRNAs — each targeting a different gene or genomic region — is synthesized as a pool, cloned into a plant transformation vector, and introduced into plants via Agrobacterium. Each transformed plant or callus receives one gRNA on average. After selection and regeneration, the population of transgenic plants is screened for a phenotype. The gRNA sequences carried by plants that pass the screen are then identified by NGS, revealing which genes were disrupted.
The Core Workflow
A plant pooled CRISPR screen has five stages:
Stage 1 — Library design and synthesis. gRNA sequences are designed computationally, targeting the genes or genomic regions of interest. The designed oligos are synthesized as a pool on a microarray and cloned into a plant-compatible CRISPR vector.
Stage 2 — Transformation. The pooled plasmid library is introduced into Agrobacterium, which is then used to transform plant material — typically callus, leaf discs, or hypocotyl explants. The Agrobacterium titer and co-cultivation conditions are optimized so that most transgenic events receive a single T-DNA insertion.
Stage 3 — Regeneration and selection. Transformed cells are selected on antibiotic or herbicide-containing media, and plants are regenerated through tissue culture. The number of independent transgenic lines produced determines the screen's coverage of the gRNA library.
Stage 4 — Phenotypic screening. The population of transgenic plants is subjected to a selection pressure or phenotyping assay — drought stress, herbicide application, pathogen challenge, or simply growth measurement — and plants showing the phenotype of interest are identified.
Stage 5 — gRNA readout by NGS. Genomic DNA is extracted from selected plants, the gRNA cassette is amplified by PCR, and the products are sequenced. The frequency of each gRNA sequence in the selected population, compared to its frequency in the original plasmid pool or an unselected control population, indicates which gene disruptions enriched or depleted under the screening condition.
Arrayed vs. Pooled: Two Screening Architectures
| Feature | Arrayed Screen | Pooled Screen |
|---|---|---|
| gRNA identity known? | Yes — each plant carries a known gRNA | No — gRNA identity determined by NGS after phenotyping |
| Throughput | Limited by transformation capacity (tens to low hundreds) | High — thousands of gRNAs in one experiment |
| Infrastructure | Requires tracking individual lines | Requires NGS readout for deconvolution |
| Phenotyping | Can measure complex, multi-parameter traits | Best suited for selectable or scorable phenotypes |
| Plant species | Any transformable species | Requires efficient, high-throughput transformation |
Arrayed screens — where each gRNA is transformed separately and the resulting plants are tracked individually — produce clean genotype-phenotype links but are labor-intensive. The FLASH pipeline (Yao et al., 2023) occupies a middle ground: it uses 12 length-based PCR tags to index individual gRNA constructs, enabling 12 constructs to be pooled per Agrobacterium mixture while retaining the ability to identify each gRNA by simple PCR rather than NGS.
Pooled screens trade the certainty of knowing which gRNA went into which plant for the efficiency of running thousands of gRNAs in a single transformation experiment. The NGS readout at the end re-establishes the genotype-phenotype link by counting gRNA sequences.
Figure 2: Arrayed versus pooled CRISPR screening architectures — arrayed screens track individual gRNA identities from the start; pooled screens identify gRNAs by NGS after phenotypic selection.
gRNA Library Design
The quality of a screening result is set at the library design stage. A well-designed library maximizes on-target activity while minimizing the number of transgenic lines needed for adequate coverage.
How Many gRNAs Per Gene
For a CRISPR knockout screen in plants, four to six gRNAs per gene is a standard recommendation. Fewer than four risks missing genes where some gRNAs have low activity — a significant concern in plants, where gRNA efficiency prediction tools trained on mammalian data do not always transfer well. Pan et al. (2023) note that gRNA activity prediction remains a challenge for plant screens, particularly for species beyond Arabidopsis and rice where training data are sparse.
More than six gRNAs per gene improves statistical power incrementally but increases the library size — and therefore the number of transgenic lines needed — proportionally. For most crop screening projects, the practical constraint is not gRNA design capacity but transformation throughput.
Controls and Internal Standards
Every gRNA library should include:
- Non-targeting control gRNAs. gRNA sequences with no match in the target genome. These establish the baseline for gRNA enrichment or depletion analysis.
- Positive control gRNAs targeting known genes. For example, gRNAs targeting genes with a phenotype that is easy to score — chlorophyll biosynthesis genes produce visible pale or albino plants when knocked out — confirm that the screening pipeline is working end to end.
- Redundant gRNAs for essential genes. gRNAs targeting genes required for plant viability should be depleted from the library after transformation and regeneration — their absence in the regenerated population confirms that the CRISPR editing is functional.
Library Uniformity
The synthesized oligo pool is never perfectly uniform — some sequences are overrepresented and others underrepresented. An NGS check of the plasmid pool before transformation establishes the baseline distribution. If certain gRNAs are severely underrepresented in the starting pool, they will be difficult to detect in the final readout regardless of their biological effect.
A practical threshold: each gRNA should be represented by at least 50–100 reads in the plasmid pool NGS data to be reliably tracked through the screen. gRNAs below this threshold should either be resynthesized or excluded from the final analysis.
Delivery and Library Representation
Getting the gRNA library into plants is the step that most distinguishes plant screens from the mammalian screens that dominate the CRISPR screening literature.
The Agrobacterium Bottleneck
Agrobacterium-mediated transformation is currently the only delivery method demonstrated for pooled CRISPR libraries in plants (Pan et al., 2023). Each transformed plant cell typically receives a single T-DNA integration, which is ideal for genotype-phenotype linkage but means that library representation is determined by the number of independent transgenic lines produced.
The math is simple: to achieve 100× coverage of a 1,000-gRNA library, you need approximately 100,000 independent transgenic lines. In crop species where transformation efficiency is 5–10% and regeneration takes 3–6 months, this scale of transformation is impractical for most individual labs. This is why the largest plant CRISPR screens to date have been conducted in rice — the most efficiently transformed crop — and why screens in wheat, maize, and soybean have been limited to smaller gene families.
Practical Coverage Targets
| Screen Type | Library Size | Recommended Coverage | Lines Needed |
|---|---|---|---|
| Gene family screen (e.g., transcription factors) | 100–500 genes, 400–3,000 gRNAs | 50–100× | 20,000–300,000 |
| Pathway-focused screen | 50–200 genes, 200–1,200 gRNAs | 50× | 10,000–60,000 |
| Genome-wide screen (rice, Arabidopsis) | 20,000–35,000 genes, 80,000–150,000 gRNAs | 10–30× | 800,000–4,500,000 |
Genome-wide screens at high coverage remain a major undertaking in plants. Liu et al. (2023) reviewed the scale of published plant screens and found that even the largest efforts — rice genome-wide knockout libraries with 84,000–88,000 gRNAs and 14,000–84,000 transgenic lines — achieve only partial coverage at the transgenic plant level, with the NGS readout compensating by detecting gRNA representation in bulk tissue from pooled populations.
Maintaining Representation Through Tissue Culture
Library representation is lost at every step: during Agrobacterium transformation, during antibiotic selection, and during regeneration. The gRNA sequences that survive to the regenerated plant stage are a subset of the original library, and the subset is not random — gRNAs targeting genes involved in regeneration or stress response may be systematically lost.
A practical mitigation: sequence the gRNA cassette from a sample of regenerated plants before phenotyping. This "pre-selection" NGS dataset establishes which gRNAs are actually present in the transformed population and at what frequencies, providing the baseline against which post-selection enrichment is measured.
NGS Readout of gRNA Sequences
The NGS readout is the step that converts a pool of plants into a ranked gene list. It answers the question: which gRNAs are enriched or depleted in the plants that passed the screen?
gRNA as Its Own Barcode
In mammalian CRISPR screens, each gRNA construct typically includes a separate barcode sequence for amplification and counting. In plants, the gRNA sequence itself usually serves as the barcode — the variable 20-nucleotide protospacer is sufficiently diverse to uniquely identify each construct in the library. This simplifies library construction but places higher demands on PCR amplification uniformity, since gRNA sequences with extreme GC content or secondary structure may amplify less efficiently.
PCR Amplification and Sequencing
The standard approach is a two-round PCR:
- Round 1 amplifies the gRNA cassette from genomic DNA using primers that anneal to conserved regions flanking the gRNA in the vector backbone. This ensures all gRNA sequences are amplified with the same primer pair.
- Round 2 attaches sample-specific barcodes and Illumina adapters for multiplexed sequencing.
The PCR should use the minimum number of cycles that produces sufficient product — typically 18–22 cycles for Round 1 — to minimize PCR duplicates that inflate apparent gRNA counts. After sequencing, reads are trimmed to extract the 20-nucleotide gRNA sequence and counted.
Sequencing Depth Requirements
For a library of 1,000 gRNAs at 100× coverage, approximately 100,000 reads per sample are needed — easily achieved on a MiSeq or a fraction of a NextSeq run. For larger libraries, scale linearly: a 100,000-gRNA library at 30× coverage requires 3 million reads per sample. Multiple samples can be multiplexed in the same sequencing run, making the per-sample sequencing cost modest compared to the transformation and tissue culture costs.
Figure 3: The gRNA NGS readout and analysis pipeline — from genomic DNA through PCR amplification, sequencing, and MAGeCK-based hit calling to produce a ranked candidate gene list.
From Read Counts to Hit Lists
The raw output of the gRNA NGS readout is a table of gRNA sequences and their read counts in each sample. Converting this into a ranked list of candidate genes requires statistical analysis.
Normalization and Comparison
The core analysis compares gRNA abundance in the selected population to a reference — typically the pre-selection plasmid pool or an unselected control population of transgenic plants. The most widely used tool for this analysis is MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout), originally developed for mammalian screens but applicable to any organism. MAGeCK normalizes read counts across samples, aggregates gRNA-level counts into gene-level scores, and ranks genes by the statistical significance of their enrichment or depletion.
Hit Calling in Plant Screens
Hit calling in plant screens is less standardized than in mammalian screens, for two reasons. First, plant screens typically involve fewer biological replicates — the cost and time of plant transformation make the six-replicate standard of mammalian screens impractical. Second, gRNA efficiency varies more in plants, meaning that different gRNAs targeting the same gene may show inconsistent enrichment patterns.
A practical approach:
- Require concordance across gRNAs. A gene is a stronger hit if multiple independent gRNAs targeting it show the same direction of enrichment or depletion, rather than a single gRNA driving the gene-level score.
- Use a relaxed significance threshold for discovery. In a screen of 1,000 genes, a false discovery rate of 10–20% may be acceptable for generating candidate lists that will be validated by follow-up experiments. Individual hits can be confirmed by generating targeted knockouts of the candidate genes and repeating the phenotype assay.
- Validate with independent experiments. The NGS readout identifies candidates. Confirming those candidates with single-gene knockouts — using the targeted amplicon approach described in the panel design article — converts candidates into validated hits.
Plant-Specific Screening Challenges
Several challenges differentiate plant CRISPR screens from their mammalian counterparts and shape the practical advice in this article.
Transformation Throughput
Mammalian CRISPR screens routinely achieve 500× library coverage because lentiviral transduction of millions of cells is fast and scalable. Plant screens are limited by the throughput of Agrobacterium-mediated transformation and tissue culture. For most crop species, a few hundred to a few thousand independent transgenic lines is a realistic scale for a single lab. This constrains library size to hundreds of genes rather than genome-wide.
Cheng et al. (2025) demonstrated that efficient gRNA activity pre-screening in protoplasts can reduce the number of gRNAs needed per gene — if only the most active one or two gRNAs are used, the library shrinks and fewer transgenic lines are needed for the same coverage. Protoplast-based gRNA validation before committing to whole-plant transformation is an underused strategy in crop screening projects.
Chimerism in T0 Plants
Plants regenerated from tissue culture are often chimeric — different cells carry different editing outcomes. In a screening context, chimerism can result in a plant carrying a gRNA but lacking the corresponding knockout phenotype because the edited sector is too small or absent from the tissue being phenotyped. This adds noise to the genotype-phenotype link but does not break it entirely, since the gRNA is still detectable by NGS. The practical consequence is that plant screens require larger sample sizes to achieve statistical power comparable to mammalian screens.
Polyploidy and Functional Redundancy
Many crops are polyploid, and many plant gene families have functionally redundant members. A knockout of a single gene may produce no phenotype because a paralog compensates. In a screening context, this means that genes with real biological roles may not be identified as hits, and the screen's false-negative rate is higher than in diploid species with compact genomes.
Two strategies address redundancy. The first is to design gRNAs that target conserved regions shared by multiple paralogs — one gRNA knocks out the entire gene family. This is feasible for closely related paralogs but risks off-target effects. The second is to accept the redundancy and interpret the screen results accordingly: the genes that do score as hits in a polyploid screen are likely to have dominant or haploinsufficient effects, making them particularly interesting candidates for follow-up.
Applications and Case Examples
Transcription Factor Screens
The most common application of pooled CRISPR screening in crops to date has been transcription factor (TF) family screens. Bi et al. (2023) constructed a library of 4,379 sgRNAs targeting 990 tomato TFs and generated 487 T0 transgenic plants, identifying 65 TFs with single-sgRNA integrations. By screening a smaller sub-library of 30 TFs, they improved the single-sgRNA recovery rate from 19% to 42% — demonstrating a practical strategy of focused sub-libraries for species with lower transformation efficiency.
Herbicide Resistance Screening
Cheng et al. (2025) used a pooled gRNA library targeting ACETOLACTATE SYNTHASE (ALS) in rice to screen for novel herbicide-resistance alleles. The screen identified the known resistance mutations and several previously unreported variants. Because herbicide resistance is a strong selectable phenotype, this type of screen is particularly well-suited to the pooled format: resistant plants are simply the ones that survive herbicide application, and the gRNA sequences they carry reveal which edits confer resistance.
Trait Discovery in Breeding Programs
For breeding programs, pooled CRISPR screening offers a route from sequence to phenotype at scale. A library targeting 200–500 candidate genes from a QTL or GWAS study can systematically test which genes contribute to a trait of interest. The NGS readout connects each phenotype to the gene that was disrupted, producing a shortlist of validated candidate genes for marker-assisted selection or further editing. The article on CRISPR edited plant line selection and sequencing QC covers what happens when individual lines from such a screen are advanced toward field trials.
From Screen Design to Data
A pooled CRISPR screen in plants is a significant investment — months of tissue culture, thousands of transgenic lines, and an NGS run. The success of the experiment is determined before the first seed is sterilized for transformation.
For researchers planning a screen, CD Genomics provides CRISPR Sequencing for Agriculture, which includes gRNA library design support, NGS-based gRNA readout, and data analysis. The Targeted Sequencing platform handles gRNA cassette amplification and barcode sequencing for pooled screening projects, and Bioinformatics Analysis Services provide MAGeCK-based hit calling, gene-level enrichment analysis, and candidate prioritization.
For projects that have already identified candidate genes from a screen and are moving to individual line validation, the CRISPR amplicon panel design guide covers the transition from pooled screening to targeted validation.
FAQ
Q1: How many gRNAs should I design per target gene for a plant CRISPR screen?
A: Four to six gRNAs per gene is the standard recommendation. This accounts for variable gRNA activity in plants — where activity prediction tools trained on mammalian data have limited accuracy — and provides statistical redundancy for hit calling. If you can pre-screen gRNA activity in protoplasts, as demonstrated by Cheng et al. (2025), you can reduce this to the most active two or three gRNAs per gene and shrink the library accordingly. For species with efficient transformation (rice, Arabidopsis), err toward more gRNAs. For species with low transformation efficiency, use fewer gRNAs and a smaller, more focused library.
Q2: Can I run a pooled CRISPR screen in a crop species that is difficult to transform?
A: Yes, but constrain the library size to match your transformation capacity. For a species where producing 500 independent transgenic lines is a realistic target, a library of 10–50 genes with 4–6 gRNAs each (40–300 gRNAs total) can achieve 10–50× coverage. Focus on a small gene family or pathway rather than genome-wide coverage. The FLASH pipeline (Yao et al., 2023) offers an alternative: 12 constructs pooled per transformation, with gRNA identity recovered by PCR rather than NGS, which can work for labs without NGS infrastructure.
Q3: What sequencing depth do I need for gRNA readout?
A: For a 1,000-gRNA library at 100× coverage, approximately 100,000 reads per sample. Scale linearly: a 10,000-gRNA library at 50× coverage needs 500,000 reads per sample. These numbers are modest — multiple samples can be multiplexed on a single MiSeq or NextSeq run. The more important QC metric is PCR duplicate rate: if more than 20–30% of reads are PCR duplicates, reduce the number of amplification cycles in the gRNA cassette PCR.
Q4: How do I know if my screen worked?
A: Three quality checks: (1) Positive control gRNAs targeting genes with known visible phenotypes should produce the expected phenotype in a subset of transgenic plants. (2) gRNAs targeting essential genes should be depleted in the regenerated plant population compared to the plasmid pool. (3) Non-targeting control gRNAs should show no systematic enrichment or depletion after selection. If all three checks pass, the screening pipeline is functional. If any fail, investigate transformation efficiency, gRNA expression, or Cas9 activity before analyzing the screen results.
Q5: What is the difference between a pooled screen and the targeted amplicon panels described in your other articles?
A: A targeted amplicon panel (Article 3) answers "what editing outcomes occurred at this specific locus" for a small number of predefined targets — typically on-target and a few off-target sites. A pooled screen answers "which genes, when disrupted, produce this phenotype" across hundreds or thousands of targets. The amplicon panel gives you allele-level detail at known sites. The pooled screen gives you gene-level discovery across the genome. Many projects use both: a pooled screen to identify candidate genes, followed by targeted amplicon sequencing of the top candidates to characterize the editing outcomes in detail.
Glossary
Arrayed screen: A screening format in which each gRNA construct is transformed separately into plants that are tracked individually, so the gRNA identity of each line is known without sequencing.
Coverage (library): The average number of transgenic plants carrying each gRNA in the library. 100× coverage means each gRNA is present in approximately 100 independent transgenic lines.
FLASH tag: A length-based PCR barcode used in the FLASH genome editing pipeline to index gRNA constructs, enabling gRNA identification by gel electrophoresis rather than NGS.
gRNA cassette: The DNA segment encoding the guide RNA, including the promoter, the 20-nucleotide protospacer, and the gRNA scaffold, integrated into the plant genome during transformation.
Hit: A gene identified by the screen as producing a phenotype of interest when disrupted, based on enrichment or depletion of its targeting gRNAs in the selected population.
Library representation: The proportion of designed gRNA sequences that are actually present in the transformed plant population at detectable frequencies.
MAGeCK: Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout — a computational tool that normalizes gRNA read counts, aggregates them into gene-level scores, and ranks genes by statistical significance.
Non-targeting control: A gRNA sequence with no match in the target genome, used to establish baseline enrichment/depletion levels in the screen analysis.
Pooled screen: A screening format in which a library of gRNA constructs is transformed as a mixture, and gRNA identity is determined by NGS after phenotyping.
Protoplast pre-screening: Testing gRNA activity in plant protoplasts before committing to whole-plant transformation, used to identify the most active gRNAs and reduce library size.
References
- Pan, C., Li, G., Bandyopadhyay, A., & Qi, Y. "Guide RNA library-based CRISPR screens in plants: opportunities and challenges." Current Opinion in Biotechnology, 2023, 79, 102883. DOI: 10.1016/j.copbio.2022.102883
- Liu, T., Zhang, X., Li, K., Yao, Q., Zhong, D., Deng, Q., & Lu, Y. "Large-scale genome editing in plants: approaches, applications, and future perspectives." Current Opinion in Biotechnology, 2023, 79, 102875. DOI: 10.1016/j.copbio.2022.102875
- Cheng, Y., Li, G., Qi, A., Mandlik, R., Pan, C., Wang, D., Ge, S., & Qi, Y. "A comprehensive all-in-one CRISPR toolbox for large-scale screens in plants." The Plant Cell, 2025, 37(4), koaf081. DOI: 10.1093/plcell/koaf081
- Yao, L., Wang, X., Ke, R., Chen, K., & Xie, K. "FLASH Genome Editing Pipeline: An Efficient and High-Throughput Method to Construct Arrayed CRISPR Library for Plant Functional Genomics." Current Protocols, 2023, 3(9), e905. DOI: 10.1002/cpz1.905
- Bi, M., Wang, Z., Cheng, K., Cui, Y., He, Y., Ma, J., & Qi, M. "Construction of transcription factor mutagenesis population in tomato using a pooled CRISPR/Cas9 plasmid library." Plant Physiology and Biochemistry, 2023, 205, 108094. DOI: 10.1016/j.plaphy.2023.108094
This article is for Research Use Only. CD Genomics provides agricultural genomics services for research purposes; it does not provide clinical diagnosis, treatment recommendations, or regulatory approval guarantees.
Send a MessageFor any general inquiries, please fill out the form below.


