The Gold Standard: Using WGS for CRISPR Analysis
With the development of CRISPR gene editing technology from basic research to clinical treatment, a comprehensive and unbiased evaluation of its off-target effect has become an indispensable safety quality inspection link. Among many off-target detection technologies, Whole Genome Sequencing (WGS) is widely regarded as the ultimate "gold standard" because of its ability to perform non-hypothesis-driven and panoramic scanning.
This paper discusses the core position of WGS in CRISPR analysis, analyzes its complete workflow from sample preparation to data analysis in detail, objectively comments on its advantages and inherent limitations compared with the targeting methods, and finally provides researchers with the best practice guide on how to integrate WGS into CRISPR projects at different stages. Understanding and correctly applying WGS is the cornerstone of ensuring the safety and reliability of gene editing products.
WGS as the Unbiased Discovery Tool: How It Works
In the analysis of CRISPR miss effect, the core challenge is how to find only a few or even one unexpected editing event in a human genome with more than 3 billion base pairs. Many early technologies, such as targeted sequencing relying on PCR amplification, require researchers to "guess" potential off-target sites in advance and design primers, which will undoubtedly miss a large number of unknown and unpredictable sites. The fundamental advantage of whole genome sequencing (WGS) lies in its impartiality.
Hypothetical-driven Discovery Science of WGS
WGS does not depend on any prior knowledge or bioinformatics prediction. It is not biased against any region of the genome, and its goal is to sequence the whole genome as evenly as possible. This means that both sites with high similarity to sgRNA sequence and sites with low similarity but accidentally cut due to complex factors such as chromatin state and three-dimensional genome structure may be found, as long as their coverage is sufficient. This ability makes it the most powerful tool to discover new and unknown off-target sites.
Brief Introduction of the Technical Principle
The basic principle of WGS is to randomly break the edited genomic DNA of cell groups or individual cell clones into small fragments, construct a sequencing library, and then use the next-generation sequencing technology to perform parallel sequencing on all these fragments. Then, millions to billions of short sequence readings are accurately compared with the unedited reference genome by bioinformatics methods, so as to systematically find the variation relative to the reference sequence in the whole genome.
Types of Miss Events Detected by WGS
Unlike techniques that can only detect preset sites, WGS can reveal many types of gene editing consequences:
- Insertion and deletion of small fragments: This is the most important variation type caused by CRISPR-Cas9-induced repair of non-homologous end connections, and it is also the most easily detected off-target evidence of WGS.
- Single-nucleotide variation: Although CRISPR mainly produces Indels, WGS can also detect single-nucleotide variation that may be introduced or exist in the background during editing.
- Structural variation: This is the unique, powerful ability of WGS. It can find a wider range of genomic abnormalities, such as chromosome translocation (when two different chromosomes are cut off at the same time, they are wrongly connected), deletion/duplication of large fragments, and complex genome rearrangement. These events are usually difficult to find by other off-target detection methods, but their potential harm is far greater than that of a single Indel.
Major concerns/outcomes of off-target effects (Manghwar et al., 2020)
The Workflow: From Edited Sample to Confirmed Off-Targets with WGS
A rigorous WGS miss distance analysis is a multi-step system engineering process, which involves the close cooperation between wet experiment and dry experiment.
Sample Preparation of Experimental Design
- A. Sample selection
- a) Mixed cell population: Suitable for rapid evaluation of the overall off-target background of the editing system. However, due to the extremely low incidence of off-target events (usually < 0.1%), the signal may be drowned by the background noise of a large number of unedited cells in the mixed population, which requires extremely high sequencing depth.
- b) Unicellular cloning: This is the preferred and more rigorous scheme. Through unicellular dilution and amplification of the cells treated by CRISPR, a cloned population with a completely consistent genetic background was obtained. In cloning, any off-target editing that exists in the fertilized egg stage will present a mutation frequency of 100% or 50% in all cells of the clone, which is easily detected. This is the strongest evidence that an off-target event is a "real occurrence" rather than a random sequencing error.
- c) Control setting: Strict control must be set. The ideal control is unedited parallel cultured cells from the same individual, or edited cells without active Cas9/sgRNA. Homologous control can effectively distinguish real editing events from individual inherent genetic polymorphism and background mutation accumulated during cell culture.
- B. Library construction and high-throughput sequencing
- a) DNA extraction and quality control: Extract high molecular weight and nondegradable genomic DNA.
- b) Library construction: Using a commercial WGS library building kit, DNA fragmentation, terminal repair, linker addition, and PCR amplification (if necessary). At present, the mainstream technology can produce 350bp-800bp inserts.
- c) Sequencing depth and coverage: these are the key parameters to determine the sensitivity of WGS.
- d) Sequencing depth: Refers to the average number of times each base in the genome has been sequenced. For off-target analysis, recommended depth is 30x-100x or higher. Low depth (such as < 15x) will seriously underestimate the number of off-target events, because low-frequency Indel can not be distinguished from sequencing errors.
- e) Genome coverage: Refers to the proportion of genome regions covered by sequencing. Due to technical limitations, some regions of the genome (such as centromeres, telomeres, and other highly repetitive sequences) are difficult to map accurately, which will form a "gap". A high-quality WGS experiment should achieve a genome coverage of > 95%.
Bioinformatics Analysis: From Data to Answer
This is the most complex and key link in the WGS process, which usually requires the participation of professional bioinformaticians.
- Data quality control and sequence comparison: Firstly, the original sequencing data are filtered by quality, and then the high-quality readings are accurately compared to the reference genome by using efficient comparison software (such as BWA-MEM or Bowtie2).
- Mutation detection: Use mutation calling software (such as Gatk Haplotype Caller, Free Bayes, Manta, etc.), specially optimized for identifying Indels. These algorithms can sensitively and specifically detect the insertion and deletion of small fragments by analyzing the local sequence assembly of the alignment region.
- Mutation filtering and annotation: The mutation initially called contains a large number of false positives (such as PCR amplification error and comparison error). Strict filtering criteria need to be applied, such as sequencing depth, allele frequency, sequencing quality value, position in repeated sequence region, etc. Variations that pass the filtration will be annotated to determine whether they fall in the gene coding region, regulatory region, or non-functional region.
Confirmation of off-target Event
- For cloned samples: All high-quality Indels found in clones and not existing in the parent control are potential off-target events. It is necessary to further check the flanking sequences of these sites to confirm whether they have sequence similarity with the used sgRNA (usually allowing several base mismatches).
- For a mixed population: It is more challenging to identify Indel loci with significantly higher frequency in the editing group than in the control group through complex statistical models.
Concordance with exome chip (Bizon et al., 2014)
Advantages and Limitations of WGS in Off-Target Analysis
Any technology has its scope of application, and it is very important to understand the advantages and disadvantages of WGS objectively for the correct application of it.
Irreplaceable Core Advantages
- Comprehensive impartiality: As mentioned above, this is the foundation of WGS, and it can find unexpected off-target events and structural variations.
- High sensitivity and quantitative ability: At high sequencing depth, WGS can detect the variation with frequency as low as 0.1% (in mixed population) and accurately quantify the allele frequency of variation.
- Provide permanent data resources: The data generated by a WGS is a snapshot of the whole genome, which can be permanently preserved and reanalyzed. When a new analysis algorithm or a new understanding of the miss distance mechanism is available, old data can be traced back, and new information can be mined.
- "One-stop" solution: In one analysis, it can not only evaluate the editing efficiency on target, but also comprehensively scan the off-target effect and monitor the genomic stability of the cell itself.
Take the Next Step: Explore Related Services
Learn More
Limitations and Challenges that Must be Faced Squarely
- High cost: High-depth (> 30x) sequencing of the human genome is still expensive, especially when multiple biological duplicates or clones need to be analyzed. This limits the application of large-scale screening.
- Extremely high data analysis burden: The amount of data generated by WGS is extremely huge (each sample can reach more than 100GB), which puts high demands on computing and storage resources and bioinformatics expertise.
Blind Spot Detection: WGS is Difficult to Accurately Detect
Variation in repetitive sequence areas, because short readings cannot be uniquely compared.
- Background noise interference: Cells will naturally accumulate somatic mutations during the culture process, and these "background noises" may be confused with real off-target events, highlighting the extreme importance of setting strict control.
- Uncertainty of function influence: WGS can find an off-target mutation, but it can't directly judge whether the mutation will actually affect gene function or cell phenotype, which needs to be verified by subsequent functional experiments.
A comprehensive overview of the main aspects of WGS (Brlek et al., 2024)
Best Practices: Integrating WGS into Your CRISPR Validation Pipeline
In view of the advantages and disadvantages of WGS, it should not be the routine of every CRISPR experiment, but should be strategically deployed at the key nodes of the R&D pipeline.
Hierarchical Miss Analysis Strategy
An efficient and economical strategy is to adopt hierarchical verification:
- The first layer: initial screening. At the early stage of the development of sgRNA and the CRISPR system, the first round of off-target spectrum evaluation was carried out quickly and at low cost by using computational prediction and in vitro experiments, and the candidate constructs with the lowest off-target risk were screened out.
- The second layer: cell-level verification. For the 1-2 top candidates, moderate flux intracellular methods (such as DISCOVER-seq) are used to verify them in relevant cell models. These methods can reflect the real environment, such as chromatin state in cells.
- The third layer: the final confirmation (the stage of WGS). For monoclonal cell lines that are about to enter preclinical research or as key experimental materials, as well as all gene editing products used for treatment (such as CAR-T cells or stem cells used in clinical trials), high-depth WGS analysis must be carried out as the final and authoritative proof of their genomic safety.
Key Considerations of WGS Experimental Design
- Clone priority: As long as the technology is feasible, WGS analysis of unicellular clones is preferred.
- Balance between depth and sample size: When the budget is limited, high-depth (≥50x) sequencing of a few key samples (e.g., 2-3 independently edited clones +1 control clone) can usually provide more reliable and convincing data than low-depth sequencing of a large number of samples.
- Long-term archiving: Properly archiving all the original data and analysis results of WGS, which not only conforms to scientific norms but also prepares for the inquiry of future review institutions.
Combination with Functional Verification
WGS found a worrying off-target mutation (for example, in an important tumor suppressor gene), which is not the end of the study. At this point, you need to start functional verification:
- Independent verification: Using Sanger sequencing or targeted deep sequencing to confirm the existence of the mutation in this clone and other independent repeated experiments.
- Phenotypic association: To evaluate whether the cells carrying the mutation show abnormal proliferation, metabolism, or tumorigenicity.
- Rescue experiment: If possible, correct this off-target mutation by gene repair and other means, and observe whether the abnormal phenotype is reversed, so as to establish a causal relationship.
Potential workflow for AMR detection in clinical microbiology (Fahy et al., 2024)
Conclusion
WGS plays the role of the ultimate arbiter in CRISPR off-target analysis. Its unbiased and panoramic perspective makes it the gold standard for discovering unknown risks and evaluating the overall stability of the genome. Despite the challenges of high cost and complex analysis, with the continuous progress of sequencing technology and the continuous decline of cost, WGS is becoming more and more accessible.
For researchers in the field of CRISPR, it is wise not to avoid WGS but to understand its value and regard it as the core component of a rigorous quality management system. By strategically integrating WGS into the whole process from tool optimization to final product release, we can not only maximize the great potential of CRISPR technology but also shoulder the ultimate responsibility for its safety and steadily push this gene therapy revolution towards a safe and successful future.
CD Genomics provides expert, end-to-end support to confidently characterize your gene editing outcomes. Our services leverage gold-standard methods like WGS and amplicon sequencing to deliver a complete picture of your editing landscape, empowering you to advance your research programs with greater confidence and reliability.
FAQ
1. Why is WGS called the "gold standard" for CRISPR off-target analysis?
It enables unbiased, genome-wide scanning without prior target guesswork, detecting small indels, single-nucleotide variations, and structural variations that targeted methods often miss.
2. What sequencing depth is recommended for WGS-based CRISPR off-target analysis?
30x–100x or higher. Depth <15x risks underestimating off-target events by failing to distinguish low-frequency indels from sequencing errors.
3. Is mixed cell population or unicellular cloning better for WGS sample selection?
unicellular cloning is preferred—it ensures off-target mutations (100%/50% frequency in clones) are easily detected, avoiding signal drowning in mixed populations.
4. What key limitation of WGS affects its widespread use in CRISPR analysis?
High cost (for high-depth sequencing) and heavy data analysis burden (100GB+ per sample, requiring strong computing resources and bioinformatics expertise).
5. Can WGS directly confirm if an off-target mutation affects gene function?
No. WGS only identifies mutations; follow-up functional experiments (e.g., phenotypic association, rescue tests) are needed to verify functional impacts.
References
- Manghwar H, Li B, Ding X, et al. "CRISPR/Cas Systems in Genome Editing: Methodologies and Tools for sgRNA Design, Off-Target Evaluation, and Strategies to Mitigate Off-Target Effects." Adv Sci (Weinh). 2020 7(6): 1902312.
- Bizon C, Spiegel M, Chasse SA, et al. "Variant calling in low-coverage whole genome sequencing of a Native American population sample." BMC Genomics. 2014 15: 85.
- Brlek P, Bulić L, Bračić M, et al. "Implementing Whole Genome Sequencing (WGS) in Clinical Practice: Advantages, Challenges, and Future Perspectives." Cells. 2024 13(6): 504.
- Fahy S, O'Connor JA, Sleator RD, Lucey B. "From Species to Genes: A New Diagnostic Paradigm." Antibiotics (Basel). 2024 13(7): 661.
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.