Genotyping-by-sequencing (GBS), as the cornerstone technology of plant genetics research and breeding, is in a rapid evolution driven by new sequencing technology and a paradigm change in data analysis. This paper prospectively discusses the future development direction of GBS technology, focuses on how long reading and long sequencing technology can empower the next generation of GBS scheme, expounds the potential of integrating GBS data with transcriptome, proteome and metabolomics data to systematically analyze complex traits, and analyzes the key role of GBS in the whole process of gene editing pipeline from guiding RNA design to off-target effect verification.
Finally, this paper constructs a grand future picture, describing how GBS, as the core data engine, is deeply integrated with accurate phenotypes, environmental data, and machine learning algorithms to jointly create an efficient, intelligent, and predictable conventional digital breeding ecosystem.
Emerging Sequencing Technologies: Reducing Costs and Enhancing GBS Capabilities
Although the Illumina platform, based on short reading length, is still the mainstream choice of GBS at present, the maturity of the third-generation long-read sequencing technology is bringing a subversive upgrade possibility to GBS.
Subversive Effects of PacBio and Oxford Nanopore
- Solving the inherent challenge of short reading length GBS: The main limitation of traditional GBS is that the short sequence (usually < 150bp) it produces is difficult to accurately anchor to the repetitive region of a complex genome, resulting in the loss of a large number of markers and the difficulty in assembling in species without a reference genome. The long-read sequencing technology can produce a reading length of several kb or even Mb, which fundamentally solves this problem.
- Long-read GBS: The emerging protocol is combining the simplified representative idea of GBS with long-read sequencing. By sequencing the fragments produced by restriction enzyme digestion, we can not only obtain SNP information, but also directly obtain haplotype information. This means that the phase relationship of multiple SNPs on the same chromosome can be clearly known, which is very important for studying allele-specific expression and the heterosis mechanism.
- Synchronous capture of epigenetic information: Oxford Nanopore technology can also detect epigenetic modifications such as DNA methylation while sequencing. This means that a long reading and long GBS experiment can get extra information about the whole genome on the basis of genotype data, which provides a new dimension for analyzing phenotypic variation.
- Balance between cost and throughput: With the increasing sequencing throughput of PacBio's HiFi model and Nanopore and the decreasing error rate, the cost of long-reading GBS is gradually approaching the acceptable range of large-scale breeding projects. It is especially suitable for constructing a high-quality reference genome, analyzing complex structural variation, and deep resequencing of core germplasm.
More Accessible Platform and Portable Sequencing
The popularity of desktop sequencers, such as iSeq and MiSeq of Illumina and MinION series of Oxford Nanopore, makes GBS analysis accessible to a single laboratory or even the field. This makes it possible to realize on-site genotyping, and breeders can quickly verify the genotypes of candidate lines during the crop growing season, which greatly speeds up the decision-making process.
Future direction: The future GBS platform will be diversified and scene-oriented. For large-scale germplasm resources scanning, short-reading-length GBS with low cost and Qualcomm is still the first choice; For key gene cloning and deep mechanism analysis, high precision and long reading length GBS will be indispensable.
Schematic representation of the TASSEL -GBS Discovery Pipeline (Glaubitz et al., 2014)
Integrating GBS with Other Omics Layers: Transcriptomics, Proteomics, and Metabolomics
Genome variation is the blueprint of trait differences, and the final phenotype is realized through complex molecular networks. Integrating GBS with other omics data is the key to understanding the black box from genotype to phenotype.
Integration Analysis Under the Framework of Systems Biology
- A. GBS + transcriptomics:
- a) Mapping of expression quantitative trait loci (eQTL): By correlation analysis between genotype data obtained from GBS and whole genome gene expression (RNA-seq data), we can identify genomic regions that regulate gene expression level. EQTL analysis can directly link the key SNP of regulatory traits with the candidate genes it controls, and provide a direct biological explanation for QTL results.
- b) Allele-specific expression (ASE): Combined with the haplotype information provided by GBS, we can detect the phenomenon that the expression levels of alleles from male and female parents are different in heterozygotes, which reveals the role of cis-regulatory elements.
- B. GBS + metabonomics:
- a) Mapping of Metabolic Quantitative Trait Locus (mQTL): Correlate GBS genotype with metabolite abundance data, and locate the genetic locus that controls the synthesis and accumulation of specific metabolites. This is very important for improving the nutritional quality of crops (such as vitamins and amino acids) and flavor substances.
- C. GBS + Proteomics:
- a) Although it is expensive to obtain proteome data, integrating it with GBS to locate protein quantitative trait loci (pQTL) can reveal the genetic variation affecting protein abundance and post-translational modification, which is closer to the functional level.
Constructing A Gene Regulatory Network
By integrating multi-omics data, we can go beyond the perspective of a single gene or QTL and construct a system-level network describing the interaction among genes, transcripts, proteins, and metabolites. In this network, a SNP discovered by GBS may be located as a key node of the network, which will affect the expression of a major regulatory gene, and then trigger a series of downstream cascade reactions of transcriptome and metabolomics, and finally lead to phenotypic variation.
This system-level understanding enables breeders to shift from manipulating a single gene to optimizing the whole network to achieve more accurate and robust trait improvement.
Effect of read trimming upon F-score, a measure of overall SNP-calling accuracy, in a curated Gram-negative dataset (Bush et al., 2020)
Recommended Services for This Step
GBS in Gene Editing Pipelines: From Discovery to Validation
Gene editing technologies such as CRISPR-Cas provide unprecedented precise manipulation ability for plant breeding. GBS plays an indispensable role in many links of this technology assembly line.
- A. Before editing: Selection and design of gRNA
- a) Identify sites with high editing efficiency: The efficiency of gene editing is affected by local genome sequence and chromatin state. By GBS analysis of a large number of germplasm, the sequence polymorphism and accessibility of target sites can be evaluated, and the sites with natural polymorphism or in highly compressed chromatin regions in recipient materials can be avoided for editing, thus improving the success rate of gRNA design.
- b) Evaluation of off-target risk: Using the whole genome GBS data, the genome sequence of the receptor material can be accurately understood. Through bioinformatics tools, potential off-target sites with high similarity to gRNA sequences can be scanned in the whole genome, and those gRNAs with high sequence uniqueness in the genome can be preferentially selected to minimize the off-target risk from the source.
- B. After editing: Mutant screening and identification
- a) Quick identification of editing events: GBS for the edited T0 generation population can efficiently screen out the individuals who have successfully edited. By analyzing the sequence of the target region, the editing type (such as base insertion, deletion, and substitution) and the homozygous/heterozygous state of editing can be accurately identified.
- b) Qualcomm screening of homozygous editors without vector sequence: In T1 or T2 generation, GBS can accomplish two things at the same time: one is to confirm whether the edited genotype is stable and homozygous; Second, through genome-wide scanning, individuals whose T-DNA of CRISPR-Cas9 vector has been eliminated are screened out, so as to quickly obtain new editing materials without transgene and accelerate the industrialization process.
- C. After editing: Miss effect analysis
- a) Genome-wide safety assessment: regulators and markets require a strict safety assessment of genetically edited crops, and the core of this assessment is to detect the off-target effect in the whole genome. High-depth genome-wide resequencing of edited materials and their wild-type parents is the gold standard, but high-density GBS is a cost-effective alternative for large-scale preliminary screening.
- b) By comparing the edited GBS data with the wild-type GBS data, we can detect whether there are new and unexpected SNPs or Indels at the expected potential off-target sites or other genomic regions, which provides important evidence for the safety of gene editing products.
Impact of missing data and minor allele frequency on the number of SNPs (Torkamaneh et al., 2015)
Towards Routine Digital Breeding: GBS in a Data-Driven Agriculture Ecosystem
The core of future agriculture is data. As the core data source of genetic information, GBS will be integrated into a larger digital breeding ecosystem and finally realize the complete digitalization, intelligence, and automation of breeding.
Formation of Closed Loop of Digital Breeding
An ideal digital breeding pipeline includes the following core links, through which GBS runs:
- Genotype data: All breeding materials (parents, segregated populations, strains, and varieties) were routinely scanned by GBS, and a dynamically updated genotype database was established.
- Accurate phenotypic group: By using UAV, ground robot, hyperspectral imaging, and other technologies, the phenotypic data of crop growth, yield, stress resistance, and so on can be obtained automatically and quantitatively in the field.
- Environmental data: The environmental data, such as temperature, humidity, soil moisture, and nutrients in the experimental field, are collected in real time by using Internet of Things sensors and weather stations.
- Multiomics data: At key nodes, integrate data such as transcriptome and metabolomics to deepen the understanding of traits.
- Machine learning and prediction model: Using machine learning algorithms (such as deep learning and random forest), integrating genotype (G), phenotype (P), and environment (E) data, a G×E prediction model is constructed.
The Core Function of GBS in Digital Breeding
- Training genome selection model: GBS data is the basis of constructing the genome selection (GS) model. Through GBS and accurate phenotypic identification of the training population, the established GS model can predict the genome estimation breeding value (GEBV) of early breeding materials (such as seedlings) that only have GBS typing and no phenotype, so as to realize early and high-precision screening and shorten the breeding cycle to the extreme.
- Realize dynamic parental selection: Based on the constantly updated GBS database and phenotypic prediction values, a machine learning algorithm can simulate millions of virtual hybrid combinations and predict the performance distribution of their offspring, thus recommending the optimal parental pairing that can produce the greatest genetic gain.
- Empowering predictive breeding: The ultimate goal of a digital breeding system is to realize predictive breeding. Before sowing, the system can predict the performance of different breeding materials in different places according to the predicted meteorological data and soil conditions of the target environment, combined with genotype information, so as to realize accurate recommendations of varieties and customized breeding.
Future Vision: A Fully Optimized Digital Breeding Pipeline
In the near future, a routine digital breeding process will be like this:
- When a breeder designs a hybrid combination, the computer will provide the best parental selection scheme based on the GBS data of the whole genome.
- The F1 seedlings produced by hybridization were analyzed by GBS.
- Based on the GS model, the genome of thousands of F2 individuals was predicted immediately, and only a few individuals with the best prediction performance were screened out and transplanted into the field.
- In the field, the automation platform continuously collects accurate phenotypic data and gives real-time feedback to optimize the GS model.
- For important strains, rapid gene editing was improved, and the editing effect and safety were verified by GBS.
- Finally, before the release of the variety, its performance in different agro-ecological areas has been accurately predicted.
Diagram of iBLUP pipeline (Yang et al., 2014)
Conclusion
GBS technology is evolving from an independent genotyping tool to a core data bridge connecting genomics and phenotypes, connecting traditional breeding and cutting-edge biotechnology, and connecting laboratories and fields. Emerging sequencing technology will continue to expand its capability boundaries, multi-omics integration will deepen its biological insights, and its application in gene editing assembly line will ensure its frontier, while its core role in the digital breeding ecosystem will lay its strategic position as a basic supporting technology for agriculture in the future.
Facing the pressure of global climate change and population growth, it has become an urgent need to make full use of genetic variation and accelerate the breeding process. GBS and its empowered digital breeding paradigm provide us with a powerful path to achieve this goal. Continuous technological innovation, cost reduction, and interdisciplinary cooperation will jointly push GBS from research to conventional application, and finally lay a solid foundation for realizing a new era of sustainable and data-driven precision agriculture.
FAQ
1. How do emerging long-read sequencing technologies (e.g., PacBio, Oxford Nanopore) enhance traditional GBS?
They solve short-read limitations by anchoring to complex genomic repetitive regions, provide haplotype information for allele-specific studies, and (for Nanopore) capture epigenetic data like DNA methylation.
2. What value does integrating GBS with transcriptomics add to crop improvement?
It enables mapping of expression quantitative trait loci (eQTL) to link trait-associated SNPs with candidate genes, and detects allele-specific expression (ASE) to reveal cis-regulatory roles.
3. How does GBS support the gene editing pipeline?
Pre-editing: It evaluates target site polymorphism/accessibility for gRNA design and scans off-target risks. Post-editing: It screens edited mutants, confirms homozygosity, and assesses off-target effects.
4. What role does GBS play in digital breeding ecosystems?
It provides core genotype data to train genomic selection (GS) models, enable dynamic parental selection via virtual hybrid simulations, and power predictive breeding by integrating G×E (genotype×environment) data.
5. Why is portable sequencing (e.g., Oxford Nanopore MinION) impactful for future GBS applications?
It enables on-site genotyping (e.g., field-based candidate line verification during growing seasons), accelerating breeder decision-making and expanding GBS accessibility to individual labs.
References
- Glaubitz JC, Casstevens TM, Lu F, et al. "TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline." PLoS One. 2014 9(2): e90346.
- Bush SJ. "Read trimming has minimal effect on bacterial SNP-calling accuracy." Microb Genom. 2020 6(12): mgen000434.
- Torkamaneh D, Belzile F. "Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data." PLoS One. 2015 10(7): e0131533.
- Yang Y, Wang Q, Chen Q, et al. "A new genotype imputation method with tolerance to high missing rate and rare variants." PLoS One. 2014 9(6): e101025.
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.