With the rapid development of high-throughput sequencing technology, genotyping-by-sequencing (GBS), as an efficient and economical genotyping technology, has completely changed the pattern of plant genetics research and breeding practice. GBS is essentially a multiplex and simplified representative sequencing method based on restriction endonuclease digestion, which can simultaneously realize large-scale excavation and genotyping of single-nucleotide polymorphism (SNP) markers in the whole genome without relying on the reference genome.
This paper will systematically expound the core methodology and technical process of GBS, deeply analyze its remarkable advantages compared with traditional genotyping platforms (such as chip technology and whole genome sequencing), and discuss its applicability and future prospects in different scales and different crop breeding projects.
To understand GBS accurately, we must grasp its two core characteristics: multiplicity and simplified representativeness.
Therefore, we can give a clear definition of GBS: GBS is a low-cost genome analysis technology based on restriction endonuclease digestion, which can simultaneously find and genotype thousands of SNP markers by constructing a multiplex sequencing library and simplifying representative sequencing of the genome.
Its core output is a genotype matrix containing all samples at thousands of SNP sites (usually stored in VCF file format), which can be directly used in breeding applications such as genetic diversity analysis, population structure analysis, linkage map construction, genome-wide association study (GWAS), and genome selection (GS).
GBS adapters, PCR and sequencing primers (Elshire et al., 2011)
The complete process of GBS can be divided into two stages: library construction of the wet experiment and bioinformatics analysis of the dry experiment.
The original sequencing data (FASTQ format) needs a series of bioinformatics processing before it can be transformed into reliable genotypic data.
The proportion of variance for analyzed traits from the 7 genetic models fitted (Dong et al., 2024)
The reason why GBS can quickly become the first choice tool for plant breeders stems from its following outstanding advantages:
This is the most striking advantage of GBS. By multiplex sequencing, the cost of a single sample is reduced to a very low level. Compared with chip or genome-wide re-sequencing, which costs hundreds of dollars per sample, the cost of GBS can be easily controlled at tens of dollars or even lower, which makes genotyping of thousands of breeding materials or strains economically feasible and greatly expands the scale of breeding projects.
For many non-model organisms, orphan crops or forest species, high-quality reference genomes are often unavailable. The simplified representativeness and tag sequencing characteristics of GBS enable it to develop and type tags through an ab initio assembly strategy. This feature of no reference genome dependence greatly broadens its application range and opens the door for genetic improvement of species with limited resources.
Hundreds to thousands of samples can be easily processed in one sequencing run, which realizes the real large-scale population genotyping. This Qualcomm capacity perfectly meets the needs of rapid and large-scale screening of early isolated populations, core germplasm banks, and a large number of new strains produced every year in modern breeding.
Unlike chip technology that relies on known SNPs, GBS is a brand-new scan in every analysis. It can not only classify known SNPs, but also find new, rare, or group-specific SNPs at the same time. This is especially important for species with rich genetic diversity or when mining new alleles, which realizes dynamic breeding while discovering and applying.
As mentioned above, selecting appropriate restriction enzymes (such as *Ape*KI) can make GBS preferentially enrich the low-copy and gene-coding regions of the genome. This means that the obtained SNP markers are more likely to be located in or linked to gene regions with biological functions, thus improving the probability of locating candidate genes in GWAS or linkage analysis and improving the effectiveness of markers.
Distribution of GBS SNP markers in the Oregon Wolfe Barley (OWB) bin map (Poland et al., 2012)
In order to comprehensively evaluate the applicability of GBS, it is necessary to compare it with two other mainstream technologies, SNP chip and whole genome resequencing (WGS).
Comparison between GBS, SNP array and WGS
| Feature Dimension | GBS | SNP Array | WGS |
|---|---|---|---|
| Technical Principle | Reduced representation sequencing | Hybridization and fluorescence detection of known SNPs | Random fragment sequencing of the entire genome |
| Information Content | Combines unknown and known SNPs, moderate marker density, covers specific restriction sites | Limited to pre-designed known SNPs on the array; fixed markers; no new variant discovery | Nearly all genome-wide variations (SNP, InDel, SV, etc.); most comprehensive information |
| Throughput and Cost | High throughput; extremely low per-sample cost; suitable for ultra-large populations | High throughput; moderate per-sample cost (rises with array density) | Low throughput; high per-sample cost (higher for deep sequencing) |
| Reference Genome Dependence | Non-mandatory; de novo analysis applicable | Strongly dependent; array design requires known genome and SNP information | Strongly dependent; data analysis relies heavily on high-quality reference genome |
| Data Complexity | Moderate; requires certain bioinformatics support | Low; mature and standardized data analysis workflow | High; massive data volume; high demands for storage and computing resources |
| Marker Uniformity and Reproducibility | Moderate; affected by digestion efficiency and sequencing depth; certain missing data | High; high genotype call rate; excellent reproducibility | High; optimal uniformity and reproducibility with sufficient sequencing depth |
| Main Application Scenarios | Large-scale breeding population screening
|
Major crops with mature arrays (e.g., maize, soybean, wheat)
|
Basic research (e.g., evolution, population genetics)
|
GBS vs. SNP chip
GBS vs. WGS
Conclusion: GBS has achieved the best balance in terms of cost, flux, flexibility, and adaptability of non-model species. It does not completely replace the chip or WGS, but provides a very competitive middle way, especially suitable for plant breeding projects that are in the development stage and have relatively limited resources, but want to embrace genomics technology.
Response to selection with GBS data in the expanding prediction set based on accuracies of genomic predictions (Gorjanc et al., 2015)
Since the advent of GBS technology, its data stability, repeatability, and label density have been significantly improved through continuous technical optimization (such as the introduction of a double enzyme digestion system to improve complexity and optimize joint design to improve efficiency) and the improvement of bioinformatics tools. It has been successfully applied to many species ranging from annual field crops to perennial fruit trees, and has achieved fruitful results in germplasm resources identification, high-density genetic map construction, QTL mapping of important traits, genome-wide association study, and genome selection.
In a word, GBS has become an indispensable and powerful tool in modern plant breeding because of its core methodological advantages, such as high cost-effectiveness, no reference to a genome and Qualcomm, and simultaneous marker discovery and genotyping. It effectively bridges the gap between traditional molecular markers and expensive whole genome sequencing, democratizing the application of genomics in plant breeding, and provides a key technical driving force for accelerating crop genetic gain and coping with global food security challenges.
1. What core characteristics define Genotyping-by-Sequencing (GBS) in plant genomics?
It has two key traits: multiplicity (mixing dozens/hundreds of samples via Barcodes to cut costs) and simplified representativeness (using restriction enzymes to sequence specific genomic regions instead of the whole genome).
2. What are the two main stages of the GBS technical workflow?
The wet-lab procedure (DNA extraction → restriction digestion → library prep → sequencing) and the bioinformatic analysis process (data QC → demultiplexing → alignment → SNP calling → genotype matrix generation).
3. Does GBS require a reference genome for plant genotyping?
No. For species without a reference genome, GBS uses an "ab initio" assembly strategy (building a pseudo-reference from high-quality reads) to develop and genotype markers.
4. How does GBS compare to SNP chips in cost and marker discovery?
GBS has lower per-sample costs (tens of dollars) and discovers new/rare SNPs; SNP chips rely on pre-designed known SNPs and have moderate costs, with higher data reproducibility.
5. What makes GBS suitable for large-scale plant breeding projects?
Its high throughput (processing hundreds/thousands of samples per run), cost-effectiveness, and ability to simultaneously perform marker discovery and genotyping meet large-population screening needs.
Related reading
References
Send a MessageFor any general inquiries, please fill out the form below.
CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.