Genotyping By Sequencing (GBS) is a cost-effective, high-throughput method using restriction enzymes and NGS to discover genome-wide SNPs. Key steps include DNA digestion, barcoding, library prep, and sequencing. Critical factors are enzyme choice, DNA quality, adapter design, and sequencing depth. GBS enables population genetics, QTL mapping, and breeding applications across diverse species, including polyploids, though challenges like uneven coverage exist.
This review explores the principles of GBS, common questions about GBS, major challenges and limitations of GBS.
What is GBS
GBS is a simpler DNA reading method using fast sequencing. It mainly finds and checks many single DNA differences (SNPs) in lots of living things. Its name says what it does: finding genotypes by sequencing. Elshire and others first made it for corn in 2011. It quickly spread to crops, trees, livestock, fish, and even humans. GBS works by using special enzymes to cut DNA at specific sites. It only reads these cut pieces, mostly near the cut points. This greatly lowers the work and cost of reading DNA. It also gives a good set of SNP markers across the whole DNA.
Modern genome studies need fast, cheap, efficient genotyping methods. GBS is a new, simpler method made for this. It uses restriction enzymes to cut DNA. It then uses fast sequencing machines. This combination lets scientists check thousands of SNPs in many individuals at once. Examples include whole populations or families. GBS has major benefits. It needs no prior DNA sequence knowledge. It costs little. It handles many samples. Its steps are mostly standard. This makes GBS a powerful tool. Scientists use it for big population genetics studies, association studies, finding important trait locations (QTLs), judging plant materials, and molecular breeding.
Services you may interested in
Principles of GBS
Building a GBS library involves key steps: sampling, enzyme cutting, adapter linking, DNA mixing and cleaning, PCR growth, product cleaning and recovery, library checking, and sequencing.
- Sampling: Place DNA samples with unique barcodes into separate wells of a PCR-96 plate under clean conditions. Add specific adapters at the same time. These barcodes act like special tags for each sample. They help track and tell apart different samples later during mixed testing. This sets up for sorting huge data later.
- Enzyme cutting: Carefully make the ApeKI enzyme mix. Add this mix to each well of the PCR-96 plate holding DNA and adapters. ApeKI is an enzyme that cuts DNA at specific sites. This creates DNA pieces with special ends. This step is the key first part of building the library. Good cutting is vital for library quality.
- Adapter linking: Do the linking reaction right in the same PCR-96 plate after enzyme cutting. Specially made adapters match the ends of cut DNA pieces. Linking enzymes join them with strong bonds. This attaches adapters to DNA. Control temperature, time, and enzyme amounts carefully. Good linking gives every DNA piece a "tag" for later steps.
- DNA mixing and cleaning: Each sample has unique special tags. Mix equal amounts of all cut DNA pieces into one library. Then clean the mix using the QIAquick PCR Purification Kit. This kit uses silica membranes to trap DNA. It removes dirt, leftover enzyme pieces, loose adapters, and other small waste. This gives clean DNA. The clean DNA works well as a good template for PCR.
- PCR growth: Carefully design special primers matching the adapter sequences. These primers attach exactly to the adapters. Make a PCR mix with template DNA, primers, dNTPs, heat-resistant polymerase, and buffer. Put the mix in a PCR machine. Run the program with many cycles. Each cycle has three steps: melt DNA, cool for primer binding, grow new DNA. This makes many DNA copies fast. Now you have enough DNA for testing and sequencing.
- PCR product cleaning and recovery: After PCR, use the Qiagen PCR clean-up kit to clean the product. This kit also uses silica membranes. It removes primer pairs, unused primers, dNTPs, and other dirt from PCR. This improves DNA purity and quality. Add 32μL Qiagen EB buffer to dissolve the clean DNA. EB is pure elution buffer. This prepares the DNA for measuring concentration, quality checks, and sequencing.
- Library check and sequencing: Test the clean library using tools like gel tests, light meters, and bioanalyzers. Check key things: concentration, size spread, purity, and variety. Only good libraries go for high-speed sequencing. This starts detailed gene study. It gives key data for genetics and breeding research.
Schematic overview of steps in GBS library construction and sequencing
(Poland JA et al., 2012)
Key Factors Influencing GBS Results
- Selection of Restriction Enzyme: Enzyme choice matters for GBS libraries. The enzyme type, cutting rate (site length), and cut style (blunt or sticky ends) decide which DNA areas get captured. They control piece numbers, sizes, and spread. This affects SNP tag count, spread, and coverage. For example, ApeKI cuts six-base sites. It makes sticky ends with four bases. It cuts well and suits dense SNP tags. HindIII cuts six-base sites. It makes sticky ends too. It fits medium tag density. High cutting rates make many small pieces. This helps find many SNPs. Low cutting rates make big pieces. This fits gene mapping. Cut style also matters. Blunt ends join slower but stay stable. Sticky ends join faster but need perfect matches. Pick enzymes based on your tools and goals.
Properties of genotyping-by-sequencing libraries made with three different enzymes (Hamblin MT et al., 2014)
- DNA Quality and Quantity: DNA quality and amount matter for GBS libraries. Poor DNA causes problems: incomplete cuts make long pieces, low adapter linking makes less diverse libraries, PCR bias over-amplifies some areas but misses others, low sequencing depth causes missing or wrong gene types. Good DNA must meet these rules: Be pure (OD260/280 ratio 1.8-2.0, OD260/230 ratio >2.0), no protein or chemical dirt; Concentration ≥50ng/μL, total ≥1μg; Full pieces, no breaks (check with gel tests or bioanalyzers). Use CTAB method or store kits to get good DNA. Always check purity, concentration, and piece quality.
- Adapter Design and Barcoding Strategy: Adapter and barcode design affects sample sorting and data quality. Adapters must: Work with your tools (like P5/P7 for Illumina), avoid wrong bonds, be medium length (18-25bp) for good linking. Barcode rules: Unique tags per sample, 8-12bp length (gives 2^8 to 2^12 combos), handle misreads (use Hamming distance ≥2). Barcode count limits mixed samples. Example: 96-well plates use 96 unique tags. But balance matters. Illumina NovaSeq runs 384 samples at once. But each sample may get low depth (1-2x). This hurts gene type accuracy.
- PCR Amplification Bias: PCR amplification is a key step in GBS library construction but has inherent biases. Too many PCR cycles can lead to over - amplification of specific fragments. Poor conditions (e.g., bad primer design, insufficient dNTPs, or low enzyme activity) can cause uneven amplification. This is shown in low amplification efficiency in extreme GC - content regions, easier amplification of short fragments over long ones, and potential masking of low - abundance alleles. To reduce biases, optimize PCR conditions by using high - fidelity polymerases (e.g., Q5 or Phusion), limiting cycle numbers (usually 12 - 15), and adjusting primer concentrations and annealing temperatures. Alternatively, consider methods like Multiple Displacement Amplification (MDA) or MALBAC. They improve amplification uniformity and coverage but may raise costs and complexity.
- Sequencing Depth: Sequencing depth is a core factor in genotype determination. The average sequencing depth per sample and site directly impacts accuracy. When depth is below 5x, there's a significant increase in genotype missing rates (exceeding 30%) and error rates (with false - positive rates over 10%). Above 20x depth, data redundancy increases costs (by about 5 - 10% for each additional 1x), but this becomes necessary for detecting complex genomes or low - frequency alleles.
- Reference Genome Quality: The integrity, accuracy, and assembly quality of the reference genome directly impact read alignment efficiency and the accuracy of SNP calling. Complex or highly repetitive regions may reduce the effectiveness of these processes.
Common Questions about GBS
How to Choose the Tag Count
A single Tag refers to the sequence of a sequencing read adjacent to an enzyme cut site. The genomic scope captured by GBS equals the number of Tags multiplied by the length of an individual read. The number of markers required varies with different research objectives. For instance, genome-wide association studies may require tens of thousands of high-density markers, whereas studies on phylogenetic relationships and linkage analysis do not need such a high marker density. Generally, a few hundred to several thousand markers are sufficient for these analyses. Therefore, it is advisable to first evaluate the number of markers needed for your study and then select the appropriate Tag count. As a general guideline based on genome size, for species with a genome smaller than 1G, it is typically recommended to use 100,000 Tags for genetic linkage mapping studies. However, you can also opt for a higher Tag count according to specific requirements.
Coefficient of variation of GBS reads per sequencing channel for sequential sequencing runs (Elshire RJ et al., 2011)
Can GBS Be Used for Polyploid Species
GBS works for polyploid species. In 2014, GBS successfully mapped the genetics of the hexaploid oat. Polyploid species are complex. They can be homoploid or heteroploid. They also come in tetraploid and hexaploid forms. Each case is unique. Now, wheat and cotton are being studied with GBS for genetic mapping.
The core challenge of GBS in polyploid species (like tetraploid and hexaploid ones) lies in genotyping complexity. Polyploid genomes have multiple homologous chromosome sets, making homologous sequences hard to distinguish and prone to interference from sequencing depth bias and PCR amplification preferences when determining SNP allele dosages (e.g., AAAA, AAAB in tetraploids). Polymorphism of restriction enzyme - cutting sites can lead to allele dropout, and incomplete reference genomes or un distinguished subgenomes can lower alignment accuracy. Moreover, multivalent chromosome pairing in meiosis may cause abnormal recombination, requiring special algorithms (such as polyRAD and updog) to solve genotypes under high heterozygosity.
Distribution of GBS loci across the oat genome (Huang YF et al., 2014)
What software is used for GBS data analysis
GBS (Genotyping-by-Sequencing) data analysis encompasses multiple stages, including quality control of raw sequencing data, SNP detection, genotype filtering, and association analysis.
- Among the core software tools, TASSEL-GBS serves as the standard pipeline, supporting both reference-based and de novo SNP calling. It integrates population genetic parameters such as minor allele frequency (MAF) and inbreeding coefficient (Fit) for efficient filtering. For instance, setting a minimum MAF of >1% and a minimum Fit of >0.8 can eliminate false-positive sites caused by sequencing errors. The process ultimately generates VCF format files.
- Stacks excels in de novo analysis for non-model species, constructing loci and haplotypes to effectively handle population genetic diversity.
- GATK is utilized for variant detection and genotype correction, particularly excelling in complex genomic regions. However, genotype likelihood values (PL) must be cautiously converted into dosages, as PL reflects genotype probabilities rather than direct dosages.
- VCFtools offers flexible variant filtering based on criteria such as depth (DP) and missing rate (--max-missing).
- PLINK is primarily used for population genetic structure and association analysis, supporting corrections for population stratification, LD filtering (--indep-pairwise 50 5 0.2), and heterozygosity tests to prevent false associations due to population structure.
- FastQC and Trimmomatic handle quality control and read trimming, with FastQC providing visualizations of raw data quality, such as base quality distribution and adapter contamination, while Trimmomatic trims low-quality bases using a sliding window and removes adapters to ensure accuracy in subsequent analyses.
Major Challenges and Limitations of GBS
- Allelic Dropout: SNPs in or near enzyme sites may cause cutting failure or adapter linking issues. This means some gene types get missed in samples carrying certain types. The SNP may show wrong calls or be filtered out.
- Uneven Coverage: Enzyme sites spread unevenly across DNA. Some spots get many tags, others get few or zero. This makes equal marker spread hard.
- High DNA Quality Requirement: GBS is sensitive to DNA damage. Old or broken DNA samples work poorly.
GBS a streamlined and cost - effective reduced - representation genome sequencing tech, enables rapid, genome - wide discovery and genotyping of high - density SNPs in numerous samples via its standardized enzyme digestion - library construction - pooling - sequencing procedure. Although it faces challenges like coverage bias, high DNA quality requirements, and data missing, its broad applications in population genetics, association analysis, molecular breeding, and germplasm evaluation have demonstrated its value. As sequencing costs drop and bioinformatics analysis improves, GBS will keep supporting genomics - driven life sciences. Future efforts may focus on optimizing enzyme digestion (e.g., dual - enzyme digestion), boosting multiplexing capability, refining polyploid genotyping algorithms, and better integrating long - read sequencing to overcome coverage gaps.
References
- Poland JA, Rife TW. "Genotyping-by-Sequencing for Plant Breeding and Genetics". Plant Genome. 2012, 5(3), 92-102 https://doi.org/10.3835/plantgenome2012.05.0005
- Hamblin MT, Rabbi IY. "The Effects of Restriction-Enzyme Choice on Properties of Genotyping-by-Sequencing Libraries: A Study in Cassava". Crop ence. 2014, 54(6) https://doi.org/10.2135/cropsci2014.02.0160
- Elshire RJ, Glaubitz JC, Sun Q, et al. "A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species". PLOS ONE. 2011, 6(5) https://doi.org/10.1371/journal.pone.0019379
- Huang YF, Poland JA, Wight CP, et al. "Using Genotyping-By-Sequencing (GBS) for Genomic Discovery in Cultivated Oat". Plos One, 2014, 9(7) https://doi.org/10.1371/journal.pone.0102448
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.