TL;DR – What this article covers and why it matters
Genomic selection is a breeding strategy that uses genome-wide DNA markers and statistical genomic prediction models to estimate breeding values before full field testing. By combining dense genotyping with high-quality phenotypes in a training population, breeders can predict genomic estimated breeding values (GEBV) for thousands of candidates and select earlier, faster, and with more confidence. This guide explains how genomic selection in plant breeding and animal breeding works in practice, how a genomic selection pipeline is built, how to choose between LC-WGS, GBS, and SNP genotyping for genomic selection projects, and which design choices most strongly affect prediction accuracy and genetic gain.
Key takeaways
Figure 1. Genomic selection pipeline summarized.
Genomic selection is changing breeding because it links genome-wide marker data directly to complex traits such as yield, fertility, health, and resilience. It provides a way to increase genetic gain per year without multiplying trial size or cost.
Traditional phenotypic selection relies on multi-year, multi-location trials to evaluate breeding candidates. For many crops and livestock species, a single selection cycle can take five to ten years, especially when you include seed or stock multiplication and regulatory steps. In the meantime, disease pressures, climate conditions, and market demands continue to evolve.
Genomic selection shortens this feedback loop. Young plants or juvenile animals are genotyped early, and genomic prediction models convert their marker profiles into GEBV. Instead of waiting for complete field performance, you can make informed decisions much earlier in the breeding cycle.
Several dairy cattle breeding programs, for example, have reported higher rates of genetic gain per year and shorter generation intervals after adopting genomic selection compared with previous schemes based on pedigree and phenotype alone. Similar trends are now seen in maize, wheat, rice, and other crops, where genomic prediction improves the efficiency of variety development and line recycling.
From the conversations we have with breeding and R&D teams, the main pain points are:
A genomic selection project, when designed carefully, addresses these issues by:
Genomic selection is a breeding method in which selection candidates are chosen based on breeding values predicted from genome-wide marker data and a training population that has both genotypes and phenotypes. In practice, genomic selection uses genomic prediction models to turn SNP profiles into GEBV that guide which individuals contribute to the next generation.
To work efficiently with genomic selection in plant breeding and animal breeding, it helps to understand a few core concepts.
A GEBV is the predicted breeding value of an individual derived from its genome-wide marker profile and a genomic prediction model trained on related individuals with known phenotypes. It plays the same role as a traditional breeding value but incorporates much more detailed genetic information than pedigree alone.
The training population is the set of individuals that have both high-quality phenotypes for target traits and genome-wide genotypes. The genomic prediction model "learns" marker–trait relationships from this group. A larger and more representative training population generally leads to higher prediction accuracy.
Figure 2. Overall summary of the most commonly used models in genomic selection. (Budhlakoti N. et al. (2022) Frontiers in Genetics)
A validation population is used to test how well the genomic prediction model performs on new material. It may be a subset of the training population held out in cross-validation or a separate set of lines or animals that represent future selection candidates.
A genomic prediction model is a statistical or machine learning model that links marker data to trait values. Common approaches include GBLUP and Bayesian methods that treat marker effects as random variables. More advanced models can also incorporate genotype-by-environment interactions when multi-environment data are available.
Marker-assisted selection usually targets a small number of known QTL or major genes. Genomic selection, in contrast, uses thousands to millions of markers distributed across the genome, without needing to identify individual QTL beforehand.
A concise way to express "genomic selection vs marker-assisted selection" is to compare focus and applications:
| Aspect | Marker-Assisted Selection (MAS) | Genomic Selection (GS) |
|---|---|---|
| Genetic target | Few large-effect QTL | Genome-wide small + large effects |
| Best suited for | Major disease resistance, single genes | Complex traits like yield, fertility, resilience |
| Prior knowledge | Requires mapped markers or QTL | Can start from dense markers alone |
| Main output | Marker-based decisions | GEBV for ranking and selection |
In practice, many modern programs use both: MAS for specific major genes and genomic selection to optimize the polygenic background.
A genomic selection pipeline can be broken down into a series of practical steps. This is also how we typically structure genomic selection project discussions with breeders, CRO partners, and R&D teams.
Figure 3. Basic schema of the genomic selection process. (Budhlakoti N. et al. (2022) Frontiers in Genetics)
Before any genotyping is ordered, clarify:
Clear definitions reduce noise in phenotypic data and ensure that the genomic selection project supports your strategic breeding goals.
The training population is the foundation of genomic prediction. Good practice includes:
Breeders who invest early in a well-designed training population usually see better GEBV accuracy and smoother expansion of genomic selection in later cycles.
Once the training population is defined, you select a genotyping strategy for both training individuals and future selection candidates. Common options include:
A dedicated section below compares these approaches specifically for genomic selection projects.
After genotyping, genomic prediction models are fitted for each trait of interest. Practical points to consider:
Clear reporting of GEBV accuracy by trait helps breeders and decision-makers see where genomic selection will have the strongest impact.
Once you have reliable GEBV, you can:
Over time, genomic selection becomes part of a recurrent selection cycle. Each generation provides new phenotype and genotype data that update the training population and improve genomic prediction models.
Genotyping choice is one of the most common questions when planning a new genomic selection project. The genotyping platform influences marker density, genome coverage, data flexibility, and cost per sample.
LC-WGS generates shallow reads across the entire genome and then uses genotype imputation to recover dense SNP data. It provides near-whole-genome coverage and is well suited to genomic selection and GWAS in species that lack mature SNP chips or require high marker density.
Genotyping-by-sequencing (GBS) and related RAD-based methods sample a fraction of the genome using restriction enzymes. They produce many markers at moderate coverage and are widely used in crops where cost per sample and throughput are key constraints.
SNP arrays use fixed sets of markers selected for important germplasm. They are robust and scalable, making them ideal for routine genotyping in species with established commercial chips or custom-designed SNP panels.
A side-by-side comparison helps breeders and technical managers choose the right strategy:
| Feature | LC-WGS (with imputation) | GBS / RAD-based | SNP arrays/panels |
|---|---|---|---|
| Genome coverage | Genome-wide | Subset of genome | Fixed marker set |
| Marker density | Very high after imputation | Moderate to high | Depends on chip design |
| Upfront setup | Moderate | Low to moderate | Higher if custom chip |
| Per-sample cost | Competitive at scale | Low | Low to moderate |
| Best fit | Genomic selection, GWAS, new or minor species | Early GS, diversity studies, population structure | Established species, large routine GS pipelines |
LC-WGS and GBS data can support both genomic selection and genome-wide association studies, which is attractive when you want to combine marker discovery with genomic prediction. SNP panels are efficient when you already know which markers work well in your germplasm and mainly need fast, routine genotyping.
From actual project discussions, a few typical scenarios emerge:
If a robust commercial SNP chip exists, a SNP panel can be a practical starting point for genomic selection in plant breeding. When higher marker density or more flexibility is needed, LC-WGS for Genomic Selection becomes appealing.
LC-WGS with imputation is often attractive because it does not rely on pre-defined chip content and can expand as reference panels grow.
GBS / RAD-based Genotyping offers a practical route to genomic selection in species with limited genomic resources and tight budgets.
Our LC-WGS for Genomic Selection, GBS / RAD-based Genotyping, and SNP Genotyping & Molecular Breeding solutions are designed to support these scenarios and help align genotyping with both current and future project needs.
Genomic selection is now used across numerous crops, livestock species, and aquaculture programs rather than being a purely theoretical concept.
Breeding programs in cereals, oilseeds, and legumes use genomic selection to:
Published multi-environment studies in maize and wheat show that genomic prediction models can capture a large share of genetic variance for yield and disease resistance, especially when training populations are well structured and represent target environments. Genomic selection in plant breeding allows breeders to discard weak lines earlier and reserve costly field trials for the most promising candidates.
Figure 4. Structure of simulated wheat breeding program running over 25 years. (Tessema B.B. et al. (2020) Frontiers in Genetics)
In dairy cattle, genomic selection was initially used for milk yield, fat and protein content, fertility, and disease resistance. It has reduced reliance on long and expensive progeny testing schemes and has increased the rate of genetic gain. Similar strategies are now deployed in beef cattle, pigs, poultry, and aquaculture, targeting growth, feed efficiency, survival, and product quality.
Both high-density SNP panels and LC-WGS are used in genomic selection in animal breeding. The choice depends on available reference resources, commercial chip options, and long-term data strategy.
Recommended Services for This Step:
Learn More:
Most commercial breeding programs need to improve several traits at once. Genomic selection adapts naturally to this situation:
Many programs start with a small set of high-priority traits for genomic selection and then expand the index as they gain confidence and collect more phenotypes.
Genomic selection does not replace GWAS or MAS; it complements them:
Linking genomic selection projects with existing GWAS and marker-assisted selection efforts can improve both efficiency and biological insight.
Well-designed genomic selection projects share several features. Below are practical design tips and realistic caveats based on published studies and project experience.
Traits with moderate to high heritability and stable scoring protocols are good candidates for early genomic selection.
Ensure the training population covers the germplasm that matters for your future candidates. Include elite lines, key donors, and materials relevant to target markets or production systems.
Genomic models can handle statistical noise only to a certain degree. Good design and replication remain essential for reliable trait values.
Align on data formats, sample naming, and metadata standards before genotyping. This reduces errors and speeds up downstream analysis and bioinformatics.
Setting realistic expectations is important:
Frequently observed issues include:
Trying to run genomic selection with only a few dozen individuals usually leads to unstable models. Whenever possible, aim for at least a few hundred individuals in the training population.
Changes in scoring scale, observer bias, or uncontrolled field heterogeneity will reduce GEBV accuracy. Establish clear phenotyping SOPs and training for technicians.
If target environments differ strongly, consider building separate models for each region or using models that include environmental covariates.
Selecting the absolute lowest-cost option may be false economy if it cannot support future GWAS, new traits, or new germplasm. Comparing LC-WGS for Genomic Selection, GBS / RAD-based Genotyping, and SNP Genotyping & Molecular Breeding with your long-term goals in mind is important.
Genomic selection becomes even more powerful when combined with:
If you plan to upgrade your reference resources, it is worth considering how T2T and haplotype-resolved genomes can support future genomic selection and GWAS projects.
If genomic selection is new to your program, the first step is usually a structured discussion rather than an immediate sequencing order. A short checklist can help you prepare.
Before launching a genomic selection project, try to clarify:
The clearer these points are, the easier it is to design a realistic genotyping and genomic prediction plan.
CD Genomics provides sequencing and analysis services that map directly to the genomic selection pipeline:
By aligning genotyping and bioinformatics with your breeding goals, you can move from conceptual interest in genomic selection to a concrete project with clear timelines and deliverables.
Figure 5. Genomic selection startup workflow with CD Genomics, from breeding goals to a tailored genotyping and prediction plan.
If you are considering genomic selection in plant or animal breeding, the next step can be a brief consultation with our technical team. Share your current population structure, target traits, and available data, and we can help you decide whether LC-WGS for Genomic Selection, GBS / RAD-based Genotyping, or SNP Genotyping & Molecular Breeding is the best starting point, and what sample sizes and timelines are realistic for your first genomic selection cycle.
There is no single magic number, but prediction accuracy generally improves as training population size increases and begins to stabilize when several hundred individuals are included and they represent your target germplasm. For complex traits, many programs aim for 500–2,000 individuals in the training population when budget and logistics allow.
A good reference genome is very helpful, especially for LC-WGS and GWAS, but it is not mandatory for every genomic selection project. GBS / RAD-based Genotyping can be used in species with limited genomic resources, and SNP arrays can work from known marker sets. Over time, however, investing in a better reference often pays off through improved marker placement and more robust genomic prediction.
Genomic selection tends to perform best for traits with moderate to high heritability, but it can still be useful for lower-heritability traits if you have large, well-phenotyped training populations. For very noisy traits, improving phenotyping and field design usually brings larger gains in prediction accuracy than changing the statistical model.
The core principles are similar, but implementation details differ. In cross-pollinated crops, genomic selection often focuses on recurrent population improvement and hybrid prediction, where relatedness patterns and heterozygosity matter. In self-pollinated crops, genomic selection is frequently applied to lines at inbred or near-inbred stages, and training populations can be designed to mirror the line development pipeline. In both cases, the training population should represent the germplasm you intend to improve.
Yes. Many programs start with a pilot phase, using a moderate training population and one or two key traits to test genomic selection in their own context. The pilot helps calibrate expectations for prediction accuracy and cost. When planning a pilot, it is still worth choosing genotyping platforms—such as LC-WGS for Genomic Selection or GBS / RAD-based Genotyping—that can scale to thousands of samples as your program grows.
References
Send a MessageFor any general inquiries, please fill out the form below.
CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.