banner
The Role of GWAS in Agriculture: Workflow, Applications and Advantages

The Role of GWAS in Agriculture: Workflow, Applications and Advantages

Inquiry

With the completion of the human genome project and the rapid development of sequencing technology, genome-wide association studies (GWAS), as a new genetic analysis method, has become an important tool to explore the genetic mechanism of human complex diseases. The core idea of GWAS is to find genetic variation significantly related to a specific phenotype (such as disease state) by comparing the genotype data of a large number of individuals with the corresponding phenotype data.

What is GWAS

GWAS is an analytical method to study the correlation between genotype and phenotype at the population level. It uses Qualcomm's genotyping technology to scan the genomes of a large number of individuals to detect single nucleotide polymorphisms (SNP) or other genetic variations related to specific traits or diseases. The main purpose of GWAS is to reveal the correlation between these genetic variations and phenotypes, so as to provide important information for understanding the genetic mechanism of diseases, discovering new disease susceptibility genes and developing personalized treatment strategies.

The basic principle of GWAS is to compare the genomes of different individuals and find out the genetic variation related to the traits of interest. These genetic variations are usually SNP, that is, single base differences in DNA sequences. Genome sequencing or SNP chip analysis were performed on large-scale samples in GWAS, and then the genotype data and phenotypic data were correlated.

The commonly used models in GWAS include general linear model (GLM) and mixed linear model (MLM). GLM thinks that phenotype is effected by genetic effect (SNP interference) and random error, while MLM adds a genetic relationship (random effect) to GLM. These models help us identify genetic variations that are significantly related to phenotypes at the population level.

The step-by-step workflow of the proposed framework (Jangale et al., 2024)The stepwise workflow of the proposed framework (Jangale et al., 2024)

How Does GWAS Work

The experimental process of GWAS is a systematic and rigorous process, involving several key steps to ensure the accuracy and reliability of the research. The following is a detailed introduction to the experimental process of GWAS.

Sample Collection

Collecting DNA and phenotypic information: Firstly, it is necessary to collect DNA samples from a large number of individuals and record the corresponding phenotypic information, such as disease status and physiological characteristics. This information is very important for subsequent data analysis and result interpretation.

Determine the sample size and diversity: The size and diversity of the sample size directly affect the statistical efficacy of GWAS and the universality of the results. Therefore, it is necessary to carefully determine the sample size before the start of the study and ensure that the sample is sufficiently representative in terms of gender, age and race.

Genotyping

Microarray technology: This is a common genotyping technology, and thousands of SNP loci can be detected simultaneously through specific microarray chips. This technique is easy to operate and low cost, and is suitable for the analysis of large-scale samples.

Next generation sequencing technology: With the rapid development of sequencing technology, Next Generation Sequencing (NGS) technology is gradually applied to GWAS. NGS technology can provide more comprehensive genome coverage, including detecting rare mutations and insertions/deletions. However, its cost is relatively high, and data processing and analysis are more complicated.

Data Quality Control

Filtering rare mutations and sites that are not in Harvin equilibrium: In GWAS, it is necessary to filter out those mutations that are extremely rare in the population because their contribution to association analysis is limited. At the same time, we should exclude those sites that are not in the state of Harvin equilibrium to ensure the accuracy of association analysis.

Controlling population stratification and individual loss rate: Population stratification and individual loss rate are important factors affecting the results of GWAS. In order to reduce the interference of these factors, it is necessary to make a detailed hierarchical analysis of the samples and properly control the missing rate of individuals.

Statistical Analysis

Correlation test method: In GWAS, commonly used correlation test methods include chi-square test and logistic regression. These methods are used to test whether the association between genotype and phenotype is significant. Choosing a suitable statistical method is very important for the reliability of the results.

Covariance correction: in order to eliminate the influence of potential confounding factors on association analysis, covariate correction is usually needed. This includes controlling demographic characteristics such as age and gender, as well as possible genetic background factors.

Result Interpretation

Identify genetic variation with significant correlation: Through statistical analysis, genetic variation with significant correlation with phenotype can be identified. These variations may directly participate in the formation of traits or be adjacent to the functional regions of key genes.

Explore the mechanism of variation affecting traits: Once significant associated variations are identified, it is necessary to further explore how these variations affect traits. This includes studying the effects of mutation on gene expression, protein function or signal transduction pathway. Through in-depth study of the functional mechanism of mutation, we can provide new ideas and methods for disease prevention, diagnosis and treatment.

An example pathway for uncovering genotype - phenotype associations via GWAS (Wright et al., 2017)Example path for the discovery genotype-phenotype associations through GWAS (Wright et al., 2017)

GWAS Application in Agricultural Research

GWAS as a key technology in the field of genetics, can carry out comprehensive correlation analysis on complex traits such as plant height, yield, quality and stress resistance of crops at the whole genome level. With its high efficiency and accuracy, GWAS has been widely used in agricultural research.

Genetic Diversity Analysis

Evaluation of population genetic structure: GWAS can use a large number of SNP markers to analyze the genetic structure of crop populations and understand the genetic relationship and genetic differences between different populations. For example, in wheat population, it was found by GWAS analysis that wheat varieties from different geographical regions had obvious differences in genetic structure, which provided important reference for wheat introduction, cross breeding and rational utilization of genetic resources.

Tracing back to the origin and domestication of crops: By conducting GWAS on modern crops and their wild relatives, we can reveal the changes of the genome of crops during domestication and trace back to the origin and domestication history of crops. For example, the GWAS study of rice shows that Asian cultivated rice is domesticated from common wild rice for a long time, and in the process of domestication, many genes related to agronomic traits have been artificially selected.

Auxiliary Crop Breeding

Molecular marker-assisted selection: SNP markers closely related to target traits identified by GWAS can be used as molecular markers for assisted selection in crop breeding. Compared with traditional breeding methods, molecular marker-assisted selection can select individuals with excellent genes more accurately, accelerate the breeding process and improve the breeding efficiency. For example, in cotton breeding, using SNP markers related to fiber quality traits for auxiliary selection can quickly screen out cotton varieties with excellent fiber quality.

Prediction of heterosis: Heterosis is a widely used phenomenon in crop breeding, but the prediction of heterosis has always been a difficult problem in breeding. GWAS can predict heterosis by analyzing the genome information of parents and mining genetic loci and genetic markers related to heterosis. For example, GWAS has made some progress in the division of maize heterosis groups and heterosis prediction, which provides theoretical guidance for the selection of maize hybrids.

The genome of the adzuki bean has multiple genetic characteristics (Ding et al., 2024)Multiple genetic characteristics of the adzuki bean genome (Ding et al., 2024)

Gene Mapping of Important Traits

Growth and development traits: the growth rate, weight, body size and other growth and development traits of livestock directly affect the breeding benefits. Genetic loci related to these traits can be found through GWAS. For example, in the study of pigs, some SNP loci related to growth rate and back fat thickness were found, which may affect the muscle growth and fat deposition of pigs.

Reproductive traits: Reproductive performance is one of the important economic traits in livestock breeding, including litter size, estrus cycle and pregnancy rate. GWAS also played an important role in the study of livestock reproductive traits. For example, in the study of sheep, the genetic loci related to litter size were identified by GWAS, which provided a genetic basis for improving the reproductive performance of sheep.

Study on Disease Resistance and Susceptibility

Screening of disease-resistant genes: Livestock diseases will bring huge economic losses to aquaculture. Through GWAS, genes related to disease resistance can be found, providing a basis for breeding disease-resistant varieties.

Analysis of genetic mechanism of diseases: Understanding the genetic mechanism of livestock diseases is of great significance for disease prevention and control. GWAS can reveal the genetic basis of disease occurrence and help researchers understand the process of disease occurrence and development.

Manhattan plots and Q - Q plots for the imputed sequence - based GWAS related to ADG, ADFI, and RFI (Ye et al., 2020)Manhattan plots and Q-Q plots of imputed sequence-based GWAS for ADG, ADFI, and RFI (Ye et al., 2020)

Advantages of GWAS in Agriculture

GWAS can systematically carry out association analysis on complex traits in the whole genome. Especially in the agricultural field, it has shown remarkable advantages by virtue of its precise positioning and efficient screening, which has greatly promoted agricultural scientific research and industrial development.

Excavate Excellent Gene Resources

Comprehensive scanning: GWAS can scan a large number of genetic markers in the whole genome, and can detect the loci related to the target traits without knowing the function and location information of genes in advance, so as to comprehensively dig out the gene resources related to important agronomic traits such as crop yield, quality and stress resistance.

Discovering new genes: This method helps to discover some new genes that have not been paid attention to before, provides more gene targets for crop genetic improvement, and broadens the gene pool available in agricultural breeding.

Analytic Complex Genetic Basis

Multi-gene mapping: Many important agronomic traits, such as yield and quality, are complex traits controlled by multiple genes. GWAS can simultaneously locate multiple loci related to these complex traits, reveal their genetic structure, and clarify the relative contribution of each gene to the traits, which is helpful to deeply understand the genetic regulation mechanism of complex traits.

Gene interaction analysis: In addition to locating a single locus, GWAS can also analyze the interaction between genes (epistasis) and the interaction effect between genes and environment, so as to more comprehensively analyze the genetic expression law of complex traits under different environmental conditions and provide theoretical basis for formulating accurate breeding strategies.

Assist Variety Identification and Protection

Fingerprint construction: GWAS technology can be used to scan the whole genome of different varieties, obtain a lot of genetic marker information, and construct the fingerprint of varieties. This fingerprint is highly specific and accurate, and can be used to identify and distinguish varieties, effectively preventing varieties from being mixed and the circulation of fake and shoddy seeds.

Evaluation of genetic diversity: By analyzing the genome variation of different varieties, GWAS can accurately evaluate the genetic relationship and genetic diversity level among varieties, provide important reference for the protection and utilization of variety resources and the cultivation of new varieties, and help to rationally plan and utilize crop genetic resources and avoid excessive concentration and loss of genetic resources.

Genetic loci associated with AF were identified through GWAS (Milan et al., 2010)Genetic loci for AF identified by GWAS (Milan et al., 2010)

GWAS Challenges in Agriculture Application

GWAS can explore the relationship between heredity and phenotype, and has made outstanding contributions to biomedicine. However, it is not a smooth road, it is difficult to process massive data, the interaction between genes and environment is complex and difficult to analyze, the sample representation is limited, and the causal inference is also difficult. Facing up to these challenges, GWAS can better help the development of precision medicine

Sample-related Problems

High sample size: In order to obtain reliable GWAS results, a large number of samples with good genetic diversity are needed. However, the cost of large-scale sample collection and genome sequencing is high, which is a big burden for many research and breeding projects.

Limited sample diversity: In practice, it may be difficult to collect enough diverse samples. For example, some wild relatives of crops or livestock may be difficult to obtain, or some local varieties are underrepresented in the sample due to the decrease of planting area and the decrease of breeding quantity. In addition, there may be differences in population structure among samples from different regions, which will also affect the accuracy of the analysis results.

Genetic Complexity of Traits

Multi-gene control: Many agricultural-related traits are jointly determined by multiple genes and also influenced by environmental factors. This complex genetic basis makes it difficult to identify all key genes simply by GWAS.

Gene interaction: There are complex interactions between genes, such as epistasis and gene network. Generally, GWAS can only detect the association between a single gene or locus and traits, and it is difficult to fully capture the interaction between these genes, thus affecting the in-depth understanding of the genetic mechanism of traits.

Gene-environment interaction: The traits of plants and animals are not only determined by genes, but also strongly influenced by the environment. GWAS may not be able to fully capture this gene-environment interaction, resulting in inconsistent results of association analysis under different environmental conditions, which affects the accurate evaluation of gene function and its application in actual production.

GWAS Informed by Pathogens (Bourgeois et al., 2020)Pathogen-Informed GWAS (Bourgeois et al., 2020)

Conclusion

GWAS has injected strong impetus into agricultural development. It accurately locates gene loci related to key traits such as crop yield, quality and stress resistance, and helps to cultivate excellent varieties with high yield, high quality and resistance to pests and diseases. Looking forward to the future, with the continuous innovation of GWAS technology, it will expand deeply in the agricultural field, make a continuous contribution to ensuring global food security, optimizing agricultural ecology and promoting sustainable agricultural development, and open a new era of agricultural scientific and technological innovation.

References

  1. Wright F, Fessele K. "Primer in Genetics and Genomics, Article 5-Further Defining the Concepts of Genotype and Phenotype and Exploring Genotype-Phenotype Associations." Biol Res Nurs. 2017 19(5):576-585 https://doi.org/10.1177/1099800417725190
  2. Jangale, Vaishnavi., et al. "Enhancing genotype-phenotype association with optimized machine learning and biological enrichment methods." medRxiv (2024) https://doi.org/10.1101/2024.06.14.24308920
  3. Ding D.Y. "Key genetic markers discovered through GWAS in leguminous crops and their application in molecular breeding." Legume Genomics and Genetics 2024 15(1): 13-22 https://doi.org/10.5376/lgg.2024.15.0002
  4. Ye S, Chen ZT., et al. "New Insights From Imputed Whole-Genome Sequence-Based Genome-Wide Association Analysis and Transcriptome Analysis: The Genetic Mechanisms Underlying Residual Feed Intake in Chickens." Front Genet. 2020 11:243 https://doi.org/10.3389/fgene.2020.00243
  5. Milan DJ, Lubitz SA., et al. "Genome-wide association studies in cardiac electrophysiology: recent discoveries and implications for clinical practice." Heart Rhythm. 2010 7(8):1141-8 https://doi.org/10.1016/j.hrthm.2010.04.021
  6. Bourgeois JS, Smith CM, Ko DC. "These Are the Genes You're Looking For: Finding Host Resistance Genes." Trends Microbiol. 2021 29(4): 346-362 https://doi.org/10.1016/j.tim.2020.09.006
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Send a MessageSend a Message

For any general inquiries, please fill out the form below.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
We provide the best service according to your needs Contact Us
PDF Download
* Email Address:

CD Genomics needs the contact information you provide to us in order to contact you about our products and services and other content that may be of interest to you. By clicking below, you consent to the storage and processing of the personal information submitted above by CD Genomcis to provide the content you have requested.

×
OUR MISSION

CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.

Contact Us
  • SUITE 111, 17 Ramsey Road, Shirley, NY 11967, USA
  • 1-631-338-8059
  • 1-631-614-7828
Copyright © 2025 CD Genomics. All Rights Reserved.
Top

We use cookies to understand how you use our site and to improve the overall user experience. This includes personalizing content and advertising. Read our Privacy Policy

Accept Cookies
x