We use cookies to understand how you use our site and to improve the overall user experience. This includes personalizing content and advertising. Read our Privacy Policy
With the completion of the human genome project and the rapid development of sequencing technology, genome-wide association studies (GWAS), as a new genetic analysis method, has become an important tool to explore the genetic mechanism of human complex diseases. The core idea of GWAS is to find genetic variation significantly related to a specific phenotype (such as disease state) by comparing the genotype data of a large number of individuals with the corresponding phenotype data.
GWAS is an analytical method to study the correlation between genotype and phenotype at the population level. It uses Qualcomm's genotyping technology to scan the genomes of a large number of individuals to detect single nucleotide polymorphisms (SNP) or other genetic variations related to specific traits or diseases. The main purpose of GWAS is to reveal the correlation between these genetic variations and phenotypes, so as to provide important information for understanding the genetic mechanism of diseases, discovering new disease susceptibility genes and developing personalized treatment strategies.
The basic principle of GWAS is to compare the genomes of different individuals and find out the genetic variation related to the traits of interest. These genetic variations are usually SNP, that is, single base differences in DNA sequences. Genome sequencing or SNP chip analysis were performed on large-scale samples in GWAS, and then the genotype data and phenotypic data were correlated.
The commonly used models in GWAS include general linear model (GLM) and mixed linear model (MLM). GLM thinks that phenotype is effected by genetic effect (SNP interference) and random error, while MLM adds a genetic relationship (random effect) to GLM. These models help us identify genetic variations that are significantly related to phenotypes at the population level.
The stepwise workflow of the proposed framework (Jangale et al., 2024)
The experimental process of GWAS is a systematic and rigorous process, involving several key steps to ensure the accuracy and reliability of the research. The following is a detailed introduction to the experimental process of GWAS.
Collecting DNA and phenotypic information: Firstly, it is necessary to collect DNA samples from a large number of individuals and record the corresponding phenotypic information, such as disease status and physiological characteristics. This information is very important for subsequent data analysis and result interpretation.
Determine the sample size and diversity: The size and diversity of the sample size directly affect the statistical efficacy of GWAS and the universality of the results. Therefore, it is necessary to carefully determine the sample size before the start of the study and ensure that the sample is sufficiently representative in terms of gender, age and race.
Microarray technology: This is a common genotyping technology, and thousands of SNP loci can be detected simultaneously through specific microarray chips. This technique is easy to operate and low cost, and is suitable for the analysis of large-scale samples.
Next generation sequencing technology: With the rapid development of sequencing technology, Next Generation Sequencing (NGS) technology is gradually applied to GWAS. NGS technology can provide more comprehensive genome coverage, including detecting rare mutations and insertions/deletions. However, its cost is relatively high, and data processing and analysis are more complicated.
Filtering rare mutations and sites that are not in Harvin equilibrium: In GWAS, it is necessary to filter out those mutations that are extremely rare in the population because their contribution to association analysis is limited. At the same time, we should exclude those sites that are not in the state of Harvin equilibrium to ensure the accuracy of association analysis.
Controlling population stratification and individual loss rate: Population stratification and individual loss rate are important factors affecting the results of GWAS. In order to reduce the interference of these factors, it is necessary to make a detailed hierarchical analysis of the samples and properly control the missing rate of individuals.
Correlation test method: In GWAS, commonly used correlation test methods include chi-square test and logistic regression. These methods are used to test whether the association between genotype and phenotype is significant. Choosing a suitable statistical method is very important for the reliability of the results.
Covariance correction: in order to eliminate the influence of potential confounding factors on association analysis, covariate correction is usually needed. This includes controlling demographic characteristics such as age and gender, as well as possible genetic background factors.
Identify genetic variation with significant correlation: Through statistical analysis, genetic variation with significant correlation with phenotype can be identified. These variations may directly participate in the formation of traits or be adjacent to the functional regions of key genes.
Explore the mechanism of variation affecting traits: Once significant associated variations are identified, it is necessary to further explore how these variations affect traits. This includes studying the effects of mutation on gene expression, protein function or signal transduction pathway. Through in-depth study of the functional mechanism of mutation, we can provide new ideas and methods for disease prevention, diagnosis and treatment.
Example path for the discovery genotype-phenotype associations through GWAS (Wright et al., 2017)
Service you may interested in
Learn More
How to Interpret GWAS Data: Application in Agriculture
GWAS in Agriculture: Application Cases and Impacts on Agricultural Development
GWAS as a key technology in the field of genetics, can carry out comprehensive correlation analysis on complex traits such as plant height, yield, quality and stress resistance of crops at the whole genome level. With its high efficiency and accuracy, GWAS has been widely used in agricultural research.
Evaluation of population genetic structure: GWAS can use a large number of SNP markers to analyze the genetic structure of crop populations and understand the genetic relationship and genetic differences between different populations. For example, in wheat population, it was found by GWAS analysis that wheat varieties from different geographical regions had obvious differences in genetic structure, which provided important reference for wheat introduction, cross breeding and rational utilization of genetic resources.
Tracing back to the origin and domestication of crops: By conducting GWAS on modern crops and their wild relatives, we can reveal the changes of the genome of crops during domestication and trace back to the origin and domestication history of crops. For example, the GWAS study of rice shows that Asian cultivated rice is domesticated from common wild rice for a long time, and in the process of domestication, many genes related to agronomic traits have been artificially selected.
Molecular marker-assisted selection: SNP markers closely related to target traits identified by GWAS can be used as molecular markers for assisted selection in crop breeding. Compared with traditional breeding methods, molecular marker-assisted selection can select individuals with excellent genes more accurately, accelerate the breeding process and improve the breeding efficiency. For example, in cotton breeding, using SNP markers related to fiber quality traits for auxiliary selection can quickly screen out cotton varieties with excellent fiber quality.
Prediction of heterosis: Heterosis is a widely used phenomenon in crop breeding, but the prediction of heterosis has always been a difficult problem in breeding. GWAS can predict heterosis by analyzing the genome information of parents and mining genetic loci and genetic markers related to heterosis. For example, GWAS has made some progress in the division of maize heterosis groups and heterosis prediction, which provides theoretical guidance for the selection of maize hybrids.
Multiple genetic characteristics of the adzuki bean genome (Ding et al., 2024)
Growth and development traits: the growth rate, weight, body size and other growth and development traits of livestock directly affect the breeding benefits. Genetic loci related to these traits can be found through GWAS. For example, in the study of pigs, some SNP loci related to growth rate and back fat thickness were found, which may affect the muscle growth and fat deposition of pigs.
Reproductive traits: Reproductive performance is one of the important economic traits in livestock breeding, including litter size, estrus cycle and pregnancy rate. GWAS also played an important role in the study of livestock reproductive traits. For example, in the study of sheep, the genetic loci related to litter size were identified by GWAS, which provided a genetic basis for improving the reproductive performance of sheep.
Screening of disease-resistant genes: Livestock diseases will bring huge economic losses to aquaculture. Through GWAS, genes related to disease resistance can be found, providing a basis for breeding disease-resistant varieties.
Analysis of genetic mechanism of diseases: Understanding the genetic mechanism of livestock diseases is of great significance for disease prevention and control. GWAS can reveal the genetic basis of disease occurrence and help researchers understand the process of disease occurrence and development.
Manhattan plots and Q-Q plots of imputed sequence-based GWAS for ADG, ADFI, and RFI (Ye et al., 2020)
GWAS can systematically carry out association analysis on complex traits in the whole genome. Especially in the agricultural field, it has shown remarkable advantages by virtue of its precise positioning and efficient screening, which has greatly promoted agricultural scientific research and industrial development.
Comprehensive scanning: GWAS can scan a large number of genetic markers in the whole genome, and can detect the loci related to the target traits without knowing the function and location information of genes in advance, so as to comprehensively dig out the gene resources related to important agronomic traits such as crop yield, quality and stress resistance.
Discovering new genes: This method helps to discover some new genes that have not been paid attention to before, provides more gene targets for crop genetic improvement, and broadens the gene pool available in agricultural breeding.
Multi-gene mapping: Many important agronomic traits, such as yield and quality, are complex traits controlled by multiple genes. GWAS can simultaneously locate multiple loci related to these complex traits, reveal their genetic structure, and clarify the relative contribution of each gene to the traits, which is helpful to deeply understand the genetic regulation mechanism of complex traits.
Gene interaction analysis: In addition to locating a single locus, GWAS can also analyze the interaction between genes (epistasis) and the interaction effect between genes and environment, so as to more comprehensively analyze the genetic expression law of complex traits under different environmental conditions and provide theoretical basis for formulating accurate breeding strategies.
Fingerprint construction: GWAS technology can be used to scan the whole genome of different varieties, obtain a lot of genetic marker information, and construct the fingerprint of varieties. This fingerprint is highly specific and accurate, and can be used to identify and distinguish varieties, effectively preventing varieties from being mixed and the circulation of fake and shoddy seeds.
Evaluation of genetic diversity: By analyzing the genome variation of different varieties, GWAS can accurately evaluate the genetic relationship and genetic diversity level among varieties, provide important reference for the protection and utilization of variety resources and the cultivation of new varieties, and help to rationally plan and utilize crop genetic resources and avoid excessive concentration and loss of genetic resources.
Genetic loci for AF identified by GWAS (Milan et al., 2010)
GWAS can explore the relationship between heredity and phenotype, and has made outstanding contributions to biomedicine. However, it is not a smooth road, it is difficult to process massive data, the interaction between genes and environment is complex and difficult to analyze, the sample representation is limited, and the causal inference is also difficult. Facing up to these challenges, GWAS can better help the development of precision medicine
High sample size: In order to obtain reliable GWAS results, a large number of samples with good genetic diversity are needed. However, the cost of large-scale sample collection and genome sequencing is high, which is a big burden for many research and breeding projects.
Limited sample diversity: In practice, it may be difficult to collect enough diverse samples. For example, some wild relatives of crops or livestock may be difficult to obtain, or some local varieties are underrepresented in the sample due to the decrease of planting area and the decrease of breeding quantity. In addition, there may be differences in population structure among samples from different regions, which will also affect the accuracy of the analysis results.
Multi-gene control: Many agricultural-related traits are jointly determined by multiple genes and also influenced by environmental factors. This complex genetic basis makes it difficult to identify all key genes simply by GWAS.
Gene interaction: There are complex interactions between genes, such as epistasis and gene network. Generally, GWAS can only detect the association between a single gene or locus and traits, and it is difficult to fully capture the interaction between these genes, thus affecting the in-depth understanding of the genetic mechanism of traits.
Gene-environment interaction: The traits of plants and animals are not only determined by genes, but also strongly influenced by the environment. GWAS may not be able to fully capture this gene-environment interaction, resulting in inconsistent results of association analysis under different environmental conditions, which affects the accurate evaluation of gene function and its application in actual production.
Pathogen-Informed GWAS (Bourgeois et al., 2020)
GWAS has injected strong impetus into agricultural development. It accurately locates gene loci related to key traits such as crop yield, quality and stress resistance, and helps to cultivate excellent varieties with high yield, high quality and resistance to pests and diseases. Looking forward to the future, with the continuous innovation of GWAS technology, it will expand deeply in the agricultural field, make a continuous contribution to ensuring global food security, optimizing agricultural ecology and promoting sustainable agricultural development, and open a new era of agricultural scientific and technological innovation.
References
For any general inquiries, please fill out the form below.
CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.