Variant Calling

What is Variant Calling

Genetic variation is a type of variation that can be inherited to offspring caused by changes in the genetic material of an organism. It is this variation that causes organisms to exhibit genetic diversity at different levels. Genetic diversity is the material basis for the survival and development of human society and plants. There are many types of genetic variation, from microscopically visible chromosome inversion to single nucleotide mutations. With the development of genomics, genetic variation information has become more comprehensive and has included SNP, InDel, SV, CNV, and transposon mutations, et al.

Variant calling refers to the use of high-throughput sequencing technology to sequence and analyze the differences in the entire genome of an individual or population of a species, to obtain a large amount of genetic variation information, such as Single Nucleotide Polymorphism (SNP), Insertion and deletion sites (InDel) and structural variation sites (SV), copy number variation (CNV) and other information. Variant calling can provide the most basic and comprehensive data foundation for subsequent functional gene fine mapping and quickly, accurately and efficiently analyze the differences between genomes, analyze each base of the whole genome, and obtain the most extensive molecular markers.

Advantages and features of variant calling

  • Abundance: In-depth analysis of all aspects of genetic variation, including SNP, InDel, SV, SNV, novel gene, et al.
  • Flexibility: with or without reference is suitable
  • Accuracy: different sequencing methods can be applied based on different material

Variant calling workflow

variant calling workflow

Sequencing technology pipeline

sequencing technology-based variant calling pipeline

Service Specifications

Sample Requirements
  • DNA sample: ~0.5 μg (concentration ≥ 10 ng/μl; OD260/280=1.8~2.0)
Sequencing
  • 10X/detection for SNP and small InDel; 20X/detection for SV; 30x detection for CNV
  • GBS: 10~20W Tags; average 8 X/Tag
  • Illumina Hiseq platform, MGI DNBSEQ-T7/DNBSEQ-G400 Long read sequencing platform
  • Analysis of sequencing quality metrics
Bioinformatics Analysis
We provide customized bioinformatics analysis including:
  • Raw data QC
  • Reference alignment or assembling
  • Variant information
  • Personalized analysis

Delivery
  • Raw data(FASTQ)
  • Data analysis report

References:

  1. Jansen S, Aigner B, Pausch H, et al. Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage[J]. BMC Genomics, 2013, 14(1): 1.
  2. Zheng L Y, Guo X S, He B, et al. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor)[J]. Genome Biology, 2011, 12(11):287-302.

The Influence of Structural Variants (SVs) on Gene Structure and Gene Expression

In a 2020 online publication on "Cabbage"[1], a comparative analysis of variations among three cabbage types, including Presence/Absence Variations (PAVs) between genes, Structural Variants (SV) within the genome, and Single Nucleotide Polymorphisms (SNPs), revealed significant insights. Specifically, differences in SV within leaf-shape genes (e.g., KAN from the GARP transcription factor family) and the inhibitory roles of flowering-related genes (e.g., MAF4, SVP) were observed among the three morphological types (as depicted in Figure 1, left). These findings hold crucial significance in unraveling the molecular mechanisms governing organ morphogenesis and flowering, as well as genetic enhancements, in cabbage and related vegetable crops.

PAV and Specific SVs in the Cabbage GenomeFigure 1: PAV and Specific SVs in the Cabbage Genome[1]

To validate the applicability of SV analysis and explore its utility in population evolutionary studies, a 2020 online publication on "Tomato"[2] conducted an investigation. Researchers obtained a systematic evolutionary tree based on over 800 materials with SNPs from short-read sequencing data (Figure 2A). Subsequently, they selected a representative set of 100 materials from 7 lineages and collected long-read sequencing data to construct a tree diagram based on SV data (Figure 2B).

The results indicated that the selected materials were distributed within their known taxonomic groups, aligning with the SNP-based classification. This demonstrates the suitability of SVs for genetic population analysis.

Tomato Systematic Evolutionary TreeFigure 2: Tomato Systematic Evolutionary Tree[2]

In a 2020 online publication on "Rice"[3], researchers constructed population structure diagrams based on both SNP data (Figure 3A) and SV data (Figure 3B). The SNP analysis revealed the broad division of the population into approximately six groups, with a clear distinction between japonica and indica rice varieties. Interestingly, the SV analysis produced highly consistent results, further supporting the differentiation observed in the SNP analysis.

Genetic Structure of RiceFigure 3: Genetic Structure of Rice [3]

Population SV Mutation Frequency Spectrum

To investigate the significant impacts of deleterious variants, the 2020 publication on rice[3] calculated frequency spectra for non-coding sites in different groups (Figure 4-A, B, C). Each SFS includes five SV types (DUP, DEL, TRA, MEI, and INV) and two SNP types (Syn, Nsyn) and reveals three notable features:

Significant differences exist among various population groups, consistent with enhanced genetic drift during domestication bottlenecks and changes in mating systems.

The proportion of fixed SVs is lower than that of fixed synonymous SNPs and non-synonymous SNPs. SVs have a lower frequency in the genome and are more likely to be purged after mutation, indicating a higher likelihood of harm associated with SVs.

INV events exhibit the most extreme SFS, with over 90% of INV events identified in three or fewer individuals in each group, implying that INV events may be under strong selection during the evolutionary process.

Researchers performed statistical analysis of the locations of SVs and SNPs on the chromosomes, and the test results indicate a significant correlation between the diversity of SVs and SNPs in chromosomal windows (Figure 4-D). This suggests that SVs provide population genetic information that is fundamentally consistent with SNPs.

SV Frequency SpectraFigure 4: SV Frequency Spectra[3]

Linkage Disequilibrium Analysis with SVs

In the [rice] study[3], LD (Linkage Disequilibrium) was calculated for three distinct population groups using SNP, SV, and SNP+SV data. Due to their potentially harmful effects, SVs often exhibit lower population frequencies compared to SNPs and may experience a more rapid LD decay over physical distances (Figure 5). SNP data showed that within approximately 100 kb, the r2 for japonica SNPs remained around 0.2, while for indica, it was approximately 0.1. In the same physical distance, r2 for rufipogon was less than 0.05. However, the r2 values for SVs were lower than those for SNPs in all population groups, with values exceeding 0.1 only within very short distances (<15 kb).

Linkage Disequilibrium AnalysisFigure 5: Linkage Disequilibrium Analysis[3]

Population Domestication Study

In the research conducted on rice[3], an evaluation of genomic disparities between single-nucleotide polymorphisms (SNPs) and structural variants (SVs) revealed a noteworthy distinction. The study observed that the average FST estimate for SNPs was markedly higher in comparison to SVs. This finding indicates that SVs typically exhibit lower population frequencies when contrasted with SNPs.

Upon integrating this data with well-established domestication and improvement genes, the investigation reaffirmed the substantial enrichment of these genes within the top 1% and 10% FST intervals. Consequently, this provided valuable insights into functional genes associated with physiological processes, morphological characteristics, and food quality (Figure 3).

SV Characteristics Associated with DomesticationFigure 6: SV Characteristics Associated with Domestication[3]

Whole Genome Association Analysis

In the 2020 publication on [canola][4], GWAS (Genome-Wide Association Study) was performed using the identified PAVs from eight canola varieties. This analysis revealed causal relationships between PAVs and traits such as siliqua length, seed weight, and flowering time. Interestingly, these significant discoveries were overlooked in the SNP-GWAS results (Figure 7).

Similarly, in the 2021 publication on [peach][5], candidate SVs associated with traits such as early fruit ripening, pericarp color around the stone, fruit shape, and flat shape formation were observed (Figure 8).

Canola PAV_GWAS Analysis ResultsCanola PAV_GWAS Analysis Results[4]

Peach SV_GWAS Analysis ResultsFigure 8: Peach SV_GWAS Analysis Results[5]

References:

  1. Li P, Su T, Zhao X, et al. Assembly of the non-heading pak choi genome and comparison with the genomes of heading Chinese cabbage and the oilseed yellow sarson. Plant Biotechnology Journal, 2020.
  2. Alonge M , Wang X , Benoit M , et al. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell, 2020.
  3. Yixuan K , Yi L , Tuomas T , et al. Evolutionary Genomics of Structural Variation in Asian rice (Oryza sativa) Domestication. Molecular Biology and Evolution, 2020.
  4. Song J, Guan Z, Hu J, et al. Eight High-quality Genomes Reveal Pan-genome Architecture and Ecotype Differentiation of Brassica napus. Nature Plants, 2020.
  5. Jiantao G, Yaoguang X,Yang Y, et al. Genome structure variation analyses of peach reveal population dynamics and a 1.67 Mb causal inversion for fruit shape. Genome Biology, 2021.
For Research Use Only. Not for use in diagnostic procedures.
Quote Request
! For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment.
Contact CD Genomics
Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
Top