Our variant calling service transforms sequencing data into actionable insights, enabling studies of hereditary disease genetics, cancer pathway discovery, evolutionary studies, and candidate target exploration for biomedical researchers.
The Variant Calling Service detects genomic variations in individual genomes using high-throughput sequencing data. This provides molecular-level insights for disease research, biomedical research, and evolutionary biology studies. The methodology relies on a multi-step computational pipeline. First, sequence alignment tools map raw sequencing reads to a reference genome to generate BAM/SAM files. Subsequently, sorting, duplicate marking, and base quality recalibration processes eliminate technical artifacts. Finally, variant detection algorithms analyze processed alignment data to identify variations and generate standardized VCF files. Annotation tools then assess the biological impact of these variants. This service supports research workflows with high efficiency, accuracy, and standardization. It is bridging the gap between genotypes and phenotypes.
Our Variant Calling Service empowers your research with:
Variant calling uses high-throughput sequencing. It fully sequences and studies DNA differences in individual or species-wide genomes. This process finds many genetic variations, including single nucleotide variants (SNVs), insertion/deletion sites (indels), structural variations (SVs) and copy number variations (CNVs). It checks every base of the whole genome. This creates essential data for later research. The results enable efficient comparative genomic studies, precise gene mapping and creation of molecular markers.
Table 1: Comparative genotyping techniques
| Technology | Conventional depth range | Application Scenario | Key Advantages |
| RAD-seq | 10X-30X |
|
|
| WGS | 30X-50X |
|
|
| WES | 50X-100X |
|
|
High-quality sequencing reads are mapped to a standardized reference genome. Thus it enables the identification of genetic variations through comparison with the reference and accurate detection and interpretation of variants.
Variant Calling
Application: Characterize hereditary disorders, population genetic structures, and familial disease susceptibility.
Application: Improves sensitivity for low-frequency variants and mitigates false positives through unified statistical modeling. It is optimal for large-scale genomic studies and population-level analyses.
Application: Address challenges from tumor purity and subclonal heterogeneity, serving as a cornerstone for cancer genomics and cancer genomics research.
Application: Reduce false positives by filtering germline polymorphisms and technical artifacts, enhancing specificity for tumor-specific alterations.
Copy Number Variations (CNVs) Analysis
CNV analysis identifies large-scale genomic alterations (>1 kb), including duplications or deletions of DNA segments, through statistical modeling of sequencing depth deviations and segmentation algorithms.
These variations drive pathogenic mechanisms in genetic disorders, developmental abnormalities, and neurodegenerative diseases.
Structural Variations (SV) Detection
SV detection characterizes large-scale genomic rearrangement, such as inversions, translocations, and insertions/deletions (>50 bp), by integrating discordant read pairs and split-read alignments from high-throughput sequencing data.
This analysis is critical for elucidating complex genetic disorders and oncogenic events.
Microsatellite Instability (MSI) Status
MSI refers to genetic alterations caused by defects in the DNA mismatch repair system. This leads to length variations in microsatellites (short repetitive DNA sequences, 1–6 nucleotides) through insertions or deletions during replication.
MSI is classified into MSI-High (MSI-H, instability in ≥2/5 markers ), MSI-Low (MSI-L, instability in one marker) and Microsatellite Stable (MSS, No instability). NGS-Based assays analyze hundreds of microsatellite loci and is applicable to FFPE tissue or ctDNA.
Research significance:
Tumor Mutational Burden (TMB)
TMB measures the total number of genetic mutations found in the DNA of cancer cells. This is often reported as the number of mutations per megabase (Mb) of interrogated genomic sequence. In common practice, TMB thresholds are stratified as High TMB (TMB-H, ≥10 mut/Mb for pan-cancer applications) and Low TMB (TMB-L, <10 mut/Mb, indicating lower tumor immunogenicity and reduced likelihood of benefiting from ICIs).
Research applications:
Figure 1: Variant Calling Service Workflow
Variant calling enables the identification of disease-associated genetic mutations through high-throughput sequencing workflows. This process detects SNVs and INDELs in human genomes, supporting studies of hereditary disorders and pharmacogenomic analyses of drug-response variability.
In plant breeding, variant calling speeds up trait improvement. It finds genetic changes linked to valuable crop traits.
For hybrid crops, variant discovery in parent lines guides crosses. This adds beneficial gene versions while removing harmful ones.
Variant calling underpins analyses of genetic diversity and evolutionary forces across populations. Population genetics leverages SNV/INDEL datasets to compute allele frequencies, detect selection signatures, and model demographic histories.
In quantitative genetics, variant calling enables genomic prediction of complex traits. It estimates heritability via mixed models and calculates kinship coefficients to correct for relatedness in GWAS.
Evolutionary studies integrate ancient DNA variant calling with methods like Treemix to reconstruct gene flow events and identify adaptive mutations.
Variant calling resolves taxonomic classifications and phylogenetic relationships by comparing genomic variations across species. DNA barcoding relies on SNV profiling of conserved loci to delineate species boundaries. While whole-genome variant data (SNPs/INDELs) constructs cladograms to infer evolutionary divergence timelines. For cryptic species, variant density analysis identifies regions under divergent selection, clarifying classification ambiguities.
Kinship and ancestry studies measure how closely people are related. They do this using variant data to calculate IBD probabilities and mixed ancestry proportions.
Tools like PLINK use SNV data to create kinship matrices. They also perform PCA to fix population bias in disease studies. Forensic applications leverage variant calling to verify familial relationships.
Figure 2: Number of overlapping variants after identifying multi-caller clusters for deletions, duplications, and mCNVs (Jakubosky, 2020)
The complete sequence of a human genome
Journal:Science
Published:2022
The human reference genome was first released in 2000. It only covered the euchromatic regions. About 8% of the heterochromatic regions remained missing. These regions were unresolved due to high repetitiveness and complex structure. This study uses long-read sequencing technologies. It builds the first telomere-to-telomere (T2T) complete human genome assembly (T2T-CHM13). This assembly includes all 22 autosomes and the X chromosome. With this work, genomic gaps are now filled. Previous errors are also corrected.
This study uses the CHM13 cell line (46, XX) for the genome assembly. These cells come from a hydatidiform mole. They are nearly homozygous, meaning very few heterozygous sites exist. This simplifies the assembly and improves accuracy. The method combines PacBio HiFi sequencing with Oxford Nanopore (ONT) ultralong-read technologies. It also uses Illumina short-reads, Hi-C, and Strand-seq data for validation. Together, this approach achieves full genomic coverage.
The T2T-CHM13 assembly uncovered 23 paralogs of the FRG1 gene, surpassing the 9 paralogs documented in GRCh38. These paralogs exhibit high sequence conservation with protein domains remaining highly conserved. The paralog FRG1DP (chr20) shares >99% similarity with FRG1BP4-BP10 (acrocentric chromosomes), explaining prior misannotation issues. The absence of these paralogs in GRCh38 caused systematic misalignment of HiFi reads from other samples. When reanalyzed using the complete T2T-CHM13 reference, reads were correctly assigned to their true genomic loci and variant frequencies normalized to typical heterozygous variation patterns. This case demonstrates how incomplete references introduce technical artifacts in clinical genomics. The T2T-CHM13 assembly resolves such errors, particularly for genes embedded in segmentally duplicated regions.
Figure 3: Reference (gray) and variant (colored) allele coverage is shown for four human HiFi samples mapped to the paralog FRG1DP.
Our standard deliverables include a detailed project report. The report contains methods, workflow, key results (tables/figures), and biological interpretations. We also offer custom deliverables tailored to your needs. For example, intermediate data files, custom graphics, or special analysis files can be offered.
It include the following biological information:
References