As the development of biological experimental technology, especially gene-sequencing technology, both laboratory and clinical researchers realize that genome sequencing is the best way to analyze the etiology, pathophysiology, treatment and prognosis of diseases. Researches further demonstrate that there are only 30 million base pairs of genes that contain essential information of proteins for human beings.
The exome is ususally defined as the sequence encompassing all exons of protein coding genes, as well as nonprotein coding elements such as microRNA or lncRNA. The investigation of exome helps to figure out which loci are responsible for proper diseases. When researchers plan to explore exons information of human genome, the cost to whole genome sequencing will be quite surprising considering the total length of human genome is over 3 billion base pairs in size. To study rare mendelian diseases, exome sequencing is a more effective way to identify the genetic variants. The breakthrough of target-enrichment strategies and DNA sequencing techniques contributes to the development of whole exome sequencing.
Principle of exome sequencing
Exome sequencing contains two main processes, namely target-enrichment and sequencing. Target-enrichment is to select and capture exome from DNA samples. There are two major methods to achieve the enrichment of exome.
Both of the methods are effective ways to extract exome from genome. So we say the sensitivities of both are high enough. However, the problem is specificity. There are parts of genome which share the similar sequence of some exons. Those parts of genome may bind to microarray or magnetic beads, resulting in false positive.
Sequencing is the process to figure out the arrangement of all the deoxyribonucleotides in exome, which may help us to understand the potential pathophysiology alternation in some diseases. Because of the decrease in the cost, the importance of whole exome sequencing is prominent. The cost of human genome is approximately equal to two or three times the cost of whole exome sequencing. So why not run more samples using whole exome sequencing to obtain more statistically significant result?
General workflow of exome sequencing
Here a common workflow of exome sequencing is shown as below. The instructions of the major processes in the workflow will be discussed below.
Figure1. Workflow of whole exome sequencing. Notice that the detailed procedures are various from different types of samples, reagent kits and sequencing instruments. Researchers should follow the instructions of reagent, kits and sequencing instruments.
• Prepare your DNA samples: DNA fragmentation
Almost all the experiment on DNA begins with DNA fragmentation. DNA should be sheared into proper pieces, because usually, the length of DNA sample extracted from tissues or cells is too long. This shearing process is called DNA fragmentation. Effective target length is determined by the sequencing instrument that you choose. In order to process whole exome sequencing, there are several major ways to fragmentize DNA samples.
After fragmentation, your DNA samples are ready for target-enrichment process.
• Isolation of exome: target-enrichment methods
Exome has to be isolated from human genome before sequencing as the former contributes to only 1% of the latter. The process of capturing the target genomic regions is called target-enrichment. The basic idea of target-enrichment is to separate anything of interest from other substances using the physicochemical property difference between them. There are some common kits of target-enrichment methods. No matter what kit you choose, the variability in capture influences your exome sequencing, so be aware of the quality, quantity and fragment sizes of your DNA samples.
Table1. Common kits of target-enrichment for sequencing.
|Kits||Targeted Region||Genomic DNA Input Required||Adapter Addition||Probe Length (mer)|
|Agilent SureSelect XT2 V6 Exome||60 Mb||100 ng||Ligation||120|
|Agilent SureSelect XT2 V5 Exome||51 Mb||100 ng||Ligation||120|
|IDT xGEN Exome Panel||39 Mb||500 ng||Ligation||not described|
|Illumina Nextera Rapid Capture Expanded Exome||62 Mb||50 ng||Transposase||95|
|Roche NimblegenSeqCap EZ Exome v3.0||64 Mb||1 ug||Ligation||60 - 90|
• Harvest your products: washing and elution
After the separation of exome and other parts of genome, several times of washing are required. The process of washing is just like what this word means literally -- to wash out anything we do not want so as to keep the thing of interest. In this case, we do not want substances such as the other parts of genome, proteins, and electrolytes. Distilled water is usually used to elute target, but some special reagent kits may require specific eluent. Eluent is the reagent to wash down the exome from microarray or magnetic beads, which is able to break the connection between exome and binding substances. Both washing and elution process could be processed multiple times in order to obtain purer exomes. Also in some cases, one more target-enrichment process is performed to make the elution better. Just follow the instruction of reagent kit you used, and adjust your protocol according to your actual situation.
• Sequencing technology
Because of time cost and length requirement of the Sanger Sequencing, the sequencing technology did not contribute much in biological and clinical studies, until next generation sequencing (NGS) technologies are invented. NGS technologies are based on the usage of dyed ddNTPs in Sanger method. The improvement is that NGS allows DNA strands to be combined, amplified and detected at the same time, leading to breakneck increase in length requirement and efficiency of sequencing. To simplify, the principle of NGS is to bind the exome samples in a proper base (such as flowcell of Illumina Hiseq and magnetic beads of Roche-454) and replicate them by PCR-in-situ, in order to make signal in every round of elongation amplified. Then ddNTPs are detected after every round of elongation. Finally, the complete sequence is integrated using biological information algorithm. NGS largely improves the efficiency and allows higher-throughput detection, that is why NGS is also called high-throughput sequencing and is widely used.
Besides of NGS, the third generation of sequencing is developing rapidly, which largely exceeds the efficiency of NGS. The key feature of third generation sequencing is single-molecule sequencing. It shortens the time cost of whole genome sequencing to several minutes. Companies such as PacificBio and Oxford Nanopore have proved their method works, and third generation of sequencing technology could lead a revolution in exome sequencing area.
Here are some common methods of whole exome sequencing used nowadays.
Table2. Common methods used for sequencing nowadays.
|Methods||Company||Generation||Read length||Accuracy||Reads per run||Time per run|
|Ion semiconductor||Ion Torrent||2nd generation||Up to 600 bp||99.60%||up to 80 million||2 hours|
|Pyrosequencing(454)||Roche||2nd generation||700 bp||99.90%||1 million||24 hours|
|Sequencing by synthesis||Illumina||2nd generation||75-300 bp||99.90%||1 million to 3 billion||1 to 11 days|
|Sequencing by ligation (SOLiD)||ABI||2nd generation||50+35 or 50+50 bp||99.90%||1.2 to 1.4 billion||1 to 2 weeks|
|Nanopore Sequencing||Oxford Nanopore Technologies||3rd generation||up to 500 kb||92–97% (single read)*||dependent on read length selected by user||1 min to 48 hours|
|Single-molecule real-time sequencing||Pacific Biosciences||3rd generation||30,000 bp||87% (single read)*||10-20 billion||0.5-20 hours|
*For third generation sequencing, accuracy is usually improved by sequencing for multiple times.
• Data analysis
The data of sequencing are confusing and unreadable before bioinformatics analysis and interpretation, because most of the sequencing methods produce short fragments of sequence, which require sequence assembly to figure out the final result. The following pipeline can be used by researchers who are interested in performing WES analysis for variant calling and genetic diseases.
Figure2. The typical variant calling pipeline.
We have benefited a lot from exome sequencing in both academic research and clinic diagnosis. Thanks to exome sequencing, the understanding of genome is developed to a new level. Many diseases used to be mysteries, such as neurological disorder in infants, which could be predicted now. Furthermore, many diseases with few treatments, such as carcinoma, are allowed to be treated by targeted therapy. It is said that the fourth generation of sequencing technology is developing. Hope it would drive another revolution in biological and medical research.
If you are interested in our genomics services, please feel free to contact our scientists. We are more than happy to be of assistance. In addition to genomics sequencing, we also provide services including transcriptomics, epigenomics, microbial genomics, single-cell sequencing, and PacBio SMRT sequencing.