Sanger sequencing, as a mature and widely used DNA sequencing technology, is an important basis for obtaining gene information in molecular biology research, clinical diagnosis, and other fields. Sanger sequencing results are usually presented in two forms: electrophoresis map (i.e., sequencing peak map) and corresponding base sequence, in which the peak map directly shows the separation of different bases in the sequencing process, while the base sequence is a direct interpretation of the peak map.
This result presentation form has the basic characteristics of high single-base resolution and high accuracy, and can reflect the base information of each position in DNA fragments. However, for many researchers and clinicians, it is not easy to interpret Sanger sequencing results. In practice, they may encounter problems such as chaotic peak patterns, abnormal peak patterns, and difficulties in base identification, especially in the face of mutation, insertion, deletion, and other variations. Accurately judging and analyzing this information is a big challenge. In addition, combining the sequencing results with the research objectives and reasonably applying them to experimental design and conclusion derivation also requires rich experience and professional knowledge.
This article elaborates on the presentation forms, quality evaluation indicators, data analysis methods, and result applications of Sanger sequencing results, aiming to help researchers accurately interpret and apply such results.
Sanger sequencing results are mainly presented in two forms: electrophoresis peak map and base sequence. The peaks of different colors in the peak diagram correspond to A, T, C, and G bases, and the clarity and height of the peaks reflect the signal quality. The base sequence is converted from the peak map with the mass value attached. It is characterized by high single-base resolution and can display sequence details intuitively, but it also has the problems of reading length limitation and terminal signal attenuation.
The electrophoresis map of Sanger sequencing is generated by separating DNA fragments of different lengths by capillary electrophoresis technology. In the map, the horizontal axis represents the base position (that is, the sequencing length) and the vertical axis represents the fluorescence signal intensity. During electrophoresis, dideoxynucleotide (ddNTPs) with different fluorescent labels will be recognized by the detector with the migration of DNA fragments. Different bases correspond to different fluorescent colors: adenine (A) is green, cytosine (C) is blue, guanine (G) is black or yellow, and thymine (T) is red.
The meaning of the peak is the core of reading the atlas. Each clear and sharp peak represents the appearance of a specific base at this position, and the height of the peak is related to the signal intensity of the base. The higher the signal intensity, the steeper the peak type, indicating that the sequencing reaction has high extension efficiency and good specificity at this position. The continuous peak pattern arrangement forms a complete DNA sequence, and the color and position of the peak can be identified by software, which can be directly converted into the corresponding base sequence.
Sanger sequencing map (Li et al., 2022)
In order to objectively evaluate the reliability of Sanger sequencing results, researchers introduced a series of quality evaluation indicators, among which the most commonly used ones are Phred mass fraction and sequencing depth.
The mass fraction (Q value) of Phred is an important index to measure the accuracy of single base recognition, and its calculation formula is Q = -10log10 (P), where P is the probability of base recognition error. For example, Q20 means that the error probability of this base is 1%, and Q30 means that the error probability is 0.1%. In practical application, it is usually required that the proportion of bases above Q20 is more than 90% and that of bases above Q30 is more than 80% in the sequencing results to ensure the accuracy of sequencing data. Through professional sequencing analysis software (such as Sequencher, BioEdit, etc.), the Phred mass fraction of each base can be viewed, which helps researchers to judge the credibility of sequencing results in different regions.
Sequencing depth usually refers to the number of times the same DNA fragment is sequenced in Sanger sequencing. Different from high-throughput sequencing, the sequencing depth of Sanger sequencing is generally low (usually 1-2 times), but due to its high accuracy, a single sequencing result can meet most experimental requirements. In some scenes that require high accuracy of results (such as mutation confirmation in clinical diagnosis), the same template is usually sequenced in two directions (forward and backward) or repeatedly to improve the reliability of the results. The results of two-way sequencing can verify each other, reduce the possible errors caused by single-direction sequencing, and are especially suitable for detecting the variation in long DNA fragments.
In addition, the reading length of sequencing results is also an important evaluation index. The average reading length of Sanger sequencing is usually 500-800 bases, and the reading length of high-quality sequencing results can reach more than 1000 bases. The length of reading will affect the coverage of long fragments of DNA. When designing sequencing experiments, it is necessary to arrange sequencing strategies reasonably according to the length of target fragments to ensure that the whole target area can be covered.
Sanger sequencing vs next generation sequencing (NGS) (Botella et al., 2015)
Sanger sequencing is a high-precision gene sequencing technology, and its data analysis is the key link to mining gene information. This process needs to rely on professional tools to analyze electrophoresis peaks, identify base sequences, judge reliability by combining quality evaluation indicators, accurately detect abnormalities such as bimodal and mutation, and reveal gene variation by comparing with reference sequences, providing a core basis for scientific research and clinical applications.
The analysis of Sanger sequencing results needs the help of professional software tools, which can help researchers quickly identify base sequences, evaluate sequence quality, compare reference sequences, and detect variations. The following introduces several commonly used software tools and their main functions.
FinchTV is a free and easy-to-use software for viewing sequencing results, which supports various sequencing file formats (such as. ab1,. scf, etc.). Its main functions include displaying the electrophoretogram, base sequence, and corresponding Phred mass fraction. Users can directly observe the peak pattern changes by enlarging the details of the electrophoretogram, and manually correct the wrong base automatically identified by the software. In addition, FinchTV also provides a sequence comparison function, which can simply compare the sequencing results with reference sequences and preliminarily judge whether there is variation.
Chromas is another widely used sequencing analysis software, which is more powerful. Besides the basic functions of FinchTV, it also supports sequence editing, reverse complementary sequence generation, restriction site analysis, and so on.
For researchers who need to conduct large-scale sequence analysis or complex mutation detection, advanced software such as Sequencher and BioEdit can be selected. These software support multiple sequence alignment, automatic mutation detection, splicing overlapping sequences, and other functions, and are suitable for research scenarios such as gene cloning verification and mutation screening.
SeqTrace's user interface, including the project window (A) and the trace-view window (B) (Stucky et al., 2012)
Services you may interested in
Learn More
In Sanger sequencing results, there are often some abnormal peaks, such as double peaks, deletion peaks, noise peaks, etc. These problems will affect the accurate identification of base sequences and need to be correctly identified and processed.
Bimodal refers to two highly similar peaks at the same base position, which are usually caused by template pollution, heterozygote samples, or nonspecific amplification during sequencing.
A-N The main issues encountered in the reading of DNA chromatograms of PCR products based on the Sanger sequencing method (Al-Shuhaib et al., 2023)
Comparing the sequencing results with the reference sequence is the key step to analyze the sequencing data. Mutation types such as mutation, insertion, and deletion can be detected by comparison, which provides a basis for subsequent research.
First of all, it is necessary to obtain the reference sequence of the target gene or fragment, which can be downloaded from public databases such as GenBank. Then, sequence comparison software (such as BLAST, ClustalW, MegAlign, etc.) is used to compare the sequenced sequence with the reference sequence. The alignment results are usually displayed in the form of sequence alignment, in which the same bases are represented by the same characters, different bases are marked by different characters, and inserted or missing bases are represented by horizontal lines or other symbols.
In mutation detection, point mutation is the most common mutation type, which shows that a base in the sequencing sequence is different from the reference sequence. For example, the base in the reference sequence is "A" and the corresponding position in the sequencing sequence is "G", which indicates that there is a point mutation with G>A at this position. By looking at the peak type and Phred mass fraction of this position, we can confirm the reliability of the mutation and avoid false-positive results caused by sequencing errors.
Comparing the different processes of the Sanger method and NGS in detecting different pathogens (Nafea et al., 2023)
Sanger sequencing has become the cornerstone of molecular biology research with high accuracy, and its results have irreplaceable applications in many fields. From the verification of gene cloning to ensure the correct insertion of fragments, to the diagnosis and treatment of diseases by mutation detection, to the study of gene function to reveal the mechanism of gene action, accurate interpretation of sequencing results is the key to promoting scientific research and clinical progress.
In genetic engineering research, it is necessary to verify the correctness of the inserted fragment by Sanger sequencing after constructing the recombinant plasmid. The research team inserted a target gene into the pET-28a vector to construct a recombinant expression plasmid. After sequencing the recombinant plasmid, the sequencing results were compared with the reference sequence and vector sequence of the target gene. It was found that the sequence of the inserted fragment was completely consistent with the target gene, and the insertion direction was correct, without base mutation or deletion, which indicated that the recombinant plasmid was successfully constructed and could be used for subsequent protein expression experiments.
Sequence chromatogram (A) and sequence quality evaluation (B) from clinica Staphylococcus aureus strain 1 (Chen et al., 2014)
In clinical diagnosis, Sanger sequencing is often used to detect gene mutations related to diseases. When detecting EGFR gene mutation in tumor tissue samples of suspected lung cancer patients, the deletion mutation of exon 19 was found in the patient samples by sequencing the hot mutation regions of the EGFR gene. Combined with the clinical symptoms and other examination results of the patient, it can be determined that the patient is suitable for treatment with EGFR tyrosine kinase inhibitors.
In the study of gene function, Sanger sequencing can be used to verify the effect of gene knock-out or knock-in experiments. The researchers used CRISPR-Cas9 technology to knock out a gene in mice, amplified the target gene region by PCR, and sequenced it. If the sequencing results show that there is an expected deletion or insertion mutation in the target gene region, and the mutation causes the gene reading frame to shift, it indicates that the gene knockout is successful. Subsequently, the biological function of the gene can be studied by observing the phenotypic changes of knockout mice.
Amplification curves (A) and melting curves (B) of partial experimental strains (Chen et al., 2014)
Correct interpretation and analysis of Sanger sequencing results are the key to giving full play to the advantages of this technology, which not only relates to the reliability of experimental results but also affects the formulation of subsequent research directions and the accuracy of scientific research conclusions. By mastering the presentation form, quality evaluation index, and data analysis method of sequencing results, researchers can accurately identify base sequences and detect gene variation, and effectively apply sequencing data to gene cloning verification, mutation detection, gene function research, and other fields.
References: