The rapid development of high-throughput sequencing technologies now provides opportunities to interrogate DNA methylation at single base resolution with high coverage on a massive scale. Bisulfite sequencing is the gold-standard for measuring methylation over the genomes of interest (Wreczycka et al., 2017). The reduced representation bisulfite sequencing (RRBS) has been widely used for studying genome-wide DNA methylation due to its significantly reduced sequencing cost and high-sequencing coverage and sensitivity (Gu et al., 2010; Meissner et al., 2005).
Bioinformatics Analysis Pipeline for RRBS Data
Analysis of DNA methylation patterns on a genome-wide scale is essential to understanding the underlying mechanisms of DNA methylation. The computational pipeline for analysis of RRBS data is shown in Figure 1.
Figure 1. Pipeline for analysis of RRBS data. CpG: CG sequences, C is cytosine and G is guanine. CHG and CHH: H is A (adenine), C or T (thymine).
Alignment tools used for RRBS data analysis
Because of the complexity of bisulfite sequencing alignments (the aligned sequences do not exactly match the reference genome, and the complexity of the libraries is reduced), standard sequence alignment software cannot be used. Due to the unique properties of RRBS, special tools are needed for alignment and analysis. Five commonly used mapping algorithms for benchmarking analysis in RRBS data, include Bismark, BS-Seeker2, BSMAP, GSNAP, and bwa-meth, which are listed in Table 1 (Sun et al., 2018).
Table 1. Brief description of different alignment tools for RRBS data analysis.
Bismark | BS-Seeker2 | bwa-meth | BSMAP | GSNAP | |
Mapping strategy | Three-letter | Three-letter | Three-letter | Wildcard | Wildcard |
Aligner | Bowtie, bowtie2 | Bowtie, bowtie2, SOAP | BWA | SOAP | Gsnap |
Adapter trimming | No | Yes | No | Yes | Yes |
Multi-cores | Yes | Yes | Yes | Yes | Yes |
Directional /undirectional | Yes/Yes | Yes/Yes | Yes/No | Yes/Yes | Yes/Yes |
Single-end/pair-end | Yes/Yes | Yes/Yes | No/Yes | Yes/Yes | Yes/Yes |
Programming language | Perl | Python | Python | C++ | C and Perl |
DNA Methylation databases
A Large amount of data has been generated by the NGS-based DNA methylation detection technologies in the past years. Several methylation databases have been developed to store these data and are available for researchers (Table 2) (Su et al., 2012). With the development of a study about DNA methylation, more databases will be established and then more information about methylation will be known.
Table 2. DNA Methylation databases.
Tools | Purpose | Web page |
MethDB | Database for DNA methylation data | http://www.methdb.de |
MethyCancer Database | Database of cancer DNA methylation data | http://methycancer.psych. ac.cn/ |
PubMeth | Database of DNA methylation literature | http://www.pubmeth.org/ |
NGSmethDB | Database for DNA methylation data at single-base resolution | http://bioinfo2.ugr.es/ NGSmethDB/gbrowse/ |
DBCAT | Database of CpG islands and analytical tools for identifying comprehensive methylation profiles in cancer cells | http://dbcat.cgm.ntu. edu.tw/ |
MethylomeDB | Database of DNA methylation profiles of the brain | http://epigenomics. columbia.edu/ methylomedb/index.html |
DiseaseMeth | Human disease methylation database | http://bioinfo.hrbmu.edu. cn/diseasemeth |
CpG IE | Identification of CpG islands | http://bioinfo.hku.hk/ cpgieintro.html |
CpG IS | Identification of CpG islands | http://cpgislands.usc.edu/ |
CG clusters | Identification of CpG islands | http://greallylab.aecom. yu.edu/cgClusters/ |
CpGcluster | Identification of CpG islands | http://bioinfo2.ugr.es/ CpGcluster |
CpGIF | Identification of CpG islands | http://www.usd.edu/~sye/ cpgisland/CpGIF.htm |
CpG_MI | Identification of CpG islands | http://bioinfo.hrbmu.edu. cn/cpgmi |
CpGProD | Identification of CpG islands | http://pbil.univ-lyon1.fr/ software/cpgprod.html |
EpiGRAPH | Genome scale statistical analysis | http://epigraph.mpi-inf. mpg.de/WebGRAPH |
Galaxy | General purpose analysis | http://main.g2.bx.psu.edu/ |
QDMR | Identification of differentially methylated regions | http://bioinfo.hrbmu.edu. cn/qdmr. |
Batman | MeDIP DNA methylation analysis tool | http://td-blade.gurdon. cam.ac.uk/software/ batman |
CisGenome Browser | A flexible tool for genomic data visualization | http://biogibbs.stanford. edu/~jiangh/browser/ |
MethVisual | Visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing | http://methvisual.molgen. mpg.de/ |
MethTools | A toolbox to visualize and analyze DNA methylation data | http://genome.imb-jena. de/methtools/ |
Challenges of methylation calling in RRBS Data Analysis
There are two key factors affecting the accuracy of methylation calls when determining the methylation state of bisulfite sequencing reads. First, the sequencing reads must be correct and derive entirely from bisulfite-converted sequences. Second, the reads must be correctly mapped to the reference genome. Failure of these two factors will result in the generation of incorrect methylation calls. And in extreme cases, the noise from these miscalls can adversely affect the conclusions of the experiment (Krueger et al., 2012). The process of restriction enzymes (commonly using restriction endonuclease MspI) digestion, bisulfite conversion and sequencing involved in RRBS would affect these two factors.
The MspI digestion would result in a wide range of DNA fragments in different sizes (Figure 2), and usually fragments between 40 and 220bp will be size-selected for the RRBS library. Quite a few MspI digested fragments, even shorter than 40bp, will be generated during the process. If the size selection process is not as good as it is in theory, often a sizeable number of fragments below 40bp can end up in the RRBS library.
Figure 2. The relative frequencies of MspI digestion product sizes in the human reference genome. (Suzuki et al., 2010)
The shorter fragments are more likely to be sequenced than larger (≥300bp) fragments. But short reads in the bisulfite sequencing data could result in low mapping efficiency in data analysis (Figure 3).
Figure 3. Performance of methylation-aware mapping (biased) and unbiased mapping for methylation sequencing data. (Krueger et al., 2012)
The bisulfite treatment of DNA mediates the deamination of unmethylated cytosine into uracil, and these converted residues will be read as thymine, which is determined by PCR-amplification and subsequent sequencing analysis (Figure 4).
Figure 4. The principle of bisulfite sequencing.
Bisulfite sequencing relies on the conversion of every single unmethylated cytosine residue to uracil. Incomplete conversion will cause false positive results due to incorrect interpretation of the unconverted unmethylated cytosines as methylated cytosines (Figure 5).
Figure 5. Incomplete bisulfite conversion.
Due to the short size-selected fragment size in the RRBS library, several factors in the sequencing process would affect the RRBS data analysis (Krueger et al., 2012):
At CD Genomics, we are dedicated to providing reliable epigenomics sequencing services, including EpiTYPER DNA methylation analysis, targeted bisulfite sequencing, reduced representation bisulfite sequencing (RRBS), whole genome bisulfite sequencing, MeDIP sequencing, ChIP-seq, and MethylRAD-seq.
References: