Tel: 1-631-275-3058 (USA)   44-208-144-6005 (Europe)       Email: info@cd-genomics.com
CD Genomics-the genomics service company
Support Documents The CD Genomics Way of Thinking Explore the scientific documents we’ve developed, including sample submission guidelines, principles, applications, and bioinformatics of genetic technologies.
Home / Resource / Support Documents / Genome Research / UMI RNA-Seq: An Effective Method for Eliminating PCR Bias

UMI RNA-Seq: An Effective Method for Eliminating PCR Bias

UMI RNA-Seq

RNA-seq is perhaps the most widely used method for read counting, which measures and compares the number of copies of each transcript in different cell types or conditions. However, traditional RNA-seq is limited by a PCR amplification step, performed to generate sufficient DNA molecules for sequencing while bringing biases. PCR bias can cause the overrepresentation of certain transcripts in the final sequencing library. The problem of PCR duplicates is more acute with greater numbers of PCR cycles, as in single-cell RNA-seq. Unique Molecular Identifiers (UMIs) are an effective solution for minimizing PCR bias, leading to more accurate estimates of quantitative gene expression. UMI RNA-seq, also known as digital RNA-seq, has been widely used in academic and clinical research applications.

The use of UMIs in NGS libraries
Figure 1. The use of UMIs in NGS libraries (Roloff et al. 2017).

What Is UMI?

UMIs are random oligonucleotide barcodes that have been increasingly used to confidently identify PCR duplicates in next-generation sequencing (NGS) experiments, especially RNA sequencing (RNA-seq). UMIs are complex indices incorporated into the sample location in each fragment before PCR amplification in the library preparation so as to identify the molecule of origin for each read and accurately identify true PCR duplicates as they have both identical UMI sequences and identical alignment coordinates. UMIs can be applied to a wide range of sequencing methods in which an accurate quantification or detection of rare mutations is required or the input is low, such as RNA-seq, single-cell RNA-seq and immune repertoire sequencing.

Alignment of read families sorted by UMIs allows for the discrimination of rare variants from protocol artifacts introduced during PCR or sequencing procedures
Figure 2. Alignment of read families sorted by UMIs allows for the discrimination of rare variants from protocol artifacts introduced during PCR or sequencing procedures (Roloff et al. 2017). ECS denotes error-corrected sequencing.

Workflow of UMI RNA-seq

The workflow of UMI RNA-seq consists of RNA isolation, rRNA removal, cDNA library construction with UMI barcodes (Figure 3), library quality assessment, deep sequencing, and data analysis. In the library construction step, the rRNA-depleted RNA is fragmented and reversely transcribed into cDNA along with ligation with UMI adapter, followed by library amplification and library QC.

UMI incorporation and library amplification in UMI RNA-seq experiments
Figure 3. UMI incorporation and library amplification in UMI RNA-seq experiments (Dixit 2016).

After deep sequencing, raw data are preprocessed to remove adapter sequences and low quality reads. UMIs in RNA-seq data can be identified using umitools reformat_fastq. PCR duplicates are marked using umitools mark_duplicates. Alignment of read families sorted by UMIs allows for the discovery of novel and rare transcript variants, read counting and then compare the abundances of reads across different samples for identifying differentially expressed transcripts.

References:

  1. Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome research, 2017, 27(3): 491-499.
  2. Fu Y, Wu P H, Beane T, et al. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. Bmc Genomics, 2018, 19(1): 531.
  3. Melsted P, Ntranos V, Pachter L. The barcode, UMI, set format and BUStools. Bioinformatics, 2019, 35(21): 4472-4473.
  4. Roloff G W, Lai C, Hourigan C S, et al. Technical advances in the measurement of residual disease in acute myeloid leukemia. Journal of clinical medicine, 2017, 6(9): 87.
  5. Dixit A. Correcting chimeric crosstalk in single cell RNA-seq experiments. BioRxiv, 2016: 093237.
* For Research Use Only. Not for use in diagnostic procedures.
SPEAK TO OUR SCIENTISTS

What would you like to discuss?

With whom will we be speaking?

Please input "genomics" as verification code.

* is a required item.

Get cutting-edge science information from CD Genomics sent straight to your inbox every month.

SUBSCRIBE TO OUR NEWSLETTER
CONTACT CD GENOMICS

45-1 Ramsey Road, Shirley, NY 11967, USA
Tel: 1-631-275-3058 (USA)
       44-208-144-6005 (Europe)
Fax: 1-631-614-7828
Email: info@cd-genomics.com