What is RNA-Seq?
Regulation of gene expression is fundamental to link genotypes with phenotypes. RNAs shape complex gene expression networks which drive biological processes. An in-depth understanding of the underlying mechanisms about how to govern these complex gene expression networks is vital for the treatment of complex disease such as cancer. Hybridization-based microarrays are used to allow the simultaneous monitoring of expression levels of annotated genes in cell populations. However, genome-wide approaches are proved to provide more valuable insights into transcriptomes. These next/third sequencing platforms allow the rapid and cost-effective generation of massive amounts of sequence data. The RNA profiling by utilizing high-throughput sequencing technologies are known as RNA-seq.
What are the applications of RNA-Seq?
Since RNA-seq is quantitative, it is useful to determine RNA expression levels. In addition to this basic function, RNA-seq can be used for differential gene expression, variants detection and allele-specific expression, small RNA profiling, characterization of alternative splicing patterns, system biology, and single-cell RNA-seq.
Figure
1. Overview of the typical RNA-seq analysis pipeline (Han et al. 2015).
• Differential gene expression
An important application of RNA-seq is the comparison of transcriptomes across different developmental stages, treatments, or disease conditions. This analysis, also known as differential gene expression analysis, requires identification of genes along with their isoforms and precise assessment of their expression levels. It is important to illustrate functional elements of the genome and uncover the biological mechanisms of development and disease.
The common tools for differential gene expression include Cuffdiff, DESeq, DESeq2, EdgeR, PoissonSeq, Limma voom, and MISO.
• Variants detection and allele-specific expression
RNA-seq allows identification of variants and allele-specific expression. Single-nucleotide polymorphisms (SNPs) refer to the variation in a single nucleotide that occurs at a specific position in the genome, which may lead to allele-specific expression (ASE). ASE means that one of two alleles is highly transcribed into mRNA and the other is lowly transcribed or even not transcribed at all. Recent studies have also associated ASE to the susceptibility of a number of human diseases. RNA-seq and whole-genome DNA sequencing (WGS) allow identification of common disease variants, including SNPs and ASE.
The common tools used for variants detection are GATK, ANNOVAR, SNPiR, SNiPlay3.
• Small RNA profiling
Small RNA species generally involve microRNA (miRNA), small interfering RNA (siRNA), and piwi-interacting RNA (piRNA), as well as other types of small RNA, such as small nucleolar RNA (snoRNA) and small nuclear RNA (snRNA). Small RNAs play a role in gene silencing and post-transcriptional regulation of gene expression. Small RNAs have been demonstrated to be involved in biological processes, including development, cell proliferation and differentiation, and apoptosis. Most initial small RNA discovery studies used pyrosequencing, and subsequently, other NGS platforms with higher throughput, which resulted in genome-wide surveys and the discovery of an increasing number of small RNA species. Common bioinformatic tools for small RNA sequencing data are shown in Table 1.
Table 1. sRNA-seq web application comparison (Rahman et al. 2018).
Features | Oasis 2 | omiRas | mirTools 2.0 | MAGI | Chimira | sRNAtoolbox |
FASTQ compression | √ | √ | √ | |||
miRNA modifications and edits | √ | √ | √ | √ | √ | |
Novel miRNA database | √ | √ | ||||
Infection and cross-species analysis | √ | |||||
Non-model organism | √ | √ | ||||
Differential expression | √ | √ | √ | √ | √ | √ |
Multivariate differential expression | √ | √ | ||||
Classification | √ | |||||
Novel miRNA target prediction | √ | √ | √ | |||
Pathway/GO analysis | √ | √ | √ | √ | √ | |
Batch job submission (API) | √ | |||||
Genome browser | √ |
• Characterization of alternative splicing patterns
Alternative splicing patterns are important to understand development and human diseases since altered splicing patterns contribute to development, cell differentiation, and human disease. RNA-seq is a powerful tool for characterization of alternative splicing patterns. Paired-end sequencing enables sequence information from both ends, thereby detecting splicing patterns without a requirement for previous knowledge of transcript annotations. PacBio SMRT sequencing allows examination of splicing patterns and transcript connectivity in an unbiased and genome-scale manner by generating full-length transcript sequences.
The common tools for characterization of alternative splicing patterns include TopHat, MapSplice, SpliceMap, SplitSeek, GEM mapper, SpliceR, SplicingCompass, GIMMPS, MATS, and rMATS.
Figure 2. RNA-seq for
detection of alternative splicing events (Ozsolak and Milos 2011).
• System biology
Creating lists of differential expression (DE) genes is not the final step of RNA-seq analysis. Further biological insight into an experimental system can be acquired by looking at the expression changes of sets of genes. This process, known as system biology, is based on the understanding that the whole is greater than the sum of the parts. Pathway analysis and co-expression network analysis are two important included parts.
Table 2. The tools for pathway analysis and co-expression network analysis using RNA-seq data.
Pathway analysis |
GSEA | A knowledge-based approach for genome-wide expression profiling. |
GSVA | A non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of an expression data set. | |
SeqGSEA | Provides methods for gene set enrichment analysis by integrating differential expression and splicing. | |
GAGE | An evaluation of the very latest large-scale genome assembly algorithms. | |
SPIA | Identifies the pathways most relevant to the condition | |
TAPPA | A java-based tool for identification of phenotype-associated genetic pathways. | |
DEAP | Identifies important regulatory patterns from differential expression data. | |
GSAASeqSP | Can identify pathways or gene sets significantly associated with a disease or phenotype. | |
Co-expression network |
GSCA | help researchers make discoveries by using massive amounts of publicly available gene expression data. |
DICER | Detects differentially co-expressed gene sets by using a novel probabilistic score for differential correlation. | |
WGCNA | A powerful method to isolate co-expressed groups of genes from microarray or RNA-seq data. |
• Single-cell RNA-seq
The single-cell RNA-seq offers opportunities to dissect of the interplay between intrinsic cellular processes and extrinsic stimuli in cell fate determination. It also contributes to a better understanding of how an ‘outlier cell’ may determine the outcome of an infection. In addition, a majority of living cells cannot be cultivated in vitro, single-cell RNA-seq may discover novel species or regulatory processes of biotechnological or medical relevance. The workflow of single-cell RNA-seq generally involves the following steps: single-cell isolation, cDNA library construction, RNA-seq, and bioinformatics (Figure 2).
Figure 3. The general workflow of single-cell RNA-seq.
Applications of single-cell RNA-seq
If you want more information about RNA-seq, please refer to the following articles:
Bioinformatics workflow of RNA-seq
The technologies and workflow of RNA-seq
References: