CD Genomics-the genomics service company
Support Documents The CD Genomics Way of Thinking Explore the scientific documents we’ve developed, including sample submission guidelines, principles, applications, and bioinformatics of genetic technologies.
Home / Resource / Support Documents / Genome Research / Workflow of LncRNA Sequencing and Its Data Analysis

Workflow of LncRNA Sequencing and Its Data Analysis

LncRNA is non-coding RNA with a length of more than 200 nucleotides. Compared with mRNA, lncRNA is generally low in expression and has stronger tissue specificity. As a hot spot of RNA research, lncRNA regulates the expression of coding genes at various levels, including epigenetic inheritance, transcription and post-transcription. Different from traditional chip inspection, lncRNA analysis combined with high-throughput sequencing technology and bioinformatics analysis can comprehensively excavate the information of lncRNA in samples. LncRNA sequencing technology has been widely used in genetic improvement of species, and disease studies on occurrence, development, diagnosis, and treatment.

The Workflow of LncRNA Sequencing

A typical lncRNA sequencing includes the quality assessment of total RNA, library preparation and sequencing. From the RNA sample to the final data, every step has an impact on the data quality and quantity. Therefore, obtaining high-quality data is the premise of ensuring comprehensive and credible biological information analysis.

Workflow of LncRNA Sequencing and Its Data Analysis

Figure 1. Overview of lncRNA sequencing

(1) Total RNA detection. It mainly includes analysis of RNA degradation degree and contamination, detection of RNA purity (OD260/280 ratio), accurate quantification of RNA concentration (Qubit) and RNA integrity detection.

(2) Library construction. After the total RNA is qualified for library preparation, the poly-A based mRNA enrichment followed by mRNA fragmentation is performed to reduce rRNA reads. Then the cDNA is synthesized from enriched and fragmented RNA using reverse transcriptase (Super-Script II) and random primers. The cDNA was further converted into double-stranded DNA using the buffer solution, dNTPs (dUTP, dATP, dGTP and dCTP) and polymerase. The cDNA fragments are repaired at the end and ligated to platform-specific adapters.

(3) Library detection and sequencing. After the library construction, initial quantification and dilution are carried out, and then the fragment size of library is tested. Hiseq/Miseq sequencing is performed following the library inspection.

LncRNA Sequencing Data Analysis

After raw RNA-seq reads are generated by Illumina paired-end sequencing, the data analysis and identification of lncRNAs are listed as follows.

Workflow of LncRNA Sequencing and Its Data Analysis

Figure 2. Overview of the data analysis and identification of lncRNAs

(1) Data pre-processing

  • Quality assessment. After obtaining the raw data (fastq files), the quality of the original reads including sequencing error rate distribution and GC content distribution, is evaluated using FastQC v0.11.3.
  • Data filtering. The original sequencing sequences contain low quality reads and adapter sequences. To ensure the quality of data analysis, raw reads must be filtered to get clean reads, and the subsequent analysis is based on clean reads. Data filtering mainly includes the removal of adapter sequences in the reads, the removal of reads with high proportion of N (N denotes the unascertained base information), and the removal of low-quality reads. This process is carried out using Cutadapt and Trimmomatic.

(2) Overall quality assessment of RNA-seq. It mainly includes inter-sample correlation assessment (Pearson correlation coefficient) and uniform distribution evaluation.

(3) Reads alignment to the Reference Genome. STAR aligner and Tophat 2 are often used for reads alignment. If the reference genome is properly selected and there is no contamination in the experiments, the results of the mapping (total mapped reads or fragments) would normally be higher than 70%.

(4) Data exploration. After the files are sorted, DESeq2 is used for data exploration. The output results can be used for cluster analysis and PCA (principal component analyses) analysis among RNA-seq samples, and the relationship among samples can be explored or the experimental design can be verified. The closer the sample clustering distance or PCA distance is, the more similar the sample is.

(5) Transcripts assembly. The transcripts are assembled with Cufflinks or Scripture software. Cufflinks uses the probability model to assemble and quantify the expression level of the isoform set as small as possible at the same time, to provide the maximum likelihood explanation of expression data at the mapping point, and to provide the chain information accurately with specific parameters for the chain specific library. Scripture, which is based on statistical segmentation model to distinguish between expression sites and experimental background noise, provides information about all isoforms with statistically significant expression at the mapping site, and is applicable to the assembly of long transcripts.

(6) Candidate lncRNA screening

  • Basic screening. The basic screening consists of three steps: transcripts, whose length is greater than 200bp and the number of exons is greater than 2, are selected firstly; then, the coverage of each transcript is calculated by Cufflinks, and the transcripts with the minimum coverage of reads greater than 3 are selected; finally, non-lncRNAs are filtered out by comparison with known non-lncRNA, and the results of Cuffmerge are used for position screening (different class-code is selected for different kinds of lncRNA). 
  • Coding Potential Evaluation. The coding potential is the key factor to judge lncRNA. At present, the mainstream methods of encoding potential analysis include Coding Potential Calculator (CPC) analysis, Coding-Non-Coding Index (CNCI) analysis, PFAM protein domain analysis and PhyloCSF analysis.

(7) Expression analysis. It mainly includes expression level comparison, differential expression analysis, and differential expression lncRNA screening, lncRNA expression cluster analysis and tissue or phenotypic specific analysis. These analyses are usually carried out using DESeq or Cuffdiff.

(8) Advanced analysis

  • LncRNA target gene prediction. The function of lncRNA is related to the adjacent coding protein gene. The protein coding genes adjacent to lncRNA are identified for functional enrichment analysis, and the main function of lncRNA could be predicted in lncRNATargets.
  • Functional enrichment analysis of specific lncRNA. Specific lncRNA generally refers to the lncRNA with differential expression or tissue or phenotypic specific expression. The functional enrichment analysis of these specific lncRNAs by GO and KEEG can be performed respectively.
  • Interaction analysis. LncRNA and mRNA can be related through the targeting relationship, and the mRNA can be related by protein, thus forming the interaction network of lncRNA-mRNA-protein. This interaction can be visualized by Cytoscape.

If you are interested in our lncRNA sequencing service, please feel free to contact our scientists. In addition to this, we provide a package of transcriptomics sequencing services involving RNA-seq, small RNA sequencing, circRNA sequencing, degradome sequencing, and bacterial RNA sequencing.

References:

  1. Arrigoni, A., Ranzani, V., Rossetti, G., Panzeri, I., Abrignani, S., Bonnal, R. J. P., et al. (2016). ‘Analysis RNA-seq and noncoding RNA’. Polycomb Group Proteins. Springer New York.
  2. Anders, S. and Huber, W. (2010) ‘Differential expression analysis for sequence count data’. Genome Biology. 11: R106.
  3. Guo, X., Gao, L., Wang, Y., Chiu, D. K., Wang, T. and Deng, Y. (2015). ‘Advances in long noncoding RNAs: identification, structure prediction and function annotation’. Briefings in Functional Genomics, 15(1), 38-46.
  4. Guttman, M., Garber, M., Levin, J. Z., Donaghey, J., Robinson, J., Adiconis, X., et al. (2010). Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnology, 28(5): 503-510.
  5. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., et al. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocol, 7(3): 562-578.
SPEAK TO OUR SCIENTISTS

What would you like to discuss?

With whom will we be speaking?

Please input "genomics" as verification code.

* is a required item.

Get cutting-edge science information from CD Genomics sent straight to your inbox every month.

SUBSCRIBE TO OUR NEWSLETTER
CONTACT CD GENOMICS

45-1 Ramsey Road, Shirley, NY 11967, USA
Tel: 1-631-275-3058
Fax: 1-631-614-7828
Email: info@cd-genomics.com