LncRNA is non-coding RNA with a length of more than 200 nucleotides. Compared with mRNA, lncRNA is generally low in expression and has stronger tissue specificity. As a hot spot of RNA research, lncRNA regulates the expression of coding genes at various levels, including epigenetic inheritance, transcription and post-transcription. Different from traditional chip inspection, lncRNA analysis combined with high-throughput sequencing technology and bioinformatics analysis can comprehensively excavate the information of lncRNA in samples. LncRNA sequencing technology has been widely used in genetic improvement of species, and disease studies on occurrence, development, diagnosis, and treatment.
The Workflow of LncRNA Sequencing
A typical lncRNA sequencing includes the quality assessment of total RNA, library preparation and sequencing. From the RNA sample to the final data, every step has an impact on the data quality and quantity. Therefore, obtaining high-quality data is the premise of ensuring comprehensive and credible biological information analysis.
Figure 1. Overview of lncRNA sequencing
(1) Total RNA detection. It mainly includes analysis of RNA degradation degree and contamination, detection of RNA purity (OD260/280 ratio), accurate quantification of RNA concentration (Qubit) and RNA integrity detection.
(2) Library construction. After the total RNA is qualified for library preparation, the poly-A based mRNA enrichment followed by mRNA fragmentation is performed to reduce rRNA reads. Then the cDNA is synthesized from enriched and fragmented RNA using reverse transcriptase (Super-Script II) and random primers. The cDNA was further converted into double-stranded DNA using the buffer solution, dNTPs (dUTP, dATP, dGTP and dCTP) and polymerase. The cDNA fragments are repaired at the end and ligated to platform-specific adapters.
(3) Library detection and sequencing. After the library construction, initial quantification and dilution are carried out, and then the fragment size of library is tested. Hiseq/Miseq sequencing is performed following the library inspection.
LncRNA Sequencing Data Analysis
After raw RNA-seq reads are generated by Illumina paired-end sequencing, the data analysis and identification of lncRNAs are listed as follows.
Figure 2. Overview of the data analysis and identification of lncRNAs
(1) Data pre-processing
(2) Overall quality assessment of RNA-seq. It mainly includes inter-sample correlation assessment (Pearson correlation coefficient) and uniform distribution evaluation.
(3) Reads alignment to the Reference Genome. STAR aligner and Tophat 2 are often used for reads alignment. If the reference genome is properly selected and there is no contamination in the experiments, the results of the mapping (total mapped reads or fragments) would normally be higher than 70%.
(4) Data exploration. After the files are sorted, DESeq2 is used for data exploration. The output results can be used for cluster analysis and PCA (principal component analyses) analysis among RNA-seq samples, and the relationship among samples can be explored or the experimental design can be verified. The closer the sample clustering distance or PCA distance is, the more similar the sample is.
(5) Transcripts assembly. The transcripts are assembled with Cufflinks or Scripture software. Cufflinks uses the probability model to assemble and quantify the expression level of the isoform set as small as possible at the same time, to provide the maximum likelihood explanation of expression data at the mapping point, and to provide the chain information accurately with specific parameters for the chain specific library. Scripture, which is based on statistical segmentation model to distinguish between expression sites and experimental background noise, provides information about all isoforms with statistically significant expression at the mapping site, and is applicable to the assembly of long transcripts.
(6) Candidate lncRNA screening
(7) Expression analysis. It mainly includes expression level comparison, differential expression analysis, and differential expression lncRNA screening, lncRNA expression cluster analysis and tissue or phenotypic specific analysis. These analyses are usually carried out using DESeq or Cuffdiff.
(8) Advanced analysis
If you are interested in our lncRNA sequencing service, please feel free to contact our scientists. In addition to this, we provide a package of transcriptomics sequencing services involving RNA-seq, small RNA sequencing, circRNA sequencing, degradome sequencing, and bacterial RNA sequencing.
References: