Bioinformatics Tips | Direct RNA Sequencing – Signal File Handling and Visualization

Bioinformatics Tips | Direct RNA Sequencing – Signal File Handling and Visualization

At a glance:

Nanopore sequencers generate raw electrical signals that encode RNA sequence information. In direct RNA sequencing workflows, these signals are written as high volume binary files. Most modern instruments and pipelines use the POD5 format for storage and transfer. POD5 supports streaming writes and efficient random access during downstream analysis. Correct handling of these files is mission critical for reliable results and stable pipelines. This article explains practical steps, provides command examples, and includes output snippets. You will learn how to inspect, merge, filter, subset, repack, and convert signal files. We also cover performance tuning, quality control, troubleshooting, and workflow integration. Each section uses clear language for scientists and engineers who manage sequencing projects.

Direct RNA Sequencing signal file handling workflow infographic with POD5 processing steps

Service you may interested in

Why POD5 Matters for Direct RNA Sequencing

POD5 replaces legacy FAST5 for most production environments today. The format couples compact storage with reliable metadata indexing. It enables streaming from the acquisition software to persistent storage. That behavior reduces temporary bottlenecks and minimizes partial writes. POD5 relies on a columnar memory model that accelerates analytics. Fast reads make integrity checks and targeted extraction practical at scale. Large projects benefit because parallel workers can access distinct chunks safely. Service providers value predictable throughput and simpler file lifecycle management.

Installation and Environment Preparation

Install the POD5 toolkit using Python packaging. Use virtual environments for isolation.

pip install pod5

Confirm that the command line interface is available and versioned.

pod5 --version

Record the toolkit version in your run logs and analysis notebooks.

Integrity and Summary Inspection

Start with quick summaries that surface obvious problems before basecalling.

Use pod5 view to build a compact table containing essential fields only.

pod5 view input.pod5 --include "read_id,channel,num_samples,end_reason" --output summary.tsv --separator "\t"

Typical output shows read identifiers, channels, sample counts, and end reasons.

read_id channel num_samples end_reason

00000000-0000-0000-0000-000000000001 23 45000 COMPLETE

00000000-0000-0000-0000-000000000002 24 45210 UNBLOCK

Inspect global integrity metrics and logs with the summary mode.

pod5 inspect summary input.pod5

Drill into individual reads when specific anomalies require deeper review.

pod5 inspect read input.pod5 00000000-0000-0000-0000-000000000001

Capture screenshots to document anomalies and share them with collaborators.

Consistent output archives make regression analysis fast during method updates.

File Manipulation: Merge, Filter, Subset, and Repack

Merging simplifies downstream scheduling when many POD5 fragments exist.

pod5 merge *.pod5 -o merged.pod5 --duplicate-ok

Filtering extracts reads of interest using a deterministic list of identifiers.

pod5 filter input.pod5 --output filtered.pod5 --ids read_ids.txt

Subsetting creates groups by barcode or quality status for organized processing.

pod5 subset -s sequencing_summary.txt --columns pod5 barcode pod5/ --template pod5_{pod5}/{barcode}/{pod5}.{barcode}.pod5

Repacking improves I O patterns and reduces fragmentation in heavy pipelines.

pod5 repack pod5s/*.pod5 repacked_pods/

Format Conversion Between POD5 and FAST5

Convert between formats when legacy tools require FAST5 inputs or outputs.

pod5 convert fast5 ./fast5/ --output pod5/ --one-to-one ./fast5/

Produce FAST5 from POD5 when specific utilities remain unported to POD5.

pod5 convert to_fast5 input.pod5 --output fast5/

Performance Tuning and Capacity Planning

Signal pipelines stress storage and compute concurrently. Plan resources carefully.

Create dashboards that track throughput, error rates, and queue depth over time.

Share weekly performance reports with stakeholders to align on capacity upgrades.

Quality Control Checklist for Signal Files

End to End Example Pipeline with Outputs

The following sequence demonstrates a compact intake routine for one run.

# Summaries

pod5 view run1.pod5 --include "read_id,channel,num_samples" > run1_summary.tsv

# Integrity logs

pod5 inspect summary run1.pod5 > run1_integrity.log

# Merge shards

pod5 merge run1_barcode01.pod5 run1_barcode02.pod5 -o run1_merged.pod5

# Repack for performance

pod5 repack run1_merged.pod5 repacked_run1/

# Convert for legacy tools

pod5 convert to_fast5 repacked_run1/run1_merged.pod5 --output fast5_out/

Representative output snippets are included for documentation and training.

POD5 file version: 0.3.28

Reads: 1245678

Channels: 512

Integrity: OK

read_id channel num_samples

00000000-0000-0000-0000-000000000001 23 47892

00000000-0000-0000-0000-000000000002 24 48010

Troubleshooting Common Issues

Maintain a runbook that documents symptoms, root causes, and durable fixes.

Share lessons across teams to reduce repeated investigation time during sprints.

Signal Visualization with Squigualiser

Once POD5 files have been processed and quality checked, the next step in Direct RNA Sequencing workflows is visualizing the raw electrical signal. Visualization bridges machine output and human interpretation, helping to validate basecalling, detect motif-associated signal patterns, and explore RNA modifications. Squigualiser is one of the most widely used tools for this purpose.

Installation Options

Option 1. Precompiled binary release:

wget https://github.com/hiruna72/squigualiser/releases/download/squigualiser-v0.6.1/squigualiser-v0.6.1-linux-x86-64-binaries.tar.gz -O squigualiser.tar.gz

tar xf squigualiser.tar.gz

cd squigualiser

./squigualiser --help

Option 2. Python installation via pip:

pip install squigualiser

Test the installation with sample data:

wget https://hiruna72.github.io/squigualiser/docs/sample_dataset.tar.gz

tar xf sample_dataset.tar.gz

squigualiser plot_pileup -f ref.fasta -s reads.blow5 -a eventalign.bam -o dir_out --region chr1:92,778,040-92,782,120 --tag_name "test_0"

Typical Workflow

Step 1. Basecall with Dorado using --emit-moves:

dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 input.pod5 --emit-moves > basecalls.bam

Step 2. Reform BAM for plotting:

squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam basecalls.bam -o reform_output.paf

Step 3. Extract sequences for alignment:

samtools fasta basecalls.bam > pass.fasta

Step 4. Align sequences to the reference genome:

minimap2 -t 16 -ax map-ont ref.fa pass.fa > mapped.bam

Step 5. Convert POD5 to SLOW5/BLOW5 format:

blue-crab p2s input.pod5 -o input.blow5

Step 6. Plot signal–read graphs:

squigualiser plot --file pass.fasta --slow5 input.blow5 --alignment mapped.bam

Output Interpretation

The generated plots display:

- X-axis: nucleotide positions, color-coded by base.

- Y-axis: current intensity values.

- Multiple aligned reads stacked together to reveal consistent patterns or deviations.

This visualization is valuable for validating new basecalling models, identifying motif-linked artifacts, or training new researchers.

From Signal Files to Visualization

By combining POD5 file management with Squigualiser visualization, researchers ensure both technical integrity and intuitive confirmation of their sequencing data. Clean, repacked files reduce computational noise, while signal-level plots highlight whether basecalling and modification signatures are reliable. This workflow forms the foundation for downstream RNA modification detection and differential methylation analysis.

Frequently Asked Questions

Glossary

References

  1. Official POD5 file format repository and release notes.
  2. Oxford Nanopore documentation covering POD5 outputs and ingestion patterns.
  3. Community tutorials discussing performance tuning for large POD5 datasets.

Recommend reading

For Research Use Only. Not for use in diagnostic procedures.
Talk about your projects

For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment

Share
Get Your Instant Quote