We are dedicated to providing outstanding customer service and being reachable at all times.
At a glance:
Direct RNA Sequencing (DRS) reads native RNA molecules without reverse transcription or PCR. That single design choice preserves full-length isoforms, poly(A) tails, and chemical modifications. As a result, DRS provides a faithful view of the epitranscriptome and enables rigorous analysis of RNA methylation across conditions. This integrated guide merges two essential workflows—modification detection during basecalling and downstream differential methylation scoring—into one practical, researcher-friendly playbook. You will learn how to select Dorado models, interpret MM/ML tags, build bedMethyl tables with modkit, compress and index large outputs for speed, and apply differential methylation tests with appropriate thresholds. Along the way, we include figure placeholders, example commands, QC checkpoints, and reporting tips that map directly to real project needs in plant, animal, and biomedical research.
DRS captures signals produced by RNA molecules as they pass through a nanopore. Because these are native molecules, chemical marks such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), pseudouridine (Ψ), and inosine (A-to-I) remain intact. Dorado, Oxford Nanopore's modern basecaller, decodes both base identity and likelihood of modifications from these current traces. When combined with robust alignment and summarization tools, DRS lets teams quantify modification prevalence at single positions, within regions, or across entire transcripts. This opens the door to linking epitranscriptomic features to expression, splicing, translation, and phenotype.
Figure 1: Overview of the DRS modification workflow from signals to DML/DMR outputs.
You will typically start with raw signal files in POD5 format produced by the sequencer. Basecalling will write a modBAM that encodes both sequence and modification annotations. For downstream aggregation, you will need a reference genome or a reference transcriptome, plus sufficient metadata to organize samples into groups for comparison. We recommend a consistent folder structure and a simple manifest file containing sample IDs, conditions, and paths to key files. Hardware considerations matter as well: fast local SSD improves throughput for temporary files, while network storage can hold merged and indexed artifacts.
Dorado supports several modified-base models for RNA. The common choices include motif-aware m6A models for DRACH contexts, broader inosine+m6A models for joint detection, and separate models for m5C and Ψ. Model selection should match your biology. If your hypothesis centers on DRACH-enriched m6A sites, the motif-specific model may yield higher precision. If you expect inosine edits, a combined inosine+m6A model is appropriate. Always pin the exact Dorado release and record model hashes in your run notes for reproducibility.
Example Dorado command (GPU-enabled, super-accurate model, multiple RNA mods):
dorado basecaller -r -x "cuda:all"
sup@latest,inosine_m6A@latest,pseU@latest,m5C@latest ./pod5 > basecalled.bam
The output BAM encodes modification evidence using two SAM tags: MM and ML. These tags are standardized so that downstream tools can parse them consistently without custom schemas.
• MM tag lists the primary base and modification code plus positional offsets within the read. For example, 'A+a' denotes m6A associated with adenines, and 'C+m' denotes m5C associated with cytosines. An entry such as 'MM:Z:A+a,3,0,0,1' means that the fourth, fifth, sixth, and eighth A positions in the read carry m6A signals. The numbers are delta-encoded offsets measured over occurrences of the canonical base.
• ML tag stores confidence scores for each listed site as integers between 0 and 255. These map evenly to probability bins over the 0.0–1.0 range and represent per-position support for a modification. Many teams set a minimum ML threshold (for example 128 or 150) during exploratory analysis to avoid low-confidence sites, then relax or tighten thresholds based on replicate evidence.
If you convert BAM to FASTQ for realignment, preserve MM/ML tags during the round-trip. Samtools and minimap2 can carry these tags forward so the resulting aligned BAM still contains modification annotations. This step is important when you need to align against different references or when you prefer transcriptome alignment for isoform-level summaries.
Example conversion and alignment while preserving tags:
samtools fastq -T MM,ML basecalled.bam |
minimap2 -x map-ont -a -t 16 -y --secondary=no reference.fa - > aligned.bam
With an aligned modBAM in hand, use modkit to build bedMethyl tables that summarize per-position counts of modified and unmodified bases. These tables include coordinates, strand, coverage, and modification rates, and they power QC plots, exploratory data analysis, and downstream statistics. The '--combine-strands' option is common for RNA analyses where strand interpretation is straightforward; omit it when strand specificity is essential to your question.
Constructing a bedMethyl table:
modkit pileup aligned.bam output/pileup.bed --ref reference.fa --combine-strands
BedMethyl rows are compact but information-rich. A typical record includes chromosome or transcript ID, 0-based start and end, modification code, coverage fields, fractions of modified calls per group, and auxiliary tags. Keep a data dictionary in your repository to avoid confusion during collaboration. When working with custom modification codes, consider modkit's '--assign-code' option to map codes to their primary base explicitly.
For large projects, compress and index bedMethyl outputs so query operations are fast and reliable. Tabix indexing enables random access to genomic regions and reduces I/O hot spots in multi-user settings.
Compression and indexing example:
bgzip pileup.bed
tabix -p bed pileup.bed.gz
Differential methylation analysis answers the core biological question: where do modification levels differ between conditions? Modkit provides two subcommands tailored to common study designs. The 'pair' mode evaluates two conditions and supports both locus-level (DML) and region-level (DMR) scoring. The 'multi' mode compares multiple conditions but focuses on region-level analysis supplied via a BED file. Loci-level tests are powerful when coverage is adequate and the biology is localized; region-level tests are more robust to sparsity and spread signal across windows, CpG islands, or 3′ UTR segments.
A typical two-condition run producing both DML and DMR outputs:
modkit dmr pair \
-a control_pileup.gz \
-b treatment_pileup.gz \
-o results/dml.tsv \
--segments results/dmr.bed \
--ref reference.fa \
--base C \
--threads 16 \
--log-filepath dmr.log
In the output, a positive difference score indicates higher methylation in the 'a' condition, while a negative score indicates the opposite. DML outputs list individual coordinates with group-specific counts and ratios, posterior metrics, and effect sizes. DMR outputs aggregate per region and include the number of contributing sites, group-level methylated read counts, and summarized proportions.
Coverage strongly influences stability. Set a minimum valid coverage suited to your read depth to reduce noise without discarding informative sites. Evaluate both effect size and confidence metrics; small p-values at very low effect sizes rarely drive biology. Balanced replicate designs enable MAP-based p-values and paired effect sizes, providing stronger evidence than single-sample contrasts. When sample sizes differ across groups, modkit still performs testing but reports the fraction of samples contributing at each position, helping you judge local sparsity.
Before publishing or moving to follow-up experiments, visually inspect top sites and regions. Squigualiser renders single-base resolution signal tracks aligned to reads or reference. Use it to verify that motifs align with expected current shifts and that pileups are consistent across reads.
Minimal plotting workflow after basecalling:
dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v5.0.0 input.pod5 --emit-moves > basecalls.bam
squigualiser reform --sig_move_offset 0 --kmer_length 1 -c --bam basecalls.bam -o reform_output.paf
samtools fasta basecalls.bam > pass.fasta
minimap2 -t 16 -ax map-ont ref.fa pass.fasta > mapped.bam
blue-crab p2s input.pod5 -o input.blow5
squigualiser plot --file pass.fasta --slow5 input.blow5 --alignment mapped.bam
The plots show bases on the x-axis and current intensity on the y-axis. Multiple reads appear as a signal pileup, which helps distinguish consistent modification signatures from random fluctuations.
• Consistency across cohorts: Fix Dorado versions and modified-base models for all samples in a comparison. Mixing versions can introduce artifactual differences.
• Coverage heterogeneity: Long 3′ UTRs and lowly expressed isoforms often produce sparse coverage. Use region-level tests or aggregate across isoforms when appropriate.
• Indexing and I/O: Always bgzip and tabix large bedMethyl outputs before interactive work or cluster jobs. This prevents slow seeks and pipeline stalls.
• Tag integrity: If you realign, confirm that MM/ML tags are preserved; otherwise, downstream tools will see no modifications.
• Orthogonal checks: Validate a subset of high-value events with independent assays or replicate consistency.
When sharing results, include exact Dorado version and model identifiers, minimap2 parameters, and modkit options. Report coverage thresholds, criteria for site inclusion, and strategies for handling multi-mapped reads. Summaries should present both DML and DMR views where relevant, with effect sizes, credible intervals or posterior metrics, and gene or isoform context. Provide figure snapshots from Squigualiser for exemplars, and distribute tabix-indexed tables to enable reuse. For long-term reproducibility, archive the modBAM, bedMethyl, and logs with checksums and a README that documents the folder layout.
Plant stress response studies often focus on m6A shifts near stress-regulated isoforms or APA sites. Direct RNA data allow simultaneous inspection of isoform usage and modification changes. In oncology, dissecting m5C or Ψ differences across tumor and matched normal samples can reveal translational control mechanisms. Platform groups validating new chemistries can use a small gene panel to compare modification calls across Dorado releases, quantifying stability with region-level scoring and signal plots.
Basecalling with multiple RNA modification models (GPU):
dorado basecaller -r -x "cuda:all" sup@latest,inosine_m6A@latest,pseU@latest,m5C@latest ./pod5 > basecalled.bam
Preserving MM/ML tags during FASTQ conversion and alignment:
samtools fastq -T MM,ML basecalled.bam |
minimap2 -x map-ont -a -t 16 -y --secondary=no reference.fa - > aligned.bam
Constructing bedMethyl tables for aggregated counts:
modkit pileup aligned.bam output/pileup.bed --ref reference.fa --combine-strands
Compression and indexing for fast region queries:
bgzip output/pileup.bed
tabix -p bed output/pileup.bed.gz
Pairwise differential methylation (loci + regions):
modkit dmr pair -a control_pileup.gz -b treatment_pileup.gz -o results/dml.tsv --segments results/dmr.bed --ref reference.fa --base C --threads 16
References
For research purposes only, not intended for personal diagnosis, clinical testing, or health assessment