Case Study: Pinpointing Bacterial Wilt Resistance in Tomato via QTL-seq
Background and goal (problem definition)
Bacterial wilt, caused by members of the Ralstonia solanacearum species complex, is a high-impact soil-borne disease that can rapidly collapse tomato stands under permissive environments. In breeding programs and mechanism-driven labs alike, one recurring bottleneck is that "resistant vs susceptible" often behaves as a quantitative trait: strong loci may exist, but minor loci, environment, and phenotyping noise can blur signals.
For a mechanism-focused PI, the real challenge is usually not "finding a locus," but producing a defensible interval and an evidence chain that can support a high-quality figure set, a marker strategy, and a shortlist of candidates that reviewers will consider plausible for follow-up.
Trait context and research value
Resistance is valuable because it can:
- enable stable production in infested fields,
- reduce the need for intensive management, and
- provide genetic handles for building resilient germplasm.
At the same time, resistance can be strain- and condition-dependent, and multiple resistance loci have been described in tomato, including regions linked to Hawaii-derived sources. (Wang et al., 2013; doi:10.1007/s10681-012-0830-x)
Objective and success criteria
Objective: Use QTL-seq (BSA-Seq) to identify one or more genomic regions associated with bacterial wilt resistance and produce:
1. clear peak(s) in a genome-wide Δ(SNP-index) profile,
2. an actionable interval with rational boundaries,
3. a candidate gene shortlist supported by structured evidence, and
4. markers suitable for RUO genotyping confirmation in independent populations and panels.
To avoid "peak chasing," define these gates before looking at the genome-wide plot:
Bulk/phenotype gates
- Resistant bulk enriched for extreme resistance; susceptible bulk enriched for extreme susceptibility
- Minimal ambiguity (avoid intermediate phenotypes unless explicitly modeled)
- Clear extreme-selection rule recorded (e.g., top/bottom 10–15%)
Sequencing & mapping gates
- Coverage sufficient for stable allele-frequency estimation genome-wide
- Minimal reference bias; no obvious coverage deserts driving apparent peaks
Variant/statistics gates
- Sufficient high-confidence SNP density after filtering
- Peak supported by sustained signal across adjacent windows (not a single spike)
- Confidence band (or empirical threshold) separates peak from baseline noise
Figure 1. Study overview
An anonymized QTL-seq overview for tomato bacterial wilt resistance: phenotype extremes → balanced bulks → sequencing → variant filtering and windowed Δ(SNP-index) → peak interval → candidate ranking and marker notes for RUO genotyping confirmation. This figure emphasizes the "deliverable chain" from experiment design to reviewer-ready outputs.
Study design (what was done)
This section is written as what you do + what you must record so the study remains defensible for reviewers and reusable for future fine mapping.
Population and phenotyping strategy
Population choice: A segregating population where the trait shows meaningful contrast (e.g., F₂, backcross, or RILs). QTL-seq was originally framed with designs where ~20–50 individuals per extreme bulk is a practical starting point in many plant settings. (Takagi et al., 2013; doi:10.1111/tpj.12105)
Phenotyping is the real power limiter. For a soil-borne disease trait:
- Standardize inoculum preparation and scoring timepoints.
- Use a clear, repeatable scoring rubric (ordinal scale is fine if consistent).
- Record covariates that can explain noise (temperature, substrate, plant age, batch).
Extreme selection rule (set before genotypes):
- Resistant extreme: bottom 10–15% of severity distribution
- Susceptible extreme: top 10–15%
- If distribution is shallow or skewed, increase population size rather than relaxing extremes.
If your long-term goal is mechanism, keep phenotype subcomponents (e.g., onset time vs final severity) as metadata; different loci may map to different components.
For a method refresher on SNP-index and Δ(SNP-index), see our QTL-seq approach guide: QTL-seq Approach: Accelerating Crop Trait Discovery via BSA-Seq
To keep downstream analysis and reporting consistent, align your design to a standard bulk segregant analysis (BSA) workflow early (sample naming, bulk definitions, and metadata fields).
Bulk construction (N per bulk and balancing strategy)
Bulk size: Start with ~20–50 individuals per bulk (or more if the trait is noisy). (Takagi et al., 2013; doi:10.1111/tpj.12105)
Balancing strategy
- Equalize DNA input per individual to avoid "dominant individuals."
- Match bulks on non-trait covariates when feasible (tray/batch distribution).
- Avoid relatedness artifacts (don't unintentionally over-sample one family cluster unless intended).
DNA handling
- Use a consistent extraction method for all individuals.
- Quantify accurately (fluorometric methods preferred).
- Pool equimolarly and record calculations.
When bulks are ready, whole-genome sequencing is commonly used for QTL-seq.
Sequencing plan (platform-agnostic) and QC checkpoints
A platform-agnostic plan is defined by:
- read length sufficient for robust mapping to the tomato reference,
- coverage sufficient to stabilize allele-frequency estimates,
- duplication and contamination controlled to avoid distorted depth.
Coverage planning (practical guidance)
- Good reference + adequate SNP density: moderate depth can work well.
- Fragmented reference, high repeats, or uneven capture: plan higher depth to retain usable SNPs after filtering.
QC checkpoints you must capture
- Raw read quality (per-base quality, adapter content)
- Mapping rate and properly paired reads
- Depth distribution (uniformity across genome)
- Duplicate rate (library complexity proxy)
- Post-filter SNP counts and missingness
Many groups package the lab + QC reporting under an NGS deliverable to ensure consistent documentation.
Bioinformatics and statistics (key outputs)
This is where a mechanism-focused PI can win reviewer trust: by presenting outputs as QC gates plus statistical logic, not just "we ran a pipeline."
Minimum reporting checklist (for reviewers)
You can copy/paste this block into Methods or Supplementary Materials.
- Bulk N and extreme selection rule (percentile thresholds; phenotype definition)
- Population type (F₂/BC/RIL) and reference genome version/build
- Sequencing metrics: total reads, mapping %, duplication %, coverage (mean/median and % genome above threshold)
- Variant filtering thresholds: min depth per bulk, mapping/base quality cutoffs, max depth cap, missingness filters
- Window smoothing settings: window size and step (or smoothing method), and rationale
- Peak boundary rule: how interval boundaries were chosen (baseline return + consecutive windows)
- Candidate scoring rubric: criteria and weighting (variant impact, consistency, annotation, optional expression evidence)
- Reproducibility package: VCF, window statistics table, plotting scripts/parameters
QC + mapping summary (what was checked and why)
Goal: Confirm both bulks are comparable and that mapping/coverage artifacts won't masquerade as QTL peaks.
QC gates (recommended starting thresholds)
- Raw reads: ≥80–90% bases at Q30+ (platform dependent)
- Adapter trimming: residual adapter content near zero
- Mapping rate: often ≥90% for a good tomato reference; investigate if much lower
- Properly paired reads: high fraction expected for paired-end
- Duplicate rate: ideally low-to-moderate; unexpected high duplication suggests low library complexity
- Coverage uniformity: avoid "coverage deserts" that differ systematically between bulks
What a reviewer-friendly mapping table includes
- total reads per bulk, % passing filter
- mapping %, properly paired %
- mean and median depth, % genome ≥10× (or your chosen threshold)
- duplication rate
- insert size summary (for PE data)
A clean, review-ready QC section often comes from an analysis report template aligned to genomic data analysis that preserves parameters and intermediate metrics.
Variant calling + filtering (stabilizing SNP-index)
Why filtering is not optional: Δ(SNP-index) depends on allele-frequency estimates. Low depth, low quality, multi-mapping, and strand bias can inflate noise and create "ghost peaks."
Typical workflow
1. Align reads to reference
2. Mark duplicates (or at minimum quantify)
3. Call SNPs jointly across bulks
4. Apply stringent filtering:
- minimum depth per bulk per site (often ≥8–10 reads)
- maximum depth cap (to avoid repeats/copy-number artifacts)
- minimum base quality and mapping quality
- remove sites with severe allele balance anomalies
- optionally remove indels to keep a clean SNP-index model
SNP-index (conceptual)
- SNP-index: fraction of reads supporting the alternative allele at a site in one bulk
- Δ(SNP-index): difference between bulks across the genome (define direction consistently)
To keep the parameter record auditable, document the variant calling configuration (software versions, exact filters, SNP retention counts).
Add one additional internal QC/filters resource link where your content matrix requires it:
[MATRIX_LINK_NEEDED: internal article on QC/variant filtering best practices]
Δ(SNP-index) plot and candidate interval selection logic
Core output: A genome-wide Δ(SNP-index) profile, typically smoothed in sliding windows.
Two widely used framings for NGS-BSA/QTL-seq are:
- Δ(SNP-index) method (often paired with simulation- or empirical-based confidence)
- G′ / smoothed G-statistic method (statistics-based inference)
QTLseqr implements both approaches and many groups use them as complementary checks. (Mansfeld & Grumet, 2018; doi:10.3835/plantgenome2018.01.0006)
A foundational statistical treatment of NGS-BSA variance sources is also available. (Magwene et al., 2011; doi:10.1371/journal.pcbi.1002255)
How we selected an interval (defensible boundaries)
1. Identify the primary region where Δ(SNP-index) exceeds the confidence band (or empirical threshold).
2. Expand boundaries until signal returns to baseline and remains stable over multiple adjacent windows.
3. Inspect local coverage and mappability; treat low-coverage/repeat-rich regions cautiously.
4. Optionally confirm with an alternate statistic (e.g., G′) to ensure the same region emerges.
Figure 2. Window-smoothed Δ(SNP-index) peak with confidence band (place after this subsection)
Example Δ(SNP-index) across chromosome position after sliding-window smoothing (window size/step defined in Methods). The shaded band represents a confidence band derived from simulation or an empirical genome-wide threshold, used to separate sustained signal from baseline noise. The candidate interval is defined by consecutive windows above threshold plus a boundary rule (return-to-baseline), rather than a single-point maximum.
Candidate gene prioritization (annotation + plausibility + structured evidence)
Once the interval is set, the interval-to-candidate step can be either subjective ("this looks like a resistance gene") or review-proof (structured evidence chain). For a mechanism-focused PI, a transparent rubric is usually the difference.
Evidence components
- Variant impact: predicted high/medium impact variants (stop-gain, splice-site, damaging missense)
- Bulk consistency: allele-frequency direction matches expected enrichment across the interval
- Functional annotation: plausible defense-related domains/pathways (kept general)
- Expression evidence (optional): if you already have RNA-seq, use expression as supporting evidence to prioritize candidates (RUO mechanistic context)
If you plan to incorporate expression as orthogonal support, pair the interval with RNA-seq transcriptome analysis as a structured add-on (not a replacement for mapping).
Results interpretation (from peak to action)
This section translates outputs into actionable next steps—candidate ranking and marker planning—while making uncertainty explicit.
Candidate interval size and boundary rationale
A QTL-seq interval is a resolution-limited estimate shaped by:
- recombination density in your population,
- QTL effect size,
- bulk size and phenotype purity,
- sequencing depth and SNP density,
- coverage/mapping artifacts.
How to describe interval size in a manuscript
- Provide interval coordinates (reference build), peak position, and width.
- Report number of genes/gene models inside the interval.
- Explain boundary logic (e.g., "plateau above confidence band across X Mb; boundaries chosen at return to baseline across Y consecutive windows").
- Cite QC evidence supporting stability (coverage uniformity, SNP density).
If the interval remains large (hundreds of genes), treat it as guidance for the next stage: add recombinants and genotyping to refine.
To tighten a region after the first pass, targeted SNP fine mapping can convert a broad interval into recombination-defined candidate segments.
From interval to candidate shortlist (kept general, reviewer-defensible)
In tomato bacterial wilt resistance research, stable loci are often followed by marker development and independent confirmation steps, which is why the "peak-to-marker" bridge matters. (Yeon et al., 2022; doi:10.3390/plants11172223)
Shortlist approach (example rubric)
We build a candidate table with columns:
- Anonymized Gene ID
- Variant(s) + predicted impact
- Bulk allele-frequency directionality
- Annotation note (domain/pathway)
- Optional expression note (if available)
- Priority score (explicit weighting)
If you bring in expression support, standardize outputs via transcriptomic data analysis so candidate ranking remains transparent and reproducible.
Figure 3. Example candidate prioritization table with explicit weighting
Anonymized example only. A four-row candidate table illustrates how a transparent scoring rubric can combine (i) variant impact (e.g., stop-gain vs missense), (ii) bulk consistency across the peak region, (iii) optional expression hints, and (iv) pathway/domain notes into a weighted priority score. The purpose is to show the evidence chain reviewers expect—even when the final gene identifiers differ by project.
Follow-up plan
A reviewer-friendly plan separates genetic confirmation from mechanistic follow-up.
A) RUO genotyping confirmation (fast, scalable)
- Propose a small marker set across the peak region.
- Screen:
- the original population (sanity check),
- an independent segregating population (transferability),
- broader panels/materials (generalization under RUO).
- Use recombinants to narrow the region.
B) Mechanistic follow-up (PI-focused, publishable)
- Select a small number of candidates for deeper assays.
- Collect supporting evidence:
- condition-relevant expression changes,
- pathway-level consistency with known defense mechanisms,
- haplotype association in broader panels (if available).
C) Reproducibility package
- Provide versioned pipeline details, parameters, and intermediate outputs (VCF, window statistics, plots, candidate table, marker notes).
Decision summary: why this approach won (time and project ROI)
For a detailed timeline/ROI comparison, see linkage mapping vs QTL-seq.
Time saved vs classical linkage mapping
Compared with classical linkage mapping workflows that require iterative marker development and repeated genotyping rounds, QTL-seq shifts effort toward:
- one high-throughput sequencing batch (two bulks, sometimes plus parents),
- a single integrated analysis pass,
- rapid generation of dense variants that feed fine mapping and marker planning.
The original QTL-seq framing emphasized speed by resequencing bulked extremes rather than genotyping many markers across all individuals. (Takagi et al., 2013; doi:10.1111/tpj.12105)
Reduced labor and faster go/no-go
For a mechanism-driven lab, the go/no-go question is often: Do we have a credible locus worth mechanistic investment? QTL-seq accelerates that decision by producing:
- a genome-wide scan with clear peaks (or an honest "no peak" outcome),
- an interval with rational boundaries and QC support,
- a structured shortlist,
- marker notes oriented to RUO genotyping confirmation.
Next-step roadmap (if results are strong, moderate, or messy)
If results are strong (single major peak)
- Proceed to RUO genotyping confirmation and recombinant-based refinement.
- Start mechanistic follow-up on top candidates.
If results are moderate (several peaks)
- Check phenotype purity and batch effects.
- Consider repeating with tighter extremes or larger bulks.
- Cross-check with an alternate statistic (Δ(SNP-index) vs G′).
If results are messy (no clear peak)
- Treat it as design feedback:
- phenotype too noisy or extremes too weak,
- insufficient SNP density or mapping issues,
- bulk imbalance (DNA pooling bias).
- Redesign: increase population size, refine phenotyping, adjust depth.
For re-analysis and reproducibility support, integrate a standardized bioinformatics services workflow and keep parameter records consistent across iterations.
When to use QTL-seq (and when not to)
Use QTL-seq when
- You have a segregating population and can phenotype reliably.
- Extremes are clear enough to form high-purity bulks.
- You suspect at least one locus has moderate-to-large effect.
- You need a fast, dense scan to justify downstream fine mapping.
Consider alternatives or a two-stage strategy when
- The trait is highly polygenic with shallow phenotype separation.
- Phenotyping is dominated by environment/batch with limited replication.
- The reference genome is too fragmented/repetitive for confident mapping.
- You need effect-size estimates across many loci (GWAS or full mapping may be better).
One-screen decision tool (mini-table + decision tree)
Mini-table: choose the right approach (RUO)
| Your situation | QTL-seq fit | Expected RUO outputs | Primary risks | Mitigation |
|---|---|---|---|---|
| Clear extremes, likely major locus | High | Peak interval, candidate rubric, marker notes for RUO confirmation | Reference bias, local repeats | Strict depth caps + coverage checks |
| Moderate extremes, mixed effect sizes | Medium | Multiple peaks or broader intervals | Phenotype noise → false negatives | Increase N, tighten extremes, replicate phenotyping |
| Highly polygenic, weak separation | Low | Flat/noisy profile | Underpowered allele-frequency shifts | Consider GWAS, larger populations, multi-stage mapping |
| Fragmented reference / repeats | Medium–Low | Gaps, unstable peaks | Mapping artifacts | Mask repeats, adjust max-depth cap, consider improved reference |
Simple decision tree
1. Can you define extremes confidently?
- No → improve phenotyping or increase population size.
- Yes → go to (2).
2. Is SNP density + mapping quality adequate genome-wide?
- No → adjust sequencing/filters/reference strategy.
- Yes → go to (3).
3. Do you observe sustained peak(s) above confidence threshold?
- No → revisit phenotype purity, depth, and filters; consider alternate statistics.
- Yes → define interval boundaries + build candidate rubric + prepare marker notes for RUO genotyping confirmation.
QC and troubleshooting (actionable thresholds + symptom→cause→fix)
QC gate table (metric → expected → what it protects you from)
| QC metric | Practical expectation (typical starting point) | If it fails, common consequence |
|---|---|---|
| Bulk phenotype purity | Extremes clearly separated; minimal intermediates | Peaks flatten; false negatives |
| DNA pooling balance | Equimolar input per individual | Artificial allele-frequency distortion |
| Mapping rate | Often ≥90% with good reference | Missing SNPs; reference bias |
| Proper pairing | High fraction for PE libraries | Uneven depth; structural artifacts |
| Duplicate rate | Low-to-moderate; investigate if high | Reduced effective depth; noisy SNP-index |
| Per-site depth (post-filter) | Commonly ≥8–10 per bulk per SNP | Spiky Δ(SNP-index); unstable peaks |
| SNP density after filters | Sufficient genome-wide density | Underpowered scan; gaps |
Troubleshooting table (symptom → likely cause → fix)
| Symptom | Likely cause | Fix (fastest first) |
|---|---|---|
| No peak anywhere | Phenotype separation too weak; bulks not extreme | Re-score; tighten extremes; increase population size |
| Many small peaks | Batch effects; inconsistent scoring | Standardize conditions; replicate phenotyping block |
| Single sharp spike in one window | Low depth/repeat region; weak filtering | Raise min depth; cap max depth; exclude repeats; verify coverage |
| Peak shifts with window size | Underpowered or uneven SNP density | Increase depth or bulk size; confirm with G′ |
| Candidate list huge | Resolution limited by recombination | Add recombinants; targeted genotyping; fine mapping |
For submission requirements and metadata structure, follow the sample submission guidelines.
Deliverables checklist (what you should expect)
A complete QTL-seq package for RUO research typically includes:
- QC summary tables for both bulks (raw + mapped)
- Filtered SNP set (VCF) + filtering report
- Window statistics table + plotting parameters
- Genome-wide Δ(SNP-index) plots (and optional alternate statistic plots)
- Peak interval coordinates + explicit boundary rationale
- Candidate table with evidence columns + scoring rubric
- Marker panel notes for RUO genotyping confirmation (coordinates, alleles, assay design notes)
- Reproducibility package (software versions, thresholds, scripts)
FAQ
How many individuals per bulk do I need?
A common starting point is ~20–50 per bulk in many plant QTL-seq designs, but noisy traits and smaller effect sizes typically benefit most from increasing bulk size and improving phenotyping. (Takagi et al., 2013; doi:10.1111/tpj.12105)
Do I need to sequence the parents?
Not strictly, but parental data often improves allele-direction interpretation and candidate ranking, especially when multiple resistance sources may exist.
What's the most common reason QTL-seq underperforms?
Phenotype noise and weak extreme separation. If bulks are not truly extreme, increased sequencing rarely rescues a flat signal.
How do I choose window size for smoothing Δ(SNP-index)?
Window size trades resolution vs stability. If peaks shift dramatically with window settings, the data may be underpowered or uneven in SNP density/coverage; consider depth increases, stronger filters, or alternate statistics.
Δ(SNP-index) vs G′—which should I report?
Many studies treat them as complementary: Δ(SNP-index) is intuitive for directionality, while G′ provides a statistics-based inference framework. (Mansfeld & Grumet, 2018; doi:10.3835/plantgenome2018.01.0006)
How do I keep candidate prioritization from sounding subjective?
Use a transparent rubric (variant impact + bulk consistency + annotation + optional expression support) and show the weighting.
Can I integrate RNA-seq evidence if I already have data?
Yes—use expression as supporting evidence to rank candidates within the mapped interval (RUO mechanism support), not as a substitute for mapping.
What if I get two strong peaks?
Confirm each peak using RUO genotyping confirmation in an independent population or panel, then refine with recombinants and targeted genotyping.
How do I move from interval to breeder-friendly markers?
Marker development and marker-assisted selection (MAS) studies in tomato illustrate how stable QTL regions can be converted into robust genotyping assays for screening breeding and research materials (RUO).
References (format unified) — with DOI links
- Takagi H, Abe A, Yoshida K, et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. The Plant Journal. 2013. doi: 10.1111/tpj.12105
- Mansfeld BN, Grumet R. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing. The Plant Genome. 2018. doi: 10.3835/plantgenome2018.01.0006
- Zhang J, Panthee DR. PyBSASeq: a simple and effective algorithm for bulked segregant analysis with whole-genome sequencing data. BMC Bioinformatics. 2020. doi: 10.1186/s12859-020-3435-8
- Magwene PM, Willis JH, Kelly JK. The statistics of bulk segregant analysis using next generation sequencing. PLoS Computational Biology. 2011. doi: 10.1371/journal.pcbi.1002255
- Michelmore RW, Paran I, Kesseli RV. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proceedings of the National Academy of Sciences of the United States of America. 1991. doi: 10.1073/pnas.88.21.9828
- Siddique MI, Silverman E, Louws F, Panthee DR. Quantitative trait loci mapping for bacterial wilt resistance and plant height in tomatoes. Plants. 2024. doi: 10.3390/plants13060876
- Wang JF, Ho FI, Truong HTH, et al. Identification of major QTLs associated with stable resistance of tomato cultivar 'Hawaii 7996' to Ralstonia solanacearum. Euphytica. 2013. doi: 10.1007/s10681-012-0830-x
- Yeon J, Le NT, Sim SC. Assessment of temperature-independent resistance against bacterial wilt using major QTL in cultivated tomato (Solanum lycopersicum L.). Plants. 2022. doi: 10.3390/plants11172223