Case Study: Pinpointing Bacterial Wilt Resistance in Tomato via QTL-seq

Background and goal (problem definition)

Bacterial wilt, caused by members of the Ralstonia solanacearum species complex, is a high-impact soil-borne disease that can rapidly collapse tomato stands under permissive environments. In breeding programs and mechanism-driven labs alike, one recurring bottleneck is that "resistant vs susceptible" often behaves as a quantitative trait: strong loci may exist, but minor loci, environment, and phenotyping noise can blur signals.

For a mechanism-focused PI, the real challenge is usually not "finding a locus," but producing a defensible interval and an evidence chain that can support a high-quality figure set, a marker strategy, and a shortlist of candidates that reviewers will consider plausible for follow-up.

Trait context and research value

Resistance is valuable because it can:

enable stable production in infested fields,
reduce the need for intensive management, and
provide genetic handles for building resilient germplasm.

At the same time, resistance can be strain- and condition-dependent, and multiple resistance loci have been described in tomato, including regions linked to Hawaii-derived sources. (Wang et al., 2013; doi:10.1007/s10681-012-0830-x)

Objective and success criteria

Objective: Use QTL-seq (BSA-Seq) to identify one or more genomic regions associated with bacterial wilt resistance and produce:

1. clear peak(s) in a genome-wide Δ(SNP-index) profile,

2. an actionable interval with rational boundaries,

3. a candidate gene shortlist supported by structured evidence, and

4. markers suitable for RUO genotyping confirmation in independent populations and panels.

To avoid "peak chasing," define these gates before looking at the genome-wide plot:

Bulk/phenotype gates

Resistant bulk enriched for extreme resistance; susceptible bulk enriched for extreme susceptibility
Minimal ambiguity (avoid intermediate phenotypes unless explicitly modeled)
Clear extreme-selection rule recorded (e.g., top/bottom 10–15%)

Sequencing & mapping gates

Coverage sufficient for stable allele-frequency estimation genome-wide
Minimal reference bias; no obvious coverage deserts driving apparent peaks

Variant/statistics gates

Sufficient high-confidence SNP density after filtering
Peak supported by sustained signal across adjacent windows (not a single spike)
Confidence band (or empirical threshold) separates peak from baseline noise

Figure 1. Study overview
An anonymized QTL-seq overview for tomato bacterial wilt resistance: phenotype extremes → balanced bulks → sequencing → variant filtering and windowed Δ(SNP-index) → peak interval → candidate ranking and marker notes for RUO genotyping confirmation. This figure emphasizes the "deliverable chain" from experiment design to reviewer-ready outputs.

Study design (what was done)

This section is written as what you do + what you must record so the study remains defensible for reviewers and reusable for future fine mapping.

Population and phenotyping strategy

Population choice: A segregating population where the trait shows meaningful contrast (e.g., F₂, backcross, or RILs). QTL-seq was originally framed with designs where ~20–50 individuals per extreme bulk is a practical starting point in many plant settings. (Takagi et al., 2013; doi:10.1111/tpj.12105)

Phenotyping is the real power limiter. For a soil-borne disease trait:

Standardize inoculum preparation and scoring timepoints.
Use a clear, repeatable scoring rubric (ordinal scale is fine if consistent).
Record covariates that can explain noise (temperature, substrate, plant age, batch).

Extreme selection rule (set before genotypes):

Resistant extreme: bottom 10–15% of severity distribution
Susceptible extreme: top 10–15%
If distribution is shallow or skewed, increase population size rather than relaxing extremes.

If your long-term goal is mechanism, keep phenotype subcomponents (e.g., onset time vs final severity) as metadata; different loci may map to different components.

For a method refresher on SNP-index and Δ(SNP-index), see our QTL-seq approach guide: QTL-seq Approach: Accelerating Crop Trait Discovery via BSA-Seq

To keep downstream analysis and reporting consistent, align your design to a standard bulk segregant analysis (BSA) workflow early (sample naming, bulk definitions, and metadata fields).

Bulk construction (N per bulk and balancing strategy)

Bulk size: Start with ~20–50 individuals per bulk (or more if the trait is noisy). (Takagi et al., 2013; doi:10.1111/tpj.12105)

Balancing strategy

Equalize DNA input per individual to avoid "dominant individuals."
Match bulks on non-trait covariates when feasible (tray/batch distribution).
Avoid relatedness artifacts (don't unintentionally over-sample one family cluster unless intended).

DNA handling

Use a consistent extraction method for all individuals.
Quantify accurately (fluorometric methods preferred).
Pool equimolarly and record calculations.

When bulks are ready, whole-genome sequencing is commonly used for QTL-seq.

Sequencing plan (platform-agnostic) and QC checkpoints

A platform-agnostic plan is defined by:

read length sufficient for robust mapping to the tomato reference,
coverage sufficient to stabilize allele-frequency estimates,
duplication and contamination controlled to avoid distorted depth.

Coverage planning (practical guidance)

Good reference + adequate SNP density: moderate depth can work well.
Fragmented reference, high repeats, or uneven capture: plan higher depth to retain usable SNPs after filtering.

QC checkpoints you must capture

Raw read quality (per-base quality, adapter content)
Mapping rate and properly paired reads
Depth distribution (uniformity across genome)
Duplicate rate (library complexity proxy)
Post-filter SNP counts and missingness

Many groups package the lab + QC reporting under an NGS deliverable to ensure consistent documentation.

Bioinformatics and statistics (key outputs)

This is where a mechanism-focused PI can win reviewer trust: by presenting outputs as QC gates plus statistical logic, not just "we ran a pipeline."

Minimum reporting checklist (for reviewers)

You can copy/paste this block into Methods or Supplementary Materials.

Bulk N and extreme selection rule (percentile thresholds; phenotype definition)
Population type (F₂/BC/RIL) and reference genome version/build
Sequencing metrics: total reads, mapping %, duplication %, coverage (mean/median and % genome above threshold)
Variant filtering thresholds: min depth per bulk, mapping/base quality cutoffs, max depth cap, missingness filters
Window smoothing settings: window size and step (or smoothing method), and rationale
Peak boundary rule: how interval boundaries were chosen (baseline return + consecutive windows)
Candidate scoring rubric: criteria and weighting (variant impact, consistency, annotation, optional expression evidence)
Reproducibility package: VCF, window statistics table, plotting scripts/parameters

QC + mapping summary (what was checked and why)

Goal: Confirm both bulks are comparable and that mapping/coverage artifacts won't masquerade as QTL peaks.

QC gates (recommended starting thresholds)

Raw reads: ≥80–90% bases at Q30+ (platform dependent)
Adapter trimming: residual adapter content near zero
Mapping rate: often ≥90% for a good tomato reference; investigate if much lower
Properly paired reads: high fraction expected for paired-end
Duplicate rate: ideally low-to-moderate; unexpected high duplication suggests low library complexity
Coverage uniformity: avoid "coverage deserts" that differ systematically between bulks

What a reviewer-friendly mapping table includes

total reads per bulk, % passing filter
mapping %, properly paired %
mean and median depth, % genome ≥10× (or your chosen threshold)
duplication rate
insert size summary (for PE data)

A clean, review-ready QC section often comes from an analysis report template aligned to genomic data analysis that preserves parameters and intermediate metrics.

Variant calling + filtering (stabilizing SNP-index)

Why filtering is not optional: Δ(SNP-index) depends on allele-frequency estimates. Low depth, low quality, multi-mapping, and strand bias can inflate noise and create "ghost peaks."

Typical workflow

1. Align reads to reference

2. Mark duplicates (or at minimum quantify)

3. Call SNPs jointly across bulks

4. Apply stringent filtering:

minimum depth per bulk per site (often ≥8–10 reads)
maximum depth cap (to avoid repeats/copy-number artifacts)
minimum base quality and mapping quality
remove sites with severe allele balance anomalies
optionally remove indels to keep a clean SNP-index model

SNP-index (conceptual)

SNP-index: fraction of reads supporting the alternative allele at a site in one bulk
Δ(SNP-index): difference between bulks across the genome (define direction consistently)

To keep the parameter record auditable, document the variant calling configuration (software versions, exact filters, SNP retention counts).

Add one additional internal QC/filters resource link where your content matrix requires it:
[MATRIX_LINK_NEEDED: internal article on QC/variant filtering best practices]

Δ(SNP-index) plot and candidate interval selection logic

Core output: A genome-wide Δ(SNP-index) profile, typically smoothed in sliding windows.

Two widely used framings for NGS-BSA/QTL-seq are:

Δ(SNP-index) method (often paired with simulation- or empirical-based confidence)
G′ / smoothed G-statistic method (statistics-based inference)

QTLseqr implements both approaches and many groups use them as complementary checks. (Mansfeld & Grumet, 2018; doi:10.3835/plantgenome2018.01.0006)
A foundational statistical treatment of NGS-BSA variance sources is also available. (Magwene et al., 2011; doi:10.1371/journal.pcbi.1002255)

How we selected an interval (defensible boundaries)

1. Identify the primary region where Δ(SNP-index) exceeds the confidence band (or empirical threshold).

2. Expand boundaries until signal returns to baseline and remains stable over multiple adjacent windows.

3. Inspect local coverage and mappability; treat low-coverage/repeat-rich regions cautiously.

4. Optionally confirm with an alternate statistic (e.g., G′) to ensure the same region emerges.

Figure 2. Window-smoothed Δ(SNP-index) peak with confidence band (place after this subsection)
Example Δ(SNP-index) across chromosome position after sliding-window smoothing (window size/step defined in Methods). The shaded band represents a confidence band derived from simulation or an empirical genome-wide threshold, used to separate sustained signal from baseline noise. The candidate interval is defined by consecutive windows above threshold plus a boundary rule (return-to-baseline), rather than a single-point maximum.

Candidate gene prioritization (annotation + plausibility + structured evidence)

Once the interval is set, the interval-to-candidate step can be either subjective ("this looks like a resistance gene") or review-proof (structured evidence chain). For a mechanism-focused PI, a transparent rubric is usually the difference.

Evidence components

Variant impact: predicted high/medium impact variants (stop-gain, splice-site, damaging missense)
Bulk consistency: allele-frequency direction matches expected enrichment across the interval
Functional annotation: plausible defense-related domains/pathways (kept general)
Expression evidence (optional): if you already have RNA-seq, use expression as supporting evidence to prioritize candidates (RUO mechanistic context)

If you plan to incorporate expression as orthogonal support, pair the interval with RNA-seq transcriptome analysis as a structured add-on (not a replacement for mapping).

Results interpretation (from peak to action)

This section translates outputs into actionable next steps—candidate ranking and marker planning—while making uncertainty explicit.

Candidate interval size and boundary rationale

A QTL-seq interval is a resolution-limited estimate shaped by:

recombination density in your population,
QTL effect size,
bulk size and phenotype purity,
sequencing depth and SNP density,
coverage/mapping artifacts.

How to describe interval size in a manuscript

Provide interval coordinates (reference build), peak position, and width.
Report number of genes/gene models inside the interval.
Explain boundary logic (e.g., "plateau above confidence band across X Mb; boundaries chosen at return to baseline across Y consecutive windows").
Cite QC evidence supporting stability (coverage uniformity, SNP density).

If the interval remains large (hundreds of genes), treat it as guidance for the next stage: add recombinants and genotyping to refine.

To tighten a region after the first pass, targeted SNP fine mapping can convert a broad interval into recombination-defined candidate segments.

From interval to candidate shortlist (kept general, reviewer-defensible)

In tomato bacterial wilt resistance research, stable loci are often followed by marker development and independent confirmation steps, which is why the "peak-to-marker" bridge matters. (Yeon et al., 2022; doi:10.3390/plants11172223)

Shortlist approach (example rubric)
We build a candidate table with columns:

Anonymized Gene ID
Variant(s) + predicted impact
Bulk allele-frequency directionality
Annotation note (domain/pathway)
Optional expression note (if available)
Priority score (explicit weighting)

If you bring in expression support, standardize outputs via transcriptomic data analysis so candidate ranking remains transparent and reproducible.

Figure 3. Example candidate prioritization table with explicit weighting
Anonymized example only. A four-row candidate table illustrates how a transparent scoring rubric can combine (i) variant impact (e.g., stop-gain vs missense), (ii) bulk consistency across the peak region, (iii) optional expression hints, and (iv) pathway/domain notes into a weighted priority score. The purpose is to show the evidence chain reviewers expect—even when the final gene identifiers differ by project.

Follow-up plan

A reviewer-friendly plan separates genetic confirmation from mechanistic follow-up.

A) RUO genotyping confirmation (fast, scalable)

Propose a small marker set across the peak region.
Screen:
the original population (sanity check),
an independent segregating population (transferability),
broader panels/materials (generalization under RUO).
Use recombinants to narrow the region.

B) Mechanistic follow-up (PI-focused, publishable)

Select a small number of candidates for deeper assays.
Collect supporting evidence:
condition-relevant expression changes,
pathway-level consistency with known defense mechanisms,
haplotype association in broader panels (if available).

C) Reproducibility package

Provide versioned pipeline details, parameters, and intermediate outputs (VCF, window statistics, plots, candidate table, marker notes).

Decision summary: why this approach won (time and project ROI)

For a detailed timeline/ROI comparison, see linkage mapping vs QTL-seq.

Time saved vs classical linkage mapping

Compared with classical linkage mapping workflows that require iterative marker development and repeated genotyping rounds, QTL-seq shifts effort toward:

one high-throughput sequencing batch (two bulks, sometimes plus parents),
a single integrated analysis pass,
rapid generation of dense variants that feed fine mapping and marker planning.

The original QTL-seq framing emphasized speed by resequencing bulked extremes rather than genotyping many markers across all individuals. (Takagi et al., 2013; doi:10.1111/tpj.12105)

Reduced labor and faster go/no-go

For a mechanism-driven lab, the go/no-go question is often: Do we have a credible locus worth mechanistic investment? QTL-seq accelerates that decision by producing:

a genome-wide scan with clear peaks (or an honest "no peak" outcome),
an interval with rational boundaries and QC support,
a structured shortlist,
marker notes oriented to RUO genotyping confirmation.

Next-step roadmap (if results are strong, moderate, or messy)

If results are strong (single major peak)

Proceed to RUO genotyping confirmation and recombinant-based refinement.
Start mechanistic follow-up on top candidates.

If results are moderate (several peaks)

Check phenotype purity and batch effects.
Consider repeating with tighter extremes or larger bulks.
Cross-check with an alternate statistic (Δ(SNP-index) vs G′).

If results are messy (no clear peak)

Treat it as design feedback:
phenotype too noisy or extremes too weak,
insufficient SNP density or mapping issues,
bulk imbalance (DNA pooling bias).
Redesign: increase population size, refine phenotyping, adjust depth.

For re-analysis and reproducibility support, integrate a standardized bioinformatics services workflow and keep parameter records consistent across iterations.

When to use QTL-seq (and when not to)

Use QTL-seq when

You have a segregating population and can phenotype reliably.
Extremes are clear enough to form high-purity bulks.
You suspect at least one locus has moderate-to-large effect.
You need a fast, dense scan to justify downstream fine mapping.

Consider alternatives or a two-stage strategy when

The trait is highly polygenic with shallow phenotype separation.
Phenotyping is dominated by environment/batch with limited replication.
The reference genome is too fragmented/repetitive for confident mapping.
You need effect-size estimates across many loci (GWAS or full mapping may be better).

One-screen decision tool (mini-table + decision tree)

Mini-table: choose the right approach (RUO)

Your situation	QTL-seq fit	Expected RUO outputs	Primary risks	Mitigation
Clear extremes, likely major locus	High	Peak interval, candidate rubric, marker notes for RUO confirmation	Reference bias, local repeats	Strict depth caps + coverage checks
Moderate extremes, mixed effect sizes	Medium	Multiple peaks or broader intervals	Phenotype noise → false negatives	Increase N, tighten extremes, replicate phenotyping
Highly polygenic, weak separation	Low	Flat/noisy profile	Underpowered allele-frequency shifts	Consider GWAS, larger populations, multi-stage mapping
Fragmented reference / repeats	Medium–Low	Gaps, unstable peaks	Mapping artifacts	Mask repeats, adjust max-depth cap, consider improved reference

Simple decision tree

1. Can you define extremes confidently?

No → improve phenotyping or increase population size.
Yes → go to (2).

2. Is SNP density + mapping quality adequate genome-wide?

No → adjust sequencing/filters/reference strategy.
Yes → go to (3).

3. Do you observe sustained peak(s) above confidence threshold?

No → revisit phenotype purity, depth, and filters; consider alternate statistics.
Yes → define interval boundaries + build candidate rubric + prepare marker notes for RUO genotyping confirmation.

QC and troubleshooting (actionable thresholds + symptom→cause→fix)

QC gate table (metric → expected → what it protects you from)

QC metric	Practical expectation (typical starting point)	If it fails, common consequence
Bulk phenotype purity	Extremes clearly separated; minimal intermediates	Peaks flatten; false negatives
DNA pooling balance	Equimolar input per individual	Artificial allele-frequency distortion
Mapping rate	Often ≥90% with good reference	Missing SNPs; reference bias
Proper pairing	High fraction for PE libraries	Uneven depth; structural artifacts
Duplicate rate	Low-to-moderate; investigate if high	Reduced effective depth; noisy SNP-index
Per-site depth (post-filter)	Commonly ≥8–10 per bulk per SNP	Spiky Δ(SNP-index); unstable peaks
SNP density after filters	Sufficient genome-wide density	Underpowered scan; gaps

Troubleshooting table (symptom → likely cause → fix)

Symptom	Likely cause	Fix (fastest first)
No peak anywhere	Phenotype separation too weak; bulks not extreme	Re-score; tighten extremes; increase population size
Many small peaks	Batch effects; inconsistent scoring	Standardize conditions; replicate phenotyping block
Single sharp spike in one window	Low depth/repeat region; weak filtering	Raise min depth; cap max depth; exclude repeats; verify coverage
Peak shifts with window size	Underpowered or uneven SNP density	Increase depth or bulk size; confirm with G′
Candidate list huge	Resolution limited by recombination	Add recombinants; targeted genotyping; fine mapping

For submission requirements and metadata structure, follow the sample submission guidelines.

Deliverables checklist (what you should expect)

A complete QTL-seq package for RUO research typically includes:

QC summary tables for both bulks (raw + mapped)
Filtered SNP set (VCF) + filtering report
Window statistics table + plotting parameters
Genome-wide Δ(SNP-index) plots (and optional alternate statistic plots)
Peak interval coordinates + explicit boundary rationale
Candidate table with evidence columns + scoring rubric
Marker panel notes for RUO genotyping confirmation (coordinates, alleles, assay design notes)
Reproducibility package (software versions, thresholds, scripts)

FAQ

How many individuals per bulk do I need?

A common starting point is ~20–50 per bulk in many plant QTL-seq designs, but noisy traits and smaller effect sizes typically benefit most from increasing bulk size and improving phenotyping. (Takagi et al., 2013; doi:10.1111/tpj.12105)

Do I need to sequence the parents?

Not strictly, but parental data often improves allele-direction interpretation and candidate ranking, especially when multiple resistance sources may exist.

What's the most common reason QTL-seq underperforms?

Phenotype noise and weak extreme separation. If bulks are not truly extreme, increased sequencing rarely rescues a flat signal.

How do I choose window size for smoothing Δ(SNP-index)?

Window size trades resolution vs stability. If peaks shift dramatically with window settings, the data may be underpowered or uneven in SNP density/coverage; consider depth increases, stronger filters, or alternate statistics.

Δ(SNP-index) vs G′—which should I report?

Many studies treat them as complementary: Δ(SNP-index) is intuitive for directionality, while G′ provides a statistics-based inference framework. (Mansfeld & Grumet, 2018; doi:10.3835/plantgenome2018.01.0006)

How do I keep candidate prioritization from sounding subjective?

Use a transparent rubric (variant impact + bulk consistency + annotation + optional expression support) and show the weighting.

Can I integrate RNA-seq evidence if I already have data?

Yes—use expression as supporting evidence to rank candidates within the mapped interval (RUO mechanism support), not as a substitute for mapping.

What if I get two strong peaks?

Confirm each peak using RUO genotyping confirmation in an independent population or panel, then refine with recombinants and targeted genotyping.

How do I move from interval to breeder-friendly markers?

Marker development and marker-assisted selection (MAS) studies in tomato illustrate how stable QTL regions can be converted into robust genotyping assays for screening breeding and research materials (RUO).

References (format unified) — with DOI links

Takagi H, Abe A, Yoshida K, et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. The Plant Journal. 2013. doi: 10.1111/tpj.12105
Mansfeld BN, Grumet R. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing. The Plant Genome. 2018. doi: 10.3835/plantgenome2018.01.0006
Zhang J, Panthee DR. PyBSASeq: a simple and effective algorithm for bulked segregant analysis with whole-genome sequencing data. BMC Bioinformatics. 2020. doi: 10.1186/s12859-020-3435-8
Magwene PM, Willis JH, Kelly JK. The statistics of bulk segregant analysis using next generation sequencing. PLoS Computational Biology. 2011. doi: 10.1371/journal.pcbi.1002255
Michelmore RW, Paran I, Kesseli RV. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proceedings of the National Academy of Sciences of the United States of America. 1991. doi: 10.1073/pnas.88.21.9828
Siddique MI, Silverman E, Louws F, Panthee DR. Quantitative trait loci mapping for bacterial wilt resistance and plant height in tomatoes. Plants. 2024. doi: 10.3390/plants13060876
Wang JF, Ho FI, Truong HTH, et al. Identification of major QTLs associated with stable resistance of tomato cultivar 'Hawaii 7996' to Ralstonia solanacearum. Euphytica. 2013. doi: 10.1007/s10681-012-0830-x
Yeon J, Le NT, Sim SC. Assessment of temperature-independent resistance against bacterial wilt using major QTL in cultivated tomato (Solanum lycopersicum L.). Plants. 2022. doi: 10.3390/plants11172223

Related Services

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.