AAV Genome Integrity Metrics That Predict Downstream Success and How to Report Them

AAV genome integrity defines how closely encapsidated genomes match the intended construct in length, structure, and junction correctness. When integrity drifts, expression becomes erratic, replicate variability increases, and cross‑batch conclusions break. The practical outcome is simple: without a common, evidence‑based integrity framework, teams can't compare vendors or lots with confidence, nor standardize internal methods and reports. This guide lays out the minimum metric set, fit‑for‑purpose assays, and a reporting template that turns integrity signals into decision‑ready outputs.
Key takeaways
- A compact metric set—full‑length fraction, effective genome yield, breakpoint landscape, junction correctness, and contamination‑aware context—answers most comparability questions.
- Adopt a platform‑agnostic evidence stance and, where structure matters, run short‑ and long‑read workflows in parallel with defined controls and replicates.
- Express thresholds as project‑defined targets with failure triggers and confirmatory actions; avoid universal hard cutoffs.
- Standardize an integrity reporting template with metric cards, aligned coverage, breakpoint summaries, a junction table, an impurity profile, and an evidence appendix.
- Use uncertainty transparently: include replicate counts, %CV, and 95% confidence intervals; grade evidence when orthogonals concur.
Define Genome Integrity in AAV and Why "Success" Depends on It
Genome integrity is the extent to which the packaged genome matches the intended design end‑to‑end, including ITR continuity and correctness at critical junctions (promoter boundaries, splice sites, coding junctions, and polyA). In practice, integrity governs reproducibility. Lots with similar titers but different integrity profiles can behave differently in expression assays, leading to misleading vendor comparisons and unstable internal baselines.
What Counts as "Intact" in Practice
Operationally, "intact" means a molecule spans the construct from one ITR to the other with no truncations, large deletions, or rearrangements, and with correct sequence at predefined key junctions. Methods should classify molecules or reads using explicit rules (alignment identity, minimum span across ITR‑adjacent windows) and then validate with targeted checks at high‑value junctions. Because algorithms and protocols differ, evidence notes and parameter versions must accompany any "intact" count to keep results comparable across teams.
Downstream "Success" as Decision Outcomes
This guide ties AAV genome integrity to two primary outcomes: 1) lot‑to‑lot and vendor‑to‑vendor comparability for selection, and 2) internal method and report standardization so future runs align to the same dictionary. Troubleshooting remains optional: when metrics land in a warning zone, the framework prescribes confirmatory actions and next steps without turning the entire effort into a fault‑finding exercise.
Common Failure Modes That Distort Results
Truncations often concentrate near ITR‑adjacent regions; internal deletions remove critical elements; rearrangements (inversions, duplications) disrupt continuity; and incorrect junctions at promoters or splice sites degrade expression predictability. For a broader primer connecting integrity to outcomes and practical examples, see the overview on AAV sequencing principles and applications in the resource titled AAV sequencing principles, applications, and case studies. It provides context for why integrity must be measured with consistent definitions: AAV sequencing principles, applications, and therapeutic case studies.

Core AAV Genome Integrity Metrics and What Each One Really Tells You
A small, consistently defined metric set explains most "why did this result change" questions and enables cross‑batch comparability. The emphasis here is on definitions that independent reviewers can reproduce from evidence attachments.
Full‑Length Fraction and Effective Genome Yield
Full‑length fraction is the proportion of molecules that span the intended construct from ITR to ITR without structural breaks. Long‑read sequencing classifies reads into full‑length versus partial based on end‑to‑end alignment and identity thresholds, while digital PCR‑based designs can infer intactness with two distal targets and appropriate statistical modeling. Effective genome yield converts nominal genome titer into the absolute quantity of intact molecules (e.g., vg/mL × full‑length fraction). This adjustment often explains discrepancies when two lots have similar titers but diverge in experimental performance.
A simple numerical illustration clarifies how project‑defined bands guide actions: suppose a program sets a full‑length fraction target band at 70–80% with a failure trigger when the point estimate and its 95% CI both fall below 68%. Lot A measures 76% (95% CI 73–79%) with %CV=6% across n=3 technical replicates—Proceed. Lot B measures 69% (95% CI 63–74%) with %CV=12%—Confirm because the CI crosses the trigger; re‑sequence a subset with long‑reads and verify total genome concentration with digital PCR. If a confirmatory long‑read analysis returns 71% with better precision and the impurity profile is within budget, the lot can move to Proceed; otherwise, Fix by reviewing library parameters or purification.
For a neutral overview of how analytical techniques map to AAV critical quality attributes and support comparability across assays, see a recent review of AAV vector characterization techniques and complementary QC white papers: AAV vector characterization techniques and CQAs (review).
Truncation and Breakpoint Landscape
The breakpoint landscape highlights where truncations cluster along the genome. Enrichment near ITR‑adjacent windows is commonly observed due to hairpin structures and replication stress, and it often correlates with batch‑specific preparation effects. Reporting should include position bins, event frequencies normalized to coverage, and hotspot flags. Recent analyses describe ITR subregions that show concentrated breakpoints, reinforcing the need to inspect these windows carefully and corroborate with long‑read evidence. Here's a short interpretive example: if two vendors show similar full‑length fractions (Lot V1: 75%; Lot V2: 74%) but V2 exhibits a strong hotspot upstream of the promoter while V1 does not, teams can anticipate higher variability in expression readouts for V2 and may mark V2 as Confirm pending targeted junction checks.
Rearrangement Signatures
Rearrangements—such as inversions and duplications—are most reliably detected by long‑read sequencing that preserves molecule‑level context. In practice, reports list event types, approximate positions, and estimated frequencies with uncertainty bounds. A recent single‑vector resolution study using high‑fidelity long‑reads detailed noncanonical configurations and demonstrated why read‑level continuity matters for interpreting structural variation: see the peer‑reviewed work on noncanonical rAAV configurations at single‑vector resolution (2024) for methodological context: molecular configuration analysis with HiFi long‑reads.
Junction Correctness for Key Regions
Junction correctness focuses on areas where small sequence errors can have outsized impacts: ITR junctions, promoter boundaries, splice donors/acceptors, coding junctions, and polyA sites. A pragmatic approach blends long‑read continuity across these regions with short‑read depth for SNVs/indels at low variant allele fractions. Because ITRs are structurally complex, targeted checks are often warranted.
Contamination‑Aware Integrity Interpretation
Contamination‑aware interpretation distinguishes intended genomes from impurity classes such as host cell DNA, plasmid backbone fragments, and helper sequences. Short‑read multi‑reference mapping quantifies impurity fractions, while orthogonal analytics such as AUC or mass photometry help contextualize genome integrity readouts with capsid content information. For a concise platform and workflow context that helps teams choose fit‑for‑purpose methods, see this technologies and workflows resource: AAV sequencing technologies, platforms, workflows, and applications.
Measurement Strategies: Matching Assays to the Metric
Different integrity questions need different assays. A platform‑agnostic evidence stack prevents over‑calling and makes cross‑team comparisons possible. Where structural context is decision‑critical, running short‑ and long‑read in parallel is recommended.
Short‑Read Mapping for Coverage and Small Variants
Short‑read sequencing provides high‑depth coverage to detect small variants and to infer truncation‑prone windows from coverage drops. It also enables multi‑reference mapping to quantify impurity classes. It struggles with palindromic ITRs and long‑range phasing, so results should be paired with structural evidence. A recent review summarizes where sequencing-based methods fit within broader AAV analytical toolkits and highlights common interpretation limits that affect assay selection and reporting comparability: review of sequencing-based methods for AAV characterization.
Long‑Read Strategies for Structural Context
Long‑read sequencing resolves molecule‑level continuity, ITR integrity, rearrangements, and breakpoint positions. High‑fidelity protocols emphasize consensus accuracy and library strategies that preserve end‑to‑end spans. Peer‑reviewed studies demonstrate how noncanonical rAAV forms are resolved at single‑vector resolution, providing templates for evidence‑grade structural calls; the 2024 work cited above is a good starting point.
Targeted Checks for Hard Regions
Hard regions include ITRs and specific regulatory junctions with high GC content or secondary structure. Targeted long‑read passes or validated Sanger checks help confirm suspicious junctions detected by global scans. A practical discussion of ITR workflow and analysis challenges makes the case for dedicated checks: ITR sequencing workflow, analysis challenges, and trends.
Controls and Replicates That Prevent False Integrity Calls
Controls and replicates convert interesting plots into trusted decisions. Orthogonal confirmations (digital PCR for absolute titers and intactness estimation, size‑based separations for fragmentation profiles, AUC/MP/SEC‑MALS for capsid content) should accompany NGS metrics. Reports should include replicate counts (n), %CV or %RSD, and 95% CIs alongside method notes. As a pragmatic norm for production comparability studies when feasible, many teams target n≥3 technical replicates per assay, aim for %CV≤10% on key metrics, and grade evidence as High when orthogonals concur within predefined acceptance bands.

Reporting Integrity Metrics So Sponsors Can Compare Batches and Vendors
A reporting template that standardizes definitions, thresholds, and evidence attachments turns integrity into a comparable attribute rather than an anecdote.
Minimum Report Package
Every lot/vendor entry should include: 1) metric cards with values, units, 95% CI, %CV, and replicate counts; 2) an aligned coverage plot annotated with ITRs and other key features; 3) a breakpoint summary with position bins and normalized frequencies; 4) a junction correctness table for predefined regions with links to read evidence; and 5) an impurity profile with class taxonomy and interpretation notes. A concise methods and controls appendix captures assay parameters, control construct outcomes, and orthogonal confirmations.
A Metric Dictionary That Eliminates Ambiguity
A metric dictionary prevents silent drift in definitions. For each metric, specify counting rules (read‑level vs molecule‑level), normalization (per aligned read, per molecule, or per vg), how junction correctness is assessed, and how exceptions like ITR hairpins or GC bias are treated. Version each algorithm and parameter set so that numbers are reproducible months later.
How to Present Uncertainty Without Losing Trust
Trust increases when uncertainty is explicit. Report 95% confidence intervals, %CV across replicates, and evidence grades based on control agreement and the presence of orthogonal confirmations. When long‑ and short‑read conclusions disagree, flag the discordance and state the planned confirmation.
Batch‑to‑Batch Comparison Format
Comparison sheets work best as side‑by‑side tables using the metric dictionary's units and terms, with conditional highlighting for deviations beyond project‑defined action triggers. Consider a quick example: Vendor X submits three lots. X‑1: full‑length fraction 78% (95% CI 75–81%), no significant hotspots, junction table clean—Proceed. X‑2: full‑length fraction 72% (95% CI 66–77%) with a promoter‑adjacent hotspot—Confirm, add targeted junction validation and rerun long‑read subset. X‑3: full‑length fraction 64% (95% CI 60–68%) with elevated backbone impurities—Fix, review purification and library parameters before reconsideration. For broader terminology awareness around quality attributes (without entering clinical scope), this high‑level resource provides useful framing: viral vector suitability attributes in AAV and lentivirus (context only).

Interpreting Integrity Signals: Turning Metrics Into Next‑Step Decisions
Integrity interpretation should map directly to next actions using project-defined targets, failure triggers, and confirmatory steps. Instead of universal cutoffs, teams define acceptance bands and specify what triggers review.
For example, full-length fraction proceeds when the point estimate and its 95% CI stay within the program's target band; it moves to confirmatory testing when the CI straddles the lower boundary; and it enters a fix path when both estimate and CI fall below the failure trigger, typically prompting a long-read re-run and a digital PCR check of total genomes.
Breakpoint landscapes proceeds when no enriched hotspots are detected in predefined critical windows; they move to confirm if a window shows enrichment above the project's fold-change criterion with supporting reads; and they trigger a fix when recurring hotspots implicate a reproducible library or purification issue.
Junction correctness proceeds when predefined key regions show no incorrect junctions above the program's variant fraction threshold; it moves to confirm with targeted assays when suspicious calls appear at low frequency; and it triggers a fix when high-impact misjoins are validated.
Contamination-aware interpretation proceeds when impurity classes stay under the program's budget with orthogonal agreement; it triggers a fix when budgets are exceeded or orthogonals disagree, in which case teams document the implicated impurity classes and process sources.
Decision Matrix: Proceed vs Fix vs Confirm
Proceed when all core metrics meet target bands with convergent evidence and stable %CV across replicates. Fix when a single root cause (e.g., a library parameter or purification step) plausibly explains deviations and confirmation verifies improvement. Confirm when metrics sit near boundaries or evidence conflicts; add targeted assays before committing to selection or remediation.
Common Misreads and How to Avoid Them
Misreads often arise from mapping artifacts in palindromic ITRs, GC‑bias in coverage plots, or conflating partial genomes with rearrangements. Dedicated long‑read inspection, targeted validations, and impurity‑aware classification prevent these pitfalls. For boundaries and secondary confirmation avenues beyond integrity, see this context page covering analysis domains adjacent to integrity assessment: AAV integration analysis: scope and considerations.
Practical Workflow: From Sample Intake to Deliverables
A stepwise, checkpointed workflow reduces rework and keeps integrity conclusions reproducible across runs and teams. The example below is platform‑agnostic and emphasizes versioning and evidence capture.
Pre‑Analytical Checkpoints
Log sample receipt with chain‑of‑custody; freeze a reference aliquot; document DNase/polishing policies; record any nuclease inactivation confirmations; register spike‑in controls and their acceptance criteria.
Sequencing and QC Checkpoints
Version library parameters; capture target depth and molecule count goals for short‑ and long‑read passes; record control construct outcomes; maintain run‑level QC manifests so replicates can be audited side‑by‑side.
Analysis and Review Checkpoints
Version pipelines and parameter files; compute metrics via the dictionary; attach evidence (coverage plots, breakpoint tables, junction reads); track orthogonal confirmations; assign evidence grades and flags (Proceed/Fix/Confirm) prior to report lock.
Deliverables and Handoff Checklist
Deliver the minimum report package plus a batch‑comparison sheet; archive raw data pointers and parameter manifests; include an evidence appendix with control results. As a cross‑vector analogy for reporting rigor and risk framing, this overview in another vector system can be informative when setting expectations for evidence attachments: lentiviral integration analysis methods and risks.

FAQ
- What Does "AAV Genome Integrity" Mean in Sequencing Reports?
- How Do You Estimate Full‑Length Fraction Without Overcalling It?
- Why Do Apparent Truncations Cluster Near ITR‑Adjacent Regions?
- How Can I Distinguish True Rearrangements From Mapping Artifacts?
- What Minimum Deliverables Should an Integrity Report Include for Cross‑Batch Comparison?
- When Should Findings Be Confirmed With a Secondary Method, and What Should Be Confirmed First?
Next steps
Teams can adapt the metric dictionary and reporting template in this guide to standardize vendor and batch comparability immediately; for neutral assistance aligning assays to metrics and packaging evidence for sponsors, CD Genomics can support projects for research use only (RUO).