Long-Read Integration Site Analysis Solution

Table of Contents

Host-vector junction and insertion structure analysis overview

Explore how long-read sequencing, vector-aware analysis, and custom bioinformatics connect integration sites with junction evidence and insertion structure.

Integration Site Analysis Should Go Beyond a Coordinate List

A coordinate table can tell you where a candidate insertion is located. It does not always show how the vector connects to the host genome, whether the junction has enough read support, whether the inserted sequence is partial or rearranged, or whether the pattern changes across samples.

For many projects, those details are the reason the analysis matters. AAV, lentiviral, retroviral, CAR-T, and genome engineering samples may contain integration patterns that require more than standard alignment output. Your team may need to review the host-side breakpoint, vector-side breakpoint, orientation, nearby gene context, read support, and whether the insertion appears simple or complex.

Why genomic coordinates are only the starting point

Genomic coordinates are essential, but they are not the whole answer. A coordinate can identify a candidate integration site; the surrounding evidence determines how useful that site is for interpretation.

A stronger deliverable should connect the coordinate to:

Host genome location
Vector sequence or breakend position
Junction sequence
Read-level support
Nearby gene or genomic feature annotation
Sample-level distribution when multiple samples are compared
Insertion structure review when supported by the data

That is why our workflow is built around integration-aware interpretation, not only site listing.

Why host-vector junctions need read-level evidence

Host-vector junctions are often the most informative part of an integration analysis. A junction sequence can show how the inserted vector or engineered sequence connects to the host genome.

Long-read sequencing can add value when reads span host-vector boundaries or provide longer context around an insertion. This helps with junction review, insertion structure assessment, and visualization. Short-read or targeted methods may still be valuable for site discovery depth, but they often need complementary analysis when the research question depends on insertion architecture.

When insertion structure matters more than site count

Some studies mainly need integration site distribution. Others need to understand insertion form. A project may need to review partial vector insertion, vector rearrangement, concatemer-like structure, multiple junctions, or complex insertion patterns.

In those cases, counting sites is not enough. The project needs a strategy that connects site discovery with insertion structure, read-level support, and annotation.

What Long Reads Add to Host-Vector Junction Analysis

Long-read sequencing is not a replacement for every integration site method. Its value is strongest when the question requires structural context.

Short-read, LAM-PCR, targeted PCR-based, and target enrichment approaches can support integration site discovery and distribution analysis. Long-read sequencing becomes especially useful when the project needs longer reads across host-vector junctions, insertion length, recombination patterns, or complex insertion architecture.

Short-read methods for higher site discovery depth

Short-read methods can be useful when the priority is detecting a larger number of candidate integration sites across samples. In some designs, short-read target-enrichment sequencing can provide deeper per-base coverage and support integration site discovery.

This can help when the main question is site count, genomic distribution, or sample-to-sample comparison. However, short reads may provide limited context for long inserted sequences, rearranged vector fragments, or multi-junction structures.

Long-read methods for junction context and insertion architecture

Long reads can show more of the host-vector connection in a single molecule. This can support junction-level interpretation, vector breakend review, insertion length estimation, and complex structure analysis when the data quality and workflow design are suitable.

When targeted enrichment or hybrid evidence is useful

Integration events may be low-frequency or difficult to capture. Targeted enrichment can help focus sequencing on vector-related or junction-related regions. A hybrid strategy can also be useful when the project needs both site discovery depth and structural context.

We help you decide whether the project should prioritize short-read depth, long-read structure, targeted enrichment, or a combined evidence strategy.

Use Cases We Support: AAV, Lentiviral, CAR-T, and Genome Engineering

Different vector systems and engineered cell models need different analysis logic. We design the workflow around vector type, host genome, sample context, and the question your team needs to answer.

AAV integration site and host-vector junction research

AAV integration analysis may require special attention to vector sequence, ITR-related breakends, episomal vector context, partial vector fragments, and rearranged vector forms. Long-read sequencing may be useful when the project needs junction structure, insertion length, or recombination context.

For AAV-focused projects, our AAV Integration Site Analysis can be considered as a core related service. When the vector genome itself also needs review, AAV Genome Sequencing may be included as a related module.

Lentiviral and retroviral integration site mapping

Lentiviral and retroviral systems are widely used in gene-modified cell research. Integration site mapping may be used to study genomic distribution, gene proximity, read support, and sample-level patterns.

Our Lentiviral/Retroviral Integration Sites Analysis can support projects that require integration site tables, host genomic annotation, read evidence, and visualization-ready outputs.

In vivo CAR-T integration site detection

For in vivo CAR-T and gene-modified cell research samples, integration site detection may need to account for heterogeneous cell populations, vector-host junction diversity, and sample-to-sample integration patterns.

For this use case, we keep the analysis focused on research samples, vector-host junction evidence, integration site distribution, sample-level comparison, and annotation. The scope should follow the study design and the available evidence.

CRISPR knock-in, transposon insertion, and genome engineering junctions

Integration-aware analysis can also support genome engineering questions. A project may need to review CRISPR knock-in junctions, template insertion patterns, transposon insertion sites, or unexpected junctions after editing.

When relevant, CRISPR Off-Target Validation and Genome Editing & Sequencing can be considered as related modules.

Single-cell or spatial follow-up only when the study design supports it

Single-cell and spatial technologies can be useful for cell-state or tissue-context questions, but they are not primary integration site discovery methods. We only recommend Single-cell RNA Sequencing or 10x Spatial Transcriptome Sequencing Service as optional follow-up when the project has a credible question about cell-type context, expression state, or tissue distribution.

This keeps the integration site workflow grounded in the insertion question while allowing multi-omics follow-up when it genuinely adds value.

Service Capabilities for Long-Read Integration Projects

A strong integration site project needs more than sequencing. It needs vector information, host reference context, sample metadata, a suitable sequencing or enrichment strategy, and integration-aware bioinformatics.

Vector and host reference review

Before recommending a workflow, we review the vector type, vector sequence, host species, host reference genome, sample type, expected integration context, and research goal.

This information helps us determine whether the project should focus on site discovery, junction structure, insertion form, distribution comparison, or custom bioinformatics.

AAV and lentiviral / retroviral integration modules

For vector-specific projects, we can connect the solution to core integration modules such as AAV integration analysis and lentiviral / retroviral integration site analysis.

These modules can be adapted based on sample type, vector sequence, host genome, and downstream deliverables.

PacBio and Nanopore long-read sequencing options

Long-read sequencing may be added when the research question requires longer junction context or insertion structure review. PacBio SMRT Sequencing may be useful when high-accuracy long-read consensus is important. Nanopore sequencing may be useful when flexible long-read structure profiling is needed.

The best platform depends on DNA quality, target enrichment design, read length needs, error profile, host/vector sequence complexity, and downstream analysis goals.

Long-read data analysis and integration-aware bioinformatics

Long-read data becomes useful only when it is interpreted correctly. CD Genomics provides Long-Read Sequencing Data Analysis Service, Genomic Data Analysis, and Bioinformatics support for host/vector mapping, junction calling, read evidence review, annotation, and report-ready visualization.

The goal is to help your team understand what the integration evidence shows, not simply receive raw data.

Technology Strategy: Short-Read, Targeted, Long-Read, or Hybrid?

The right method depends on the question. Some projects need high site discovery depth. Others need junction structure or insertion architecture. Some need both.

Strategy	Best-fit question	Strengths	Limitations	Junction structure value	Bioinformatics needs	Typical deliverables
Short-read integration analysis	Site discovery, distribution, gene proximity, sample comparison	High depth, scalable site detection, useful for broad distribution analysis	Limited long-range insertion context	Moderate; depends on read design and junction support	Coordinate calling, annotation, read support review	Integration site table, gene/proximity annotation, read counts
LAM-PCR / targeted PCR-based methods	Known or enriched junction discovery	Focused enrichment, established use in integration mapping	Amplification bias and limited context may occur	Limited to captured junction regions	Primer/enrichment-aware interpretation	Candidate sites, junction reads, annotation
Target enrichment sequencing	Vector-associated or junction-focused discovery	Improves focus on relevant regions	Performance depends on capture design and sample quality	Stronger when paired with long reads	Enrichment QC, host/vector mapping, junction calling	Enriched read summary, candidate sites, junction evidence
PacBio long-read sequencing	Junction sequence, insertion context, high-consensus long-read evidence	Long-read evidence with strong consensus potential	Requires suitable DNA and project design	High when reads span junctions or insertion structure	Long-read processing, split-read review, structure classification	Junction sequence, read support, insertion structure summary
Nanopore long-read sequencing	Flexible long-read junction and structure profiling	Long reads can capture larger insertion context	Error profile and coverage should be reviewed carefully	High when reads support host-vector structure	ONT-aware processing, junction review, visualization	Junction diagrams, structure plots, integration tables
Hybrid short-read + long-read strategy	Site discovery depth plus structure context	Combines complementary evidence layers	Requires careful data integration	High when both data types support the same event	Cross-platform QC, integrated annotation, report design	Integrated site table, junction evidence, structure summary
Single-cell / spatial follow-up	Cell-type or tissue-context follow-up after integration analysis	Adds expression or spatial context when justified	Not a primary integration discovery method	Indirect; depends on study design	Multi-omics interpretation and careful scope control	Cell-state or tissue-context summaries when included

A balanced strategy may use short-read or targeted methods for discovery depth and long-read sequencing for structure context. We help you decide which evidence layer fits your project.

Workflow from Sample Review to Junction Interpretation

From vector and sample review to junction calling, insertion structure analysis, annotation, and final reporting

Long-read integration site analysis workflow

A useful integration analysis workflow should connect sample information, vector sequence, host reference context, sequencing design, junction calling, annotation, and final reporting.

We start by reviewing the vector type, vector sequence, host species, host reference genome, sample type, sample grouping, and research goal. This helps define whether the project is focused on AAV integration, lentiviral/retroviral integration, in vivo CAR-T research samples, CRISPR knock-in junctions, transposon insertion, or another genome engineering context.

We then review DNA quality, sample complexity, available input, target abundance concerns, and whether enrichment may be needed. Low-frequency or heterogeneous events may require a more focused strategy.

Reads are processed with host and vector references in mind. Host-vector split reads, vector-associated reads, candidate breakends, and mapping patterns are reviewed before downstream reporting.

Candidate integration sites are reviewed with genomic coordinates, junction sequence, vector breakend information, read-level support, and annotation. When supported by the data, insertion structure review may include partial vector insertion, rearranged vector sequence, concatemer-like insertion, or complex junction patterns.

Final deliverables can include integration site tables, junction sequence summaries, annotation files, genome browser tracks, diagrams, read support tables, and a project report.

Sample Requirements and Project Intake Information

Integration site analysis depends on both the biological sample and the project context. Vector sequence, host genome information, sample type, and comparison design can strongly affect the workflow.

Final requirements depend on vector type, sample quality, host reference availability, target abundance, enrichment strategy, and analysis goals.

Sample or input type	What we review	Quality focus	Required project information	Typical QC checkpoints	Notes
Host genomic DNA from vector-treated research samples	DNA quality, host species, vector type, expected integration context	Suitability for enrichment, long-read sequencing, and junction analysis	Host reference, vector sequence, sample grouping, research goal	DNA QC, library QC, read length, host/vector mapping, junction support	Final requirements depend on vector type, sample complexity, and method selection
AAV-related research samples	Vector sequence, host reference, expected AAV context, sample grouping	Junction capture and vector breakend review	AAV vector sequence, host genome, sample labels, study design	Enrichment review, host/vector mapping, candidate junction review	Useful when the project needs AAV integration site or host-vector junction analysis
Lentiviral / retroviral or CAR-T research samples	Cell population context, vector system, sample type, available DNA, comparison design	Heterogeneity and low-frequency event risk	Vector sequence, host reference, sample labels, in vivo / ex vivo research context	DNA QC, enrichment performance, read support review, sample-level summary	Keep interpretation focused on research questions and study design
CRISPR knock-in or genome engineering samples	Edited locus, donor or vector template, expected junctions, off-target concerns	Junction specificity and insertion structure review	Target locus, donor/vector sequence, host reference, edit design	Read support, junction mapping, unexpected insertion review	Useful when the project needs knock-in junction or editing outcome interpretation
Existing sequencing data	FASTQ/BAM files, platform, references, prior analysis, sample labels	Compatibility with reanalysis and integration-aware bioinformatics	Host reference, vector sequence, raw files, prior workflow notes	File check, read QC, mapping review, candidate junction review	Can support reanalysis when data quality and references are suitable

Integration-Aware Bioinformatics and Deliverables

Integration site analysis becomes useful when reads are converted into interpretable evidence. We focus on deliverables that help your team review coordinates, junctions, read support, insertion structure, and genomic context.

Integration site coordinates and annotation

The core output is an integration site table with genomic coordinates. When supported by available reference information, the table may include chromosome, position, strand, nearby gene, intronic/exonic/intergenic context, gene proximity, and sample labels.

Host-vector junction sequence and read-level support

Junction-level deliverables can include host-side sequence, vector-side sequence, vector breakend position, read support counts, and representative read evidence. This helps your team judge whether an event has enough support for the intended analysis.

Insertion structure classification

Simple host-vector junctions
Partial vector insertion
Rearranged vector insertion
Concatemer-like insertion
Multi-junction or complex insertion patterns
Sample-specific or group-level insertion summaries

Genome browser tracks, diagrams, and reports

Integration site TSV or CSV
Junction sequence table
Read support table
Annotation table
Vector breakend summary
Genome browser tracks
Junction diagrams
Insertion structure summary
PDF or HTML-style project report

Choose the Right Strategy for Your Integration Question

A good strategy starts with the question your team needs to answer. We help you decide whether your project needs site discovery depth, junction structure, vector-type logic, sample-level comparison, or custom bioinformatics.

Choose site-discovery depth when integration count and distribution are central

If the primary goal is to identify many candidate sites and compare broad genomic distribution, short-read or targeted methods may be appropriate.

This is useful when site count, gene proximity, or sample-level distribution is more important than full insertion architecture.

Choose long-read evidence when junction structure is central

If the project needs insertion length, recombination patterns, vector breakend context, partial insertion review, or rearranged insertion analysis, long-read sequencing may add important evidence.

This is especially relevant for complex junctions and insertion structures that cannot be easily interpreted from short reads alone.

Add vector-type logic for AAV, lentiviral, retroviral, or CAR-T studies

AAV, lentiviral, retroviral, and in vivo CAR-T research samples should not be treated as identical project types. The vector sequence, host reference, sample context, integration biology, and expected deliverables all influence the workflow.

We help match the method to the vector system and study design.

Add custom bioinformatics when raw reads are not enough

Integration site analysis often requires custom filtering, host-vector split-read review, annotation, visualization, and reporting. If your team needs reviewable deliverables rather than raw files, integration-aware bioinformatics should be part of the plan.

Request Integration Site Analysis Plan

References

Compliance / Disclaimer

CD Genomics provides this service for Research Use Only (RUO). This service is not intended for clinical diagnosis, direct medical interpretation, patient monitoring, clinical safety assessment, therapeutic decision support, GMP release testing, release testing, regulatory validation, clinical-grade CAR-T testing, guaranteed detection claims, or direct-to-consumer testing.

Demo Results

Demo results help your team understand the kinds of outputs that may be included in a project. These examples show result types, not fixed conclusions.

Integration site table with annotation

This table summarizes candidate integration sites with genomic coordinates, sample labels, read support, vector-side information, and gene/proximity annotation.

Host-vector junction read evidence diagram

This figure shows reads spanning or supporting the junction between host genome sequence and vector sequence, including split-read alignments and vector breakend position.

Insertion structure and sample-level summary view

This output summarizes insertion forms such as partial vector insertion, rearranged insertion, concatemer-like structure, or complex junction patterns.

FAQ

1. What is a Long-Read Integration Site Analysis Solution?

It is a research-focused workflow that uses sequencing and integration-aware bioinformatics to identify integration sites, analyze host-vector junctions, review insertion structures, annotate genomic context, and prepare report-ready outputs.

2. When is short-read integration site analysis enough?

Short-read analysis may be suitable when the main goal is integration site discovery, site count, distribution analysis, or gene/proximity annotation. It may be less suitable when the project needs long-range insertion structure or junction architecture.

3. When should I add long-read sequencing?

Long-read sequencing may be useful when your project needs host-vector junction context, insertion-site length, recombination pattern review, partial vector insertion analysis, or complex insertion structure interpretation.

4. What is a host-vector junction?

A host-vector junction is the sequence boundary where host genomic DNA connects to vector-derived or engineered sequence. It can provide direct evidence for how an insertion is connected to the host genome.

5. Can this solution support AAV integration site analysis?

Yes. We can support AAV integration site and host-vector junction analysis when vector sequence, host reference, sample type, and study design support the workflow.

6. Can this solution support lentiviral or retroviral integration site mapping?

Yes. Lentiviral and retroviral integration site analysis can include genomic coordinates, read support, gene/proximity annotation, sample-level summaries, and visualization-ready outputs.

7. Can this solution support in vivo CAR-T integration site detection?

Yes. For in vivo CAR-T and other gene-modified cell research samples, we can support integration site detection and host-vector junction analysis when the vector sequence, host reference, sample type, and study design support the workflow.

8. Can the analysis detect partial vector insertions or rearranged insertion structures?

It can support partial vector insertion, rearranged insertion, concatemer-like insertion, or complex junction review when the sequencing strategy, read length, enrichment design, and data quality provide enough evidence.

9. What sample and vector information should I provide?

Useful information includes sample type, host species, host reference genome, vector sequence, vector type, sample grouping, expected integration context, prior data, and research goal.

10. What deliverables can I expect?

Deliverables may include integration site tables, genomic coordinates, junction sequence summaries, read support tables, gene/proximity annotation, vector breakend summaries, genome browser tracks, junction diagrams, insertion structure summaries, and project reports.

11. Can single-cell or spatial analysis be added?

Single-cell or spatial analysis can be considered only when the study design supports a cell-type or tissue-context follow-up question. These methods are not primary integration site discovery methods.

12. Is this service intended for clinical safety assessment or release testing?

No. This service is designed for research-focused integration site analysis, host-vector junction review, insertion structure interpretation, and bioinformatics deliverables. It is not intended for clinical safety assessment, release testing, patient monitoring, or regulatory validation.

Literature Case: Long-Read TES Adds Structure Context to AAV Integration Analysis

Published Research Highlight

Comparison and cross-validation of long-read and short-read target-enrichment sequencing methods to assess AAV vector integration into host genome

Journal: Molecular Therapy Methods & Clinical Development
Published: 2024

Background

AAV vector integration analysis often requires both site discovery and structural interpretation. Short-read data can support integration site quantitation and distribution analysis, but longer reads can provide additional context for integration-site length, recombination, and vector rearrangement.

Methods

The study compared short-read and long-read target-enrichment sequencing using AAV-treated monkey samples, in vitro lentiviral-treated samples, a stable cell line, and an engineered spike-in control.

The researchers evaluated insertion sites, vector and host breakends, vector sequence representation, and rearrangement patterns.

Results

Short-read target-enrichment sequencing identified more integration sites because of deeper per-base coverage.
Long-read target-enrichment sequencing identified fewer sites but enabled measurement of integration-site length and recombination patterns.
Long-read target-enrichment sequencing revealed vector rearrangement in 4%–40% of integration sites in AAV-treated animals.
Figure 4 shows vector source locus and host location of insertion sites in long- and short-read data.

Vector source locus and host location of insertion sites in long-read and short-read target-enrichment sequencing data Long-read and short-read target-enrichment sequencing can both support integration site analysis, while long-read data adds structural context for vector-host junctions and recombination patterns.

Conclusion

This case supports a balanced integration analysis strategy. Short-read or targeted approaches may be useful when site discovery depth is central, while long-read sequencing becomes valuable when the research question requires host-vector junction context, insertion structure, integration-site length, or recombination patterns.

Related Publications

The following publications support the scientific rationale for long-read integration site analysis, AAV vector integration, CAR-T integration profile research, and structural context analysis.

Comparison and cross-validation of long-read and short-read target-enrichment sequencing methods to assess AAV vector integration into host genome

Journal: Molecular Therapy Methods & Clinical Development

Year: 2024

Comparison and cross-validation of long-read and short-read target-enrichment sequencing methods to assess AAV vector integration into host genome — full article

Journal: Molecular Therapy Methods & Clinical Development

Year: 2024

Clonal kinetics and single-cell transcriptional profiling of CAR-T cells in patients undergoing CD19 CAR-T immunotherapy

Journal: Nature Communications

Year: 2020

Long-read sequencing identifies novel structural variations in colorectal cancer metastases

Journal: Genome Biology

Year: 2021