
Integration Site Analysis Should Go Beyond a Coordinate List
A coordinate table can tell you where a candidate insertion is located. It does not always show how the vector connects to the host genome, whether the junction has enough read support, whether the inserted sequence is partial or rearranged, or whether the pattern changes across samples.
For many projects, those details are the reason the analysis matters. AAV, lentiviral, retroviral, CAR-T, and genome engineering samples may contain integration patterns that require more than standard alignment output. Your team may need to review the host-side breakpoint, vector-side breakpoint, orientation, nearby gene context, read support, and whether the insertion appears simple or complex.
Why genomic coordinates are only the starting point
Genomic coordinates are essential, but they are not the whole answer. A coordinate can identify a candidate integration site; the surrounding evidence determines how useful that site is for interpretation.
A stronger deliverable should connect the coordinate to:
- Host genome location
- Vector sequence or breakend position
- Junction sequence
- Read-level support
- Nearby gene or genomic feature annotation
- Sample-level distribution when multiple samples are compared
- Insertion structure review when supported by the data
That is why our workflow is built around integration-aware interpretation, not only site listing.
Why host-vector junctions need read-level evidence
Host-vector junctions are often the most informative part of an integration analysis. A junction sequence can show how the inserted vector or engineered sequence connects to the host genome.
Long-read sequencing can add value when reads span host-vector boundaries or provide longer context around an insertion. This helps with junction review, insertion structure assessment, and visualization. Short-read or targeted methods may still be valuable for site discovery depth, but they often need complementary analysis when the research question depends on insertion architecture.
When insertion structure matters more than site count
Some studies mainly need integration site distribution. Others need to understand insertion form. A project may need to review partial vector insertion, vector rearrangement, concatemer-like structure, multiple junctions, or complex insertion patterns.
In those cases, counting sites is not enough. The project needs a strategy that connects site discovery with insertion structure, read-level support, and annotation.
What Long Reads Add to Host-Vector Junction Analysis
Long-read sequencing is not a replacement for every integration site method. Its value is strongest when the question requires structural context.
Short-read, LAM-PCR, targeted PCR-based, and target enrichment approaches can support integration site discovery and distribution analysis. Long-read sequencing becomes especially useful when the project needs longer reads across host-vector junctions, insertion length, recombination patterns, or complex insertion architecture.
Short-read methods for higher site discovery depth
Short-read methods can be useful when the priority is detecting a larger number of candidate integration sites across samples. In some designs, short-read target-enrichment sequencing can provide deeper per-base coverage and support integration site discovery.
This can help when the main question is site count, genomic distribution, or sample-to-sample comparison. However, short reads may provide limited context for long inserted sequences, rearranged vector fragments, or multi-junction structures.
Long-read methods for junction context and insertion architecture
Long reads can show more of the host-vector connection in a single molecule. This can support junction-level interpretation, vector breakend review, insertion length estimation, and complex structure analysis when the data quality and workflow design are suitable.
When targeted enrichment or hybrid evidence is useful
Integration events may be low-frequency or difficult to capture. Targeted enrichment can help focus sequencing on vector-related or junction-related regions. A hybrid strategy can also be useful when the project needs both site discovery depth and structural context.
We help you decide whether the project should prioritize short-read depth, long-read structure, targeted enrichment, or a combined evidence strategy.
Use Cases We Support: AAV, Lentiviral, CAR-T, and Genome Engineering
Different vector systems and engineered cell models need different analysis logic. We design the workflow around vector type, host genome, sample context, and the question your team needs to answer.
AAV integration site and host-vector junction research
AAV integration analysis may require special attention to vector sequence, ITR-related breakends, episomal vector context, partial vector fragments, and rearranged vector forms. Long-read sequencing may be useful when the project needs junction structure, insertion length, or recombination context.
For AAV-focused projects, our AAV Integration Site Analysis can be considered as a core related service. When the vector genome itself also needs review, AAV Genome Sequencing may be included as a related module.
Lentiviral and retroviral integration site mapping
Lentiviral and retroviral systems are widely used in gene-modified cell research. Integration site mapping may be used to study genomic distribution, gene proximity, read support, and sample-level patterns.
Our Lentiviral/Retroviral Integration Sites Analysis can support projects that require integration site tables, host genomic annotation, read evidence, and visualization-ready outputs.
In vivo CAR-T integration site detection
For in vivo CAR-T and gene-modified cell research samples, integration site detection may need to account for heterogeneous cell populations, vector-host junction diversity, and sample-to-sample integration patterns.
For this use case, we keep the analysis focused on research samples, vector-host junction evidence, integration site distribution, sample-level comparison, and annotation. The scope should follow the study design and the available evidence.
CRISPR knock-in, transposon insertion, and genome engineering junctions
Integration-aware analysis can also support genome engineering questions. A project may need to review CRISPR knock-in junctions, template insertion patterns, transposon insertion sites, or unexpected junctions after editing.
When relevant, CRISPR Off-Target Validation and Genome Editing & Sequencing can be considered as related modules.
Single-cell or spatial follow-up only when the study design supports it
Single-cell and spatial technologies can be useful for cell-state or tissue-context questions, but they are not primary integration site discovery methods. We only recommend Single-cell RNA Sequencing or 10x Spatial Transcriptome Sequencing Service as optional follow-up when the project has a credible question about cell-type context, expression state, or tissue distribution.
This keeps the integration site workflow grounded in the insertion question while allowing multi-omics follow-up when it genuinely adds value.
Service Capabilities for Long-Read Integration Projects
A strong integration site project needs more than sequencing. It needs vector information, host reference context, sample metadata, a suitable sequencing or enrichment strategy, and integration-aware bioinformatics.
Vector and host reference review
Before recommending a workflow, we review the vector type, vector sequence, host species, host reference genome, sample type, expected integration context, and research goal.
This information helps us determine whether the project should focus on site discovery, junction structure, insertion form, distribution comparison, or custom bioinformatics.
AAV and lentiviral / retroviral integration modules
For vector-specific projects, we can connect the solution to core integration modules such as AAV integration analysis and lentiviral / retroviral integration site analysis.
These modules can be adapted based on sample type, vector sequence, host genome, and downstream deliverables.
PacBio and Nanopore long-read sequencing options
Long-read sequencing may be added when the research question requires longer junction context or insertion structure review. PacBio SMRT Sequencing may be useful when high-accuracy long-read consensus is important. Nanopore sequencing may be useful when flexible long-read structure profiling is needed.
The best platform depends on DNA quality, target enrichment design, read length needs, error profile, host/vector sequence complexity, and downstream analysis goals.
Long-read data analysis and integration-aware bioinformatics
Long-read data becomes useful only when it is interpreted correctly. CD Genomics provides Long-Read Sequencing Data Analysis Service, Genomic Data Analysis, and Bioinformatics support for host/vector mapping, junction calling, read evidence review, annotation, and report-ready visualization.
The goal is to help your team understand what the integration evidence shows, not simply receive raw data.
Technology Strategy: Short-Read, Targeted, Long-Read, or Hybrid?
The right method depends on the question. Some projects need high site discovery depth. Others need junction structure or insertion architecture. Some need both.
| Strategy | Best-fit question | Strengths | Limitations | Junction structure value | Bioinformatics needs | Typical deliverables |
|---|---|---|---|---|---|---|
| Short-read integration analysis | Site discovery, distribution, gene proximity, sample comparison | High depth, scalable site detection, useful for broad distribution analysis | Limited long-range insertion context | Moderate; depends on read design and junction support | Coordinate calling, annotation, read support review | Integration site table, gene/proximity annotation, read counts |
| LAM-PCR / targeted PCR-based methods | Known or enriched junction discovery | Focused enrichment, established use in integration mapping | Amplification bias and limited context may occur | Limited to captured junction regions | Primer/enrichment-aware interpretation | Candidate sites, junction reads, annotation |
| Target enrichment sequencing | Vector-associated or junction-focused discovery | Improves focus on relevant regions | Performance depends on capture design and sample quality | Stronger when paired with long reads | Enrichment QC, host/vector mapping, junction calling | Enriched read summary, candidate sites, junction evidence |
| PacBio long-read sequencing | Junction sequence, insertion context, high-consensus long-read evidence | Long-read evidence with strong consensus potential | Requires suitable DNA and project design | High when reads span junctions or insertion structure | Long-read processing, split-read review, structure classification | Junction sequence, read support, insertion structure summary |
| Nanopore long-read sequencing | Flexible long-read junction and structure profiling | Long reads can capture larger insertion context | Error profile and coverage should be reviewed carefully | High when reads support host-vector structure | ONT-aware processing, junction review, visualization | Junction diagrams, structure plots, integration tables |
| Hybrid short-read + long-read strategy | Site discovery depth plus structure context | Combines complementary evidence layers | Requires careful data integration | High when both data types support the same event | Cross-platform QC, integrated annotation, report design | Integrated site table, junction evidence, structure summary |
| Single-cell / spatial follow-up | Cell-type or tissue-context follow-up after integration analysis | Adds expression or spatial context when justified | Not a primary integration discovery method | Indirect; depends on study design | Multi-omics interpretation and careful scope control | Cell-state or tissue-context summaries when included |
A balanced strategy may use short-read or targeted methods for discovery depth and long-read sequencing for structure context. We help you decide which evidence layer fits your project.
Workflow from Sample Review to Junction Interpretation
From vector and sample review to junction calling, insertion structure analysis, annotation, and final reporting

A useful integration analysis workflow should connect sample information, vector sequence, host reference context, sequencing design, junction calling, annotation, and final reporting.
We start by reviewing the vector type, vector sequence, host species, host reference genome, sample type, sample grouping, and research goal. This helps define whether the project is focused on AAV integration, lentiviral/retroviral integration, in vivo CAR-T research samples, CRISPR knock-in junctions, transposon insertion, or another genome engineering context.
We then review DNA quality, sample complexity, available input, target abundance concerns, and whether enrichment may be needed. Low-frequency or heterogeneous events may require a more focused strategy.
Reads are processed with host and vector references in mind. Host-vector split reads, vector-associated reads, candidate breakends, and mapping patterns are reviewed before downstream reporting.
Candidate integration sites are reviewed with genomic coordinates, junction sequence, vector breakend information, read-level support, and annotation. When supported by the data, insertion structure review may include partial vector insertion, rearranged vector sequence, concatemer-like insertion, or complex junction patterns.
Final deliverables can include integration site tables, junction sequence summaries, annotation files, genome browser tracks, diagrams, read support tables, and a project report.
Sample Requirements and Project Intake Information
Integration site analysis depends on both the biological sample and the project context. Vector sequence, host genome information, sample type, and comparison design can strongly affect the workflow.
Final requirements depend on vector type, sample quality, host reference availability, target abundance, enrichment strategy, and analysis goals.
| Sample or input type | What we review | Quality focus | Required project information | Typical QC checkpoints | Notes |
|---|---|---|---|---|---|
| Host genomic DNA from vector-treated research samples | DNA quality, host species, vector type, expected integration context | Suitability for enrichment, long-read sequencing, and junction analysis | Host reference, vector sequence, sample grouping, research goal | DNA QC, library QC, read length, host/vector mapping, junction support | Final requirements depend on vector type, sample complexity, and method selection |
| AAV-related research samples | Vector sequence, host reference, expected AAV context, sample grouping | Junction capture and vector breakend review | AAV vector sequence, host genome, sample labels, study design | Enrichment review, host/vector mapping, candidate junction review | Useful when the project needs AAV integration site or host-vector junction analysis |
| Lentiviral / retroviral or CAR-T research samples | Cell population context, vector system, sample type, available DNA, comparison design | Heterogeneity and low-frequency event risk | Vector sequence, host reference, sample labels, in vivo / ex vivo research context | DNA QC, enrichment performance, read support review, sample-level summary | Keep interpretation focused on research questions and study design |
| CRISPR knock-in or genome engineering samples | Edited locus, donor or vector template, expected junctions, off-target concerns | Junction specificity and insertion structure review | Target locus, donor/vector sequence, host reference, edit design | Read support, junction mapping, unexpected insertion review | Useful when the project needs knock-in junction or editing outcome interpretation |
| Existing sequencing data | FASTQ/BAM files, platform, references, prior analysis, sample labels | Compatibility with reanalysis and integration-aware bioinformatics | Host reference, vector sequence, raw files, prior workflow notes | File check, read QC, mapping review, candidate junction review | Can support reanalysis when data quality and references are suitable |
Integration-Aware Bioinformatics and Deliverables
Integration site analysis becomes useful when reads are converted into interpretable evidence. We focus on deliverables that help your team review coordinates, junctions, read support, insertion structure, and genomic context.
Integration site coordinates and annotation
The core output is an integration site table with genomic coordinates. When supported by available reference information, the table may include chromosome, position, strand, nearby gene, intronic/exonic/intergenic context, gene proximity, and sample labels.
Host-vector junction sequence and read-level support
Junction-level deliverables can include host-side sequence, vector-side sequence, vector breakend position, read support counts, and representative read evidence. This helps your team judge whether an event has enough support for the intended analysis.
Insertion structure classification
- Simple host-vector junctions
- Partial vector insertion
- Rearranged vector insertion
- Concatemer-like insertion
- Multi-junction or complex insertion patterns
- Sample-specific or group-level insertion summaries
Genome browser tracks, diagrams, and reports
- Integration site TSV or CSV
- Junction sequence table
- Read support table
- Annotation table
- Vector breakend summary
- Genome browser tracks
- Junction diagrams
- Insertion structure summary
- PDF or HTML-style project report
Choose the Right Strategy for Your Integration Question
A good strategy starts with the question your team needs to answer. We help you decide whether your project needs site discovery depth, junction structure, vector-type logic, sample-level comparison, or custom bioinformatics.
Choose site-discovery depth when integration count and distribution are central
If the primary goal is to identify many candidate sites and compare broad genomic distribution, short-read or targeted methods may be appropriate.
This is useful when site count, gene proximity, or sample-level distribution is more important than full insertion architecture.
Choose long-read evidence when junction structure is central
If the project needs insertion length, recombination patterns, vector breakend context, partial insertion review, or rearranged insertion analysis, long-read sequencing may add important evidence.
This is especially relevant for complex junctions and insertion structures that cannot be easily interpreted from short reads alone.
Add vector-type logic for AAV, lentiviral, retroviral, or CAR-T studies
AAV, lentiviral, retroviral, and in vivo CAR-T research samples should not be treated as identical project types. The vector sequence, host reference, sample context, integration biology, and expected deliverables all influence the workflow.
We help match the method to the vector system and study design.
Add custom bioinformatics when raw reads are not enough
Integration site analysis often requires custom filtering, host-vector split-read review, annotation, visualization, and reporting. If your team needs reviewable deliverables rather than raw files, integration-aware bioinformatics should be part of the plan.
References
- Comparison and cross-validation of long-read and short-read target-enrichment sequencing methods to assess AAV vector integration into host genome
- Comparison and cross-validation of long-read and short-read target-enrichment sequencing methods to assess AAV vector integration into host genome — full article
- Clonal kinetics and single-cell transcriptional profiling of CAR-T cells in patients undergoing CD19 CAR-T immunotherapy
- Long-read sequencing identifies novel structural variations in colorectal cancer metastases
- PMC full text record for the 2024 AAV target-enrichment sequencing study
Compliance / Disclaimer
CD Genomics provides this service for Research Use Only (RUO). This service is not intended for clinical diagnosis, direct medical interpretation, patient monitoring, clinical safety assessment, therapeutic decision support, GMP release testing, release testing, regulatory validation, clinical-grade CAR-T testing, guaranteed detection claims, or direct-to-consumer testing.
Demo Results
Demo results help your team understand the kinds of outputs that may be included in a project. These examples show result types, not fixed conclusions.

Integration site table with annotation
This table summarizes candidate integration sites with genomic coordinates, sample labels, read support, vector-side information, and gene/proximity annotation.

Host-vector junction read evidence diagram
This figure shows reads spanning or supporting the junction between host genome sequence and vector sequence, including split-read alignments and vector breakend position.

Insertion structure and sample-level summary view
This output summarizes insertion forms such as partial vector insertion, rearranged insertion, concatemer-like structure, or complex junction patterns.
FAQ
1. What is a Long-Read Integration Site Analysis Solution?
It is a research-focused workflow that uses sequencing and integration-aware bioinformatics to identify integration sites, analyze host-vector junctions, review insertion structures, annotate genomic context, and prepare report-ready outputs.
2. When is short-read integration site analysis enough?
Short-read analysis may be suitable when the main goal is integration site discovery, site count, distribution analysis, or gene/proximity annotation. It may be less suitable when the project needs long-range insertion structure or junction architecture.
3. When should I add long-read sequencing?
Long-read sequencing may be useful when your project needs host-vector junction context, insertion-site length, recombination pattern review, partial vector insertion analysis, or complex insertion structure interpretation.
4. What is a host-vector junction?
A host-vector junction is the sequence boundary where host genomic DNA connects to vector-derived or engineered sequence. It can provide direct evidence for how an insertion is connected to the host genome.
5. Can this solution support AAV integration site analysis?
Yes. We can support AAV integration site and host-vector junction analysis when vector sequence, host reference, sample type, and study design support the workflow.
6. Can this solution support lentiviral or retroviral integration site mapping?
Yes. Lentiviral and retroviral integration site analysis can include genomic coordinates, read support, gene/proximity annotation, sample-level summaries, and visualization-ready outputs.
7. Can this solution support in vivo CAR-T integration site detection?
Yes. For in vivo CAR-T and other gene-modified cell research samples, we can support integration site detection and host-vector junction analysis when the vector sequence, host reference, sample type, and study design support the workflow.
8. Can the analysis detect partial vector insertions or rearranged insertion structures?
It can support partial vector insertion, rearranged insertion, concatemer-like insertion, or complex junction review when the sequencing strategy, read length, enrichment design, and data quality provide enough evidence.
9. What sample and vector information should I provide?
Useful information includes sample type, host species, host reference genome, vector sequence, vector type, sample grouping, expected integration context, prior data, and research goal.
10. What deliverables can I expect?
Deliverables may include integration site tables, genomic coordinates, junction sequence summaries, read support tables, gene/proximity annotation, vector breakend summaries, genome browser tracks, junction diagrams, insertion structure summaries, and project reports.
11. Can single-cell or spatial analysis be added?
Single-cell or spatial analysis can be considered only when the study design supports a cell-type or tissue-context follow-up question. These methods are not primary integration site discovery methods.
12. Is this service intended for clinical safety assessment or release testing?
No. This service is designed for research-focused integration site analysis, host-vector junction review, insertion structure interpretation, and bioinformatics deliverables. It is not intended for clinical safety assessment, release testing, patient monitoring, or regulatory validation.
Literature Case: Long-Read TES Adds Structure Context to AAV Integration Analysis
Published Research Highlight
Journal: Molecular Therapy Methods & Clinical Development
Published: 2024
Background
AAV vector integration analysis often requires both site discovery and structural interpretation. Short-read data can support integration site quantitation and distribution analysis, but longer reads can provide additional context for integration-site length, recombination, and vector rearrangement.
Methods
The study compared short-read and long-read target-enrichment sequencing using AAV-treated monkey samples, in vitro lentiviral-treated samples, a stable cell line, and an engineered spike-in control.
The researchers evaluated insertion sites, vector and host breakends, vector sequence representation, and rearrangement patterns.
Results
- Short-read target-enrichment sequencing identified more integration sites because of deeper per-base coverage.
- Long-read target-enrichment sequencing identified fewer sites but enabled measurement of integration-site length and recombination patterns.
- Long-read target-enrichment sequencing revealed vector rearrangement in 4%–40% of integration sites in AAV-treated animals.
- Figure 4 shows vector source locus and host location of insertion sites in long- and short-read data.
Long-read and short-read target-enrichment sequencing can both support integration site analysis, while long-read data adds structural context for vector-host junctions and recombination patterns.
Conclusion
This case supports a balanced integration analysis strategy. Short-read or targeted approaches may be useful when site discovery depth is central, while long-read sequencing becomes valuable when the research question requires host-vector junction context, insertion structure, integration-site length, or recombination patterns.
Related Publications
The following publications support the scientific rationale for long-read integration site analysis, AAV vector integration, CAR-T integration profile research, and structural context analysis.
Journal: Molecular Therapy Methods & Clinical Development
Year: 2024
Journal: Molecular Therapy Methods & Clinical Development
Year: 2024
Journal: Nature Communications
Year: 2020
Long-read sequencing identifies novel structural variations in colorectal cancer metastases
Journal: Genome Biology
Year: 2021
PMC full text record for the 2024 AAV target-enrichment sequencing study
Journal: Molecular Therapy Methods & Clinical Development / PMC
Year: 2024
