Overview

Service Overview

Integration Site Analysis (ISA) is a molecular characterization service that determines the precise chromosomal location, copy number, and structural integrity of foreign DNA insertions. It allows researchers to validate transgenic models by using long-read sequencing to resolve complex tandem arrays, detect plasmid backbone contamination, and verify the safety of flanking endogenous genes.

By moving beyond simple presence/absence detection (PCR) or low-resolution imaging (FISH), we provide a definitive "Genomic Passport" for your biological product. Whether you are characterizing a founder mouse line to separate hemizygotes from homozygotes, or validating a Master Cell Bank (MCB) for viral vector production, our pipeline ensures that your transgene is exactly where it should be—and nowhere else.

Why Choose CD Genomics?

  • Single-Nucleotide Resolution: Pinpoint exact chromosomal coordinates (e.g., Chr9:124,116,557).
  • Structural Clarity: Resolve massive tandem arrays (>20 copies), inversions, and "invisible" backbone contamination.
  • Regulatory Compliance: Generate robust data packages for FDA/EMA safety assessments and GMO deregulation.
Black Box Risks

The Challenge: Risks of the "Black Box" Model

For decades, the generation of transgenic models has relied on methods like pronuclear injection, biolistics, and viral transduction. While effective at introducing DNA, these methods share a common flaw: Random Integration.

Without precise mapping, you are operating with incomplete data. This "black box" presents three critical risks to your pre-clinical research pipeline that standard genotyping cannot detect:

Endogenous Gene Disruption

(Insertional Mutagenesis)

The insertion of foreign DNA is a violent genetic event. If the transgene lands within the coding region (exon) or regulatory element (promoter/enhancer) of an endogenous gene, it can create "off-target" phenotypes.

  • The Quantitative Risk: Studies estimate that 5–10% of transgenic mice exhibit phenotypes unrelated to the transgene itself, but rather due to the disruption of native genes.
  • The CD Genomics Solution: We map the insertion relative to the host annotation (RefSeq/Ensembl) to certify that flanking genes (e.g., Ccr2, Ccr5 in mouse models) remain intact and functional.

Unstable Tandem Arrays & Silencing

Transgenes rarely integrate as single, clean copies. They often form complex head-to-tail concatemers (tandem arrays).

  • The Quantitative Risk: High copy numbers (often exceeding 20 copies per locus) can trigger epigenetic silencing (heterochromatin formation), leading to variable expression or total loss of the trait over generations.
  • The CD Genomics Solution: We resolve the internal architecture of the array, identifying inverted repeats, truncations, and scrambling that serve as triggers for silencing.

"Hidden" Contamination

(Vector Backbones)

During plasmid linearization, fragments of the bacterial backbone (origins of replication, antibiotic resistance markers) can accidentally co-integrate.

  • The Regulatory Risk: The presence of bacterial DNA (e.g., AmpR or E. coli genomic fragments) in a therapeutic vector or commercial food crop is a major regulatory non-compliance issue.
  • The CD Genomics Solution: Our alignment algorithms screen specifically for vector backbone retention, ensuring your product meets safety standards for "clean" events.
Technology

Technology Platforms: Nanopore & TLA

We employ a dual-technology approach, selecting the platform based on your specific genome complexity, available DNA quality, and project goals.

A. Nanopore Ultra-Long Sequencing (ONT)

The Gold Standard for Structural Complexity and Polyploidy.

Oxford Nanopore Technologies (ONT) sequencing reads native DNA strands without PCR amplification. We routinely generate read lengths exceeding 50 kb (N50), with maximum reads often surpassing 100 kb. This length is critical because it allows us to "bridge" the entire insertion site—anchoring into unique flanking DNA on both sides of a repetitive transgene array.

Integration Site Analysis using Nanopore sequencing to bridge complex transgene insertions and repetitive arrays

B. Targeted Locus Amplification (TLA)

The Gold Standard for SNVs and Sensitivity.

TLA is a proximity-ligation NGS strategy that turns the entire genome into a library of "neighbors." It relies on the physical proximity of DNA strands in the cross-linked nucleus.

Workflow

Step-by-Step Workflow

Our workflow is streamlined to deliver results in as little as 1-2 weeks from DNA receipt. We strictly adhere to protocols optimized for High Molecular Weight (HMW) DNA preservation.

Phase 1: Extraction

High Molecular Weight (HMW) DNA Extraction

We use a modified lysis protocol (Proteinase K/RNase A + phenol-chloroform). For crops, specialized buffers remove polyphenols. We require fragment size peak >50 kb.

Phase 2: Prep

Library Preparation

We employ ligation-based or rapid transposase-based protocols depending on needs. We minimize shearing to preserve ultra-long reads.

Phase 3: Sequence

Sequencing & Basecalling

Run time is typically 48 hours to generate sufficient data (5–15 Gb per flow cell). High-accuracy basecalling algorithms convert signals to sequence.

Phase 4: Analysis

Bioinformatics & Analysis

Alignment to hybrid reference (host + vector). Filtering for reads spanning the "Junction" (>500 bp host seq). Manual verification of candidates.

Step-by-step workflow for Transgene Integration Site Analysis from HMW DNA extraction to bioinformatics reporting

Bioinformatics

Comprehensive Bioinformatics Pipeline

We translate raw sequencing data into actionable biological insights. Our analysis pipeline is divided into Standard and Advanced tiers.

Analysis Module Description Technical Output
Genome Alignment Mapping reads to host reference + vector. .bam files, Coverage plots.
Integration Site Mapping Identification of precise breakpoints. Chromosome, Start/End Coordinates (e.g., Chr9:124,116,557).
Copy Number (CNV) Estimation of transgene copies. Ratio of Transgene Depth / Host Gene Depth (e.g., "18 copies").
Structural Variation (SV) Detection of complex rearrangements. Identification of Inversions, Translocations, and Concatenations.
Backbone Screening Detection of E. coli or plasmid DNA. BLAST results for Ori, AmpR, or genomic E. coli.
Flanking Gene Safety Impact analysis on local genes. Distance to nearest Exon/Intron; Gene ID of disrupted loci.
Primer Design Design of Junction-PCR assays. Primer sequences (Fwd/Rev) flanking the insertion site.

The "Genotyping Efficiency" Calculation

One of the most valuable outputs of our service is the design of Junction-PCR assays. By identifying the exact breakpoint, we provide you with a 3-primer design:

  1. Common Forward Primer (Host)
  2. Reverse Primer (Host)
  3. Reverse Primer (Transgene)

This allows you to distinguish Wild Type, Hemizygous, and Homozygous animals in a single PCR reaction. This replaces expensive and variable qPCR (CNV) assays, potentially saving 40–60% in annual colony maintenance costs.

Applications

Applications & Scenarios

Our ISA service supports diverse areas of pre-clinical and agricultural research, addressing sector-specific pain points.

Biomedical Research

(Mice & Rats)

  • The Problem: A researcher observes lethality in homozygous transgenic mice but not hemizygotes. Is it the transgene dosage, or did the insertion disrupt an essential gene?
  • Our Solution: We map the insertion. If the transgene is found inside an essential gene (e.g., Sox2), the lethality is likely due to the homozygous knockout of that gene, not the transgene expression. This saves months of wasted breeding.
  • Validation: Differentiating "true founders" from mosaics in CRISPR or Pronuclear injection projects.

Pre-Clinical Gene Therapy

(AAV/Lentivirus)

For IND filings, the FDA requires detailed characterization of the vector product and its interaction with the host.

  • Lentiviral Vectors: We analyze integration sites to assess "bias" towards active transcription units and ensure no preferential integration into proto-oncogenes (insertional oncogenesis risk).
  • AAV Vectors: While primarily episomal, AAV can integrate. We assess genomic toxicity, vector fragmentation, and the presence of "partial packaging" where truncated genomes are delivered.

Agricultural Biotechnology

(GM Crops)

  • Polyploid Challenges: In crops like Canola (Brassica napus, AACC genome), short reads cannot determine if the transgene is in the A or C sub-genome. Our ultra-long reads resolve this context.
  • Deregulation: Regulatory agencies (USDA, EFSA) require proof that the insertion is stable and lacks bacterial backbone. We provide the "Clean Event" certificate, confirming the absence of Agrobacterium border sequences or plasmid origins.
Samples

Sample Requirements

To ensure the success of long-read sequencing (Nanopore), the integrity of the input DNA is the single most critical factor. Sheared or degraded DNA will fail to produce the long reads necessary to span complex insertions.

We recommend the following specifications for optimal results:

Sample Type Input Amount Concentration Purity Criteria Storage & Transport
Genomic DNA (gDNA) ≥ 5 µg ≥ 50 ng/µL OD260/280: 1.8-2.0
OD260/230: 2.0-2.2
Dry ice. Do not vortex. Wide-bore tips only.
Fresh Tissue (Animal) > 100 mg N/A N/A Flash frozen in liquid N2. Ship on dry ice. Preferred: Spleen, Liver, Kidney.
Plant Tissue > 2 g N/A N/A Young leaves (dark treated). Flash frozen immediately to prevent degradation by polyphenols.
Cell Lines 5 x 10^6 cells N/A N/A Washed 2x with PBS. Pelleted and flash frozen.
Blood ≥ 2 mL N/A N/A EDTA anticoagulant. Ship at 4°C (do not freeze whole blood).

Note: For HMW DNA extraction services, please inquire with our team. We utilize specialized CTAB (for plants) and magnetic bead protocols (for animals) to preserve fragment length.

Case Study

Case Study: Characterizing the Oct4:EGFP Mouse

Title: Resolving Complex Tandem Arrays and Bacterial Contamination in Oct4:EGFP Mice via Nanopore Sequencing.

Citation: Nicholls, P. K., et al. (2019). "Locating and Characterizing a Transgene Integration Site by Nanopore Sequencing." G3: Genes|Genomes|Genetics, 9(5), 1481–1486.

The Tg(Pou5f1-EGFP)2Mnn (also known as Oct4:EGFP) is a critical transgenic mouse line used to visualize germ cells. Despite its widespread use in stem cell research, the genomic location of the transgene remained a "black box." Researchers faced challenges in distinguishing homozygous form hemizygous mice and feared potential disruption of endogenous genes. Traditional PCR walking had failed due to the highly repetitive nature of the insertion.

  • DNA Source: High-molecular-weight DNA was extracted from mouse liver using a phenol-chloroform protocol optimized for ultra-long fragments.
  • Sequencing: The library was prepared using rapid sequencing chemistry and sequenced on the Oxford Nanopore platform.
  • Analysis: Reads were aligned using long-read alignment tools to the mouse reference genome (mm10), the EGFP coding sequence, and the E. coli DH5α genome.

The Nanopore data provided a breakthrough in characterizing this model, revealing complexity that short-read sequencing had missed:

  1. Location Identified: The transgene was mapped to a sub-telomeric region on Chromosome 9, specifically at an intergenic region between Ccr2 and Ccr5.
  2. Structural Revelation: The insertion was not a simple copy but a massive tandem array containing approximately 26 copies of the transgene (spanning ~450 kb).
  3. Contamination Detected: Unexpectedly, the sequencing revealed a 6.2 kb fragment of E. coli genomic DNA embedded within the transgene array—a contamination event likely occurring during the original microinjection that had gone undetected for years.
  4. Endogenous Safety: It was confirmed that the insertion caused a 686 bp deletion in the host genome but did not disrupt the coding sequences of the flanking chemokine receptor genes.

Nanopore transgene integration site analysis showing dot plot of tandem array and E. coli DNA contamination.

The study demonstrated that long-read sequencing could resolve complex, repetitive integration events that short-read methods miss. The mapping allowed the team to design a specific Junction-PCR assay, enabling the cost-effective distinction between heterozygous and homozygous mice and significantly reducing colony management costs.

FAQ

Frequently Asked Questions

References

  1. Nicholls, P. K., et al. (2019). "Locating and Characterizing a Transgene Integration Site by Nanopore Sequencing." G3: Genes|Genomes|Genetics, 9(5), 1481–1486.
  2. Jupe, F., et al. (2019). "The complex architecture of transgenic insertion sites in plants resolved via nanopore sequencing." PLOS ONE, 14(1).
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Inquiry
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

CD Genomics is transforming biomedical potential into precision insights through seamless sequencing and advanced bioinformatics.

Copyright © CD Genomics. All Rights Reserved.
Top