Integration Site Analysis (ISA) Service for Transgene Integration
Eliminate the "Black Box" of Random Integration: Precision Mapping of Transgene Location, Structure, and Integrity.
From the development of complex polyploid GM crops to Investigational New Drug (IND) enabling studies for gene therapy, the "random" nature of transgene integration remains a critical variable. CD Genomics replaces assumption with absolute certainty.
We utilize Nanopore Ultra-Long Sequencing and Targeted Locus Amplification (TLA) to transform "black box" insertions into fully characterized genetic loci.
Integration Site Analysis (ISA) is a molecular characterization service that determines the precise chromosomal location, copy number, and structural integrity of foreign DNA insertions. It allows researchers to validate transgenic models by using long-read sequencing to resolve complex tandem arrays, detect plasmid backbone contamination, and verify the safety of flanking endogenous genes.
By moving beyond simple presence/absence detection (PCR) or low-resolution imaging (FISH), we provide a definitive "Genomic Passport" for your biological product. Whether you are characterizing a founder mouse line to separate hemizygotes from homozygotes, or validating a Master Cell Bank (MCB) for viral vector production, our pipeline ensures that your transgene is exactly where it should be—and nowhere else.
Structural Clarity: Resolve massive tandem arrays (>20 copies), inversions, and "invisible" backbone contamination.
Regulatory Compliance: Generate robust data packages for FDA/EMA safety assessments and GMO deregulation.
Black Box Risks
The Challenge: Risks of the "Black Box" Model
For decades, the generation of transgenic models has relied on methods like pronuclear injection, biolistics, and viral transduction. While effective at introducing DNA, these methods share a common flaw: Random Integration.
Without precise mapping, you are operating with incomplete data. This "black box" presents three critical risks to your pre-clinical research pipeline that standard genotyping cannot detect:
Endogenous Gene Disruption
(Insertional Mutagenesis)
The insertion of foreign DNA is a violent genetic event. If the transgene lands within the coding region (exon) or regulatory element (promoter/enhancer) of an endogenous gene, it can create "off-target" phenotypes.
The Quantitative Risk: Studies estimate that 5–10% of transgenic mice exhibit phenotypes unrelated to the transgene itself, but rather due to the disruption of native genes.
The CD Genomics Solution: We map the insertion relative to the host annotation (RefSeq/Ensembl) to certify that flanking genes (e.g., Ccr2, Ccr5 in mouse models) remain intact and functional.
Unstable Tandem Arrays & Silencing
Transgenes rarely integrate as single, clean copies. They often form complex head-to-tail concatemers (tandem arrays).
The Quantitative Risk: High copy numbers (often exceeding 20 copies per locus) can trigger epigenetic silencing (heterochromatin formation), leading to variable expression or total loss of the trait over generations.
The CD Genomics Solution: We resolve the internal architecture of the array, identifying inverted repeats, truncations, and scrambling that serve as triggers for silencing.
"Hidden" Contamination
(Vector Backbones)
During plasmid linearization, fragments of the bacterial backbone (origins of replication, antibiotic resistance markers) can accidentally co-integrate.
The Regulatory Risk: The presence of bacterial DNA (e.g., AmpR or E. coli genomic fragments) in a therapeutic vector or commercial food crop is a major regulatory non-compliance issue.
The CD Genomics Solution: Our alignment algorithms screen specifically for vector backbone retention, ensuring your product meets safety standards for "clean" events.
Technology
Technology Platforms: Nanopore & TLA
We employ a dual-technology approach, selecting the platform based on your specific genome complexity, available DNA quality, and project goals.
A. Nanopore Ultra-Long Sequencing (ONT)
The Gold Standard for Structural Complexity and Polyploidy.
Oxford Nanopore Technologies (ONT) sequencing reads native DNA strands without PCR amplification. We routinely generate read lengths exceeding 50 kb (N50), with maximum reads often surpassing 100 kb. This length is critical because it allows us to "bridge" the entire insertion site—anchoring into unique flanking DNA on both sides of a repetitive transgene array.
Polyploid Resolution: Essential for AgBio. Long reads can distinguish between homeologous sub-genomes (e.g., distinguishing the A-genome vs. C-genome in Brassica napus), which short reads often collapse into a single consensus.
Chemistry & Throughput: We utilize high-efficiency ligation sequencing chemistries for maximum throughput and rapid sequencing chemistries for ultra-long read preservation. Sequencing is performed on high-throughput flow cells depending on the required depth (typically 1.8-fold to 10-fold haploid genome coverage is sufficient for mapping).
Epigenetic Data: As Nanopore sequences native DNA, we can simultaneously detect CpG methylation patterns on the transgene promoter, offering early insight into potential silencing mechanisms.
B. Targeted Locus Amplification (TLA)
The Gold Standard for SNVs and Sensitivity.
TLA is a proximity-ligation NGS strategy that turns the entire genome into a library of "neighbors." It relies on the physical proximity of DNA strands in the cross-linked nucleus.
Mechanism:
Crosslinking: DNA is crosslinked in situ to preserve physical proximity.
Digestion & Ligation: DNA is digested with a frequent cutter and re-ligated to form circular molecules.
Selective Amplification: Using just one specific primer pair complementary to a short unique sequence in the transgene, we amplify the transgene and the flanking genomic DNA that was ligated to it.
Advantages: This method yields extremely high coverage (>100×) of the transgene itself, making it ideal for detecting Single Nucleotide Variants (SNVs) and rare integration events in heterogeneous cell populations.
Workflow
Step-by-Step Workflow
Our workflow is streamlined to deliver results in as little as 1-2 weeks from DNA receipt. We strictly adhere to protocols optimized for High Molecular Weight (HMW) DNA preservation.
Phase 1: Extraction
High Molecular Weight (HMW) DNA Extraction
We use a modified lysis protocol (Proteinase K/RNase A + phenol-chloroform). For crops, specialized buffers remove polyphenols. We require fragment size peak >50 kb.
Phase 2: Prep
Library Preparation
We employ ligation-based or rapid transposase-based protocols depending on needs. We minimize shearing to preserve ultra-long reads.
Phase 3: Sequence
Sequencing & Basecalling
Run time is typically 48 hours to generate sufficient data (5–15 Gb per flow cell). High-accuracy basecalling algorithms convert signals to sequence.
Phase 4: Analysis
Bioinformatics & Analysis
Alignment to hybrid reference (host + vector). Filtering for reads spanning the "Junction" (>500 bp host seq). Manual verification of candidates.
Bioinformatics
Comprehensive Bioinformatics Pipeline
We translate raw sequencing data into actionable biological insights. Our analysis pipeline is divided into Standard and Advanced tiers.
Ratio of Transgene Depth / Host Gene Depth (e.g., "18 copies").
Structural Variation (SV)
Detection of complex rearrangements.
Identification of Inversions, Translocations, and Concatenations.
Backbone Screening
Detection of E. coli or plasmid DNA.
BLAST results for Ori, AmpR, or genomic E. coli.
Flanking Gene Safety
Impact analysis on local genes.
Distance to nearest Exon/Intron; Gene ID of disrupted loci.
Primer Design
Design of Junction-PCR assays.
Primer sequences (Fwd/Rev) flanking the insertion site.
The "Genotyping Efficiency" Calculation
One of the most valuable outputs of our service is the design of Junction-PCR assays. By identifying the exact breakpoint, we provide you with a 3-primer design:
Common Forward Primer (Host)
Reverse Primer (Host)
Reverse Primer (Transgene)
This allows you to distinguish Wild Type, Hemizygous, and Homozygous animals in a single PCR reaction. This replaces expensive and variable qPCR (CNV) assays, potentially saving 40–60% in annual colony maintenance costs.
Applications
Applications & Scenarios
Our ISA service supports diverse areas of pre-clinical and agricultural research, addressing sector-specific pain points.
Biomedical Research
(Mice & Rats)
The Problem: A researcher observes lethality in homozygous transgenic mice but not hemizygotes. Is it the transgene dosage, or did the insertion disrupt an essential gene?
Our Solution: We map the insertion. If the transgene is found inside an essential gene (e.g., Sox2), the lethality is likely due to the homozygous knockout of that gene, not the transgene expression. This saves months of wasted breeding.
Validation: Differentiating "true founders" from mosaics in CRISPR or Pronuclear injection projects.
Pre-Clinical Gene Therapy
(AAV/Lentivirus)
For IND filings, the FDA requires detailed characterization of the vector product and its interaction with the host.
Lentiviral Vectors: We analyze integration sites to assess "bias" towards active transcription units and ensure no preferential integration into proto-oncogenes (insertional oncogenesis risk).
AAV Vectors: While primarily episomal, AAV can integrate. We assess genomic toxicity, vector fragmentation, and the presence of "partial packaging" where truncated genomes are delivered.
Agricultural Biotechnology
(GM Crops)
Polyploid Challenges: In crops like Canola (Brassica napus, AACC genome), short reads cannot determine if the transgene is in the A or C sub-genome. Our ultra-long reads resolve this context.
Deregulation: Regulatory agencies (USDA, EFSA) require proof that the insertion is stable and lacks bacterial backbone. We provide the "Clean Event" certificate, confirming the absence of Agrobacterium border sequences or plasmid origins.
Samples
Sample Requirements
To ensure the success of long-read sequencing (Nanopore), the integrity of the input DNA is the single most critical factor. Sheared or degraded DNA will fail to produce the long reads necessary to span complex insertions.
We recommend the following specifications for optimal results:
Sample Type
Input Amount
Concentration
Purity Criteria
Storage & Transport
Genomic DNA (gDNA)
≥ 5 µg
≥ 50 ng/µL
OD260/280: 1.8-2.0 OD260/230: 2.0-2.2
Dry ice. Do not vortex. Wide-bore tips only.
Fresh Tissue (Animal)
> 100 mg
N/A
N/A
Flash frozen in liquid N2. Ship on dry ice. Preferred: Spleen, Liver, Kidney.
Plant Tissue
> 2 g
N/A
N/A
Young leaves (dark treated). Flash frozen immediately to prevent degradation by polyphenols.
Cell Lines
5 x 10^6 cells
N/A
N/A
Washed 2x with PBS. Pelleted and flash frozen.
Blood
≥ 2 mL
N/A
N/A
EDTA anticoagulant. Ship at 4°C (do not freeze whole blood).
Note: For HMW DNA extraction services, please inquire with our team. We utilize specialized CTAB (for plants) and magnetic bead protocols (for animals) to preserve fragment length.
Case Study
Case Study: Characterizing the Oct4:EGFP Mouse
Title: Resolving Complex Tandem Arrays and Bacterial Contamination in Oct4:EGFP Mice via Nanopore Sequencing.
Citation: Nicholls, P. K., et al. (2019). "Locating and Characterizing a Transgene Integration Site by Nanopore Sequencing." G3: Genes|Genomes|Genetics, 9(5), 1481–1486.
The Tg(Pou5f1-EGFP)2Mnn (also known as Oct4:EGFP) is a critical transgenic mouse line used to visualize germ cells. Despite its widespread use in stem cell research, the genomic location of the transgene remained a "black box." Researchers faced challenges in distinguishing homozygous form hemizygous mice and feared potential disruption of endogenous genes. Traditional PCR walking had failed due to the highly repetitive nature of the insertion.
DNA Source: High-molecular-weight DNA was extracted from mouse liver using a phenol-chloroform protocol optimized for ultra-long fragments.
Sequencing: The library was prepared using rapid sequencing chemistry and sequenced on the Oxford Nanopore platform.
Analysis: Reads were aligned using long-read alignment tools to the mouse reference genome (mm10), the EGFP coding sequence, and the E. coli DH5α genome.
The Nanopore data provided a breakthrough in characterizing this model, revealing complexity that short-read sequencing had missed:
Location Identified: The transgene was mapped to a sub-telomeric region on Chromosome 9, specifically at an intergenic region between Ccr2 and Ccr5.
Structural Revelation: The insertion was not a simple copy but a massive tandem array containing approximately 26 copies of the transgene (spanning ~450 kb).
Contamination Detected: Unexpectedly, the sequencing revealed a 6.2 kb fragment of E. coli genomic DNA embedded within the transgene array—a contamination event likely occurring during the original microinjection that had gone undetected for years.
Endogenous Safety: It was confirmed that the insertion caused a 686 bp deletion in the host genome but did not disrupt the coding sequences of the flanking chemokine receptor genes.
The study demonstrated that long-read sequencing could resolve complex, repetitive integration events that short-read methods miss. The mapping allowed the team to design a specific Junction-PCR assay, enabling the cost-effective distinction between heterozygous and homozygous mice and significantly reducing colony management costs.
FAQ
Frequently Asked Questions
How does Nanopore sequencing differ from TLA for transgene mapping?
Nanopore sequencing uses ultra-long reads to physically span the entire insertion site and flanking regions. It is ideal for complex, repetitive arrays and large structural variants, especially in plants. TLA (Targeted Locus Amplification) uses cross-linking and PCR to amplify the transgene's neighborhood. TLA is excellent for detecting Single Nucleotide Variants (SNVs) with high sensitivity but relies on short-read sequencing. We will recommend the best approach based on your project.
Can you detect if the transgene is inserted in multiple chromosomes?
Yes. Our bioinformatics pipeline maps reads to the entire reference genome. If there are multiple integration sites (e.g., one on Chr 4 and one on Chr 12), the alignment will reveal distinct junctions for each insertion event, and we can estimate the copy number at each specific locus.
Why is "backbone contamination" a problem for my transgenic model?
Bacterial backbone sequences (like antibiotic resistance genes AmpR or KanR) can trigger immune responses in gene therapy vectors and are strictly regulated in GMO crops. In research mice, they can cause unexpected silencing effects due to their high GC content or bacterial methylation patterns. Identifying them is crucial for regulatory compliance and model stability.
Do I need to provide the sequence of my transgene?
Yes. To perform the alignment, we need the sequence of the vector/plasmid used for transfection or injection. If you suspect the vector has been modified or rearranged, we can perform de novo assembly of the insert, but the original reference helps guide the analysis.
Can you distinguish between hemizygous and homozygous animals?
The sequencing itself maps the location and structure. However, using the specific junction sequences we identify, we can design a PCR assay for you. This PCR assay can easily distinguish hemizygous (one band) from homozygous (two bands or specific band patterns) animals in your colony, replacing the need for sequencing every individual.
What is the advantage of using this service for GM crops compared to standard PCR walking?
GM crops often have large, polyploid genomes with high repetitive content (transposons). PCR walking often fails because primers bind non-specifically to these repetitive elements. Our long-read sequencing reads through the repeats to anchor the transgene into unique genomic regions, providing a confident map even in complex plants like Wheat or Canola.