Comparing the Three Platforms for Immune Repertoire Sequencing

Immune Repertoire Sequencing is the core technology to analyze the diversity of T cell receptor (TCR) and B cell receptor (BCR) and reveal the immune response mechanism, and the selection of the sequencing platform directly determines the resolution, throughput, and application scenarios of Immune Repertoire Sequencing data. At present, the mainstream Immune repertoire sequencing technology platform can be divided into three categories: short-read platform, long-read platform, and single cell integrated sequencing platform.

The article discusses three major immune repertoire sequencing platforms (short-read, long-read, single-cell integrated), their principles, features, applications, comparative analysis, and selection guidelines.

Short-Read Platforms for Immune Repertoire Sequencing

The short-read sequencing platform is the most widely used technical system in Immune repertoire sequencing, with the Illumina series platform as the core. With the advantages of Qualcomm quantity, high accuracy, and low cost, it can efficiently capture TCR/BCR variable region sequences, especially suitable for large-scale immune diversity screening and dynamic tracking. Although the full-length receptor sequence cannot be obtained directly due to the limitation of reading length, it is still the first choice tool for the study of Immune repertoire sequencing at the population level by optimizing the experiment and analysis strategy.

Technical Principle and Core Steps

The Immune repertoire sequencing experimental design of short-read and long-read platforms revolves around targeted amplification-library construction, high-throughput sequencing, and the key steps focus on specifically capturing variable region sequences and reducing amplification bias:

A. Targeted amplification strategy

a) Multiplex PCR primers were designed for the conserved sequences of V region and J region of TCR (α/β/γ/δ chain) and BCR (heavy chain/light chain), and the V-D-J fragment containing the CDR3 region (BCR is V-J or V-D-J fragment) can be amplified by one PCR.

B) In order to avoid the bias of subtype amplification caused by primer competition, it is necessary to optimize the primer concentration ratio (usually the concentration of each V/J primer is 10-20 μM) and control the number of PCR cycles (18-25 cycles) to reduce the over-amplification of high-abundance clones.

B. Library construction

a) Illumina platform-specific linker (P5/P7 linker) and sample Barcode(6-8 bp) need to be added to the amplified products. The introduction of the Barcode can realize the mixed sample sequencing of 12-96 samples, which significantly reduces the cost of a single sample. The library fragment size should be controlled at 200-500 bp.

b) After the fragment distribution is verified by Agilent Bioanalyzer, it is sequenced at a concentration of 10-8 pM. The NovaSeq platform can generate 600-1200 GB of data in a single run, which is enough to support the Immune Repertoire Sequencing analysis of hundreds of samples.

The structure of lymphocyte receptors specific to antigens and the mechanisms underlying diversity generation (Liu et al., 2021)The structure of antigen-specific lymphocyte receptors and the generation of diversit (Liu et al., 2021)

A. Data processing

a) After the original data is subjected to FastQC quality control (filtering the reading segment with Phred quality value less than 20) and Cutadapt to remove the linker sequence, V (D) J gene allocation and CDR3 region identification are carried out by using tools such as MiXCR or IMGT/HiHV-Quest.

b) By comparing with the V/J gene sequence in IMGT database, researches can determine the V, D and J subtypes of each reading segment, and locate the starting and ending positions of CDR3 region.

c) Finally, define Clonotype through sequence clustering (usually taking CDR3 nucleotide sequence ≥97% as the threshold), and calculate the diversity index (Shannon index, Simpson index) and clone abundance distribution.

B. Technical limitations and optimization direction

a) The core bottleneck of the short-read platform stems from the lack of read length, which makes it impossible to directly parse the complete V (D) J recombination sequence:

  • Loss of full-length variable region coverage: short reading length (≤300 bp) can only cover CDR3 region and part of V/J region, and it is impossible to capture the complete sequence of CDR1, CDR2 and CDR3 at the same time, while CDR1/CDR2 is also involved in antigen recognition, and the loss of this information may lead to the prediction deviation of antigen specificity.
  • The detection sensitivity of low-abundance clones is limited: Although Qualcomm can cover a large number of clones, there is bias in the PCR amplification process, which makes rare clones with abundance less than 0.01% easy to miss. In order to improve this problem, the low-cycle PCR+molecular marker technology has been developed in recent years, which involves adding a 10 bp molecular marker to the primer, and each original template molecule carries a unique molecular marker. The amplification bias can be corrected by removing the weight of molecular marker, which improves the detection sensitivity of low-abundance clones to 0.001%-0.005%, but it will increase the experimental cost and the complexity of data analysis.

The performance of short-read tools across various intron persistence thresholds (David et al., 2022)Short-read tool performance across different thresholds of intron persistence (David et al., 2022)

Long-Read Sequencing Platforms in Immune Repertoire Analysis

The long-read sequencing platform is the key breakthrough to solve the bottleneck of short-read technology. With the core advantages of super-long reading length (10 kb-tens of kb), PacBio and Oxford Nanopore can directly capture the full-length sequence of TCR/BCR variable region (including complete V, D, and J fragments and CDR1-CDR2-CDR3 regions), and become a special tool for deeply analyzing the receptor structure in Immune Repertoire Sequencing. Its technical value lies in breaking through the dependence of sequence splicing, realizing the accurate analysis of immune receptor recombination mode and somatic mutation, especially suitable for the study of B-cell immune library and rare clones.

Core Application of Long-Read Sequencing Methods

  • Whole-region analysis of BCR somatic high frequency mutation (SHM): SHM is the core mechanism of affinity maturation of B cells, which is distributed in the CDR1-CDR2-CDR3 region. The short reading platform can only analyze SHM in the CDR3 region, while the long reading platform can capture the mutation site of the full-length variable region.
  • The combination of rare V/J fragments and the mining of new recombination patterns: In the research of non-model animal (such as macaques and pigs) immune library, due to the imperfect gene annotation in IMGT database, the V/J allocation of short reading platform is prone to errors, while the long reading platform can find new V/J fragments through full-length sequence comparison.
  • Complete receptor analysis of rare TIL clones in tumor microenvironment: Rare TIL clones in tumor microenvironment (abundance < 0.1%) often carry tumor-specific antigen recognition receptors, and a short-read platform is difficult to capture due to amplification bias, while a long-read platform can achieve accurate capture through low cycle PCR+full-length amplification.

Technical Limitations and Development Trend

The promotion of long reading and long platform is limited by the bottleneck of low throughput and high cost:

  • Unbalanced flux and cost: PacBio Sequel II platform can generate 100-200 GB of data in a single operation, which is only 1/5-1/10 of Illumina NovaSeq, while the cost of a single sample is high, which cannot support large-scale sample analysis.
  • The complexity of data processing is high: The original data with very long reading length is large (a single sample is about 10-20 GB), and error correction and full-length sequence screening are needed. The data analysis time is about 24-48 hours (only 2-4 hours for the short-read platform), and it depends on high-performance computing servers and professional bioinformatics personnel.

To break through these limitations, the technical development directions in recent years include:

  • Flux improvement: PacBio launched the Sequel III platform, which increased the data volume of a single operation to 500-1000 Gb, and the cost of a single sample was low.
  • Joint database construction with short reading length: High-abundance clones are screened by a short-read platform, and their full-length sequences are analyzed by a long-read platform, to achieve the balance of Qualcomm screening and depth analysis.
  • Direct RNA sequencing: ONT can directly sequence the mRNA of BCR/TCR to avoid PCR amplification bias and further improve the detection sensitivity of low-abundance clones.

An overview of tools and pipelines for long-read analysis (Amarasinghe et al., 2019)Overview of long-read analysis tools and pipelines (Amarasinghe et al., 2019)

Single-Cell Platforms for Immune Repertoire Profiling

Traditional TCR-seq and BCR-seq are mostly based on bulk samples, which can only analyze the receptor diversity at the population level, but can't correlate the clonal type and functional state of a single cell. The single-cell integrated sequencing platform can capture the TCR/BCR sequence, transcriptome, and protein expression information of single cells synchronously through microfluidic and Cell Barcode technology, and realize the precise docking of clonal-cell phenotype-immune function, which provides a breakthrough tool for revealing the immune response mechanism of T cells and B cells.

Technical Principle and Experimental Workflow

The core of Illumina NovaSeq of the single-cell integration platform is to realize the matching of multi-omics data through Cell Barcode. Taking the 10x Genomics platform as an example, the experimental process is divided into four key steps:

  • Single cell capture and Barcode labeling: Peripheral blood mononuclear cells (PBMCs) or single cell suspension after tissue dissociation (cell activity > 90%) are mixed with Gel Beads with "Cell Barcode + molecular marker", and a "single cell-gel bead-droplet" complex is formed with the help of microfluidic chip technology to ensure that each liquid. Each gel bead carries one Cell Barcode and ten molecular markers, which are mainly used to correct the deviation in the process of polymerase chain reaction (PCR).
  • Cell lysis and reverse transcription: Add lysis buffer into the droplet system, and after cell lysis, mRNA will be released, which will be combined with primers containing Cell Barcode, molecular marker and oligo (dT) sequences on gel beads, and complementary DNA (cDNA) will be synthesized under the catalysis of reverse transcriptase. In this process, TCR/BCR mRNA of each cell carries the same Cell Barcode as other gene mRNA, thus effectively binding the receptor sequence with the transcriptome data.
  • Library construction and sequencing: Collecting cDNA products of all droplets, and respectively constructing two types of libraries after PCR amplification:
    a. Immune repertoire sequencing library: TCR/BCR-specific primers were used to amplify the variable region sequence.
    b. Transcriptome library: amplify the whole genome mRNA sequence.
  • After adding the Illumina linker, the two kinds of libraries were detected by a high-throughput sequencing platform. The single operation of the 10x Genomics platform can effectively capture 500-10,000 cells, and each cell can detect 500-2,000 gene expression levels and 1-2 TCR/BCR clones on average.
  • Multiomics data integration analysis: data analysis was carried out by using the special Cell Ranger software for 10x Genomics.

Flow cytometric assessment of the effects of MnBuOE and irradiation on T cell populations (Noh et al., 2024)Flow cytometric analysis of the effects of MnBuOE and irradiation on T cell populations (Noh et al., 2024)

cDNA Distinguishing According to Cell Barcode

  • The distribution of V (D) J gene fragment and the sequence identification of CDR3 were completed for the Immune repertoire sequencing library, and then the clonal type of each cell was determined.
  • The transcriptome library was quantitatively analyzed for gene expression, and cell phenotypes (such as CD8⁺T cells and depleted T cells) were identified with the help of marker genes (such as CD8A, CD4, PDCD1, etc.).
  • Finally, an incidence matrix containing cell ID-clone-phenotype-gene expression information is generated.

Core Advantages and Breakthrough Applications

The greatest value of a single-cell integration platform lies in revealing the direct relationship between cloning and function, and its application scenario completely changes the research boundary of traditional Immune Repertoire Sequencing:

A. Tracing the origin of pathological clones of autoimmune diseases

a) In the PBMCs analysis of patients with systemic lupus erythematosus (SLE), a class of B cells with BCR clones of IGHV3-23+IGKJ1+CD19+CD27-IgG+ were found on the BD Rhapsody platform, and the Klonga expressed IFN-α inducible gene (such as IFIT1). And the abundance was positively correlated with the disease activity (SLEDAI score).

b) Further experiments proved that the clone could secrete anti-dsDNA autoantibodies, which provided direct evidence for the pathological mechanism of SLE, while the traditional BCR-seq could not distinguish the phenotype (CD27-) from the function (IFN-α response) of the clone.

B. Clonal tracing of memory cells infected with immunity

a) In the analysis of PBMCs from recovered people in Covid-19, a class of TRB clone CASSQDRGDTQYF+CD8+CD45RO+CCR7-effector- effector memory T cells were traced by the 10x Genomics platform, and the clone remained high in abundance (1.5%) and highly expressed antiviral genes (such as GZMB and PRF1) after 6 months of rehabilitation.

Comparative Analysis Across Different Platforms

The technical characteristics and application value of the three types of IR-seq platforms are significantly different, and the multi-dimensional comparison of their core performance indicators is the key basis for platform selection. The following is a systematic comparison of six dimensions: flux, resolution, accuracy, cost, sample compatibility, and core application, and analyzes their complementarity.

Comparison of Core Performance Indicators

The technical characteristics and application value of the three types of IR-seq platforms are significantly different, and the multi-dimensional comparison of their core performance indicators is the key basis for platform selection. The following is a systematic comparison from seven dimensions: flux, resolution, accuracy, cost, sample compatibility, core application and data analysis:

Three major plotforms comparison

Performance Metrics Short-Read Platform (Illumina) Long-Read Platform (PacBio/ONT) Single-Cell Integrated Platform (10x Genomics)
Sequencing Throughput Extremely high (millions–billions of reads/sample) Medium-low (tens of thousands–millions of reads/sample) Medium (500–10,000 cells/sample)
Immune Repertoire Resolution Clone-level (cannot link to cell function) Full-length receptor-level (cannot link to cell function) Single-cell–clone–function level (highest resolution)
Sequence Accuracy High Medium-high High
Cost per Sample Low High Extremely high
Sample Compatibility Wide (fresh/frozen nucleic acids, tissues) Wide (fresh/frozen nucleic acids, small sample sizes) Limited (fresh viable cells, sufficient quantity required)
Core Application Large-scale screening, population dynamics tracking Full-length receptor analysis, SHM analysis Clone-function association, precise mechanism research
Data Analysis Complexity Low (standard V(D)J analysis tools) Medium (requires error correction and full-length filtering) High (multi-omics data integration)

Technical Complementarity

The three types of platforms are not substitutes, but can achieve comprehensive coverage of "breadth-depth-accuracy" through joint use. Typical joint strategies include:

  • Short-read screening + long-read in-depth analysis: In the study of vaccine immunogenicity, firstly, 100 cases of vaccinated PBMCs were sequenced by Immunization platform, and three BCR dominant clones (abundance > 1%) were screened out; The full-length variable region sequences of these three clones were analyzed by PacBio platform, and it was found that there was a common mutation of "GGT→GGA" in CDR1 region, which could enhance the binding affinity between antibodies and vaccine antigens.
  • Short-read dynamic tracking + single cell function correlation: In the research of CAR-T cell therapy, firstly, the changes of CAR-T clone abundance were tracked by the Illumina platform 1-3 months after treatment, and it was found that the abundance of CAR-T clone decreased to 1% in the second month (suggesting that exhaustion might occur). The single-cell sequencing of PBMCs at the second month was carried out by the 10x Genomics platform. It was found that CAR-T cells highly expressed depletion markers PDCD1 and LAG3, and their TRB clones were consistent with the dominant clones detected by Illumina.
  • Long reading and long structure analysis+single cell phenotype verification: In the study of B-cell lymphoma, a new V-D-J recombination pattern (V4-34+D6-13+J4) was found in the BCR heavy chain of lymphoma cells through the ONT platform. Then, it was verified by the BD Rhapsody platform that BCR of this recombinant model was only expressed in CD19+CD20-CD38+plasmacytoid lymphoma cells, and the proliferation-promoting gene MYC was highly expressed.

The assembly of S. aureus using ALLPATHS (Maccallum et al., 2009)The ALLPATHS assembly of S. aureus (Maccallum et al., 2009)

Conclusion

The short-read, long-read, and single cell integration of immuno-library sequencing have achieved breakthroughs in the dimensions of flux, reading length, and resolution, respectively, and built a complete system covering population screening to single cell mechanism research. Among them:

  • Short-read platform supports large-scale application with the advantages of Qualcomm and low cost.
  • Long-read platform breaks through the bottleneck of the full-length analysis of the receptor by virtue of its long reading length.
  • Single-cell integration platform realizes the accurate mapping of clonal type and function through multi-omics association.

The synergistic application of the three promotes the study of Immune repertoire sequencing from the description of immune diversity to the analysis of cloning functional mechanism and the mining of clinical markers.

In the future, with the upgrading of sequencing technology and the optimization of bioinformatics tools (AI-driven V (D) J allocation algorithm and multi-platform integration software), Immune repertoire sequencing will play a greater role in immune research and clinical application. Basic research can help to explore the maintenance of immune memory and the pathological mechanism of autoimmune diseases. In clinical application, it can develop disease diagnosis markers and immunotherapy prediction models to provide technical support for the prevention and control of immune diseases.

CD Genomics provides comprehensive, end-to-end immune repertoire sequencing services, leveraging Illumina short-read and PacBio/ONT long-read platforms. Contact now to discuss your project and learn how our tailored immune repertoire sequencing services can advance your research.

References

  1. Liu H, Pan W, Tang C, et al. "The methods and advances of adaptive immune receptors repertoire sequencing." Theranostics. 2021 11(18): 8945-8963.
  2. David JK, Maden SK, Wood MA, Thompson RF, Nellore A. "Retained introns in long RNA-seq reads are not reliably detected in sample-matched short reads." Genome Biol. 2022 23(1): 240.
  3. Amarasinghe, S.L., Su, S., Dong, X. et al. "Opportunities and challenges in long-read sequencing data analysis." Genome Biol. 21, 30 (2020).
  4. Noh SU, Lim J, Shin SW, et al. "Single-Cell Profiling Reveals Immune-Based Mechanisms Underlying Tumor Radiosensitization by a Novel Mn Porphyrin Clinical Candidate, MnTnBuOE-2-PyP5+ BMX-001)." Antioxidants (Basel). 2024 13(4): 477.
  5. Maccallum I, Przybylski D, Gnerre S, et al. "ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads." Genome Biol. 2009 10(10): R103.
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.


Related Services
Inquiry
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

CD Genomics is transforming biomedical potential into precision insights through seamless sequencing and advanced bioinformatics.

Copyright © CD Genomics. All Rights Reserved.
Top