T2T Genome Assembly vs Draft Assembly: What You Gain in Repeats and Structural Variants
Introduction
Traditional genome assemblies have long relied on short-read sequencing technologies. However, these draft assemblies often contain gaps. These gaps mainly occur in repetitive regions of the genome. As a result, many genetic variants go undetected. In fact, this limitation can significantly reduce the accuracy of research findings.
For instance, researchers analyzed data from the 1000 Genomes Project. They compared results using the older GRCh38 reference to the new complete T2T-CHM13 reference. The analysis revealed more than one million additional high-quality variants across the samples. Most of these newly found variants were located in previously unresolved repetitive regions (Aganezov et al., 2022. DOI: https://doi.org/10.1126/science.abl3533).
This discovery highlights a key problem. Draft assemblies miss important information. Therefore, downstream analyses become incomplete. Researchers may spend extra time and resources trying to interpret limited data.
Now, telomere-to-telomere (T2T) genome assembly offers a better solution. First, let's explain it simply. The genome is like a long instruction book for life. It contains all the DNA code in an organism. Sequencing reads small pieces of this code. Assembly puts the pieces together to rebuild the full book.
Short reads are tiny pieces, usually 100 to 150 bases long. They work well in unique parts of the genome. However, repetitive regions have similar or identical sequences. Short reads cannot span these repeats properly. Consequently, the assembly breaks into fragments. Gaps appear. Some regions collapse or get assembled incorrectly. This produces a draft assembly. It is helpful but remains incomplete.
In contrast, T2T assembly relies on long-read technologies, such as PacBio HiFi and Oxford Nanopore ultra-long reads. These reads often span tens to hundreds of thousands of bases—or even millions in some cases. As a result, they effectively bridge across complex repetitive sequences that short reads cannot resolve. This capability produces truly contiguous assemblies. Each chromosome is now represented as a single continuous sequence, extending from one telomere to the other without any gaps.
The landmark T2T-CHM13 human genome assembly demonstrates this advance clearly. It resolved all remaining gaps from previous references and added substantial new sequence in repetitive regions.
Figure 1: Overview of the complete T2T-CHM13 human genome assembly, highlighting resolved gaps and newly added sequences compared to GRCh38.
This multi-panel ideogram shows, for each chromosome, gaps and assembly issues in the prior GRCh38 reference (now fixed in CHM13), densities of exclusive genes (red), segmental duplications, centromeric satellites, and additional nonsyntenic bases added in T2T-CHM13 (especially in acrocentric arms and repetitive regions).
Why does this improvement matter for biotech teams and contract research organizations (CROs)? You often evaluate sequencing and assembly options for projects. Choosing the right approach affects data quality. T2T provides several clear advantages:
- Superior resolution of repetitive regions. Repeats, such as telomeres, centromeres, and segmental duplications, make up substantial portions of many genomes. In the human genome, these areas were historically difficult to assemble. Draft methods leave them fragmented or missing. T2T resolves them completely. Therefore, the true genomic structure becomes visible.
- More accurate detection of structural variants. Structural variants (SVs) are large-scale changes in DNA. Examples include deletions, insertions, duplications, inversions, and translocations. Many SVs occur within or near repeats. Short-read drafts frequently miss or misinterpret these SVs. Long-read T2T assemblies capture complex SVs reliably. This reduces errors and uncovers new biological insights.
- Stronger foundation for downstream applications. Complete assemblies improve every step that follows. Read mapping becomes more precise. Variant calling gains accuracy. Functional studies, such as gene annotation or population genomics, benefit greatly. For non-model organisms or complex samples, the gains are even larger.
The T2T-CHM13 assembly marked a milestone. Published in 2022, it added nearly 200 million bases of new sequence. It filled the remaining gaps from previous references (Nurk et al., 2022. DOI: https://doi.org/10.1126/science.abj6987). Since then, studies consistently show real-world benefits. Teams now select T2T when high-resolution data is essential.
In summary, T2T represents the new standard in genome assembly. It addresses longstanding limitations of draft approaches. The sections below explore the key differences in detail. We will review evidence from published studies. Finally, we will guide you on when T2T is the best choice for your research needs.
The advantages of T2T assemblies become clear when we compare them directly to traditional draft assemblies. In particular, three areas stand out. First, contiguity improves dramatically. Second, repetitive regions gain much better resolution. Third, structural variant detection becomes far more reliable. Let's explore each difference step by step. These gains come from long-read technologies. If you need a quick refresher on the basics of T2T sequencing, start here: Telomere-to-Telomere (T2T) Sequencing Explained: When You Need a Complete Genome.
Key Differences Between T2T and Draft Assemblies
Improved Contiguity
Contiguity refers to how connected the assembled sequences are. In simple terms, it measures the length of continuous DNA pieces without breaks or gaps.
Traditional draft assemblies, built mostly from short reads, often lack contiguity. For example, the widely used human reference GRCh38 contains hundreds of gaps. Moreover, some chromosomes break into many separate pieces called contigs. These gaps cluster in hard-to-assemble regions. As a result, researchers cannot easily study the full chromosome structure.
In contrast, T2T assemblies achieve remarkable contiguity. The T2T-CHM13 human genome has zero gaps across nearly all chromosomes. Each autosome and the X chromosome forms one single, unbroken contig. This means the sequence runs continuously from telomere to telomere. Therefore, the entire chromosome layout is accurate and complete (Nurk et al., 2022. DOI: https://doi.org/10.1126/science.abj6987).
This leap matters for practical research. High contiguity simplifies read mapping. It also reduces errors in downstream analyses. For biotech teams working on complex genomes, this saves time and improves results.
Better Resolution of Repetitive Regions
Repetitive regions pose the biggest challenge in genome assembly. Repeats are DNA sequences that appear multiple times, often nearly identical. They make up over half of the human genome. Because short reads are too brief to span full repeats, draft assemblies struggle here. The assembler either collapses repeats into shorter versions or leaves gaps entirely.
T2T assemblies handle repeats much better. Long reads span entire repeat arrays. They also capture unique sequences on both sides. As a result, the true length, order, and variation of repeats are resolved accurately.
Key repetitive regions that benefit include:
- Telomeres. These are the protective ends of chromosomes. They consist of thousands of "TTAGGG" repeats. Draft assemblies often shorten or approximate them. T2T resolves them fully, revealing exact lengths and subtelomeric variations.
- Centromeres. These central regions help chromosomes divide properly during cell growth. They contain massive arrays of satellite DNA, organized into higher-order repeats (HORs). Previous drafts showed only rough, low-resolution versions. T2T provides complete, high-resolution maps. For instance, active centromeres now show layered HOR structures with precise epigenetic marks (Altemose et al., 2022. DOI: https://doi.org/10.1126/science.abl4178).
- Acrocentric short arms. Certain chromosomes (13, 14, 15, 21, 22) have short arms rich in ribosomal DNA (rDNA) arrays. These produce the ribosomes needed for protein building. Draft assemblies left them largely unassembled or collapsed. T2T fills them completely, adding millions of bases.
- Segmental duplications. These are large blocks of DNA (often >10 kb) copied between different genome locations. They include many medically relevant genes. Draft methods collapse identical copies. T2T expands them correctly, revealing true copy numbers and variations.
Figure 2: Comparison of centromeric higher-order repeat (HOR) resolution in previous draft assemblies versus the complete T2T-CHM13 assembly.
This improved resolution opens new doors. Researchers can now study repeat functions accurately. For example, variations in centromeric HORs link to chromosome stability. Segmental duplication differences affect gene dosage in population studies.
Enhanced Structural Variant Detection
Structural variants (SVs) are large-scale DNA changes. They include deletions, insertions, inversions, duplications, and more. Many SVs occur within or near repetitive regions. This makes them hard to detect with short reads.
Short reads map ambiguously in repeats. Tools often miss SVs or call them incorrectly. As a result, draft assemblies provide incomplete SV catalogs.
Long-read T2T assemblies change this. Reads align uniquely, even in complex areas. The long span directly reveals variant structures. Therefore, SV calling becomes more sensitive and specific.
Studies confirm major gains. When reanalyzing human samples with the T2T reference, researchers discovered millions of new variants. Most were insertions in repetitive regions previously hidden (Aganezov et al., 2022. DOI: https://doi.org/10.1126/science.abl3533), Complex SVs, like those in segmental duplications, now appear correctly.
For research teams, this means richer datasets. You uncover variation that influences gene regulation or diversity. In non-model organisms, the benefits are even greater. Complete assemblies capture species-specific repeats and SVs reliably.
In summary, these differences—superior contiguity, precise repeat resolution, and accurate SV detection—make T2T the preferred choice for high-quality genomics.
T2T vs Draft Genome Assembly: Key Comparison
| Feature | Traditional Draft Assembly (e.g., GRCh38) | T2T Assembly (e.g., T2T-CHM13) | Key Benefits of T2T |
|---|---|---|---|
| Contiguity | Hundreds of gaps; many chromosomes fragmented | Zero gaps; single contig per chromosome | Accurate chromosome-scale structure; easier read mapping |
| Repetitive Regions (telomeres, centromeres, rDNA, SDs) | Collapsed, shortened, or gapped | Fully resolved with exact lengths and variants | True repeat architecture and variation revealed |
| Structural Variant Detection | Misses or mis-calls many SVs in repeats | High sensitivity and specificity for complex SVs | Millions of new variants discovered |
| New Sequence Added | ~8% of genome unresolved | +~200 Mb fully sequenced | Complete view of previously hidden biology |
| Best For | Basic SNP calling in unique regions; lower cost | Repetitive region studies, SV analysis, pangenomes | Publication-quality, high-resolution research |
Real-World Gains and Evidence
The differences between T2T and draft assemblies are not just theoretical. In fact, published studies show clear, practical benefits. These come from the T2T-CHM13 complete human genome and follow-up research. As a result, teams now gain deeper insights into genomic variation. Moreover, they achieve more reliable results in diverse projects.
One major gain appears in variant discovery. When researchers remapped data from large cohorts to the T2T reference, they found many new variants. For example, a study reanalyzed samples from the 1000 Genomes Project. It used both the older GRCh38 draft and the new T2T-CHM13. The results were striking. Over two million new insertions emerged. Most of these sat in repetitive regions that drafts could not resolve properly. In addition, small variants increased by hundreds of thousands. This boosted the total high-quality variants significantly (Aganezov et al., 2022. DOI: https://doi.org/10.1126/science.abl3533).
Why does this matter? These new variants fill gaps in our understanding of human diversity. Previously hidden changes now become visible. Therefore, population genomics studies gain accuracy. Biotech teams studying genetic variation get richer datasets. This helps in research on evolution, ancestry, and basic biology.
A dedicated study mapped SDs across the full T2T-CHM13 genome. It revealed a comprehensive view for the first time. SDs cover about 235 million bases in this complete reference. This includes 68 million bases of new sequence from previously unresolved areas. When the team checked copy number variation across 268 diverse human genomes, a clear pattern emerged. About 91% of this new SD sequence better matched real human variation than older references. In contrast, only 9% seemed specific to the CHM13 cell line (Vollger et al., 2022. DOI: https://doi.org/10.1126/science.abj6965).
This finding has broad impact. Accurate SD maps help researchers study gene duplication events. Many important gene families live in these regions. For instance, genes involved in brain development or immune response often duplicate. With T2T, copy numbers are precise. As a result, functional studies become more trustworthy.
Figure 3: Genome-wide map of segmental duplications in the complete T2T-CHM13 human genome.
This ideogram or circos-style plot shows the distribution and density of segmental duplications across all chromosomes. Newly resolved SD regions (from repetitive areas like acrocentric arms and pericentromeric zones) are highlighted, illustrating the expanded and accurate representation compared to prior draft references.
Further evidence comes from centromere research. Centromeres rely on massive repeat arrays. Draft assemblies provided only simplified models. T2T delivers detailed, layered structures. One study built high-resolution maps of higher-order repeats in all human centromeres. It uncovered variant-rich layers and epigenetic patterns. These details link to chromosome stability in cell division. Researchers now explore basic mechanisms of inheritance more accurately (Altemose et al., 2022. DOI: https://doi.org/10.1126/science.abl4178).
In non-human genomes, the gains multiply. For example, teams assembling plant or animal genomes face even larger repeats. T2T approaches resolve species-specific structures. This aids comparative genomics. It also supports biodiversity research.
Overall, these studies build strong authority for T2T. The complete reference uncovers hidden biology. It reduces errors from gaps. Moreover, it sets a foundation for future work.
If you want to evaluate these gains in your own assemblies, see our guide on resources: T2T Assembly QC Metrics: Completeness, Accuracy, and How to Evaluate Results. For a deeper dive into hard-to-assemble regions like telomeres, centromeres, and segmental duplications, check this resource: Assembling the Hard Parts: Telomeres, Centromeres, and Segmental Duplications in the T2T Era.
These real-world examples show why many teams switch to T2T. The evidence is clear and growing.
Conclusion
Throughout this guide, we have explored the clear advantages of telomere-to-telomere (T2T) genome assembly over traditional draft approaches. First, T2T delivers superior contiguity with gap-free chromosomes. Second, it resolves repetitive regions accurately, including telomeres, centromeres, acrocentric arms, and segmental duplications. Third, it enables precise detection of structural variants that drafts often miss. Moreover, real-world studies confirm these gains. They show millions of new variants discovered, better maps of duplications, and deeper insights into centromere biology.
These improvements make T2T the better choice in many scenarios. However, the decision depends on your project goals. Here are key situations where T2T stands out:
- When studying repetitive regions. If your research focuses on telomeres, centromeres, ribosomal DNA, or segmental duplications, drafts fall short. T2T provides the complete structure needed for reliable results.
- For accurate structural variant analysis. Complex SVs hide in repeats. If detecting insertions, inversions, or duplications is critical, long-read T2T reduces errors and uncovers hidden variation.
- In population or comparative genomics. Complete references reveal more diversity across samples. This is especially useful for human cohorts or non-model organisms with unique repeats.
- When downstream accuracy matters most. High-quality assemblies improve read mapping, annotation, and functional studies. For biotech R&D or CRO projects aiming for publication-level data, T2T minimizes rework.
- For novel or complex genomes. Plant, animal, or microbial genomes often have larger repeats than humans. T2T handles them robustly, leading to breakthrough assemblies.
In contrast, draft assemblies may suffice for simple queries. For example, basic SNP calling in unique regions works fine with short reads. They are also faster and cheaper for low-resolution overviews. However, as long-read costs drop and tools improve, T2T becomes accessible for more teams.
The evidence is strong. The T2T-CHM13 reference and follow-up papers prove the value. As a result, many researchers now adopt complete assemblies as the standard.
Are you evaluating assembly options for your next project? The choice affects data quality and insights. To help you decide, we offer a free comparison checklist. It covers key factors like contiguity, repeat resolution, and SV detection. Download it today to guide your planning.
Once you choose T2T, the next step is selecting deliverables. Options include primary assemblies, polished contigs, haplotype phasing, and standard formats. For practical guidance on these outputs, explore our detailed resource: Choosing the Right T2T Deliverables: Assembly Outputs, Polishing, Phasing, and Data Formats (RUO).
Our team specializes in high-quality T2T assemblies for research-use-only applications. We use proven long-read platforms to deliver complete, accurate genomes. Whether you need human references or custom organisms, we can support your goals.
Ready to upgrade your genomics projects? Contact us for a consultation. We will discuss your needs and recommend the best approach. Start achieving gap-free results today.
References:
- Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., Vollger, M. R., Altemose, N., Uralsky, L., Gershman, A., Aganezov, S., Hoyt, S. J., Diekhans, M., Logsdon, G. A., Alonge, M., Antonarakis, S. E., Borchers, M., Bouffard, G. G., Brooks, S. Y., ... Phillippy, A. M. (2022). The complete sequence of a human genome. Science, 376(6588), 44–53. https://doi.org/10.1126/science.abj6987
- Aganezov, S., Yan, S. M., Soto, D. C., Kirsche, M., Zarate, S., Avdeyev, P., Taylor, D. J., Shafin, K., Shumate, A., Xiao, C., Wagner, J., McDaniel, J., Olson, N. D., Miga, K. H., Phillippy, A. M., Schatz, M. C., & Schatz, M. C. (2022). A complete reference genome improves analysis of human genetic variation. Science, 376(6588), eabl3533. https://doi.org/10.1126/science.abl3533
- Altemose, N., Logsdon, G. A., Bzikadze, A. V., Sidhwani, P., Langley, S. A., Caldas, G. V., Hoyt, S. J., Uralsky, L., Ryabov, F. D., Shew, C. J., Sauria, M. E. G., Borchers, M., Gershman, A., Mikheenko, A., Shepelev, V. A., Dvorkina, T., Kunyavskaya, O., Vollger, M. R., Rhie, A., ... Eichler, E. E. (2022). Complete genomic and epigenetic maps of human centromeres. Science, 376(6588), eabl4178. https://doi.org/10.1126/science.abl4178
- Vollger, M. R., Dishuck, P. C., Sorensen, M., Welch, A. E., Logsdon, G. A., Mikheenko, A., Rhie, A., Mullen, J. L., Warren, W. C., Graves-Lindsay, T. A., Tracey, A., Lucas, J. K., Zevallos, K. N., Asri, M., Kurtz, S., Eichler, E. E., & Eichler, E. E. (2022). Segmental duplications and their variation in a complete human genome. Science, 376(6588), abj6965. https://doi.org/10.1126/science.abj6965