Telomere-to-Telomere (T2T) Sequencing Explained: When You Need a Complete Genome
For nearly two decades, the "completed" human genome still had blind spots. About eight percent of sequence was missing, and much of it sat in hard-to-read repeats. Those gaps mattered because repeats often hide genes, regulatory elements, and structural variants.
Today, that gap is closed for the human reference thanks to long reads and better assembly methods. This guide explains what telomere to telomere genome assembly means, why it is possible now, and how you can plan a beginner-friendly project without costly trial-and-error. We keep the focus on non-clinical research uses.
What Telomere to Telomere Genome Assembly Means
Telomere-to-telomere, often shortened to T2T, means you assemble each chromosome end to end with no unresolved gaps. The sequence begins at one telomere, passes through the centromere and repeat-rich regions, and ends at the other telomere. In other words, it is a continuous, gapless chromosome rather than a draft made of many pieces separated by unknown stretches.
Why do gaps appear in the first place? Short reads struggle with repeats. When you cut a genome into tiny fragments, many pieces look almost the same. The assembler cannot tell where each repeat copy belongs, so it leaves breaks or guesses using a reference. Those guesses can create reference bias and misjoins. Long reads reduce this ambiguity because they span across repeats and provide unique context around them.
For a friendly definition and background, see the beginner explainer on the CD Genomics site in the article titled "What is Telomere-To-Telomere Sequencing?", which gives plain-language context and core terms in one place: the T2T explainer.

A complete chromosome includes telomeres, the centromere, rDNA arrays, and other repeats that draft genomes often miss.
According to the National Human Genome Research Institute's overview, the first complete, gapless human genome clarified how much was missing before and why long reads were the breakthrough. The page provides accessible context for newcomers to the field: NHGRI's T2T overview.
Why T2T Is Possible Now
The key advance is long reads that are both accurate and long enough to cross repeats. Two platforms often work together.
- PacBio HiFi reads are long and very accurate. Their high per-base accuracy helps assemblers build clean graphs with fewer errors.
- Oxford Nanopore reads can be ultra-long. Some reads pass hundreds of kilobases, or even more, which is enough to span centromeres and other long repeats.
This convergence, along with long-range validation data such as Hi-C maps and optical maps, allows assemblies to resolve difficult regions. The Telomere-to-Telomere Consortium used these ideas to produce the first gapless human reference. The team reported a total of 3.055 Gb, including previously missing repeats and centromere sequences, in the paper titled "The complete sequence of a human genome." You can read the landmark study in Science here: the T2T-CHM13 paper (Nurk et al., 2022). For an accessible summary, see UCSC's announcement.
For a broader look at what "complete" means in 2026, and how to evaluate assemblies in the T2T era, Heng Li's review explains the criteria and tool choices in plain terms. It describes how low-error long reads, long-range maps, and careful validation work together: Genome assembly in the telomere-to-telomere era (Li, 2023).
Draft Genomes Versus Complete Genomes
It is easy to confuse a scaffolded "draft" with a complete assembly. A draft assembly may have long scaffolds, but those spans can include strings of Ns, which are gaps. It may also rely on a reference to order contigs, which can hide errors and introduce reference bias. A complete assembly, by contrast, is continuous with no gaps, and each chromosome sequence reaches both telomeres.
When you evaluate a genome, look at multiple signals together. You need contiguity, completeness, and correctness. You also need structural integrity, especially in repeat-rich regions.
- Contiguity is commonly summarized by contig N50. Higher N50 often indicates larger pieces, but it is not enough by itself.
- Completeness can be checked with BUSCO, which looks for conserved genes. High BUSCO recovery suggests most genic content is present.
- Correctness involves consensus accuracy, often summarized by Merqury QV. Higher QV means fewer base errors in the final sequence.
- Structural integrity asks: did you resolve telomeres and centromeres, rDNA arrays, and segmental duplications without breaks?
If you are new to post-assembly checks, the Galaxy Training Network provides practical, beginner-friendly tutorials that walk through Merqury, BUSCO, and related tools on real data. See the ERGA post-assembly QC tutorial for step-by-step guidance.
Pilot experiment design: a simple, reproducible plan
Start small, test fast, and measure clearly. A focused pilot reduces guesswork and shows whether your sample prep and chosen platforms will close gaps.
- Define scope and goals. State the genome size, expected repeat content, and whether you need phased haplotypes. This makes coverage targets practical.
- Collect one high-quality sample for the pilot. Prioritize HMW DNA with clear size metrics (pulse-field or Femto) and good purity.
- Data plan (example pilot for a 500–800 Mb plant genome):
- PacBio HiFi: aim for 30–40× raw HiFi coverage.
- ONT ultra-long: generate a set of ultra-long reads with read N50 ≥100 kb and a modest total yield (20–50 Gb) to test bridging of long repeats.
- Optional short reads or Hi-C: include a small Hi-C library or 10–20× short reads for validation and scaffolding if available.
- Subsampling and comparisons. Produce three assemblies from the same pilot data to compare outcomes:
- HiFi-only assembly (e.g., hifiasm).
- HiFi + ONT hybrid assembly (e.g., Verkko or hifiasm-UL).
- ONT-first assembly if you rely on ultra-long reads (e.g., Flye), then polish with HiFi. Subsample reads (for example, 20×, 30×, 40× HiFi) to see where gains level off.
- Report the raw numbers. In your pilot report include raw yields, mean/median read length, read N50, and estimated coverage per data type.
Keep the pilot short (one week of analysis) so you can iterate quickly.
Quick QC thresholds (pilot guidance)
| Checkpoint | Metric | Practical target (pilot) |
|---|---|---|
| Read quality | HiFi coverage | 30–40× |
| Long-read length | ONT read N50 | ≥100 kb for bridge testing |
| Assembly completeness | BUSCO (appropriate lineage) | ≥95% suggests good genic completeness |
| Consensus accuracy | Merqury QV (k-mer) | ≥30 is a conservative target |
| Structural check | Hi-C contact map | Clear chromosome diagonals; few inter-chromosomal artifacts |
These targets are conservative starting points informed by community practice; see Heng Li's T2T-era guidance and Galaxy's QC tutorials for more context: Genome assembly in the T2T era (Li, 2023) and the ERGA post-assembly QC tutorial.
Benchmarking and reproducibility: a short checklist
- Fix software and hardware versions. List assembler, basecaller, and polishers with exact versions and command-line flags.
- Share raw metrics. Publish raw read yields, read N50, and coverage calculations in a short table.
- Record subsampling rules. State how you selected reads (longest X Gb or random subsample) and include scripts or commands.
- Run at least two assembly strategies. Compare HiFi-only, hybrid, and ONT-first builds and report BUSCO, QV, contig N50, and ordinal checks for telomere/centromere presence.
- Validate visually. Include at least one Hi-C contact map image and one read-mapping identity plot in your report.
- Make data and commands available. Deposit raw reads in a suitable repository or provide access instructions, and archive the exact command logs so peers can reproduce results.
Following these simple steps helps you judge whether a full T2T project is feasible and reduces wasted runs. For stepwise QC and tooling, community tutorials and reviews provide runnable examples and commands.
The Technology Stack That Closes Gaps
To see why long reads help, picture a jigsaw puzzle with many look-alike pieces. Short reads capture tiny fragments of the picture. Many pieces fit in several places, so you hesitate or make a guess. Long reads are like larger puzzle strips that include distinctive landmarks. They bridge the repeat and land you in the correct unique region on the other side.
In practice, PacBio HiFi provides high-accuracy long reads that keep the assembly graph clean, while Oxford Nanopore provides ultra-long reads that physically span long repeats and centromeres. Hybrid assemblers, such as Verkko or hifiasm-UL, can use both data types. If you want a concise overview of platform differences, see this internal primer: PacBio vs Oxford Nanopore comparison.

Disclosure: CD Genomics is our product. As a neutral example, many labs run a hybrid workflow to reduce trial-and-error. They begin with high-molecular-weight DNA QC, then plan PacBio HiFi for accurate backbone contigs and add Oxford Nanopore ultra-long runs to bridge long repeats. Assemblies are built with a hybrid-capable tool and validated with Merqury QV, BUSCO, and Hi-C maps. A service partner such as CD Genomics can coordinate the multi-platform runs and provide bioinformatics checks without changing your scientific control.
For a readable primer on how and why these two data types complement each other, vendor resources are useful context. See the PacBio long-read overview and the ONT read-length guide. For a deeper community perspective, the review by Heng Li above explains standards and choices in the T2T era.
Planning Your T2T Project Without Trial-and-Error
Beginners often face two linked challenges: fragmented assemblies and uncertainty about how much data is enough. The goal here is to give you conservative planning numbers and checkpoints that help you avoid repeated guesses.
High-molecular-weight DNA makes everything easier. Handle samples gently, avoid vortexing, and use extraction methods designed for long DNA. Keep purity high and check integrity before you commit to sequencing. If you need practical tips, CD Genomics provides primer-level guidance on DNA extraction and handling for long-read projects in its DNA extraction guidance.
The table below summarizes typical starting targets. Adjust based on genome size, repeat content, ploidy, and heterozygosity. Always confirm with your organism's literature.
| Project size | HiFi coverage target | ONT ultra-long goal | Long-range data | Typical QC targets |
|---|---|---|---|---|
| Small genomes (microbial, <10 Mb) | 50× or higher | Optional; use if repeats cause breaks | Optional; use if large plasmids or repeats | BUSCO near 100% for relevant lineage; QV ≥ 40 |
| Medium genomes (100–800 Mb) | 30–60× per haplotype | Read N50 ≥ 100 kb; supplement to bridge long repeats | Hi-C at ≥30× physical coverage for robust scaffolding | BUSCO ≥ 95–99%; QV ≥ 30–40; long contig N50 |
| Large genomes (>1 Gb, repeat-rich) | 40–80× per haplotype | Push for many ultra-long reads with N50 ≥ 100–150 kb | Hi-C and, if possible, optical maps for validation | BUSCO high for clade; QV ≥ 30; verify telomere and centromere continuity |
These ranges draw on community practice reflected in reviews and tutorials, such as the T2T-era review by Heng Li and the VGP methods papers that show how HiFi, Hi-C, and other maps work together. For background, see Genome assembly in the telomere-to-telomere era (Li, 2023) and the VGP v2.1 workflow in Galaxy.
Practical checkpoints that reduce guesswork:
- Check raw DNA size distribution and purity before library prep. If HMW DNA is low, improve extraction rather than hoping assembly will fix it.
- After sequencing, confirm coverage and read length metrics. If ONT ultra-long N50 is too short to span key repeats, consider another flow cell.
- During assembly, track contig N50, BUSCO, and Merqury QV. If BUSCO drops or QV is low, revisit polishing and data balance.
- Validate with Hi-C contact maps. Strong, clean diagonal patterns support correct chromosome-scale structure.
A simple coverage calculation
Let's say your genome is 600 Mb. You plan for 40× HiFi and an ONT ultra-long set to bridge repeats.
- HiFi: 600 Mb × 40 = 24,000 Mb = 24 Gb of HiFi sequence. If your HiFi yield is ~15 Gb per SMRT Cell (an example figure that varies by chemistry), you would schedule two cells and leave buffer for QC.
- ONT UL: Target an N50 ≥ 100 kb and a few hundred Gbases of total yield if repeats are long and frequent. Yield varies with chemistry and DNA quality, so plan more conservatively if your HMW DNA metrics are borderline.
Because yields change over time and with sample prep, always check the latest platform guidance and adjust. The idea is to budget enough data so the assembly can close repeats without many re-runs.
A minimal, runnable hybrid pipeline
This example is for learning on a workstation. Replace file names with your own. Commands assume common tools and default presets; for production, tune parameters and consult tool manuals.
- Inspect reads and estimate genome size with k-mers
meryl count k=21 output meryl_db *.fastq.gz
meryl print greater-than 100 meryl_db > kmers.gt100.txt
- Assemble with Verkko (HiFi + ONT)
verkko \
--hifi reads_hifi.fastq.gz \
--nano reads_ont.fastq.gz \
--threads 32 --work-dir verkko_out
- Evaluate completeness and accuracy
busco -i verkko_out/consensus.fasta -l embryophyta_odb10 -m genome -o busco_out
merqury.sh sample.meryl verkko_out/consensus.fasta merqury_out
- Inspect Hi-C contact map (if available)
juicer.sh -g genome -z verkko_out/consensus.fasta -p genome.chrom.sizes -s MboI -y restriction_sites.txt -D juicer_dir
These steps give you a feel for the workflow. For real projects, add polishing, purge haplotigs if needed, and run manual curation when QC flags appear.
Real-World Non-Human Examples
Beginners often ask whether telomere to telomere genome assembly is only for human studies. The answer is no. Recent plant and animal projects show how hybrid strategies help across species.
- Maize. A Nature Genetics study reported a complete, gapless assembly of maize. The project used long reads and long-range data to resolve complex repeats typical of large plant genomes. The work illustrates how high repeat content can still be conquered with the right data balance. See a 2024 open-access context article that discusses maize assemblies and related methods here: an overview of long-read genome projects in plants, and note the maize T2T paper DOI: 10.1038/s41588-023-01419-6.
- Sorghum. Multiple studies in 2024 achieved T2T assemblies of sorghum lines using hybrid data, with reports of intact telomeres and centromeres across chromosomes. Read methods and results in plant biology venues that describe how HiFi, ONT ultra-long, and Hi-C work together. A good starting point is this open-access paper with details for the BTx623 reference: a 2024 sorghum T2T resource, with DOI 10.1016/j.xplc.2024.100977.
- Mouse haploid embryonic stem cells. A complete, telomere-to-telomere sequence was reported for a non-human mammalian system. The Science paper shows how diploid challenges can be sidestepped with experimental design and long-read data. It is a useful example for animal labs planning similar work: complete T2T in mouse haploid ESCs.
These examples show that a hybrid approach can help you move beyond drafts even in large, repeat-rich genomes. They also show why validation matters. The papers document not only assembly contiguity but also correctness and structural integrity, including telomeres and centromeres.
Where Telomere to Telomere Genome Assembly Changes Your Research
A complete, gapless assembly gives you a clean foundation. Many downstream analyses become simpler and more accurate because you are no longer guessing across gaps or editing around reference bias.
- New gene discovery. Genes that sit inside repeats, or near centromeres, are easier to find and annotate when those regions are in the assembly. As a result, your gene catalogs are more complete.
- Structural variation. Long reads expose rearrangements, inversions, and copy-number changes that short reads often miss. When the assembly is gapless, you can map and compare these features without the noise of gaps.
- Evolution studies. Repeats evolve fast. When you finally see them in full, you can track centromere evolution, satellite expansions, and segmental duplications across lineages.
If you want a short, authoritative reminder of why the first gapless human genome mattered for analysis, read the Science milestone again: the T2T-CHM13 paper. It connects the technical step of closing gaps to better biological insight.
Next Steps and Resources
Telomere to telomere genome assembly is no longer a distant goal. With careful planning and the right data, it is within reach for many non-human projects today. Start by defining your scientific question, then size your data plan to your organism and repeats. Use the checkpoints in this guide to avoid trial-and-error.
If you want a deeper primer on the concepts and enabling technologies, the CD Genomics resource provides a plain-language introduction: T2T explainer. For a clear technology overview that shows why hybrid strategies work, see the PacBio vs Oxford Nanopore comparison.
When you are ready to plan a project, you can review practical sample handling and acceptance criteria here: sample submission guideline (and the companion PDF guide). If you need end-to-end support in a research-use-only context, you can read about service options and analysis support on these pages: long-read sequencing services and long-read data analysis service.
To close, here is a quick checklist you can skim before you commit to sequencing.

- Is your DNA high-molecular-weight and clean? If not, fix extraction first.
- Do your planned reads meet coverage and read-length goals for your genome size?
- Have you picked an assembly plan that uses both accuracy and read length to bridge repeats?
- Do you have a validation plan with Merqury QV, BUSCO, and Hi-C maps?
Beginner FAQ
-
- Do I always need both PacBio HiFi and ONT ultra-long data?
-
- What if my BUSCO is high but QV is low?
-
- How do I know if I reached telomere to telomere genome assembly?
Mini glossary
- Contig: A continuous stretch of assembled sequence with no gaps.
- Scaffold: Ordered and oriented contigs that may include gaps (Ns).
- BUSCO: A tool that checks for expected single-copy genes to assess completeness.
- Merqury QV: A k-mer based measure of consensus accuracy; higher is better.
- N50: The length at which 50% of the assembly is in contigs of that size or longer.
Still curious about the basics of telomere to telomere genome assembly? Think of it this way: it is a promise to yourself that every base you can possibly see, you will see in order. That promise turns a draft into a trustworthy scientific resource you can build on.
Author and credentials
CD Genomics Bioinformatics & Sequencing Team. Comprised of PhD-level scientists and senior bioinformaticians, the team has deep experience coordinating international, multicenter genomics studies. They have managed de novo genome assembly programs and long-read sequencing projects using PacBio and Oxford Nanopore platforms, and they routinely support large-scale transcriptomic and epigenomic profiling. Core capabilities include customized bioinformatics workflow development, standardized QC protocols, longitudinal study harmonization, and audit-ready data governance.
Disclosure: CD Genomics is our product. This article is published under a team byline. CD Genomics provided technical insights for this content. All technical recommendations are illustrative and should be evaluated by independent experts for specific study objectives. For more information on the platforms and services mentioned, see the CD Genomics website: CD Genomics.
References and suggested reading:
- T2T milestone study: The complete sequence of a human genome (Science, 2022).
- Beginner overview: NHGRI's telomere-to-telomere explainer; general news context from UCSC.
- T2T-era standards: Genome assembly in the T2T era (Li, 2023).
- Practical QC walkthrough: Galaxy ERGA post-assembly QC tutorial.
- Platform context: PacBio long-read overview; ONT read-length guide.
- Non-human examples: plant long-read project overview with maize context; sorghum T2T resource BTx623; mouse haploid ESCs T2T.