CD Genomics provides Pan Genome Sequencing services powered by precise bioinformatics analysis, unlocking in-depth genetic insights to advance population genetics research.
CD Genomics provides internationally leading Pan Genome sequencing solutions, specifically designed to decode the genetic secrets of ancient organisms. By integrating targeted capture techniques with ultra-high-throughput sequencing platforms, we achieve in-depth analysis of highly degraded samples, enabling groundbreaking advancements in research on species evolution, pathogen research, ecological and environmental studies.
Pan-genome encompasses the complete set of genes found across all strains within a given species. This approach gives a more complete picture of genetic diversity than analyzing individual genomes alone. The pan-genome consists of the core genome and the variable (or accessory/dispensable) genome. The core genome comprises genes universally present in all strains of the species. These core genes are typically associated with fundamental biological functions and essential phenotypic traits, reflecting the evolutionary stability of the species. In contrast, the variable genome includes genes present only in a subset of strains or individuals. These accessory genes often encode specific adaptations or unique traits, such as antibiotic resistance genes or virulence factors. By comprehensively studying the pan genome, researchers can elucidate the full spectrum of genetic variability within a species. This facilitates a deeper understanding of its evolutionary dynamics, functional adaptations, and potential responses to environmental pressures.
Schematic image of the pan-genome structure. (Muzzi, et al., 2007)
System diagram of pan-genome construction subsystem. (Duan, et al., 2019)

NextSeq 500

Illumina NovaSeq

PacBio Sequel II
Our Pan Genome sequencing service workflow includes sample collection, library preparation, high-throughput sequencing, quality control, and detailed variant analysis to uncover genetic variations and population insights. Customers are encouraged to ensure proper sample handling and share specific research goals for tailored analysis. For any questions about sample requirements, sequencing, or data interpretation, our team is always ready to assist.

| Basic Analysis | Advanced Analysis |
| Raw Data Quality Control: Per-base sequence quality, GC content distribution and adapter contamination ratio. Variant Calling: SNP and indel identification.Detect presence/absence variations (PAVs), copy number variations (CNVs), and inversions Gene Prediction and Annotation: Functional annotation and structural annotation. Core/Accessory Gene Analysis |
Population Genetics: Phylogenetic trees, population structure analysis. GWAS and Trait Association: Use pan-genome as reference to improve precision in trait-gene mapping. Domestication and Evolution: Compare gene loss/retention between wild and cultivated species. |

Small variant sites in pan genome graphs.

Number of SVs present in each of the 3,202 1KG samples.

Length distribution of SV insertions and SV deletions contained.
(Liao et al., Nature, 2023)
A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range
Journal: Nature Genetics
Published: 2024
https://doi.org/10.1038/s41588-024-01715-9
Arabidopsis thaliana, a widely distributed model organism, is important for population genomics. Currently, most studies on its genome structural variations only examine a few strains. To address this, this study built a pangenome using 72 globally diverse strains (chromosome-level genome assemblies of 69 accessions) from Europe, Asia, Africa, North America, and islands. It used long-read sequencing (PacBio/Nanopore) with short-read correction (Illumina) for de novo assembly and comparative analysis. Results show The overall structure of its DNA remains very similar across different strains. And Differences in genome size are mainly caused by variations in the centromere regions. Pan-genome analyses uncovered 32,986 distinct gene families, 60% being present in all accessions and 40% appearing to be dispensable, including 18% private to a single accession, indicating unexplored genic diversity. These 69 new Arabidopsis thaliana genome assemblies will empower future genetic research.
To infer the evolutionary relationships among A. thaliana accessions, they constructed a species tree of the accessions, which confirmed their evolutionary relationships and was consistent with
previous reports highlighting the African populations as the most divergent and probably most ancient lineages.
Phylogenetic tree based on SNPs in the 72 A. thaliana genomes.
This study identified 32,986 gene families. Among these, 19,721 (60%) comprised the core genome enriched in fundamental metabolic and developmental functions. While the remaining 13,265 (40%) constituted the accessory genome, including 5,582 dispensable genes and 6,070 accession-private genes. Core genes encode significantly longer proteins containing more Pfam domains, exhibit higher expression levels, and show lower Ka/Ks ratios, indicating stronger purifying selection. Accessory genes are functionally enriched in defense responses such as disease resistance, yet notably, a substantial portion remains expressed in the reference accession Col-0.
The annotated protein-coding genes in each individual accession.
The number and proportion of core, softcore, dispensable and private gene families in the pan-genome.

The protein length, number of annotated Pfam domains, gene expression landscape and Ka/Ks of core (red), softcore (blue), dispensable (green) and private (purple) genes of Col-0.
(Lian, et al., 2024)
A minimum of two individuals or subspecies is required to begin a pan-genome study. However, to capture a species' full genetic diversity, more samples are recommended. Including additional samples significantly improves the resolution of genetic variation and yields a more refined pan-genome map.
Pan-genome analysis does not universally require a reference genome; its dependency hinges on the assembly method. The iterative approach needs an initial reference genome for aligning all sample data, followed by independent assembly of unmapped sequences and their integration to expand the pan-genome. Conversely, the de novo assembly method operates without any reference genome, directly assembling each sample' s data into draft genomes, which are then integrated into the pan-genome via comparative genomics.
References