A pan genome is the entire set of genes for all strains within a clade, including the core genome and variable/accessory/dispensable genome. The core genome is composed of sequences that exist in all strains, and is generally related to the biological function and main phenotypic characteristics of the species, reflecting the stability of the species; variable/accessory/dispensable genome is composed of sequences that exist only in a single strain or part of the strains which is related to the adaptability of the species to a specific environment or the unique biological characteristics, reflecting the characteristics of the species.
Pan genome sequencing uses high-throughput sequencing and biological information analysis to perform library construction and in-depth sequencing of individuals/subspecies/lineages of species, then assemble separately and construct a pan genome map, enrich the genetic information of species and study its important biological problem.
Figure 1. Composition of a pan genome
At least 2 individuals or subspecies required.
With or without reference genome are both suitable.