Introduction to pan genome

A pan genome is the entire set of genes for all strains within a clade, including the core genome and variable/accessory/dispensable genome. The core genome is composed of sequences that exist in all strains, and is generally related to the biological function and main phenotypic characteristics of the species, reflecting the stability of the species; variable/accessory/dispensable genome is composed of sequences that exist only in a single strain or part of the strains which is related to the adaptability of the species to a specific environment or the unique biological characteristics, reflecting the characteristics of the species.

Pan genome sequencing uses high-throughput sequencing and biological information analysis to perform library construction and in-depth sequencing of individuals/subspecies/lineages of species, then assemble separately and construct a pan genome map, enrich the genetic information of species and study its important biological problem.

Composition of a pan genomeFigure 1. Composition of a pan genome

Advantages and features of pan genome

  • Enrich genome information of the species through sequencing the subspecies and individuals
  • Quickly find genes or structural variations of genes related to important traits based on study of variable genome
  • Study the differences within species from the perspective of unique gene sequences
  • Small population, cost and time efficient

Pan genome workflow

Analysis pipeline

Pan genome analysis pipeline


  • How many samples are required for pan genome sequencing?
  • At least 2 individuals or subspecies required.

  • If the reference genome required?
  • With or without reference genome are both suitable.

