banner
Pan-Genome: Concepts, Classification, and Biological Implications

Pan-Genome: Concepts, Classification, and Biological Implications

Inquiry

In the evolution of life science research, traditional genomics takes a single reference genome as the research benchmark. The concept of the Pan-Genome is an innovation of this cognitive paradigm, which expands the research dimension from a single individual genome to the collection of all individual genomes within a species, including both the core genome that maintains species commonality and the variable genome that determines individual specificity.

The establishment of this theoretical framework not only corrects the underestimation of genetic diversity in traditional research, but also provides a new research paradigm for analyzing the driving force of biological evolution, adaptive evolution path, and functional gene network by integrating the whole genome variation at the species level, and promotes the methodological leap of genomics research from individual genetic map to species genetic panorama.

This paper focuses on pan-genome, introduces its concept, classification, explains its driving factors and distribution characteristics in different species, and also explains its scientific significance in agriculture.

Definition and Classification of Pan-Genome

Pan-genome refers to the sum of the genomes of all individuals in a species. It not only contains the core genes shared by all individuals, which play a key role in the basic life activities and the maintenance of basic biological characteristics of species but also covers the unique variable genes of different individuals, which are often related to individual's special traits, environmental adaptability, and disease susceptibility.

This concept has completely broken the limitation of a single reference genome in traditional genomics. In the past, a single reference genome was usually constructed based on a few individuals, which was difficult to reflect the widespread genetic variation within species. The pan-genome presents the genetic diversity within a species in a panoramic view by integrating the genome data of multiple individuals.

Venn diagram illustration depicting the three components of the pan-genome (Guimarães et al., 2015) Venn diagram plot that represents the three parts of the pan-genome (Guimarães et al., 2015)

Core Genomes vs. Variable Genomes

The pan-genome covers the core genome and variable genome. A core genome is a gene shared by all individuals of a species, which determines the basic life characteristics. A Variable genome is a gene with differences between different individuals, which makes the species present genetic diversity, and together they constitute a complete set of genetic information of the species.

Core genome refers to the genes existing in all individuals of a species, which are usually closely related to the basic life activities and important biological functions of the species. These genes are relatively conservative in the evolution of species, and based on the molecular clock theory, the nucleotide substitution rate of core genes is relatively stable, which has become an important marker for constructing the phylogenetic tree of species. In addition, the core genome is also involved in the regulation of species' environmental adaptability. Under extreme temperatures or high salt environments, the core gene maintains intracellular homeostasis through expression regulation, further strengthening species' survival competitiveness.

Variable genomes refer to genes that are different in different individuals of a species, and they may only exist in some individuals, or there are differences in copy numbers and sequences among other individuals. These differences are mainly due to genetic events such as gene gain and loss, copy number variation (CNV), and single nucleotide polymorphism (SNP). The existence of variable genomes enables species to show rich phenotypic diversity under different environmental conditions, which is an important genetic basis for species to adapt to environmental changes.

  • In plants, the variable distribution of disease-resistant genes helps crops resist the invasion of regional pathogens.
  • In microbial communities, variable genomic elements carrying antibiotic resistance genes promote the rapid evolution of microbial communities under the pressure of drugs.

In addition, the variable genome is also involved in the niche differentiation of species, for example, some insects realize the preference selection of different host plants through specific taste receptor gene variation and then promote the formation of new species.

Open Pan-Genome vs. Closed Pan-Genome

  • A. Open Pan-Genome
    • a) The characteristic of open pan-genome is that with the addition of new individuals, new genes are constantly discovered, and the size of the species' pan-genome shows a continuous growth trend. This type of pan-genome is common in some microbial species, such as bacteria and archaea. Because of the rapid evolution speed and frequent horizontal gene transfer, microorganisms can continuously acquire new genes from the environment, thus expanding their pan-genome.
    • b) Because of the rapid evolution speed and frequent horizontal gene transfer, microorganisms can continuously acquire new genes from the environment, thus expanding their pan-genome. Studies have shown that the pan-genome scale of some bacterial species can increase by 10%-15% due to the intake of foreign genes in just a few weeks of laboratory culture, which makes the open pan-genome a key genetic basis for microorganisms to adapt to complex niches.
    • c) In addition, although archaea is different from bacteria in cell structure, it also has an efficient gene exchange network. In extreme environments, key genes adapted to special niches such as high salt and high temperature can be obtained through horizontal gene transfer, which further promotes the continuous expansion of the pan-genome.

The diverse components and categories of the pan-genome (Divya et al., 2022) Different components and types of the pan-genome (Divya et al., 2022)

  • B. Closed Pan-Genome
    • a) On the contrary, when the number of individuals sequenced reaches a certain level, the number of newly discovered genes shows an obvious decline trend, and the pan-genome size of species tends to be stable gradually. This type of pan-genome occupies a dominant position in the animal and plant world. For example, in the research of rice, corn, and other crops, it is found that with the increase of the number of sequenced varieties from dozens to hundreds, the growth curve of new genes is gradually flat, and finally, a stable gene set is formed.
    • b) From the perspective of evolutionary biology, the relatively stable living environment and low generation replacement rate of animals and plants determine that the gene pool is relatively slow to update. Taking perennial woody plants as an example, its long growth cycle and complex reproductive process make it take a long time for gene recombination and mutation accumulation.
    • c) In addition, there are a large number of regulatory elements and redundant genes in animal and plant genomes, which buffer the acquisition and loss of genes to some extent, making the pan-genome gradually enter a dynamic equilibrium state after experiencing rapid expansion in the initial stage. This stability is very important for maintaining the genetic characteristics and ecological adaptability of species and also provides a reliable theoretical basis for crop genetic improvement and species evolution research.

Rarefaction curves depicting open (green) and closed (blue) pan-genomes (Dawid et al., 2021) Rarefaction curves for open (green) and closed (blue) pan-genomes (Dawid et al., 2021)

The construction of a pan-genome relies on advanced sequencing technologies. Modern approaches combine long-read sequencing (e.g., PacBio) to resolve structural variations and multi-sample designs to capture genetic diversity, enabling a comprehensive representation of both core and variable genomes.

Evolutionary Driving Factors of Pan-Genome

The evolution of Pan-Genome is not driven by a single factor, but the result of multiple mechanisms. In the process of long-term evolution, species constantly adjust their genetic composition by acquiring, losing, or transforming genes to adapt to the complex and changeable environment. These driving factors are interrelated influence each other, and jointly shape the dynamic changes of the Pan-Genome.

Horizontal Gene Transfer

Horizontal Gene Transfer (HGT) refers to the non-reproductive transfer process of genes between different species or individuals of the same species, which breaks the inherent mode of traditional vertical inheritance and introduces new variables for biological evolution. As one of the core driving forces of microbial Pan-Genome evolution, HGT is mainly realized through three mechanisms: transformation, transduction, and conjugation.

  • Transformation refers to the process in which microorganisms directly take naked DNA fragments from the environment and integrate them into their genomes through homologous recombination. The transformation mechanism enables microorganisms to quickly acquire adaptive genes in the environment.
  • Transduction refers to the process in which a phage (virus-infected with bacteria) packs a DNA fragment of a donor bacterium into phage particles when it infects a host bacterium and then introduces the DNA into a recipient bacterium when it infects a recipient bacterium. The transduction mechanism has high efficiency and specificity and especially plays a key role in the pan-genome expansion of pathogenic bacteria.
  • Conjugation refers to the process of transferring plasmid or chromosomal DNA from donor to recipient through direct contact between donor and recipient bacteria. Conjugation is one of the most important ways of microbial drug-resistance gene transmission.

A schematic illustration depicting horizontal gene transfer and genome decay as the primary mechanisms driving the evolutionary dynamics of lactic acid bacteria (Li et al., 2023) Schematic diagram showing horizontal gene transfer and genome decay as the major mechanisms driving the evolution dynamics of lactic acid bacteria (Li et al., 2023)

Repeat Sequence Extension

Repeated sequences exist widely in the genome, and their expansion and contraction are important driving forces for the evolution of the Pan-Genome. From the molecular mechanism, repetitive sequences can be amplified by transposable element jumping, unequal exchange or DNA replication sliding. This dynamic change not only changes the gene copy number but also affects the gene expression level through dose effect.

  • Transposable elements (TEs) are a kind of DNA sequences that can move autonomously in the genome. The position transfer is realized by jumping, which is often accompanied by the increase of its copy number, forming repeated sequence clusters.
  • Unequal crossing occurs during meiosis or mitosis, where homologous chromosomes or sister chromatids are paired with non-homologous segments, resulting in duplication or deletion of the recombined chromosomes.
  • DNA replication slippage refers to the mismatch between the template strand and the newly synthesized strand due to the base complementarity of the repeated sequence during DNA replication, which causes the polymerase to slip and restart replication, thus increasing the number of repeated units.

In the field of plant disease resistance, gene family expansion mediated by repetitive sequences shows excellent evolutionary adaptability. Taking rice as an example, NBS-LRR disease-resistant gene families rich in leucine repeats (LRR) form gene clusters with different numbers in different rice varieties through a tandem repeat mechanism. Redundant copies in these gene clusters differentiate into the function of identifying specific pathogen effectors through cumulative point mutation or structural variation so that rice can achieve specific immune responses in the long-term game with Magnaporthe grisea.

The dynamic evolution of orthologous gene families (Zhong et al., 2018) Dynamic evolution of orthologous gene family (Zhong et al., 2018)

Environmental Adaptation

Environmental pressure is another important factor in promoting the evolution of the Pan-Genome. In nature, the complexity and dynamic changes of the environment form a powerful screening mechanism, which drives species to survive and reproduce through adaptive changes at the genetic level. Under different environmental conditions, species need to adapt to environmental changes through gene mutation and recombination.

This dynamic change of genes based on environmental pressure is not only reflected in the production of new genes but also in the loss and silence of genes. In cavefish, the long-term dark environment led to the gradual degradation of vision-related genes, but the genes related to tactile perception and chemical induction expanded significantly, forming a unique pan-genome structure adapted to the dark environment. These phenomena fully show that environmental pressure shapes the diversity and specificity of species' pan-genome through continuous selection, which provides a genetic basis for the evolution of biological environmental adaptability.

A proposed model for the evolutionary process of the Mtr pathway in Shewanella (Zhong et al., 2018) A hypothetical model for the evolution of the Mtr pathway in Shewanella (Zhong et al., 2018)

Distribution Characteristics of Pan-Genome in Different Species

In the biological world, the distribution characteristics of the Pan-Genome vary with species types, showing significant diversity and adaptability differences. This difference not only reflects the evolution of different species but also is closely related to their living environment and biological characteristics, which is the key point to understanding the genetic diversity of species.

The pan-Genome of microorganisms usually has great plasticity and openness. Taking bacteria as an example, the genetic differences between different strains may be very large, and the pan-genome size of some bacteria can reach several times that of the core genome. This high degree of genetic diversity enables bacteria to survive and reproduce in various environments.

Compared with microorganisms, the Pan-Genome of plants and animals is relatively conservative and closed. However, there is still rich genetic diversity in the pan-genome of animals and plants.

Significance of Pan-Genome Research

With the deepening of biological research, Pan-Genome research is of great significance. It breaks through the limitation of the single reference genome, integrates the genome information of all individuals or species from a holistic perspective, opens a new window for exploring the mysteries of life, and its scientific value gradually appears in many key fields.

Reveal Species Diversity

Pan-genome research can fully reveal the genetic diversity of species and provide an important basis for species classification, evolution, and ecological research. By comparing the pan-genomes of different species or individuals of the same species, we can understand the genetic relationship, evolutionary process, and ecological adaptability of species.

Adaptive Evolution

The variable genome in the Pan-Genome contains a large number of genes related to environmental adaptation, and the study of these genes can help us understand how species adapt to different environmental conditions during evolution.

Functional Redundancy

Functional redundancy means that multiple genes have the same or similar functions, which is common in Pan-Genome. The existence of functional redundancy makes species have stronger stability and adaptability in the face of pressure such as gene mutation. Through the study of functionally redundant genes in the Pan-Genome, we can deeply understand the function and regulation mechanism of genes, and provide new ideas for the study of gene function and the development of gene editing technology.

The expression and evolutionary conservation of transcripts following the clustering of 16 barley transcriptomes (Contreras-Moreira et al., 2017) Expression and evolutionary conservation of transcripts after clustering 16 barley transcriptomes (Contreras-Moreira et al., 2017)

Conclusion

To sum up, the concept and research of the Pan-Genome provide a new perspective and method for us to deeply understand the genetic diversity, evolutionary mechanism, and biological functions of species. With the continuous development of sequencing technology and the continuous reduction of cost, Pan-Genome research will be carried out in more species, which will bring new opportunities and challenges to the development of biology, medicine, agriculture, and other fields.

References

  1. Guimarães LC, Florczak-Wyspianska J., et al. "Inside the Pan-genome-Methods and Software Overview." Curr Genomics. 2015 16(4): 245-52 https://doi.org/10.2174/1389202916666150423002311
  2. Dawid G, Sylwia N., et al. "Towards a better understanding of the bacterial pan-genome" Folia Biologica et Oecologica. 17: 84-96 (2021) https://doi.org/10.18778/1730-2366.16.19
  3. Divya R, Harshit K., et al. "Pan-genomics: A review of analysis, evolution, applications and future prospects." The Pharma Innovation Journal. 2022 SP-11(10): 2189-2201 https://www.thepharmajournal.com/
  4. Li WC, Wu Q., et al. "Population and functional genomics of lactic acid bacteria, an important group of food microorganism: Current knowledge, challenges, and perspectives." Food Frontiers. 2025 5(1): 3-23 https://doi.org/10.1002/fft2.321
  5. Zhong C, Han M., et al. "Pan-genome analyses of 24 Shewanella strains re-emphasize the diversification of their functions yet evolutionary dynamics of metal-reducing pathway." Biotechnol Biofuels. 2018 11: 193 https://doi.org/10.1186/s13068-018-1201-1
  6. Contreras-Moreira B, Cantalapiedra CP., et al. "Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species." Front Plant Sci. 2017 8: 184 https://doi.org/10.3389/fpls.2017.00184
For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Send a MessageSend a Message

For any general inquiries, please fill out the form below.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
We provide the best service according to your needs Contact Us
OUR MISSION

CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.

Contact Us
Copyright © CD Genomics. All Rights Reserved.
Top