In the evolution of life science research, traditional genomics takes a single reference genome as the research benchmark. The concept of the Pan-Genome is an innovation of this cognitive paradigm, which expands the research dimension from a single individual genome to the collection of all individual genomes within a species, including both the core genome that maintains species commonality and the variable genome that determines individual specificity.
The establishment of this theoretical framework not only corrects the underestimation of genetic diversity in traditional research, but also provides a new research paradigm for analyzing the driving force of biological evolution, adaptive evolution path, and functional gene network by integrating the whole genome variation at the species level, and promotes the methodological leap of genomics research from individual genetic map to species genetic panorama.
This paper focuses on pan-genome, introduces its concept, classification, explains its driving factors and distribution characteristics in different species, and also explains its scientific significance in agriculture.
Pan-genome refers to the sum of the genomes of all individuals in a species. It not only contains the core genes shared by all individuals, which play a key role in the basic life activities and the maintenance of basic biological characteristics of species but also covers the unique variable genes of different individuals, which are often related to individual's special traits, environmental adaptability, and disease susceptibility.
This concept has completely broken the limitation of a single reference genome in traditional genomics. In the past, a single reference genome was usually constructed based on a few individuals, which was difficult to reflect the widespread genetic variation within species. The pan-genome presents the genetic diversity within a species in a panoramic view by integrating the genome data of multiple individuals.
Venn diagram plot that represents the three parts of the pan-genome (Guimarães et al., 2015)
The pan-genome covers the core genome and variable genome. A core genome is a gene shared by all individuals of a species, which determines the basic life characteristics. A Variable genome is a gene with differences between different individuals, which makes the species present genetic diversity, and together they constitute a complete set of genetic information of the species.
Core genome refers to the genes existing in all individuals of a species, which are usually closely related to the basic life activities and important biological functions of the species. These genes are relatively conservative in the evolution of species, and based on the molecular clock theory, the nucleotide substitution rate of core genes is relatively stable, which has become an important marker for constructing the phylogenetic tree of species. In addition, the core genome is also involved in the regulation of species' environmental adaptability. Under extreme temperatures or high salt environments, the core gene maintains intracellular homeostasis through expression regulation, further strengthening species' survival competitiveness.
Variable genomes refer to genes that are different in different individuals of a species, and they may only exist in some individuals, or there are differences in copy numbers and sequences among other individuals. These differences are mainly due to genetic events such as gene gain and loss, copy number variation (CNV), and single nucleotide polymorphism (SNP). The existence of variable genomes enables species to show rich phenotypic diversity under different environmental conditions, which is an important genetic basis for species to adapt to environmental changes.
In addition, the variable genome is also involved in the niche differentiation of species, for example, some insects realize the preference selection of different host plants through specific taste receptor gene variation and then promote the formation of new species.
Different components and types of the pan-genome (Divya et al., 2022)
Rarefaction curves for open (green) and closed (blue) pan-genomes (Dawid et al., 2021)
The construction of a pan-genome relies on advanced sequencing technologies. Modern approaches combine long-read sequencing (e.g., PacBio) to resolve structural variations and multi-sample designs to capture genetic diversity, enabling a comprehensive representation of both core and variable genomes.
Services you may interested in
Learn More
The evolution of Pan-Genome is not driven by a single factor, but the result of multiple mechanisms. In the process of long-term evolution, species constantly adjust their genetic composition by acquiring, losing, or transforming genes to adapt to the complex and changeable environment. These driving factors are interrelated influence each other, and jointly shape the dynamic changes of the Pan-Genome.
Horizontal Gene Transfer (HGT) refers to the non-reproductive transfer process of genes between different species or individuals of the same species, which breaks the inherent mode of traditional vertical inheritance and introduces new variables for biological evolution. As one of the core driving forces of microbial Pan-Genome evolution, HGT is mainly realized through three mechanisms: transformation, transduction, and conjugation.
Schematic diagram showing horizontal gene transfer and genome decay as the major mechanisms driving the evolution dynamics of lactic acid bacteria (Li et al., 2023)
Repeated sequences exist widely in the genome, and their expansion and contraction are important driving forces for the evolution of the Pan-Genome. From the molecular mechanism, repetitive sequences can be amplified by transposable element jumping, unequal exchange or DNA replication sliding. This dynamic change not only changes the gene copy number but also affects the gene expression level through dose effect.
In the field of plant disease resistance, gene family expansion mediated by repetitive sequences shows excellent evolutionary adaptability. Taking rice as an example, NBS-LRR disease-resistant gene families rich in leucine repeats (LRR) form gene clusters with different numbers in different rice varieties through a tandem repeat mechanism. Redundant copies in these gene clusters differentiate into the function of identifying specific pathogen effectors through cumulative point mutation or structural variation so that rice can achieve specific immune responses in the long-term game with Magnaporthe grisea.
Dynamic evolution of orthologous gene family (Zhong et al., 2018)
Environmental pressure is another important factor in promoting the evolution of the Pan-Genome. In nature, the complexity and dynamic changes of the environment form a powerful screening mechanism, which drives species to survive and reproduce through adaptive changes at the genetic level. Under different environmental conditions, species need to adapt to environmental changes through gene mutation and recombination.
This dynamic change of genes based on environmental pressure is not only reflected in the production of new genes but also in the loss and silence of genes. In cavefish, the long-term dark environment led to the gradual degradation of vision-related genes, but the genes related to tactile perception and chemical induction expanded significantly, forming a unique pan-genome structure adapted to the dark environment. These phenomena fully show that environmental pressure shapes the diversity and specificity of species' pan-genome through continuous selection, which provides a genetic basis for the evolution of biological environmental adaptability.
A hypothetical model for the evolution of the Mtr pathway in Shewanella (Zhong et al., 2018)
In the biological world, the distribution characteristics of the Pan-Genome vary with species types, showing significant diversity and adaptability differences. This difference not only reflects the evolution of different species but also is closely related to their living environment and biological characteristics, which is the key point to understanding the genetic diversity of species.
The pan-Genome of microorganisms usually has great plasticity and openness. Taking bacteria as an example, the genetic differences between different strains may be very large, and the pan-genome size of some bacteria can reach several times that of the core genome. This high degree of genetic diversity enables bacteria to survive and reproduce in various environments.
Compared with microorganisms, the Pan-Genome of plants and animals is relatively conservative and closed. However, there is still rich genetic diversity in the pan-genome of animals and plants.
With the deepening of biological research, Pan-Genome research is of great significance. It breaks through the limitation of the single reference genome, integrates the genome information of all individuals or species from a holistic perspective, opens a new window for exploring the mysteries of life, and its scientific value gradually appears in many key fields.
Pan-genome research can fully reveal the genetic diversity of species and provide an important basis for species classification, evolution, and ecological research. By comparing the pan-genomes of different species or individuals of the same species, we can understand the genetic relationship, evolutionary process, and ecological adaptability of species.
The variable genome in the Pan-Genome contains a large number of genes related to environmental adaptation, and the study of these genes can help us understand how species adapt to different environmental conditions during evolution.
Functional redundancy means that multiple genes have the same or similar functions, which is common in Pan-Genome. The existence of functional redundancy makes species have stronger stability and adaptability in the face of pressure such as gene mutation. Through the study of functionally redundant genes in the Pan-Genome, we can deeply understand the function and regulation mechanism of genes, and provide new ideas for the study of gene function and the development of gene editing technology.
Expression and evolutionary conservation of transcripts after clustering 16 barley transcriptomes (Contreras-Moreira et al., 2017)
To sum up, the concept and research of the Pan-Genome provide a new perspective and method for us to deeply understand the genetic diversity, evolutionary mechanism, and biological functions of species. With the continuous development of sequencing technology and the continuous reduction of cost, Pan-Genome research will be carried out in more species, which will bring new opportunities and challenges to the development of biology, medicine, agriculture, and other fields.
References
Send a MessageFor any general inquiries, please fill out the form below.
CD Genomics is propelling the future of agriculture by employing cutting-edge sequencing and genotyping technologies to predict and enhance multiple complex polygenic traits within breeding populations.