Fact Sheet: Introns and Exons

CD Genomics Blog

Explore the blog we've developed, including genomic education, genomic technologies, genomic advances, and genomics news & views.

Posted on February 21, 2024

What is Exon?

A coding sequence within a fragmented gene, an exon (expressed region) constitutes a vital segment of a eukaryotic gene. It remains conserved post-splicing and serves as the basis for protein expression during protein biosynthesis. These nucleotide sequences are present in both the initial transcription product and the mature RNA molecule.

Genome encompasses the entire genetic information stored in the DNA of an organism. It comprises both gene regions, responsible for coding functional elements, and non-coding regions. The human genome, with a size of approximately 3 billion base pairs (3 GB), predominantly consists of non-coding regions, with the protein-coding regions making up only around 2%.

On the other hand, the exome represents the entirety of all exons within the genome. In humans, there are approximately 180,000 exons, constituting roughly 1% of the human genome, equivalent to about 30 million base pairs (30 MB).

What is Intron?

Introns, also known as intervening sequences, act as genetic elements that impede the linear expression of a gene. Distinguished from exons, introns are interspersed sequences found within the DNA of eukaryotic cells. During transcription, these sequences are transcribed into precursor RNA, undergo splicing, and are subsequently eliminated, resulting in their absence from the mature RNA molecule.

The cleaved gene structure is defined by the alternating arrangement of introns and exons. Introns within precursor RNAs are commonly denoted as "interspersed sequences," highlighting their distinctive placement within the genetic composition.

The Origin of Introns

Introns trace their origins as far back as the genes they inhabit, predating the assembly of the first such genes. During the early stages of genetic evolution, these introns possessed the remarkable abilities of autocatalysis and self-replication. As a result, they played a crucial role in organizing and replicating primitive genes and genomes.

However, prokaryotes and certain lower eukaryotes underwent the loss of introns. This loss was driven by the necessity for rapid DNA replication and, consequently, swift cell division.

In the contemporary context, modern introns represent a class of evolutionary relics. They persist due to their unique capability to recombine exons within the genome, giving rise to new genes. In essence, introns contribute to the enhanced evolutionary potential of the organisms that carry them.

Types of Introns

Introns exhibit diversity based on whether the splicing process occurs spontaneously or requires the intervention of the spliceosome, leading to the classification of self-splice and spliceosomal introns.

Self-Splice Introns: Identified by Thomas Cech in 1981, self-splice introns are further divided into two types: Type I Introns and Type II Introns.
Spliceosomal Introns: This category involves introns that undergo cleavage facilitated by the spliceosome. Whether a sequence is recognized as an intron or exon depends on intrinsic factors.

Additionally, specific intron sequences are categorized as follows:

GT-AG Intron: The most prevalent intron type, characterized by sequences starting with GT and ending with AG.
AT-AC Intron: Beginning with AU and concluding with AC.

Further classifications based on organisms and RNA types include:

GU-AG Class (Primary Introns): Found in the nucleus and pre-mRNA of eukaryotic organisms.
AU-AG Class (Minor Introns): Present in the nucleus and pre-mRNA of eukaryotic organisms.

Classifications based on RNA types and locations include:

Class 1 Introns: Located in the nucleus, pre-mRNA (eukaryotic), organelle RNA, and a few bacterial RNAs.
Class 2 Introns: Found in organelle RNA and some bacterial RNA, primarily in mitochondrial and chloroplast rRNA genes in eukaryotes.
Class 3 Introns: Present in organelle RNAs.
Double Introns: Identified in organelle RNA.
Introns in Pre-tRNA: Located in the cytosol and pre-tRNA (eukaryotic).

How Do Cells Distinguish Between Introns and Exons?

The precursor messenger RNAs (pre-mRNAs) transcribed from the genomes of various eukaryotic organisms, spanning from S. cerevisiae to H. sapiens, comprise both exons and introns. In most cases, only the exons undergo translation to produce target proteins. To facilitate this translation process, pre-mRNAs undergo a crucial step known as "splicing," where introns are removed, and exons are seamlessly joined together. This splicing process is essential for the unobstructed flow of genetic information into the protein synthesis phase.

The precision of intron removal from pre-mRNA is orchestrated by a complex and dynamic RNA-protein assembly called the spliceosome. The spliceosome consists of five small ribonucleoprotein particles (snRNPs), each containing small nuclear RNA (snRNA) and other component proteins. This intricate machinery is assembled de novo on the pre-mRNA substrate. The initial step in pre-mRNA splicing involves the accurate recognition of introns and exons. In molecular biology, this recognition is determined by three conserved positions in the pre-mRNA, denoted as 5′-SS (GU), BPS (A), and 3′-SS (AG). These positions are specifically identified by the spliceosome components.

Following the initiation of assembly, the spliceosome undergoes a sequential two-step esterification reaction, facilitated by component additions and conformational changes. This intricate process culminates in the generation of a mature mRNA, ready for translation.

Exon and Exome Sequencing

Exons collectively embody crucial genetic information, and exome sequencing stands as a genomic analysis technique designed to selectively capture and enrich DNA within the exon region of the entire genome. Leveraging sequence capture technology and high-throughput sequencing, this method employs advanced technological processes to unravel genetic information within genes.

Exome sequencing holds significant utility in screening for gene mutations associated with familial hereditary diseases and rare conditions. The precision afforded by this approach extends to guiding personalized medication strategies.

In comparison to whole-genome sequencing, whole-exome sequencing presents advantages in achieving high-depth sequencing. This is attributed to the smaller detection region, with exons constituting only 1% of all human gene sequences. This targeted approach enhances the detection of low-frequency and rare variants, concurrently reducing testing costs and storage space requirements.

Introns and Whole Genome Sequencing

Whole exome sequencing (WES) is a crucial clinical test extensively employed to identify pathogenic variants associated with single-gene genetic diseases. While it covers exons and exon-intron boundaries, a notable limitation is its inability to completely detect pathogenic variants in non-coding regions, potentially leading to oversight of deep intronic variants. On the other hand, whole genome sequencing (WGS) theoretically unveils all genetic variants, but its challenge lies in the intricate interpretation of a vast array of coding and non-coding variants.

RNA sequencing (RNA-seq) emerges as a valuable approach allowing direct detection of gene transcript levels. This facilitates the identification of pre-RNA splicing abnormalities, offering a foundation for experimental analysis and supplying evidence to support the pathogenicity of variants.

In clinical scenarios where patient symptoms strongly align with a genetic disease, yet WES fails to reveal potentially pathogenic variants in coding regions and exon-intron boundaries, consideration should be given to the possibility of deep intronic pathogenic variants. In such cases, the detection of deep intronic variants can be accomplished through RNA-seq or WGS.