In the long exploration process of life science, the pursuit of the essence of genetic material has always been the core driving force for the development of the discipline. In the 19th century, Mendel discovered the laws of separation and free combination of genetic factors, starting with the pea hybridization experiment, which opened the scientific cognition of human beings on the genetic laws. Subsequently, Morgan confirmed that the genes were linearly arranged on the chromosome through the genetic research of Drosophila, which pushed the genetic research from macro phenomenon to micro level.
As the core biomacromolecule carrying biological genetic information, DNA’s precise arrangement of base sequence and dynamic regulation of spatial conformation constitute the molecular basis for the storage, transmission and expression of genetic information. In-depth analysis of the structure and function of DNA has become an important cornerstone for understanding the nature of life, exploring the mechanism of disease and promoting the innovation and development of biotechnology.
In this paper, the basic principle and structural analysis of DNA are comprehensively summarized, and the scientific value and application potential of related research are also expounded.
What is DNA
DNA, namely deoxyribonucleic acid, is a biomacromolecule linked by deoxynucleotide through phosphodiester bond. Its molecular structure contains five elements: carbon, hydrogen, oxygen, nitrogen and phosphorus, but the seemingly simple combination of elements constitutes the genetic material basis of all known organisms on earth (except a few RNA viruses). At the cellular level, DNA mainly exists in chromosomes in the nucleus, and a small amount of DNA is also contained in organelles such as mitochondria and chloroplasts. These genetic materials jointly regulate the growth, development, reproduction and metabolism of organisms.
From the biological point of view, DNA, as the core carrier of biological genetic information, stores all the genetic information needed to synthesize protein, which is transmitted to the offspring cells through cell division, ensuring the stable continuation of the genetic characteristics of species. At the same time, the variation of DNA sequence is the raw material of biological evolution, which promotes species to adapt to environmental changes in natural selection. It can be said that DNA is not only the guardian of life continuation, but also the creator of biodiversity.

The biochemical structure of DNA (Amshawee et al., 2024)
DNA Molecular Structure
The breakthrough cognition of DNA molecular structure began in 1950s. In 1953, James Watson and Francis Crick successfully constructed a DNA double helix structure model based on rosalind Franklin’s X-ray diffraction data. This discovery was hailed as one of the greatest scientific achievements in the 20th century and laid a solid foundation for the birth of molecular biology.
The DNA molecule has a double helix structure, and its molecular skeleton is formed by alternating polymerization of deoxyribose-phosphodiester bonds, which are arranged on the outside in a regular spiral trajectory. Through the negative charge distribution of phosphate groups and the stability of sugar ring conformation, the overall structural rigidity of the molecule is given. The inner base pairs form a planar rung structure through an accurate hydrogen bond network, which constitutes the core functional unit of the double helix.

Basic DNA structure proposed by Watson and Crick (Zahid et al., 2013)
DNA molecules contain four kinds of nitrogenous bases: Adenine (A, adenine), Thymine (T, thymine), Guanine (G, guanine) and Cytosine (C, cytosine). The base pairs strictly follow Watson-Crick pairing principle: A and T are specifically bound by two hydrogen bonds, and G and C form stable base pairs by three hydrogen bonds. This base complementary pairing mechanism not only maintains the thermodynamic stability of the double helix structure, but also is the molecular basis of DNA semiconservative replication mechanism and accurate transmission of genetic information, ensuring high fidelity in the process of genetic material replication.
According to the classic Watson-Crick model, the double helix structure of DNA presents a right-handed helix conformation, and its helix parameter is that each helix period contains 10.5 base pairs (bp), the pitch length is 3.4 nm, and the molecular diameter is about 2 nm. The two polynucleotide chains follow the principle of anti-parallel arrangement, that is, one chain extends in the direction of 5′-3′ and the other chain extends in the direction of 3′-5′. This topological feature ensures the accuracy of base complementary pairing and the thermodynamic stability of three-dimensional space structure.
Structural analysis shows that there are major groove and minor groove with different depths and widths on the surface of the double helix. These conformational features constitute the specific recognition interface of protein-DNA interaction, which plays a key role in gene transcription regulation, DNA replication and repair and other life processes.

The double-helical forms of DNA (A, B, Z) according to X-ray fiber diffraction analysis (Amshawee et al., 2024)
From the chain termination method of Sanger sequencing in the early days to high-throughput sequencing (NGS) today, the technology iteration has realized the leap from reading hundreds of bases at a time to producing TB-level data in a single day, making it possible to sequence the whole genome, analyze transcriptome and study metagenome. These technologies are pushing life sciences from descriptive research to precise regulation, and providing key data support for personalized medical care, ecological protection and agricultural innovation.
Basic Function of DNA
The core function of DNA is to guide the synthesis of protein, which is accomplished through two key steps: Transcription and Translation. Gene is a specific nucleotide sequence with genetic effect in DNA molecules. It transforms genetic information into protein molecules with biological functions through these two processes, thus realizing the regulation of life activities.
Transcription
Transcription takes place in the nucleus, which is a process of synthesizing RNA using a strand of DNA as a template. This process requires the participation of RNA polymerase, and its specific steps are as follows:
- First, RNA polymerase recognizes and binds to the promoter region of DNA molecules, and the promoter is a DNA fragment with a specific sequence, which determines the starting position and direction of transcription.
- Subsequently, under the action of RNA polymerase, the DNA double-strand is partially untied, and one of the strands (template strand) is used as a template, and according to the principle of base complementary pairing (A-U, T-A, C-G, G-C, note that uracil U replaces thymine T in RNA), a RNA strand complementary to the template strand is synthesized by using free ribonucleotides.
- When RNA polymerase moves to the terminator region on the DNA molecule, the transcription process ends and the newly synthesized RNA chain is released from the DNA template.
According to different functions, RNA generated by transcription can be mainly divided into three types: Messenger RNA (mRNA), Transfer RNA(tRNA) and Ribosomal RNA(rRNA). Among them:
- mRNA carries the genetic information on DNA and is the template of protein synthesis.
- tRNA is responsible for transporting amino acids during translation
- rRNA is a component of ribosomes, and participates in the site construction of protein synthesis.

Show how gene is expressed from a pair of DNA strand to proteins (Abass et al., 2021)
Translation
Translation takes place on ribosomes in cytoplasm, which is a process of transforming nucleotide sequences in mRNA into amino acid sequences in protein. This process needs the cooperation of tRNA, ribosome and various protein factors, and can be divided into three stages: initiation, extension and termination.
- Initial stage: Firstly, mRNA binds to the small subunit of ribosome, and then tRNA carries the initial amino acid (usually formyl methionine in prokaryotes and methionine in eukaryotes) to complement and pair with the start codon (AUG) on mRNA through its anti-codon, and then the large subunit of ribosome binds to form a complete translation initiation complex.
- Extension stage: Under the action of extension factors, ribosomes move from 5′ end to 3′ end along mRNA. Every time a codon is moved, there is a tRNA carrying the corresponding amino acid that binds to the codon on mRNA through the anti-codon. The newly entered amino acid is connected with the previous amino acid through peptide bond to form polypeptide chain. Then, the unloaded tRNA leaves the ribosome, and the ribosome continues to move to meet the next tRNA carrying amino acids. This process is repeated and the polypeptide chain is gradually extended.
- Termination stage: When the ribosome moves to the stop codon (UAA, UAG, UGA) on the mRNA, no corresponding tRNA can bind to it, but the stop factor recognizes and binds to the stop codon, which promotes the ribosome disintegration and releases the newly synthesized polypeptide chain. Polypeptide chains are folded and modified to form protein molecules with specific spatial structure and biological functions.

The codon-based model for mRNA translation (Zhao et al., 2014)
The process from DNA to protein embodies the accuracy and efficiency of life information transmission. As the storage of genetic information, DNA transforms information into protein through transcription and translation, while protein is the main executor of life activities, which undertakes many functions such as catalyzing chemical reactions, forming cell structures and transmitting signals. This information transmission mode of "DNA→RNA→ protein" is called the Central Dogma of Molecular Biology, which is one of the core theories of modern biology.
Relationship Between DNA and Gene
DNA, as the genetic material of most organisms, is a long-chain macromolecule linked by deoxynucleotide through phosphodiester bond. Its basic unit consists of three parts: deoxyribose, phosphate group and nitrogenous bases. The concept of gene originated from Mendel’s "genetic factor". With the development of molecular biology, its definition is gradually accurate. From the functional point of view, gene is a nucleotide sequence with genetic effect in DNA molecule and the basic functional unit of genetic information. The "genetic effect" here mainly refers to the ability to encode protein or RNA, and the role of participating in the regulation of gene expression.
Modern research has found that genes include not only coding regions (such as exons), but also non-coding regulatory sequences (such as promoters and enhancers), which together constitute a complete functional unit of genes. The Human Genome Project shows that only about 2% of the sequences in human DNA belong to protein coding genes, and there are also a large number of "RNA genes" with regulatory functions in other non-coding regions, which further expands the connotation of genes.
Linear Distribution of Genes on DNA Chain
From the analysis of chromosome structure, DNA exists in the form of chromatin complex, and gene, as a genetic unit with specific functions, is essentially a nucleotide sequence fragment with coding function in DNA molecules. Human diploid somatic cells contain 23 pairs of chromosomes, each of which is composed of a linear double-stranded DNA molecule combined with histone and other protein.
Take chromosome 1 as an example, its DNA molecular length is about 249 million base pairs, and it encodes about 3000 functional genes by bioinformatics annotation. These genes are arranged linearly on DNA molecules and separated from each other by non-coding DNA regions, forming a unique gene cluster structure. This structural feature not only ensures the independence and accuracy of gene replication, but also provides an important structural basis for gene recombination (such as cross exchange) between homologous chromosomes during meiosis, which is of key significance for maintaining genetic diversity of species.

Three-way comparison of mouse, human, and bovine genomic sequences around Xist (Chureau et al., 2002)
Determination of DNA Sequence on Gene Function
The biological function of a gene is essentially determined by its DNA nucleotide sequence. In protein coding gene, triplet codon (e.g. start codon AUG coding methionine) accurately guides the linear arrangement of amino acids through genetic coding rules; The nucleotide sequence of non-coding RNA gene determines the secondary structure and tertiary spatial conformation of the transcription product through the principle of base complementary pairing, and then affects its intermolecular interaction ability.
It is worth noting that different genes on the same DNA molecule may share regulatory sequences and form a co-expression gene network. For example, in the lactose operon of Escherichia coli, multiple genes with related functions are controlled by the same regulatory region, which reflects the high efficiency of DNA sequence organization.

Exploring the parallels between the distributional hypothesis of word semantics and gene function (Kwon et al., 2024)
Information Transmission from DNA Replication to Gene Expression
Semiconservative replication mechanism of DNA is the basis of stable genetic inheritance. In the process of replication, the double helix is untied into two single strands, and each strand is used as a template to synthesize a new complementary strand, eventually forming two DNA molecules that are exactly the same as their parents. Because genes are part of DNA, this process ensures that genes are accurately transmitted to offspring during cell division.
Gene expression is firstly transcribed to synthesize RNA with a strand of DNA as a template, and then mRNA is paired with anti-codons in the translation process to guide tRNA to carry amino acids to synthesize protein on ribosomes. Non-coding RNA genes also participate in regulation through transcription to generate functional RNA. The whole process starts from the sequence information of DNA molecules, continues inheritance through replication, and realizes the functional transformation from nucleic acid to protein through transcription and translation, which constitutes the core path of life information transmission.

Semiconservative DNA replication (Furusawa et al., 1998)
Conclusion
As the carrier of genetic information, DNA molecules play a core role in the process of storage, transmission and expression of genetic information, and its structural characteristics and functional mechanism are still the frontier fields of interdisciplinary research in biology, chemistry and medicine. With the continuous progress of freezing electron microscopy, single molecule sequencing, gene editing and other technologies, the molecular mechanism of DNA in gene expression regulation, epigenetic modification, disease occurrence and development is gradually becoming clear.
These research results not only deepen the understanding of the nature of life, but also provide theoretical basis and technical support for overcoming global challenges such as major diseases, developing new bioenergy, and constructing synthetic biological systems, highlighting the important scientific value and application potential of DNA research in solving human sustainable development problems.
Learn More
References:
- Zahid, M., Kim, B., Hussain, R. et al. "DNA nanotechnology: a future perspective." Nanoscale Res Lett. 2013 8: (119 ) https://doi.org/10.1186/1556-276X-8-119
- Amshawee, A. M. et al. "Structure, Functions And Clinical Significance Of DNA: A Review Article." International Journal of Health & Medical Research. 2024 https://doi.org/10.58806/ijhmr.2024.v3i07n07
- Abass, Y.A., Adeshina, S.A. "Deep Learning Methodologies for Genomic Data Prediction: Review." J Artif Intell Med Sci. 2021 (2): 1–11 https://doi.org/10.2991/jaims.d.210512.001
- Zhao YB, Krishnan J. "mRNA translation and protein synthesis: an analysis of different modelling methodologies and a new PBN based approach." BMC Syst Biol. 2014 8: 25 https://doi.org/10.1186/1752-0509-8-25
- Chureau C, Prissette M., et al. "Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine." Genome Res. 2002 12(6): 894-908 https://doi.org/10.1101/gr.152902
- Kwon JJ, Pan J., et al. "On knowing a gene: A distributional hypothesis of gene function." Cell Syst. 2024 15(6): 488-496 https://doi.org/10.1016/j.cels.2024.04.008