Inquiry
x
quote Request a Quote

DNA Barcoding vs. Metabarcoding: Five Comparing Parts

Inquiry      >

In the field of molecular ecology and biodiversity research, DNA Barcoding and Metabarcoding have become core tools for overcoming the limitations of traditional morphological identification, thanks to their precise molecular recognition capabilities. Although both are based on genetic marker sequencing, there are essential differences in research scale, technical process, and application scenarios, and at the same time, they form a close technical complementarity.

This paper will systematically compare the characteristics of the two technologies from five dimensions: core definition, workflow, bioinformatics analysis, complementary role of reference library construction, inherent challenges and limitations, and provide theoretical support for the technology selection of related research.

Core Definitions: From Individual to Community

DNA barcodes and metabarcoding are obviously different in the research field, and their essential differences are rooted in the scale positioning of the research object. DNA barcode technology takes species identification of a single organism as the core and realizes accurate classification and identification through sequence analysis of standardized gene fragments. Metabarcoding technology focuses on the composition analysis of complex biological communities, and uses the mixed DNA information in environmental samples to realize the quantitative assessment of ecosystem species diversity, thus completing the research paradigm transition from the individual level to the community level.

DNA Barcode: Molecular ID of a Single Sample

DNA barcode technology was put forward by Canadian scientist Hebert in 2003. Its core definition is: through sequencing a short and highly conservative standardized genetic marker, the species-level identification of a single biological sample can be realized. The key to this technology lies in the selection of standardized genetic markers, which need to meet three core conditions:

  • The sequence is highly conserved within the same species (small intraspecific variation)
  • There is a significant difference among different species (large interspecific variation)
  • It is easy to amplify by universal primers

At present, recognized standard barcodes have been formed in different biological groups: Mitochondrial Cytochrome Oxidase I (COI) is the main animal group, with a length of about 650bp and an interspecific variation rate of 10%-20%, which can effectively distinguish more than 90% animal species. The combination of the chloroplast rbcL gene and matK gene is commonly used in plant groups, which cover the key functions of plant photosynthesis and evolutionarily conserved areas, respectively, and jointly realize accurate species identification. The standard bar code of the fungi group is the ITS (internal transcribed spacer) sequence, which has a high copy number and fast evolution rate in the fungal genome, and can effectively distinguish related fungi species.

Detailed procedural specifics of three technologies integrated with DNA mini-barcodes, as presented in one review (Gao et al., 2019)Species/species hypotheses identification success rate by site using the custom regional database and the UNITE database at two sequence similarity thresholds (Kerr et al., 2023)

DNA Metabarcoding: List of Community Species in Complex Samples

DNA Metabarcoding technology is a community-level molecular tool developed on the basis of the DNA barcode. Its core definition is: total DNA is extracted from mixed samples containing multiple organisms (such as soil, water, intestinal contents, etc.), and all detectable biological groups in the samples are identified at one time by high-throughput sequencing of bar code genes, to generate a list of community species composition.

Different from the logic of "single sample → single sequence → single species" in DNA barcode, the logic of Metabarcoding is "mixed sample → massive sequence → multiple species". For example, when a freshwater sample is analyzed by macro bar code, the sample may contain the DNA of dozens or even hundreds of organisms, such as phytoplankton, zooplankton, fish exfoliated cells, and microorganisms. By designing primers for universal bar codes of eukaryotes, after obtaining millions of sequences in high-throughput sequencing, we can identify diatoms, rotifers, cladocera, small fishes, and other groups contained in the samples through bioinformatics analysis, and even detect trace DNA of rare species (such as a few exfoliated cells of endangered fish).

The paradigm differences between them can be summarized as follows: DNA barcode is targeted individual identification, which solves the problem of what species this one is; Metabarcoding is a panoramic community analysis to solve the problem of which species are in this pile. The former is the technical basis of the latter, and the latter is the scale expansion of the former.

The Workflow: A Side-by-Side Comparison

The difference in technical process is the direct embodiment of the adaptation of the two methods to different research objectives. From sample input, laboratory operation to result output, DNA barcode and Metabarcoding have formed clear one-to-one and one-to-many process differences, which can be divided into three core links:

Sample Input: Single Sample vs Mixed sample

Sample input is the first key dividing line between the two technologies, which directly determines the direction of subsequent experimental design:

  • Sample requirements of DNA barcodes: It should be a single biological individual or tissue with distinguishable morphology and no external pollution. In plant research, it is necessary to collect a complete leaf (no other plant tissues are mixed), and in insect research, it is necessary to select a complete adult or larva (no parasites or attached organisms are polluted). Cross-contamination should be strictly avoided in sample processing, because even a small amount of exogenous DNA (such as skin cells of operators and fungal spores in the air) may lead to wrong sequencing results and affect the accuracy of species identification.
  • Sample Requirements of DNA Metabarcoding: It should be a mixed environment sample containing various biological DNA, and the common types include soil (containing microorganisms, small invertebrates and plant root DNA), water (containing plankton, fish exfoliated cells and microbial DNA), animal intestinal contents (containing food residues, intestinal microorganisms and parasite DNA), feces (containing host intestinal microorganisms and undigested food DNA) and so on. Unlike DNA barcodes, Metabarcoding samples do not need to be purified, and the complexity of samples is the core of their research goal. However, its collection and preservation need to avoid DNA degradation (such as adding EDTA anticoagulant and cryopreservation) and reduce external pollution (such as using aseptic sampling tools).

Software for the bioinformatics processing of metabarcoding data, categorized by input read type, software type, interface, produced feature type, and operating system (Hakimzadeh et al., 2023)Brief procedural details of three technologies coupled with the DNA mini-barcodes involved in one review (Gao et al., 2019)

Laboratory operation: Sanger Sequencing vs High-throughput Sequencing

The core difference of laboratory operation is reflected in the selection of PCR amplification strategy and sequencing technology, which directly determines the scale and efficiency of sequence output:

  • The laboratory process of DNA barcode: The process is simple and standardized, and the core is "single sample → single PCR→ Sanger sequencing". The specific steps include:
  • Extracting genomic DNA from a single sample (CTAB method or commercial kit)
  • PCR amplification was carried out with universal primers for specific barcode genes
  • The PCR products were verified by agarose gel electrophoresis to ensure that the amplification was successful and there was no miscellaneous band

After the PCR product was purified, it was sequenced by using Sanger sequencing technology (dideoxy chain termination method) to obtain a complete sequence with a length of about 500-1000bp. Sanger sequencing has the advantages of high accuracy, long sequence reading, complete coverage of barcode genes, and meeting the needs of single species identification, but the throughput is extremely low, and only one sequence of a sample can be obtained in one reaction.

The laboratory process of DNA Metabarcoding: The process is complex and needs to meet the Qualcomm demand, and the core is "mixed sample → multiplex PCR→ high-throughput sequencing." The specific steps include:

  • Extracting total DNA from mixed samples (using a kit that can simultaneously extract DNA from animals, plants and microorganisms)
  • Double PCR: In the first round of PCR, universal primers (such as 18S rRNA gene primers for studying eukaryotic communities) were used to amplify the target fragment. In the second round of PCR, a sample label and a sequencing adapter were added to the 5' end of the primer, and each sample corresponds to a unique Barcode sequence, which can realize the mixed sequencing of multiple samples
  • Quantify the PCR products of all samples and mix them according to equimolar concentration to construct a sequencing library
  • Using NGS technology (such as Illumina MiSeq, NovaSeq platform) for high-throughput sequencing, millions to tens of millions of short sequence readings (usually 150-300bp) can be obtained in one reaction.

High-throughput sequencing has the advantages of extremely high throughput and low cost, and can process dozens to hundreds of samples at the same time, but the sequence reading length is short, and the sequence of different samples needs to be split by Barcode.

Results Output: Single Sequence vs Massive Sequence Matrix

The form of the result output directly adapts to the research objectives of the two technologies, which determines the direction of subsequent analysis:

Output result of DNA barcode: A complete barcode sequence with quality control (such as a 650bp COI sequence). The sequence must meet the quality standard of "no fuzzy base (n) and no frameshift mutation", and then it can be compared with reference databases (such as BOLD and GenBank). If the similarity with the reference sequence of a known species is ≥98%, and it is clustered with this species on the phylogenetic tree, the species identity can be determined; If the similarity is less than 95%, it may represent a new species or a species that has not been included, which needs further verification.

The output of the DNA Metabarcoding is a sample-sequence-abundance matrix. Specifically, it includes:

  • Original sequencing data (usually in FASTQ format), including millions of short sequences with Barcode
  • Cleaning sequence after quality filtering, de-embedding, and de-redundancy
  • Operational Classification Unit (OTU) or amplicon Sequence Variant (ASV) obtained by clustering according to sequence similarity. OTU is usually clustered with 97% similarity, representing a potential species. ASV obtains accurate sequences through error correction algorithm, and the resolution is higher than OTU
  • Finally, the "sample-OTU/ASV abundance matrix" is output, that is, the sequence number (reading length) of each OTU/ASV in each sample and the corresponding species annotation results.

Principal Component Analyses (PCA) were applied to the morphological and molecular surveys of meadows (Cowart et al., 2015)Software for metabarcoding data bioinformatics processing categorized by input read type, software type, interface, produced feature type and operating system (Hakimzadeh et al., 2023)

The Bioinformatic Divide: From Sequence to Answer

At the level of bioinformatics analysis, there are significant differences between the DNA barcode and the Metabarcoding. The DNA barcode analysis process is highly standardized, while DNA Metabarcoding analysis involves multi-step verification, and the process is more complicated. There is an obvious gap between them in the analysis method and difficulty.

Bioinformatics analysis of DNA barcodes: Simple and Focused

DNA barcodes take single-sequence species matching as the core, and the analysis process is simple, relying on mature tools, with low requirements for computing resources. The main steps are as follows:

  • Sequence quality control: Use Chromas, SeqMan, and other software to check the sequencing peaks, eliminate the sequences with fuzzy peaks and a large number of N bases, and splice the two-way sequencing results with MEGA software to obtain a complete barcode sequence.
  • Sequence comparison and species annotation: Submit the quality-controlled sequence to the BLAST tool of the BOLD system for similarity comparison. In the BOLD system, species can be directly identified when the similarity between the sequence and the reference sequence of the specimen with a certificate is ≥98%. 95%-98% need to be verified by morphological characteristics. < 95% are labeled as "unidentified" or "potential new species".
  • Phylogenetic verification: Using MEGA or PAUP software, using the adjacency method (NJ) or maximum likelihood method (ML), the phylogenetic tree was constructed using the target sequence and the reference sequence of related species. When the target sequence and a species sequence form a single-line branch with a bootstrap value of ≥90%, the species identity is confirmed.

Bioinformatics Analysis of DNA Metabarcoding: Complex and Multidimensional

DNA Metabarcoding takes the community analysis of massive sequences as its core, and needs to process millions of sequence data. After multi-step filtering and statistical analysis, the community structure analysis is completed, which requires high computing resources such as servers and memory. The specific process is as follows:

  • Pretreatment of original data: Using Cutadapt or Trimmomatic software to remove the linker and primer sequence. Split samples according to Barcode sequence to avoid cross-contamination. Filter low-quality short sequences with Phred mass value < 20, base ratio > 5% and length < 150bp.
  • Sequence de-embedding and de-redundancy: Use UCHIME or VSEARCH software to detect and eliminate embedded sequences. Redundancy is eliminated by using USEARCH or DADA2 software, and the unique sequence and its abundance information are retained, thus reducing the computational load.
  • OTU/ASV clustering: OTU clustering uses USEARCH to cluster clean sequences with 97% similarity, and selecting the most abundant sequence as the representative. ASV analysis uses DADA2 or Deblur software to correct errors, obtain high-resolution ASV, and distinguish the differences of related species.
  • Species annotation: Compare the OTU/ASV representative sequence with the reference database for annotation. Silva or RDP is commonly used in bacterial communities, and BOLD or PR2 is commonly used in eukaryotes. The annotation criteria are: ≥97% to species level, 90%-97% to genus level, and 80%-90% to family level.
  • Community diversity analysis: α diversity is to calculate Observed OTUs/ASVs, Chao1 index, and Shannon index to evaluate the diversity within the sample. β diversity is to compare the community structure between samples based on Bray-Curtis or UniFrac distance through PCoA and NMDS. Statistical test is to ANOVA or PERMANOVA was used to analyze the influence of environmental factors on community structure.

Success rates of species/species hypotheses identification across sites, using both the custom regional database and the UNITE database at two sequence similarity thresholds (Kerr et al., 2023)Principal Component Analyses (PCA) for the morphological and molecular surveys of meadows (Cowart et al., 2015)

Complementary Roles in Building Reference Libraries

DNA barcode and Metabarcoding form an inseparable complementary relationship in the construction and perfection of the reference sequence database, and jointly support species identification and community analysis at the molecular level.

DNA barcode is the core builder of the reference library. It takes a single voucher specimen verified by morphology as the research object, and provides the basic data of specimen-sequence-classification information for the reference library by sequencing standardized genetic markers (such as animal COI gene and plant rbcL + matK gene).

DNA Metabarcoding is the gap finder of the reference library. Its high-throughput sequencing ability to environmental mixed samples can detect a large number of sequences that cannot be matched in the existing reference library (namely "Annotated OTU/ASV"), and these sequences often correspond to rare species, new species, or weak taxonomic groups missing from the reference library.

Inherent Challenges and Limitations of Each Technique

DNA barcode and macro barcode technology have significantly promoted the development of biological identification and community research. However, restricted by technical principles, experimental operation, and external environment, it has inherent limitations in practical application, which not only affects the accuracy of analysis results, but also restricts the depth and breadth of research, and needs to be systematically sorted out and deeply analyzed.

A. Challenges and Limitations of the DNA Barcode

a) Sample dependence and flux bottleneck: DNA barcodes need to be input from a single morphologically distinguishable pure sample, so it is impossible to directly analyze mixed samples. And relying on Sanger sequencing technology, only one sample can be processed in a single reaction, and the flux is extremely low, which makes it difficult to meet the needs of a large-scale biodiversity survey.

b) Limitation of tag applicability: There are differences in standard barcodes of different biological groups (for example, fungi depend on ITS sequences and animals depend on COI sequences), and there is no universal tag suitable for all organisms; Some groups (such as bacteria) have limited ability to distinguish barcodes because of large intraspecific variation and small interspecific variation.

c) Morphological pre-requirement: Although complex morphological identification can be avoided, pure individuals still need to be screened by morphology in the sample collection stage, which is not suitable for small and microorganisms (such as planktonic bacteria) or broken samples (such as tissue fragments in animal intestinal contents).

B. Challenges and limitations of the DNA Metabarcoding

a) Insufficient quantitative accuracy: The sequence reading length of macro bar code is not completely linear with the actual biomass of species, and it is influenced by PCR amplification preference (such as the difference of amplification efficiency of universal primers for different species) and DNA extraction efficiency (such as the difficulty in extracting DNA from thick-walled microorganisms), so it is easy to overestimate or underestimate species abundance, and only semi-quantitative analysis can be realized.

b) Bioinformatics is highly complex: It needs to process millions of short sequences, involving multi-step analysis such as quality filtering, demosaicing, OTU/ASV clustering, and the parameter setting of each link will affect the results. And the requirements for computing resources are high, so it is difficult for ordinary laboratories to complete it independently.

c) The reference library is highly dependent: The accuracy of species annotation directly depends on the integrity of the reference library. For taxa with weak taxonomic research (such as some soil protozoa), due to the lack of reference sequences, it may be possible to annotate at the family/genus level but not at the species level, or even produce false negative results (that is, missing the existing species).

Conclusion

In the future, with the improvement of reference databases, the upgrading of sequencing technology, and the simplification of bioinformatics tools, the two technologies will be further integrated to provide more efficient molecular solutions for biodiversity protection, ecological environment monitoring, food safety detection, and other fields.

DNA barcoding and metabarcoding comparison

Comparison Dimension DNA Barcoding DNA Metabarcoding
Core Definition Identifies single specimens via sequencing a short, standardized genetic marker Identifies entire communities from bulk/environmental samples via mass-sequencing of barcode regions
Sample Input and Workflow Input: Single, morphologically distinct organism
Process: Sanger sequencing of one PCR product
Input: Complex DNA mixture
Process: NGS of millions of PCR amplicons
Bioinformatic Analysis Sequence alignment and phylogenetic tree for one sequence Quality filtering → MOTU/ASV clustering → denoising → taxonomic assignment for millions of sequences
Role in Reference Libraries Foundation: Provides curated, voucher-linked reference sequences (e.g., BOLD) Gap-filler: Reveals unknown diversity, flags taxa missing from databases to guide barcoding
Inherent Challenges Low throughput, requires manual specimen handling, cannot analyze communities Semi-quantitative, PCR biases, high computational demand, database dependency

Both DNA barcode and DNA Metabarcoding are based on molecular recognition theory, but there are obvious differences in research dimensions, technical paths, and application fields. To align DNA barcoding or metabarcoding with your study objectives, follow these targeted guidelines:

  • Choose DNA barcoding if your goal is single-specimen identification: Opt for it when confirming the species of individual, morphologically distinguishable samples or building voucher-linked reference sequences.
  • Choose metabarcoding if your goal is community-wide profiling: Use it to characterize taxa in mixed/environmental samples (e.g., soil, water, gut contents) or conduct high-throughput biodiversity surveys.
  • Prioritize barcoding for low-throughput, high-confidence needs: Select it when sample purity is achievable and you require definitive species IDs (e.g., validating new species, authenticating herbal products)—avoid it for complex mixtures or large-scale ecological monitoring.
  • Prioritize metabarcoding for broad, scalable inquiries: Deploy it for questions about community structure (e.g., pollution impacts on soil microbes, dietary analysis) or when samples are degraded/fragmented—complement it with barcoding later to fill reference library gaps if unannotated taxa emerge.

References

  1. Aylagas E, Borja A, Rodríguez-Ezpeleta N. "Environmental status assessment using DNA metabarcoding: towards a genetics based Marine Biotic Index (gAMBI)." PLoS One. 2014 9(3): e90529.
  2. Gao Z, Liu Y, Wang X, Wei X, Han J. "DNA Mini-Barcoding: A Derived Barcoding Method for Herbal Molecular Identification." Front Plant Sci. 2019 10: 987.
  3. Hakimzadeh A, Abdala Asbun A, Albanese D, et al. "A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses." Mol Ecol Resour. 2024 24(5): e13847.
  4. Cowart DA, Pinheiro M, Mouchel O, et al. "Metabarcoding is powerful yet still blind: a comparative analysis of morphological and molecular surveys of seagrass communities." PLoS One. 2015 10(2): e0117562.
  5. Kerr, M.; Leavitt, S.D. "A Custom Regional DNA Barcode Reference Library for Lichen-Forming Fungi of the Intermountain West, USA, Increases Successful Specimen Identification." J. Fungi. 2023 9: 741.
* For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Inquiry
Customer Support & Price Inquiry
  • For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Copyright © 2025 CD Genomics. All rights reserved. Terms of Use | Privacy Notice