In the fields of microbial ecology, environmental science, and biodiversity research, Metabarcoding and Metagenomics have become the core tools to analyze complex biological communities with their unique advantages. However, there are significant differences in research objectives, technical principles, and application scenarios, and they also face inherent limitations.
This paper will analyze the changes brought by the technology selection framework, inherent limitations and deviations, quantitative accuracy, collaborative application mode, and long reading and long sequencing, and provide a comprehensive reference for technology selection and scheme design of related research.
It is the key to ensuring the research efficiency and the reliability of the results to accurately select macro-bar code or macro-genomics technology before conducting the research on biological communities. The application scenarios of the two can be divided into three dimensions: research objectives, cost constraints, and sample characteristics, forming a clear decision-making basis.
When the research focuses on issues related to species classification, or is limited by cost and sample quality, the macro barcode technology shows significant advantages, and the specific application scenarios include:
When the research goal involves functional gene analysis, special biological group research, or genome reconstruction, metagenomics technology becomes an inevitable choice, and the specific application scenarios include:
Proposed workflow of a clinical microbiology laboratory using a CMg approach to identify putative pathogens (Forbes et al., 2018)
| Explore Related Research Services | Resource |
|---|---|
| Microbial DNA Metabarcoding | Five Dimensions for Metabarcoding and Metagenomics Comparison |
| Metagenomics | Metagenomics Sequencing Overview |
Although DNA metabarcoding code and metagenomics technology are widely used in research, due to the influence of technical principles and experimental procedures, there are inevitable inherent limitations and deviations, which directly affect the accuracy of results and the reliability of interpretation.
A. Limitations and deviations of the Metabarcoding
a) DNA Metabarcoding depends on PCR amplification and specific bar code genes, and its limitations mainly come from the amplification process, gene selection, and reference database, which are as follows:
i. Deviation between PCR and primers: In the process of PCR amplification, there are differences in the binding efficiency between template DNA and primers of different species, which leads to the amplification efficiency of some species being much higher than that of other species, thus making the sequence reading length of this species overestimated, while the species with low amplification efficiency (such as some rare microorganisms) are underestimated or even omitted, which can not truly reflect the relative abundance of species in the sample.
ii. Limited to the organisms covered by the target barcode gene: Metabarcoding can only detect the organisms carrying the target barcode gene, but can't identify the organisms lacking the gene or whose gene sequence is too variable.
iii. Lack of direct functional information: Metabarcoding can only provide species classification information, but cannot relate to the functional characteristics of organisms.
iv. Completely dependent on the reference database: The accuracy of species annotation depends entirely on the integrity and accuracy of the sequence in the reference database. At present, a large number of species (especially rare species and undiscovered new species) are still missing in public reference databases (such as GenBank and Silva), which makes these species unable to be annotated in macro bar code results (i.e., "unidentified taxon"), accounting for 10%-40%. In addition, some sequences in the database have classification errors or incomplete annotation information (for example, they are only labeled at the genus level, but not at the species level), which further affects the accuracy of species identification.
B. Limitations and deviations of metagenomics
b) Although metagenomics can obtain all the DNA information in the sample, it is also obviously limited by the cost, sample quality, and data analysis methods, which are as follows:
i. The cost and computational burden are significantly higher: the sequencing depth of metagenomics usually needs to reach 10-100 bp/sample, which is much higher than that of macro bar code (usually 0.5-2 bp/sample), and the sequencing cost is about 5-20 times that of macro bar code. At the same time, the assembly, annotation and analysis of marine sequence data (usually millions to tens of millions of sequences) require high-performance computing resources (such as the memory in the server needs more than 64G, and the analysis period is as long as several weeks), and it needs professional bioinformatics personnel to operate, which is difficult for small laboratories or research projects with limited funds.
ii. The quality and quantity of DNA are high: the construction of a metagenomics library needs complete and high-quality DNA (the fragment length is usually ≥1kb), and the DNA concentration needs to reach a certain threshold (usually ≥100ng/μL). For low-concentration DNA (< 10 ng/μL) or severely degraded DNA (fragment length < 500bp) in environmental samples (such as soil and water), it is difficult to construct a high-quality library, resulting in a low proportion of effective sequences in sequencing data, and even the sequencing cannot be completed.
iii. Low-abundance genome assembly is difficult: Genome assembly in metagenomics depends on sufficient sequence coverage. For microorganisms with an abundance of less than 0.1% in samples, the coverage of genome sequences is usually insufficient, and it is difficult to assemble complete genome fragments, resulting in the information of these low-abundance microorganisms being omitted.
iv. There is uncertainty in function prediction: metagenomics predicts biological function by detecting the existence of functional genes, but the existence of genes is not the same as gene expression.
Simplified workflow of current DNA-based methods for analysis of (micro) organisms within environmental samples, using honey as an example (Vuong et al., 2024)
In the study of biological communities, it is the key to analyzing the dynamic changes of community structure and evaluating the interaction between species to accurately quantify the relative abundance or genome abundance of species in samples. However, both metabarcoding and metagenomics have defects in quantitative accuracy, forming their own quantitative dilemmas, and the difference between them is mainly due to the influence of technical principles on abundance assessment.
Metabarcoding infers the relative abundance of species by the sequence reading length obtained by sequencing, but its quantitative result is only semi-quantitative due to the variation of PCR amplification deviation and gene copy number, which can not accurately reflect the true abundance of species. The specific problems include:
Therefore, the quantitative results of macro bar code can only be used to compare the relative change trend of species abundance in different samples, but they can not accurately give the actual cell number or true relative abundance of species.
Metagenomics directly sequences all the DNA in the sample without PCR amplification, and evaluates the relative genome abundance of species by the ratio of genome coverage or sequence reading length to genome size, which avoids the PCR amplification deviation and has better quantitative accuracy than macro bar code, but it is still unable to achieve absolute quantification due to the difference in genome size. The specific features include:
Therefore, although the quantitative results of metagenomics are more accurate than that of metabarcoding, which can better reflect the relative genomic abundance of species, when it is necessary to evaluate the relative cell abundance or absolute cell number, it still needs to be corrected by combining with other technologies (such as flow cytometry and qPCR), and it is impossible to achieve completely accurate quantification alone.
Case examples illustrating different scenarios for MAG-V9 OTU match scenarios, results for the "rho" proportionality metrics (Zavadska et al., 2024)
In ecological research, DNA metabarcoding and metagenomics have changed from independent applications to collaborative integration, and a hierarchical research paradigm with resource optimization and precise focus as the core has been constructed. This paradigm adopts the progressive strategy of "initial screening-deep analysis" and becomes the mainstream technical framework of complex ecological research.
The hierarchical research process follows the logic of "Qualcomm screening-in-depth analysis": the key samples and core groups are quickly locked by metabarcoding, and the functional mechanism of communities is revealed by macro-genomics. Its implementation is divided into three stages:
This process has obvious advantages: At the resource level, the research cost will be reduced by 50%-70%. In terms of research depth, targeted function analysis improves efficiency. In terms of reliability, the two confirmed each other.
At present, it has been widely used in soil remediation, water treatment, intestinal microorganisms, and other research fields, and has become the core technical path to connect community structure and functional mechanism research.
Abundance of two marine microbes illustrated by three hypothetical scenarios (Brennan et al., 2023)
The development of long-read sequencing technologies, such as Oxford nanopore and PacBio single-molecule real-time sequencing, has broken the traditional technical barriers between macro barcode and metagenomics. With the sequencing reading length of several kilobytes to tens of kilobytes, it can cover both barcode genes and functional genes at the same time, realize the synchronous acquisition of single reading length of species classification and functional information, and effectively solve the problem that the structure and function of traditional technology are out of touch.
This technology has achieved breakthroughs in three dimensions:
Although there are some limitations, such as a high error rate (about 1%-5%) and high cost, with the progress of technology, it is expected to become mainstream in 3-5 years, which will promote the paradigm change of community research and have great potential in the fields of uncultured microorganism analysis.
Hierarchical research process can maximize the value of resources, while long-read may further break the boundary between them. Clarifying the differences and synergy potential between them can provide the basis for selecting precise technologies for different research objectives and promote the development of biome research in a more efficient and in-depth direction.
Comparison between DNA metabarcoding and metagenomics
| Comparison Aspect | DNA Metabarcoding | DNA Metagenomics |
|---|---|---|
| Definition | Targets specific barcode regions (e.g., 16S rRNA for bacteria) in mixed samples to identify taxa | Sequences all DNA in a sample, providing both taxonomic and functional gene information of the entire community |
| Sample Requirement | Can handle low - quality or degraded DNA, suitable for environmental samples like soil, water, and gut contents | Requires higher - quality and quantity of DNA, samples should be free of significant inhibitors |
| Workflow | Involves DNA extraction, PCR amplification of barcode regions, and high - throughput sequencing | Includes DNA extraction, library preparation without PCR (ideally), and high - throughput sequencing |
| Bioinformatics Analysis | Focuses on clustering sequences into OTUs or ASVs and taxonomic assignment | Entails sequence assembly, gene prediction, functional annotation, and taxonomic profiling at a more comprehensive level |
| Limitations | Subject to PCR biases, which can distort taxon abundances, limited to targeted barcode regions, highly dependent on reference databases | High cost, high computational demands; challenges in assembling genomes, especially for low - abundance organisms; functional predictions may not reflect actual in - situ functions |
From the five dimensions of technology selection, inherent limitations, quantitative accuracy, collaborative application, and future trends, DNA metabarcoding and metagenomics have their own adaptation scenarios:
References
Please submit a detailed description of your project. We will provide you with a customized project plan to meet your research requests. You can also send emails directly to for inquiries.
Please fill out the form below: ×