Pathogen Population Genomics: How Whole Genome Sequencing Is Tracking Drug Resistance Worldwide
For Research Use Only. Not for use in diagnostic procedures or clinical decision-making.
Pathogen drug resistance is moving faster than our ability to detect it with traditional tools. A single Klebsiella pneumoniae isolate carrying both KPC and NDM carbapenemases can seed an outbreak that spans three continents before phenotypic testing catches up. A Plasmodium falciparum parasite with a new kelch13 mutation can erode artemisinin efficacy across the Greater Mekong Subregion within two transmission seasons. In both cases, the question is no longer whether resistance will emerge — it is how quickly we can see it coming.
Population-level whole genome sequencing (WGS) answers that question. By replacing single-isolate typing with genome-wide surveillance, WGS reveals not just which resistance genes a pathogen carries, but how those genes are spreading — through which clones, on which plasmids, across which borders, and under what selective pressures. This article maps the technologies, databases, and case studies that make pathogen population genomics the backbone of modern drug resistance surveillance, from malaria parasites crossing continents to bacterial AMR threatening the last line of antibiotics.
Figure 1: Population-level whole genome sequencing versus single-isolate typing for pathogen surveillance — WGS captures genome-wide variation, plasmid content, and transmission relationships that conventional typing misses.
Why Single Isolates Fall Short
Conventional pathogen typing — culture-based phenotyping, PCR for a handful of resistance genes, or 7-locus MLST — answers one question at a time for one isolate at a time. That approach served clinical microbiology for decades. It does not serve population-level surveillance.
Three structural limitations make single-isolate methods inadequate for tracking resistance at scale:
- Resolution blindness. MLST types a genome by seven housekeeping genes — roughly 0.03% of a typical bacterial genome. Two E. coli isolates assigned the same sequence type can differ by thousands of SNPs across the rest of the genome, including SNPs in efflux pump regulators, porin genes, or plasmid uptake machinery that determine resistance phenotypes. WGS captures all of it.
- Plasmid invisibility. Mobile genetic elements — plasmids, transposons, integrons — carry the genes that matter most for AMR dissemination, including blaKPC, blaNDM, and mcr colistin resistance determinants. A 2025 plasmidome analysis reconstructed 226,110 plasmids from 69,969 K. pneumoniae genomes and found that convergent plasmids encoding both carbapenemases and hypervirulence factors are increasing in North America and Europe. Conventional typing sees the chromosomal backbone. It misses the plasmid story entirely.
- No transmission context. PCR tells you a resistance gene is present. It cannot tell you whether two patients in different wards share the same strain, whether a hospital outbreak was imported from the community, or whether a clone circulating in poultry has jumped into human clinical settings. SNP-level resolution — typically 0–11 SNPs between technical replicates in harmonized WGS pipelines — provides that context.
The gap between single-isolate typing and population-level WGS is the gap between knowing resistance exists and understanding how it moves.
The WGS Surveillance Advantage
Population-level WGS operates on a different principle: sequence enough isolates across enough locations and time points to reconstruct transmission chains, measure selection, and detect emerging threats before they become epidemics.
Table 1: Single-Isolate Typing vs. Population-Level WGS
| Capability | Single-Isolate Typing | Population-Level WGS |
|---|---|---|
| Resolution | 7 genes (MLST) or ~10–20 resistance loci (PCR) | Entire genome (5–6 Mbp for bacteria; 23 Mbp for P. falciparum) |
| Plasmid/MGE detection | No | Yes — full plasmid reconstruction and conjugative element tracking |
| Transmission inference | No — isolates treated independently | Yes — SNP-threshold-based transmission clusters |
| Selection detection | No | Yes — genome-wide selection scans (iHS, FST, IBD) |
| Antimicrobial resistance prediction | Limited to tested drugs | Rule-based (ResFinder, AMRFinderPlus) or ML-based prediction; >87% sensitivity, >98% specificity vs. phenotypic AST |
| Real-time outbreak detection | No | Yes — cgMLST/HierCC clustering within hours of sequence upload |
| Cost per sample (2024–2025) | $5–30 (PCR panel) | $25–100 (targeted nanopore) to $150–500 (short-read WGS) |
The workflow transforms raw sequence data into actionable epidemiology through three stages. First, reads are assembled or mapped to a reference, and variants are called — SNPs, indels, structural variants, and, for organisms with small genomes, full-genome consensus sequences. Second, a genotyping scheme — core-genome MLST (cgMLST) for bacteria or identity-by-descent (IBD) network analysis for malaria parasites — assigns each isolate to a population framework, which population evolution analysis then uses to quantify selection pressure, gene flow, and demographic history. Third, phylogenetic reconstruction combined with temporal and geographic metadata produces transmission trees that can distinguish imported cases from local transmission, identify superspreader events, and flag clones under positive selection at drug-resistance loci.
The power of this approach was demonstrated at unprecedented scale during the COVID-19 pandemic, when over 16 million SARS-CoV-2 genomes were shared through GISAID, enabling near-real-time tracking of variant emergence — from Alpha through Omicron and its sub-lineages — on a timescale of weeks rather than months.
Tracking Resistance Through Time
Pathogen genomic surveillance did not begin with COVID-19, but the pandemic accelerated infrastructure, analytical methods, and data-sharing norms that are now being applied across a broader range of pathogens. The timeline below captures major milestones in this transition.
Table 2: Genomic Surveillance Timeline
| Year | Milestone |
|---|---|
| 2014–2016 | West African Ebola outbreak — real-time WGS used to reconstruct transmission chains, demonstrating feasibility of field genomic epidemiology |
| 2018 | Nextstrain platform published (Hadfield et al.), providing open-source phylodynamic visualization tools |
| 2020–2022 | SARS-CoV-2 pandemic — >16 million genomes shared via GISAID; Nextstrain becomes global standard for real-time variant tracking |
| 2022 | Mpox (monkeypox) outbreak — WGS reveals cryptic transmission and rapid evolution in a DNA virus previously thought to be slow-mutating |
| 2023 | MalariaGEN releases Pf7 dataset: 20,000 P. falciparum genomes from 33 countries; EnteroBase surpasses 1 million bacterial isolates |
| 2024 | WHO publishes updated Bacterial Priority Pathogens List (BPPL), explicitly recommending WGS for AMR surveillance |
| 2024–2025 | Nanopore-based targeted sequencing brings malaria genomic surveillance cost below $30/sample; EnteroBase adds RESTful API for programmatic access |
| 2025–2026 | Global studies reveal polygenic artemisinin resistance across three continents (17,565 genomes) and KPC/NDM co-producing CRKP in 32 countries |
Figure 2: Timeline of pathogen genomic surveillance milestones, from the West African Ebola outbreak to global-scale malaria and bacterial AMR genomic studies.
Malaria Resistance Across Continents
Malaria genomic surveillance illustrates the full arc from technology development to public health impact. In 2024, de Cesare and colleagues demonstrated that targeted nanopore sequencing of drug-resistance loci — performed directly from dried blood spots at a cost of approximately $25 per sample — could match the resolution of short-read WGS for detecting kelch13, crt, dhfr, dhps, and mdr1 variants. The approach works in district-hospital settings in Zambia, where cold-chain sample transport and high-throughput sequencing infrastructure are unavailable. This is not a proof-of-concept: it is an operational system.
What that system is now revealing at global scale is sobering. A landmark analysis of 17,565 P. falciparum isolates from 39 countries, spanning three decades of collection, mapped the global population genetic architecture of the parasite and identified region-specific signatures of drug adaptation. In Southeast Asia, clonal expansion of the pfkelch13-C580Y mutation was found to co-occur with mutations in pfarps10, pfrad5, and pfMyoF — confirming a polygenic model of artemisinin resistance that complicates molecular surveillance based on kelch13 alone. In the Horn of Africa and South America, elevated identity-by-descent around pfKIC7 and pfKIC9 — interactors of pfkelch13 — suggested convergent evolution under drug pressure through alternative genetic pathways.
Meanwhile, aggregating data across 112,933 samples from 73 countries (1980–2023) revealed that artemisinin-resistance markers in Africa — A675V, C469Y, R561H — are increasing at trajectories that mirror Southeast Asia 10 to 15 years earlier. In Uganda, a novel locus (px1, PF3D7_0720700) associated with decreased susceptibility to lumefantrine, mefloquine, and dihydroartemisinin was detected in 2025, underscoring the parasite's capacity to generate new resistance mechanisms faster than individual molecular assays can be updated.
The database ecosystem supporting this work — led by MalariaGEN's Pf7 open dataset with 20,000 genomes from 33 countries — provides the population baseline against which new mutations can be evaluated. Without that baseline, a novel kelch13 allele detected in a returning traveler is an anecdote. With it, that same allele is a population-genetic signal that can be tracked back to its geographic origin and forward to its transmission trajectory.
Bacterial AMR on the Rise
If malaria genomics demonstrates the power of population-level surveillance for a single eukaryotic pathogen, bacterial AMR genomics demonstrates its necessity across an entire kingdom of threats. The WHO Bacterial Priority Pathogens List 2024 categorized 24 antibiotic-resistant bacteria into three priority tiers, with carbapenem-resistant Acinetobacter baumannii, carbapenem-resistant Enterobacterales, and rifampicin-resistant Mycobacterium tuberculosis in the critical tier. For each of these, the WHO explicitly recommends integrating WGS into surveillance programs.
The case of carbapenem-resistant K. pneumoniae (CRKP) shows why. A 2025 analysis of 413 KPC/NDM co-producing K. pneumoniae genomes from 32 countries — placed against 64,354 background genomes — found that dual-carbapenemase prevalence rose from 0.03% to 3.10% of all K. pneumoniae genomes between 2014 and 2023. A single clonal group, CG1 (dominated by ST11-KL64 and ST11-KL47), accounted for 55% of all dual-carbapenemase isolates globally, with a measurable shift from KL47 to KL64 and increasing acquisition of hypervirulence genes. The 30-day mortality for bloodstream infections caused by these dual-carbapenemase strains was 56%, compared to 32.5% for KPC-only CRKP.
This is fundamentally a genomic story. The KPC and NDM genes are carried on plasmids — often hybrid plasmids encoding both carbapenemases simultaneously — and the global dissemination of CG1 cannot be explained by travel patterns alone. Genomic data reveals a combination of clonal expansion, plasmid transfer, and selection pressure from carbapenem use that varies by region: K2N1 (KPC-2 + NDM-1) dominates in China, while K3N1 circulates primarily in the United States. Without population-level WGS, these regional transmission patterns would be invisible, and infection control would operate blind.
For Escherichia coli, EnteroBase — the world's largest bacterial genotyping platform, holding assembled genome data from over 1.1 million isolates across multiple genera — provides cgMLST-based hierarchical clustering (HierCC) that assigns strains to population frameworks within hours of uploading short reads. The platform, used by more than 4,000 researchers from 127 countries, enables real-time detection of cross-border outbreak clusters. A researcher in one country uploading a sequenced isolate can learn, within hours, that it clusters with an outbreak strain circulating 5,000 kilometers away.
For both K. pneumoniae and E. coli, the convergence of drug resistance and hypervirulence on mobile plasmids — documented at scale in the plasmidome analysis of nearly 70,000 genomes — represents a threat that phenotypic AST alone cannot characterize. Only whole genome sequencing can determine whether a carbapenem-resistant isolate is also hypervirulent, whether its resistance genes are chromosomally integrated or plasmid-borne, and whether that plasmid is conjugative.
The Database Ecosystem
Pathogen population genomics depends on four classes of infrastructure that function as a layered surveillance network — and on the microbial population genomics services that generate the sequence data feeding each layer. Each database serves a distinct role, and the most effective surveillance programs use all four in combination.
Table 3: Key Genomic Surveillance Databases and Platforms
| Database | Primary Pathogens | Data Scale (2024–2025) | Core Function |
|---|---|---|---|
| GISAID | Influenza, SARS-CoV-2, RSV, mpox, dengue, chikungunya, Zika | ~18 million sequences | Sequence sharing with provenance metadata; access controls that incentivize data contribution |
| Nextstrain | SARS-CoV-2, influenza, H5N1, dengue, mpox, Lassa, measles, 8 others | 15 automated pathogen builds | Real-time phylodynamic visualization; MLR fitness forecasting for variant growth advantage |
| EnteroBase | Escherichia/Shigella, Salmonella, Clostridioides, Vibrio, Helicobacter, Yersinia, Streptococcus, Mycobacterium | >1.1 million isolates | cgMLST + HierCC clustering for bacterial strain typing; RESTful API for programmatic access |
| MalariaGEN Pf7 | Plasmodium falciparum | 20,000 genomes from 33 countries | Open population-genetic baseline for drug resistance, diagnostic, and vaccine target surveillance |
The ecosystem faces real tensions. GISAID's scale — ~18 million sequences — makes it indispensable, but its governance model has faced sustained criticism, spurring the 2024 launch of Pathoplexus, an open-source, scientist-governed alternative focused initially on Ebola, CCHF, and West Nile viruses. In October 2025, GISAID ended SARS-CoV-2 flat-file data updates to Nextstrain, disrupting the GISAID-based global SARS-CoV-2 builds that had operated since February 2020. The open-data (GenBank/INSDC) builds continue, but are geographically biased toward countries with open-data submission policies.
These are not merely technical issues. They determine whether a public health agency in a low-resource setting can detect an emerging clone in real time. The databases that work best — those with open APIs, transparent governance, and low barriers to data contribution — are the ones that most effectively translate genomic data into public health action.
Figure 3: The four-layer database ecosystem supporting pathogen population genomics — GISAID, Nextstrain, EnteroBase, and MalariaGEN — each serving a distinct role in the global genomic surveillance network.
From Data to Action
The value of pathogen population genomics is measured not in genomes sequenced but in interventions changed. Across the case studies described above, WGS has influenced public health decision-making in specific and reproducible ways:
- Vaccine formulation. The WHO's annual influenza vaccine strain selection — managed through the Global Influenza Surveillance and Response System (GISRS) — now incorporates WGS data shared via GISAID EpiFlu to identify antigenic drift variants months before they dominate circulation.
- Antimalarial drug policy. The detection of artemisinin partial resistance markers (kelch13 R561H, A675V, C469Y) in East Africa, initially identified through WGS surveillance and confirmed by the 112,933-sample meta-analysis, has triggered national-level reviews of first-line ACT regimens in Rwanda, Uganda, and Ethiopia.
- Hospital infection control. Real-time cgMLST-based clustering through EnteroBase has enabled hospitals to distinguish nosocomial transmission from independent community acquisitions within 48 hours of culture — directing infection control resources toward wards with active transmission rather than wasting them on unrelated cases that happen to share an MLST type.
- Antibiotic stewardship. When a carbapenem-resistant K. pneumoniae isolate is identified by WGS as carrying a plasmid-borne metallo-beta-lactamase (e.g., NDM), clinicians and stewardship teams know that ceftazidime-avibactam — effective against many serine carbapenemases — will likely fail, and can adjust empiric therapy before phenotypic AST results return.
- Cross-border outbreak response. The European Antimicrobial Resistance Genes Surveillance Network (EURGen-Net), built on WGS interoperability standards, has detected and contained multi-country K. pneumoniae and E. coli outbreaks by linking genomic clusters across national surveillance systems.
Each of these applications depends on the same infrastructure: robust reference databases, harmonized bioinformatics pipelines, and data-sharing agreements that enable cross-jurisdictional comparison. Building that infrastructure requires investment beyond sequencing machines — it requires training, bioinformatics capacity, and sustained political commitment to data sharing.
The COVID-19 pandemic demonstrated that genomic surveillance infrastructure, once built, does not stay built. Investments made during the acute phase of the pandemic eroded rapidly as public attention shifted. The pathogens documented in this article — malaria parasites, carbapenem-resistant bacteria, emerging viral threats — do not respect funding cycles. The platforms (Nextstrain, EnteroBase, GISAID, MalariaGEN) and the methods (cgMLST, IBD network analysis, targeted nanopore sequencing) exist and are operational. What remains is the political and financial commitment to use them at the scale the resistance threat demands.
While this article focuses on pathogen surveillance, many of the same sequencing platforms and population-genetic principles apply to human disease studies. For a detailed comparison of GWAS and WGS approaches in the context of complex disease genetics, see our GWAS vs Whole Genome Sequencing guide.
Frequently Asked Questions
WGS detects drug resistance by comparing a pathogen's genome sequence against curated databases of known resistance determinants — acquired genes (e.g., blaKPC, mcr), point mutations (e.g., kelch13 propeller domain SNPs in P. falciparum, rpoB mutations in M. tuberculosis), and regulatory mutations (e.g., porin gene disruptions in Enterobacterales). Rule-based tools like ResFinder, AMRFinderPlus, and the CARD database match sequence features to resistance phenotypes. Machine learning models trained on genotype-phenotype pairs can additionally predict resistance from non-canonical or polygenic determinants that rule-based tools miss. WGS achieves over 87% sensitivity and over 98% specificity compared to gold-standard phenotypic antimicrobial susceptibility testing.
Genomic epidemiology is the integration of pathogen whole genome sequencing with epidemiological metadata — time, location, patient demographics, and exposure history — to reconstruct transmission dynamics at population scale. It differs from clinical diagnostics in both purpose and scale: where a clinical microbiology lab asks whether a single isolate carries a resistance gene, a genomic epidemiology program asks how that resistance gene entered a population, through which transmission chains it is spreading, and whether it is under positive selection. The core methods include phylogenetic reconstruction, molecular clock dating, phylogeographic inference, and cgMLST-based clustering, all of which depend on having a population-level sample of genomes rather than individual clinical isolates.
The WHO Bacterial Priority Pathogens List 2024 identifies the highest-priority targets: carbapenem-resistant Acinetobacter baumannii, carbapenem-resistant and third-generation cephalosporin-resistant Enterobacterales (including K. pneumoniae and E. coli), and rifampicin-resistant M. tuberculosis in the critical tier. For malaria, P. falciparum with artemisinin partial resistance and partner drug resistance is the primary target. For viruses, influenza viruses with reduced neuraminidase inhibitor susceptibility and SARS-CoV-2 variants with potential monoclonal antibody escape are surveillance priorities. Across all pathogen groups, the common principle is the same: any pathogen for which resistance emergence would compromise first-line therapy warrants population-level genomic surveillance.
Population-level WGS requires representative sampling across geographic space, time, and host populations — not just convenience samples from treatment failures or severe cases. It uses cgMLST or SNP-based clustering schemes designed for cross-laboratory harmonization rather than single-lab identification. It depends on shared reference databases (EnteroBase, GISAID, MalariaGEN) that define the population baseline against which new variants are assessed. And its primary output is not an individual patient report but a population-level assessment of transmission dynamics, selection pressure, and emerging threats. A clinical isolate sequenced in isolation tells you what resistance genes are present in that sample; the same isolate sequenced within a population framework tells you where those genes came from and where they are likely to go.
Costs vary by pathogen, throughput, and technology. Targeted nanopore sequencing panels for P. falciparum drug-resistance genes can be delivered at approximately $25 per sample using dried blood spots, with results in 2–3 days. Bacterial WGS on Illumina platforms ranges from $50 to $150 per isolate for consumables, depending on coverage depth and multiplexing strategy. The dominant cost drivers are not sequencing reagents but bioinformatics infrastructure, personnel training, and the recurring expense of maintaining database connectivity and analytical pipelines. Programs that invest in cloud-based or centralized bioinformatics — rather than duplicating capacity at every sequencing site — achieve substantially lower total cost per isolate.
Not yet, and for most clinical applications it should complement rather than replace phenotypic AST. WGS can predict resistance to many drug classes with high accuracy, particularly when resistance is mediated by acquired genes or well-characterized chromosomal mutations. However, for drugs where resistance mechanisms are incompletely characterized — or where gene expression, efflux pump regulation, or epistatic interactions determine phenotype — phenotypic AST remains the reference standard. The integration of WGS into surveillance (population-level monitoring, outbreak detection, retrospective analysis) is mature. Integration into direct clinical decision-making (replacing phenotypic AST for individual patient management) is progressing but requires further validation, regulatory approval, and laboratory accreditation.
References:
- Billows N, Dombrowski JG, Thorpe J, et al. Global-scale population genetic analysis of Plasmodium falciparum identifies region-specific patterns of malaria parasite adaptation. Nature Communications. 2026. doi:10.1038/s41467-026-73006-2
- Hadfield J, Megill C, Bell SM, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121-4123. doi:10.1093/bioinformatics/bty407
- Matsumura Y, Yamamoto M, Gomi R, et al. Integrating whole-genome sequencing into antimicrobial resistance surveillance: methodologies, challenges, and perspectives. Clinical Microbiology Reviews. 2025;38(4):e0014022. doi:10.1128/cmr.00140-22
- Dyer R, Zhou Z, Aanensen DM, et al. EnteroBase in 2025: exploring the genomic epidemiology of bacterial pathogens. Nucleic Acids Research. 2025;53(D1). doi:10.1093/nar/gkae902
- Sati H, Walsh TR, Tacconelli E, et al. The WHO Bacterial Priority Pathogens List 2024: a prioritisation study to guide research, development, and public health strategies against antimicrobial resistance. The Lancet Infectious Diseases. 2025. doi:10.1016/S1473-3099(25)00118-5
- Zhang F, Liu X, Li Z, et al. Tracking international and regional dissemination of the KPC/NDM co-producing Klebsiella pneumoniae. Nature Communications. 2025;16:5574. doi:10.1038/s41467-025-60765-7
- de Cesare M, Hamainza B, Hsiang MS, et al. Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing. Nature Communications. 2024;15:1413. doi:10.1038/s41467-024-45688-z
- Sherry NL, Lee JYH, Giulieri SG, et al. Genomics for antimicrobial resistance — progress and future directions. Antimicrobial Agents and Chemotherapy. 2025;69(5):e0108224. doi:10.1128/aac.01082-24
- MalariaGEN, Ahouidi A, Ali M, et al. Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome Open Research. 2023;8:22. doi:10.12688/wellcomeopenres.18681.1
For Research Use Only. Not for use in diagnostic procedures or clinical decision-making.