Population genomics + GWAS across 425 mulberry accessions identifies population structure, gene flow, demographic history, and loci for leaf size/biomass and flowering time.
This casebook summarizes a published high-impact study (Advanced Science, 2023) that combined population genomics and GWAS to clarify mulberry (Morus spp.) domestication and expansion, quantify gene flow/introgression, reconstruct demographic history, and identify genomic regions linked to agronomic traits (leaf size/biomass and flowering time). The same end-to-end workflow is widely applicable to crops, forestry species, aquaculture species, and non-model organisms where researchers need both evolutionary context and actionable markers.
Data source: All study findings summarized here are from the original publication (DOI: 10.1002/advs.202300039).
At a glance (study facts):
Minimal banner showing "Mulberry Evolution & GWAS" with leaf, DNA, and gene-flow icons plus a small Manhattan-plot motif.
Mulberry (Morus spp.) is a widely cultivated economic plant across many developing countries in Asia. Its importance goes beyond yield: as the only food source for silkworms, mulberry sits at the foundation of sericulture and has played an outsized role in historical trade and cultural exchange.
Despite its long cultivation history, the genetic and evolutionary story of mulberry domestication and spread has been less clear than for many staple crops. For breeders and population geneticists, this uncertainty creates practical problems:
This published study addressed those gaps by pairing large-scale whole-genome resequencing with population genetic inference and GWAS, producing a single, coherent narrative from variants → history → trait loci.
The study analyzed 425 mulberry resources/accessions. Importantly, the cohort combined:
That "new data + published data integration" approach is increasingly common in population genomics because it can expand geographic coverage and genetic diversity without restarting from scratch—provided that harmonization and QC are handled carefully.
All individuals were profiled by whole-genome resequencing, with an average sequencing depth of ~20×. For population genomics, that depth supports robust variant discovery and reduces uncertainty in genotype calls relative to shallow sequencing—especially useful when results will feed into multiple downstream analyses (structure, gene flow, demography, GWAS).
From the 425 accessions, the authors reported:
This dense variant set is the enabling resource that makes the rest of the paper possible: it increases resolution for phylogeny and structure, improves power for detecting introgression signals, and provides the marker density GWAS needs to localize trait associations.
A strength of this study is that it does not treat population genetics and GWAS as separate projects. Instead, it uses a connected workflow where evolutionary inference informs trait mapping, and trait mapping is interpreted in the context of structure and gene flow.
Flowchart showing 425 mulberry accessions → whole-genome resequencing → variant calling → population structure, diversity, LD, gene flow, and history, plus phenotypes and GWAS leading to breeding markers.
Workflow overview of population genomics and GWAS used to study mulberry domestication history and trait loci
Below is the end-to-end workflow used in this study, summarized for clarity.
Using the SNP dataset, the authors built a phylogenetic tree and clustered all accessions into five distinct genetic groups. They also examined genetic structure and geographic distribution, tying genetic clusters to where accessions were sampled or cultivated.
Phylogenetic tree, LD decay curve, PCA, and admixture plots showing population structure of 425 mulberry accessions.
In plant populations—especially those shaped by domestication, breeding, and human-mediated movement—population structure is not just a "nice plot." It directly impacts inference:
If your goal includes GWAS, start by characterizing population structure (and relatedness) so association models can be properly controlled.
The authors used f3 statistics and ABBA–BABA (D-statistics) to test gene flow among mulberry populations. These methods are widely used to detect admixture and introgression signals that are not always obvious from phylogenetic trees alone.
Key finding: different mulberry populations showed extensive gene flow, implying frequent inter- and intra-specific introgression during domestication and cultivation.
Domestication is often presented as a simple split between "wild ancestor" and "cultivated descendant." Real plant domestication histories are frequently more complex:
By explicitly testing gene flow, this study avoided an oversimplified narrative and instead supported a model where domestication and expansion occurred amid substantial genetic exchange.
If your species has a long cultivation history or wide geographic spread, plan for explicit gene flow/admixture testing—because it can change both the evolutionary story and how you interpret GWAS hits.
The study inferred demographic trajectories using PSMC and SMC++, methods designed to estimate historical changes in effective population size (Ne) from genome-wide data.
Key demographic signals reported:
Demographic history and dispersal of mulberry, including geographic sampling, genetic diversity, admixture signals, and effective population size through time. (from paper Figure 3).
Demographic events shape genetic variation and can mimic or mask selection:
When demography is analyzed alongside structure and gene flow, you can interpret genetic patterns more confidently and design downstream studies more strategically (e.g., subgroup GWAS, balanced sampling, targeted crosses).
Demographic inference is not only "history"; it informs practical choices like cohort design, expected LD decay, and how transferable trait markers may be across lineages.
Genomic loci associated with leaf size during mulberry domestication, integrating phenotype differences, GWAS peaks, selection signals, and candidate gene/LD evidence. (from paper Figure 4).
The study performed GWAS for:
These are breeding-relevant traits in mulberry: leaf biomass connects directly to sericulture productivity, while flowering time can affect adaptation and management.
The GWAS identified:
The authors further proposed two candidate genes as potential regulators:
GWAS highlights loci associated with phenotypes, but breeding action requires careful follow-up:
Still, the study demonstrates a powerful principle: when population structure and gene flow are accounted for, GWAS can yield practical, chromosome-level signals and candidate genes even in complex domestication contexts.
The most credible GWAS in domesticated species is anchored by population genomics (structure + gene flow + demography), not performed in isolation.
This case is worth emulating because it combines four components that—together—produce publishable insights and practical outputs:
For teams planning similar projects, the value is not only the biological story of mulberry, but the reusable study design logic.
Below is a planning checklist written for researchers designing a population genomics + GWAS study (plants, aquaculture species, forestry trees, or non-model organisms).
If you want to run a mulberry-like workflow in your own organism, the project can be organized as modular deliverables—so you can start with population structure and expand into GWAS and marker development.
Typical service modules aligned to this case:
Tell us your species, target traits, cohort size, and whether you have existing WGS data to integrate. We’ll recommend a sequencing + analysis plan aligned to population structure, gene flow, and GWAS best practices.
For a project modeled after this published study, deliverables are typically packaged as a reproducible report plus machine-readable files:
Whole-genome resequencing across diverse accessions, followed by population structure, gene flow (f3/D-statistics), and demographic inference (PSMC/SMC++) provides a high-resolution view.
A total of 425 accessions (290 newly resequenced + 135 integrated from earlier studies).
The study reported an average depth of ~20× for whole-genome resequencing.
2,359,117 SNPs and 934,187 short InDels (<10 bp).
Because domestication and cultivation often involve mixing among lineages; gene flow tests can reveal admixture that a tree alone may not capture.
The study used f3 statistics and ABBA–BABA (D-statistics).
PSMC and SMC++ were used to infer historical changes in effective population size (Ne).
Leaf size/biomass-related traits and flowering time.
Leaf size/biomass signals were highlighted on Chromosome 7, and flowering time signals on Chromosome 5.
The study proposed MaBXY5 and MaERF110 as potential key genes related to leaf traits and flowering time, respectively.
References: