banner
CD Genomics Blog

Explore the blog we've developed, including genomic education, genomic technologies, genomic advances, and genomics news & views.

Meta Intent: Establish the mathematical and biological rationale for selecting ordination methods in complex microbiome, transcriptomic, and population-genetic datasets.

High-dimensional omics data do not become simpler just because they are plotted in two dimensions. A PCA score plot, a PCoA map, and an NMDS ordination may all look like colored clusters on a page, but they are built on different mathematical rules. PCA asks which orthogonal directions retain the most variance in the original feature matrix. PCoA asks how a sample-to-sample distance matrix can be embedded into a Euclidean coordinate system. NMDS asks how well the rank order of dissimilarities can be preserved in a low-dimensional map. When these methods are treated as interchangeable, the figure may still look polished, but the biological interpretation can drift quickly.

That distinction matters more now because modern omics projects rarely stay within one data type. A single study may move from microbiome profiling to transcriptomics, then to host genotyping or multi-layer integration. These data types do not share the same geometry. A transformed expression matrix, a sparse abundance table, and a phylogeny-aware beta-diversity matrix are not three versions of the same input. They encode different structures and different assumptions. That is why ordination is not just a plotting decision. It is a modeling decision about what kind of biological difference should count.

A practical workflow starts with three questions. What is the input: a feature matrix or a distance matrix? What is the signal of interest: variance, ecological dissimilarity, or ranked separation? What level of distortion can be tolerated before the plot stops being trustworthy? Those questions matter whether the project begins with RNA-Seq profiling, a microbiome survey built on 16S/18S/ITS amplicon sequencing, or a population-scale design using whole genome SNP genotyping. The method should follow the geometry, not the other way around.

PCA, PCoA, and NMDS reduce complexity through different geometric rules

Figure 1. PCA, PCoA, and NMDS reduce complexity through different geometric rules: PCA projects variance from a feature matrix, PCoA embeds pairwise distances through eigendecomposition, and NMDS preserves ranked dissimilarities through iterative optimization.

PCA: The Linear Projection of Variance

PCA remains the default ordination in many omics workflows because its logic is clean and efficient. Start with a data matrix. Center the variables. Then decompose the matrix so that the first axis captures the largest possible variance, the second captures the next largest variance under orthogonality, and each later axis explains progressively less. In practice, this is usually done through singular value decomposition. Conceptually, the result is the same: principal components are orthogonal directions associated with descending variance.

This makes PCA especially useful when the matrix itself is the object of interest. In bulk transcriptomics, samples often vary along broad gradients shaped by tissue identity, treatment, batch, or cell-state composition. In host genetics, PCA is widely used to summarize ancestry structure and major allele-frequency trends across populations. When the data are dense and continuous after sensible preprocessing, PCA is often the clearest first view of structure. That is one reason it remains central in population-scale whole genome sequencing workflows, genotype screening, and large-scale expression profiling.

Mathematical foundation: SVD, covariance, and what PC1 actually means

PCA does not search for clusters first. It searches for directions that explain variance. If the centered data matrix is written as (X), singular value decomposition expresses that matrix in terms of orthogonal directions and associated magnitudes. Those directions correspond to eigenvectors of the covariance structure, while the associated eigenvalues quantify how much variance is retained along each axis.

That distinction matters because a PCA score plot is not a map of ecological distance. It is a coordinate system optimized to summarize variation in the original variables. PC1 is not “the strongest biological difference” in a universal sense. It is the direction that captures the most variance under the assumptions of the transformed matrix. In a population genetics setting, that often aligns with ancestry or stratification. In expression studies, it may align with treatment, tissue, or batch. In both cases, the plot is useful because the covariance structure itself is meaningful.

That is why PCA still plays an important role in population structure control in GWAS workflows and in genotype-focused analyses where hidden structure must be identified before downstream modeling. The method is fast, interpretable, and mathematically transparent. But those strengths depend on the geometry of the input.

The “double zero” problem: why sparse microbiome tables break PCA intuition

Microbiome abundance tables behave differently. They are sparse. They are compositional. They are uneven. Most importantly, zeros do not all mean the same thing. A taxon absent in two samples could reflect genuine biological absence, insufficient sequencing depth, taxonomic ambiguity, or a detection limit. PCA has no built-in way to separate those possibilities. In Euclidean space, shared zeros can make samples look close even when the ecological meaning of those zeros is weak or ambiguous.

This is the classic double-zero problem. It does not mean PCA is mathematically wrong. It means the geometry implied by raw or lightly normalized abundance tables may not match the biology that the reader thinks the plot is showing. Two samples can appear similar because they both lack many taxa, even though the informative part of their community structure lies elsewhere.

The problem becomes more serious because microbiome data are compositional. Relative abundances live on a simplex, not in unconstrained Euclidean space. When one taxon increases, the proportions of others must shift, even when their absolute quantities may not have changed in the same way. That induces covariance patterns that are partly mathematical artifacts of closure. If PCA is applied too casually, the leading axes may summarize compositional constraints as much as real ecological separation.

This is one reason microbiome studies often move quickly toward distance-based ordination after the initial quality-control phase. Projects built around community-scale metagenomic shotgun sequencing or targeted profiling often need a geometry that respects ecological dissimilarity more directly than plain covariance on a sparse table can.

The horseshoe effect is a warning sign, not a decorative shape

A curved or arch-shaped PCA plot is often treated as a visual curiosity. It should be treated as a warning. When a long ecological gradient is compressed into a small number of linear axes, the projected samples can bend into a horseshoe. The plot may still look structured, but that structure is already telling you that the first two linear components are struggling to represent the underlying relationships faithfully.

This is where many interpretations go wrong. A reader sees an arch and starts telling a cluster story. The more careful reading is different: the data likely contain gradient-like or nonlinear structure that is being forced into a linear frame. In microbiome settings, that often means the first two PCs are not randomly distorted. They are revealing that the geometry is mismatched.

Once that happens, the right response is not to keep polishing the PCA plot. The right response is to ask whether the data should be analyzed through a biologically defined distance instead. In some composition-aware workflows, log-ratio transformations can make PCA-like analyses more defensible. But that does not erase the original lesson. A neat scatter plot does not guarantee that the ordination logic fits the data-generating process.

PCA can become unstable on sparse compositional microbiome tables

Figure 2. PCA can become unstable on sparse compositional microbiome tables because shared zeros do not imply shared biology. When long ecological gradients are forced into linear axes, arch- or horseshoe-like distortions may emerge. An arch is an interpretive warning, not a cluster claim.

When PCA is still defensible after transformation

The key question is not whether PCA is “allowed” for microbiome-like data. The key question is whether preprocessing has made Euclidean geometry sufficiently meaningful for the biological question at hand. If transformation reduces compositional distortion, stabilizes variance, and lowers the influence of zero inflation, PCA may still serve as a useful first-pass summary. But that use should be justified, not assumed.

Use PCA when the feature matrix itself is meaningful and Euclidean geometry remains defensible after preprocessing. That often includes transformed expression matrices, genotype-derived relationship structure, and other dense quantitative assay outputs. Be more cautious when the table is sparse, compositional, heavily zero-inflated, or strongly phylogenetically structured. In those cases, a visually tidy PCA may still be the wrong summary.

That rule matters for teams working across assay types. A project may begin with total RNA sequencing and later expand into microbial profiling or host-microbe integration. The ordination method should change when the geometry changes. Reusing the same plotting habit across all layers is convenient, but it is rarely optimal.

PCoA (MDS): Integrating Non-Euclidean Distance Metrics

PCoA starts from a different premise. Instead of summarizing variance across original variables, it starts with a matrix of pairwise distances among samples. The core question is no longer “Which directions explain variance?” It becomes “How can these sample-to-sample dissimilarities be represented in a Euclidean coordinate system?” That shift sounds subtle, but it changes the biological meaning of the ordination from the ground up.

Once a distance matrix has been chosen, the ordination inherits the assumptions built into that distance. A Bray-Curtis PCoA is not just a generic community plot. It is a map of abundance-weighted ecological dissimilarity. A UniFrac PCoA is not merely another version of the same figure. It is a map of separation defined through phylogenetic branch structure. The same samples can therefore occupy very different positions depending on what counts as biologically meaningful difference.

This is why PCoA is so central in microbiome analysis. It gives researchers a formal way to move from complex community tables to an interpretable low-dimensional embedding without pretending that raw Euclidean variance on the original abundance matrix is always the right lens. That flexibility is often the difference between a plot that looks technical and a plot that is actually aligned with the biological question.

The power of choice: Bray-Curtis for abundance, UniFrac for phylogeny

Distance selection is the real center of gravity in PCoA. Bray-Curtis is widely used when the goal is to compare communities on the basis of abundance composition. It is sensitive to differences in counts or relative abundance structure. If two samples contain similar taxa but in very different proportions, Bray-Curtis will usually separate them clearly.

UniFrac changes the question. Instead of focusing only on abundance patterns, it incorporates phylogenetic branch length. Unweighted UniFrac is driven mainly by lineage presence or absence. Weighted UniFrac adds abundance weighting, so dominant lineages exert more influence on the final distance. These are not minor technical tweaks. They are different biological definitions of community difference.

That distinction becomes critical in practice. If two communities share the same broad clades but differ mainly in abundance distribution, Bray-Curtis and weighted UniFrac may show stronger separation than unweighted UniFrac. If rare but phylogenetically distinct lineages differ between groups, unweighted UniFrac may highlight a contrast that Bray-Curtis compresses. In other words, the distance metric defines the biological contrast before PCoA even begins.

That is the idea Figure 3 should make visually obvious. It is not a three-box summary of metrics. It is a remapping exercise. Start with one cohort. Keep the samples the same. Then change the rule used to define biological distance. The map changes with the rule. Under Bray-Curtis, abundance shifts pull samples apart in one way. Under unweighted UniFrac, branch membership reshapes the same cohort in another way. Under weighted UniFrac, abundant lineages exert stronger force and bend the geometry again. The point is not that one map is more elegant. The point is that distance choice decides what kind of biological difference becomes visible.

This is especially important in studies using absolute quantitative amplicon strategies for community comparison, where abundance scaling changes interpretation.

The same samples can generate different maps because the metric defines the geometry

Figure 3. The same samples can generate different maps because the metric defines the geometry. Bray-Curtis emphasizes abundance change, unweighted UniFrac emphasizes lineage presence or absence, and weighted UniFrac emphasizes abundance-weighted phylogenetic difference.

Negative eigenvalues: what they really mean

One of the most important strengths of PCoA is also one of the most misunderstood. If the chosen distance matrix is not strictly Euclidean, the eigendecomposition can produce negative eigenvalues. This is not just a technical nuisance. It is a signal that the distance relationships cannot be represented perfectly in ordinary Euclidean space.

That matters because users often treat the first two axes as if they were neutral coordinates. They are not. If the distance structure itself resists Euclidean embedding, the low-dimensional map is already under pressure. Small negative eigenvalues may not destroy interpretability. Large negative components are more serious. They tell you that the distance geometry and the plotting geometry are not fully aligned.

Corrections such as Lingoes or Cailliez can be useful because they modify the distance structure to make the embedding more tractable. But they should not be used as cosmetic fixes. The real question is always biological and geometric at the same time: does the corrected map still represent the distinction you actually care about, or has the method been forced into compliance at the cost of meaning?

The hidden driver of PCoA stability: normalization before ordination

A common mistake is to argue about Bray-Curtis versus UniFrac while treating normalization as an afterthought. In reality, normalization changes the geometry before ordination ever starts. That means cluster stability is often shaped as much by preprocessing as by the ordination method itself.

Consider the difference between TSS and CLR-style workflows. TSS keeps the data in a relative-abundance frame, which often pairs naturally with Bray-Curtis or UniFrac. CLR moves the data into log-ratio geometry, making Euclidean or Aitchison-style interpretations more defensible for compositional data. The cluster pattern may change substantially between these workflows. That does not automatically mean one plot is wrong. It means the meaning of distance has changed upstream.

This is where many analysts lose interpretive control. They compare two ordination plots and ask which method is more stable. The deeper question is different: which preprocessing-plus-distance combination best matches the biological contrast under study? If the experiment is about ecological turnover, one answer may be best. If it is about compositional balance among taxa, another may be better.

That issue becomes even more important in programs that combine multi-omics analysis with microbial community profiling, or in designs that bring together transcript structure through full-length transcripts sequencing (Iso-Seq) and phenotype-level comparisons across modalities. The ordination plot is only the visible end of a much longer assumption chain. If the upstream choices distort the geometry, the downstream figure cannot repair it.

NMDS: Rank-Based Non-Linear Ordination

NMDS is best understood as a rank-first method. It does not try to preserve the exact numerical value of every pairwise distance. It tries to preserve the order of those distances as faithfully as possible in a lower-dimensional space. That difference is not technical trivia. It changes what the plot can and cannot promise.

In practical terms, NMDS starts with a dissimilarity matrix and asks a harder question than PCoA: if sample A is more similar to B than to C in the original high-dimensional space, can that ordering be preserved in two or three dimensions? The algorithm does not solve this in one clean decomposition step. It searches iteratively. It proposes a configuration, measures how badly the low-dimensional distances disagree with the original rank order, updates the configuration, and repeats. That is why NMDS is often more flexible for messy ecological data, but also more computationally demanding and more dependent on convergence quality.

This flexibility is one reason NMDS remains attractive in microbiome and community ecology. Sparse, nonlinear, and semimetric relationships do not always behave well when forced into exact Euclidean coordinates. In those cases, preserving the rank structure of dissimilarity can be more honest than pretending that every absolute distance can be represented faithfully. For datasets generated through microbial identification workflows, exploratory community comparisons, or mixed ecological abundance surveys, that can make NMDS a useful alternative when PCA and PCoA begin to overstate geometric precision.

The iterative search: what NMDS is actually optimizing

NMDS is sometimes described too loosely as “a nonlinear ordination method.” That description is true, but not useful enough. The real center of the method is the stress function. Stress measures the mismatch between ranked dissimilarities in the original space and the distances among points in the final low-dimensional configuration. Lower stress means the two-dimensional or three-dimensional map is doing a better job of respecting the original ordering of sample relationships.

This also explains why NMDS plots should never be judged by appearance alone. A clean-looking separation on paper does not prove that the ordination is faithful. Two layouts may look equally interpretable, but one may preserve ranked relationships far better than the other. Unlike PCA, which offers explained variance, or PCoA, which offers eigenvalue-based summaries, NMDS asks the analyst to look directly at goodness of fit.

The iterative nature of the method has two practical consequences. First, different random starts can converge to slightly different solutions, especially in noisy or weakly structured data. Second, the dimensionality choice matters immediately. Forcing a difficult dataset into two dimensions may produce a visually attractive plot with unacceptable distortion, whereas moving to three dimensions may reduce stress enough to make the structure interpretable again. This is why NMDS is not just a fallback method. It is a method that demands active diagnostic judgment.

Interpreting stress: why stress > 0.2 is the danger zone

Stress is often reported, then ignored. It should be treated as the central warning label on the plot. Low stress suggests that the low-dimensional configuration is representing the original ranked dissimilarities reasonably well. Moderate stress calls for caution. Once stress moves above about 0.2, the biological story attached to the plot becomes much harder to defend with confidence.

That does not mean a high-stress NMDS is useless. It means the reader should stop treating the figure as a faithful spatial map. At that point, the ordination may still suggest broad tendencies, but fine-grained claims about cluster separation, intermediate positioning, or trajectory-like relationships become risky. A point that appears visually close to another point may not truly represent a trustworthy neighborhood in the original data structure.

The temptation is to keep the two-dimensional figure because it fits the page. That is often the wrong decision. If stress is too high, the response should be methodological rather than cosmetic. Increase the dimensionality. Reconsider the dissimilarity metric. Revisit filtering or normalization. Or ask whether PCoA is more appropriate if the distance structure is stable enough for eigen-based embedding. A plot should not stay in the paper just because it is visually convenient.

NMDS searches iteratively for a low-dimensional configuration that preserves ranked dissimilarities

Figure 4. NMDS searches iteratively for a low-dimensional configuration that preserves ranked dissimilarities rather than exact distances. Stress below about 0.1 usually supports a faithful visual summary, values between 0.1 and 0.2 require caution, and stress above 0.2 signals a danger zone for biological interpretation.

When rank preservation is safer than exact embedding

NMDS is a strong choice when exact metric faithfulness is less realistic than rank faithfulness, when ecological relationships are nonlinear, or when the analyst wants a method that is less tied to linear variance logic than PCA. It is especially useful when the underlying dissimilarity structure carries information that would likely be distorted by forcing a fully Euclidean embedding.

But NMDS should not be treated as an automatic upgrade over PCoA. If the distance matrix is biologically well chosen and behaves reasonably in Euclidean space, PCoA often gives a more directly interpretable coordinate system with less computational burden. NMDS earns its place when rank preservation is the safer target than exact embedding.

PCA vs. PCoA vs. NMDS for large omics cohorts

The practical differences among the three methods become sharper as datasets grow. PCA is typically the most computationally efficient when the input is a dense feature matrix and preprocessing has already produced a defensible Euclidean structure. PCoA adds flexibility because it accepts a distance matrix, but that flexibility comes at the cost of metric dependence and possible non-Euclidean behavior. NMDS is the most tolerant of awkward ranked dissimilarity structures, but it is usually the most computationally expensive and the most sensitive to convergence details as sample count rises.

For cohorts above 10,000 samples, that tradeoff becomes real. PCA can often remain tractable if the feature matrix is well managed. PCoA may become expensive because the full pairwise distance matrix scales badly in memory and computation. NMDS can become the slowest option of all, especially if multiple starts are needed for stable convergence. In other words, the best ordination is not just the one with the nicest assumptions. It is the one whose assumptions remain defensible at the scale of the study.

That matters for larger programs based on bacterial RNA sequencing, high-throughput community profiling, or multi-batch omics integration, where analysts often move from a few dozen samples to hundreds or thousands. Method selection that seems trivial in a pilot study can become a real computational bottleneck in production.

2026 Perspective: Beyond Ordination to Embedding

By 2026, the question is no longer whether classical ordination methods still matter. They do. The real question is where they stop being enough. UMAP and t-SNE have expanded the visualization toolkit, especially in single-cell and manifold-like high-dimensional datasets, but they should not be treated as universal replacements for PCA, PCoA, or NMDS. They answer a different class of questions.

Classical ordination methods remain valuable because they are tied to explicit geometry. PCA is tied to variance in the input matrix. PCoA is tied to the chosen distance matrix. NMDS is tied to the ranked structure of dissimilarity. That makes them especially important in settings where interpretation must stay close to biological meaning. In beta-diversity analysis, for example, the investigator often needs the ordination to reflect an explicit ecological or phylogenetic distance definition. In that setting, PCoA still has a clearer inferential anchor than UMAP.

UMAP and t-SNE become more attractive when the goal shifts from interpretable ordination to neighborhood visualization. They are often useful for local topology, manifold unfolding, and visually exploring fine-grained structure in very high-dimensional data. But local neighbor preservation is not the same thing as biologically transparent distance interpretation. A UMAP cluster may be useful for discovery. It is not automatically a substitute for a beta-diversity ordination whose geometry was defined deliberately through Bray-Curtis, UniFrac, or Aitchison logic.

Local vs. global topology: when to use PCoA for beta-diversity and UMAP for trajectory-like structure

A useful rule is this: if the scientific question depends on a defined dissimilarity metric, stay close to ordination. If the scientific question depends more on local manifold structure, neighborhood continuity, or trajectory-like variation, embedding methods may become more informative.

That distinction matters across assay types. In microbiome analysis, beta-diversity comparisons often benefit from PCoA because the interpretation is directly tied to the metric. In transcriptomics or single-cell style applications, UMAP may reveal local transitions or subtype neighborhoods more effectively than PCA or PCoA. But even there, PCA often remains an important upstream step because it provides a stable, denoised representation before manifold learning begins.

For projects that connect 10x spatial transcriptome sequencing or other high-resolution transcriptomic strategies to downstream phenotype analysis, it is increasingly common to use both families of methods in the same workflow: ordination for interpretable structure, embedding for local exploration.

Integration strategies: connecting genotype space and phenotype space

One of the most useful 2026 extensions of ordination logic is not a new plotting method at all. It is cross-space comparison. A genotype PCA and a phenotype- or microbiome-based PCoA may each be valid on their own, but the deeper biological question is often whether their structures are related. Do ancestry gradients align with community structure? Do host genetic clusters track phenotypic ordination? Do treatment-associated transcriptomic shifts mirror ecological movement in a microbial distance space?

This is where Procrustes analysis becomes powerful. Instead of pretending that genotype space and phenotype space are directly comparable coordinates, Procrustes asks how well one low-dimensional configuration can be rotated, scaled, and aligned to another. The logic is simple but powerful: keep each ordination honest to its own geometry first, then test concordance between them afterward.

That approach is especially relevant in programs combining variant-calling workflows used to compare genotype-derived structure with phenotype space, host genotyping, and microbial community profiling, or in broader population evolution studies where phenotype and ancestry structure may only partially overlap. The key advantage is conceptual clarity. You do not force one data type into the wrong geometry just to make the pictures look comparable. You compare structures after each has been modeled appropriately.

The emerging decision rule

The most useful 2026 rule is therefore not “Use the newest method.” It is “Match the geometry to the biological question.”

Use PCA when the input is a feature matrix and the goal is to summarize dominant variance in a defensible Euclidean space. Use PCoA when the biological question is defined through a distance matrix and the resulting map should inherit that definition. Use NMDS when ranked relationships are more trustworthy than exact metric embedding. Use UMAP or t-SNE when the goal is local neighborhood visualization rather than explicit distance-based ordination.

That is the real shift from definition-based content to decision-based content. Researchers do not need another page telling them that PCA is linear and NMDS is nonlinear. They need a way to decide which geometry belongs to which dataset, and how to recognize when a clean-looking plot is telling the wrong story.

Method selection matrix

Input / Goal Best-fit Method Why it fits Main warning sign
Dense feature matrix, variance summary PCA Directly models variance structure in the transformed matrix Compositional distortion, zero inflation, or arch effects
Distance-defined beta-diversity comparison PCoA Preserves the biological meaning built into the metric Negative eigenvalues and preprocessing dependence
Rank-preserving ecological separation NMDS More robust when ranked dissimilarity matters more than exact embedding High stress and unstable convergence
Local neighborhood exploration UMAP / t-SNE Useful for manifold-like local structure and subtype exploration Not a direct substitute for explicit distance-based ordination

Don’t pick PCA just because it is familiar. Don’t pick PCoA without interrogating the metric. Don’t trust NMDS without reading the stress. Don’t use UMAP as a drop-in replacement for a defined beta-diversity ordination.

Method choice should follow data geometry and analytical goal together

Figure 5. Method choice should follow data geometry and analytical goal together. Feature matrices favor PCA, biologically defined distance matrices favor PCoA, rank-sensitive ecological dissimilarity can favor NMDS, and local-manifold visualization can favor UMAP or t-SNE.

Key Takeaways

  • Feature matrix vs. distance matrix is the first fork in the road.
  • Exact embedding vs. rank preservation is the second.
  • Interpretability vs. local visualization is the third.
  • A clean-looking plot is not automatically a trustworthy biological summary.
  • In omics, the best ordination is the one whose geometry matches the question.

FAQ

1. What is the main difference between PCA, PCoA, and NMDS?

PCA works on a feature matrix and summarizes variance along orthogonal axes. PCoA works on a distance matrix and embeds sample dissimilarities into coordinates. NMDS works on ranked dissimilarities and prioritizes preserving their order rather than exact values.

2. Why is PCA often a poor fit for sparse microbiome data?

Because sparse abundance tables contain many zeros, and shared zeros do not always imply biological similarity. In addition, microbiome data are compositional, so Euclidean covariance structure can misrepresent ecological relationships.

3. When should I use Bray-Curtis instead of UniFrac?

Use Bray-Curtis when abundance composition is the main contrast of interest. Use UniFrac when phylogenetic structure matters. Choose weighted or unweighted UniFrac depending on whether lineage abundance should influence the distance.

4. What do negative eigenvalues in PCoA mean?

They indicate that the chosen distance matrix is not fully Euclidean. Small negative eigenvalues may be tolerable, but larger ones suggest that the low-dimensional Euclidean embedding is straining to represent the original distance structure faithfully.

5. What does NMDS stress actually tell me?

Stress measures how well the low-dimensional NMDS configuration preserves the ranked dissimilarities from the original data. Lower stress means a more faithful ordination. Higher stress means the map is more distorted and should be interpreted more cautiously.

6. Is NMDS always better than PCoA for ecological data?

No. NMDS is more flexible when rank preservation is the main priority, but PCoA is often more interpretable and computationally efficient if the distance matrix behaves reasonably in Euclidean space.

7. Should UMAP replace PCoA in beta-diversity analysis?

Usually no. UMAP is excellent for local neighborhood visualization, but beta-diversity analysis often requires an ordination whose geometry is directly tied to a defined biological distance metric. That is where PCoA still has a stronger interpretive foundation.

8. Can PCA and PCoA be used together in the same study?

Yes. A study may use PCA for genotype or expression matrices and PCoA for ecological or phylogeny-aware distance matrices. Their results can then be compared through methods such as Procrustes analysis.

References:

  1. Pearson K. On lines and planes of closest fit to systems of points in space. Philosophical Magazine. 1901;2(11):559-572. DOI:10.1080/14786440109462720
  2. Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24(6):417-441. DOI:10.1037/h0071325
  3. Gower JC. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966;53(3-4):325-338. DOI:10.1093/biomet/53.3-4.325
  4. Kruskal JB. Nonmetric multidimensional scaling: a numerical method. Psychometrika. 1964;29(2):115-129. DOI:10.1007/BF02289694
  5. Kruskal JB. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika. 1964;29(1):1-27. DOI:10.1007/BF02289565
  6. Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecological Monographs. 1957;27(4):325-349. DOI:10.2307/1942268
  7. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology. 2005;71(12):8228-8235. DOI:10.1128/AEM.71.12.8228-8235.2005
  8. Aitchison J. The statistical analysis of compositional data. Journal of the Royal Statistical Society Series B. 1982;44(2):139-177. DOI:10.1111/j.2517-6161.1982.tb01195.x
  9. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Frontiers in Microbiology. 2017;8:2224. DOI:10.3389/fmicb.2017.02224
  10. Martino C, Morton JT, Marotz CA, et al. A novel sparse compositional technique reveals microbial perturbations. mSystems. 2019;4(1):e00016-19. DOI:10.1128/mSystems.00016-19
  11. Xu W, Wang H, Zeng Y. The horseshoe-like effect of principal component analysis in single-cell RNA-seq data. Bioinformatics Advances. 2024;4(1):vbae109. DOI:10.1093/bioadv/vbae109
  12. Peres-Neto PR, Jackson DA, Somers KM. How many principal coordinates should be retained for ecological studies? Computers & Geosciences. 2005;31(10):1217-1223. DOI:10.1016/j.cageo.2005.03.002
  13. Legendre P, Anderson MJ. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs. 1999;69(1):1-24. DOI:10.1890/0012-9615(1999)069[0001:DBRATM]2.0.CO;2
  14. Borg I, Groenen PJF. Modern Multidimensional Scaling: Theory and Applications. 2nd ed. Springer; 2005. DOI:10.1007/0-387-28981-X

Disclaimer: This article discusses analytical logic for research-use-only omics datasets and does not constitute clinical, diagnostic, or therapeutic guidance.


Quote Request
Copyright © CD Genomics. All rights reserved.
Share
Top