Spatial Transcriptomics and Proteomics Integration: Methods, Tools, and Applications

Inquiry

A tissue section contains the full molecular story of its biology—but no single technology reads every chapter. Spatial transcriptomics captures which genes are being expressed and where, offering a transcriptome-wide view of cell state across tissue architecture. Spatial proteomics, in contrast, detects the proteins that execute cellular functions, often at single-cell or subcellular resolution.

The relationship between these two layers is weaker than most researchers expect. Across multiple tissue types and platforms, the correlation between mRNA abundance and protein abundance measured from the same spatial region ranges from 0.4 to 0.6^[8]. That means transcript levels alone explain less than half of the protein variation in a given spot or cell.

This gap is not measurement noise—it reflects real biology. Post-transcriptional regulation, variable translation efficiency, differential protein degradation rates, and a wide array of post-translational modifications all contribute to the divergence between what a cell’s transcriptome says it could do and what its proteome says it is doing^[8][9]. The gap is especially pronounced in the tumor microenvironment, where immune checkpoint proteins can be virtually absent at the RNA level while showing strong expression at the protein level in specific spatial niches^[8].

Integrating these two modalities bridges the gap. It provides a view that is simultaneously broad (transcriptome-wide discovery) and functionally resolved (protein-level validation), making it possible to ask questions that neither layer alone can answer.

Conceptual illustration bridging spatial transcriptomics and proteomics integration with a tissue section showing mRNA spots and single-cell protein detection Figure 1. Spatial transcriptomics and proteomics integration reconciles transcriptome-wide discovery with protein-level functional validation within intact tissue architecture.

Two Experimental Paths to Paired Data

Before any computational integration can happen, the data must be generated. Two broad experimental strategies exist for collecting spatial transcriptomics and proteomics data from the same tissue, each with distinct trade-offs.

Dimension	Same-Section Integration	Serial-Section Integration
How it works	RNA and protein detected from the same tissue section	Adjacent sections processed independently for each modality
Spatial alignment	Inherent—shared coordinates	Computational image registration required
Platforms	DBiT-seq, Stereo-CITE-seq, IN-DEPTH (PCF → Xenium)	Visium + CODEX/IMC, Xenium + COMET
Key advantage	Pixel-perfect co-localization, no registration error	Flexibility to use best-in-class platforms per modality
Trade-off	Higher technical complexity, fewer protein targets per panel	Registration errors propagate through analysis
Best suited for	High-precision cell-cell colocalization studies	Broad discovery with flexible platform choices

Same-section integration is the current gold standard for studies that require high-confidence spatial co-localization. Technologies like IN-DEPTH (IN-situ DEtailed Phenotyping To High-resolution transcriptomics) run spatial proteomics first on a section, then perform spatial transcriptomics on the same slide, preserving both RNA and protein signals^[8]. Stereo-CITE-seq combines transcriptome-wide spatial RNA detection with protein antibody tags on the same slide^[1], while DBiT-seq offers a microfluidic approach to deliver both RNA and protein probes across the same tissue surface^[1].

Serial-section integration trades some alignment precision for experimental flexibility. A researcher can run, for example, 10x Visium for spatial transcriptome sequencing on one section and CODEX (imaging-based spatial proteomics) on the immediately adjacent section, then co-register the two data sets computationally^[9]. This approach allows each modality to be processed under its optimal conditions and is compatible with a wider range of sample types, including FFPE tissues. The trade-off is that tissue deformation during sectioning, fixation, and staining introduces subtle alignment errors that can compound across the analysis.

Side-by-side comparison of same-section and serial-section experimental strategies for spatial multi-omics data collection Figure 2. Two experimental paths to paired spatial transcriptomics and proteomics data—same-section integration (shared coordinates) versus serial-section integration (computational registration).

Four Computational Paradigms at a Glance

Once paired data are in hand, the next question is how to extract shared biological signals while preserving modality-specific information. Current computational approaches fall into four broad paradigms, summarized below.

Paradigm	Representative Tools	Core Strength	Key Limitation	Best Applied When
Graph neural networks	SpatialGlue, STAGATE	Learns from spatial neighborhood structure	Sensitive to graph construction parameters	Tissue with clear spatial domains (cortex layers, tumor boundaries)
Matrix factorization	MOFA+, MEFISTO	Highly interpretable factors	Linear or near-linear assumptions	Exploratory studies needing interpretable latent dimensions
Deep learning / VAE	scMM, TotalVI, SpaOmicsVAE	Captures non-linear cross-modal relationships	Requires larger datasets; less interpretable	Complex multi-omics with non-linear regulation
Spatial-prior integration	SpatialCOC	Robust to noise; flexible dimensions	Newer; fewer validation benchmarks	Noisy data, cross-slice integration, arbitrary modalities

The following sections walk through each paradigm, with representative algorithms and practical considerations for choosing among them.

Overview of four computational paradigms for integrating spatial transcriptomics and proteomics: graph neural networks, matrix factorization, deep learning, and spatial-prior integration Figure 3. Four computational paradigms for spatial transcriptomics-proteomics integration, each with representative tools and core strengths.

Graph Networks and Spatial Topology

Graph neural networks (GNNs) are a natural fit for spatial multi-omics data because they treat tissue sections as graphs: each spot or cell is a node, and spatial adjacency defines the edges. This structure lets the model incorporate tissue architecture directly into the learning process.

SpatialGlue uses a dual-attention mechanism to integrate multiple omics layers^[2]. Within each modality, it builds a spatial proximity graph and a feature similarity graph, then weights their contributions adaptively. A second level of attention learns how much each modality should contribute at each spatial location. The result is a joint embedding that respects both molecular similarity and tissue structure. In benchmark tests on Stereo-CITE-seq (RNA + protein) data from mouse thymus and spleen, SpatialGlue consistently outperformed earlier methods including Seurat, MOFA+, and TotalVI across multiple metrics^[2].

STAGATE takes a related but distinct approach, using a graph attention autoencoder that learns which neighbor spots are most informative for defining spatial domains. It handles tissues with gradient boundaries—where cell states transition gradually rather than forming sharp edges—better than methods that assume discrete compartments.

Practical considerations. Graph construction parameters (neighbor count, distance metric, edge-weighting scheme) are not preprocessing details—they directly affect the results. A k-nearest-neighbor graph that is too dense can oversmooth domain boundaries, while one that is too sparse can fragment coherent regions. A practical step is to run sensitivity checks with at least two neighbor settings and report whether major conclusions hold across both.

Shared Factors from Noisy Measurements

Matrix factorization approaches take a different starting point. Rather than encoding spatial structure explicitly, they assume that the observed omics layers are generated by a smaller set of shared latent factors—biological processes such as a signaling pathway, a cell-state program, or a metabolic gradient.

MOFA+ (Multi-Omics Factor Analysis) identifies these shared factors across modalities using a Bayesian framework that naturally handles missing values^[3]. Each learned factor can be interpreted biologically: one factor might capture interferon response strength across the tissue, another might reflect proliferation index. The output—factor values per spot—can be visualized as spatial maps, overlaid on tissue images, and correlated with histological features.

MEFISTO extends MOFA+ by introducing Gaussian process priors along spatial or temporal coordinates^[4]. Where MOFA+ treats each spot as an independent sample, MEFISTO explicitly models the expectation that nearby spots should have similar factor values. This spatial smoothing helps recover continuous gradients—for example, a gradual transition from the tumor core to the invasive front—that might be fragmented by a discrete clustering approach.

The main trade-off with this family of methods is linearity. MOFA+ and MEFISTO assume that the relationship between latent factors and observed data is approximately linear. When cross-modal regulation is strongly non-linear (for example, when a transcription factor’s protein abundance has a threshold effect on target gene expression), deep learning approaches may capture the relationship more faithfully.

Nonlinear Integration with Deep Learning

Deep learning methods, particularly variational autoencoders (VAEs), address the non-linearity limitation by learning flexible, data-driven transformations between modalities.

TotalVI was originally developed for CITE-seq data (single-cell RNA + protein) but has been applied to spatial contexts with appropriate adaptations^[5]. It jointly models RNA counts (negative binomial distribution) and protein ADT values, learning a latent representation that captures shared biology while denoising modality-specific technical variation.

scMM uses a VAE architecture with an adversarial training objective that forces the transcriptomic and proteomic embeddings to follow the same distribution in the latent space^[6]. This alignment step is especially valuable when the two modalities have very different statistical properties—RNA is count-based and sparse, while protein signals are continuous and have different dynamic ranges.

SpaOmicsVAE extends the VAE framework with a dual-graph structure that incorporates both spatial proximity and feature similarity, similar to the GNN paradigm^[10]. It integrates three modalities (transcriptome, proteome, and epigenome) in a single framework and includes an attention-based fusion layer that weights each modality’s contribution per spatial location.

Practical considerations. Deep learning approaches are data-hungry. On a dataset with a small tissue section or limited coverage, VAE-based methods may underperform simpler linear approaches. The quality of the latent representation should be validated with known marker concordance using three checks:

Marker co-localization — well-characterized RNA and protein markers for the same cell type should co-localize in the embedding
Modality balance — no single modality should dominate the latent dimensions; variance explained per modality should be reported
Biological plausibility — inferred relationships should be consistent with established pathway knowledge before being treated as discoveries

Spatial Information as Prior Knowledge

A more recent family of methods treats spatial information as a first-class prior rather than an input to be featurized. SpatialCOC (Spatial Continuous Mapping and Cross-Omics Correction), published in Nature Communications in 2026, exemplifies this approach^[7].

SpatialCOC has two core modules. The Spatial Continuous Mapping (SCM) module uses implicit neural representations to learn a continuous spatial distribution for each omics layer. Instead of discretizing the tissue into spots or cells and building graphs between them, SCM models molecular abundance as a continuous function of spatial coordinates—meaning it can predict values at any spatial position, including locations not directly measured.

The Cross-Omics Correction (COC) module then identifies non-linear correlations between modalities while removing modality-specific noise. Inspired by canonical correlation analysis but implemented as a dual-loss optimization, it recovers shared biological signals without forcing modalities into a common embedding at the expense of modality-specific features.

In validation studies across human lymph node, mouse brain, spleen, and thymus datasets, SpatialCOC outperformed SpatialGlue, STAGATE, MultiVI, and other existing methods in identifying continuous spatial domains, resisting noise, and maintaining batch-consistent trajectory inference across sections^[7].

The key conceptual shift in SpatialCOC is that it does not require equal-dimensional modalities or pre-aligned spatial resolutions. This makes it particularly useful for integrating data types that are structurally divergent—for example, whole-transcriptome Visium spots (55 μm resolution) with single-cell-resolution CODEX protein images.

From Integration to Biological Discovery

The value of spatial transcriptomics–proteomics integration is best illustrated by studies that used it to make biological findings that neither layer alone could have revealed.

IN-DEPTH + SGCC in EBV-positive DLBCL. A 2025 Cancer Discovery study introduced IN-DEPTH, a same-slide workflow that runs ultra-high-plex PCF (CODEX) spatial proteomics first, followed by spatial transcriptomics on the exact same tissue section^[8]. To analyze the resulting multi-modal data, the authors developed Spectral Graph Cross-Correlation (SGCC), which quantifies spatial co-localization and co-exclusion patterns between cell types and links them to transcriptional programs.

Applied to EBV-positive versus EBV-negative diffuse large B-cell lymphoma, the workflow uncovered three linked findings: (1) EBV-positive tumor cells used IL-27–STAT3 signaling to recruit and polarize macrophages toward a C1Q+ immunosuppressive phenotype; (2) in regions where tumor cells, macrophages, and CD4+ T cells co-localized, CD4+ T cell dysfunction was most severe; and (3) the macrophage-CD4+ T cell interaction in EBV-negative samples trended toward immune activation rather than suppression^[8]. These spatial interaction patterns had no transcriptome-only correlate—they were detectable only through multi-modal integration.

Lymph node remodeling in HNSCC metastasis. A 2026 Cancer Cell study applied CODEX spatial proteomics (53 antibodies) and spatial transcriptomics to 390 tissue microarray cores from 78 head and neck squamous cell carcinoma patients^[9]. With over 1.5 million cells profiled and 21 cell types identified, the study revealed that lymph node colonization is not a passive event—it actively remodels the tissue by forming immunosuppressive fibroblast-myeloid cell niches. These niches expand into T-cell zones, driving T cell dysfunction and regulatory T cell activation through TGF-β1 and CXCL10 signaling^[9]. Notably, the remodeling effect extended beyond the metastatic site to distant, tumor-free lymph nodes, suggesting a systemic immunosuppressive cascade triggered by regional metastasis.

Both studies share a common architectural insight: spatial multi-modal data is not simply additive. The interaction patterns between cell types, the niche-level organization, and the tissue-level effects of signaling pathways emerged only when transcriptomic and proteomic data were analyzed together in spatial context.

Landmark studies using spatial multi-omics integration: IN-DEPTH SGCC workflow for EBV-positive DLBCL and HNSCC lymph node metastasis multi-region analysis Figure 4. Two landmark studies demonstrating biological discovery through spatial transcriptomics-proteomics integration—IN-DEPTH+SGCC in DLBCL and multi-region spatial analysis in HNSCC lymph node metastasis.

Open Challenges in Multi-Modal Integration

Despite rapid progress, several challenges remain unresolved:

Resolution mismatch. Spatial transcriptomics at the spot level (for example, 55 μm Visium spots containing dozens of cells) must be reconciled with single-cell or subcellular proteomics data. This is an asymmetric deconvolution problem, and existing methods show substantially higher error at tissue boundaries and infiltration zones, where multiple cell types mix at sub-spot scales^[7].
Non-random missingness. Detection efficiency differs across modalities. A low-abundance transcript may fall below detection while its protein product is stable and measurable, or vice versa. These systematic gaps can mislead algorithms into interpreting technical absence as biological absence—a problem that becomes circular when one modality is used to “validate” features that the other could not detect^[7].
Registration error propagation. In serial-section designs, tissue deformation and section angle differences introduce alignment uncertainties that cascade through spatial neighborhood graphs, domain detection, and inter-modality comparisons. Current methods rarely quantify or propagate these uncertainties into downstream conclusions^[7].
Spatially heterogeneous batch effects. Standard batch correction assumes uniform effects across all spatial positions. In spatial data, batch effects can be spatially localized—for example, tissue edges versus centers in a single section, or different regions of a multi-section TMA. Methods that account for spatial structure in technical variation are only beginning to emerge.

Choosing a Strategy That Fits

The choice of experimental and computational strategy depends on the research question, sample type, and available resources.

If high-precision cell-cell colocalization is the primary goal—for example, mapping immune cell interactions in the tumor microenvironment—same-section integration paired with a graph-based analysis method such as SpatialGlue provides the strongest spatial evidence. For questions that require broad transcriptome coverage with protein validation, serial-section integration with a flexible alignment pipeline offers a practical balance between depth and feasibility.

Among computational methods, the decision can be framed as three questions: (1) Do I need interpretable factors for hypothesis generation? → Matrix factorization (MOFA+/MEFISTO). (2) Is my data large and likely to contain non-linear relationships? → Deep learning (SpaOmicsVAE, TotalVI). (3) Do I have noisy or structurally divergent multi-modal data spanning multiple sections? → Spatial-prior methods (SpatialCOC).

For research teams evaluating these approaches, sample feasibility review and study design consultation can help match the experimental strategy to the specific tissue type, platform, and analysis question. CD Genomics supports research projects through spatial transcriptomics services and integrated multi-omics analysis workflows, from sample review through data generation and reporting.

This content is intended for research use only. The methods, tools, and studies described are not intended for clinical diagnosis, treatment decisions, or individual health assessment.

FAQs

How do I choose between same-section and serial-section integration?
The decision depends primarily on your research question and sample availability. Same-section integration is the preferred choice when high-precision cell-cell colocalization is essential—for example, mapping which immune cell types physically interact with tumor cells at specific spatial coordinates. It eliminates registration errors entirely because both modalities share the same tissue coordinates. Serial-section integration is more practical when you need maximum flexibility in platform selection, when your tissue block allows multiple adjacent sections, or when you are working with FFPE samples that require modality-specific processing conditions. The trade-off is that you accept some registration uncertainty, which should be documented and ideally quantified in your analysis report.

When should I use SpatialGlue versus SpatialCOC for integration?
SpatialGlue works best when your tissue has well-defined spatial domains—layered structures like the cerebral cortex, organized lymphoid follicles, or tumor regions with sharp boundaries. Its graph-attention mechanism excels at respecting these architectural features. SpatialCOC is the stronger choice when your data is noisy, when modalities have very different resolutions (for example, 55 μm Visium spots paired with single-cell CODEX images), or when you are analyzing multiple sections and need batch-consistent results. SpatialCOC models molecular abundance as a continuous function of spatial coordinates, which makes it more robust to resolution mismatches and technical noise than graph-based methods.

How can I validate that multi-modal integration produced meaningful results?
Validating integration quality requires checking consistency along three axes. For well-characterized markers, RNA and protein signals should show concordant spatial patterns—if CD8A RNA and CD8 protein map to completely different tissue regions, the integration has likely failed. Cross-modality balance should be assessed by checking that no single omics layer dominates the variance in the joint embedding. Known biological structures (histologically annotated tissue regions, expected cell-type distributions) should correspond to the spatial domains or factors identified by the integration method. Treat novel discoveries revealed through integration as hypotheses that require orthogonal validation, such as RNAscope or immunofluorescence for a subset of key markers.

Can I perform integration if I only have spatial transcriptomics data without paired proteomics?
Yes, several computational approaches can predict spatial protein distributions from transcriptomics-only data. Tools like DGAT (Dual-Graph Attention Network) learn RNA–protein relationships from spatial CITE-seq reference datasets and then impute protein abundance in spatial transcriptomics samples that lack proteomics measurements. Similarly, SpaOmicsVAE supports integration when one modality is partially missing through its VAE imputation framework. However, imputed protein data should be treated as generated hypotheses, not as direct measurements. These predictions are most reliable for well-characterized markers with strong RNA–protein correlations and should be validated with targeted protein detection on a subset of key targets whenever possible.

References

Enninful A, Zhang Z, Klymyshyn D, et al. Integration of imaging-based and sequencing-based spatial omics mapping on the same tissue section via DBiTplus. bioRxiv. 2024. doi:10.1101/2024.11.07.622523. [Note: Also accepted at Nature Methods; final published version DOI may differ.]
Long Y, Ang KS, Sethi R, et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nature Methods. 2024;21(9):1658-1667. doi:10.1038/s41592-024-02316-4
Argelaguet R, Arnol D, Bredikhin D, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biology. 2020;21(1):111. doi:10.1186/s13059-020-02015-1
Velten B, Braunger JM, Argelaguet R, et al. MEFISTO: a probabilistic framework for integrative analysis of multi-modal spatial and temporal data. Nature Biotechnology. 2022;40(10):1473-1482. doi:10.1038/s41587-022-01292-4
Gayoso A, Steier Z, Lopez R, et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nature Methods. 2021;18(9):1008-1016. doi:10.1038/s41592-021-01220-3
Yang Y, Shi X, Gu H, et al. A mixture‑of‑experts deep generative model for integrated analysis of single‑cell multiomics data. Genome Biology. 2021;22(1):330. doi:10.1186/s13059-021-02520-1
Li M, Sun P, Ye K, Meng D, et al. SpatialCOC: an integrative framework for spatial continuous mapping and cross-omics correction in spatial multi-omics data. Nature Communications. 2026. doi:10.1038/s41467-026-71882-2
Yiu SPT, Chang Y, Yeo YY, Qiu H, Wu W, et al. Same-Slide Spatial Multi-Omics Integration With IN-DEPTH Reveals Tumor Virus-Linked Spatial Reorganization of the Tumor Microenvironment. Cancer Discovery. 2026. doi:10.1158/2159-8290.CD-25-0775
Haist M, Baertsch MA, Reticker-Flynn NE, et al. Lymph node colonization induces tissue remodeling via immunosuppressive fibroblast-myeloid cell niches supporting metastatic tolerance. Cancer Cell. 2026;44(3):604-623.e9. doi:10.1016/j.ccell.2026.01.003
Zhang Z, Wang M, Zhang X, et al. SpaOmicsVAE: A deep learning framework for integrative analysis of spatial multi-omics data. Computer Methods and Programs in Biomedicine. 2025;271:109032. doi:10.1016/j.cmpb.2025.109032

For research use only, not intended for any clinical use.