Targeted Spatial Panel Design for Xenium & CosMx: A Practical Gene Selection Guide from Single-Cell Reference Data

Why Panel Design Deserves Its Own Workflow

Most conversations around targeted spatial transcriptomics default to platform specs. Which instrument delivers higher resolution? Does the chemistry preserve RNA better under different fixation conditions? These are reasonable starting points, but they skip over a more consequential bottleneck: once the platform is chosen, you still have to decide which 300 to 5,000 genes will fill your custom panel.

The quality of that decision is often the difference between a dataset that cleanly resolves every expected cell type and one that leaves you wondering whether a missing signal is real biology or a design flaw. Imagine designing a 400-gene panel for a glioblastoma study, only to discover after the first pilot that your top three microglia markers are undetectable because their transcripts are shorter than the probe length requirement, and your negative control probes are indistinguishable from the housekeeping genes. That is not a platform problem — it is a panel design problem. And it is far more common than most researchers expect.

A well-structured panel, even with a fraction of the genes available on a whole-transcriptome platform, can generate focused, high-confidence spatial data across hundreds of samples. The difference comes down to a systematic selection workflow: preparing your reference data properly, organizing genes into functional layers with clear selection criteria, testing on pilot sections, and iterating before committing to a full cohort.

For most research groups, high-quality single-cell RNA-seq reference data is already available — either from the same tissue or from public atlases. The challenge is translating that expression data into a platform-compatible gene list that does not sacrifice biological resolution for the sake of a gene count limit.

Figure 1: Gene Slot Allocation Funnel. A three-dimensional funnel illustration showing the progressive filtering process from the whole transcriptome (~20,000 genes) through scRNA-seq filtering, marker candidate selection, and panel design to the final spatial panel. Each narrowing stage represents a key filtering decision that eliminates genes unlikely to perform well on targeted spatial platforms.

Starting with a Solid scRNA-seq Reference

The accuracy of your panel depends directly on the quality of your starting data. Targeted spatial platforms can measure only a fraction of the transcriptome, so every gene slot must earn its place through clear evidence. A panel designed from noisy or poorly annotated single-cell data will propagate those errors into the spatial experiment.

When You Have Your Own scRNA-seq Data

Matched or adjacent-tissue single-cell data from the same biological system is the gold standard. Your existing clustering annotations and differentially expressed gene lists have already been validated through independent analysis pipelines, which means they carry fewer uncertainties than data borrowed from external sources.

Before using this data for panel design, three preparation steps are essential.

First, work from a normalized expression matrix rather than raw counts. Raw counts reflect cell-to-cell variation in sequencing depth that can distort marker gene rankings. A gene that appears to be a top marker simply because its cell type was sequenced more deeply is a false positive for panel design purposes. Normalization methods such as CPM, SCTransform, or scran's deconvolution approach remove this technical variation and reveal the biological expression patterns that matter.

Second, verify that your cell-type annotations are stable. Clusters defined at low resolution or validated by only a handful of markers can propagate classification errors into the panel. A quick cross-check against canonical marker expression — either in your own data or against published atlases — catches annotation drift before it affects gene selection. If a cluster changes its identity after re-clustering at higher resolution, the markers derived from that cluster are unreliable.

Third, build a candidate gene list that filters on both specificity and expression abundance. A gene that is perfectly specific to a rare cell type but expressed at only one transcript per cell in scRNA-seq is unlikely to produce detectable signal on a targeted spatial platform, where per-gene sensitivity is inherently lower than full transcriptome sequencing. Setting a minimum mean expression threshold — typically the 25th percentile of all detected genes — removes these low-yield candidates early in the process.

When You Need to Use Public Data

For projects without matched scRNA-seq data — a common situation in FFPE clinical cohorts, rare disease models, or non-model organisms — public resources can fill the gap. Several curated atlases now cover major human and mouse tissues with sufficient resolution for panel design:

Tabula Sapiens provides single-cell transcriptomes across 24 human tissue types from individual donors, preserving donor-matched cross-tissue comparability.

The Human BioMolecular Atlas Program (HuBMAP) offers spatially registered single-cell data with detailed anatomical metadata, making it particularly useful for projects that need both cell-type and spatial context from the outset.

The Allen Brain Atlas and its derivatives cover both human and mouse brain at fine cell-type resolution, including inhibitory and excitatory neuron subtypes that are difficult to distinguish with generic markers.

For simpler marker lookups, databases like CellMarker 2.0 and PanglaoDB compile published cell-type markers across hundreds of tissue and disease contexts. These are useful for initial candidate nomination but should be cross-referenced against at least one additional source, since marker specificity can shift with disease state, tissue processing method, and platform chemistry.

The Four-Layer Panel Architecture

A common mistake in panel design is treating the gene list as a flat collection of interesting candidates. A more robust approach is to organize genes into four functional layers, each with a distinct purpose and its own selection logic. Organizing genes into these four layers transforms panel design from a reactive slot-filling exercise into a proactive, hypothesis-driven process. Here is how each layer should be constructed.

Layer 1: Cell Type Markers — The Non-Negotiable Foundation

Every cell type expected in your tissue should be represented by at least three to five well-validated marker genes. These markers form the backbone of the panel because they determine whether the spatial data interpretation can be grounded in cell-type composition, which is the starting point for most downstream analyses.

Selection criteria for Layer 1 markers include three checks.

Specificity across independent sources. A gene that appears as a top marker in your scRNA-seq data but is not reported in CellMarker 2.0 or PanglaoDB for the same tissue should be treated with caution. Ideally, each marker is confirmed in at least two independent datasets before being committed to the panel. A marker that fails this cross-validation check often turns out to be dataset-specific rather than biologically universal.

Consistent detection. Genes in the top 200 most abundant transcripts in your reference have a high probability of producing robust spatial signal. Markers in the lower half of the expression distribution carry higher risk of failing on the spatial platform, where per-gene sensitivity is lower than in full transcriptome sequencing. If a marker ranks high by fold-change but low by absolute expression, consider adding a backup.

Orthogonal confirmation. If published immunohistochemistry or RNAscope images confirm the expected spatial pattern for a candidate marker, that is strong supporting evidence. For candidates without any spatial validation data, consider adding them to a pilot panel rather than committing them to the final gene list. Pilot data will quickly reveal whether the marker produces the expected spatial distribution.

For a typical tissue with eight to twelve expected cell types, Layer 1 requires approximately 30 to 60 genes. In a tumor study where macrophages, monocytes, and dendritic cells all need to be distinguished, Layer 1 may need eight to ten markers for the myeloid compartment alone. The key principle is to cover all expected cell types with redundancy, not to maximize the number of markers per cell type.

Layer 2: State and Functional Markers

Beyond cell-type identification, most spatial studies aim to capture biological processes — immune activation, metabolic reprogramming, hypoxic response, or developmental signaling. Layer 2 genes cover these functional dimensions.

The exact composition of Layer 2 depends on your study. For a tumor microenvironment project, relevant markers might include interferon-gamma response genes (IFNG, STAT1, IRF1), T-cell exhaustion markers (PDCD1, LAG3, TOX), and hypoxia-inducible factors (HIF1A, VEGFA). For a neurodegeneration study, the list shifts to amyloid processing enzymes (BACE1, PSEN1), synaptic proteins (SYN1, DLG4), and astrocyte reactivity markers (GFAP, VIM).

A practical allocation guideline is to reserve roughly one-third of your total gene budget for Layer 2. Adjust the ratio based on your primary goal: a 70:30 split for cell-type mapping projects, or 50:50 for mechanism-focused studies. The key is to decide the ratio before selection begins, rather than retrofitting functional markers into whatever slots remain after cell type markers are chosen.

Layer 3: Technical Controls — The Most Overlooked Essential

Positive controls should include two to three housekeeping genes — GAPDH, ACTB, and B2M are standard choices — that are expressed consistently across all expected cell types. These controls serve three functions: they confirm that the tissue section contains detectable RNA, they provide a cross-sample benchmark for normalizing signal intensity, and they flag samples with generalized RNA degradation when their signal drops below an expected range.

Negative controls should include five to ten non-targeting probe sequences designed not to hybridize to any known transcript in the species. These probes quantify background autofluorescence and non-specific binding. Without them, any spatially patterned signal could be real biology or a staining artifact. The gene budget cost of negative controls is negligible — typically five to ten probes out of several hundred or thousand — but their value for data interpretability is enormous.

Layer 4: Reserve Slots

No panel survives first contact with real tissue unchanged. Reserve five to ten percent of your gene budget for markers identified during pilot testing. These may include replacements for Layer 1 markers that failed to produce signal, newly published gene targets in your field, or genes that emerged as unexpectedly informative from the pilot data.

A practical approach is to fill Layer 4 with placeholder candidates during probe synthesis, then update the composition before ordering probes for the main cohort. Building this buffer into the initial design avoids the expensive scenario of discovering a critical gap after probe synthesis is complete.

Figure 2: Four-Layer Panel Architecture — Concentric Rings. A cross-sectional diagram showing four nested rings representing the functional layers of a targeted spatial panel. The innermost ring (Cell Type Markers) forms the core, surrounded by State and Functional Markers, Technical Controls, and an outer Reserve Slots ring with a dashed boundary indicating flexibility. Each layer is color-coded and labeled with its recommended gene budget allocation.

Maximizing Resolution Within a Fixed Gene Budget

Each targeted spatial platform imposes a hard upper limit on the number of genes that can be measured simultaneously. With Xenium supporting up to 5,000 and CosMx up to 6,000, every gene choice competes for a finite slot. Maximizing biological resolution within these limits requires two strategies: eliminating redundancy and designing backup markers for critical cell types.

Avoiding Gene Family Redundancy

An often overlooked issue is the inclusion of multiple members of the same gene family that carry nearly identical spatial information. Several histone genes, HLA paralogs, or keratin family members may show highly correlated expression patterns across cell types. Including all of them consumes slots without adding biological value.

A quick correlation analysis on your scRNA-seq reference data can identify these redundant candidates. When two genes show a Pearson correlation above 0.9 across cell types, retaining only the one with higher expression or more specific spatial annotation is usually sufficient. This single step can recover 5 to 15 percent of your gene budget for more informative markers.

Backup Marker Strategy

For cell types that are critical to your study, designate two backup markers in addition to the primary marker. If the primary fails due to RNA degradation, low expression in a specific tissue region, or probe design limitations, the backup ensures that the cell type remains detectable.

This redundancy is especially important for FFPE samples, where RNA fragmentation can reduce detection efficiency for longer probes. A marker with a transcript length under 2 kilobases is more likely to survive FFPE processing than one exceeding 3 kilobases.

Figure 4: Platform Gene Budget vs. Resolution Radar Chart. A radar chart comparing Xenium, CosMx, and GeoMx across five performance axes: gene budget, spatial resolution, FFPE compatibility, throughput, and analysis complexity. The chart enables direct visual comparison of platform trade-offs for targeted panel design projects.

When a Discovery Run Makes Sense First

For tissue types with limited spatial reference data, a whole-transcriptome spatial platform such as Visium or Visium HD can serve as a powerful precursor to targeted panel design. The full transcriptome coverage provides an unbiased view of gene expression across tissue regions, revealing spatial gradients and regional enrichments that single-cell data alone may not capture.

This cascade workflow — Visium discovery phase, followed by targeted panel optimization, followed by large-cohort Xenium or CosMx profiling — is especially effective for three scenarios.

Novel disease models or engineered tissues where spatial expression patterns have not been described and published markers are unreliable. In these settings, attempting to design a targeted panel from scRNA-seq data alone risks missing spatially informative genes that only become apparent in the tissue context.
Large cohort studies where the cost of running 100 to 200 samples on a targeted platform is justified only after the gene set has been validated on a smaller discovery set. A Visium pilot on 4 to 6 sections typically costs a fraction of a full targeted cohort and can prevent the much larger expense of discovering poor marker selection halfway through sample processing.
Projects that combine discovery and validation within a single grant budget, where the upfront Visium investment reduces the risk of choosing the wrong genes for the targeted phase. The decision to include a discovery phase should be made before panel design begins, not after a first failed targeted pilot.

A concrete example: a researcher studying immune infiltration in a transgenic pancreatic cancer model with no published spatial data runs two Visium HD sections and identifies 150 regionally enriched genes absent from any public database. These 150 genes become the core of a custom Xenium panel applied to 80 mice. Without the discovery phase, those spatially informative genes would never have been included.

Figure 3: Visium to Xenium Cascade Workflow. A horizontal three-stage pipeline showing the sequential workflow from Visium whole-transcriptome discovery through panel design and optimization to full-cohort Xenium targeted profiling. Each stage is annotated with sample numbers and key decision points, illustrating how discovery-phase findings directly inform targeted panel composition.

Choosing Between Pre-Designed and Custom Panels

Both Xenium and CosMx offer pre-designed panels covering common tissue types. The Xenium Multi-Tissue and Human Brain panels include 250 to 500 genes. CosMx Universal Cell Characterization panels serve a similar purpose.

Pre-designed panels are convenient and validated across multiple tissue types. Their limitation is breadth over depth. If your study targets a specific pathway or disease mechanism underrepresented in the pre-designed set, a custom panel design will deliver more relevant data.

A cost-effective middle path: use a pre-designed panel for initial feasibility characterization, then design a custom panel for the main cohort based on the findings. This two-phase approach ensures the custom panel targets only the genes that matter for your specific question while the pre-designed panel de-risks the discovery phase.

One additional consideration is the update cycle. Pre-designed panels are updated periodically by the manufacturer — new versions may include improved probe designs or expanded gene coverage. If your study timeline spans multiple panel versions, factor in the cost of re-validation when switching between pre-designed iterations. Custom panels, once validated, remain stable across the entire project, which simplifies longitudinal data comparison.

Validating Your Panel Before Scaling

Running a pilot on one or two representative tissue sections is the single highest-return investment in the panel design process. No amount of in silico analysis can predict how every gene will perform on fixed tissue under your specific protocol. A single pilot run typically costs a fraction of a full cohort experiment and can prevent the much larger expense of discovering, after all samples have been processed, that a key cell type was never detected.

Pilot QC Metrics

After the pilot run, evaluate three metrics.

Gene detection rate. What fraction of panel genes produce signal above the negative control threshold? A rate below 70 percent indicates poor probe performance, low transcript abundance, or inadequate RNA quality.
Cell segmentation quality. Dense tissues, lipid-rich regions, and autofluorescent samples can degrade segmentation accuracy. Inspect boundaries manually across multiple tissue regions.
Signal-to-background ratio. Divide mean positive control intensity by mean negative control intensity. A ratio below three suggests the assay is not resolving real signal from background.

Pilot Decision Tree

Three outcomes are possible after the pilot.

Proceed. All cell types detectable, detection rate above 70 percent, controls within range. The panel is ready for the full cohort.
Revise and re-pilot. Missing cell types or poor detection in specific regions. Replace failing markers and run a second pilot.
Reassess. If RNA quality or platform compatibility is fundamentally limiting, reconsider sample preparation or platform choice before investing further.

Figure 5: Pilot Decision Tree. A vertical decision flowchart with three terminal outcomes: proceed to full cohort, revise and re-pilot, or reassess platform or sample preparation. Each branch is triggered by a specific QC outcome — detection rate above or below 70 percent, successful or failed cell segmentation, and signal-to-background ratio above or below three.

Panel Design Without scRNA-seq Data

When no single-cell reference is available, panel design becomes harder but not impossible.

Public marker catalogs. PanglaoDB and CellMarker 2.0 provide curated lists cross-referenced across studies.
Literature-based curation. Mine published bulk RNA-seq or proteomics data for candidate markers in your tissue.
Small-panel pilot strategy. Start with 30 to 50 lineage-defining genes, pilot on one section, assess detectability, and expand iteratively. Slower but reduces risk of committing to a large untested gene set.

Figure 6: Data Source Comparison — Three Paths for Panel Design. A three-column comparison layout showing the data quality, preparation effort, and best-use scenarios for each of three reference data sources: matched scRNA-seq data, public atlas data, and scenarios without single-cell reference data.

Common Panel Design Mistakes

All markers from one reference cluster. A gene list derived entirely from a single scRNA-seq cluster risks spatial bias — that cluster may reflect a dissociation artifact or a rare subpopulation not representative of the whole tissue. Cross-reference markers against at least two independent sources, ideally one from a different laboratory or platform.

Lowly expressed genes in the panel. High fold-change but low absolute counts equals poor spatial signal. Replace any marker below the 25th expression percentile in your reference data with a more abundant alternative. A common workaround is to check whether the gene has been successfully detected by RNAscope or smFISH in the same tissue type — positive detection data is a strong indicator of spatial platform compatibility.

Ignoring FFPE constraints. FFPE reduces detection efficiency for longer transcripts. Prioritize transcripts under 2 kb and include extra housekeeping controls. A marker that works well in fresh-frozen tissue may fail entirely in FFPE. When planning for FFPE, run a quick probe performance prediction using the manufacturer's design pipeline before committing to the final panel composition.

Skipping negative controls. Without them, every signal could be autofluorescence or non-specific binding. Not negotiable.

Overlooking the bioinformatics handoff. The gene list you design must be compatible with the analysis pipeline that will process the spatial data. If your panel includes genes with ambiguous annotation, overlapping genomic coordinates, or poorly characterized splice variants, they may be filtered out during alignment or quantification, effectively wasting those slots. Verify gene identifiers against the platform's reference genome annotation before finalizing the panel.

Treating panel design as a single-pass exercise. The most expensive mistake. Always build in a pilot phase with a clear decision gate before committing to a large cohort. A two-week pilot run can save months of rework on a failed 200-sample experiment.

How CD Genomics Supports Targeted Panel Design

Designing a targeted spatial panel involves multiple specialized steps: scRNA-seq data analysis, marker gene selection, platform-specific probe design, pilot execution, and iterative QC. Each step carries its own failure modes — a mis-specified gene identifier, an overlooked correlation between two family members, an unpiloted FFPE section — that can silently degrade the final dataset.

CD Genomics provides end-to-end support across this workflow, from reference data processing and gene stratification through custom probe coordination, pilot testing, and full-cohort spatial transcriptomics. Whether you are starting with your own single-cell data, working from public atlases, or building a panel from published markers, our team can help translate your research question into a focused, cost-effective gene set that preserves cell type resolution and biological signal. Contact our team to discuss your panel design project, reference data requirements, and pilot strategy — we can help you move from gene list to validated spatial panel in a structured, repeatable workflow.

FAQ

Q1: I only have Visium data, not scRNA-seq. Can I still design a Xenium panel?
Yes. Visium whole-transcriptome data can substitute for scRNA-seq provided spots approximate cell-type resolution. For larger spots, tools like cell2location or RCTD estimate cell-type proportions and guide marker selection.

Q2: What is the main difference between Xenium and CosMx panel design workflows?
Design principles are similar, but panel size limits and probe chemistry differ. Xenium supports up to 5,000 genes with fixed in situ probes. CosMx supports up to 6,000 with modular chemistry. The four-layer architecture applies to both.

Q3: What ratio of cell-type to pathway markers do you recommend?
60:40 for discovery studies; 40:60 for mechanism-focused projects. Technical controls are separate.

Q4: Can I add genes after probe synthesis?
Yes, but ordering additional probes adds cost and turnaround time. The Layer 4 reserve strategy minimizes mid-project additions.

Q5: Does FFPE tissue require special panel design?
Yes. FFPE fragmentation reduces detection efficiency for longer transcripts. Prioritize shorter genes, include extra controls, and always pilot before scaling.

Q6: How do I know if my panel is ready before running the full experiment?
The pilot phase is designed to answer this. If your gene detection rate exceeds 70 percent, your signal-to-background ratio is above three, and all expected cell types are identifiable in the pilot data, your panel is ready for the full cohort. If any of these criteria are not met, use the decision tree in the validation section to determine whether to revise the panel or reassess the platform choice.

References

Milholland B, et al. Gene panel selection for targeted spatial transcriptomics. Genome Biol. 2024;25:31. doi: 10.1186/s13059-024-03174-1
Van den Berge K, et al. Spapros: probe set selection for targeted spatial transcriptomics. Nat Methods. 2024;21:2260-2270. doi: 10.1038/s41592-024-02496-z
Tian L, et al. Predictive and robust gene selection for spatial transcriptomics. Nat Commun. 2023;14:2091. doi: 10.1038/s41467-023-37392-1
Janesick A, et al. Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues. Nat Commun. 2025;16:64990. doi: 10.1038/s41467-025-64990-y

Related Services

Applications described are for research use only. Not for use in diagnostic procedures. (RUO)

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.