Beyond the Basics: STR Analysis Databases (DSMZ, ATCC, JCRB) Explained

If your team depends on human cell lines for research use only (RUO), you already know how quickly an authentication question can turn into a submission delay or a painful audit trail. This guide explains how to use the major STR profiling databases—DSMZ (DSMZCellDive), ATCC, and JCRB—together, so you can get defensible identity, provenance, and traceability without getting lost in the details. We'll keep the focus practical: how to compare results across loci panels, what "≥80% match" actually means for decisions, and how to document your calls for procurement, QA, and CRO intake.

Information herein is current as of 2026-02-06. Features and access policies may change; always verify on official portals.


1. Why STR databases matter for cell line authentication

In B2B settings—university cores, biotech discovery, and CRO intake—the job is not merely to "get a match." You need a defensible chain of evidence that proves identity (what is it?), provenance (where did it come from?), and traceability (can another party retrace the same steps?). An STR profiling database is your backbone for all three.

  • Identity: A reference STR profile lets you confirm that your allele calls at shared loci match a canonical entry for the intended line.
  • Provenance: Catalog IDs, depositor information, and cross-references (e.g., Cellosaurus links) help you connect a profile to a specific source or repository.
  • Traceability: Stable URLs, timestamps, and exportable records ensure another reviewer can repeat your lookup and reach the same conclusion.

1.1 What "database matching" actually answers

At a minimum, a search against a well-curated STR profiling database answers four questions:

  • Which known cell lines most closely match my profile at the shared loci?
  • How strong is the match (percent match or similarity) and how many loci contributed?
  • Are there aliases, derivatives, or lots for the same lineage that could explain small differences?
  • What documentation can I export to support my interpretation for a QA or journal checklist?

1.2 Typical buyer concerns you can resolve up front

Procurement officers, QA leads, and PIs tend to ask the same questions:

  • "Will my STR profile match their database even if we used a different kit?"
  • "Can the result be traced and reproduced by an auditor or reviewer?"
  • "If we only have a partial profile (e.g., FFPE or low-input), what's ‘good enough' to proceed?"
  • "If the line isn't in one database, where else should we look before escalating?"

The rest of this article builds a practical playbook to answer these questions—without overpromising beyond RUO scope.

Caption: STR profiling database matching workflow (as of 2026-02-06). Start with a clean profile and end with an audit-ready package.


2. A quick primer on an STR profile (only what you need for databases)

This is not a forensic deep dive. It's the minimum you need to read, compare, and document profiles across repositories.

2.1 Loci, allele calls, and why allele formatting consistency matters

Most human cell line authentication panels include a set of autosomal STR loci alongside Amelogenin for sex determination. Your database queries depend on two things: the overlap between your measured loci and the repository's loci, and the formatting of your alleles.

  • Loci: Expect the 8-core set (D5S818, D13S317, D7S820, D16S539, vWA, TH01, TPOX, CSF1PO) and, in modern panels, an extended 13–17 loci set recommended by the standard. More shared loci generally means more confident matching.
  • Formatting: Be consistent in representing microvariants (e.g., 9.3), delimiting multiple alleles at a locus, and labeling locus names exactly as expected by the input form. A stray label or a missing decimal on a microvariant can drop your top hit out of contention.

If your lab is new to reading STR outputs or electropherograms, it pays to start with a quick primer before you compare across databases. New to reading STR reports? Start here: how to interpret allele calls and electropherograms. CD Genomics guide to selecting a partner

2.2 Why different kits/loci panels can still be compared (and where it gets tricky)

Kits from different vendors often converge on overlapping loci, even if the exact panels differ. Cross-database tools accept partial inputs and compute similarity using only the shared loci. Where it gets tricky is microvariant handling, off-ladder alleles, and locus dropout (common with degraded or FFPE material). That's where normalization rules and clear notes about missing loci become essential.

2.3 Common match outputs you'll see

Most interfaces report a ranked list of candidates with a match percentage or relatedness score and the number of loci compared. You might see:

  • Exact/identical match: 100% at shared loci for a canonical entry (less common with legacy profiles or derivatives).
  • High-probability/related match: ≥80% similarity on shared loci; warrants provenance checks and alias review.
  • Partial or insufficient loci: A good directional hit but with too few shared loci to confirm; repeat or supplement if the decision is high-stakes.

3. DSMZ vs ATCC vs JCRB: what each STR profiling database is best for

All three resources are widely used, but they excel in different scenarios. Rather than chase a single "winner," use the one that matches your decision job.

3.1 DSMZ (DSMZCellDive): strengths, typical coverage, how researchers use it

  • Official portals: STR browser/search are accessible via DSMZCellDive, documented in the 2022 paper "DSMZCellDive: Diving into high-throughput cell line data" (PMCID PMC9334839; DOI 10.12688/f1000research.111175.2). The web tools are reachable at celldive.dsmz.de (STR browse and STR search sections).
  • What it's good at: Flexible search against a large reference set with ranked similarity outputs; helpful when you have legacy or partial profiles because the UI and underlying logic focus on shared loci and microvariant-aware matching as described in the associated literature.
  • Typical usage in publications and labs: Discovery-minded checks, cross-verification alongside Cellosaurus, and early triage when you aren't sure which repository hosts the canonical reference.
  • Notes on coverage and updates: DSMZ's public pages don't always pin an exact, timestamped human-profile count in one place; treat counts as volatile and capture a screenshot and URL in your audit package when you search.

3.2 ATCC: strengths, typical coverage, how it is referenced in publications and procurement

  • Official pages: Start with ATCC's "Interrogating the Database" and "STR Profiling Analysis" help documents, which explain locus inputs and matching logic. ATCC's tutorial materials and public pages describe filters for "≥80% match" and expanded search bands, and note that the database contains STR profiles for all ATCC human lines.
  • What it's good at: Audit-ready verification when the line of interest is an ATCC-distributed human cell line. The workflow and language map closely to the ANSI/ATCC standard, which many procurement and journal policies reference.
  • Threshold transparency: Public guidance recognizes 100% as identical, 80–99% as related/likely, and a lower band (often starting around the high 50s) for expanded searches that require additional analysis. This aligns with the widely adopted "≥80% confirmation" convention in the standard and related guidance.
  • Export and interoperability: Product pages often include per-line STR tables; batch APIs are not publicly documented for STR, so plan for UI-driven queries and page-level archiving.

3.3 JCRB Cell Bank: strengths, typical coverage, where it's especially useful

  • Official pages: The English portal provides search and detailed per-line pages. When STR data exist, you'll find them under "DNA Profile (STR)," often alongside lot-level QC dates and cross-links.
  • What it's good at: Provenance-rich entries for many lines with Japanese origins, plus explicit notes on lots and updates. Useful in alias resolution and when harmonizing across institutions that cite JCRB identifiers.
  • Export and interoperability: No public STR API; plan to archive the relevant per-line pages you consulted (URL + timestamp) into your audit package.

3.4 Practical takeaway: which database to prioritize by use case

  • Procurement and journal compliance: Start with ATCC when the expected canonical reference is an ATCC line; cross-check DSMZ entries when in doubt and archive both URLs in your memo.
  • Research QA with legacy kits/partial profiles: Start with DSMZCellDive (and/or Cellosaurus CLASTR) to broaden recall, then verify top hits on ATCC product pages or JCRB per-line entries.
  • CRO intake and batch triage: Use a CLASTR batch upload or DSMZ search to triage many profiles quickly; mirror high-confidence candidates to ATCC/JCRB for documentation.
  • Alias/derivative resolution: Favor JCRB per-line pages (with lots and synonyms) and Cellosaurus cross-references; reflect consistent naming in your report.

Caption: High-level, scenario-based comparison of the three STR profiling database options (as of 2026-02-06). Use it as a quick decision aid, not a substitute for an on-portal check.


4. Interoperability: how cross-database matching works in practice

4.1 The real-world problem: same line, multiple aliases, different submissions

It's common to find the same biological lineage under different labels—catalog IDs, repository codes, depositor names, or even lot-specific annotations. Without alias handling, you risk calling a "mismatch" where there's only a naming discrepancy or a derivative.

4.2 Normalization rules that matter: allele naming, microvariants, missing loci, and kit differences

Normalize before you search and before you interpret:

  • Locus naming: Match the input field names expected by the portal; follow its capitalization and delimiters.
  • Allele formatting: Preserve microvariants (e.g., 9.3) and use consistent separators for multiple alleles.
  • Missing loci: For partial profiles, clearly note which loci are missing; a candidate that shares all measured loci can be a strong lead even when some loci were not recovered.
  • Kit differences: Record the kit and version. Differences in standard panels can be reconciled by comparing only shared loci—just make it explicit in your notes.

Caption: "Same cell line, different labels." Map aliases across ATCC, DSMZ, JCRB, and Cellosaurus to avoid false mismatches.

4.3 Candidate ranking: how top hits are usually presented

Expect a ranked list by percent match or a distance-based score. Some tools let you toggle algorithms (e.g., Tanabe vs Masters) or filter by minimum match percentage. Don't over-interpret small rank differences when few loci are shared; instead, verify with provenance details on the repository pages.

4.4 How to document provenance so audits are painless

Build a small, repeatable template for each match:

  • Sample and run metadata: Sample ID, passage or lot, extraction date, kit/panel name and version, and the full list of loci attempted.
  • Allele table and EPGs: Include raw allele calls and electropherograms where available.
  • Exact search details: Portals used (DSMZCellDive, ATCC, JCRB pages, and/or CLASTR), input loci, and date/time of search.
  • URLs and screenshots: Stable URLs of matched entries, downloaded PDFs if available, and annotated screenshots.
  • Interpretation and thresholds: A brief, standard paragraph explaining your call (e.g., "≥80% across 15 shared loci; documented aliases link ATCC and JCRB entries").

For batch operations, store these artifacts in a structured folder or your LIMS/ELN with consistent naming.

For an example RUO deliverable package (allele tables, EPGs, and cross-database comparisons), see CD Genomics' STR assay and audit-ready deliverables: STR assay and audit-ready deliverables.


5. Match criteria and decision rules you can defend in audits

The ANSI/ATCC standard (ASN-0002-2022) and community guidance converge on a simple, defensible set of tiers you can incorporate into SOPs and procurement checklists.

5.1 Suggested decision tiers: Confirmed, Likely, Inconclusive, Mismatch

  • Confirmed: ≥80% allele concordance across the shared loci, with no conflicting provenance; aligns with standard practice for confirmation. Archive URLs and a note about shared loci.
  • Likely/Related: 80–89% with caveats that warrant an alias or derivative review (e.g., subclonal drift, MSI/LOH). Cross-check repository notes and Cellosaurus synonyms.
  • Inconclusive: 60–79%. Repeat extraction if possible, expand loci, and re-run against multiple portals. This band commonly reflects partial profiles or kit/locus mismatch.
  • Mismatch: <60% or clear conflict at multiple high-information loci. Re-extract and re-profile; treat as a potential mislabeling or contamination scenario.

When documenting, pair the tier with the number of shared loci and your normalization notes. This is what an auditor will look for.

5.2 Handling partial profiles (low DNA, degraded, FFPE)

Partial profiles happen. Treat them as directional evidence and decide whether to proceed or repeat:

  • If the decision risk is low (e.g., early discovery triage), a high-probability hit across the few shared loci can justify moving forward—so long as you document the limitation and plan to repeat.
  • If the decision risk is high (e.g., a high-value program or a late-stage verification gate), repeat extraction, increase loci coverage to ≥13, and re-run the search before you call it confirmed.

Tip: Make a standard note format for missing loci and possible dropout causes; it will save back-and-forth later.

5.3 When STR is not enough

STR is the backbone for human cell line identity, but there are cases where you should escalate:

  • Mixed or cross-species signals: Use species identification and targeted assays to rule out non-human contaminants; keep records alongside your STR match.
  • Ambiguous or borderline matches: Consider orthogonal data such as karyotyping or SNP-based panels to break ties.
  • Quality flags: If EPGs show artifacts or allelic imbalance, repeat the assay rather than force a call.

Micro-case 1 — Alias/derivative resolution (anonymized) Input (excerpt): D5S818:11,13; D13S317:8,11; D7S820:10,12; vWA:16,18; Amelogenin: X Cross-database result notes: Top hits returned near‑identical matches in Cellosaurus aggregated records and an ATCC product page; JCRB listed a historical synonym tied to the same donor. See the Cellosaurus aggregation for CVCL_1401 (accessed 2026-02-06). Judgment & conclusion: Profiles and cross-references indicate a known alias rather than contamination; call = Confirmed after archiving the ATCC/JCRB URLs and noting passage/lot metadata.

Micro-case 2 — Microvariant handling and derivative signal (anonymized) Input (excerpt): TH01:9.3,9; D16S539:9,13; D18S51:13,15; FGA:22,23 Cross-database result notes: DSMZCellDive/DSMZ literature examples show similar 9.3 microvariant conventions and reported high similarity scores for related derivatives (see DSMZCellDive discussion in Koblitz et al., 2022; accessed 2026-02-06). Cellosaurus entries for the same lineage record slightly different FGA calls likely due to subclonal variation. Judgment & conclusion: Microvariant notation matches DSMZ conventions; treat as Likely/Related — record microvariant formatting and recommend repeat PCR on ≥13 loci for confirmation.

Micro-case 3 — Partial/low-loci profile (anonymized) Input (excerpt): CSF1PO:11; TH01:7,9; TPOX:8; (only 4 loci recovered; sample = degraded/FFPE) Cross-database result notes: CLASTR/Cellosaurus and DSMZ triage returned several low-confidence candidates with 50–70% shared alleles; ATCC searches produced no ≥80% hits (accessed 2026-02-06). Judgment & conclusion: Evidence is directional but inconclusive; recommended action = re-extract and re-run with an extended kit (≥13 loci). If re-extraction impossible, report as Inconclusive with documented loci attempted and candidate URLs archived.

Why these micro-cases were chosen Cases were selected to cover three high-value decision jobs: alias/derivative resolution, microvariant/notation edge-cases, and partial-profile handling. Each demonstrates how cross-database evidence and provenance notes change the call and shows the practical artifacts (allele snippet, candidate ranks, source URLs, access dates) you should archive for auditability.

Need advanced use cases like mixed human/mouse signals or quantifying mixtures? See chimerism & xenograft STR. Chimerism and xenograft analysis using STR

Also, if you want a hands-on explainer of how to read STR reports before making escalation decisions, see the in-depth interpretation guide. How to interpret STR analysis reports


6. FAQs B2B teams ask before placing an order

6.1 "Can you compare to the database we use internally?"

Yes—export your internal allele tables in a clean CSV/XLS with locus names that match common panels. Tools like Cellosaurus CLASTR accept batch uploads and reconcile source conflicts by computing best/worst profiles. Your SOP should map internal headers to the portal's accepted locus labels and include a template file. Archive the run logs, settings, and result URLs.

6.2 "What if my cell line isn't in the big three databases?"

First, broaden the search: use CLASTR and DSMZCellDive with flexible locus inputs. Second, check repository-specific pages (ATCC product pages, JCRB line details) for aliases or derivatives. Third, consider that your profile could represent an under-documented derivative or a novel line; escalate to additional loci or orthogonal methods if the match remains <80% across a reasonable set of shared loci.

6.3 "What should I send to maximize match confidence?"

  • Provide ≥13-locus allele calls with consistent formatting, plus Amelogenin.
  • Include kit/panel name and version, sample provenance (passage/lot), and the date of extraction.
  • Share electropherograms (EPGs) alongside the allele table.
  • If you suspect degradation (e.g., FFPE), flag it so reviewers expect dropout and plan a re-extract.

Also consider: executing the workflow end-to-end (RUO)

If you need an RUO provider to run the STR assay and compile an audit-ready package (allele tables, EPGs, cross-database comparisons), CD Genomics supports cross-database matching against major public repositories and reports minor human–human contamination down to approximately 10% under typical conditions. See the service overview for scope, deliverables, and sample types. Cell line identification and authentication via STR profiling

Disclosure: This mention is included to illustrate practical execution options; choose any qualified RUO provider that meets your institutional SOPs and documentation requirements.


Practical references and standards you should know

  • The ANSI/ATCC standard (ASN-0002-2022) formalizes recommended autosomal loci and interpretation thresholds. Use its terminology in your SOPs and external-facing memos so procurement and journal staff are speaking the same language.
  • Official repository help pages (ATCC) describe how to input loci, interpret percent match bands, and archive results appropriately.
  • DSMZCellDive's 2022 article outlines the data portal that underpins DSMZ's search tools and documents high-level matching behavior and data sources.
  • JCRB's quality control policy and per-line pages often include lot-level dates and links that are invaluable when you need to trace provenance.

For cross-database similarity search, CLASTR (hosted by Cellosaurus) offers a batch-friendly interface that accepts standard CSV/XLS with locus headers and returns ranked hits.


Related Services


Author

Yang H. — Senior Scientist, CD Genomics; University of Florida.

Yang is a genomics researcher with over 10 years of research experience in genetics, molecular and cellular biology, sequencing workflows, and bioinformatic analysis. Skilled in both laboratory techniques and data interpretation, Yang supports RUO study design and NGS-based projects.


References (official URLs and DOIs where applicable)

  1. ANSI/ATCC. Authentication of Human Cell Lines: Standardization of STR Profiling (ASN-0002-2022). Access via ANSI Webstore: https://webstore.ansi.org/standards/atcc/ansiatccasn00022022
  2. International Cell Line Authentication Committee (ICLAC). Guide to Human Cell Line Authentication (2023). PDF: https://iclac.org/wp-content/uploads/ICLAC_Guide-to-Human-Cell-Line-Authentication_02-Mar-2023.pdf; Overview page: https://iclac.org/resources/human-cell-line-authentication/
  3. Koblitz J, et al. DSMZCellDive: Diving into high-throughput cell line data. F1000Research. 2022;11:486. DOI: https://doi.org/10.12688/f1000research.111175.2; PMCID: https://pmc.ncbi.nlm.nih.gov/articles/PMC9334839/
  4. ATCC. Interrogating the Database. Official help page describing inputs and percent-match interpretation (including ≥80% filters): https://www.atcc.org/search-str-database/interrogating-the-database
  5. ATCC. STR Profiling Analysis. Overview of ATCC STR database and analysis guidance: https://www.atcc.org/search-str-database/str-profiling-analysis
  6. ATCC. STR Authentication using the ATCC STR database (tutorial/guide). https://www.atcc.org/search-str-database/str-authentication-using-the-atcc-str-database
  7. Cellosaurus. STR Similarity Search Tool (CLASTR). Main page: https://www.cellosaurus.org/str-search/; Help: https://www.cellosaurus.org/str-search/help.html; Expasy entry: https://www.expasy.org/resources/str-similarity-search-tool
  8. JCRB Cell Bank. Quality control of cell lines (overview of tests including STR-PCR; per-line "DNA Profile (STR)" available on many detail pages). https://cellbank.nibiohn.go.jp/about-qc_english/
  9. JCRB Cell Bank. Per-line STR examples (representative detail pages where "DNA Profile (STR)" appears): HuH-7 (JCRB0403): https://cellbank.nibiohn.go.jp/~cellbank/en/search_res_det.cgi?ID=385; RPMI 8226 (JCRB0034): https://cellbank.nibiohn.go.jp/~cellbank/en/search_res_det.cgi?ID=226
  10. ATCC. Human Cell STR Profiling Service (standards alignment and deliverables noted). https://www.atcc.org/services/cell-authentication/human-cell-str-testing

Notes: Policies, coverage counts, and input requirements are subject to change. Verify details on official portals and archive URLs/timestamps in your internal records.

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
PDF Download
* Email Address:

CD Genomics needs the contact information you provide to us in order to contact you about our products and services and other content that may be of interest to you. By clicking below, you consent to the storage and processing of the personal information submitted above by CD Genomcis to provide the content you have requested.

×
Quote Request
! For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.
Contact CD Genomics
Terms & Conditions | Privacy Policy | Feedback   Copyright © CD Genomics. All rights reserved.
Top