Kiteworks Agent Marketplace
Compliance-grade Data Loss Prevention

Entity Detector

by Kiteworks Inc. · v0.4.4

Find credit cards, IBANs, emails, phone numbers, IPs, Dutch BSN, US SSNs and ITINs, UK NHS numbers, DEA numbers and crypto wallets in a Kiteworks folder — preview the counts, then redact with one confirmation. Detection runs locally on the platform; file content never leaves it. Requires the platform entity engine (CPython 3.13); without it, detection reports unavailable and fails closed — never falsely clean.

Modifies your content · approval required Metadata only RBAC / ABAC Audit logged
Install View on GitHub
Free for Kiteworks customers
Version v0.4.4 Status Live Posture User-content mutation

What it does

Scans a Kiteworks folder for sensitive entities (credit cards, IBANs, emails, phone numbers, IP addresses, Dutch BSN, US Social Security numbers and ITINs, UK NHS numbers, DEA medical license numbers, crypto wallet addresses, passport and ID-card MRZ — TD3/TD2/TD1 machine-readable zones) — including OCR over scanned images and PDFs — previews the findings, and on confirmation writes a redacted copy next to each file and quarantines the original for human deletion.

entity-detection artifact-output

See it in action

You

Scan Finance/Invoices for sensitive data and show me what would be redacted

Entity Detector

Pinning Finance/Invoices and scanning every supported file with the platform's local detection engine. You get per-file counts only — the matched text never leaves the platform, and nothing is changed in a preview.

12 Files scanned
3 Files with findings
27 Entities found
FileCredit cardsIBANsEmails
supplier-invoices-q1.csv1405
payment-runs.txt060
contact-sheet.docx002
Read-only preview. To redact, run "apply" with the confirmation code from this preview — the platform then writes a redacted copy next to each file and moves the original into that folder's "To delete" for a human to review and delete.
Illustrative example · not live tenant data
You

Scan Finance/Invoices for payment data

Entity Detector

Using the financial-identifiers focus: only credit-card and IBAN findings are detected and planned for redaction. Counts only — the matched values never leave the platform.

12 Files scanned
2 Files with findings
20 Financial identifiers
FileCredit cardsIBANs
supplier-invoices-q1.csv140
payment-runs.txt06
Narrowed focus: other sensitive types (emails, phone numbers, IPs, BSN) are not scanned or redacted in this mode, and redacted copies are skipped by future runs. Run a general sensitive scan first if you want full coverage.
Illustrative example · not live tenant data
You

Redact the files we just previewed in Finance/Invoices

Entity Detector

Re-checking the folder against the confirmed plan first — if anything changed since the preview, the apply refuses and nothing is touched. Then each planned file gets a redacted copy, and the original moves to "To delete".

3 Files redacted
0 Failed
27 Entities redacted
OriginalRedacted copyOriginal moved to
supplier-invoices-q1.csvsupplier-invoices-q1-redacted.csvTo delete
payment-runs.txtpayment-runs-redacted.txtTo delete
contact-sheet.docxcontact-sheet-redacted.docxTo delete
Nothing was deleted — originals are quarantined in "To delete" so a human makes the final call. An outcome report (counts only) was saved to your Kiteworks Agents Output folder.
Illustrative example · not live tenant data

Relevant regulations and standards

Frameworks and mandates this agent helps you address. Not a certification — your own controls and assessment still apply.

GDPR PCI-DSS

Tags

piiredactionentity-detection

What's new

latest 0.4.4

Published version history. The latest version is what new installs receive; your administrator chooses when to upgrade.

  1. 0.4.4 stable latest 2026-06-16
    • Redacted-copy names now carry a platform-generated uniqueness token by default: <stem>-redacted-<24hex>.<ext> (and <stem>-partially-redacted-<24hex>.<ext>). The token closes a cross-run check-then-write race where two redactions of the same folder could overwrite/version each other's copy on a same-name collision — so it is now on at every apply, not only under write fan-out. Behavior, counts, coverage, reason codes, and the plan/plan_hash contract are unchanged; only the per-file redacted_name differs (the exported CSV is keyed on the ORIGINAL file name, so it is unaffected). Prior tokenized copies are still recognized as platform-derived (never re-redacted). Operators who must keep the legacy clean <stem>-redacted.<ext> names can set KW_ENTITY_UNSAFE_CLEAN_REDACTED_COPY_NAMES=1, which explicitly accepts the documented same-name cross-run race (logged at startup).
    • Platform-side in the same train (no agent input/output schema change): the per-degree concurrency env knobs KW_CONCURRENCY_ENTITY / KW_ENTITY_WRITE_FANOUT are removed — entity scan/redact concurrency now derives from the platform's governed resource pool, parallel by default, fail-closed to sequential.
    • Report/output change only (the default redacted_name shape): patch release; the plan_hash contract is unchanged, so a 0.4.3 preview still validates a 0.4.4 apply.
  2. 0.4.3 stable 2026-06-16
    • Apply now writes the redacted copies through the platform's bounded-concurrent redact_scope instead of issuing file-by-file writes from the agent — the symmetric change to 0.4.2's preview adoption of scan_scope. At the default KW_ENTITY_WRITE_FANOUT=1, apply output and behavior are byte-identical to the prior sequential path (the platform's K=1 fast path). When write fan-out is enabled (>1), the platform preserves deterministic input-order outcomes, counts, coverage, and closed reason codes, but a redacted copy's name may carry a platform-generated uniqueness token to prevent same-name overwrite/versioning races — so a per-file outcome's redacted_name can differ from K=1 (the exported CSV, keyed on the original file name, is unaffected). Apply also runs identically under default subprocess isolation. Preview is unchanged.
  3. 0.4.2 stable 2026-06-15
    • Preview now scans the whole folder through the platform's bounded-concurrent scan_scope instead of file-by-file, so a multi-file folder previews faster on a capable server (governed by KW_CONCURRENCY_ENTITY; default 1 = sequential). The plan, the per-file rows, and the confirmation plan_hash are byte-identical at any concurrency degree — the platform folds results in deterministic candidate order — so this is a pure performance change with no behavioral or output difference. Apply is unchanged.
  4. 0.4.1 stable 2026-06-14
    • MRZ detection covers all three ICAO 9303 travel-document layouts: TD3 passports (2x44, as before), TD2 ID documents (2x36, Part 6), and TD1 ID cards (3x30, Part 5) — each confirmed via its own four check digits (document number, date of birth, expiry, composite; near-zero false positives). Fires on OCR-scanned images/PDFs AND on MRZ text pasted into text files. Visas (MRV-A/B) are out of scope.
    • A recognizable-but-unsupported MRZ variant — currently a TD1 long-document-number card (the >9-char document number puts a filler in the line-1 check slot, so the standard check fails) — no longer reads as clean: the file is flagged needs_review with reason unsupported_mrz_variant, surfaced in blocked_from_apply, and bound into plan_hash. It is NOT redacted (the overflow validation is out of scope) but it can never be silently treated as having no MRZ. The confirmation hash is kw-entity-plan/8.
    • TD1 reality check, disclosed: most ID cards carry the MRZ on the BACK and the portrait on the FRONT, so a scanned MRZ side typically has no associated portrait — such files report needs_review (portrait_not_detected) and are never partial-redaction eligible. The acknowledged partial override applies only where a document-associated portrait is detected on the MRZ page (e.g. passports, passport cards).
    • Report/output change only (a previously undetected ID card may now carry MRZ counts): patch release; the plan-hash contract is unchanged, but a preview taken on 0.4.0 will not validate an apply if its folder contains newly detected zones (count drift flips the gate, by design — run a fresh preview).

Install in Claude Code

claude plugin marketplace add \
  kiteworks/agent-marketplace
claude plugin install \
  kiteworks-entity-detector@kiteworks

Prerequisites

  • Kiteworks Compliance Runtime — install via pip install kw-mcp-gateway (host >=1.0.0,<2.0.0). This agent calls into the runtime for deterministic, audited execution.
  • Official Kiteworks MCP >=9.3.0 (used by the runtime) — install and sign in from github.com/kiteworks/mcp.
  • Python >=3.11.

Connect from Claude

Add this marketplace as a remote MCP connector in Claude Desktop or Claude Code — point it at <your-host>/mcp. One process per deployment; no per-machine install. Requires the official Kiteworks MCP to be configured.