What it does
Scans a Kiteworks folder for sensitive entities (credit cards, IBANs, emails, phone numbers, IP addresses, Dutch BSN, US Social Security numbers and ITINs, UK NHS numbers, DEA medical license numbers, crypto wallet addresses, passport and ID-card MRZ — TD3/TD2/TD1 machine-readable zones) — including OCR over scanned images and PDFs — previews the findings, and on confirmation writes a redacted copy next to each file and quarantines the original for human deletion.
See it in action
Scan Finance/Invoices for sensitive data and show me what would be redacted
Pinning Finance/Invoices and scanning every supported file with the platform's local detection engine. You get per-file counts only — the matched text never leaves the platform, and nothing is changed in a preview.
| File | Credit cards | IBANs | Emails |
|---|---|---|---|
| supplier-invoices-q1.csv | 14 | 0 | 5 |
| payment-runs.txt | 0 | 6 | 0 |
| contact-sheet.docx | 0 | 0 | 2 |
Scan Finance/Invoices for payment data
Using the financial-identifiers focus: only credit-card and IBAN findings are detected and planned for redaction. Counts only — the matched values never leave the platform.
| File | Credit cards | IBANs |
|---|---|---|
| supplier-invoices-q1.csv | 14 | 0 |
| payment-runs.txt | 0 | 6 |
Redact the files we just previewed in Finance/Invoices
Re-checking the folder against the confirmed plan first — if anything changed since the preview, the apply refuses and nothing is touched. Then each planned file gets a redacted copy, and the original moves to "To delete".
| Original | Redacted copy | Original moved to |
|---|---|---|
| supplier-invoices-q1.csv | supplier-invoices-q1-redacted.csv | To delete |
| payment-runs.txt | payment-runs-redacted.txt | To delete |
| contact-sheet.docx | contact-sheet-redacted.docx | To delete |
Relevant regulations and standards
Frameworks and mandates this agent helps you address. Not a certification — your own controls and assessment still apply.
Tags
What's new
latest 0.4.4Published version history. The latest version is what new installs receive; your administrator chooses when to upgrade.
-
0.4.4stable latest 2026-06-16- Redacted-copy names now carry a platform-generated uniqueness token by default:
<stem>-redacted-<24hex>.<ext>(and<stem>-partially-redacted-<24hex>.<ext>). The token closes a cross-run check-then-write race where two redactions of the same folder could overwrite/version each other's copy on a same-name collision — so it is now on at every apply, not only under write fan-out. Behavior, counts, coverage, reason codes, and the plan/plan_hashcontract are unchanged; only the per-fileredacted_namediffers (the exported CSV is keyed on the ORIGINAL file name, so it is unaffected). Prior tokenized copies are still recognized as platform-derived (never re-redacted). Operators who must keep the legacy clean<stem>-redacted.<ext>names can setKW_ENTITY_UNSAFE_CLEAN_REDACTED_COPY_NAMES=1, which explicitly accepts the documented same-name cross-run race (logged at startup). - Platform-side in the same train (no agent input/output schema change): the per-degree concurrency env knobs
KW_CONCURRENCY_ENTITY/KW_ENTITY_WRITE_FANOUTare removed — entity scan/redact concurrency now derives from the platform's governed resource pool, parallel by default, fail-closed to sequential. - Report/output change only (the default
redacted_nameshape): patch release; theplan_hashcontract is unchanged, so a 0.4.3 preview still validates a 0.4.4 apply.
- Redacted-copy names now carry a platform-generated uniqueness token by default:
-
0.4.3stable 2026-06-16- Apply now writes the redacted copies through the platform's bounded-concurrent
redact_scopeinstead of issuing file-by-file writes from the agent — the symmetric change to 0.4.2's preview adoption ofscan_scope. At the defaultKW_ENTITY_WRITE_FANOUT=1, apply output and behavior are byte-identical to the prior sequential path (the platform's K=1 fast path). When write fan-out is enabled (>1), the platform preserves deterministic input-order outcomes, counts, coverage, and closed reason codes, but a redacted copy's name may carry a platform-generated uniqueness token to prevent same-name overwrite/versioning races — so a per-file outcome'sredacted_namecan differ from K=1 (the exported CSV, keyed on the original file name, is unaffected). Apply also runs identically under default subprocess isolation. Preview is unchanged.
- Apply now writes the redacted copies through the platform's bounded-concurrent
-
0.4.2stable 2026-06-15- Preview now scans the whole folder through the platform's bounded-concurrent
scan_scopeinstead of file-by-file, so a multi-file folder previews faster on a capable server (governed byKW_CONCURRENCY_ENTITY; default1= sequential). The plan, the per-file rows, and the confirmationplan_hashare byte-identical at any concurrency degree — the platform folds results in deterministic candidate order — so this is a pure performance change with no behavioral or output difference. Apply is unchanged.
- Preview now scans the whole folder through the platform's bounded-concurrent
-
0.4.1stable 2026-06-14- MRZ detection covers all three ICAO 9303 travel-document layouts: TD3 passports (2x44, as before), TD2 ID documents (2x36, Part 6), and TD1 ID cards (3x30, Part 5) — each confirmed via its own four check digits (document number, date of birth, expiry, composite; near-zero false positives). Fires on OCR-scanned images/PDFs AND on MRZ text pasted into text files. Visas (MRV-A/B) are out of scope.
- A recognizable-but-unsupported MRZ variant — currently a TD1 long-document-number card (the >9-char document number puts a filler in the line-1 check slot, so the standard check fails) — no longer reads as clean: the file is flagged
needs_reviewwith reasonunsupported_mrz_variant, surfaced inblocked_from_apply, and bound intoplan_hash. It is NOT redacted (the overflow validation is out of scope) but it can never be silently treated as having no MRZ. The confirmation hash iskw-entity-plan/8. - TD1 reality check, disclosed: most ID cards carry the MRZ on the BACK and the portrait on the FRONT, so a scanned MRZ side typically has no associated portrait — such files report needs_review (portrait_not_detected) and are never partial-redaction eligible. The acknowledged partial override applies only where a document-associated portrait is detected on the MRZ page (e.g. passports, passport cards).
- Report/output change only (a previously undetected ID card may now carry MRZ counts): patch release; the plan-hash contract is unchanged, but a preview taken on 0.4.0 will not validate an apply if its folder contains newly detected zones (count drift flips the gate, by design — run a fresh preview).
Install in Claude Code
claude plugin marketplace add \
kiteworks/agent-marketplace
claude plugin install \
kiteworks-entity-detector@kiteworks
Prerequisites
-
Kiteworks Compliance Runtime — install via
pip install kw-mcp-gateway(host>=1.0.0,<2.0.0). This agent calls into the runtime for deterministic, audited execution. -
Official Kiteworks MCP
>=9.3.0(used by the runtime) — install and sign in from github.com/kiteworks/mcp. - Python
>=3.11.
Connect from Claude
Add this marketplace as a remote MCP connector in Claude Desktop or Claude Code — point it at <your-host>/mcp. One process per deployment; no per-machine install. Requires the official Kiteworks MCP to be configured.