Microsoft Presidio
Type: full-code · Vendor: Microsoft · Language: Python · License: MIT · Status: active · Status in practice: mature · First released: 2019-08-01
Microsoft Presidio detects personally identifiable information in text and images and then anonymizes it through configurable operators, so sensitive data can be de-identified before or after it is handled by a model.
Description. Presidio is an open-source PII de-identification SDK from Microsoft. Its Analyzer identifies private entities using named-entity recognition, regular expressions, rule-based logic, and checksums across multiple languages, and its Anonymizer replaces, redacts, masks, hashes, or encrypts the detected entities through built-in operators. A separate Image Redactor module detects and redacts PII text in standard and DICOM images. The recognizers and operators are pluggable and customizable to specific business needs.
Agent loop shape. Presidio is a library invoked around a model rather than a runtime loop. Text passes through the Analyzer, which returns the spans and types of detected PII; those spans are handed to the Anonymizer, which applies a chosen operator (replace, redact, mask, hash, encrypt) to produce de-identified output. Images are routed through the Image Redactor, which detects PII text in pixels and redacts it. The custom recognizers and operators are configured by the integrating application.
Primary use cases
- detecting PII entities in text
- anonymizing, masking, hashing, or encrypting detected PII
- redacting PII text in images and DICOM scans
- de-identifying data before or after model use
Key concepts
- Analyzer → pii-redaction (docs) — The detection engine that scans text and returns the spans and entity types of PII using NER models, regular expressions, rule-based logic, and checksum validation across multiple languages.
- Anonymizer → pii-redaction (docs) — The de-identification engine that applies an operator — replace, redact, mask, hash, or encrypt — to the entity spans the Analyzer found, producing anonymized output.
- Recognizers (docs) — Pluggable, customizable detectors for specific entity types that the Analyzer composes, letting teams add business-specific PII patterns beyond the built-in set.
- Image Redactor → multimodal-guardrails (docs) — A separate module that uses OCR to find PII text rendered as pixels in standard and DICOM images and redacts it, extending de-identification to the image modality.
Patterns this full-code implements —
- ★★PII Redaction
An Analyzer detects PII entities via NER/regex/rule-based recognizers and an Anonymizer redacts, masks, hashes, or encrypts them in text and images before or after model use.
- ★Multimodal Guardrails
Beyond text, the Image Redactor module detects and redacts PII text embedded as pixels in standard images and DICOM medical scans, so the same de-identification guardrail extends to the image modalit…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.