Microsoft Presidio
Microsoft Presidio detects personally identifiable information in text and images and then anonymizes it through configurable operators, so sensitive data can be de-identified before or after it is handled by a model.
Description
Presidio is an open-source PII de-identification SDK from Microsoft. Its Analyzer identifies private entities using named-entity recognition, regular expressions, rule-based logic, and checksums across multiple languages, and its Anonymizer replaces, redacts, masks, hashes, or encrypts the detected entities through built-in operators. A separate Image Redactor module detects and redacts PII text in standard and DICOM images. The recognizers and operators are pluggable and customizable to specific business needs.
Solution
Presidio is a library invoked around a model rather than a runtime loop. Text passes through the Analyzer, which returns the spans and types of detected PII; those spans are handed to the Anonymizer, which applies a chosen operator (replace, redact, mask, hash, encrypt) to produce de-identified output. Images are routed through the Image Redactor, which detects PII text in pixels and redacts it. The custom recognizers and operators are configured by the integrating application.
Primary use cases
- detecting PII entities in text
- anonymizing, masking, hashing, or encrypting detected PII
- redacting PII text in images and DICOM scans
- de-identifying data before or after model use
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.