Microsoft Presidio

Type: full-code · Vendor: Microsoft · Language: Python · License: MIT · Status: active · Status in practice: mature · First released: 2019-08-01

Links: homepage docs repo

Microsoft Presidio detects personally identifiable information in text and images and then anonymizes it through configurable operators, so sensitive data can be de-identified before or after it is handled by a model.

Description. Presidio is an open-source PII de-identification SDK from Microsoft. Its Analyzer identifies private entities using named-entity recognition, regular expressions, rule-based logic, and checksums across multiple languages, and its Anonymizer replaces, redacts, masks, hashes, or encrypts the detected entities through built-in operators. A separate Image Redactor module detects and redacts PII text in standard and DICOM images. The recognizers and operators are pluggable and customizable to specific business needs.

Agent loop shape. Presidio is a library invoked around a model rather than a runtime loop. Text passes through the Analyzer, which returns the spans and types of detected PII; those spans are handed to the Anonymizer, which applies a chosen operator (replace, redact, mask, hash, encrypt) to produce de-identified output. Images are routed through the Image Redactor, which detects PII text in pixels and redacts it. The custom recognizers and operators are configured by the integrating application.

Primary use cases

detecting PII entities in text
anonymizing, masking, hashing, or encrypting detected PII
redacting PII text in images and DICOM scans
de-identifying data before or after model use

flowchart TD fw["Microsoft Presidio"] fw --> p1["PII Redaction<br/>(core)"] fw --> p2["Multimodal Guardrails<br/>(first-class)"]

Key concepts

Analyzer → pii-redaction (docs) — The detection engine that scans text and returns the spans and entity types of PII using NER models, regular expressions, rule-based logic, and checksum validation across multiple languages.
Anonymizer → pii-redaction (docs) — The de-identification engine that applies an operator — replace, redact, mask, hash, or encrypt — to the entity spans the Analyzer found, producing anonymized output.
Recognizers (docs) — Pluggable, customizable detectors for specific entity types that the Analyzer composes, letting teams add business-specific PII patterns beyond the built-in set.
Image Redactor → multimodal-guardrails (docs) — A separate module that uses OCR to find PII text rendered as pixels in standard and DICOM images and redacts it, extending de-identification to the image modality.