Presidio
ActiveDescription
Microsoft's open-source context-aware PII detection and de-identification SDK for text, images, and structured data, providing sensitive data protection for LLM applications and agents.
Key Features
- Context-aware PII detection — Identifies credit card numbers, names, addresses, and other sensitive entities using NER, regex, rule logic, and checksums
- Multiple de-identification modes — Supports masking, replacement, encryption, pseudonymization, and other anonymization strategies
- Image PII redaction — Built-in image text recognition and PII region masking, with DICOM medical image support
- Custom recognizers — Extend PII detection with custom recognizers and integrate external NLP models
- Multi-language support — Built-in PII detection across multiple languages for global data compliance
- Flexible deployment — Supports Python, PySpark, Docker, and Kubernetes deployment options
Use Cases
Categories
Quick Start
pip install presidio-analyzer presidio-anonymizer
python -m spacy download en_core_web_lg
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
text = "John's email is john@example.com, call him at 555-123-4567."
analyzer = AnalyzerEngine()
results = analyzer.analyze(text=text, language='en')
anonymizer = AnonymizerEngine()
print(anonymizer.anonymize(text=text, analyzer_results=results))