pytest-evals
StaleDescription
A pytest plugin for running and analyzing LLM evaluation tests, enabling systematic validation of AI agent performance.
A pytest plugin for running and analyzing LLM evaluation tests, enabling systematic validation of AI agent performance.
The Python Risk Identification Tool for generative AI — an open-source framework by Microsoft for proactively identifying risks in generative AI systems through red teaming and automated probing.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
Meta's set of tools to assess and improve LLM security, including safety benchmarks, prompt injection detection, and output auditing to help evaluate and enhance the safety of large language models.
The security toolkit for LLM interactions, providing prompt injection detection, PII anonymization, content safety auditing, and more to secure production LLM deployments.