pytest-evals

Stale

GitHub Jupyter Notebook MIT

Description

A pytest plugin for running and analyzing LLM evaluation tests, enabling systematic validation of AI agent performance.

Related Projects

PyRIT

3.7k · Python

Active

The Python Risk Identification Tool for generative AI — an open-source framework by Microsoft for proactively identifying risks in generative AI systems through red teaming and automated probing.

pythonsecurityevaluation +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Purple Llama

4.1k · Python

Active

Meta's set of tools to assess and improve LLM security, including safety benchmarks, prompt injection detection, and output auditing to help evaluate and enhance the safety of large language models.

securityevaluationpython +2

LLM Guard

2.9k · Python

Stale

The security toolkit for LLM interactions, providing prompt injection detection, PII anonymization, content safety auditing, and more to secure production LLM deployments.

securityllmpython +2

pytest-evals

Description

Tags

Categories

Related Projects

PyRIT

Giskard

Purple Llama

LLM Guard