PydanticAI Harness
ActiveDescription
Batteries for your Pydantic AI agent — official harness providing testing, evaluation, and debugging infrastructure.
Batteries for your Pydantic AI agent — official harness providing testing, evaluation, and debugging infrastructure.
AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.
DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.
Framework for running agent evaluations and creating RL environments to measure and improve agent performance
An open-source tool from Meta for LLM prompt optimization. Automates the process of continuously improving and refining LLM prompts.