Harbor

Active

GitHub Python Apache-2.0

Description

Framework for running agent evaluations and creating RL environments to measure and improve agent performance

Related Projects

AgentLabs

550 · TypeScript

Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1

Prompt Ops

816 · Python

Normal

An open-source tool from Meta for LLM prompt optimization. Automates the process of continuously improving and refining LLM prompts.

prompt-engineeringllmtools +2

PydanticAI Harness

492 · Python

Active

Batteries for your Pydantic AI agent — official harness providing testing, evaluation, and debugging infrastructure.

pydantic-aitestingevaluation +2

DeepEval

15.9k · Python

Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1

Harbor

Description

Tags

Categories

Related Projects

AgentLabs

Prompt Ops

PydanticAI Harness

DeepEval