AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Harbor

Harbor

Active
GitHub Python Apache-2.0

Description

Framework for running agent evaluations and creating RL environments to measure and improve agent performance

Tags

evaluation benchmark rl-environments agent-testing python

Categories

📊 Observability ⚡ Agent Tools
Visit GitHub Visit Website

Project Metrics

Stars 2.3k
Forks 1.1k
Watchers 2.3k
Issues 386
Created August 4, 2025
Last commit June 2, 2026

Deployment

Local

Related Projects

AgentLabs

550 · TypeScript
Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1

Prompt Ops

816 · Python
Normal

An open-source tool from Meta for LLM prompt optimization. Automates the process of continuously improving and refining LLM prompts.

prompt-engineeringllmtools +2

PydanticAI Harness

492 · Python
Active

Batteries for your Pydantic AI agent — official harness providing testing, evaluation, and debugging infrastructure.

pydantic-aitestingevaluation +2

DeepEval

15.9k · Python
Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community