AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Judgeval

Judgeval

Active
GitHub Python Apache-2.0

Description

An evaluation framework for LLM applications providing test set management, metric computation, and output quality assessment for agent development teams.

Tags

evaluation prompt-testing llm-quality

Categories

📊 Observability
Visit GitHub

Project Metrics

Stars 1.0k
Forks 92
Watchers 1.0k
Issues 17
Created October 25, 2024
Last commit May 11, 2026

Deployment

Local

Related Projects

Agenta

4.1k · TypeScript
Active

Agenta is an open-source LLMOps platform providing prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

observabilityllmopsprompt-management +2

Giskard

5.3k · Python
Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Agents Towards Production

19.1k · Jupyter Notebook
Active

End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

agentframeworkevaluation +2

PrompToMatix

954 · Python
Stale

An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.

prompt-engineeringevaluationllm +1
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community