Judgeval

Active

GitHub Python Apache-2.0

Description

An evaluation framework for LLM applications providing test set management, metric computation, and output quality assessment for agent development teams.

Related Projects

Agenta

4.1k · TypeScript

Active

Agenta is an open-source LLMOps platform providing prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

observabilityllmopsprompt-management +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Agents Towards Production

19.1k · Jupyter Notebook

Active

End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

agentframeworkevaluation +2

PrompToMatix

954 · Python

Stale

An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.

prompt-engineeringevaluationllm +1

Judgeval

Description

Tags

Categories

Related Projects

Agenta

Giskard

Agents Towards Production

PrompToMatix