Arthur Bench

Normal

GitHub TypeScript MIT

Description

An open-source evaluation tool for generative AI applications, helping teams build test suites, compare model outputs, and track quality changes over time.

Related Projects

Agenta

4.1k · TypeScript

Active

Agenta is an open-source LLMOps platform providing prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

observabilityllmopsprompt-management +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

Agents Towards Production

19.1k · Jupyter Notebook

Active

End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.

agentframeworkevaluation +2

PrompToMatix

954 · Python

Stale

An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.

prompt-engineeringevaluationllm +1

Arthur Bench

Description

Tags

Categories

Related Projects

Agenta

Giskard

Agents Towards Production

PrompToMatix