Judgeval
ActiveDescription
An evaluation framework for LLM applications providing test set management, metric computation, and output quality assessment for agent development teams.
An evaluation framework for LLM applications providing test set management, metric computation, and output quality assessment for agent development teams.
Agenta is an open-source LLMOps platform providing prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
End-to-end, code-first tutorials for building production-grade GenAI agents. From prototype to enterprise deployment.
An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.