Giskard
ActiveDescription
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.
A security scanner for LLM agentic workflows. Automatically detects security vulnerabilities, prompt injection risks, and permission violations in agent pipelines before deployment.
An automatic prompt optimization framework by Salesforce AI Research that leverages LLMs to search for and refine prompts for improved model performance.
A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.