A comprehensive benchmark to evaluate LLMs as agents (ICLR 2024), covering operating systems, databases, knowledge graphs, digital card games and more.

evaluationpythonagent +1

AgentLabs

546 · TypeScript

不活跃

AgentLabs 是一个面向 Agent 开发与测试的工具集合，强调实验、回放和开发流程辅助，适合帮助团队提升 Agent 迭代效率。

testingdeveloper-toolsevaluation +1

Coval

简介

标签

分类

相关项目

Giskard

PrompToMatix

AgentBench

AgentLabs