AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Deep Research Bench

Deep Research Bench

Active
GitHub Python Apache-2.0

Description

Comprehensive benchmark for deep research agents, providing systematic evaluation framework for assessing deep research agent performance.

Tags

benchmark evaluation deep-research testing agents

Categories

📊 Observability
Visit GitHub

Project Metrics

Stars 738
Forks 80
Watchers 738
Issues 22
Created June 13, 2025
Last commit May 11, 2026

Deployment

Local

Related Projects

AWS Agent Evaluation

364 · Python
Stale

Amazon's AI agent evaluation tool for automated quality assessment of Bedrock Agents and other LLM agents with multi-dimensional metrics and benchmarks.

awsevaluationbenchmark +2

Giskard

5.4k · Python
Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

AgentLabs

550 · TypeScript
Stale

AgentLabs is a toolkit for agent development and testing, focused on experimentation, replay, and workflow support to improve iteration speed.

testingdeveloper-toolsevaluation +1

DeepEval

15.9k · Python
Active

DeepEval is an open-source evaluation framework for LLM applications. It provides rich evaluation metrics and tools, supporting unit testing and integration testing to help developers build reliable LLM applications.

llmevaluationtesting +1
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community