A framework for few-shot evaluation of language models by EleutherAI, providing standardized evaluation pipelines supporting hundreds of benchmark tasks and widely adopted as a core LLM evaluation tool in the community.

llm-evaluationbenchmarkevaluation-framework +2

Windows MCP

5.8k · Python

Active

Windows MCP is an MCP server for the Windows desktop, providing AI agents with computer-use capabilities for desktop automation and system operations.

mcpwindowsdesktop-automation +2

Windows Agent Arena

Description

Tags

Categories

Related Projects

Bananalyzer

WebQA Agent

LM Evaluation Harness

Windows MCP