OpenAI Evals

Normal

GitHub Python NOASSERTION

Description

OpenAI's framework for evaluating LLMs and LLM systems, providing an open-source registry of benchmarks and tools for systematic model assessment.

Related Projects

UQLM

1.2k · Python

Active

CVS Health's open-source uncertainty quantification library for language models, providing UQ-based hallucination detection with confidence scoring and mitigation tools to identify and reduce unreliable LLM outputs.

hallucination-detectionuncertainty-quantificationllm-evaluation +2

Guardrails AI

7.0k · Python

Active

Guardrails AI adds programmable guardrails to large language models, ensuring reliability and safety through input/output validation, structured data extraction, and custom validators.

guardrailsllm-safetyvalidation +2

Garak

8.0k · Python

Active

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluation +2

OpenCompass

7.1k · Python

Active

OpenCompass is a comprehensive LLM evaluation platform supporting a wide range of models including Llama, Mistral, GPT-4, Qwen, GLM, and Claude across 100+ benchmark datasets.

llm-evaluationbenchmarkevaluation-platform +1

OpenAI Evals

Description

Tags

Categories

Related Projects

UQLM

Guardrails AI

Garak

OpenCompass