JailTrickBench

Stale

Description

Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.

Related Projects

Jailbreak LLMs

3.7k · Jupyter Notebook

Stale

A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.

jailbreakllm-safetybenchmark +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

AgentShield Benchmark

21 · TypeScript

Active

Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.

securitybenchmarkai-safety +2

EasyJailbreak

851 · Python

Normal

An easy-to-use Python framework for generating adversarial jailbreak prompts, helping researchers systematically evaluate LLM safety defenses with multiple attack method combinations.

jailbreakadversarialllm-safety +2