JailTrickBench
StaleDescription
Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.
Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.
A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.
An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.
Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.
An easy-to-use Python framework for generating adversarial jailbreak prompts, helping researchers systematically evaluate LLM safety defenses with multiple attack method combinations.