Jailbreak LLMs
StaleDescription
A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.
A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.
An open-source benchmark for prompt injection attacks and defenses in LLMs, systematically evaluating the effectiveness of different attack strategies and defense mechanisms.
Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.
Vigil is an LLM security detection tool that identifies prompt injections, jailbreaks, and other potentially risky LLM inputs through multi-dimensional analysis for real-time safety protection.
Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.