Jailbreak LLMs

Stale

GitHub Jupyter Notebook MIT

Description

A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.

Related Projects

Open-Prompt-Injection

439 · Python

Stale

An open-source benchmark for prompt injection attacks and defenses in LLMs, systematically evaluating the effectiveness of different attack strategies and defense mechanisms.

prompt-injectionbenchmarkllm-safety +2

JailTrickBench

162 · Python

Stale

Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.

benchmarkjailbreakllm-safety +2

Vigil

478 · Python

Stale

Vigil is an LLM security detection tool that identifies prompt injections, jailbreaks, and other potentially risky LLM inputs through multi-dimensional analysis for real-time safety protection.

prompt-injectionsecurityllm-safety +2

AgentShield Benchmark

21 · TypeScript

Active

Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.

securitybenchmarkai-safety +2

Jailbreak LLMs

Description

Tags

Categories

Related Projects

Open-Prompt-Injection

JailTrickBench

Vigil

AgentShield Benchmark