AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects Jailbreak LLMs

Jailbreak LLMs

Stale
GitHub Jupyter Notebook MIT

Description

A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.

Tags

jailbreak llm-safety benchmark dataset security

Categories

🛡️ Security & Guardrails
Visit GitHub

Project Metrics

Stars 3.7k
Forks 319
Watchers 3.7k
Issues 3
Created August 1, 2023
Last commit December 24, 2024

Deployment

Local

Related Projects

Open-Prompt-Injection

439 · Python
Stale

An open-source benchmark for prompt injection attacks and defenses in LLMs, systematically evaluating the effectiveness of different attack strategies and defense mechanisms.

prompt-injectionbenchmarkllm-safety +2

JailTrickBench

162 · Python
Stale

Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.

benchmarkjailbreakllm-safety +2

Vigil

478 · Python
Stale

Vigil is an LLM security detection tool that identifies prompt injections, jailbreaks, and other potentially risky LLM inputs through multi-dimensional analysis for real-time safety protection.

prompt-injectionsecurityllm-safety +2

AgentShield Benchmark

21 · TypeScript
Active

Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.

securitybenchmarkai-safety +2
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community