AgentList
HomeProjectsArticlesAbout
Explore Projects
HomeProjectsArticlesAbout
Explore Projects
Projects JailTrickBench

JailTrickBench

Stale
GitHub Python MIT

Description

Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.

Tags

benchmark jailbreak llm-safety neurips evaluation

Categories

🛡️ Security & Guardrails
Visit GitHub Visit Website

Project Metrics

Stars 162
Forks 13
Watchers 162
Issues 4
Created June 13, 2024
Last commit November 30, 2024

Deployment

Local

Related Projects

Jailbreak LLMs

3.7k · Jupyter Notebook
Stale

A dataset of 15,140 ChatGPT prompts including 1,405 jailbreak prompts from Reddit, Discord, and other platforms, providing a large-scale benchmark for LLM safety research and jailbreak detection.

jailbreakllm-safetybenchmark +2

Giskard

5.3k · Python
Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

AgentShield Benchmark

21 · TypeScript
Active

Open benchmark for AI agent security tools, evaluating prompt injection, data exfiltration, tool abuse, and provenance tracking.

securitybenchmarkai-safety +2

EasyJailbreak

851 · Python
Normal

An easy-to-use Python framework for generating adversarial jailbreak prompts, helping researchers systematically evaluate LLM safety defenses with multiple attack method combinations.

jailbreakadversarialllm-safety +2
AgentList

The most comprehensive directory of open-source AI Agent projects. Discover and compare top Agent frameworks like LangChain, CrewAI, and more.

Quick Links

  • Project List
  • Featured Articles
  • Browse Categories

Contact

  • About
  • Privacy Policy
  • Contact Us

© 2026 AgentList. All rights reserved.

Made with for the open source community