EasyJailbreak

Normal

GitHub Python GPL-3.0

Description

An easy-to-use Python framework for generating adversarial jailbreak prompts, helping researchers systematically evaluate LLM safety defenses with multiple attack method combinations.

Related Projects

FuzzyAI

1.4k · Jupyter Notebook

Stale

An automated LLM fuzzing tool by CyberArk that helps developers and security researchers identify and mitigate jailbreak vulnerabilities in LLM APIs with multiple attack vectors.

fuzzingllm-securityjailbreak +2

AgentDojo

560 · Python

Normal

A dynamic environment by ETH Zurich to evaluate attacks and defenses for LLM agents, providing standardized benchmarks for measuring agent system security.

security-benchmarkagent-evaluationattack-defense +2

AI Red Teaming Playground Labs

1.9k · TypeScript

Normal

Microsoft's open-source AI red teaming playground labs with infrastructure for running AI red teaming trainings and hands-on security exercises.

red-teamtrainingsecurity +2

JailTrickBench

162 · Python

Stale

Bag of Tricks for benchmarking jailbreak attacks on LLMs. NeurIPS 2024 paper providing empirical tricks for LLM jailbreaking with standardized evaluation.

benchmarkjailbreakllm-safety +2

EasyJailbreak

Description

Tags

Categories

Related Projects

FuzzyAI

AgentDojo

AI Red Teaming Playground Labs

JailTrickBench