🛡️

Security & Guardrails

AI safety evaluation, red-teaming, LLM guardrails, vulnerability scanning, and compliance audit tools

43 projects

HexStrike AI

HexStrike AI is an advanced MCP server that lets AI agents autonomously run 150+ cybersecurity tools for automated pentesting, vulnerability discovery, and security research.

cybersecuritypentestingmcp-server +2

Giskard

5.3k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

MCP Context Forge

3.6k · Python

Active

An AI Gateway, registry, and proxy by IBM that sits in front of any MCP, A2A, or REST/gRPC APIs, exposing a unified endpoint with centralized discovery, guardrails, and management.

mcpa2aapi-gateway +4

GhidraMCP

8.5k · Java

Stale

MCP server for Ghidra reverse engineering platform, enabling AI agents to autonomously perform binary analysis and vulnerability discovery.

mcpreverse-engineeringghidra +2

NeMo Guardrails

6.0k · Python

Active

NVIDIA NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational systems, supporting topic control, safety enforcement, and dialog guidance.

guardrailsllm-safetynvidia +2

OpenShell

5.1k · Rust

Active

OpenShell is the safe, private runtime for autonomous AI agents, developed by NVIDIA. Provides controlled execution environments and resource management.

rustagentframework +2

Garak

7.6k · HTML

Active

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluation +2

Portkey AI Gateway

11.4k · TypeScript

Active

Portkey AI Gateway is a blazing fast AI gateway with integrated guardrails, routing to 200+ LLMs with 50+ AI guardrails through a single fast and friendly API.

gatewayllm-routingguardrails +2

SWE-ReX

485 · Python

Active

Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.

sandboxcode-executionswe-agent +3

SWE-agent

19.0k · Python

Active

SWE-agent takes a GitHub issue and automatically generates fixes using your LLM of choice, also applicable to cybersecurity auditing and competitive coding. NeurIPS 2024 paper.

swecodingagent +2

DeepCamera

2.7k · JavaScript

Active

Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. LLM-powered agentic security camera agent with pluggable AI skills. Runs on Mac Mini & AI PC.

javascriptagenttools +3

AI-Infra-Guard

3.5k · Python

Active

Tencent's full-stack AI red teaming platform integrating OpenClaw security scanning, agent scanning, skills scanning, MCP scanning, AI infrastructure scanning, and LLM jailbreak evaluation.

ai-securityred-teamingllm-security +2

Inspect AI

1.9k · Python

Active

A framework for large language model evaluations developed by the UK AI Safety Institute (AISI), providing comprehensive model capability assessment tools with support for safety and alignment testing.

llm-evaluationai-safetyevaluation-framework +2

Arrakis

802 · Go

Stale

Arrakis is a fully customizable and self-hosted sandboxing solution written in Go, designed specifically for AI agent code execution scenarios, providing a secure isolated runtime environment.

goagentsecurity +2

AgentShield

510 · TypeScript

Active

AI agent security scanner that detects vulnerabilities in agent configurations, MCP servers, and tool permissions. Available as CLI, GitHub Action, and GitHub App integration.

typescriptsecurityllm +2

AgentGateway

2.4k · Rust

Active

Next generation agentic proxy for AI agents and MCP servers. Provides unified traffic management, routing, and security control.

rustmcpagent +3

OpenSandbox

10.1k · Python

Active

OpenSandbox is an open-source, secure, fast, and extensible sandbox runtime for AI agents, developed by Alibaba.

sandboxai-infrastructurekubernetes +2

Archestra

3.6k · TypeScript

Active

Enterprise AI Platform with guardrails, MCP registry, gateway and orchestrator — comprehensive AI agent governance and management.

typescriptmcpsecurity +2

UQLM

1.1k · Python

Active

CVS Health's open-source uncertainty quantification library for language models, providing UQ-based hallucination detection with confidence scoring and mitigation tools to identify and reduce unreliable LLM outputs.

hallucination-detectionuncertainty-quantificationllm-evaluation +2

E2B

11.8k · Python

Active

E2B provides secure cloud sandboxes for AI agents, supporting code execution, file operations, and isolated compute as an execution layer for coding and automation workflows.

sandboxcode-executionsecurity +1

Agent Safehouse

1.6k · Shell

Active

Sandbox your local AI agents so they can only read and write what they need. File system permission control for secure local agent execution.

sandboxsecurityagent-tools +2

PentestGPT

12.7k · Python

Normal

An automated penetration testing agentic framework powered by large language models for security testing and vulnerability discovery.

penetration-testingsecurityllm +2

Guardrails AI

6.7k · Python

Active

Guardrails AI adds programmable guardrails to large language models, ensuring reliability and safety through input/output validation, structured data extraction, and custom validators.

guardrailsllm-safetyvalidation +2

LLM-Jailbreaks

603 ·

Stale

A comprehensive collection of LLM jailbreak techniques and prompts for ChatGPT, Claude, Llama, and other models — essential reference for LLM security research.

llmsecurityprompt-engineering

Purple Llama

4.1k · Python

Active

Meta's set of tools to assess and improve LLM security, including safety benchmarks, prompt injection detection, and output auditing to help evaluate and enhance the safety of large language models.

securityevaluationpython +2

PyRIT

3.7k · Python

Active

The Python Risk Identification Tool for generative AI — an open-source framework by Microsoft for proactively identifying risks in generative AI systems through red teaming and automated probing.

pythonsecurityevaluation +2

Agent Governance Toolkit

1.1k · Python

Active

Microsoft's AI Agent Governance Toolkit providing policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.

securityevaluationpython +2

Agentic Security

1.8k · Python

Normal

An open-source LLM vulnerability scanner and AI red teaming kit for automated security fuzzing of LLM applications, detecting jailbreaks, prompt injection, and adversarial attacks.

llm-securityred-teamingllm-fuzzer +2

Anthropic Cybersecurity Skills

5.3k · Python

Active

754 structured cybersecurity skills for AI agents mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND and NIST AI RMF. Works with Claude Code, Codex CLI, Cursor, Gemini CLI and 20+ platforms.

pythonsecurityagent +2

OpenAI Evals

18.2k · Python

Active

OpenAI's framework for evaluating LLMs and LLM systems, providing an open-source registry of benchmarks and tools for systematic model assessment.

llm-evaluationbenchmarkevals +2

AI Agents From Scratch

3.4k · JavaScript

Active

Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of function calling, memory, and ReAct patterns.

javascriptagentevaluation +2

LLM Guard

2.8k · Python

Stale

The security toolkit for LLM interactions, providing prompt injection detection, PII anonymization, content safety auditing, and more to secure production LLM deployments.

securityllmpython +2

Rebuff

1.5k · TypeScript

Stale

An LLM prompt injection detector that combines heuristics, vector similarity, and language model-based detection to identify and block malicious prompt injection attacks.

securityllmtesting +2

Rogue

1.0k · Python

Active

AI Agent Evaluator and Red Team Platform. Provides systematic security evaluation and adversarial testing tools to discover and fix vulnerabilities in agent systems.

securityevaluationobservability +2

Agent Scan

2.2k · Python

Active

Security scanner for AI agents, MCP servers, and agent skills by Snyk — detect and fix security vulnerabilities before deployment.

pythonsecuritymcp +2

Agentic Radar

953 · Python

Stale

A security scanner for LLM agentic workflows. Automatically detects security vulnerabilities, prompt injection risks, and permission violations in agent pipelines before deployment.

securityagentpython +2

CodeGate

784 · Python

Stale

Security gateway for AI coding agents providing security protection, workspace isolation, and multiplexing, supporting Claude, Copilot, Cline, and other IDE extensions to prevent sensitive data leaks and malicious prompt injections.

pythonsecurityagent +5

ToolHive

1.7k · Go

Active

An enterprise-grade platform for running and managing MCP servers with containerized deployment, security isolation, network policies, resource limits, and unified management of large-scale MCP server fleets via Kubernetes or Docker.

mcptoolsgo +4

Superagent

6.5k · TypeScript

Active

Superagent protects AI applications against prompt injections, data leaks, and harmful outputs, embedding safety directly into your app.

ai-safetyguardrailsagent-tools +2

Microsandbox

5.6k · Rust

Active

Secure, local, cross-platform and programmable sandboxes for AI agents. Provides strict resource isolation using microVM technology.

rustagenttools +2

Prompt Injection Defenses

677 ·

Stale

Every practical and proposed defense against prompt injection — a comprehensive reference for LLM security practitioners.

securityllmprompt-engineering

PentAGI

15.3k · Go

Active

Fully autonomous AI Agents system capable of performing complex penetration testing tasks using multi-agent architecture with support for multiple LLM providers.

securitytestingmulti-agent +2

Awesome-LLM-Safety

1.8k · HTML

Normal

A curated collection of safety-related papers, articles, and resources focused on Large Language Models — comprehensive reference for researchers and practitioners exploring LLM safety implications and advancements.

llmsecurityevaluation