🛡️

Security & Guardrails

AI safety evaluation, red-teaming, LLM guardrails, vulnerability scanning, and compliance audit tools

🏆 Top 20 Ranking

62 projects

Promptfoo

22.5k · TypeScript

Active

CLI tool that combines LLM prompt testing with red-teaming.

promptfootestingred-team +1

Promptfoo

22.5k · TypeScript

Active

Test and evaluate LLM prompts, agents, and RAG pipelines. Built-in red teaming and security evaluation for reliable AI applications.

testingevaluationred-teaming +2

SWE-agent

19.6k · Python

Active

SWE-agent takes a GitHub issue and automatically generates fixes using your LLM of choice, also applicable to cybersecurity auditing and competitive coding. NeurIPS 2024 paper.

swecodingagent +2

Anthropic Cybersecurity Skills

18.8k · Python

Active

754 structured cybersecurity skills for AI agents mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND and NIST AI RMF. Works with Claude Code, Codex CLI, Cursor, Gemini CLI and 20+ platforms.

pythonsecurityagent +2

OpenAI Evals

18.7k · Python

Normal

OpenAI's framework for evaluating LLMs and LLM systems, providing an open-source registry of benchmarks and tools for systematic model assessment.

llm-evaluationbenchmarkevals +2

PentAGI

17.9k · Go

Active

Fully autonomous AI Agents system capable of performing complex penetration testing tasks using multi-agent architecture with support for multiple LLM providers.

securitytestingmulti-agent +2

PentestGPT

13.9k · Python

Active

An automated penetration testing agentic framework powered by large language models for security testing and vulnerability discovery.

penetration-testingsecurityllm +2

E2B

12.7k · Python

Active

E2B provides secure cloud sandboxes for AI agents, supporting code execution, file operations, and isolated compute as an execution layer for coding and automation workflows.

sandboxcode-executionsecurity +1

Portkey AI Gateway

12.2k · TypeScript

Active

Portkey AI Gateway is a blazing fast AI gateway with integrated guardrails, routing to 200+ LLMs with 50+ AI guardrails through a single fast and friendly API.

gatewayllm-routingguardrails +2

OpenSandbox

11.6k · Python

Active

OpenSandbox is an open-source, secure, fast, and extensible sandbox runtime for AI agents, developed by Alibaba.

sandboxai-infrastructurekubernetes +2

HexStrike AI

9.8k · Python

Normal

HexStrike AI is an advanced MCP server that lets AI agents autonomously run 150+ cybersecurity tools for automated pentesting, vulnerability discovery, and security research.

cybersecuritypentestingmcp-server +2

Presidio

9.5k · Python

Active

Microsoft's open-source context-aware PII detection and de-identification SDK for text, images, and structured data, providing sensitive data protection for LLM applications and agents.

pii-detectiondata-maskingprivacy +2

SkillSpector

9.4k · Python

Active

NVIDIA's SkillSpector inspects and evaluates the tool-use and function-calling skills of LLM agents against safety, correctness, and performance criteria.

security-guardrailsmcpstatic-analysis +1

GhidraMCP

9.3k · Java

Stale

MCP server for Ghidra reverse engineering platform, enabling AI agents to autonomously perform binary analysis and vulnerability discovery.

mcpreverse-engineeringghidra +2

CAI

9.2k · Python

Active

Alias Robotics' open-source AI security research agent framework for multi-agent orchestration of cybersecurity tasks, integrating 300+ AI models, designed for red-team operations and security research.

cybersecurityai-agentsred-team +2

Garak

8.2k · Python

Active

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluation +2

OpenShell

7.2k · Rust

Active

OpenShell is the safe, private runtime for autonomous AI agents, developed by NVIDIA. Provides controlled execution environments and resource management.

rustagentframework +2

Guardrails AI

7.0k · Python

Active

Guardrails AI adds programmable guardrails to large language models, ensuring reliability and safety through input/output validation, structured data extraction, and custom validators.

guardrailsllm-safetyvalidation +2

Guardrails AI

7.0k · Python

Active

Open-source library for structured validation and safety guardrails on LLM outputs.

guardrailsvalidationsafety +1

Microsandbox

6.6k · Rust

Active

Secure, local, cross-platform and programmable sandboxes for AI agents. Provides strict resource isolation using microVM technology.

rustagenttools +2

Superagent

6.6k · TypeScript

Normal

Superagent protects AI applications against prompt injections, data leaks, and harmful outputs, embedding safety directly into your app.

ai-safetyguardrailsagent-tools +2

NeMo Guardrails

6.5k · Python

Active

NVIDIA NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational systems, supporting topic control, safety enforcement, and dialog guidance.

guardrailsllm-safetynvidia +2

NeMo Guardrails

6.5k · Python

Active

NVIDIA's LLM conversational guardrails framework with programmable safety boundaries.

nemoguardrailsnvidia +1

Giskard

5.4k · Python

Active

An open-source evaluation and testing library for LLM agents providing automated model scanning, bias detection, performance benchmarking, and compliance checks.

evaluationtestingllm-safety +3

(24 / 62)

Agent 评估LLM 评测自动化测试

Agent Evaluation and Testing: From Vibe Checks to End-to-End Pipelines

Most teams evaluate agents by checking a few examples. Real evaluation needs layered metrics, non-rotting datasets, and judges that push back. This article provides runnable code patterns and a practical decision framework.

security-guardrailsred-teamprompt-injection

AI Agent Guardrails and Red Teaming in Practice: From Rule Engines to Adversarial Evaluation

Five-layer defense plus red-team loop, built on five open-source projects you can copy.

AI Agent安全Prompt Injection

AI Agent Security in Practice: From Prompt Injection to Defense in Depth

A systematic walkthrough of three major attack surfaces in AI agents, with practical code examples for prompt injection defense, tool permission scoping, and output filtering.

AI 编程Coding AgentCLI

AI Coding Agents Deep Dive: Architecture Trade-offs from CLI to IDE-Integrated

A deep architectural comparison of seven open-source coding agents across three paradigms — CLI-first, IDE-integrated, and fully autonomous — examining context management, tool access, and autonomy levels to help you pick the right tool for each development scenario.

AI Agent沙箱代码执行

Sandboxing AI Agents: Isolation Strategies for Safe Code Execution

Comparing container, WebAssembly, and process-level isolation approaches, with practical code for safely executing agent-generated code.

llm-gatewaymodel-routingcost-optimization

LLM Routing and Multi-Model Gateways in Practice: A Production-Grade Multi-Model Architecture

Four LLM gateways compared, with production patterns for fallback, smart routing, cost observability, and scheduling.

Security & Guardrails

62 projects

Promptfoo

Promptfoo

SWE-agent

Anthropic Cybersecurity Skills

OpenAI Evals

PentAGI

PentestGPT

E2B

Portkey AI Gateway

OpenSandbox

HexStrike AI

Presidio

SkillSpector

GhidraMCP

CAI

Garak

OpenShell

Guardrails AI

Guardrails AI

Microsandbox

Superagent

NeMo Guardrails

NeMo Guardrails

Giskard

Related Articles

Agent Evaluation and Testing: From Vibe Checks to End-to-End Pipelines

AI Agent Guardrails and Red Teaming in Practice: From Rule Engines to Adversarial Evaluation

AI Agent Security in Practice: From Prompt Injection to Defense in Depth

AI Coding Agents Deep Dive: Architecture Trade-offs from CLI to IDE-Integrated

Sandboxing AI Agents: Isolation Strategies for Safe Code Execution

LLM Routing and Multi-Model Gateways in Practice: A Production-Grade Multi-Model Architecture