🛡️

Best Security & Guardrails Top 20

Top 20 most popular open-source Security & Guardrails projects, ranked by GitHub Stars.

Anthropic Cybersecurity Skills

754 structured cybersecurity skills for AI agents mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND and NIST AI RMF. Works with Claude Code, Codex CLI, Cursor, Gemini CLI and 20+ platforms.

pythonsecurityagenttools

Promptfoo

22.8k Stars

CLI tool that combines LLM prompt testing with red-teaming.

promptfootestingred-teamcli

Promptfoo

22.8k Stars

Test and evaluate LLM prompts, agents, and RAG pipelines. Built-in red teaming and security evaluation for reliable AI applications.

testingevaluationred-teamingprompt-testing

SWE-agent

19.7k Stars

SWE-agent takes a GitHub issue and automatically generates fixes using your LLM of choice, also applicable to cybersecurity auditing and competitive coding. NeurIPS 2024 paper.

swecodingagentcybersecurity

OpenAI Evals

18.8k Stars

OpenAI's framework for evaluating LLMs and LLM systems, providing an open-source registry of benchmarks and tools for systematic model assessment.

llm-evaluationbenchmarkevalsred-teaming

PentAGI

18.0k Stars

Fully autonomous AI Agents system capable of performing complex penetration testing tasks using multi-agent architecture with support for multiple LLM providers.

securitytestingmulti-agentagent

PentestGPT

14.0k Stars

An automated penetration testing agentic framework powered by large language models for security testing and vulnerability discovery.

penetration-testingsecurityllmautomation

E2B

12.8k Stars

E2B provides secure cloud sandboxes for AI agents, supporting code execution, file operations, and isolated compute as an execution layer for coding and automation workflows.

sandboxcode-executionsecuritypython

Portkey AI Gateway

12.3k Stars

Portkey AI Gateway is a blazing fast AI gateway with integrated guardrails, routing to 200+ LLMs with 50+ AI guardrails through a single fast and friendly API.

gatewayllm-routingguardrailsai-safety

OpenSandbox

11.7k Stars

OpenSandbox is an open-source, secure, fast, and extensible sandbox runtime for AI agents, developed by Alibaba.

sandboxai-infrastructurekubernetessecurity

SkillSpector

11.6k Stars

NVIDIA's SkillSpector inspects and evaluates the tool-use and function-calling skills of LLM agents against safety, correctness, and performance criteria.

security-guardrailsmcpstatic-analysisnvidia

HexStrike AI

10.1k Stars

HexStrike AI is an advanced MCP server that lets AI agents autonomously run 150+ cybersecurity tools for automated pentesting, vulnerability discovery, and security research.

cybersecuritypentestingmcp-serversecurity

Presidio

9.8k Stars

Microsoft's open-source context-aware PII detection and de-identification SDK for text, images, and structured data, providing sensitive data protection for LLM applications and agents.

pii-detectiondata-maskingprivacynlp

GhidraMCP

9.4k Stars

MCP server for Ghidra reverse engineering platform, enabling AI agents to autonomously perform binary analysis and vulnerability discovery.

mcpreverse-engineeringghidrasecurity

CAI

9.3k Stars

Alias Robotics' open-source AI security research agent framework for multi-agent orchestration of cybersecurity tasks, integrating 300+ AI models, designed for red-team operations and security research.

cybersecurityai-agentsred-teampentest

Garak

8.3k Stars

NVIDIA's open-source LLM vulnerability scanner that automatically detects security issues in language models including safety vulnerabilities, hallucination tendencies, jailbreak risks, and prompt injection attacks.

llm-securityvulnerability-scannerllm-evaluationred-teaming

OpenShell

7.3k Stars

OpenShell is the safe, private runtime for autonomous AI agents, developed by NVIDIA. Provides controlled execution environments and resource management.

rustagentframeworksecurity

Guardrails AI

7.1k Stars

Guardrails AI adds programmable guardrails to large language models, ensuring reliability and safety through input/output validation, structured data extraction, and custom validators.

guardrailsllm-safetyvalidationoutput-validation

Guardrails AI

7.1k Stars

Open-source library for structured validation and safety guardrails on LLM outputs.

guardrailsvalidationsafetypython

Microsandbox

6.8k Stars

Secure, local, cross-platform and programmable sandboxes for AI agents. Provides strict resource isolation using microVM technology.

rustagenttoolssecurity

Agent 评估LLM 评测自动化测试

Agent Evaluation and Testing: From Vibe Checks to End-to-End Pipelines

Most teams evaluate agents by checking a few examples. Real evaluation needs layered metrics, non-rotting datasets, and judges that push back. This article provides runnable code patterns and a practical decision framework.

RAGhallucination-detectionagent-evaluation

Agent Hallucination Defense: Practical Mitigation Patterns Beyond Guardrails

Why do LLM agents hallucinate? This article traces root causes and systematically reviews practical mitigation patterns: retrieval augmentation, confidence scoring, multi-agent cross-validation, forced citation backtracking, and observability with UpTrain, Giskard, RagaAI Catalyst, Comet Opik, and NVIDIA Garak.

安全Prompt InjectionOWASP

Agent Prompt Injection Defense: OWASP LLM01 in Practice

Based on OWASP LLM Top 10 engineering practice, this article systematically explains the seven layers of defense-in-depth for agent prompt injection: input sanitization, instruction isolation, least-privilege, output auditing, guardrails frameworks, continuous red-teaming, and kill switches -- with actionable code and toolchains.

Best Security & Guardrails Top 20

Anthropic Cybersecurity Skills

Promptfoo

Promptfoo

SWE-agent

OpenAI Evals

PentAGI

PentestGPT

E2B

Portkey AI Gateway

OpenSandbox

SkillSpector

HexStrike AI

Presidio

GhidraMCP

CAI

Garak

OpenShell

Guardrails AI

Guardrails AI

Microsandbox

Related Articles

Agent Evaluation and Testing: From Vibe Checks to End-to-End Pipelines

Agent Hallucination Defense: Practical Mitigation Patterns Beyond Guardrails

Agent Prompt Injection Defense: OWASP LLM01 in Practice