forge

Active

Description

Forge is a self-hosted LLM tool-calling framework for function calling and multi-step agentic workflows using local models like Ollama and llamafile.

Key Features

Proxy server mode drops guardrails into any OpenAI-compatible or Anthropic client without code changes
Rescue parsing handles malformed tool calls from Mistral, Qwen, and fenced-JSON formats automatically
WorkflowRunner with SlotWorker provides priority-queued GPU access for multi-agent architectures
Lifts small local models (8B) from single digits to 84% accuracy on structured tool-calling evals
Composable middleware allows injecting response validation and retry nudges into custom orchestration loops
Supports Ollama, llama-server, Llamafile, vLLM, and Anthropic as inference backends

Use Cases

💡 Improve tool-calling reliability of self-hosted LLMs in agentic coding workflows

💡 Wrap existing harnesses (opencode, aider, Cline) with guardrails via the proxy server

💡 Run structured multi-step agent workflows with automatic context compaction

💡 Share GPU inference slots across specialist agents with preemption-based scheduling

💡 Validate and rescue malformed tool outputs in production LLM pipelines

Quick Start

Install via `pip install forge-guardrails`. Start a backend like llama-server, then run `python -m forge.proxy --backend-url http://localhost:8080 --port 8081`. Point your client at `http://localhost:8081/v1` and forge applies guardrails transparently.

Visit GitHub

forge

Description

Key Features

Use Cases

Tags

Categories

Quick Start

Related Projects

DSPy

AgentStation

Hephaestus

Julep