forge
ActiveDescription
Forge is a self-hosted LLM tool-calling framework for function calling and multi-step agentic workflows using local models like Ollama and llamafile.
Key Features
- Proxy server mode drops guardrails into any OpenAI-compatible or Anthropic client without code changes
- Rescue parsing handles malformed tool calls from Mistral, Qwen, and fenced-JSON formats automatically
- WorkflowRunner with SlotWorker provides priority-queued GPU access for multi-agent architectures
- Lifts small local models (8B) from single digits to 84% accuracy on structured tool-calling evals
- Composable middleware allows injecting response validation and retry nudges into custom orchestration loops
- Supports Ollama, llama-server, Llamafile, vLLM, and Anthropic as inference backends
Use Cases
π‘ Improve tool-calling reliability of self-hosted LLMs in agentic coding workflows
π‘ Wrap existing harnesses (opencode, aider, Cline) with guardrails via the proxy server
π‘ Run structured multi-step agent workflows with automatic context compaction
π‘ Share GPU inference slots across specialist agents with preemption-based scheduling
π‘ Validate and rescue malformed tool outputs in production LLM pipelines
Categories
Quick Start
Install via `pip install forge-guardrails`. Start a backend like llama-server, then run `python -m forge.proxy --backend-url http://localhost:8080 --port 8081`. Point your client at `http://localhost:8081/v1` and forge applies guardrails transparently.