forge

Active
GitHub Python MIT

Description

Forge is a self-hosted LLM tool-calling framework for function calling and multi-step agentic workflows using local models like Ollama and llamafile.

Key Features

  • Proxy server mode drops guardrails into any OpenAI-compatible or Anthropic client without code changes
  • Rescue parsing handles malformed tool calls from Mistral, Qwen, and fenced-JSON formats automatically
  • WorkflowRunner with SlotWorker provides priority-queued GPU access for multi-agent architectures
  • Lifts small local models (8B) from single digits to 84% accuracy on structured tool-calling evals
  • Composable middleware allows injecting response validation and retry nudges into custom orchestration loops
  • Supports Ollama, llama-server, Llamafile, vLLM, and Anthropic as inference backends

Use Cases

πŸ’‘ Improve tool-calling reliability of self-hosted LLMs in agentic coding workflows
πŸ’‘ Wrap existing harnesses (opencode, aider, Cline) with guardrails via the proxy server
πŸ’‘ Run structured multi-step agent workflows with automatic context compaction
πŸ’‘ Share GPU inference slots across specialist agents with preemption-based scheduling
πŸ’‘ Validate and rescue malformed tool outputs in production LLM pipelines

Quick Start

Install via `pip install forge-guardrails`. Start a backend like llama-server, then run `python -m forge.proxy --backend-url http://localhost:8080 --port 8081`. Point your client at `http://localhost:8081/v1` and forge applies guardrails transparently.

Related Projects