Multi-Agent Collaboration Patterns: Supervisor vs Swarm vs Graph
A systematic comparison of three mainstream multi-agent collaboration patterns: Supervisor, Swarm, and Graph. Actionable selection criteria, applicable scenarios, typical frameworks, and mixed-use strategies.
Multi-Agent Collaboration Patterns: Supervisor vs Swarm vs Graph
When a single agent cannot handle a complex task, multi-agent collaboration becomes a necessary option. But "multi-agent" does not automatically mean "more powerful" -- choosing the wrong collaboration pattern often performs worse than a single agent, costs more, and is harder to maintain. This article provides a production-engineering comparison of three mainstream multi-agent collaboration patterns -- Supervisor, Swarm, and Graph -- with actionable selection criteria and implementation templates.
Why Multi-Agent Collaboration Is Needed
A single LLM agent struggles with three categories of tasks.
First, tasks that demand deep specialization. A composite task like "research + write code + write documentation" handled by a single agent produces mediocre output in all three domains. Multi-agent mode lets a "researcher agent," a "coder agent," and a "technical writer agent" each focus on its own subtask.
Second, tasks with exploding state space. When an agent must repeatedly switch between tools, data sources, and steps, the context window inflates rapidly. Splitting into multiple agents lets each agent maintain its own state space, while the orchestrator only passes summaries and decisions.
Third, tasks requiring audit and permission boundaries. In enterprise scenarios, "data query agents" and "action execution agents" should have different identities, different permissions, and different audit trails. Multi-agent mode naturally supports this isolation.
But multi-agent has significant costs: additional coordination overhead, debugging complexity, and token expense. Before adopting it, you must be confident the benefits outweigh the costs.
Pattern 1: Supervisor
The Supervisor pattern is the classic multi-agent architecture: a "conductor" agent handles task decomposition, sub-agent scheduling, and result aggregation. Sub-agents are independent and only report back to the Supervisor.
The Supervisor uses a state machine plus an LLM decision function to pick the next agent. Below is a LangGraph-based core implementation:
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, AIMessage, SystemMessage
class SupervisorState(TypedDict):
messages: list[BaseMessage]
next_agent: str
final_answer: str | None
def supervisor_router(state: SupervisorState) -> str:
response = supervisor_llm.invoke([
SystemMessage(content=(
"You are a supervisor managing three agents: "
"researcher (web search), coder (code execution), writer (technical writing). "
"Reply with one of: researcher, coder, writer, FINISH."
)),
*state["messages"]
])
decision = response.content.strip()
if decision == "FINISH":
return END
return decision
def researcher_node(state: SupervisorState):
result = researcher_agent.invoke(state["messages"])
state["messages"].append(AIMessage(content=f"[Researcher] {result}"))
return state
def coder_node(state: SupervisorState):
result = coder_agent.invoke(state["messages"])
state["messages"].append(AIMessage(content=f"[Coder] {result}"))
return state
def writer_node(state: SupervisorState):
result = writer_agent.invoke(state["messages"])
state["messages"].append(AIMessage(content=f"[Writer] {result}"))
return state
workflow = StateGraph(SupervisorState)
workflow.add_node("supervisor", supervisor_router)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_node("writer", writer_node)
workflow.add_conditional_edges(
"supervisor",
lambda state: state["next_agent"],
{"researcher": "researcher", "coder": "coder", "writer": "writer", END: END}
)
for n in ["researcher", "coder", "writer"]:
workflow.add_edge(n, "supervisor")
workflow.set_entry_point("supervisor")
app = workflow.compile()
Supervisor pattern strengths:
- Centralized decisions: all scheduling logic lives in one node, easy to audit
- Easy to understand: traditional master-worker model fits most mental models
- Strong observability: all conversations flow through the Supervisor, traces are clear
- Simple state management: single global state, no risk of sub-agent state drift
Supervisor pattern pain points:
- Supervisor is a single bottleneck: every decision is mediated by LLM calls, latency and cost compound
- Context inflation: the Supervisor must retain every sub-agent's conversation history
- Error propagation: one bad dispatch by the Supervisor can derail the entire task
- Hard to scale: adding a new sub-agent requires rewriting the Supervisor's prompt
Best for:
- Tasks with clear structure and well-defined sub-task boundaries
- Scenarios requiring strong audit trails (finance, healthcare, government)
- Debugging and observability as priorities
Pattern 2: Swarm
The Swarm pattern is a lightweight multi-agent approach OpenAI introduced in 2024: eliminate the Supervisor and let agents hand off directly to each other. Each agent decides whether to "pass" the task to another agent.
from openai_agents import Agent, handoff
triage_agent = Agent(
name="Triage Agent",
instructions=(
"You are the entry point. Based on the user's request, "
"either answer directly or hand off to the appropriate specialist. "
"Use handoff() when the request requires specialized handling."
),
handoffs=[
handoff(billing_agent, condition="When user asks about billing, invoices, or refunds"),
handoff(tech_support_agent, condition="When user reports a technical issue or asks how-to questions"),
handoff(sales_agent, condition="When user wants to upgrade, purchase, or learn about pricing"),
]
)
Core mechanisms of the Swarm pattern:
- Each agent decides for itself whether to hand off (based on "handoff conditions" in its prompt)
- The full conversation history is passed during handoff
- There is no central coordinator; agents form a directed graph
Swarm pattern strengths:
- Low latency: no Supervisor scheduling overhead
- Natural division of labor: each agent decides what it is "not good at"
- Easy to scale: add a new agent by including it in another agent's handoffs list
- Simple to implement: a few dozen lines of code are enough to get started
Swarm pattern pain points:
- No global view: there is no "conductor" aware of overall task progress
- Loop risk: agent A hands off to B, B hands off back to A
- Hard to debug: decisions are scattered across agents, traces are non-intuitive
- State inconsistency: each agent maintains its own context, cross-agent state merging is complex
Best for:
- Customer-service triage (triage plus several specialists)
- Tasks with even granularity and no strong inter-agent dependencies
- Simple tasks that need a quick prototype
Pattern 3: Graph
The Graph pattern is the "state machine" multi-agent architecture introduced by frameworks like LangGraph: the entire system is modeled as a directed graph, where nodes are agents or functions and edges are conditional routes. Unlike Supervisor mode, Graph does not require a centralized decision-maker; routing rules can be jointly determined by code, prompts, or learned models.
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
class ResearchState(TypedDict):
query: str
research_result: str | None
verified_facts: list[str]
code: str | None
test_results: dict | None
final_summary: str | None
def research_node(state: ResearchState):
return {"research_result": research_agent.invoke(state["query"])}
def verify_node(state: ResearchState):
return {"verified_facts": fact_checker_agent.invoke(state["research_result"])}
def should_retry_research(state: ResearchState) -> str:
return "research" if len(state["verified_facts"]) < 3 else "code"
def code_node(state: ResearchState):
return {"code": coder_agent.invoke(state["verified_facts"])}
def test_node(state: ResearchState):
return {"test_results": test_runner.run(state["code"])}
def should_fix_code(state: ResearchState) -> str:
return "code" if state["test_results"]["failed"] > 0 else "summarize"
def summarize_node(state: ResearchState):
return {"final_summary": writer_agent.invoke({
"research": state["research_result"],
"code": state["code"],
"tests": state["test_results"],
})}
workflow = StateGraph(ResearchState)
workflow.add_node("research", research_node)
workflow.add_node("verify", verify_node)
workflow.add_node("code", code_node)
workflow.add_node("test", test_node)
workflow.add_node("summarize", summarize_node)
workflow.set_entry_point("research")
workflow.add_edge("research", "verify")
workflow.add_conditional_edges("verify", should_retry_research, {"research": "research", "code": "code"})
workflow.add_edge("code", "test")
workflow.add_conditional_edges("test", should_fix_code, {"code": "code", "summarize": "summarize"})
workflow.add_edge("summarize", END)
app = workflow.compile()
Graph pattern strengths:
- High expressiveness: supports conditional routes, loops, parallelism, timeouts, and other complex control flows
- High determinism: routing rules are controlled by code, behavior is predictable
- Excellent observability: graph execution is itself the trace; every node's input and output are clear
- Native support for human-in-the-loop: arbitrary nodes can be marked for human review
- Persistence-friendly: graph state is serializable, supports checkpointing and resumption
Graph pattern pain points:
- Implementation complexity: must design state schema, nodes, and routes
- Upfront design cost: the shape of the graph must be planned in advance
- Lower flexibility: fixed-path graphs struggle with completely novel tasks
- Steeper learning curve: requires understanding of state machines and directed graphs
Best for:
- Tasks with clear process and explicit stages
- Scenarios requiring human-in-the-loop
- Complex business logic (compliance, approval flows)
- Long-lived production systems
Comparison of the Three Patterns
| Dimension | Supervisor | Swarm | Graph |
|---|---|---|---|
| Decision centralization | Central (Supervisor) | Decentralized (per agent) | Central plus conditional routes |
| Implementation complexity | Medium | Low | High |
| Debugging difficulty | Medium (clear traces) | High (scattered decisions) | Low (graph is the trace) |
| Scalability | Medium (Supervisor prompt must change) | High (add handoffs) | Medium (modify graph structure) |
| Best for | Structured tasks, strong audit | Simple triage, customer service | Complex workflows, production systems |
| Context management | Single global | Per agent | Explicit state schema |
| Loop risk | Low | High (inter-agent deadlock) | Very low (code-controlled) |
| Typical frameworks | LangGraph, CrewAI, Agno | OpenAI Swarm, Agno Swarm | LangGraph, Agno |
Practical Recommendations
Start with Supervisor. If your multi-agent task has five or fewer sub-agents, Supervisor mode is the easiest to understand and the easiest to debug. Even if the Supervisor later becomes a bottleneck, migrating to Graph mode is easier than going straight to Graph from scratch.
Swarm is suited for triage-style scenarios. If your multi-agent setup is essentially a triage plus a few specialists (customer service, ticket dispatch, lead routing), Swarm mode is lighter than Supervisor. But do not add "task progress tracking" logic into Swarm -- that breaks Swarm's "no center" advantage.
Use Graph for complex workflows. When your task involves loops (generate-verify-retry), parallelism (multiple researchers investigating simultaneously), or conditional branches (choose the next step based on intermediate results), Graph is the only paradigm that expresses these patterns elegantly. Graph mode is the "ultimate form of maintainable multi-agent systems."
Mix and match. Modern multi-agent frameworks (LangGraph, Agno) allow embedding Supervisor or Swarm sub-graphs inside Graph nodes, leveraging each pattern's strengths. For example: a top-level Graph controls overall flow (research -> implement -> test), and a single "implement" node is internally a Supervisor dispatching multiple sub-agents.
Implementation Checklist
Define task boundaries:
- Decomposed into five or fewer sub-tasks: Supervisor is enough
- Decomposed into 5-20 sub-tasks with loops/branches: Graph
- Simple "triage + specialist" pattern: Swarm
Choose a framework:
- LangGraph: preferred for Graph mode, LangChain ecosystem
- CrewAI: role-playing Supervisor with simplified role definitions
- Agno: lightweight, multi-pattern support
- OpenAI Swarm: educational Swarm framework
Key design principles:
- Give each agent clear "boundaries" -- what it should do, what it should not
- Maintain a clear "handoff protocol" -- what state, what history is passed during handoff
- Design "fallback paths" -- how to degrade when an agent fails
- Cap context windows -- long sessions must be compressed or truncated
- Monitor every agent's token consumption -- runaway agents are a common failure
Summary
Multi-agent collaboration is not "the more the better" -- it is "use the right pattern for the right task." Supervisor fits structured tasks, Swarm fits triage scenarios, Graph fits complex production systems. The three are not mutually exclusive; you can embed Supervisor sub-graphs inside a Graph to combine their strengths.
The cost of picking the wrong pattern is far higher than "start with a single agent" -- first evaluate whether a single agent is truly insufficient, then choose the simplest multi-agent pattern that works.
Reference frameworks: CrewAI (role-playing multi-agent), AG2 (AutoGen) (conversational multi-agent), LangGraph (de facto standard for Graph mode), OpenAI Swarm (lightweight Swarm paradigm), and Agno (multi-pattern multi-agent framework) cover the engineering implementations of the three main paradigms.
Projects in this article
CrewAI
54.6k ⭐CrewAI is a multi-agent framework for orchestrating role-playing, autonomous AI agents that collaborate like a team to tackle complex tasks.
AG2
4.7k ⭐AG2 (formerly AutoGen) is an open-source AgentOS providing a multi-agent conversation framework with flexible agent orchestration, tool integration, and distributed collaboration for building complex multi-agent systems.
LangGraph
36.2k ⭐LangGraph is a framework for building controllable, debuggable, long-running stateful agents, expressing agent state and control flow as a graph.
OpenAI Swarm
21.8k ⭐OpenAI Swarm is a lightweight multi-agent collaboration framework focused on simplicity and controllability, ideal for learning and prototyping.
Agno
40.9k ⭐Agno is a high-performance agent framework for building multimodal AI agents with memory, knowledge, and tool-use capabilities, supporting multiple LLM providers.