Multi-Agent Collaboration Patterns: Supervisor vs Swarm vs Graph

When a single agent cannot handle a complex task, multi-agent collaboration becomes a necessary option. But "multi-agent" does not automatically mean "more powerful" -- choosing the wrong collaboration pattern often performs worse than a single agent, costs more, and is harder to maintain. This article provides a production-engineering comparison of three mainstream multi-agent collaboration patterns -- Supervisor, Swarm, and Graph -- with actionable selection criteria and implementation templates.

Why Multi-Agent Collaboration Is Needed

A single LLM agent struggles with three categories of tasks.

First, tasks that demand deep specialization. A composite task like "research + write code + write documentation" handled by a single agent produces mediocre output in all three domains. Multi-agent mode lets a "researcher agent," a "coder agent," and a "technical writer agent" each focus on its own subtask.

Second, tasks with exploding state space. When an agent must repeatedly switch between tools, data sources, and steps, the context window inflates rapidly. Splitting into multiple agents lets each agent maintain its own state space, while the orchestrator only passes summaries and decisions.

Third, tasks requiring audit and permission boundaries. In enterprise scenarios, "data query agents" and "action execution agents" should have different identities, different permissions, and different audit trails. Multi-agent mode naturally supports this isolation.

But multi-agent has significant costs: additional coordination overhead, debugging complexity, and token expense. Before adopting it, you must be confident the benefits outweigh the costs.

Pattern 1: Supervisor

The Supervisor pattern is the classic multi-agent architecture: a "conductor" agent handles task decomposition, sub-agent scheduling, and result aggregation. Sub-agents are independent and only report back to the Supervisor.

The Supervisor uses a state machine plus an LLM decision function to pick the next agent. Below is a LangGraph-based core implementation:

from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, AIMessage, SystemMessage

class SupervisorState(TypedDict):
    messages: list[BaseMessage]
    next_agent: str
    final_answer: str | None

def supervisor_router(state: SupervisorState) -> str:
    response = supervisor_llm.invoke([
        SystemMessage(content=(
            "You are a supervisor managing three agents: "
            "researcher (web search), coder (code execution), writer (technical writing). "
            "Reply with one of: researcher, coder, writer, FINISH."
        )),
        *state["messages"]
    ])
    decision = response.content.strip()
    if decision == "FINISH":
        return END
    return decision

def researcher_node(state: SupervisorState):
    result = researcher_agent.invoke(state["messages"])
    state["messages"].append(AIMessage(content=f"[Researcher] {result}"))
    return state

def coder_node(state: SupervisorState):
    result = coder_agent.invoke(state["messages"])
    state["messages"].append(AIMessage(content=f"[Coder] {result}"))
    return state

def writer_node(state: SupervisorState):
    result = writer_agent.invoke(state["messages"])
    state["messages"].append(AIMessage(content=f"[Writer] {result}"))
    return state

workflow = StateGraph(SupervisorState)
workflow.add_node("supervisor", supervisor_router)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_node("writer", writer_node)

workflow.add_conditional_edges(
    "supervisor",
    lambda state: state["next_agent"],
    {"researcher": "researcher", "coder": "coder", "writer": "writer", END: END}
)
for n in ["researcher", "coder", "writer"]:
    workflow.add_edge(n, "supervisor")
workflow.set_entry_point("supervisor")
app = workflow.compile()

Supervisor pattern strengths:

Centralized decisions: all scheduling logic lives in one node, easy to audit
Easy to understand: traditional master-worker model fits most mental models
Strong observability: all conversations flow through the Supervisor, traces are clear
Simple state management: single global state, no risk of sub-agent state drift

Supervisor pattern pain points:

Supervisor is a single bottleneck: every decision is mediated by LLM calls, latency and cost compound
Context inflation: the Supervisor must retain every sub-agent's conversation history
Error propagation: one bad dispatch by the Supervisor can derail the entire task
Hard to scale: adding a new sub-agent requires rewriting the Supervisor's prompt

Best for:

Tasks with clear structure and well-defined sub-task boundaries
Scenarios requiring strong audit trails (finance, healthcare, government)
Debugging and observability as priorities

Pattern 2: Swarm

The Swarm pattern is a lightweight multi-agent approach OpenAI introduced in 2024: eliminate the Supervisor and let agents hand off directly to each other. Each agent decides whether to "pass" the task to another agent.

from openai_agents import Agent, handoff

triage_agent = Agent(
    name="Triage Agent",
    instructions=(
        "You are the entry point. Based on the user's request, "
        "either answer directly or hand off to the appropriate specialist. "
        "Use handoff() when the request requires specialized handling."
    ),
    handoffs=[
        handoff(billing_agent, condition="When user asks about billing, invoices, or refunds"),
        handoff(tech_support_agent, condition="When user reports a technical issue or asks how-to questions"),
        handoff(sales_agent, condition="When user wants to upgrade, purchase, or learn about pricing"),
    ]
)

Core mechanisms of the Swarm pattern:

Each agent decides for itself whether to hand off (based on "handoff conditions" in its prompt)
The full conversation history is passed during handoff
There is no central coordinator; agents form a directed graph

Swarm pattern strengths:

Low latency: no Supervisor scheduling overhead
Natural division of labor: each agent decides what it is "not good at"
Easy to scale: add a new agent by including it in another agent's handoffs list
Simple to implement: a few dozen lines of code are enough to get started

Swarm pattern pain points:

No global view: there is no "conductor" aware of overall task progress
Loop risk: agent A hands off to B, B hands off back to A
Hard to debug: decisions are scattered across agents, traces are non-intuitive
State inconsistency: each agent maintains its own context, cross-agent state merging is complex

Best for:

Customer-service triage (triage plus several specialists)
Tasks with even granularity and no strong inter-agent dependencies
Simple tasks that need a quick prototype

Pattern 3: Graph

The Graph pattern is the "state machine" multi-agent architecture introduced by frameworks like LangGraph: the entire system is modeled as a directed graph, where nodes are agents or functions and edges are conditional routes. Unlike Supervisor mode, Graph does not require a centralized decision-maker; routing rules can be jointly determined by code, prompts, or learned models.

from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict

class ResearchState(TypedDict):
    query: str
    research_result: str | None
    verified_facts: list[str]
    code: str | None
    test_results: dict | None
    final_summary: str | None

def research_node(state: ResearchState):
    return {"research_result": research_agent.invoke(state["query"])}

def verify_node(state: ResearchState):
    return {"verified_facts": fact_checker_agent.invoke(state["research_result"])}

def should_retry_research(state: ResearchState) -> str:
    return "research" if len(state["verified_facts"]) < 3 else "code"

def code_node(state: ResearchState):
    return {"code": coder_agent.invoke(state["verified_facts"])}

def test_node(state: ResearchState):
    return {"test_results": test_runner.run(state["code"])}

def should_fix_code(state: ResearchState) -> str:
    return "code" if state["test_results"]["failed"] > 0 else "summarize"

def summarize_node(state: ResearchState):
    return {"final_summary": writer_agent.invoke({
        "research": state["research_result"],
        "code": state["code"],
        "tests": state["test_results"],
    })}

workflow = StateGraph(ResearchState)
workflow.add_node("research", research_node)
workflow.add_node("verify", verify_node)
workflow.add_node("code", code_node)
workflow.add_node("test", test_node)
workflow.add_node("summarize", summarize_node)

workflow.set_entry_point("research")
workflow.add_edge("research", "verify")
workflow.add_conditional_edges("verify", should_retry_research, {"research": "research", "code": "code"})
workflow.add_edge("code", "test")
workflow.add_conditional_edges("test", should_fix_code, {"code": "code", "summarize": "summarize"})
workflow.add_edge("summarize", END)

app = workflow.compile()

Graph pattern strengths:

High expressiveness: supports conditional routes, loops, parallelism, timeouts, and other complex control flows
High determinism: routing rules are controlled by code, behavior is predictable
Excellent observability: graph execution is itself the trace; every node's input and output are clear
Native support for human-in-the-loop: arbitrary nodes can be marked for human review
Persistence-friendly: graph state is serializable, supports checkpointing and resumption

Graph pattern pain points:

Implementation complexity: must design state schema, nodes, and routes
Upfront design cost: the shape of the graph must be planned in advance
Lower flexibility: fixed-path graphs struggle with completely novel tasks
Steeper learning curve: requires understanding of state machines and directed graphs

Best for:

Tasks with clear process and explicit stages
Scenarios requiring human-in-the-loop
Complex business logic (compliance, approval flows)
Long-lived production systems

Comparison of the Three Patterns

Dimension	Supervisor	Swarm	Graph
Decision centralization	Central (Supervisor)	Decentralized (per agent)	Central plus conditional routes
Implementation complexity	Medium	Low	High
Debugging difficulty	Medium (clear traces)	High (scattered decisions)	Low (graph is the trace)
Scalability	Medium (Supervisor prompt must change)	High (add handoffs)	Medium (modify graph structure)
Best for	Structured tasks, strong audit	Simple triage, customer service	Complex workflows, production systems
Context management	Single global	Per agent	Explicit state schema
Loop risk	Low	High (inter-agent deadlock)	Very low (code-controlled)
Typical frameworks	LangGraph, CrewAI, Agno	OpenAI Swarm, Agno Swarm	LangGraph, Agno

Practical Recommendations

Start with Supervisor. If your multi-agent task has five or fewer sub-agents, Supervisor mode is the easiest to understand and the easiest to debug. Even if the Supervisor later becomes a bottleneck, migrating to Graph mode is easier than going straight to Graph from scratch.

Swarm is suited for triage-style scenarios. If your multi-agent setup is essentially a triage plus a few specialists (customer service, ticket dispatch, lead routing), Swarm mode is lighter than Supervisor. But do not add "task progress tracking" logic into Swarm -- that breaks Swarm's "no center" advantage.

Use Graph for complex workflows. When your task involves loops (generate-verify-retry), parallelism (multiple researchers investigating simultaneously), or conditional branches (choose the next step based on intermediate results), Graph is the only paradigm that expresses these patterns elegantly. Graph mode is the "ultimate form of maintainable multi-agent systems."

Mix and match. Modern multi-agent frameworks (LangGraph, Agno) allow embedding Supervisor or Swarm sub-graphs inside Graph nodes, leveraging each pattern's strengths. For example: a top-level Graph controls overall flow (research -> implement -> test), and a single "implement" node is internally a Supervisor dispatching multiple sub-agents.

Implementation Checklist

Define task boundaries:

Decomposed into five or fewer sub-tasks: Supervisor is enough
Decomposed into 5-20 sub-tasks with loops/branches: Graph
Simple "triage + specialist" pattern: Swarm

Choose a framework:

LangGraph: preferred for Graph mode, LangChain ecosystem
CrewAI: role-playing Supervisor with simplified role definitions
Agno: lightweight, multi-pattern support
OpenAI Swarm: educational Swarm framework

Key design principles:

Give each agent clear "boundaries" -- what it should do, what it should not
Maintain a clear "handoff protocol" -- what state, what history is passed during handoff
Design "fallback paths" -- how to degrade when an agent fails
Cap context windows -- long sessions must be compressed or truncated
Monitor every agent's token consumption -- runaway agents are a common failure

Summary

Multi-agent collaboration is not "the more the better" -- it is "use the right pattern for the right task." Supervisor fits structured tasks, Swarm fits triage scenarios, Graph fits complex production systems. The three are not mutually exclusive; you can embed Supervisor sub-graphs inside a Graph to combine their strengths.

The cost of picking the wrong pattern is far higher than "start with a single agent" -- first evaluate whether a single agent is truly insufficient, then choose the simplest multi-agent pattern that works.

Reference frameworks: CrewAI (role-playing multi-agent), AG2 (AutoGen) (conversational multi-agent), LangGraph (de facto standard for Graph mode), OpenAI Swarm (lightweight Swarm paradigm), and Agno (multi-pattern multi-agent framework) cover the engineering implementations of the three main paradigms.

Multi-Agent Collaboration Patterns: Supervisor vs Swarm vs Graph

Why Multi-Agent Collaboration Is Needed

Pattern 1: Supervisor

Pattern 2: Swarm

Pattern 3: Graph

Comparison of the Three Patterns

Practical Recommendations

Implementation Checklist

Summary

Projects in this article

CrewAI

AG2

LangGraph

OpenAI Swarm

Agno