Voice Agent Production Guide: LiveKit Agents from Prototype to Millions of Concurrent Calls

Why Voice Agents Are Different

Voice agents must deliver end-to-end latency under 500ms. A 3-second silence on a phone call means a dropped conversation — unlike chatbots where 2-5s is acceptable.

Pipeline Architecture

User Audio → VAD → STT → Agent (LLM) → TTS → User Audio
                  ↑_______________↓
              Turn Detection (Interruption Handling)

Budget: VAD < 50ms, STT < 300ms, LLM < 200ms, TTS < 200ms. End-to-end target: 400-600ms.

LiveKit Agents Framework

livekit/agents (11.1k Stars, Apache 2.0) — powers ChatGPT's Advanced Voice mode. Built on LiveKit WebRTC SFU + Agent SDK (Python/Node.js) + plugin ecosystem.

STT plugins: Deepgram, OpenAI Whisper, Azure, and more. TTS plugins: Cartesia, ElevenLabs, OpenAI, Azure, Deepgram, and more. LLM plugins: OpenAI Realtime, GPT, Claude, Groq, Together, Ollama.

Tool / MCP support: Bring tools and MCP servers into the conversation via Function Calling.

Quick Start

pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia

from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, inference

server = AgentServer()

@server.rtc_session(agent_name="support-agent")
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=inference.STT(model="deepgram/nova-3", language="multi"),
        llm=inference.LLM(model="openai/chat-latest"),
        tts=inference.TTS(model="cartesia/sonic-3"),
    )
    await session.start(room=ctx.room, agent=Agent(instructions="Hello, how can I help?"))

Production Considerations

Interruption handling: Combine acoustic VAD with semantic turn detection (inference.TurnDetector()).
Agent dispatch: Use lk dispatch create or the Python Server SDK to route calls.
SIP integration: Connect to PSTN via LiveKit Phone Numbers or SIP Trunk — inbound, outbound, DTMF, recording.
Observability: Transcripts, OpenTelemetry traces, turn-by-turn telemetry.
Keep-alive: Prompt after 15s silence to avoid perceived drop.

Deployment

LiveKit Cloud: Managed, global edge nodes, 50h free monthly
Self-hosted: Docker Compose for LiveKit Server + agents (data sovereignty)

Open-Source Comparison

Feature	LiveKit	Pipecat	Vocode
Stars	11.1k	~13k	~3.8k
MCP	Native	Community	None
SIP	Native	DIY	Limited
Cloud	Yes	No	No

Summary

Three key decisions: STT/TTS selection (Deepgram + Cartesia for best latency/quality), interruption strategy (semantic turn detection required), deployment path (Cloud for validation, self-host for scale).

Voice Agent Production Guide: LiveKit Agents from Prototype to Millions of Concurrent Calls

Why Voice Agents Are Different

Pipeline Architecture

LiveKit Agents Framework

Quick Start

Production Considerations

Deployment

Open-Source Comparison

Summary

Projects in this article

LiveKit Agents

LiveKit

Pipecat

Open WebUI