Voice Agent Production Guide: LiveKit Agents from Prototype to Millions of Concurrent Calls

Voice AI agents are the next frontier. LiveKit (11k Stars, powering ChatGPT's Advanced Voice) offers a complete framework. This article breaks down the pipeline and walks through building production-ready voice agents.

AgentList Team · 2026年6月22日
语音 AgentLiveKit实时语音WebRTCVoice AISTTTTS

Why Voice Agents Are Different

Voice agents must deliver end-to-end latency under 500ms. A 3-second silence on a phone call means a dropped conversation — unlike chatbots where 2-5s is acceptable.

Pipeline Architecture

User Audio → VAD → STT → Agent (LLM) → TTS → User Audio
                  ↑_______________↓
              Turn Detection (Interruption Handling)

Budget: VAD < 50ms, STT < 300ms, LLM < 200ms, TTS < 200ms. End-to-end target: 400-600ms.

LiveKit Agents Framework

livekit/agents (11.1k Stars, Apache 2.0) — powers ChatGPT's Advanced Voice mode. Built on LiveKit WebRTC SFU + Agent SDK (Python/Node.js) + plugin ecosystem.

STT plugins: Deepgram, OpenAI Whisper, Azure, and more. TTS plugins: Cartesia, ElevenLabs, OpenAI, Azure, Deepgram, and more. LLM plugins: OpenAI Realtime, GPT, Claude, Groq, Together, Ollama.

Tool / MCP support: Bring tools and MCP servers into the conversation via Function Calling.

Quick Start

pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia
from livekit import agents
from livekit.agents import AgentServer, AgentSession, Agent, inference

server = AgentServer()

@server.rtc_session(agent_name="support-agent")
async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=inference.STT(model="deepgram/nova-3", language="multi"),
        llm=inference.LLM(model="openai/chat-latest"),
        tts=inference.TTS(model="cartesia/sonic-3"),
    )
    await session.start(room=ctx.room, agent=Agent(instructions="Hello, how can I help?"))

Production Considerations

  1. Interruption handling: Combine acoustic VAD with semantic turn detection (inference.TurnDetector()).
  2. Agent dispatch: Use lk dispatch create or the Python Server SDK to route calls.
  3. SIP integration: Connect to PSTN via LiveKit Phone Numbers or SIP Trunk — inbound, outbound, DTMF, recording.
  4. Observability: Transcripts, OpenTelemetry traces, turn-by-turn telemetry.
  5. Keep-alive: Prompt after 15s silence to avoid perceived drop.

Deployment

  • LiveKit Cloud: Managed, global edge nodes, 50h free monthly
  • Self-hosted: Docker Compose for LiveKit Server + agents (data sovereignty)

Open-Source Comparison

Feature LiveKit Pipecat Vocode
Stars 11.1k ~13k ~3.8k
MCP Native Community None
SIP Native DIY Limited
Cloud Yes No No

Summary

Three key decisions: STT/TTS selection (Deepgram + Cartesia for best latency/quality), interruption strategy (semantic turn detection required), deployment path (Cloud for validation, self-host for scale).