Framework · Voice & Conversational

Pipecat

Open-source Python framework for building real-time voice and multimodal conversational agents by composing frame processors into pipelines that orchestrate STT, LLM, TTS, transports and tools.

Description

Pipecat is an open-source Python framework whose unit of design is a Pipeline of FrameProcessors that exchange Frames (audio, image, text, transcription, transport messages). Built-in services span STT (Deepgram, AssemblyAI, OpenAI Whisper, ...), LLMs (Anthropic, OpenAI, Gemini, ...), TTS (ElevenLabs, Google, Azure, ...) and speech-to-speech models (OpenAI Realtime, Gemini Multimodal Live), plus parallel pipelines, function calling, user-input muting and turn-detection strategies. Pipecat Flows layers conversation flows; the Subagents framework adds multi-agent.

Solution

Asynchronous pipeline of frame processors. A transport feeds audio frames in; an STT service converts them to transcripts; an LLM service produces text or tool calls; a TTS service streams audio out via the transport. Control and system frames orchestrate lifecycle. Parallel pipelines fan out with synchronized inputs and outputs. Function-calling frames interleave with the conversation; user-input muting and turn-detection strategies suppress processing during bot responses.

Primary use cases

voice assistants and AI companions
multimodal interfaces over WebRTC
interactive storytelling and business agents
complex dialog systems with structured flows

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.