Pipecat
Open-source Python framework for building real-time voice and multimodal conversational agents by composing frame processors into pipelines that orchestrate STT, LLM, TTS, transports and tools.
Description
Pipecat is an open-source Python framework whose unit of design is a Pipeline of FrameProcessors that exchange Frames (audio, image, text, transcription, transport messages). Built-in services span STT (Deepgram, AssemblyAI, OpenAI Whisper, ...), LLMs (Anthropic, OpenAI, Gemini, ...), TTS (ElevenLabs, Google, Azure, ...) and speech-to-speech models (OpenAI Realtime, Gemini Multimodal Live), plus parallel pipelines, function calling, user-input muting and turn-detection strategies. Pipecat Flows layers conversation flows; the Subagents framework adds multi-agent.
Solution
Asynchronous pipeline of frame processors. A transport feeds audio frames in; an STT service converts them to transcripts; an LLM service produces text or tool calls; a TTS service streams audio out via the transport. Control and system frames orchestrate lifecycle. Parallel pipelines fan out with synchronized inputs and outputs. Function-calling frames interleave with the conversation; user-input muting and turn-detection strategies suppress processing during bot responses.
Primary use cases
- voice assistants and AI companions
- multimodal interfaces over WebRTC
- interactive storytelling and business agents
- complex dialog systems with structured flows
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.