Pipecat
Type: full-code · Vendor: Daily / pipecat-ai community · Language: Python · License: BSD-2-Clause · Status: active · Status in practice: mature · First released: 2023-12-27
Open-source Python framework for building real-time voice and multimodal conversational agents by composing frame processors into pipelines that orchestrate STT, LLM, TTS, transports and tools.
Description. Pipecat is an open-source Python framework whose unit of design is a Pipeline of FrameProcessors that exchange Frames (audio, image, text, transcription, transport messages). Built-in services span STT (Deepgram, AssemblyAI, OpenAI Whisper, ...), LLMs (Anthropic, OpenAI, Gemini, ...), TTS (ElevenLabs, Google, Azure, ...) and speech-to-speech models (OpenAI Realtime, Gemini Multimodal Live), plus parallel pipelines, function calling, user-input muting and turn-detection strategies. Pipecat Flows layers conversation flows; the Subagents framework adds multi-agent.
Agent loop shape. Asynchronous pipeline of frame processors. A transport feeds audio frames in; an STT service converts them to transcripts; an LLM service produces text or tool calls; a TTS service streams audio out via the transport. Control and system frames orchestrate lifecycle. Parallel pipelines fan out with synchronized inputs and outputs. Function-calling frames interleave with the conversation; user-input muting and turn-detection strategies suppress processing during bot responses.
Primary use cases
- voice assistants and AI companions
- multimodal interfaces over WebRTC
- interactive storytelling and business agents
- complex dialog systems with structured flows
Key concepts
- Pipeline (docs) — Composition of FrameProcessors; ParallelPipeline runs branches with synchronized I/O.
- Frames (docs) — Audio, image, text, transcription and transport message frames plus control/system frames.
- FrameProcessor (docs) — Unit of pipeline logic; users write custom processors for domain logic.
- Services (docs) — STT / TTS / LLM / speech-to-speech integrations (50+ providers).
- Function calling → tool-use (docs) — LLM tool use within the voice pipeline.
- Pipecat Flows (docs) — Structured conversation-flow layer on top of pipelines.
Patterns this full-code implements —
- ★★Pipes and Filters
Pipecat's central abstraction is a Pipeline of FrameProcessors that exchange Frames — a textbook pipes-and-filters topology.
- ★★Stop / Cancel
User mute strategies suppress incoming user audio/transcriptions during critical bot operations; configurable interruption strategies gate barge-in.
- ★★Tool Use
Function calling is a first-class pipeline feature; handlers registered via register_function with per-function timeout and cancel-on-interruption controls.
- ★★Event-Driven Agent
Pipeline is built on streaming frames; high-priority SystemFrames carry interruptions, user-input, errors and lifecycle events; ControlFrames signal boundaries and config changes.
- ★★Streaming Typed Events
Frames are explicitly typed (TextFrame, LLMTextFrame, TranscriptionFrame, InterimTranscriptionFrame, OutputAudioRawFrame, TTSAudioRawFrame, OutputImageRawFrame) and stream through processors in guara…
- ★Multilingual Voice Agent Stack
Multilingual capability is delegated to chosen STT/TTS/LLM services; framework itself is language-agnostic so any multilingual provider stack works. Pipecat docs do not document a multilingual-specif…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.