Full-Code · Voice & Conversationalactive

Pipecat

Type: full-code · Vendor: Daily / pipecat-ai community · Language: Python · License: BSD-2-Clause · Status: active · Status in practice: mature · First released: 2023-12-27

Links: homepage docs repo

Open-source Python framework for building real-time voice and multimodal conversational agents by composing frame processors into pipelines that orchestrate STT, LLM, TTS, transports and tools.

Description. Pipecat is an open-source Python framework whose unit of design is a Pipeline of FrameProcessors that exchange Frames (audio, image, text, transcription, transport messages). Built-in services span STT (Deepgram, AssemblyAI, OpenAI Whisper, ...), LLMs (Anthropic, OpenAI, Gemini, ...), TTS (ElevenLabs, Google, Azure, ...) and speech-to-speech models (OpenAI Realtime, Gemini Multimodal Live), plus parallel pipelines, function calling, user-input muting and turn-detection strategies. Pipecat Flows layers conversation flows; the Subagents framework adds multi-agent.

Agent loop shape. Asynchronous pipeline of frame processors. A transport feeds audio frames in; an STT service converts them to transcripts; an LLM service produces text or tool calls; a TTS service streams audio out via the transport. Control and system frames orchestrate lifecycle. Parallel pipelines fan out with synchronized inputs and outputs. Function-calling frames interleave with the conversation; user-input muting and turn-detection strategies suppress processing during bot responses.

Primary use cases

voice assistants and AI companions
multimodal interfaces over WebRTC
interactive storytelling and business agents
complex dialog systems with structured flows

flowchart TD transport[Transport WebRTC/Daily/LiveKit] --> audioin[Audio frames in] audioin --> stt[STT FrameProcessor] stt --> ctx[Context frames] ctx --> llm[LLM FrameProcessor] llm --> fc{Function call?} fc -->|yes| tool[Tool FrameProcessor] tool --> llm fc -->|no| tts[TTS FrameProcessor] tts --> audioout[Audio frames out] audioout --> transport parallel[ParallelPipeline] -.synced I/O.-> llm mute[User input muting] -.turn detection.-> stt flows[Pipecat Flows] -.structured turns.-> llm

Key concepts

Pipeline (docs) — Composition of FrameProcessors; ParallelPipeline runs branches with synchronized I/O.
Frames (docs) — Audio, image, text, transcription and transport message frames plus control/system frames.
FrameProcessor (docs) — Unit of pipeline logic; users write custom processors for domain logic.
Services (docs) — STT / TTS / LLM / speech-to-speech integrations (50+ providers).
Function calling → tool-use (docs) — LLM tool use within the voice pipeline.
Pipecat Flows (docs) — Structured conversation-flow layer on top of pipelines.

Pipecat

Neighbourhood

Alternatives & relatives

Listed as alternative by (4)

References

Provenance