Full-Code · Voice & Conversationalactive

Pipecat

Type: full-code  ·  Vendor: Daily / pipecat-ai community  ·  Language: Python  ·  License: BSD-2-Clause  ·  Status: active  ·  Status in practice: mature  ·  First released: 2023-12-27

Links: homepage docs repo

Open-source Python framework for building real-time voice and multimodal conversational agents by composing frame processors into pipelines that orchestrate STT, LLM, TTS, transports and tools.

Description. Pipecat is an open-source Python framework whose unit of design is a Pipeline of FrameProcessors that exchange Frames (audio, image, text, transcription, transport messages). Built-in services span STT (Deepgram, AssemblyAI, OpenAI Whisper, ...), LLMs (Anthropic, OpenAI, Gemini, ...), TTS (ElevenLabs, Google, Azure, ...) and speech-to-speech models (OpenAI Realtime, Gemini Multimodal Live), plus parallel pipelines, function calling, user-input muting and turn-detection strategies. Pipecat Flows layers conversation flows; the Subagents framework adds multi-agent.

Agent loop shape. Asynchronous pipeline of frame processors. A transport feeds audio frames in; an STT service converts them to transcripts; an LLM service produces text or tool calls; a TTS service streams audio out via the transport. Control and system frames orchestrate lifecycle. Parallel pipelines fan out with synchronized inputs and outputs. Function-calling frames interleave with the conversation; user-input muting and turn-detection strategies suppress processing during bot responses.

Primary use cases

  • voice assistants and AI companions
  • multimodal interfaces over WebRTC
  • interactive storytelling and business agents
  • complex dialog systems with structured flows

Key concepts

  • Pipeline (docs)Composition of FrameProcessors; ParallelPipeline runs branches with synchronized I/O.
  • Frames (docs)Audio, image, text, transcription and transport message frames plus control/system frames.
  • FrameProcessor (docs)Unit of pipeline logic; users write custom processors for domain logic.
  • Services (docs)STT / TTS / LLM / speech-to-speech integrations (50+ providers).
  • Function calling tool-use (docs)LLM tool use within the voice pipeline.
  • Pipecat Flows (docs)Structured conversation-flow layer on top of pipelines.

Patterns this full-code implements

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.