Framework · Voice & Conversational

Hume EVI

Hosted speech-to-speech voice API from Hume AI that pairs an emotionally aware response model with a configurable supplemental LLM, measuring vocal prosody and adapting tone in real time.

Description

EVI is Hume AI's real-time speech-to-speech interface. It streams measurements of the tune, rhythm and timbre of the user's voice, reacts with matching prosody, and remains interruptible at all times. EVI is configured through a Config object (system prompt, voice, supplemental LLM, tools) and can be supplemented with partner LLMs from Anthropic, OpenAI, Google or Fireworks. Tool use and built-in tools are first-class but parallel function calls are not yet supported. Chat Groups link sessions so a conversation can resume across disconnects.

Solution

Hosted speech-to-speech loop over a WebSocket. Caller audio streams in; EVI emits prosody measurements, decides the response (optionally via a supplemental LLM), streams generated speech back, and is interruptible by design. Tool calls go out to the developer's backend except for Hume's built-in tools which it invokes itself. A Chat Group id can be passed to resume across reconnects.

Primary use cases

emotionally aware voice agents across consumer and support apps
multilingual speech-to-speech experiences on EVI 4 / 4-mini
function-calling voice agents that hit external APIs
long-running conversations resumed via Chat Groups

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.