Assistants

Overview

Assistants define voice agent behavior, prompt/instructions, interaction settings, optional tools, and optional end-call behavior.

Assistant execution supports two LLM modes:

pipeline: OpenAI realtime handles STT+LLM and a separate TTS provider speaks output.
realtime: Gemini realtime handles STT+LLM+TTS in one model.

Supported TTS providers for pipeline mode are cartesia, sarvam, elevenlabs, and mistral.

assistant_llm_mode="pipeline" requires both assistant_tts_model and assistant_tts_config.
assistant_llm_mode="realtime" requires assistant_llm_config.
In realtime mode, assistant_tts_model and assistant_tts_config are ignored by runtime.
assistant_start_instruction is used as the opening response when assistant_interaction_config.speaks_first=true.
assistant_interaction_config.speaks_first works in both pipeline and realtime modes.