Assistants
Overview
Assistants define voice agent behavior, prompt/instructions, interaction settings, optional tools, and optional end-call behavior.
Assistant execution supports two LLM modes:
pipeline: OpenAI realtime handles STT+LLM and a separate TTS provider speaks output.realtime: Gemini realtime handles STT+LLM+TTS in one model.
Supported TTS providers for pipeline mode are cartesia, sarvam, elevenlabs, and mistral.
Mode Rules
assistant_llm_mode="pipeline"requires bothassistant_tts_modelandassistant_tts_config.assistant_llm_mode="realtime"requiresassistant_llm_config.- In
realtimemode,assistant_tts_modelandassistant_tts_configare ignored by runtime. assistant_start_instructionis used as the opening response whenassistant_interaction_config.speaks_first=true.assistant_interaction_config.speaks_firstworks in bothpipelineandrealtimemodes.