Skip to content

Assistants

Overview

Assistants define voice agent behavior, prompt/instructions, interaction settings, optional tools, and optional end-call behavior.

Assistant execution supports two LLM modes:

  • pipeline: OpenAI realtime handles STT+LLM and a separate TTS provider speaks output.
  • realtime: Gemini realtime handles STT+LLM+TTS in one model.

Supported TTS providers for pipeline mode are cartesia, sarvam, elevenlabs, and mistral.

Mode Rules

  • assistant_llm_mode="pipeline" requires both assistant_tts_model and assistant_tts_config.
  • assistant_llm_mode="realtime" requires assistant_llm_config.
  • In realtime mode, assistant_tts_model and assistant_tts_config are ignored by runtime.
  • assistant_start_instruction is used as the opening response when assistant_interaction_config.speaks_first=true.
  • assistant_interaction_config.speaks_first works in both pipeline and realtime modes.

Endpoints