Introduction
The Realtime Speech API Playground is an interactive environment designed to enable developers, businesses, and hobbyists to experiment with and fine-tune AI-powered speech interactions. It allows users to test and understand the capabilities of the platform in real-time, providing the ability to simulate various conversational scenarios using customizable settings. Whether you’re building voice assistants, automated customer support agents, or seamless conversational AI experiences, the Playground equips you with the tools to create, test, and refine your AI-driven speech applications.
Interact with Agent
- Adjust
Playground Settingsto tailor conversations with your AI agent. - Choose between
ManualorVADmode, then clickConnectto start interacting. - View and review conversation logs in the
Transcriptionstab. - Use the
Functionspanel to integrate tool calling and track actions triggered by the AI agent.
Key Features of the Playground
Tune your playground for testing various scenarios for your AI agent- Customizable system Prompts: Define your AI agent’s behavior and personality by setting a system prompt and an optional default message. This helps guide how the agent understands context and responds to user input.
- Custom prompt box: Input your unique system prompt to define the AI’s context.
- Default Input Greeting: Optionally add a starting message or greeting that initializes the interaction context.
- Template agent prompts: Choose from a set of pre-defined tailored prompts for some common scenarios to build your
AI agents.- Available templates: Tech support, Real estate, Human Resources, Healthcare
- Flexible: Quickly set up and modify if needed to test scenarios.
- Model Selection and response settings: Customize How Your AI Thinks, Sounds, and Responds.
- Input Language: Choose from multiple input languages to match your AI agent’s needs.
- Model selection: Choose from the available models based on your requirements. Optimize your AI’s performance with models tailored for efficiency, speed, or high-quality, use-specific output.
- Voice Settings: Personalize your AI agent’s interaction style with a variety of voice modes.
Voice Type: Male, female, or neutral.Language and Accent: Choose from multiple languages and regional accents to better suit your audience.Prosodic Features: Choose a voice that best matches your desired pitch, intonation, and stress patterns for enhanced naturalness.Tonality: Select voices with emotional tones suitable for your use case.Speed: Opt for voices with appropriate speech rates for clarity and audience suitability.
- Temperature Control: Temperature determines the randomness of the AI’s responses. Customize the variability in your AI agent’s responses to suit your specific needs.
- Low Temperature (e.g., 0.2): Produces consistent and predictable outputs.
- High Temperature (e.g., 0.8): Generates more creative and varied responses.
- VAD settings: Use
Voice Activity Detectionto adjust how your AI agent detects when someone is speaking and determines the right moment to respond, ensuring smoother, more natural interactions.- VAD mode: Choose how your AI listens.
VadMode detects speech and responds on its own, whileManualMode uses push-to-talk for more controlled interactions. - Threshold: Sets the minimum audio level (volume) required to detect speech. A lower threshold makes the agent more sensitive to quiet voices, while a higher threshold helps ignore background noise.
- Prefix padding: Adds a short buffer of audio before speech is detected. This helps capture the beginning of speech more accurately, avoiding cut-off words or syllables.
- Silence duration: Defines how long a period of silence must last before the agent decides the speaker has finished talking. Adjust this to balance responsiveness with avoiding premature interruptions.
- VAD mode: Choose how your AI listens.