Skip to main content

Introduction

The Realtime Speech API Playground is an interactive environment designed to enable developers, businesses, and hobbyists to experiment with and fine-tune AI-powered speech interactions. It allows users to test and understand the capabilities of the platform in real-time, providing the ability to simulate various conversational scenarios using customizable settings. Whether you’re building voice assistants, automated customer support agents, or seamless conversational AI experiences, the Playground equips you with the tools to create, test, and refine your AI-driven speech applications.
Hero Light

Interact with Agent

  • Adjust Playground Settings to tailor conversations with your AI agent.
  • Choose between Manual or VAD mode, then click Connect to start interacting.
  • View and review conversation logs in the Transcriptions tab.
  • Use the Functions panel to integrate tool calling and track actions triggered by the AI agent.
Note: Playground consumes credits in the background.

Key Features of the Playground

Tune your playground for testing various scenarios for your AI agent
  • Customizable system Prompts: Define your AI agent’s behavior and personality by setting a system prompt and an optional default message. This helps guide how the agent understands context and responds to user input.
    • Custom prompt box: Input your unique system prompt to define the AI’s context.
    • Default Input Greeting: Optionally add a starting message or greeting that initializes the interaction context.
  • Template agent prompts: Choose from a set of pre-defined tailored prompts for some common scenarios to build your AI agents.
    • Available templates: Tech support, Real estate, Human Resources, Healthcare
    • Flexible: Quickly set up and modify if needed to test scenarios.
  • Model Selection and response settings: Customize How Your AI Thinks, Sounds, and Responds.
    • Input Language: Choose from multiple input languages to match your AI agent’s needs.
    • Model selection: Choose from the available models based on your requirements. Optimize your AI’s performance with models tailored for efficiency, speed, or high-quality, use-specific output.
    • Voice Settings: Personalize your AI agent’s interaction style with a variety of voice modes.
      • Voice Type: Male, female, or neutral.
      • Language and Accent: Choose from multiple languages and regional accents to better suit your audience.
      • Prosodic Features: Choose a voice that best matches your desired pitch, intonation, and stress patterns for enhanced naturalness.
      • Tonality: Select voices with emotional tones suitable for your use case.
      • Speed: Opt for voices with appropriate speech rates for clarity and audience suitability.
    • Temperature Control: Temperature determines the randomness of the AI’s responses. Customize the variability in your AI agent’s responses to suit your specific needs.
      • Low Temperature (e.g., 0.2): Produces consistent and predictable outputs.
      • High Temperature (e.g., 0.8): Generates more creative and varied responses.
  • VAD settings: Use Voice Activity Detection to adjust how your AI agent detects when someone is speaking and determines the right moment to respond, ensuring smoother, more natural interactions.
    • VAD mode: Choose how your AI listens. Vad Mode detects speech and responds on its own, while Manual Mode uses push-to-talk for more controlled interactions.
    • Threshold: Sets the minimum audio level (volume) required to detect speech. A lower threshold makes the agent more sensitive to quiet voices, while a higher threshold helps ignore background noise.
    • Prefix padding: Adds a short buffer of audio before speech is detected. This helps capture the beginning of speech more accurately, avoiding cut-off words or syllables.
    • Silence duration: Defines how long a period of silence must last before the agent decides the speaker has finished talking. Adjust this to balance responsiveness with avoiding premature interruptions.

Transcriptions and Functions

Review your audio transcripts and events trigerred by your AI agent through function calling. For a detailed guide, see Tool Calling.