Introduction

The Realtime Speech API Playground is an interactive environment designed to enable developers, businesses, and hobbyists to experiment with and fine-tune AI-powered speech interactions. It allows users to test and understand the capabilities of the platform in real-time, providing the ability to simulate various conversational scenarios using customizable settings. Whether you’re building voice assistants, automated customer support agents, or seamless conversational AI experiences, the Playground equips you with the tools to create, test, and refine your AI-driven speech applications.

Create an account on TensorStudio

Create a free account on TensorStudio to get started. You’ll need this account to access the API keys and manage your projects. The signup process takes less than a minute - just provide your email, create a password, and verify your email address.

Key Features of the Playground

  • Customizable Prompts: Define your AI agent’s behavior by providing a system prompt and an optional default message.
  • Model Selection: Choose from a variety of models optimized for different scenarios.
  • Voice Settings: Select from a range of voices to tailor the user experience.
  • Temperature Control: Adjust the randomness of AI responses to align with your use case.
  • Preset Prompts: Quickly get started with pre-defined prompts tailored for common scenarios.
  • VAD (Voice Activity Detection) Modes: Choose between manual and automatic interaction modes for optimal control.
  • Push-to-Talk Functionality: Enable manual interaction for precise, user-driven engagements.
  • Events Section: Monitor all events generated by the API in real-time.

System Prompt

The system prompt is the core instruction that defines the AI agent’s behavior and personality. By setting this, you can control how the agent responds to user inputs.

  • Custom Prompts: Input your unique system prompt to define the AI’s context.
  • Default Messages: Optionally add a starting message that initializes the interaction context.

Preset Prompts

Preset prompts are pre-configured examples designed to showcase common use cases. Use these to:

  • Quickly set up and test scenarios.
  • Learn from best practices.
  • Save time when experimenting with new ideas.

Model Selection

Choose from the available models based on your requirements. Each model is designed for specific use cases, such as:

  • Cost efficiency.
  • Low-latency interactions.
  • High-quality responses tailored to specific use cases.

Voice Settings

Select the voice that will represent your AI agent. Voice settings include:

  • Voice Type: Male, female, or neutral.
  • Language and Accent: Choose from multiple languages and regional accents to better suit your audience.
  • Prosodic Features: Choose a voice that best matches your desired pitch, intonation, and stress patterns for enhanced naturalness.
  • Tonality: Select voices with emotional tones suitable for your use case.
  • Speed: Opt for voices with appropriate speech rates for clarity and audience suitability.

Temperature Control

Temperature determines the randomness of the AI’s responses:

  • Low Temperature (e.g., 0.2): Produces consistent and predictable outputs.
  • High Temperature (e.g., 0.8): Generates more creative and varied responses.

VAD Mode (Voice Activity Detection)

Control how the AI detects and responds to user input:

  • Automatic Mode: The AI automatically detects when the user is speaking and responds accordingly.
  • Manual Mode: Users must press the Push-to-Talk button to interact with the AI, offering precise control over the interaction.