Streaming Voice Agent

Step 1: Install dependencies
Step 2: Build the agent
Step 3: Create the streaming TTS callback
Step 4: Run the agent with the callback
Step 5: Flush the buffer
Full example
When to use this pattern
See also

This pattern gives you the lowest possible “time to first audio”. Instead of waiting for the agent to finish, every token is forwarded to a streaming TTS callback that buffers up sentences and dispatches them to the speech engine the moment they’re complete.

Step 1: Install dependencies

pip install -U swarms voice-agents
export OPENAI_API_KEY=sk-...

Step 2: Build the agent

from swarms import Agent

agent = Agent(
    agent_name="Quantitative-Trading-Agent",
    agent_description="Advanced quantitative trading and algorithmic analysis agent",
    model_name="gpt-4.1",
    dynamic_temperature_enabled=True,
    max_loops=1,
    dynamic_context_window=True,
    top_p=None,
)

Step 3: Create the streaming TTS callback

StreamingTTSCallback is a callable that satisfies Swarms’s streaming_callback contract. With stream_mode=True, the audio for each sentence is played the moment it’s synthesised.

from voice_agents.main import StreamingTTSCallback

tts_callback = StreamingTTSCallback(
    voice="alloy",
    model="openai/tts-1",
    stream_mode=True,
)

OpenAI ships six voices: alloy, echo, fable, onyx, nova, shimmer.

Step 4: Run the agent with the callback

Pass the callback as streaming_callback. Tokens flow into the agent’s response and into the TTS engine in parallel.

out = agent.run(
    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
    streaming_callback=tts_callback,
)

Step 5: Flush the buffer

StreamingTTSCallback buffers the last sentence until it sees a terminator (., ?, !, …). Always call flush() at the end so the final sentence is spoken.

tts_callback.flush()
print(out)

Full example

from swarms import Agent
from voice_agents.main import StreamingTTSCallback

agent = Agent(
    agent_name="Quantitative-Trading-Agent",
    agent_description="Advanced quantitative trading and algorithmic analysis agent",
    model_name="gpt-4.1",
    dynamic_temperature_enabled=True,
    max_loops=1,
    dynamic_context_window=True,
    top_p=None,
)

tts_callback = StreamingTTSCallback(
    voice="alloy", model="openai/tts-1", stream_mode=True
)

out = agent.run(
    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
    streaming_callback=tts_callback,
)

tts_callback.flush()
print(out)

Source: examples/guides/voice_agents/voice_agents_examples/agent_with_speech.py

When to use this pattern

You want time to first audio as low as possible.
The agent’s output is long enough that waiting for completion would be awkward.
You’re fine with sentence-level granularity (the callback buffers per sentence, not per token).

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Streaming Voice Agent

Step 1: Install dependencies

Step 2: Build the agent

Step 3: Create the streaming TTS callback

Step 4: Run the agent with the callback

Step 5: Flush the buffer

Full example

When to use this pattern

See also

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Documentation Index

​Step 1: Install dependencies

​Step 2: Build the agent

​Step 3: Create the streaming TTS callback

​Step 4: Run the agent with the callback

​Step 5: Flush the buffer

​Full example

​When to use this pattern

​See also

Step 1: Install dependencies

Step 2: Build the agent

Step 3: Create the streaming TTS callback

Step 4: Run the agent with the callback

Step 5: Flush the buffer

Full example

When to use this pattern

See also