Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

The simplest voice-agent pattern: let the agent finish thinking, then hand the final text to stream_tts_openai for narration. This is ideal when you only care about the final answer, not intermediate tokens.

Step 1: Install dependencies

pip install -U swarms voice-agents
export OPENAI_API_KEY=sk-...

Step 2: Build the agent

Use any LiteLLM-compatible model. Here we use a quantitative trading agent.
from swarms import Agent

agent = Agent(
    agent_name="Quantitative-Trading-Agent",
    agent_description="Advanced quantitative trading and algorithmic analysis agent",
    model_name="gpt-4.1",
    dynamic_temperature_enabled=True,
    max_loops=1,
    dynamic_context_window=True,
    top_p=None,
)

Step 3: Run the agent

The agent runs to completion and returns the full response as a string.
out = agent.run(
    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
)

Step 4: Stream the result through TTS

stream_tts_openai accepts a list of strings and streams them through OpenAI’s TTS engine. With stream_mode=True, audio chunks play as they’re synthesised.
from voice_agents.main import stream_tts_openai

stream_tts_openai(
    [out],
    stream_mode=True,
)

Full example

from swarms import Agent
from voice_agents.main import stream_tts_openai

agent = Agent(
    agent_name="Quantitative-Trading-Agent",
    agent_description="Advanced quantitative trading and algorithmic analysis agent",
    model_name="gpt-4.1",
    dynamic_temperature_enabled=True,
    max_loops=1,
    dynamic_context_window=True,
    top_p=None,
)

out = agent.run(
    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
)

stream_tts_openai(
    [out],
    stream_mode=True,
)

When to use this pattern

  • You only need to narrate the final answer.
  • Latency to first audio is not critical (you wait for the agent to finish before any speech).
  • Simplicity wins — no callback wiring, no flush() calls.
For sentence-by-sentence narration as the agent generates, see Streaming Voice Agent.