Documentation Index
Fetch the complete documentation index at: https://docs.swarms.world/llms.txt
Use this file to discover all available pages before exploring further.
The simplest voice-agent pattern: let the agent finish thinking, then hand the final text to stream_tts_openai for narration. This is ideal when you only care about the final answer, not intermediate tokens.
Step 1: Install dependencies
pip install -U swarms voice-agents
export OPENAI_API_KEY=sk-...
Step 2: Build the agent
Use any LiteLLM-compatible model. Here we use a quantitative trading agent.
from swarms import Agent
agent = Agent(
agent_name="Quantitative-Trading-Agent",
agent_description="Advanced quantitative trading and algorithmic analysis agent",
model_name="gpt-4.1",
dynamic_temperature_enabled=True,
max_loops=1,
dynamic_context_window=True,
top_p=None,
)
Step 3: Run the agent
The agent runs to completion and returns the full response as a string.
out = agent.run(
task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
)
Step 4: Stream the result through TTS
stream_tts_openai accepts a list of strings and streams them through OpenAI’s TTS engine. With stream_mode=True, audio chunks play as they’re synthesised.
from voice_agents.main import stream_tts_openai
stream_tts_openai(
[out],
stream_mode=True,
)
Full example
from swarms import Agent
from voice_agents.main import stream_tts_openai
agent = Agent(
agent_name="Quantitative-Trading-Agent",
agent_description="Advanced quantitative trading and algorithmic analysis agent",
model_name="gpt-4.1",
dynamic_temperature_enabled=True,
max_loops=1,
dynamic_context_window=True,
top_p=None,
)
out = agent.run(
task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
)
stream_tts_openai(
[out],
stream_mode=True,
)
When to use this pattern
- You only need to narrate the final answer.
- Latency to first audio is not critical (you wait for the agent to finish before any speech).
- Simplicity wins — no callback wiring, no
flush() calls.
For sentence-by-sentence narration as the agent generates, see Streaming Voice Agent.