Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

What you can build

The voice-agents package plugs directly into any Swarms agent through the standard streaming_callback parameter. Tokens are streamed straight from the LLM into a streaming text-to-speech (TTS) pipeline, so the agent’s response begins speaking the moment the first sentence is generated — there is no “wait for the agent to finish, then speak” delay.
PatternExampleWhat it shows
Basic post-run TTSBasic Speech AgentRun the agent normally, then narrate the final result.
Streaming TTS callbackStreaming Voice AgentSpeak each sentence as the LLM produces it.
Autonomous loop + bash + voiceAutonomous Voice Agentmax_loops="auto" agent with terminal access narrating its work.
Multi-agent debateVoice DebateTwo agents alternate, each with a distinct voice. Optional STT input.
Hierarchical swarmHierarchical Speech SwarmDirector and workers, each with their own voice.

Prerequisites

Install

pip install -U swarms voice-agents

API keys

Set the keys for the LLM you want to drive the agent and for the TTS provider (OpenAI’s tts-1 is the default):
export OPENAI_API_KEY=sk-...           # required for OpenAI TTS
export ANTHROPIC_API_KEY=sk-ant-...    # only if using Claude models

How the integration works

StreamingTTSCallback is a callable that accepts one token at a time, buffers it sentence-by-sentence, and dispatches each sentence to the configured TTS engine. Because it implements the streaming_callback contract, it works anywhere Swarms exposes per-token callbacks — single agents, autonomous loops, hierarchical swarms, debates, etc.
from swarms import Agent
from voice_agents import StreamingTTSCallback

tts = StreamingTTSCallback(voice="alloy", model="openai/tts-1")

agent = Agent(model_name="gpt-4.1", max_loops=1)
result = agent.run(task="Hello!", streaming_callback=tts)
tts.flush()  # emit any remaining text in the buffer

Available voices

OpenAI’s TTS engine supports six voices out of the box: alloy, echo, fable, onyx, nova, shimmer. Pick distinct voices when you have multiple agents speaking so users can tell them apart.
Always call tts_callback.flush() at the end of every run. The streaming callback buffers the last sentence until the agent emits a sentence terminator — flush() forces it out.