Voice Agents Overview

What you can build
Prerequisites
Install
API keys
How the integration works
Available voices
Related

What you can build

The voice-agents package plugs directly into any Swarms agent through the standard streaming_callback parameter. Tokens are streamed straight from the LLM into a streaming text-to-speech (TTS) pipeline, so the agent’s response begins speaking the moment the first sentence is generated — there is no “wait for the agent to finish, then speak” delay.

Pattern	Example	What it shows
Basic post-run TTS	Basic Speech Agent	Run the agent normally, then narrate the final result.
Streaming TTS callback	Streaming Voice Agent	Speak each sentence as the LLM produces it.
Autonomous loop + bash + voice	Autonomous Voice Agent	`max_loops="auto"` agent with terminal access narrating its work.
Multi-agent debate	Voice Debate	Two agents alternate, each with a distinct voice. Optional STT input.
Hierarchical swarm	Hierarchical Speech Swarm	Director and workers, each with their own voice.

Prerequisites

Install

pip install -U swarms voice-agents

API keys

Set the keys for the LLM you want to drive the agent and for the TTS provider (OpenAI’s tts-1 is the default):

export OPENAI_API_KEY=sk-...           # required for OpenAI TTS
export ANTHROPIC_API_KEY=sk-ant-...    # only if using Claude models

How the integration works

StreamingTTSCallback is a callable that accepts one token at a time, buffers it sentence-by-sentence, and dispatches each sentence to the configured TTS engine. Because it implements the streaming_callback contract, it works anywhere Swarms exposes per-token callbacks — single agents, autonomous loops, hierarchical swarms, debates, etc.

from swarms import Agent
from voice_agents import StreamingTTSCallback

tts = StreamingTTSCallback(voice="alloy", model="openai/tts-1")

agent = Agent(model_name="gpt-4.1", max_loops=1)
result = agent.run(task="Hello!", streaming_callback=tts)
tts.flush()  # emit any remaining text in the buffer

Available voices

OpenAI’s TTS engine supports six voices out of the box: alloy, echo, fable, onyx, nova, shimmer. Pick distinct voices when you have multiple agents speaking so users can tell them apart.

Always call tts_callback.flush() at the end of every run. The streaming callback buffers the last sentence until the agent emits a sentence terminator — flush() forces it out.

Agent Streaming — the underlying token-streaming mechanism the voice callback uses.
voice-agents on PyPI — package source and TTS/STT API reference.

Prediction Markets: Kalshi Basic Speech Agent

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Voice Agents Overview

What you can build

Prerequisites

Install

API keys

How the integration works

Available voices

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Documentation Index

​What you can build

​Prerequisites

​Install

​API keys

​How the integration works

​Available voices

​Related

What you can build

Prerequisites

Install

API keys

How the integration works

Available voices

Related