Agent Streaming

Sync Streaming with a Multi-Loop Tool-Calling Agent
Async Streaming
Streaming Through the Autonomous Loop
Related

Stream tokens from an Agent the moment the LLM produces them — across every internal loop including tool-call turns, synthesis turns, and the autonomous plan/execute/summary cycle. The Agent exposes two generator methods:

agent.run_stream(task) — sync generator yielding str tokens
agent.arun_stream(task) — async generator yielding str tokens

Both work for any max_loops value (1, integer > 1 with tools, or "auto").

Sync Streaming with a Multi-Loop Tool-Calling Agent

Tokens stream during the tool-call turn AND the synthesis turn that runs after the tool returns.

from swarms import Agent


def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b


agent = Agent(
    agent_name="Calculator",
    model_name="gpt-4.1-mini",
    max_loops=3,
    tools=[add],
    persistent_memory=False,
    print_on=False,
)

for token in agent.run_stream(
    "Use the add tool to compute 17 + 25, then state the result."
):
    print(token, end="", flush=True)

Async Streaming

Drop-in for any async caller. The agent loop runs in a thread executor; tokens flow through an asyncio.Queue so the caller’s event loop is never blocked.

import asyncio
from swarms import Agent

agent = Agent(
    agent_name="Writer",
    model_name="gpt-4.1-mini",
    max_loops=1,
    persistent_memory=False,
    print_on=False,
)


async def main():
    async for token in agent.arun_stream(
        "Explain the difference between concurrency and parallelism in two sentences."
    ):
        print(token, end="", flush=True)


asyncio.run(main())

Streaming Through the Autonomous Loop

When max_loops="auto", the agent runs a plan→execute→summary cycle. All phases stream their tokens — including the final summary phase.

import asyncio
from swarms import Agent


def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b


agent = Agent(
    agent_name="AutoBot",
    model_name="gpt-4.1-mini",
    max_loops="auto",
    tools=[add],
    persistent_memory=False,
    print_on=False,
)


async def main():
    async for token in agent.arun_stream(
        "Use the add tool to compute 99 + 1, then briefly explain the answer."
    ):
        print(token, end="", flush=True)


asyncio.run(main())

run_stream and arun_stream are real LLM streaming, not buffered chunking. Tokens arrive over the wall-clock duration of the LLM call (typically 10–80 ms apart inside a network burst), not all at once at the end.

Agent Configuration — streaming_on, streaming_callback, and the streaming method signatures
Streaming — full overview of every streaming mode
SequentialWorkflow Streaming — pipeline streaming across multiple agents

Streaming Responses RAG Examples Overview

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Agent Streaming

Sync Streaming with a Multi-Loop Tool-Calling Agent

Async Streaming

Streaming Through the Autonomous Loop

Index

Basic Examples

Single Agent

Multi-Agent Examples

Applications

Research

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Documentation Index

​Sync Streaming with a Multi-Loop Tool-Calling Agent

​Async Streaming

​Streaming Through the Autonomous Loop

​Related

Sync Streaming with a Multi-Loop Tool-Calling Agent

Async Streaming

Streaming Through the Autonomous Loop

Related