Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

Stream tokens from an Agent the moment the LLM produces them — across every internal loop including tool-call turns, synthesis turns, and the autonomous plan/execute/summary cycle. The Agent exposes two generator methods:
  • agent.run_stream(task) — sync generator yielding str tokens
  • agent.arun_stream(task) — async generator yielding str tokens
Both work for any max_loops value (1, integer > 1 with tools, or "auto").

Sync Streaming with a Multi-Loop Tool-Calling Agent

Tokens stream during the tool-call turn AND the synthesis turn that runs after the tool returns.
from swarms import Agent


def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b


agent = Agent(
    agent_name="Calculator",
    model_name="gpt-4.1-mini",
    max_loops=3,
    tools=[add],
    persistent_memory=False,
    print_on=False,
)

for token in agent.run_stream(
    "Use the add tool to compute 17 + 25, then state the result."
):
    print(token, end="", flush=True)

Async Streaming

Drop-in for any async caller. The agent loop runs in a thread executor; tokens flow through an asyncio.Queue so the caller’s event loop is never blocked.
import asyncio
from swarms import Agent

agent = Agent(
    agent_name="Writer",
    model_name="gpt-4.1-mini",
    max_loops=1,
    persistent_memory=False,
    print_on=False,
)


async def main():
    async for token in agent.arun_stream(
        "Explain the difference between concurrency and parallelism in two sentences."
    ):
        print(token, end="", flush=True)


asyncio.run(main())

Streaming Through the Autonomous Loop

When max_loops="auto", the agent runs a plan→execute→summary cycle. All phases stream their tokens — including the final summary phase.
import asyncio
from swarms import Agent


def add(a: int, b: int) -> int:
    """Add two integers and return the result."""
    return a + b


agent = Agent(
    agent_name="AutoBot",
    model_name="gpt-4.1-mini",
    max_loops="auto",
    tools=[add],
    persistent_memory=False,
    print_on=False,
)


async def main():
    async for token in agent.arun_stream(
        "Use the add tool to compute 99 + 1, then briefly explain the answer."
    ):
        print(token, end="", flush=True)


asyncio.run(main())
run_stream and arun_stream are real LLM streaming, not buffered chunking. Tokens arrive over the wall-clock duration of the LLM call (typically 10–80 ms apart inside a network burst), not all at once at the end.