Building Agents with Cerebras

Cerebras runs Llama models on its wafer-scale chips and delivers inference speeds well beyond anything available from GPU-based providers — frequently over 1000 tokens per second. It’s the right pick when latency is the dominant constraint: real-time customer support, voice agents, autocomplete-style UIs, and high-throughput agent swarms.

Installation

pip install -U swarms

Environment Setup

export CEREBRAS_API_KEY="..."

Get an API key at cloud.cerebras.ai.

Quick Start

Every Cerebras model uses the cerebras/ prefix:

from swarms import Agent

agent = Agent(
    agent_name="Cerebras-Agent",
    model_name="cerebras/llama-3.3-70b",
    max_loops=1,
)

print(agent.run("Summarize the architectural innovations behind wafer-scale compute in three paragraphs."))

Model Names

Model	`model_name`	Best for
Llama 3.3 70B	`"cerebras/llama-3.3-70b"`	Default — frontier open model at peak speed
Llama 3.1 70B	`"cerebras/llama3-70b-instruct"`	Llama 3.1 70B instruction-tuned
Llama 3.1 8B	`"cerebras/llama3.1-8b"`	Smaller, even faster

Speed-Critical Use Cases

Voice Agent Loop

Cerebras’s speed is what makes real-time voice agents feel natural — the model can respond in tens of milliseconds:

from swarms import Agent

voice_agent = Agent(
    agent_name="Voice-Assistant",
    model_name="cerebras/llama-3.3-70b",
    system_prompt="You are a friendly voice assistant. Keep responses under 2 sentences.",
    streaming_on=True,
    max_loops=1,
)

# Plug into your TTS / STT pipeline
voice_agent.run("What's a good weeknight dinner I can make in 20 minutes?")

High-Volume Classification

When you need to process thousands of items per minute:

from swarms import Agent

classifier = Agent(
    agent_name="Cerebras-Classifier",
    model_name="cerebras/llama3.1-8b",
    system_prompt="Classify each input as one of: support, sales, billing, other. Reply with the label only.",
    max_loops=1,
)

for ticket in tickets:
    label = classifier.run(ticket)
    route(ticket, label)

Streaming

Streaming on Cerebras feels essentially instant:

from swarms import Agent

agent = Agent(
    agent_name="Streaming-Cerebras",
    model_name="cerebras/llama-3.3-70b",
    streaming_on=True,
    max_loops=1,
)

agent.run("Write a 200-word explanation of how transformer attention works.")

Massive Parallel Agent Swarms

Cerebras’s speed compounds in multi-agent setups — 20 agents in parallel can still finish in a couple seconds:

from swarms import Agent, ConcurrentWorkflow

agents = [
    Agent(
        agent_name=f"Reviewer-{i}",
        model_name="cerebras/llama-3.3-70b",
        system_prompt=f"You are reviewer #{i}. Give a one-paragraph critique.",
        max_loops=1,
    )
    for i in range(20)
]

workflow = ConcurrentWorkflow(agents=agents)
reviews = workflow.run("Draft proposal: build an in-house vector database instead of using Pinecone.")

Tool Use

Cerebras’s Llama models support function calling:

from swarms import Agent

def get_weather(city: str) -> str:
    """Return the current weather for a city."""
    return f"{city}: 21°C, partly cloudy"

agent = Agent(
    agent_name="Cerebras-Assistant",
    model_name="cerebras/llama-3.3-70b",
    tools=[get_weather],
    dynamic_temperature_enabled=True,
    max_loops=3,
)

print(agent.run("What's the weather in Tokyo right now?"))

Production Defaults

from swarms import Agent

agent = Agent(
    agent_name="Production-Cerebras",
    model_name="cerebras/llama-3.3-70b",
    max_loops=1,
    persistent_memory=True,
    context_compression=True,
    autosave=True,
    retry_attempts=3,
    print_on=False,
)

Next Steps

Building Agents with Groq — also very fast, broader model selection
Building Agents with Ollama — run open models locally
Building Agents with vLLM — self-host open models at scale
Model Providers Overview

Building Agents with Groq Building Agents with DeepSeek

Index

Basic Examples

Model Providers

Single Agent

Multi-Agent Examples

Applications

Research

Technical Analysis

Use Cases

Finance

Voice Agents

Integrations

Deployment

CLI

Building Agents with Cerebras

Installation

Environment Setup

Quick Start

Model Names

Speed-Critical Use Cases

Voice Agent Loop

High-Volume Classification

Streaming

Massive Parallel Agent Swarms

Tool Use

Production Defaults

Next Steps

​Installation

​Environment Setup

​Quick Start

​Model Names

​Speed-Critical Use Cases

​Voice Agent Loop

​High-Volume Classification

​Streaming

​Massive Parallel Agent Swarms

​Tool Use

​Production Defaults

​Next Steps

Installation

Environment Setup

Quick Start

Model Names

Speed-Critical Use Cases

Voice Agent Loop

High-Volume Classification

Streaming

Massive Parallel Agent Swarms

Tool Use

Production Defaults

Next Steps