Installation
Environment Setup
Quick Start
Every Cerebras model uses thecerebras/ prefix:
Model Names
| Model | model_name | Best for |
|---|---|---|
| Llama 3.3 70B | "cerebras/llama-3.3-70b" | Default — frontier open model at peak speed |
| Llama 3.1 70B | "cerebras/llama3-70b-instruct" | Llama 3.1 70B instruction-tuned |
| Llama 3.1 8B | "cerebras/llama3.1-8b" | Smaller, even faster |
Speed-Critical Use Cases
Voice Agent Loop
Cerebras’s speed is what makes real-time voice agents feel natural — the model can respond in tens of milliseconds:High-Volume Classification
When you need to process thousands of items per minute:Streaming
Streaming on Cerebras feels essentially instant:Massive Parallel Agent Swarms
Cerebras’s speed compounds in multi-agent setups — 20 agents in parallel can still finish in a couple seconds:Tool Use
Cerebras’s Llama models support function calling:Production Defaults
Next Steps
- Building Agents with Groq — also very fast, broader model selection
- Building Agents with Ollama — run open models locally
- Building Agents with vLLM — self-host open models at scale
- Model Providers Overview