Installation
Install Ollama from ollama.ai, then install Swarms:Environment Setup
No API key required. Ollama runs entirely on your machine. By default it listens onhttp://localhost:11434.
If you’re running Ollama on a different host:
Quick Start
Every Ollama model uses theollama/ prefix:
Model Names
Any model you have pulled in Ollama is usable. Common choices:| Model | model_name | Notes |
|---|---|---|
| Llama 3.2 | "ollama/llama3.2" | Meta’s latest small Llama (3B / 1B) |
| Llama 3.3 70B | "ollama/llama3.3" | Frontier Meta open model |
| Qwen 2.5 | "ollama/qwen2.5" | Strong open model from Alibaba |
| Mistral | "ollama/mistral" | Fast 7B European model |
| Phi 3 | "ollama/phi3" | Microsoft’s small but capable model |
| DeepSeek R1 | "ollama/deepseek-r1" | Local R1 distillation |
| Code Llama | "ollama/codellama" | Code-specialized Llama |
ollama list to see what you have installed locally.
Tool Use
Modern Ollama models (Llama 3+, Qwen 2.5+) support function calling:Streaming
Streaming works the same as any other provider:Privacy-First Workflows
Because nothing leaves your machine, Ollama is ideal for processing sensitive data:Multi-Agent on Local Hardware
You can run multi-agent setups entirely locally — useful for offline R&D:Performance Tips
- Use quantized models —
ollama/llama3.3:8b-instruct-q4_K_Mruns much faster than the full-precision version on consumer hardware. - Set
context_lengthhonestly — local models have small effective context windows.8192or16384is realistic for most setups. - One agent at a time on a single GPU — concurrent agents on the same machine will queue at the inference engine.
Production Defaults
Next Steps
- Building Agents with vLLM — production self-hosting at scale
- Building Agents with Cerebras — fastest hosted open models
- Building Agents with Groq — fast hosted open models
- Model Providers Overview