How to Run Llama with OpenClaw?
Meta's Llama is the most popular open-weight AI model family. Llama 3.1 and 3.2 run locally via Ollama, giving you a completely free AI agent with no API costs.
Install Ollama first, then pull the Llama model you want. Llama 3.1 8B is the default recommendation — it fits in 8 GB RAM and delivers solid performance for most tasks.
Model sizes: 8B (8 GB RAM, good for general use), 70B (64 GB RAM, near-GPT-4 quality), and 405B (enterprise-only, needs a cluster). For most home users, 8B is the practical choice.
Llama 3.2 introduced smaller vision-capable models: 11B and 90B with image understanding. These are useful if you want your agent to analyze screenshots or photos sent via messenger.
Performance varies by hardware. On an M2 Mac Mini with 16 GB RAM, Llama 3.1 8B generates about 20-30 tokens per second — responsive enough for conversational use. On a modest VPS (4 GB RAM), expect slower speeds.
Llama is best for: privacy-focused deployments, offline operation, home servers, and users who want zero API costs. For quality-critical tasks, cloud models like Claude are still recommended.
# Pull Llama 3.1 ollama pull llama3.1 # Or the larger version for better quality ollama pull llama3.1:70b # Configure OpenClaw openclaw config set provider ollama openclaw config set model llama3.1