Ollama
Free tierThe easiest way to build and run open-source AI models locally
Free tier available·All audiences·Powered by Open-source models (Llama, Mistral, Gemma, etc.)·API available·Open source
Key strengths
Run open-source LLMs fully offline and locallyOne-command install and model management via CLIOpenAI-compatible REST API for easy integrationOptional cloud tier for larger, faster modelsPrivacy-first — data is never used for training
Free tier + paid plans · from $20 USD/mo
San Francisco, United States
Founded 2023
Self-hostable
No ratings yet
Developer Documentation
Installation
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify
ollama --version
Key CLI Commands
ollama pull llama3 # Download a model
ollama run llama3 # Run interactive session
ollama list # List installed models
ollama serve # Start the local API server (default: port 11434)
ollama launch openclaw # Launch a compatible app
REST API (OpenAI-compatible)
Ollama exposes an HTTP API at http://localhost:11434. It is compatible with the OpenAI Chat Completions API format:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Key Parameters
| Parameter | Description |
|---|---|
model | Model name (e.g., llama3, mistral, gemma) |
messages | Chat history array |
stream | Boolean — enable token streaming |
options | Model options: temperature, num_ctx, top_p, etc. |
Hardware Acceleration
- Apple Silicon: Metal (automatic)
- NVIDIA: CUDA (automatic if CUDA drivers present)
- AMD: ROCm
- Fallback: CPU inference
