Have you ever wanted to run a powerful large language model (LLM) like Llama 3 or Gemma right on your own computer, but you need a consistent and portable setup? That’s where using Ollama with Docker and Docker Compose comes in.
Docker Compose is a fantastic tool that allows you to define and run multi-container Docker applications. By using it with Ollama, you get a clean, isolated environment that is easy to manage and replicate across different machines.
This guide will walk you through the process of setting up Ollama and your first model using Docker Compose.
Step 1: Create Your Docker Compose Configuration
The core of this setup is a single file: docker-compose.yml
. This file defines the services (containers) you want to run. You can place this file in any directory you choose.
Create a new file named docker-compose.yml
and add the following content.
version: '3.8'
services:
ollama:
container_name: ollama
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: always
volumes:
ollama_data:
What this file does:
version: '3.8'
: Specifies the Docker Compose file format version.services
: Defines the containers to be run.ollama
: This is the name of our service.image: ollama/ollama:latest
: Tells Docker to use the latest official Ollama image from Docker Hub.ports: - "11434:11434"
: Maps the container’s port11434
to the same port on your host machine, which is the default port for Ollama’s API.volumes: - ollama_data:/root/.ollama
: This creates a persistent volume namedollama_data
to store the downloaded models. This means your models will not be deleted if the container is removed.restart: always
: Ensures that the container restarts automatically if it stops.volumes: ollama_data:
: Defines the named volume.
Step 2: Start the Ollama Service
Now, open your terminal or Command Prompt, navigate to the directory where you saved your docker-compose.yml
file, and run the following command:
docker compose up -d
up
: Starts the services defined in thedocker-compose.yml
file.-d
: Runs the containers in “detached” mode, so they run in the background and don’t tie up your terminal.
Docker will now download the Ollama image and start the container. You’ll know it’s running when you get your command prompt back.
Step 3: Finding and Using Pre-made Models
Ollama hosts a wide variety of pre-trained models that are ready for you to use. You can find the full library on the official website at https://ollama.com/library
.
The library is a fantastic resource for exploring models based on their size, purpose, and capabilities. You’ll find popular models like Llama 3, Mistral, and Gemma, as well as specialized models for coding (CodeLlama
) and image understanding (LLaVA
).
How to choose a model:
- Check your hardware: LLMs are memory-intensive, especially for a GPU. A general rule of thumb is that 7B (billion parameter) models require at least 8 GB of RAM/VRAM, 13B models need 16 GB, and larger models require even more.
- Consider the model’s purpose: Look for models fine-tuned for a specific task. For general-purpose conversations,
Llama 3
is an excellent choice. For coding assistance,CodeLlama
is highly recommended.
How to download and run a model:
Ollama provides a simple command to download and run any model from its library. The docker compose exec
command is a great way to manage models from your terminal.
To download a model without running it immediately, use ollama pull
:
docker compose exec ollama ollama pull mistral
To download and immediately start chatting with a model, use ollama run
:
docker compose exec ollama ollama run llama3
If you don’t have the model already, the run
command will automatically pull it for you first.
To see all the models you have downloaded, use the ollama list
command:
docker compose exec ollama ollama list
Step 4: Next Steps and Advanced Usage
Option 1: Executing a command inside the container
You can use docker compose exec
to run commands within the container. This is a great way to manage models from your terminal.
To run a popular model like Llama 3, use this command:
docker compose exec ollama ollama run llama3
The first time you run this, it will download the model. After the download is complete, you can start chatting with the model directly in your terminal.
Option 2: Using the Ollama API
Since you’ve mapped port 11434
, you can also interact with Ollama via its API, just as you would with a local installation. For example, you can use curl
to send a request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
This method is particularly useful for integrating Ollama into applications and scripts.
Step 5: Getting Started with the OpenAPI Specification
Ollama’s API is fully documented and is compatible with the OpenAPI (formerly Swagger) standard. This means you can use it with various API clients and frameworks for a seamless integration.
The OpenAPI Base URL
While there isn’t a single, downloadable JSON file, all of the API endpoints follow a consistent base URL.
- Base URL:
http://localhost:11434/api
Common Endpoints
Here are a few of the most important endpoints you’ll use:
- POST
/api/generate
: The main endpoint for generating completions from a prompt. This is used for single-turn text generation. - POST
/api/chat
: Used for multi-turn conversations, as it maintains context and conversation history. - GET
/api/tags
: Retrieves a list of all models that have been downloaded and are available on your local Ollama instance. - POST
/api/pull
: Used to pull (download) a new model from the Ollama library. - POST
/api/embeddings
: Generates a numerical vector for a given text prompt, which is essential for Retrieval-Augmented Generation (RAG) applications.
You can find the full API reference in the official Ollama documentation, which details all the request and response body formats for each endpoint. This documentation is your key to programmatically interacting with your local Ollama instance, for example, from a Spring AI application.
Step 6: Enabling GPU Acceleration
Enabling GPU acceleration is a critical step for getting the best performance from your models. The method you use depends on your operating system.
For Linux and Windows (NVIDIA GPUs)
For a significant performance boost on systems with NVIDIA GPUs, you’ll need to configure Docker Compose to grant it access. This requires the NVIDIA Container Toolkit to be installed on your host system.
Once the toolkit is installed, update your docker-compose.yml
to include the deploy
and runtime
options.
version: '3.8'
services:
ollama:
container_name: ollama
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
runtime: nvidia
restart: always
volumes:
ollama_data:
This configuration tells Docker to use the NVIDIA runtime and to reserve all available GPUs for the ollama
container, ensuring you get the best possible performance.
For macOS (Apple Silicon)
On Mac, the process is much simpler. You do not need to modify the docker-compose.yml
file to enable GPU access. The official Ollama application for macOS is built to use Apple’s Metal Performance Shaders (MPS) for acceleration by default.
This means if you’re using the native Ollama application on your Mac (which is the recommended approach for the best performance), it will automatically use the GPU. If you choose to run Ollama inside Docker Desktop on a Mac, Docker Desktop handles the GPU passthrough automatically without needing the extra configuration in your docker-compose.yml
.
In short: GPU acceleration is a built-in feature of the Ollama macOS application and is automatically handled by Docker Desktop.
Step 7: Adding a Web User Interface (Optional)
While the command line and API are powerful, a web-based UI can make interacting with models much easier. Open WebUI is an excellent open-source choice that integrates seamlessly with Ollama.
To add Open WebUI to your setup, update your docker-compose.yml
file to include a second service.
version: '3.8'
services:
ollama:
container_name: ollama
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: always
open-webui:
container_name: open-webui
image: ghcr.io/open-webui/open-webui:main
ports:
- "3000:8080"
volumes:
- open-webui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: always
volumes:
ollama_data:
open-webui_data:
That’s it! You now have a complete, performant, and user-friendly setup for running local LLMs.
Discover more from GhostProgrammer - Jeff Miller
Subscribe to get the latest posts sent to your email.