Have you ever wanted to run a powerful large language model (LLM) like Llama 3 or Gemma right on your own computer, but you need a consistent and portable setup? That’s where using Ollama with Docker and Docker Compose comes in.

Docker Compose is a fantastic tool that allows you to define and run multi-container Docker applications. By using it with Ollama, you get a clean, isolated environment that is easy to manage and replicate across different machines.

This guide will walk you through the process of setting up Ollama and your first model using Docker Compose.

Step 1: Create Your Docker Compose Configuration

The core of this setup is a single file: docker-compose.yml. This file defines the services (containers) you want to run. You can place this file in any directory you choose.

Create a new file named docker-compose.yml and add the following content.

version: '3.8'

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: always

volumes:
  ollama_data:

What this file does:

  • version: '3.8': Specifies the Docker Compose file format version.
  • services: Defines the containers to be run.
  • ollama: This is the name of our service.
  • image: ollama/ollama:latest: Tells Docker to use the latest official Ollama image from Docker Hub.
  • ports: - "11434:11434": Maps the container’s port 11434 to the same port on your host machine, which is the default port for Ollama’s API.
  • volumes: - ollama_data:/root/.ollama: This creates a persistent volume named ollama_data to store the downloaded models. This means your models will not be deleted if the container is removed.
  • restart: always: Ensures that the container restarts automatically if it stops.
  • volumes: ollama_data: : Defines the named volume.

Step 2: Start the Ollama Service

Now, open your terminal or Command Prompt, navigate to the directory where you saved your docker-compose.yml file, and run the following command:

docker compose up -d
  • up: Starts the services defined in the docker-compose.yml file.
  • -d: Runs the containers in “detached” mode, so they run in the background and don’t tie up your terminal.

Docker will now download the Ollama image and start the container. You’ll know it’s running when you get your command prompt back.

Step 3: Finding and Using Pre-made Models

Ollama hosts a wide variety of pre-trained models that are ready for you to use. You can find the full library on the official website at https://ollama.com/library.

The library is a fantastic resource for exploring models based on their size, purpose, and capabilities. You’ll find popular models like Llama 3, Mistral, and Gemma, as well as specialized models for coding (CodeLlama) and image understanding (LLaVA).

How to choose a model:

  • Check your hardware: LLMs are memory-intensive, especially for a GPU. A general rule of thumb is that 7B (billion parameter) models require at least 8 GB of RAM/VRAM, 13B models need 16 GB, and larger models require even more.
  • Consider the model’s purpose: Look for models fine-tuned for a specific task. For general-purpose conversations, Llama 3 is an excellent choice. For coding assistance, CodeLlama is highly recommended.

How to download and run a model:

Ollama provides a simple command to download and run any model from its library. The docker compose exec command is a great way to manage models from your terminal.

To download a model without running it immediately, use ollama pull:

docker compose exec ollama ollama pull mistral

To download and immediately start chatting with a model, use ollama run:

docker compose exec ollama ollama run llama3

If you don’t have the model already, the run command will automatically pull it for you first.

To see all the models you have downloaded, use the ollama list command:

docker compose exec ollama ollama list

Step 4: Next Steps and Advanced Usage

Option 1: Executing a command inside the container

You can use docker compose exec to run commands within the container. This is a great way to manage models from your terminal.

To run a popular model like Llama 3, use this command:

docker compose exec ollama ollama run llama3

The first time you run this, it will download the model. After the download is complete, you can start chatting with the model directly in your terminal.

Option 2: Using the Ollama API

Since you’ve mapped port 11434, you can also interact with Ollama via its API, just as you would with a local installation. For example, you can use curl to send a request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

This method is particularly useful for integrating Ollama into applications and scripts.

Step 5: Getting Started with the OpenAPI Specification

Ollama’s API is fully documented and is compatible with the OpenAPI (formerly Swagger) standard. This means you can use it with various API clients and frameworks for a seamless integration.

The OpenAPI Base URL

While there isn’t a single, downloadable JSON file, all of the API endpoints follow a consistent base URL.

  • Base URL: http://localhost:11434/api

Common Endpoints

Here are a few of the most important endpoints you’ll use:

  • POST /api/generate: The main endpoint for generating completions from a prompt. This is used for single-turn text generation.
  • POST /api/chat: Used for multi-turn conversations, as it maintains context and conversation history.
  • GET /api/tags: Retrieves a list of all models that have been downloaded and are available on your local Ollama instance.
  • POST /api/pull: Used to pull (download) a new model from the Ollama library.
  • POST /api/embeddings: Generates a numerical vector for a given text prompt, which is essential for Retrieval-Augmented Generation (RAG) applications.

You can find the full API reference in the official Ollama documentation, which details all the request and response body formats for each endpoint. This documentation is your key to programmatically interacting with your local Ollama instance, for example, from a Spring AI application.

Step 6: Enabling GPU Acceleration

Enabling GPU acceleration is a critical step for getting the best performance from your models. The method you use depends on your operating system.

For Linux and Windows (NVIDIA GPUs)

For a significant performance boost on systems with NVIDIA GPUs, you’ll need to configure Docker Compose to grant it access. This requires the NVIDIA Container Toolkit to be installed on your host system.

Once the toolkit is installed, update your docker-compose.yml to include the deploy and runtime options.

version: '3.8'

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    runtime: nvidia
    restart: always

volumes:
  ollama_data:

This configuration tells Docker to use the NVIDIA runtime and to reserve all available GPUs for the ollama container, ensuring you get the best possible performance.

For macOS (Apple Silicon)

On Mac, the process is much simpler. You do not need to modify the docker-compose.yml file to enable GPU access. The official Ollama application for macOS is built to use Apple’s Metal Performance Shaders (MPS) for acceleration by default.

This means if you’re using the native Ollama application on your Mac (which is the recommended approach for the best performance), it will automatically use the GPU. If you choose to run Ollama inside Docker Desktop on a Mac, Docker Desktop handles the GPU passthrough automatically without needing the extra configuration in your docker-compose.yml.

In short: GPU acceleration is a built-in feature of the Ollama macOS application and is automatically handled by Docker Desktop.

Step 7: Adding a Web User Interface (Optional)

While the command line and API are powerful, a web-based UI can make interacting with models much easier. Open WebUI is an excellent open-source choice that integrates seamlessly with Ollama.

To add Open WebUI to your setup, update your docker-compose.yml file to include a second service.

version: '3.8'

services:
  ollama:
    container_name: ollama
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: always
  
  open-webui:
    container_name: open-webui
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    volumes:
      - open-webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama
    restart: always

volumes:
  ollama_data:
  open-webui_data:

That’s it! You now have a complete, performant, and user-friendly setup for running local LLMs.


Discover more from GhostProgrammer - Jeff Miller

Subscribe to get the latest posts sent to your email.

By Jeffery Miller

I am known for being able to quickly decipher difficult problems to assist development teams in producing a solution. I have been called upon to be the Team Lead for multiple large-scale projects. I have a keen interest in learning new technologies, always ready for a new challenge.