Ollama GPU Server Setup

This guide shows how to run the Ollama server on a Kubernetes node with an NVIDIA GPU. It assumes the NVIDIA device plugin is installed on the cluster.

Deployment via Flux

FluxCD manages the Ollama deployment under gitops/clusters/homelab/apps/ollama/. Commit the manifest files to the repository and Flux will create the namespace, deployment and service automatically. The deployment mounts an empty directory at /root/.ollama for model storage. Replace it with a persistent volume claim if you want the models to survive pod restarts.

Adding and Serving Models

Pull a model into the running pod:

kubectl exec deployment/ollama-gpu -- ollama pull qwen3:4b

Verify the model is available:

kubectl exec deployment/ollama-gpu -- ollama list

Send a test request to the service:

curl http://<service-ip>:80/api/generate -d '{"model":"qwen3:4b","prompt":"Hello"}'

Model Swap Script

To swap models (pull new, remove old, verify GPU):

scripts/ollama/update-model.sh qwen3:4b qwen2.5:3b

Add --ha to also update the Home Assistant conversation agent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama GPU Server Setup

Deployment via Flux

Adding and Serving Models

Model Swap Script

Uh oh!

FilesExpand file tree

ollama-gpu-server.md

Latest commit

History

ollama-gpu-server.md

File metadata and controls

Ollama GPU Server Setup

Deployment via Flux

Adding and Serving Models

Model Swap Script