Build Your Own Monster AI Homelab with Ollama and Proxmox – Wong Edan's

Why Rent Clouds When You Can Own the Storm?

Listen up, you beautiful band of data-hoarders and silicon-worshippers! If you are still paying monthly tributes to the cloud giants just to ask a chatbot how to boil an egg, you are officially Edan (crazy)—and not the good kind. Why let Sam Altman have all the fun when you can turn your bedroom into a data center that sounds like a jet engine and doubles as a space heater? Today, we are diving deep into the madness of setting up an AI Server Homelab. We aren’t just running a script; we are building a local brain using Ollama, Open WebUI (OWUI), and the sheer power of Proxmox 9. Grab your screwdriver and some strong coffee; it’s about to get technical, loud, and incredibly rewarding.

The goal is simple: total digital sovereignty. We want a local LLM setup that rivals ChatGPT but runs entirely on your own hardware. Whether you’re rocking a $2000 EPYC beast to crunch through Deepseek R1 671b or a modest setup with an AMD iGPU, this guide covers the “how-to” for the modern homelab enthusiast. We’re focusing on the Ollama setup within a Proxmox LXC environment because, let’s be honest, virtual machines are so 2010. We want that sweet, sweet near-native performance with the efficiency of containers.

The Foundation: Hardware Selection and the EPYC Dream

Before we touch the keyboard, we need to talk iron. You can’t run a 671-billion parameter model like Deepseek R1 on a toaster. Recent community breakthroughs have shown that a $2000 EPYC system is the current sweet spot for running high-end models locally. Why? Because memory bandwidth is the king of AI, and enterprise-grade CPUs with multichannel DDR4/DDR5 give you the lanes you need to move those weights around.

However, if your budget is more “instant noodles” than “enterprise hardware,” don’t panic. The AI Server Homelab is surprisingly flexible. You can build a quiet, efficient setup using Docker + AMD iGPU or a standard Ubuntu-based box with a used NVIDIA GPU. The critical components you need to consider are:

The Hypervisor: Proxmox VE (we’re looking at version 9) is the gold standard for homelabbers. It allows you to segment your AI services from the rest of your network.
The Compute: NVIDIA is still the easiest path due to CUDA, but AMD support via ROCm and iGPU passthrough is rapidly maturing.
The RAM: For LLMs, RAM is often more important than the CPU. If you’re running Ollama, you need enough VRAM (GPU) or System RAM (CPU) to hold the model weights.

Step 1: Preparing the Proxmox Host – Don’t Skip the Ritual!

I have seen many brave souls fall into the abyss because they rushed this step. You must update the Proxmox Host system correctly before you even think about passing through a GPU. This isn’t just a suggestion; it’s a survival requirement. If you ignore the order of operations, you will find yourself rerunning the initramfs install portion later while crying into your keyboard.

First, access your Proxmox shell and ensure your repositories are correct. Then, run the standard updates:

apt update && apt dist-upgrade -y

Next, we need to install the headers for your kernel, which are essential for the NVIDIA drivers to hook into the system properly. This is where the GPU Passthrough magic begins. Use the following command to ensure you have the necessary build tools:

apt install build-essential dkms pve-headers-$(uname -r)

Wong Edan’s Pro-Tip: If you are planning to use an NVIDIA GPU, you must blacklist the open-source Nouveau drivers. If you don’t, the kernel will grab the card before the proprietary drivers can, and your AI server will be as useful as a chocolate teapot.

Step 2: The Dark Art of GPU Passthrough in Proxmox LXC

To run Ollama with hardware acceleration inside a Proxmox LXC, we need to share the host’s GPU resources with the container. This is a bit like trying to share a single lollipop between two toddlers—it requires finesse and firm rules. This is often where people run into Proxmox CUDA installation issues.

You need to install the NVIDIA drivers on the host but NOT the CUDA toolkit (unless you really want to). Inside the LXC container, you will install the libraries but not the kernel modules. The versions must match exactly. To verify your host can see the card, run:

nvidia-smi

Once the host is ready, you’ll need to modify the LXC configuration file located at /etc/pve/lxc/ID.conf. You’ll be adding lines to allow the container to access /dev/nvidia0, /dev/nvidiactl, and /dev/nvidia-uvm. Without these permissions, Ollama will revert to CPU mode, and your model generation speed will drop faster than my motivation on a Monday morning.

Step 3: Setting Up Ollama – The Heart of the Homelab

Now we get to the fun part. Ollama has revolutionized the AI Server Homelab Beginners Guide landscape because it packages complex LLMs into a simple, manageable service. No more wrestling with 70 different Python dependencies or broken conda environments.

Inside your LXC (running Debian 11 or Ubuntu), installing Ollama is a one-liner:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, Ollama runs as a service in the background. You can start pulling models immediately. Want to try the latest? Run:

ollama run llama3

But wait! If you’re a real power user, you’re looking at the Ollama Multimodal features. This allows you to feed images to your local AI. Just remember, multimodal models are heavier and require more VRAM. If your Proxmox 9 LXC setup is correctly configured for GPU passthrough, Ollama will automatically detect the CUDA cores and start munching on those tokens at lightning speed.

Step 4: Open WebUI (OWUI) – Your Custom ChatGPT Interface

Running AI in a terminal is cool for showing off to your cat, but for daily use, we want a “ChatGPT-like” experience. This is where Open WebUI (formerly Ollama WebUI) comes in. It provides a stunning, feature-rich interface that supports user accounts, model switching, and even RAG (Retrieval-Augmented Generation).

The best way to deploy OWUI is via Docker. If you are already in a Debian-based LXC, install Docker and run the following:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/data --name open-webui ghcr.io/open-webui/open-webui:main

Why the host-gateway flag? Because the Docker container needs to talk to the Ollama service running on the host (or the LXC). Once this is running, you can point your browser to http://[LXC-IP]:3000 and boom—you have your own private, local AI assistant. No subscriptions, no data harvesting, just pure, unadulterated intelligence.

Step 5: Advanced Setup – vLLM and Deepseek R1

For those who have built the aforementioned $2000 EPYC beast, you might find Ollama’s simplicity a bit limiting. You might want to explore vLLM local AI. vLLM is a high-throughput serving engine that is particularly good at handling massive models like Deepseek R1 671b.

Deepseek R1 has caused a stir in the community because it offers performance comparable to proprietary models but is fully runable locally if you have the hardware. Setting up vLLM requires a bit more Linux “Kung Fu,” involving Python virtual environments and specific pip installs for torch and vllm. However, the performance gains in tokens-per-second are worth the headache if you are serving multiple users in your homelab.

“The setup that lets you run a custom ChatGPT isn’t just about the software; it’s about the orchestration of Proxmox, Docker, and the underlying hardware drivers working in perfect harmony.”

Troubleshooting the “Initramfs” and Driver Nightmare

As mentioned in the Real-World Context, the initramfs stage is where most beginners fail. When you install NVIDIA drivers on the Proxmox host, the kernel needs to be updated to recognize these changes at boot. If you see an error where the GPU isn’t detected after a reboot, you likely need to run:

update-initramfs -u

Another common issue is the “version mismatch.” If your host has NVIDIA driver 535 and your container tries to use 550 libraries, everything will break. Keep them synced. Also, if you are building a “quiet” home lab using an AMD iGPU, your path will involve llama.cpp or specific Docker builds designed for ROCm. It’s a different path, but the destination—local AI glory—is the same.

Networking and Security: Avoiding the VPN Trap

You’ve built this amazing AI Server Homelab, and now you want to use it from your phone while you’re at the coffee shop. Stop! Don’t just open a port on your router. That is a recipe for getting your server turned into a botnet node.

The modern approach to leveling up your HomeLab involves avoiding traditional VPN servers with broad network access. Instead, use a zero-trust overlay network like Tailscale or a specialized reverse proxy with authentication. This ensures that only you can access your Open WebUI instance, keeping your local LLM private and secure.

Wong Edan’s Verdict: Is It Worth It?

So, should you spend weeks of your life and thousands of your currency units setting up a local AI server? ABSOLUTELY.

Building an AI Server Homelab with Ollama, Proxmox 9, and Open WebUI isn’t just about having a chatbot. It’s about learning the stack that will define the next decade of computing. From handling NVIDIA CUDA issues to mastering LXC GPU passthrough, the skills you gain here are invaluable. Plus, there is no feeling quite like watching a 70B parameter model generate text at 50 tokens per second on a machine sitting right next to your feet. It’s powerful, it’s private, and yes, it’s a little bit Edan. Now go forth, update your initramfs, and let your local AI live!