This week I ended up spending a surprising amount of time out and about and have patiently been cleaning my office throughly (there’s a couple of construction sites nearby and I like to have the windows open, so dust has accumulated too much), so personal productivity is still sub-optimal.
Sandboxing
Since I wanted to have a Jupyter notebook environment with direct GPU access, I decided to clean up my LLM/ML sandbox setup and rebuild my GPU VM atop a fresh Fedora 40 install.
As much as I have been enjoying Silverblue and Bluefin, waiting 2 minutes for rpm-ostree
every time I wanted to add a new package was getting really old (and development containers felt like too much overhead for iterating quickly on what is effectively a single project), so I just re-imaged the GPU VM (preserving the data volume with all the models and data) and set up Portainer on it1.
AI Stack Dump
Here’s the current stack, slightly redacted, but with the required GPU settings and a few creature comforts:
services:
open-webui:
image: ghcr.io/open-webui/open-webui
container_name: open-webui
volumes:
- /mnt/data/open-webui:/app/backend/data
ports:
- 3010:8080
links:
- litellm
litellm:
image: ghcr.io/berriai/litellm:main-latest
container_name: litellm
volumes:
- /mnt/data/open-webui/litellm/config.yaml:/app/config.yaml
command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "2" ]
jupyter:
container_name: jupyter
image: quay.io/jupyter/pytorch-notebook:cuda12-python-3.11
volumes:
- /mnt/data/jupyter:/home/jovyan:rw
ports:
- 8888:8888
restart: always
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
searxng:
container_name: searxng
image: docker.io/searxng/searxng:latest
volumes:
- /mnt/data/searxng:/etc/searxng:rw
ports:
- 3080:8080
node-red:
container_name: node-red
image: nodered/node-red
restart: always
environment:
- TZ=Europe/Lisbon
volumes:
- /mnt/data/node-red:/data
links:
- litellm
ports:
- 1880:1880
watchtower:
image: containrrr/watchtower
container_name: watchtower
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
network_mode: host
environment:
- TZ=Europe/Lisbon
- WATCHTOWER_CLEANUP=true
- WATCHTOWER_SCHEDULE=0 30 3 * * *
- WATCHTOWER_ROLLING_RESTART=false
- WATCHTOWER_TIMEOUT=30s
- WATCHTOWER_INCLUDE_STOPPED=true
- WATCHTOWER_REVIVE_STOPPED=false
- WATCHTOWER_NOTIFICATIONS=shoutrrr
- WATCHTOWER_NOTIFICATION_URL=pushover://shoutrrr:${PUSHOVER_API_KEY}@${PUSHOVER_USER_KEY}/
This setup provides most of what I need:
- Open Web UI as a general-purpose LLM interface without any frills (I use it mostly to try out prompt styles across different LLMs and rely on its chat history for keeping tabs on experiments)
- LiteLLM as a general-purpose proxy for most LLMs except
ollama
(I have all my Azure OpenAI endpoints configured on it, so I don’t need to mess about with access keys and other annoyances) - SearXNG as a meta-search engine that can take API calls, perform RAG across multiple search engines and give me tidy results (it saves me the trouble of using Tor to talk to DuckDuckGo and finagling the parsing).
- Node-RED as a general-purpose RAD environment where it’s trivial to do API calls, sort out JSON payloads, and generally try out RAG and function calling before taking what I’ve learned and turning it into Python services (which I prototype in Jupyter and then add to a proper program).
ollama
runs outside the stack, since I’ve yet to find it useful to run it under Docker and installing the NVIDIA Container Toolkit for Jupyter was enough of a hassle.
I’ve been running most of these for around six months, and even though the stack breaks at least every couple of weeks2, it’s been a pretty productive setup.
Joining the Jetsons
Thanks to an acquaintance (you know who you are–I appreciated this a lot!), I got my hands on a Jetson Nano development kit, which I already got running using the default (if fairly outdated) JetPack image–and got far enough to realize powering it from micro-USB won’t fly if I need to use all its compute power, so today I’m rifling my drawers to find a jumper and a suitable barrel jack power supply.
Then I can go back and work my way though CUDA versions until everything I need is working–but jetson-containers
seems to have me covered.
It’s going to be interesting to line it up against the other SBCs I have been testing, and even more so if I can actually get directly comparable support for the NPUs in them.
-
I also setup my usual
xorgxrdp-glamor
config (which I keep forgetting to removeXvnc
from) and thepipewire
module, so I have a really great remote desktop experience. ↩︎ -
watchtower
tries to keep the containers up to date, but both LiteLLM and Open Web UI are really bad at handling upgrades, occasionally losing their settings–that’s why I have them without arestart
policy, since I’d rather go into the logs after their having failed once rather than having hours of retries… ↩︎