Notes for June 17-23

This week I ended up spending a surprising amount of time out and about and have patiently been cleaning my office throughly (there’s a couple of construction sites nearby and I like to have the windows open, so dust has accumulated too much), so personal productivity is still sub-optimal.

Sandboxing

Since I wanted to have a Jupyter notebook environment with direct GPU access, I decided to clean up my LLM/ML sandbox setup and rebuild my GPU VM atop a fresh 40 install.

As much as I have been enjoying and , waiting 2 minutes for rpm-ostree every time I wanted to add a new package was getting really old (and development containers felt like too much overhead for iterating quickly on what is effectively a single project), so I just re-imaged the GPU VM (preserving the data volume with all the models and data) and set up Portainer on it1.

AI Stack Dump

Here’s the current stack, slightly redacted, but with the required GPU settings and a few creature comforts:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui
    container_name: open-webui
    volumes:
      - /mnt/data/open-webui:/app/backend/data
    ports:
      - 3010:8080
    links:
      - litellm

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm
    volumes:
      - /mnt/data/open-webui/litellm/config.yaml:/app/config.yaml
    command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "2" ]

  jupyter:
    container_name: jupyter
    image: quay.io/jupyter/pytorch-notebook:cuda12-python-3.11
    volumes:
      - /mnt/data/jupyter:/home/jovyan:rw
    ports:
      - 8888:8888
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    volumes:
      - /mnt/data/searxng:/etc/searxng:rw
    ports:
      - 3080:8080

  node-red:
    container_name: node-red
    image: nodered/node-red
    restart: always
    environment:
      - TZ=Europe/Lisbon
    volumes:
      - /mnt/data/node-red:/data
    links:
      - litellm
    ports:
      - 1880:1880

  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    network_mode: host
    environment:
      - TZ=Europe/Lisbon
      - WATCHTOWER_CLEANUP=true
      - WATCHTOWER_SCHEDULE=0 30 3 * * *
      - WATCHTOWER_ROLLING_RESTART=false
      - WATCHTOWER_TIMEOUT=30s
      - WATCHTOWER_INCLUDE_STOPPED=true
      - WATCHTOWER_REVIVE_STOPPED=false
      - WATCHTOWER_NOTIFICATIONS=shoutrrr
      - WATCHTOWER_NOTIFICATION_URL=pushover://shoutrrr:${PUSHOVER_API_KEY}@${PUSHOVER_USER_KEY}/

This setup provides most of what I need:

  • Open Web UI as a general-purpose LLM interface without any frills (I use it mostly to try out prompt styles across different LLMs and rely on its chat history for keeping tabs on experiments)
  • LiteLLM as a general-purpose proxy for most LLMs except ollama (I have all my Azure OpenAI endpoints configured on it, so I don’t need to mess about with access keys and other annoyances)
  • SearXNG as a meta-search engine that can take API calls, perform RAG across multiple search engines and give me tidy results (it saves me the trouble of using Tor to talk to DuckDuckGo and finagling the parsing).
  • as a general-purpose RAD environment where it’s trivial to do API calls, sort out JSON payloads, and generally try out RAG and function calling before taking what I’ve learned and turning it into Python services (which I prototype in Jupyter and then add to a proper program).
  • ollama runs outside the stack, since I’ve yet to find it useful to run it under Docker and installing the NVIDIA Container Toolkit for Jupyter was enough of a hassle.

I’ve been running most of these for around six months, and even though the stack breaks at least every couple of weeks2, it’s been a pretty productive setup.

Joining the Jetsons

Thanks to an acquaintance (you know who you are–I appreciated this a lot!), I got my hands on a Jetson Nano development kit, which I already got running using the default (if fairly outdated) JetPack image–and got far enough to realize powering it from micro-USB won’t fly if I need to use all its compute power, so today I’m rifling my drawers to find a jumper and a suitable barrel jack power supply.

Then I can go back and work my way though CUDA versions until everything I need is working–but jetson-containers seems to have me covered.

It’s going to be interesting to line it up against I have been testing, and even more so if I can actually get directly comparable support for the NPUs in them.


  1. I also setup (which I keep forgetting to remove Xvnc from) and the pipewire module, so I have a really great remote desktop experience. ↩︎

  2. watchtower tries to keep the containers up to date, but both LiteLLM and Open Web UI are really bad at handling upgrades, occasionally losing their settings–that’s why I have them without a restart policy, since I’d rather go into the logs after their having failed once rather than having hours of retries… ↩︎

This page is referenced in: