Deployment Pains

This week I found myself late into the evening wrestling with a Kubernetes ingress controller to get multiple TLS endpoints working in a cluster and realizing that I had spent all my free time dealing with something that ought to be a solved problem by now.

In fact, the last time I decided to solve that for my particular circumstances I ended up creating piku, and the developer in me is more than slightly annoyed at the amount of hoops I have to go through on Kubernetes–writing YAML to define storage, volume claims, ingress controllers, SSL certificates, etc.

And yes, I could use PaaS services like Azure Functions or Zappa, but most of the stuff I do runs on fairly exotic language runtimes and is just plain weird when compared to your typical enterprise app. So I need a bit more control.

Too Much SRE Work Makes Rui a Dull Boy

Of course, the sysadmin/SRE in me is fascinated with the intricacy of the ritual invocations required to get things to even partially work, but more than a little weary of the amount of YAML required to do something as simple as deploying multiple services with Let’s Encrypt, so today I set up a Rancher server, “imported” my existing AKS cluster, set up most of what I needed with a few clicks and exported the YAML back for safekeeping.

But that got me thinking–my work forces me to do an insane amount of context switching as I move from customer to customer, and I ended up doing more meta-coding around infra and data flows than actual application code over the past couple of months.

And the overhead involved in understanding the infrastructure just keeps piling up, let alone being able to evolve dozens of different architectures over time1.

Less Pain, More Fun

But let’s get back to lessening the pain of developers through automated deployments, and take my own example as (somewhat biased, but likely viable) yardstick. Right now, most of my personal stuff falls into three categories:

  • “push and forget” websites and simple workers that I deploy in a Heroku-like fashion through piku, which provides instant gratification by letting me deploy nearly anything via git push, and which sets up TLS automatically for me.
  • Containerized applications (either pre-packaged stuff like Node-RED or n8n that I use as accelerators or complex API endpoints) I deploy using Traefik on a couple of VMs by editing a single docker-compose file, and which also sets up TLS automatically for me (theoretically2).
  • Containerized applications I struggle to deploy on Kubernetes by editing a maze-like structure of little YAML documents, in a perverse reversion of origami crafted from ransom letters.

Rancher positions itself as the “easy button” for the latter (and it demonstrably helped me this time around), but I can’t help but think it’s just papering over the cracks of an overly complex solution. Of course Kubernetes does solve a lot of issues related to scalability and reliability, but many projects simply don’t need it.

And, crucially, I have the most fun when I deploy on the first two environments, because there is much less cognitive overhead and I can focus on the code I’m writing rather than how to get it to run in a sane, well-configured environment in less than 2 minutes.

This is not a new argument (and I am not giving up on Kubernetes here), but my current imbalance between “SRE time” and “dev time” is becoming a strong incentive to see if I can easily tweak piku to deploy simple container services against a “raw” Traefik setup or k3s

Going Dark

In other news (and flipping over to the dev side of things), I finally found the time to add prefers-color-scheme: dark (i.e., “dark mode”) support to the site, and since that involved a few color scheme tweaks I also made syntax highlighting WCAG AA-compliant–the rest of the color scheme likely isn’t, but there were a lot of long-overdue tweaks to be done to the CSS and I ended up doing around half of them in a single pass.

The compelling event was getting tired of “white flashes” when looking up stuff in the evenings from my iOS 13 devices (all of which switch to dark mode automatically now). I’m reasonably happy with the results so far, but I’m letting it sink in and adding prospective tweaks to my to-do list.

uWSGI Weirdness

As to the back-end, I’m still happy with the engine, especially now that I removed Dropbox sync and post via git, which is hopelessly geeky but saves me a lot of headaches regarding syncing and remote editing.

The current codebase is now around three years old and pretty much battle-tested (even if it has a few legacy warts to tolerate several generations of post formats), but I’ve trying to track down a niggling cache corruption issue that occasionally messes up the home page (and apparently nothing else).

I suspected CloudFlare at first given that I traditionally play merry havoc with all sorts of HTTP headers to minimize server-side processing, but I currently suspect it is due to my running the web workers inside uwsgi as a single process with four threads and using functools.lru_cache, which seems not to be thread-safe–so after three weeks of live testing, I just merged the new caching policy to master and expect things to remain stable for a year or two.

Yeah, right…

  1. And I do mean architecture, not infrastructure or more natural ways to express it. Stuff like Pulumi (despite cute, flexible and quite readable) is just another meta layer (and I already use Terraform), so it’s not the answer here. ↩︎

  2. Traefik 2.0 seems to be a bit brittle, though–a friend has been trying to do something as simple as password-protect its dashboard for a couple of days now to no avail, and I’ve had Let’s Encrypt fail in mysterious ways a few times. ↩︎