Site Engine Update

It’s been a few months since I moved this site to Azure static storage, and I think a few notes are in order.

The site is still powered by my Wiki engine and I still keep it updated by tossing Markdown files into a git repository, but it now has even more unique twists.

Current Setup

TL;DR: Instead of rebuilding the entire site whenever a single page changes (or even syncing generated content), my current engine does delta updates directly onto an Azure storage container upon git push.

Or, to put it another way, it:

Takes the deltas off that push.
Matches them against what is currently published in $web.
Figures out what (if any) images, related pages and back-links need updating via a small SQLite database that holds all those references.
POSTs the generated HTML for any updated pages directly to a $web public container (which it does wickedly fast thanks to my own asyncio library for talking to Azure Storage).

This contrasts with the approach of many static site generators of rebuilding the entire site, saving the HTML to a folder and then syncing all of that to a public bucket/container, which can be pretty I/O intensive.

Or (like many people do these days with GitHub Actions et al) spawning and provisioning an entire VM to do so.

I don’t like either, and find the “pseudo-CI” approach a tad lazy (and wasteful).

Bang For The Buck

While it’s true that the site generator is on an “always on” (and admittedly pokey) Azure B1ls instance (1 VCPU, 512MB RAM), that machine is shared with a bunch of other small services, and the builder service is actually removed from RAM when idle (thanks to Piku/uwsgi magic), so overall costs are negligible.

The process has minimal storage and I/O impact (it only needs to work with a copy of the raw content and a SQLite database with page relationships), and takes less than 10 seconds to generate and publish a new page (or even a few hundred of them) from the moment I type git push, which compares well with my age-old, Dropbox-driven “live” syncing approach.

I could run the whole thing on a Pi Zero (and yes, I’ve tested it on a 3A+, just because the Zero is quite slow for full rebuilds), so that’s fine.

Pluses

There are a few distinct advantages with this setup:

No more worrying about HTTP handling and the constant flurry of botnets trying to hack their way into something that was never a Wordpress site in a rather heavy-handed way.
Much better availability as there would occasionally be some OS weirdness, VM update or configuration tweak that would take the site offline until I noticed.
Maybe a smidgeon more performance (even considering I had optimized HTTP request handling and have been using Cloudflare for years, the site does feel a tad snappier).

Minuses

There are a few things I’m not happy about, though:

My “magical” Wiki URL redirection feature was replaced by a gnarly bit of JavaScript that I’m not exactly enamoured with either. I might push that back to a Cloudflare worker, but am wary of relying too much on any single service.
I’m not particularly impressed with DuckDuckGo site search. Even after fiddling with various things like Bing Webmaster tools (since DuckDuckGo gets part of its catalog from it), search is just not good enough. I publish a complete sitemap and have OpenGraph everywhere, but their crawler simply doesn’t do a good a service as my old SQLite-powered full text indexer, and it’s getting very annoying.
The engine is a bit slow when rendering the full working set of 8000+ pages (when I do layout changes, for instance). Python can only do so much, and even though I pre-bake a lot of stuff, I’d like it to be a bit faster when parsing and rendering the actual pages.

I’ve been putting off using Azure Search because I don’t want to have to maintain a search endpoint, but I might end up doing it regardless.

Nice To Haves

There are a few minor things I’d like to have done, though:

I still haven’t pushed out the code to render archives, which has an annoying bug I haven’t had the time (or willpower) to fix because it will eventually force me to break a couple of longstanding abstractions. Update: I decided to not overthink it and just use my old Wiki code to generate archive pages.
OpenGraph can be improved little further, especially where images are concerned. Adding better image descriptions and the like feels like something I might have some fun doing, so will probably be adding Azure Cognitive Services to my rendering pipeline
I would love to have the builder run as an Azure Function, but there is simply no way I can get git to run inside one, and Piku has rendered this kind of service so simple and easy to maintain that I don’t feel the urge to tackle the amount of extra complexity required to work around that.

Engine Maintenance

Finally, the current codebase could do with a little cleaning up, since it has suffered a bit of feature creep over the years.

It’s relatively small at ~4000 LOC (discounting templating and HTML), but the actual static generator is a ~1000 LOC bolt-on that could be streamlined and blended in a little better, so I’d like to trim the whole thing back into the ~3000 LOC range.

And, in general, I just wish the entire codebase was simpler, for Python is a wonderful language but still lacks the conciseness of LISP–and I would very much like to go back to something like the Hy implementation I had a few years back¹.

And yet, replacing Python would be a tall order indeed. There are a few interesting candidates (Janet, Fennel, Clojure, Kotlin and even F#), but none of them have the full range of wonderful libraries I currently rely on.

I do, however, hear the siren call of building something leaner and meaner, so I might do some experiments over summer break.

Not Going To Happen

There are a few things I most definitely won’t be doing, though:

Building a Docker container (again, Piku has made things so easy to deploy and update that setting up a private registry, a full-blown CI/CD pipeline and whatever else would feel like a chore).
Getting this to run on Kubernetes (ha!).
Porting the engine to Rust (I gave it a go and realized that implementing the XML transformation pipelines I have would be a bit of a chore).

So, in essence, things are stable, reliable and with enough headroom and small things to tweak to keep being interesting, but definitely low impact.

I might do a redesign, though. It’s about time to see if there is something else I can do layout-wise (although I do like Georgia’s readability and universal reach, and the lack of any unnecessary frills).

Besides wanting to move to Python 3, another reason I changed was that Hy kept repeatedly breaking over time and I’d need it to settle into some sort of long-term stable form first. They’re about to go 1.0 (which took its time), but I am definitely going to wait and see in that regard. ↩︎

Tao of Mac