It’s been a few months since I moved this site to Azure static storage, and I think a few notes are in order.
The site is still powered by my Wiki engine and I still keep it updated by tossing Markdown files into a
git repository, but it now has even more unique twists.
TL;DR: Instead of rebuilding the entire site whenever a single page changes (or even syncing generated content), my current engine does delta updates directly onto an Azure storage container upon
Or, to put it another way, it:
- Takes the deltas off that
- Matches them against what is currently published in
- Figures out what (if any) images, related pages and back-links need updating via a small
SQLitedatabase that holds all those references.
POSTs the generated HTML for any updated pages directly to a
$webpublic container (which it does wickedly fast thanks to my own
asynciolibrary for talking to Azure Storage).
This contrasts with the approach of many static site generators of rebuilding the entire site, saving the HTML to a folder and then syncing all of that to a public bucket/container, which can be pretty I/O intensive.
Or (like many people do these days with GitHub Actions et al) spawning and provisioning an entire VM to do so.
I don’t like either, and find the “pseudo-CI” approach a tad lazy (and wasteful).
Bang For The Buck
While it’s true that the site generator is on an “always on” (and admittedly pokey) Azure
B1ls instance (1 VCPU, 512MB RAM), that machine is shared with a bunch of other small services, and the
builder service is actually removed from RAM when idle (thanks to Piku/
uwsgi magic), so overall costs are negligible.
The process has minimal storage and I/O impact (it only needs to work with a copy of the raw content and a
SQLite database with page relationships), and takes less than 10 seconds to generate and publish a new page (or even a few hundred of them) from the moment I type
git push, which compares well with my age-old, Dropbox-driven “live” syncing approach.
I could run the whole thing on a Pi Zero (and yes, I’ve tested it on a 3A+, just because the Zero is quite slow for full rebuilds), so that’s fine.
There are a few distinct advantages with this setup:
- No more worrying about HTTP handling and the constant flurry of botnets trying to hack their way into something that was never a Wordpress site in a rather heavy-handed way.
- Much better availability as there would occasionally be some OS weirdness, VM update or configuration tweak that would take the site offline until I noticed.
- Maybe a smidgeon more performance (even considering I had optimized HTTP request handling and have been using Cloudflare for years, the site does feel a tad snappier).
There are a few things I’m not happy about, though:
- I’m not particularly impressed with DuckDuckGo site search. Even after fiddling with various things like Bing Webmaster tools (since DuckDuckGo gets part of its catalog from it), search is just not good enough. I publish a complete sitemap and have OpenGraph everywhere, but their crawler simply doesn’t do a good a service as my old
SQLite-powered full text indexer, and it’s getting very annoying.
- The engine is a bit slow when rendering the full working set of 8000+ pages (when I do layout changes, for instance). Python can only do so much, and even though I pre-bake a lot of stuff, I’d like it to be a bit faster when parsing and rendering the actual pages.
I’ve been putting off using Azure Search because I don’t want to have to maintain a search endpoint, but I might end up doing it regardless.
Nice To Haves
There are a few minor things I’d like to have done, though:
I still haven’t pushed out the code to render archives, which has an annoying bug I haven’t had the time (or willpower) to fix because it will eventually force me to break a couple of longstanding abstractions.Update: I decided to not overthink it and just use my old Wiki code to generate archive pages.
- OpenGraph can be improved little further, especially where images are concerned. Adding better image descriptions and the like feels like something I might have some fun doing, so will probably be adding Azure Cognitive Services to my rendering pipeline
- I would love to have the
builderrun as an Azure Function, but there is simply no way I can get
gitto run inside one, and Piku has rendered this kind of service so simple and easy to maintain that I don’t feel the urge to tackle the amount of extra complexity required to work around that.
Finally, the current codebase could do with a little cleaning up, since it has suffered a bit of feature creep over the years.
It’s relatively small at ~4000 LOC (discounting templating and HTML), but the actual static generator is a ~1000 LOC bolt-on that could be streamlined and blended in a little better, so I’d like to trim the whole thing back into the ~3000 LOC range.
And, in general, I just wish the entire codebase was simpler, for Python is a wonderful language but still lacks the conciseness of LISP–and I would very much like to go back to something like the Hy implementation I had a few years back1.
And yet, replacing Python would be a tall order indeed. There are a few interesting candidates (Janet, Fennel, Clojure, Kotlin and even F#), but none of them have the full range of wonderful libraries I currently rely on.
I do, however, hear the siren call of building something leaner and meaner, so I might do some experiments over summer break.
Not Going To Happen
There are a few things I most definitely won’t be doing, though:
- Building a Docker container (again, Piku has made things so easy to deploy and update that setting up a private registry, a full-blown CI/CD pipeline and whatever else would feel like a chore).
- Getting this to run on Kubernetes (ha!).
- Porting the engine to Rust (I gave it a go and realized that implementing the XML transformation pipelines I have would be a bit of a chore).
So, in essence, things are stable, reliable and with enough headroom and small things to tweak to keep being interesting, but definitely low impact.
I might do a redesign, though. It’s about time to see if there is something else I can do layout-wise (although I do like
Georgia‘s readability and universal reach, and the lack of any unnecessary frills).