Re-Linking

Over the past couple of years, I’ve been teeter-tottering between the need to clean up some things in this site, the notion of re-writing the engine completely (again, for the third time) or just going “static” and use something like Hugo or my little sample data pipeline to push the whole thing to a storage bucket of some kind.

That goes against thr entire philosophy of the current engine, which is heavily optimized for on-the-fly rendering (a throwback to its ancient PhpWiki origins, the time I spent tuning everything for low-level HTTP responses, stacking Varnish on top, etc.).

These days I can just use CloudFlare for all of that HTTP optimization, so a lot of that code is just sitting there. But this was also designed to be an internal Wiki engine, which is why it still has its own internal indexing, automatic re-linking and search mechanisms, and those I don’t want to throw away just yet.

In fact, I use an older, internal instance of it at home for miscellaneous stuff (I’ve yet to find a simple private Wiki that I like).

But turning the current engine into yet another static site generator is something I’ve been loathe to do. And that’s largely because it’s Python, which means it’s beautifully concise, trivially easy to hack but slow to render all the 8000+ pages that I’ve published over the past 18 years.

On the other hand, none of the current crop of static site generators fit my needs. To point out just two examples (and yes, I’ve looked at many more) Hugo is extremely popular but would require me to edit every single file (which is just not going to happen), and Zola is by far the sanest thing I’ve found but would at least require massive batch renaming and some (less, but still a lot) of editing, largely because it decided to use TOML for front matter (which no Markdown editor currently handles properly).

And starting anew is not a trivial option, because I still use this site as a personal notepad for things of interest as my pages on JavaScript, Python and Clojure readily attest. And that kind of content relies on a heavily customized bit of link table generation I would always have to code in somehow.

But either way it goes (and I’m currently leaning for rolling my own as a quick hack) I need to do a few things ahead of time.

To go static, there is a lot of old content to clean up (some of it is still Textile or even raw HTML), and (most notably) the URL convention I’ve been following throughout the years has a few quirks (like pages with a space in their names), so I’m going to tackle those first and do two things:

  • Normalize space handling and casing across every single page (and use the engine’s built-in auto-aliasing to generate 301 redirects so that at least search engines can figure things out, even if I’ll be breaking some direct links).
  • Clear out (or at least annotate) some junk that is no longer applicable to this decade (like pages about ancient apps that are long gone).

The first bit is starting now, since the sooner the better in terms of SEO–I want people to be able to at least search for things if they come across an external reference from years back and whatever I pick for static hosting can’t do redirects or something like that.

And since CloudFlare now lets me plug directly into the Internet Archive, I can check what is and isn’t being handled properly.

The second will take longer, and will certainly happen piecemeal over the next few months. I have a lot of outdated content, but some of it I’ll keep around for entertainment value.

So please expect some link breakage for a few days (perhaps weeks) as I change page names and do some surgical site-wide searching and replacing based on a few patches I’ve been doing to the engine itself to spit out lists of broken/non-conformant internal links.

Most recent content shouldn’t be affected, but some older stuff may be hard to find the first time around.