Building The Anti-Wiki

Ah, yes, I really ought to do something about renewing the site - and show some progress where it regards my NewWikiMigration, which people still ask about every now and then.

So here's a precis. I've been half-heartedly hacking away at Yaki for a long while now, usually in hour-long breaks every week or so, depending on the amount of real work I bring home, my hopelessly sparse free time and my motivation, and it sort of works at this point, but not to an extent where I can just throw away PhpWiki.

Building a Better Content Grill

The three main goals of Yaki were:

  • To get rid of mySQL-based storage for my Wiki
  • To change from PhpWiki to something smaller and simpler
  • To be easily deployable anywhere

After looking around for a while I eventually hit upon Snakelets, which I've been using for months now for all sorts of things.

What drew me to Snakelets is that it works lot like a Java container: most of the application state resides in memory, so you can do stuff like session variables, run-time content conversion and, of course, extensive caching control - all of which are great for interactive applications.

Considering that Ajax is prone to hammering your server with a gazillion spurious HTTP requests, being able to issue replies without assembling a database query or dashing through half the inodes on your hard disk (or both) is something you should give some serious consideration and an aspect I was particularly attuned to (remember, this site ran off a 733MHz machine for years).

Plus you can run multiple applications in parallel (sharing user sessions if necessary), and it comes with a simple templating mechanism (that is also pre-processed and cached internally) and a web management interface to manage and restart individual applications. So despite not being HTTP/1.1-compliant, Snakelets is pretty damn fast, and I hope it gets more attention from the Python community, which has so far mostly ignored it in favor of more traditional CGI-based approaches.

As far as I'm concerned, it's their loss. No matter how fast FastCGI is or how neatly designed Django happens to be, Python developers are almost completely blind to the application container metaphor, and I think they're still paying the price for that - just look at the number of times they've reinvented the wheel atop standalone WSGI-compliant "mini servers"... But I digress.

As Simple As Possible

In Yaki, the content store is a folder tree. Each Wiki node is represented my a folder containing a plain UTF-8 text file, with metadata in a pseudo-RFC:2822 format:

$ cat space/Sandbox/index.txt

From: Author
Subject: Sandbox
Tags: sandbox, @wiki, demo
Content-Type: text/html


I chose this approach because the file format is pretty damned eternal - there's no shortage of libraries to handle RFC:2822 formats, the headers let me store pretty much whatever kind of metadata I want, and I can store "attachments" to the Wiki nodes in the folder alongside index.txt and reference them (in HTML) with a cid: schema - something that should be familiar to anyone who ever had to encode HTML inside MIME (I decided against a pure MIME, XHTML or XML format because any of them would be a right pain to edit directly).

Wiki cross-linking is done via scheduled indexing - using a Snakelets plugin, all content is parsed and a link table is built up in RAM (and serialized to and from disk whenever the application is restarted). Content is pre-rendered, stuffed in a template (which has slots for both page content and navigation bars), served up and cached with my usual amount of HTTP header shenanigans. So far it's shaping up nicely, and I've created a small but useful sel of generic Python classes to deal with it all.

I haven't hit upon the right way to implement Wiki plugins or tie in versioning, but with plain text files I can use just about anything, so I'm not worried - although I have mostly stopped caring for versioning, I can plug in anything from RCS to Subversion if the need arises...

Yaki currently handles text/plain and text/vnd.textile besides plain HTML, and I have written a simple HTML-to-Textile converter that is good enough to enable round-trip editing via a web interface without mangling the contents too much.

Of course, migration entails conversion of my old content - after a few stabs at trying to do this via the raw PhpWiki markup, it turned out to be significantly simpler to feed the PhpWiki-generated HTML through Beautiful Soup to strip out redundant attributes, and just store all my old content as HTML.

Using Beautiful Soup, it was a trivial matter to handle the InterWikiMap and have, say, links to Wikipedia stored like:

... <a href="Wikipedia:Two Cows">Economics</a> ...

There are a few niggling issues with bits of my old (pre-UTF-8) pages, but I hope to have those sorted out soon.

Raw Fish

Now, one of the things that you quickly realize when you have all your content neatly organized on a filesystem is that very little work is required to batch process it and generate HTML.

And let's face it, plain and simple HTML has a lot going for it. For starters, there's no dependency on a particular web server - which is something to consider when you start looking at lighttpd benchmarks, and that has been on my mind as traffic to this site keeps increasing.

Sure, you lose interactivity - but most of my site is not that interactive - there is a lot of caching and image processing going on in some sections, but by and large it is pretty deterministic - the SeeAlso navigation bars are a good example of the things that are dynamically generated for each page, but even they are cached and don't change very often - only when content is updated, which is a relatively rare event for something that works around the clock.

So it turns out that pretty much everything on the site can be generated from a static filesystem tree and the right kind of templating and dependency tracking.

But it has to be some pretty smart dependency tracking to, say, update all the SeeAlso references in Wiki articles that are linked from a new node, automatically update TOCs, render RSS and Atom feeds, etc., etc.

Enter Sashimi, a little toy project that I've kicked off today, and that boils down to trying to use Python's distutils, Ruby's rake or a plain old Makefile to maintain a Wiki-like site.

So far I haven't decided on what to actually use, but with the availability of Rubyful Soup and my constant drive to learn new things, I think rake might be it.

Wrapping it in Rice and Seaweed

There are, of course, a lot of things that a static HTML site will never be able to do. I'm not giving up on Yaki by any means - but having your content in a filesystem tree lets you try out more (simple) approaches in parallel and compare them as you go, and you can do them anywhere (on any machine, at almost any time) without requiring a full-fledged web server or a database server.

When I started hacking my first rakefile and looking at the raw HTML content against the fully-generated pages, it quickly dawned on me that creating the HTML for a good deal of the site navigation could well be handled on the client side - think TiddlyWiki, but with less pizzazz.

So yes, I could have a dumb (but fast) web server send off bits of Sashimi to a clever browser, which would then render (or most likely, decorate) it using Ajax, rendering the navigation bars and suchlike based on snippets of XML that hold the whole thing together - and those will be much easier to generate, either dynamically or in batch, than fully-fledged HTML content.

Replace Ajax and XML with rice and seaweed, and you have... Sushi.

Sushi is a bit too Web 2.0 for my own liking, but Prototype has been growing on me and I'm starting to feel a lot less predisposed against JavaScript than I used to be in the past. So doing my own Sushi might be a good way to both get over it and learn more about the gory details.

The Green Stuff

In tandem with all of this, I've recently had the occasion to stumble upon a bunch of Flash-related stuff that re-kindled my interest in ActionScript, and I've tried to devote some of my (almost inexistent) free time to playing around with it.

Given that Java has (demonstrably) lost a lot of mind-share (and browser pixels) to Flash, and that Flash is gaining more and more runtime features (like XML handling and decent JavaScript 'glue'), I've started thinking of it as a viable way to do more than photo albums.

Sure, it's a more or less monopolistic format almost solely controlled by Adobe, and I honestly don't like the idea of a combined PDF/Flash plugin (fancy having banner ads pester you to download updates to Acro-Flash?) - but it's nice, fast, flexible, and eminently spicy.

So I've started porting some Python graphics-related stuff to ActionScript, and called it (predictably enough) Wasabi.

ActionScript is a nightmare compared to the cleanliness of Python, but MTASC makes it somewhat bearable (even if you have to jump through some hoops to make do without having to endure the torment of the Flash GUI for development).

What, No Dessert?

Of course, all of these things will need time to cook - yes, even Sashimi. In retrospect, I should probably have called Yaki "Miso" (in honor of Beautiful Soup, which I use extensively under the hood), but given that these are solely hobby projects at this point (coding hasn't been my main source of income for over a decade or so), I won't get hung up on naming conventions.

I just hope it doesn't take another year to replace this Wiki, regardless of which turns out to be the best dish...