On Yaki and Google App Engine

Thanks to the guys at Google (yay!) I’ve had a go at Google App Engine to see if I can port Yaki atop it and, eventually, migrate this entire site into the Google cloud. And yes, I think it’s doable, but it’s bound to take a while.

Yaki has, at its core, a little above 3800 lines of code right now (that is excluding Snakelets, extra Python libraries and HTML). That relatively meagre amount of code has been holding this site together for almost a year with nary a hitch (save those I cause myself from time to time) and manages nearly 5000 Wiki pages (including 2000 blog posts and a little above 500 linkblog items).

All in all, it’s one of the pieces of code I’m most proud of (although it should be said that the codebase that is currently up on Google Code is woefully outdated – something I cannot see getting fixed anytime soon).

Its resilience and flexibility (at least where my wants and needs are concerned) are, of course, directly attributable to what it’s coded in. Python is a truly wonderful language to code in when your time is limited and you only re-visit your stuff every six months or so.

And Snakelets remains, in my eyes, one of the hidden gems of Python web application development – WSGI and its ilk be buggered.

So moving to App Engine is going to be challenging, to say the least.

Yaki Design Principles

Yaki was heavily influenced by my loathing of database storage and my love of the Java application server model1. As such, it relies very heavily upon:

  1. Filesystem storage – all the Wiki pages are stored on a read-only, plain filesystem tree, which allows for extremely easy editing and versioning and offline editing – today I use Mercurial to version all the content and mirror it across my Macs.
  2. Pre-rendering and caching – all the content is pre-rendered as HTML and handled as such internally, and all of it is stored in a way that makes serving a page mostly a matter of spitting up static files (filtered through the Snakelets template engine, but with negligible overhead).
  3. Background Indexing – all the internal Wiki links are tallied by a background indexer that crawls the content and builds a set of internal hash tables kept in RAM.
  4. Application Contexts – This is the reason why I went with Snakelets instead of all the other WSGI frameworks. I can keep data in RAM that is persistent across requests and URL handlers, making it trivial to build stuff like SeeAlso and the Referrers table.
  5. Minimal Exposure – you can’t break Yaki via the web. At least not without breaking Python or Snakelets, since the only user inputs are the search form and the Archives. I suppose it might be hackable in some way (and many have tried, confusing it with a Wordpress site), but right now it has no web interface to speak of (I publish stuff via Mercurial and a particularly convoluted SSH configuration), and I would call it reasonably secure (as such things go).

Likely Impacts of Porting

Well, there are a few obvious ones – storage, data model, having to put up with a database again, losing application contexts and reverting to a CGI-like model, etc. But regardless of the actual changes to the Yaki codebase and data model, there are a few things that worry me:

  1. Content Migration and Easy Editing – I need to figure out a way to be able to import and edit the site content without too much hassle. Right now I foresee two possible approaches: either I implement an HTML page editing back-end (which I loathe to do, since it completely breaks my I-can-edit-with-vim philosophy and adds a number of security hassles), or I pull out all the stops and try to build a WebDAV or MacFUSE interface to the data store (either is equally fraught with pitfalls, but there’s PyFileServer to build upon).
  2. Real-Time Rendering – I loathe the prospect of wasting CPU cycles, but it seems that my pre-rendering approach will have to go out the window – yes, there is handler caching, I can optimize HTTP transactions to my usual paranoid standards and I could store pre-processed content inside the data store, but it would be wasteful in terms of storage. I’m not sure how I would handle Wiki backlinks, but that might be fixable through a decent data model (and I spent enough time rooting inside PhpWiki and mySQL to come up with something).
  3. Search – Of course I’d use Google Search. But, still, there are a number of things I can do with the current approach that need major re-thinking. Some of them might be doable by changing the storage model completely and adding extra metadata to entries, but most won’t.

Other Stuff

Regardless of the architectural aspects, there are a few things to note concerning App Engine:

  1. There is very little support for static content – I can store page markup in the data store without too many hassles, but there is no simple way to store or manage images and media files. Plus 500MB is most likely not enough for my current data set (this Wiki comprises nearly 400MB in raw markup and inline images alone, not taking into account additional stuff like indexing, the current Snakelets code, additional libraries, etc.)
  2. There is no support for background processes – this is a biggie for building the wiki backlinks tables, doing data cleanups, fetching my linkblog entries, and whatnot (it is also one of the main strengths of Snakelets). I could probably get away with using some sort of timer mechanism, but there isn’t one.

Next Steps

Still, it’s doable. I’ll be fooling around with it, but it took me over a year to migrate from PhpWiki to Yaki – I expect it will take me at least that long to move to Google App Engine, and it may well happen that the stuff I’m worried about now will get fixed in the meantime.

Right now, and although I’m likely to take a stab at doing a bare-bones Yaki port and do the brutal thing (i.e., abuse the data store and treat it like a file system), I already have a pretty decent project to deploy atop App Engine – I think it’s perfect for plugging the hole in my current RSS setup and re-implementing the Bayesian classifier logic I had bolted on to newspipe.

There will, of course, be more to it, but right now I have other stuff to worry about – like my home renovation, which is running late. Too damn late for my liking.

Ah well.

1 Yes, there is one thing about Java that I like – containers. It’s about the only thing, I think.