Grilling Posts

After Brent Simmons, Tim Bray and Tom Insam popped up in my RSS feed practically in a row writing about baking posts and running sites with site generators rather than fully dynamic platforms, I thought it would be a good time to summarize how I grill this site’s pages in Yaki.

Right now, everything on this site is stored in Dropbox, which replaced Mercurial well over a year ago.

To post, I simply edit a text file, drop it into a folder, and it gets synced automagically to the server.

That’s it. Zero hassle. Completely free choice of format, editor, device, you name it. No headaches yet, and if Dropbox has a glitch or goes away, there are plenty of replicas - and I can replace their service with git in an eyeblink.

A post is published instantly, and Whoosh indexes the content within 30 minutes of it being online (I’ve been playing around with filesystem notifications to make that instantaneous, but Dropbox tends to get inotify all hot under the collar, so that’s not finished yet).

Comments were handled by Disqus until a few days ago2, and there is really not much else to look at rather than content.

From the moment a browser starts talking to this site, a number of things happen:

  • lighttpd1 takes the request and checks if the content is in a RAM cache, thereby dealing with 90% of the crufty bits like favicons, CSS (which is both minified and gzipped), etc.
  • If not (or if the cache is stale) it then reverse proxies it to Yaki. Yaki can handle being a front-end server just fine, but one of the sites running on this VPS is based on nodejs (which is dumb as a doornail by default) so I need something in front of both. I have been meaning to get rid of lighttpd in favor of nginx for a while, but I’ve even lazy and haven’t figured out the new setup yet.
  • Yaki then takes the request and checks if the relevant page content has been pre-rendered. Pre-rendered content is stored in a Haystack-like binary file (here’s the source for the stable version of that module - I’ve been testing an mmap()-based version, but haven’t yet decided it’s worthwhile to put in production).
  • If the pre-rendered HTML exists and the remote browser doesn’t have it already (in which case it’ll get a 304 Not Modified reply), Yaki then blindly grabs a chunk of the file and spits it out via a template (templates in Yaki/Snakelets are pre-processed Python code, so the whole thing is very fast).
  • If not, then Yaki grabs the page contents off disk (remember, there’s no database, just plain text files and images), filters it through Markdown, Textile or whatever, updates the intra-wiki link map, and sticks the resulting HTML into the haystack before running it through the site template.
  • In either case, HTTP headers are properly set.

The fun thing here is that Yaki is very dynamic. I can bolt on a number of filters to the output and re-render the HTML in other ways, do link substitutions, you name it - but the static content store and pre-baked haystack makes it brutally efficient HTTP-wise, and the simple caching tweaks I’ve added over the years more so.

You’d be surprised at the amount of “professional” CMS solutions that waste CPU cycles running a database query on every request, parsing the results and rendering them with insanely optimized engines to utterly fail at doing something so simple as outputting:

Cache-Control:public, max-age=3600
Date:Sat, 19 Mar 2011 16:46:34 GMT
Expires:Sat, 19 Mar 2011 17:01:34 GMT
Last-Modified:Sat, 19 Mar 2011 16:46:34 GMT

It’s not rocket science. Back when I was running PhpWiki and mySQL, I managed to cut down on bandwidth consumption by a full third by simply adding HTTP caching headers (before moving to FeedBurner, further tweaking also cut down on RSS traffic), and I keep seeing it on pretty much every kind of site out there - you just have to open the Web Inspector (or Firebug), refresh a page a couple of times and take a look at how many requests get a 200 OK (for retransmission) instead of a 304 Not Modified (for proper caching handling on the server side).

There are actually very few frameworks that do caching properly for you, and most server-side web development these days seems more focused on architectural sophistication than efficiency - and thus the inherent inefficiency of, say, PHP or Ruby is offset by a ludicrous amount of no-sequeling, memcaching and layering and whatnot.

All of which are fun and useful, but perhaps not really necessary for pushing content out there.

Update: I forgot to mention that Yaki also caches pre-rendered compressed content in gzip binary chunks for some things, thereby saving even more CPU cycles. I’ve been looking at request handling times, and server-side it takes around 0.02 seconds to render and serve a complex page (i.e., lots of markup with syntax highlighting plugins and all) from markup and typically around 0.005 seconds to spit out pre-processed output (from Python).

Update 2: For extra kicks, I’ve temporarily set up Varnish instead of lighttpd, to see if there would be any benefit of having an aggressive RAM-based cache (set to 128MB, which is plenty enough for the homepage, stylesheets, images, and most recent articles). Time will tell if Yaki even needs it, but I’ve been meaning to play around with it for a while (it’s extensively used at SAPO) and it might be useful to some folk - if you can’t bake, you might try glazing… :)

  1. I actually use lighttpd-improved since it has a few interesting fixes. Still, it will eventually go away. ↩︎

  2. I decided to get rid of them due to visual clutter and eventually realized I didn’t miss them, so I updated the policy page as well. ↩︎