Rainy Day, and Bot Nuisances


Well, things seem to be shaping up. The photo album is fixed (but still running under PHP), RSS feeds now use the proper base URL, and there are a whole bunch of them now (one per namespace, i.e., blog, applications – now under the apps tree – HOWTO, etc.).

I will be adding them to the sidebar soon, but today I’m pretty much taking the day to read and catch up on things, since the weather isn’t very inviting outside.

Still, there are plenty of things to fix, and not just with Yaki. For instance, one of the things I’ve been doing behind the scenes is fiddling with lighttpd, which is what we use as a reverse proxy.

Right now, I’ve been shutting out nuisance bots, like @MSRBOT@ (which seems to have a lot of trouble with relative URLs and, despite what it says on that page, hasn’t picked up on robots.txt).

Yes, I know I’ve done this before – isn’t progress wonderful?

The magic for banning these nuisances under lighttpd (lest I forget sometime in the future) is this incantation:

$HTTP["useragent"] =~ "MSRBOT|msnbot/0.9" {
  url.access-deny = ( "" )
}

You can, of course, add anything you like to the regular expression. Popular terms are “sucker”, “grabber”, etc. – it should take you no time at all to go over your server logs and find quite a few more.

Have fun stomping them out.