# Sifting Through Cyberspace

Like a lot of people, I got around 30 copies of the latest e-mail worm. The first two or three made it into my Inbox, but a quick tweak to SpamAssassin got rid of the rest. As a nice bonus, a newsletter I keep trying to unsubscribe to (but couldn't be bothered to filter until now) is now also rated as spam.

None of those 30 e-mails, of course, caused a whit of trouble to my Mac.

But the rising sophistication of e-mail worms (this one, besides posing as a legitimate update from Microsoft, even included fancy HTML formatting and links back to its purported home) is bordering on the absurd. I mean, what's the point? Is it all about doing a disservice to Microsoft? Teen l33t karma?

Oh well. The real issue for me is the steady increase on filtering for everything. Besides firewalls, ACLs, SpamAssassin, etc., we're now up to filtering... websites.

Like Brent Simmons, I have referral filters, robots.txt filters, even automatic redirections for questionable search terms and pages.

Like Kazaa, for instance. My page listing port numbers and ways to block it was so popular - since it mentioned several common loopholes around those - that I decided to take it down and redirect all dodgy referrers to it back to Google (which is now filtering for those itself).

My approach to it is simple, brutal and (so far) reasonably effective: I added a global include file for all my sites with the following code snippet:

$aRedirects = array( "/www.kazaa-ez.com/" => "http://www.google.com/search?q=kazaa&ie=UTF-8&oe=UTF-8", "'^http://askuk.directhit.com'" => "http://www.google.com", "/^http:\/\/(www\.|)slashdot.org/" => "http://the.taoofmac.com/slashdot.html" );foreach($aRedirects as $key =>$url ) {  if( array_key_exists( 'HTTP_REFERER', $_SERVER ) ) { if( @preg_match($key, $_SERVER['HTTP_REFERER'] ) ) { header( "Location:$url" );      exit;    }  }}

(code edited to remove offensive terms and notorious referrer polluters)

However, the \$aRedirects array is steadily growing, so this will most likely have to be scrapped in favour of something that can handle hundreds of patterns (probably read from a file to avoid the overhead of doing a SQL query, parsing the results, etc. Files are so much more efficient for sub-10K recordsets...)

Of course, the funniest filter I've found so far is Simon Willison's piratify. Aaaaar, matey, cider is invaluable for regular expressions, indeed...