Think Different

The Real Blog Wars

After an utterly hectic work week filled with Chipmunk incidents, I finally had the time to check on my blogroll and become aware of the WinerWatcher controversy (via Simon Willison) and the whole enchilada of what I probably should start calling "the re-editing wars" between Dave Winer and Mark Pilgrim.

Amazing. But then, one of the reasons I run my site on PhpWiki is built-in versioning for every node, which makes it all the more easy for everyone to track changes and avoid controversy.

RSS, Bayesian classification and Python

I've made a little progress on my Bayesian RSS viewer (mostly restructuring code and making it more of a package and less of a hack), but nothing presentable yet. In fact, while researching a bit more, I've come across a great twist on the basic RSS viewer concept: getting your RSS news via a POP3 gateway.

Riconcito Sudaca picked up Mark's ultra-liberal RSS parser, wrapped it inside a POP3 pseudo-server, and made it possible to keep track of RSS feeds with any e-mail client.

But the brilliant bit is that he actually fetches inline images and stores them as attachments inside the "messages" fed to the e-mail client so that an HTML-enabled e-mail client can display them without generating HTTP requests.

(I'm tweaking that so that it works better with Mozilla and Mail.app, and will post the modified code sometime soon)

There are a lot of advantages to getting your RSS feed via e-mail. The first is, of course, that news items get filtered for Spam exactly like everything else (although feeds that only include headlines provide very little useful information). You also get long-term archiving, unified filtering (using your e-mail client or procmail), and the ability to forward interesting posts to your friends of colleagues.

And, last but not least, a way to keep track of all your RSS feeds no matter where you are.

The downside for me? It's written in Python. The damn language seems to crop up every time I need to do something interesting these days, so I've dusted off my copy of Programming Python, added the Python DevCenter to my bookmarks and will be adding the Python Meerkat mob to my blogroll.

Yeah, I'm taking the plunge. Going to get no end of flak from the Perl geeks, but I'd rather learn a language I can read after the code is done.

As to improvements to the POP3 aggregator, my current goals include:

Changing the way messages are formatted so that both HTML and plaintext formats work across all my mail clients (multipart/related was the first thing I added, but inline images must have CIDs to work in Mozilla)
Making it more RedHat and multi-user-friendly (log and spool file locations, init scripts, the works)
Have it only poll feeds every 30 minutes (adjustable interval), irrespective of POP3 logins (it's all too easy to flood a server with requests if you poll your e-mail accounts every 5 minutes)
Have it import OPML feed lists (maybe search for user.opml before user.txt)
Have it group new items from a feed into a single "digest" message (very useful for "headline-only" feeds from cheapskate sites). Ideally, I should be able to activate this feature per feed, but a global setting or "automagic" grouping (triggered, say, by RSS item size) might be better.
Enable it to be launched from inetd (I'm not too keen on standalone daemons, and working from stdin instead of a socket opens up other integration possibilities)
Fiddle with the RSS to mail header mappings so that dates like 1/1/1970 are changed to now() and feeds without author info are tagged as being From: sitename

An idea that comes to mind is allowing for some kind of threading (using Trackback or some other mechanism to add In-Reply-To headers to the pseudo e-mails), but it's still too early.

Besides, it's Saturday. Lots of things to enjoy outside (save for the weather, unfortunately).

Tao of Mac

Think Different

The Real Blog Wars

RSS, Bayesian classification and Python

Chip and Dale in Real Life

This page is referenced in: