The Bayesian Thing, Again


Every once in a while, I have enough time to re-visit my Quest for Easier Information Management, and I must say that pretty much every aspect of it is fulfilled by now, often with far simpler tools than what I anticipated at first.

Nevertheless, the RSS hydra rears its ugly head every now and then, and despite newspipe's profound effects on my RSS reading habits, Bayesian classification springs to mind every once in a while.

I first mentioned it nearly two years ago, and I've even tried doing a custom version of newspipe, but training was always a problem - in fact, there are a lot other problems, none of which are new.

Click, You Have a Browser

The main one is that using an e-mail client to view your RSS feeds imposes severe restrictions on any sort of training interface you might want to implement, and Mail.app doubly so. Adding any sort of JavaScript to e-mail messages is suicidal (for lots of obvious reasons), and I want to make sure the HTML dialect used to render RSS feeds is simple enough to ensure my mobile front-end to newspipe keeps working.

Looking at your average e-mail client, the most you can to do provide feedback is either flagging messages (which works across all IMAP clients) and clicking on links inside them. Which, of course, launches a browser window and clutters up your screen.

You can use message priorities, too, unless you're using Mail.app - which has no built-in message priority support - one of the dumbest shortcomings in Mac OS X, if you ask me. And it needs a lot more improvements, too, but that's not important right now.

The Flagging Game

Flagging does work wonders, though. My typical RSS reading session consists of firing up Mail.app, opening my newspipe folder and scrolling through messages, flagging those I find interesting for later. Or, if I'm on the move, using my front end, which looks like this:

Yes, that's a minimalist IMAP front-end - the icons should be self-explanatory, and it looks the same on my V800, my Blackberry and any other mobile browser worth its salt. Plus, it rotates and resizes images to fit the screen (based on the device's WAP profile), which is great for cartoon strips. You can get the (outdated and buggy) source here. I'll update it one of these days, honest.

Getting a Clue

So it was with interest that, inside 24h, I stumbled upon News Clues - which acts as a proxy for your aggregator, reclassifying items and moving them to a per-topic feed (a very neat approach, but I'd probably rather have it work as an HTTP proxy, which is what it seemed like initially) - and James Tauber's post on attention.xml, which (like all designs by committee) seems overly complex for general usage.

But Is It Really Worth The Trouble?

I've been wondering whether or not investing some of my precious free time in pursuing these would help me spend even less time tracking my RSS feeds, and so far I've yet to find a real reason to dive in. Automated classification is only really useful for prioritization, sure, but it can also cause you to miss those little nuggets of information that our brain is very good at noticing among a wide set of information sources. And you can only notice patterns and trends if you have a very broad range of feeds, which kind of nullifies the point of doing Bayesian classification (and culling) in the first place.

But my main issue is return on investment. Besides the effort involved in coding yet-another-web-based-aggregator (which is the only real way to fix the UI issues), flagging things for follow-up and using newspipe's neat "digest" feature (which groups all new items from a feed into a single e-mail message) is simple enough and fast enough to be a more sensible approach in the short term.

(And if you don't understand how amazingly practical it is to read RSS as e-mail, then I won't try to convince you.)

But the concept is intriguing enough to consider coding up something (assuming I have the time and patience for it) and it's interesting see that it is still being actively pursued by other people.

Maybe one of them will get it right.