Bayesian Whatsits

Being a heavy SpamAssassin user, I've been acquainted with Bayesian classification for a while. But using it simply to filter Spam always seemed somewhat short-sighted, since most Bayesian classifiers support a (theoretically infinite) number of buckets - so sorting incoming mail into several distinct categories (instead of simply Spam and Non-Spam) is not an earth-shattering concept. Jon Udell, for one, wrote about exactly that.

Neither is building an RSS reader with Bayesian classification (Bryce Yehl, for one, has mentioned it, and the concept seems to be popping up all over the place). I'm guessing that someone is tinkering with that even as I write, and I personally would love to see Brent Simmons of NetNewsWire fame pull a stunt like this.

If I had to bet on the next "killer apps", these two - mail classification and RSS filtering - would be right at the top of my list, based solely on the actual time-saving benefits for users. After setting up Outclass (which requires POPFile) with several "buckets" to classify my corporate e-mail by project and field, I'm definetly not going back. Outlook, even with extensive use of Rules Wizard and categories, simply cannot cope with the diverse kinds of project-related e-mail I swap with colleagues, and Outclass is the only thing I could find that could deal with Exchange folders and multiple categories.

And there's a lot to build on. Besides the base math (PDF), there are several implementations of standalone, generic Bayesian classifiers like Reverend, dbacl and libbayes (these are the ones I know about offhand).

So I guess it's only a matter of time. Who knows, maybe Apple will add customizable Bayesian filtering to its next version of They, at least, have all the bits in place (and understand the need to make users' lives easier).