Bayesian Classification

Reverend Thomas Bayes 1702-1761

Named after the Reverend Thomas Bayes, Bayesian classification (or filtering) is all the rage these days.

Here’s the base math (PDF), or you can lose yourself in the International Society for Bayesian Analysis (ISBA) web site. Fortunately, there are a lot of implementations of Bayesian classifiers (and applications that use them to actually do something useful, like SpamAssassin).

Steps for Bayesian classification of RSS feeds

I’ve since , but here’s the general gist of things:

  • Get the feed (basic).
  • strip_tags and suchlike nonsense.
  • tokenize (with stemming, either way).
  • classify somewhat like this

See for some idea of what I’m aiming at.