Reverend Thomas Bayes 1702-1761
Named after the Reverend Thomas Bayes, Bayesian classification (or filtering) is all the rage these days.
Here’s the base math (PDF
), or you can lose yourself in the International Society for Bayesian Analysis (ISBA) web site. Fortunately, there are a lot of implementations of Bayesian classifiers (and applications that use them to actually do something useful, like SpamAssassin).
Steps for Bayesian classification of RSS feeds
I’ve since gone and coded it properly, but here’s the general gist of things:
- Get the feed (basic).
- strip_tags and suchlike nonsense.
- tokenize (with stemming, either way).
- classify somewhat like this
See rss2mail for some idea of what I’m aiming at.
Classifiers
- Classifier4j (Java)
- Bayesian Filter (PHP)
- btail - a Bayesian log filter