Reverend Thomas Bayes 1702-1761
Here's the base math (PDF), or you can lose yourself in the International Society for Bayesian Analysis (ISBA) web site. Fortunately, there are a lot of implementations of Bayesian classifiers (and applications that use them to actually do something useful, like SpamAssassin).
Steps for Bayesian classification of RSS feeds:
I've since gone and coded it properly, but here's the general gist of things:
- Get the feed (basic).
- strip_tags and suchlike nonsense.
- tokenize (with stemming, either way).
- classify somewhat like this
See Projects/rss2mail for some idea of what I'm aiming at.