Reverend Thomas Bayes 1702-1761
Here’s the base math (
Steps for Bayesian classification of RSS feeds
I’ve since gone and coded it properly, but here’s the general gist of things:
- Get the feed (basic).
- strip_tags and suchlike nonsense.
- tokenize (with stemming, either way).
- classify somewhat like this
See rss2mail for some idea of what I’m aiming at.