Thanks mostly to television (the shows are so dull my brain rebels from lack of adequate stimuli), I've been tinkering with Bayesian classification of RSS feeds in PHP. The actual RSS parsing is handled by Magpie, which saves me a whole lot of trouble. It is also very well written, and I just dropped it into my usual framework, changed a couple of lines, and off it went.
After a bit more tinkering, I'm now joyfully pulling in random feeds and generating word frequency counts (with a sidetrack investigation into Stemming as an aid for tokenization).
However, some interesting issues remain unaddressed, such as the actual classification code, back-end storage, UI and a few design topics:
- Should I simply classify RSS items as "interesting/not interesting" or aim at multiple categories?
- How should I rate items (i.e., train the classifier)? Since it will be a web-based application, click-throughs will certainly be useful, but I'm not sure it will be enough.
- Should I generate another "sanitized" feed comprised solely of "interesting" items? How do I add rating info to that?
Actually, the more I think about this, the more it looks like Peerkat. I suppose I could try hacking Peerkat to do what I want (there is certainly no lack of Bayesian classifiers for Python), but I've come across two reasons not to:
- I'm not fluent enough in Python to go about hacking Peerkat into submission (I also don't like the UI much, but I'm very peculiar about such things and don't hold it against Rael :))
- I want to code a Bayesian classifier in PHP. Currently here seems to be none, and even if it existed, I'd probably want one I could drop in to my framework with minimal hassle - PHP coders are not known for consistent coding styles, or even for proper OO design (Magpie and PhpWiki being two notable exceptions).
AppleScript Idle Handling
The Reality Distortion Field Strikes Again
Whatever else you might say about him, Steve Jobs has a way of jolting people. To my utter bewilderment, Statesman looks at his usual mock turtleneck and slacks uniform and tries to be scholarly about it. Oh well.
Would you believe it's been 7 years since we crowded through the first portals and began fragging the living daylights out of each other? The amount of people still hacking Quake I is simply amazing...
The Mini-ITX shrinks even more.
Still a bit pricey, though, but it lets me add a 2.5" laptop HD.