Midnight Oil

Spent a couple of hours writing RSS scrapers for a few sites I read (but which either have summary feeds or fill their regular feeds with dynamic advertising junk) and playing around with .

In the meantime, I've noticed that there is an increasing amount of splogs (the new kind of search magnet aimed at shoving advertising down our eyeballs by using content from actual sites) re-publishing excerpts of my feed.

I used to have a big, fat yellow warning that was added to inefficient aggregators (typically scripting-based ones that didn't honor the usual conventions), but splog generators seem to have gotten smarter, which is interesting.

But it was even more interesting when I tried to track down the IP addresses they fetch my feeds from (by comparing page modification times with my publishing times and server logs).

I wrote a script to query Technorati and for a few obscure words (including URL portions) I had used in the weeks' past postings, and lo and behold, I picked up a couple of more openly commercial -oriented sites re-publishing stuff I link to a couple of hours after my posts.

Which is not strange on its own - after all, the data is out there for everyone. But out of that couple of dozen samples I took, one instance of particularly obscure stuff (an app that I didn't link from my front page) was re-published on one of those high-volume sites (and by high-volume I mean with very strident advertising) something like 15 minutes after I added its node under Applications.

And yes, I'm pretty sure I found it first - guess I have a few subscribers to my full Wiki feed out there.

I wonder how long it will take for to be plastered all over the Net... Well, they deserve the publicity.

Anyway, here's the thought of the day: nothing gets lost on the Internet - it just mutates into advertising fodder and starts clogging search engines.