Midnight Oil

Spent a couple of hours writing RSS scrapers for a few sites I read (but which either have summary feeds or fill their regular feeds with dynamic advertising junk) and playing around with Synfig.

In the meantime, I've noticed that there is an increasing amount of splogs (the new kind of Google search magnet aimed at shoving advertising down our eyeballs by using content from actual sites) re-publishing excerpts of my feed.

I used to have a big, fat yellow warning that was added to inefficient aggregators (typically scripting-based ones that didn't honor the usual HTTP conventions), but splog generators seem to have gotten smarter, which is interesting.

But it was even more interesting when I tried to track down the IP addresses they fetch my feeds from (by comparing page modification times with my publishing times and server logs).

I wrote a Python script to query Technorati and Google for a few obscure words (including URL portions) I had used in the weeks' past postings, and lo and behold, I picked up a couple of more openly commercial Mac-oriented sites re-publishing stuff I link to a couple of hours after my posts.

Which is not strange on its own - after all, the data is out there for everyone. But out of that couple of dozen samples I took, one instance of particularly obscure stuff (an app that I didn't link from my front page) was re-published on one of those high-volume Mac sites (and by high-volume I mean with very strident advertising) something like 15 minutes after I added its node under Applications.

And yes, I'm pretty sure I found it first - guess I have a few subscribers to my full Wiki feed out there.

I wonder how long it will take for Synfig to be plastered all over the Net... Well, they deserve the publicity.

Anyway, here's the thought of the day: nothing gets lost on the Internet - it just mutates into advertising fodder and starts clogging search engines.

Tao of Mac

Midnight Oil