I haven’t activated this feature here yet because, as a fully static site, it’s not significantly affected by badly behaved crawlers. However, I’ve been moderately annoyed by random spikes in activity from seemingly normal user agents as reported by Azure storage stats. This clearly indicates that there are quite a few misbehaving, misrepresenting crawlers out there.
While I’m not particularly concerned about people using my content to train their models (if I were, I might have some fun with prompt injection before enabling similar features), I can understand why individuals who make a living from their content might object.
I am, however, a bit disappointed that this situation seems to signal the effective demise of robots.txt
as a civilized way to manage web crawling and indexing.