Good Riddance, GPTBot

Just like Google is constantly indexing the Web, OpenAI is now crawling the open Web to scrape content from websites for free to train their LLM (lucrative language model) “AI” products.

But, as I learned from a post by Ethan on Mastodon, you can disallow GPTBot to get its tiny robot hands on your writing by adding those two lines of code to your website’s robots.txt:

User-agent: GPTBot
Disallow: /

Good riddance, GPTBot! 👋

~

44 Webmentions

Photo of @matthiasott
@matthiasott
@brunomiguel @matthiasott Well, looks like that's only the tip of the iceberg: https://searchengineland.com/google-content-available-ai-training-publishers-opt-out-430475 Google says all online content should be available for AI training unless publishers opt out
Photo of @matthiasott
@matthiasott
@matthiasott I also did it about an hour ago. If my hosting allowed it, I would even block their IP ranges
Photo of @matthiasott
@matthiasott
@matthiasott Hmm yes they will definitely respect this. Hmmm openai is very trustworthy. Mmmmm
Photo of @matthiasott
@matthiasott
@frederic @matthiasott tech bros making the world a worst place, one stupid shit at a time
Photo of @matthiasott
@matthiasott
@frederic this reminds me of the old days of getting infected with malware 😍; @matthiasott
Photo of @matthiasott
@matthiasott
@brunomiguel @matthiasott Maybe we should start taking content off the net and ship discs (well, flash drives nowadays) to friends again instead. 🤷; Imagine, getting an USB drive with random cool stuff every month, without knowing what's on it.
Photo of @janboddez
@janboddez
@janboddez Yes, I wrote that post on the smartphone in bed and didn’t find the time to update the robots.txt myself yet. But thanks for the reminder! I just added the two lines. ✅;
Photo of @matthiasott
@matthiasott
@matthiasott I really wish there was a way to disallow all AI bots. I don't want to opt out of search indices but it doesn't seem like there's a good way to get out of Google's LLM models

Likes

Reposts