Blocking LLM/AI scraping

Can we block LLM’s/AI from scraping our Weblogs and profile pages etc?

Maybe like how it’s possible in Bear.blog via robot.txt whatever that is.

2 Likes

Yes! If you create a paste.lol entry with the title robots.txt, that will serve as the robots.txt file for your profile page.

For weblog.lol, you can just create an entry that looks like this:

---
Date: 2024-01-01 00:01
Type: File
Content-type: text/plain
Title: robots.txt
Location: /robots.txt
---

User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
...

Hope this helps!

3 Likes

Thanks Adam!

So for the profile page part I only need to create a paste with robot.txt as the title? No text body or anything else?

Piggybacking off this: does this need to be on every page you don’t want scraped? Or just like make one page with this and the entire weblog is safe?

You’d only need to create a single robots.txt file and that’ll cover your entire weblog! You should be able to do it by just copying and pasting the example above into a new weblog entry. :+1:

2 Likes