Prevent Search Engines from Indexing Certain Pages

Please excuse this perhaps silly question. Is it easy to exclude certain pages (not all) of my weblog presence from indexing by search engines? (I’m thinking for example of the page with a, required by state laws, imprint).

1 Like

Sure, you can make a robots.txt file as described here: The Web Robots Pages

On weblog.lol, just create a new entry that looks like this:

---
Date: 2023-06-17 00:01
Type: File
Content-type: text/plain
Title: robots.txt
Location: /robots.txt
---

User-agent: *
Disallow: /path-to-hidden-page
Disallow: /path-to-another-hidden-page

Edit it as you see fit, and that will tell search engines where they’re not welcome to crawl/index. :+1:

3 Likes

Thanks :pray::star_struck:

1 Like

One short follow-up question: Do I have to do this on the website or is there a way to upload this page via github action as well? (I’ve tried and it does not show up in the file section in the editor and I was wondering if I need a certain file extension or subfolder or if it wasn’t possible at all.)

1 Like

For the time being, weblog.lol “files” are only recognized as entries with file metadata. I do hope to be able to support working with regular files as well at some point!

So, there is no way to upload them via GitHub then, right? I do have to use omg.lol website to do so? (Or can I save a file as .txt somewhere, put the file frontmatter in and it should show up as pages and posts do?)

You can definitely still upload them via GitHub — you just need to use the format shown above (with the front matter / metadata in the page, which I’ll paste below again for reference) and weblog.lol should recognize it as a “file entry”. I realize as I type this out that it’s all more confusing than it is (since it’s still a file, but not a “regular” file, rather a file that represents a weblog entry that is itself a file, lol… oh yikes).

Since I probably made this clear as mud, let me know if you still need any help with it. But if you put a file in your GitHub weblog folder (using the GitHub Action for upload) with these contents, you should be good to go (though be sure to modify the Disallow directives for your specific stuff):

---
Date: 2023-06-17 00:01
Type: File
Content-type: text/plain
Title: robots.txt
Location: /robots.txt
---

User-agent: *
Disallow: /path-to-hidden-page
Disallow: /path-to-another-hidden-page
1 Like

Thanks, I think I understood but was asking again because it does not work for me. I’ve copied your example, changed only the Disallow part, saved it as

robots.txt

in the same folder where I have all my other stuff, committed it and it does not show up, but other files I’ve added afterwards do. Therefore I asked for the file extension / path because I was thinking that it might must be saved with certain name or location.

CleanShot 2023-06-21 at 11.01.32

Sorry for all the trouble!

Marco

1 Like

How do you organize your files? I had to structure them like below, so they get uploaded or updated during build time:

/
.../configuration
.../weblog
...|__/files
...|__/pages
...|__/posts
...|__/templates
1 Like

I have /configuration and /weblog. The template.html and configuration.txt is under /configuration, pages and posts are under /weblog and so far it worked. Put robots.txt under /weblog and even tried with /configuration but it did not show up :man_shrugging:. Maybe I should try your approach :thinking:.

It’s no trouble — I’m sorry that this is happening! Let me dig into it and see if I can make it behave correctly. Will update soon.

Okay, got it working; tried a few more things. Using “robots.md” instead of “robots.txt” as the actual filename in /weblog dir made the file show up under Files in the Web. :sweat_smile:

1 Like

Ah, right, that’s it — every file has to end in .md to be recognized as a weblog entry. I’m working on relaxing that, but for the time being that was the issue. Sorry for the confusion!

2 Likes

Hey, Adam, jumping in here just to clarify if the --- prefix and suffix is actually required. I’m in the habit of keeping it on posts and pages, but for some reason all my Files don’t have it around the front matter and they work fine. In general, is best to have them?

1 Like

They’re totally optional! The weblog.lol parser doesn’t care if they’re present or not; it just looks for Thing: Value patterns followed by two newlines and treats everything before the two newlines as post metadata. The separators are optionally supported for folks batch importing entries from other services that use YAML frontmatter.

1 Like

Ahh, but what I’m reading there is that future-proofing myself (what if I need that?! :scream: ) might be the way to go and just have them there in case. Thanks!

1 Like