URLs with commas and ampersands (edited title since I have more info now)

Updated info at bottom of post


As the title suggests, a recent post with a comma in the title and the resulting URL did not strip that comma out.

In most places this doesn’t seem to be an issue, but I was running a performance test yesterday and that made me take a look at the URL, which is when I noticed the comma.

For reference, the performance test is https://www.websitecarbon.com/ asks for the URL and then provides a general grade for the “greenness” of the page based on some set of criteria for carbon footprint of your webpage.

It flagged the URL, as seen in this screenshot, and I can only assume it is the comma.

This post with the URL not getting stripped is odd because it is part of a series that all have a comma in the title and this appears to be the only one that didn’t get stripped. I also have a very long series (six posts) that all have a comma in the title and they all got stripped from the URL.

So, I guess this is more of a question of why do you think it happened? can it be fixed and I could perhaps use the URL mapping to resolve sending it to a different URL, etc. Any ideas appreciated.

Edit: this is happening on today’s post as well, so now I wonder if something changed in the last two days with URL processing? Here’s the two URLs w/ the commas:

  • https://weblog.anniegreens.lol/2023/11/diving-into-indiewebify-me-&-microformats,-part-vii
  • https://weblog.anniegreens.lol/2023/11/working-for-a-living-when-your-living-isn't-working,-part-iii
  • https://weblog.anniegreens.lol/2023/11/diving-into-indiewebify-me-&-microformats,-part-vi

(^please note these won’t work anymore because I added a hardcoded Location without the special characters)

Oh, dear! It looks like all the posts now with a comma in the title where the URL originally didn’t have a comma now do and the previous URLs are 404-ing :grimacing:

^ignore that, it looks like the post in question also had the comma in the URL, so there are three total, I just didn’t notice the third one until just now, added to list above

LOL no, I was right, it wasn’t there before. It seems that if I edit one of these posts with a comma in the title, the previous URL without the comma gets changed to have the comma and then the old URL which I have linked elsewhere, because it is a series, returns a 404. Eek not sure what to do here.

Sorry to keep adding on here, in addition to the commas, there are also ampersands in the URL. I’m going to stop for tonight, but I think this might be a bigger issue with both commas and ampersands not getting stripped. So as long as I don’t edit anything else, it is just the three URLs listed above, two of which have both the commas and ampersands, one has just a comma because there is no ampersand in the title.

FINAL UPDATE for tonight (I think): my workaround for now is to add the “Location” front-matter and make it the URL is should be or previously was without the comma and ampersand, that way I won’t be sending people to 404s if they navigate between series posts. I also added a new URL map for one of the “wrong” URLs with the special characters to point to the “correct” version without the special characters because I know that post was getting traffic (thanks to Tinylytics) and I’d hate to have changed it and have a bunch of people hit a 404. Though I’ll now have a weird hit/Kudos mismatch, but oh well, better than 404s.

Well, it wasn’t a fun night. After adding the URL maps to the config, without remembering, this triggered every blog post to acquire the special characters in their URLs if they have them in their titles. I obviously couldn’t let that stand as this isn’t a “new” blog anymore and those links are out there, they’ve been shared in some newsletters and other places and a few of them still get a bit of traffic.

So, I’ve gone through and “hard-coded” every URL with the Location metadata. Still, there appears to be an issue with special characters now getting put into the automated URLs from posts with special characters in their titles.

Best I can tell this started Wednesday. But if you edit a post and save it, even though it was published before Wednesday it will get saved with a different URL with the special characters unless you add the Location metadata to the front matter with the preferred/correct non-special character path. And if you edit config, it triggers the same.

Probably something that should be fixed sooner rather than later or everyone will be getting 404s.

Oh, gosh, I am so sorry about this. This was a case of me fixing another unrelated bug a few days ago, and not realizing the possible downstream impacts. Ugh…

I’ve reverted the prior fix, so things should now behave as they used to. I feel terrible about what you went through, though! I need to work out some kind of “bat signal” that anyone encountering a serious issue can use to get more immediate help. Emailing help@omg.lol is usually a good idea, but ideally we’d have something even more attention-grabbing for times like this. I’ll put some thought into options for sounding a more effective alarm so that hopefully nobody has to go nuts trying to deal with a problem like this for any longer than needed. :sweat:

1 Like

Eh, it was a holiday weekend and it seems like many places were having issues! Micro.blog had a server explode (probably not literally). Luckily, I found a workaround.

My last issue to clean-up is related to using the URL map feature in config so that people still trying to hit the old URLs with the special characters get pointed to the right place.

I don’t think this is doing a redirect in the backend, but I’m not sure what it is doing. Is there any way for me to set something, since I don’t have access to anything other than config, so that the URLs actually being hit are the correct ones? This will clean up some Tinylytics stuff so that I don’t have separate logging for a handful of posts.

Currently, even with the URL map set for these URLs, they’re still registering as the old ones when they get hit in the analytics.

This is not a huge deal, just sort of bothersome.

Edit: oh, and for the bat signal, I think you hang out in other places more than Mastodon or here, so perhaps I need to get on Discord.

I’ve just updated the URL mapping behavior to perform a redirect, so you should be all set there now. Previously it was just serving the mapped content but leaving the requested URL alone — I think redirection makes a whole lot more sense, but my next step will probably be to add a configuration option so folks can choose their desired behavior.

Also, check your email!

1 Like

Okay this makes sense for these items but there are two maps I’d like to retain the original URL for (my domain landing is one, if you’ll recall I think that was why you created the map function to begin with).

Any chance of setting on per map instance? Like this list use redirects, these others don’t.

Love the idea of granular control as part of the mapping itself! I’m currently torn between two approaches:

Symbol-based:

URL map: site.tld/foo -> /other-foo
URL map: site.tld/bar => /other-bar

Where -> signifies a redirect, and => signifies an alias (i.e. the old behavior).

Directive-based:

URL map: site.tld/foo -> /other-foo [redirect]
URL map: site.tld/bar -> /other-bar [alias]

With the directive-based approach, omitting the directive would simply have a default behavior (redirection).

Any strong preference on which one to use? :smile:

1 Like

I like the directive as it is more obvious, at least to me, when reading the thing.

1 Like

Thanks! Will get it up and running today and will confirm when ready. :+1:

1 Like

Curious, did this get implemented with the control?

1 Like

Finally, yes, just now! The default behavior now remains redirection, but adding [redirect] will redirect and adding [alias] will preserve the existing URL but show the content referenced in the mapping. :+1:

1 Like

Thanks for getting this in!

1 Like