URL encoding using a `+` instead of `%20` for a space

The paths of many websites are navigated based on file names or pre-defined names, and these paths are basically not processed

1 Like

I spoke incorrectly. The work you’re doing may well be related, but the description you supplied is not. If your technique keeps spaces in the generated URL, then it’s not going to do at all what I want.

I’m not sure I understand everything on that page, but if I use the permalink button, I get something like this copied to the clipboard:

https://shared.tiddlyhost.com/#New Tiddler

But that’s not what I want. If I paste that in a chat or in an email, the link will look like this:

https://shared.tiddlyhost.com/#New

to a nonexistent tiddler New. This is one of the main reasons URLs are encoded.

I’m proposing merely that we change the encoding of spaces from the percent-encoding of %20 to +. I am absolutely not trying to change the habit of spaces in tiddler titles. I think those make for much nicer reading than WordsRunTogether or Underscore_Delimited_Titles or Titles+Using+Plus+Signs.

At the moment, if I try to load one of these urls, it tries to open a nonexistent tiddler:

  • https://shared.tiddlyhost.com/#New_Tiddler
  • https://shared.tiddlyhost.com/#New+Tiddler

If my proposal were in place, then the second one of those would load

https://shared.tiddlyhost.com/#New%20Tiddler

I’m glad to see that you’re doing related work, but the point of the question was not to get someone to do the work but to find out if there is any appetite for updating the core with this behavior. If the answer turns out to be yes, I will happily try to figure out how to do it effectively.

The reason for the + sign rather than a - are two-fold. First, in English, at least, - is a reasonably common character in titles, for hyphenated words. Second, there is a fair bit of prior art in the use of + replacing spaces in URLs. I mentioned Confluence, but have used others, and the most ubiquitous is probably search engines using it for search queries in URLs:

The underscore (_) would be a reasonable alternative for the first objection, but the prior art makes the plus sign (+) more familiar.

Yes, a change to this function, and one to openStartupTiddlers might be all it takes to accomplish my goals. The question is whether the core team would be interested in such changes.

Yes, although there are other issues with using encodeURI that might come into play. I was thinking that we could either wrap the current encodeURIComponent-based technique or separate out the fragment and encode it separately from the rest.

This sounds like overkill to me. Although I haven’t discussed Permaviews so far, or really thought about it much, it seems like we can also use the pchar | "/" | "?" syntax by using a tilde (~) as an additional separator, so instead of this monstrosity:

https://tiddlywiki.com/#Procedure%20Calls:%5B%5BProcedure%20Calls%5D%5D%20%5B%5BProcedure%20Definitions%5D%5D%20%5B%5BProcedure%20Parameter%20Handling%5D%5D

we might be able to use something like this:

https://tiddlywiki.com/#Procedure+Calls:Procedure+Calls~Procedure+Definitions~Procedure+Parameter+Handling

Or we could let the colon do double-duty like this:

https://tiddlywiki.com/#Procedure+Calls:Procedure+Calls:Procedure+Definitions:Procedure+Parameter+Handling

Much of this is speculative, as I don’t really understand the format. Is the first title in the list the most recently focused tiddler? Obviously this would take more work than the simple two functions above, but it should be doable.

I’m hoping that my suggestion would be able to work so that this isn’t an issue at all. There would be a minor backward incompatibility as you noted above, but that’s only for hand-crafted URLs including + characters that appear in the tiddler title. This wouldn’t affect any urls generated within TW.

Can you give an example of what you mean? If the file name contains spaces or most punctuation characters, those will need to be encoded somehow.

The problem is that we need/should go with web standards. Have a look at my first 2 links in my first post. I think that’s the related standards for the format mentioned in the OP.

IMO It’s not 100% standard because usually it’s used with HTML POST requsts which is not the case for our use case. But it seems to create valid URLs that our forum renders as a link.

I’m interested to implement it but we need to do more experiments.

I’m on mobile at the moment. So do not expect something from my side this week.

I’m pretty sure my proposal complies with them entirely. And it seems much simpler than this POST-style one.

I did spend a little time to code a first pass at the mechanism to code and decode such urls. I don’t know if it could work as simply as this, but my hope would be that finished versions of these functions could simply take the place of encodeURIComponent and decodeURIComponent in the relevant spots. There are almost certainly corner cases not covered here, but this might show the possibility.

The functions look like:

const encode = (input) => {
  const sub = (s) => s.replace(/\[\[|\]\]/g, '')
  const url = new URL(input)
  const fragment = url.hash && decodeURIComponent(url.hash.slice(1))
  if (fragment) {
    const [focus, list = ''] = fragment.split(':')
    const parts = list.split(/\[\[|\]\]\s?/).filter(Boolean).map(s=>s.trim()).map((s) => s.includes(' ') ? `[[${s}]]` : s).filter(Boolean)
    const encoded = sub(focus) + (parts.length ? (':' + parts.map(sub).join('&')) : '')
    url.hash = encoded
  }
  return url.toString().replaceAll('%20', ' ')
}

and

const decode = (input) => {
  const sub = (s) => s.replaceAll('+', '%20').replaceAll(' ', '%20').replaceAll('[', '%5B').replaceAll(']', '%5D')
  const url = new URL(input)
  const fragment = url.hash && url.hash.slice(1)
  if (fragment) {
    const [focus, list = ''] = fragment.split(':')
    const parts = list.replace(/\+/g, ' ').split('&').map(s => s.includes(' ') ? `[[${s}]]` : s).filter(Boolean)
    const decoded = sub(focus) + (parts.length == 0 ? '' : (':' + sub(parts.join(' '))))
    url.hash = '#' + decoded
  }
  return decodeURIComponent(url)
}

The idea is that if we start with something like this, which seems to be our input to this process:

https://tiddlywiki.com/#Procedure Calls:[[Procedure Definitions]] HelloThere [[Procedure Calls]] [[Procedure Parameter Handling]]

and call encode on it, we would get this:

https://tiddlywiki.com/#Procedure+Calls:Procedure+Definitions&HelloThere&Procedure+Calls&Procedure+Parameter+Handling

Similarly, https://tiddlywiki.com/#Procedure Calls would become https://tiddlywiki.com/#Procedure+Calls

Then for decoding

https://tiddlywiki.com/#Procedure+Calls:Procedure+Definitions&HelloThere&Procedure+Calls&Procedure+Parameter+Handling

would become

https://tiddlywiki.com/#Procedure Calls:[[Procedure Definitions]] HelloThere [[Procedure Calls]] [[Procedure Parameter Handling]]

which is right what we want. Moreover, an existing Permalink/Permaview URL, such as

https://tiddlywiki.com/#Procedure%20Calls:%5B%5BProcedure%20Definitions%5D%5D%20HelloThere%20%5B%5BProcedure%20Calls%5D%5D%20%5B%5BProcedure%20Parameter%20Handling%5D%5D

would yield the same value, so there should be very little in the way of backward compatibility problems. (This is the same result we would get if we called decodeURIComponent on that existing Permalink/Permaview URL.)

I’m leaving in the morning for a funeral, but I hope to spend some time on this over the weekend.

I have gained a thorough understanding of this issue in this thread. I will try and put my conclusions and views sucintly rather than showing the details. Because I have failed to communicate my conclusions so far.

My recomendation;

  • The OT - Not I do not think we should change the URL fragment encoding
  • To solve your problem I suggest tiddlers you wish to share be given titles that use underscore _ not + to replace the spaces.
    • Making sure relink updates the mention of the title elsewhere.
    • To automate we can create a button to rename ones with spaces with _
  • If you don’t like the look of _ or cant rename the tiddler I presented a whole range of tips and tricks which we can automate. I will address this seperatly if asked.

Using the existing permalink;

The URI encode decode

  • Under no circumstances should we interfear with encodeuri and encodeuri operators OR javascript functions these are fixed international standards and will be replied apon by prior plugins and TiddlyWiki script.

I will address the use of other symbols in a seperate reply to keep this simple.

If for some reason there is a compelling reason to still generate URI’s with the fragment ONLY containing the additional characters ; / ? : @ & = + $ , # there are two approaches;

  • Create a custom permalink button that does not encode the above characters?
  • Modify the core tm-permalink and or tm-permview messages to use encodeURI() rather than encodeURIComponent(), for the title/fragment only.
    • This stops the encoding of the above characters in the title
      • It also stops spaces being encoded as %20 but this could be rectified.
      • Most people using this method posibly want to avoid spaces in the title anyway.

Tiddler naming

  • All tiddlers in a wiki automaticaly result in a URI at which an appropriate encoding of it will open. The question is what encoding do you use when sharing?
  • Every tiddler automaticaly has a URI and if it does not contain spaces and only A–Z a–z 0–9 - _ . ! ~ * ' ( ) it will not be percent encoded.
    • the use of _ - . instead of spaces are perhaps the most appropriate.
  • If you are happy to have tiddler titles without spaces then the tiddler and its URI are the same thing.
  • If you would prefer not to see the above characters in a title, when the tiddler is displayed you can use a cascade that alters the display of the title as if it contained spaces.

Want a different URI for the same title? Or more than one URI

  • If you have the tiddler Filter Filter Run Prefix (Examples) create a new tiddler named Filter_Filter_Run_Prefix_(Examples) and inside it place {{Filter Filter Run Prefix (Examples)}}
    • This is what my masqurade tool does.
  • You now have two avaliable URI’s and which ever you publish they see the same contents.
  • For cases where people are unlikely to read the URI you share you can create a very short tiddler title 01 transcluding the original tiddler and share that.
    • This is in effect your own wikibased short uri tool like bit.ly etc…
    • best for when someone needs to type it in.

Other methods remain available

Ok. I disagree, but I certainly can accept that you don’t like the idea.

This might solve other people’s problems, but not mine. There will be some non-technically-savvy editors for a wiki I’m working on. I want them to be free to use spaces in the titles wherever they see fit. But I also want to make the Permalink urls as pretty and as simple as possible. (Permaview is a much lesser concern, but I would expect that a solution to one would have to solve the other.)

Perhaps I didn’t make it clear: I would certainly not suggest interfering with these operators or JS functions. Not at all. In fact, my sample uses those JS functions.

The relevant specifications are around how a URL is structured, which parts of it are required, what characters are allowed in which parts. For example, this is a perfectly legal URL:

https://%67oogle.com/,

since percent-encoded characters are allowed in the hostname and %67 corresponds to the letter g. According to the spec, this is neither better nor worse than

https://google.com.

My suggestion is simply that among the many different ways we might encode a space with the fragment section of a url, we choose to use + instead of %20. Note that en/decodeURI and en/decodeURIComponent are not meant to create a canonical encoding, only to create a legal one. This proposal does nothing more than suggest that a different legal encoding.

This does have one backward incompatibility problem. My take is that it’s extremely minor; others might disagree, though. The problem is that if someone handcrafted a URL fo a page whose name included a +, keeping the + character intact, the resulting url would no longer point to a tidler, since the + would be interpreted as a space.

Please take my “argument” as from an egoless position. I am only trying to further good coding and methods. You do what you want, but I would be remiss if I did not raise my concerns,

  • And generate URI’s with the fragment proportion / tiddlername easy to read!

So to achieve this outcome I would recommend the following;

  • In fact it’s so good I will build it, perhaps called permalink-protection
  • I am still tempted to only use _ to replace spaces.

So if we modified the permalink button to retrieve the title, replace spaces with your nominated character, + or _ and behind the scenes create a tiddler using the reformatted title, and transcluding the original tiddler from within it, the new title will be a “pretty link as needed”.

  • This is done with only an alternative permalink button.
  • This also has the advantage of indicating which tiddler titles someone has used to generate a link, because the alternate tiddler title now exists (if saved)
    • This is critical to retain Search engine optimisation and keep links valid.
    • Even if the original tiddler is renamed the old link will remain valid, relink will rename the transclusion within it and the old name will display the new tiddler, and the new name comes up in search.
    • Any link published externally whould remain valid despite changes on the site
  • Perhaps we would hide these “extra” tiddlers from the default search so they only see one.

FYI

  • Officially this is not “legal” it should use encodeURIComponent for the whole link, fragment included. In fact especially the fragment from what I read, although some are starting to break this rule.

This is an interesting proposal, with its biggest advantage over mine of not requiring any changes at all to the core.

Of course mine has the advantage of making prettier URLs for all users. It is also a familiar mechanism to many, used in search engine URLs as well as other places.

I’m off to bed, then likely offline for the next two days. I will try to catch up when I’m back. But thank you for a fascinating discussion.

– Scott

https://github.com/Jermolene/TiddlyWiki5/blob/642f8da6ed4210af9552858efaa66988e3b255ed/core/modules/server/routes/get-tiddler-html.js#L20Here is another related file

Hi @Scott_Sauyet using + to encode a space character is only allowed within the query string part of URLs, and not in the main path of the URL.

Here’s a Stack Overflow article that gives a good overview of the situation:

The key passage is:

The use of a ‘%20’ to encode a space in URLs is explicitly defined in RFC 3986, which defines how a URI is built. There is no mention in this specification of using a ‘+’ for encoding spaces - if you go solely by this specification, a space must be encoded as ‘%20’.

The mention of using ‘+’ for encoding spaces comes from the various incarnations of the HTML specification - specifically in the section describing content type ‘application/x-www-form-urlencoded’. This is used for posting form data.

As far as I can tell, any encoding used in fragment identifiers is a matter for the application, and is not affected by the spec.

TiddlyWiki uses URL encoding in a few situations:

  • to construct permalink URLs
  • to construct file paths from tiddler titles when saving static static renderings
  • to construct HTTP paths to each tiddler

I think that the only one that meets the criteria for using + to encode spaces is the permalink. Changing the others usages would for example break file paths on Windows which doesn’t allow + in filenames.

Changing the encoding used in permalinks wouldn’t in fact require a way to distinguish between the old and new encodings; a particular instance of TiddlyWiki would interpret permalinks according to its prevailing encoding. Upgrades would work, too: existing links to tiddlers with spaces in their titles would still have %20 encoding, but it would still be decoded correctly. I think the only issue would be with existing permalinks to tiddlers with a “+” in their title, which seems tolerable.

So, I think your proposal is feasible as a core change. However, I have always expected that we’d solve the problem of making prettier permalinks through some kind of slug mechanism, where the system generates a unique, readable URL fragment for each tiddler title. That way we would end up with urls like https://tiddlywiki.com/#help-improve-tiddlywiki, and any punctuation etc would be removed.

Ok. Trying to type a response on my mobile phone as an automobile passenger. Let’s see how it goes.


First of all @jeremyruston, thank you for the response. I was planning on working this through a bit more and then if it still seemed feasible, raising an issue (and perhaps a PR) on GitHub. If that doesn’t make sense, please let me know.

I think it’s subtlety different than that. The specification says that the space character cannot appear anywhere in a URL, and offers the percent-encoding as a way to encode arbitrary bytes, including the space character. Especially in the fragment part of the URL, though, there’s no proscription of what those characters actually mean. TW is already slightly altering the usual HTML understanding of the fragment as a DOM id, even if the solution is in the same spirit.

Yes, I read that before I asked and reread the specification. Some of the answers are reading in a proscriptiveness I can’t find in the spec.

What do you mean by that third one?

That was my conclusion as well.

I think so, but I’m still not sure whether or not it’s a good idea.

Although that would would be both the most familiar and more readable than my + solution, I can’t see how to resolve the ambiguities, especially given TW’s dynamic nature.

Would a visitor arriving with #more-coffee be offered the choice between the three existing tiddlers More Coffee?, More Coffee!, and MORE COFFEE!!!? Or would we arbitrarily choose one? It only gets worse with permaviews.

So I have attached in my permalink protection button. It’s a working proof of concept, On any tiddler click it and it will copy a permalink to the clipboard with the tiddlers title getting spaces replaced with _ and create a matching tiddler. We can use different rules if we wish. permalink-protection_button.json (2.1 KB) drop it on tiddlywiki to see it work, and capture a permalink on a tiddler with spaces. Look in recent to see what was created and look/edit it.

  • It would be a simple matter to hide such permalink protection tiddlers from the general search. -[object-type[permalink-tiddler]]
  • Even if you rename the source tiddler, with relink installed the same URI will show the same content.
  • If you ever publish a URI then if you never delete your “permalink protection tiddlers” then not only will the same content be displayed, but you can choose to redirect previously published URI’s to different content in the same wiki.
  • We can even adjust it to appear with the same space separated title.
  • This is better than adding slug to a tiddler because it is change tolerant and you can gather information about the links you have shared outside the wiki.
1 Like

For me personally as a user that’s a “no go”. I do not want to create 2 tiddlers to be able to use 1 of them. – Especially if I want to publish links from sites I do not control.


I think the “masquerade” concept is OK, but it’s plugin territory.

If I would need to implement a system like this one, I’d like to use my “aliases” field. Wich is also part of a plugin.

My aliases field already contains “short titles” for tiddler-links. There would be no problem, to add a new value or use an existing one for the permaview or permalink.

But as I wrote – That’s plugin territory

My solution is about publishing links into the wiki, to tiddlers, not publishing links to other resources from the wiki. It’s a single tiddler solution.

  • I am creating a second tiddler for a reason
    • it can be renamed if desired.
  • however this discussion has prompted me to look into incoming links as I have on other web technologies.
  • I can see extending the solution I gave with additional features.

I am sure we could use your alias tools here but this came from encoding permalinks.

I am not sure about your emphasis on plugins. All my solutions are json packages, but could be made plugins. If what you mean is it core or plugin, I vote plugin and argued not to do core changes at the top of the OT.

  • one advantage of masquerade is bringing system tools into the tiddlers name space and searchable without editing shadows.
  • it would be worth looking if aliases can be used this way.

This has been raised as an issue on GitHub, with a Work-in-Progress pull request demonstrating an initial approach.

@Scott_Sauyet

Can you please clarify for me?

  • Is this intended to replace all permalinks that contain spaces to + ?
  • Can it be switched on/off?
  • Is it only applied to the tiddler titles, that is the “fragment”, or the whole URI?
  • What If we prefer underscore _ ?

I too would be in favor of encoding less characters in the search or fragment parts of the permalink, to make it more readable, but believe this extended set, needs to be configurable, one reason is if someone wishes to use permalinks for GET or POST requests, It is not possible without the existing encoding. This is now possible with “HTTP Requests in WikiText”.

Yes, although only in the fragment (starting with # to the end of the url). It does more, transforming permaviews the same way, keeping the : separator between the focused tiddler and the full list, and putting commas between the list elements. Right now, if we have a current tiddler of Foo Bar Baz and also have open Qux and Corge Grault, we separate the current tiddler from the list with a colon (:), space-separate the list, wrapping any elements that include spaces in [[ ... ]], to get a basic fragment of

#Foo Bar Baz:[[Foo Bar Baz]] Qux [[Corge Grault]]

which we then use JavaScript’s encodeURIComponent to turn into this:

#Foo%20Bar%20Baz:%5B%5BFoo%20Bar%20Baz%5D%5D%20Qux%20%5B%5BCorge%20Grault%5D%5D

My suggestion is that we instead turn it into

#Foo+Bar+Baz:Foo+Bar+Baz,Qux,Corge+Grault

While this is a definite improvement in those permaviews, it is to my eyes immensely better for permalinks. This:

https://tiddlywiki.com/#Working+with+TiddlyWiki

is tremendously nicer to look at than this:

https://tiddlywiki.com/#Working%20with%20TiddlyWiki

No it’s intended to be a backward-compatible replacement for loading previously-generated links.1 And it would entirely supplant the current permalink/view creation. I see very little reason to ever want to turn it off. Do you see one?

Only to the fragment.

If the community prefers underscores to plus signs, giving us instead

https://tiddlywiki.com/#Working_with_TiddlyWiki

I’m perfectly happy with that. + is what came to mind first, as I think it’s more common to see, but Wikipedia is one of the biggest sites on the internet, and it uses _. That would be fine. It has an additional advantage in that _ seems less likely to be used in a title than +. But we can bikeshed later over the specific character to use if we decide this overall is a good idea.

But, I don’t see much sense in making this customizable. First, it would make decoding/loading much more difficult. Second it would lead to inconsistencies between wikis for little benefit. Third, the most obvious form of customization would be to allow the wiki creators to supply the character to use in replacing spaces; this would then lead to problems if they choose characters illegal in the fragment, as seems all too likely.

I’m not clear where this is coming from, but I’m pretty sure it’s irrelevant. The fragment is not supposed to be supplied to the server at all, and I’m pretty sure that no major JS clients actually do so. If for some reason a permalink is supplied as a query parameter or part of the body of an HTTP request, then it would presumably be encoded as otherwise required, but that would be layered atop whatever we’ve already done to encode it.


1 There is a caveat. If someone has a tiddler whose title contains a + or a ,, and they’ve hand-crafted a permalink to it, rather than using the one generated by TW, and that includes the + or , directly, without percent-encoding, then it will link to the wrong place, likely a non-existent tiddler, but conceivably an existing one.

@Scott_Sauyet to cut a long story short, if you are prepared to use the underscore instead, I would be happier, it is to do with encodeURI and encodeURIComponent standards. The plus + is not inside published standards, even in the fragment, although it will work in many cases.

  • Ideally this would be an optin or optout because I can forsee cases where the existing permalink encoding method is needed.

    • The following characters need not be encoded both with encodeURI and encodeURIComponent A–Z a–z 0–9 - _ . ! ~ * ' ( ) note the _ underscore is among them, + is not.
    • The following additional characters also valid when encoding with encodeURI ; / ? : @ & = + $ , # the + symbol and others are in this subset. So if we modify your proposed change to “not encode these additional characters” the permalinks will look even easier to read.
      • However I have read that not all systems may consider this a valid URI even if only used in the fragment. Thus I believe we need to allow people to opt out or in.
  • I have described this issue back in this thread but happy to give more details if you request so.

I have an alternative way to patch this to achieve the above rather than using a replace with + I can show you, we just need to add an opting/out process.