An Experiment - UUID v7 for TiddlyWiki

This is fantastic! I really like this idea. I still want my human-readable urls, but this helps fill that gap, while adding a much-needed immutable id to tiddlers.

This was also my first exposure to Crockford’s base32, and I’m quite impressed. The check-digits are a little odd, expanding the alphabet for little gain, but otherwise wonderful.

I’ve only read the docs, but will try it out as soon as I’m done with my current time-consuming project.

Thank you very much for sharing!

Yea, I wanted to know, if I could shorten the UUIDv7 format and found ULID, which has a different usecase and is encoded using Crockford base32. My plugin only uses Crockford base32 encoding the 128 bits are still UUID v7.

c32 in the end is not much shorter. but I think a bit more memorable for “geeky” humans.

The checksum can be used if c32 values are “transferred” over telephone. It is an easy way to validate the whole string. …

I did find it funny to experiment with time “range sets”. I think a series of the last random elements could make a nice immutable link, that survives tiddler renaming.

Me too. – I know, my + separator will clash with your concept. But mine is only a concept an experiment. It would be nice if we could make it compatible.

This is great!

Could we get a b64 version that’s encoded in only 15 chars??? Or are there b64 poison characters that would make this nontrivial for TW?

With b64url, if you only encode the random elements with a version number it could be done with 14 characters or so, but we additionally need the timestamp, if it should replace the created field.

But the goal was, one field to get both, timestamp + random part. The field name: “created” also needs 7 characters. So it also adds to the tiddler size.

I did experiment with a 62 character and 64 character alphabets. But those values are not lexicographically sortable, which is the whole reason to use the new UUID v7 format in the first place.

Base64url would be 22 characters without hyphens and those values are case sensitive. Which is very hard to remember by humans. Even if you only need to remember partials.

Note: TW created encodes only 64 bits (ms-precision timestamp, no random bits), so bits per char is 64/17 ≈ 3.76 . The others all encode the full 128-bit UUID.

Encoding Chars per value Bits per char Sortable Alphabet
TW created 17 ~3.76 Yes 0-9
Hex (no hyphens) 32 4.00 Yes 0-9a-f
Crockford Base32 26 5.00 Yes 0-9A-Z (no I,L,O,U)
Base62 22 5.95 No A-Za-z0-9
Base64url 22 6.00 No A-Za-z0-9-_

Yes, understood. But if you need a check-sum, it does not have to be a check-digit. ISBN uses a check-digit, and to achieve its base-11 (first prime after 10), they need only add the symbol X to their alphabet, and it will only appear in the last place.

Crockford does a similar thing, but because 37 is five more than his 32 “digits” (0 - 9, A - Z, less I, L, O, and U), he adds five new symbols to his alphabet, all five of which can only appear in the last position: *, ~, $, =, and U/u. That seems overkill. There are lots of alternatives if you don’t insist on a single digit. One possibility, use the U, which was excluded not because it was easily confused with a digit, as were I, L, and O, but mostly because he needed to remove one more, and I guess he was worried about people saying FU too often, which, because he adds it back, can still happen at the end of the word. Here’s one possibility, which uses a U in one of the last two places when you need a check-sum.

u-first u-second
UA - 0 AU - 19
UB - 1 BU - 20
UC - 2 CU - 21
UD - 3 DU - 22
UE - 4 EU - 23
UF - 5 FU - 24
UG - 6 GU - 25
UH - 7 HU - 26
UJ - 8 JU - 27
UK - 9 KU - 28
UM - 10 MU - 29
UN - 11 NU - 30
UP - 12 PU - 31
UQ - 13 QU - 32
UQ - 14 RU - 33
US - 15 SU - 34
UT - 16 TU - 35
UV - 17 VU - 36
UW - 18

I don’t know why he insisted on a check-digit.

I’m sure we could get them together. The last state of play was that Jeremy wanted to consider more general-purpose tools to carry information in the URL fragment. Mine would fit in fine, but wouldn’t be the whole story. This POC would certainly fit in as well. I didn’t have the skills at the time to make the necessary changes in the TW codebase, and it got dropped for 5.4. I think my skills have grown enough that I could reasonably consider this now. I’ll try to squeeze out some time.

I just thought I would point out you can add to a tiddlywiki url a search term, if it searches for a unique id include a unique time stamp then only that tiddler opens in the story.

There are possibly many reasons to use a static reference to a given tiddler and in tiddlywiki where the unique key is title that can but often does not change over time, is a useful feature. however there are possibly many different solutions and reasons to employ a static reference. in many cases it need not be universal. With this in mind perhaps we could document the key features and degrees of freedom along with limitations that may exist in different solutions then document each, such as the one here against those items, so it is easy to compare and contrast.

Whoa! Brilliant!
I have two different types of comment I will post seperately.

First on the UUID ideas.
Sounds and looks great.
Especially the Crockford with phrases.
I can imagine scenarios where it would definitely be helpful moving stuff between wikis.

Q: If implemented would you be able to switch it off?
Some wikis I create are only titles–these don’t need any other field.

TT

3 posts were split to a new topic: Create random titles using a shuffle algorithm

If I understand the “cut-up” mechanism right. You need a “linear text first”.

  1. You split the text into an array of sentences
  2. You shuffle the array and
  3. Get a new text that way

You are right. Shuffle is one algorithm, that is part of the library. But you need the 3 steps above. You need a definition, how to split the “input text”.

The easiest way would be new-line. But that would need some preparation of the input text.

A second possibility would be a “sentence end marker” like a dot “.” … Which can be problematic for numbers. so it may be dot-space ". " … which is problematic at line ends. … dot-space OR dot-newline. That should be reasonably save to split ordinary text into sentences.

We get an array of sentences that can now be shuffled and written to a new tiddler, with an “origin” field that points back to the original text, where it comes from. This origin field could be a c32 field :wink:

So IMO you would need to be a bit more specific about:

  • How your input text looks like
  • Can you prepare your text for splitting or not
  • What can be used to split it into sentences
  • How do you want to use it.
    • widget
    • filter operators
    • ?
      -m

I recommend trying this UUIDv7 encoding: Base62id

Efforts are underway to make this encoding the standard compact UUID encoding

Hi Sergey,
That’s very interesting. Did you design it?

One of the main advantages of the c32 format is, that it allows a somewhat “human readable” structure, by allowing to add hyphens.

Hyphens do add length, but also increase human readability and memorability.

The encode / decode algorithm seems to be straight to implement and the result seems to be sortable, according to the docs.

I have to say I do not understand yet, what the encoding algorithm exactly does and why it works.

One of the key features I wanted to have, is to select a “range” of tiddlers by using parts of the ID in a URL and open several tiddlers. Is this still possible?

Yes, I am the author. But the encoding requirements and features have been actively discussed with the community for many years. In other words, it’s a collective effort.

Human readability, memorability, and ease of pronunciation weren’t important requirements. In reality, what’s required is ease of copying with a double click.

If UUIDs were sorted in binary representation or in long canonical text representation (with hyphens), they will remain sorted in the same order after encoding with Base62id.

The algorithm can’t be explained in a few words. It’s more complex than for 16, 32, and 64-character encodings. But programmers understand it easily and can code it quickly without asking questions.

Since encoded UUIDs are sorted, a range can also be selected by a few characters on the left side of the encoded identifier, where the timestamp is contained.

Yea, I did test it already. It is short, but not nice to read. Is there a full spec? RFC draft

Is it allowed to add hyphens?

Yes, this is a complete spec. It will be the basis for a separate RFC or a modification to RFC 9562. Hyphens are not allowed, as they prevent the entire identifier from being selected by double-clicking.

For readability, I recommend adding identicons generated from the UUID value.

I would recommend hiding the UUID in any format from users’ view, leaving only the identicon instead, with the ability to copy the UUID by clicking on the identicon. The UUID is intended for the information system, not for humans.

This encoding is a great idea, a very nice improvement on most existing encodings, and the click-to-copy is very useful.

But I don’t think it really meets the design goals here. While I can’t speak for @pmario, I think our notions overlap a great deal, and the reasons for his design include things that Base62id can’t really cover:

  • This should be as easily read and transferred as possible. Some of the multiple English word systems would cover this better, but that’s hard to internationalize, and much more complex to implement. If the encoding is difficult to read over the phone, it’s a problem.

  • Case insensitivity. This makes user mistakes much less common.

  • Hyphens and ranges. These work together to help us identify tiddlers written within short time-frames of one another.

  • Short unique-enough identifiers that are somewhat memorizable – 01K + 01VG in the examples.

And as much as I appreciate identicons, I don’t think TW has any place to put them by default, although they could be useful for some multi-user wikis, I’m sure.

I appreciate the work you’re putting into standardizing this. And I do like the tool. But it doesn’t feel like a good fit here to me.

I did experiment with base62 (no id) encoding for IDs for quite some time already. I do like, that it is much shorter than UUIDs. They are much shorter, but have the disadvantages Scott described.

Due to the underlaying UUID v7 origin we still can use some characters from the start and end of c62.

  • FdAvKdAg-yeCllbTI4qzZpA
  • FdAvKdDr-mzZ9EAqrvuP87I

As we can see they start with the same characters. And the F will be there for quite a long time. It contains binary 10 at the MSB (most significant bits) - Then there are the 48 bit timestamp.

  • FdAvKdD still selects a range
  • FdAvKdD + P887i … Should be unique within a wiki. Since every character represents almost 6 bits, less chars will be more specific, than with all other formats.

WikiLab edition is updated

In today’s era of instant messaging, copy-paste, and seamless integration of information systems, it’s hard to imagine anyone dictating thirty-character gibberish over the phone, much less trying to remember it. It’s easier to send a message or find a document by the words in its title. But it would be a shame if the document identifier format in this wiki turned out to be incompatible with other information systems that adhere to the standard.

Currently TW has no UUID at all. So this experiment should allow us to “play” with different possibilities. In a somewhat funny way (for geeks :wink:

TiddlyWiki is special, since our target group are non-techie users. While we still need to enable power users, that are willing to follow the “rabbit hole”.

Link rot is a real problem, especially with TW. Finding “stable” tiddler titles is difficult. Refactoring is a main workflow in a wiki.

I did create the uni-link plugin, which allows me to use aliases to create wikilinks [[alias|?]] that do not break if tiddler titles change. A tiddler can have several aliases. Those aliases have the same problem as tiddler titles, if content from different wikis are merged. They can clash, so links are not unique anymore.

An other goal of these experiments is to evaluate if the created field could be used to cover several usecases:

  • Be a timestamp
  • Be a UUID
  • Be manageable by non techy users
  • Use it as an alias-link
  • Use it to create “unbreakable” URLs
  • Use it to create TW pretty-links [[link text|FdAvT+vs]] that can be remembered if you need to

The last example is much more convenient to type, because it’s less characters. Those links should not break, if tiddler titles change. The main problem is case sensitivity, which is hard to remember.

[[link text|01KMZY-GD3G]] c32 encoded is less uniqe than c62, but case does not matter and still won’t brake on title change. It’s less collision resistant then c62 …

So there are pros and cons for every encoding type. As I said – It’s an experiment – There is no guarantee if it ever would land in the core at all.

A standard format would be useful for seamless integration with other systems: Wikipedia, search engines, AI agents, knowledge bases, etc., so that they could easily navigate to a specific document in your wiki or even analyze several related documents. Electronic documents in other information systems often have a structured text format (JSON, YAML, Markdown, TOON), rather than a binary format. Therefore, it’s important to have a standard text UUID format, not just a binary (128-bit) format.

The goal of “Be manageable by non-techy users” is flawed from a UX perspective. Users should be able to manipulate documents, not UUIDs. UUIDs should be hidden from users “under the hood.” Users should be able to see the document title or even its revision history. But there’s no reason why users should see the UUID itself. Even copying a UUID can be done by clicking on the UUID identicon or the document title.

The goal of “Use it to create TW pretty-links [[link text|FdAvT+vs]] that can be remembered if you need to” is also wrong from a UX perspective. There’s no need to force the user to remember anything. Links should be saved in folders and in logs with the navigation history. If you need to determine whether two UUIDs match, you can quickly do so by comparing their identicons or searching the text. If you only have a UUID in text format, you can match several characters in the right-hand random parts of the UUID.

It’s wrong to enter a UUID from the keyboard. Copy-paste is for that.

All UUIDv7 are equally collision-resistant, regardless of the text format. Even if you mistype one or more characters when entering them from the keyboard, the likelihood of collisions is practically zero due to the long random part.