Calculate the text similarity between each entry

Different from finding duplicates in the text of tiddlers, and building a private word similarity dictionary

Such as using the edit distance algorithm to sort all user entries by similarity, or some other approximation algorithm

This allows me to greatly increase my sensitivity to the subtle differences between individual tiddlers in an algorithmically assisted way,(Discover the connection between them), rather than relying mainly on my brain to recall the possible internal connections between any individual tiddlers,And establish a two-way link

Any reply would be greatly appreciated

You are exploring some sophisticated uses of tiddlywiki, good on you.

This kind of thing is possible, but can get quite involved, in part because language can be quite involved.

Find some good keywords and search talk.tiddlywiki, tiddlywiki.com and with google (+ tiddlywiki, to find content from the old google groups), words that come to mind include;

  • nearby
  • neighbours
  • Fuzzy search
  • levenshtein Operator

I will return and edit above if more “come to mind”.

But a lot of these ideas (but always looking for more), have being discussed before in some quite long and valuable threads.

Also look at the freelinks plugin (core plugins) as this highlights content as links to tiddlers if they exist. That is if a keyword or phrase, is already a title, it becomes a link.

1 Like

This may seem like a complicated problem with wiki syntax, but I’m sure I’ve written this text editing length algorithm in python before, perhaps one way is to export my tiddlywiki data as a json file, use python to do the statistics for me, and then write the results back to tiddlywiki

You can always do it the way you know, but this would be annoying if done many times, I also think one way or another you can do any data manipulation you need in TiddlyWiki so try to do so first :nerd_face: and ask for help.

Hi @XYZ hopefully you wouldn’t need to resort to Python. The core levenshtein operator performs the standard Levenshtein string distance algorithm.

2 Likes

Whoa! That is super useful!

The levenshtein operator is certainly a potential aid… But I also think that it won’t be entirely sufficient for what you want.

Suppose you already have, say, a very short and a very long version of the same basic tiddler content. The very short tiddler will still “match” (according to levenshtein) every other very short (unrelated) tiddler “better” than it will match the long one that just has more details on the same theme.

(Perhaps there’s a way of combining levenshtein distance with length comparison, so that internally duplicated strings become more salient… :thinking: )

I’d encourage trying freelinks from the official plugin library (or turning them on for occasional surveying/maintenance work) as another tool, and perhaps taking advantage of an alias plugin.

If you make sure to develop the habit of specifying aliases for your tiddlers (and especially if even those aliases automatically show up as links), you’re less likely to repeat yourself by creating a near-duplicate tiddler using slightly different words.

I also make use of “see-also” fields in some projects, to help me browse through related tiddlers even when tags and internal tiddler links are not the best tool for connecting things.

3 Likes