Check if at least one tiddler exists - faster way?

Dottore · February 8, 2024, 3:39pm

I made the file available: signalexport.json.gz - Google Drive
The main tiddler to begin with is “Recipient”.

saqimtiaz · February 8, 2024, 4:05pm

I had a very quick look, here is an adapted version of your original filter, please confirm that it does as required:

\define openThreadWindow()
<$action-sendmessage $message="tm-open-window" $param="$:/temp/openme"
	template="ThreadWindow"
	windowTitle=`{{!!title}} Thread {{!!thread_id}}`
	width="500"
	height="580"
	windowID=`thread-window{{!!recipient_id}}`
	recipient_id=`{{!!recipient_id}}`/>
\end

<$list filter="[tag[recipient]has:field[thread_id]sort[]] :filter[all[tiddlers]field:thread_id{!!thread_id}!match<currentTiddler>]">
	<$button class="tc-btn-invisible tc-tiddlylink" actions="<<openThreadWindow>>">
		<$transclude/>
	</$button>
</$list>

This relies on only message and recipient tiddlers having thread_id fields. Instead of checking if there are any tiddlers with a given thread_id field and the tag message, we check if there are any thread_id tiddlers with a given thread_id after excluding the recipient tiddler.

If there can be more than one recipient tiddler for a given thread_id, change the filter to:

[tag[recipient]has:field[thread_id]sort[]] :filter[all[tiddlers]field:thread_id{!!thread_id}!tag[recipient]]

Dottore · February 8, 2024, 4:31pm

This seems to do the trick! Thank you!

If I understand correctly the difference is that no check for the tag message is performed. I’m surprised that this makes such a huge difference since in both cases all tiddlers have to be examined.

saqimtiaz · February 8, 2024, 4:37pm

In the filter expression [tag[message]field:thread_id{!!thread_id}first[]] retrieving the tiddlers tagged message is fast because there is an index for it internally. However after that we have to iterate over a very large number of tiddlers to restrict the results to those with a given thread_id.

In comparison, in [all[tiddlers]field:thread_id{!!thread_id}!tag[recipient]] we retrieve from the field index the tiddlers with a given thread_id and then have to iterate over a much smaller number of tiddlers to remove those with a given tag.

What really surprised me is that in the code that I first proposed, this filter was taking up 86% of the execution time:
[tag[recipient]sort[]] :filter[enlist<threads>match{!!thread_id}]

saqimtiaz · February 8, 2024, 7:32pm

Aaaah, that was bothering me but thankfully the mystery is solved. I had worked from Eric’s revised version of my code which dropped the unique[] operator when getting the threads, which meant we were dealing with a list 26773 items long instead of 74 items long for the threads.

So my original code would have looked as below, though what we arrived upon later should still be somewhat faster:

\define openThreadWindow()
<$action-sendmessage $message="tm-open-window" $param="$:/temp/openme"
	template="ThreadWindow"
	windowTitle=`{{!!title}} Thread {{!!thread_id}}`
	width="500"
	height="580"
	windowID=`thread-window{{!!recipient_id}}`
	recipient_id=`{{!!recipient_id}}`/>
\end

<$let threads={{{ [tag[message]get[thread_id]unique[]format:titlelist[]join[ ]] }}}>
<$list filter="[tag[recipient]sort[]] :filter[enlist<threads>match{!!thread_id}]">
	<$button class="tc-btn-invisible tc-tiddlylink" actions=<<openThreadWindow>>>
		<$transclude/>
	</$button>
</$list>
</$let>

The overall key to filter performance is often to limit the length of the results as quickly as possible, preferably using field or tag operators when possible.

Dottore · February 9, 2024, 9:41am

Here you replaced !match<currentTiddler> by !tag[recipient]. This is just as fast.

You say that using [all[tiddlers]field:thread_id{!!thread_id}!tag[recipient]] we have to iterate over a small number of tiddlers. So then why is this so much slower: [all[tiddlers]field:thread_id{!!thread_id}tag[message]]. This is the same count of tiddlers.

This version would be similar to my first approach. And it safer since there might be another kind of tiddler which is not tagged with recipient.

The last version you posted seems to be as fast as the previous. I like the previous one more. At least for me it’s more easy to understand.

saqimtiaz · February 9, 2024, 9:59am

To be honest, I am not completely sure and would have to investigate by profiling the tag operator. My guess is that it is dependent on the number of tiddlers with the tag that is being included/excluded, due to the performance of array.indexOf in JavaScript that is used in the code for the tag operator. The operator gets the tiddlers with the given tag and then for each input item checks if it is in that list of tiddlers.

For large datasets like yours I could well imagine a different implementation of the tag operator that would be much faster when used in the middle of a filter expression, but it would be confusing to know which one to use when.

Edit: turns out the alternate implementation I was thinking of used to be the default but was changed to what it currently is as that is more performant in other circumstances, see Optimise the tag filter · Jermolene/TiddlyWiki5@e4b10d4 · GitHub