Note to self: Avoid using the filter Operator with large lists

I’m normally too nervous to post my small findings as I’m an amateur that often doesn’t do things the most optimally, and this discovery may not be news to the super users either. But in my time of searching how to optimize render and refresh times in my beefier projects, I’ve never found anything that covered this, and it made a huge difference by several orders of magnitude, so I’m making a post about it in case it helps others - or if it could be improved further. :smile:

I noticed the lag problem in my current pet project, BlinkieWiki. It deals with a large number of tiddlers (as of writing this, just over 20,000) containing web graphics that have complex metadata. The website has a search interface for filtering graphics through these different kinds of data, such as primary color, keywords, and creator.

The primary filter expression used to determine which graphics to display used the filter operator to reference a field value containing additional filter expressions, which at the time was necessary due to these values being toggle-able and liable to change on the fly (such as whether the graphic has certain color tags, or if it should filter out certain kinds of content or not). However, the render time for displaying the graphics for every time the search query was edited (with throttled refresh factored in) could take as long as 10 seconds, and that was unacceptable.

In experimenting with what was causing the lag, I figured out that the overwhelming culprit was the filter operator. Without it, it was orders of magnitude faster, even with the same amount of input tiddlers. But I still needed to use the toggle-able filter expressions. Where my first setup had been this (simplified for the example):

<$list filter="[all[tiddlers]tag[webgraphic]search:keywords{!!keywords}filter{!!ReferenceToExtraFilters}]">list item</$list>

the newer, more performant equivalent became this - macro-ify the entire section and use substitution instead:

\define displayResults(filter) <$list filter="[all[tiddlers]tag[webgraphic]search:keywords{!!keywords}$filter$]">list item</$list>
...
<$macrocall $name="displayResults" filter={{!!ReferenceToExtraFilters}}/>

(I know this syntax is deprecated, so it isn’t best practice…more experienced users are free to comment on how this can be further improved.)

The filters of course needed to be formatted in such a way that they could fit inside a substitution without breaking the main filter, so they exclude the beginning and ending brackets that normally appear in expressions.

In doing this, the lag was reduced from ~7-10 seconds to a mere 0.8-1.5 seconds(!), which is the single largest improvement to performance that I’ve ever achieved with this project. It was an important lesson learned: when working with large lists and especially large lists that require complex filtering, avoid using Filter and other redundant statements unless you really have to, as there are ways to achieve the same effect that are significantly more performant. I’m sure the broader applications of this ‘lesson’ are more complicated than that, and that there are probably better ways to solve the problem than I did. But I am a little proud of my discovery, so I’m sharing it anyway :stuck_out_tongue:

2 Likes

Just another showcase of lack of a wikitext linter that would spot such antipatterns and give optimization hints. Sigh.

Can you explain what the antipattern was here?

Also, do you have any suggestions about how such a linter would work?

Thanks for sharing your information. And “grats” for your improvements :wink:

You are right most of the time there are different ways with TW filters to reach the same results.

I personally always try to make the number of “listed” elements as small as possible, as fast as possible in a filter.

In your filter expression you have 2 elements that can be “unpredictable”

  • “search” and
  • “filter”

In your case if “filter” “stacks”, that can cause trouble as you found out. … The problem is. There is no “golden rule” to optimise performance, since it always depends on the current usecase and the underlaying data / meta-data.

So thanks for your feedback.

1 Like

The Filter Operators table has “some” info about operators, that create a new and may be long list.

The “C” flag shows this info.

The problem from the OP is, that “filters stack” by selecting several search options. Depending on the “subfiters” this can lead to complex filters that can have the potential to be slow.

IMO it is the feedback like in the OP that can help us to find out about solutions that are more optimised. As I wrote. Sadly there is no “golden rule” except: “Make the list short as fast as possible”. … But that’s basically it :wink:

Sometimes it also helps to store the result of one filter in a variable, if the same list should be used several times. But that depends a lot on the data and the usecase.

1 Like

As I understand, the idea is that an inline filter operator in a longer filter expression can cause performance issues if it operates on long lists. Why not encourage the user to consider using the second version by displaying a warning when such a pattern is found in a filter string?

Technically, there could be a toggleable “Run linter when saving tiddlers” config setting which would process wikitext code and spot these things (another notorious scenario is getting the "pesky brackets"™ wrong in filter expressions) when the save tiddler button is pressed.

I think with great power comes great responsibility and your use case while it does first limit the filter to tag[webgraphic] this is almost inconsequential since you have a large number of tiddlers so tagged, then you use a search operator which does need to look at every tiddler closely with the contents of a field that can change, then you use filters which is able to be broad and encapsulate a lot of possible tiddlers.

  • The point being there is very little to nessasarily constrain the size of the searchable lists.

In cases like this it is common for the designer to find other ways to reduce the search set so the search does not need to do as much work.

  • Add another logical layer to the search such as recent first, limit to 100 etc…
  • If you much force the user to select from a set of criteria and only then start the search
  • Always use operators that do cache the listing first see performance

I could suggest other stratigies but they tend to be different depending on the nature and volume of the data.

  • Alas more and more computer resources get used for flexible and generic solutions, when often being forced to set some contraints can be much more efficient.

Not your fault. Filter expression should have been design to run async on a worker thread or server side, but thay are sync, that is the root cause.

I suspect you would find similar performance gains with this construct:

[all[tiddlers]tag[webgraphic]search:keywords{!!keywords}subfilter{!!ReferenceToExtraFilters}]

Note the use of subfilter instead of filter. The filter operator invokes its parameter once for every single input item and is useful when you need access to fields from input titles in the filter expression invoked, as it sets the value of currentTiddler to the current input item being evaluated.

The subfilter operator is invoked once and only once, regardless of the number of input elements, and does not change the value of currentTiddler.

6 Likes