The Cartesian Product: understanding what it is and avoiding it like the plague

Charlie_Veniot · May 11, 2023, 5:09pm

(Note: this applies to any version of TiddlyWiki new and old; however, I do not take into consideration or use features in TiddlyWiki beyond version 5.2.3 as they really are of no interest to me.)

In terms of TiddlyWiki:

A Cartesian product is the result of cycling through every item in a list with every item in the very same list (or any other list) .

To understand that, let’s consider a scenario: producing a list in which a “horizontal rule” is placed after every item except for the last one.

Now let’s look at this TW script:

<$list filter="[tag[Articles]]  +[sort[]]">
    <h2><$link><$view field="title"/></$link></h2>
    <$transclude mode="block" field="text"/>
    <$list filter="[tag[Articles]]  +[sort[]] +[last[]] +[!match<currentTiddler>]"><hr></$list>
</$list>

Let’s pretend that the outer list will result in 50 items.

The inner list gets the same list of 50 items to find out if the current outer list item is the last item.

The cartesian product = 50 times 50 = 2,500. That is the number of items that get processed in total.

In general, this may not matter because TiddlyWiki is blazingly fast.

However, we run the risk of running into that once in a blue moon moment in which something time-consuming is going to happen 2,500 times. That’s bad.

The best way to get around this problem is to rethink things.

Finding out what is the last item in the list involves creating the list entirely and then getting the last item.

Same thing getting the first item in a list: still involves creating the list entirely and then getting the first item.

If we dive into the list widget, we can setup a counter variable. Taking advantage of that, and rejigging the process a little, we can do this:

<$list filter="[tag[Articles]]  +[sort[]]" counter="myCounter">
    <$list filter="[<myCounter>!match[1]]"><hr></$list>
    <h2><$link><$view field="title"/></$link></h2>
    <$transclude mode="block" field="text"/>
</$list>

Now, for every item in the outer list, the stuff inside the list happens 50 times (pretending we are dealing with 50 items in the list).

The inner list, however, will only ever produce one item each of those 50 times.

So the result is 50 (items in the outer list) + 1 items (in the inner list) 50 times (one time for each item in the outer list) = 50 + 50 = 100 items processed in total.

Going from 2,500 items getting processed to 100 items getting processed, that’s 25 times less processing going on. In some scenarios, that could be a big deal.

pmario · May 11, 2023, 6:07pm

Nice write-up!

Just for a more completeness sake.

The following code should have the same performance characteristic as your second example.

Since TW v5.2.0, to detect the “first” or “last” element in a list and do something different, there are 2 new variables, which are present in a list, if the “counter” parameter is set.

See the “counter” info at: https://tiddlywiki.com/#ListWidget

Changing your code, the myCounter-last variable can be used to check if the list reaches the end.

The code below draws the HR as long as the myCounter-last variable is set to “no”

<$list filter="[tag[Articles]sort[]]" counter="myCounter">
    <h2><$link><$view field="title"/></$link></h2>
    <$transclude mode="block" field="text"/>
    <$list filter="[<myCounter-last>match[no]]"><hr></$list>
</$list>

So in a more general example.

<$list filter="[tag[Articles]sort[]]" counter="myCounter">

    <!-- the next line will only be executed once at the start of the list -->
    <$list filter="[<myCounter-first>match[yes]]">Start</$list>

    <<myCounter>>: <$text text=<<currentTiddler>>/>

    <!-- the next  2 lines will only be executed once at the end of the list -->
    <$list filter="[<myCounter-last>match[no]]"><hr></$list>
    <$list filter="[<myCounter-last>match[yes]]">End</$list>
</$list>

Charlie_Veniot · May 11, 2023, 6:18pm

The problem I have with myCounter-last is this: it means the list gets processed twice.

If that’s the case, that’s a little bit inefficient. Not the worst of all evils, and likely only a problem on a blue moon with all the stars and ducks lined up, but not (for my preferences) a great thing.

Scott_Sauyet · May 11, 2023, 7:06pm

Are you sure about that? I haven’t checked the implementation, but I would be very surprised if it were implemented that way. This is written in JS and presumably is using JS arrays, so we know the limits up front, and I would assume that we would use the knowledge of the length of that array to provide this value at minimal cost.

But, again, I haven’t looked.

Scott_Sauyet · May 11, 2023, 7:10pm

Ok, now I looked:

github.com

Jermolene/TiddlyWiki5/blob/master/core/modules/widgets/list.js#L136-L141

    
      
          	if(this.counterName) {
          		parseTreeNode.counter = (index + 1).toString();
          		parseTreeNode.counterName = this.counterName;
          		parseTreeNode.isFirst = index === 0;
          		parseTreeNode.isLast = index === this.list.length - 1;
          	}

It looks as though it should be efficient.

Charlie_Veniot · May 11, 2023, 7:17pm

A look-ahead process to figure out total count before processing items can be efficient, but it can’t be more efficient than not doing that count.

Look-ahead is look-ahead, and it involves work.

The most efficient way to go is to not do it.

But I am a very rare duck: I have a zero-tolerance for any inefficiency foisted upon me (that doesn’t solve any problem for me), and I’m a die-hard minimalist .

Far extreme light and agile. What can I say?

Another unfortunate reason for the latest versions of TW being to me an impossible sell.

jeremyruston · May 11, 2023, 8:39pm

I don’t think that that is the case. There is no “look-ahead process”.

As far as I know, the only inefficiency introduced by using the counter attribute is that it prevents some optimisations for list widget entries that change position as the result of a refresh.

Charlie_Veniot · May 11, 2023, 8:58pm

A count-as-we-process, as in this is the current count, that is a no-brainer. That’s just increment as we process.

A “what is the total count”, I can’t see how that isn’t an immediate doubling of the bare-bones getting of a list for total count before going through the list for output…

To be clear, it is all in the realm of likely nanoseconds.

Not much to quibble about for the great majority, but I suffer from sensory overload and notice absolutely everything (temperature differences, scents, delays, weight), so you aren’t discussing with run-of-the-mill-normal here.

sukima · May 12, 2023, 1:11am

Is an <hr> required? If all you want is a line between entries this can be done with CSS and no computation needed.

<style scoped>
.my-fancy-styled-list > * + * {
  border-top: thick dashed black;
}
</style>
<div class="my-fancy-styled-list">
<$list filter="[tag[Articles]]">
<div>
<h2><$link><$view field="title"/></$link></h2>
<$transclude mode="block" field="text"/>
</div>
</$list>
</div>

This is coined The Lobotomized Owl technique.

If you did need the <hr> you could just put one at the bottom and hide it with CSS as well:

<style scoped>
.my-fancy-styled-list > hr:last-child {
  display: none;
}
</style>
<div class="my-fancy-styled-list">
<$list filter="[tag[Articles]]">
<h2><$link><$view field="title"/></$link></h2>
<$transclude mode="block" field="text"/>
<hr>
</$list>
</div>

Scott_Sauyet · May 12, 2023, 1:13am

TW could certainly be rearchitected to use generators or some other style of lazy listing. And if it was, then references to the length of a collection or the index of the last element would be more expensive.

But that’s not how it works now. Right now the ListWidget constructor calls to its getTiddlerList method, which calls wiki.filterTiddlers to create an array of tiddlers. (That calls wiki.compileFilters, which I haven’t fully grokked yet, but it clearly is returning an array.)

Then inside a call to $tw.utils.each, the constructor calls its makeItemTemplate method, and the code I mentioned above runs. But at this point, the system has already found the full list, so reporting whether it’s the last item is strictly a matter of checking whether the currently processed index from each matches the length of the list.

In other words, using myCounter-last should be no less efficient than checking !match[1], not with the way TW is currently designed.

pmario · May 12, 2023, 7:53am

The list <$list filter="[<myCounter-last>match[no]]"><hr></$list> is just an “if statement”. It reads the value of myCounter compares it with “no” and if it is no, it draws the HR element. So this is very fast.

Only the “outer list” can have a performance impact. [tag[something]] is a shortcut for [all[tiddlers]tag[something]] … So this filter has to check every tiddler that exists in the wiki, if it is tagged: “something”. … Luckily tags are cached. So subsequent usage of [tag[something]] will be fast.

But … you are right if lists with many elements are nested this can quickly become a performance hit. As you wrote in your first example. Even if tags are cached executing the code 50 x 50 times can be noticeable.

pmario · May 12, 2023, 8:26am

------ Warning: Some “Geek Speak” follows -------

The following info should be valid for the time of writing, but may be completely different in the future. We constantly try to improve the core internals.

IMO redesigning is not necessary.

Working with tags, filters and lists is highly optimized already.

As I wrote [tag[something]] is cached / indexed, so only the first time using a tag-filter will need maximum CPU cycles.
As you wrote: wiki.filterTiddlers needs to wiki.compileFilters. That means a filter-run eg: [tag[something]sort[]] internally is converted to javascript functions. Those functions are cached too. So the “compile” step can be skipped for consecutive calls of the same filter-run. Not the resulting tiddler-list is cached, but the javascript functions that create the list, which also is an improvement.
list-widgets only “render” DOM elements if really necessary, because the browser DOM is the Elephant in the room. Manipulating the DOM can cause a complete “redraw” of the whole page.

What can make looooong lists slow is “adding DOM nodes and events” for clickable elements.

So redrawing long lists should be avoided. That’s why, we do recommend to switch to the sidebar “Open tab” instead of “Recent” if you have a lot of content tiddlers.

Hope that’s not too technical.

pmario · May 12, 2023, 8:45am

The [tag[something]count[]] actually is a demanding filter run and should be done only once, outside of a list and the result should be stored in a variable. So it can be used inside the list and other following filter runs eg:

<$let maxNumber={{{ [tag[Articles]count[]] }}}>
<$list filter="[tag[Articles]sort[]]" counter="myCounter">

    <<myCounter>>/<<maxNumber>>: <$text text=<<currentTiddler>>/>

</$list>
</$let>

Knowing which filter-results should be stored in variables is an iterative process. It’s not always immediately clear, which filter-runs can or should be reused.

So if you create new dynamic lists at the beginning there may be many filters, that are exactly the same. Once everything works as expected, there is a possibility to optimize them. … BUT …

Backup first
-m

Scott_Sauyet · May 12, 2023, 12:56pm

Oh, I absolutely wasn’t suggesting that we do so. I was just trying to demonstrate that what Charlie was worried about is not a problem given the current design. It looks as though I wasn’t clear enough.

This is all about testing counter-last. The concern was that we should avoid it because it would require a second iteration of the list. Because this is not stored as a list (linked list, or car-cdr / first-rest), but as an array, this is not a problem: we have constant-time access to the length property. (Interestingly, if we had such a list, then counter-last would be trivial, but counter-first might be problematic, depending on the implementation.)

Charlie_Veniot · May 12, 2023, 1:52pm

ARG! For the love o’ Pete, the light bulb just came on…

“arrays” has been percolating in the back of my sponge from the moment I read this.

For the last month and a half, I’ve been (at the job) working heavily with reports, and the reporting tool, when printing a report, counts records as they are processed and sent to the print queue. For a total count to appear on the first page, the reporting tool has to cycle through all of the records first (to get a total count) before going through the records again for printing.

I’ve been so hyperfocused on the reporting that this mechanism is what I see even in my dreams.

It just grabbed me by the jugular out of nowhere a couple of minutes ago: hey moron, the results of a filter run get put somewhere before everything in between the list-widget tags get rendered. So counting of items is just a quick getting of that “somewhere’s” size.

This facepalm moment is going to smart and leave a mark for a good while.

Thanks for sorting me out, @Scott_Sauyet .

Scott_Sauyet · May 12, 2023, 2:29pm

We’ve all been there. I’d be embarrassed to try to count the number of times I’ve done the same!

TW_Tones · May 12, 2023, 11:58pm

For a little clarity and completeness on this subject the filters that are cached by default and behind the sceens is documented here https://tiddlywiki.com/#Performance

However we could organise the fiters we design into two groups, ones that act on all or a large set of tiddlers and those that do not. Lets concider the first group and concider in the following tiddler https://tiddlywiki.com/#Filter%20Operators the C column, which help you see if you are starting your filter with a fresh list or not.

With the large lists if you build them with one of the optimised filters, you get some optimisation.
however I belive we can also limit the processing demand by using such filters only once in a section of code;

Concider the following;

<$set name=input-list filter="[all[tiddlers]tag[TableOfContents]]">
<$let total={{{ [<input-list>enlist-input[]count[]] }}}>
total=<<total>><br>
<$list filter=" [<input-list>enlist-input[]]">

</$list>
</$let>
</$set>

So in this example the input-list variable is effectivly a cache of tiddler titles, the count is only counting the member of the list, not interogating all tiddlers again, and same with the list.
Of course if a change occurs the tiddler will rerender and go through this again but at least inside the code you are reducing the processing or filtering needed.
Also consider even if your list is large the variable exists only until the set let widgets are closed.
Finaly remember, at least in single file wikis all triddlers, there titles and fields are already “in memory”.

If someone could confirm the assumprtion here in, it would be appreciated.