Field Pollution vs Performance

HistoryBuff · May 22, 2022, 7:05pm

Hello all,

I’ve heard discussions in the past concerning tag pollution where too many tags might cause a performance hit. Does the same discussion apply to any custom field? I ask because, as my TW grows, I’m getting more custom fields. I woulnd’t say that I have a huge amount (currently there are 160 total fields in my TW, approximately half of which are custom).

What might be considered too many custom fields such that performance might be impacted? I’m asking that question because I’ve been considering converting a bunch of tiddlers that I use to store data in fields to dictionary tiddlers (my TW is basically being used as a database for Kansas railroads). This would reduce the number of total tiddlers I have as well as reducing the number of fields I have in use (not sure of the exact numbers). Part of me wants to do this an exercise and learning experience, but it would be a considerable amount of work as I have many filters that I use to present the data. I’d like to know if this might be worth the effort.

I don’t believe I have a big performance issue now that i fixed the issue I mentioned in Violations in console results - #17 by HistoryBuff. However, I would like to avoid one in the future.

Any comments / opinions are welcome.

pmario · May 22, 2022, 8:44pm

Hi,

You should have a closer look at: How the TW internal data structure looks like and why data-tiddlers are not optimal · Discussion #6116 · Jermolene/TiddlyWiki5 · GitHub

Data tiddlers have some inherent disadvantages, because the text-field is “text”. If it contains structured data, this data has to be parsed to be used. That’s a disadvantage.

The TW UI refresh cycle is highly optimized to only change elements that need to be changed, if tiddlers are changed. Data-tiddlers have a disadvantage here too.

Mark_S · May 22, 2022, 9:00pm

That’s a nice write-up!

If you wanted the best of both worlds, you could run a process that extracted all the info you needed from the dictionary for a particular display and created the necessary, temporary tiddlers. Then you could do your list, graphs, or whatever you needed. When you were done, the tiddlers could be deleted. I’m imagining a particular situation, where you have a huge number of data entries but only need a subset for any particular analysis. Like if you were dealing with millions of railroad entries but only wanted to see stats for “Kansas”.

TW_Tones · May 22, 2022, 11:37pm

As for tags using fields also needs the designer to think about the nature of the information they are storing. A tag makes sense if it could be use across any tiddler in the whole wiki, fields if it is used one at least a subset of tiddlers and data tiddlers if its a lookup for use in random tiddlers and random places within the tiddlers eg a glossary of rare terms.

Along with asking yourself if a tag/field or data record is the most suitable we need to consider;

How rare or common will the use of the tag/field or data record be?
How many cases will there likely be for each tag/field or data record both initially and potential with new information recoded in the wiki?
Then consider day to day use of the wiki and given the above how often will the information need to be displayed?
If the display of such information has a performance impact , could it be selectively shown/hidden so it only has an impact when the user needs it?

If using fields, even one off fieldnames on tiddlers keep in mind the impact on the fieldname drop down when adding fields. Although this could become “polluted” it would actually be possible to present alternative new field dropdowns which only list a subset of possible fields.

An interesting use case I have is Preview request for comment - Link protocol handlers where you can drop multiple links on a tiddler and they will be storred as link-N fields, alternatively you can rename the field to a meaningful name and this “meaningful name” is used to display the link.

A range of fieldnames could proliferate
However the same fieldname may be reused a lot eg; discussion field containing links to talk.tiddlywiki.org threads.

Now once these logical data questions are answered one can then ask about the performance issues.

The nature of the data will have a major impact on performance that’s why I put it before performance.
Note fields are amongst “indexed filter operators” so carful use of filters will help.
- [all[tiddlers]field:y[x]...
- [all[shadows]field:y[x]...
- [all[tiddlers+shadows]field:y[x]...
- [all[shadows+tiddlers]field:y[x]...

You may be interested in this post How to have your code and action it too - batch operations on multiple tiddlers, refactoring your wiki which helps a lot in refactoring your wiki, perhaps moving between using a tag or fields, even data tiddlers. This supports one off batch conversions or a reusable action.

HistoryBuff · May 23, 2022, 11:03pm

Thanks to all for the insightful comments. I have some pondering to do. I do think you all have talked me out of wholesale converting to dictionary tiddlers, though.