Can we make standard search a little smarter?

Mohammad · April 22, 2022, 5:03am

Example: Assume I am looking for tags and I want to know how it works.

Goto https://tiddlywiki.com
In the standard search box enter tags (I am looking for tags operator)
See what TW returns (absolutely confusing)

This issue has been already raised by others!

How can we make standard search smarter?

side note: I am aware of advanced search and custom search and too many hacks one can do! I am looking to see how make standard search a little bit smarter.

linonetwo · April 22, 2022, 6:30am

I think this is about optimizing ranking, and maybe additionally providing a preview of search results?

Maybe we can do an experiment about page rank, and manually apply importance to some tiddler that we think is most important for the newcomers.

TW_Tones · April 22, 2022, 7:19am

Mohammad,

Tags are central to tiddlywiki so no wonder you get a lot of results on the documentation site. One quickly learns you just as an additional word, or even part of tags ope I often use something like send wid short for send widget.

However I have built but not published yet, a tool I call “search indicators” that monitors the content of the search string ands determines if the string therein represents a tag, a tiddler, a prefix or suffix of existing tiddlers, then an icon appears indicating one or more of these facts about the
search string, then click to open or trigger the appropriate “advanced search”.

I mention this because if we wanted to give precedence to tags and operators we could do something like that.

The existing mechanism is to add additional search results tabs by tagging a tiddler with $:/tags/SearchResults so clone $:/core/ui/DefaultSearchResultList and modify it eg;

$__core_ui_TagSearchResultList.json (1.2 KB)

Better yet, exclude system tiddlers and remove the 250 item limit.
SearchResultListTab-Tags.json (1.1 KB)

and system only tab SearchResultListTab-Tags-system.json (1.1 KB)

These could perhaps be limited only to a title search

TW_Tones · April 22, 2022, 7:27am

In the above last 2 I used tag pills for convivence, but need to make the popup unique.

I will look for the fix

tag pills are even more useful if you extend the tag pill dropdown, I need another fix so the close all and open all work when used in the search.

reimagin-tags.json (9.5 KB)

pmario · April 22, 2022, 7:49am

I think it should be possible to use a “sortby” like function, that should sort the results in a way the “search term” looks like. So if you search: “tag” the titles that start with “tag” should be listed first.

I think that would be worth some tests. … In a second run, we only need to make sure, that “important” search terms have a corresponding tiddler.

Mohammad · April 22, 2022, 8:58am

Thank you all! Yes I think, standard search in core needs some love! It can be improved!
I like its simplicity and wish to keep it simple but make it smarter!
Ranking is a good suggestion!

pmario · April 22, 2022, 1:00pm

I think manual ranking is a lot of work and very “opinionated”. We don’t really know, what is important for users. IMO it always depends on the current usecase the user wants to solve.

As Mohammad pointed out. His search text was: “tags” … IMO The results in the list where there with 3 letters typed: “tag”, but they where sorted at the “bottom” of the list.

pmario · April 22, 2022, 2:26pm

I did a bit of testing with some common search strings for TiddlyWiki. eg: field, tag, title, add, delete, save, tiddler, widget … and so on

There seems to be a pattern.

Usually we do have titles, that start with the search term, but they are alphabetically sorted, so the end up low in the list mainly because of the “Action**” widgets
There are some tiddlers, where the search term is the 2nd or 3rd word in the title. eg: “Date Fields”
There are tiddlers where the search term is part of a combined title. eg: or TiddlerFields, ActionSetFieldWidget

Using this info would allow us to create some kind of a ranking mechanism based on tiddler titles

In a 2nd or 3rd step we could do a full text search and count the “search term” in the body text. … Similar to the second list in the dropdown. BUT I would want to combine that info in 1 list

In a 3rd or 2nd step we could search the “tags” if it contains the search term …

With all that info combined it may be possible to create an “overall ranking” based on an algorithm.

Just some brainstorming
-mario

pmario · April 22, 2022, 2:33pm

I think some kind of highlighting the search term in combination with a preview may also go a long way already. So users can “skim” the results much faster and decide, what’s important for them.

Mohammad · April 22, 2022, 5:13pm

I like the Mario solution very much and I think that can make search much more usable.

Alvaro · April 22, 2022, 10:37pm

I think using the tags, which are already in the documentation, can be useful to sort better the list. We could use some of “top level” tags to create a first rank. For example, the tiddler are tagged with reference

Concepts
Definitions
WikiText
Macros
Variables
SystemTags
Widgets
Filters
Messages
Commands
Mechanisms
Developers

pmario · April 24, 2022, 10:57am

Sorry — It’s not a solution. Just some thoughts about a “rateable” structure, the content at tiddlywiki.com already has … It’s an “opinionated” view about that structure.

eg: User titles may have a completely different structure than we use atm at tw-com.

Using “search term count” in “tags” and the “body” text may be a bit more generic.

I also think it’s a bit of a chicken / egg problem here. User titles are basically “free form”.

If there would be a “rating” algorithm, that give “better” results, then user created titles may (will) change in the way the algorithm works. …

Since the algorithm is created by an “opinionated programmer” there is a “backside of the coin”. … Users will be “forced” to adopt this opinion — for good or bad

I was thinking about something like this:

rating point results add up (cumulative)
“combined” words are splitted on “word boundary”
- eg: ActionSetFieldWidget … action set field widget
- my/title/looks/like/this … my title looks like this (so the “boundary” may be configurable)
searchTerm rating points = 100 / position-in-title
- so word #1 = 100 points, word #3 = 33 points … and so on
- “stop words” like: at, in, on, and, or … may be counted in phase 1, but removed in phase 2
searchTerm in tags get 50 points
- “full” and “partial” match may be considered in phase 2
searchTerm in “defined field” gets 50 points
- This gives us the possibility to use fields like: keywords instead of tags to improve rating
searchTerm as “full word match” in text: rating points = number of occurrences * 10
- To make it faster only eg: the first X words may be used
- X may be 100 by default. If set to 0 … use all words. …
- search terms shorter than Y may be ignored
  - Y = 5 as default
  - This gives us the possibility, that “full text search” kicks in later in the “process”

The numbers may be tweaked or may be configurable, to give the user more control.

I think this algorithm is already very complicated. … But since the existing search results already do 2 full iterations to display title and full-text results the performance should still be OK.

No promises, just some brainstorming

-mario

Alvaro · April 24, 2022, 3:40pm

I like it.

The tags an keywords in searchTerm might be a double-edged sword.

The first is better than second, but it would be needed to add the keywords to every documention tiddler.
The second (in searchTerm) maybe isn’t good for rating. In the case presented by @Mohammad, the tiddler SystemTags would get less tag points than the tiddlers tagged with SystemTags. This is the reason because I talked about “top level” tags. I didn’t explain myself very well. In documentation there is something like a tags tree in the toc, it is possible get more points to “top level” (=general/generic) tags. And it is possible rebuild dynamically the tree/table for points-tags.

pmario · April 25, 2022, 7:31am

I think you where clear. … The problem is, that dynamically evaluating the “toc tree” is expensive in terms of cpu cycles. So it would need some type of indexing / caching to be fast.

Some more thought will be needed.

pmario · April 25, 2022, 9:39am

I did have a closer look at the core “search code”. It’s part of the wiki-store and it is regexp based. … Internally – if a regexp function “matches” a search term, there is a lot of “structural information” (context), that we “throw away” at the moment.

I think, we can store this contextual info, and make it available for filter functions. … So in theory the existing (and a bit of new ; ) “search code” should be able to create enough information to be able to create the following filter.

It could be: [!is[system]search:title<userInput>sortsub<rank>limit[250]]

The existing default search filter is: [!is[system]search:title<userInput>sort[title]limit[250]]

So the sortsub operator could use a new <<rank>> macro. <<rank>> should “only” be needed to make the info available, that search operator and its flags could create.

So potentially “expensive” tiddler processing should only needed once.

At least, that’s the idea. … I’ll create a GitHub issue with more “dev speech” soon.

TiddlyTitch · April 25, 2022, 10:27am

Very true!

As a person who loves regular expressions I sometimes wonder if the solution here is teaching best use of regex?

Obviously TW is more than regex. But I think it a valid question to ask?

Just a query, TT.

TW_Tones · June 10, 2022, 11:12am

After revisiting this subject I had a thought, rather than trying to rate all tiddlers, what if the author of the wiki could indicate perhaps with a tag 'key terms", as per @pmario test such as field, tag, title, add, delete, save, tiddler, widget and the very top of search results first searches key terms. this would help with any wiki where a subset of terms will be found everywhere and yet also a common subject of searches. It would allow the author to “separate the wheat from the chaff”, optionally we could also set a ranking on these.

Ths method would allow tiddlywiki.com to seperate the tiddlers about key terms from there use elsewhere and satisfy the OT, yet also help others address the same problem when it arises in their own wiki.

Here is a hacked example search key terms.json (419 Bytes)

Tag some tiddlers as “key term”
interesting when you go to do this you find we do not necessarily have a tiddler that provides guidance on such key terms.
I would like to see these additional tabs under search to not even display if there were no matches.

TW_Tones · April 27, 2023, 12:48pm

I would suggest rating and promoting is helpful but of the information is not there at is still not visible.

I have seen dozens of gaps in information we also need to fill. One day I will package some.