Search Operator: Astonished to discover

… that it doesn’t quite do what it says on the tin:

words: (the default) treats the search string as a list of tokens separated by whitespace, and matches if all of the tokens appear in the string (regardless of ordering and whether there is other text in between)

With my emphasis

words: (the default) treats the search string as a list of tokens separated by whitespace, and matches if all of the tokens appear in the string […]

I don’t think that’s actually what it does. It seems to consider only “tokens” (i.e words) broken not even by word boundaries.

Try this on TW.com

tiddler: test search
test: one-two one-two-three 
text: {{{ [is[current]search:test[one]] }}}

Since “one” is not a “token separated by whitespace”, search should not write anything to the output.

This, just for fun…

tiddler: test search
test: Was this really what you had imagined mate?
text: {{{ [is[current]search:test[his hat was a real mat]] }}}

Clearly, whitespace has no bearing on the matter at all. (Couple hours of my life I won’t get back.)

If I can remember how, I’ll submit a PR for the docs – at the very least, I’ll try to make it less misleading.

That’s not true the search term should be found in any string, even if it is in the middle of the text sting

The parameter “word” does tell you about the search input. It does not tell you anything about the content of the tiddler.

Still, @CodaCoder: if you were confused by this, others likely are as well, and I encourage you to write the text for a docs update. If you have problems with the submission process, someone can help you through it or do it on your behalf.


If it’s not clear, what the docs calls “the search string” is not the text we’re searching through, but the text we’re looking to find. Thus {{{ [is[current]search:test[one-three]] }}} would have no matches, since there are no instances of one-three in one-two one-two-three, but {{{ [is[current]search:test[one three]] }}} would match because the two strings one and three are there.

It’s easy enough to see how someone might be confused by this. Do you have a suggested wording that’s clearer?

I stand by my statement. search does not treat the parameter as a list of tokens separated by whitespace – whitespace has no bearing on the functionality – it simply doesn’t care. Any old sequence of characters would be more accurate and much more correct.

There is no parameter “word”. I’m not talking about the input. But I think you know that…

Clearly. And nobody said it did. I think you think I’m confused by the output specification – I want the output, that’s why I’m using it. What I’m surprised about is the fact the output doesn’t conform to the written specification for it’s input (where spec==the docs). Clear now?

I wasn’t confused, Scott, I was surprised to discover (hence the topic title) the docs were so misleading. Like I said above, whitespace has no bearing on the matter (excluding the notional ^ and $ endpoints). I wasn’t expecting it to ignore what it states so clearly in the docs.

Clearly, but I vouch the following is not expected given the current wording:


tiddler: test search
test1: This is a test statement where whitespace is included
test2: Thisisateststatementwherewhitespaceisnotincluded
text:
  {{{ [is[current]search:test1[his test is in space]] }}}
  {{{ [is[current]search:test2[his snot is here in space]] }}}

image

See my response above. Basically, avoid mentioning whitespace and “tokens”. It simply doesn’t care.

Looking back over my previous usage of search, it’s a case where conforming data (data that conforms to how the tool is documented) doesn’t reveal how the code underneath is actually written to work. When supplied with _non_conforming data, all is revealed. It’s just matching char-sequences – which is nothing like the docs would have you (me) believe.

Now, contains, on the other hand, would make sense behaving like this but it doesn’t – and in my case, I need the regexp suffix on search. But that’s an irrelevance here…

Thanks. I’m gonna move on…

I don’t quite follow you. The search operator in “word” mode does definitely split the parameter into separate tokens delimited by whitespace.

This might be the crux of the matter. When we interpret the string one as a “list of tokens separated by whitespace” then we get a single token as the result: one. Interpreting the string one two three in the same way yields a list of three tokens one, two, three.

It’s hard to understand the intent of your examples because it’s not clear what result they produce and what result was expected.

Agreed (I think). But onetwothree also returns one as a match, as does net, wot and etw ← no whitespace.

Here’s what “broke” for me (simplified):

(tiddler 1 field) labels: street highway city no-people no-cars

another:

(tiddler 2 field) labels: airport people cars jets

search:labels[people]

that’s cutting it back to the bare bones. It found people for both tiddler 1 and 2. In tiddler 1 people is not surrounded by whitespace.

Do you mean that [search[onetwothree]] matches fields that contain the text net?

Yes. Repeat…

image

I think now you’re getting there… :slight_smile:

And Jeremy, what I actually wrote was (note the hyphens)

I think @Scott_Sauyet is right: it sounds like you’re reading the docs as if the comment about splitting into tokens applies to the text being searched. It does not; it applies to the text that is being searched for; the target text. So it seems that the confusion arises because the phrase “search string” in the docs is ambiguous.

Yes, and I agree that this should be clarified. I just don’t have a better wording in mind yet.

So you really are saying people should be found when the field contains no-people?

Yes it should. We don’t tokenize "no-people".

Think about it like this:

--  Searching for "one three" in "one-two one-two-three".
    -----------------------------------------------------
-- First split "one three" into the tokens by whitespace, "one" and "three"

-- Now check if each of these tokens appears anywhere in "one-two one-two-three"

  "one": "one-two one-two-three"
          ^^^                    -- Found
"three": "one-two one-two-three"
                          ^^^^^  -- Found

-- Yes they both do, so the current title is returned.

But now let’s try to find `“two one-three”:

--  Searching for "two one-three" in "one-two one-two-three".
    ----------------------------------------------------------
-- First split "two one-three" into the tokens by whitespace, "two" and one-"three"
-- Now check if each of these tokens appears anywhere in "one-two one-two-three"

      "two": "one-two one-two-three"
                  ^^^             -- Found
"one-three": "one-two one-two-three"
                                  -- Not Found

-- No, one of them doesn't, so nothing is returned.

Note that we never try to tokenize the string “one-two one-two-three”. That’s just the raw text we’re searching through.

Thanks, Scott.

Guess I’ll have to write my own, then.

There is a flag “words” in the docs https://tiddlywiki.com/#search%20Operator – I though you where referring to that one.

So can you explain what you would like? I’m guessing most similar requirements would be relatively easy, but some would be more time-intensive than others.

Are you trying to find if one or several words are contained in a tokenized list of words from the search target? Something like this?:

const codaSearch = (query, text, words = text.split(/\s+/)) => 
  query.split(/\s+/).every(w => words.includes(w))

codaSearch ('one two', "three two one") //=> true
codaSearch ('one two', "done artwork")  //=> false, even though 
                                        //   the words are there as raw text

I haven’t looked to see if there are affordances to add your own search modes, but it would be easy enough to override the search tiddler to add this capability.

Like this

I’ll rework the code to use that.

But this, in my case at least…

Note that the regexp option obviates the need for the older regexp operator.

is not the case. Maybe that’s the only text that “needs” to be changed.

Thoughts?

Ha! I spent more than a few minutes trying to think of a word with “two” embedded.

Yes… Sometimes a really intuitive example can make the concern clearer:

If I’m searching for “mental” (as in mental health), I can be overwhelmed if my results list is littered with tiddlers that contain developmental and environmental and elemental etc.

Looking only for whole words (i.e., seeking results in which the search string appears as a whole word) is certainly something an ordinary user might want, without needing to dive into regexp!

1 Like