Extracting sentences from tiddlers if they contain specific words

goldtiddler · July 6, 2025, 8:54pm

I’m looking for a \procedure to extract sentences from tiddlers if the sentence contain any of a specific list of words. The tiddlers of interest are to be selected using tags.

Tiddler1 (tags : nature): The cat sat on the mat. The quick brown fox.
Tiddler2 (tags : nature) : The bird sat on a branch, eating a worm. The day was sunny.
Tiddler3 (tags : astronomy) : The Earth orbits the Sun. Mercury is a planet in the solar system.

If the words of interest are [quick,Mercury, day] and tags of interest is [nature], then the output should be:

Tiddler1: The quick brown fox.
Tiddler2: The day was sunny.

Any suggestions are appreciated.

sukima · July 6, 2025, 9:51pm

You could go so far as to program your own filter in JavaScript but that seems like overkill personally. For me it would be easier to copy/paste. But if you really want automation you could put the sentences with interest in a field and then make tiddler to render only that field for the filtered tag.

twMat · July 6, 2025, 10:05pm

Basically…

…first filter out all tiddlers that at all have the desired tiddlers and contain the desired words.

Thereafter split the texts into individual sentences - this is where tricky edge cases come in, i.e how you define a sentence - and then just search each sentence for the desired words.

If you can define what constitutes a sentence then I don’t think it should be overly difficult.

clsturgeon · July 7, 2025, 1:22pm

As @twMat states, you need to define what constitutes a sentence. In TW we have too many structures to content with… HTML, WikiText, bullet points, periods in the middle of sentences (such as ‘The quick brown fox startled Mrs. Emily Green.’). There is the issue of compound words or words with words… ‘holiday’ when you search for ‘day’.

In any event, I asked CoPilot for a JavaScript solution. It elected to extract HTML from the text and end a sentence with a period or a new line. It also addressed the embedded word issue. Otherwise, the other issues remain. I wrapped the method in a filter operator (called ‘sentences’). You will likely need to tweak it to your needs.

You need to write a procedure, but take care… for example, the following does not work well because of the embedded word issue. The first filter uses out-of-box TW filter operators and attempts to find the tiddlers while the second list filter finds the sentences (new operator to return sentences). This first example will find holiday to match with day when finding the tiddler, but then fail to find the sentence because holiday does not match.

<$list filter="[tag[nature]search:text:some[quick Mercury day]]"><$link $to={{!!title}}>{{!!title}}</$link>:<br/>
<$list filter="[is[current]sentences[quick Mercury day]]">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{{!!title}}</$list></$list>

So, instead, I added another filter operator, ‘tiddlersentence’ to find tiddlers: this needs a better name.

<$list filter="[tag[nature]tiddlersentences[quick Mercury day]]"><$link $to={{!!title}}>{{!!title}}</$link>:<br/>
<$list filter="[is[current]sentences[quick Mercury day]]">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{{!!title}}<br/></$list></$list>

Warning… I did very little testing…

$__plugins_cls_mk_filters_matchsentences.js.json (2.8 KB)

Import the json file, save and reload… and test.

TiddlyTitch · July 7, 2025, 2:12pm

Formal Literature def: A text string following either null or stop+whitespace(s); that starts with an alpha-numeric; and finishes with a stop(+space(s) directly adjacent to a final alpha-numeric.

That formulation can be made into a regular expression.
What does’t it match?

TiddlyTitch · July 7, 2025, 2:32pm

Another way is … "all after a null or stop (exclude) that starts with a not-whitespace until a stop (include).

TiddlyTitch · July 7, 2025, 2:41pm

Final ellipsis’ (… = 3 stops) or a singular ellipses (…=U+2026) is a possible complication.

etardiff · July 7, 2025, 3:51pm

The example given in the same paragraph you quoted:

TiddlyTitch · July 7, 2025, 4:31pm

“That is exactly what I told Donald Jnr.”
“Ah, Donald Jnr., should one exclude him from an ending?”

TiddlyTitch · July 7, 2025, 4:49pm

Mr., Ms., Dr., A.P.A., etc. could be exceptions if needed …

I’d guess the utility of a working (practical) approach will heavily depend on the specific lingo of thy wiki??