How to Solve Regex Issues

jypre · January 28, 2022, 1:11pm

@TiddlyTitch If you can’t forge that criminal regexp (complete or incomplete), then this indicate a very low probability of apparition. This is akind to a nuclear war: noone has to manage that kind of risk. You just have to live with it. And prey your gods if you so wish.

jypre · January 28, 2022, 1:23pm

@CodaCoder I don’t seee what you mean with your double arrows.

And as far as “with a button inside the hook” is concerned, I’m still not getting your clever idea. Regular management of input data finishes in tiddler fields (or indexes) which will make auto-updates for the tiddlers that use them. I did this only to write html so far. Are you telling you are using this to launch further macro?

Something like this perhaps:

<$set name=nextStep filter="[[data]get[flag]match[goForIt]then[celebrate]]">
<$macrocall $name=<<nextStep>> first=<userData>> second=<yetAnotherData>>/>
</$set>

And with the call to the //celebrate// macro be also in the context of a button click? Otherwise, //celebrate// would not be able to have any side effect (except on generating an output).

Mohammad · January 28, 2022, 2:31pm

I believe what make Tiddlywiki crashes, is not the regex pattern entered in Advanced Search, as it is a non-destructive operation! Actually the number of results may be produced and rendering time may cause this happen, so if you have unsaved data, they will be lost! This freezing happened to me a lot of times when I am writing a script in an ordinary tiddler and the preview is open!
I learned just to click on save button before such operation!

CodaCoder · January 28, 2022, 2:34pm

I tried to convey this:

The live code reads the content of tiddler:field
The user types into input
The user clicks button
Button copies input into tiddler:field
Go to 1.

TiddlyTitch · January 28, 2022, 3:14pm

How would a new user know to do that? To “save before experiment?”

A point of the issue is there is a possibility to use regex so it locks the wiki. In my experiments it is rare in ordinary use. But I might be worth in docs for “Search In Fields” mentioning that upfront?

Just a comment
TT

CodaCoder · January 28, 2022, 4:05pm

And…

ignoring it
hoping it won’t happen
saying it can be worked around
or any other bloody-minded approach

is foolish to oneself and others. There. I said it.

Moving on…

@TiddlyTitch TT, I saw your message about trying to think of a hang-causing regex… one that gets me more often than not, is a completely empty regex. Yes, I know that can be easily worked around, usually. However, I’m struggling to find a nice optimal way to guard against this one.

I will find an optimum way, just having to live with it a while longer until it reveals itself.

Here’s how it surfaces…

It’s a setting in BK-TW. The setting is used in the (new to you) Writing Supervisor. The WS can be enabled/disabled in the UI. When active, common grammatical issues are highlighted in the preview pane, live, as you type in the editor.

The user can specify a set of HTML elements to be excluded from a certain “rule” within WS. Looks something like this:

\<(?!(bk-ann|bk-note|bk-problem|inset|interjection|
br|div|h1|h2|h3|hr|p|span|
|red|orange|blue|green|dull|done))[a-zA-Z0-9_\-]+>(?!\n\n)

Because this part of the regex is subsumed by the much larger WS ruleset regex, it’s not so easy to if-and-but it. Follow?

My current thinking is to disable WS completely if it’s blank – because if it is blank and the preview is open on a large tiddler (chapsec) maaan it can take ages to resurface the wiki (first move then, close the damned editor).

The specific TW code in use at the time it receives the empty setting is:

search-replace:g:regexp<var1>,<var2>search-replace:g:regexp<var3>,<var4>search-replace:g:regexp<var5>,<var6>(and so on for about a hundred or so times)

And only ONE of them might be empty to bring it all crashing to a halt. Like I said, I can’t if-and-but in the middle of that, so I think I’ll need to “if” at the very outset, unless…

someone knows a better way?

Mohammad · January 28, 2022, 4:22pm

As you suggested, I will add recommendation to “Search In Fields” docs.
But this is the way Tiddlywiki behaves! searches are live!

TiddlyTitch · January 28, 2022, 4:34pm

Right! Sometimes the aliveness is troubling …

TiddlyTitch · January 28, 2022, 5:12pm

@Mark_S suggested that limiting the number of Tiddlers searched to, maybe, 150 could be a good idea whilst developing a regex.

Regarding a general regex “watchdog”, I doubt it is good idea or viable. Why? Though TW uses regex extensively, I mean majorly, I think it is uncommon. Most web pages never ever use it. Browser back-end support is good but limited. AFAIK there is virtually no serious backend error tracking for regex? Dunno.

A comment, TT

CodaCoder · January 28, 2022, 5:26pm

No, seems not.

There was a serious deep-dive on Github a while back – I searched recently and failed to find it. Maybe @jeremyruston can recall it and dig it out.

TT – did you seem my post up top? You may have missed it due to the reshuffle to a new thread.

EricShulman · January 28, 2022, 5:41pm

Clearly the solution is the use the <$sedagive> widget…
Or… have Madeline Kahn sing to it

TiddlyTitch · January 28, 2022, 6:08pm

Dr Shulman, FYI, we applied a <<sedagive “to Abby Normal”>> …

TT

jeremyruston · January 28, 2022, 6:17pm

There is some work on safely executing user-supplied regular expressions as part of this JavaScript implementation written in JavaScript:

https://neil.fraser.name/software/JS-Interpreter/docs.html

Scroll down to the section on regular expressions. There’s also a demo of the regexp functionality here:

https://neil.fraser.name/software/JS-Interpreter/demos/regexp.html

Based on that, I think the only way to safely execute arbitrary regular expressions would involve writing a new regexp engine in JS. The other alternative I’ve considered is to run the regexp operation in an iframe (or WebWorker) so that we can implement a timeout.

TiddlyTitch · January 28, 2022, 6:30pm

Thankyou for taking the time to comment at such length. It is useful.

In practice I’m unsure how much of an issue it is.

Certainly the more we expose TW to raw regex the more likely problems might occur. That said, I have found it difficult to devise any test case that reduces TW to complete freeze via regex. Maybe that is because I know regex so can’t easily force errors? Maybe a new generation can?

Best, TT

CodaCoder · January 28, 2022, 6:47pm

The TW source was flagged as having an issue by Github itself. That’s “how much of an issue it is”.

But if you really want to dig into an issue, read this: How to Solve Regex Issues - #46 by CodaCoder

TW_Tones · January 29, 2022, 12:12am

Coming late to this thread I just want to add some practices of mine used when the result may be infinite.

Count and or limit the filter before providing the user with an option to show all.
Add to the search trigger button a save wiki step, so one can safely reload.
Make it so on reload the search is not immediately actioned so I avoid a loop.
I have an edit recent tiddlers (inc system) list in the side bar, so I can open for edit a tiddler without first displaying it, so as to fix a failing tiddler.

However I wonder if we could have a widget that one can use to wrap a slab of wikitext that introduces a timeout or break to the code within it, one that exits gracefully. We could wrap any code at risk of large or infinitely recursive processes to have this additional protection.

TiddlyTitch · January 30, 2022, 3:59pm

CodaCoder:

\<(?!(bk-ann|bk-note|bk-problem|inset|interjection|
br|div|h1|h2|h3|hr|p|span|
|red|orange|blue|green|dull|done))[a-zA-Z0-9_\-]+>(?!\n\n)
Because this part of the regex is subsumed by the much larger WS ruleset regex, it’s not so easy to if-and-but it. Follow?

Ciao @CodaCoder I just wanted to briefly comment that your post I found interesting!

Just FYI the work of Jan Goyvaerts (a regex guru) might be pertinent if you want to go deep into all this. Might have a few relevant tips for your case? See, for instance: Runaway Regular Expressions: Too Many Repetitions & Runaway Regular Expressions: Catastrophic Backtracking

Best
TT

CodaCoder · January 30, 2022, 8:25pm

@TiddlyTitch Thanks. I’ve read that stuff many times. But my problem is not the regex, per se, but how to expand a TW var/macro within a complex or lengthy regex. I need to step back and rethink how the whole thing is constructed

TiddlyTitch · January 31, 2022, 1:55pm

Right. Let me be an idiot for a minute. Recursion is the bane of regex (and it’s strength). Nesting regex well, especially if complex, is notoriously difficult to fathom. Also one can get basic memory problems using “(capturing groups)” in any recursion. If you don’t need capture then “(?: non-capturing group)” will save much memory on long recursions.

Just a tech comment. One you may know well already?

TT

CodaCoder · January 31, 2022, 3:53pm

No, I didn’t, or at least hadn’t given it much thought (but I will now). However, speed of execution isn’t my problem. What I have is easily fast enough to monitor live-as-you-type text input and mark up the preview with “potential issues”.

I took that step back and now I know how to fix it – I think. Either I need a benign regex or a benign replacement string for search-replace to work with when the search string itself is empty, OR, construct the whole thing (see above) from smaller parts, one of which will be the user-supplied part having already dealt with the possibility it might be empty.

I’m favoring the latter right now.

Remember that the whole mess (long negative lookahead, regular regex, another negative lookahead) ends up inside var1 in search-replace<var1>,<var2> and the search-replace is part of a lengthy filter with a huge list of similar search-replaces.

Yeah, I’m going with the latter – deconstruct, rebuild.