How to Solve Regex Issues

No, seems not.

There was a serious deep-dive on Github a while back – I searched recently and failed to find it. Maybe @jeremyruston can recall it and dig it out.

TT – did you seem my post up top? You may have missed it due to the reshuffle to a new thread.

Clearly the solution is the use the <$sedagive> widget…
Or… have Madeline Kahn sing to it :wink:

Dr Shulman, FYI, we applied a <<sedagive “to Abby Normal”>> …

TT

There is some work on safely executing user-supplied regular expressions as part of this JavaScript implementation written in JavaScript:

https://neil.fraser.name/software/JS-Interpreter/docs.html

Scroll down to the section on regular expressions. There’s also a demo of the regexp functionality here:

https://neil.fraser.name/software/JS-Interpreter/demos/regexp.html

Based on that, I think the only way to safely execute arbitrary regular expressions would involve writing a new regexp engine in JS. The other alternative I’ve considered is to run the regexp operation in an iframe (or WebWorker) so that we can implement a timeout.

Thankyou for taking the time to comment at such length. It is useful.

In practice I’m unsure how much of an issue it is.

Certainly the more we expose TW to raw regex the more likely problems might occur. That said, I have found it difficult to devise any test case that reduces TW to complete freeze via regex. Maybe that is because I know regex so can’t easily force errors? Maybe a new generation can?

Best, TT

The TW source was flagged as having an issue by Github itself. That’s “how much of an issue it is”.

But if you really want to dig into an issue, read this: How to Solve Regex Issues - #46 by CodaCoder

Coming late to this thread I just want to add some practices of mine used when the result may be infinite.

  • Count and or limit the filter before providing the user with an option to show all.
  • Add to the search trigger button a save wiki step, so one can safely reload.
  • Make it so on reload the search is not immediately actioned so I avoid a loop.
  • I have an edit recent tiddlers (inc system) list in the side bar, so I can open for edit a tiddler without first displaying it, so as to fix a failing tiddler.

However I wonder if we could have a widget that one can use to wrap a slab of wikitext that introduces a timeout or break to the code within it, one that exits gracefully. We could wrap any code at risk of large or infinitely recursive processes to have this additional protection.

1 Like

Ciao @CodaCoder I just wanted to briefly comment that your post I found interesting!

Just FYI the work of Jan Goyvaerts (a regex guru) might be pertinent if you want to go deep into all this. Might have a few relevant tips for your case? See, for instance: Runaway Regular Expressions: Too Many Repetitions & Runaway Regular Expressions: Catastrophic Backtracking

Best
TT

@TiddlyTweeter Thanks. I’ve read that stuff many times. But my problem is not the regex, per se, but how to expand a TW var/macro within a complex or lengthy regex. I need to step back and rethink how the whole thing is constructed :confused:

1 Like

Right. Let me be an idiot for a minute. Recursion is the bane of regex (and it’s strength). Nesting regex well, especially if complex, is notoriously difficult to fathom. Also one can get basic memory problems using “(capturing groups)” in any recursion. If you don’t need capture then “(?: non-capturing group)” will save much memory on long recursions.

Just a tech comment. One you may know well already?

TT

No, I didn’t, or at least hadn’t given it much thought (but I will now). However, speed of execution isn’t my problem. What I have is easily fast enough to monitor live-as-you-type text input and mark up the preview with “potential issues”.

I took that step back and now I know how to fix it – I think. Either I need a benign regex or a benign replacement string for search-replace to work with when the search string itself is empty, OR, construct the whole thing (see above) from smaller parts, one of which will be the user-supplied part having already dealt with the possibility it might be empty.

I’m favoring the latter right now.

Remember that the whole mess (long negative lookahead, regular regex, another negative lookahead) ends up inside var1 in search-replace<var1>,<var2> and the search-replace is part of a lengthy filter with a huge list of similar search-replaces.

Yeah, I’m going with the latter – deconstruct, rebuild.

1 Like

@TiddlyTweeter dunnit. <sigh-of-relief> :sweat_smile:

Three missteps along the way, all amounting to the same issue I was trying to solve (proving the need) – an empty variable which “crashes” the regex engine (killing the wiki, three times).

1 Like