A regex wizard needed :)

BurningTreeC · September 14, 2023, 12:25pm

Hello,

in the codemirror 6 plugin I need to use a regex to match the completion-source for autocompletion.
It is done like:

var word = context.matchBefore(/\w*/)

context.matchBefore() matches everything before the cursor using the regex

Now, what I want to achieve is, that also strings like tv-config-toolbar-text get matched.
Basically, a regex that matches any single string

Do you have an idea?

Thank you,
Simon

TW_Tones · September 14, 2023, 12:33pm

@TiddlyTitch has skills in this area, perhaps also @Mohammad

ChatGPT may be able to help /^tv-config-toolbar-text$/

Here’s how it works:

^ and $ are start and end anchors. They ensure that the match occurs at the start and end of the string, respectively.
The rest is just the literal string “tv-config-toolbar-text” that you want to match.

Here’s a JavaScript example showing how this regex can be used:

javascriptCopy code

const regex = /^tv-config-toolbar-text$/;

console.log(regex.test("tv-config-toolbar-text"));  // true
console.log(regex.test("not-tv-config-toolbar-text"));  // false

This will only return true for the string “tv-config-toolbar-text”.

john.edw_gmail.com · September 14, 2023, 12:53pm

\w+(?:-\w+)+ to match hyphenated words or tv(?:-\w+)+ to match hyphenated words that begin with tv.

Is that what you are looking for?

BurningTreeC · September 14, 2023, 12:56pm

Thank you @john.edw_gmail.com

That’s already a good start!
I need more than hyphenated words, there can also be underscores, dollar signs, slashes, backslashes, brackets… I’m not totally sure yet what should be included

BurningTreeC · September 14, 2023, 1:40pm

On the CodeMirror 6 plugin site you can test your regex how it works with autocomplete, see the last entry in the “Some things to try out” tiddler

Scribs · September 14, 2023, 2:57pm

in addition to the above i recommend using a site like regex101 if you aren’t already, makes it way easier (still not easy) to test / debug regex.

BurningTreeC · September 14, 2023, 2:59pm

Thank you @Scribs, very helpful!

I was using regexr.com

john.edw_gmail.com · September 14, 2023, 2:59pm

\w+(?:[^a-zA-Z0-9\s]\w+)+ match word characters separated by non alphanumeric and non space characters.

matches: Now-is\the/time$for]all[good_men^to&come*to(the)aide+of@their~country

Probably matches on more than you really want, but not knowing which characters are in play…

BurningTreeC · September 14, 2023, 3:02pm

Perfect @john.edw_gmail.com!

I cannot know which characters are in play, that’s the problem

I’ll try your regex as soon as I’m at home

grantwparks · September 14, 2023, 3:42pm

\S+ - one or more non-whitespace chars

Scott_Sauyet · September 14, 2023, 3:43pm

That makes it hard to work with.

Well that’s trivial: /.*/, or /.+/ if you want it to contain at least one character. There’s more we can do for multi-line stuff if necessary.

But I’m guessing the requirements are a little more stringent than this.

BurningTreeC · September 14, 2023, 3:46pm

Thanks @Scott_Sauyet ,

yes, the requirements are a bit more stringent

It cannot match ALL the text - it must detect a bit before the cursor but not the whole - and somehow understand what the user wants to complete

BurningTreeC · September 14, 2023, 3:48pm

Thanks @grantwparks ,

that regex does a good job!

Maybe a regex that matches two strings with one or more spaces in between would do it even better.

Would that look like:

\S+\s*\S* ?

Scott_Sauyet · September 14, 2023, 3:52pm

Maybe others are making sense of this, but it’s really not clear to me. How would you expect to tell a regex, which only takes a string, to know anything about a cursor?

Can you give a handful of examples, both input strings and what’s expected to match? That might make it a bit more clear.

BurningTreeC · September 14, 2023, 3:59pm

Sorry @Scott_Sauyet - I wasn’t clear

The regex must set something like a boundary
Like \w
I’m not so sure how the codemirror 6 function matches the text but it seems like it does it so that it matches from the end of the string to the beginning. I don’t know - I will look into the native functions, then I can tell you more.

What I know is that the example where I got my initial regex (\w*) is from the codemirror programmer himself and he must know what he’s doing ^^

I believe the regex must set such a word-boundary so that this works well but I want to figure all possibilities out

grantwparks · September 14, 2023, 4:00pm

the \s would have to use + also instead of * I think

Scott_Sauyet · September 14, 2023, 4:18pm

Just to be clear, \w* matches any number of “word” characters, and is entirely equivalent to [a-zA-Z0-9_], which means it matches any digits, any upper- or lower-case characters between a and z, and any underscores (_)… We can write one that adds other characters you want to include by expanding on what’s in the brackets. Often we will have to escape punctuation characters with a single backslash (\).

For example, if we want to include all the characters above as well as these ones: + - * / ? !, we might write /[a-zA-Z0-9_\+\-\*\/\?\!]*/. The trouble with this (or with /\w*/ for that matter), is that they will match all strings, since we allow for zero characters in our matches.

BurningTreeC · September 14, 2023, 5:17pm

Thanks for the explanation @Scott_Sauyet !

I’m indecisive what I should use at the moment. I feel like a more complicated regex matching all special characters is better but I haven’t tested it

I will test it though and with your help I got an insight how this stuff works. Thank you!

cdaven · September 17, 2023, 6:32pm

As I understand it, you want to “grab” the last entered auto-completable text snippet when the user presses a keyboard shortcut for autocomplete?

If the last thing the user wrote was “tv-config-toolbar-text”, the regex pattern /\w*/ will only match “text”. To get the whole thing, you need /[-\w]*/.

If you really want to match “all” characters, you could use a negative pattern such as /[^\s]*/, which will capture everything until the first/last whitespace. But do you want to include parentheses, periods and so on?

“If I had only one hour to solve a problem, I would spend up to two-thirds of that hour in attempting to define what the problem is.”

BurningTreeC · September 18, 2023, 4:22am

Thank you @cdaven

Well, your last regex does it perfectly!