Regular expression to split tags field

Zheng_Bangyou · March 7, 2024, 2:04am

I am trying to split tags field into individual tags with R, e.g.

tags: [[tag 1 & 2]] tag2 [tag] [[[[tag 4]]]]

into

c("tag1 & 2", "tag2", "[tag]", "[[tag 4]]")

What’s the regular expression used in js for tiddlywiki to split it?

Sorry I cannot find it from source codes.

Scott_Sauyet · March 7, 2024, 2:30am

The current implementation uses this, but I imagine there’s something simpler for most use-cases:

/(?:^|[^\S\xA0])(?:\[\[(.*?)\]\])(?=[^\S\xA0]|$)|([\S\xA0]+)/mg

TW_Tones · March 7, 2024, 2:31am

First, you do not need regular expressions to split tags, there are core filter operators for tags specifically, even then a tags field is what we call a list field and there are additional operators such as list and enlist that can be used.

Please explain for what purpose you want tags in the following format?

("tag1 & 2", "tag2", "[tag]", "[[tag 4]]")

This can be constructed but what do you intend to do with the result?
Are you in fact writing a Java Script solution?
Why not use TiddlyWiki Script?

For example;

\function tag-array() [all[current]tags[]addprefix["]addsuffix["]] +[join[, ]] +[addprefix[(]]  +[addsuffix[)]] 

<<tag-array>>

returns ("a", "b", "c", "tag with spaces")

Scott_Sauyet · March 7, 2024, 2:38am

I’m assuming this:

(emphasis added)

means that we’re doing this in the R programming language, and hence do not have access to JS/Tiddlywiki tools, but do have a regular expression engine. And

c("tag1 & 2", "tag2", "[tag]", "[[tag 4]]")

is the list format for R.

Zheng_Bangyou · March 7, 2024, 2:46am

Thanks @Scott_Sauyet for your suggestions.

Yes I am trying to use web API to process tiddler in R.

Scott_Sauyet · March 7, 2024, 3:11am

This simplification works for me:

/(?:^|[^\S])(?:\[\[(.*?)\]\])(?=[^\S]|$)|(\S+)/mg

I don’t know why the built-in one handles non-breaking spaces (\xA0) but I assume there’s a good reason. However I never have them in my list fields and so for my own uses, I would skip them. That’s the only change here: I removed the handling for them. It’s at least slightly simpler.

Zheng_Bangyou · March 7, 2024, 3:14am

Thanks for your help