Challenge: parsing key value pairs "with regular expressions?"

Folks,

For another piece of work I have come across problem I believe someone with better regex skills than my self may be able to solve.

  • It seems to me a larger problem of parsing key/value pairs can be solved if the smaller problem of identifying how to spit a string into separate pairs is found.

First example
number="1",streetaddress="my street",city="my town",postcode=2222,"key name"="key value"

In this example the existence of the comma, It is possible to split this string to extract five separate key value pairs and then handle them.

  • However we can not use comma “,” inside the key or values?
  • what if the input does not include commas?

Second example
number="1" streetaddress="my street" city="my town" postcode=2222 "key name"="key value"

  • In this example we are using key value pairs that we understand as a list but how can we parse this into separate key value pairs?

Third example;
number='1' streetaddress="my street there is a 'object' waiting" city="my town" postcode=2222 "key name"="key value" keyname=""" This includes "double quote" in the value""" mykey=45

  • This introduces examples of the different quote rules as documented, again without the commas. How can we split this into key/value pairs?

Note, Tiddlywiki can already parse a range of key/value pairs passed into a widget and macros and the new set multiple variables widget can make use of a list of keys plus a list of values to convert key value pairs into variables.

The problem is the initial spiting of a list of key/value pairs into separate pairs to give the various alternative ways to “quote a value”, or for that matter quotes for a key name.

  • It is also a little difficult to then make use of the results of a successful list of key/value pairs.

I would appreciate any ideas and especially if a split operator such as splitregexp<myexp> could be used to do this for all cases in general.

  • or at least based only on double quotes of the values.
  • or perhaps a tool to programmatically insert the appropriate commas.

There may even be a hack that uses the similar behaviour already in tiddlywiki to achieve the full outcome.

  • For example most widgets accept and validly process such key/value pairs often referred to as a “Hashmap of variables”.

I think its easier to use CFG or parser combinators to solve this, tw’s parser tree is an example of parser conbinator.

Maybe you can use $tw.wiki.parseText() to parse the widget, and get these params from result.tree[0].orderedAttributes

see

and

for API

You will first need to wrap these params into an widget syntax like <$xxx xxx=xxx />

Sorry, I am not a javascript coder, I am looking for a non Javascript solution or someone to write one for me.

Thanks for your suggestions

First example : [<string>split["]split[=]trim[,]!is[blank]trim[ ]] will clean up the input string to have every key followed by their value. Then you can get a value by their “index” : +[first[8]last[1]] or +[first[8]nth[2]] will output the value for postcode, since postcode is in 4th position (every key must be a multiple of 2). Alternatively, if you know the key name : +[after[postcode]] will output 2222. You might get false results if a value is identical to one of the key, and I’m not sure this will work if you have empty values (ideally you should use a default N.A value to prevent issues).

This seems to works with your second example too, but not the third. Maybe you could use a macro to concatenate your string inside a let widget ? If the commas doesnt break the let widget, this would allow you to retrieve each key as a variable.

EDIT: tested it, it does allow comma but a variable name cannot be quoted. If you find a way to clean up the key then you could use this method, but honestly the best way would be to have a standardized input string, rather than trying to find a regex or filter expression for every case possible.

Thanks @telumire your approach is working as expected with commas and with the second case. Unfortunately the third case is ultimately the solution I am looking for.

Unfortunately adding to the string null="" throws these methods out. But it is close.

You have given me a lot to work with.

Verbose, but it keeps things simple and easy to handle with plain filter operators.

number:::1;;;streetaddress:::my street there is a ‘object’ waiting;;;city:::my town;;;postcode:::2222;;;key name:::key value;;;keyname:::This includes “double quote” in the value;;;mykey:::45

Of course, that doesn’t work so well if any of the values have ::: or ;;;

But easy enough to validate/invalidate data.

If you do want to go the regexp route, I strongly recommend https://regex101.com/ for testing your regexps. This has helped me often to debug my regular expressions in the past.
Make sure to select the flavor “ECMAScript” on the left-hand side when you use advanced operators, to stay compatible with the syntax TW uses.

Have a nice day
Yaisog

PS: For your third example, you could use a hierarchy of delimiters, e.g. split everything at """, then all the parts at ", then at ', and the rest at =. After that, proceed as in @telumire’s answer. After each split, work only with items that still contain a = so you don’t split e.g. my street there is a 'object' waiting further (this check needs to be put into the regexp I think as TW has no foreach-like functionality).
With that you cannot have something like key='Value with "quoted" text', though, where the hierarchy is not respected.
Parsing text using wikitext filters is probably not really recommendable. You’ll be forever debugging regexps when something unexpected comes along.

1 Like