Regexp Search and Replace for Words with Dash (Hyphen)

Mohammad · June 21, 2023, 4:30am

What is the suitable regexp pattern to capitalize the words with dash as below?

Find all words with em-dash (e.g. contains a dash in the middle)
Replace the captured words with first letter of each part capitalized

Example

simple-words, some text, to-do

→

Simple-Words, some text, To-Do

Partial wikitext

<$let 
  in="simple-words, some text, to-do"
  pat=...?
  rep= ..?
>

<$text text={{{
      [<in>search-replace:gm:regexp<pat>,<rep>]
}}}/>

</$let>

How the above code can be completed? What are the pat and rep?

pmario · June 21, 2023, 5:57am

TW has “dash” -, “endash” -- and “emdash” --- … In your example you only use the single dash.

Here’s a regexp, that returns 3 backreferences + detailed description. The regexp works for 1 word-dash combination. So for your example it would need to be done 3 times

The 1st backreference contains every character except those in the list: ,<space>- comma, space, dash

The 2nd backref. contains the “spacers”, m-dash, n-dash, dash, <space>

The 3rd beckred. contains every char except ,<space>

([^, -]+)(---|--|-| )([^, ]+)


// ([^, -]+)(---|--|-| )([^, ]+)
// 
// Options: Case insensitive; Dot doesn’t match line breaks; ^$ don’t match at line breaks
// 
// Match the regex below and capture its match into backreference number 1 «([^, -]+)»
//    Match any single character NOT present in the list below «[^, -]+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
//       A single character from the list “, ” «, »
//       The literal character “-” «-»
// Match the regex below and capture its match into backreference number 2 «(---|--|-| )»
//    Match this alternative (attempting the next alternative only if this one fails) «---»
//       Match the character string “---” literally «---»
//    Or match this alternative (attempting the next alternative only if this one fails) «--»
//       Match the character string “--” literally «--»
//    Or match this alternative (attempting the next alternative only if this one fails) «-»
//       Match the character “-” literally «-»
//    Or match this alternative (the entire group fails if this one fails to match) « »
//       Match the character “ ” literally « »
// Match the regex below and capture its match into backreference number 3 «([^, ]+)»
//    Match any single character NOT present in the list “, ” «[^, ]+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Hope that helps
-m

Charlie_Veniot · June 21, 2023, 6:06am

<$let 
  in="simple-words, some text, to-do"
  pat="([a-z])([a-z]*)-([a-z])([a-z]*)"
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[-]] [[$3]uppercase[]addsuffix[$4]] +[join[]] +[search-replace:g[:::],[ ]]"
>

<$list filter={{{
      [<in>search-replace:g[ ],[:::]search-replace:gm:regexp<pat>,<rep>]
}}}>
<<currentTiddler>>
</$list>

</$let>

or (I prefer this slight change):

<$let 
  in="simple-words, some text, to-do"
  pat="([a-z])([a-z]*)-([a-z])([a-z]*)"
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[-]] [[$3]uppercase[]addsuffix[$4]] +[join[]]"
>

<$list filter={{{
      [<in>search-replace:g[ ],[:::]search-replace:gm:regexp<pat>,<rep>] +[search-replace:g[:::],[&nbsp;]] }}}>
<<currentTiddler>>
</$list>

</$let>

pmario · June 21, 2023, 7:36am

hmm. There are languages which have different characters than included here. IMO the only way to catch all possible characters is to go with a pattern, that includes every possible char and excludes the separators.

Charlie_Veniot · June 21, 2023, 12:29pm

Maybe so, but of zero importance at this stage of the design and analysis game.

We’re just at the proof-o’-concept stage based on the narrow scope of the OP.

One problem at a time. Incremental and iterative tweaking of the prototype as Mohammad expands further on his needs/requirements. Refine over time.

Big requirements up front: bleurk.

Springer · June 21, 2023, 4:21pm

I’m confused why you would specifically want to exclude capitalizing some text … maybe you’re just trying to draw attention to the hard parts?

I wonder whether a workable solution effectively replaces each hyphen with hyphen-plus-space, then capitalizes as usual, and then re-collapses each hyphen-space combination.

As far as I know, there is no place where hyphen is appropriately followed by a space…

… Actually, some people use two hyphens to represent an em-dash — like what sets this phrase apart — so I guess those shorthand em-dashes (--) would need to be corrected before doing any of that handling of actual hyphens (which you’re calling “dash” here).

Scott_Sauyet · June 21, 2023, 6:01pm

I don’t understand why the version below doesn’t work. It does the proper thing with the hyphenated words, but strips out sapces. I can fix it with Charlie’s space conversion/restoration process, but I don’t know why that’s necessary.

This is the code:

<$let 
  in="simple-words, some text, to-do"
  pat=(\w)(\w*)(-+)(\w)(\w*)
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[$3]] [[$4]uppercase[]addsuffix[$5]] +[join[]]"
>

This vanilla JS equivalent works just fine:

"simple-words, some text, to-do".replace(
   /(\w)(\w*)(-+)(\w)(\w*)/gm, 
  (_, a, b, c, d, e) => a.toUpperCase() + b + c + d.toUpperCase() + e 
) //=> "Simple-Words, some text, To-Do"

Can someone explain why spaces – not captured by the regex – are removed with this call to search-replace?

Charlie_Veniot · June 21, 2023, 6:09pm

It isn’t the regex stripping out spaces. It is the list widget.

Scott_Sauyet · June 21, 2023, 6:10pm

Oh, of course. Thank you.

Mohammad · June 22, 2023, 10:14am

A post was merged into an existing topic: Dynamic Filters in One Go

Charlie_Veniot · June 21, 2023, 6:27pm

Evaluate what is inside the triple curly brackets first.

Say {{{ whatever }}} evaluates to A space B space C

<$list filter = "A B C">

You see why the spaces go away, right?

Say {{{ whatever }}} evaluates to A   B   C

The spaces go away like before, and then the html space codes get rendered at render time.

Scott_Sauyet · June 21, 2023, 6:33pm

Thanks. I got that from your earlier response, and I do understand what you’re doing with the multiple replaces – adding a placeholder for spaces, then replacing that placeholder with &nbps; – but my last two attempts above don’t use the <$list> widget; one uses <$text> and the other uses a <$let> variable. Am I implicitly using <$list>? Or is something else going on?

I’m sure it’s me being dense about syntax again, but I don’t see what’s wrong with them.

Charlie_Veniot · June 21, 2023, 6:38pm

The text widget just takes the string coming at it and spits out that string without evaluating it. Curly bracket content gets evaluated, then text widget spits that out as-is.

The list widget takes a string and evaluates it. Curly brackets get evaluated, then the list widget evaluates the result of the curly bracket.

We need double evaluation. Triple curly brackets cause one evaluation, and then list widget causes another evaluation.

The only other widget you can use instead of list widget for second evaluation is the wikify widget. (TW 5.2.3 viewpoinr)

Scott_Sauyet · June 21, 2023, 6:43pm

That’s going to take a while to sink in. Thank you very much for your help.

saqimtiaz · June 21, 2023, 6:46pm

Your code dynamically constructs a string representing a filter expression. Great. You now need to evaluate it as a filter expression.

The first filter evaluation constructs your filter expression, the second evaluates it.

Try this:

<$let 
  in="simple-words, some text, to-do"
  pat=(\w)(\w*)(-+)(\w)(\w*)
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[$3]] [[$4]uppercase[]addsuffix[$5]] +[join[]]"
>

<$text text={{{
      [<in>search-replace:gm:regexp<pat>,<rep>] :map[subfilter<currentTiddler>]
}}}/>

</$let>

Charlie_Veniot · June 21, 2023, 6:52pm

BTW, there is no need at all for the ::: stuff in there. That just makes it way easier to debug things, because they stand out more than  

For those who don’t care about easy visual things for debugging, the following should work (but I have not tried):

<in>search-replace:g[ ],[ ]search-replace:gm:regexp<pat>,<rep>] ] }}}

Scott_Sauyet · June 21, 2023, 6:59pm

But that still strips out spaces. Obviously I can use Charlie’s technique, but I really want something simpler to work. I may simply have to get used to disappointment.

Charlie_Veniot · June 21, 2023, 7:06pm

Know TiddlyWiki and think as TiddlyWiki is, and there is no struggle and have no disappointment.

Do not know TiddlyWiki and think as not TiddlyWiki is, and you will struggle and have disappointment.

I am happy as a clam, because I think as TiddlyWiki is, and it works as I think.

You are getting in your own way. Empty your cup.

Charlie_Veniot · June 21, 2023, 7:46pm

Yes, I’ve watched Kung Fu Panda a few too many times …

TW_Tones · June 21, 2023, 11:51pm

It may not be relevant but when splitting titles or strings to perform operations on them you can use +[join[ ]] containing a space to rejoin them restoring the spaces.