Regexp Search and Replace for Words with Dash (Hyphen)

What is the suitable regexp pattern to capitalize the words with dash as below?

  • Find all words with em-dash (e.g. contains a dash in the middle)
  • Replace the captured words with first letter of each part capitalized

Example

simple-words, some text, to-do

Simple-Words, some text, To-Do

Partial wikitext

<$let 
  in="simple-words, some text, to-do"
  pat=...?
  rep= ..?
>

<$text text={{{
      [<in>search-replace:gm:regexp<pat>,<rep>]
}}}/>

</$let>

How the above code can be completed? What are the pat and rep?

TW has “dash” -, “endash” -- and “emdash” --- … In your example you only use the single dash.

Here’s a regexp, that returns 3 backreferences + detailed description. The regexp works for 1 word-dash combination. So for your example it would need to be done 3 times

The 1st backreference contains every character except those in the list: ,<space>- comma, space, dash

The 2nd backref. contains the “spacers”, m-dash, n-dash, dash, <space>

The 3rd beckred. contains every char except ,<space>

([^, -]+)(---|--|-| )([^, ]+)

// ([^, -]+)(---|--|-| )([^, ]+)
// 
// Options: Case insensitive; Dot doesn’t match line breaks; ^$ don’t match at line breaks
// 
// Match the regex below and capture its match into backreference number 1 «([^, -]+)»
//    Match any single character NOT present in the list below «[^, -]+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
//       A single character from the list “, ” «, »
//       The literal character “-” «-»
// Match the regex below and capture its match into backreference number 2 «(---|--|-| )»
//    Match this alternative (attempting the next alternative only if this one fails) «---»
//       Match the character string “---” literally «---»
//    Or match this alternative (attempting the next alternative only if this one fails) «--»
//       Match the character string “--” literally «--»
//    Or match this alternative (attempting the next alternative only if this one fails) «-»
//       Match the character “-” literally «-»
//    Or match this alternative (the entire group fails if this one fails to match) « »
//       Match the character “ ” literally « »
// Match the regex below and capture its match into backreference number 3 «([^, ]+)»
//    Match any single character NOT present in the list “, ” «[^, ]+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Hope that helps
-m

<$let 
  in="simple-words, some text, to-do"
  pat="([a-z])([a-z]*)-([a-z])([a-z]*)"
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[-]] [[$3]uppercase[]addsuffix[$4]] +[join[]] +[search-replace:g[:::],[ ]]"
>

<$list filter={{{
      [<in>search-replace:g[ ],[:::]search-replace:gm:regexp<pat>,<rep>]
}}}>
<<currentTiddler>>
</$list>

</$let>

or (I prefer this slight change):

<$let 
  in="simple-words, some text, to-do"
  pat="([a-z])([a-z]*)-([a-z])([a-z]*)"
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[-]] [[$3]uppercase[]addsuffix[$4]] +[join[]]"
>

<$list filter={{{
      [<in>search-replace:g[ ],[:::]search-replace:gm:regexp<pat>,<rep>] +[search-replace:g[:::],[&nbsp;]] }}}>
<<currentTiddler>>
</$list>

</$let>

hmm. There are languages which have different characters than included here. IMO the only way to catch all possible characters is to go with a pattern, that includes every possible char and excludes the separators.

Maybe so, but of zero importance at this stage of the design and analysis game.

We’re just at the proof-o’-concept stage based on the narrow scope of the OP.

One problem at a time. Incremental and iterative tweaking of the prototype as Mohammad expands further on his needs/requirements. Refine over time.

Big requirements up front: bleurk.

I’m confused why you would specifically want to exclude capitalizing some text … maybe you’re just trying to draw attention to the hard parts?

I wonder whether a workable solution effectively replaces each hyphen with hyphen-plus-space, then capitalizes as usual, and then re-collapses each hyphen-space combination.

As far as I know, there is no place where hyphen is appropriately followed by a space…

:thinking: … Actually, some people use two hyphens to represent an em-dash — like what sets this phrase apart — so I guess those shorthand em-dashes (--) would need to be corrected before doing any of that handling of actual hyphens (which you’re calling “dash” here).

I don’t understand why the version below doesn’t work. It does the proper thing with the hyphenated words, but strips out sapces. I can fix it with Charlie’s space conversion/restoration process, but I don’t know why that’s necessary.

This is the code:

<$let 
  in="simple-words, some text, to-do"
  pat=(\w)(\w*)(-+)(\w)(\w*)
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[$3]] [[$4]uppercase[]addsuffix[$5]] +[join[]]"
>

This vanilla JS equivalent works just fine:

"simple-words, some text, to-do".replace(
   /(\w)(\w*)(-+)(\w)(\w*)/gm, 
  (_, a, b, c, d, e) => a.toUpperCase() + b + c + d.toUpperCase() + e 
) //=> "Simple-Words, some text, To-Do"

Can someone explain why spaces – not captured by the regex – are removed with this call to search-replace?

It isn’t the regex stripping out spaces. It is the list widget.

Oh, of course. Thank you.

A post was merged into an existing topic: Dynamic Filters in One Go

Evaluate what is inside the triple curly brackets first.

Say {{{ whatever }}} evaluates to A space B space C

<$list filter = "A B C">

You see why the spaces go away, right?

Say {{{ whatever }}} evaluates to A &nbsp; B &nbsp; C

The spaces go away like before, and then the html space codes get rendered at render time.

Thanks. I got that from your earlier response, and I do understand what you’re doing with the multiple replaces – adding a placeholder for spaces, then replacing that placeholder with &nbps; – but my last two attempts above don’t use the <$list> widget; one uses <$text> and the other uses a <$let> variable. Am I implicitly using <$list>? Or is something else going on?

I’m sure it’s me being dense about syntax again, but I don’t see what’s wrong with them.

The text widget just takes the string coming at it and spits out that string without evaluating it. Curly bracket content gets evaluated, then text widget spits that out as-is.

The list widget takes a string and evaluates it. Curly brackets get evaluated, then the list widget evaluates the result of the curly bracket.

We need double evaluation. Triple curly brackets cause one evaluation, and then list widget causes another evaluation.

The only other widget you can use instead of list widget for second evaluation is the wikify widget. (TW 5.2.3 viewpoinr)

That’s going to take a while to sink in. Thank you very much for your help.

Your code dynamically constructs a string representing a filter expression. Great. You now need to evaluate it as a filter expression.

The first filter evaluation constructs your filter expression, the second evaluates it.

Try this:

<$let 
  in="simple-words, some text, to-do"
  pat=(\w)(\w*)(-+)(\w)(\w*)
  rep="[[$1]uppercase[]addsuffix[$2]addsuffix[$3]] [[$4]uppercase[]addsuffix[$5]] +[join[]]"
>

<$text text={{{
      [<in>search-replace:gm:regexp<pat>,<rep>] :map[subfilter<currentTiddler>]
}}}/>

</$let>
1 Like

BTW, there is no need at all for the ::: stuff in there. That just makes it way easier to debug things, because they stand out more than &nbsp;

For those who don’t care about easy visual things for debugging, the following should work (but I have not tried):

<in>search-replace:g[ ],[&nbsp;]search-replace:gm:regexp<pat>,<rep>] ] }}}

1 Like

But that still strips out spaces. Obviously I can use Charlie’s technique, but I really want something simpler to work. I may simply have to get used to disappointment.

Know TiddlyWiki and think as TiddlyWiki is, and there is no struggle and have no disappointment.

Do not know TiddlyWiki and think as not TiddlyWiki is, and you will struggle and have disappointment.

I am happy as a clam, because I think as TiddlyWiki is, and it works as I think.

You are getting in your own way. Empty your cup.

Yes, I’ve watched Kung Fu Panda a few too many times …

It may not be relevant but when splitting titles or strings to perform operations on them you can use +[join[ ]] containing a space to rejoin them restoring the spaces.