Compare two strings char by char?

RetRoland · April 25, 2024, 11:02pm

I’d like to compare two strings by char.
The result would ideally be the first position where one string differs from the other.
For example, abcef is the first string and abcrsdef the second.

I could think of two filters like[[abcdef]split[]] and [[abcrsdef]split[]], where the result is 4 .
Unfortunately I have no clue how to put these two filters in relation to each other.

Any hint is really appreciated, thank you in advance.

TW_Tones · April 25, 2024, 11:18pm

I imagine there are regular expressions that can help but if you look closely at the set widget there is a way to reference the Nth entry in a variable. You could save your two lists after split[] as variables, count the number of items, then use a range[0]<n> to step through each position using the $set select=<<item>> to access the same position in both lists and compare them, as soon as one does not match your item number is the position.

I can spell this out more but this is a quick answer.

Charlie_Veniot · April 26, 2024, 1:31am

While you wait for somebody to give you more “modern” (or sleeker) code, here’s one old-school way to go about it, just for the giggles. (Only for the giggles?)

Paste into a new tiddler, and try different values in the string1 and string2 variables.

<$let string1="abcdefg"
          string2="ahcd"
         length1={{{ [<string1>split[]count[]] }}}
         length2={{{ [<string2>split[]count[]] }}}
         endrange={{{ [<length1>match<length2>] [<length1>add[1]] [<length2>add[1]] +[minall[]] }}} >


<$wikify name="wikify1" text="""
<$list variable="r" filter="[range[1],<endrange>]" counter="count">
<$let char1={{{ [<string1>split[]nth<r>else[:::]] }}}
         char2={{{ [<string2>split[]nth<r>else[:::]] }}}>
<$list variable="mismatch" filter="[<char1>!match<char2>]">
mismatch at character position <<r>>: {{{ [<char1>search-replace[:::],[end of string]] }}} and {{{ [<char2>search-replace[:::],[end of string]] }}},
</$list>
</$let>
</$list>
""">
<$list filter="[<wikify1>split[,]nth[1]]">
{{!!title}}
</$list>
</$wikify>

</$let>

Charlie_Veniot · April 26, 2024, 2:05am

Little tweak to handle scenarios in which strings match:

<$let string1="abcd"
          string2="abcd"
         length1={{{ [<string1>split[]count[]] }}}
         length2={{{ [<string2>split[]count[]] }}}
         endrange={{{ [<length1>match<length2>] [<length1>add[1]] [<length2>add[1]] +[minall[]] }}} >


<$wikify name="wikify1" text="""
<$list variable="r" filter="[range[1],<endrange>]" counter="count">
<$let char1={{{ [<string1>split[]nth<r>else[:::]] }}}
         char2={{{ [<string2>split[]nth<r>else[:::]] }}}>
<$list variable="mismatch" filter="[<char1>!match<char2>]">
mismatch at character position <<r>>: {{{ [<char1>search-replace[:::],[end of string]] }}} and {{{ [<char2>search-replace[:::],[end of string]] }}},
</$list>
</$let>
</$list>""" output="formattedtext">
<$list filter="[<wikify1>split[,]nth[1]regexp[mismatch]] [[Matching Strings !!!]] +[nth[1]]">
{{!!title}}
</$list>
</$wikify>

</$let>

EDIT: originally had “html” as the wikify widget’s output, and changed it to “formattedtext”

TW_Tones · April 26, 2024, 4:43am

@RetRoland I presume you would consider extra characters as a difference?

Charlie seems to have given a working example but I am interested in exploring such an example as I am currently working on enhancing string handling. But before I share another approach;

@RetRoland you can also look at the various difference tools in tiddlywiki such as the DiffTextWidget

<$diff-text source=<<a-string>> dest=<<b-string>>/>

Snag_aa00dec

The result depends on the source or destination in use, to say if its and addition or deletion, and it counts contiguous (together) differences as one.

Using the set select parameter;

\define a-string() abcdef
\define b-string() abcrsdef
\function zthitem() [<item>subtract[1]]
\function test() [<each>!match<comparing>then<item>]

<$list filter="[<a-string>split[]]" counter=item variable=each>
<$set name=comparing filter="[<b-string>split[]]" select=<<zthitem>> >
<!-- <item>> <<each>>/<<comparing>>  --><<test>>
</$set>
</$list>

The above returns 4 5 6

This shows the position of any mismatches, All you need is the first position 4, but does not account for which value is tested first.

I believe I can compress this further using functions.

RetRoland · April 26, 2024, 2:58pm

This is a very interesting approach which I never thought about.
Looks promising, I will try to follow this one.
Maybe it could be done sleeker, but not by me
Thank you!

Edit:
reading on, I think that @TW_Tones approach suits me needs better
But this doesn’t compromise your snippets, they’re very informative to me and I will keep them in my library for further use.

RetRoland · April 26, 2024, 3:11pm

I discovered the Diff-Text-Widget some days ago and it is exremely helpful for quickly visualizing the differences, e.g. TW-code.

It doesn’t solve my problems, because I can’t get any values back for further processing, so I was forced looking for another solution.

I didn’t remember the each-operator, which seems to be the key for a solution. If I would I never had to ask my question.
Thank you very much, this is what I was searching for and thank you for providing a working example, too.
Great stuff!

tw-FRed · April 26, 2024, 6:24pm

Hi folks!

Here’s my take at it, just for fun…

Specs:

Only filter syntax, no wikitext, because… why not?
We want a function with 2 parameters considered as strings (empty values are Ok)
When both parameters are identical, the function returns nothing (even if both parameters are empty)
Else the function returns the index of the first different character, “1-based”.

Here’s the code:

\function str.secondIsLonger(s1, s2) [<s1>length[]] :filter[<s2>length[]compare::gt<currentTiddler>then<currentTiddler>]

\function str.firstDiffIndex(str1, str2) [<str1>split[]] :map:[<str2>split[]zth<index>else[]!match<currentTiddler>then<index>] :filter[<currentTiddler>!is[blank]] [str.secondIsLonger<str1>,<str2>] :else[[-1]] +[minall[]!match[-1]add[1]]

---

Examples:

---

Should be "same":

{{{ [str.firstDiffIndex[abcdefgh],[abcdefgh]] :else[[same]] }}}

---

Should be "same":

{{{ [str.firstDiffIndex[],[]] :else[[same]] }}}

---

Should be "4":

{{{ [str.firstDiffIndex[abcdefgh],[abcZdefgh]] :else[[same]] }}}

---

Should be "9":

{{{ [str.firstDiffIndex[abcdefgh],[abcdefghZ]] :else[[same]] }}}

---

Should be "9":

{{{ [str.firstDiffIndex[abcdefghZ],[abcdefgh]] :else[[same]] }}}

---

Should be "1":

{{{ [str.firstDiffIndex[],[abcdefgh]] :else[[same]] }}}

---

Should be "1":

{{{ [str.firstDiffIndex[abcdefgh],[]] :else[[same]] }}}

Notes:

The str.firstDiffIndex function’s first filter run splits every char from its first parameter str1 ([<str1>split[]])
In the second filter run, the second parameter (str2) characters are split and relevant char is chosen using the index variable of the filter run (<str2>split[]zth<index>). When str2 is shorter than str1, the result may be empty, in which case it’s replaced by an empty char (else[]) in order to induce a difference.
At this point the nth char of str2 (or an empty string), is compared with the current str1 character and if they don’t match the result is the current index (!match<currentTiddler>then<index>).
Then a :filter filter run removes empty values left by the :map filter run – :map result always has at least as many items as its input – (:filter[<currentTiddler>!is[blank]]).
Now we need to address the special case of “abc” vs “abcZ” ie str2 is identical to str1 but with added content. In this case the result should be the length of str1 +1. That’s the purpose of str.secondIsLonger function.
Function str.secondIsLonger compares lengths of its parameters and returns the length of the first one only if the second one is longer. It leverages one great power of functions, which can be composed of several filter runs but behave like a filter operator when used. Filter syntax doesn’t allow things like <str2>length[]compare::gt[ <str1>length[] ], so here 2 filter runs are used.
As a result of all this, we now have a list maybe containing some differences indexes, then maybe the length of str1, or… nothing at all when both strings are the same.
The lowest value in this list is the first difference, so +[minall[]] should do the trick, but… when given no input, the result of +[minall[]] is “Infinity” (litteraly)! To avoid this behavior, a special value is added when the list is empty (:else[[-1]]) just before choosing (+[minall[]) then excluding it back again (!match[-1]).
Lastly, any index value is “0-based” but we want a more human-readable “1-based” value, so we add 1 (add[1]), hence the code of the 2 last filter runs::else[[-1]] +[minall[]!match[-1]add[1]].

Please note that all this code is quite expensive, so it’s not ready for production use: for each character of the first parameter, the second parameter is split even if a difference has already been found (or worse, if there’s no more character in str2 to compare to)! Anyway, it’s been a lot of fun!

Fred

TW_Tones · April 26, 2024, 10:00pm

Lots if interesting methods and ideas within your example @tw-FRed thanks for sharing.

I am still working through how it works
I like your approach to dealing with the different lengths, but wonder if there are even more ways?
I am impressed with the reduction to two filters.

You may be interested in the stringlength operator I created (you can ignore the controversy in that thread) Idea: Enhance the length operator - #32 by TW_Tones

I wonder if a filter operator to return the longest/shortest string would be practical?

I think any other ideas for enhancing string operations, should be identified and collated.

Mario · April 27, 2024, 4:41am

5 posts were split to a new topic: Should there be a substring filter operator?