How to Filter A + (B-A)? - and AI Generated Code

Ah, I didn’t notice that the code wasn’t formatted as such. I do apologize for jumping to the conclusion that the code was untested.

[Slightly edited:]

In some respects, it doesn’t ultimately matter what the origin is, so long as you’ve tested the code and made sure it works (ideally, in an elegant/flexible way).

However, our community norm is to avoid posting LLM/AI-generated code without acknowledgment. And in this case, even before seeing the bits that were stripped out within angle-brackets, it’s clear that the approach in that code is really not a straightforward and elegant solution, compared to the neat solution posted earlier in the thread by @saqimtiaz

I don’t think it’s a coincidence that such verbose and roundabout code was generated by a LLM, though I myself have also sometimes built some very complex Rube Goldberg contraptions when I couldn’t find an elegant solution!

For such reasons, I do think our norm should be not only to make sure code works, but to include a disclaimer if/when LLM-generated code is posted by someone who is not prepared to explain how it works and how to learn from it.

1 Like

So, this is fantastic — documentation updates can be quite fast:

https://tiddlywiki.com/#unique%20Operator:[[unique%20Operator]]%20[[Dominant%20Append]]

Thanks again @saqimtiaz for helping ordinary folks step up here and there.

@twMat: if you had seen this kind of documentation, would that have helped you avoid feeling perplexed?

2 Likes

UPDATE: I do now see the code, and how it works. (What follows still reflects my thinking about whether it’s a good solution, but “good” is a matter of degree.)

Although the code you curated through the LLM may work, I do think — given its complexity — that it’s a bit of a kluge. It works , and shows that a solution is possible, but it is cumbersome and indirect.

If someone asks, “Is this possible?” the answer (in TiddlyWiki) is pretty much always “yes.” But if they ask, “How should I do this?” I think our goal should be to give people the conceptual tools they need so that they can solve similar problems, parsimoniously (elegantly).

There are lots of convoluted ways to get around the Dominant Append behavior, when one wants the opposite (that is, one wants to keep the earlier value when there are duplicates). In the past, I have tried workaround with a set of reverse[] operations — to feed a “backwards” order into the de-duplication process, and then to re-reverse the order. Although that, too, works, I knew that was not the most elegant solution.

At any rate, my suggestion is that when you post something that was generated by a LLM process, and you don’t yourself understand how or why it works (and you therefore probably don’t understand whether it’s an efficient or flexible solution), please do be up-front about this fact, with a disclaimer such as “This code comes from LLM, but then I tested it, and it does seem to work”.

Still, I do think people would welcome this kind of tool. Sometimes we encounter one-off problems and are more interested in getting any solution that works, so we can move on with the larger tasks at hand. :slight_smile:

@Springer – have a look at: How to Show Backticks in Discourse Threads to see, how to easier explain backticks

It’s a wiki, so everyone can see the code. … and improve it.

First of all; thank you for improving the docs! That bit about unique keeping the first instance" was absolutely news to me and also to conceptually contrast it to “dominant append”.

Would it have helped; well, it will from now on. I turn to the docs quite a lot, especially the filter ops, and sometimes without any specific problem to solve but more with the intention to generally understand what is possible. For my OP I don’t think I had considered unique at all so I don’t think it would have helped (in the “past tense”) but I think it will help the next time because I now have a better sense of what unique does.

1 Like

Norbert, welcome! Thanks for posting. I didn’t see this until after your code was polished a bit so I happily missed the confusion around it :wink:
With that said, that’s an interesting solution. One key question is if it is more efficient than other solutions. It does seem to involve more stuff than Saq’s solution - but your solution can be simplified into this:

<$let 
  list1="[[33]] [[22]] [[44]]"
  list2="[[22]] [[33]] [[11]]">
{{{ [enlist<list2>] -[enlist<list1>] +[prepend<list1>] }}}
</$let>

i.e the filter:
[enlist<list2>] -[enlist<list1>] +[prepend<list1>]

…and now I don’t know which is more efficient, this or @saqimtiaz , i.e:
[enlist<list1>] :all[enlist<list2>] :and[unique[]]

Anyone knows?

[enlist<list1>] =[enlist<list2>] +[unique[]]

Nice job seeing through the weeds to get at the minimalist form of what was working, there in code pasted in by @stelzi (Note we can further streamline the code by @saq to use short-form filter run prefixes).

There’s the question what’s more efficient (conceivably with performance implications, though probably not in this case), but also the question what code feels more semantic.

The code by @saqimtiaz reads well, from left to right. Get these things that belong at the beginning, also get those (without de-duplication) which belong next in order; then remove duplicates (favoring earlier instances). The LLM code says, Get the second-ranked list, weed out anything that’s also part of the first-ranked list, then stick all of that first-ranked list at the front of the train.

If you think of your solution as a pattern to modify or re-deploy in future scenarios — perhaps cases that combine three or four overlapping lists (while needing to keep the earliest copy of any duplicate) — which solution generalizes better and is easier to troubleshoot?

Maybe even more semantic for potential expansion/adaptation (with “append” admittedly inspired by the prepend in @stelzi solution) is this concise version:

[enlist<list1>append<list2>unique[]]

(For reasons I don’t understand, the enlist operator seems not to matter for lists beyond the first, perhaps because “append” implicitly enlists? Anyway, this minimalist code (fewer keystrokes than either of the initial candidates) is the shortest I can come up with… :thinking:)

Personally I would say;

  • Always declare the use of LLM’s to produce content, possibly until the heat death of the universe.
  • Always test first and advise if it has been fully tested.

@twMat I think it productive and helpful to present such logical set requirements in plain language, this often also leads to finding ways to rephrase the question. It also helps to give a hint as to what it is trying to achieve. Although I am aware you are already simplifying, to state the question.

1 Like

:rofl:

I suspect there may be grey area cases, where code is generated by LLM, but where the person working with the LLM and sharing the result here recognizes not just the fact that it works, but also is fluent enough to see how and why it works, and has enough understanding to recognize why this is a decently straightforward solution — an “I see how I would have gotten there eventually” kind of moment.

Even so, of course, I see wisdom in erring on the side of acknowledgment, because that affects what we can expect in further conversation.

Basically, if you post code here as a suggested approach to a problem, you’re implicitly saying that this code reflects your own understanding, and you’re paying attention to how/why some solutions might count as better than others.

Only in exceptional circumstances should someone post code (LLM or otherwise) that does not express any of their own substantive understanding. It would have to be a case where:

  1. The source of the functional but poorly-understood code is acknowledged, and
  2. Nobody else has solved the problem at hand already, and (or)
  3. The poster is at least interested in developing a substantive understanding, as in “Oh, I found this (other) approach (via LLM/on another wiki/etc.), and I’m curious: Why does this (also) work, and are there advantages/disadvantages to doing it this way?”
2 Likes

Quite relevant: AI Generated Content on talk.tiddlywiki.org.

Right, I was replying with that thread in mind, but thanks for flagging it here.

I replied there  

What, specifically, is not clear in my OP?

Oh! That’s very neat and seems very efficient. In my ignorance about these things I’m guessing that; appending is just a matter of the system changing a pointer or two and then unique does a more costly traversing to compare titles but that can’t be avoided given the requirement.

1 Like

There is unlikely to be a significant performance difference between this filter and the one I suggested, as the costly bit is the unique operator. So I suggest using whichever fits your cognitive pattern best.

It is not that we can’t get clarity on what your problem is, but phrased in another way we may be able to state it terms of logic or plain language much more easily. What is the reason you want this to be evaluated, and specifically to maintain this order?

You don’t have to answer that, since you have the answer already.

I just think we may be able to provide a whole class of solutions, not just a specific one.

Given list A append list B, retaining the order and removing duplicates

This is exactly how [enlist<A>append<B>unique[]] reads.

Hi folks, I was not able to split the AI posts from the OP - So @twMat I did change the thread title to fit the actual content.

Fascinating! I would not have thought that unique could be more costly than any other of the listops when working with similar numbers of items. Are you suggesting that an additional remove step (as in [enlist<list2>] -[enlist<list1>] +[prepend<list1>]) would be less costly (computationally) than using unique (as in [enlist<list1>append<list2>unique[]])? If so, we have a genuine tradeoff here between machine-efficient code and natural-reading, easily-expandable code.

If Dominant Append is the most efficient (machine-wise), and unique is the least so, then perhaps I should finally trot out my own Rube Goldberg solution now (which respects one of my desiderata, which is being easily able to accommodate any number of lists, which the remove-prepend acrobatics cannot):

[enlist<list2>reverse[]] [enlist<list1>reverse[]] +[reverse[]]

Just backing that train right up to the station… :nerd_face:

To clarify, the unique operator is costly compared to the rest of the operations of those rather simple filter expressions.

Potentially, though I would need to study the code implementation to answer definitively. However, I do think that unless we are dealing with lists hundreds of thousands of items long, the difference if any is going to be negligible.

My uninformed gut feeling says that might be the slowest option, since the reverse operator needs to loop through all its input.

  • I appreciate your insight here

I suppose what we need is some kind reference to the missing number that helps us scale it to a meaningful time cost.

The unique operator may cost twice as much when compared to the rest of the operations, such as 2¢ vs 1¢ per hundred thousand.

Perhaps we could use an existing large data set and a standard environment to extract some metrics as a general rule?

1 Like