AI Generated Content on talk.tiddlywiki.org

Springer · July 10, 2024, 2:25pm

Which paper, in turn, appeals to Frankfurt’s philosophical account of bullshit.

And indeed, the reason I’m suspicious of LLM output is not so much because it’s “dumb machine circuits” rather than “smart organic humans” but because of how LLMs have been trained up. They’ve been trained to throw gobs of stuff at the wall repeatedly until something sticks.

When our children do that, we send a clear visceral signal that there are lines that must not be crossed. Throw spaghetti at the wall one more time, and you’ll be regretting it for a while.

When we are in a learning role, we need a sense of finitude, humility, social deference to those who are likely to know what we don’t, and shame (being mortified, taken aback, and rendered at least briefly dumb with soul-searching when we really mess up, hit the third rail, encounter the silent treatment, etc.).

Alas, those who have been training up our LLMs chose an initial recipe that is 100% shameless exuberance and 0% of the critical capacity (awareness of one’s own ignorance) that is equally vital to intelligence. LLM disclaimer boilerplate (about their limits) and “taboo patches” (censorship around certain words or patterns) are tacked on at the edges, rather than built into the core of its handling of meaningful stuff.

As Socrates realized (trying to make sense of the oracle that proclaimed no Athenian was wiser than he): The first step in wisdom is recognizing one’s own ignorance.

Scott_Sauyet · July 10, 2024, 2:31pm

His book On Bullshit just arrived yesterday… some light vacation reading!

Moreover, what sticks has nothing to do with underlying models of the domain in question, but simply, “What is the word most likely to come next in my partial output?” That’s great for creating syntactically reasonable output, but has little to do with the underlying semantics.

Springer · July 10, 2024, 2:50pm

Surely there’s a continuum here, and no clear line.

I suspect much of human speech (by humans) is poor in semantic depth, and mostly a matter of having a “knack” for “what to say when”.

Some humans, including some with a great deal of power, skim over the surface of language without engaging carefully with meanings. They engage in bullshit just as much as LLMs. (I’m just going to leave that observation there, to avoid derailing the conversation…)

And even while I like to decry such behavior, we are all more or less sphexish in how we navigate the world and language (without implying anything about determinism here — just being fallible and inept around the edges).

So the difference between semantic depth and mere “figuring out what sticks” is a matter of degree, and can’t be a litmus-test difference between humans and LLMs.

Scott_Sauyet · July 10, 2024, 3:16pm

Yes of course. For the past year, I’ve been playing to role of the cantankerous sceptic about LLMs at my day job, a counterbalance to the extreme enthusiasm most everyone seems to express. (Although the last two months have seen a drastic turn-around.)

Some of that carries over here.

I loved seeing “sphexish” – I’m a long-term Hofstadter fan. And I mostly agree with his suggestion that there is a huge continuum between form and content. I love the example (either from one of his Scientific American columns or from his commentary on one of them in Metamagical Themas) that goes something like this:

Given Sentence M: “Mary was sick yesterday,” we can describe varying levels of understanding:

Sentence M contains twenty characters.
Sentence M contains four English words.
Sentence M contains one proper noun, one verb, one one adverb, in that order.
Sentence M contains one human’s name, one linking verb, one adjective describing a potential health state of a living being, and one temporal adverb, in that order.
The subject of Sentence M is a pointer to an individual named ‘Mary’, the predicate is an ascription of ill health to the individual so indicated, on the day preceding the statement’s utterance.
Sentence M asserts that the health of an individual named ‘Mary’ was not good the day before today.
Sentence M says that Mary was sick yesterday.

I don’t think LLMs are yet capable of level 4; they’re not even close to level 5.

But yes, I certainly overreached. Blame it on overexposure to LLM fan-boys steering GigantiCorp’s technical direction! And I’m still unconvinced that there’s much there there. I do think The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con explains a great deal of the “success” of LLMs. I am getting older; perhaps it’s just a matter of being crotchety too.

TW_Tones · July 10, 2024, 11:53pm

All of the suggested features can be achieved with the functions I published. In a similar way to you, I would not want to oblige people to search for them just to get the point I am trying to make.

On one hand, A reasonable percentage of readers may just scan the code samples, even ignore the quoted section.
On the other hand someone who understands the code can;
- Critically review the proposed functions
- Perhaps go directly to coding a trigger or action
- Tell me how to code such triggers or actions.

TW_Tones · July 11, 2024, 12:18am

As a sceptic and critical thinker I have questioned the claims about LLMs and call them exactly that, LLMs. they are not A.I.

I have spent substantial time researching how to make use of them, and how they work. Some of this has being made visible within talk.tiddlywiki

If you totally ignore the positive hype, the doomsdayers and the overly critical, you find they are a powerful tool, but like any tool, a trusted friend, or any human expert, you need to be vigilant, sceptical, critical.

Because of this never use it to publish something you yourself do not know without validation. This is the strong argument supporting discouraging raw LLM answers here.

You also discover that asking the right questions, the new art/science of “prompt engineering” is the main determinant of the effectiveness of LLM’s. It would be unwise to emphasise too heavily that LLM’s produce bullshit, because it may imply your questions were bullshit. GIGO

To put it another way, just as search engines help us gain access to a large “corpus” of knowledge, or querying a “Data Warehouse”, LLM’s provide us another avenue to query a larger less curated corpus of knowledge. But, by their nature they require even more sceptical, critical and validation especially when outside your own knowledge domain.

Despite this they remain a powerful tool, or subjective framework, to seek and manage knowledge.

Scott_Sauyet · July 11, 2024, 2:06am

The notion of bullshit as I meant it refers to that article, which paraphrases Frankfurt to say

“The models are in an important way indifferent to the truth of their outputs.”

And I’m afraid that I believe – as a very outside observer – that prompt engineering is similar to both the pyschic’s con and to the nonsense about knowledge from previous livess that Plato promotes in the Socratic dialogue, Meno. With enough triangulation from questions, you can arrive at an answer that suits you, but that has no relationship whatsoever with what’s actually true or what’s actually known.

But the subject at hand is community standards. I don’t think posting small chat outputs here is a big deal, so long as they’re properly labelled and not passed off as original work. Unless the discussion is about the LLMs, though, I would find it much more helpful if the poster synthesized the information in their own words and did enough research to verify that it’s not hallucinations/bullshit.

Such posters should also recognize that in using this material, they may be losing some not insignificant portion of their potential audience. I read most everything you post here, @TW_Tones, but I skipped that one, entirely on the basis of your sourcing from ChatGPT.

Springer · July 11, 2024, 2:49am

It’s fair to say, as you do here, that there’s no source of information we should approach in an absolutely uncritical way.

In at least one way, though, LLMs are very much unlike trusted friends and human experts. They are impervious to our interest in accountability, and their developers have shown no interest in building genuine accountability into their interactions.

Pattern-recognition (what LLMs do) is a matter of what the logician and semiotician C.S. Peirce calls “firstness,” (registering similarities, resemblances, which are always a matter of degree), while an orientation toward distinguishing the true from false, appropriate from inappropriate (etc.) involves engaging with signs in their third (most normative) dimension.

My friends, and human experts, do participate in this field of thirdness (being accountable for registering what’s ok vs what’s not). For this reason, there’s such a thing as developing some appropriate degree of trust in them. If I mislead you by relying on them (in precisely the areas where they’ve committed to being reliable), I can apologize to you, but I can also turn around and hold them to account.

Meanwhile, there is no appropriate degree of trust (normative trust) in an LLM — though of course there’s such a thing as making good probability bets. How I interact with LLMs (or share their output, etc.) involves a distinct responsibility that can only lie with me and with other socially-connected beings who show me that they give a damn about not screwing up.

(I don’t imagine that we’re disagreeing on anything of substance here, @TW_Tones — I’m just hoping to articulate one qualitative difference that is easily overlooked in the "just as with other sources of information… " line of argument.)

CodaCoder · July 11, 2024, 3:21am

Bravo, @Springer

~~The universe~~ AI cares not.

TW_Tones · July 11, 2024, 10:25pm

Yes, @Springer I think we are in agreement,

@Springer yes my point was even if you trust a friend you need to be sceptical, if I can’t trust them at all they would not be a friend. I was not saying LLM’s are a trusted friend, the are an untrusted tool. If they were useless, I would not use them at all. I don’t trust hammers as well, its the way I use them.

I have made use of LLMs to great effect, as a tool. You could say it takes fuzzy search to new heights. It is in fact important to ask the right questions and be critical of the result.

I don’t think I need argue here they have many and effective applications

Until such time as real A.I. Exists, we use LLM’s as tools, and we don’t hold our tools accountable, we ourselves are accountable for what we do with our tools. In fact when I produce something, even with the advice of trusted humans, I need to be accountable for what advice I relied on.

I think this points to a negative consequence of claiming LLM’s is A.I. that some are inclined to assign responsibility to the LLM and ignore their own.

Yes, because they are LLM’s not A.I. or even I
Even the Title of this thread makes the error of suggesting there is any A.I. in LLM’s
- As if we need it, I could demonstrate by listing what LLM’s can’t, don’t or will not do.

It is true that some people will overly rely on the LLM’s as a tool and pass it off as their own work when it is not, and yes this should be discouraged. Just like any information without sufficient scepticism it will be unreliable.

I think your larger statement “The universe cares not” is the more reliable. This is also clear in Evolutionary science, to start with we “need theory of mind” at a minimum.

Care is perhaps only potentially valid between individuals and in social species.

I am sure there is people out there that don’t know what they are talking about, but there is a serious and new science around prompt engineering with very high success approaches. We ignore that at our peril. Prompt engineering is about setting the context, the input, filters, and asking the right questions.

Not so fuzzy questions, you ask of a fuzzy data/knowledge base.

TW_Tones · July 11, 2024, 10:34pm

I would suggest this is an errant approach, especially if you normally find my content of use, to ignore based on an arbitrary classification. However it may not have been content of use to you, I scan some content for importance or relevance all the time.

Scott_Sauyet · July 11, 2024, 11:17pm

But I am ready to read your content. I am not ready to read ChatGPT’s content.

With a limited amount of time to invest on any subject, I might briefly scan widely, but I will invest time in sources that seem more reliable. During the height of Covid, for instance, I paid far more attention to serious medical researchers than to the likes of Joseph Mercola.

ChatGPT is emphatically not a reliable source. Its training material contains plenty that is true, but also contains vast swaths of falsehoods. And it has no mechanism to distinguish them. The only way it might get a relatively high truth rating is if the material it’s trained on is substantially more truthful than the Internet as a whole. And that would require enormous amounts of expertise.

Because such LLMs are trained on more correct information than disinformation, they are useful to suggest areas of study, to help with brainstorming, and so on. But their raw output is not worth much more than that. I choose not to invest my time with them.

Had you synthesized the information yourself, done some research into those techniques, and only used LLM output as a jumping-off point, then I would have happily read your post.

Yes, I’ve been following the LLM journey somewhat closely. As the backlash seems to be setting in, I’m guessing we’ll see much more of a realization that prompt engineering is just another example of trying to fit people to the limitation of software rather than crafting software designed for people. I’m guessing it will end up a short-lived fad. I’ll bet my NFTs on it.

No, there are plenty of intelligences indifferent to the truth; we call them bullshitters. For at least current-crop LLMs, there is simply no notion of truth to consult. I agree that “AI” is misapplied to LLMs, but pretty much every historical application of the term to some field of technology been problematic. LLMs’ level of (un-)intelligence has little to do with their reliability.

TW_Tones · July 11, 2024, 11:37pm

I urge further investigation into this, for example with prompt engineering if you ask for answers that a qualified professional would say, you change the probabilities that you will get better answers.

As I said, it is the quality questions you ask.

I have already created a custom Chat Session that includes in its corpus specific tiddlywiki source material, for example the word “wikitext” returns valid wiki text, it is not perfect because it needs more training or references.

It also gives code examples you can test.

Scott_Sauyet · July 11, 2024, 11:54pm

So do you have suggestions on how a prompt engineer would have updated your prompt

so as to get a more correct result… without any foreknowledge of the deprecation in its second suggestion or the incorrect code in the third one?

TW_Tones · July 12, 2024, 12:37am

The result was reasonable as it stood, have a look at the [Edit] comments by @Mario Mario who did the due diligence.

My question set the scope by saying “website” which means html, javascript, and css was relevant
I did not expect perfection, and I did not pretend it was so.

Here we see I, ChatGPT and mario have collaborated to generate usable info and code. Now we need to create some action triggers. Make them into modules, currently just beyond my reach.

Scott_Sauyet · July 12, 2024, 2:01am

I think we’re going to have to agree to disagree here.

From my point of view, it required outside expertise to correct 2/5^ths of the suggestions, and there is no obvious way to have found that using prompt engineering or other current techniques. Therefore the original chat response was wrong far too often for me to want to take it seriously. Even 1 in 10 would be awfully high.

But clearly you see it differently. I don’t think there’s much chance of changing one another’s minds.

But if I get a vote, I’m in favor of Jeremy’s suggestion in the OP.

TW_Tones · July 12, 2024, 2:25am

Yes, we will agree to disagree, it is not a function of producing perfect output from the LLM but appropriate output. Remember these calls in plain JavaScript need to be read in the TiddlyWiki context and it was posted to facilitate collaborations, that has occurred.

In fact the way I presented it, it may very well be useable in another context independent of TiddlyWiki.
As soon as one sets hard, fast and somewhat inflexible rules, the community has started a slow death in my opinion.
Remember all the caveats, I preface it with a discussion, explained its source, quoted it in full and invited comment. Which I got and @Mario tested it.
- So if I did something wrong there, you are suggesting inflexible rules or harsh criticism.

It’s interesting how I have a short break from substantial contributions then return and quickly find grief. This is not the way to encourage and maintain participation.

Birthe · July 12, 2024, 7:20am

Please TW_Tones, I think you take this far too personal. You are appreciated, very active and helping a lot of us. Thank you!
Read the original post once more. It is about not copying AI generated Content. Not about you using it and commenting.
I often have problems due to lack of language skills - and to be honest also not having knowledge enough about Tiddlywiki. I often experience remembering a thread at a later date and understand better.
What I really did understand is to avoid this forum being yet another echo chamber for other AI’s to be trained from. As a user, that is also of interest to you.
Yes, your post was mentioned as an example, the post - not you! If you had not done it, somebody else would.
The discussion has been interesting to read!

TW_Tones · July 12, 2024, 8:37am

Thanks for your comment @Birthe

That is correct, and my “responsible post” was given as example implying it was breaking the proposed rule. Subsequent comments supported that position, and I think it is WRONG.

This is not about it being personal, it is about it being difficult to put a reasonable argument.
It is me saying I think the proposed moderation is inappropriate especially if my post falls into rule, it was given as an example.

Everything is easy if I agree, but If I want to have a second opinion respected, then it seems like grief.

twMat · July 12, 2024, 9:22am

I don’t understand why this is being discussed at all.

Just like two years ago, if someone posts…

outright crap, they are banned
bad quality (illegible, often stupid suggestions, etc), they are ignored
good stuff, people read it

…and the same applies to any stuff you copy, AI generated or not!

Obviously if someone just copy-pastes any gob without curating it then that is just a bad member and will get treated as such.

How is this not obvious?