AI Generated Content on talk.tiddlywiki.org

jeremyruston · July 2, 2024, 9:18am

I would like to propose an amendment to our code of conduct:

Please do not post the raw output of AI generated content from tools like ChatGPT or Claude.

The fundamental reason that we should try to contain the slop that AIs generate is that they are not reliably truthful or accurate. There are several other arguments against publishing AI slop:

It wastes the time of readers

Pollution of searches with inaccurate results

Pollution of future AIs trained on the content of this forum with inaccurate information

Everyone here has the ability to access AIs themselves, and so nothing is gained by reproducing the verbatim output of a conversation.

For clarity, this is about publishing the raw output of AIs. There are situations where AI-derived content is acceptable .

For example, if one asks an AI for 1,000 suggestions for the name of a new thing, and then manually chooses the 20 best suggestions, then it seems perfectly reasonable to publish that list of 20 suggestions. In other words it is acceptable to publish information synthesised from the output of an AI

It is also acceptable to post links to raw conversations to facilitate discussion

From now on, moderators will hide posts that violate this policy, and work with the authors to refine their post so that the raw AI output does not need to be included.

This is not about discouraging people from using AIs. Personally, I find them very useful in some situations. The concerns here are about inadvertently spreading inaccurate information and the problems that that gives us in the future.

As somebody else said, why should I read something that you cannot even be bothered to write for yourself?

Scott_Sauyet · July 2, 2024, 12:52pm

Agreed.

I wonder if we should have one more bullet to point out that discussions about AI-generated content are within bounds. (“Nonsense like the following from ChapGPT are going to make people think…”, “I know it’s just a dumb stochastic model, but this sentence from Gemini seems positively inspired!”)

Perhaps something like this?:

For clarity, this is about publishing the raw output of AIs. There are situations where AI-derived content is acceptable .

For example, if one asks…

It is also acceptable to post…

Moreover, it is acceptable to post discussions about AI-generated content, but that content should be clearly marked as such.

Springer · July 2, 2024, 2:19pm

Yes. I did encounter one post a few weeks ago that struck me as probably including some AI-generated code (with some inappropriate bits and some bizarre “just wtf?” bits) — all while passing it off as original work. I did flag it (for the inappropriateness), though I’m not sure anything happened as a result. I also thought about posting something on the AI-code-sharing theme. I’m glad you’ve taken the initiative.

UPDATE: I can confirm that @pmario did follow up to trim lots of the nonsense (which struck me as conceivably shaped by AI) from the post that I had flagged. Thanks!

TW_Tones · July 10, 2024, 5:15am

I agree with the intent of this thread but don’t totally agree, in so far as I have already posted tested and working code that is very meaningfull and not bloated, for two reasons;

A good and working solution
A working solution asking for review.

Arguably it was not verbatim, as I worked it first. But the final code was verbatim.

Because I used ChatGPT to generate solutions for tiddlywiki, aware of and tested in tiddlywiki, I doubt there will be harm done if it is re-ingested, in fact may be good.

I would suggest that the output of such an LLM

Always indicated as sourced from there.
Not make up the content of a whole reply

All your suggestions can otherwise apply in general.

An important note: The output of an LLM is primarily driven by the questions asked and the quality can vary as much as any other contributions. We do not want low quality posts without appropriate information and sources with LLM’s or otherwise.

jeremyruston · July 10, 2024, 8:56am

Hi @TW_Tones the focus of this policy is about publishing the raw, unreviewed output of LLMs. So it is indeed perfectly OK to publish an LLM-generated solution that you have tested and verified.

Your recent post about unload event handlers makes a good concrete example to discuss.

The majority of that post is the output of ChatGPT summarising very basic principles of web development, without any insight or applicability to TiddlyWiki. It would not be useful to a TiddlyWiki end user nor to an experienced developer. I would argue that including the ChatGPT output here adds no value to the post. Anyone interested can consult an LLM for themselves.

Worse is that you are burning through the time and energy of readers. A relevant quote that I like: Why should I read it if you can’t be bothered to write it?

Scott_Sauyet · July 10, 2024, 9:47am

I’m going to steal this for discussions at work!

Mario · July 10, 2024, 10:51am

I think that’s not entirely valid. The post has a clear request for “JavaScript gurus” to implement some plugins.

TW_Tones

Ideally one of our TiddlyWiki JavaScript gurus, could build a set of plugins that allow a designer to set up a set of triggers. The idea would be to typically trigger events such as a logout, save, check in etc… that requires no user intervention, although I think there is value remaining agnostic, to permit innovative solutions from the community.

Then there was the AI created overview of possible events, with some code.

I think the overview was not bad at all. For me it was OK to read.

There is a clear structure.
Every possible event had max 2 sentences intro text + code.
There even is a Considerations section, which points out possible pitfalls - I did like that one.

I think it could be used as a starting point for a plugin author to see, what’s expected in TW_Tones request. – I did know that browsers have “tab visibility detection”, but I would not have thought about it from Tony’s request alone and for sure I would have not known the exact name of the event.

Hmmm, I see it different. As an experienced developer I would have needed to look up the stuff myself, which definitely takes more time, than reading what’s already there. – I do like LLMs for brainstorming.

As a TiddlyWiki “end user” which is not interested in JavaScript that’s true. But if there is a bit of interest, it gives a good overview about "What is possible " – The post itself is in the “Developers section”

As I wrote – IMO It makes it clear “what’s expected” from “to be done” plugins.
I saw some value.

That said: As an admin, I would have hidden the post, if there would have been more issues with the AI generated code. But only one “jumped out” immediately – So I decided to fix it and added some more MDN links to the OP.

Just my thoughts

saqimtiaz · July 10, 2024, 11:16am

I had a rather contrary experience as a developer and found it extremely tedious to read and overall not the best use of my time, and as a result stopped reading the thread even though it might otherwise have been of interest as I have worked with some of the relevant techniques. What I would have preferred instead was a summary along the lines of: “Some of the techniques that ChatGPT suggests might be useful are x, y and z” with a link to the ChatGPT conversation for further details. Just a link to the ChatGPT conversation without a summary would have been preferable as well, so I could focus on the remainder of the post.

This just goes to illustrate that we should be trying to look at this from a community perspective rather than a personal one, as personal preferences will always vary.

I think there is considerable merit to the proposed code of conduct for AI generated content from a community perspective for the reasons already discussed above. It errs on the side of caution in terms of the risk of AI generated content being posted on this forum, but that costs us nothing. After all, examples such as the discussion in the linked thread can still be achieved by the author providing a link to the ChatGPT conversation, rather than reproducing the text of that conversation on this forum.

I believe that the proposed code of conduct does not hinder discussion in any meaningful manner, it just requires some discipline from community members which should be a reasonable expectation in any constructive discourse.

TW_Tones · July 10, 2024, 11:24am

Well we may need to agree to disagree, I presented a list of java script functionality I would like to have plugins to turn into triggers.

I wrote substantial text, declared the source after scanning through the code
It provides Sufficient information to proceed, so I don’t think that should result in a sanction about LLM’s.
It is in the developers forum.
The whole LLM output is quoted.

I belive you can only say that if you already decided to scan it and rate my contribution as not worth while. Something you do quite a bit to me

I believe only a little imagination is needed to see how the examples given can be applied and used in tiddlywiki, providing primitives to allow innovation.

Do you really have a problem with that?

Scott_Sauyet · July 10, 2024, 12:03pm

The first response I received:

jeremyruston · July 10, 2024, 12:21pm

Hi @TW_Tones what value do you think was added by including the verbatim extracts from ChatGPT, as opposed to linking to them? That’s the key question here.

I am suggesting that since anyone can get the same information for themselves there is no value from including the verbatim extracts. As I said, I think that posting these verbatim extracts is damaging because it obliges people to read them.

Scott_Sauyet · July 10, 2024, 12:55pm

You stuck with it longer than I did. I read the initial words, “I asked ChatGPT the following question…” scanned the post to see that 80% of the content was verbatim ChatGPT output, and skipped the rest of it. Looking at it now, most of it looks accurate, at least after edits from @Mario.

Knowing how much of what ChatGPT produces is bullshit^*, I don’t find reading the output to be worth my time unless the results have been validated by someone in the know… which mostly defeats the purpose.

@TW_Tones, you did the right thing by clearly marking the LLM results as such. But do be aware that including such output will stop some people reading your post at all.

^*I mean this in the technical sense found Hicks, Humphries, and Slater’s paper ChatGPT is Bullshit.

Springer · July 10, 2024, 2:25pm

Which paper, in turn, appeals to Frankfurt’s philosophical account of bullshit.

And indeed, the reason I’m suspicious of LLM output is not so much because it’s “dumb machine circuits” rather than “smart organic humans” but because of how LLMs have been trained up. They’ve been trained to throw gobs of stuff at the wall repeatedly until something sticks.

When our children do that, we send a clear visceral signal that there are lines that must not be crossed. Throw spaghetti at the wall one more time, and you’ll be regretting it for a while.

When we are in a learning role, we need a sense of finitude, humility, social deference to those who are likely to know what we don’t, and shame (being mortified, taken aback, and rendered at least briefly dumb with soul-searching when we really mess up, hit the third rail, encounter the silent treatment, etc.).

Alas, those who have been training up our LLMs chose an initial recipe that is 100% shameless exuberance and 0% of the critical capacity (awareness of one’s own ignorance) that is equally vital to intelligence. LLM disclaimer boilerplate (about their limits) and “taboo patches” (censorship around certain words or patterns) are tacked on at the edges, rather than built into the core of its handling of meaningful stuff.

As Socrates realized (trying to make sense of the oracle that proclaimed no Athenian was wiser than he): The first step in wisdom is recognizing one’s own ignorance.

Scott_Sauyet · July 10, 2024, 2:31pm

His book On Bullshit just arrived yesterday… some light vacation reading!

Moreover, what sticks has nothing to do with underlying models of the domain in question, but simply, “What is the word most likely to come next in my partial output?” That’s great for creating syntactically reasonable output, but has little to do with the underlying semantics.

Springer · July 10, 2024, 2:50pm

Surely there’s a continuum here, and no clear line.

I suspect much of human speech (by humans) is poor in semantic depth, and mostly a matter of having a “knack” for “what to say when”.

Some humans, including some with a great deal of power, skim over the surface of language without engaging carefully with meanings. They engage in bullshit just as much as LLMs. (I’m just going to leave that observation there, to avoid derailing the conversation…)

And even while I like to decry such behavior, we are all more or less sphexish in how we navigate the world and language (without implying anything about determinism here — just being fallible and inept around the edges).

So the difference between semantic depth and mere “figuring out what sticks” is a matter of degree, and can’t be a litmus-test difference between humans and LLMs.

Scott_Sauyet · July 10, 2024, 3:16pm

Yes of course. For the past year, I’ve been playing to role of the cantankerous sceptic about LLMs at my day job, a counterbalance to the extreme enthusiasm most everyone seems to express. (Although the last two months have seen a drastic turn-around.)

Some of that carries over here.

I loved seeing “sphexish” – I’m a long-term Hofstadter fan. And I mostly agree with his suggestion that there is a huge continuum between form and content. I love the example (either from one of his Scientific American columns or from his commentary on one of them in Metamagical Themas) that goes something like this:

Given Sentence M: “Mary was sick yesterday,” we can describe varying levels of understanding:

Sentence M contains twenty characters.
Sentence M contains four English words.
Sentence M contains one proper noun, one verb, one one adverb, in that order.
Sentence M contains one human’s name, one linking verb, one adjective describing a potential health state of a living being, and one temporal adverb, in that order.
The subject of Sentence M is a pointer to an individual named ‘Mary’, the predicate is an ascription of ill health to the individual so indicated, on the day preceding the statement’s utterance.
Sentence M asserts that the health of an individual named ‘Mary’ was not good the day before today.
Sentence M says that Mary was sick yesterday.

I don’t think LLMs are yet capable of level 4; they’re not even close to level 5.

But yes, I certainly overreached. Blame it on overexposure to LLM fan-boys steering GigantiCorp’s technical direction! And I’m still unconvinced that there’s much there there. I do think The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con explains a great deal of the “success” of LLMs. I am getting older; perhaps it’s just a matter of being crotchety too.

TW_Tones · July 10, 2024, 11:53pm

All of the suggested features can be achieved with the functions I published. In a similar way to you, I would not want to oblige people to search for them just to get the point I am trying to make.

On one hand, A reasonable percentage of readers may just scan the code samples, even ignore the quoted section.
On the other hand someone who understands the code can;
- Critically review the proposed functions
- Perhaps go directly to coding a trigger or action
- Tell me how to code such triggers or actions.

TW_Tones · July 11, 2024, 12:18am

As a sceptic and critical thinker I have questioned the claims about LLMs and call them exactly that, LLMs. they are not A.I.

I have spent substantial time researching how to make use of them, and how they work. Some of this has being made visible within talk.tiddlywiki

If you totally ignore the positive hype, the doomsdayers and the overly critical, you find they are a powerful tool, but like any tool, a trusted friend, or any human expert, you need to be vigilant, sceptical, critical.

Because of this never use it to publish something you yourself do not know without validation. This is the strong argument supporting discouraging raw LLM answers here.

You also discover that asking the right questions, the new art/science of “prompt engineering” is the main determinant of the effectiveness of LLM’s. It would be unwise to emphasise too heavily that LLM’s produce bullshit, because it may imply your questions were bullshit. GIGO

To put it another way, just as search engines help us gain access to a large “corpus” of knowledge, or querying a “Data Warehouse”, LLM’s provide us another avenue to query a larger less curated corpus of knowledge. But, by their nature they require even more sceptical, critical and validation especially when outside your own knowledge domain.

Despite this they remain a powerful tool, or subjective framework, to seek and manage knowledge.

Scott_Sauyet · July 11, 2024, 2:06am

The notion of bullshit as I meant it refers to that article, which paraphrases Frankfurt to say

“The models are in an important way indifferent to the truth of their outputs.”

And I’m afraid that I believe – as a very outside observer – that prompt engineering is similar to both the pyschic’s con and to the nonsense about knowledge from previous livess that Plato promotes in the Socratic dialogue, Meno. With enough triangulation from questions, you can arrive at an answer that suits you, but that has no relationship whatsoever with what’s actually true or what’s actually known.

But the subject at hand is community standards. I don’t think posting small chat outputs here is a big deal, so long as they’re properly labelled and not passed off as original work. Unless the discussion is about the LLMs, though, I would find it much more helpful if the poster synthesized the information in their own words and did enough research to verify that it’s not hallucinations/bullshit.

Such posters should also recognize that in using this material, they may be losing some not insignificant portion of their potential audience. I read most everything you post here, @TW_Tones, but I skipped that one, entirely on the basis of your sourcing from ChatGPT.

Springer · July 11, 2024, 2:49am

It’s fair to say, as you do here, that there’s no source of information we should approach in an absolutely uncritical way.

In at least one way, though, LLMs are very much unlike trusted friends and human experts. They are impervious to our interest in accountability, and their developers have shown no interest in building genuine accountability into their interactions.

Pattern-recognition (what LLMs do) is a matter of what the logician and semiotician C.S. Peirce calls “firstness,” (registering similarities, resemblances, which are always a matter of degree), while an orientation toward distinguishing the true from false, appropriate from inappropriate (etc.) involves engaging with signs in their third (most normative) dimension.

My friends, and human experts, do participate in this field of thirdness (being accountable for registering what’s ok vs what’s not). For this reason, there’s such a thing as developing some appropriate degree of trust in them. If I mislead you by relying on them (in precisely the areas where they’ve committed to being reliable), I can apologize to you, but I can also turn around and hold them to account.

Meanwhile, there is no appropriate degree of trust (normative trust) in an LLM — though of course there’s such a thing as making good probability bets. How I interact with LLMs (or share their output, etc.) involves a distinct responsibility that can only lie with me and with other socially-connected beings who show me that they give a damn about not screwing up.

(I don’t imagine that we’re disagreeing on anything of substance here, @TW_Tones — I’m just hoping to articulate one qualitative difference that is easily overlooked in the "just as with other sources of information… " line of argument.)