About TiddlyWiki Content Licenses, to be Used By AI LLMS

I’m learning LangChain now for career reasons. One of the first projects I want to build at home is something to make LLMs much more accurate with TiddlyWiki questions. Currently thinking a RAG based on TW-com and GrokTW would be a great start. (I won’t make it public without getting permission from the authors of those sites.) I think our community would be helped by having good AI support.

About tiddlywiki.com, tiddlywiki.com/dev

The TW Contributor License Agreements (CLAs) also informs users about the license our content is published at: 2.3 Outbound License which states, that we use CC-BY 3.0 for plain text content.

So you do not need to ask for permission to use the content. IMO it is licensed properly for you to use it.

On the other hand we would highly welcome, if you would also make your work available for the broader community.


Talk-TiddlyWiki terms of service can be found at: Terms of Service - Talk TW


About Grok TiddlyWiki at the time of writing (7.Dec.2025) uses CC BY-NC-ND 4.0 as found in the projects Copyright tiddler

Since it is ND … which stands for “No derivatives”, IMO you’ll need do contact the author. Since LLMs most likely will create derivatives of the content and many examples probably do not make sense if they are used “out of context”. … @sobjornstad (any commments?)

Have fun!
Mario

PS - Desclaimer: I am not a lawyer, so take everything I post here with a grain of salt.

4 Likes

I’m not a lawyer either, and I don’t speak in any way for the Creative Commons team or any other users of the CC licenses, but I don’t personally consider simply training or fine-tuning an LLM on a resource or giving it access to that resource as a reference to constitute creating a derivative of the source. That seems little different to me, philosophically speaking, to a human reading the text and then doing something with the knowledge they gain, or to indexing the text with a search engine.

If the RAG in your project is designed so the agent can give explicit examples or quotations from the manual or GTW text to the user in response to questions, then it gets murkier. (Though I’m not aware of any case law here yet. Arguably it might still not be too dissimilar from a search engine.)

In any event, as long as you cite the source in a manner similar to that expected by the CC-BY-NC-ND license (either next to the answers or in some reasonably visible, single list of sources for the tool, or whatever seems reasonable), you’re absolutely welcome to include Grok TiddlyWiki in a public version.

4 Likes

Right. IF an LLM was only used locally.

But that analogy only works IF Joe Bloggs only and ever locally commands a LLM as a singleton.

The rip-off factor of roving LLMs reporting home to millions is high.

Clearer licensing AND enforcement is needed.

A semi-political comment
TT

This is awesome! Thanks for doing my homework for me :slightly_smiling_face:

I know enough about RAGs in AI now to be able to summarize them like this: I’d be finding tiddlers relevant to the user’s question, and telling the LLM to reference them when generating its answer. So, in a sense, I’d be writing a system that generated derivative works on the fly.

I would not think of making such a resource public without prominently giving credit to those like yourself who shared their knowledge in written form, and linking to the original sources. Not sure if I can do that on a per-answer basis (though I think I can, and I will try). But I will certainly give prominent credit on the homepage.

Thank you for your kind response. I may even reach out to you directly when I have something worth sharing, to be sure you’re happy with the results.

Just a side note: I have a custom GPT for Tiddlywiki and is is more useful than simple open chats however when I have come to use ChatGPT for TiddlyWiki problems, consider this discussion With an LLM it is all about how good your Question is! and also I have found if I want to rewrite or create a new filter operator for example, selecting the specific code within TiddlyWiki that is similar to what I want to start with is much more productive and effective.

You can even ask the LLM to ingest all JavaScript at tiddlywiki.com and answer questions based on this.

2 Likes

Come to think of it – again IANAL, but I don’t think that quoting something normally constitutes “creating a derivative work” of it, either logically or legally. If you wrote a post on this forum that quoted a relevant portion of Grok TiddlyWiki, that wouldn’t be creating a derivative work of Grok TiddlyWiki, it would be writing a forum post that quoted a book, which is a normal thing that people do without comment every day. Or if someone wrote a blog post about how to use TiddlyWiki for something and included a couple of judicious, properly cited quotes from GTW, it would be absurd for me to get mad at them – they followed all the relevant norms, made their post more useful, and gave me free publicity, what’s not to like?

Legally, quotes of a reasonable length that don’t replace the original source are, for the most part, fair use (though it’s a little more complicated than that). You don’t need a license in the first place for fair use, so any restrictions placed by the CC license are irrelevant.

(Using the information to answer a question or solve a problem without quoting it directly seems even more clear-cut. If I read Guns, Germs, and Steel and then write a paper analyzing some claim it makes, that is not a derivative work of the book in any meaningful sense, and certainly not in a legal sense.)

In contrast, if your tool translated the entire book on the fly, or rewrote all of the exercises to use a different story/sample wiki, then that generated content would be a derivative work by the normal definitions. That is a much more substantial use.

I don’t see any reason your tool’s output ought to be different, copyright-wise, from what happens when a human creates the same content. Unless the theory is that the software itself is a derivative work of Grok TiddlyWiki when it includes the ability to search Grok TiddlyWiki, which seems fairly far-fetched to me. (Legally there can be a difference between retrieving the content live on the web and bundling it into a distribution with the software, since you are then distributing copies of the content in the latter case. If I were designing a tool that did this, I would definitely do the former if it was practical.)

It’s fair to point out, as others have above, that LLMs raise some questions because of how readily they can create text, in particular because in more cases than previous tools like Google, they might end up replacing uses of the original source. But I don’t think we should try to reinterpret existing copyright laws to fit our moral intuitions here (and not everyone has the same moral intuitions here given that this is a whole new kind of thing!). Existing copyright law has never really been updated for digital content created by humans in the first place, and it’s a pretty bad fit in many cases. If we see this creating a big problem for society, we should design a different kind of IP right (training rights? LLM quoting rights?) that applies here.

On a more personal note, the goal of writing GTW was to create a learning resource people could work through, do the exercises, and so on, to learn how everything (or specific parts of TiddlyWiki they care about) works fully. That people can sometimes search it to find answers to specific questions is a free side effect, and for all I care someone could sell that service and give me nothing – good on them for doing something useful with it without hurting me! The only time I’d feel ripped off would be if someone, e.g., picked up the entire book, made minor changes to it, removed my donation link, and sold it to people; now they’re just rent-seeking. That’s what the license is supposed to prevent.

3 Likes

It is important to note if the content is based on an open source system. It can be quite different if you are the unique publisher of something. Where traditionally you may have made money from advertising on your site, LLM’s can answer questions relating to your expertise and the user of that info never visits your website, and you loose your revenue.

  • In fact, the facts about your special area would not exist but for your efforts and costs.

Exactly. LLM’s are likely, overall, potentially, corroding human agency.

Of course it is complex with many levels.

There are good & bad LLM functions AND “hallucinations” (the first widespread neologism generatiive AI can be prone to ).

IMO we are seriously lacking the conceptual clarity needed to address the issues adequately.

Just an opinion from TT,
Anthropologist of Modernity

Do you see it as a “big problem”.

If so, what might you suggest that is practical?

A comment
TT

Are you saying that “hallucination” itself is a neologism? If so, etymologists seem to disagree.

Also, I am of the school that thinks “hallucinating”` is a misnomer here anyway, as it involves too much of the idea of actual perception. I agree with those who prefer “bullshitting”.

No. It is an ancient word.

Merely that it was re-run as an accurate descriptor of the evident behaviour/output of some major AI systems.

The old word “hallucination” had widened it’s semantic in a cogent, useful, way.
People need know AI can sometimes hallucinate.

TT

Whatever.

The point is cogency on grasping WTF “AI” is doing.

TT