Collecting TiddlyWiki PDF documentation to create a "privateGPT" for TW

Using https://ollama.ai

What documents would you suggest in order to produce privateGPT that could help TW programming?

supported extensions are:

.csv: CSV,
.docx: Word Document,
.doc: Word Document,
.enex: EverNote,
.eml: Email,
.epub: EPub,
.html: HTML File,
.md: Markdown,
.msg: Outlook Message,
.odt: Open Document Text,
.pdf: Portable Document Format (PDF),
.pptx : PowerPoint Document,
.ppt : PowerPoint Document,
.txt: Text file (UTF-8),
1 Like

Thanks @papiche. Just install ollama in my computer. Would be really interesting if you would like to share how you train your own models.

Hi @Zheng_Bangyou
Running LLM locally is possible :wink:

I am starting to learn how to train mine…
I’ve learn about the method with that example

For now, it doesn’t maintain memory after a restart
The idea is to create a “TW programming professor”…

Let’s share our experiments
And get PDF to feed the LLama :wink:

1 Like

material collection is wip in Can I get talk forum and gg dataset to train AI?, but with no progress.

Oh, nice… But getting “clean documents” is hard to fulfill.

Maybe instead of parsing forums that contains TWC & TW5 related information,
we could start with an export of https://tiddlywiki.com/

My program needs à PDF.
Let’s make it in a quick & dirty manner…

It produces this 861 pages document

ingesting
Trying it…

It is not producing so good answers.

  • Maybe because pdf links are still pointing to original html ?
  • Maybe we should use “codellama” instead of “mistral” ?
  • Maybe we need a better curated source…

@linonetwo what is your environment?
Do you had success on specific dataset ?

1 Like

I think you can export JSON, and turn it to txt instead (without html tags in .html file, it is much smaller)

  1. I use [all[]!is[binary]field:type[text/vnd.tiddlywiki]] to export tiddlers that might be meaningful to GPT
  2. export json to C:\Users\linonetwo\Downloads\tiddlers.json
  3. use following script to turn it to txt
  4. use txt directly or print it to pdf
const fs = require('fs'); // If running in Node.js
// If you're running this in a browser environment, you might need to use a different approach to read the file, such as FileReader API.

// Read the tiddlers.json file
const jsonData = fs.readFileSync('C:/Users/linonetwo/Downloads/tiddlers.json', 'utf8'); // Assuming tiddlers.json is in the same directory

// Parse the JSON data
const tiddlers = JSON.parse(jsonData);

// Concatenate the text fields with their corresponding titles
let concatenatedText = '';
tiddlers.forEach(tiddler => {
    concatenatedText += `${tiddler.title}:\n${tiddler.text}\n\n`;
});

fs.writeFileSync('C:/Users/linonetwo/Downloads/tiddlers.txt', concatenatedText, 'utf8')

script is written by GPT3.5, and the result is pretty small (1.9M)

Great!
did you made that pdf ?

because, when i try to apply

as filter on CLI command

tiddlywiki \
--load ~/.zen/tmp/tiddlywiki.html \
--output ~/.zen/tmp \
--render '.' 'tiddlers.json' '$:/core/templates/exporters/JsonFile' 'exportFilter' '[all[]!is[binary]field:type[text/vnd.tiddlywiki]]'

output is empty…

I did see and create a guide with ChatGPT ChatGPT - TiddlyWiki Guide “Requires ChatGPT Plus” but not tested it too much, and I am sure more material can be added.

  • I think going via PDF may be unnecessary.

An example Question and answer to my guide;

How do I code a filter operator in javascript?

Creating a custom filter operator in TiddlyWiki using JavaScript involves several steps, but it’s a great way to extend TiddlyWiki’s functionality to meet your specific needs. Here’s a basic guide to get you started:

1. Understand Filter Operator Structure

A filter operator in TiddlyWiki processes an input list of tiddler titles and outputs a modified list according to specific criteria. It typically has this structure:

  • Name: The operator’s identifier in filter expressions.
  • Function: The JavaScript function that implements the operator’s logic.
  • Parameters: Arguments that can modify the operator’s behavior.

2. Create Your JavaScript Module

You’ll need to create a JavaScript module that TiddlyWiki can recognize and load. This involves writing your filter operator’s code and saving it in a .js file within your TiddlyWiki plugins folder, or embedding it directly in a tiddler set to the correct module type.

Here’s an example of a simple filter operator that filters tiddlers based on whether their titles are of even length:

/*\
title: $:/plugins/yourname/yourplugin/filters/evenlength.js
type: application/javascript
module-type: filteroperator

Filter operator to select tiddlers with titles of even length.
\*/

exports.evenlength = function(source, operator, options) {
    var results = [];
    source(function(tiddler, title) {
        if(title.length % 2 === 0) {
            results.push(title);
        }
    });
    return results;
};

3. Module Type and Path

  • Set the module type to filteroperator in your tiddler’s fields or your .js file’s metadata.
  • Choose a meaningful title/path. For plugins, a common convention is $:/plugins/[yourname]/[pluginname]/filters/[operatorname].

4. Using Your Filter Operator

After adding the custom filter operator to your TiddlyWiki, you can use it like any built-in operator. For instance, assuming your operator is named evenlength, you could use it in a filter like this:

[all[tiddlers]evenlength[]]

This filter would return all tiddlers with titles of even length.

5. Testing and Debugging

  • Test your filter operator thoroughly to ensure it behaves as expected.
  • Use the browser’s developer tools to debug if something doesn’t work. Look for errors in the console, and check if your operator is being loaded correctly.

6. Documentation and Sharing

  • Document how to use your filter operator, including examples.
  • If you plan to share your plugin, consider publishing it on the TiddlyWiki community platforms or repositories.

Further Learning

For more complex behaviors, delve into TiddlyWiki’s core code to understand how built-in filter operators work. The TiddlyWiki documentation and source code are excellent resources for learning more about filter operators and TiddlyWiki’s internals.

Remember, creating custom functionality in TiddlyWiki can be incredibly rewarding. Don’t hesitate to experiment and explore the possibilities!

Are you using the correct html? I was done that in tiddlywiki.com 's advanced serarch.

But I find RAG 's performance is not good, we may still need modal training.

You make use of a “proprietary IA”, I can’t access…
My mean is to create an open “TW5 assistant” model that could perform as “chatGPT4”
or better.

@linonetwo sorry, I did forgot 'text/plain'

tiddlywiki --load ~/.zen/tmp/tiddlywiki.html --output ~/.zen/tmp \
--render '.' 'tiddlers.json' 'text/plain' '$:/core/templates/exporters/JsonFile' 'exportFilter' '[all[]!is[binary]field:type[text/vnd.tiddlywiki]]'

I am genuine user.
only know RAG, but curious, about different methods, what is yours ?
are you using ollama ?

Of course, my point is however I did not ingest PDF but pointed to some wikis.

I am all for a public open tiddlywiki LLM.

Can you support me on Can I get talk forum and gg dataset to train AI??

I have many GPU, but only lack training data.

I’m using www.coze.com , I think it uses GPT4. I put all tw5-com doc to its database, and it still can’t understand filter expression and widget well.

Maybe https://floneum.com/ can be used with local models from

this article is interesting
How to convert any text corpus into a Graph of Concepts (GC)

https://towardsdatascience.com/how-to-convert-any-text-into-a-graph-of-concepts-110844f22a1a