Collecting TiddlyWiki PDF documentation to create a "privateGPT" for TW

papiche · March 5, 2024, 11:54pm

What documents would you suggest in order to produce privateGPT that could help TW programming?

supported extensions are:

.csv: CSV,
.docx: Word Document,
.doc: Word Document,
.enex: EverNote,
.eml: Email,
.epub: EPub,
.html: HTML File,
.md: Markdown,
.msg: Outlook Message,
.odt: Open Document Text,
.pdf: Portable Document Format (PDF),
.pptx : PowerPoint Document,
.ppt : PowerPoint Document,
.txt: Text file (UTF-8),

Zheng_Bangyou · March 6, 2024, 2:50am

Thanks @papiche. Just install ollama in my computer. Would be really interesting if you would like to share how you train your own models.

papiche · March 6, 2024, 12:17pm

Hi @Zheng_Bangyou
Running LLM locally is possible

I am starting to learn how to train mine…
I’ve learn about the method with that example

For now, it doesn’t maintain memory after a restart
The idea is to create a “TW programming professor”…

Let’s share our experiments
And get PDF to feed the LLama

linonetwo · March 9, 2024, 1:08am

material collection is wip in Can I get talk forum and gg dataset to train AI?, but with no progress.

papiche · March 9, 2024, 12:21pm

Oh, nice… But getting “clean documents” is hard to fulfill.

Maybe instead of parsing forums that contains TWC & TW5 related information,
we could start with an export of https://tiddlywiki.com/

My program needs à PDF.
Let’s make it in a quick & dirty manner…

source
HTML export
PDF print

pdf1168×844 149 KB

It produces this 861 pages document

ingesting
Trying it…

It is not producing so good answers.

Maybe because pdf links are still pointing to original html ?
Maybe we should use “codellama” instead of “mistral” ?
Maybe we need a better curated source…

@linonetwo what is your environment?
Do you had success on specific dataset ?

linonetwo · March 10, 2024, 10:34am

I think you can export JSON, and turn it to txt instead (without html tags in .html file, it is much smaller)

I use [all[]!is[binary]field:type[text/vnd.tiddlywiki]] to export tiddlers that might be meaningful to GPT
export json to C:\Users\linonetwo\Downloads\tiddlers.json
use following script to turn it to txt
use txt directly or print it to pdf

const fs = require('fs'); // If running in Node.js
// If you're running this in a browser environment, you might need to use a different approach to read the file, such as FileReader API.

// Read the tiddlers.json file
const jsonData = fs.readFileSync('C:/Users/linonetwo/Downloads/tiddlers.json', 'utf8'); // Assuming tiddlers.json is in the same directory

// Parse the JSON data
const tiddlers = JSON.parse(jsonData);

// Concatenate the text fields with their corresponding titles
let concatenatedText = '';
tiddlers.forEach(tiddler => {
    concatenatedText += `${tiddler.title}:\n${tiddler.text}\n\n`;
});

fs.writeFileSync('C:/Users/linonetwo/Downloads/tiddlers.txt', concatenatedText, 'utf8')

script is written by GPT3.5, and the result is pretty small (1.9M)

papiche · March 11, 2024, 5:33pm

Great!
did you made that pdf ?

because, when i try to apply

as filter on CLI command

tiddlywiki \
--load ~/.zen/tmp/tiddlywiki.html \
--output ~/.zen/tmp \
--render '.' 'tiddlers.json' '$:/core/templates/exporters/JsonFile' 'exportFilter' '[all[]!is[binary]field:type[text/vnd.tiddlywiki]]'

output is empty…

TW_Tones · March 11, 2024, 11:20pm

I did see and create a guide with ChatGPT ChatGPT - TiddlyWiki Guide “Requires ChatGPT Plus” but not tested it too much, and I am sure more material can be added.

I think going via PDF may be unnecessary.

An example Question and answer to my guide;

How do I code a filter operator in javascript?

Creating a custom filter operator in TiddlyWiki using JavaScript involves several steps, but it’s a great way to extend TiddlyWiki’s functionality to meet your specific needs. Here’s a basic guide to get you started:

1. Understand Filter Operator Structure

A filter operator in TiddlyWiki processes an input list of tiddler titles and outputs a modified list according to specific criteria. It typically has this structure:

Name: The operator’s identifier in filter expressions.

Function: The JavaScript function that implements the operator’s logic.

Parameters: Arguments that can modify the operator’s behavior.

2. Create Your JavaScript Module

You’ll need to create a JavaScript module that TiddlyWiki can recognize and load. This involves writing your filter operator’s code and saving it in a .js file within your TiddlyWiki plugins folder, or embedding it directly in a tiddler set to the correct module type.

Here’s an example of a simple filter operator that filters tiddlers based on whether their titles are of even length:
/*\
title: $:/plugins/yourname/yourplugin/filters/evenlength.js
type: application/javascript
module-type: filteroperator

Filter operator to select tiddlers with titles of even length.
\*/

exports.evenlength = function(source, operator, options) {
    var results = [];
    source(function(tiddler, title) {
        if(title.length % 2 === 0) {
            results.push(title);
        }
    });
    return results;
};
3. Module Type and Path

Set the module type to filteroperator in your tiddler’s fields or your .js file’s metadata.

Choose a meaningful title/path. For plugins, a common convention is $:/plugins/[yourname]/[pluginname]/filters/[operatorname].

4. Using Your Filter Operator

After adding the custom filter operator to your TiddlyWiki, you can use it like any built-in operator. For instance, assuming your operator is named evenlength, you could use it in a filter like this:
[all[tiddlers]evenlength[]]
This filter would return all tiddlers with titles of even length.

5. Testing and Debugging

Test your filter operator thoroughly to ensure it behaves as expected.

Use the browser’s developer tools to debug if something doesn’t work. Look for errors in the console, and check if your operator is being loaded correctly.

6. Documentation and Sharing

Document how to use your filter operator, including examples.

If you plan to share your plugin, consider publishing it on the TiddlyWiki community platforms or repositories.

Further Learning

For more complex behaviors, delve into TiddlyWiki’s core code to understand how built-in filter operators work. The TiddlyWiki documentation and source code are excellent resources for learning more about filter operators and TiddlyWiki’s internals.

Remember, creating custom functionality in TiddlyWiki can be incredibly rewarding. Don’t hesitate to experiment and explore the possibilities!

linonetwo · March 12, 2024, 7:01am

Are you using the correct html? I was done that in tiddlywiki.com 's advanced serarch.

But I find RAG 's performance is not good, we may still need modal training.

papiche · March 12, 2024, 6:52pm

You make use of a “proprietary IA”, I can’t access…
My mean is to create an open “TW5 assistant” model that could perform as “chatGPT4”
or better.

@linonetwo sorry, I did forgot 'text/plain'

tiddlywiki --load ~/.zen/tmp/tiddlywiki.html --output ~/.zen/tmp \
--render '.' 'tiddlers.json' 'text/plain' '$:/core/templates/exporters/JsonFile' 'exportFilter' '[all[]!is[binary]field:type[text/vnd.tiddlywiki]]'

I am genuine user.
only know RAG, but curious, about different methods, what is yours ?
are you using ollama ?

TW_Tones · March 12, 2024, 10:33pm

Of course, my point is however I did not ingest PDF but pointed to some wikis.

I am all for a public open tiddlywiki LLM.

linonetwo · March 13, 2024, 10:17am

Can you support me on Can I get talk forum and gg dataset to train AI??

I have many GPU, but only lack training data.

linonetwo · March 13, 2024, 10:18am

I’m using www.coze.com , I think it uses GPT4. I put all tw5-com doc to its database, and it still can’t understand filter expression and widget well.

papiche · March 13, 2024, 11:52am

Maybe https://floneum.com/ can be used with local models from

papiche · March 19, 2024, 10:35am

this article is interesting
How to convert any text corpus into a Graph of Concepts (GC)

https://towardsdatascience.com/how-to-convert-any-text-into-a-graph-of-concepts-110844f22a1a