- Explain TW core WikiText with LLM<- we are here
- human validation of dataset
- train a wikitext generation expert model
- get talk forum and gg dataset to train AI for chat model
- fine tune a latest opensource LLM
- RLHF
I’m working on step 1, read and comment on GitHub - tiddly-gittly/TiddlyWiki-LLM-dataset: WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP) if you have any doubt or ideas.
You can help add more tasks in TiddlyWiki-LLM-dataset/wiki/tiddlers/prompts/data at master · tiddly-gittly/TiddlyWiki-LLM-dataset · GitHub , PR or comment on this post are both OK.
If you are interesting in step 2, please stay tuned for my follow-up post, I will invite reviewers for help when AI generated dataset is ready.