Preparing AI training materials Season 1 - Explain TW core WikiText

  1. Explain TW core WikiText with LLM<- we are here
  2. human validation of dataset
  3. train a wikitext generation expert model
  4. get talk forum and gg dataset to train AI for chat model
  5. fine tune a latest opensource LLM
  6. RLHF

I’m working on step 1, read and comment on GitHub - tiddly-gittly/TiddlyWiki-LLM-dataset: WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP) if you have any doubt or ideas.

You can help add more tasks in TiddlyWiki-LLM-dataset/wiki/tiddlers/prompts/data at master · tiddly-gittly/TiddlyWiki-LLM-dataset · GitHub , PR or comment on this post are both OK.

If you are interesting in step 2, please stay tuned for my follow-up post, I will invite reviewers for help when AI generated dataset is ready.

3 Likes