AI Tools Plugin

stevesuny · January 28, 2025, 7:12pm

Hi all, there has been some discussion (not too recent) about the AI Tools Plugin, so thought I would revive it here.

I am interested in using the plugin (https://github.com/TiddlyWiki/TiddlyWiki5/pull/8365) to import a single conversation in json format from ChatGPT. I use chatgpt exporter script ChatGPT Exporter running under tamper monkey https://www.tampermonkey.net/, which purports to export in Open AI json format.

I’ve renamed the exported file conversations.json … didn’t work.

Any thoughts or suggestions?

FYI: the specific purpose is to capture the content of my audio transcripts, which are not reported in the html or the md exports, for some reason, but appear to be in the json exports. I’m trying to avoid having to process my entire corpus of chatgpt for a single conversation.

well-noted · January 28, 2025, 10:58pm

I do not know how the ChatGPT exporter works – I would suggest writing a python script that would take the audio file as an input, send it through the Whisper system to get the transcript, and format it in a json format ouput compatible with Tiddlywiki using whatever fields you like.

You can ask ChatGPT how to do that.

Then bulk import those jsons.

stevesuny · January 30, 2025, 2:17am

I’ve had chatgpt generate a python script, and that kind of works, but I want all the additional data of the full json export. Jeremy’s plugin has worked, probably needs a tweak to deal with a single conversation export.

VikingMage · January 30, 2025, 10:49am

$__plugins_vm_importChatGPT.tid (3.6 KB)

I’m working on an extension to import chatgpt conversations. it’s early days, but it creates a json tiddler for each conversation.
it registers a custom file format (.chatgpt) and a deserializer which handles the import automatically. just drag your renamed conversation file on to your wiki after the plugin is installed and restarted.

Note: This is for ChatGPT export through the settings in ChatGPT

WARNING. I’m new to writing plugins and this is the first time I’ve tried something this advanced, so test it on an empty wiki or make sure you have a backup.

well-noted · January 30, 2025, 6:47pm

I happened to be working on my notation processing script today, so I thought I’d attach my system prompt:

system_prompt = (
    'You are a highly skilled and motivated assistant, assigned with organizing all sections of this text file '
    'into the .json format appropriate for Tiddlywiki. Follow these steps:\n'
    '1. Set the text of each section as the "text" field of a tiddler.\n'
    '2. Set the tags to the "tags" field and the page number to the "pagenum" field (enclosed in quotes).\n'
    '3. Use the timestamp as the "title" field.\n'
    '4. Notes with multiple tags must be contained within quotation marks without commas or additional quotation marks surrounding separate notes, such as "one two three".\n'
    '5. Multi-word tags are wrapped in double brackets [[like this]]. Single-word tags do not need brackets.\n'
    '6. Separate each section by exactly one <<split>>.\n'
    '7. Remove or correct any characters that might interfere with being imported into Tiddlywiki.\n'
    '8. Add spaces to any words that might have gotten combined incorrectly.\n'
    '9. List properties in the order: title, text, tags, type, pagenum, caption1, structure, cover, parent.\n'
    '10. Set the type of each to "text/vnd.tiddlywiki".\n'
    '11. Set "caption1", "structure", "cover", and "parent" to empty strings.\n'
    '12. Double-check against the original to ensure the pagenum, parameter names, and text are correct.\n'
    '13. Ensure each section is correctly separated by exactly one <<split>> and no more.\n'
    '14. Verify that each note has a unique timestamp and modify if necessary to ensure uniqueness.\n'
    '15. Add possessive apostraphes where appropriate which may have gotten removed incorrectly. \n'
    '16. After completing your task, please look over everything and make sure that all json structures are correct and you have not missed any content. \n'
    '17. Do not include any additional formatting such as \n\n{{{{||$:/config.template}}}}.\n'
    'Example output:\n'
    '<<split>>\n'
    '{\n'
    '  "title": "202408130907000001",\n'
    '  "text": "This is the text of the first section.",\n'
    '  "tags": "example",\n'
    '  "type": "text/vnd.tiddlywiki",\n'
    '  "pagenum": "1",\n'
    '  "caption1": "",\n'
    '  "structure": "",\n'
    '  "cover": "",\n'
    '  "parent": ""\n'
    '}\n'
    '<<split>>\n'
    '{\n'
    '  "title": "202408130907000002",\n'
    '  "text": "This is the text of the second section.",\n'
    '  "tags": "[[multi word tag]]",\n'
    '  "type": "text/vnd.tiddlywiki",\n'
    '  "pagenum": "2",\n'
    '  "caption1": "",\n'
    '  "structure": "",\n'
    '  "cover": "",\n'
    '  "parent": ""\n'
    '}\n'
    '<<split>>\n'
    '{\n'
    '  "title": "202408130907000003",\n'
    '  "text": "This is the text of the third section.",\n'
    '  "tags": "example [[multi word tag]] another",\n'
    '  "type": "text/vnd.tiddlywiki",\n'
    '  "pagenum": "3",\n'
    '  "caption1": "",\n'
    '  "structure": "",\n'
    '  "cover": "",\n'
    '  "parent": ""\n'
    '}\n'
    '<<split>>'
)

stevesuny · February 6, 2025, 9:26pm

Hi this sounds great. Just out of curiosity: why not have a tiddler for each prompt and each response, instead of each conversation?

stevesuny · February 7, 2025, 12:03am

Hi, I tried the plugin, and while it did import, and create a tiddler for each conversation, I didn’t get any of the prompts or messages. I’ve uploadedSpotify content access query.tid (265 Bytes)
the tiddlers created for one of my conversations.

Here is my test wiki

Thanks!

buggyj · February 10, 2025, 8:01am

You could also try my setup for importing chats that have been exported from chatgpt.

VikingMage · February 13, 2025, 2:34am

Did you rename your conversation to have the extension .chatgpt?

The imported tiddlers will be have the prefix ChatGPT: and the suffix @<conversation start date>
The body of the tiddler will be the Json for the whole conversation.
I have tested importing to your test wiki and my conversations.chatgpt loaded correctly

As for splitting into separate files, I’m not sure how much of the metadata I want to keep yet.

VikingMage · February 13, 2025, 2:50am

I like your extension @buggyj it looks good and I like the logic for handling duplicate imports.

VikingMage · February 13, 2025, 3:27am

It look like the ChatGPT export has folders which are named with the conversation id and inside those folders is an audio folder with .wav files.

looking at the conversation metadata, there are refferences to sediment:// files.
if you split the name of the audio files before the - then the name matches the metadata sediment:// file name.

eg:
Filesystem:
./ChatGPT_export_2022-02-12/6719bfcc-76e0-8003-b4a0-9d80c28f566f/audio/file_6719bff3b250203c3694f96b520e9147395159a8cb10a5b7ba7dcc96e8dcc9b1df462079404e2aa8e6e8de6740b5d9d4-3d57a5f2-bb63-4653-b403-d1879e1cd0b2.wav
metadata:

[
    {
        "2b5de450-06f7-4519-8cf7-cd500543c678": {
            "id": "2b5de450-06f7-4519-8cf7-cd500543c678",
            "message": {
                "id": "2b5de450-06f7-4519-8cf7-cd500543c678",
                "author": {
                    "role": "user",
                    "name": null,
                    "metadata": {}
                },
                "create_time": 1729740787.720517,
                "update_time": null,
                "content": {
                    "content_type": "multimodal_text",
                    "parts": [
                        {
                            "expiry_datetime": null,
                            "content_type": "real_time_user_audio_video_asset_pointer",
                            "frames_asset_pointers": [],
                            "video_container_asset_pointer": null,
                            "audio_asset_pointer": {
                                "expiry_datetime": null,
                                "content_type": "audio_asset_pointer",
                                "asset_pointer": "sediment://file_6719bff3b250203c3694f96b520e9147395159a8cb10a5b7ba7dcc96e8dcc9b1df462079404e2aa8e6e8de6740b5d9d4",
                                "size_bytes": 317804,
                                "format": "wav",
                                "metadata": {
                                    "start_timestamp": null,
                                    "end_timestamp": null,
                                    "pretokenized_vq": null,
                                    "interruptions": null,
                                    "original_audio_source": null,
                                    "transcription": null,
                                    "start": 0,
                                    "end": 6.62
                                }
                            },
                            "audio_start_timestamp": 7.3895760560408235
                        },
                        {
                            "content_type": "audio_transcription",
                            "text": "Conceptually, could a perception be the same as an embedding?",
                            "direction": "in",
                            "decoding_id": null
                        }
                    ]
                },
                "status": "finished_successfully",
                "end_turn": null,
                "weight": 1,
                "metadata": {
                    "voice_mode_message": true,
                    "request_id": "235b50c3-33dc-44ad-9dc5-caea79faada9",
                    "message_source": null,
                    "timestamp_": "absolute",
                    "message_type": null,
                    "real_time_audio_has_video": false
                },
                "recipient": "all",
                "channel": null
            },
            "parent": "500c7f74-c723-499b-a68a-eae6d6c026c2",
            "children": [
                "b738ab1f-1381-4212-8ef4-200242e77973"
            ]
        }
    }
]