Question - How-To - Convert: PDF to TiddlyWIki (.tid)?

Hi Folks,

In searching Google, and TiddlyWiki (main site and here) - I can’t seem to find a good way to “convert” my PDF (mainly text) files to TiddlyWIki’s .tid format – to be able to then “search” the imported content.

I know there are discussion converting “From” TiddlyWIki — “To” other formats - PDF, MD, etc.

But none of these go the other way - from PDF to .tid,… (example)


I can convert my PDF files to HTML5 (one large file) and import them - and they display just fine - But I can’t get the search to work in the standard “TiddlyWiki” way,…


So - My goals - are to convert the PDF’s to .tid fies - - even if I have to convert PDF to MD then to TID - or,… ,…

  • Search: To be able to search for items in the pdf files - using TiddlyWiki’s native search feature
  • Tags - I want to tag the imported files with different tags - to then be able to list them by tags (tags are pretty-ok,…)
  • Filters - I want to be able to search my imported files by “keywords” - that I can convert to lists

Maybe I am overthinking this - but I just can’t seem to search imported PDF or HTML files in Tiddlywiki.

Thanks,

TwN00b

"One step closer to migrating my stuff to TiddlyWiki,…"

1 Like

Do you need to do this from the command line or in a batch process, because if not there may be other ways for a PDF’s content to be converted manually now or then.

  • There may also be some semi-automatic ways as well.

I am able to convert them to HTML - so this is a start - - - ,… - not tough and this part is done.

Next - I am looking to see how to convert the HTML to .tid - - or I might just stay with html - since the files look like PDF files - but are HTML.

At this point - I am researching how to “search” the html content (the viewable part) - to see how I can create my categorized Tags,…

It makes sense that there isn’t really a direct “conversion” from pdf to tiddler. The reason is that PDF is a complex print-oriented format that can include so many layout features, graphic elements, and metadata details that don’t cross over easily. At best you could extract certain elements (such as recognizable text). Some pdfs don’t even have embedded text; they just have graphic images as they’d be captured by a scanner, where any included text is more or less open to OCR processing…

Sounds like your html is a good path, though, if you’re mostly interested in text, and the html conversion is including the text that interests you.

Even if you don’t have the full text searchability, using a keywords field is a great intermediate step!

1 Like

Thanks @Springer - yeah - I can “sort-of” search the html files.

  • I’ll just use my old categories (non TiddlyWiki app) and create them - and add them to each html file

I still feel there is a way - it’s just on the fringes of my brain - ,…

Thanks all - calling this “Solved” as I think it ran it’s course.

TwN00b