Looking for advice

Bob_Jansen · July 20, 2023, 6:40am

I am working on building a digital archive of the paper-based archive of Central Street Gallery in Sydney. This was one of the first contemporary art spaces in Sydney in the 60’s and thus is an important organisation of that time that mentored many of today’s important artists.

You can seethe work in progress at Central Street Archive.

The advice I need is for the display of multi-page documents.

The example, from the archive, is the tiddler, Magazine: Contemporary Art Society, November 1974 (Central Street Archive)

This is a 240MB document, obviously too large to display as is, the download time is prohibitive.

So I am thinking about a way to paginate the document and provide simple access to the pages. I have already split the large document into 21 single-page documents of about 30MB each.

I know I could have a sequence of tiddlers, one for each page but I am hoping I can provide a ‘page turning’ facility.

I already use a PageTurner tiddler which some knowledgeable person from this group provided me some time ago, to browse through the items. You can see this in action through the << and >> icons on each page. This is implemented using the ViewTemplate facility,

I am looking for suggestions on how to implement this second page turning facility. I have tried using a duplicate of the PageTurner facility but TW seems to get confused and when trying to turn a document page, TW displays the next item in the items list. I don’t know whether this is because of the SORT getting confused about sorting items and pages. If I can implement two such facilities, it is important that turning pages does not interfere with turrning through the items, and vice versa.

All advice/suggestions/guidance greatly appreciated.

bobj

twMat · July 20, 2023, 7:36am

Quick thought: How would the info optimally be presented on any non-TW based system? Do you know of any great website that presentes comparable information? Thereafter, this can perhaps be replicated with TW.

TW_Tones · July 20, 2023, 7:48am

I was thinking that as well @twMat, its too big for anywhere, I still have not seen the PDF file to know, but I think revisiting the PDF itself to use most likely lower resolution images, because I understand it is a scanned document. Then it can be kept all together as one file.

With a proper PDF editor one could also move the above contents details into the PDF allowing navigation within the viewer.
There is a lot one can do to reduce the size of scanned document with little or no apparent loss of quality.

pmario · July 20, 2023, 9:11am

I did download one of your pdfs to see the size. It’s a 2 page text only page, with no images and just some names listed. It uses 2.3MByte and took about 2 minutes to load.

Then I did switch to the next page and it took 4 minutes for a 6MB file.

For me it seems your internet provider puts a download speed or any other rate limit on to the communication.

Every modern mobile device makes photos that are about 4MB in size and they do not need that much time to be handled. Even if the image comes from Australia and has to be sent to Austria (my place)

So I would check with the storage provider. … It may also be possible that the PDFs are stored in a “cold storage”, which are much much cheaper than “hot” storage, where the info is available immediately … I would highly recommend to check, where your PDFs are stored

TW_Tones · July 20, 2023, 9:54am

Yes, I learned its in the U.S. with smaller files. Content delivery network would help a lot.

Bob_Jansen · July 20, 2023, 11:12am

@TW_Tones,

I have looked at reducing the file but for docs I tried, they became almost unreadable.

My preferred option at the moment is to split the file in some way. Three come to mind:

A single page per file rather than the current two. Double the number of files but each only half the size.
Split into pages with each page as two magazine pages. What I’ve been playing around with.
Split depending on the story, so each story in a single file. Given that most people would, I guess, be more interested in specific story, this might be the best solution. But then what do we do with long stories?

Might try your suggestion of photographing rather than scanning and see if that makes a difference.

Bobj

TW_Tones · July 20, 2023, 11:21am

I did not mean that. Even after scanning you can apply different compression algorithiums to any image file, its too complex to describe all the methods. But this has being an issue for decades with websites in general and thus a lot of tools and methods abound. If you can share something, like some scans, I could have a brief look.

Jason_Cunliffe · July 20, 2023, 1:36pm

I suggest you follow a couple of paths simultaneously::

Focus on the Catalog Archive structure and contents, without worrying initially about web or filesize.

LaTeX has very powerful semantic features, beautifull typography, and deep features to adapt and the same core contents in various layouts, export (print convert). A key feature is in lining graphics, images and pdfs included.
One can take existing external pdfs and pull them in whole full page as part of a new document, and/or select specific pages for inclusion. They can be scaled from thumbnail catalog matrices up to zoomed in cropped, positioned diagrams, illustrations.

There are excellent free desktop software such TexStudio. And these days Overleaf.com is a brilliant online application. It’s brilliant key feature is that it offers collaborative online editing. Overleaf merged with the aptly named ShareLatex == they are now one.

The upside is images (scans photos etc) can be high res as you want.
The output quality and format is then a set of downstream choices, via small script variations.

The object is getting the content and all it’s associated images and text in powerful system.

meanwhile and from there, Tiddlywiki can be a brilliant companion platform. A 2-way interface.

hth
~ Jason

Jason_Cunliffe · July 20, 2023, 1:44pm

Is a good online read ∆

The ‘pdfpages’ package is a LaTeXplugin with excellent options via just a line or two in script command

Tiddlywiki has its own LaTeX features

Here is a short 8bminute video which gives a taste of how v small changes in LaTeX code can be fast and powerful. Just edit a parameter or two to get what you need.

One can then create and collect a library of documents, illustrations == Lego parts and assemblies. Then select compose and use as needs be.

There is a strong transclusionary similarity between LaTeX documents and Tiddlywiki

The roots and DNA of html & hypertext > Wikis…can be found in typesetting codes and syntax.

What PDFs have done documents is marvellous as an end-user = reader. For distribution°publishing likwise.

/// The problem in PDFs is the loss of semantic structure.
PDFs are pretty stupid and not interested in the semantics. They just focus on page layout, clean typography and no headaches. Great

But for creators who need to develop and grow their content — reorder, structure, sequence, tube, adapt etc PDFs are problematic.

LaTeX 3 = major version long-term dream now project to advance and implement semantic features seriously.
And like Tiddlywiki= born mid 1990s.
The developers put it aside for years while improving the prevailing code.
Don’t break things for people who are using them-Dept. !

But LaTeX 3 is a work in progress now.
It’s timing is good.
Like the spectacular features and Dev User community of TW5.3

The interplay between both is something I am actively exploring myself currently.
Learning curve but the rewards and long-term benefits are impressive

Bob_Jansen · July 21, 2023, 3:36am

I’d like to thank everyone who contributed for their advice and experience.
You have given me much to think about. Now to do the thinking and tinkering…

bobj