Time consuming JS macro

clsturgeon · June 29, 2022, 10:13pm

I am writing a JS macro to perform an import of GEDCOM data. My first few tests are with small data sets. These files import quickly (a few seconds). I recently tried a very large data file which took a bit of time to process.

What are some best practices when processing such files. I had to hit a wait button in a browser message three or four times during this import.

How can I inform the user of progress? I tried the TW notification method. This works as long as I did not call it more than once.

Thoughts? Thank you in advance.

TW_Tones · June 29, 2022, 11:54pm

Although it is not really truthful I have seen an animated GIF file displayed before hand used to imply progress is continuing even once the tab is no longer responding to the user.

The simplest form is to display a “.” for each item processed, so progress is evident, but not sure how you do that from a JS module in a TiddlyWiki.

Other approaches are to design restart-able processes so you can import the first N items, then the next N items etc…

This could allow you to provide an indicator or notification before each restart.

Perhaps if you import the whole GEDCOM source into a tiddler then write a parser this re-start-able way would be easier. If the source is in or can be converted to csv I would use a dedicated wiki, with JSON mangler and import all the data into a plugin with each entry a shadow tiddler.

You can just drag the result to the wiki that processes it.
The JSON mangler import allows you to select what to convert, thus reduce the data size if needed.

It is possible to increase the “inactivity” timeout period for a browser, something I found essential in the past, but not so much lately.

saqimtiaz · June 30, 2022, 5:03am

Macros are meant to be used to return text and not have any other side effects like altering tiddlers.

The recommended way to approach your use case would be to send a custom widget message using action-sendmessage say tm-importged-com and implement a listener for it in the root widget, see for example:

github.com

Jermolene/TiddlyWiki5/blob/1110fd50cfc4b61e61524323b4e309d72f1298dd/core/modules/startup/rootwidget.js#L28

    
      
          exports.after = ["startup"];
          exports.before = ["story"];
          exports.synchronous = true;
          
          
exports.startup = function() {
          	// Install the modal message mechanism
          	$tw.modal = new $tw.utils.Modal($tw.wiki);
          	$tw.rootWidget.addEventListener("tm-modal",function(event) {
          		$tw.modal.display(event.param,{variables: event.paramObject, event: event});
          	});
          	$tw.rootWidget.addEventListener("tm-show-switcher",function(event) {
          		$tw.modal.display("$:/core/ui/SwitcherModal",{variables: event.paramObject, event: event});
          	});
          	// Install the notification  mechanism
          	$tw.notifier = new $tw.utils.Notifier($tw.wiki);
          	$tw.rootWidget.addEventListener("tm-notify",function(event) {
          		$tw.notifier.display(event.param,{variables: event.paramObject});
          	});
          	// Install the copy-to-clipboard  mechanism
          	$tw.rootWidget.addEventListener("tm-copy-to-clipboard",function(event) {
          		$tw.utils.copyToClipboard(event.param);

In the callback for the listener, save the imported data in a tiddler. That tiddler can be transcluded where you want the data to appear, or the data in it further processed.

This approach will make the import asynchronous and should also avoid it being interrupted by a refresh process.

pmario · June 30, 2022, 7:39am

What is a “small” data set for you. How many records?

And – What is “large”?

I think it also depends, what you do with your data while importing. I don’t know the GEDCOM data format, but a short glimpse to spec indicates that it is “line-based”, with at lot of “back-references”, that may need modifying existing records (tiddler) several times. … Depending how your data-structure looks like. … That needs time.

Is there an easy way to “split” the data-set into several smaller “chunks” and import them that way?

— some browser JS background —

Browser engines work with a single event-loop. That means everything is done in 1 “never ending” loop. The browser UI handling and the site javascript-code is done in this 1 loop.

So if a JS programmer creates an endless loop in JS eg: while (1) {} the browser UI will be completely blocked.

Today’s browsers detect these “long running js code” and create a popup with a button to “continue” or “block” that script. … If it is blocked, it can only resume, if the page is reloaded!

So if you import “large” data in a for() loop, after some time the browser will show the popup you mentioned.

— end background —

In my opinion, this can only be avoided by splitting and queueing the import data and then process the queues in smaller chunks until everything is imported. 1 “chunk” per browser loop.

At the moment TW doesn’t provide any queueing and task-execution API to developers.

Such a mechanism would allow us to give some user progress feedback, since the UI will be responsive while the tasks are executed.

As I wrote — How many lines is a “large” data set?

pmario · June 30, 2022, 7:44am

That’s not possible. If a JS loop is executed, no other code can be executed. So drawing the dot “.” will not happen until the loop is finished. … So all the dots will be drawn, when they are not needed anymore. …

See my background info from the other post.

clsturgeon · June 30, 2022, 12:13pm

There are various ways to measure this. File size, number of lines in a file and number of records. For those unfamiliar, GEDCOM is a flat file format containing genealogical data, such definitions as individuals, families, and sources. Within these records are notes and events. The GEDCOM format does not align with how my data in TW is defined so it adds challenges to the import.

My first two small files contained ~400 and ~1200 individuals. Both imported within a few seconds. The large file had ~12000 individuals.

As you indicated since there are complex data relationships in these incoming files, good or bad, I have a three step process.

Import all the data into JS objects
Prepare objects for import. This is nothing more than reviewing all the objects so they generate unique tiddlers
Generate tiddlers from JS objects

There are numerous pros and cons to this approach. Biggest con is memory usage and biggest pro is it works.

Having it reread the incoming data to address complex data relationships did not sit well with me. So I went with read once.

Breaking up data for import has its challenges. For data integrity you would want split data along family lineage. I suspect there are existing tools in the market to do this.

Sometime ago I wrote a PowerShell script to convert GEDCOM to json, where the json format was in TW format. The large GEDCOM file converted to json imported into TW slowly too.

Every genealogist has their own unique data set and some are proud of their huge volume
of data. I have thought about adding a “Check” or “Review” button/macro that reviews a GEDCOM file and reports to the user some statistics about the file, and indicates how viable the file is for importing. I personally do not feel 12000 individuals is appropriate. I belong to family history group—this was their GEDCOM file.

Once imported, having ~12000 individuals with associated event data and notes is very taxing in TW. The system with ~1200 individuals responses very well. Aside: if this was hosted in node.JS would you expect better performance?

clsturgeon · June 30, 2022, 12:24pm

Thank you for this. I’ll will give this a try.

pmario · July 1, 2022, 8:53am

Thanks for you explanations.

No. … The node server reads all the tiddlers from disk, builds the content and ships it to the browser. So it’s basically the same thing once it’s on the browser.

12000 tiddlers shouldn’t be a problem for TW. … It’s the filters and connected info that is visible at any time, that can make the UI slower. …

Without seeing the wiki and the structure it’s hare to guess, what’s going on.

clsturgeon · July 1, 2022, 2:07pm

I haven’t enabled the performance setting in awhile. I see the problem. I’m going to need to be creative. I’m bad for going OTT with showing more and more data in my lists. With 12000 individuals and even more event tiddlers the performance results suggest that how I obtain birth and death dates from event tiddlers is killing the performance. Eg. A list filter to obtain people, sorted by title then a sub list of getting

[tag[event]tag[birth]contains:people{!!title}]

Then again another for the death event

[tag[event]tag[death]contains:people{!!title}]

TW_Tones · July 1, 2022, 11:24pm

@clsturgeon

As @pmario said,

With TiddlyWikis interactive always up to date state the main contributor to performance is what are you displaying on screen and the work to keep this up to date. Performance can be improved right away hiding the sidebar or closing tiddlers.

We played previously with a Esperanto dictionary with 66,000+ tiddlers which became usable with these tweaks.
There are ways to look at your data and see if there are more efficient ways such as filters that are eliminating the largest number of tiddlers first.
One approach is on large list or dashboard tiddlers to take a static snapshot of the tiddlers output and display that and get the user to refresh the tiddler as needed snapshot.json (6.2 KB) is one I prepared earlier.
- This may be optimised further.

Finally, I want to reiterate my suggestions that seems forgotten in this thread Time consuming JS macro - #2 by TW_Tones and I can illustrate these further if asked.

myfta · July 2, 2022, 11:08am

This is more of a usage response to your long running JS. As a user who would use MK for research I’m not sure I’d want to bring in 12000 individuals from a Gedcom. I can see the convenience, but research tends to be around a family group, locality etc. so I would import a subset of my main FH database, then tag them as associated with a specific research task. Your main FH program is the best place for your master dataset. MK is not going to compete with a genealogy program for its reporting, charting etc.

Also if you are doing research on the move, in libraries you will probably be on a mobile device so do you really want to have those 12000 in MK?

I would be happy if it worked reliably with a recommendation of a max of, say, 1000 records in a Gedcom.

Don’t let me stop you optimising your coe though!

clsturgeon · July 3, 2022, 12:18am

@myfta I cannot disagree with you. MK will in no way compete with traditional genealogy software. Most genealogist demand very specific charts and reports, something that I never made my top priority. My focus has been data, and as you can see some of the data I capture is not traditional.

I have addressed a few performance issues and will attempt to address more.

I’ll look into the possibility of restricting import. Example: user could select a individual in the gedcom and then the user could import either the ancestors or descendants of the selected individual.

What other options could I provide that makes sense? I could support a surname import, but doing such an import would provide a lot of gaps on the data. Thoughts?

Did you look at the help document (which still has too many gaps)? There are a few more side projects in the works. Check out the Genealogical Questionnaire, the SNP project and a new interactive timeline. The latter 2 are under construction.

clsturgeon · July 3, 2022, 3:09am

@TW_Tones I have taken your post serious. This has prompted me to review performance statistics and make changes. These have made solid performance improvements. The TW with 12000+ individuals being imported is now in a usable state.

I now need to find time to work on the original reason for the post. Plus I want to review this “dashboard” idea. However, this long weekend has me enjoying the outdoors. Thanks again.

TW_Tones · July 3, 2022, 4:11am

We have heavy and persistent rain here enjoy the weekend.

Feel free to send me you data (I will also see performance) or part there of and I will use my approach on it to go part way. Confidentially assured.

myfta · July 3, 2022, 5:11am

I use Family Historian, but like most such programs it does allow you to create a partial Gedcom export. The most useful options are “related to the person exported” and “spouses”. I agree that surname would not be so useful.

I will give the other other document areas a look but maybe others would like to share their thoughts…