Scripts for Backing Up TiddlyHost

Springer · August 16, 2024, 8:01pm

Of course, I back up my files locally on a regular basis, and with greater frequency whenever I’m doing big adventurous work — like “clipping in” while rock-climbing.

But my point was that if TiddlyHost were to cease to be available (or to cease to be reliable), then an important feature (the fine-grained incremental backups — recently enhanced with labels, thanks to @simon) would be effectively lost. I have all those eggs in that one basket, because otherwise it would not be realistic to keep all those eggs at all I don’t think I’m confused about the importance of backing up. I’m simply wringing my hands over various ways in which TiddlyHost has been amazing, and (for my own case) there’s no other path that puts all those features together in such a lovely “just works” [until now ] way.

(As of just now, I tried to open three tiddlyhost sites: one was zippy, one took 30 seconds, and one gave me a gateway timeout after 60. Really feels random right now — not in a “small chance of a problem” way, but “small chance it’ll work responsively” way.)

Mark_S · August 17, 2024, 5:00am

Ok, here’s a script that runs in PowerShell. But I tested it in Linux, so I might need some feedback. There might be pathing descrepancies. It would be neat if the shell could retry files after timeouts, but that is more than I am up to today.

With this version, you make a file for your password, for your user id (email address), and for your list of files. From the comments

# Example usage:
#   <this scriptname> [FileList] [Password File] [User Id File] [Download Dir]
# Where
#   [FileList] is a file with a list of projects/files you want to download.
#      Do not include the .html extension.
#   [Password File] is file with your password (so it doesn't need to be hard-coded here)
#      Defaults to "./thostpass.txt"
#   [User Id File] is a file with your user id (usually your email)
#      Defaults to "./user.txt"
#   [Download Dir] is the name of the directory where you want your downloads to go.
#      Defaults to "." (current directory)

So FileList is a file with a list of the files you want to process. One name per line.

You can also specify parameters with parameter names, rather than by order.

If you have an error with one file, it will skip to the next. So watch the output for error messages. This is different from the bash shell script, where an error with one file would terminate the program.

In terms of warnings, well, do not try this where it will over-write your existing backups … in case there’s something wrong with these backups. If you try this, let me know if the status field returns a number, because on Linux it doesn’t.

Mark_S · August 20, 2024, 7:37pm

That would be great! The caveat is that I haven’t heard back from any Windows users.

simon · August 20, 2024, 8:43pm

Have you used it yourself?

Mark_S · August 20, 2024, 8:53pm

For testing of course – on Linux.

Powershell runs on Windows, Linux and MacOS. Powershell comes by default I think on Windows 10 and 11. It may have slight differences in abilities from the Linux/Mac version, but (I think) this usually relates to access to the hardware, which should be irrelevant for this use. It would be handy if someone out there (@twMat ?) gave the script a test run on Windows.

twMat · August 20, 2024, 9:28pm

I’ll give it a try within a day or two. I didn’t previously because by the time you kindly posted it I had already backed-up almost everything manually. But I realize now that I probably didn’t communicate that though! Thank you @Mark_S

And, of course, a huge thank you to @simon ! Speed seems to be as normal now! Regarding:

installing a robots.txt file to instruct web-crawlers not to relentlessly follow every filter and sort link in the “Explore” page.

…it’s a bit strange though that the problems arose suddenly, not gradually, so if web-crawlers were always “following every filter…” then why would it change all of a sudden…? Or maybe they themselves, or how they do their thing, changed ~2 weeks ago? Or maybe there was a sudden influx of users? Or an abrupt change in how users use the system?

Mark_S · August 21, 2024, 3:49am

I’m working with this backup script, and it keeps stalling. My feeling is that the site problem isn’t over yet.

I have gotten the powershell script to work on my old Win 7 machine. I forgot that Powershell actually works better under Linux than Windows. There’s a couple of extra steps people will need to know about.

I’m adding code to retry download attempts.

vilc · August 21, 2024, 7:51am

@Mark_S many thanks for the script. Until now I was using the bash script with some modifications for multiple wikis through WSL. This will make things much easier on Windows.

I have tried it on Windows 10 and it worked good so far – it correctly downloaded a couple of wikis. I really like how easy to configure it is with the file list.

In a rather locked business environment I get an error New-Object : Cannot create type. Only core types are supported in this language mode. on line 44 $WebSession = New-Object Microsoft.PowerShell.Commands.WebRequestSession. I’ll give it a try at home later today.
However, it seems this line is not necessary anyway, the $WebSession variable is only referenced in a commented-out code.

Btw, @simon, is it possible to download a local core version of external core wikis (and the core js) using these scripted methods?
I tried appending ?mode=local_core (like in the TH sites panel) to the url, but with no effect.
This would make scripted downloads a better tool for backing up in case of TH issues.

TiddlyTitch · August 21, 2024, 10:37am

It works.

Long live Progeny Of Polly!!! (PoP)

TT, x

.

Mark_S · August 23, 2024, 11:17pm

I have a new verion of the uploader in PS. It has code to trap and repeat files that have timed-out, up to 5 times. Unfortunately, it has only been lightly tested because … TiddlyHost has been working too well!

I’ve downloaded 12 out 12 files every time whenever I’ve tested over the last 12 hours. Hopefully this means TH is back in the game.

twMat · August 24, 2024, 5:44pm

Things seem to work well on TH right now but regardless:

OK, I’m looking into your Tiddlyhost downloader script now (thank you!) and immediately, even before running it, hit a “conceptual” problem for my use case:

[FileList] is a file with a list of projects/files you want to download.

First, I’m guessing “file” refers to the wiki names seen in the tiddlyhost.com/sites list… right? (Not entirely clear. Maybe better use “sites” or even clearer with “wiki names” / “WikiList” ?)

Second, having to manually list all files nixes the point of using the script to begin with: It is not difficult to manually download a wiki, so the purpose - or at least my purpose - of using a script would be to eliminate as much manual hassle as possible. But having to search a 100+ list with sometimes semi-cryptic titles and flip back and forth to type out names in a list is burdensome. I’d suggest an option to downloads all the wikis - maybe even have this as the default (e.g if no list given). The user can thereafter manually delete the undesired ones but I imagine the use of this is typically as a backup so just maybe it’s not ever even relevant to delete stuff afterwards).

I can see how typing titles is not a big deal if you have some 10 TH sites… but then I also wonder why one would need a script at all?

OK, hope this made sense. Again, I didn’t yet get to the actual execution of the script because of this.

Thank you for sharing Mark!

Mark_S · August 24, 2024, 6:37pm

Clicking 143 sites, every week (or however often you do a backup), isn’t a hassle?

In any event, once you’ve downloaded your sites, it should be pretty easy to make the list.

But the real reason I did it this way is because the original script has you pass the title.

But downloading everything would be useful. So it would depend on whether it’s even possible. @simon , are the sites in TH listable (once you’ve logged in, of course) ?

twMat · August 24, 2024, 6:59pm

Oh, once a week is another use case than what I thought this was about. I’ve seen this as a kind of emergency action when/if TH behaves strangely. But you’re right, it makes sense to back up more often and yes manually clicking 143 sites is a hassle …but, still, identifying and typing out the title of, say, 50 of those is even more of a hassle IMO.

Mark_S · August 24, 2024, 7:23pm

I wouldn’t type them all by hand. Since you have them downloaded, you just do something like

dir > files.txt

Then edit files.txt to cut out the cruft. Also remove all .html, which probably needs to be stated more directly. There are also utilities on windows (File commander ??) that will let you copy a nice little list.

simon · August 25, 2024, 5:43pm

Right now you could probably scrape out a site list. (I’ll try it and share some example code). In future perhaps we’ll have a JSON end point to provide a list of all your sites.

Springer · August 25, 2024, 10:50pm

OK, I’m late to the game here, partly because your OP mentions MacOS, but I knew nothing about Powershell… except what I see now by googling it, and getting invited down rabbit-holes of instructions about how to set it up; the “how-to” pages all seem to presuppose familiarity with stuff that’s greek to me.

If I’m on MacOS, and have access to the terminal, can I run some version of your script to grab a download of each of my projects? Can you point me to a current step-by-step on how to download/enable powershell — also, ideally not just landing me at a github maze where it’s opaque (to a non-coder) how to find/download what I need)?

And on the details of your script: Does it work smartly by prioritizing the most recently-changed projects, and bypassing downloads that would be identical to what’s already in my backups archive?

Many thanks!

Mark_S · August 26, 2024, 3:13am

I’m not a Mac person, but it looks like you might be able to run the original script. Apparently Mac can run shell-like scripts. The question is whether it has all the tools the script needs – especially the curl executable. The actual instructions seem to be a matter of setting the rights to the script (which is what you also have to do on linux).

I would think that this is the authoritative resource for installing Powershell on Mac:

Ok, and looking at that page, it appears that Mac must have the curl executable available.

I guess I would try the “brew” instructions first, and see if they work.

That would be a cool future script. Currently, the script works blindly, not knowing anything about the status of the site. Possibly the script that @simon might come up with could do that.

It’s sort of a “Better to have an umbrella with holes than no umbrella at all” kind of script.

simon · August 26, 2024, 4:11am

Here’s a new “download all my sites” example script.

Actually it does use a newly added JSON end point to avoid needing to scrape the html.

simon · August 26, 2024, 4:20am

FWIW if you’re using a mac you should be able to use the example scripts just as they are, i.e. no need for PowerShell unless you have a special reason why you want to use that.

There’s no “smart” downloading/backups at this stage, but such a thing would be possible. A simple approach might be to download and then delete the downloaded file if it’s the same as previously downloaded file.

pmario · August 26, 2024, 9:50am

Hi folks,

I just wanted to let you know about my “old” backup scripts used for tiddlyspace, which sadly does not exist anymore.

I did have a short look at both source download helper source codes and saw, that they both start to download, without informing the user what’s going on.

So I thought it would be nice to have 3 steps.

Download some meta data, save it to files and let the user know what’s going on
Tell the user it may take some time to download and ask if they want to continue
2.a User may say No and first may want to modify the list of files they really want / need.
If they want to go on. Use the meta data files and download according to those files

Step 2 may not seem to be necessary, but I tell you, if there are 22000+ spaces you can choose from, this info is important

I did create 2 backup-scripts for tiddlyhost, 8 years ago. Both of them follow this schema.

The repos are at:

Just some thoughts.
Mario