Tiddlyhost outage Feb 14 ...?

New outage, just noticed (Feb 14 21:13 UTC)

I’ll delete post if it comes up very soon. Otherwise perhaps this will catch the attention of @simon

Should be back up now.

A quick share on what I think is going on:

  • Every so often Tiddlyhost gets hit by heavy load. Sometimes it’s scripting attacks, sometimes it’s web crawler bots that don’t respect the rate limit specified in the robots.txt file. (Or they do but they don’t understand that *.tiddlyhost.com is all the same server, so they think they can fetch everything at once.)
  • Because of the heavy server load, the CPU usage spikes up, and things start getting slower.
  • Sometimes it recovers after a few minutes, but other times there’s a tipping point where the server doesn’t recover and it needs a manual reboot.

In this screenshot you can see a smaller spike that it did recover from, and the bigger one from just now, that required a reboot.

So I’m trying to better understand what kind of events cause the load spike, what the tipping point is and how to avoid it, etc. I think that this helped, but I really have enough visibility into what’s going on, especially during the high CPU load. I can pull down log files from nginx, but I don’t have good tools for analyzing them, or tracking metrics (other than the Linode console where that screenshot comes from).

Thanks for your patience, and apologies for the unscheduled downtime. And if you’re a site reliability expert with suggestions on how to make Tiddlyhost more reliable, I’m interested!

13 Likes

I’m just appreciative that you do this at all!

Remember everyone, you can pay some money to tiddlyhost to help Simon cover costs!

https://tiddlyhost.com/pricing

5 Likes

Thanks Simon! Much appreciated…

Sorry to be the bearer of bad news @simon, but it appears that there may be another outtage going on right now.

Hopefully it isn’t a scripting attack :pensive:

Edit: Possibly not! just as I posted it appears Tiddlyhost is back up and running! Happy to see that it wasn’t anything serious :grin:

1 Like

Seems to be down again, for at least half an hour now, could you please take a look @simon ?

1 Like

Yup, it’s down atm for me.

One possible approach if possible would be to have a script running that tries to see if a number of hosted wikis respond. If out of 10 wikis 10 don’t respond, a server reboot may be initiated or perhaps just an alert message is send to one of several people capable of responding. One check every N minutes, 60 for hourly checks.

The script should not be running on the server itself, as otherwise the script could stop too.

I think the Tiddlytools plugin would be a good place to start, as it has several time functions.

2 Likes

Although TiddlyTools’ $action-timeout widget can be used to periodically invoke wikitext-based scripts that could, in theory, be used to check the status of a remote server (such as Tiddlyhost), this would require the use of a TiddlyWiki that would have to always be loaded into a client-side browser and does not suspend processing when the browser is in the background.

A much better approach for checking server status would be to have a server-side periodic cron job that uses curl with a --connect-timeout parameter to attempt to fetch “a number of hosted wikis” from TiddlyHost. Of course, this server-side cron job would have to be running on a separate server from the one that run TiddlyHost. This is similar to the way https://www.githubstatus.com/ is a separate server that is used to check the operational status of https://github.com.

-e

2 Likes

tiddlyhost still down? :cry: :broken_heart:

Looks like it, I just checked.

Would love to have @simon confirm (after this outage event is resolved) whether this kind of system would be easy to set up.

Just as important as the detection routine, however, is the question of what process is triggered if Tiddlyhost is down… As always, my hope is that more people could be in the notification / troubleshooting loop so that no one person is on the hook, 24/7, as the only person who can get Tiddlyhost back online…

Should be back up now, apologies

The automated outage detection and restart is a good idea, I’ll see if I can figure something out.

2 Likes

If the amount of work isn’t too much, I’d recommend rewriting it in golang. I’ve found that backend applications written in golang are less prone to crashing.

No need to know golang. just use AI assist.

This got me really triggered.