MultiWikiServer and SQLite

jeremyruston · January 31, 2024, 3:33pm

I wanted to discuss an important but more technical aspect of the recently announced MultiWikiServer plugin, which is its usage of SQLite for data storage.

TiddlyWiki on Node.js currently stores tiddlers as “.tid” files in the file system. At startup, all the tiddler files are loaded into memory for rapid access and any subsequent changes are written to the file system.

This approach is not practical for MultiWikiServer because of the need to cope with large amounts of data. Instead, I’ve chosen to use a very popular database called SQLite for data storage. SQLite is an embedded database that is designed to be integrated directly into applications like TiddlyWiki. It does not function anything like a traditional server-based database accepting asynchronous connections from multiple clients.

SQLite is blazingly fast and efficient. In fact, it has proven to be faster than storing data in the file system for some applications. It is also highly extensible. See https://sqlite.org/ for more background. To get some background, I recommend this interview with the principal author of SQLite.

The key rationale for using SQLite is that the long term plan for TiddlyWiki in the browser includes the ability to use a WASM version of SQLite to store tiddler data. (You can see my experiments on SQLite in the browser in this PR).

For users, the most important implication of the use of SQLite is that all the tiddlers are now stored in a single file with the extension .sqlite. This is a standard file format that can be opened with many off-the-shelf tools, but there’s no question that it is a big change from seeing tiddlers as individual files.

The solution is that we will have the ability to synchronise the database with external data sources such as the file system. Of course, this is the same technique we use to process tiddlers that have been modified in memory. Overall, it’s best to consider the database as being the equivalent of the memory store used in standard TiddlyWiki.

At the current point in the development process, there is a robust SQL schema that models the recipes/bags/tiddlers/fields of the system. There are efficient queries for simple operations like saving and retrieving a tiddler/bag/recipe, and for enumerating the tiddlers in a bag/recipe. In the future I want to develop the SQL interface much further, ultimately compiling TiddlyWiki filter expressions dynamically into SQL queries.

I have not yet done any work on optimising performance, being focussed on proving the correctness of the design through extensive tests, but I expect performance to be much better than with the existing implementation.

Any questions and comments welcome.

hiddengarden · January 31, 2024, 4:55pm

Greetings!
I am excited to see the next evolutionary step is towards performance. I run TiddlyWiki on NodeJS and there is hardly anything out there that can compete with that. I love the fact that it caches the content in local .tid files. Feels safe and unrestricted.

I hope in the future implementation there is a setting somewhere which would allow “hard tiddler” backups in the same shape and form because it will be a great loss if we lose that,

Is anything along the lines of an API in the works? Something that would allow Unreal Engine or Godot to communicate with the core?

jeremyruston · January 31, 2024, 5:12pm

The plan is to offer syncing to a folder of individual .tid files as a backup option.

Yes, there is an API. It is an evolution of the existing API. Is there anything particular you’d like to achieve with it?

Theophile · January 31, 2024, 5:18pm

This is awesome. I’m coincidentally currently working on a personnal project using SQlite WASM too, but I was told that the plain text search capabilities of SQLite was poor in term of performances. I came accross the FTS5 extension but was not able to test it myself yet. Do you think the tiddlywiki search engine will be able to be as fast as the current implementation with SQLite wasm?

Also, this project might be an interesting way to deal with large database when using a multiwikiserver setup: Using sql.js-httpvfs with browser - DEV Community (demo)

There is also an interesting blog post on the various, current implementation of SQLite WASM that can be found here : The Current State of SQLite Persistence on the Web (powersync.com)

poc2go · January 31, 2024, 5:39pm

If looking for a backup system - and since have opened the door of TiddlyWiki w/SQLite might want to take a look at Fossil for backup controls.

I use the autosync feature which does the commit → push → pull → update → merge. Keeping everything in sync.

Is made by Dr.Hipp of SQLite fame.

pmario · January 31, 2024, 3:37pm

@jeremyruston – Nice work!

I am curious about some design decisions. So here are the questions:

Why did you decide to use SQLite as a backend, instead going with a “file based” approach or an other database?
Since the SQLite database file is one large file. What’s the limit of the database-file size?
Did you find good performance data for a database that represents a specific usecase as TiddlyWiki?
What’s about the hardware requirements for SQLite?
What’s about “human readable” backups? aka files. IMO there is a reason, why users like files
Databases usually do not delete content. They mark it as deleted and keep it.
6.1 Is that also true for SQLite? (found an answer - below)
6.2 Is it possible to “purge” unused / deleted data from the database? (It’s about privacy concerns)

I did find an answer for 6ff (according to ChatGPT) –

SQLite does mark deleted data-space as available. So it will be overwritten by new data over time.
There is a purge command, which is called “vacuum”

-mario

jeremyruston · January 31, 2024, 4:04pm

The first issue is that our existing wiki folder format has some peculiarities that make it unsuitable for high performance applications. Notably, tiddlers may be provided in any of a number of formats, including some obscure variants like .multids files which allow multiple tiddlers to be specified in a single file. That means that the task of writing changed tiddlers back to the file system might be impossible without the system somehow changing the format of the files involved.

To be able to do syncing properly we need to be able to store metadata against each tiddler. That is awkward when using the file system, and much easier when using a database.

The reason I chose SQLite is, as mentioned above, to give us commonality between the browser and the server.

The use of one large file seems like a shortcoming, but it’s really a just common technique to get the performance benefits of bypassing the file system and accessing disc sectors directly. When I last worked with SQL Server it worked in the same way.

I’ve made tiddler databases >10GB without any problems, and SQLite itself has an upper limit of 281 terabytes. I would hope that databases up to 100GB will perform adequately.

The SQLite docs are very good on this topic; see https://www.sqlite.org/whentouse.html.

I’m not sure if I understand correctly. If you’re asking about the rationale for using SQLite, the answer is that the commonality with the browser trumps other considerations.

SQLite runs on anything down to mobile phones and embedded devices. It is just a single 9MB C source file that compiles to a few hundred KB of native code.

I think I’ve covered that point?

At the moment the schema that I am using does delete content in the usual way. I do intend to investigate retaining the revision history, but it’s not implemented at the moment.

TW_Tones · January 31, 2024, 11:50pm

@jeremyruston thanks for continuing to show tiddlywiki the love it deserves and once again innovating.

A few questions and thoughts

Will this approach “function like a traditional server-based database accepting asynchronous connections from multiple clients?” (To reverse one of your statements above).

From what you outlined it is possible to speculate what this means for tiddlywiki, but If I may ask some specific questions. I am happy to follow your other posts if an when you answer these questions. I expect in some cases the MultiWikiServer plugin will not do what I suggest below, I just ask these be kept in mind during development so they may be possible to accommodate one day.

Do you think it will be possible to publish a read only single file wiki that with an internet connection can retrieve its content from an internet hosted database, and thus allow changes to be saved?
- I believe it will be important to see if we can host such a database(s) on a variety of hosts and “give TiddlyWiki Wings”, as this is currently a major limitation. Node is somewhat unsafe for internet facing.
It’s great that filters can continue to operate on this data store, perhaps there would be value allowing SQL like queries as well?
Using a database backend model, does suggest we may be able to actively support multi-user wikis, is that part of this current development?
- Even if it was just serial editors based around locking by recipes/bags/tiddlers
  - although if one day it got to the field lock level, that would be interesting
The idea of being able to switch / hot switch the database is an interesting one.
- Load a wiki, login to a database, now the wiki is what the database content dictates.
- Start a wiki based on one data store (an edition) but clone it to another (an instance of an edition) and gain a custom wiki. Basically by changing the database store.
Can we also choose between the internal and external core introduced recently?
- With an external core, a static wiki + a user content database, additional instances of a wiki edition could be as small as the content the user adds to their wiki.
- Perhaps an external core + MultiWikiServer Plugin on a CDN would make sense.
Being able to take the wiki offline on different devices remains a valuable feature, can this model accommodate this?
- all this with the ability to save locally as a single file wiki is powerful.

Thanks for your innovative development approach and reaching out to the community.

LoryMoney · February 1, 2024, 12:22am

Thxs for the contribution!

Edited.
I had dificulties to launch the server for testing, this steps worked for me though:

git clone https://github.com/Jermolene/TiddlyWiki5.git --branch multi-wiki-support
cd TiddlyWiki5
node ./tiddlywiki.js ./editions/tw5.com-server --listen

jeremyruston · February 1, 2024, 5:37pm

I believe that the FTS5 extension is now included in the default WASM distribution of SQLite. I’ve only looked at it superficially, but I am hoping that the new schema should be particularly well suited to full text search because of the way that it stores tiddlers as multiple separate records in a fields table:

-- Tiddlers are contained in bags and have titles
CREATE TABLE IF NOT EXISTS tiddlers (
	tiddler_id INTEGER PRIMARY KEY AUTOINCREMENT,
	bag_id INTEGER,
	title TEXT,
	FOREIGN KEY (bag_id) REFERENCES bags(bag_id) ON UPDATE CASCADE ON DELETE CASCADE,
	UNIQUE (bag_id, title)
)

-- Tiddlers also have unordered lists of fields, each of which has a name and associated value
CREATE TABLE IF NOT EXISTS fields (
	tiddler_id INTEGER,
	field_name TEXT,
	field_value TEXT,
	FOREIGN KEY (tiddler_id) REFERENCES tiddlers(tiddler_id) ON UPDATE CASCADE ON DELETE CASCADE,
	UNIQUE (tiddler_id, field_name)
)

I did indeed see that library when it first popped up. It does open up some very interesting possibilities, such as a configuration hosted on a static server in which tiddlers are dynamically loaded from the database as required.

jeremyruston · February 1, 2024, 5:39pm

Fossil looks great, and I really liked SQLite’s rationale for not using git.

jeremyruston · February 1, 2024, 6:00pm

No. I was describing the shape of a regular SQL database like Postgres or SQL Server to emphasise the differences from SQLite.

What is an internet hosted database? Is it the same as a server? In our architecture, the database is a subcomponent of the server.

It might make sense to allow end users to write SQL queries for importing or syncing data, but I don’t think end users would want to be writing queries that target the schema used by the tiddler store tables.

What do you mean by active support for multi-user wikis? Both vanilla TW on Node.js and TW with MWS support multiple users at the same time.

In our architecture the code running in the browser is loaded from the server, which retrieves the data for the wiki from the database. Once the wiki is running in the browser it uses the API to save changes back to the server, which saves the changes back to the database.

So, the wiki running in the browser isn’t aware that there is a database, it just knows about the server from which it came.

I do plan to support the external core build, just as TiddlyWiki on Node.js does today.

I think you’re asking about working on an offline wiki that syncs back to a server? I do also intend to support that configuration.

jeremyruston · February 1, 2024, 6:00pm

I think the missing step is npm install which installs the better-sqlite3 library that we use.

wattahay · February 1, 2024, 6:35pm

WASM and SQLite are very exciting. TiddlyWiki will gain immense development possibilities.

Does this mean that Jeremy Rust-on will make TiddlyWiki be able to get on with Rust - or to get its rust . . . on. badda ching . . . ug

How this ends up with WASM will be an extraordinarily interesting topic to follow, and it will be very cool to see it in action!

TW_Tones · February 1, 2024, 9:09pm

I think I am starting to understand the scope of this project, primarily an alternate “back end” for saving tiddlers that enables multiple wikis to be hosted, on top of node.js.

If I am wrong please correct me.

In this case “an internet hosted database” I was thinking a seperate database not a subcomponent. For example the earlier solution to use CouchDB and PouchDB (in node). I was aware there is SQLite hosting available.

If this is not a component of the current work, that is fine but it seems to me one of TiddlyWiki servers limitations is the complexity of safely putting node servers on the internet.
I had thought that we may replace the tiddler store with with a seperate internet hosted SQLite database and avoid node altogether, ie rather than the less available CouchDB
perhaps this could be in our future?

This was in relation to a database independent of node.js but I see I misunderstood the scope of “MultiWiki Server and SQLite”.

jeremyruston · February 1, 2024, 9:32pm

Thanks everyone for the feedback, much appreciated.

I’m certainly not complaining about the felicitous naming, but I am of course more generally excited about Rust and WASM. My thinking at the moment is that if I was to ever rewrite the TiddlyWiki core I would do so in Rust, compiling it to WASM for the browser…

I think that’s a very fair way of capturing the scope of this work.

Another way of thinking about it is that we’re moving TiddlyWiki on Node.js from “proof of concept” to “industrial strength supertanker that also happens to run on your mobile phone”…

That makes sense. Those kinds of database are generally available via an HTTP API which should be usable with TiddlyWiki’s existing handling for the tm-http-request message.

TW_Tones · February 1, 2024, 10:03pm

Thanks for helping me wrap my head around this @jeremyruston

Perhaps we can build a solution, seperate piece of work, to make this more accessible and set a standard?

Perhaps consider leveraging code in “MultiWikiServer and SQLite”

Scott_Sauyet · February 2, 2024, 3:27am

This worries me a bit. One of my chief reasons for choosing the (current) Node backend is that while I’m in active development of a wiki, I can make multiple git commits, each of them containing only minimal diffs. When I’m in a groove, I might do this as often as five to ten times an hour.

In the new scenario, will the be equivalent be to commit the .sqlite file every time? How heavyweight will that be?

I know there is some backup-to-file facility. Will that offer me a chance at a similar workflow?

jeremyruston · February 2, 2024, 10:36am

Hi @Scott_Sauyet we already have good support for storing tiddlers as files but there are a number of reasons why I don’t think we can go much further with it. Using SQLite is a way to get around the limitations of TiddlyWiki that stem from the use of a file store. If we stay with a file store then we won’t be able to fix/improve some important features.

To give a meaningful example, I need to first explain how syncing is accomplished between the server and the browser. One aspect of it is that the browser needs to be able to say to the server “Have any of your tiddlers changed?”.

This is accomplished by the server returning a list of all the tiddler titles that it knows about, along with their revision count. The revision count is a number that is incremented each time the tiddler is changed.

The part of TiddlyWiki running in the browser then compares those revision counts with the revision counts it received the last time around, and thus can deduce the tiddlers that have changed.

A severe problem we have with this arrangement is that the revision counts are only maintained in memory on the server which means that the revision counts are reset to zero whenever the server restarts.

This means that any browsers connected to the server at the time it is restarted may experience data loss because it will see the tiddlers on the server as being older than they really are.

To fix this, we need to be able to persist the revision count against each tiddler. But that turns out to be very hard given the constraints we’ve given ourselves:

We could store the revision count as a field of the tiddler itself, stored within the tiddler. The problem then is that if a user were to drag those tiddlers to another wiki folder then the revision counts would no longer be accurate
We could store the revision count elsewhere (say in a blob of JSON stored in the filesystem), but then we end up with something rather brittle: what happens if the JSON file gets out of sync with the tiddler files?

Using SQLite makes things much easier: the revision count is an intrinsic part of the tiddler table, and it is maintained for us by SQLite itself. It is not possible for a tiddler to be modified without the revision count being updated.

Better than that, robust and durable revision counts will enable us to make synchronisation much faster and more efficient. SQLite itself tracks the maximum current revision count, which will let the browser ask “Which tiddlers have changed since revision count X?”, and get a list of just the tiddlers that have changed. Even better, all of this can be accomplished using standard HTTP “etag” semantics, which means that API clients other than TiddlyWiki will be able to take advantage of these features.

Taking us all the way back to the top, this more efficient syncing is what will enable us to much more easily and efficiently synchronise data from the database to external stores such as Git (or Fossil).

Scott_Sauyet · February 2, 2024, 2:18pm

Thank you for the detailed explanation. I did assume that the rationale was something like this. I have nothing against SQLite – in fact I’m a big fan.

Please don’t take my comment as criticism. This was more a lament for part of my workflow which has been fruitful for me. That doesn’t dim my excitement for the overall changes.

And this makes me wonder if I will be able to simply add one command to my usual workflow:

> ??   # some SQLite sync command goes here
> git status
> git add .
> git commit -m "Add the foobar section"
> git push

I guess I’ll have to see. Meanwhile, I’m reading up on Fossil, and might have to try it at home for my next project. Thanks for the lead!