Proposal: Persistent Indexer Snapshots — eliminating redundant index rebuilds on startup

Following the discussion in this threads, where Mario noted: “The global cache being cleared unconditionally has bugged me for quite some time… if getGlobalCache is to be useful, it probably needs a dedicated invalidate function — and that should go through the indexer route.” This proposal is a concrete design in exactly that direction.

The Problem

TiddlyWiki’s three core indexers (TagIndexer, FieldIndexer, BackLinksIndexer) use correct incremental update logic — but they are purely volatile runtime structures. Every time a wiki is opened, all three are rebuilt from scratch with a full O(n) scan, producing results identical to the last saved session. In single-file HTML mode, this is entirely redundant computation.

Additionally, several high-frequency operators have no indexer backing and fall back to full table scans on every invocation:

Operator Current With index
[has[field]] O(n) full scan O(1) lookup
[!has[field]] O(n) full scan O(1) lookup
[field:date>=[20260101]] O(n) full scan O(log n) binary search

Proposed Changes (three small, targeted modifications)

1. Serialise at save time
When a save is triggered, each indexer writes its current state — compressed — into a dedicated system tiddler ($:/indexes/TagIndexer, etc.). These tiddlers are included in the saved HTML alongside all other tiddlers, requiring no changes to the saver pipeline beyond a hook.

2. Deserialise at startup
Before the full-scan rebuild, each indexer checks for a valid snapshot tiddler. If found, it deserialises directly into memory and skips the scan. If not found, or if a sanity check fails (e.g. tiddler count diverges beyond a threshold), it falls back to the existing full-scan behaviour — unchanged and fully backwards-compatible.

3. Fill index coverage gaps
Add two new lightweight indexers under the existing module-type: indexer contract:

  • HasFieldIndexer — a Set<title> per field name; answers [has[field]] in O(1)
  • SortedFieldIndexer — a sorted (value, title) array per field; enables range queries via binary search in O(log n)

Both follow the existing incremental update(descriptor) contract and are automatically eligible for persistence under changes 1 and 2.

On compression: Index data has extremely high repeated-string density (tiddler titles appearing across many tag/field buckets), making it ideal for LZ-family compression — typically 70–90% reduction. The actual HTML size increase is small. Snapshots are stored as deflate-compressed Base64 in tiddler text, a format TiddlyWiki already handles natively. Implementation would use fflate (pure JS, synchronous API, ~10KB gzipped) to avoid async timing issues during the boot sequence.

This does not change the single-file nature of TiddlyWiki. Index snapshots live inside the HTML file alongside tiddlers. The wiki remains a single, self-contained file.

Full design document: GIST LINK

What this proposal does NOT address

  • The unconditional clearGlobalCache behaviour — not directly fixed here, but this proposal lays the groundwork for a future dependency-aware invalidation scheme
  • Filter execution pipeline optimisations (lazy limit[], operator reordering) — separate follow-on work
  • Node.js / client-server deployments — out of scope; the single-file save model is the key enabler here

Open questions for the community

  1. Should serialise() / deserialise() be optional on the indexer interface (indexers that don’t implement them simply always rebuild), or required for all new indexers going forward?
  2. Compression library: bundle fflate as a core dependency, or let each indexer choose its own serialisation strategy?
  3. Should $:/indexes/ tiddlers be included or excluded by default in tiddler exports? (Included = exported wiki opens fast; excluded = smaller export, rebuilds on first open)
  4. Is per-indexer granularity sufficient, or do very large wikis need sub-indexer sharding (e.g. TagIndexer split by first character)?