Deriving wiki structure from titles (long)

I’ve spoken of a current project in three different threads a few months ago. The idea is to port the current policy manual for my regional school system from a pile of PDFs to a single wiki.

After a hiatus, I’ve come back with a significant refactoring, and I’d love to hear thoughts on the new approach, and to find if others have similar needs. If they do—if there’s an appetite for this approach—I probably still will not try up front to make it more generic, but I will at least try to keep in mind such needs while I make design decisions.

A policy manual is something like a collection of legal documents. One of the major goals of this project is to follow the Philosophy of Tiddlers and ensure that our tiddlers are the “smallest semantically meaningful units”. And one step further, we want each of those units to be easily addressable, easy to link to, etc. These should be built up into documents which combined their children in various outline or block styles to match the original PDFs.

The old approach

The tiddler Restrictions on Use of School Facilities (#Policy1410(C)) looks like this:

title: Policy1410(C)
tags: Policy1410
caption: Restrictions on Use of School Facilities
marker: C
parent: Policy1410

The following restrictions shall apply to the use of school facilities:

<<children outline-block>>

Any violation of this Policy or any applicable Administrative Regulations may result in
permanent revocation of the privilege to use school facilities against the organization and/or
individuals involved. 

And its child, Refreshment Restrictions (#Policy1410(C)(3)) looks like this:

title: Policy1410(C)(3)
tags: Policy1410(C)
caption: Refreshment restrictions
marker: 3
parent: Policy1410(C)

Refreshments may not be prepared, served or consumed without the prior
approval. Notwithstanding, only those beverages permitted by state law may be
sold during the school day. Upon approval, refreshments may be prepared, served
and consumed only in areas designated by the responsible administrator.

I used tags for one of the TOC macros. But for various bits of custom code, like the breadcrumbs at the top, I used the parent node. I used the marker field to show the C or 3 in the appropriate lists. The original PDFs have many different outline structures. I used the <<children outline-block>> inside the text to show where the child nodes are to be placed and and which of the many structure styles I would use. (That is, another section might include <<children plain>>, <<children definition-quoted>>, or a dozen others.)

The infrastructure code to use all this was relatively simple, as I had named fields for everything. But the per-tiddler setup was fiddly and time-consuming.

Another goal is to eventually remove myself from the maintenance of this. The admin likely to take over has no TW experience. I’m still going to need to develop tools for maintenance, but I didn’t like the conceptual weight – even if my tools automated the creation of all the appropriate fields, I wanted this to be simpler to understand as well.

So I refactored

The newer approach

The current code for Restrictions on Use of School Facilities (#Policy1410(C)) looks like this:

title: Policy1410(C)
caption: Restrictions on Use of School Facilities
structure: outline-block

The following restrictions shall apply to the use of school facilities:

<<sections>>

Any violation of this Policy or any applicable Administrative Regulations may result in
permanent revocation of the privilege to use school facilities against the organization and/or
individuals involved. 

and its child, Refreshment Restrictions (#Policy1410(C)(3)) looks like this:

title: Policy1410(C)(3)
caption: Refreshment restrictions

Refreshments may not be prepared, served or consumed without the prior
approval. Notwithstanding, only those beverages permitted by state law may be
sold during the school day. Upon approval, refreshments may be prepared, served
and consumed only in areas designated by the responsible administrator.

What should be obvious is the reduced number of fields involved. There are no more tags. parent, and marker fields, reducing the conceptual weight. There are occasional other uses of tags, but we are not cluttering up the tag namespace with every non-leaf node in the tree.

Techniques

So how did we accomplish this?

By using the titles to derive all this information.

We had used the parent field to derive our breadcrumbs. Well, we can find the equivalent algorithmically. The parent of Policy1410(C)(3), is found by removing the last parenthesized portion of the title, giving Policy1410(C), and doing it again, we find its grandparent is Policy1410, and using just the initial 1 in the title, we can find its great-grandparent is Section1000.

Code

In $:/_/rham/procedures/breadcrumbs, we use these function to find the parents of any of our main tiddlers:

In $:/_/rham/procedures/sections, we have

\function get.parent(tid)
  [<tid>!prefix[Policy]then[]] 
  [<tid>regexp<marker>then<tid>search-replace::regexp<marker>,[$1]]
  [<tid>removeprefix[Policy]split[]first[]addsuffix[000]addprefix[Section]]
  +[first[]] 
\end get.parent

We had used the marker field to show outline markers, usually as links to the related tiddlers. Now we can just extract that last parenthesized portion of the title as our marker.

Code

In $:/_/rham/procedures/sections, we have

\define markerRegex() .*\(([^\(\)]+)\)$
<!-- ... -->
<$let marker={{{ [<tid>search-replace::regexp<markerRegex>,[$1]] }}}>

and that marker is then passed to any of the outline handlers.


And we had used tags to define the hierarchy to show in the TOC. Now we wrote out own TOC macro, patterned on the core versions, but simplified, and using functions that filter all tiddlers to find the children of a given Section, Policy, or Policy section, just based on the title.

Code

The macro is in $:/_/rham/procedures/toc. It’s too long to include here, but suffice it to say that it’s patterned very closely on the core’s toc-selective-expandable.

The functions it uses to list child-nodes are in $:/_/rham/functions/children:

\function sec.children(pol)
  [<pol>removeprefix[Section]removesuffix[000]addprefix[Policy]] :map[all[tiddlers]prefix<currentTiddler>!regexp[\(]enlist-input[]join[ ]]
\end sec.children

<!-- todo: simplify and combine regexes -->
\function pol.children(pol)
  [prefix<pol>] -[<pol>] :filter[trim:prefix<pol>regexp[^\(]!regexp[.*\(.*\(]] +[enlist-input[]join[ ]]
\end pol.children

\function children(pol)
  [function[sec.children],<pol>]
  [function[pol.children],<pol>]
  :filter[!is[empty]first[]]
\end children

We’ve also moved the outline-block from a macro parameter to a structure field, and renamed children to sections. (In a document on education policy, children really has another more important meaning.) Moving this doesn’t remove any additional conceptual weight, but it simply seemed more logical as field metadata rather than as a macro parameter in the text section.

(One might think that we should be able to use a ViewTemplate to avoid having to call a procedure for this altogether. I chose this example in part to demonstrate why we can’t. Note that the procedure call is nested between other blocks of text. For some policies or policy sections, it will be the only text content. For others, there will be preceding text. For others there will be following text. And sometimes, like here, there’s both. And that makes doing it with a template next to impossible. [Yes, I can think of ways, but they’re terrible!])

Note that the new infrastructure code is significantly more complex than its older counterpart. But this seems worthwhile if it makes content entry and maintenance much easier.

Questions

This brings to mind a few questions. I’d love to hear your feedback:

  • Would you use a more generic version of this idea for any of your wikis? The idea, in short, is that we structure our titles in a manner that allows us to derive document structure. Here we just use short parenthesized markers and the simple relationship between titles like Policy3141 and Section3000 to derive a hierarchical table of contents, lists of breadcrumbs, outline makers and the like.

  • Do you have suggested improvements? I’d love to hear anything, from typo corrections to major algorithmic alterations. But I’m especially interested in simplifications to the infrastructure code (i.e. anything in the $:/_/rham/ namespace.

  • Do you have any ideas on how to make a useful content creation/modification interface? I would really like to make it simple to append a sub-sub-section to a tiddler. Or a section. If we’re inserting into the middle of a outline list, we should automatically renumber later entries. Have you built or seen any components that would help with this?

2 Likes
  1. This macro is suitable for structures like xxx(a)(1) or “a.1 title” or “1.1.1 title”.

  2. Use the “structure” field instead of <<sections outline>>. This is because tiddlers can be generated using spreadsheets? Can they be conveniently edited in spreadsheets? Otherwise, <<sections outline>> seems more conventional.

  3. Placing this numbering within headings is suitable for migrating PDFs. However, when creating new documents rather than migrating existing ones, it leads to significant re-numbering workload after inserting content during editing. Therefore, omitting numbering and using simple {{xxx}} is a simpler approach. Using spreadsheets as the editing tool can resolve this automatic numbering issue.

  4. The format “1.1.1 title” can be used to remove the caption field.

  5. TiddlyWiki’s list field automatically assigns sequential numbering. Therefore, generating content based on tags and list fields requires less workload. You can customize the numbering scheme in macro parameters—choosing between 1, 2, 3 or a, b, c.

https://tiddlywiki.com/#list-tagged-draggable%20Macro%20(Examples)

Thank you very much for your considered response. This is all great feedback.

That’s the main reason I’m asking the question about a more generic version. Are there other use-cases with similar title-derivable hierarchies? And if so, is there a not-too-complex set of tools we could write to treat these uniformly?

Also note, thiis is not a single macro, procedure, function, or template. This is a large collection of them working in tandem. If I do make this more generic, it would be more like an edition.

No, there is no thought of using spreadsheets for this. This is an attempt to replicate an existing collection of PDFs in a more useful format.

The original version was like that, and I am not at all committed to this change. I switched because there can logically only be a single such call per tiddler. It’s a marker for “Put my main content here, please,” for any tiddler that has children. As noted, I can’t just infer this, because there may well be some prefatory or following text that logically belongs with the children. So I have to be explicit about it. But as this describes the structure of the tiddler and not specifically the structure of the inclusion, it seemed to make more sense as tiddler metadata.

It also has one additional advantage: If necessary, I can now easily use that structure field to selectively apply templates to different styles of blocks.

For my use-case, it’s not just a matter of migrating. These documents cannot be casually changed. All but the most trivial changes are reviewed or suggested by the Board of Education’s lawyer. The school superintendent reviews them and submits them to the Board’s Policy Committee, who can accept, reject, or alter them. If that committee approves, they get brought before the whole Board for a first reading, and then at a subsequent meeting, they may be approved, rejected, or sent back for changes.

So the goal here is to update only the format of the documents. There should be no structural changes at all.

Absolutely. I will be looking at tools to make this easier. I would like to create a procedure that allows an insert, but with a warning like

Are you sure you want to insert a new section, E, here? That will involve moving the current E, F, and G to F, G, and H, as well as similar changes for any of their children. Proceed? (y/n)"

I am curious how you would use spreadsheets for this. I can’t think of a workflow where that would be easier than using the built-in TW tools… and I’m one who does a lot of bulk tiddler manipulation in Node.

I could. It would make all the code that’s generating the hierarchy from the titles still more complex. But it’s certainly doable.

However, I’ve long had a strong preference for simple and readable URLs… I would much rather point people to https://crosseye.github.io/rham-policy/#Policy1410(C)(3) than to (the imaginary) https://crosseye.github.io/rham-policy/0.5.0/#Policy1410.C.3%20Refreshment%20restriction. Keeping simple URLs is definitely one of the goals of this project.

This was the main alternative I considered to the current approach. It would allow dragging and dropping titles/captions for rearrangement, which would be much easier than writing custom tools.

But it would make linkability much more difficult. And it would make it awfully easy to restructure these legal documents without the legal review process. The main reasons I chose for not going this route were to have reasonable permalinks and to retain the structure of the source documents.


Again, thank you for your response. You’ve given me lots to think about!

OK. I understand why you set the structure field. You want to read information from the field when assembling content. The <<sections>> tag only serves as an anchor.

Since the policy cannot be changed, you can only transfer content unidirectionally from PDF to TiddlyWiki. Therefore, the numbering remains fixed.

The URL design is your preference.

The data structure of TiddlyWiki [{title:xx},{title:yy}] is inherently a two-dimensional table. It is well-suited for bidirectional conversion with spreadsheets. In spreadsheets, auto-filling can be used to automatically generate serial numbers.

If the workload isn’t too heavy, I’m looking forward to a macro for assembling content based on tags and lists.

Exactly.

Yes, although there certainly changes made over time. Most policies list a number of revision dates. But most of these are minor and only rarely affect the outline structure. I still do have to worry about it, as it can happen, but it’s not a high priority now.

I think I see. If there is a two-dimensional data-set, this could make sense. I don’t think it affects my current project, though.

I’m afraid I’m not following. The version I’m refactoring away from stored its hierarchy in tags and a few other fields. The point of the refactoring, though, was to simplify the data structures required for this. Right now it involves only the title and structure fields.


Again, thank you for the feedback. It’s very helpful.