How to simplify a JS macro that searches for the first match and formats a result using it

I’m looking for a way to simplify a somewhat complex JS macro, and was wondering if anyone had advice.

I’m building a wiki that is meant to replace the collection of PDFs that serve as the Policy Manual for a local school district. It’s modeled on another project I did, which attempts to replace the single PDF town charter for my town with a TiddlyWiki version. Both documents contain links to various Connecticut statutes; in the originals, these are just mentioned by name, but here it’s helpful to offer actual links. The Charter, though, had fewer than twenty such links, easy enough to create manually. I’m guessing this one will have 500 or more. I don’t want to look those up by hand and add all those links. So I created a macro to do this.

There is a hierarchy of Volume > Title > Chapter > Section, and the sections break down in many ways. But to cite them, you need only the title and section. Three are nearly 1100 chapters, each containing from a handful to many dozens of sections. But the citations need to know the chapter number as the sections are just anchors on chapter pages. For instance, School climate survey is in Volume 1 > Title 10 (Education) > Chapter 170 (Boards of Education) > Section 222gg. It’s cited as CGS 10-222gg, but the url includes the chapter, https://www.cga.ct.gov/current/pub/chap_170.htm#sec_10-222gg.

What I have built is a macro to take 10-222gg and generate from it something like this:

<a class="cgs" href="https://www.cga.ct.gov/current/pub/chap_170.htm#sec_10-222gg" >10-222gg</a>

To do this, I first extracted a list of chapters and their start/end sections from all the links off the titles page of the state law webpage. I parsed this list to get a JSON object with records like this

{
    "Title": "1",
    "ChapterType": "Chapter",
    "ChapterNumber": "1",
    "FirstSection": "1-1",
    "LastSection": "1-3b",
    "Description": "Construction of Statutes"
}

But I only need the ChapterType (mostly “Chapter”, but for a few, “Article”), the ChapterNumber and the LastSection in order to find the correct record and format the output. And since this is a lot of text, I reduce this to C|1|1-3b to store in my macro.

All this was one-time setup. At startup, the macro expands those shortened forms back into {ChapterType, ChapterNumber, LastSection} records. When it’s called with, say, 10-222gg, it searches through them one by one until it finds one whose last section is greater than 10-222gg. (This has a good deal of complexity on its own, as we have to break that apart into numeric sections, punctuation, and alphabetic sections, and the alphabetic ones need to sort a, b, c, ...z, aa, bb, cc, ...zz, aaa, bbb, ..., but I have already written that code.) Once I have the right chapter, it’s a matter of some fairly simple formatting. Well, with some caveats about Chapter versus Article, and the different naming conventions for them: not hard, just annoying.


So that’s the macro. It’s got a long, ordered list of shortened formats of the 1100 chapters, which expand into more usable objects. When called, it searches those objects for the first one that will include the section parameter supplied, and then uses that object and the section parameter to format a link.

The process in convoluted, the code is fairly complex. I can’t even think about porting this to wikitext unless I can simplify things. So I’m looking for advice. Can anyone suggest ways to simplify this?

Compare, this if you will, to the quite simple procedure for federal citations. Here’s the whole thing:

\procedure usc(title, section)
<a class="usc" href=`https://www.law.cornell.edu/uscode/text/$(title)$/$(section)$`>
  <<title>> U.S.C. § <<section>>
</a>
\end

This is helped by Cornell Law School taking it upon themselves to organize this in a logical way, with links that depend only upon the parts needed for citation: title and section. I would rather be able to cite official sources, but the hierarchy in federal law is even more complex, and I don’t think I could comfortably replicate it without pages and pages of code, if I could even do so at all. For instance, here’s the link to what’s cited as 10 U.S.C. § 7973:

https://www.govinfo.gov/content/pkg/
  USCODE-2023-title20/html/USCODE-2023-title20-chap70-subchapVIII-partF-subpart5-sec7973.htm

That includes the Year > Title > Chapter > Subchapter > Part > Subpart > Section. I don’t even want to think about it!

In any case, you can see some sample calls to the two macros in Legal links demo.


Is there any good way to simplify this?

@Scott_Sauyet does the resource promise to keep the urls it currently uses?

One trick maybe to see if they have a good search facility that you can reverse engineer as we can with google, ie create a url with the serch terms included, thus if you search for a particular reference and it has being changed, moved or added to, the new references may come up in a search. Similarly still work if they change the url structure.

1 Like

That structure has been static for around 20 years, maybe more. So I’m not worried about that.

I’ve found no useful search feature for it. Their whole site is antiquated, even by government standards. I suppose I could use something like the Google API, but that would involve a very different sort of interface. Right now, I am linking to the definitive location for a specific statute. With a search API, even if I limited it to the correct site and took only most prominent result, I would be mostly crossing my fingers that the result is the right one.

That, I agree, would be very nice. With my current approach, I’d have to update my macro.


Thank you for the response. My first impression is that this wouldn’t do what I want. But I will look again tomorrow. It’s an approach I hadn’t considered.

As you say, It may be posible to craft a google search with sufficent terms to search the destination site. It may add value.

Are you married to a) having an ordered list of chapters, and b) calling the macro with a section only? If the goal is ultimately to convert this to wikitext, my first instinct is to convert your list of chapters to a JSON data tiddler, e.g.

{
    "C 1 1-3b": "Construction of Statutes",
    "C 2 1-6": "Legal Holidays and Standard of Time",
    "C 3 1-21l": "Public Records: General Provisions",
    "C 4 1-25": "Oaths",
    "C 5 1-27": "Bonds",
    "C 6 1-41": "Uniform Acknowledgment Act"
}

I did a quick test using your full data set here: Link Indexes.tid (62.3 KB)

  • You don’t really need the chapter titles, of course, though personally I think it’s nice to have them since it opens up more avenues for filtering. If you do want the titles, we’re stuck with JSON format since some of them include colons. Incorrect; I recommend using a dictionary tiddler without reservation.
  • If you don’t need the English, you could do this with a dictionary tiddler instead, and that would be a little more concise and legible.

The upside to storing the data this way is that it’s easy to retrieve the value of a given index. The downside is that it doesn’t preserve the original order of your list, so your current approach to searching doesn’t really work, and we may have to resort to something slightly more verbose:

\procedure cgs2(type, num, sec, label)
\function get.type() [{!!title}split[ ]first[]match[A]then[art]] ~chap
\function display() [<label>!match[]] ~[<sec>]
<a class="cgs"
	href=`https://www.cga.ct.gov/current/pub/$(get.type)$_$(num)$.htm#sec_$(sec)$`
><<display>></a>
\end

* `<<cgs2 C 170 10-222gg>>`: <<cgs2 C 170 10-222gg>>
* `<<cgs2 C 170 10-222gg>>`: <<cgs2 C 170 10-222gg "School climate survey">>

It would also be pretty easy to extend the macro to link to whole chapters:

\procedure cgs2(type, num, sec, label)
\function get.type() [{!!title}split[ ]first[]match[A]then[art]] ~chap
\function index() [<type>] [<num>] +[join[ ]] :map[[Link Indexes]indexes[]prefix{!!title}]
\function display() [<label>!match[]] ~[<sec>!match[]] ~[[Link Indexes]getindex<index>]
<a class="cgs"
	href=`https://www.cga.ct.gov/current/pub/$(get.type)$_$(num)$.htm${ [<sec>!match[]addprefix[#sec_]] }$`
><<display>></a>
\end

* `<<cgs2 C 170 10-222gg>>`: <<cgs2 C 170 10-222gg>>
* `<<cgs2 C 170 10-222gg>>`: <<cgs2 C 170 10-222gg "School climate survey">>
* `<<cgs2 C 170>>`: <<cgs2 C 170>>

image

Apologies if this isn’t the sort of thing you’re looking for. At any rate, I thought it was an interesting question!

2 Likes

Not at all. That was the first thing I could think of that might possibly work.

Well, yes, I think so. I got too deep in the weeds here and didn’t really describe the big picture well. My goal is to relatively easily convert each of the 150 or so current documents into one or more tiddlers. And then I would like this to become the primary format for this manual, which means that I would need to train some non TW-savvy users to update this on a regular basis.

But the source documents I’m using, and which such users would eventually work from, have only text like

Conn. Gen. Stat. 10-222gg

That needs to be converted to this:

https://www.cga.ct.gov/current/pub/chap_170.htm#sec_10-222gg

And the trouble is that there’s nothing in the original which gives me that chapter number of 170. I need some way to derive it. I don’t want to look up all ~500 of them as I convert the documents. And I definitely didn’t want those responsible for later maintenance to have to do that. It has to be mechanical.

So this whole process is meant to simplify that.

I don’t care if I call it like

<<cgs 10-222g>>

or

<<<cgs 10 222gg>>>

so long as it’s mechanical.

I will look when I’m back on a computer; it’s challenging on a phone.

Is that the case? Again I won’t be able to easily test from my phone, but I would have assumed that a dictionary tiddler would treat everything after the first colon until the end of the line as the value, including further colons.

While I don’t know now if that might be necessary, it certainly would be convenient to have it available.

On the contrary, while I didn’t originally spell out the limitations that would make this approach difficult, it’s exactly the sort of thing I was hoping for.

And more broadly, please don’t apologize for trying to help. Ever. You are one of the most helpful people on this forum!

Moreover, I think this sparked an approach that will work well and be much cleaner. So thank you very much.

That approach is a variant of one I had previously rejected as involving too much data. I can spider the Connecticut site to capture the chapter/section containment hierarchy and store that as you suggested in a dictionary tiddler. I would need to repeat this after Connecticut’s biennial updates, which is fine. I had originally rejected this when thinking I would need the full JSON payload I started with, which would be too large for my rough estimate of 10k - 20k total sections. I never reconsidered it once I compressed these to C|170|10-222gg. While that’s still a lot of data, it’s manageable. I could have records either like this:

...
10-222ff: C170
10-222gg: C170
10-222hh: C170
...

or, with more compression but slower searching, like

...
chap170: ... 10-222ff 10-222gg 10-222hh ...
...

And with a little work, I could remove all those 10- prefixes.

This sounds like it will work perfectly. Now to go see if I still have any of my old code to spider the state law website…

Me too! :wink:

Thank you very much for your help! I’ll report back when I have some results. (Likely a few days; this weekend is packed.)

1 Like

I was afraid that might be the issue. I don’t suppose the section IDs are unique across chapters? I know you weren’t responsible for the source documents; I’m just struggling to understand how anyone was ever expected to find the appropriate section from “Conn. Gen. Stat. 10-222gg” alone, if any chapter can hypothetically include a section 10-222gg. Are you meant to simply understand from context that it pertains to the BoE chapter?

No, you’re correct. I thought I’d had an issue along those lines before, but I may have been conflating it with line breaks, which do actually require a JSON index.

That’s very kind, thank you. I’d nearly finished typing by the time I realized that I might be missing some complicating factor, and thought, “Hmm, I hope I’m not missing the point entirely…” I’m glad it was able to inspire a solution regardless!

If you do go the index route (and particularly if you choose a more compressed version like chap170: ... 10-222ff 10-222gg 10-222hh ..., which would be my inclination, too) you may be interested in @pmario’s KeyValues plugin, which makes it much simpler to search index values and return the results in your desired format.

I’m eager to see how it goes! I’ve been admiring your use of TW for civic education and outreach for quite a while. Your community is very lucky to have you.

The full section ids, including the title (that is, the 10 in 10-222gg) are universally unique. But the second part might be repeated often. There’s a 1-5, and a 2-5, and a 42-5, etc. There might well be no other 222gg, though!

No, that’s not quite it. As above, the 10 indicates the title. The title is full of chapters, and the chapter numbers are unique across the site. The chapters contain sections, but the sections numbers are as a whole unique across the whole site. The latter sections, such as 222gg, are unique across the title. My guess is that Chapters were a later addition to a Volume > Title > Section hierarchy. That’s what it feels like. But to find a specific section, the manual process looks like this:

  • Ok, I want 10-222ggg. Let’s scan the root for Title 10.
  • There it is (inside Volume 3), Let’s visit title 10
  • Next, which chapter contains 10-222gg? We scan again, recognizing the slightly odd sorting order of z being followed by aa and then bb. We find that Chapter 170 contains Secs. 10-218 to 10-239k, which must include our 222*. So we visit chapter 170
  • Now we’re cooking with gas. We can quickly (maybe? or maybe CTRL-F) scan through the ~180 items in the TOC, to find that the section we’re looking for is at https://www.cga.ct.gov/current/pub/chap_170.htm#sec_10-223g.

My code was mostly meant to automate that scanning. The last step is easily automatable. Once we have the chapter, the link to the section is clear. The whole thing might take a minute, certainly no more than two. But it involves loading several different pages, and scanning through what might be large lists.

Thanks. I haven’t seen that one. I’ll check it out.

Now you’ve made me blush! But thank you. I got back involved with TW before I got back involved with the local community. But it seems I dove heads-first into both, and it’s really nice when I can use one obsession to help with another. :smiley:

1 Like

Here we go :wink: [INTRO] KeyValues Plugin

1 Like

I have some results.

I wrote some code to scrape the state law website and turned the result into a plugin, containing a dictionary tiddler for each Title in the law, with records like

222gg:170

reporting that Section 222gg can be found in Chapter 170.

For the one title (42a) that uses “Articles” in place of “Chapters” the value is prefaced with an A:

2-101:A002

These tiddlers are bundled together into a plugin, along with the procedure that accepts a statute reference (such as 10-222gg), splits it into the title (10) and section number (222gg), then looks up the section number in the dictionary tiddler for title 10, to find that we’re in chapter 170, then constructs the URL for that and builds a link out of it. (I think a useful refactoring at some point would be to extract the URL generation into its own public procedure, and layer the link generation as a thin shell around that. We’ll see.)

That code is relatively simple, although the Article/Chapter handling adds complexity, as does a second optional parameter that lets you display text different from the statute reference. It looks like this:

\procedure cgs(section, text)
<$let
  title={{{ [<section>split[-]first[]] }}}
  sect={{{ [<section>removeprefix<title>removeprefix[-]] }}}
  display={{{ [<text>!match[]then<text>else<section>] }}}
  _chap={{{ [[$:/plugins/crosseye/statutes/cgs/title/]addsuffix<title>getindex<sect>] }}}
  type={{{ [<_chap>prefix[A]then[art]else[chap]] }}}
  chap={{{ [<_chap>trim:prefix[A]] }}}
>
<% if [<chap>!match[]] %>
<a href=`https://www.cga.ct.gov/current/pub/$(type)$_$(chap)$.htm#sec_$(section)$` target="_blank" class="cgs"><<display>></a>
<% else %>
<<section>>
<% endif %>
</$let>
\end

This plugin is included in the latest version: $:/plugins/crosseye/statutes/cgs

@etardiff: I opted not to go for the version like

chap170: ... 10-222ff 10-222gg 10-222hh ..., 

because the logic to handle that is significantly more complex. (I don’t think Mario’s plugin would help, although I admit to only skimming the docs.) And it would by its very nature be less performant. I originally thought that this would be much more compact, because we don’t repeat the chapter number, but with all the repetition of 10- in the sample or whatever title number we’re using, we don’t save much. At some point I may play with it, just to be sure, but it’s falling to low priority at the moment.


This takes a lot more storage than my initial approach (which is still available for comparison in version 0.1.2), but the code feels cleaner, and that’s what I was looking for.

Thank you, @TW_Tones, @etardiff, and @pmario, very much for your help! I’d love to hear any feedback, but I feel I have this in good enough shape right now to move back to working on the main content.

2 Likes

It makes me ask the question if we could use the tm-http* messages made available a few versions ago to test each of a filtered set of html links for validity, that is that the destination exists.

Hmmm, I’m not sure if this is what you meant, but I could see that as an interesting alternative for the Article/Chapter distinction. Most everything is a Chapter, but a few are called articles, and we could first test if the content we’re looking for is under chap_NNN and if not, check art_NNN. It might simplify a bit of the logic. But that would have to come at the expense of making an HTTP for every such link we want to render.

To use if for sections would be more problematic, since that would involve parsing the chapter(/article) documents. I definitely wouldn’t want to do that in real time.

Or did you have some other idea entirely?

I was only thinking once you construct the links that initialy or in a couple of years if there were a way to have all the links tested to ensure no “link rot” had occured.

  • Something any wiki with external links could benifit from.
1 Like

This could be related to this item Using links[] in current tiddler which would support extracting a full prety/title/link from the content of one or more “links” within a tiddler.

  • ie; Each tiddler could do a link rot test of external links once opened. may be a long bow to draw, but maybe not.

That’s an interesting concept. I don’t think it would work well here, as we’d actually have to parse the page, and not just check HTTP headers, since our links all have a #fragment-identifier, pointing to a part of the page. We’d have to parse to know if that id actually exists. But I can see the potential usefulness in other circumstances.