More precise titlecase?

Springer · June 16, 2023, 5:18pm

I see that the titlecase filter operator relies on a bit of javascript in the core.

For many academic purposes, titlecase is much more specific than simply capitalizing each word (see here for details).

How hard would it be to make a plugin that adds a fine-grained English-specific apa-titlecase (or whatever we’d call it), that yields results like this:

Rethinking Sustainability on Planet Earth: A Time for New Framings

Note, above has been edited to reflect consistency through subtitle string

In other words: starting the second word after colon in each string, there’s a list of short (particle) words that don’t get capitalized, but other words are capitalized.

Currently, this issue is the main obstacle discouraging me from using variations on the built-in citation templates within Refnotes by @Mohammad in academic-facing contexts; I end up having to run around typing proper citation strings, for each bibliographic tiddler, manually.

jeremyruston · June 16, 2023, 5:38pm

Thank you @Springer. The style guide link is very interesting. The rules are well defined and not hugely complicated so it seems possible that a wikitext solution might be possible, but that would take a bit more investigation.

The alternative might be a plugin that embeds an existing JavaScript library that implements the APA rules for title case. I found a couple that look reasonable at first glance:

GitHub - words/ap-style-title-case: Convert a value to AP/APA title case
GitHub - gouch/to-title-case: A JavaScript method for intelligently converting strings to title case.

The plugin could work by overriding the core implementation of the existing titlecase operator.

Springer · June 16, 2023, 5:42pm

If anyone is willing to take this on, a bunch of academic users including myself will be very pleased!

Of course, there are other variations on this theme, both in English and in other languages. So, surely this should not simply replace the existing titlecase, but should be a variant that can be called as needed by refnotes and by end-users.

Thanks @jeremyruston for the speedy confirmation that it seems possible!

TW_Tones · June 16, 2023, 11:34pm

Without reading the full standard could what you want be achived by the inclusion of stop words, ie words not to be capitalized. In your example before the : would be titlecase excluding stop words and after the : sentence case?

I have not tested it but proper nouns should remain.

Stop words could be stored but if one can write a macro or even function or custom widget, that implements your standard then as the method becomes available you can implement in one place.

perhaps even allow you build it over time.

[edited] I am looking forward to making a custom operator to include/exclude titles/words in TW 5.3.x which could be used to deal wih stop words.

Springer · June 17, 2023, 1:02am

Exactly.

One slight complication is that Refnotes currently imposes a double-operation of lowercase first, then sentencecase (applied to all parts of the title). This initial lowercase step makes sense, since some sloppy bibtex resources may arrive with some or all fields in all caps.

If we can achieve a more precise apa-titlecase operator, I would tweak my refnotes templates to forego that first imposition of lowercase, in order to retain capitalized proper nouns in subtitles (the part after the colon) when they are subjected to sentencecase operator. (I’d then have to add a “clean-up” step if I ever did end up importing bibtex resources with the allcaps problem.)

TW_Tones · June 17, 2023, 1:29am

Here is a list of some stop words NLTK's list of english stopwords · GitHub

Have you a different list?

pmario · June 17, 2023, 2:11am

There seems to be an ISO list of stop-words for many languages. They would need to be filtered for words shorter or equal to 3 characters to fit APA style. But those lists would probably be handy as a configuration parameter for the algorithm.

I personally would also be happy with an editor toolbar button, that converts a selected portion of text into APA style. Eg: for headings I would prefer plain text instead of a macro call.

TW_Tones · June 17, 2023, 2:11am

This may be a solution using existing code to deal with the first case; please play with it and test if true;

<$list filter="[[this is a set of words extravagant IBM]sentencecase[]split[ ]]" variable=word>
   <$list filter="[<word>] -[subfilter<english-stop-words>]" emptyMessage=<<word>> variable=word>
           <$text text={{{ [<word>titlecase[]] }}}/>^^1^^
   </$list>
</$list>

Where english-stop-words is a macro containing a LOWERCASE list of stop words on each line linked previously.

Result

The 1 indicates “not stop words”.

I still struggle with some aspects of filter run prefixes I am so not sure how to do this in a single filter.

[Edited] I see value as @pmario suggests to do this in the editor, rather than run this every time if you do have an established standard.

[Edited2] We could use something similar to extract initials or acronym’s

Mohammad · June 17, 2023, 5:09am

I will have a look! but I remember it was not an easy task and needs some complex scripting! I thought as most TW are used for personal knowledge management such small issue is ignorable.
with TW 5.3.0 this would be easier. I will update you if I came to a solution

Mohammad · June 17, 2023, 5:36am

You need also to convert stop words to lowercase, if they were fed in wrong case. Like: this Is A TEST

TW_Tones · June 17, 2023, 6:16am

If they incoming data is systematically wrong you can add Is if one off edit the input. Perhaps we could just force stop words only to lowercase.

emptyMessage={{{ [<word>lowercase[]] }}}

Springer · June 17, 2023, 10:51am

Ah, the price of success!

I’m using TW and Refnotes in student-facing applications all the time, have written academic papers in it, and have also used it as my presentation engine for public talks, using the Krystal story river.

I have not experienced such irregular data in bibtex resources; when that happens it’s not terrible to have to manually correct input now and then. Occasionally, I do see allcaps records, but an optional as-needed lowercasing step would be best there (since only with a manual step can I make sure to fix proper nouns and acronyms in the process).

Mostly, I hope to be able to use Title Case when appropriate, and still use Sentence case when appropriate (The APA does require sentence case for the final compiled bibliography, which is exactly as you have designed showrefs and bibliography macros within Refnotes – except that the forced lowercase that happens first is currently overriding proper nouns).

Thanks for all your work through now on Refnotes!

Mohammad · June 18, 2023, 3:47am

Great! so, you need a more correct APA representation of references list. I add this to my tasks and will try to improve the Refnotes APA output format!

Scott_Sauyet · June 23, 2023, 2:47pm

Here’s an attempt at this:

APA_Titles.json (3.5 KB)

(This is a JS macro, so you will need to save and reload to see it.)

The macro <<apa-title>> attempts to capture the APA-style guidelines for titles. This is an early draft. To use it, just call

<<apa-title "the text you wish to convert">>

Which will yield

The Text You Wish to Convert

Example outputs

The Text You Wish to Convert
Rethinking Sustainability on Planet Earth: A time for new framings
On the Wings of a Dove
On and On
The Hyphenation-Pattern: A use-case

Note the lower-case words not converted, such as "to", "on", "of", and "a". These are hard-coded, and pulled from a random internet site. The APA guidelines might well specify them. But note that these are not the same as the stop-words for English; they are only a subset.

Implementation

This is implemented as a JavaScript module in $:/_/modules/macros/apa-title.js. The process involves first splitting on colons, handling the first as title case, and the others as sentence case. Sentence case is easy, simply uppercasing the first (non-space) character. Title case additionally upper-cases the first letters of each additional word, then lower cases some short word, then re-upper-cases them again if they are the first or last words. Those words are: "a", "an", "and", "at", "but", "by", "for", "in", "nor", "of", "on", "or", "so", "the", "to", "up", and "yet".

TODO

This is a JS macro. Can it be better done in wikitext?
The list of words not to capitalize is hard-coded. It should probably be dynamic. Moreover, there should be some facility for internationalization.
This also has internal sentenceCase and titleCase functions. Should they also be exposed? There are already filter operators for both of these.
Should this simply be an editor button? Or should there be one as well that uses this?

Note that this has nothing to do with the RefNotes plugin, and I have no idea how difficult it might be to integrate. It’s just a stand-along implementation of APA title style.

Scott_Sauyet · June 23, 2023, 6:31pm

Here’s a follow-up to deal with the corrected capitalization in @springer’s original post:

APA_Titles2.json (3.5 KB)

Example Outputs

The Text You Wish to Convert
Rethinking Sustainability on Planet Earth: A Time for New Framings
On the Wings of a Dove
On and On
The Hyphenation-Pattern: A Use-Case

Mohammad · June 24, 2023, 5:37am

Thank you Scott! Great job!

JS gives more flexibility to do such complex task. I tested the second version and it works great! I did not test for all cases, like verbs, nouns where they are 3 or less characters in length and they need to be titleCased. I am sure @Springer can provide more examples here.

Springer · March 3, 2024, 11:52pm

I don’t think I ever properly acknowledged or replied to this!

This macro is definitely helpful!

One thing I don’t yet understand well is how easy it is to make a macro like this “play well” in complex cases.

Here’s my current look at the challenge of handling titles well with just one complexity, namely, proper sorting of titles (moving initial “The” or “A” to the end of a title in the process)… while still maintaining your careful case tweaks.

https://biblio-springer.tiddlyhost.com/#title%20list%20challenge

Eventually, good title handling would need to include openness to additional splicing as well (into longer citation strings, handling title strings with internal italics, leaving subtitles out or in depending on local needs, etc.):

As always, a joy to benefit from your javascript and all-around willingness to take on a challenge!

Scott_Sauyet · March 4, 2024, 2:49am

I’d totally forgotten about this.

I don’t really know how this would be used. I don’t see much use for it as macro, honestly. But the underlying JS code could be used as a filter operator easily enough as well.

But I don’t understand the likely usage. To me, this would be most likely done on import, translating, e.g., “Changing a poorly-cased title into an APA-style one: a new study for the changing world”, into “Changing a Poorly-Cased Title Into an APA-Style One: A New Study for the Changing World” for the title or some other field of a newly-created tiddler. If you’re likely to want to do this later, leaving the original text, but displaying it APA-style, then the macro might well come in handy. (Probably today it would be a procedure or function instead.)

Never mind, I found the sort of thing you’re looking for. I believe this is a more helpful link to your biblio challenge: https://biblio-springer.tiddlyhost.com/#title%20list%20%E2%80%94%C2%A0sort%20by%20first%20significant%20word%3F

It shouldn’t be hard to convert that JS macro to a JS filter. I’ll take a look. After that, you should be able to combine it with restructure-and-order technique.

Scott_Sauyet · March 4, 2024, 4:50am

(As I said above, I think this is a better link: https://biblio-springer.tiddlyhost.com/#title%20list%20%E2%80%94%C2%A0sort%20by%20first%20significant%20word%3F

I think this might do what you’re looking for, or at least be a step in that direction:
https://crosseye.github.io/TW5-demos/2024-03-03a/

We expose the apa function that was used above in a macro, this time in a filter-operator, apatitle.

We use it like this:

\define by-apa() [{!!title}apatitle[]search-replace:i:regexp[^The |A |An ],[]]

<ul>
<$list filter="[{Titles}split[
]sortsub:string<by-apa>]">
<li><$link to=<<currentTiddler>> ><$text text={{{[<currentTiddler>apatitle[]search-replace:i:regexp[(^The |A )(.*)],[$2, $1]]}}}/></$link></li>
</$list>
</ul>

Here we combine the APA functionality with the technique already described to sort without leading "The"''s, "A"'s (and I’ve added "An"'s) and to display by moving those articles to the end. As I write this up, I realize there’s a small bug in this technique: It would not properly sort "The gold ring" and "A gold ring", except by luck. I’m not going back to fix this in my demo, but you could reuse the search-replace used in the display in the sort, and it should be fine.

That filter operator is written in Javascript. I don’t think there’s anything fundamental that would prevent this from being written in wikitext, but I’m not interested in trying right now.

pmario · March 4, 2024, 8:45am

Have a closer look at the sortsub operator examples

The last example is exactly about that challenge.