More precise titlecase?

I think this algorithm should be either a suffix for the format-operator or for the titlecase-operator

It will be easier to implement as an option for the “format-operator” since it already has a core mechanism in place that allows new suffixes to be a separated js-file.

This fromat-suffix mechanism also makes it easy to implement the whole thing as a 3rd party plugin.

You may raise a ticket at GitHub, otherwise it will be forgotten.

Language specific “stop words” can be implemented as operator parameters. We would have to have a closer look about the different specs.

Springer’s example showed that she had seen this, as well as my APA-title implementation. She just couldn’t find a way to put them together, probably with good reason, since my implementation was a macro.

My latest version implements it as a stand-alone filter operator, similar, I suppose, to the current titlecase operator. I’ll look at the format-suffix mechanism when I have a chance.

My worry about trying to introduce this to the core is that APA is just one of the major citation styles, many with different capitalization rules. While it’s probably overall the most popular, there’s also MLA, Chicago, IEEE, Bluebook, and many more. Emphasizing one over the others may belong in user-land, not the core.

Right, so might we have something like a FlexTitleCase function, where some default parameters are tucked into shadow tiddlers that can be overridden…

Such shadows could include a decent common-stopword list in English, a small initial-article list (consisting of only “A” “An” and “The” for sortsub operations) plus one decent regexp specification of how title case works.

My guess is APA is better than the other standards as a default starting-point. APA is the most widespread. Also, Chicago and MLA are a bit more humanities-oriented, while a TiddlyWiki reference solution is likely to attract people in citation-dense fields like medicine and social sciences.

(Even in my non-science field, APA is often accepted, and at any rate is much more respectable than the awkward results my students currently see when existing tools either capitalize Every Word In A String or leave every word — including plato and mackinnon — in lowercase)

The initial/basic FlexTitleCase solution might as well assume a certain level of clean data, in which all title strings come in without capitalization except on initial words and where the capitalization is required by the content (or is at least consistent with desired title case results). That assumption sidesteps the need for a list of proper names and other special strings…

(Perhaps a slot, within the solution, could be available for a user’s list of specific strings encountered in their body of work that need special case treatment (acronyms, CamelCase tech terms, Scottish names, proprietary strings such as eBay). A shadow tiddler with some starter terms of this kind might be helpful as a model… but that’s less high priority. A user will only need that if they’ve inherited dirty data in the form of ALL_CAPS or all_bluntly_lowercased bibtex data (unfortunately not rare) that would be convenient to display nicely even prior to cleanup.)

At any rate, it would be fantastic for the “bibliographic edition” to have a TiddlyWiki-novice–friendly solution that allows for modular tweaking on top of decent shadows.

Ideally these shadows and custom filters should be open to arrangement within a cascade, even if the cascade has only one node in the default implementation. The cascade mechanism will be helpful if a project includes titles from multiple languages, in which case a biblio-entry tiddler would include a language field that sends the cascade down a different capitalization path. Variants appropriate to non-English languages could be developed as community users’ attention and time allows…

Any initial solution will surely be tentative, and will prompt some problem-solving and discussion that will be helpful in rounding out a general solution over time.

Looks great!

Will spend more time with this soon…

1 Like

That’s right. But I still think the format-operator suffixes are a good way to go, even with a 3rd party plugin, because they already have the needed core infrastructure. The core format operator suffixes are: date, json, relativedate, timestamp and titlelist at the moment.

We could establish a new suffix prefix eg: user- or 3p- like user-apa or 3p-apa and probably many others. – IMO this would be a good discussion point with Jeremy at GH-discussions. I’ll start one, after I finished this response.

  • 3p- … 3rd-party … If I personally would see this one, I’d start searching, what it is good for. So we would need some good docs.
  • user- … IMO self explanatory

just some thoughts


About the implementation – Follow the “rabbit-hole” / links – If you want :slight_smile:

My draft-PR containing an improved version of the “dumpvariables-macro” contains a new variables-suffix and a format-parameter.

Using it is done at: https://github.com/Jermolene/TiddlyWiki5/blob/28237b3b4a00cbe73aa9a940c874c7751796f316/core/wiki/macros/dumpvariables.tid#L19

The new docs is at: TiddlyWiki — a non-linear personal web notebook

Here’s the discussion at GH: Idea: Should we promote the: "module-type: formatfilteroperator" for 3rd-party suffixes and parameters? · Jermolene/TiddlyWiki5 · Discussion #8038 · GitHub

This sounds like a good design. I wish I had more time to try to implement it, but I won’t, not for at least a month.

I don’t know the data you’re dealing with, but that strikes me as an extremely stringent requirement – not that certain words are capitalized – but that other words are not. But yes, we would never get anywhere if we tried to automate proper names; we’d need a full semantic parser to understand that Patty might eat a patty or that each brad was nailed in by Brad.

I don’t disagree, but it would require more thinking before I was sure I agreed, to, but mostly I had to respond because of this:

What a wonderfully confusing phrase for a clearly understandable concept!

All my “clean data” assumption amounts to, in practice, is expecting title strings to use initial caps for any proper nouns (names etc.) and to refrain from imposing initial capital letters on the little bitty words that need to stay lowercase. (So that’s one positive expectation, and one negative.)

That latter sounds like a lot of what my code samples actually try to do the the output. If the input is already arriving like this, they might as well sleep in! :wink:

:slight_smile: – Yes it is what it is.

In one of your earlier posts you mentioned an editor toolbar button. As a user I personally would want to mark some text and then apply the APA-like formatting, so I do have “standard” text only, instead of a macro- or widget-calls, which imo distract the plain text reading.

Shouldn’t the words "as" and "if" also be in the lowercase list?

I don’t recall from what random site I pulled my list. I’m pretty sure it claimed to be a definitive APA list, but I’m not certain such a thing actually exists.

The list can be changed easily in the code. And if we pursue this, it should be relatively easy to instead make the list dynamic. (That will lead to an optimization versus ease-of-use question, but that’s for another day.)

This inspired an interesting idea. It is very “meta”.

To transform the text in place, rather than make the text a “parameter to a macro or widget”, we may still use the macro or widget, but use it in an editor toolbar button. In effect apply the macro, capture the result and replace the selection. This requires the design of a toolbar button.

  • This is not new.
  • just observe, the output of such transformations is wikitext and markup.

My first idea was

What if we made it very, very easy to go from applying a macro or widget to a selection, that is to making an editor toolbar that uses the macro or widget, to do this for us?

  • I am thinking how we could automate this button creation?
  • we could then use any widget or macro solution to create an editor toolbar button that transforms and replaces a selection.
    • the replacement should be wiki text markup, without the macro or widget call. That is “the result” of the macro or widget.

[Edited] For example

If we had a macro for APA “title case”, <<apa "text to transform">> then one would select text to transform and hit a button and select apa, and it would replace the selection with test transformed according to apa standard, the macros output.

  • If however we can reduce all macrocalls to filters, even better, so you would need to have your transformes as filter operators [<selectedtext>apa[]], may be a separate transformer dropdown.

But there is more…

I then had the idea what about a generic wiki text transform button?

The idea would be you author your tiddler text, including the use of macros and widgets to “transform” pieces or sections. The tiddler will now look as intended in the view template, however the text field in edit mode will contain all the transformation macros and widgets, which you @pmario are not so keen on.

The trick would be a special “transformer” editor button.

Select a slab of text containing wiki text, macros and widgets and the button will evaluate it all, and replace it with the resulting wiki text. Now without any macros or widget calls.

  • Basicaly the macros and widgets are just transitory and used by the author to transform what they enter or past into the text.
    • I already have an editor toolbar button, designed to insert such widgets or macros, from a most recently used list.
  • Initially the user selects part of the text to which this transformation applies.
  • Later we could have a list of “transformer” macros and widgets, and place a button on the whole tiddler, click to transform. But macros and widgets that are used for dynamic content will be left as it.
    [Edited]
  • as reflected above we could have a purely filter based set of transformers.
    • we could then put the filters in the text, with a tool, and chose later if they are dynamic or one hit transforms.

To be clear, this is for macros and widgets we are happy to be used once and removed, not macros and widgets that are dynamic.

By dynamic I mean will update automatically with future changes in the Wikis content (such as lists of tiddlers), or even changes in the macro or widgets.

What do you think, will we start a new community effort on this?

[Post Script]

One example of utility may be that if we had a transclusion {{tiddleranme}} it would be trivial to select this {{tiddleranme}} and transform it to the actual content of tiddler name, effectively the reverse of excise, but retaining the separate tiddler.

When it comes to something like citation formats for bibliographic records, one-off conversions are really different from what I’m looking for. Even an old-style macro or widget is not dynamic enough; what’s needed is a procedure (or something similarly powerful) that handles three things dynamically:

(a) corrections to the data (When I fix a misspelling in the author’s name, that fix flows into all citation strings “downstream” that draw on that field)
(b) adjustments to the standard (I take a project tailored to confirm to one standard, and now — without changing any of the substantive data — I need a way to get citations appearing in MLA standard this time around, ideally by touching only one setting, in a way that can toggle back and forth, so I don’t destroy the work put into one standard in order to access another standard)
(c) complex embedding, so this FlexTitleCase procedure (or whatever) can be called within a larger wikitext-based frame, and any tweak to the details of the FlexTitleCase procedure (or whatever it is) is a one-time/one-place change that fixes everything “downstream” without needing to dig around in the bowels of the larger viewtemplate (or whatever).

I suspect that most people who would care about a precise titlecase (as in this thread’s title) are interested in dynamic flexibility — or will be, down the road, if their projects grow enough.

But perhaps one use-case for a one-time-conversion tool comes up in the data cleanup phase. Say I get some incoming bibtex records that are in ALLCAPS. This needs to be fixed, because I can’t generally trust any algorithm to parse whether any of the non-initial words in the title need capitalization. There’s no information embedded in a DUMBCAPS record that would be lost by overwriting with lowercase, and I need to level the strings into lowercase in order to add careful capitalization info back in. So I would benefit from a “Impose sentence-case on this mess” button as the first step in cleaning it up. (If I’m lucky, there’s no further cleanup needed for some records.)

Of course, in the big picture, it would be most efficient to apply this not in a one-off way, but with a whole batch of records (on specific fields or across fields) — perhaps through the Tiddler Commander tool that @Mohammad has developed, or even an option at the import-records stage.

I do imagine, though, that I’d be somewhat nervous about tinkering with such a conversion action affordance (that is, tinkering to make it handle complex contextual details via regexp), since if there’s a glitch in its implementation, data can be lost — perhaps even in ways that wouldn’t be noticed until later. :flushed:

Thanks to clarify your viewpoint. – We use a similar mechanism for the TW docs, where we do have several doc-macros only for the purpose to be able to modify them globally – as you described it. – That’s a valid usecase.

My personal desire to have “title-cased” headings in plain text using a toolbar button is a completely different one.

Implementing the algorithm with filters, we can satisfy both of them.

2 Likes

There are good ideas, but some of them will be hard to implement.

At the moment we do not have the possibility to create “complex” wikitext output from rendered macro or widget output. Rendered output is HTML text, which does not have enough information, how it was created, to convert it back to wikitext without information loss.

There is a 3rd party converter, that converts HTML syntax back to “simple” wikitext syntax, like heading, lists, bold, italic … and so on. For may usecases this is OK. → This plugin uses a toolbar button.

So I’m not sure if we should re-implement it again into the core, since the plugin already exists.

1 Like

It is clear the functionality you want is dynamic. In such cases you need to maintain a reference to a function or procedure definition defined elsewhere you can change.

  • in this case you would not use the above transformers as they generate static results by definition.
  • perhaps if there were a wiki text way to apply a macro. I am thinking of the way we can apply a class to wiki text symbols *.classname bullet text. I belive @pmario’s custom wiki text solution can do this. Perhaps we could extract a subset.
  • I wonder if we could use relink here?

Converting to html is a valid approach to produce static results but it does reduce readability when compared to wiki text.

Perhaps we can use the substitution operator, back tick parameters or /define to do our reformatting and thus not let the wiki text get rendered by our transformation?

  • the trick will be understanding the input.

The plugin converts HTML to wikitext. So <h1>test</h1> will become ! test

I was not refering to the plugin but replacing selected wiki text containing macros with the wikified html result, but that is not good enough.

  • Think with filters understandably, focusing on titles, we are at a slight disadvantage when writing filters to manipulate strings, including those containing wikitext.
  • Few simple filter operators, coded or custom filter operators/functions may be sufficient to improve this.

Can you find this plugin and provide a reference?

  • Although I am not currently looking for it.