I’m tweaking my bibliographic resource toward working well as a demo/model project that amplifies the power of RefNotes tools and maximizes “intertwingularity” . And I’m hitting a stumbling-block that is not unique to me…
Objective: On the hyperlinked interface I want, any author names within any bibliographic record will function as a link to a virtual tiddler gathering info about that author. (More fancy: any author name will function as a link only if there’s more to browse about that author — beyond what’s in the tiddler being displayed).
Any “missing” tiddler can serve as a virtual tiddler/node (gathering a dynamic-table overview of bibliographic resources attributed to that author) if the name of that missing tiddler appears in the bibtex-author field of existing tiddlers.
All good so far in theory. I have been developing such “missing/virtual” utility/hub tiddlers for a while now. (Author names could alternately show up as filter-pills in ways that @TW_Tones has been developing.)
First Wrinkle: bibtex records as they come in “from the wild” include both LastName, FirstName MiddleName
format and FirstName MiddleName LastName
format for author names, plus other variants such as LastName, F M
. It’s possible that I could modify the import process so as to standardize somewhat. Cleaning incoming data and sticking to a standard is generally a good thing, but it’s nowhere near a sufficient solution to this problem.
(In case anyone doesn’t see why “cleaning up” in batch-mode or at import-time doesn’t help much: Some records come in with only first initials, as in “F M Alexander”. If I overwrite the actual field data for both “Mills, Charles W.” and “C Wright Mills” down to the “least-common-denominator” value of “C W Mills” — in order to standardize for matching purposes, then I lose actual information. Charles Wade Mills (who goes by Charles W Mills) will end up irreversibly conflated with the author who goes by C Wright Mills (both real authors I do cite!). Now I grant I may inherit some records that list only “C W Mills” as author. Still, I can avoid making the problem worse!)
So, I wonder what’s the most elegant filter-based way to make sure that the missing/virtual tiddler for Jane Addams
— which someone can click wherever that name appears in bibtex-author
role — could (in its dynamic table or filter-pill) pull up tiddlers with “Jane Addams” and ALSO tiddlers with “Addams, Jane” in the bibtex-author field.
The simple connection between LastName, FirstName
and FirstName LastName
variants would be a great start, and I think I could accomplish this by myself. But I pause because…
Additional Complication: The missing/virtual tiddler (or filter-pill) associated with author string “Nussbaum, Martha C.” should catch tiddlers with bibtex-author
values “Nussbaum, Martha” and “Martha Nussbaum” and “M Nussbaum” and “M C Nussbaum” and “Martha C. Nussbaum” and “Martha Craven Nussbaum” but not “Martha H Nussbaum” (or any other incompatible variant on the name, with some 98%-sufficient rule of thumb algorithm… we could get into the weeds with Jr. and such… cross some bridges later!)
The basic trick for the filter is to standardize an author-name string into FirstName MiddleName LastName
format (if there’s exactly one comma in the string, bump whatever’s before it to the end), and then to check it against values in the bibtex-author field of other tiddlers… which may require splitting those fields at the ;
character (since sometimes there are multiple authors — but perhaps we’ll focus on single-author works for now!), and setting a variable to FirstName... LastName
standard to check for the right kind of match (meaning a match that avoids glaring false positives, such as names with conflicting first/middle data, while erring on the side of weak false positives with compatible initials).
(The reason to standardize into that FirstName ... LastName
order rather than LastName, FirstName M [etc]
is that otherwise our algorithm will have to impose some guess about whether “Simone de Beauvoir” ought to be “Beauvoir, Simone de” or “de Beauvoir, Simone” and whether “Gabriel García Márquez” ought to be “García Márquez, Gabriel” or “Márquez, Gabriel García” (etc.) — all of which invites more trouble than we want!)
Why I’m coming to Y’all Of course, I am willing to put a bunch of trial-and-error into this from my end. However, I suspect
- RegExp wizards may be able to do this much more easily than I can;
- A solution to this name-order problem may actually be useful for other purposes, such as projects that batch-import or otherwise inherit name data in multiple formats and levels of completeness. For example:
- genealogical records
- student names
- employee names
Dreaming BIG: Maybe TiddlyWiki could eventually offer a new type
for the compare
filter operator so that a filter can use compare:person-name:weak-match
(for example) to catch weak matches (regardless of whether LastName, FirstName
convention is used, and regardless of whether first/middle names are reduced to an initial, with or without period, flattening all diacritic-marked-characters to ascii, all case-insensitive, etc.), while filtering out genuine conflicts. Something like compare:person-name:strong-match
could compensate only for the lastname-order variation, plus perhaps certain differences in punctuation.
I’m not the best person to invent this wheel. But I can describe it!
(And to be fair, I think some folks have grappled with variants on this problem, before, including @Mohammad. Please do point me to any threads that document progress on this front, since my only impression is that things were left mostly unresolved!)