Identifying proper nouns

That’s probably my fault. I responded to the wrong topic. When I participated in the other thread about finding proper nouns, I misunderstood something fundamental. I thought the goal was to find an automated, repeatable way to recognize proper nouns in a string of text, presumably a tiddler. This topic (edit: that is, Searching/indexing web site/page) made it clear that this is for a one-off operation, an attempt to simplify some manual conversion effort. So having a number of false positives, if they’re not overwhelming, would probably not be an issue. And that leads to more possibilities.

I will try to move these posts to that topic.

I find this an interesting problem. I don’t know if I’ll find time, but if I do, would a version that gave results like the following be helpful?

{
  /* ... */
  "Botanic Gardens": [
    "make me that survive, the Botanic Gardens, the Domain, Trumper Park, Moore Park: they are the "
  ],
  /* ... */
  "But": [
    "notice it at the time, but in retrospect it seems to have been happy. It is",
    "But it wasn’t really the football, it was the atmosphere",
    "prepare from Friday night onwards, but going to Trumper is easy.",
    /* ... */
  ],
  /* ... */
  "Trumper Park": [
    "Trumper Park has been persistent in my life.",
    "the people who went to Trumper Park on Sunday",
  ]
  /* ... */
}

(or perhaps with the potential proper noun in ALL CAPS?)

The usefulness of this is directly proportional to the usefulness of the underlying PDF → text conversion, which for this document is not great.

But if this might be helpful, I’ll see if I can find the time. As I said, I find it an interesting problem.