Wordsmithing stop words, verbs past and present tense

TW_Tones · September 15, 2022, 8:15am

Folks,

I am keen to progress some ideas that makes use of standard English found in titles or paragraphs and as a result are collecting lists of stop words and verbs past and present tense.

Do you have a view on the following?

Should I make use of JSON tiddlers so store these lists and test if a given word is in a data tiddler ie has a key/index or should I import it using CSV and a JSON mangler to create shadow tiddlers in a plugin and simple test if the word has a shadow tiddler?

As I progress I will share this data here for others to make use of.

TW_Tones · September 16, 2022, 2:23am

To give this Topic a little bump, consider this;

List all or a subset of tiddlers named with plain English titles.

Now remove the stop words to see those that are most meaningful, these remaining words could be used;
- in a search across other tiddlers to find related content
- locate possible duplicates with similar names
Similarly search the original title for all verbs (doing words) to find tiddlers that;
- Suggest something needs to be done
- If in a past tense things that have being done and as a result becomes history or knowledge
In addition we could look at a tiddler title suffix for punctuation such as ? or ! and combinations there of, to discover outstanding, answered questions or assertions of fact, or to question an assertion.
Provide a tool to automatically rename a tiddlers verbs from present to past tense indicating it is no longer outstanding.

Of note also this need not only get applied to tiddler titles but lines, sentences even paragraphs in tiddlers content.

Of course if this is sucessful we could do this for any language.

So what do you think is the best method to store and test words found in titles and content against some large 100-1000 word lists?

Data tiddler
Shadow tiddlers
Word tiddlers
Variables or list fields containing the wordlists.

Which methods are most performant with the way tiddlywiki works, such as using indexes, saving storage space or working memory?

[Edited] Additional content

Further to comparing words used to word lists it would make sense for us to allow the user to characterise words found that belong to other “word sets” like nouns;

What is a noun? A noun is a word that names something, such as a person, place, thing, or idea

Lets say we have removed verbs and stop words from a tiddlers title, a line of a paragraph various words will remain including for example a persons name eg “Tony”.

If we make it simple to do so, we can add “Tony” to a list of Nouns either into a dataset or tiddler (The key question in this thread) but then it makes sense to indicate what kind of noun it is, a person, thing or place, with the first letter capitalised allowing us to determine if is is a Proper noun vs. common noun
In this case and others we could use a trailing “s” as a plural to discover a set, perhaps with a list of plurals not ending with an “s” eg sheep/set/pack/package

Using the smartest part of the computer, the person using it.

With various systematic discoveries as above and asking simple questions such as indicate a type of noun, plural or singular etc… TiddlyWiki could generate a list of really simple questions, many simply answered using a checkbox.
Such questions and the resulting answers would become relationships and where appropriate tiddlers. Have and idle moment?, look at a list of simple but outstanding questions about the content in your wiki and answer them.
As the answers become encoded in your wiki new sets, tiddlers and relationships will be added to the wiki enhancing the discovery process even more.
- For example it may ask ‘if this is the same “Tony” referenced in tiddlername’ and if not ask for a surname for this and the other, and provide a disambiguation tiddler eg; “Tony” tiddler is a proper noun but the are two instances of “Tony’s”. “Tony Hwan” and “Tony Choo”.
- We can choose when a new tiddler will be created and thus appear in the standard search.
Using such algorithm’s would add more information to your wiki and increase the order and knowledge contained within see also Measure of order in a tiddlywiki - #15 by TW_Tones

Temporary data answering my own Question (This Topic)?

The advantage of word lists being shadow tiddlers in a plugin is that in a “learning mode” a number of large wordlists can be accessed as shadow tiddlers in data plugins. If used these word tiddlers can be edited, perhaps adding some context information to the tiddler (eg first used in tiddlername). Then when about to publish online, you can remove these long word lists plugins retaining only the edited ones, thereby reducing the published wiki size and complexity. They can always be added back later for more authorship.